125 62 20MB
English Pages 1375 [1374] Year 2005
Lecture Notes in Artificial Intelligence Edited by J. G. Carbonell and J. Siekmann
Subseries of Lecture Notes in Computer Science
3613
Lipo Wang Yaochu Jin (Eds.)
Fuzzy Systems and Knowledge Discovery Second International Conference, FSKD 2005 Changsha, China, August 27-29, 2005 Proceedings, Part I
13
Series Editors Jaime G. Carbonell, Carnegie Mellon University, Pittsburgh, PA, USA Jörg Siekmann, University of Saarland, Saarbrücken, Germany Volume Editors Lipo Wang Nanyang Technological University School of Electrical and Electronic Engineering Block S1, 50 Nanyang Avenue, Singapore 639798 E-mail: [email protected] Yaochu Jin Honda Research Institute Europe Carl-Legien-Str. 30, 63073 Offenbach/Main, Germany E-mail: [email protected]
Library of Congress Control Number: 2005930642
CR Subject Classification (1998): I.2, F.4.1, F.1, F.2, G.2, I.2.3, I.4, I.5 ISSN ISBN-10 ISBN-13
0302-9743 3-540-28312-9 Springer Berlin Heidelberg New York 978-3-540-28312-6 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springeronline.com © Springer-Verlag Berlin Heidelberg 2005 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 11539506 06/3142 543210
Preface
This book and its sister volume, LNAI 3613 and 3614, constitute the proceedings of the Second International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2005), jointly held with the First International Conference on Natural Computation (ICNC 2005, LNCS 3610, 3611, and 3612) from August 27–29, 2005 in Changsha, Hunan, China. FSKD 2005 successfully attracted 1249 submissions from 32 countries/regions (the joint ICNC-FSKD 2005 received 3136 submissions). After rigorous reviews, 333 high-quality papers, i.e., 206 long papers and 127 short papers, were included in the FSKD 2005 proceedings, representing an acceptance rate of 26.7%. The ICNC-FSKD 2005 conference featured the most up-to-date research results in computational algorithms inspired from nature, including biological, ecological, and physical systems. It is an exciting and emerging interdisciplinary area in which a wide range of techniques and methods are being studied for dealing with large, complex, and dynamic problems. The joint conferences also promoted cross-fertilization over these exciting and yet closely-related areas, which had a significant impact on the advancement of these important technologies. Specific areas included computation with words, fuzzy computation, granular computation, neural computation, quantum computation, evolutionary computation, DNA computation, chemical computation, information processing in cells and tissues, molecular computation, artificial life, swarm intelligence, ants colony, artificial immune systems, etc., with innovative applications to knowledge discovery, finance, operations research, and more. In addition to the large number of submitted papers, we were blessed with the presence of four renowned keynote speakers and several distinguished panelists. On behalf of the Organizing Committee, we thank Xiangtan University for sponsorship, and the IEEE Circuits and Systems Society, the IEEE Computational Intelligence Society, and the IEEE Control Systems Society for technical co-sponsorship. We are grateful for the technical cooperation from the International Neural Network Society, the European Neural Network Society, the Chinese Association for Artificial Intelligence, the Japanese Neural Network Society, the International Fuzzy Systems Association, the Asia-Pacific Neural Network Assembly, the Fuzzy Mathematics and Systems Association of China, and the Hunan Computer Federation. We thank the members of the Organizing Committee, the Advisory Board, and the Program Committee for their hard work over the past 18 months. We wish to express our heart-felt appreciation to the keynote and panel speakers, special session organizers, session chairs, reviewers, and student helpers. Our special thanks go to the publisher, Springer, for publishing the FSKD 2005 proceedings as two volumes of the Lecture Notes in Artificial Intelligence series (and the ICNC 2005 proceedings as three volumes of the Lecture Notes in Computer Science series). Finally, we thank all the authors
VI
Preface
and participants for their great contributions that made this conference possible and all the hard work worthwhile. August 2005
Lipo Wang Yaochu Jin
Organization
FSKD 2005 was organized by Xiangtan University and technically co-sponsored by the IEEE Circuits and Systems Society, the IEEE Computational Intelligence Society, and the IEEE Control Systems Society, in cooperation with the International Neural Network Society, the European Neural Network Society, the Chinese Association for Artificial Intelligence, the Japanese Neural Network Society, the International Fuzzy Systems Association, the Asia-Pacific Neural Network Assembly, the Fuzzy Mathematics and Systems Association of China, and the Hunan Computer Federation.
Organizing Committee Honorary Conference Chairs: General Chair: General Co-chairs: Program Chair: Local Arrangement Chairs: Proceedings Chair: Publicity Chair: Sponsorship/Exhibits Chairs: Webmasters:
Shun-ichi Amari, Japan Lotfi A. Zadeh, USA He-An Luo, China Lipo Wang , Singapore Yunqing Huang, China Yaochu Jin, Germany Renren Liu, China Xieping Gao, China Fen Xiao, China Hepu Deng, Australia Shaoping Ling, China Geok See Ng, Singapore Linai Kuang, China Yanyu Liu, China
Advisory Board Toshio Fukuda, Japan Kunihiko Fukushima, Japan Tom Gedeon, Australia Aike Guo, China Zhenya He, China Janusz Kacprzyk, Poland Nikola Kasabov, New Zealand John A. Keane, UK Soo-Young Lee, Korea Erkki Oja, Finland Nikhil R. Pal, India
Witold Pedrycz, Canada Jose C. Principe, USA Harold Szu, USA Shiro Usui, Japan Xindong Wu, USA Lei Xu, Hong Kong Xin Yao, UK Syozo Yasui, Japan Bo Zhang, China Yixin Zhong, China Jacek M. Zurada, USA
VIII
Organization
Program Committee Members Janos Abonyi, Hungary Jorge Casillas, Spain Pen-Chann Chang, Taiwan Chaochang Chiu, Taiwan Feng Chu, Singapore Oscar Cordon, Spain Honghua Dai, Australia Fernando Gomide, Brazil Saman Halgamuge, Australia Kaoru Hirota, Japan Frank Hoffmann, Germany Jinglu Hu, Japan Weili Hu, China Chongfu Huang, China Eyke H¨ ullermeier, Germany Hisao Ishibuchi, Japan Frank Klawoon, Germany Naoyuki Kubota, Japan Sam Kwong, Hong Kong Zongmin Ma, China
Michael Margaliot, Israel Ralf Mikut, Germany Pabitra Mitra, India Tadahiko Murata, Japan Detlef Nauck, UK Hajime Nobuhara, Japan Andreas N¨ urnberger, Germany Da Ruan, Belgium Thomas Runkler, Germany Rudy Setiono, Singapore Takao Terano, Japan Kai Ming Ting, Australia Yiyu Yao, Canada Gary Yen, USA Xinghuo Yu, Australia Jun Zhang, China Shichao Zhang, Australia Yanqing Zhang, USA Zhi-Hua Zhou, China
Special Sessions Organizers David Siu-Yeung Cho, Singapore Vlad Dimitrov, Australia Jinwu Gao, China Zheng Guo, China Bob Hodge, Australia Jiman Hong, Korea Jae-Woo Lee, Korea Xia Li, China
Zongmin Ma, China, Geok-See Ng, Singapore Shaoqi Rao, China Slobodan Ribari, Croatia Sung Y. Shin, USA Yasufumi Takama, Japan Robert Woog, Australia
Reviewers Nitin V. Afzulpurkar Davut Akdas K¨ urat Ayan Yasar Becerikli Dexue Bi Rong-Fang Bie
Liu Bin Tao Bo Hongbin Cai Yunze Cai Jian Cao Chunguang Chang
An-Long Chen Dewang Chen Gang Chen Guangzhu Chen Jian Chen Shengyong Chen
Organization
Shi-Jay Chen Xuerong Chen Yijiang Chen Zhimei Chen Zushun Chen Hongqi Chen Qimei Chen Wei Cheng Xiang Cheng Tae-Ho Cho Xun-Xue Cui Ho Daniel Hepu Deng Tingquan Deng Yong Deng Zhi-Hong Deng Mingli Ding Wei-Long Ding Fangyan Dong Jingxin Dong Lihua Dong Yihong Dong Haifeng Du Weifeng Du Liu Fang Zhilin Feng Li Gang Chuanhou Gao Yu Gao Zhi Geng O. Nezih Gerek Rongjie Gu Chonghui Guo Gongde Guo Huawei Guo Mengshu Guo Zhongming Han Bo He Pilian He Liu Hong Kongfa Hu Qiao Hu Shiqiang Hu Zhikun Hu Zhonghui Hu
Zhonghui Hu Changchun Hua Jin Huang Qian Huang Yanxin Huang Yuansheng Huang Kohei Inoue Mahdi Jalili-Kharaajoo Caiyan Jia Ling-Ling Jiang Michael Jiang Xiaoyue Jiang Yanping Jiang Yunliang Jiang Cheng Jin Hanjun Jin Hong Jin Ningde Jin Xue-Bo Jin Min-Soo Kim Sungshin Kim Taehan Kim Ibrahim Beklan Kucukdemiral Rakesh Kumar Arya Ho Jae Lee Sang-Hyuk Lee Sang-Won Lee Wol Young Lee Xiuren Lei Bicheng Li Chunyan Li Dequan Li Dingfang Li Gang Li Hongyu Li Qing Li Ruqiang Li Tian-Rui Li Weigang Li Yu Li Zhichao Li Zhonghua Li Hongxing Li Xiaobei Liang
Ling-Zhi Liao Lei Lin Caixia Liu Fei Liu Guangli Liu Haowen Liu Honghai Liu Jian-Guo Liu Lanjuan Liu Peng Liu Qihe Liu Sheng Liu Xiaohua Liu Xiaojian Liu Yang Liu Qiang Luo Yanbin Luo Zhi-Jun Lv Jian Ma Jixin Ma Longhua Ma Ming Ma Yingcang Ma Dong Miao Zhinong Miao Fan Min Zhang Min Zhao Min Daniel Neagu Yiu-Kai Ng Wu-Ming Pan Jong Sou Park Yonghong Peng Punpiti Piamsa-Nga Heng-Nian Qi Gao Qiang Wu Qing Celia Ghedini Ralha Wang Rong Hongyuan Shen Zhenghao Shi Jeong-Hoon Shin Sung Chul Shin Chonghui Song Chunyue Song
IX
X
Organization
Guangda Su Baolin Sun Changyin Sun Ling Sun Zhengxing Sun Chang-Jie Tang Shanhu Tang N K Tiwari Jiang Ping Wan Chong-Jun Wang Danli Wang Fang Wang Fei Wang Houfeng Wang Hui Wang Laisheng Wang Lin Wang Ling Wang Shitong Wang Shu-Bin Wang Xun Wang Yong Wang Zhe Wang Zhenlei Wang Zhongjie Wang Runsheng Wang Li Wei Weidong Wen Xiangjun Wen Taegkeun Whangbo Huaiyu Wu Jiangning Wu Jiangqin Wu
∗
Jianping Wu Shunxiang Wu Xiaojun Wu Yuying Wu Changcheng Xiang Jun Xiao Xiaoming Xiao Wei Xie Gao Xin Zongyi Xing Hua Xu Lijun Xu Pengfei Xu Weijun Xu Xiao Xu Xinli Xu Yaoqun Xu De Xu Maode Yan Shaoze Yan Hai Dong Yang Jihui Yang Wei-Min Yang Yong Yang Zuyuan Yang Li Yao Shengbao Yao Bin Ye Guo Yi Jianwei Yin Xiang-Gang Yin Yilong Yin Deng Yong
Chun-Hai Yu Haibin Yuan Jixue Yuan Weiqi Yuan Chuanhua Zeng Wenyi Zeng Yurong Zeng Guojun Zhang Jian Ying Zhang Junping Zhang Ling Zhang Zhi-Zheng Zhang Yongjin Zhang Yongkui Zhang Jun Zhao Quanming Zhao Xin Zhao Yong Zhao Zhicheng Zhao Dongjian Zheng Wenming Zheng Zhonglong Zheng Weimin Zhong Hang Zhou Hui-Cheng Zhou Qiang Zhou Yuanfeng Zhou Yue Zhou Daniel Zhu Hongwei Zhu Xinglong Zhu
The term after a name may represent either a country or a region.
Table of Contents – Part I
Fuzzy Theory and Models On Fuzzy Inclusion in the Interval-Valued Sense Jin Han Park, Jong Seo Park, Young Chel Kwun . . . . . . . . . . . . . . . . . .
1
Fuzzy Evaluation Based Multi-objective Reactive Power Optimization in Distribution Networks Jiachuan Shi, Yutian Liu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
Note on Interval-Valued Fuzzy Set Wenyi Zeng, Yu Shi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
Knowledge Structuring and Evaluation Based on Grey Theory Chen Huang, Yushun Fan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
A Propositional Calculus Formal Deductive System LU of Universal Logic and Its Completeness Minxia Luo, Huacan He . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
Entropy and Subsethood for General Interval-Valued Intuitionistic Fuzzy Sets Xiao-dong Liu, Su-hua Zheng, Feng-lan Xiong . . . . . . . . . . . . . . . . . . . . .
42
The Comparative Study of Logical Operator Set and Its Corresponding General Fuzzy Rough Approximation Operator Set Suhua Zheng, Xiaodong Liu, Fenglan Xiong . . . . . . . . . . . . . . . . . . . . . . .
53
Associative Classification Based on Correlation Analysis Jian Chen, Jian Yin, Jin Huang, Ming Feng . . . . . . . . . . . . . . . . . . . . . .
59
Design of Interpretable and Accurate Fuzzy Models from Data Zong-yi Xing, Yong Zhang, Li-min Jia, Wei-li Hu . . . . . . . . . . . . . . . . .
69
Generating Extended Fuzzy Basis Function Networks Using Hybrid Algorithm Bin Ye, Chengzhi Zhu, Chuangxin Guo, Yijia Cao . . . . . . . . . . . . . . . . .
79
Analysis of Temporal Uncertainty of Trains Converging Based on Fuzzy Time Petri Nets Yangdong Ye, Juan Wang, Limin Jia . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
89
XII
Table of Contents – Part I
Interval Regression Analysis Using Support Vector Machine and Quantile Regression Changha Hwang, Dug Hun Hong, Eunyoung Na, Hyejung Park, Jooyong Shim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 An Approach Based on Similarity Measure to Multiple Attribute Decision Making with Trapezoid Fuzzy Linguistic Variables Zeshui Xu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Research on Index System and Fuzzy Comprehensive Evaluation Method for Passenger Satisfaction Yuanfeng Zhou, Jianping Wu, Yuanhua Jia . . . . . . . . . . . . . . . . . . . . . . . 118 Research on Predicting Hydatidiform Mole Canceration Tendency by a Fuzzy Integral Model Yecai Guo, Yi Guo, Wei Rao, Wei Ma . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Consensus Measures and Adjusting Inconsistency of Linguistic Preference Relations in Group Decision Making Zhi-Ping Fan, Xia Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Fuzzy Variation Coefficients Programming of Fuzzy Systems and Its Application Xiaobei Liang, Daoli Zhu, Bingyong Tang . . . . . . . . . . . . . . . . . . . . . . . . . 140 Weighted Possibilistic Variance of Fuzzy Number and Its Application in Portfolio Theory Xun Wang, Weijun Xu, Weiguo Zhang, Maolin Hu . . . . . . . . . . . . . . . . 148 Another Discussion About Optimal Solution to Fuzzy Constraints Linear Programming Yun-feng Tan, Bing-yuan Cao . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Fuzzy Ultra Filters and Fuzzy G-Filters of MTL-Algebras Xiao-hong Zhang, Yong-quan Wang, Yong-lin Liu . . . . . . . . . . . . . . . . . . 160 A Study on Relationship Between Fuzzy Rough Approximation Operators and Fuzzy Topological Spaces Wei-Zhi Wu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 A Case Retrieval Model Based on Factor-Structure Connection and λ−Similarity in Fuzzy Case-Based Reasoning Dan Meng, Zaiqiang Zhang, Yang Xu . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
Table of Contents – Part I
XIII
A TSK Fuzzy Inference Algorithm for Online Identification Kyoungjung Kim, Eun Ju Whang, Chang-Woo Park, Euntai Kim, Mignon Park . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Histogram-Based Generation Method of Membership Function for Extracting Features of Brain Tissues on MRI Images Weibei Dou, Yuan Ren, Yanping Chen, Su Ruan, Daniel Bloyet, Jean-Marc Constans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Uncertainty Management in Data Mining On Identity-Discrepancy-Contrary Connection Degree in SPA and Its Applications Yunliang Jiang, Yueting Zhuang, Yong Liu, Keqin Zhao . . . . . . . . . . . . 195 A Mathematic Model for Automatic Summarization Zhiqi Wang, Yongcheng Wang, Kai Gao . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Reliable Data Selection with Fuzzy Entropy Sang-Hyuk Lee, Youn-Tae Kim, Seong-Pyo Cheon, Sungshin Kim . . . . 203
Uncertainty Management and Probabilistic Methods in Data Mining Optimization of Concept Discovery in Approximate Information System Based on FCA Hanjun Jin, Changhua Wei, Xiaorong Wang, Jia Fu . . . . . . . . . . . . . . . 213 Geometrical Probability Covering Algorithm Junping Zhang, Stan Z. Li, Jue Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
Approximate Reasoning Extended Fuzzy ALCN and Its Tableau Algorithm Jianjiang Lu, Baowen Xu, Yanhui Li, Dazhou Kang, Peng Wang . . . . 232 Type II Topological Logic C2T and Approximate Reasoning Yalin Zheng, Changshui Zhang, Yinglong Xia . . . . . . . . . . . . . . . . . . . . . . 243 Type-I Topological Logic C1T and Approximate Reasoning Yalin Zheng, Changshui Zhang, Xin Yao . . . . . . . . . . . . . . . . . . . . . . . . . . 253 Vagueness and Extensionality Shunsuke Yatabe, Hiroyuki Inaoka . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
XIV
Table of Contents – Part I
Using Fuzzy Analogical Reasoning to Refine the Query Answers for Relational Databases with Imprecise Information Z.M. Ma, Li Yan, Gui Li . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 A Linguistic Truth-Valued Uncertainty Reasoning Model Based on Lattice-Valued Logic Shuwei Chen, Yang Xu, Jun Ma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
Axiomatic Foundation Fuzzy Programming Model for Lot Sizing Production Planning Problem Weizhen Yan, Jianhua Zhao, Zhe Cao . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Fuzzy Dominance Based on Credibility Distributions Jin Peng, Henry M.K. Mok, Wai-Man Tse . . . . . . . . . . . . . . . . . . . . . . . . 295 Fuzzy Chance-Constrained Programming for Capital Budgeting Problem with Fuzzy Decisions Jinwu Gao, Jianhua Zhao, Xiaoyu Ji . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 Genetic Algorithms for Dissimilar Shortest Paths Based on Optimal Fuzzy Dissimilar Measure and Applications Yinzhen Li, Ruichun He, Linzhong Liu, Yaohuang Guo . . . . . . . . . . . . . 312 Convergence Criteria and Convergence Relations for Sequences of Fuzzy Random Variables Yan-Kui Liu, Jinwu Gao . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 Hybrid Genetic-SPSA Algorithm Based on Random Fuzzy Simulation for Chance-Constrained Programming Yufu Ning, Wansheng Tang, Hui Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 Random Fuzzy Age-Dependent Replacement Policy Song Xu, Jiashun Zhang, Ruiqing Zhao . . . . . . . . . . . . . . . . . . . . . . . . . . . 336 A Theorem for Fuzzy Random Alternating Renewal Processes Ruiqing Zhao, Wansheng Tang, Guofei Li . . . . . . . . . . . . . . . . . . . . . . . . . 340 Three Equilibrium Strategies for Two-Person Zero-Sum Game with Fuzzy Payoffs Lin Xu, Ruiqing Zhao, Tingting Shu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
Table of Contents – Part I
XV
Fuzzy Classifiers An Improved Rectangular Decomposition Algorithm for Imprecise and Uncertain Knowledge Discovery Jiyoung Song, Younghee Im, Daihee Park . . . . . . . . . . . . . . . . . . . . . . . . . 355 XPEV: A Storage Model for Well-Formed XML Documents Jie Qin, Shu-Mei Zhao, Shu-Qiang Yang, Wen-Hua Dou . . . . . . . . . . . . 360 Fuzzy-Rough Set Based Nearest Neighbor Clustering Classification Algorithm Xiangyang Wang, Jie Yang, Xiaolong Teng, Ningsong Peng . . . . . . . . . 370 An Efficient Text Categorization Algorithm Based on Category Memberships Zhi-Hong Deng, Shi-Wei Tang, Ming Zhang . . . . . . . . . . . . . . . . . . . . . . . 374 The Integrated Location Algorithm Based on Fuzzy Identification and Data Fusion with Signal Decomposition Zhao Ping, Haoshan Shi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383 A Web Document Classification Approach Based on Fuzzy Association Concept Jingsheng Lei, Yaohong Kang, Chunyan Lu, Zhang Yan . . . . . . . . . . . . 388 Optimized Fuzzy Classification Using Genetic Algorithm Myung Won Kim, Joung Woo Ryu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392 Dynamic Test-Sensitive Decision Trees with Multiple Cost Scales Zhenxing Qin, Chengqi Zhang, Xuehui Xie, Shichao Zhang . . . . . . . . . . 402 Design of T–S Fuzzy Classifier via Linear Matrix Inequality Approach Moon Hwan Kim, Jin Bae Park, Young Hoon Joo, Ho Jae Lee . . . . . . 406 Design of Fuzzy Rule-Based Classifier: Pruning and Learning Do Wan Kim, Jin Bae Park, Young Hoon Joo . . . . . . . . . . . . . . . . . . . . . 416 Fuzzy Sets Theory Based Region Merging for Robust Image Segmentation Hongwei Zhu, Otman Basir . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426 A New Interactive Segmentation Scheme Based on Fuzzy Affinity and Live-Wire Huiguang He, Jie Tian, Yao Lin, Ke Lu . . . . . . . . . . . . . . . . . . . . . . . . . . 436
XVI
Table of Contents – Part I
Fuzzy Clustering The Fuzzy Mega-cluster: Robustifying FCM by Scaling Down Memberships Amit Banerjee, Rajesh N. Dav´e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444 Robust Kernel Fuzzy Clustering Weiwei Du, Kohei Inoue, Kiichi Urahama . . . . . . . . . . . . . . . . . . . . . . . . 454 Spatial Homogeneity-Based Fuzzy c-Means Algorithm for Image Segmentation Bo-Yeong Kang, Dae-Won Kim, Qing Li . . . . . . . . . . . . . . . . . . . . . . . . . 462 A Novel Fuzzy-Connectedness-Based Incremental Clustering Algorithm for Large Databases Yihong Dong, Xiaoying Tai, Jieyu Zhao . . . . . . . . . . . . . . . . . . . . . . . . . . 470 Classification of MPEG VBR Video Data Using Gradient-Based FCM with Divergence Measure Dong-Chul Park . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475 Fuzzy-C-Mean Determines the Principle Component Pairs to Estimate the Degree of Emotion from Facial Expressions M. Ashraful Amin, Nitin V. Afzulpurkar, Matthew N. Dailey, Vatcharaporn Esichaikul, Dentcho N. Batanov . . . . . . . . . . . . . . . . . . . . . 484 An Improved Clustering Algorithm for Information Granulation Qinghua Hu, Daren Yu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494 A Novel Segmentation Method for MR Brain Images Based on Fuzzy Connectedness and FCM Xian Fan, Jie Yang, Lishui Cheng . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505 Improved-FCM-Based Readout Segmentation and PRML Detection for Photochromic Optical Disks Jiqi Jian, Cheng Ma, Huibo Jia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514 Fuzzy Reward Modeling for Run-Time Peer Selection in Peer-to-Peer Networks Huaxiang Zhang, Xiyu Liu, Peide Liu . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523 KFCSA: A Novel Clustering Algorithm for High-Dimension Data Kan Li, Yushu Liu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
Table of Contents – Part I
XVII
Fuzzy Database Mining and Information Retrieval An Improved VSM Based Information Retrieval System and Fuzzy Query Expansion Jiangning Wu, Hiroki Tanioka, Shizhu Wang, Donghua Pan, Kenichi Yamamoto, Zhongtuo Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537 The Extraction of Image’s Salient Points for Image Retrieval Wenyin Zhang, Jianguo Tang, Chao Li . . . . . . . . . . . . . . . . . . . . . . . . . . . 547 A Sentence-Based Copy Detection Approach for Web Documents Rajiv Yerra, Yiu-Kai Ng . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557 The Research on Query Expansion for Chinese Question Answering System Zhengtao Yu, Xiaozhong Fan, Lirong Song, Jianyi Guo . . . . . . . . . . . . . 571 Multinomial Approach and Multiple-Bernoulli Approach for Information Retrieval Based on Language Modeling Hua Huo, Junqiang Liu, Boqin Feng . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580 Adaptive Query Refinement Based on Global and Local Analysis Chaoyuan Cui, Hanxiong Chen, Kazutaka Furuse, Nobuo Ohbo . . . . . . 584 Information Push-Delivery for User-Centered and Personalized Service Zhiyun Xin, Jizhong Zhao, Chihong Chi, Jiaguang Sun . . . . . . . . . . . . . 594 Mining Association Rules Based on Seed Items and Weights Chen Xiang, Zhang Yi, Wu Yue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603 An Algorithm of Online Goods Information Extraction with Two-Stage Working Pattern Wang Xun, Ling Yun, Yu-lian Fei . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609 A Novel Method of Image Retrieval Based on Combination of Semantic and Visual Features Ming Li, Tong Wang, Bao-wei Zhang, Bi-Cheng Ye . . . . . . . . . . . . . . . . 619 Using Fuzzy Pattern Recognition to Detect Unknown Malicious Executables Code Boyun Zhang, Jianping Yin, Jingbo Hao . . . . . . . . . . . . . . . . . . . . . . . . . . 629 Method of Risk Discernment in Technological Innovation Based on Path Graph and Variable Weight Fuzzy Synthetic Evaluation Yuan-sheng Huang, Jian-xun Qi, Jun-hua Zhou . . . . . . . . . . . . . . . . . . . . 635
XVIII Table of Contents – Part I
Application of Fuzzy Similarity to Prediction of Epileptic Seizures Using EEG Signals Xiaoli Li, Xin Yao . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645 A Fuzzy Multicriteria Analysis Approach to the Optimal Use of Reserved Land for Agriculture Hepu Deng, Guifang Yang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653 Fuzzy Comprehensive Evaluation for the Optimal Management of Responding to Oil Spill Xin Liu, Kai W. Wirtz, Susanne Adam . . . . . . . . . . . . . . . . . . . . . . . . . . . 662
Information Fusion Fuzzy Fusion for Face Recognition Xuerong Chen, Zhongliang Jing, Gang Xiao . . . . . . . . . . . . . . . . . . . . . . . 672 A Group Decision Making Method for Integrating Outcome Preferences in Hypergame Situations Yexin Song, Qian Wang, Zhijun Li . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 676 A Method Based on IA Operator for Multiple Attribute Group Decision Making with Uncertain Linguistic Information Zeshui Xu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684 A New Prioritized Information Fusion Method for Handling Fuzzy Information Retrieval Problems Won-Sin Hong, Shi-Jay Chen, Li-Hui Wang, Shyi-Ming Chen . . . . . . . 694 Multi-context Fusion Based Robust Face Detection in Dynamic Environments Mi Young Nam, Phill Kyu Rhee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 698 Unscented Fuzzy Tracking Algorithm for Maneuvering Target Shi-qiang Hu, Li-wei Guo, Zhong-liang Jing . . . . . . . . . . . . . . . . . . . . . . . 708 A Pixel-Level Multisensor Image Fusion Algorithm Based on Fuzzy Logic Long Zhao, Baochang Xu, Weilong Tang, Zhe Chen . . . . . . . . . . . . . . . . 717
Neuro-Fuzzy Systems Approximation Bound for Fuzzy-Neural Networks with Bell Membership Function Weimin Ma, Guoqing Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 721
Table of Contents – Part I
XIX
A Neuro-Fuzzy Method of Forecasting the Network Traffic of Accessing Web Server Ai-Min Yang, Xing-Min Sun, Chang-Yun Li, Ping Liu . . . . . . . . . . . . . 728 A Fuzzy Neural Network System Based on Generalized Class Cover Problem Yanxin Huang, Yan Wang, Wengang Zhou, Chunguang Zhou . . . . . . . . 735 A Self-constructing Compensatory Fuzzy Wavelet Network and Its Applications Haibin Yu, Qianjin Guo, Aidong Xu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 743 A New Balancing Method for Flexible Rotors Based on Neuro-fuzzy System and Information Fusion Shi Liu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 757 Recognition of Identifiers from Shipping Container Images Using Fuzzy Binarization and Enhanced Fuzzy Neural Network Kwang-Baek Kim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 761 Directed Knowledge Discovery Methodology for the Prediction of Ozone Concentration Seong-Pyo Cheon, Sungshin Kim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 772 Application of Fuzzy Systems in the Car-Following Behaviour Analysis Pengjun Zheng, Mike McDonald . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 782
Fuzzy Control GA-Based Composite Sliding Mode Fuzzy Control for DoublePendulum-Type Overhead Crane Diantong Liu, Weiping Guo, Jianqiang Yi . . . . . . . . . . . . . . . . . . . . . . . . 792 A Balanced Model Reduction for T-S Fuzzy Systems with Integral Quadratic Constraints Seog-Hwan Yoo, Byung-Jae Choi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 802 An Integrated Navigation System of NGIMU/ GPS Using a Fuzzy Logic Adaptive Kalman Filter Mingli Ding, Qi Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 812 Method of Fuzzy-PID Control on Vehicle Longitudinal Dynamics System Yinong Li, Zheng Ling, Yang Liu, Yanjuan Qiao . . . . . . . . . . . . . . . . . . 822
XX
Table of Contents – Part I
Design of Fuzzy Controller and Parameter Optimizer for Non-linear System Based on Operator’s Knowledge Hyeon Bae, Sungshin Kim, Yejin Kim . . . . . . . . . . . . . . . . . . . . . . . . . . . . 833 A New Pre-processing Method for Multi-channel Echo Cancellation Based on Fuzzy Control Xiaolu Li, Wang Jie, Shengli Xie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 837 Robust Adaptive Fuzzy Control for Uncertain Nonlinear Systems Chen Gang, Shuqing Wang, Jianming Zhang . . . . . . . . . . . . . . . . . . . . . . 841 Intelligent Fuzzy Systems for Aircraft Landing Control Jih-Gau Juang, Bo-Shian Lin, Kuo-Chih Chin . . . . . . . . . . . . . . . . . . . . . 851 Scheduling Design of Controllers with Fuzzy Deadline Hong Jin, Hongan Wang, Hui Wang, Danli Wang . . . . . . . . . . . . . . . . . 861 A Preference Method with Fuzzy Logic in Service Scheduling of Grid Computing Yanxiang He, Haowen Liu, Weidong Wen, Hui Jin . . . . . . . . . . . . . . . . . 865 H∞ Robust Fuzzy Control of Ultra-High Rise / High Speed Elevators with Uncertainty Hu Qing, Qingding Guo, Dongmei Yu, Xiying Ding . . . . . . . . . . . . . . . . 872 A Dual-Mode Fuzzy Model Predictive Control Scheme for Unknown Continuous Nonlinear System Chonghui Song, Shucheng Yang, Hui yang, Huaguang Zhang, Tianyou Chai . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 876 Fuzzy Modeling Strategy for Control of Nonlinear Dynamical Systems Bin Ye, Chengzhi Zhu, Chuangxin Guo, Yijia Cao . . . . . . . . . . . . . . . . . 882 Intelligent Digital Control for Nonlinear Systems with Multirate Sampling Do Wan Kim, Jin Bae Park, Young Hoon Joo . . . . . . . . . . . . . . . . . . . . . 886 Feedback Control of Humanoid Robot Locomotion Xusheng Lei, Jianbo Su . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 890 Application of Computational Intelligence (Fuzzy Logic, Neural Networks and Evolutionary Programming) to Active Networking Technology Mehdi Galily, Farzad Habibipour Roudsari, Mohammadreza Sadri . . . . 900
Table of Contents – Part I
XXI
Fuel-Efficient Maneuvers for Constellation Initialization Using Fuzzy Logic Control Mengfei Yang, Honghua Zhang, Rucai Che, Zengqi Sun . . . . . . . . . . . . . 910 Design of Interceptor Guidance Law Using Fuzzy Logic Ya-dong Lu, Ming Yang, Zi-cai Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . . 922 Relaxed LMIs Observer-Based Controller Design via Improved T-S Fuzzy Model Structure Wei Xie, Huaiyu Wu, Xin Zhao . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 930 Fuzzy Virtual Coupling Design for High Performance Haptic Display D. Bi, J. Zhang, G.L. Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 942 Linguistic Model for the Controlled Object Zhinong Miao, Xiangyu Zhao, Yang Xu . . . . . . . . . . . . . . . . . . . . . . . . . . . 950 Fuzzy Sliding Mode Control for Uncertain Nonlinear Systems Shao-Cheng Qu, Yong-Ji Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 960 Fuzzy Control of Nonlinear Pipeline Systems with Bounds on Output Peak Fei Liu, Jun Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 969 Grading Fuzzy Sliding Mode Control in AC Servo System Hu Qing, Qingding Guo, Dongmei Yu, Xiying Ding . . . . . . . . . . . . . . . . 977 A Robust Single Input Adaptive Sliding Mode Fuzzy Logic Controller for Automotive Active Suspension System Ibrahim B. Kucukdemiral, Seref N. Engin, Vasfi E. Omurlu, Galip Cansever . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 981 Construction of Fuzzy Models for Dynamic Systems Using Multi-population Cooperative Particle Swarm Optimizer Ben Niu, Yunlong Zhu, Xiaoxian He . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 987 Human Clustering for a Partner Robot Based on Computational Intelligence Indra Adji Sulistijono, Naoyuki Kubota . . . . . . . . . . . . . . . . . . . . . . . . . . . 1001 Fuzzy Switching Controller for Multiple Model Baozhu Jia, Guang Ren, Zhihong Xiu . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1011 Generation of Fuzzy Rules and Learning Algorithms for Cooperative Behavior of Autonomouse Mobile Robots(AMRs) Jang-Hyun Kim, Jin-Bae Park, Hyun-Seok Yang, Young-Pil Park . . . . 1015
XXII
Table of Contents – Part I
UML-Based Design and Fuzzy Control of Automated Vehicles Abdelkader El Kamel, Jean-Pierre Bourey . . . . . . . . . . . . . . . . . . . . . . . . . 1025
Fuzzy Hardware Design of an Analog Adaptive Fuzzy Logic Controller Zhihao Xu, Dongming Jin, Zhijian Li . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1034 VLSI Implementation of a Self-tuning Fuzzy Controller Based on Variable Universe of Discourse Weiwei Shan, Dongming Jin, Weiwei Jin, Zhihao Xu . . . . . . . . . . . . . . 1044
Knowledge Visualization and Exploration Method to Balance the Communication Among Multi-agents in Real Time Traffic Synchronization Li Weigang, Marcos Vin´ıcius Pinheiro Dib, Alba Cristina Magalh˜ aes de Melo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1053 A Celerity Association Rules Method Based on Data Sort Search Zhiwei Huang, Qin Liao . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1063 Using Web Services to Create the Collaborative Model for Enterprise Digital Content Portal Ruey-Ming Chao, Chin-Wen Yang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1067 Emotion-Based Textile Indexing Using Colors and Texture Eun Yi Kim, Soo-jeong Kim, Hyun-jin Koo, Karpjoo Jeong, Jee-in Kim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1077 Optimal Space Launcher Design Using a Refined Response Surface Method Jae-Woo Lee, Kwon-Su Jeon, Yung-Hwan Byun, Sang-Jin Kim . . . . . . 1081 MEDIC: A MDO-Enabling Distributed Computing Framework Shenyi Jin, Kwangsik Kim, Karpjoo Jeong, Jaewoo Lee, Jonghwa Kim, Hoyon Hwang, Hae-Gook Suh . . . . . . . . . . . . . . . . . . . . . . 1092 Time and Space Efficient Search for Small Alphabets with Suffix Arrays Jeong Seop Sim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1102 Optimal Supersonic Air-Launching Rocket Design Using Multidisciplinary System Optimization Approach Jae-Woo Lee, Young Chang Choi, Yung-Hwan Byun . . . . . . . . . . . . . . . . 1108
Table of Contents – Part I XXIII
Numerical Visualization of Flow Instability in Microchannel Considering Surface Wettability Doyoung Byun, Budiono, Ji Hye Yang, Changjin Lee, Ki Won Lim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1113 A Interactive Molecular Modeling System Based on Web Service Sungjun Park, Bosoon Kim, Jee-In Kim . . . . . . . . . . . . . . . . . . . . . . . . . . 1117 On the Filter Size of DMM for Passive Scalar in Complex Flow Yang Na, Dongshin Shin, Seungbae Lee . . . . . . . . . . . . . . . . . . . . . . . . . . . 1127 Visualization Process for Design and Manufacturing of End Mills Sung-Lim Ko, Trung-Thanh Pham, Yong-Hyun Kim . . . . . . . . . . . . . . . 1133 IP Address Lookup with the Visualizable Biased Segment Tree Inbok Lee, Jeong-Shik Mun, Sung-Ryul Kim . . . . . . . . . . . . . . . . . . . . . . . 1137 A Surface Reconstruction Algorithm Using Weighted Alpha Shapes Si Hyung Park, Seoung Soo Lee, Jong Hwa Kim . . . . . . . . . . . . . . . . . . . 1141
Sequential Data Analysis HYBRID: From Atom-Clusters to Molecule-Clusters Zhou Bing, Jun-yi Shen, Qin-ke Peng . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1151 A Fuzzy Adaptive Filter for State Estimation of Unknown Structural System and Evaluation for Sound Environment Akira Ikuta, Hisako Masuike, Yegui Xiao, Mitsuo Ohta . . . . . . . . . . . . . 1161 Preventing Meaningless Stock Time Series Pattern Discovery by Changing Perceptually Important Point Detection Tak-chung Fu, Fu-lai Chung, Robert Luk, Chak-man Ng . . . . . . . . . . . . 1171 Discovering Frequent Itemsets Using Transaction Identifiers Duckjin Chai, Heeyoung Choi, Buhyun Hwang . . . . . . . . . . . . . . . . . . . . . 1175 Incremental DFT Based Search Algorithm for Similar Sequence Quan Zheng, Zhikai Feng, Ming Zhu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1185
Parallel and Distributed Data Mining Computing High Dimensional MOLAP with Parallel Shell Mini-cubes Kong-fa Hu, Chen Ling, Shen Jie, Gu Qi, Xiao-li Tang . . . . . . . . . . . . . 1192
XXIV Table of Contents – Part I
Sampling Ensembles for Frequent Patterns Caiyan Jia, Ruqian Lu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1197 Distributed Data Mining on Clusters with Bayesian Mixture Modeling M. Viswanathan, Y.K. Yang, T.K. Whangbo . . . . . . . . . . . . . . . . . . . . . . 1207 A Method of Data Classification Based on Parallel Genetic Algorithm Yuexiang Shi, Zuqiang Meng, Zixing Cai, B. Benhabib . . . . . . . . . . . . . . 1217
Rough Sets Rough Computation Based on Similarity Matrix Huang Bing, Guo Ling, He Xin, Xian-zhong Zhou . . . . . . . . . . . . . . . . . 1223 The Relationship Among Several Knowledge Reduction Approaches Keyun Qin, Zheng Pei, Weifeng Du . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1232 Rough Approximation of a Preference Relation for Stochastic Multi-attribute Decision Problems Chaoyuan Yue, Shengbao Yao, Peng Zhang, Wanan Cui . . . . . . . . . . . . 1242 Incremental Target Recognition Algorithm Based on Improved Discernibility Matrix Liu Yong, Xu Congfu, Yan Zhiyong, Pan Yunhe . . . . . . . . . . . . . . . . . . . 1246 Problems Relating to the Phonetic Encoding of Words in the Creation of a Phonetic Spelling Recognition Program Michael Higgins, Wang Shudong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1256 Diversity Measure for Multiple Classifier Systems Qinghua Hu, Daren Yu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1261 A Successive Design Method of Rough Controller Using Extra Excitation Geng Wang, Jun Zhao, Jixin Qian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1266 A Soft Sensor Model Based on Rough Set Theory and Its Application in Estimation of Oxygen Concentration Xingsheng Gu, Dazhong Sun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1271 A Divide-and-Conquer Discretization Algorithm Fan Min, Lijun Xie, Qihe Liu, Hongbin Cai . . . . . . . . . . . . . . . . . . . . . . . 1277 A Hybrid Classifier Based on Rough Set Theory and Support Vector Machines Gexiang Zhang, Zhexin Cao, Yajun Gu . . . . . . . . . . . . . . . . . . . . . . . . . . . 1287
Table of Contents – Part I
XXV
A Heuristic Algorithm for Maximum Distribution Reduction Xiaobing Pei, YuanZhen Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1297 The Minimization of Axiom Sets Characterizing Generalized Fuzzy Rough Approximation Operators Xiao-Ping Yang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1303 The Representation and Resolution of Rough Sets Based on the Extended Concept Lattice Xuegang Hu, Yuhong Zhang, Xinya Wang . . . . . . . . . . . . . . . . . . . . . . . . . 1309 Study of Integrate Models of Rough Sets and Grey Systems Wu Shunxiang, Liu Sifeng, Li Maoqing . . . . . . . . . . . . . . . . . . . . . . . . . . . 1313 Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1325
Table of Contents – Part II
Dimensionality Reduction Dimensionality Reduction for Semi-supervised Face Recognition Weiwei Du, Kohei Inoue, Kiichi Urahama . . . . . . . . . . . . . . . . . . . . . . . .
1
Cross-Document Transliterated Personal Name Coreference Resolution Houfeng Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
Difference-Similitude Matrix in Text Classification Xiaochun Huang, Ming Wu, Delin Xia, Puliu Yan . . . . . . . . . . . . . . . . .
21
A Study on Feature Selection for Toxicity Prediction Gongde Guo, Daniel Neagu, Mark T.D. Cronin . . . . . . . . . . . . . . . . . . . .
31
Application of Feature Selection for Unsupervised Learning in Prosecutors’ Office Peng Liu, Jiaxian Zhu, Lanjuan Liu, Yanhong Li, Xuefeng Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
A Novel Field Learning Algorithm for Dual Imbalance Text Classification Ling Zhuang, Honghua Dai, Xiaoshu Hang . . . . . . . . . . . . . . . . . . . . . . . .
39
Supervised Learning for Classification Hongyu Li, Wenbin Chen, I-Fan Shen . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
Feature Selection for Hyperspectral Data Classification Using Double Parallel Feedforward Neural Networks Mingyi He, Rui Huang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
Robust Nonlinear Dimension Reduction: A Self-organizing Approach Yuexian Hou, Liyue Yao, Pilian He . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
An Effective Feature Selection Scheme via Genetic Algorithm Using Mutual Information Chunkai K. Zhang, Hong Hu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
XXVIII Table of Contents – Part II
Pattern Recognition and Trend Analysis Pattern Classification Using Rectified Nearest Feature Line Segment Hao Du, Yan Qiu Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
Palmprint Identification Algorithm Using Hu Invariant Moments Jin Soo Noh, Kang Hyeon Rhee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
91
Generalized Locally Nearest Neighbor Classifiers for Object Classification Wenming Zheng, Cairong Zou, Li Zhao . . . . . . . . . . . . . . . . . . . . . . . . . . .
95
Nearest Neighbor Classification Using Cam Weighted Distance Chang Yin Zhou, Yan Qiu Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 A PPM Prediction Model Based on Web Objects’ Popularity Lei Shi, Zhimin Gu, Yunxia Pei, Lin Wei . . . . . . . . . . . . . . . . . . . . . . . . . 110 An On-line Sketch Recognition Algorithm for Composite Shape Zhan Ding, Yin Zhang, Wei Peng, Xiuzi Ye, Huaqiang Hu . . . . . . . . . . 120 Axial Representation of Character by Using Wavelet Transform Xinge You, Bin Fang, Yuan Yan Tang, Luoqing Li, Dan Zhang . . . . . . 130 Representing and Recognizing Scenario Patterns Jixin Ma, Bin Luo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 A Hybrid Artificial Intelligent-Based Criteria-Matching with Classification Algorithm Alex T.H. Sim, Vincent C.S. Lee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Auto-generation of Detection Rules with Tree Induction Algorithm Minsoo Kim, Jae-Hyun Seo, Il-Ahn Cheong, Bong-Nam Noh . . . . . . . . 160 Hand Gesture Recognition System Using Fuzzy Algorithm and RDBMS for Post PC Jung-Hyun Kim, Dong-Gyu Kim, Jeong-Hoon Shin, Sang-Won Lee, Kwang-Seok Hong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 An Ontology-Based Method for Project and Domain Expert Matching Jiangning Wu, Guangfei Yang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 Pattern Classification and Recognition of Movement Behavior of Medaka (Oryzias Latipes) Using Decision Tree Sengtai Lee, Jeehoon Kim, Jae-Yeon Baek, Man-Wi Han, Tae-Soo Chon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
Table of Contents – Part II XXIX
A New Algorithm for Computing the Minimal Enclosing Sphere in Feature Space Chonghui Guo, Mingyu Lu, Jiantao Sun, Yuchang Lu . . . . . . . . . . . . . . 196 Y-AOI: Y-Means Based Attribute Oriented Induction Identifying Root Cause for IDSs Jungtae Kim, Gunhee Lee, Jung-taek Seo, Eung-ki Park, Choon-sik Park, Dong-kyoo Kim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 New Segmentation Algorithm for Individual Offline Handwritten Character Segmentation K.B.M.R. Batuwita, G.E.M.D.C. Bandara . . . . . . . . . . . . . . . . . . . . . . . . 215 A Method Based on the Continuous Spectrum Analysis for Fingerprint Image Ridge Distance Estimation Xiaosi Zhan, Zhaocai Sun, Yilong Yin, Yayun Chu . . . . . . . . . . . . . . . . . 230 A Method Based on the Markov Chain Monte Carlo for Fingerprint Image Segmentation Xiaosi Zhan, Zhaocai Sun, Yilong Yin, Yun Chen . . . . . . . . . . . . . . . . . . 240 Unsupervised Speaker Adaptation for Phonetic Transcription Based Voice Dialing Weon-Goo Kim, MinSeok Jang, Chin-Hui Lee . . . . . . . . . . . . . . . . . . . . . 249 A Phase-Field Based Segmentation Algorithm for Jacquard Images Using Multi-start Fuzzy Optimization Strategy Zhilin Feng, Jianwei Yin, Hui Zhang, Jinxiang Dong . . . . . . . . . . . . . . . 255 Dynamic Modeling, Prediction and Analysis of Cytotoxicity on Microelectronic Sensors Biao Huang, James Z. Xing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 Generalized Fuzzy Morphological Operators Tingquan Deng, Yanmei Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 Signature Verification Method Based on the Combination of Shape and Dynamic Feature Yingna Deng, Hong Zhu, Shu Li, Tao Wang . . . . . . . . . . . . . . . . . . . . . . 285 Study on the Matching Similarity Measure Method for Image Target Recognition Xiaogang Yang, Dong Miao, Fei Cao, Yongkang Ma . . . . . . . . . . . . . . . . 289 3-D Head Pose Estimation for Monocular Image Yingjie Pan, Hong Zhu, Ruirui Ji . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
XXX
Table of Contents – Part II
The Speech Recognition Based on the Bark Wavelet Front-End Processing Xueying Zhang, Zhiping Jiao, Zhefeng Zhao . . . . . . . . . . . . . . . . . . . . . . . 302 An Accurate and Fast Iris Location Method Based on the Features of Human Eyes Weiqi Yuan, Lu Xu, Zhonghua Lin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 A Hybrid Classifier for Mass Classification with Different Kinds of Features in Mammography Ping Zhang, Kuldeep Kumar, Brijesh Verma . . . . . . . . . . . . . . . . . . . . . . 316 Data Mining Methods for Anomaly Detection of HTTP Request Exploitations Xiao-Feng Wang, Jing-Li Zhou, Sheng-Sheng Yu, Long-Zheng Cai . . . 320 Exploring Content-Based and Image-Based Features for Nude Image Detection Shi-lin Wang, Hong Hui, Sheng-hong Li, Hao Zhang, Yong-yu Shi, Wen-tao Qu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324 Collision Recognition and Direction Changes Using Fuzzy Logic for Small Scale Fish Robots by Acceleration Sensor Data Seung Y. Na, Daejung Shin, Jin Y. Kim, Su-Il Choi . . . . . . . . . . . . . . . 329 Fault Diagnosis Approach Based on Qualitative Model of Signed Directed Graph and Reasoning Rules Bingshu Wang, Wenliang Cao, Liangyu Ma, Ji Zhang . . . . . . . . . . . . . . 339 Visual Tracking Algorithm for Laparoscopic Robot Surgery Min-Seok Kim, Jin-Seok Heo, Jung-Ju Lee . . . . . . . . . . . . . . . . . . . . . . . . 344 Toward a Sound Analysis System for Telemedicine Cong Phuong Nguyen, Thi Ngoc Yen Pham, Castelli Eric . . . . . . . . . . . 352
Other Topics in FSKD Methods Structural Learning of Graphical Models and Its Applications to Traditional Chinese Medicine Ke Deng, Delin Liu, Shan Gao, Zhi Geng . . . . . . . . . . . . . . . . . . . . . . . . . 362 Study of Ensemble Strategies in Discovering Linear Causal Models Gang Li, Honghua Dai . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
Table of Contents – Part II XXXI
The Entropy of Relations and a New Approach for Decision Tree Learning Dan Hu, HongXing Li . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378 Effectively Extracting Rules from Trained Neural Networks Based on the New Measurement Method of the Classification Power of Attributes Dexian Zhang, Yang Liu, Ziqiang Wang . . . . . . . . . . . . . . . . . . . . . . . . . . 388 EDTs: Evidential Decision Trees Huawei Guo, Wenkang Shi, Feng Du . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398 GSMA: A Structural Matching Algorithm for Schema Matching in Data Warehousing Wei Cheng, Yufang Sun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408 A New Algorithm to Get the Correspondences from the Image Sequences Zhiquan Feng, Xiangxu Meng, Chenglei Yang . . . . . . . . . . . . . . . . . . . . . . 412 An Efficiently Algorithm Based on Itemsets-Lattice and Bitmap Index for Finding Frequent Itemsets Fuzan Chen, Minqiang Li . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420 Weighted Fuzzy Queries in Relational Databases Ying-Chao Zhang, Yi-Fei Chen, Xiao-ling Ye, Jie-Liang Zheng . . . . . . 430 Study of Multiuser Detection: The Support Vector Machine Approach Tao Yang, Bo Hu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442 Robust and Adaptive Backstepping Control for Nonlinear Systems Using Fuzzy Logic Systems Gang Chen, Shuqing Wang, Jianming Zhang . . . . . . . . . . . . . . . . . . . . . . 452 Online Mining Dynamic Web News Patterns Using Machine Learn Methods Jian-Wei Liu, Shou-Jian Yu, Jia-Jin Le . . . . . . . . . . . . . . . . . . . . . . . . . . 462 A New Fuzzy MCDM Method Based on Trapezoidal Fuzzy AHP and Hierarchical Fuzzy Integral Chao Zhang, Cun-bao Ma, Jia-dong Xu . . . . . . . . . . . . . . . . . . . . . . . . . . . 466 Fast Granular Analysis Based on Watershed in Microscopic Mineral Images Danping Zou, Desheng Hu, Qizhen Liu . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
XXXII Table of Contents – Part II
Cost-Sensitive Ensemble of Support Vector Machines for Effective Detection of Microcalcification in Breast Cancer Diagnosis Yonghong Peng, Qian Huang, Ping Jiang, Jianmin Jiang . . . . . . . . . . . 483 High-Dimensional Shared Nearest Neighbor Clustering Algorithm Jian Yin, Xianli Fan, Yiqun Chen, Jiangtao Ren . . . . . . . . . . . . . . . . . . 494 A New Method for Fuzzy Group Decision Making Based on α-Level Cut and Similarity Jibin Lan, Liping He, Zhongxing Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . 503 Modeling Nonlinear Systems: An Approach of Boosted Linguistic Models Keun-Chang Kwak, Witold Pedrycz, Myung-Geun Chun . . . . . . . . . . . . 514 Multi-criterion Fuzzy Optimization Approach to Imaging from Incomplete Projections Xin Gao, Shuqian Luo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524 Transductive Knowledge Based Fuzzy Inference System for Personalized Modeling Qun Song, Tianmin Ma, Nikola Kasabov . . . . . . . . . . . . . . . . . . . . . . . . . 528 A Sampling-Based Method for Mining Frequent Patterns from Databases Yen-Liang Chen, Chin-Yuan Ho . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536 Lagrange Problem in Fuzzy Reversed Posynomial Geometric Programming Bing-yuan Cao . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546 Direct Candidates Generation: A Novel Algorithm for Discovering Complete Share-Frequent Itemsets Yu-Chiang Li, Jieh-Shan Yeh, Chin-Chen Chang . . . . . . . . . . . . . . . . . . . 551 A Three-Step Preprocessing Algorithm for Minimizing E-Mail Document’s Atypical Characteristics Ok-Ran Jeong, Dong-Sub Cho . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561 Failure Detection Method Based on Fuzzy Comprehensive Evaluation for Integrated Navigation System Guoliang Liu, Yingchun Zhang, Wenyi Qiang, Zengqi Sun . . . . . . . . . . 567 Product Quality Improvement Analysis Using Data Mining: A Case Study in Ultra-Precision Manufacturing Industry Hailiang Huang, Dianliang Wu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577
Table of Contents – Part II XXXIII
Two-Tier Based Intrusion Detection System Byung-Joo Kim, Il Kon Kim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581 SuffixMiner: Efficiently Mining Frequent Itemsets in Data Streams by Suffix-Forest Lifeng Jia, Chunguang Zhou, Zhe Wang, Xiujuan Xu . . . . . . . . . . . . . . . 592 Improvement of Lee-Kim-Yoo’s Remote User Authentication Scheme Using Smart Cards Da-Zhi Sun, Zhen-Fu Cao . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596
Mining of Spatial, Textual, Image and Time-Series Data Grapheme-to-Phoneme Conversion Based on a Fast TBL Algorithm in Mandarin TTS Systems Min Zheng, Qin Shi, Wei Zhang, Lianhong Cai . . . . . . . . . . . . . . . . . . . . 600 Clarity Ranking for Digital Images Shutao Li, Guangsheng Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 610 Attribute Uncertainty in GIS Data Shuliang Wang, Wenzhong Shi, Hanning Yuan, Guoqing Chen . . . . . . . 614 Association Classification Based on Sample Weighting Jin Zhang, Xiaoyun Chen, Yi Chen, Yunfa Hu . . . . . . . . . . . . . . . . . . . . 624 Using Fuzzy Logic for Automatic Analysis of Astronomical Pipelines Lior Shamir, Robert J. Nemiroff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634 On the On-line Learning Algorithms for EEG Signal Classification in Brain Computer Interfaces Shiliang Sun, Changshui Zhang, Naijiang Lu . . . . . . . . . . . . . . . . . . . . . . 638 Automatic Keyphrase Extraction from Chinese News Documents Houfeng Wang, Sujian Li, Shiwen Yu . . . . . . . . . . . . . . . . . . . . . . . . . . . . 648 A New Model of Document Structure Analysis Zhiqi Wang, Yongcheng Wang, Kai Gao . . . . . . . . . . . . . . . . . . . . . . . . . . 658 Prediction for Silicon Content in Molten Iron Using a Combined Fuzzy-Associative-Rules Bank Shi-hua Luo, Xiang-guan Liu, Min Zhao . . . . . . . . . . . . . . . . . . . . . . . . . . 667
XXXIV Table of Contents – Part II
An Investigation into the Use of Delay Coordinate Embedding Technique with MIMO ANFIS for Nonlinear Prediction of Chaotic Signals Jun Zhang, Weiwei Dai, Muhui Fan, Henry Chung, Zhi Wei, D. Bi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677 Replay Scene Based Sports Video Abstraction Jian-quan Ouyang, Jin-tao Li, Yong-dong Zhang . . . . . . . . . . . . . . . . . . . 689 Mapping Web Usage Patterns to MDP Model and Mining with Reinforcement Learning Yang Gao, Zongwei Luo, Ning Li . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 698 Study on Wavelet-Based Fuzzy Multiscale Edge Detection Method Wen Zhu, Beiping Hou, Zhegen Zhang, Kening Zhou . . . . . . . . . . . . . . . 703 Sense Rank AALesk: A Semantic Solution for Word Sense Disambiguation Yiqun Chen, Jian Yin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710 Automatic Video Knowledge Mining for Summary Generation Based on Un-supervised Statistical Learning Jian Ling, Yiqun Lian, Yueting Zhuang . . . . . . . . . . . . . . . . . . . . . . . . . . . 718 A Model for Classification of Topological Relationships Between Two Spatial Objects Wu Yang, Ya Luo, Ping Guo, HuangFu Tao, Bo He . . . . . . . . . . . . . . . . 723 A New Feature of Uniformity of Image Texture Directions Coinciding with the Human Eyes Perception Xing-Jian He, Yue Zhang, Tat-Ming Lok, Michael R. Lyu . . . . . . . . . . . 727 Sunspot Time Series Prediction Using Parallel-Structure Fuzzy System Min-Soo Kim, Chan-Soo Chung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 731 A Similarity Computing Algorithm for Volumetric Data Sets Tao Zhang, Wei Chen, Min Hu, Qunsheng Peng . . . . . . . . . . . . . . . . . . . 742 Extraction of Representative Keywords Considering Co-occurrence in Positive Documents Byeong-Man Kim, Qing Li, KwangHo Lee, Bo-Yeong Kang . . . . . . . . . 752 On the Effective Similarity Measures for the Similarity-Based Pattern Retrieval in Multidimensional Sequence Databases Seok-Lyong Lee, Ju-Hong Lee, Seok-Ju Chun . . . . . . . . . . . . . . . . . . . . . . 762
Table of Contents – Part II XXXV
Crossing the Language Barrier Using Fuzzy Logic Rowena Chau, Chung-Hsing Yeh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 768 New Algorithm Mining Intrusion Patterns Wu Liu, Jian-Ping Wu, Hai-Xin Duan, Xing Li . . . . . . . . . . . . . . . . . . . 774 Dual Filtering Strategy for Chinese Term Extraction Xiaoming Chen, Xuening Li, Yi Hu, Ruzhan Lu . . . . . . . . . . . . . . . . . . . 778 White Blood Cell Segmentation and Classification in Microscopic Bone Marrow Images Nipon Theera-Umpon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787 KNN Based Evolutionary Techniques for Updating Query Cost Models Zhining Liao, Hui Wang, David Glass, Gongde Guo . . . . . . . . . . . . . . . . 797 A SVM Method for Web Page Categorization Based on Weight Adjustment and Boosting Mechanism Mingyu Lu, Chonghui Guo, Jiantao Sun, Yuchang Lu . . . . . . . . . . . . . . 801
Fuzzy Systems in Bioinformatics and Bio-medical Engineering Feature Selection for Specific Antibody Deficiency Syndrome by Neural Network with Weighted Fuzzy Membership Functions Joon S. Lim, Tae W. Ryu, Ho J. Kim, Sudhir Gupta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 811 Evaluation and Fuzzy Classification of Gene Finding Programs on Human Genome Sequences Atulya Nagar, Sujita Purushothaman, Hissam Tawfik . . . . . . . . . . . . . . . 821 Application of a Genetic Algorithm — Support Vector Machine Hybrid for Prediction of Clinical Phenotypes Based on Genome-Wide SNP Profiles of Sib Pairs Binsheng Gong, Zheng Guo, Jing Li, Guohua Zhu, Sali Lv, Shaoqi Rao, Xia Li . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 830 A New Method for Gene Functional Prediction Based on Homologous Expression Profile Sali Lv, Qianghu Wang, Guangmei Zhang, Fengxia Wen, Zhenzhen Wang, Xia Li . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 836
XXXVI Table of Contents – Part II
Analysis of Sib-Pair IBD Profiles and Genomic Context for Identification of the Relevant Molecular Signatures for Alcoholism Chuanxing Li, Lei Du, Xia Li, Binsheng Gong, Jie Zhang, Shaoqi Rao . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845 A Novel Ensemble Decision Tree Approach for Mining Genes Coding Ion Channels for Cardiopathy Subtype Jie Zhang, Xia Li, Wei Jiang, Yanqiu Wang, Chuanxing Li, Qiuju Wang, Shaoqi Rao . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 852 A Permutation-Based Genetic Algorithm for Predicting RNA Secondary Structure — A Practicable Approach Yongqiang Zhan, Maozu Guo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 861 G Protein Binding Sites Analysis Fan Zhang, Zhicheng Liu, Xia Li, Shaoqi Rao . . . . . . . . . . . . . . . . . . . . . 865 A Novel Feature Ensemble Technology to Improve Prediction Performance of Multiple Heterogeneous Phenotypes Based on Microarray Data Haiyun Wang, Qingpu Zhang, Yadong Wang, Xia Li, Shaoqi Rao, Zuquan Ding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 869
Fuzzy Systems in Expert System and Informatics Fuzzy Routing in QoS Networks Runtong Zhang, Xiaomin Zhu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 880 Component Content Soft-Sensor Based on Adaptive Fuzzy System in Rare-Earth Countercurrent Extraction Process Hui Yang, Chonghui Song, Chunyan Yang, Tianyou Chai . . . . . . . . . . . 891 The Fuzzy-Logic-Based Reasoning Mechanism for Product Development Process Ying-Kui Gu, Hong-Zhong Huang, Wei-Dong Wu, Chun-Sheng Liu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 897 Single Machine Scheduling Problem with Fuzzy Precedence Delays and Fuzzy Processing Times Yuan Xie, Jianying Xie, Jun Liu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 907 Fuzzy-Based Dynamic Bandwidth Allocation System Fang-Yie Leu, Shi-Jie Yan, Wen-Kui Chang . . . . . . . . . . . . . . . . . . . . . . 911
Table of Contents – Part IIXXXVII
Self-localization of a Mobile Robot by Local Map Matching Using Fuzzy Logic Jinxia Yu, Zixing Cai, Xiaobing Zou, Zhuohua Duan . . . . . . . . . . . . . . . 921 Navigation of Mobile Robots in Unstructured Environment Using Grid Based Fuzzy Maps ¨ Ozhan Karaman, Hakan Temelta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 925 A Fuzzy Mixed Projects and Securities Portfolio Selection Model Yong Fang, K.K. Lai, Shou-Yang Wang . . . . . . . . . . . . . . . . . . . . . . . . . . 931 Contract Net Protocol Using Fuzzy Case Based Reasoning Wunan Wan, Xiaojing Wang, Yang Liu . . . . . . . . . . . . . . . . . . . . . . . . . . 941 A Fuzzy Approach for Equilibrium Programming with Simulated Annealing Algorithm Jie Su, Junpeng Yuan, Qiang Han, Jin Huang . . . . . . . . . . . . . . . . . . . . . 945 Image Processing Application with a TSK Fuzzy Model Perfecto Mari˜ no, Vicente Pastoriza, Miguel Santamar´ıa, Emilio Mart´ınez . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 950 A Fuzzy Dead Reckoning Algorithm for Distributed Interactive Applications Ling Chen, Gencai Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 961 Intelligent Automated Negotiation Mechanism Based on Fuzzy Method Hong Zhang, Yuhui Qiu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 972 Congestion Control in Differentiated Services Networks by Means of Fuzzy Logic Morteza Mosavi, Mehdi Galily . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 976
Fuzzy Systems in Pattern Recognition and Diagnostics Fault Diagnosis System Based on Rough Set Theory and Support Vector Machine Yitian Xu, Laisheng Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 980 A Fuzzy Framework for Flashover Monitoring Chang-Gun Um, Chang-Gi Jung, Byung-Gil Han, Young-Chul Song, Doo-Hyun Choi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 989
XXXVIII Table of Contents – Part II
Feature Recognition Technique from 2D Ship Drawings Using Fuzzy Inference System Deok-Eun Kim, Sung-Chul Shin, Soo-Young Kim . . . . . . . . . . . . . . . . . . 994 Transmission Relay Method for Balanced Energy Depletion in Wireless Sensor Networks Using Fuzzy Logic Seung-Beom Baeg, Tae-Ho Cho . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 998 Validation and Comparison of Microscopic Car-Following Models Using Beijing Traffic Flow Data Dewang Chen, Yueming Yuan, Baiheng Li, Jianping Wu . . . . . . . . . . . . 1008 Apply Fuzzy-Logic-Based Functional-Center Hierarchies as Inference Engines for Self-learning Manufacture Process Diagnoses Yu-Shu Hu, Mohammad Modarres . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1012 Fuzzy Spatial Location Model and Its Application in Spatial Query Yongjian Yang, Chunling Cao . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1022 Segmentation of Multimodality Osteosarcoma MRI with Vectorial Fuzzy-Connectedness Theory Jing Ma, Minglu Li, Yongqiang Zhao . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1027
Knowledge Discovery in Bioinformatics and Bio-medical Engineering A Global Optimization Algorithm for Protein Folds Prediction in 3D Space Xiaoguang Liu, Gang Wang, Jing Liu . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1031 Classification Analysis of SAGE Data Using Maximum Entropy Model Jin Xin, Rongfang Bie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1037 DNA Sequence Identification by Statistics-Based Models Jitimon Keinduangjun, Punpiti Piamsa-nga, Yong Poovorawan . . . . . . 1041 A New Method to Mine Gene Regulation Relationship Information De Pan, Fei Wang, Jiankui Guo, Jianhua Ding . . . . . . . . . . . . . . . . . . . . 1051
Knowledge Discovery in Expert System and Informatics Shot Transition Detection by Compensating for Global and Local Motions Seok-Woo Jang, Gye-Young Kim, Hyung-Il Choi . . . . . . . . . . . . . . . . . . . 1061
Table of Contents – Part II XXXIX
Hybrid Methods for Stock Index Modeling Yuehui Chen, Ajith Abraham, Ju Yang, Bo Yang . . . . . . . . . . . . . . . . . . 1067 Designing an Intelligent Web Information System of Government Based on Web Mining Gye Hang Hong, Jang Hee Lee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1071 Automatic Segmentation and Diagnosis of Breast Lesions Using Morphology Method Based on Ultrasound In-Sung Jung, Devinder Thapa, Gi-Nam Wang . . . . . . . . . . . . . . . . . . . . 1079 Composition of Web Services Using Ontology with Monotonic Inheritance Changyun Li, Beishui Liao, Aimin Yang, Lijun Liao . . . . . . . . . . . . . . . 1089 Ontology-DTD Matching Algorithm for Efficient XML Query Myung Sook Kim, Yong Hae Kong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1093 An Approach to Web Service Discovery Based on the Semantics Jing Fan, Bo Ren, Li-Rong Xiong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1103 Non-deterministic Event Correlation Based on C-F Model Qiuhua Zheng, Yuntao Qian, Min Yao . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1107 Flexible Goal Recognition via Graph Construction and Analysis Minghao Yin, Wenxiang Gu, Yinghua Lu . . . . . . . . . . . . . . . . . . . . . . . . . 1118 An Implementation for Mapping SBML to BioSPI Zhupeng Dong, Xiaoju Dong, Xian Xu, Yuxi Fu, Zhizhou Zhang, Lin He . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1128 Knowledge-Based Faults Diagnosis System for Wastewater Treatment Jang-Hwan Park, Byong-Hee Jun, Myung-Geun Chun . . . . . . . . . . . . . . 1132 Study on Intelligent Information Integration of Knowledge Portals Yongjin Zhang, Hongqi Chen, Jiancang Xie . . . . . . . . . . . . . . . . . . . . . . . 1136 The Risk Identification and Assessment in E-Business Development Lin Wang, Yurong Zeng . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1142 A Novel Wavelet Transform Based on Polar Coordinates for Datamining Applications Seonggoo Kang, Sangjun Lee, Sukho Lee . . . . . . . . . . . . . . . . . . . . . . . . . . 1150
XL
Table of Contents – Part II
Impact on the Writing Granularity for Incremental Checkpointing Junyoung Heo, Xuefeng Piao, Sangho Yi, Geunyoung Park, Minkyu Park, Jiman Hong, Yookun Cho . . . . . . . . . . . . . . . . . . . . . . . . . . 1154 Using Feedback Cycle for Developing an Adjustable Security Design Metric Charlie Y. Shim, Jung Y. Kim, Sung Y. Shin, Jiman Hong . . . . . . . . . 1158 w -LLC: Weighted Low-Energy Localized Clustering for Embedded Networked Sensors Joongheon Kim, Wonjun Lee, Eunkyo Kim, Choonhwa Lee . . . . . . . . . . 1162 Energy Efficient Dynamic Cluster Based Clock Synchronization for Wireless Sensor Network Md. Mamun-Or-Rashid, Choong Seon Hong, Jinsung Cho . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1166 An Intelligent Power Management Scheme for Wireless Embedded Systems Using Channel State Feedbacks Hyukjun Oh, Jiman Hong, Heejune Ahn . . . . . . . . . . . . . . . . . . . . . . . . . . 1170 Analyze and Guess Type of Piece in the Computer Game Intelligent System Z.Y. Xia, Y.A. Hu, J. Wang, Y.C. Jiang, X.L. Qin . . . . . . . . . . . . . . . . 1174 Large-Scale Ensemble Decision Analysis of Sib-Pair IBD Profiles for Identification of the Relevant Molecular Signatures for Alcoholism Xia Li, Shaoqi Rao, Wei Zhang, Guo Zheng, Wei Jiang, Lei Du . . . . . 1184 A Novel Visualization Classifier and Its Applications Jie Li, Xiang Long Tang, Xia Li . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1190
Active Information Gathering on the Web Automatic Creation of Links: An Approach Based on Decision Tree Peng Li, Seiji Yamada . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1200 Extraction of Structural Information from the Web Tsuyoshi Murata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1204 Blog Search with Keyword Map-Based Relevance Feedback Yasufumi Takama, Tomoki Kajinami, Akio Matsumura . . . . . . . . . . . . . 1208
Table of Contents – Part II
XLI
An One Class Classification Approach to Non-relevance Feedback Document Retrieval Takashi Onoda, Hiroshi Murata, Seiji Yamada . . . . . . . . . . . . . . . . . . . . . 1216 Automated Knowledge Extraction from Internet for a Crisis Communication Portal Ong Sing Goh, Chun Che Fung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1226
Neural and Fuzzy Computation in Cognitive Computer Vision Probabilistic Principal Surface Classifier Kuiyu Chang, Joydeep Ghosh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1236 Probabilistic Based Recursive Model for Face Recognition Siu-Yeung Cho, Jia-Jun Wong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1245 Performance Characterization in Computer Vision: The Role of Visual Cognition Theory Aimin Wu, De Xu, Xu Yang, Jianhui Zheng . . . . . . . . . . . . . . . . . . . . . . 1255 Generic Solution for Image Object Recognition Based on Vision Cognition Theory Aimin Wu, De Xu, Xu Yang, Jianhui Zheng . . . . . . . . . . . . . . . . . . . . . . 1265 Cognition Theory Motivated Image Semantics and Image Language Aimin Wu, De Xu, Xu Yang, Jianhui Zheng . . . . . . . . . . . . . . . . . . . . . . 1276 Neuro-Fuzzy Inference System to Learn Expert Decision: Between Performance and Intelligibility Laurence Cornez, Manuel Samuelides, Jean-Denis Muller . . . . . . . . . . . 1281 Fuzzy Patterns in Multi-level of Satisfaction for MCDM Model Using Modified Smooth S-Curve MF Pandian Vasant, A. Bhattacharya, N.N. Barsoum . . . . . . . . . . . . . . . . . . 1294 Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1305
On Fuzzy Inclusion in the Interval-Valued Sense Jin Han Park1, Jong Seo Park2 , and Young Chel Kwun3 1
Division of Math. Sci., Pukyong National University, Pusan 608-737, South Korea [email protected] 2 Department of Math. Education, Chinju National Universuty of Education, Chinju 660-756, South Korea [email protected] 3 Department of Mathematics, Dong-A University, Pusan 604-714, South Korea [email protected]
Abstract. As a generalization of fuzzy sets, the concept of intervalvalued fuzzy sets was introduced by Gorzalczany [Fuzzy Sets and Systems 21 (1987) 1]. In this paper, we shall extend the concept of “fuzzy ˇ inclusion”, introduced by Sostak [Supp. Rend. Circ. Mat. Palermo (Ser. II) 11 (1985) 89], to the interval-valued fuzzy setting and study its fundamental properties for some extent.
1
Introduction
After the introduction of the concept of fuzzy sets by Zadeh [10] several researchers were concerned about the generalizations of the notion of fuzzy set, e.g. fuzzy set of type n [11], intuitionistic fuzzy set [1,2] and interval-valued fuzzy set [4]. The concept of interval-valued fuzzy sets was introduced by Gorzalczany [4], and recently there has been progress in the study of such sets by Mondal and ˇ Samanta [6] and Ramakrishnan and Nayagam [7]. On the other hand, Sostak [8] defined fuzzy inclusion between two fuzzy sets A and B in order to give measure of inclusion of one in the other and applied this notion in fuzzy topological spaces defined by himself. Fuzzy inclusion in intuitionistic fuzzy sets was defined and studied by C ¸ oker and Demirci [3]. In this paper, we define and study fuzzy inclusion in interval-valued fuzzy sets and then apply this inclusion to interval-valued ˇ fuzzy topological spaces in Sostak’s sense.
2
Preliminaries
First we shall present the fundamental definitions given by Gorzalczany [4]: Definition 1. [4] Let X be a nonempty fixed set. An interval-valued fuzzy set (IVF set, for short) A on X is an object having the form U A = {(x, [μL A (x), μA (x)]) : x ∈ X} U L where μL A : X → [0, 1] and μA : X → [0, 1] are functions satisfying μA (x) ≤ U μA (x) for each x ∈ X. L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 1–10, 2005. c Springer-Verlag Berlin Heidelberg 2005
2
J.H. Park, J.S. Park, and Y.C. Kwun
Let D be the set of all closed subintervals of the unit interval [0, 1] and consider singletons {a} in [0, 1] as closed subintervals of the form [a, a]. An IVF U X set A = {(x, [μL A (x), μA (x)]) : x ∈ X} in X can identified to element in D . Thus for each x ∈ X, A(x) is a closed interval whose lower and upper end points are U μL A (x) and μA (x), respectively. Obviously, every fuzzy set A = {μA (x) : x ∈ X} on X is an IVF set of the form A = {(x, [μA (x), μA (x)]) : x ∈ X}. Let X0 be a subset of X. For any interval [a, b] ∈ D, the IVF set whose value is the interval [a, b] for x ∈ X0 and [0, 0] for x ∈ X \ X0 , is denoted by [a,˜b]X0 . In particular, if ˜X0 . The IVF set [a,˜b]X (resp. a = b the IVF set [a,˜ b]X0 is denoted by simply a a ˜X ) is denoted by simply [a,˜ b] (resp. a ˜). For the sake of simplicity, we shall often U L U use the symbol A = [μL A , μA ] for the IVF set A = {(x, [μA (x), μA (x)]) : x ∈ X}. Definition 2. [4] Let A and B be IVF sets on X. Then L U U (a) A ⊆ B iff μL A (x) ≤ μB (x) and μA (x) ≤ μB (x) for all x ∈ X; (b) A = B iff A ⊆ B and B ⊆ A; L (c) The complement Ac of A is defined by Ac (x) = [1 − μU A (x), 1 − μA (x)] for all x ∈ X; (d) If {Ai : i ∈ J} is an arbitrary family of IVF sets on X, then U ∩Ai (x) = [sup μL Ai (x), sup μAi (x)], i∈J
∪Ai (x) = [inf
i∈J
i∈J
μL Ai (x),
inf μU Ai (x)].
i∈J
Definition 3. Let X and Y be two nonempty sets and f : X → Y be a function. Let A and B be IVF sets on X and Y respectively. (a) The inverse image f −1 (B) of B under f is the IVF set on X defined by U f −1 (B)(x) = [μL B (f (x)), μB (f (x))] for all x ∈ X. (b) The image f (A) of A under f is the IVF set on Y defined by f (A) = U [μL f (A) , μf (A) ], where μL f (A) (y) = μU f (A) (y) =
−1 supx∈f −1 (y) μL (y) = φ A (x) if f 0, otherwise, −1 supx∈f −1 (y) μU (y) = φ A (x) if f 0, otherwise.
for each y ∈ Y . Now we list the properties of images and preimages, some of which we shall frequently use in Sections 3 and 4. Theorem 1. Let A and Ai (i ∈ J) be IVF sets on X and B and Bi (i ∈ J) be IVF sets on Y and f : X → Y be a function. Then: (a) If A1 ⊆ A2 , then f (A1 ) ⊆ f (A2 ). (b) If B1 ⊆ B2 , then f −1 (B1 ) ⊆ f −1 (B2 ).
On Fuzzy Inclusion in the Interval-Valued Sense
3
A ⊆ f −1 (f (A)). If, furthermore, f is injective, then A = f −1 (f (A)). B. If, furthermore, then f (f −1 (B)) = B. f (f −1 (B)) ⊆ f is surjective, −1 −1 −1 −1 f ( Bi ) = ), f ( Bi ) = f (Bi ). f (Bi f ( Ai ) = f (Ai ), f ( Ai ) ⊆ f (Ai ) If, furthermore, f is injective, then f ( Ai ) = f (Ai ). (g) f −1 (˜ 1) = ˜ 1, f −1 (˜ 0) = ˜ 0. ˜ ˜ ˜ (h) f (0) = 0 and f (1) = ˜ 1 if f is surjective. (i) f −1 (B)c = f −1 (B c ) and f (A)c ⊆ f (Ac ) if f is surjective. (c) (d) (e) (f)
3
Fuzzy Inclusion in the Interval-Valued Sense
In this section, we shall extend the concept “fuzzy inclusion” [8,9] to the intervalvalued fuzzy setting: Definition 4. Let X be a nonempty set. Then the right fuzzy inclusion, denoted by , is the IVF set on DX × DX defined by U L μL (A, B) = inf{((1 − μA ) ∨ μB )(x) : x ∈ X}, L U μU (A, B) = inf{((1 − μA ) ∨ μB )(x) : x ∈ X}
for each A, B ∈ DX . Here μL (A, B) denote the lower limit of inclusion of A in U B, while μ (A, B) denotes the upper limit of inclusion of A in B. L Remark 1. Since 1 − μU A ≤ 1 − μA , we may deduce the following: L L U ((1 − μU A ) ∨ μB )(x) ≤ ((1 − μA ) ∨ μB )(x) for each x ∈ X L L U ⇒ inf ((1 − μU A ) ∨ μB )(x) ≤ inf ((1 − μA ) ∨ μB )(x) x∈X
x∈X
U ⇒ μL (A, B) ≤ μ (A, B). U Therefore, for each A, B ∈ DX , [μL (A, B), μ (A, B)] is closed interval.
Definition 5. For any two IVF sets A, B ∈ DX , closed interval [μL (A, B), U μ (A, B)] will be denoted by [A B], i.e., U [A B] = [μL (A, B), μ (A, B)].
Remark 2. The closed interval [A B] shows “to what extend the IVF set A is contained in the IVF set B” (cf. [3,9]). If A and B are crisp IVF sets on X 1D where C and D are nonempty subsets of X, then given by A = ˜ 1C and B = ˜ ˜ [A B] = 1 iff C ⊆ D, and [A B] = ˜ 0 otherwise. Similar to the concept of right fuzzy inclusion, we can easily define the left fuzzy inclusion as follows: L U U L U μL (A, B) = μ (B, A), μ (A, B) = μ (B, A) , [A B] = [μ (A, B), μ (A, B)].
4
J.H. Park, J.S. Park, and Y.C. Kwun
Definition 6. For any two IVF sets A, B ∈ DX , the interval-valued fuzzy equality of A to B defined as follows: [A B] = [A B] ∧ [A B]. μL (A, B)
L U U The interval values = μL (A, B)∧μ (A, B) and μ (A, B) = μ (A, B)∨ μU (A, B), respectively, denote the lower limit of equality and the upper limit of equality of the IVF set A to the IVF set B.
Now we present some of the basic properties of interval-valued fuzzy inclusion: Theorem 2. For IVF sets A, B, C and D on X, the following properties hold: (a) (b) (c) (d) (e) (f)
If A ⊆ B and D ⊆ C, then [A C] ≥ [B D]. [Ac B c ] = [B A]. [A ∪ B C ∩ D] ≤ [A C] ∧ [B D]. [A C] ∨ [B D] ≤ [A ∩ B C ∪ D]. [A ∩ B c ˜ 0] = [A B] = [˜ 1 Ac ∪ B]. If {Bi ∈ DX : i ∈ J}, then ∧i∈J [A Bi ] = [A ∩i∈J Bi ] and ∧i∈J [Bi A] = [∪i∈J Bi A].
L U U L L Proof. (a) Let A ⊆ B and D ⊆ C. Since μL A ≤ μB , μA ≤ μB , μD ≤ μC and U μU ≤ μ , we obtain D C U L L U [A C] = inf ((1 − μA ) ∨ μC )(x), inf ((1 − μA ) ∨ μC )(x) x∈X x∈X U L L U ≥ inf ((1 − μB ) ∨ μD )(x), inf ((1 − μB ) ∨ μD )(x) = [B D] . x∈X
(b)
x∈X
[B A] = inf ((1 − ∨ inf ((1 − ∨ x∈X x∈X L U U L = inf ((1 − (1 − μA )) ∨ (1 − μB ))(x), inf ((1 − (1 − μA )) ∨ (1 − μB ))(x) μU B)
x∈X c
μL A )(x),
μL B)
μU A )(x)
x∈X
= [A B ] . c
(c) By (a), [A ∪ B C ∩ D] ≤ [A C] and [A ∪ B C ∩ D] ≤ [B D] and hence [A ∪ B C ∩ D] ≤ [A C] ∧ [B D]. (d) Similar to (c). (e) U L L U [A B] = inf ((1 − μA ) ∨ μB )(x), inf ((1 − μA ) ∨ μB )(x) x∈X x∈X U L L U = inf (1 − (μA ∧ (1 − μB )))(x), inf (1 − (μA ∧ (1 − μB )))(x) x∈X x∈X U L = inf ((1 − μA∩B c ) ∨ 0)(x), inf ((1 − μA∩B c ) ∨ 0)(x) x∈X
x∈X
= [A ∩ B ˜ 0] c
On Fuzzy Inclusion in the Interval-Valued Sense
and
[A B] =
inf ((1 −
x∈X
μU A)
∨
μL B )(x),
=
5
inf (0 ∨
x∈X
μL Ac ∪B ))(x),
inf ((1 −
x∈X
inf (0 ∨
x∈X
μL A)
μU B ))(x)
∨
μU Ac ∪B )(x)
= [˜1 Ac ∪ B].
(f) [A ∩i Bi ] =
L L U inf (1 − μU ) ∨ μ )(x), inf ((1 − μ ) ∨ μ )(x) A ∩Bi A ∩Bi
x∈X
x∈X
L L U ) ∨ ∧ μ )(x), inf ((1 − μ ) ∨ ∧ μ )(x) inf ((1 − μU i Bi i Bi A A x∈X x∈X L L U = inf inf (1 − μU ) ∨ μ )(x), inf inf ((1 − μ ) ∨ μ )(x) A Bi A Bi x∈X i x∈X i L L U ) ∨ μ )(x), inf ((1 − μ ) ∨ μ )(x) = ∧i inf (1 − μU A Bi A Bi
=
x∈X
x∈X
= ∧i [A Bi ] and by (b) we obtain c
[∪i Bi A] = [Ac (∪i Bi ) ] = [Ac ∩i (Bi )c ] = ∧i [Ac (Bi )c ] = ∧i [Bi A] . Now we list the properties of interval-valued fuzzy inclusion related to images and preimages: Theorem 3. Let A, B be IVF sets on X and C, D be IVF sets on Y . For a function f : X → Y , the following properties hold: (a) [A B] ≤ [f (A) f (B)]. Furthermore, [A B] = [f (A) f (B)] if f is injective. (b) [C D] ≤ [f −1 (C) f −1 (D)]. Furthermore, [C D] = [f −1 (C) f −1 (D)] if f is surjective. (c) [A f −1 (f (A))] ≤ [f (A) f (A)], [f −1 (f (A)) A] ≤ [A A], [f (f −1 (C)) C] ≤ [f −1 (C) f −1 (C)] and [C f (f −1 (C))] ≤ [C C]. (d) [f (A) C] = [A f −1 (C)]. (e) If {Ci : i ∈ J} ⊆ DY , then [f (A) ∪i Ci ] = A ∪i f −1 (Ci ) . (f) If {Ai : i ∈ J} ⊆ DX , then
[f (∩i Ai ) C] = ∩i Ai f −1 (C) .
L Proof. (a) By Definition 3 (b), we obtain inf y∈Y (1 − μU f (A) ) ∨ μf (B) (y) ≥
L L U inf x∈X (1 − μU ) ∨ μ ) ∨ μ (1 − μ (x) and inf y∈Y A B f (A) f (B) (y) ≥ inf x∈X ((1− L U μA ) ∨ μB (x) and hence
6
J.H. Park, J.S. Park, and Y.C. Kwun
U [f (A) f (B)] = μL (f (A), f (B)), μ (f (A), f (B))
L L U (y), inf (1 − μ = inf (1 − μU ) ∨ μ ) ∨ μ f (A) f (B) f (A) f (B) (y) y∈Y y∈Y L L U ≥ inf (1 − μU ) ∨ μ ) ∨ μ (x), inf (1 − μ (x) = [A B]. A B A B x∈X
x∈X
Now we prove the equality in case that f is injective. L inf ((1 − μU f (A) ) ∨ μf (B) )(y)
L U L = inf (1 − μU (y) ∧ inf (1 − μ ) ∨ μ ) ∨ μ f (A) f (B) f (A) f (B) (y) y∈f (X) y ∈f / (X)
L = inf (1 − μU f (A) ) ∨ μf (B) (y) ∧ 1 y∈f (X)
L = inf (1 − μU f (A) ) ∨ μf (B) (y) y∈f (X)
y∈Y
=
inf y∈f (X)
(1 −
sup x∈f −1 (y)
μU A )(x) ∨
L = inf (1 − μU A ) ∨ μB (x).
sup x∈f −1 (y)
μL B )(x)
x∈X
Similarly, we have U L U inf ((1 − μL f (A) ) ∨ μf (B) )(y) = inf (1 − μA ) ∨ μB (x).
y∈Y
x∈X
Hence, from two equalities above, we obtain [A B] = [f (A) f (B)]. (b) Similar to (a). (c) From (a) and (b), the required inequalities can be easily obtained. (d) By (b) and Theorem 1 (c), we have [f (A) C] ≤ [f −1 (f (A)) f −1 (C)] ≤ A f −1 (C)]. Similarly, by (a) and Theorem 1 (d), [A f −1 (C)] ≤ [f (A) f (f −1 (C))] ≤ f (A) C]. Hence [f (A) C] = [A f −1 (C)]. (e) By (d) and Theorem 1 (e), we have [f (A) i Ci ] = [ A f −1 ( i Ci ) = −1 A i f (Ci ) . (f) Similar to (e).
4
Interval-Valued Fuzzy Families
In this section, we define the concept of interval-valued fuzzy family to obtain generalized De Morgan’s laws and later interval-valued fuzzy topological spaces. Definition 7. An IVF set F on the set DX is called an interval-valued fuzzy U family (IVFF for short) on X and denoted by the form F = [μL F , μF ].
On Fuzzy Inclusion in the Interval-Valued Sense
7
Definition 8. Let F be an IVFF on X. Then the IVFF of complemented U L L c IVF sets on X is defined by F ∗ = [μL F ∗ , μF ∗ ], where μF ∗ (A) = μF (A ) and U U c X μF ∗ (A) = μF (A ) for each A ∈ D . Example 1. We can easily extend the fuzzy topological space, first defined by ˇ Sostak [8,9], to the case of interval-valued fuzzy sets [4]. Let τ be an IVFF on X. For each A ∈ DX , we can construct the closed interval τ (A) as follows: U τ (A) = [μL τ (A), μτ (A)]. ˇ In this case, an interval-valued fuzzy topology in Sostak’s sense (So-IVFT for short) on a nonempty set X is an IVFF τ on X satisfying the following axioms:
˜ =1 ˜ and τ (˜ (O1) τ (0) 1) = ˜ 1; (O2) τ (A1 ∩ A2 ) ≥ τ (A1 ) ∧ τ (A2 ) for any A1 , A2 ∈ DX ; (O3) τ ( i Ai ) ≥ i τ (Ai ) for any {Ai : i ∈ J} ⊆ DX . In this case the pair (X, τ ) is called an interval-valued fuzzy topological space ˇ in Sostak’s sense (So-IVFTS for short). For any A ∈ DX , the closed interval τ (A) is called the interval-valued degree of openness of A. Of course, a So-IVFTS (X, τ ) is fuzzy topological space in Chang’s sense. So we also define a So-IVFTS in the sense of Lowen [5] as follows: (X, τ ) is So-IVFTS in the sense of Lowen if (X, τ ) is IVFTS satisfying the
condition ˜ ˜ that for each IVF set in the form [a, b], where [a, b] ⊆ [0, 1], τ [a, b] = ˜1 holds. Let (X, τ ) be a So-IVFTS on X. Then the IVFF τ ∗ of the complemented IVF setson X is defined by τ ∗ (A) = τ (Ac ) for each A ∈ DX . The closed interval U τ ∗ (A) = μL τ ∗ (A), μτ ∗ (A) is called the interval-valued degree of closedness of A. ∗ Thus the IVFF τ on X satisfies the following properties: ˜ =1 ˜ and τ ∗ (˜ (C1) τ ∗ (0) 1) = ˜ 1; ∗ ∗ (C2) τ (A1 ∪ A2 ) ≥ τ (A1 ) ∧ τ ∗ (A2 ) for any A1 , A2 ∈ DX ; (C3) τ ∗ ( i Ai ) ≥ i τ ∗ (Ai ) for any {Ai : i ∈ J} ⊆ DX . Now we extend the intersection and union of a fuzzy family [8,9] to the interval-valued fuzzy setting: Definition 9. Let F be an IVFF on X. Then (a) the intersection F of this IVFF is the IVF set defined by F (x) = μL∩F (x), μU∩F (x) for all x ∈ X, where L X μL∩F (x) = inf{1 − μU F (A) ∨ μA (x) : A ∈ D }, U X μU∩F (x) = inf{1 − μL F (A) ∨ μA (x) : A ∈ D };
(b) the union F of this IVFF is the IVF set defined by F (x) = μL∪F (x), μU∪F (x) for all x ∈ X, where L X μL∪F (x) = sup{μL F (A) ∧ μA (x) : A ∈ D }, U X μU∪F (x) = sup{μU F (A) ∧ μA (x) : A ∈ D }.
8
J.H. Park, J.S. Park, and Y.C. Kwun
Theorem 4. (Generalized De Morgan’s Laws) Let F be an IVFF on X. Then we have c (a) ( F ) = F ∗ . (b) ( F )c = F ∗ . Proof. Let x ∈ X. Then we have L X μL∩F ∗ (x) = inf{1 − μU F ∗ (A) ∨ μA (x) : A ∈ D } c L X = inf{1 − μU F (A ) ∨ μA (x) : A ∈ D } c L X = inf{1 − μU F (A ) ∨ (1 − (1 − μA ))(x) : A ∈ D } c U X = inf{1 − μU F (A ) ∨ (1 − μAc )(x) : A ∈ D } c U X = 1 − sup{μU F (A ) ∧ μAc (x) : A ∈ D } U X U = 1 − sup{μU F (A) ∧ μA (x) : A ∈ D } = 1 − μ ∩F (x)
and U X μU∩F ∗ (x) = inf{1 − μL F ∗ (A) ∨ μA (x) : A ∈ D } c U X = inf{1 − μL F (A ) ∨ (1 − (1 − μA ))(x) : A ∈ D } c L X = inf{1 − μL F (A ) ∨ (1 − μAc )(x) : A ∈ D } c L X = 1 − sup{μL F (A ) ∧ μAc (x) : A ∈ D } L X L = 1 − sup{μL F (A) ∧ μA (x) : A ∈ D } = 1 − μ ∩F (x). Hence ( F )c = F ∗ . (b) Similar to (a).
Finally, we shall define the image and preimage of IVFF’s under a function f :X →Y: Definition 10. Let F be an IVFF and let f :X → Y be an injective on X U L U function. Then the image F f = μL F f , μF f of F = μF , μF under f is the IVFF on Y defined as follows: L −1 μF (f (B)), if B ⊆ ˜1f (X) (B) = μL f F 0, otherwise μU F f (B) =
−1 (B)), if B ⊆ ˜1f (X) μU F (f 0, otherwise.
Definition 11. Let F be an IVFF on Y and let f : X → Y be a function. Then f −1 L U U = μF f −1 , μF f −1 of F = μL the preimage F F , μF under f is the IVFF on X defined by −1 (A) = sup{μL (B), B ∈ DY }, μL F (B) : A = f F f −1 −1 μU (A) = sup{μU (B), B ∈ DY }. F (B) : A = f F f −1
On Fuzzy Inclusion in the Interval-Valued Sense
9
Theorem 5. Let F be an IVFF on X and f : X → Y be an injective function. Then (a) f ( F ) = F f . (b) f ( F ) = F f . Proof. (a) We notice that, under B ⊆ ˜ 1f (X) , if B ∈ DY , then there exists a X −1 A ∈ D such that A = f (B) and so f (A) = B. Similarly, if A ∈ DX , then there exists a B ∈ DY such that B ⊆ ˜ 1f (X) and A = f −1 (B), and so B = f (A). −1 Let y ∈ Y . If f (y) = φ, then we have L L Y μL ∪F f (y) = sup{μF f (B) ∧ μB (y) : B ∈ D } L −1 Y ˜ = sup μF (f (B)) ∧ μL B (y) : B ∈ D , B ⊆ 1f (X) −1 Y ˜ (B)) ∧ μL ∨ sup μL F (f B (y) : B ∈ D , B ⊆ 1f (X) −1 Y ˜ (B)) ∧ μL = sup μL F (f B (y) : B ∈ D , B ⊆ 1f (X) ∨ 0 −1 Y ˜ (B)) ∧ μL = μL F (f B (y) : B ∈ D , B ⊆ 1f (X) L X (A) ∧ μ (y) : A ∈ D = μL F f (A) L X = μF (A) ∧ f (μL . A )(y) : A ∈ D
On the other hand, whenever f −1 (y) = φ, there exists a x ∈ X such that f (x) = y since f is injective. Then we have L L μL f (∪F ) (y) = f (μ∪F )(y) = μ∪F (x) X (A) ∧ μL = sup{μL A (x) : A ∈ D } L F L X . = μF (A) ∧ f (μA )(y) : A ∈ D L If f −1 (y) = φ, then we have μL ∪F f (y) = 0 and μf (∪F ) (y) = 0. Therefore, we L −1 obtain μL (y) = φ, then we have ∪F f = μf (∪F ) . Similarly, if f
U U Y μU ∪F f (y) = sup μF f (B) ∧ μB (y) : B ∈ D −1 Y ˜ (B)) ∧ μU = sup μU F (f B (y) : B ∈ D , B ⊆ 1f (X) U X = sup μU F (A) ∧ f (μA )(y) : A ∈ D and U [x ∈ f −1 (y)] μU f (∪F ) (y) = μ∪F (x) U X = sup μF (A) ∧ μU A (x) : A ∈ D U X . = sup μU F (A) ∧ f (μA )(y) : A ∈ D U U If f −1 (y) = φ, since f (∪μU )(y) = 0 and μU ∪F f (y) = 0, we have μ∪F f = μf (∪F ) . f F Hence f ( F ) = F . (b) Similar to (a).
10
J.H. Park, J.S. Park, and Y.C. Kwun
Theorem 6. Let F be an IVFF on Y and f : X → Y be a function. Then −1 (a) f −1 ( F ) = F f . −1 (b) f −1 ( F ) = F f . Proof. (a) Let N (A) = {B ∈ DY : A = f −1 (B)} for each A ∈ DX . Then X D = {N (A) : A ∈ DX }. Let x ∈ X. Then we have L X μL (x) = sup μL ∪F f −1 (A) ∧ μA (x) : A ∈ D f (∪F f −1 ) −1 X = sup sup μL (B), B ∈ DY ∧ μL F (B) : A = f A (x) : A ∈ D L −1 (B), B ∈ DY : A ∈ DX = sup sup μL F (B) ∧ μA (x) : A = f L X = sup sup μL F (B) ∧ μA (x) : B ∈ N (A) : A ∈ D L X = sup sup μL F (B) ∧ μf −1 (B) (x) : B ∈ N (A) : A ∈ D L Y = sup μL F (B) ∧ μB (f (x)) : B ∈ D = μL f −1 (∪F ) (x). −1 −1 Similarly, we have μU (x) = μU ( F) = Ff . f −1 (∪F ) (x). Hence f ∪F f −1 (b) Similar to (a).
References 1. K. Atanassov, “Intuitionistic fuzzy sets”, in: V. Sgurev, Ed., VII ITKR’s Session, Sofia (June 1983 Central Sci. and Techn. Library, Bulg. Academy of Sciences, 1984). 2. K. Atanassov, “Intuitionistic fuzzy sets”, Fuzzy Sets and Systems, Vol. 20, pp. 87-96, 1986. 3. D. C ¸ oker and M. Demirci, “On fuzzy inclusion in the intuitionistic sense”, J. Fuzzy Math., Vol. 4, pp. 701-714, 1996. 4. M. B. Gorzalczany, “A method of inference in approximate reasoning based on interval-valued fuzzy sets”, Fuzzy Sets and Systems, Vol. 21, pp. 1-17, 1987. 5. R. Lowen, “Fuzzy topological spaces and fuzzy compactness”, J. Math. Anal. Appl., Vol. 56, pp. 621-633, 1976. 6. T. K. Mondal and S.K. Samanta, “Topology of interval-valued fuzzy sets”, Indian J. pure appl. Math., Vol. 30, pp. 23-38, 1999. 7. P. V. Ramakrishnan and V. Lakshmana Gomathi Nayagam, “Hausdorff intervalvalued fuzzy filters”, J. Korean Math. Soc., Vol. 39, pp. 137-148, 2002. ˇ 8. A. Sostak, “On a fuzzy topological structure”, Supp. Rend. Circ. Mat. Palermo (Ser. II), Vol. 11, pp. 89-103, 1985. ˇ 9. A. Sostak, “On compactness and connectedness degrees of fuzzy sets in fuzzy topological spaces”, General Topology and its Relations to Mordern Analysis and Algebra, Helderman Verlag, Berlin, pp. 519-532, 1988. 10. L. A. Zadeh, “Fuzzy sets”, Inform. and Control, Vol. 8, pp.338-353, 1965. 11. L. A. Zadeh, “The concept of a linguistic variable and its application to approximate reasoning -I”, Inform. Sci., Vol. 8, pp. 199-249, 1975.
Fuzzy Evaluation Based Multi-objective Reactive Power Optimization in Distribution Networks Jiachuan Shi and Yutian Liu School of Electrical Engineering, Shangdong University, Jinan 250061, China [email protected] Abstract. A fuzzy evaluation based multi-objective optimization model for reactive power optimization in power distribution networks is presented in this paper. The two objectives, reducing active power losses and improving voltage profiles, are evaluated by membership functions respectively, so that the objectives can be compared in a single scale. To facilitate the solving process, a compromised objective is formed by the weighted sum approach. The weights are decided according to the preferences and importance of the objectives. The reactive tabu search algorithm is employed to get global optimization solutions. Simulation results of a practical power distribution network, greatly improved voltage profiles and reduced power losses, demonstrated that the proposed method is effective.
1 Introduction Reactive power optimization (RPO) aims to reduce active power losses by adjusting reactive power distribution. Generally, regulating means in distribution systems are mainly transformer tap-changers and shunt capacitors, which are both discrete variables. Therefore, it is formed as a constrained combinatorial optimization problem. There have been several approaches, such as numerical programming, heuristic methods and AI-based methods, devised to solve the reactive power optimization problems. Numerical programming algorithms solve the optimization problems by iterative techniques. The quadratic inter programming algorithm based on the active power losses reduction formulas are implemented in distribution networks capacitor allocation and real time control problems [1]. Numerical programming algorithms are effective in small-scale cases. Computational burden increases greatly when dealing with the practical large-scale networks. Heuristic methods are based on the rules that are developed through intuition, experience and judgment. The heuristic methods produce near optimal solutions fast, but the results are not guaranteed to be optimal [2]. AI-based methods, including global optimization techniques [3,4] and fuzzy set theory [5-8], have been implemented in RPO. In [3], a GA-based two-stage algorithm, which combines GA and heuristic algorithm, is introduced to solve the constrained optimization problem. GA-SA-TS hybrid algorithms are introduced in [4]. Combining the advantages of individual algorithms, the local and global search strategies are L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 11 – 19, 2005. © Springer-Verlag Berlin Heidelberg 2005
12
J. Shi and Y. Liu
joined to find better solutions within reasonable time. Fuzzy set theory is efficient in remedying for the uncertainty of data and optimization models. The applications of fuzzy set theory in distribution network shunt capacitor placement are introduced in [5]. Fuzzy set theory is combined with dynamic programming to solve distribution network voltage/var regulation problems in [6,7,8]. It should be noted that, most of the papers focus on reducing active power losses and/or energy losses, and the voltage profiles are considered in operating constraints. The single-objective optimization model often increase voltage to upper limits to reduce active power losses, which may be not acceptable in practice. Therefore, reducing active power losses may conflict with voltage profiles constraints. A reactive power optimization method is presented in this paper. To solve the conflict between active power losses reduction and operating constraints, voltage profiles constraints are converted into objectives. That is to improve voltage profiles and reduce active power losses as much as possible. Therefore, the RPO problem is formed as a constrained combinatorial multi-objective optimization. Constraints include power flow, operating constraints, and adjusting frequencies. The membership functions are introduced to assess the objectives, so that the objectives can be compared without influenced by their original values. The weighted-sum approach is utilized to combine the objectives into one, and the constrained multiobjective optimization problem is solved by the Reactive Tabu Search (RTS) technology.
2 Problem Formulation A fuzzy evaluation based multi-objective optimization model for the RPO problem in distribution networks is presented in this section. The traditional RPO aims to minimize active power losses without voltage violations. The optimization process tends to minimize active power losses by increasing voltage to upper limit. Meanwhile the results are not acceptable in practice considering variations of loads and source node voltage. Furthermore, the voltage constraints are treated as “hard” constraints, that is, no voltage violation is allowed; it may be hard to find a feasible solution, especially considering two or more operating conditions. In the new RPO model, the voltage profiles constraints are considered as an objective. That is, the objectives are to improve voltage profiles and to reduce active power losses as much as possible. The constraints include power flow and operating constraints. The objectives are estimated by fuzzy membership functions. The membership functions scale the objectives to the unit interval [0,1], so that the satisfactory of the objectives can be compared without influenced by their original values. The weighted-sum approach is introduced to combine the objectives. The weights imply the importance of objectives, that is, the preferences of objectives. After combination, the multi-objective optimization problem is transformed into a single-objective optimization problem. The optimal solution of the latter problem is a nondominated solution of the former one.
Fuzzy Evaluation Based Multi-objective RPO in Distribution Networks
13
2.1 Voltage Profiles Assessment To evaluate the voltage profiles, a trapezoid membership function shown as Fig.1 is introduced.
Lupper − Vi 0 ° Lupper − Lupper 1 ° 0 °° Vi − Llower Fv (Vi ) = ® lower 0 lower ° L1 − L0 °1 ° °¯0
(L
< Vi < Lupper 0
)
(L
< Vi < L1lower
)
(L
≤ Vi ≤ L1upper
)
upper 1
lower 0 lower 1
(1)
(others)
where Vi is the voltage of node i; L0upper and L0lower are unacceptable voltage limits; L1upper and L1lower are acceptable voltage margins. Fv(Vi) 1
0
lower L0
lower upper L1 L1
upper Vi(p.u.) L0
Fig. 1. Membership function for voltage profiles assessment
The membership function of the network voltage profiles is
¦¦ F (V ) N
3
v
ip
i =1 p =1
Fvoltage =
(2)
3N
where p represents the phase A, B and C; N is the number of secondary buses; Vip is the phase voltage of node i. To evaluate voltage eligibility of the network, three-phase voltage-eligibility ratio (TPVER) is defined as
¦ ¦ f (V ) N
3
ip
TPVER =
(3)
i =1 p =1
°0 f (Vip ) = ® °¯1
3N (Vip > Vupper ) or (Vip < Vlower )
(V
lower
≤ Vip ≤ Vupper )
where Vlower and Vupper are voltage limits, e.g. 0.9 and 1.07p.u., respectively.
14
J. Shi and Y. Liu
2.2 Active Power Losses Assessment To compare with the other objectives, the active power loss is also valued by a membership function. Different from the membership functions for voltage profiles, there is not a standard or limit for active power losses reduction. In this paper, the membership function for active power losses is determined by two parameters, Pl_ori and Pl_min. The former one is the active power losses before optimization, and the membership value of Pl_ori is set 0.5. The latter one, Pl_min, is the minimum active power losses, and its membership value is 1. The active power losses can be calculated by Nb
(4)
Ploss = ¦ I i2 Ri i =1
where Ploss is the active power losses of the whole network; Ii is the current of branch i; Ri is the resistance of branch i; Nb is the number of branches. The current can be divided into active power current Ii_real, and reactive power current Ii_imag. The reactive currents account for a portion of these losses.
I i = I i _ real + jI i _ imag
(5) N
Ploss = Ploss _ real + Ploss _ imag = ¦ I
N
2 i _ real
i =1
Ri + ¦ I
2 i _ imag
Ri
i =1
After reactive power compensation, branch current Ii is decreased to Ii’, and the active power losses is reduced to P’loss. While the losses caused by active power currents, Ploss_real, cannot be decreased by reactive power compensation.
Ii′ = I i _ real + j(Ii _ imag − Ii _ comp )
(6)
′ = Ploss_ real + Ploss ′ _ imag = Ploss_ real + ¦(Ii _ imag − I i _ comp ) Ri Ploss N
2
i =1
In an ideal operating condition, enough shunt capacitors are installed and reactive power current is reduced to 0. Therefore, the Ploss_imag is 0 and the Ploss_real can be treated as the minimum active power losses, Pl_min. The membership function of the active power losses shown as Fig. 2 is defined as 0 Pl > (2Pl_ori − Pl _ min ) ° − P P ° l l _ ori + 0.5 Pl_ min ≤ Pl ≤ (2Pl_ori − Pl _ min ) Floss(Pl ) = ®0.5 × Pl _ min − Pl _ ori ° °1 Pl < Pl_ min ¯
(7)
where Pl is the active power losses. The two parameters, Pl_ori and Pl_min, are determined by the system structure and original operating conditions. The assessment results of trial solutions under different operating conditions are comparable.
Fuzzy Evaluation Based Multi-objective RPO in Distribution Networks
15
Floss(Pl) 1
0.5
0 Pl_ori
Pl_min
Pl
Fig. 2. Membership function for active power losses assessment
2.3 Objective Function The multi-objective RPO can be formed as
Max Floss Max Fvoltage
(8)
s.t. Tk min ≤ Tk ≤ Tk max 0 ≤ Qcj ≤ Qcj max where Floss is the membership function for active power losses; Fvoltage is membership function for the voltage profiles; Tk is the ratio of transformer k; Qcj is the capacity of capacitors at node j. The restriction of power flow is not listed here. Normally, the solutions of a multi-objective optimization are a series of Plato solutions. A compromised objective function for operating condition m is represented as
Fobj _ m = (αFloss + β Fvoltage )m
(9)
where Fobj_m is the objective function for operation condition m;Į,ȕare weight coefficients for the two objectives. Considering load variation, two or more different operating conditions are considered during the optimization process. The RPO problem is described as
Max Fobj = ¦ (Fobj _ m ) M
m =1
s.t.
(10)
Tk min ≤ Tk ≤ Tk max 0 ≤ Qcj ≤ Qcj max
where M is the number of operating conditions.
3 Solution Method Reactive Tabu Search algorithm (RTS) [9] is utilized to solve the constrained combinational optimization problem. The two searching strategies, diversification and intensification, are presented in many heuristic algorithms. In traditional tabu search, the length of tabu list is determined by experiences. The feedback mechanisms are
16
J. Shi and Y. Liu
introduced in RTS to adjust the length of tabu list. The diversification and the intensification can be balanced according to the process of optimization. Therefore, compared with traditional TS, the RTS is more robust and effective. The reactive power optimization process by RTS is described as the following: i) Read in initial data, including impedance of feeders, loads, regulation variables and inequality constraint conditions. Code the regulation variables. ii) Generate initial solution. Set the regulation variables randomly without breaking the constraint conditions, including power flow restriction. Calculate the objective function f(X), and set best solution vector Xopt as X. iii) Generate a group of trail solutions, X1, X2, … , Xk, by “move” from X. Check their feasibility and discard the unfeasible ones. iv) Get the corresponding values by searching Hash Table, in which all the visited configurations are stored. If the trial solutions are not available from the Hash Table, calculate the objective function, f(X1), f(X2), …, f(Xk), and keep them in the Hash Table. v) Search neighborhoods. Get the best one, X*, from the trail solutions. Update X with X*, if X* is not in the Tabu list, or X* fits aspiration criteria. Try the next solution, if the former one cannot update X. vi) Update Tabu List. Push the record of reversed move into a FIFO (First-InFirst-Out) stack, the Tabu List. vii) Update Xopt with X*, if f(X*) is better than f(Xopt). viii) Update the length of Tabu list. If most solutions are gotten from the Hash Table, the length of Tabu list increases till an upper limit to escape from local bests. If less solution is gotten from the Hash Table, the length of Tabu list decreases till a lower limit to reduce the constraints of Tabu list. ix) Terminate condition. Stop optimization and output results if f(Xopt) has not been improved for several iterations or number of maximum iteration is meet. If terminate conditions are not fit, iteration should continue from step iii).
4 Test Results The effectiveness of the proposed method is verified by the application to a practical distribution system shown as Fig.3 in Jinan, China. Data are sampled by the distribution SCADA system every 15 minutes. 4.1 Operating Conditions Selection Active power losses and voltage profiles are closely related with loads, which vary continually during different time of a day and different seasons of a year. If the voltage profiles in peak load conditions and valley load conditions are eligible, those during this period are believed to be acceptable. Consequently, the peak and valley load conditions cover the daily varying loads, and two typical days in spring and summer represent load variations in a year. Therefore, the loads variation during a long period can be covered by a few operating conditions and the optimization results fit a long-period voltage profiles variation.
Fuzzy Evaluation Based Multi-objective RPO in Distribution Networks
17
6000# 6001# 6010#
6002# 6003# 6004#
substation
6011# 6008# 6009# 6007# 6006# 6005#
Fig. 3. Structure of case distribution network
Operating conditions considered in optimization is closely related with calculation speed. Two typical operating conditions, the maximum loads and the minimum loads in a year, are utilized to “cover” voltage and loads variations in optimization. The former one is taken from peak load in a typical day in summer, and the latter one is valley load condition in spring. 4.2 Optimization Results Three-phase power flow is calculated by the forward-backward sweeping [10]. All 13 three-phase 10kV/0.4kV transformers in this network have no-load tap-changers (NLTC), which are 1±2.5%×2, and the original positions are 3, that is, the ratios are 1 p.u. The phase-to-neutral voltage upper and lower limits for 0.4kV distribution networks are 1+7% (235.4V) and 1-10% (198V) respectively. In Eq.1, acceptable upper and lower voltage margins are 1±2%, and unacceptable limits are 1±20%. In Eq.9, is 3 and is 1, which means the active power losses reduction is more emphasized than the voltage profiles improvement. Four nodes with biggest sensitivity values are pre-selected to be compensated, shown in Table1, all of which are under heavier loads. Furthermore, 6010# is the smallest capacity one (250kVA), whose impedance is bigger than that of the others (315kVA or 400kVA). Different regulation means are compared. In scheme 1, only non-load tap-changer (NLTC) and fixed capacitors (FC) are used to regulate voltage profiles. In scheme 2, NLTC, FC and switchable capacitors (SC) are utilized. After optimization, tap changers of all the transformers are regulated from position 3 to position 1, that is, to decrease the ratios from 1 to 0.95 in p.u. The voltage profiles and
18
J. Shi and Y. Liu
active power losses before and after optimization are shown in Table2. Both schemes can improve the voltage-eligibility ratio and decrease active power losses under two operating conditions evidently. Table 1. Sensitivity and capacity at 0.4kV sides Capacity (kVar) Locations
SC (w/kVar)
6010
Scheme 1 (FC)
Scheme 2 (FC+SC)
10.9
40
40+80
6007
8.4
40
40+80
6000
7.2
40
40+80
6008
6.1
60
60+80
Table 2. Voltage profiles and active power losses
Load Conditions Items
Summer Max
Spring Min
TPVER Power loss TPVER Power loss (%) (%) (%) (%)
Before optimization
94.87
2.38
0
6.39
Scheme 1 (NLTC, FC)
100
2.24
100
6.36
Scheme 2 (NLTC, SC, FC)
100
2.13
100
6.36
5 Conclusions The reactive power optimization problem is formed as a multi-objective optimization problem, which aims to improve voltage profiles and decrease active power losses. In this way, the conflict between active power losses reduction and voltage constraints can be solved compromise. The fuzzy evaluation based weighted-sum approach is effective in solving multi-objective optimizations. The membership functions scale the objectives to the unit interval [0,1]. Therefore, the satisfaction of different objectives can be compared fairly. The fuzzy evaluation strategy releases the influence of original values.
Fuzzy Evaluation Based Multi-objective RPO in Distribution Networks
19
References 1. Jin-Cheng Wang, et al. Capacitor placement and real time control in large-scale unbalanced distribution systems: Loss reduction formula, problem formulation, solution methodology and mathematical justification. IEEE Trans. on Power Delivery 1997; 12(2): 953-58. 2. H. N. Ng, M. M. A. Salama, A. Y. Chikhani. Classification of capacitor allocation techniques. IEEE Transactions on Power Delivery 2000; 15(1): 387-92. 3. KarenNan Miu, Hsiao-Dong Chiang, Gary Darling. Capacitor placement, replacement and control in large-scale distribution systems by a GA-based two-stage algorithm. IEEE Transactions on Power Systems 1997; 12(3): 1160-66. 4. Yutian Liu, Li Ma, Jianjun Zhang. Reactive power optimization by GA/SA/TS combined algorithms, Int. J. of Electric Power & Energy Systems 2002; 24(9): 765-69 5. S.F. Mekhamer, et al. Application of fuzzy logic for reactive-power compensation of radial distribution feeders; Transactions on power systems 2003; 18(1): 206-13 6. Yutian Liu, Peng Zhang, Xizhao Qiu. Optimal voltage/var control in distribution systems, Int. J. of Electric Power & Energy Systems 2002; 24(4): 271-76 7. Feng-Chang Lu, Yuan-Yih Hsu. Fuzzy dynamic programming approach to reactive power/voltage control in a distribution substation. IEEE Transactions on Power Systems 1997; 12(2): 681 –88 8. Andrija T. Saric, Milan S. Calovic, Vladimir C. Strezoski. Fuzzy multi-objective algorithm for multiple solution of distribution systems voltage control. Int. J. of Electrical Power & Energy Systems 2003; 25(2): 145-53. 9. V. J. Rayward-Smith, I.H. Osman, C. R. Reeves, G. D. Smith, Modern Heuristic search methods, John Wiley and Sons Ltd, 1996 10. Whei-Min Lin, et al, Three-phase unbalanced distribution power flow solutions with minimum data preparation. IEEE Transactions on Power Systems 1999; 14(3): 1178-83
Note on Interval-Valued Fuzzy Set Wenyi Zeng1,2 and Yu Shi1 1
2
Department of Mathematics, Beijing Normal University, Beijing, 100875, P.R. China Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, SE171 77, Sweden [email protected] shi yu [email protected]
Abstract. In this note, we introduce the concept of cut set of intervalvalued fuzzy set and discuss some properties of cut set of interval-valued fuzzy set, propose three decomposition theorems of interval-valued fuzzy set and investigate some properties of cut set of interval-valued fuzzy set and mapping H in detail. These works can be used in setting up the basic theory of interval-valued fuzzy set.
1
Introduction
The theory of fuzzy set, pioneered by Zadeh[11], has achieved many successful applications in practice. As a generalization of fuzzy set, Zadeh[12, 13, 14] introduced the concept of interval-valued fuzzy set, after that, some authors investigated the topic and obtained some meaningful conclusions. For example, Biswas[2] and Li[7] in interval-valued fuzzy subgroup, Mondal[8] in intervalvalued fuzzy topology, Bustince etc.[3], Chen etc.[4], Yuan etc.[10], Arnould[1] and Gorzalczany[6] in approximate reasoning of interval-valued fuzzy set, Bustince etc.[3] and Deschrijver etc.[5] in interval-valued fuzzy relations and implication, Turksen[9] in normal forms of interval-valued fuzzy set and so on. These works show the importance of interval-valued fuzzy set. Just like that decomposition theorems of fuzzy set played an important role in the fuzzy set theory, it helped us to develop many branches such as fuzzy algebra, fuzzy measure and integral, fuzzy analysis, fuzzy decision making and so on. In this paper, our aim is to investigate decomposition theorems of intervalvalued fuzzy set in order that we can do some preparation for setting up the basic theory of interval-valued fuzzy set and develop its some relative branches. In this paper, our work is organized as follows. In the section 2, we introduce the concept of cut set of interval-valued fuzzy set and discuss some properties of cut set of interval-valued fuzzy set. In section 3, we propose three decomposition theorems of interval-valued fuzzy set and give some properties of cut set of interval-valued fuzzy set and mapping H. The final is conclusion. L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 20–25, 2005. c Springer-Verlag Berlin Heidelberg 2005
Note on Interval-Valued Fuzzy Set
2
21
Preliminaries
In this section, we introduce the concept of cut set of interval-valued fuzzy set and give some properties of cut set of interval-valued fuzzy set. Let I = [0, 1] and [I] be the set of all closed subintervals of the interval [0, 1]. Then, according to Zadeh’s extension principle[11], we can popularize these operations such as ∨, ∧ and c to [I], thus, ([I], ∨, ∧, c) is a complete lattice with a minimal element ¯ 0 = [0, 0] and a maximal element ¯ 1 = [1, 1]. Furthermore, let a ¯ = ¯ = ¯b ⇐⇒ a− = b− , a+ = b+ , a ¯ ≤ ¯b ⇐⇒ [a− , a+ ], ¯b = [b− , b+ ], then we have, a a− ≤ b− , a+ ≤ b+ and a ¯ < ¯b ⇐⇒ a ¯ ≤ ¯b and a ¯ = ¯b. Considering [I] is dense, therefore, ([I], ∨, ∧, c) is a superior soft algebra. Suppose X be a universal set, we call a mapping: A : X −→ [I] an intervalvalued fuzzy set in X. Let IVFSs stands for the set of all interval-valued fuzzy sets in X. For every A ∈IVFSs and x ∈ X, A(x) = [A− (x), A+ (x)] is called the degree of membership of an element x to A, then fuzzy sets A− : X → [0, 1] and A+ : X → [0, 1] are called a low fuzzy set of A and a upper fuzzy set of A, respectively. For simplicity, we denote A = [A− , A+ ], and F (X) and P(X) as the set of all fuzzy sets and crisp sets in X, respectively. Therefore, some operations such as ∪, ∩, c can be introduced into IVFSs, thus, (IVFSs, ∪, ∩, c) is a superior soft algebra. Definition 1. Let A ∈IVFSs, λ = [λ1 , λ2 ] ∈ [I], we order: (1,1)
Aλ
(1,2)
Aλ
(2,1)
Aλ
(2,2)
Aλ (3,3)
Aλ
(3,4)
Aλ
(4,3)
Aλ
(4,4)
Aλ
= A[λ1 ,λ2 ] = {x ∈ X|A− (x) λ1 , A+ (x) λ2 } (1,1)
= A[λ1 ,λ2 ] = {x ∈ X|A− (x) λ1 , A+ (x) > λ2 } (1,2)
= A[λ1 ,λ2 ] = {x ∈ X|A− (x) > λ1 , A+ (x) λ2 } (2,1)
= A[λ1 ,λ2 ] = {x ∈ X|A− (x) > λ1 , A+ (x) > λ2 } (2,2)
= A[λ1 ,λ2 ] = {x ∈ X|A− (x) λ1
or A+ (x) λ2 }
= A[λ1 ,λ2 ] = {x ∈ X|A− (x) < λ1
or A+ (x) λ2 }
(3,3)
= A[λ1 ,λ2 ] = {x ∈ X|A− (x) λ1
or A+ (x) < λ2 }
= A[λ1 ,λ2 ] = {x ∈ X|A− (x) < λ1
or A+ (x) < λ2 }
(3,4) (4,3) (4,4)
(i,j)
A[λ1 ,λ2 ] is called the (i, j)th (λ1 , λ2 )-(double value) cut set of interval-valued (i,j)
(i,j)
= A[λ,λ] is called the (i, j)th fuzzy set A. Specially, if λ = λ1 = λ2 , Aλ λ-(single value) cut set of interval-valued fuzzy set A. For A ∈ F(X), λ ∈ [0, 1], we denote A1λ = {x ∈ X|A(x) λ}, A2λ = {x ∈ X|A(x) > λ}, A3λ = {x ∈ X|A(x) λ}, A4λ = {x ∈ X|A(x) < λ}. Therefore, we have the following properties. Property 1. A[λ1 ,λ2 ] = (A− )iλ1 ∩ (A+ )jλ2 ,
i, j = 1, 2
A[λ1 ,λ2 ] = (A− )iλ1 ∪ (A+ )jλ2 ,
i, j = 3, 4
(i,j)
(i,j)
22
W. Zeng and Y. Shi
Property 2. (2,2)
(1,2)
(1,1)
A[λ1 ,λ2 ] ⊆ A[λ1 ,λ2 ] ⊆ A[λ1 ,λ2 ]
(4,4)
(3,4)
(3,3)
A[λ1 ,λ2 ] ⊆ A[λ1 ,λ2 ] ⊆ A[λ1 ,λ2 ]
A[λ1 ,λ2 ] ⊆ A[λ1 ,λ2 ] ⊆ A[λ1 ,λ2 ] , A[λ1 ,λ2 ] ⊆ A[λ1 ,λ2 ] ⊆ A[λ1 ,λ2 ] ,
(2,2)
(2,1)
(1,1)
(4,4)
(4,3)
(3,3)
Property 3. For λ1 = [λ11 , λ21 ], λ2 = [λ12 , λ22 ] ∈ [I] and λ11 < λ12 , λ21 < λ22 , then (2,2) (1,1) (3,3) (4,4) Aλ1 ⊇ Aλ2 , Aλ1 ⊆ Aλ2 . Property 4. For λ = [λ1 , λ2 ], then
c (1,1) (4,4) Aλ = Aλ ,
c (2,1) (3,4) Aλ = Aλ ,
c (1,2) (4,3) Aλ = Aλ
c (2,2) (3,3) Aλ = Aλ
Definition 2. For [λ1 , λ2 ] ∈ [I] and A ∈IVFSs, we order [λ1 , λ2 ] · A, [λ1 , λ2 ] ∗ A ∈IVFSs and their membership functions are defined as following. ([λ1 , λ2 ] · A) (x) [λ1 ∧ A− (x), λ2 ∧ A+ (x)] ([λ1 , λ2 ] ∗ A) (x) [λ1 ∨ A− (x), λ2 ∨ A+ (x)] Property 5. For A, B ∈IVFSs and λ, λ1 , λ2 ∈ [I], then we have, (1) λ1 λ2 ⇒ λ1 · A ⊆ λ2 · A, λ1 ∗ A ⊆ λ2 ∗ A (2) A ⊆ B ⇒ λ · A ⊆ λ · B, λ ∗ A ⊆ λ ∗ B
3
Decomposition Theorem
In this section, we will give three decomposition theorems of interval-valued fuzzy set and some properties of cut set of interval-valued fuzzy set and mapping H. (1,j) [λ1 , λ2 ] · A[λ1 ,λ2 ] , j = 1, 2 Theorem 1. A = [λ1 ,λ2 ]∈[I]
Theorem 2. A =
(4,i)
λ ∗ Aλc
c
,
i = 3, 4.
λ∈[I]
Supposed H be a mapping from [I] to P(X), H : [I] → P(X), for every λ = [λ1 , λ2 ] ∈ [I], we have H(λ) ∈ P(X). Obviously, cut set of interval-valued fuzzy set, Aλ ∈ P(X), it means that mapping H indeed exists. Based on Theorem 1 and Theorem 2, we have the following theorem in general. Theorem 3. For A ∈IVFSs, we have: (1,2)
(1,1)
(1) If there exists A[λ1 ,λ2 ] ⊆ H([λ1 , λ2 ]) ⊆ A[λ1 ,λ2 ] , then A = H([λ1 , λ2 ]). (4,4) (4,3) (2) If there exists A[λ1 ,λ2 ] ⊆ H([λ1 , λ2 ]) ⊆ A[λ1 ,λ2 ] , then A =
c H([λ1 , λ2 ]c ) .
[λ1 , λ2 ] ·
[λ1 ,λ2 ]∈[I]
[λ1 , λ2 ]∗
[λ1 ,λ2 ]∈[I]
Note on Interval-Valued Fuzzy Set
23
In the following, we investigate the relation of cut set of interval-valued fuzzy set and mapping H. For simplicity, we denote λ1 = [λ11 , λ21 ], λ2 = [λ12 , λ22 ] and λ1 , λ2 ∈ [I], then we give some properties of cut set of interval-valued fuzzy set and mapping H. (2,2)
(1,1)
Property 6. If there exists A[λ1 ,λ2 ] ⊆ H([λ1 , λ2 ]) ⊆ A[λ1 ,λ2 ] , then we have: λi1< λi2 , i = 1, 2 ⇒ H(λ1 ) ⊇ H(λ2 ), (a) λ1 , λ2 ∈ [I] and (1,1) H([α1 , α2 ]), λ1 = 0 and λ2 = 0, (b) A[λ1 ,λ2 ] =
α1 λ2 (2,2)
(2,1)
Property 7. If there exists A[λ1 ,λ2 ] ⊆ H([λ1 , λ2 ]) ⊆ A[λ1 ,λ2 ] , then we have: (a) λ1 , λ2 ∈ [I] and λi1 < λi2 , i = 1, 2 ⇒ H(λ1 ) ⊇ H(λ2 ), (2,1) H([λ1 , α2 ]), λ2 = 0, (b) A[λ1 ,λ2 ] =
α2 λ2 (2,2)
(1,2)
Property 8. If there exists A[λ1 ,λ2 ] ⊆ H([λ1 , λ2 ]) ⊆ A[λ1 ,λ2 ] , then we have: i i (a) λ1 , λ2 ∈ [I] and λ1 < λ2 , i = 1, 2 ⇒ H(λ1 ) ⊇ H(λ2 ), (1,2) H([α1 , λ2 ]), λ1 = 0, (b) A[λ1 ,λ2 ] =
α1 λ1 (2,1)
(1,1)
Property 9. If there exists A[λ1 ,λ2 ] ⊆ H([λ1 , λ2 ]) ⊆ A[λ1 ,λ2 ] , then we have: i i (a) λ1 , λ2 ∈ [I] and λ1 < λ2 , i = 1, 2 ⇒ H(λ1 ) ⊇ H(λ2 ), (1,1) H([α1 , λ2 ]), λ1 = 0, (b) A[λ1 ,λ2 ] =
α1 λ1 (1,2)
(1,1)
Property 10. If there exists A[λ1 ,λ2 ] ⊆ H([λ1 , λ2 ]) ⊆ A[λ1 ,λ2 ] , then we have: i i (a) λ1 , λ2 ∈ [I] and λ1 < λ2 , i = 1, 2 ⇒ H(λ1 ) ⊇ H(λ2 ), (1,1) H([λ1 , α2 ]), λ2 = 0, (b) A[λ1 ,λ2 ] =
α2 λ2
H([λ1 , α2 ]),
λ2 = 1
24
W. Zeng and Y. Shi (4,4)
(3,3)
Property 11. If there exists A[λ1 ,λ2 ] ⊆ H([λ1 , λ2 ]) ⊆ A[λ1 ,λ2 ] , then we have: (a) λ1 , λ2 ∈ [I] and λi1 < λi2 , i = 1, 2 ⇒ H(λ1 ) ⊆ H(λ2 ), (4,4) H([α1 , α2 ]), λ1 = 0 and λ2 = 0, (b) A[λ1 ,λ2 ] = α1 λ2 (4,4)
(4,3)
Property 12. If there exists A[λ1 ,λ2 ] ⊆ H([λ1 , λ2 ]) ⊆ A[λ1 ,λ2 ] , then we have: (a) λ1 , λ2 ∈ [I] and λi1 < λi2 , i = 1, 2 ⇒ H(λ1 ) ⊆ H(λ2 ), (4,4) H([λ1 , α2 ]), λ2 = 0, (b) A[λ1 ,λ2 ] = α2 λ2 (4,4)
(3,4)
Property 13. If there exists A[λ1 ,λ2 ] ⊆ H([λ1 , λ2 ]) ⊆ A[λ1 ,λ2 ] , then we have: (a) λ1 , λ2 ∈ [I] and λi1 < λi2 , i = 1, 2 ⇒ H(λ1 ) ⊆ H(λ2 ), (4,4) (b) A[λ1 ,λ2 ] = H([α1 , λ2 ]), λ1 = 0, α1 λ1 (3,4)
(3,3)
Property 14. If there exists A[λ1 ,λ2 ] ⊆ H([λ1 , λ2 ]) ⊆ A[λ1 ,λ2 ] , then we have: (a) λ1 , λ2 ∈ [I] and λi1 < λi2 , i = 1, 2 ⇒ H(λ1 ) ⊆ H(λ2 ), (3,4) (b) A[λ1 ,λ2 ] = H([λ1 , α2 ]), λ2 = 0, α2 λ2 (4,3)
(3,3)
Property 15. If there exists A[λ1 ,λ2 ] ⊆ H([λ1 , λ2 ]) ⊆ A[λ1 ,λ2 ] , then we have: (a) λ1 , λ2 ∈ [I] and λi1 < λi2 , i = 1, 2 ⇒ H(λ1 ) ⊆ H(λ2 ), (4,3) H([α1 , λ2 ]), λ1 = 0, (b) A[λ1 ,λ2 ] = α1 λ1
H([α1 , λ2 ]),
λ1 = 1
Note on Interval-Valued Fuzzy Set
4
25
Conclusion
In this paper, we introduce the concept of cut set of interval-valued fuzzy set and discuss some properties of cut set of interval-valued fuzzy set, propose three decomposition theorems of interval-valued fuzzy set and investigate some properties of cut set of interval-valued fuzzy set and mapping H in detail. These works can be used in setting up the basic theory of interval-valued fuzzy set. The discussion of representation theorems of interval-valued fuzzy set will be studied in other papers.
References [1] Arnould, T., Tano, S.: Interval valued fuzzy backward reasoning, IEEE Trans, Fuzzy Syst. 3(4)(1995), 425-437 [2] Biswas, R.: Rosenfeld’s fuzzy subgroups with interval-valued membership functions, Fuzzy Sets and Systems 63(1994), 87-90 [3] Bustince, H., Burillo, P.: Mathematical analysis of interval-valued fuzzy relations: Application to approximate reasoning, Fuzzy Sets and Systems 113(2000), 205-219 [4] Chen, S.M., Hsiao, W.H.: Bidirectional approximate reasoning for rule-based systems using interval-valued fuzzy sets, Fuzzy Sets and Systems 113(2000), 185-203 [5] Deschrijver, G., Kerre, E.E.: On the relationship between some extensions of fuzzy set theory, Fuzzy Sets and Systems 133(2003), 227-235 [6] Gorzalczany, M.B.: A method of inference in approximate reasoning based on interval-valued fuzzy sets, Fuzzy Setss and Systems 21(1987), 1-17 [7] Li, X.P., Wang, G.J.: The SH -interval-valued fuzzy subgroup, Fuzzy Sets and Systems 112(2000), 319-325 [8] Mondal, T.K., Samanta, S.K.: Topology of interval-valued fuzzy sets, Indian J. Pure Appl. Math., 30(1999), 23-38 [9] Turksen, I.B.: Interval-valued fuzzy sets based on normal forms, Fuzzy Sets and Systems 20(1986), 191-210 [10] Yuan, B., Pan, Y., Wu, W.M.: On normal form based on interval-valued fuzzy sets and their applications to approximate reasoning, Internat. J. General Systems 23(1995), 241-254 [11] Zadeh, L.A.: Fuzzy sets, Infor. and Contr. 8(1965), 338-353 [12] Zadeh, L.A.: The concept of a linguistic variable and its application to approximate reasoning, Part 1, Infor. Sci. 8(1975), 199-249 [13] Zadeh, L.A.: The concept of a linguistic variable and its application to approximate reasoning, Part 2, Infor. Sci. 8(1975), 301-357 [14] Zadeh, L.A.: The concept of a linguistic variable and its application to approximate reasoning, Part 3, Infor. Sci. 9(1975), 43-80
Knowledge Structuring and Evaluation Based on Grey Theory Chen Huang and Yushun Fan CIMS ERC, Department of Automation, Tsinghua University ,Beijing 100084, PR China [email protected] [email protected]
Abstract. It is important nowadays to provide guidance for individuals or organizations to improve their knowledge according to their objectives, especially in the case of incomplete cognition. Based on grey system theory, a knowledge architecture which consists of grey elements including knowledge fields and knowledge units is built. The method to calculate the weightiness of each knowledge unit, with regard to the user's objectives, is detailed. The knowledge possessed by the user is also evaluated with grey clustering method by whitenization weight function.
1 Introduction Knowledge is a very important factor to knowledge workers and knowledge-intensive organizations. Guidance, which points out what the most important knowledge is according to user’s objectives and gives the evaluation of current knowledge possessions, is required to improve one's knowledge more efficiently. However, the existing knowledge management technology[1 2] can not resolve this problem since the cognition and evaluation of knowledge is indefinite and difficult. To resolve this problem, this paper introduces grey theory[3 4] into the knowledge management. The grey theory was founded by Prof. Julong Deng in 1982 and caused intense attention because of its original thought and broad applicability. Grey means the information is incomplete. This theory intended to use extremely limited known information to forecast unknown information. By far, it has already developed a set of technologies including system modeling, analysis, evaluation, optimization, forecasting and decision-making, and has been applied in many fields such as agriculture, environment and mechanical engineering. To knowledge workers and knowledge-intensive organizations, the objectives are usually indefinite and changing, while the cognition of its own knowledge is incomplete. Though the cognition will be continually improved along with the accumulation of knowledge, the characteristic of grey will always exist. Therefore, it is more suitable to use grey system theory, instead of other traditional theories and methods, to model and evaluate one's knowledge. In this paper, the knowledge architecture based on grey theory is given firstly. Then it provides a method to calculate the weightiness of each knowledge unit in the architecture, with regard to the user's objectives. Finally it presents how to evaluate the knowledge with grey clustering method. L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 26 – 30, 2005. © Springer-Verlag Berlin Heidelberg 2005
Knowledge Structuring and Evaluation Based on Grey Theory
27
2 Knowledge Architecture Based on Grey Theory Since the pursuing of knowledge should be objective-driven, in this knowledge architecture, we should first define the system objective, which could change with time. The knowledge architecture mainly consists of knowledge fields (KF) and knowledge units (KU), both of which are grey elements. A knowledge field is the field that the knowledge belongs to. Each knowledge field could be divided into many sub-fields. The knowledge fields that can not be divided any more are called knowledge units. The knowledge architecture defined above is shown as Fig. 1. Here KF refers to knowledge field, while KU refers to knowledge unit.
Fig. 1. Knowledge architecture
Because of the incomplete cognition, each element in this knowledge architecture is a grey element, which means the element has not been cognized completely. We use grey degree to represent the extent of grey for grey elements. Grey degree describes the grey extent of a knowledge unit. The value range is (0,1] . 0 represents completely known, while 1 represents totally unknown. According to grey theory, the knowledge units will never turn 'white', in other words, the grey degree will never reach 0. But users can define that when the grey degree is less than a certain value (such as 0.1), the knowledge unit can be regarded as white approximately. What's more, since the grey degree is very hard to be measured precisely, we can define several grey clusters in (0,1] , thus the grey degree could be estimated by judging which grey cluster the knowledge unit belongs to.
3 The Weightiness of Knowledge Units The weightiness of a knowledge unit is a quantitative parameter used to indicate the importance of a knowledge unit, with regard to the user's objective. The basic principle to determine the weightiness of a knowledge unit is the analysis hierarchy process. The process is detailed as following. Firstly, construct judgement matrixes including Objective-KF/KU matrix and KF-KF/KU matrixes. Each node in the architecture can have a judgement matrix with its sub-nodes. Suppose node A has n sub-nodes. Then the judgement matrix is
28
C. Huang and Y. Fan
Δ = {δ ij }, i, j = 1, " , n. , which means it is n-rank. Compare every two sub-nodes: sub-node i and sub-node j. If they are the same important, we have δ ij = 1 . If i is more important than j, δ ij = 5 . If i is extremely more important than j,
δ ij = 9 .
Secondly, we can calculate the weightiness of every mono-layer according to the judgement matrixes. In other words, for a node that has sub-nodes, calculate the weightiness of its sub-nodes. It can be given by calculating the latent root λ max and eigenvector W of the judgement matrix
Δ , where
ΔW = λ max W
(1)
Finally, we calculate the overall weightiness, making use of the weightiness of every mono-layer. Suppose the layer 1 has m elements: A1 ,..., Am , and the weightiness of Ai to layer 0 is
a 0i
i=1,2,…,m. The layer 2 has k elements: B1 ,..., Bk , and the weightiness of B j
to Ai is bi j , j = 1,2,..., k . Here if B j is independent of Ai , we have bi j = 0 . Then the weightiness of B j to layer 0 is m
w0j = ¦ a 0i bi j , j = 1,2,...k
(2)
i =1
Actually, since B j only has one father node, supposing its father node is Ai , then the weightiness of B j to layer 0 is
w0j = a 0l bl j
(3)
If there are several objectives, the weightiness of each knowledge unit to each objective can be calculated in the similar way.
4 Knowledge Evaluation with Grey Clustering Method Since knowledge is very hard to evaluate, we use grey clustering method to classify knowledge units with whitenization weight function according to how they are mastered. Definition: Suppose there are n objects to be clustered, m clustering criterion, s different grey clusters. According to sampling xij (i = 1,2,..., n; j = 1,2,..., m) of
object i (i=1,2,…,n) regarding criterion j (j=1,2,…,m), classify object i into grey cluster k ( k ∈ {1,2,..., s} ). We call it grey clustering.[3] In the knowledge architecture, each knowledge unit is an object to be clustered. Clustering criterions are observation criterions used to judge how the knowledge unit is mastered. Grey clusters refer to the grey classes defined based on the extent of knowledge mastery such as ‘bad’, ‘medium’ and ’excellent’.
Knowledge Structuring and Evaluation Based on Grey Theory
29
In grey theory, whitenization weight function is frequently used to describe the preference extent when a grey element takes different value in its value field. Frequently used whitenization weight functions are shown in Fig. 2.
Fig. 2. Whitenization weight function
The whitenization weight function is represented as f jk (•) . If f jk (•) is as shown in Fig. 2(a), Fig. 2(b), Fig. 2(c) or Fig. 2(d), then it is represented as f jk [ x kj (1), x kj ( 2), x kj (3), x kj ( 4)] , f jk [−,−, x kj (3), x kj (4)] , f jk [ x kj (1), x kj (2),−, x kj ( 4)] or f jk [ x kj (1), x kj (2),−,−] separately. In the knowledge architecture, generally speaking, the
whitenization weight f jk [−,−, x kj (3), x kj (4)] ;
function that of
of ‘bad’ ‘medium’
grey grey
cluster cluster
should should
be be
like like
f jk [ x kj (1), x kj (2),−, x kj ( 4)] ; and that of ‘excellent’ grey cluster should be like f jk [ x kj (1), x kj (2),−,−] .
Since the significance and the dimension of each criterion are very different with each other, we adopt fixed weightiness to cluster. We call η j the clustering weightiness of criterion j. The steps of grey fixed-weightiness clustering are: (1) Give the whitenization weight function of sub-cluster k of criterion j f jk (•)( j = 1,2,..., m; k = 1,2,..., s ) . (2) Give the clustering weightiness of each criterion η j ( j = 1,2,..., m) by qualitative analysis or using the method given in Section 3. (3) Given f jk (•) , η j and x ij (i = 1,2,..., n; j = 1,2,..., m ) which are samplings of object i regarding criterion j, calculate grey fixed-weightiness clustering m
quotieties σ k = ¦ f k ( x ) ⋅η ( i = 1,2,..., n; k = 1,2,..., s ). i j ij j j =1
∗
k (4) If σ k = i max {σ i } , we can conclude that object i belongs to grey cluster k . ∗
1≤ k ≤ s
For example, suppose there are 4 knowledge units a, b, c, d. Establishing criterion set : number of literatures have been read; : number of published papers; : as number of giving lectures}, we classify the four units into three clusters which are ‘excellent’, ‘medium’ and ‘bad’. The samplings are shown in table 1.
30
C. Huang and Y. Fan Table 1. Samplings of each knowledge unit regarding criterions
KU a KU b KU c KU d
5 44 20 25
1 4 3 5
0 3 1 2
Firstly, give the whitenization weight functions: f11[0,40,−,−] , f12 [0,20,−,40] , 1 1 3 f13 [−,−,5,10] , f 2 [0,10,−,−] , f 22 [0,5,−,10] , f 2 [ −,−,4,8] , f 3 [0,4,−,−] , f 32 [ 0,2,−,4] , f 33 [−,−,1,2] . Assume the clustering weightiness of each criterion is
η 1 = 0.2,η 2 = 0.5,η 3 = 0.3 . Then it can be given:
σ 1 = (σ 11 , σ 12 , σ 13∗ ) = (0.075,0.15,1.0) , σ 2 = (σ 21∗ , σ 22 , σ 23 ) = (0.625,0.55,0.5) , σ 3 = (σ 31 , σ 32 , σ 33∗ ) = (0.385,0.65,0.8) , σ 4 = (σ 41 , σ 42∗ , σ 43 ) = (0.525,0.95,0.375) The results indicate that knowledge unit b2 belongs to ‘excellent’ grey cluster, b4 belongs to ‘medium’ grey cluster, b1 and b3 belongs to ‘bad’ grey cluster. Combining the evaluation result with the weightiness of knowledge units, the foremost learning and developing directions can be known.
6 Conclusion This paper gives the knowledge architecture based on grey theory. It provides the method to calculate the weightiness of knowledge and to evaluate them. This method can not only help users to establish their knowledge architecture, but also inform them which are the most important knowledge and which are the foremost learning directions. It also provides the existing knowledge management technology a new idea, which is human-oriented since the user's objective and the expansibility of cognition are well considered. Furthermore, since the method provided in this paper is just objective-oriented, the process-based knowledge modeling and evaluation will be studied in our future work.
References 1. Schreiber, G., Akkermans, H., Anjewierden, A., deHoog, R., Shadbolt, N., VandeVelde, W., Wielinga, B.: Knowledge engineering and management: the commonKADS methodology. MIT Press, Cambridge Massachusetts London England (2000) 2. Mertins, K., Heisig, P., Vorbeck, J.: Knowledge management concepts and best practices. Tsinghua University Press, Beijing (2004) 3. Liu, S., Guo, T., Dang, Y.: Theory and applications of grey systems. Science Press, Beijing (1999) 4. Deng, J.: The tutorial of grey system theory. Huazhong Science and Technology Uniuersity Press, Wuhan (1990) 5. Xia, S., Yang, J., Yang, Z.: The introduction of system engineering. Tsinghua University Press, Beijing (1995)
A Propositional Calculus Formal Deductive System LU of Universal Logic and Its Completeness Minxia Luo1,2 and Huacan He1 1
School of Computer Science, Northwestern Polytechnical, University, Xi’an, 710072, P.R. China [email protected], [email protected] 2 Department of Mathematics, Yuncheng University, Yuncheng, 044000, P.R. China
Abstract. Universal logic has given 0-level universal conjunction operation, universal disjunction operation and the universal implication operation. We introduce a new kind of algebra system UBL algebra based on these operations. A general propositional calculus formal deductive system LU of universal logic based on UBL algebras is built up, and its completeness is proved.
1
Introduction
In past several years, fuzzy logic has been application in fuzzy control field. However, fuzzy logic theory is not consummate. Fuzzy logic has been doubt and animadversion (see [1]) because these methods of fuzzy reasoning have not strict logical foundation. Residuated fuzzy logic calculi are related to continuous t-norms which are used as truth functions for the conjunction connective, and their residua as truth function for the implication. Main examples are L ukasiewicz, G¨odel and product logics, related to L ukasiewicz t-norm (x∗y = max(0, x + y − 1)), G¨ odel t-norm (x∗y = min(x, y)) and product t-norm (x∗y = xy) respectively. Rose and Rosser [2] proved completeness results for L ukasiewicz logic and Dummet [3] for G¨ odel logic, and recently three of the authors [4] axiomatized product logic. More recently, H´ ajek [5] has proposed the axiomatic system BL corresponding to a generic continuous t-norm. A kind of fuzzy propositional logic was proposed by Professor Guojun Wang [6] in 1997, and its completeness has proved in 2002(see[8]). A new kind of logical system–based on strong regular residuated lattices has been proposed by Professor Daowu Pei in 2002, and its completeness has been proved(see[9]). A new requirement for classical logic was raised with developing of computer science and modern logic. Non-classical logic and modern logic are developing rapidly. Universal logic principle was proposed by professor Huacan He in 2001(see [10]). A kind of flexible relation between fuzzy propositions has been found in the study of artificial intelligence theory. The flexible relation L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 31–41, 2005. c Springer-Verlag Berlin Heidelberg 2005
32
M. Luo and H. He
was described by the model of continuous universal logic operation. The main reasons to influence the flexible relation are generalized correlation and generalized autocorrelation. Some 0-level binary universal operations are defined as follows(see[10]): T (x, y, h) = ite{0|x = 0 or y = 0; (max(0, xm + y m − 1))1/m } S(x, y, h) = ite{1|x = 1 or y = 1; 1 − (max(0, (1 − x)m + (1 − y)m − 1))1/m } I(x, y, h) = ite{1|x≤y; 0|m≤0 and y = 0; (1 − xm + y m )1/m } where m = (3 − 4h)/(4h(1 − h)), h∈[0, 1]. S = ite{β|α; γ}, it is a conditional expression which express that if α is true, then S = β; otherwise, S = γ. Universal logical formal deduction system B in ideal condition(h≡0.5) has been established by author [11], and its completeness has been proved [12]. In this paper, we introduce a kind of algebra–UBL algebra, and its properties are discussed. A general propositional calculus formal system LU based on UBL algebras is built up, and its completeness is proved. Moreover, formal deduction system B is a schematic extension of general universal logic system LU .
2
Main Properties of UBL Algebras
Some properties of the model of 0-level binary universal propositional connectives have been studied in [10]. It has been proved that the model of 0-level universal conjunction is a nilpotent t-norm for h∈(0, 0.75), a strict t-norm for h∈(0.75, 1) in [13]. The model of 0-level universal implication is a residuum of the model of 0-level universal conjunction, i.e., the model of 0-level universal conjunction and the model of 0-level universal implication form adjoint pair (see [13]). In this section, we introduce UBL algebra by the model of 0-level universal conjunction, universal disjunction and universal implication. Its main properties are studied as follows. Definition 1. Let L be a partial order set. A UBL algebra is an algebra (L, ⊗, ⊕, ⇒, 0, 1) with three binary operations and two constants such that (1) (L, ⊗, 1) is a commutative semigroup with the unit element 1, i.e. ⊗ is commutative, associative, and 1⊗x = x for all x; (2) (L, ⊕, 0) is a commutative semigroup with the unit element 0; (3) ⊗ and ⇒ form an adjoint pair, i.e. x⊗y≤z if and only if x≤y⇒z; (4) (x⇒y)⇒((x⊕z)⇒(y⊕z)) = 1; (5) x⇒y = 1 or y⇒x = 1. Example 1. (1) MV-algebra is an UBL algebra. (2) For ∀x, y∈[0, 1], we define the following operations: x⊗y = ite{0|x = 0 or y = 0; (max(0, xm + y m − 1))1/m } x⊕y = ite{1|x = 1 or y = 1; 1 − (max(0, (1 − x)m + (1 − y)m − 1))1/m }
A Propositional Calculus Formal Deductive System
33
x⇒y = ite{1|x≤y; 0|m≤0 and y = 0; (1 − xm + y m )1/m } where m∈R. Then ([0, 1], ⊗, ⊕, ⇒, 0, 1)) is UBL algebra which is called UBL unit interval. Proposition 1. Let L be an UBL algebra. ∀a, b, c∈L, the following properties are true: (1) a⊗b = b⊗a, a⊕b = b⊕a (2) (a⊗b)⊗c = a⊗(b⊗c), (a⊕b)⊕c = a⊕(b⊕c) (3) 1⊗a = a, 0⊕a = a (4) a⊗b≤c or a≤b⇒c (5) (a⊗b)⇒c = a⇒(b⇒c) (6) a≤b or a⇒b = 1 (7) 1⇒a = a (8) a⇒(b⇒a) = 1 (9) a⇒(b⇒c) = b⇒(a⇒c) (10) a≤b⇒c or b≤a⇒c (11) a⇒(b⇒a⊗b) = 1 (12) ((a⇒b)⊗a)⇒b = 1 (13) b⇒c≤(a⇒b)⇒(a⇒c) (14) a⇒b≤(b⇒c)⇒(a⇒c) (15) a⇒b≤(a⊗c)⇒(b⊗c) (16) a⇒b≤(a⊕c)⇒(b⊕c) (17) an ≤am , m≤n, where ak+1 = ak ⊗a. Proof omit.
3
A Propositional Calculus Formal Deductive System LU
The 0-level universal conjunction, universal disjunction and universal implication propositional connectives are ∧h , ∨h and →h , and written by ∧, ∨ and → respectively. Definition 2. For the 0-level model of universal conjunction, universal disjunction and universal implication, a propositional calculus system LU is defined as follows: The set F of well-formed formulas (wf s)of LU is defined as usual from a countable set of propositional variables P1 , P2 , · · ·, three connectives ∧, ∨, → and the truth constant ¯ 0 for 0. Further definable connectives are: ¬A : A→¯ 0 A≡B : (A→B)∧(B→A) Definition 3. The following formulas are axioms of universal logic system LU : (U 1) (A→B)→((B→C)→(A→C)) (U 2) A∧B→A (U 3) A∧B→B∧A (U 4) A→A∨B
34
M. Luo and H. He
(U 5) A∨B→B∨A (U 6) (A∧(A→B))→(B∧(B→A)) (U 7) (A→(B→C))→((A∧B)→C) (U 8) ((A∧B)→C)→(A→(B→C)) (U 9) ((A→B)→C)→(((B→A)→C)→C) (U 10) ¯ 0→A (U 11) (A∨B)∨C→A∨(B∨C) (U 12) (A→B)→(A∨C→B∨C) The deduction rule is modus ponens(MP): i.e., from A and A→B infer B. In a natural manner, we can introduce the concepts such as proof, theorem, deduction from a formula set Γ , Γ -consequence in the system LU . A theory of LU is a set of wf s. Γ A denotes that A is provable in the theory Γ . A denotes that A is a theorem of system LU . We denote Thm(LU ) = {A∈F | A} Ded(Γ ) = {A∈F | Γ A} Proposition 2. The hypothetical syllogism (HS rule for short) holds, i.e. assume Γ = {A→B, B→C}, then Γ A→C. Proof. Assume Γ = {A→B, B→C}, then 10 20 30 40 50
A→B (Γ ) (A→B)→((B→C)→(A→C)) (U 1) (B→C)→(A→C) (10 , 20 , MP) B→C (Γ ) A→C (30 , 40 , MP)
Theorem 1. The following formulae are theorems of the formal system LU : (T 1) (A→(B→C))→(B→(A→C)) (T 2) (B→C)→((A→B)→(A→C)) (T 3) A→(B→A) (T 4) A→A (T 5) A→(B→(A∧B)) (T 6) (A∧B)→(A→B) (T 7) (A→B∧C)→(A∧B→C) (T 8) (A∨B→C)→(A→B∨C) (T 9) ((A→C)∧(B→C))→(A∧B→C) (T 10) ((A→B)∧(A→C))→(A→B∨C) (T 11) (A∧(A→B))→B (T 12) (A→B)→(A∧C→B∧C) (T 13) A∧(B∧C)→(A∧B)∧C (T 14) (A∧B)∧C→A∧(B∧C)
A Propositional Calculus Formal Deductive System
Proof. (T 1) 10 20 30 40 50 60 70
B∧A→A∧B (U 3) (B∧A→A∧B)→((A∧B→C)→(B∧A→C)) (U 1) (A∧B→C)→(B∧A→C) (10 , 20 , MP) (B∧A→C)→(B→(A→C)) (U 8) (A∧B→C)→(B→(A→C)) (30 , 40 , HS) (A→(B→C))→(A∧B→C) (U 7) (A→(B→C))→(B→(A→C)) (50 , 60 , HS)
(T 2) (U 1) 10 (A→B)→((B→C)→(A→C)) 20 ((A→B)→((B→C)→(A→C)))→ ((B→C)→((A→B)→(A→C))) (T 1) (10 , 20 , MP) 30 (B→C)→((A→B)→(A→C)) (T 3) (U 2) 10 A∧B→A 20 (A∧B→A)→(A→(B→A)) (U 8) (10 , 20 , MP) 30 A→(B→A) (T 4) 10 20 30 40 50
A→(B→A) (T 3) (A→(B→A))→(B→(A→A)) (T 1) B→(A→A) (10 , 20 , MP) B ( Let B be any axiom) A→A (30 , 40 , MP)
(T 5) 10 20 30 40 50
B∧A→A∧B (U 3) (B∧A→A∧B)→(B→(A→A∧B)) (U 8) B→(A→A∧B) (10 , 20 , MP) (B→(A→A∧B))→(A→(B→A∧B)) (T 1) A→(B→A∧B) (30 , 40 , MP)
(T 6) 10 20 30 40 50
A∧B→B∧A (U 3) B∧A→B (U 2) A∧B→B (10 , 20 , HS) B→(A→B) (T 3) A∧B→(A→B) (30 , 40 , HS)
(T 7) 10 20 30 40 50
B∧C→(B→C) (T 6) (B∧C→(B→C))→((A→B∧C)→(A→(B→C))) (T 2) (A→B∧C)→(A→(B→C)) (10 , 20 , MP) (A→(B→C))→(A∧B→C) (U 7) (A→B∧C)→(A∧B→C) (30 , 40 , HS)
(T 8)
35
36
M. Luo and H. He
10 A→A∨B (U 4) 20 (A→A∨B)→((A∨B→C)→(A→C)) (U 1) (10 , 20 , MP) 30 (A∨B→C)→(A→C) 40 C→C∨B (U 4) 50 C∨B→B∨C (U 5) (40 , 50 , HS) 60 C→B∨C 70 (C→B∨C)→((A→C)→(A→B∨C)) (T 2) (60 , 70 , MP) 80 (A→C)→(A→B∨C) 90 (A∨B→C)→(A→B∨C) (30 , 80 , HS) (T 9) 10 (A→C)∧(B→C)→(A→C) (U 2) (U 2) 20 A∧B→A 30 (A∧B→A)→((A→C)→(A∧B→C)) (U 1) 40 (A→C)→(A∧B→C) (20 , 30 , MP) (10 , 40 , HS) 50 (A→C)∧(B→C)→(A∧B→C) (T 10) 10 (A→B)∧(A→C)→(A→C)∧(A→B) (U 3) (U 2) 20 (A→C)∧(A→B)→(A→C) (10 , 20 , HS)) 30 (A→B)∧(A→C)→(A→C) 0 4 C→B∨C (U 4) (T 2) 50 (C→B∨C)→((A→C)→(A→B∨C)) 60 (A→C)→(A→B∨C) (40 , 50 , MP) 70 ((A→B)∧(A→C))→(A→B∨C) (30 , 60 , HS) (T 11) (T 4) 10 (A→B)→(A→B) 20 ((A→B)→(A→B))→((A→B)∧A→B) (U 7) 30 (A→B)∧A→B (10 , 20 , MP) 0 (U 3) 4 A∧(A→B)→(A→B)∧A 50 ((A→B)∧A→B)→(A∧(A→B)→B) (U 1) 60 (A∧(A→B))→B (40 , 50 , MP) (T 12) 10 (A∧(A→B))→B (T 11) 20 B→(C→B∧C) (T 5) 30 (A∧(A→B))→(C→B∧C) (10 , 20 , HS) 40 ((A∧(A→B))→(C→B∧C))→ (A→((A→B)→(C→B∧C))) (U 8) (30 , 40 , MP) 50 A→((A→B)→(C→B∧C)) 60 ((A→B)→(C→B∧C))→ (C→((A→B)→B∧C)) (T 1) (50 , 60 , HS) 70 A→(C→((A→B)→B∧C)) 80 (A→(C→((A→B)→B∧C)))→ (A∧C→((A→B)→B∧C)) (U 7) (70 , 80 , MP) 90 (A∧C)→((A→B)→B∧C) 100 ((A∧C)→((A→B)→B∧C))→ ((A→B)→(A∧C→B∧C)) (T 1) (90 , 100 , MP) 110 (A→B)→(A∧C→B∧C)
A Propositional Calculus Formal Deductive System
(T 13) (U 8) 10 ((A∧B)∧C→D)→(A∧B→(C→D)) 20 (A∧B→(C→D))→(A→(B→(C→D))) (U 8) (10 , 20 , HS) 30 ((A∧B)∧C→D)→(A→(B→(C→D))) 0 4 (B→(C→D))→(B∧C→D) (U 7) 50 ((B→(C→D))→(B∧C→D))→ ((A→(B→(C→D)))→(A→(B∧C→D))) (T 2) (40 , 50 , MP) 60 (A→(B→(C→D)))→(A→(B∧C→D)) 70 (A→(B∧C→D))→(A∧(B∧C)→D) (U 7) (60 , 70 , HS) 80 (A→(B→(C→D)))→(A∧(B∧C)→D) 90 ((A∧B)∧C→D)→(A∧(B∧C)→D) (30 , 80 , HS) 0 10 LetD = (A∧B)∧C 110 A∧(B∧C)→(A∧B)∧C (90 , 100 , MP) The proof of (T 14) is similar to (T 13).
37
Definition 4. The binary relation ∼ on F is called provable equivalence relation, A∼B if and only if A→B, B→A Theorem 2. The relation ∼ is a congruence relation of F . The quotient algebra [F ] = F /∼ = {[A]| A∈F } is an UBL algebra in which the partial ordering ≤ is defined as follows: [A]≤[B] if and only if A→B Proof. It is clear that the relation ∼ is an equivalent relation on F . It can be proved that the relation ∼ is a congruence relation of F by (U 1), (U 12), (T 1), (T 2) and (T 12). The quotient algebra [F ] = F /∼ = {[A]| A∈F } is an UBL algebra, where [A]∨[B] = [A∨B], [A]∧[B] = [A∧B], [A]→[B] = [A→B].
4
The Completeness of Formal System LU
Now we extend the semantical concepts of the system LU onto general UBL algebra. Definition 5. Let L be an UBL algebra. A (∧, ∨, →)-type homomorphism, i.e. v : F →L v(A∧B) = v(A)⊗v(B), v(A∨B) = v(A)⊕v(B), v(A→B) = v(A)⇒v(B) is called an L-valuation of the system LU . The set Ω(L) of all L-valuation of the system LU is called the L-semantics of the system LU .
38
M. Luo and H. He
Definition 6. Let L be an UBL algebra, A∈F, Γ ⊆F. (1) A is called an L-tautology, denote by |=L A, if ∀v∈Ω(L), v(A) = 1. (2) A is called an L-semantics consequence of Γ , denote by Γ |=L A, if ∀v∈ Ω(L), we always have v(A) = 1 whenever v(Γ )⊆{1}. A is L-tautology when Γ = ∅. We use T (L) denote the set of all L-tautology, i.e., T (L) = {A∈F | |=L A}. Theorem 3 (L-soundness). Let L be an UBL algebra, ∀A∈F, ∀Γ ⊆F. If Γ A, then Γ |=L A. Specially, Thm(LU )⊆T (L). Proof. ∀Γ ⊆F, if v(Γ )⊆{1} and Γ A, then A∈Axm(LU ), or A∈Γ , or A is obtained from B and B→A by MP, where B and B→A are Γ -consequence, and v(B) = v(B→A) = 1. If A∈Axm(LU ) or A∈Γ , then v(A) = 1. If A is obtained from B and B→A by MP, then v(A) = 1 since v(A)≥v(B)⊗(v(B)⇒v(A)) = v(B)⊗v(B→A) = 1. Definition 7. Let L = (L, ⊕, ⊗, ⇒, 0, 1) be an UBL algebra. A filter on L is a non-empty set F ⊆L such that for each a, b∈L, a∈F, and b∈F implies a⊗b∈F, a∈F and a≤b implies b∈F. F is a prime filter iff for each a, b∈L, (a⇒b)∈F or (b⇒a)∈F. Remark. If F is a filter on UBL algebra, x, y∈F , then x⊕y∈F . In fact, it is true that x⊗y≤x≤x⊕y. Proposition 3. Let L be an UBL algebra and F be a filter. We define the relation as follows: a∼F b if and only if (a⇒b)∈F and (b⇒a)∈F. then (1) ∼F is a congruence and the quotient algebra L/∼F is an UBL algebra. (2) L/∼F is linearly ordered if and only if F is a prime filter. Proposition 4. Let L be an UBL algebra and a∈L\{1}. Then there is a prime filter F on L such that a∈F / . Proof. Let A be the set of all filters on L not containing a. Then A=∅ by {1}∈A. There is a maximal element F in A, because every chain {Ft }t∈I has an upper bound ∪t∈I Ft in partially order set (A, ⊆) by Zorn,s Lemma. To show that F is a prime filter on L.
A Propositional Calculus Formal Deductive System
39
If there exist b, c∈L, such that b⇒c∈F / and c⇒b∈F / , let F1 = {x∈L|∃y∈F, ∃n∈N, y⊗(b⇒c)n ≤x} F2 = {u∈L|∃v∈F, ∃m∈N, v⊗(c⇒b)m ≤u} We can prove that F1 and F2 are filters on L such that b⇒c∈F1 , F ⊂F1 , c⇒b∈F2 , F ⊂F2 . Now we only prove that F1 ∈A or F2 ∈A, i.e., a∈F / 1 or a∈F / 2. In fact, if a∈F1 and a∈F2 , then ∃y, v∈F , ∃n, m∈N such that a≥y⊗(b⇒c)n , a≥v⊗(c⇒b)m . Without loss of generality, we assume n≥m, then a≥v⊗(c⇒b)n . Hence a≥max(y⊗(b⇒c)n , v⊗(c⇒b)n )≥y⊗v⊗(max((b⇒c)n , (c⇒b)n )) = y⊗v Thus a∈F by y⊗v∈F , a contradiction. Therefore, a∈F / 1 or a∈F / 2.
Theorem 4. Let L be any an UBL algebra. Then exist UBL chain {Lt |t∈I} such that L can be isomorphically embedded into L∗ = Πt∈I Lt . Proof. Let P = {F | F is a prime filter on L}, then P=∅. We have that L/∼F is an UBL chain for all F ∈P by Proposition 3. We may prove that the mapping i : L→L∗ , x→([x]F )F ∈P is an embedding from L to L∗ , i.e., the mapping i is monomorphism. In fact, obviously, i is a homomorphism. ∀a, b∈L, if a=b, without loss of generality, we assume a≤b, then a⇒b=1. There exist a prime filter F on L such that a⇒b∈F / by Proposition 4. Thus [a]F ≤[b]F in L/∼F . Hence [a]F =[b]F , i.e., i(a)=i(b). Proposition 5. If L1 and L2 are UBL algebra, L1 ∼ =L2 , then T (L1 ) = T (L2 ). Proposition 6. Let L, L1 be UBL algebra. If L1 is a subalgebra of L, then T (L)⊆T (L1 ). Theorem 5. Let L be an UBL chain and A∈F. If A∈T (L), then A∈T (L1 ), where L1 is any UBL algebra. Proof. We shall show that A∈T (L∗ ) when A∈T (Lt )(∀t∈I) by Theorem 4, Proposition 5 and Proposition 6, where L∗ = Πt∈I Lt , {Lt }t∈I is chain. In fact, ∀v∈Ω(L∗ ), then ft v∈Ω(Lt ), where ft : L∗ →Lt is a projection mapping. If v(A) = (at )t∈I =1, then exist t∈I such that at =1, i.e. ft v(A) = ft (v(A)) =1, a contradiction. Thus A∈T (L∗ ).
40
M. Luo and H. He
Theorem 6 ([L]-completeness). Let A∈F. Then A if and only if |=[L] A. Proof. The necessity is obviously by Theorem 3. [A]≤[B] if and only if A→B by Theorem 2 in [L]. [A] = [B] if and only if A∼B. If B∈Thm(LU ), then ∀A∈F, A→B, i.e., [A]≤[B]. Thus [B] = 1 is the maximal element of [L]. Assume [A] = 1, then [A] = [B], therefore A∼B, i.e., A∈Thm(LU ). Hence the maximal element 1 of [L] is the set of all theorems of system LU . Suppose A∈T ([L]), then ∀v∈Ω([L]), v(A) = 1. Specially, v : F →[L], A→[A], A∈F, v(A) = 1, i.e., [A] = 1. Thus A∈Thm(LU ), i.e., A.
Theorem 7 (completeness). Let A∈F, then the following are equivalent: (i) A; (ii) for each linearly ordered UBL algebra L, A∈T (L); (iii) for each UBL algebra L, A∈T (L). Proof. The implications (i)⇒(ii), (ii)⇒(iii) and (iii)⇒(i) have been proved by Theorem 3, Theorem 5 and Theorem 6 respectively. Theorem 8 (strong completeness). Let Γ ⊆F, A∈F. The following are equivalent: (i) Γ A; (ii) for each linearly ordered UBL algebra L, Γ |=L A; (iii) for each UBL algebra L, Γ |=L A. Proof. The implications (i)⇒(ii), (ii)⇒(iii) have been proved by Theorem 3 and Theorem 5 respectively. We shall show (iii)⇒(i). We define the relation on F ∼Γ : A∼Γ B if and only if Γ A→B, Γ B→A It can be proved easily that ∼Γ is a congruence on F , and corresponding quotient algebra [F ]Γ = F /∼Γ = {[A]Γ |A∈F } is an UBL algebra, and ∀v∈Ω([F ]Γ ), v(Γ )⊆{1}. We can prove that 1=Ded(Γ ) is the maximal element of [F ]Γ . If Γ |=[F ]Γ A, then ∀v∈Ω([F ]Γ ), v(A) = 1. Specially, for the mapping v : F →[F ]Γ , A→[A]Γ , A∈F we have [A]Γ = 1=Ded(Γ ). Hence A∈Ded(Γ ), i.e., Γ A.
5
Conclusions
We introduce a new kind of algebra system UBL algebra based on 0-level universal conjunction operation, universal disjunction operation and universal implication operation. A general propositional calculus formal deductive system
A Propositional Calculus Formal Deductive System
41
LU of universal logic based on UBL algebras is built up, and its completeness is proved. It show that syntax and semantics of formal system LU is concordant. We may establish a solid logical foundation for flexible reasoning. Moreover, the formal deductive system B ([11]) of universal logic in ideal condition(h≡0.5) is a schematic extension of general universal logic system LU .
References 1. Hongxing L.: To see the success of fuzzy logic from mathematical essence of fuzzy control-on the paradoxical sussess of fuzzy logic. Fuzzy systems and mathematics 9(1995)1–14(in chinese) 2. Rose,P.A., Rosser,J.B.: Fragments of many valued statement calculi. Trant.A.M.S. 87(1958)1–53 3. Dummett,M.: A propositional calculus with denumerable matrix. Journal of Symbolic Logic 24(1959)97–106 4. H´ ajek,P., Godo,L., Esteva,F.: A complete many-valued logic with product conjunction. Archive for Mathematical Logic 35(1996)191–208 5. H´ ajek,P.: Metamathematics of fuzzy logic. Kluwer Academic Publishers(1998) 6. Guojun W.: A formal deductive system of fuzzy propositional calculus. Chinese Science Bulletin 42(1997)1041–1045(in chinese) 7. Guojun W.: The full implication triple I method for fuzzy reasoning. Science in China(Series E) 29(1999)43–53(in chinese) 8. Daowu P., Guojun W.: The completeness and applications of the formal system L∗ . Science in China(Series F) 45(2002)40–50 9. Daowu P.: A logic system based on strong regular residuated lattices and its completeness. Acta Mathematica Sinica 45(2002)745–752(in chinese) 10. Huacan He.: Principle of Universal Logic. Beijing: Chinese Scientific Press(2001)(in chinese) 11. Minxia L., Huacan H.: The formal deductive system B of universal logic in the ideal condition. Science of Computer 31(2004)95–98(in chinese) 12. Minxia L., Huacan H.: The completeness of the formal deductive system B of universal logic in the ideal condition. Science of computer(in chinese)(in press) 13. Minxia L., Huacan H.: Some algebraic properties about the 0-level universal operation model of universal logic. Fuzzy systems and Mathematics(in chinese)(in press)
Entropy and Subsethood for General Interval-Valued Intuitionistic Fuzzy Sets Xiao-dong Liu, Su-hua Zheng, and Feng-lan Xiong Department of Mathematics, Ocean University of China, Qingdao 266071, P.R. China [email protected], [email protected], [email protected]
Abstract. In this paper, we mainly extend entropy and subsethood from intuitionistic fuzzy sets to general interval-valued intuitionistic fuzzy sets, propose a definition of entropy and subsethood , offer a function of entropy and construct a class of subsethood function. Then from discussing the relationship between entropy and subsethood, we know that while choosing the subsethood, we can get some kinds of function of entropy based on subsethood. Our work is also applicable to practical fields such as: neural networks, expert systems, and other.
1
Introduction
Since L.A.Zadeh introduced fuzzy sets [1] in 1965, a lot of new theories treating imprecision and uncertainty have been introduced. Some of them are extensions of fuzzy set theory. K.T.Atanassov extended this theory, proposed the definition of intuitionistic fuzzy sets (IF S, for short) ([3] [4]) and interval-valued intuitionistic fuzzy sets (IV IF S, for short) [5], which have been found to be highly useful to deal with vagueness out of several higher order fuzzy sets. And then, in the year 1999, Atanassov defined a Lattice-intuitionistic fuzzy set [6]. A measure of fuzziness often used and cited in the literature is entropy first mentioned in 1965 by L.A.Zadeh [2]. The name entropy was chosen due to an intrinsic similarity of equations to the ones in the Shannon entropy [7]. But they are different in types of uncertainty and Shannon entropy basically measures the average uncertainty in bits associated with the prediction of outcomes in a random experiment. This theory was extended and a non-probabilistic-type entropy measure for IF S was proposed by Eulalia Szmidt, Janusz Kacprzyk [8]. We extended it again onto general interval-valued intuitionistic fuzzy sets (V IF S, for short) in this paper. We organize this paper as follows: in section 2, at first, we define the definition of V IF S , then discuss the relation between V IF S and some other kinds of intuitionistic fuzzy sets. From the discussion, we make sure that all these sets are subset of V IF S in the view of isomorphic imbedding mapping, and this ensures that our work about the theory of entropy and subsethood is a reasonable extension of F S, IF S and IV IF S. So according to section 2, theories in this paper are feasible on all these kinds of intuitionistic fuzzy sets. In section 3, we L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 42–52, 2005. c Springer-Verlag Berlin Heidelberg 2005
Entropy and Subsethood for General Interval-Valued IFS
43
mainly extend entropy [7] onto V IF S(X). Firstly, we give a reasonable definition of A is less fuzzy than B on V IF S. For there is not always a comparable relation between any two elements of L, which is a lattice. So we premise the definition of A refines B with the discussion of the incomparable condition. Then we definite entropy by giving the properties of A refines B in theorem 3.1. Finally from theorem 3.2, we get a idiographic entropy function. In section 4, we definite a class of subsethood function by three real functions. Then, from discussing the relation between entropy and subsethood, we get a class of entropy function which is useful in different conditions and eg 2 ensures that is reasonable. Thus, while choosing the subsethood, we can get some kinds of function of entropy based on subsethood.
2
Definitions and Quantities of Some Kinds of IFS
Throughout this paper, let X is a nonempty definite set,|X| = n. Definition 2.1. An interval number over [0, 1] is defined as an object of the form: a = [a− , a+ ] with the property: 0 ≤ a− ≤ a+ ≤ 1 . Let L denote all interval numbers over [0, 1]. We define the definition of relation (it is specified by “ ≤ ”): a ≤ b ⇔ a− ≤ b− and a+ ≤ b+ . We can easily prove that “ ≤ ” is a partially ordered relation on L. So < L, “ ≤ ” > is a complete lattice where − − ∨ b−, a+ ∨ b+ ] , a∧ b = [a ∧ b− , a+ ∧ b + ] , a∨ b = [a − + − ai = [ ai ai ] , ai = [ ai , a+ . i ] i∈I
i∈I
i∈I
i∈I
i∈I
i∈I
Let 1 = [1, 1] denotes the maximum of < L, “ ≤ ” > and 0 = [0, 0] denotes the minimum of < L, “ ≤ ” >. We define the “C” operation, where ac = [1 − a+ , 1 − a− ] . We can easily prove that the complement operation “C” has the following properties: c c c c c c ) =a, = (ac ∨bc ) , 1. (a c 2. (a∨b) = (a ∧b ),c (a∧b) c c c c ai , ( ai ) = ai , 4. If a ≤ b , then (a ≥ b ) ∀a, b ∈ L . 3. ( ai ) = i∈I
i∈I
i∈I
i∈I
Then we will give the definition of intuitionistic fuzzy sets on L .
Definition 2.2. A general interval valued intuitionistic fuzzy sets A on X is an object of the form: A = {< x, μA (x), νA (x), x ∈ X >} , where μA : X → L and νA : X → L , with the property: 0 ≤ μA (x) ≤ νA (x)c ≤ 1 (∀x ∈ x). For briefly, we denote A = {< x, μA (x), νA (x) >, x ∈ X} by A =< μA (x), νA + − + (x) > and denote μA (x) = [μ− A (x), μA (x)] ∈ L, νA (x) = [νA (x), νA (x)] ∈ L . Let V IF S(X) denote all set of general interval valued intuitionistic fuzzy sets on L . We define the relation “ ≤ ”: A ≤ B ⇔ μA (x) ≤ μB (x) and νA (x) ≥ νB (x) . We can easily prove that ≤ is also a partially ordered relation on V IF S(X). So < V IF S(X), “ ≤ ” > is a complete lattice where
44
X.-d. Liu, S.-h. Zheng, and F.-l. Xiong
A ∨ B = [μA (x) ∨ μB (x), νA (x) ∧ νB (x)] , A ∧ B = [μA (x) ∧ μB (x), νA (x) ∨ νB (x)] , c
i∈I i∈I
Ai = [
i∈I
Ai = [
i∈I
μAi , μAi ,
i∈I i∈I
νAi ] , νAi ] .
We define the “C” operation, where A =< νA (x) , μA (x) > . We can easily prove that the complement operation “C” has the following c properties: c Ai , ( Ai )c = Ai , 1. (Ac )c = A , 2. (A∨B)c = (Ac ∧B c ), 3. ( Ai )c = i∈I
i∈I
i∈I
i∈I
4. If A ≤ B, then (Ac ≥ B c ) ∀A, B ∈ V IF S(X). So we know < V IF S(X), “ ≤ ” > is a complete lattice with complement . Let I denotes the maximum of V IF S(X) and θ denotes the minimum of V IF S(X), Where μI (x) = 1 , νI (x) = 0 , μθ (x) = 0 , νθ (x) = 1. Then we will discuss the relationship between IV IF S(X)[5] and V IF S(X) . Definition 2.3. An interval-valued intuitionistic fuzzy set on X ( IV IF S(X), for short) is an object of the form B = {< x, MB (x), NB (x), x ∈ X >} . Where MB : X → L and NB : X → L (∀x ∈ X) , (∀x ∈ X). With the condition 0 ≤ MB+ (x) + NB+ (x) ≤ 1 Proposition 2.1.IV IF S(X)[5] is a subset of V IF S(X) . Proof. To any B belongs to IV IF S(X)[5] , we have B = {< x, MB (x), NB (x), x ∈ X >} , Where MB : X → L and NB : X → L With the condition 0 ≤ MB+ (x) + NB+ (x) ≤ 1 (∀x ∈ X) , + that is 0 ≤ MB (x) ≤ 1 − NB+ (x) (∀x ∈ X). from definition 2.2 we can easily prove 0 ≤ MB (x) ≤ NB (x)C ≤ 1 (∀x ∈ X). That is B belongs to V IF S(X) . So IV IF S(X)[5] is a subset of V IF S(X) . We have an example here: Eg1. Let A = {< x, [0, 0.3], [0, 0.8] >, x ∈ X} for [0, 0.8]c = [0.2, 1], [0, 0.3] ≤ [0.2, 1], we have A ∈ V IF S(X). But from 0.3 + 0.8 ≥ 1, we get A ∈ V IF S(X). Then we get IV IF S(X) ⊂ V IF S(X) and IV IF S(X) = V IF S(X). It is shown in the Theorem 2.1 that in the view of isomorphism insertion, we consider P (X) is a subset of V IF S(X). Definition 2.4. ,where A function P (X) → V IF S(X) 1 , if x ∈ A, 0 , if x ∈ A, ; νδ(A) (x) = (∀A ∈ P (X)) . μδ(A) (x) = 0 , if x ∈ A 1 , if x ∈ A Theorem 2.1.The function δ is a injection map reserving union, intersection and complement. Proof. Let Ai ∈ I , 1. It is easy to prove that δ is a injection . Ai ⇔ ∃i0 ∈ I satisfying x ∈ Ai0 , 2. μδ( Ai ) (x) = 1 ⇔ x ∈ i∈I i∈I μδ(Ai ) (x) = 1 ⇔ ∃i0 ∈ I, μδ(Ai0 ) (x) = 1 ⇔ ∃i0 ∈ I satisfying x ∈ Ai0 , i∈I That is μδ( Ai ) (x) = μδ(Ai ) (x) (∀x ∈ X) , i∈I
i∈I
Entropy and Subsethood for General Interval-Valued IFS
νδ( i∈I
Ai ) (x)
= 1 ⇔ x ∈
i∈I
45
Ai ⇔ ∀i ∈ I there is x ∈ Ai .
νδ(Ai ) (x) = 1 ⇔ ∀i ∈ I, νδ(Ai ) (x) = 1 ⇔ ∀i ∈ I , there is x ∈ Ai . That is νδ( Ai ) (x) = νδ(Ai ) (x) (∀x ∈ X). So we have δ( Ai ) = δ(Ai ). i∈I i∈I i∈I i∈I δ(Ai ) , 3. Similarly, we can prove δ( Ai ) = i∈I
i∈I
i∈I
μδ(Ac ) (x) = 1 ⇔ x ∈ Ac ⇔ x ∈ A , (1) (2) νδ(Ac ) = 1 ⇔ x ∈ Ac ⇔ x ∈ A , μ(δ(A))c (x) = ν(δ(A)) (x) = 1 ⇔ x ∈ A , (3) ν(δ(A))c (x) = μ(δ(A)) (x) = 1 ⇔ x ∈ A . (4) From (1) and (3) we have μδ(Ac ) (x) = μ(δ(A))c (x) (∀x ∈ X) , νδ(Ac ) = ν(δ(A))c (x) (∀x ∈ X) . Then we get δ(Ac ) = (δ(A))c . So δ is a injection map reserving union, intersection and complement. In the view of imbedding mapping, we may consider P (X) as the subset of V IF S(X). Let F (X) is the class of all fuzzy sets proposed by L.A.Zadeh, let ξ: F (x) → V IF S(X) , ξ(F ) = A =< μA (x), νA (x) > , μA (x) = [F (x), F (x)] , νA (x) = [1 − F (x), 1 − F (x)] . So that the injection ξ holds union, intersection and complement. In the same way, we may also consider F (X) as the subset of V IF S(X)in the view of imbedding mapping. 4.
Definition 2.5. An intuitionistic fuzzy set on X (IF S(X) , for short) is an object of the form: A = {< x, μA (x), νA (x), x ∈ X >} , where μA : X → [0, 1] and νA : X → [0, 1] , with the property: 0 ≤ νA (x) + μA (x) ≤ 1 (∀x ∈ x). Let’s discuss the relationship between IF S(X)[4] and V IF S(X) . Let η: IF S(X) → V IF S(X) Where any C belongs to IF S(X)[4] , C = {< x, α(x), β(x) >, x ∈ X} (α(x) + β(x) ≤ 1), η(C) = A =< μA (x), νA (x) >, μA (x) = [α(x), α(x)] , νA (x) = [β(x), β(x)] . Similarly we have IF S(X) is the subset of V IF S(X) in the view of imbedding mapping. All the above ensure that what we get in this paper are feasible on P (X), F (X), IF S(X) and IV IF S(X). A method of judging whether set A ∈ V IF S(X) is a subset of P (X), is given by the following theorem . Theorem 2.2. Let A ∈ V IF S(X) , so we have A ∨ Ac = I ⇔ A ∈ δ(P (X)) . Proof. “ ⇒ ” Let A =< μA (x), νA (x) > , from definition 1) we have: Ac =< νA (x), μA (x) >, μA∨Ac = μA (x) ∨ νA (x), νA∨Ac = μA (x) ∧ νA (x) . − (1) From μA (x)∨νA (x) = 1 , we have μ− A (x)∨νA (x) = 1 ; + + From μA (x)∧νA (x) = 0 , we have μA (x)∧νA (x) = 0 . (2) There are two different cases: + − (x) = 0 , then we have νA (x) = 0 , that is νA (x) = 0 , and from (1) 1. If νA − + we have μA (x) = 1 . So we get μA (x) = 1, that is μA (x) = 1 ; − 2. If μ+ A (x) = 0 then we have μA (x) = 0 , that isμA (x) = 0 .
46
X.-d. Liu, S.-h. Zheng, and F.-l. Xiong
− + And from (1) we have νA (x) = 1 . So we get νA (x) = 1 that is νA (x) = 1 . So we know that for any x belongs to X, μA (x) and νA (x) can only be 0 or 1 with the following property: μA (x) = 1 ⇔ νA (x) = 0 . Let K = {x|x ∈ X, μA (x) = 1} , from definition we have: μδ(K) (x) = 1 ⇔ x ∈ K ⇔ μA (x) = 1 ; νδ(K) (x) = 0 ⇔ x ∈ K ⇔ μA (x) = 1 ⇔ νA (x) = 0 (∀x ∈ X) . So we get μδ(K) (x) = μA (x), νδ(K) (x) = νA (x) (∀x ∈ X) . That is δ(K) = A . “ ⇐ ” It is straight-forward .
Corollary 2.1. A ∧ Ac = θ ⇔ A ∈ δ(P (X)) . From proposition 2.2 to proposition 2.4, we give some qualities of V IF S. Proposition 2.2. Let A ∈ V IF S(X),then we have: 1. μA (x) = 1 ⇒ νA (x) = 0 (∀x ∈ X), 2. νA (x) = 1 ⇒ μA (x) = 0 (∀x ∈ X). Proof. 1. For νA (x) ≤ (μA (x))c = (1)c = 0 , we have νA = 0 . Similarly we can prove 2. Proposition 2.3. A ∨ Ac = θ (∀A ∈ V IF S(X)) Proof. Let A =< μA (x), νA (x) > then Ac =< νA (x), μA (x) >, A ∨ Ac =< μA (x) ∨ νA (x), νA (x) ∧ μA (x) >. If A ∨ Ac = θ , then we have μA (x) ∨ νA (x) = 0, νA (x) ∧ μA (x) = 1 . It is contradiction. Hence A ∨ Ac = θ (∀A ∈ V IF S(X)). Proposition 2.4. Let A ∈ V IF S(x) ,Then we have νA∨Ac (x) = νA∧Ac (x) ⇔ μA (x) = νA (x) (∀x ∈ X) . Proof. ” ⇒ ” From the definition2.2, − + + νA∨Ac (x) = νA (x) ∧ μA (x) = [νA (x) ∧ μ− A (x), νA (x) ∧ μA (x)] , − − + νA∧Ac (x) = νA (x) ∨ μA (x) = [νA (x) ∨ μA (x), νA (x) ∨ μ+ A (x)] . From νA∨Ac (x) = νA∧Ac (x) , − − − + + + + we have νA (x) ∧ μ− A (x) = νA (x) ∨ μA (x) , νA (x) ∧ μA (x) = νA (x) ∨ μA (x) . − − + + So we get νA (x) = μA (x), νA (x) = μA (x) . That is νA (x) = μA (x) . “ ⇐ ” It is straight-forward .
3
Entropy on VIFS
De Luca and Termini [12] first axiomatized non-probabilistic entropy. The De Luca-Termini axioms formulated for intuitionistic fuzzy sets are intuitive and have been wildly employed in the fuzzy literature. To extend this theory, firstly we definite the function as follows: Definition 3.1. M : V IF S(X) → [0, 1], where ∀A ∈ V IF S(x) , − + 1 M (A) = 2n [2 − νA (x) − νA (x)] (∀x ∈ X) . x∈X
Entropy and Subsethood for General Interval-Valued IFS
47
− + Proposition 3.1. To any A ∈ V IF S(X), M (A) = 0 ⇔ νA (x) = νA (x) = 1 ⇔ A = θ (∀x ∈ X). − + (x) − νA (x) = 0 (∀x ∈ X) , Proof. M (A) = 0 ⇔ 2 − νA − + − + for νA (x) ≤ 1, νA (x) ≤ 1 ,we have M (A) = 0 ⇔ νA (x) = νA (x) = 1 (∀x ∈ X) . From proposition 2.2, we have μA (x) = 0 . So we have μA (x) = μθ (x), νA (x) = − + νθ (x) . That is A = θ , hence M (A) = 0 ⇔ νA (x) = νA (x) = 1 ⇔ A = θ (∀x ∈ X).
Proposition 3.2. Let A, B ∈ V IF S(X) , then we have A ≥ B ⇒ M (A) ≥ M (B) . Proof. For A ≥ B , then we have νA (x) ≤ νB (x) (∀x ∈ X) . So we have − − + + (x) ≤ νB (x), νA (x) ≤ νB (x) . νA − + − + Hence [2−νA (x)−νA (x)] ≥ [2 − νA (x) − νA (x)] . That isM (A) ≥ M (B) . x∈X
x∈X
Proposition 3.3. Let A, B ∈ V IF S(X) , then we have νA (x) ≤ νB (x) and M (A) ≥ M (B) ⇔ νA (x) = νB (x)
(∀x ∈ X) .
Proof. “ ⇐ ”It is straight-forward. − − + + (x) ≤ νB (x) and νA (x) ≤ νB (x) . “ ⇒ ” For νA (x) ≤ νB (x) , we have νA − + − + So we get 2 − νA (x) − νA (x) ≥ νB (x) − νB (x) (∀x ∈ X). If there is a x0 ∈ X, − − + + (x) < νB (x), νA (x) < νB (x) let one of the following two inequations νA establish, then we have − + − + M (A) = [2 − νA (x) − νA (x)] + [2 − νA (x0 ) − νA (x0 )] x =xo , x∈X − + − + > [2 − νB (x) − νB (x)] + [2 − νB (x0 ) − νB (x0 )] x =xo , x∈X
= M (B) . − − This is contrary to M (A) = M (B) . So for any x ∈ X ,we have νA (x) = νB (x) , + + νA (x) = νB (x) . That is νA (x) = νB (x) (∀ ∈ X) . Definition 3.2. Let A, B ∈ V IF S(X) with the following properties: 1. If μB (x) ≥ νB (x), then μA (x) ≥ μB (x) and νA (x) ≤ νB (x) , 2. If μB (x) < νB (x), then μA (x) ≤ μB (x) and νA (x) ≥ νB (x) , 3. If μB (x) and νB (x) is incomparable, there are two conditions: − 1). If μ− B (x) ≥ νB (x) , then there are four inequations as follow: − − + − − + + (1)μA (x) ≥ μB (x) , (2)μ+ A (x) ≤ μB (x) , (3)νA (x) ≤ νB (x) , (4)νA (x) ≥ νB (x) . − − 2).If μB (x) < νB (x) then there are four inequations as follow: − + + − − + + (5)μ− A (x) ≤ μB (x) , (6)μA (x) ≥ μB (x) , (7)νA (x) ≥ νB (x) , (8)νA (x) ≤ νB (x) . Thus we call that A refines B (that is A is less fuzzy than B).
Theorem 3.1. LetA, B ∈ V IF S(X) , and A refines B, then we have 1. (A ∧ Ac ) ≤ (B ∧ B c ) , 2. (A ∨ Ac ) ≥ (B ∨ B c ) . Proof. We prove inequation 1 first: 1. If μB (x) ≥ νB (x) from definition 3.2, we have μA (x) ≥ μB (x) νA (x) ≤ νB (x) . Then we have μA (x) ≥ μB (x) ≥ νB (x) ≥ νA (x) .
and
48
X.-d. Liu, S.-h. Zheng, and F.-l. Xiong
Hence μA∧Ac (x) = μA (x) ∧ νA (x) = νA (x), μB∧B c (x) = μB (x) ∧ νB (x) = νB (x) , νA∧Ac (x) = μA (x) ∨ νA (x) = μA (x) , νB∧B c (x) = μB (x) ∨ νB (x) = μB (x) . So μA∧Ac (x) ≤ μB∧B c (x) , νA∧Ac (x) ≥ νB∧B c (x) . 2. If μB (x) < νB (x) , from definition 3.2, we have μA (x) ≤ μB (x) and νA (x) ≥ νB (x) . Then we have μA (x) ≤ μB (x) ≤ νB (x) ≤ νA (x) . Hence μA∧Ac (x) = μA (x)∧νA (x) = μA (x), μB∧B c (x) = μB (x)∧νB (x) = μB (x) , νA∧Ac (x) = μA (x) ∨ νA (x) = νA (x), νB∧B c (x) = μB (x) ∨ νB (x) = νB (x) . So μA∧Ac (x) ≤ μB∧B c (x) , νA∧Ac (x) ≥ νB∧B c (x) . 3. In the case of μB (x) and νB (x) is incomparable, there are still another two different cases: − 1). If μ− B (x) ≥ νB (x) (9), for μB (x) is not larger than νB (x), we have + + − − μB (x) ≤ νB (x)(10). From (1), (3) and (9) we get μ− A (x) ≥ μB (x) ≥ νB (x) ≥ − − − νA (x). So we have μA (x) ≥ νA (x) (11) and from (2), (4) and (10) we get + + + + + μ+ A (x) ≤ μB (x) ≤ νB (x) ≤ νA (x). So we have μA (x) ≤ νA (x) (12). Then from (9) and (10), we have − + + − + μB∧B c (x) = μB (x) ∧ νB (x) = [μ− B (x) ∧ νB (x), μB (x) ∧ νB (x)] = [νB (x), μB (x)] , − − + + − + νB∧B c (x) = μB (x) ∨ νB (x) = [μB (x) ∨ νB (x), μB (x) ∨ νB (x)] = [μB (x), νB (x)] . − + In the same way, from (11) and (12), we get μA∧Ac (x) = [νA (x), μA (x)] and + νA∧Ac (x) = [μ− A (x), νA (x)] . Then from (2) and (3), we get μA∧Ac (x) ≤ νA∧Ac (x) ≥ νB∧B c (x) . μB∧B c (x) . From(1) and (4) we get − + 2). If μ− B (x) < νB (x) (13), for μB (x) is not less than νB (x), we have μB (x) ≥ + − − − − νB (x) (14). From (5), (13) and (7), we get μA (x) ≤ μB (x) < νB (x) ≤ νA (x). − − + + Hence μA (x) < νA (x) (15), and from (6),(14) and (8), we get μA (x) ≥ μB (x) ≥ + + + νB (x) ≥ νA (x). Henceμ+ A (x) ≥ νA (x) (16). Then from (13)and (14), we have − − + − + μB∧B c (x) = μB (x) ∧ νB (x) = [μB (x) ∧ νB (x), μ+ B (x) ∧ νB (x)] = [μB (x), νB (x)], − − + + − νB∧B c (x) = μB (x) ∨ νB (x) = [μB (x) ∨ νB (x), μB (x) ∨ νB (x)] = [νB (x), μ+ B (x)]. + In the same way, from (15) and (16), we get μA∧Ac (x) = [μ− A (x), νA (x)] − + c c and νA∧A (x) = [νA (x), μA (x)]. Then from (5) and (8), we get μA∧A (x) ≤ μB∧B c (x). From(6) and (7), we get νA∧Ac (x) ≥ νB∧B c (x). Summary, we get inequation 1. (A ∧ Ac ) ≤ (B ∧ B c ). From inequation 1. , we can easily get inequation 2. (A ∨ Ac ) ≥ (B ∨ B c ). Then we can define the definition of entropy on V IF S. The De Luca-Termini axioms[12] were formulated in the following way. Let E be a set-to-point mapping E : F (X) → [0, 1] . Hence E is a fuzzy set defined on fuzzy sets. E is an entropy measure if it satisfies the four De Luca and Termini axioms: 1. E(A) = 0 iff A ∈ X(A non-fuzzy) , 2. E(A) = 1 iff μA (x) = 0.5 for ∀x ∈ X, 3. E(A) ≤ E(B) if A is less fuzzy than B, i.e., if μA ≤ μB when μB ≤ 0.5 and μA ≤ μB when μB ≥ 0.5 , 4.E(A) = E(Ac ) . Since the De Luca and Termini axioms were formulated for fuzzy sets, we extend them for V IF S. Definition 3.3. A real function E : V IF S(X) → [0, 1] is an entropy measure if E has the following properties:
Entropy and Subsethood for General Interval-Valued IFS
1.E(A) = 0 ⇔ A ∈ δ(P (X)) , 3.E(A) ≤ E(B) if Arefines B ,
2.E(A) = 1 ⇔ νA (x) = μA (x) 4.E(A) = E(Ac ) .
49
(∀x ∈ X) ,
Definition 3.4. Let σ: V IF S(X) → [0, 1], where any A ∈ V IF S(X), we have c ) (∀x ∈ X). σ(A) = M(A∧A M(A∨Ac ) From proposition 2.3, we know that A ∨ Ac = θ(∀A ∈ V IF S(X)) , then from proposition 3.1, we know that M (A ∨ Ac ) = 0, and then from A ∨ Ac ≥ A ∧ Ac and proposition 3.2, we know M (A ∨ Ac ) ≥ M (A ∧ Ac ), that is c ) 0 ≤ σ(A) = M(A∧A M(A∨Ac ) ≤ 1 . So the definition of is reasonable. Theorem 3.2. σ is entropy. Proof. 1. σ(A) = 0 ⇔ M (A ∧ AC ) = 0 ⇔ A ∧ Ac = θ ⇔ A ∈ δ(P (X)) . 2. σ(A) = 1 ⇔ M (A ∧ Ac ) = M (A ∨ Ac ) . For A ∨ Ac ≥ A ∧ Ac , we get νA∨Ac (x) ≤ νA∧Ac (x) , (∀x ∈ X) . So from proposition 3.3, we have νA∨Ac (x) = νA∧Ac (x) (∀x ∈ X). And from proposition 2.4, we get μA (x) = νA (x) (∀x ∈ X). 3. If A refines B, we have A ∧ Ac ≤ B ∧ B c . From proposition 3.2, we have M (A ∧ Ac )≤ M (B ∧ B c ). And from A ∨ Ac ≥ B ∨ B c , we have M (A ∨ Ac ) c ) M(B∧B c ) ≥ M (B ∨ B c ). So we get M(A∧A M(A∨Ac ) ≤ M(B∨B c ) , that is σ(A) ≤ σ(B) . 4. It is straight-forward. So we constructed a class of entropy function on V IF S.
4
Subsethood on VIFS
Definition 4.1. A real function Q : IF S(X) × IF S(X) → [0, 1] is called subsethood, if Q has the following properties: 1. Q(A, B) = 0 ⇔ A = I, B = θ, 2. If A ≤ B ⇒ Q(A, B) = 1, 3. If A ≥ B and Q(A, B) = 1, then A = B, 4. If A ≤ B ≤ C, then Q(C, A) ≥ Q(C, B), Q(C, A) ≤ (Q(B, A)). Definition 4.2. We define the function f : V IF S(X) × V IF S(X) → [0, 1] by − − − 1 { min[1, g(ϕ(μ− f (A, B) = 2n A (x) − μB (x) + 1), ψ(νA (x) − νB (x) + 1))] x∈X + + + + min[1, g(ϕ(μ+ A (x) − μB (x) + 1), ψ(νA (x) − νB (x) + 1))]} . x∈X
where ϕ: [0, 2] → [0, 2] and ψ: [0, 2] → [0, 2] with the following properties: 1. α > β ⇒ ϕ(α) > ϕ(β), ψ(α) > ψ(β) (α, β ∈ [0, 2]), 2. ϕ(α) = 2 ⇔ α = 2; ψ(β) = 0 ⇔ β = 0 , 3. ϕ(1) = ψ(1) = 1 . And the function g : [0, 2] × [0, 2] → [0, 2] with the following properties: 1. α > β ⇒ g(α, γ) < g(β, γ), g(γ, α) > g(γ, β) (α, β, γ ∈ [0, 2]) , 2. g(α, β) = 0 ⇔ α= 2, β= 0 , 3. g(1, 1) = 1 .
50
X.-d. Liu, S.-h. Zheng, and F.-l. Xiong
Theorem 4.1. f is subsethood. Proof. Let A, B ∈ V IF S(X) 1. For each x ∈ X, we have μI (x) = 1, νI (x) = 0, μθ (x) = 0 and νθ (x) = 1. − − − Then g(ϕ(μ− A (x) − μB (x) + 1), ψ(νA (x) − νB (x) + 1)) = 0 , + + + + g(ϕ(μA (x) − μB (x) + 1), ψ(νA (x) − νB (x) + 1)) = 0 . So we get f (I, θ) = 0 . On the contrary, If f (A, B) = 0 , Then for each x ∈ X , We have − − − g(ϕ(μ− A (x) − μB (x) + 1), ψ(νA (x) − νB (x) + 1)) = 0 . − − − − And then ϕ(μA (x) − μB (x) + 1) = 2 , ψ(νA (x) − νB (x) + 1) = 0 . − − − − (νA (x)−νB (x)+1) = 0 (2) . So we can get (μA (x)−μB (x)+1) = 2 (1), − − − From (1), we have μ− A (x) = μB (x) + 1 . for μA (x) ≤ 1, μB (x) ≥ 0 , We have − − μA (x) = 1 , μB (x) = 0 . − − − − (x) = νA (x) + 1 . Similarly, we haveνA (x) = 0, νB (x) = 1 . From (2), we have νB + + + By the same way, we can prove μA (x) = 1, νA (x) = 0, μB (x) = 0, and + νB (x) = 1 . Thus we have μA (x) = 1 , νA (x) = 0 , μB (x) = 0 , and νB (x) = 1 , that is A = I and B = θ . 2. If A ≤ B , then for each x ∈ X, we have μA (x) ≤ μB (x) and νA (x) ≥ νB (x). So we have the following four inequations: − + + μ− A (x) − μB (x) + 1 ≤ 1 , μA (x) − μB (x) + 1 ≤ 1 , − − + + νA (x) − νB (x) + 1 ≥ 1 , νA (x) − νB (x) + 1 ≥ 1 . − − + Then we have ϕ(μA (x) − μB (x) + 1) ≤ 1 , ψ(μA (x) − μ+ B (x) + 1) ≤ 1 , − − + + ϕ(νA (x) − νB (x) + 1) ≥ 1 , ψ(νA (x) − νB (x) + 1) ≥ 1 . And then we have − − − g(ϕ(μ− A (x) − μB (x) + 1), ψ(νA (x) − νB (x) + 1)) ≥ g(1, 1) = 1 , + + + + g(ϕ(μA (x) − μB (x) + 1), ψ(νA (x) − νB (x) + 1)) ≥ g(1, 1) = 1 . So we get − − − 1 { min[1, g(ϕ(μ− f (A, B) = 2n A (x) − μB (x) + 1), ψ(νA (x) − νB (x) + 1)) + + + + + min[1, g(ϕ(μA (x) − μB (x) + 1), ψ(νA (x) − νB (x) + 1))} =1. 3. If f (A, B) = 1, then for each x ∈ X, we have − − − g(ϕ(μ− A (x) − μB (x) + 1) , ψ(νA (x) − νB (x) + 1)) ≥ 1 , + + + + g(ϕ(μA (x) − μB (x) + 1) , ψ(νA (x) − νB (x) + 1)) ≥ 1 . And from A ≥ B, we have μA (x) ≥ μB (x), μA (x) ≤ μB (x) (∀x ∈ X). So we − + + have the following four inequations: μ− A (x) − μB (x) + 1 ≥ 1, μA (x) − μB (x) + 1 ≥ − − + + 1, νA (x) − νB (x) + 1 ≤ 1, νA (x) − νB (x) + 1 ≤ 1. If A = B , then at least one of the four inequations given above would be never equal: − 1) If μ− A (x) − μB (x) + 1 > 1, then we have − − − g(ϕ(μ− A (x) − μB (x) + 1), ψ(νA (x) − νB (x) + 1)) − − < g(1, ψ(νA (x) − νB (x) + 1)) ≤ g(1, 1) ≤ 1. It is contradiction. + 2)If μ+ (x) − μ (x) + 1 > 1, then we have A B + + (x) − μ+ g(ϕ(μ+ A B (x) + 1), ψ(νA (x) − νB (x) + 1)) + + < g(1, ψ(νA (x) − νB (x) + 1)) ≤ g(1, 1) = 1. It is contradiction.
Entropy and Subsethood for General Interval-Valued IFS
51
− − 3) If νA (x) − νB (x) + 1 < 1, then we have − − − g(ϕ(μ− A (x) − μB (x) + 1), ψ(νA (x) − νB (x) + 1)) − − < g(ϕ(μA (x) − μB (x) + 1), 1) ≤ g(1, 1) = 1. It is contradiction. + + (x) − νB (x) + 1 < 1, then we have 4) If νA + + + g(ϕ(μ+ A (x) − μB (x) + 1), ψ(νA (x) − νB (x) + 1)) + + < g(ϕ(μA (x) − μB (x) + 1), 1) ≤ g(1, 1) = 1. It is contradiction. So we have A = B. 4. Let A ≤ B ≤ C (A, B, C ∈ V IF S(X)) ,then for each x belongs to X, we have μA (x) ≤ μB (x) ≤ μC (x) and νA (x) ≥ νB (x) ≥ νC (x). So we have − − − − − − − μ− C (x) − μA (x) ≥ μC (x) − μB (x) , νC (x) − νA (x) ≤ νC (x) − νB (x) , + + + + + + + + μC (x) − μA (x) ≥ μC (x) − μB (x) , νC (x) − νA (x) ≤ νC (x) − νB (x) . Then we get − − 1 − { min[1, g(ϕ(μ− f (C, A) = 2n C (x) − μA (x) + 1), ψ(νc (x) − νA (x) + 1))] + + + + + min[1, g(ϕ(μC (x) − μA (x) + 1), ψ(νC (x) − νA (x) + 1))} − − − 1 − (x) + 1))] ≤ 2n { min[1, g(ϕ(μC (x) − μB (x) + 1), ψ(νc (x) − νB + + + + min[1, g(ϕ(μ+ (x) − μ (x) + 1), ψ(ν (x) − ν (x) + 1))]} C B C B = f (C, B) . In the same way, we can prove f (C, A) = f (B, A) .
Eg2. For ϕ, ψ : [0, 2] → [0, 2], g : [0, 2] × [0, 2] → [0, 2], ϕ(x) = x, ψ(y) = y, g(x, y) = [(2 − x) + y] × 12 . It is straight-forward to prove that ϕ and ψ has the following qualities: 1). α > β ⇒ ϕ(α) > ϕ(β), ψ(α) > ψ(β), (α, β ∈ [0, 2]) . 2). ϕ(α) = 2 ⇔ α = 2 ; ψ(β) = 0 ⇔ β = 0 . 3). ϕ(1) = ψ(1) = 1 . And to g , it also has qualities as follows: 1). α > β ⇒ g(α, γ) < g(β, γ), g(γ, α) > g(γ, β) (α, β, γ ∈ [0, 2]) , 2). g(α, β) = 0 ⇔ α= 2, β= 0 , 3).g(1, 1) = 1 . It is shown in theorem 4.2. the relationship between entropy and subsethood on V IF S. where Theorem 4.2. Let Q is subsethood, ρ: V IF S(X) → [0, 1], ρ(A) = Q(A ∨ Ac , A ∧ Ac ) (∀A ∈ V IF S(X)), then we have ρ is entropy. Proof. 1. ∀A ∈ δ(P (X)) for A ∨ Ac = I, A ∧ Ac = θ , we have ρ(A) = Q(A ∨ Ac , A ∧ Ac ) = Q(I, θ) = 0 . On the contrary, if ρ(A) = Q(A ∨ Ac , A ∧ Ac ) = 0 . Then we have A ∨ Ac = I, A ∧ Ac = θ , so we get ∀A ∈ δ(P (X)) . 2. If E(A) = 1 that is Q(A ∨ Ac , A ∧ Ac ) = 1 and A ∨ Ac ≥ A ∧ Ac . Then we have A ∨ Ac = A ∧ Ac . So we get μA (x) = νA (x) ; On the contrary, if μA (x) = νA (x), we have A ∨ Ac = A ∧ Ac , So we may get Q(A ∨ Ac , A ∧ Ac ) = E(A) = 1. 3. If A refines B, then we have A∧Ac ≤ B ∧ B c ≤ B ∨ B c ≤ A ∨ Ac , so we can get E(A) = Q(A ∨ Ac , A ∧ Ac ) ≤ Q(B ∨ B c , A ∧ Ac ) ≤ Q(B ∨ B c , B ∧ B c ) ≤ E(B). 4. E(A) = Q(A ∨ Ac , A ∧ Ac ) = Q(Ac ∨ A, Ac ∧ A) = E(Ac ) .
52
X.-d. Liu, S.-h. Zheng, and F.-l. Xiong
From theorem 4.1 and theorem 4.2, we can construct a class of reasonable subsethood and entropy function, which are useful in different practical conditions.
5
Conclusion
In this paper, we offered different kinds of entropy function and subsethood function and they would be practical in different experiments. would be practical in different experiments.
References 1. Zadeh, L.A.: Fuzzy sets. Inform.and Control 8 (1965) 338–353 2. Zadeh, L.A.: Fuzzy sets and Systems. In: Proc. Systems Theory, Polytechnic Institute of Brooklyn, New York (1965) 29–67 3. Atanassov, K.T.: Intuitionistic Fuzzy Sets. In: V.Sgurev(Ed.), VII ITKR’s session, Sofia, June (1983), Central Sci.and Techn.Library, Bulgaria Academy of Sciences(1984) 4. Atanassov, K.T.: Intuitionistic fuzzy sets. Fuzzy Sets and Systems 20 (1986) 87–97 5. Atanassov, K.T., Gargov, G.: Interval valued intuitionistic fuzzy sets. Fuzzy Sets and Systems 31 (1989) 343–349 6. Atanassov, K.T.: Intuitionistic Fuzzy Sets. Physica-Verlag, Heidelberg, New York(1999) 7. Jaynes, E.T.: Where do We Stand on Maximum Entropy? In: Levine, Tribus (Eds.), The Maximum Entropy Formalism, MIT Press, Cambridge, MA. 8. Szmidt, E., Kacprzyk, J.: Entropy for Intuitionistic Fuzzy Sets. Fuzzy Sets and Systems 118 (2001) 467–477 9. Yu-hai Liu, Feng-lan Xiong, Subsethood on Intuitionistic Fuzzy Sets. International Conference on Machine Learning and Cybernetics V.3 (2002) 1336–1339 10. Glad Deschrijver, Etienne E.Kerre, On her Relationship between some Extensions of Fuzzy Set Theory. Fuzzy Sets and Systems 133 (2003) 277–235 11. Guo-jun Wang, Ying-ming He: Intuitionistic Sets and Fuzzy Sets. Fuzzy Sets and Systems 110 (2000) 271–274 12. Luca, A.Ed., Termini, S.: A Definition of a Non-probabilistic Entropy in the Setting of Fuzzy Sets Theory. Inform. and Control 20 (1972) 301–312
The Comparative Study of Logical Operator Set and Its Corresponding General Fuzzy Rough Approximation Operator Set Suhua Zheng, Xiaodong Liu, and Fenglan Xiong Department of Mathematics, Ocean University of China; Qingdao 266071, Shandong, P.R. China [email protected] [email protected]
Abstract. This paper presents a general framework for the study of fuzzy rough sets in which constructive approach is used. In the approach, a pair of lower and upper general fuzzy rough approximation operator in the lattice L is defined. Furthermore, the entire property and connection between the set of logical operator and the set of its corresponding general fuzzy rough approximation operator are examined, and we prove that they are 1-1 mapping. In addition, the structural theorem of negator operator is given. At last, the decomposition and synthesize theorem of general fuzzy rough approximation operator are proved. That for how to promote general rough approximation operator to suitable general fuzzy rough approximation operator, that is, how to select logical operator , provides theory foundation.
1
Introduction
After Pawlak proposed the notion of rough set in 1982[3], many authors have investigated attribute reduction and knowledge representation based on rough set theory. However, about fuzzy rough set and fuzzy rough approximation theory, the study ([4][5][6]) is not penetrating enough. In 2002, all kinds of logical operators and their corresponding fuzzy rough sets are defined and investigated in [1]. In 2003, generalized approximation operators are studied in [10]. This paper is the development and extension, we define the upper and lower general fuzzy rough approximation operator decided by t-norm and t-conorm in the lattice L. Besides, we study the entire properties and relativity between the logical operator set and its corresponding fuzzy rough approximation operator set. This paper is organized as follows. In section 2, we define the mapping σ from the set of t-norm operators in the lattice L to the set of general fuzzy rough approximation operators, and we prove that σ is a surjection. Furthermore, we construct 1-1 mapping from the set of t-norm equivalent classes to the set of general upper fuzzy roughapproximation operators. Subsequently, in section 3, we define the mapping N from t-norm operator set to the pair of lower and upper fuzzy rough approximation operator set. In addition, we give the constructional theorem of negator, so we can construct different forms of negator L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 53–58, 2005. c Springer-Verlag Berlin Heidelberg 2005
54
S. Zheng, X. Liu, and F. Xiong
operators , then get the dual t-norm and t-conorm operators with respect to N and their corresponding general lower and upper fuzzy rough approximation operators. In section 4, we prove the decomposition and synthesize theorem of general fuzzy rough approximation operator. In this paper, unless otherwise stated, we will consider L be a lattice with the largest and least element 0, 1; X, Y and Z to be finite and nonempty set; FL (X) to be all the fuzzy sets on X whose range is L; R ∈ FL (X × Y ) where R is a serial fuzzy relation[10]: for every x ∈ X, there exists y ∈ Y , such that R(x, y) = 1.
2
The Properties and Relativity of t-norm Operator Set and General Upper Fuzzy Rough Approximate Operator Set
Definition 2.1 ([1]) A t-norm operator in L is a mapping t : L × L −→ L, if for ∀x, y, y1 , y2 ∈ L, t satisfies the following conditions: 1)t(x, y) = t(y, x); 2)t(x, 1) = x ; 3)If y1 ≥ y2 , then t(x, y1 ) ≥ t(x, y2 ). From the definition 2.1, it is easy to get the follows: 4) t(x, 0) = 0 for all x ∈ L ;
5) t(x, y) ≤ x ∧ y
for all x, y ∈ L;
Definition 2.2. ([1][3])Let t be a t-norm in L, then ϕt : FL (Y ) → FL (X) is called a general upper fuzzy rough approximation operator decided by t, if for every x ∈ X, A ∈ FL (Y ), it have: ϕt (A)(x) = sup t(R(x, y), A(y)). y∈Y
For briefly, we denote ϕt by ϕ. And, (X, Y, R) is called the general fuzzy rough approximation space. Theorem2.1. Let λ ∈ L, xi , xj , xk , xe ∈ X; y r , y d , y p , y q ∈ Y ; then ϕ has the following properties: z), λ). p2) ϕ(φ) = φ. p1) ϕ(z λ )(x) = t(R(x, p3)ϕ( zλi i ) = ϕ(zλi i ). (zλi i ∈ FL (Y ), I is finite; and if λi ∈ {λi |i ∈ I}, i∈I
i∈I
i∈I
i, j ∈ I, i = j, it followszi = zj ). p4)If R(xi , y d ) = R(xj , y r ), it follows that ϕ(yλd )(xi ) = ϕ(yλr )(xj ) . p5)If R(xi , y d ) = 1, it follows that ϕ(yλd )(xi ) = λ. d i r j p6) ϕ(yR(x j ,y r ) )(x ) = ϕ(yR(xi ,y d ) )(x ). p7) Let R(xi , y d ) ≥ R(xj , y r ), it follows that ϕ(yλd )(xi ) ≥ ϕ(yλr )(xj ). p8) Let u, v ∈ L and satisfying : R(xi , y d ) < u < R(xj , y r ), R(xk , y p ) < v < R(xe , y q ); it follows that : ϕ(yvd )(xi ) ∨ ϕ(yup )(xk ) ≤ ϕ(yvr )(xj ) ∧ ϕ(yuq )(xe ). From now on, assume that ∀α ∈ L, α and R(x, y)(∀x ∈ X, y ∈ Y ) are comparative. Theorem2.2. Let L = {R(x, y) | x ∈ X, y ∈ Y }, L {0} = {0 = L0 < L1 < L2 < . . . < Ls = 1}, T = {t|t is a t-norm operator in L}, M = {ϕ|ϕ : FL (Y )→
The Comparative Study of Logical Operator Set and Its Corresponding
55
FL (X) and satisfies p2) − p8)}; we define σ : T → M , by σ(t) = ϕt . Then σ is a surjection. Proof: Firstly, we construct t(u, v) from ϕ: 1) If 0 ∈ {u, v}, then we define t(u, v) = 0; 2) If neither of them is equal to zero and at least one of them belong to L − {0}, we have: If u ∈ L − {0}, then we can assume u = Li = R(xi , y i ) , where 0 < i ≤ s, xi ∈ X, y i ∈ Y , so we define t(u, v) = ϕ(yvi )(xi ); By theorem 2.1 p4) follows that the definition is reasonable. Otherwise , let t(u, v) = t(v, u) . We can see that: If v = R(xj , y j ), from p6) j i i j it follows that: ϕ(yR(x j ,y j ) )(x ) = ϕ(yR(xi ,y i ) )(x ) i.e. t(u, v) = t(v, u). 3) If u = 0, v = 0 and u, v ∈ L − {0} , let R(xi , y i ) = Li < u < Li+1 = R(xi+1 , y i+1 ), R(xj , y j ) = Lj < v < Lj+1 = R(xj+1 , y j+1 ). We define: t(u, v) = ϕ(yvi )(xi )∨ϕ(yuj )(xj ). And we can know: If arbitrary x ∈ X, y ∈ Y , it follows that R(x, y) = 0; then we define R(x0 , y 0 ) = 0 = ϕ(yv0 )(x0 ) = ϕ(yu0 )(x0 ). Now , we can prove that t(u, v) is a t-norm, and ϕt = ϕ. From all of the above, we obtain that σ is a surjection. Theorem2.3. Let σ and T be defined as before t1 , t2 ∈ T , then for every x ∈ X, y ∈ Y, λ∈ L, the following conclusion holds: σ(t1 ) = σ(t2 ) if and only if t1 (R(x, y), λ) = t2 (R(x, y), λ). Corollary2.4. Let arbitrary x ∈ X, y ∈ Y, λ ∈ L, If we define the equivalent relation on T as follows: t1 ∼ t2 ⇔ t1 (R(x, y), λ) = t1 (R(x, y), λ); then ∃μ : T /∼−→ M ,by μ([t]) = σ(t), ∀[t] ∈ T /∼; furthermore, μ is a 1-1 mapping. Definition2.3. We use t to denote the t-norm that we construct only by 1),2),3) in theorem 2.2 . In fact, we also can construct other t-norm by ϕ, we can only change in 3) as follows: Let R(xi , y i ) = Li < u < Li = R(xi+1 , y i+1 ), R(xj , y j ) = Lj < v < Lj+1 = R(xj+1 , y j+1 ); where 0 ≤ i, j ≤ s − 1. Then we can definite t(u, v) = ϕ(yvi+1 )(xi+1 ) ∧ ϕ(yuj+1 )(xj+1 ). We denote it by t, we can prove that σ(t) = σ(t). In order to show it more clearly, we give theorem 2.5. Theorem2.5. Let arbitrary u, v ∈ L, t ∈ T, ϕ∈ M , and σ(t) = ϕ, then we have t(u, v) ≤ t(u, v) ≤ t(u, v).
3
The Structural Theorem of Negator , the t-norm Equivalent Classes and the Pair of Dual Upper and Lower General Fuzzy Rough Approximation Operator is 1-1 Corresponding
Definition 3.1. [1] A t-conorm operator in L is a mapping g : L × L −→ L. If for all x, y, y1 , y2 ∈ L, g satisfies the following conditions:
56
S. Zheng, X. Liu, and F. Xiong
1)g(x, y) = g(y, x) ; 2)g(x, 0) = x;
3) If y1 ≥ y2 , then g(x, y1 ≥ g(x, y2 ).
From the definition 3.1, it is easy to get the property: 4)g(x, y) ≥ x ∨ y for every x, y ∈ L. We denote G = {g|g is a t-conorm in L}. Definition 3.2[1]: A function N : L −→ L is called negator , if ∀x, y ∈ L, N has the properties: 1)N (N (x)) = x; 2) If x ≥ y, then N (x) ≤ N (y). It is easy to know the following properties hold: N (xi ) = N ( xi ), where I is finite and xi ∈ L. 3)N (0) = 1, N (1) = 0; 4)[8] i∈I
i∈I
In addition, w e denote Π = {N |N is a negator in L.}, and assume Π = Φ Definition 3.3[1] Let t ∈ T, g ∈ G, N ∈ Π, if they satisfy t(x.y) = N (g(N (x), N (y))), we call that t, g are dual with respect to N . It is easy to know that the following property holds: t, g is dual with respect to N if and only if t(N (x), N (y)) = N (g(x, y)). Definition 3.4.[1][8]: Let arbitrary A ∈ FL (Y ), x ∈ X; ψg : FL (Y ) −→ FL (X) is called general lower fuzzy rough approximation operator, if ψg (A)(x) = inf g(N (R(x, y)), A(y)). y∈Y
Let x ∈ Y, A ∈ FL (Y ), we define N (A) ∈ FL (Y ) by N (A(x)) = N (A)(x). Definition 3.5. Let ϕ, ψ : FL (Y ) → FL (X) are the upper and lower general fuzzy rough approximation operator respectively, we call that they are dual with respect to N if they satisfy: for every A ∈ FL (Y ), ϕ(A) = N (ψ(N (A))). Theorem 3.1. Let E = {ψg |g ∈ G}, and WN = {(ϕ, ψ)|ϕ, ψare dual with respect toN } ⊂ M × E, where N is given. Let arbitrary t ∈ T , we define the function ΣN : T −→ WN by ΣN (t) = (ϕt , ψg ), where g is dual to t with respect to N . Then there exists ν : T /∼ → WN by ν([t]) = ΣN (t), and σ is a 1-1 mapping. Theorem 3.2. Let L = [0, 1], h(x) is a negator if and only if there is a function f (x) in [0, 1] ,which is continuous , strictly monotone decreasing and satisfying f (0) = 1 , and ∃ξ ∈ (0, 1) such that ⎧ if 0 ≤ x ≤ ξ, ⎨ f (x), h(x) = ⎩ −1 f (x), if ξ < x ≤ 1. Remark. Theorem 3.2 is different from the constructional theorem of negator in [9]. There exist examples to prove that.
4
The Decomposition and Composition Theorem of General Fuzzy Rough Approximation Operator
In this section, L is an order lattice with the largest andleast element denoted by 1,0. And, it has negator operator which satisfies that λ = α, for all α ∈ L. λ 1 ⇒ conf (r) > sup(cr ), ar and cr are positively correlated, and the following holds: 0 < conf (r) − sup(cr ) ≤ 1 − sup(cr ) In particular, we have: 0 < Cm =
conf (r) − sup(cr ) ≤1 1 − sup(cr )
The nearer the ratio Cm is close to 1, the higher the positive correlation between antecedent ar and class label cr . 3. Dependence(ar , cr ) < 1 ⇒ conf (r) < sup(cr ), ar and cr are negatively correlated, and the following holds: −sup(cr ) ≤ conf (r) − sup(cr ) < 0 In particular, we have: −1 ≤ Cm =
conf (r) − sup(cr ) MIN TOTAL WEIGHT) do 4 A ← A, T ← T ; ar ← ∅; 5 6 while (1) do 7 foreach literal li ∈ A do CalculateFoilGain(li ) in T ; 8 l = AttributeOfBestGain(); 9 if l.gain ≤ MIN BEST GAIN then 10 φ = Cm (ar , ci ); 11 if φ ≥ φmin ∧ N oMoreGeneralRule(ar ⇒ ci ) then 12 R ← R ∧ (ar ⇒ ci , φ); 13 end 14 if φ ≤ −φmin ∧ N oMoreGeneralRule(a r ⇒ ci ) then R ← R ∧ (ar ⇒ ci , φ); 15 16 end 17 end 18 gainThreshold = bestGain*GAIN SIMILARITY RATIO; 19 foreach l ∈ A do if l .gain ≥gainThreshold then 20 21 ar ← ar ; ar ← ar ∧ l ; 22 remove ar from A ; 23 remove each t ∈ T dissatisfy ar ; 24 A ← A , T ← T ; 25 CARGenerator(T ,A ,ci ); 26 end 27 end ar ← ar ∧ l; 28 29 remove ar from A ; 30 remove each t ∈ T dissatisfy ar ; 31 end 32 end foreach t ∈ P satisfy ar do reduce t.weight by a decay factor 33 34 end
Once the classifier has been established in the form of a list of rules, regardless of the methodology used to generate it, there are existing a number of strategies for using the resulting classifier to classify unseen data object as follows: (1) Choose the single best rule which matches the data object and has the highest ranks to make a prediction, (2) Collect all rules in the classifier satisfying the given unseen data and make a prediction by the ”combined effect” of different class association rules, and (3) Select k-best rules for each class, evaluates the average accuracy of each class and choose the class with the highest expected accuracy as the predicted class. In our case, ACBCA establish the classifier in the form of a list of ordered rules in the process of rules generation. Instead of taking all rules satisfying the
66
J. Chen et al.
new data object into consideration, ACBCA just selects a small set of strong correlated association rules to make prediction by the following procedure: Step 1: For those rules satisfying the new data in antecedent, selects the first k-best rules for each class according to rules’ consequents; Step 2: Calculates the rank of each group by summing up the Cm of relevant rules. We think the average accuracy is not advisable because many trivial rules with low accuracy will weaken the effect of the whole group; Step 3: The class label of new data will be assigned by that of the highest rank group.
4
Experimental Results and Performance Study
To evaluate the accuracy and efficiency of ABCBA, we have performed an extensive performance study on some datasets from UCI Machine learning Repository [10]. It would have been desirable to use the same datasets as those used by other associative classification approaches; however it was discovered that many of these datasets were no longer available in the UCI repository. All the experiments are performed on a 2.4GHz Pentium-4 PC with 512MB main memory. And a 10-fold cross validation was performed on each dataset and the results are given as average of the accuracies obtained for each fold. The parameters of ABCBA are set as the following. In the rule generation algorithm, MIN TOTAL WEIGHT is set to 0.05, MIN BEST GAIN to 0.7, GAIN SIMILARITY RATIO to 0.99, φmin to 0.2 and decay factor to 2/3. The best 3 rules are used in prediction. The experimental results for C4.5, Ripper, CBA, CMAR and CPAR are taken from [6]. The table 1 gives the average predictive accuracy for each algorithm respectively. Bold values denote the best accuracy for the respective dataset. Column 2∼3 present the results of traditional rule-based classification algorithms C4.5 and Ripper. Column 4∼6 show the results of several recently developed associative classification methods CBA, CMAR and CPAR. Column 7 describes the results of ABCBA when only positive rules are considered for classification. Column 8 shows the results of ABCBA, whichconsiders both positive rules of the form ar ⇒ cr and negative rules of the form ar ⇒ cr . As can be seen, ABCBA on almost all occasions achieves best accuracy or comes very close. Moreover, the classification accuracy increases when the correlated rules are taken into consideration. This is most noticeable for the Ionosphere, the Iris Plant and the Pima Indians diabetes datasets on which the correlated rules can give better supervision to the classification.
5
Conclusions
Associative classification is an important issue in data mining and machine learning involving large, real databases. We have introduced ABCBA, a new associative classification algorithm based on correlation analysis. ABCBA differs from
Associative Classification Based on Correlation Analysis
67
Table 1. Comparison of Accuracies on UCI Different Datasets Dataset austral breast cleve crx diabetes german hepatic horse iono iris labor led7 pima wine zoo Average
C4.5 84.7 95 78.2 84.9 74.2 72.3 80.6 82.6 90 95.3 79.3 73.5 75.5 92.7 92.2 83.4
Ripper 87.3 95.1 82.2 84.9 74.7 69.8 76.7 84.8 91.2 94 84 69.7 73.1 91.6 88.1 83.15
CBA 84.9 96.3 82.8 84.7 74.5 73.4 81.8 82.1 92.3 94.7 86.3 71.9 72.9 95 96.8 84.69
CMAR 86.1 96.4 82.2 84.9 75.8 74.9 80.5 82.6 91.5 94 89.7 72.5 75.1 95 97.1 85.22
CPAR 86.2 96 81.5 85.7 75.1 73.4 79.4 84.2 92.6 94.7 84.7 73.6 73.8 95.5 95.1 84.77
ABCBA p ABCBA all 85.22 85.8 95.1 94.5 78.21 79.21 85.51 85.65 76.67 75.99 70.9 73.3 80.83 80.83 82.02 82.02 92.02 93.74 94.67 95.3 89.17 97.17 72.56 73.91 74.08 76.18 93.13 94.5 96 94 84.41 85.47
existing algorithms for this task insofar, that it does not employ the supportconfidence framework and take both positive and negative rules under consideration. It uses an exhaustive and greedy algorithm based Foil Gain to extract CARs directly from the training set. During the rule building process, instead of selecting only the best literals, ABCBA inherits the basic idea of CPAR, keeping all close-to-the-best literals in rules generation process so that it will not miss the some important rule. Since the class distribution in the dataset and the correlated relationship among attributes should be taken into account because it is conforming to the usual or ordinary course of nature in the real world, we present a new rule scoring schema named Cm to evaluate the correlation of the CARs. The experimental results of the ABCBA show that a much smaller set of positive and negative association rules can achieve best accuracy or comes very close on almost all occasions. Actually, we just use the association rules of the type (ar ⇒ cr ) and (ar ⇒ cr ) for classification. These two type rules have an direct association with the class label, so they can be considered together. However, if there are more than two classes in the dataset, (ar ⇒ cr ) and (ar ⇒ cr ) just provide information that ”not-belong-to”, instead of giving the association between data attributes and class label directly. We are currently investigating reasonable and effective methods to use these two kinds of rules.
References 1. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann (1993) 2. Cohen, W.W.: Fast effective rule induction. In Prieditis, A., Russell, S.J., eds.: ICML 1995, Tahoe City, California, USA, Morgan Kaufmann (1995) 115-123
68
J. Chen et al.
3. Liu, B. Hsu, W. and Ma, Y.: Integrating Classification and Association Rule Mining. Proceedings KDD-98, New York, 27-31 August. AAAI. (1998) 80-86. 4. Li W., Han, J. and Pei, J.: CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules. Proceedings of the 2001 IEEE International Conference on Data Mining, San Jos´e, California, USA, IEEE Computer Society (2001) 5. J. R. Quinlan and R. M. Cameron-Jones: FOIL: A midterm report. In Proceedings of European Conference. Machine Learning, Vienna, Austria 1993, (1993) 3-20 6. X. Yin and J. Han: CPAR: Classification based on Predictive Association Rules, Proc. 2003 SIAM Int.Conf. on Data Mining (SDM’03), San Fransisco, CA, May 2003. 7. Piatetsky-Shapiro, G.,: Discovery, Analysis, and Presentation of Strong Rules. Knowledge Discovery in Databases, G. Piatetsky-Shapiro and WJ Frawley (Eds.), AAAI/MIT Press, 1991, pp. 229-238. 8. Sergey Brin, Rajeev Motwani, and Craig Silverstein: Beyond market baskets: Generalizing association rules to correlations. SIGMOD Record (ACM Special Interest Group on Management of Data), (1997) 265-276. 9. Xindong Wu,Chengqi Zhang,Shichao Zhang: Efficient Mining of Both Positive and Negative Association Rules. ACM Transactions on Information Systems, 22(2004), 3: 381-405. 10. Blake, C.L. and Merz, C.J. (1998). UCI Repository of machine learning databases. http://www.ics.uci.edu/ mlearn/MLRepository.html, Irvine, CA: University of California, Department of Information and Computer Science.
Design of Interpretable and Accurate Fuzzy Models from Data Zong-yi Xing 1, Yong Zhang 1, Li-min Jia 2, and Wei-li Hu 1 1
Nanjing University of Science and Technology, Nanjing 210094, China 2 Beijing Jiaotong University, Beijing, 100044, China [email protected]
Abstract. An approach to identify data-driven interpretable and accurate fuzzy models is presented in this paper. Firstly, Gustafson-Kessel fuzzy clustering algorithm is used to identify initial fuzzy model, and cluster validity indices are adopted to determine the number of rules. Secondly, orthogonal least square method and similarity measure of fuzzy sets are utilized to reduce the initial fuzzy model and improve its interpretability. Thirdly, constraint LevenbergMarquardt algorithm is used to optimize the reduced fuzzy model to improve its accuracy. The proposed approach is applied to PH neutralization process, and results show its validity.
1 Introduction During the past years, fuzzy modeling techniques have become an active research area due to its successful application to classification, data mining, pattern recognition, simulation, prediction, control, etc [1-4]. Several fuzzy modeling methods have been proposed including fuzzy clustering based algorithm [5], neurofuzzy systems [6-7] and genetic rules generation [8-9]. However all these technologies only focus on precision that simply fit data with highest possible accuracy, neglecting interpretability of obtained fuzzy models, which is considered as a primary merit of fuzzy systems and is the most prominent feature that distinguishes fuzzy systems from many other models [10]. In order to improve interpretability of fuzzy models, some methods have been developed. Setnes et al. [11] proposed a set-theoretic similarity measure to quantify the similarity among fuzzy sets, and to reduce the number of fuzzy sets in the model. Yen et al. [12] introduced several orthogonal transformation techniques for selecting the most important fuzzy rules from a given rule base in order to construct a compact fuzzy model. Abonyi et al. [13] proposed a combined method to create simple TakagiSugeno fuzzy model that can be effectively used to represent complex systems. Delgado [14] presents fuzzy modeling as a multi-objective decision making problem considering accuracy, interpretability and autonomy as goals, and all these goals are handled via single-objective ε – constrained decision making problem which is solved by a hierarchical evolutionary algorithm. This paper proposes systematic techniques to construct interpretable and accurate fuzzy models. The paper is organized as follows: section 2 describes the TakagiSugeno fuzzy model. The systematic fuzzy modeling approach, including fuzzy clustering algorithm, cluster validity measure, rule reduction and fuzzy sets merging L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 69 – 78, 2005. © Springer-Verlag Berlin Heidelberg 2005
70
Z.-y. Xing et al.
and constraint Levenberg-Marquardt optimization, is stated in section 3. The proposed method is demonstrated on the PH neutralization benchmark in section 4. Section 5 concludes the paper.
2 Takagi-Sugeno Fuzzy Model The Takagi-Sugeno (TS) Fuzzy model [1] was proposed by Takagi and Sugeno in an effort to develop a systematic approach to generating fuzzy model from a given inputoutput data set. A typical fuzzy rule of the model has the form:
R i : IF x1 is Ai ,1 and and x p is Ai , p
(1)
THEN yi = ai 0 + ai1 x1 + + aip x p where xj are the input variables, Aij are fuzzy sets defined on the universe of discourse of the input variables, yi are outputs of rules. The output of the TS fuzzy model is computed using the normalized fuzzy mean formula: c
y (k ) = ¦ pi ( x) yˆi
(2)
i =1
where c is the number of rules, Pi is the normalized firing strength of the ith rule:
∏ A (x ) P ( x) = ¦ ∏ A (x ) p
ij
j =1
i
c
p
i =1
j =1
j
ij
(3)
j
Given N input-output data pairs {xk , yk } , the model in (2) can be written as a linear regression problem y = Pθ + e
(4)
where θ is consequents matrix of rules, and e is approximation error matrix. In this paper, Gaussian membership functions are used to represent the fuzzy set Aij Aij ( x j ) = exp(−
2 1 ( x j − vij ) ) 2 σ ij2
(5)
where vij and σ ij represent center and variance of Gaussian function respectively.
3 Design of Data-Driven Interpretable and Accurate Fuzzy Models 3.1 Construct Initial Fuzzy Models Using Fuzzy Clustering Algorithm
The Gustafson-Kessel [15] (GK) algorithm is employed to construct initial fuzzy models. The objective function of GK algorithm is described following:
Design of Interpretable and Accurate Fuzzy Models from Data c
71
N
J (Z; U, V ) = ¦¦ ( μik )m Dik2
(6)
i =1 k =1
where
U = [ μik ]
Z is the set of data,
is the
fuzzy partition
matrix,
V = [ v1 , v 2 , , v c ] is the set of centers of the clusters, c is the number of clusters, N T
is the number of data, m is the weighting exponent, μik is the membership degree between the ith cluster and kth data, which satisfy conditions:
μik ∈ [0,1];
¦
C
N
i =1
μik = 1; 0 0 . Generate the matrix U with the membership randomly. Compute the parameters of model using (13), (14) and (17). Calculate norm of distance utilizing (8. Update the partition matrix U using (12); Stop if U (l ) − U (l −1) ≤ ε , else go to 3).
3.2 Cluster Validation Indices
It is essential to determine the number of rules, i.e. the number of fuzzy clusters. This problem can be solved by validation analysis using cluster validity indices. There are two categories of fuzzy validity indices. The first category uses only the membership values of a fuzzy partition of data. On the other hand, the latter one involves both partition matrix and the data itself. The partition coefficient (PC) and the partition entropy coefficient (PE) [16] are the typical cluster validity indices of the first category. PC (c) = PE (c) = −
1 c N ¦ ¦ μik2 n i =1 k =1
(18)
1 c N ¦ ¦ μik log a μik n i =1 k =1
(19)
The number corresponding to PC (c) ∈ [1/ c,1] PE (c) ∈ [0, log a c] significant knee is selected as the optimal number of rules. The compactness and separation validity function proposed by Xie and Beni (XB) [17] is a representation of the second category. The smallest value of XB indicates the optimized clusters.
where
XB(c) =
c
N
i =1
k =1
¦ ¦
μikm xk − vi
n ⋅ min vi − vk i,k
2
2
(20)
Design of Interpretable and Accurate Fuzzy Models from Data
73
3.3 Rule Reduction Based on Orthogonal Least Square Method
In the process of cluster validity analysis, it is possible that the noise or abnormal data are clustered to create redundant or incorrect fuzzy rules. This problem can be solved by rule reduction using orthogonal least square method (OLS) [18]. The OLS method transforms the columns of the firing strength matrix into a set of orthogonal basis vectors. Using Gram–Schmidt orthogonalization procedure, the firing strength matrix P is decomposed into
P = WA
(21)
where W is a matrix with orthogonal columns wi, and A is an upper triangular matrix with unity diagonal elements. Substituting (21) into (4) yields y = WAθ + e = Wg + e
(22)
where g = Aθ . Since the columns wi of W are orthogonal, the sum of squares of y(k) can be written as c
y T y = ¦ gi2 wiT wi + eT e
(23)
i =1
Dividing N on both side of (23), it can be seen that the part of the output variance yTy/N explained by the regressors is ¦ gi wiT wi / N , and an error reduction ratio due to an individual rule is defined as
[err ]i =
gi2 wiT wi , 1 ≤ i ≤ c. yT y
(24)
The ratio offers a simple means for seeking a subset of important rules in a forward-regression manner. If it is decided that r rules are used to construct a fuzzy model, then the first r rules with the largest error reduction ratios will be selected. If the importance measure of a fuzzy rule is far less than others, and deletion of this rule doesn’t deteriorate precision performance, this fuzzy rule will be picked out to improve interpretability of fuzzy model. 3.4 Similarity Merging of Fuzzy Sets
Fuzzy models obtained above may contain redundant information in the form of similarity between fuzzy sets. This makes the fuzzy model uninterpretable, for it is difficult to assign qualitatively meaningful labels to similar fuzzy sets. In order to acquire an effective and interpretable fuzzy model, elimination of redundancy and making the fuzzy model as simple as possible are necessary. As for two similar fuzzy sets, a similarity measure is unutilized to determine if fuzzy sets should be combined to a new fuzzy set. For fuzzy sets A and B, a settheoretic operation based similarity measure [11] is defined as
74
Z.-y. Xing et al.
¦ S ( A, B) = ¦
N k =1 N k =1
[ μ A ( xk ) ∧μ B ( xk )] [ μ A ( xk ) ∨μ B ( xk )]
(25)
Where X = {x j | j = 1, 2, , m} is the discrete universe, ∧ and ∨ are the minimum and maximum operators respectively. S is a similarity measure in [0,1]. S=1 means the compared fuzzy sets are equal, while S=0 indicates that there is no overlapping between fuzzy sets. If similarity measure S ( A, B ) > τ , i.e. fuzzy sets are very similar, then the two fuzzy sets A and B should be merged to create a new fuzzy set C, where τ is a predefined threshold. In a general way, τ = [0.4 − 0.7] is a good choice. 3.5 Optimization
After rule reduction and similar fuzzy sets merging, the precision of initial fuzzy model is improved, while its precision performance is reduced. It is essential to optimize the reduced fuzzy model to improve its precision, while preserves its interpretability. The precision and parameters of fuzzy model are strongly nonlinear, so a robust optimization technique should be applied in order to assure a good convergence. The constraint Levenberg-Marquardt (LM) method [19] is adopted in this paper. The premise parameters are limited to change in a range of ±α % around their initial values in order to preserve the distinguishability of fuzzy sets. For the sake of maintaining the local interpretability of fuzzy model, the consequent parameters are restricted to vary ± β % of the corresponding consequent parameters.
4 Example PH neutralization [20] is a typical nonlinear system with three influent streams (acid, buffer and base) and one effluent stream. The influent buffer stream and the influent acid stream are kept constant, and the influent base stream change randomly. The output is the PH in the tank. In order to determine the number of rules, cluster validity indices including PC, PE, XB and Average Partition density (PA) [21] are adopted. Fig. 1 diagrams the result of cluster validity analysis intuitionally. Obviously, all the cluster validity indices indicate that optimal number of rules is 3. The GK fuzzy clustering algorithm is used to constructed fuzzy model with 3 fuzzy rules. The fuzzy sets of obtained fuzzy model are illustrated in Fig 2(b). Obviously, the fuzzy sets are distinguishable, and it is easy to assign understandable linguistic term to each fuzzy set. The OLS method is adopted to pick out unnecessary fuzzy rules. The importance measures of fuzzy rules are [0.2249 0.0250 0.7438] respectively, where the measure of the 2nd fuzzy rule is far less than the others. Without considering precision performance, the 2nd fuzzy rule can be deleted. Fig 2(a) diagrams the fuzzy sets of membership with 2 rules.
Design of Interpretable and Accurate Fuzzy Models from Data 0.85
0.95
0.80
0.80
0.75
75
0.65
PC
PE 0.70
0.50
0.65
0.35
0.60
2
3
4
5 6 7 number of rules (a)
8
9
10
0.20
0.26
13
0.22
12
0.18
2
3
4
5 6 7 number of rules (b)
8
9
10
2
3
4
5 6 7 number of rules (d)
8
9
10
11
XB
PA 0.14
10
0.10
9
0.06
8 2
3
4
5 6 7 number of rules (c)
8
9
10
Fig. 1. Cluster validity indices
After rule reduction, the interpretability of fuzzy model is improved, while the precision is reduced. If the model with 2 rules can satisfy practical demands, this model will be employed, otherwise, the model with 3 rules is preserved. The reduced fuzzy model with 2 rules is adopted in order to demonstrate interpretability in this paper. Fuzzy sets merging process is carried out sequentially. The similarity measure between fuzzy sets of flow rate is 0.3420, and the measure between fuzzy sets of PH is 0.0943. Without considering precision and reality, the fuzzy sets of flow rate can be merged to a new fuzzy set, so the flow rate variable can also be deleted for containing only one fuzzy set. In practice, similarity measure between fuzzy sets of flow rate is small, and flow rate variable is independent variable, and the precision performance of model after fuzzy sets merging is deteriorative, so the fuzzy sets of flow rate are not merged in this paper. In order of illustrated interpretability and precision of different fuzzy models intuitively, Fig. 2 diagrams fuzzy sets of model with 2, 3, 4,5 fuzzy rules, and Table 1 shows the corresponding errors and number of rules/fuzzy sets. Obviously, with the increase of rules, the precision is improved, while the interpretability is reduced. When the number of rules is 4 and 5, the fuzzy sets are heavy overlapped, whereas the precision is only increased a little. R1 : If FR(k ) is High and PH ( k ) is High Then PH (k + 1) = 0.0321FR (k ) + 0.8614 PH ( k ) + 0.6659 2
R : If FR(k ) is Low and PH (k ) is Low Then PH (k + 1) = 0.1103FR (k ) + 0.7521PH ( k ) + 0.2614
(26)
76
Z.-y. Xing et al.
1.0 0.8 μ 0.6 0.4 0.2 0
A2
5
1 0.8 μ 0.6 0.4 0.2 0
A1
10
15 20 25 Flow Rate A2
4
30
35
1.0 0.8 μ 0.6 0.4 0.2 0
A1
6
8
PH ( )
1.0 0.8 μ 0.6 0.4 0.2 0
10
A2
5
A1
10
15 20 25 Flow Rate
A2
A2
5
1.0 0.8 A2 μ 0.6 0.4 0.2 0
4
6
A4
15 20 Flow Rate
25
A3
4
6
30 A1
PH
35 A3
8
PH
10
(b)
A1A3
10
30
A1
(a) 1.0 0.8 μ 0.6 0.4 0.2 0
A3
8
10
35 A4
1.0 0.8 0.6 μ 0.4 0.2 0
A2
A1
5
10
1.0 0.8 A1 0.6 μ 0.4 0.2 0
(c)
A4
15 20 Folw Rate A4
4
A3
6
PH
A5
25 A2
8
30
35
A3
A5
10
(d)
Fig. 2. Fuzzy sets of different fuzzy models Table 1. Comparison of different fuzzy models Number of fuzzy rules 2 2 3 4 5
Number of fuzzy sets 2 4 6 8 10
Training error 1.6290 0.4020 0.3791 0.3716 0.3739
Validation error 1.7547 0.3433 0.3160 0.3119 0.3099
Validation error descend — 411.13% 8.64% 1.31% 0.65%
The fuzzy model is optimized by Levenberg-Marquardt method. The constraint of antecedent parameters is 5%, and the constraint of consequent parameters is 15%. The optimized fuzzy model is described as (26), where FR(k ) is the value of flow rate, PH (k ) is the value of PH. Fig. 3(a) diagrams the fuzzy sets of the model. After optimization, the training error and validation error of the model are 0.3711 and 0.3088. Without decrease interpretability, the precision of the model is improved. Fig 3(b) illustrates the comparison of model outputs and measured outputs.
Design of Interpretable and Accurate Fuzzy Models from Data
1.0 0.8 μ 0.6 0.4 0.2 0 1.0 0.8 μ 0.6 0.4 0.2 0
Low
5
10
High
15 20 25 Flow Rate
Low
4
6
PH
(a)
8
12 11 10 9 8 30 35 PH 7 6 High 5 4 3 2 10
77
Model output Measured output 50 100 150 200 250 300 350 400 450 500 number of data
(b)
Fig. 3. Fuzzy sets of the model and comparison of model output and measured output
5 Conclusion This paper proposes systematic techniques to construct interpretable and accurate fuzzy models. Firstly, GK fuzzy clustering algorithm is used to identify fuzzy model, and cluster validity indices are adopted to determine the number of rules. Secondly, orthogonal least square method and similarity measure of fuzzy sets are utilized to reduce the initial fuzzy model and improve its interpretability. Thirdly, constraint Levenberg-Marquardt algorithm is used to optimize the reduced fuzzy model. At last, the simulation on PH neutralization illustrates the effectiveness of proposed methods.
Acknowledgements This paper is supported by scientific research foundation of Nanjing University of Science and Technology, and by Jiangsu Planned Projects for Postdoctoral Research Funds.
References [1] Takagi, T., Sugeno, M.: Fuzzy identification of systems and its application to modeling and control. IEEE Trans on Systems Man and Cybernetics, 1985, 15(1): 116-132 [2] Sugeno, M., Yasukawa, T.: A fuzzy-logic- based approach to qualitative modeling. IEEE Trans on Fuzzy Systems, 1993, 1(1): 7-31 [3] Wang, L. X.: Adaptive fuzzy systems and control: design and stability analysis. Prentice Hall, 1994 [4] Min, Y. C., Linkens, D. A.: Rule-base self-generation and simplification for data-driven fuzzy models. Fuzzy sets and systems, 2004,142(2): 243-265 [5] Gomez-Skarmeta, A. F., DELGADO, M., VILA, M. A.: About the use of fuzzy clustering techniques for fuzzy model identification. Fuzzy Sets and Systems, 1999, 106(2): 179188 [6] Jang, J. R.: ANFIS: Adaptive-network-based fuzzy inference system. IEEE Trans on Systems Man, and Cybernetics, 1993, 23:665–684
78
Z.-y. Xing et al.
[7] Lefteri, H. T., Robert, E. U.: Fuzzy and Neural Approaches in Engineering. Wiley, 1997 [8] Cordon, O., Herrera, F., Hoffmann, F., Magdalena, L.: Genetic Fuzzy Systems: Evolutionary Tuning and Learning of Fuzzy Rule Bases. World Scientific, 2001 [9] Cordon, O., Gomide, F., Herrera, F., Hoffmann, F., Magdalena, L.: Ten Years of Genetic Fuzzy Systems: Current Framework and New Trends. Fuzzy sets and systems, 2004, 141(1): 5-31 [10] Babuska, R., Bersini, H., Linkens, D. A., Nauck, D., Tselentis, G., Wolkenhauer, O.: Future Prospects for Fuzzy Systems and Technology [EL/OB]. ERUDIT Newsletter, Aachen, Germany, 6(1), 2000. Available: http://www.erudit.de/erudit/newsletters/news 61/page5.htm [11] Sentes, M., Babuska, R., Kaymak, U., Lemke, H. R. N.: Similarity Measures in Fuzzy Rule Base Simplification. IEEE Trans on Systems Man and Cybernetics, 1998, 28(3): 376-386. [12] Yen, J., Wang, L.: Simplifying fuzzy modeling by both gray relational analysis and data transformation methods . IEEE Trans on Systems Man and Cybernetics, 1999, 29(1): 1324 [13] Abonyi, J., Roubos, J. A., Oosterom, M., Szeifert, F.: Compact TS-fuzzy models through clustering and OLS plus FIS model reduction. Proc of IEEE int conf on fuzzy systems, Sydney, Australia, 2001: 1420-1423 [14] Delgado, M. R., Zuben, F. V., Gomide, F.: Multi-Objective Decision Making: Towards Improvement of Accuracy, Interpretability and Design Autonomy in Hierarchical Genetic Fuzzy Systems. Proc of IEEE int conf on fuzzy systems, Honolulu, Hawai, 2002: 1222-1227 [15] Gustafson, D., Kessel, W.: Fuzzy clustering with a fuzzy covariance matrix. Proc of IEEE conf on decision and control. San Diego, USA, 1979: 761-766 [16] Bezdek, J. C.: Pattern Recognition with fuzzy objective algorithm. New York: Plenum Press, 1981 [17] Xie, X. L., Beni, G.: A Validity Measure for Fuzzy Clustering. IEEE Trans on Pattern Analysis and Machine Intelligence, 1991, 13(8): 841-847 [18] Yen, J., Wang, L.: Simplifying fuzzy modeling by both gray relational analysis and data transformation methods. IEEE Trans on Systems Man and Cybernetics, 1999, 29(1): 1324 [19] Gaweda, A. E.: Optimal data-driven rule extraction using adaptive fuzzy neural models. University of Louisville 2002 [20] Babuska, R.: Fuzzy Modeling for Control. Boston: Kluwer Academic Publishers, 1998 [21] Gath, I., Geva, A. B.: Fuzzy clustering for the estimation of the parameters of the components of mixtures of normal distributions. Pattern Recognition Letters, 1989, 9: 77-86
Generating Extended Fuzzy Basis Function Networks Using Hybrid Algorithm Bin Ye, Chengzhi Zhu, Chuangxin Guo, and Yijia Cao College of Electrical Engineering, National Laboratory of Industrial Control Technology, Zhejiang University, Hangzhou 310027, Zhejiang, China [email protected], [email protected]
Abstract. This paper presents a new kind of Evolutionary Fuzzy System (EFS) based on the Least Squares (LS) method and a hybrid learning algorithm: Adaptive Evolutionary-programming and Particle-swarm-optimization (AEPPSO). The structure of the Extended Fuzzy Basis Function Network (EFBFN) is firstly proposed, and the LS method is used to design it with presetting the widths of the hidden units in EFBFN. Then, to enhance the performance of the obtained EFBFN ulteriorly, a novel learning algorithm based on least squares and the hybrid of evolutionary programming and particle swarm optimization (AEPPSO) is proposed, in which we use EPPSO to tune the parameters of the premise part in EFBFN, and the LS algorithm to decide the consequent parameters in it simultaneously. In the simulation part, the proposed method is employed to predict a chaotic time series. Comparisons with some typical fuzzy modeling methods and artificial neural networks are presented and discussed.
1 Introduction In recent years, various neural-fuzzy networks have been proposed. Among them, the fuzzy basis function networks (FBFNs) [1], which are similar in structure to radial basis function networks (RBFNs) [2], have gained much attention. In addition to their simple structure, FBFNs possess another advantage that they can readily adopt various learning algorithms already developed for RBFNs. Among the various methods for designing FBFNs, the OLS method [1] has attracted much attention due to its simple and straightforward computation with reliable performance. However, its performance largely depends on the preset input membership functions (MFs) because the parameters in the MFs remain unchanged during learning process. Furthermore, it fails to yield a meaningful fuzzy system after learning [3]. In this paper, we extend the original FBFN with singleton output fuzzy variables to be Extended FBFN with 1st-order fuzzy output, named EFBFN, and the least squares (LS) method is used to select the significant fuzzy rules from candidate fuzzy basis functions (FBFs [1]) based on error reduction measure which is calculated by a projection matrix instead of orthogonalization. While the LS algorithm is a computationally efficient way of constructing fuzzy system, its performance depends on the preset parameters of basis functions. Therefore, we have the necessity to tune the parameters of the obtained MFs within a small range using a tuning algorithm. Here, a tuning algorithm based on least squares and the hybrid of evolutionary programming and L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 79 – 88, 2005. © Springer-Verlag Berlin Heidelberg 2005
80
B. Ye et al.
particle swarm optimization, named Adaptive Evolutionary-programming and Particle-swarm-optimization (AEPPSO), is proposed. The hybrid algorithm EPPSO based on EP and PSO is performed to tune the parameters of the premise part, while the LS algorithm is used to decide the consequent parameters of the fuzzy rule base simultaneously. To demonstrate the performance of the proposed algorithm for designing EFBFN based on the hybrid of LS and AEPPSO, the prediction of a chaotic time series are performed, and the results are also compared to other methods.
2 Extended Fuzzy Basis Function Network 2.1 Structure of Extended Fuzzy Basis Function Network The original FBFN [1] used singleton fuzzy membership functions as output fuzzy variables, while in this paper we extend the FBFN to be the following form:
i i i i i i i R : if x1 is A1 ( x1 ) and ... xk is Ak ( xk ), then y is ξ 0 + ξ1 x1 + ... + ξ k xk ,
(1)
i
where R represents the ith rule (1 ≤ i ≤ r ) , x j (1 ≤ j ≤ k ) is an input variable and i i y is an output variable. A j ( x j ) are the fuzzy variables defined as in Eqn.2, and i ξ j (1 ≤ i ≤ r , 0 ≤ j ≤ k ) are the consequent parameters of the fuzzy model.
1 i i i 2 A j ( x j ) = exp[ − *[( x j − m j ) / σ j ] ], 2 i
(2)
i
where m j and σ j are the mean value and the standard deviation of the Gaussian type MF, respectively. The fuzzy basis functions defined by Wang can also be used here: i i pi ( Ȥ ) = ∏ kj =1 A j ( x j ) / ¦ ir=1 (∏ kj =1 A j ( x j )), i = 1, 2,...r , Ȥ = [ χ1 , ..., χ k ].
(3)
The fuzzy basis function expansion defined as in Eqn.4 for extended FBFN is equivalent to the 1st-order T-S fuzzy model suggested by Takagi and Sugeno [4]. r ∗ y = ¦ pi ( Ȥ )Ci , i =1 i i i where Ci is the consequent part of the fuzzy rule, and Ci = ξ 0 + ξ1 x1 + ... + ξ k xk . 2.2 EFBFN Based on Least Squares Method
Before describing the LS method for designing EFBFN, we define the I/O data as:
\ = {Ȥ , y | Ȥ = ( x , x ,...x ), h = 1, 2...N } . h h h h1 h 2 hk
(4)
Generating Extended Fuzzy Basis Function Networks Using Hybrid Algorithm
81
According to the orthogonal least algorithm proposed for constructing FBFN by Wang [1], the following N candidate fuzzy basis function (FBF) nodes are generated at the beginning: h h ph ( Ȥ ) = ∏ kj =1 A j ( x j ) / ¦ hN=1 ( ∏ kj =1 A j ( x j )), h = 1, 2, ... N , Ȥ = [ χ , ..., χ ]. , 1 k
(5)
where in the fuzzy MFs, the widths and the candidate centers are defined as follows: max( x j )−min( x j ) i , m j = Ȥ h , where r j = 1, 2..., k , Ȥ h = [ xh1, xh 2 ,...xhk ], x j = [ x1 j ,..., x Nj ]. i
σj =
(6)
Assume that r (r < N) EFBFN nodes have been selected by the OLS algorithm, the resultant FBF expansion produced by the OLS algorithm is represented by: r h h ∗ y = ¦ p ( Ȥ )Ci = ¦ir=1 Ci ( ∏ kj =1 A ji ( x j )) / ¦ hN=1 ( ∏ kj =1 A ji ( x j )), i = 1, 2, ...r . hi i =1
(7)
Compare above equation with the original definition of EFBFN in Eqn. (4), i.e., the following equation: ∗ i i i i y = ¦ir=1 Ci ( ∏ kj =1 Aij ( x j )) / ¦ir=1 (∏ kj =1 A j ( x j )), C = ξ + ξ x + ... + ξ x , i k k 0 1 1
(8)
it is clear to see that the fuzzy basis functions selected by the OLS algorithm can not be interpreted as true FIS since they retain the normalization factor in the denominator before training. While the least squares method used in this paper for constructing EFBFN can be clearly understood in terms of fuzzy rules defined in physics domains and with a computational efficiency [3]. We consider the inferred output formula, Eqn.4 as a linear regression model: y = Wȟ + e ,
(9)
where y = [ y1 ,... yl ,..., y N ] is the desired output, e = [e1 , e2 ,..., eN ] is the error signal, the matrix W and the vector ȟ are defined as follows:
W = [ w1 ,...w N ] = [p1 ,...p r , p1. ∗ x1,...p r . ∗ x1 ,...p1. ∗ x k ,...p r . ∗ x k ] 1 1 r 1 r r T ȟ = [ξ 0 ,...ξ 0 , ξ1 ,...ξ1 ,...ξ k ,...ξ k ] T T pi = [ pi ( Ȥ1 ),..., pi ( Ȥ N )] , x j = [ x1 j , x2 j ,...x Nj ] , T pi . ∗ x j = [ pi ( Ȥ1 ) x1 j ,..., pi ( Ȥ N ) x Nj ] , i = 1, 2,...r , j = 1, 2, ..., k The LS algorithm selects the most significant FBFs by maximizing the following error reduction measure [err]: + [err ] = WW d ,
(10)
82
B. Ye et al.
+ + where W denotes the pseudoinverse of W . It can be seen that WW is the orthogonal projection onto the column space of W . With LS algorithm applied in EFBFN, when we are selecting the first EFBFN (r = 1) node from the FBF sets, the N candidate FBFs are calculated using the following formula: 1 ph ( Ȥ g ) = ∏ kj =1 A j ( x j ), g = 1, 2, ... N , h = 1, 2,...N ,
(11)
where the parameters in the fuzzy MFs are defined as in Eqn. (6). When s-1 EFBFN nodes have been picked from the data sets, for selecting the sth EFBFN nodes, the Ns+1 candidate FBFs for EFBFN using LS are defined as follows: i i p ( Ȥ ) = ∏ kj =1 A j ( x j ) / ¦ hs =1 ∏ kj =1 A j ( x j ), h ≠ h , ..., h ≠ h , g = 1, 2, ... N . 1 h g s −1
(12)
Our objective is to find r fuzzy rules from the N training data pairs. The selection procedure can be described as follows: Step 1: for 1 ≤ h ≤ N , compute ( h) T p1 = [ ph ( Ȥ1 ), ph ( Ȥ 2 ), ..., ph ( Ȥ N )] , h = 1, 2,... N , ( h) ( h) ( h) ( h) W1 = [p1 , p1 . ∗ x1 ,..., p1 . ∗ x k ] ( h) [err ]1 = W1( h ) ( W1( h) )+ y (h ) (h ) ( h) Find [err ]1 1 = max{[ err ]1 ,1 ≤ h ≤ N }, and select p1 = p1 1 . Step s : where s ≥ 2, for 1 ≤ h ≤ N , h ≠ h1, ..., h ≠ hs −1 , compute ( h) T p s = [ ph ( Ȥ1 ), ph ( Ȥ 2 ),..., ph ( Ȥ N )] , h = 1, 2,...N , (h) ( h) ( h) (h) Ws = [p1 ,..., p s , p1. ∗ x1 ,..., p s . ∗ x1, ..., p1. ∗ x k , ..., p s . ∗ x k ] ( h) [ err ]s = Ws( h ) ( Ws( h ) )+ y (h ) (h ) ( h) Find [ err ]s s = max{[ err ]s ,1 ≤ h ≤ N , h ≠ h1, ..., h ≠ hs −1}, and select p s = p s s .
The procedure will stop when the number of the selected FBFs reaches r . Once r FBFs are found, i.e., r fuzzy rules are generated, the matrix W in Eqn. (9) is determined as W = [p1 ,..., p r , p1. ∗ x1 ,..., p r . ∗ x1 ,..., p1. ∗ x k ,..., p r . ∗ x k ] , and the consequent parameters can be calculated as: + ȟ = W d.
(13)
r ∗ y = ¦ pi ( Ȥ ) ⋅ Ȥ ′ ⋅ ȟ i , i =1
(14)
The final EFBFN is:
i i i T where Ȥ ′ = [1, x1 ,..., xk ] , and ȟ i = [ξ 0 , ξ1 ,..., ξ k ] .
Generating Extended Fuzzy Basis Function Networks Using Hybrid Algorithm
83
3 Adaptive EPPSO for EFBFN The EFBFN constructed by LS algorithm has the limitation that the widths of the membership functions in the hidden units are prefixed and the centers are selected only for the available train data pairs. In this section, we will use the proposed algorithm AEPPSO to tune the parameters of the MFs in the EFBFN obtained using LS algorithm, and to determine the consequent parameters simultaneously.
3.1 Population Initialization for EPPSO (1) Parameter representation The fuzzy model we used in this paper has been described as in Eqn. (1), in which the parameters could be divided into the premise part and the consequent part. The parameter set of the two parts can both be expressed as a two dimensional matrix. For the premise part, the parameter set i i {m j , σ j ,1 ≤ i ≤ r ,1 ≤ j ≤ k } of MFs constitute the first matrix named Q , while for i the consequent part, the consequent parameter set {ξ j ,1 ≤ i ≤ r , 0 ≤ j ≤ k } constitutes the second matrix named ȟ , where r is the number of the fuzzy rules and k is the number of input variables. The Gaussian type MF is used in this paper, thus the size of Q is ^1 = r × (2 k ) , and the size of ȟ is ^ 2 = r × ( k + 1) , totally ^ = ^1 + ^ 2 parameters.
Initialization In this step, M individuals forming the population is initialized, (2) and each consists of ^1 parameters. The initialization of the parameter matrix Q is performed based on those MF parameters of the EFBFN obtained by LS algorithm in i
i
the above section. Assume that the obtained parameters are {m j (0), σ j (0)} , and the domains of the ith input variable in the train data set has been found to be [min( x ), max( x )] , then we initialize the centers and widths for the FBFs as random j j values with the following domain: i i i i i i m j ∈ [ m j (0) − δ j , m j (0) + δ j ], σ j ∈ [σ j (0) − δ j , σ j (0) + δ j ],
(15)
where δ j is a small positive value, usually defined as δ j = (max( x j ) − min( x j )) / 10 . These individuals could be regarded as population members in terms of EP and particles in terms of PSO, respectively.
3.2 Algorithm Description (1) PSO operator In each generation, after the fitness values of each individual are evaluated, the top 50% individuals are selected as the elites and the others are discarded. Similar to the maturing phenomenon in nature, the individuals are firstly enhanced by PSO and become more suitable to the environment after acquiring the
84
B. Ye et al.
knowledge from the society. The whole elites could be regarded as a swarm, and each elite corresponds to a particle in it. In PSO, individuals (particles) of the same generation enhance themselves based on their own private cognition and social interactions with each other. And this procedure is regarded as the maturing phenomenon in EPPSO. The selected M/2 elites are regarded as particles in PSO, and each elite correT
sponds to a particle zθ = ( zθ 1 , zθ 2 , ..., zθ ^ ) . 1 In applying PSO, we adopt the following equation [5] to improve these individuals:
v (t + 1) = ω v (t ) + c r (Ȍ g (t ) − z (t )) + c r (Ȍ g (t ) − z (t )) θ θ 11 θ 22 θ , z (t ) = z (t ) + v (t + 1)
θ
θ
(16)
θ
T is the velocity vector of this particle, vθ = ( vθ 1 , vθ 2 , ..., vθ ^ ) 1 T Ȍθ = (ψ θ 1 , ψ θ 2 , ..., ψθ ^ ) is the best previous position encountered by the ith par1
where
ticle, Ȍ g (t ) is the best previous position among all the individuals of the swarm, θ = 1, 2, ..., M / 2 , ω is a parameter called the inertia weight, t is the iteration counter, c1 and c2 are positive constants, referred to as cognitive and social parameters, respectively, and r1, r2 are random numbers, uniformly distributed within the interval [0, 1]. In Eqn. (16), Ȍ g (t ) is the best performing individual evolved so far, either the enhanced elite obtained by PSO or the offspring produced using EP. By performing the PSO operation on the selected elites before mutation operation in EP, we may accelerate the convergence speed of the individuals and improve the search ability. The enhanced elites will be copied to the next generation and also designated as the parents of the generation for EP, copied and mutated to produce the other M/2 individuals. (2) EP operator To produce better-performing offspring, the mutation parents are selected merely from the enhanced elites by PSO. In the EP operation, the M / 2 particles in PSO will be M / 2 population members with parameters θ | zθ = zθ n ,1 ≤ θ ≤ M / 2; 1 ≤ n ≤ ^1} . The parameter mutation changes the pa-
{z
rameters of membership functions by membership functions by adding Gaussian random numbers generated with the probability of p to them [6]: θ n = zθ n + α ∗ e
z
( Fmax − Fm )/ Fmax
∗ N (0,1),
(17)
where Fmax is the largest fitness value of the individuals in the current generation, Fm is the fitness of the mth individual, Į is a real value between 0 and 1. The combination of the offspring of EP and the elites enhanced by PSO comes to be the new generation of EPPSO, and after the fitness evaluation, the evolution will go ahead again until termination condition is satisfied. The details of the basic procedure of EP are referred to Ref. [7]. (3) Least Squares Estimate The hybrid algorithm EPPSO is applied to tune the MF parameters of the EFBFN, while the LS is used to determine the ^ 2 consequent
Generating Extended Fuzzy Basis Function Networks Using Hybrid Algorithm
85
i
parameters ξ j of the fuzzy rule base. For each individual in each generation, when the premise parameter matrix Q is determined by EPPSO, we fix these parameters and using the following procedure to obtain the consequent parameters: i
i
i. Use the MF parameters {m j , σ j } in Q to calculate the FBFs pi ( Ȥ ) as follows: i ∏ kj =1 A j ( x j )
i 1 xj − mj 2 i , where A j ( x j ) = exp[ − * ( ) ]. pi ( Ȥ ) = i 2 σ ij ¦ ir=1 (∏ kj =1 A j ( x j )) ii. Calculate matrix W T W = [ w1 ,...w N ] = [p1,...p r , p1. ∗ x1, ...p r . ∗ x1 ,...p1. ∗ x k ,...p r . ∗ x k ], T T where pi = [ pi ( Ȥ1 ), ..., pi ( Ȥ N )] , x j = [ x1 j , x2 j ,... x Nj ] , iii. Now, we can calculate the consequent parameters of the current fuzzy rule base using Eqn. (13), and also the inferred output could be calculated using Eqn. (14). Since the fitness evaluation is not the same for different problem, we will introduce it in the simulation part. When the fitness is evaluated, we select the top 50% individuals and till this step the program finish one generation. And the evolution will go on until the termination condition is satisfied.
4 Simulation Results In this section, simulation results of predicting a chaotic time series using the fuzzy inference systems based on the proposed hybrid algorithm are presented. The chaotic time series is generated from the Mackey-Glass differential delay equation [8] defined below: x (t ) = 0.2 x(t − τ ) /[1 + x10 (t − τ )] − 0.1x(t ),
(18)
the problem is to use the past values of x to predict some future value of x. The same example as published in [8-10] has been adopted to allow a comparison with the published results. To obtain the time series value at each integer point, we applied the fourth Runge-Kutta method to find the numerical solution of Eqn. (18). Initial condition x(0)=1.2, τ = 17 and the value of the signal six steps ahead x(t+6) is predicted based on the values of the signal at current moment, 6, 12 and 18 steps back. The input-output data pairs are of the following format: [ x(t − 18), x(t − 12), x(t − 6), x(t ); x(t + 6)].
(19)
The data range of 118 ≤ t ≤ 1117 has also been adopted, with the first 500 samples forming the training data set and the second 500 forming the validation data set. The nondimensional error index (NDEI) has been calculated to compare model performance, and the fitness function used in this example is defined as F = 1 / NDEI .
86
B. Ye et al.
During the training process of the EFBFN constructed for the chaotic time series using LS algorithm, the NDEI decrease of predicting the chaotic time series using EFBFN comes to be very little when the number of fuzzy rules reaches about twelve. Hence, we determine to generate a FIS with twelve fuzzy rules for predicting the chaotic time series. The final predicting NDEI in training and testing are 0.020 and 0.022, respectively. We use the proposed algorithm AEPPSO to tune the MF parameters of the obtained EFBFN to achieve better performance. The twelve fuzzy rules in EFBFN result in 156 free parameters totally, in which 96 premise parameters to be tuned using EPPSO, and other 60 consequent parameters to be determined by LS method. In applying AEPPSO, 70 individuals are initially randomly generated in a population, i.e., M = 70. During PSO operation, the inertia weights are set to be w = 0.35, w = 0.1 , the cognitive parameter c1 and the social parameter c2 are max min both set to be 1.5; the mutation probability p and learning parameter α in EP are both set to be 0.1. To show the superiority of AEPPSO, other two algorithms named AEP (Adaptive Evolutionary Programming) and APSO (Adaptive Particle Swarm Optimization) are also performed to the same problem. The evolutions are processed for 100 generations and repeated for 5 runs. The training averaged best-so-far RMSE values over 5 runs for each generation are shown in Fig.1. From the figure, we can see that AEP converges with a slower speed compared to APSO and AEPPSO, while APSO converges fastest with a larger NDEI compared to that of AEPPSO. The phenomenon sufficiently incarnates the characteristics of each algorithm as follows: i. AEP obtains knowledge merely from the individuals themselves without sharing the knowledge with each other, while the individuals in APSO adapt themselves according to their own knowledge and also their companions’ knowledge. Therefore, the APSO is reasonably to have a faster convergence speed. ii. APSO memorizes the previous knowledge of good solutions as flying destinations of the particles. However, the particles are restricted within the previous knowledge and thus coming to be lacking in diversity. This feature makes APSO probably converge to an unsatisfying result. iii. AEPPSO combines the two algorithms’ merits and the simulation results in Fig.1 validate its features of satisfying result and convergence speed. Observing from the figure, the tuning performance for EFBFN based on the hybrid of LS and AEP is much worse than that of the other two, uncovering the ineffectiveness of EP in problems of pursuing high precision. Table 1 lists different methods’ generalization capabilities which are measured by using each method to predict 500 points immediately following the training set. The result of the proposed method outperforms that of any other approaches listed in the table except for that of ANFIS. Note that the generalization NDEIs of the hybrid algorithms are the mean values over 5 runs, and the minimum generalization NDEIs of the hybrid algorithms LS+AEP, LS+APSO and LS+AEPPSO are 0.015, 0.011 and 0.009, respectively.
Generating Extended Fuzzy Basis Function Networks Using Hybrid Algorithm
87
0.022
LS+AEPPSO LS+AEP LS+APSO
0.02
0.018
NDEI
0.016
0.014
0.012
0.01
0.008
0
10
20
30
40
50
60
70
80
90
100
Generation
Fig. 1. Average best-so-far NDEI in each generation (iteration) using different methods Table 1. Comparisons of generalization capability with published results Method Cascade-Correlation NN [8] Back-Prop NN [8] 6th-order Polynomial [8] ANFIS [9] AEPLSE [10] LS+AEP LS+APSO LS+AEPPSO
Training Data 500 500 500 500 500 500 500 500
Error Index (NDEI) 0.06 0.02 0.04 0.007 0.014 0.017 0.013 0.011
Unlike the back-propagation and LSE method of the neuro-fuzzy system ANFIS, the method proposed in this paper use the hybrid of evolution algorithm and LSE, and obtains an approximate performance. As a new evolutionary fuzzy system, the EFBFN generating method base on the hybrid of LS and AEPPSO provide another effective method of modeling and prediction using fuzzy inference system.
5 Conclusions A novel fuzzy modeling strategy based on least squares method and the so-called Adaptive Evolutionary-programming and Particle-swarm-optimization (AEPPSO) has been presented. The LS method is firstly used to determine the fuzzy basis functions of the proposed EFBFN by presetting centers of Gaussian membership functions. In the second stage, the combined algorithm (EPPSO) is proposed to tune the obtained MF parameters in EFBFN, and the LS method is used again here to determine the consequent parameters in it simultaneously. Proposed as a new kind of Evolutionary Fuzzy System (EFS), the method has been examined on predicting a chaotic time series, and the results are compared to neuro-fuzzy system, neural networks, and other fuzzy modeling methods to demonstrate its effectiveness.
88
B. Ye et al.
Acknowledgements This work is supported by the Outstanding Young Scholars Fund (No. 60225006) and Innovative Research Group Fund (No. 60421002) of Natural Science Foundation of China.
References 1. Wang, L. X., Mendel, J. M.: Fuzzy Basis Functions, Universal Approximation, and Orthogonal-Least Squares Learning. IEEE Trans. Neural Networks. 3 (1992) 807-814 2. Chen, S., Cowan, C. F. N., Grant, P. M.: Orthogonal Least Squares Learning Algorithm for Radial Basis Function Networks. IEEE Trans. Neural Networks. 2 (1991) 302-309 3. Lee, C. W., Shin, Y. C.: Construction of Fuzzy Systems Using Least-Squares Method and Genetic Algorithms. Fuzzy Sets and Systems. 137 (2003) 297-323 4. Takagi, T., Sugeno, M.: Fuzzy Identification of Systems and Its Application. IEEE Trans. Systems, Man, and Cybernetics. 15 (1985) 116-132 5. Shi, Y., Eberhart, R. C.: Parameter Selection in Particle Swarm Optimization. The 7th Annual Conference on Evolutionary Programming, San Diego, USA. 7 (1998) 591-600 6. Hwang, H. S.: Automatic Design of Fuzzy Rule Bases for Modeling and Control Using Evolutionary Programming. IEE Proc. Control Theory Appl. 146 (1999) 9-16 7. Fogel, D. B.: Evolutionary Computations: Toward a New Philosophy of Machine Intelligence. New York: IEEE (1995) 8. Crowder, R. S.: Predicting The Mackey-Glass Time Series With Cascade-correlation Learning. Proc. 1990 Connectionist Models Summer School: Carngie Mellon University. (1990) 117-123 9. Jang, J. S. R., Sun, C. T., Mizutani, E.: ANFIS: Adaptive-Network-Based Fuzzy Inference System. IEEE Trans. Systems., Man and Cybernetics. 23 (1993) 665-685 10. Ye, B., Guo, C. X., Cao, Y. J.: Identification of fuzzy model using evolutionary programming and least squares estimate. Proc. 2004 IEEE Fuzzy Systems Conf. 2 (2004) 593-598
Analysis of Temporal Uncertainty of Trains Converging Based on Fuzzy Time Petri Nets* Yangdong Ye1, Juan Wang1, and Limin Jia2 1
Department of Computer Science, Zhengzhou University, Zhengzhou 450052, China [email protected], [email protected] 2 School of Traffic and Transportation, Beijing Jiaotong University, Beijing 100044, China [email protected]
Abstract. The paper defines a fuzzy time Petri net (FTPN) which adopts four fuzzy set theoretic functions of time called fuzzy timestamp, fuzzy enabling time, fuzzy occurrence time and fuzzy delay, to deal with temporal uncertainty of train group operation and we also present different firing strategies for the net to give prominence to key events. The application instance shows that the method based on FTPN can efficiently analyze trains converging time, the possibility of converging and train terminal time in adjustment of train operation plan. Compared with time interval method, this method has some outstanding characteristics such as accurate analysis, simple computation, system simplifying and convenience for system integrating.
1 Introduction The main issues [1][2] of train group operation modeling in the research of Railway Intelligent Transportation System (RITS) include the representation of train group concurrency, processing of multi-levels problems oriented to different control and decision, the disposal of system hybrid attributes and processing of various uncertainty factors affecting train group operation, all of which are relative to analysis of time parameter. With accelerating the speed of train group operation and the influences of objective uncertainty factors, it is increasingly important to process temporal uncertainty issues in train group operation. The analysis of temporal uncertainty in trains converging is significant for dynamic control during train operation, trains dispatching in stations, passengers changing trains, goods transferred and resources distribution of railway system. In the process of dealing with time factors in train group operation, the usual representation of single point of time is impractical and not integrated and the representation of time interval is difficult to do quantitative analysis. Based on existed Petri net model of train group operation [2][10], the paper introduces fuzzy set theory to describe uncertainty or subjective time information, which can satisfy applications of reality. How to represent uncertainty knowledge is an attentional issue [3~7] in the research of Petri net modeling. With the existed time Petri net (TPN) [8~11], we define *
This research was partially supported by the National Science Foundation of China under grant number 600332020 and the Henan Science Foundation under grant number 0411012300.
L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 89 – 99, 2005. © Springer-Verlag Berlin Heidelberg 2005
90
Y. Ye, J. Wang, and L. Jia
a fuzzy time Petri net adopting fuzzy set theoretic functions of time to analyze trains converging issue. The paper introduces relative concepts of fuzzy time Petri net and the fuzzy set theoretic functions of time in Section 2 and 3; then we give an example of train operation in Section 4 and make relative analysis in Section 5; finally aiming at temporal uncertainty we compare this method with reasoning algorithm of TPN in Section 6.
2 Fuzzy Time Petri Nets In order to describe train behaviors with time constraints, the paper defines a fuzzy time Petri nets adopting four fuzzy time functions to deal with temporal uncertainty of trains converging issue. These functions appear in some publications and applications [6~8] and their definitions were given by Murata etc [3~5]. Definition 2.1. Fuzzy time Petri nets’ system (FTPN’s) FTPN’s = (P, T, E, ȕ, A, FT, D, M0), where: (1) (2) (3) (4) (5) (6) (7) (8)
P = {p1,p2,… pn} is a finite set of places; T = {t1,t2,… tm,} is a finite set of transitions, where P∪T ∅, PŀT = ∅; E = {e1,e2,… em} is a finite set of events; ȕ: EĺT, is a mapping function that represents an event is relevant to a transition; A ⊆ (P × T)∪(T × P) is a set of arcs; FT is a set of fuzzy timestamps. It is related with tokens. An unrestricted timestamp is represented by [0,0,0,0], and an empty timestamp is ∅; D is a set of fuzzy delay time that is related with outgoing arcs of transitions; M0: PĺFT, is an initial marking function.
Definition 2.2. State marking of FTPN’s A marking Mi: PĺFT (i=0,1,2,3…) is a description of dynamic behaviors of system. A marking of system corresponds to a vector about places. In this paper, we use sets of tokens, {(p, ʌ(IJ))| p∈P, ʌ(IJ) ∈FT}, to describe marking and the empty timestamp ∅ does not appear in the sets of tokens. If M2 is directly reachable from M1 via e1, the sequence q is M1[e1>M2. Definition 2.3. Firing strategies of transitions in FTPN’s We primarily define FTPN to describe the analysis of temporal uncertainty facing train group operation. The firing of transition takes no time. Under conditions with no conflicts, transitions will fire immediately when all required tokens arrive. For conditions with conflicts, system needs to use certain firing strategies to do different analysis. The following strategies are used according to different problems in FTPN’s: (1) the earliest enabling time firing strategy; (2) the strategy of multi-condition first firing; (3) the strategy of high-possibility first firing; (4) mixed strategy etc. Using these firing strategies we can give prominence to key events and simplify system analyzing.
Analysis of Temporal Uncertainty of Trains Converging Based on Fuzzy Time Petri Nets
91
3 Fuzzy Time Function A fuzzy time function is a mapping function from time scale, the set of all non-negative real numbers, to the real interval [0, 1]. The value of function indicates degree of possibilities for an event on a point of time IJ. Fuzzy time functions are specified by 4-tuple [3~5] and their graphs are trapezoid describing train earliest arrival or departing time, the most possible time and the latest time. The paper uses square brackets express the functions to keep consistent with time interval [10,11]. Definition 3.1. Fuzzy timestamp ʌ(IJ) The fuzzy timestamp is the possibility distribution of a token arriving in one place on time IJ. Definition 3.2. The possibility distribution of the latest time latest is a multiple operator to calculate the possibility distribution of the latest time of fuzzy timestamps. Suppose that there are n fuzzy timestamps ʌi(IJ) = hi[ai,bi,ci, di], i=1,2,…,n, we calculate the possibility distribution of the latest time as following latest{ʌ1(IJ),ʌ2(IJ)} = min(h1,h2)[max(a1,a2),max(b1,b2),max(c1,c2),max(d1,d2)] latest{ʌi(IJ),i=1,2,…,n} = latest{latest{…{latest{ʌ1(IJ), ʌ2(IJ)},…,ʌi(IJ)},…},ʌn(IJ)}. Definition 3.3. The possibility distribution of the earliest time earliest is an operator to calculate the possibility distribution of the earliest time among many timestamps. Suppose that there are n fuzzy timestamps ʌi(IJ)=hi[ai,bi,ci, di], i=1,2,…,n, the possibility distribution of the earliest time is following earliest{ʌ1(IJ),ʌ2(IJ)} = max(h1,h2)[min(a1,a2), min(b1,b2),min(c1,c2),min(d1,d2)] earliest{ʌi(IJ), i=1,2,…,n} = earliest { earliest { …{ earliest { ʌ1(IJ), ʌ2(IJ) }, … , ʌi(IJ)}, …}, ʌn(IJ)}. Definition 3.4. Fuzzy enabling time e(IJ) When the occurrence of a transition or an event needs more than one token or resource, the latest time possibility distribution is fuzzy enabling time e(IJ). The net in this paper is ordinary net (the weight of inputting arcs is 1). So if transition t has n input places, to fire t needs n tokens, i.e. the occurrence of the corresponding event needs n tokens respectively in n places. The fuzzy enabling time of t is e(IJ) = latest{ʌi(IJ), i=1,2,…,n}
(1)
Definition 3.5. Fuzzy occurrence time o(IJ) The fuzzy occurrence time o(IJ) is the possibility distribution of an event occurring time. In ordinary circumstances we adopt the principle of First Come, First Served, to give high priority to the earlier enabling event. The fuzzy occurrence time of an event is the minimum overlapping area between fuzzy enabling time of this event and the result of earliest, and this operation is expressed by min [3][4]. In this paper the operator Min rounds the value of the calculated result of min. Suppose that there are n enabling events ei, i=1,2,…,n, and the corresponding fuzzy enabling time is ei(IJ), i=1,2,…,n by formula (1). The fuzzy occurrence time of the event et with fuzzy enabling time et(IJ) calculates as following
92
Y. Ye, J. Wang, and L. Jia
ot(IJ) = Min{et(IJ), earliest{ei(IJ), i=1,2,…,t,…,n }}
(2)
Definition 3.6. Fuzzy delay d(IJ) The fuzzy delay is a fuzzy time function associated with outgoing arcs from transitions. It is one kind of measurement about time when describing and analyzing events and means degree of possibility of time span that system states change from one to another when an event occurs. Given fuzzy delay and fuzzy occurrence time, the fuzzy timestamp of the new token is calculated as following ʌ(IJ) = o(IJ)⊕d(IJ) = [o1,o2,o3,o4] ⊕[d1,d2,d3,d4] = [o1+d1,o2+d2,o3+d3,o4+d4]
(3)
Suppose that transition t has one input place p1 and one output place p2, and no conflict. If the fuzzy timestamp of the token in p1 is ʌ1(IJ), then the fuzzy enabling time is et(IJ) = ʌ1(IJ) and the fuzzy occurrence time is ot(IJ) = et(IJ) = ʌ1(IJ). Given fuzzy delay d(IJ), the fuzzy timestamp of the new token in p2 after the event occurs is ʌ2(IJ) = ot(IJ) ⊕ d(IJ) = et(IJ) ⊕ d(IJ) = ʌ1(IJ) ⊕ d(IJ) Now the fuzzy timestamp of new token equals the result of operator fuzzy timestamps of the token in input place and the fuzzy delay.
(4) between
4 Train Group Behaviors Model During the modeling of train group behaviors, a series of temporal uncertainty problems emerge, which are caused by objective stochastic uncertainty in the train operation departure from train operation graph. 4.1 Example of Train Group Operation In order to explain the validity of FTPN clearly, a railway net in Fig. 1 is given, where S is station and Se is section. Train operation is denoted by following time constraints, where time (unit of time uses minute) is given by fuzzy time function.
Fig. 1. Railway route net of certain area
Train Tr1 plans to start from Station S1 at 7:00am, then to arrive at S6, and after time [13,15,20,22] on Se611 to reach terminal S11. There are three paths to S6. Tr1 runs on Se12 and Se25 for time [18,20,25,27] and [13,15,20,22] to S5, or on Se13 and Se35 for [18,20,25,27] and [20,22,30,32] to S5, and then from S5 to spend [18,20, 25,27] on Se56 to S6. Another path is from S1 via Se14 and Se46 to S6 for time [26, 28,32,34]
Analysis of Temporal Uncertainty of Trains Converging Based on Fuzzy Time Petri Nets
93
and [26,28,32,34] respectively. The problem of converging will emerge at S6; If it happens, the train will delay for [2,2,4,4]; If not, there is no delay. Train Tr2 starts from S8 at 7:00am, via Se89, Se96 for [18,20,25,27] and [20,22,25,27] respectively to reach S6; or via Se87, Se76 for [13,15,18,20] and [23,25,30,32] to get to S6, and then spends [18,20,25,27] to S10. There is the same problem of converging at S6. 4.2 Structure of the Model We construct FTPN model as shown in Fig. 5. The descriptions of places and transitions are listed in Table 1. FTPN’s = ( P, T, E, ȕ, A, FT, D, M0 ) where (1) P = { pi | i=1,2,…,14 }; T = { tj | j=1,2,…,16 }, E = { ej | j=1,2,…,16 }; (2) Fuzzy delay D = { d1 = [18,20,25,27], d2 = [18,20,25,27], d3 = [13,15,20,22], d4 = [20,22,30,32], d5 = [18,20,25,27], d6 =[26,28,32,34], d7 = [26,28,32,34], d8 = d15 = [0,0,0,0], d9 = d17 = [2,2,4,4], d10=[13,15,20,22], d11 = [18,20,25,27], d12 = [13,15,18,20], d13 = [20,22,25,27], d14 = [23,25,30,32], d16 = [18,20,25,27] }; (3) M0: { (p1, ʌ01(IJ)), (p9, ʌ02(IJ)) }; ʌ01(IJ)=[0,0,0,0], ʌ02(IJ)=[0,0,0,0]. Table 1. Places and transitions of train operation FTPN model p1 ,p2,p3,p4 ,p5,p6 p7, p13 p8 p9,p10, p11, p12 p14 t1,t2,t3,t4,t5 t6,t7 t8,t15 t9 t10 t11,t12 ,t13 ,t14, t16
Tr1 is in station S1, S2, S3, S4, S5 and S6. Tr1 and Tr2 is ready to depart from S6. Tr1 is in station S11. Tr2 is in station S8, S9, S7 and S6. Tr2 is in station S10. Tr1 is running on section Se12, Se13, Se25, Se35 and Se56. Tr1 is running on section Se14 and Se46. Trains are in S6 without trains converging. Trains stay in S6 with trains converging. Tr1 is running on section Se611. Tr2 is running on section Se89, Se87, Se96, Se76 and Se610.
5 Analysis of Temporal Uncertainty of Trains Converging This chapter analyzes the issues in the process of adjusting train operation plan for the model in Fig. 3 including trains converging time, the possibility of converging, train terminal time and the adjustment of train operation time. 5.1 Problem Statement The analysis of time parameters of trains converging in stations can conclude to the computation of timestamps and analysis of transition sequences from one state of FTPN’s to another. Three transition sequences, t1,t3,t5; t2,t4,t5; t6,t7, exist when one token moves from p1 to p6. Two sequences, t11,t13; t12,t14, exist when token moves from p9 to p12. Suppose that the possible fuzzy timestamps in p6 are ʌ61(IJ), ʌ62(IJ), ʌ63(IJ) and the fuzzy timestamps in p12 are ʌ121(IJ), ʌ122(IJ). When there are tokens in both p6 and p12, system marking is Mp:
94
Y. Ye, J. Wang, and L. Jia
{(p6, ʌ6(IJ)), (p12, ʌ12(IJ))}. Mp is the critical state whether trains converging happens or not. All possible fuzzy timestamps in Mp can be calculated by formula (3) (4) and the results are shown in Fig. 2.
Fig. 2. All possible fuzzy timestamps in Mp
From Fig. 2 transition t8, t15 and t9 under Mp may be enabled simultaneously, so the corresponding events e8, e15 and e9 have conflict. With different firing strategies the occurrence sequence from one state to another is different. Here we adopt the principle of small subscript first to deal with the events, which have the same enabling time and are independent to each other. The behaviors of two trains after arriving in station S6 conclude as following, q1: Mp [e9>Mj [e10>M10[e16>Mf q2: Mp [e8>M8 [e15>Mnj [e10>M10’[e16>Mnf among which the markings are Mj: {(p7, ʌj1(IJ)), (p13, ʌj2(IJ))}, Mnj: {(p7, ʌnj1(IJ)), (p13, ʌnj2(IJ))}, Mf : {(p8, ʌ8(IJ)), (p14, ʌ14(IJ))} and Mnf : {(p8, ʌ8’(IJ)), (p6, ʌ14’(IJ))}. The trains converging issue is the computation of timestamps and analysis of transition sequences from state M0 to Mp, then to Mj or Mnj and to final state Mf or Mnf. All the transition sequences and correlative markings can be seen in Table 2. When trains converging does not happen the values of corresponding fuzzy timestamps keep invariable, so we use the same subscripts in Table 2. Table 2. Transition sequences and part markings of model Occurrence Transition sequence of sequence each case Sequence with q11: t1,t3,t5,t11,t13,t9,t10,t16 q12: t6,t7,t11,t13,t9,t10,t16 Trains converging: q1 q13: t1,t3,t5,t12,t14,t9,t10,t16 q21: t2,t4,t5,t11,t13,t8,t15,t10,t16 q22: t2,t4,t5,t12,t14,t8,t15,t10,t16 Sequence q : t ,t ,t ,t ,t ,t ,t ,t without trains 23 6 7 12 14 8 15 10 16 converging: q2 q24: t1,t3,t5,t11,t13,t8,t15,t10,t16 q25: t6,t7,t11,t13,t8,t15,t10,t16 q26: t1,t3,t5,t12,t14,t8,t15,t10,t16
State Mp before converging
State Mj/Mnj after converging
Mp1: {(p6,ʌ61(IJ)),(p12,ʌ121(IJ))} Mp2: {(p6,ʌ63(IJ)),(p12,ʌ121(IJ))} Mp3: {(p6,ʌ61(IJ)),(p12,ʌ122(IJ))} Mp4: {(p6,ʌ62(IJ)),(p12,ʌ121(IJ))} Mp5: {(p6,ʌ62(IJ)),(p12,ʌ122(IJ))} Mp6: {(p6,ʌ63(IJ)),(p12,ʌ122(IJ))} Mp7: {(p6,ʌ61(IJ)),(p12,ʌ121(IJ))} Mp8: {(p6,ʌ63(IJ)),(p12,ʌ121(IJ))} Mp9: {(p6,ʌ61(IJ)),(p12,ʌ122(IJ))}
Mj1: {(p7,ʌ71(IJ)),(p13,ʌ131(IJ))} Mj2: {(p7,ʌ72(IJ)),(p13,ʌ132(IJ))} Mj3: {(p7,ʌ73(IJ)),(p13,ʌ133(IJ))} Mnj1: {(p7,ʌ62(IJ)),(p13,ʌ121(IJ))} Mnj2: {(p7,ʌ62(IJ)),(p13,ʌ122(IJ))} Mnj3: {(p7,ʌ63(IJ)),(p13,ʌ122(IJ))} Mnj4: {(p7,ʌ61(IJ)),(p13,ʌ121(IJ))} Mnj5: {(p7,ʌ63(IJ)),(p13,ʌ121(IJ))} Mnj6: {(p7,ʌ61(IJ)),(p13,ʌ122(IJ))}
In base of the earliest enabling time firing strategy we adopt the strategies of multi-condition first firing and high-possibility first firing to deal with temporal uncertainty of trains converging. The problems include that whether trains converging occurs, what the possibility of converging is, what the earliest or latest terminal time is and how to adjust train operation plan with analysis of trains converging.
Analysis of Temporal Uncertainty of Trains Converging Based on Fuzzy Time Petri Nets
95
5.2 Analysis Using the Strategy of Multi-condition First Firing The strategy of multi-condition first firing means that comparing the number of required tokens for the enabling events the transition requiring more tokens will fire first. This strategy stands out the complex event which requires more resources. With the strategy the occurrence sequence is q1. The analysis of trains converging time corresponds to the computation of timestamp of e9. The fuzzy occurrence time of e9 under Mp1, Mp2, Mp3 in Table 2 can be calculated by formula (1) (2) respectively. oe91(IJ) = 0.5[49,52,52,54], oe92(IJ) = 0.25[52,53,53,54], oe93(IJ) = 0.25[49,51,51,52]. Final fuzzy timestamps corresponding to marking Mf can be calculated by formula (4) and the results are shown with dashed line in Fig. 3(a) (b). The analysis of trains converging can be made out from above computation. • Trains converging time and the possibility of converging can be got by oe9(IJ). There is the highest occurrence possibility 0.5 at IJ=52 i.e. train Tr1 and Tr2 can meet in station S6 between 7: 49am and 7: 54am and the largest possibility is 0.5. • The earliest or latest terminal time after trains converging can be got from Fig. 3(a) (b). The earlier fuzzy time functions when Tr1 arrives in S8 are ʌ81(IJ) = 0.5[64,69,76,80] and ʌ83(IJ)=0.25[64,68,75,78], so that the earliest terminal time of Tr1 is 8:04am and the highest possibility arriving in terminal station is 0.5. • The paths corresponding with the earliest or latest terminal time also can be got. The fuzzy timestamp ʌ81(IJ) must pass through the transition sequence t1,t3,t5,t11,t13,t9,t10,t16, i.e. Tr1 goes only through the section Se13, Se35, Se56 and Se611 and stops for a while at station S6, and after that there is the earliest terminal time 8:04am. Similarly the time and paths of Tr2 can be analyzed in the same way. 5.3 Analysis with the Strategy of High-Possibility First Firing The strategy of high-possibility first firing means that for the given occurrence possibility threshold δ, if there is time IJ making the possibility hi of oei(IJ) greater than δ, i.e. hi>δ, then the event ei is the key event to be analyzed. This strategy actually gives prominence to the event with higher possibility. If δ=0.5 in the strategy of high-possibility first, the trains converging event of the model will become the nonoccurrence or low-possibility event. The occurrence sequence q2 includes all cases without trains converging. The fuzzy occurrence times of e8, e15 are oe8(IJ) = ʌ6(IJ) and oe15(IJ) = ʌ12(IJ) and the marking after e8 and e15 is Mnj: {(p7,ʌnj1(IJ)),(p13,ʌnj2(IJ))}. The corresponding transition sequences and fuzzy timestamps can be seen in Table 2. All final fuzzy timestamps calculate by formula (4) are shown in Fig 3 (a) (b) with real line. The analysis of temporal uncertainty about trains converging can be got from above results. • Trains converging event does not occur with this strategy. So the two trains run respectively and there is no stop in station S6.
96
Y. Ye, J. Wang, and L. Jia
• The earliest or latest terminal time without trains converging can be got from Fig. 3(a) (b). The earliest fuzzy time when Tr1 arrives in S8 is ʌ81’(IJ)=[62,70,90,98], so that the earliest terminal time of Tr1 is 8:02am and the highest possibility is 1. • Corresponding with ʌ81(IJ) Tr1 goes only through the section Se12, Se25, Se56 and Se611 and no stop in S6, and then there is the earliest terminal time 8:02am.
(a) All possible fuzzy timestamps of token in p8
(b) All possible fuzzy timestamps of token in p14
Fig. 3. All final fuzzy timestamps
5.4 Analysis of Train Operation Plan Adjustment Based on Trains Converging Analysis of train operation plan adjustment based on trains converging can make out from Fig. 2. The possibility of trains not converging is higher than that of trains converging. If the system in reality needs trains converging, we can do quantitative analysis for train operation plan using the FTPN model. Among all the cases of ʌ6(IJ), ʌ12(IJ) the maximum time difference is 5. With the minimum adjustment degree when the later token arrives ahead of schedule for 5 units of time or the earlier token delays for 5 units the event e9 has the highest possibility 1 through transition sequence t1, t3, t5, t11, t13. That is to say that if Tr2 departs ahead of time or speed up, or Tr1 put off departing, the possibility of trains converging will be higher. With the adjustment based on one train as benchmark and the highest possibility 1, Tr1 passes through Se12, Se25, Se56 and Tr2 goes through Se89, Se96, the time of converging is [44,50,50,54] or [49,55,55,59] which are shown in Fig. 4 (a) (b).
(a) Fuzzy timestamps of Tr1 ahead for 5 minutes
(b) Fuzzy timestamps of Tr2 delaying for 5 minutes
Fig. 4. Train operation adjustment for trains converging
Similarly when the later token delays for 5 units of time or the earlier token arrives ahead of time for 5 units of time e9 has the lowest possibility 0.
6 Comparing Analysis with Reasoning Algorithm of Time Petri Net The present research of temporal knowledge reasoning algorithm [10,11] of TPN aims at the system model of TPN, uses finite system states, creates sprouting graph of time parameters, and then with the sprouting graph analyzes time parameters.
Analysis of Temporal Uncertainty of Trains Converging Based on Fuzzy Time Petri Nets
Fig. 5. FTPN model of train operation
97
Fig. 6. TPN model of train operation
6.1 Time Petri Net Model of the Example Time Petri net (TPN) adopts time interval to deal with temporal uncertainty of train group operation [10]. The model of the example using TPN in [10] is shown in Fig. 6, where time interval parameters are marked. The corresponding sprouting graph with time parameter of the model is shown in Fig. 7.
Fig. 7. Time parameter sprouting graph of train operation
6.2 Comparing Analysis We analyze the example using the temporal knowledge reasoning algorithm of TPN and the specific contrasts with the analysis in chapter 5 can be seen in Table 3.
98
Y. Ye, J. Wang, and L. Jia Table 3. Temporal knowledge reasoning contrast of TPN and FTPN
TPN FTPN Representation of temporal uncertainty Time interval Fuzzy time function Processing of temporal uncertainty No Quantitative analysis Quantitative analysis Data structure Dynamic sprouting graph Timestamp Computation procedure 4 2 [49,54] 0.5[49,52,52,54] Occurrence time of trains converging in [52,54] 0.25[52,53,53,54] q11,q12,q13 [49,52] 0.25[49,51,51,52] The earliest terminal time of Tr1 after [64,78] 0.25[64,68,75,78] q13: t1,t3,t5,t12,t14,t9,t10,t16 converging and transition sequence q13: t1,t3,t5,t12,t14,t9,t10,t16 The earliest terminal time of Tr2 after [69,83] 0.25[69,73,80,83] converging and transition sequence q13: t1,t3,t5,t12,t14,t9,t10,t16 q13: t1,t3,t5,t12,t14,t9,t10,t16 Adjustment of train operation plan No Adjustment degree is 5 units of time
The following conclusions can be deduced from Table 3: (1) The descriptions of temporal uncertainty of the two methods are consistent. (2) The fuzzy time functions of FTPN have quantitative analysis. (3) Our computation procedure is simple. The existed method has four procedures which are modeling, system states searching, creating sprouting graph and analyzing processes. But our method just needs two procedures, system states searching and time parameters calculating and analyzing when token moves. (4) Time analysis is easier in our method. FTPN adopts token structure of fuzzy timestamp which is much simpler than sprouting graph structure. (5) Our method adopts firing strategies that can give prominence to key events in different conditions. (6) The method is convenient for integration of train object models.
7 Conclusions The paper defines a fuzzy time Petri net for temporal uncertainty of train group operation and introduces four fuzzy set theoretic functions of time to the net for representation of temporal uncertainty during train operation. Oriented to temporal uncertainty of trains converging in train operation plan adjustment, this paper presents different firing strategies for the model to give prominence to key events and simplify system modeling and analyzing. The application instance shows that the method can analyze quantitatively temporal uncertainty of trains converging including trains converging time, the possibility of converging and terminal time in train operation plan adjustment. Comparing with time interval method we conclude that our method has outstanding characteristics such as accurate analysis, simple computation, system simplifying and convenience for system integrating. So the method can be used to represent temporal knowledge and enriches the research of various railway expert systems.
References 1. Jia, L.-M., Jiang, Q.-H.: Study on Essential Characters of RITS. Proceeding of 6th International Symposium on Autonomous Decentralized Systems. IEEE Computer Society, Pisa Italy (2003) 216-221.
Analysis of Temporal Uncertainty of Trains Converging Based on Fuzzy Time Petri Nets
99
2. Ye, Y.-D., Zhang, L., Du, Y.-H., Jia, L.-M.: Three-dimension Train Group Operation Simulation System Based on Petri Net with Objects. Proceeding of IEEE 6th International Conference on Intelligent Transportation Systems. Vol.2. (2003) 1568-1573. 3. Murata, T.: Temporal Uncertainty and Fuzzy-timing High-Level Petri Nets. 17th International conference on Application and Theory of Petri Nets, Lecture Notes in Computer Science, vol.1091. Springer-Verlag, New York (1996) 11-28. 4. Zhou, Y., Murata, T.: Petri Net Model with Fuzzy-timing and Fuzzy-Metric. International Journal of Intelligent Systems. 14(8) (1999) 719-746. 5. Zhou, Y., Murata, T., DeFanti, T.A.: Modeling and Performance Analysis using Extended Fuzzy-timing Petri Nets for Networked Virtual Environments. IEEE Transaction on System, Man and Cybernetics-Part B: Cybernetics, 30(5) (2000) 737-756. 6. Cardoso, J., Valette, R., Dubois, D.: Possibilistic Petri Nets. IEEE Transaction on System, Man and Cybernetics-Part B: Cybernetics. 29(5) (1999) 573-582. 7. Dubois, D., Prade, H.: Processing Fuzzy Temporal Knowledge. IEEE Transaction on System, Man and Cybernetics. 19(4) (1989) 729-744. 8. Merlin P.M.: A Methodology for The Design and Implementation of Communication Protocol. IEEE Transaction on Communication. 24(6) (1976) 614-621. 9. Tsai, J.-J.P., Yang, S.-J., Chang, Y.-H.: Timing Constraint Petri Nets and Their Application to Schedulability Analysis of Real-time System Specifications. IEEE Transaction on Software Engineering. 21(1) (1995) 32-49. 10. Ye, Y.-D., Du, Y.-H., Gao, J.-W., Jia, L.-M. A Temporal Knowledge Reasoning Algorithm using Time Petri Nets and its Applications in Railway Intelligent Transportation System. Journal of the china railway society. 24(5) (2002) 5-10. 11. Jong, W.-T., Shiau, Y.-S., Horng, Y.-J., Chen, H.-H., Chen, S.-M.: Temporal Knowledge Representation and Reasoning Techniques Using Time Petri Nets. IEEE Transaction on System, Man and Cybernetics, Part-B: Cybernetics. 29(4) (1999) 541-545.
Interval Regression Analysis Using Support Vector Machine and Quantile Regression Changha Hwang1 , Dug Hun Hong2 , Eunyoung Na3 , Hyejung Park3, and Jooyong Shim4 1
3
Division of Information and Computer Sciences, Dankook University, Yongsan Seoul, 140-714, South Korea [email protected] 2 Department of Mathematics, Myongji University, Yongin Kyunggido, 449-728, South Korea [email protected] Department of Statistical Information, Catholic University of Daegu, Kyungbuk 712 - 702, South Korea {ney111, hyjpark }@cu.ac.kr 4 Corresponding Author, Department of Statistics, Catholic University of Daegu, Kyungbuk 702-701, South Korea [email protected]
Abstract. This paper deals with interval regression analysis using support vector machine and quantile regression method. The algorithm consists of two phases - the identification of the main trend of the data and the interval regression based on acquired main trend. Using the principle of support vector machine the linear interval regression can be extended to the nonlinear interval regression. Numerical studies are then presented which indicate the performance of this algorithm.
1
Introduction
Regression analysis has been the most frequently used technique in various areas of business, science, and engineering. But we encounter many situations where necessary assumptions for regression analysis cannot be met because they are not based on random variables. In such situations interval regression analysis, which is regarded as the simplest version of possibilistic regression analysis introduced by Tanaka et al.[9], can be a good alternative. Interval regression that can be measured its spreading extent of data in the presence of heteroscedasticity, is addressed and analyzed. In interval regression analysis, the sample involved whole interval will be estimated. However, when estimation is made only through the whole data, outlier becomes its focus; thus in this case, the analysis of tendency for the whole data would be difficult. Lee and Tanaka[6] proposed upper and lower approximation model to overcome the above problem in the linear interval regression based on the quantile regression. The support vector machine(SVM), firstly developed by Vapnik[10][11], is being used as a new technique for regression and classification problem. SVM is gaining popularity due to many attractive features, and promising empirical L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 100–109, 2005. c Springer-Verlag Berlin Heidelberg 2005
Interval Regression Analysis Using Support Vector Machine
101
performance. It has been successfully applied to a number of real world problems such as handwritten character and digit recognition, face detection, text categorization and object detection in machine vision. The aforementioned applications are related to classification problems. It is also widely applicable in regression problems. SVM was initially developed to solve classification problems but recently it has been extended to the domain of regression problems. SVM is based on the structural risk minimization(SRM) principle, which has been shown to be superior to traditional empirical risk minimization(ERM) principle. SRM minimizes an upper bound on the expected risk unlike ERM minimizing the error on the training data. By minimizing this bound, high generalization performance can be achieved. For introductions and overviews of recent developments of SVM regression can be found in Gunn[2], Smola and Schoelkopf[8], Kecman[4], and Wang[12]. SVM avoids over-fitting by using regularization technique and is robust to outlier. Hong and Hwang[3] introduced interval regression analysis using quadratic loss SVM for crisp input and interval output. In this paper, we propose interval regression analysis for crisp input and output using the principle of SVM and quantile regression. The proposed model can be applied to both the linear and nonlinear interval regression. The remainder of this paper is organized as follows. In Section 2 we explain the estimation methods of quantile regression using SVM. In Section 3 we explain the estimation methods of interval regression using the principle of SVM and the quantile regression method in Section 2. In Section 4 we perform the numerical studies through four examples. Finally, Section 5 gives the conclusions.
2
Quantile Regression Using Support Vector Machine
The quantile regression model introduced by Koenker and Bassett[5] assumes that the conditional quantile function of the response given xi is linearly related to the covariate vector xi as follows Q(p|xi ) = β(p)t xi for p ∈ (0, 1)
(1)
where β(p), p-th regression quantile, is defined as any solution to the optimization problem, min β
n
ρp (yi − β t xi )
(2)
i=1
where ρp (·) is the check function defined as ρp (r) = pr I(r ≥ 0) + (p − 1)r I(r < 0).
(3)
Here we reexpress the above problem by the formulation stated in Vapnik[10][11]. 1 w2 + γ (pξi + (1 − p)ξi∗ ) for p ∈ (0, 1), 2 i=1 n
minimize
102
C. Hwang et al.
subject to
yi − wt xi ≤ ξi , wt xi − yi ≤ ξi∗ , ξi , ξi∗ ≥ 0,
(4)
where the p-th regression quantile β(p) is expressed in terms of wt . We construct a Lagrange function as follows: L=
1 2
−
w + γ 2
n
n
(pξi + (1 −
p)ξi∗ )
i=1
α∗i (ξi∗ + yi − wt xi ) −
i=1
n
−
n
αi (ξi − yi + wt xi )
i=1
(ηi ξi + ηi∗ ξi∗ ).
(5)
i=1
We notice that the positivity constraints αi , α∗i , ηi , ηi∗ ≥ 0 should be satisfied. After taking partial derivatives of the above equation with regard to the primal variables (w, ξi , ξi∗ ) and plugging them into the above equation, we have the optimization problem below. − max ∗ α,α
n n 1 (αi − α∗i )(αj − α∗j )xti xj + yi (αi − α∗i ) 2 i,j=1 i=1
(6)
with constraints αi ∈ [0, pγ] and αi ∈ [0, (1 − p)γ]. Solving the above equation with the constraints determines the optimal Lagrange multipliers, the αi , α∗i , the p-th regression quantile estimators and the conditional quantile function estimator given the covariate vector x are obtained as, respectively, w=
n
(αi − α∗i )xi and Q(p|x) =
i=1
n
(αi − α∗i )xti xi .
(7)
i=1
For the nonlinear quantile regression case, the conditional quantile function requires the computations of dot products φ(xk )t φ(xl ), k, l = 1, · · · , n, in a potentially higher dimensional feature space. Under certain conditions(Mercer[7]), these demanding computations can be reduced significantly by introducing a kernel function K such that φ(xk )t φ(xl ) = K(xk , xl ).
(8)
The kernels often used are given below. K(x, y) = (xt y + 1)d , K(x, y) = e−
x−y2 2σ2
,
(9)
where d and σ 2 are kernel parameters.
3
Support Vector Interval Regression
The proposed model is divided into two parts - the lower approximation model and the upper approximation model. In the lower approximation model, a main
Interval Regression Analysis Using Support Vector Machine
103
proportion of the data without extreme points is determined from the data classified into three classes. In the upper approximation model, intervals including all observations are obtained based on the already obtained lower approximation model. 3.1
Lower Approximation Model
In the lower approximation model, we identify the interval regression model from the main trend of data. We adopt the quantile regression using SVM to select the data set which we want to consider mainly. By applying the quantile regression, we can determine a main proportion of the given data without extreme points. If we want to consider p100%(0 < p < 1) center-located observations mainly, that portion of data can be obtained by the quantile regression with p1 = 0.5 + p/2 and p2 = 0.5 − p/2. The obtained model is insensitive to extreme data points since only p100% center-located observations are used. The lower approximation model is expressed as YL (xi ) = B0 + B1 xi1 + · · · + Bm xim − + = yL (xi ), yL (xi ) , i ∈ C2 ,
(10)
where the interval coefficients are denoted as Bj = (bj , dj ) (i = 0, 1, · · · , m) and − + (xi ) and yL (xi ) are bounds of YL (xi ). Outputs in Class 2 (i ∈ C2 ) should be yL included in the lower approximation model YL (xi ), which can be expressed as follows: t b xi + dt |xi | ≥ yi , i ∈ C2 . (11) yi ∈ YL (xi ) ⇐⇒ bt xi − dt |xi | ≤ yi It is desirable that observations in C1 are located above the upper bound of YL (xi ), while observations in C3 are located below the lower bound of YL (xi ). But, for multivariate data, some observations are in C1 or C3 or both are included in the lower approximation model YL (xi ). Thus, to permit some observations in C1 or C3 or both are included in YL (xi ), we introduce a tolerance vector θ = (θ0 , · · · , θm )t . Adding the tolerance vector θ to the radius vector d in the interval coefficient vector B, the following inequalities for observations in C1 and C3 are considered: yi ≥ bt xi + (dt |xi | − θt |xi |), i ∈ C1 , yi ≤ bt xi − (dt |xi | − θt |xi |), i ∈ C3 ,
(12)
where θj is the tolerance parameter introduced by Lee and Tanaka[6]. θj = 0 (j = 0, 1, · · · , m) indicates that a lower approximation model YL (xi ) classifies the data into three classes clearly, and θj = 0 for some i = {0, 1, · · · , n} indicates that the lower approximation model YL (xi ) includes some data points in C1 or C3 or both. Based on the assumptions mentioned above, the problem is to obtain the optimal interval coefficients of the lower approximation model Bj = (bj , dj ) (i = 0, 1, · · · , m) that minimize the following objective function: 1 ∗ min (b2 + d2 + θ2 ) + γ( ξ1i + ξ2i + (ξ3i + ξ3i )) 2 i∈C
i∈C
i∈C2
104
C. Hwang et al.
subject to
⎧ t d |xi | ≤ ξ1i , i ∈ C = C1 ∪ C2 ∪ C3 ⎪ ⎪ ⎪ ⎪ θt |xi | ≤ ξ2i , i ∈ C ⎪ ⎪ ⎪ ∗ ⎪ + , i ∈ C2 yi − bt xi ≤ ξ3i + , bt xi − yi ≤ ξ3i ⎪ ⎪ ⎨ bt x − dt |x | ≤ y ≤ bt x + dt |x |, i ∈ C i
i
i
i
i
2
yi ≥ bt xi + dt |xi | − θt |xi |, i ∈ C1 ⎪ ⎪ ⎪ ⎪ yi ≤ bt xi − dt |xi | + θt |xi |, i ∈ C3 ⎪ ⎪ ⎪ t t ⎪ ⎪ ⎪ d |xi | − θ |xi | ≥ 0, i ∈ C ⎩ (∗) ξ1i ≥ 0, ξ2i ≥ 0, i ∈ C and ξ3i ≥ 0, i ∈ C2 .
(13)
∗ are slack Here, ξ1i represent spreads of the estimated outputs, and ξ2i , ξ3i , ξ3i variables representing upper and lower constraints on the outputs of the model. We construct a Lagrange function and differentiating it with respect to ∗ , we have the corresponding dual optimization problem. we a, c, ξ1i , ξ2i , ξ3i , ξ3i can derive the corresponding dual optimization problem.
t α2i α2j |xi |t |xj | max − 21 i,j∈C α1i α1j |xi | |xj | +
i,j∈C t +2 i,j∈C α7i α7j |xi | |xj | + i,j∈C2 (α3i − α∗3i )(α3j − α∗3j ) xti xj
+ i,j∈C2 (α4i − α∗4i )(α4j − α∗4j ) xti xj
+ − 2 i∈C1 ,j∈C3 α5i α6j xti xj
+ i,j∈C2 (α4i + α∗4i )(α4j + α∗4j ) |xi |t |xj |
+3 i,j∈C1 α5i α5j |xi |t |xj | + 3 i,j∈C3 α6i α6j |xi |t |xj |
+ i,j∈C2 (α3i − α∗3i )(α4j − α∗4j ) xti xj
−2 i∈C2 ,j∈C1 (α3i − α∗3i )α5j xti xj
+2 i∈C2 ,j∈C3 (α3i − α∗3i )α6j xti xj
−2 i∈C2 ,j∈C1 (α4i − α∗4i )α5j xti xj
+2 i∈C2 ,j∈C3 (α4i − α∗4i )α6j xti xj
(14) −2 i∈C,j∈C2 (α1i + α∗7i )(α4j + α∗4j ) |xi |t |xj |
∗ t −2 i∈C2 ,j∈C1 (α4i + α4i )α5j |xi | |xj |
−2 i∈C2 ,j∈C3 (α4i + α∗4i )α6j |xi |t |xj |
+2 i,j∈C (α1i + α∗2i )α7j |xi |t |xj |
+2 i∈C,j∈C1 (α1i − α∗2i )α5j |xi |t |xj |
−2 i∈C,j∈C3 (α1i + α∗2i )α6j |xi |t |xj |
+4 i∈C1 ,j∈C3 α5i α6j |xi |t |xj |
yi ) + i∈C2 α∗3i ( + yi ) − i∈C2 α3i ( − − i∈C2 α4i yi + i∈C2 α∗4i yi + i∈C1 α5i yi − i∈C3 α∗3i yi (∗) (∗) subject to 0 ≤ α1i , α2i , α3i ≤ γ, α4i , α5i , α6i , α7i ≥ 0.
Solving the above problem, the optimal upper approximation model is obtained as follows:
∗ YL (x) = ( + (α4i − α∗4i )]xi t x − i∈C1 α5i xi t x i∈C2 [(α3i − α3i ) t + i∈C3 α6i xi t x , (15) i∈C [−α3i
+ α7i ]|xi | |x|
∗ t + i∈C2 (α4i + α4i )|xi | |x| − i∈C1 α5i |xi |t |x| − i∈C3 α6i |xi |t |x| ).
Interval Regression Analysis Using Support Vector Machine
3.2
105
Upper Approximation Model
In the upper approximation model, we formulate the interval regression model including all observations based on the already obtained lower approximation model(main trend). The upper approximation model YU (xi ) including all data can be expressed as + · ·· + Am xim YU (xi ) = A 0−+ A1 xi1 + = yU (xi ), yU (xi ) , i = 1, · · · , n,
(16)
where Aj = (aj , cj ) (j = 0, 1, · · · , m), aj and cj are a center and a radius of − + (xi ) and yU (xi ) are bounds of YU (xi ). We the interval coefficient Aj and yU have already obtained the lower approximation model YL (xi ) representing the main trend of the given data. Since the upper approximation model YU (xi ) should include the lower approximation model. Thus, we can formulate the upper approximation model including all observations as the following problem:
n
n
n ∗ ), min 12 (a2 + c2 ) + γ( i=1 ξ1i + i=1 ξ2i + i=1 ξ2i t subject to c |xi | ≤ ξ1i , ∗ + , yi − at xi ≤ ξ2i + , at xi − yi ≤ ξ2i − t t a xi − c |xi | ≤ min(y(xi ), yL (xi )), + (xi )) at xi + ct |xi | ≥ max(y(xi ), yL (∗) ξ1i ≥ 0, ξ2i ≥ 0.
(17)
Then constructing a Lagrange function and differentiating it with respect to ∗ a, c, ξ1i , ξ2i , ξ2i , we have the corresponding dual optimization problem.
maximize − 21 ni,j=1 (α2i − α∗2i )(α2j − α∗2j )xti xj
n − 21 i,j=1 (α3i − α∗3i )(α3j − α3j ∗ )xti xj
n − i,j=1 (α2i − α∗2i )(α3j − α∗3j )xti xj
n − 21 i,j=1 (α3i + α∗3i )(α3j + α∗3j )|xi |t |xj |
(18) − 21 ni,j=1 (α1i α1j )|xi |t |xj |
n + i,j=1 α1i (α3j + α∗3j )|xi |t |xj | + ni=1 (α2i − α∗2i )yi
n
n + − + i=1 α3i max(y(xi ), yL (xi )) − i=1 α∗3i min(y(xi ), yL (xi )) n ∗ − i=1 (α2i + α2i ) Solving the above problem, the optimal upper approximation model is obtained as follows:
n − α∗2i ) + (α3i − α∗3i )]xi t x, YU (x) = ( i=1 [(α
2i (19) n ∗ t i=1 [−α1i + (α3i + α3i )]|xi | |x|). By using kernel tricks mentioned in Section 2, we easily extend the linear interval regression into the nonlinear interval regression. We obtain the following dual optimization problem:
106
C. Hwang et al.
maximize − 21 ni,j=1 (α2i − α∗2i )(α2j − α∗2j )K(xi , xj )
n − 21 i,j=1 (α3i − α∗3i )(α3j − α∗3j )K(xi , xj )
n − i,j=1 (α2i − α∗2i )(α3j − α∗3j )K(xi , xj )
n − 21 i,j=1 (α3i + α∗3i )(α3j + α∗3j )K(|xi |, |xj |)
(20) − 21 ni,j=1 (α1i α1j )K(|xi |, |xj |)
+ ni,j=1 α1i (α3j + α∗3j )K(|xi |, |xj |)
+ ni=1 (α2i − α∗2i )yi
n n + − + i=1 α3i max(y(xi ), yL (xi )) − i=1 α∗3i min(y(xi ), yL (xi )) n ∗ − i=1 (α2i + α2i ) Here we should notice that the constraints 0 ≤ α1i , α2i , α∗2i ≤ γ and α3i , α∗3i ≥ 0 are unchanged. Solving the above dual optimization problem determines the Lagrange multipliers, α1i , αki , α∗ki , k = 2, 3. For the nonlinear case, the difference from the linear case is that a and c are no longer explicitly given. However, they are uniquely defined in the weak sense by the inner products at φ(x) and ct φ(|x|). Similar to the linear case, it is noted that ct φ(|x|) is nonnegative. Therefore, interval nonlinear regression function is given as follows:
n ∗ ∗ YU (x) = ( i=1 [(α
2in− α2i ) + (α3i − α3i∗)]K(xi , x), (21) i=1 [−α1i + (α3i + α3i )]K(|xi |, |x|)).
4
Numerical Studies
In order to illustrate the performance of the interval regression estimation using SVM, four examples are considered. Two examples are for the linear interval regression and other two examples are for the nonlinear interval regression. First, we apply the linear interval regression to the officials number data(Lee and Tanaka[6]). The officials number data are from 32 mid-size (the number of officials is between 400 and 1000) cities of Korea in 1987, Input data : x1 = x2 = x3 = x4 = x5 = x6 = x7 = Output data :
area (km2 ) population (1000) number of district annual revenues (108 won) rate of water sewerage service (%) rate of water service (%) rate of housing supply (%) y = number of officials.
We put the main trend of the number of officials in mid-size cities as 60%, so that p1 = 0.8 and p2 = 0.2. Our goal is to determine the lower approximation model YL (xi ) and the upper approximation model YU (xi ). Here, C1 , C2 , C3 are found by quantile regression
In this data set, γ and are chosen as 30 and
using SVM. 0 which minimize i dt |xi | + i |yi − bt xi | . For the illustration of the nonlinear
Interval Regression Analysis Using Support Vector Machine
107
interval regression, 51 of x are generated from xi = 0.04(i − 1)− 1 and 51 of y are generated from yi = xi + 2 exp(−16x2i ) + 0.5ei , i = 1, · · · , 51 , ei ∼ U (−1, +1). C1 , C2 , C3 are found by SVM based quantile regression and the Gaussian kernel x−y2
K(x, y) = e− σ2 is used. For quantile regression, the kernel parameter σ 2 and γ are chosen as 0.1 and 500 by 10-fold cross validation, respectively. And 2 the kernel parameter σ and γ are chosen as 0.3 and 200 and = 0.05 which t minimize i d |φ(xi )|+ i |yi −bt φ(xi )| . Figure 1 and 2 illustrate the estimation results of officials number data and the simulated nonlinear data, respectively. In figures dot line shows the main trend of data and solid line shows the interval regression including all data. We have another real data sets, each of data set has one input variable so that they can show the difference of the linear and the nonlinear interval regression model clearly. Figure 3 illustrates the estimation result of fisheggs data (DeBlois and Leggett[1]) where the number eaten and the density of eggs are known to be linearly related. For the estimation of the fisheggs data we use p1 = 0.8, p2 = 0.2, γ = 200, and = 0.1. Figure 4 illustrates the estimation result of the ultrasonic data available from http://www.itl.nist.gov/div898/strd/nls, where the ultrasonic response and the metal distance are known to be nonlinearly related. Here we use p1 = 0.8, p2 = 0.2, γ = 500, σ 2 = 1 for the quantile regression, and γ = 100, σ 2 = 1, = 0.05 for the interval regression. The Gaussian kernel is used for the ultrasonic data , the values of parameters used in estimation of fisheggs data and ultrasonic data are obtained from 10-fold cross validation and minimizing i dt |φ(xi )| + i |yi − bt φ(xi )| .
2.5
1000
2
900
1.5
800
1
700
0.5
y
number of officials
1100
600
0
500
−0.5
400
−1
300
−1.5 −1
0
5
10
15 city number
20
25
30
Fig. 1. Linear interval Regression 1
5
−0.8
−0.6
−0.4
−0.2
0 x
0.2
0.4
0.6
0.8
1
Fig. 2. Nonlinear interval Regression 1
Conclusions
We proposed interval regression analysis using the principle of SVM and quantile regression. We identify the main trend from the lower approximation model using the designated center-located proportion of the given data. The obtained lower approximation model can explain the relationship between the input and output
108
C. Hwang et al.
20
100
18
90
16
80
70 ultrasonic response
number eaten
14
12
10
8
60
50
40
6
30
4
20
2
0
10
0
10
20
30
40
50
60
70
density
Fig. 3. Linear interval Regression 2
0 0.5
1
1.5
2
2.5
3 3.5 metal distance
4
4.5
5
5.5
6
Fig. 4. Nonlinear interval Regression 2
data well, because the obtained model was not influenced by extreme points. The upper approximation model including all data was obtained on the basis of the lower approximation model. Through examples, we showed that the proposed algorithm derives the satisfying solutions and is attractive approach to interval regression. In particular, we can use this algorithm successfully when a linear model is inappropriate. By using two approximation models we could easily recognize observations whose outputs are extremely far from the main trend of the given data, which leads our proposed algorithm to be useful for the analysis of the data with possibilistic viewpoints.
Acknowledgement This work was supported by the Korea Research Foundation Grant(KRF-2004042-C00020).
References 1. DeBlois, E.M. and W.C. Leggett. : Functional response and potential impact of invertebrate predators on benthic fish eggs: analysis of the Calliopius laeviusculuscapelin (Mallotus villosus) predator-prey system. Marine Ecology Progress Series 69 (1991) 205–216 2. Gunn S. : Support vector machines for classification and regression. ISIS Technical Report, U. of Southampton (1998) 3. Hong D. and Hwang C. : Interval regression analysis using quadratic loss support vector machine. IEEE Transactions on Fuzzy Systems 13(2) April (2005) 229–237 4. Kecman V. : Learning and soft computing, support vector machines, neural networks and fuzzy logic moldes. The MIT Press, Cambridge, MA (2001) 5. Koenker R. and Bassett G. : Regression quantiles. Econometrica 46 (1978) 33–50 6. Lee H. and Tanaka H. : Upper and lower approximation models in interval regression using regression quantile techniques. European Journal of Operational Research 116 (1999) 653–666
Interval Regression Analysis Using Support Vector Machine
109
7. Mercer J. : Functions of positive and negative and their connection with the theory of integral equations. Philosphical Transactions of the Royal Society A (1909) 415– 446 8. Smola A. and Schoelkopf B. : On a kernel-based method for pattern recognition, regression, approximation and operator inversion. Algorithmica 22 (1998) 211–231 9. Tanaka H., Koyana K. and Lee H. : Interval regression analysis based on quadratic programming. In: Proceedings of The 5th IEEE International Conference on Fuzzy Systems, New Orleans, USA (1996) 325–329 10. Vapnik V. N. : The nature of statistical learning theory. Springer, New York (1995) 11. Vapnik, V. N.: Statistical learning theory. Springer, New York (1998) 12. Wang, L.(Ed.) : Support vector machines: theory and application. Springer, Berlin Heidelberg New York (2005)
An Approach Based on Similarity Measure to Multiple Attribute Decision Making with Trapezoid Fuzzy Linguistic Variables Zeshui Xu College of Economics and Management, Southeast University, Nanjing, Jiangsu 210096, China [email protected]
Abstract. In this paper, we investigate the multiple attribute decision making problems under fuzzy linguistic environment. We introduce the concept of trapezoid fuzzy linguistic variable and some operational laws of trapezoid fuzzy linguistic variables. We develop a similarity measure between two trapezoid fuzzy linguistic variables. Based on the similarity measure and the ideal point of attribute values, we develop an approach to ranking the decision alternatives in multiple attribute decision making with trapezoid fuzzy linguistic variables. We finally illustrate the developed approach with a practical example.
1 Introduction Multiple attribute decision making under linguistic environment is an interesting research topic having received more and more attention from researchers during the last several years [1-5]. In the process of multiple attribute decision making, the linguistic decision information needs to be aggregated by means of some proper approaches so as to rank the given decision alternatives and then to select the most desirable one. Bordogna et al. [1] developed a model within fuzzy set theory by linguistic ordered weighted average (OWA) operators for group decision making in a linguistic context. Herrera and Martínez [2] established a linguistic 2-tuple computational model for dealing with linguistic information. Li and Yang [3] developed a linear programming technique for multidimensional analysis of preferences in multiple attribute group decision making under fuzzy environments, in which all the linguistic information and real numbers are transformed into triangular fuzzy numbers. Xu [4,5] proposed some methods, which compute with words directly. In this paper, we shall investigate the multiple attribute decision making problems under fuzzy linguistic environment, in which the decision maker can only provide their preferences (attribute values) in the form of trapezoid fuzzy linguistic variables. In order to do that, this paper is structured as follows. In Section 2 we define the concept of trapezoid fuzzy linguistic variable and some operational laws of trapezoid fuzzy linguistic variables, and then develop a similarity measure between two trapezoid fuzzy linguistic variables. In Section 3 we develop an approach to ranking the decision alternatives based on the similarity measure and the ideal point of attribute values. We L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 110 – 117, 2005. © Springer-Verlag Berlin Heidelberg 2005
An Approach Based on Similarity Measure to Multiple Attribute Decision Making
111
illustrate the developed approach with a practical example in Section 4, and give concluding remarks in Section 5.
2 Trapezoid Fuzzy Linguistic Variables In [6], Zadeh introduced the concept of linguistic variable, that is, whose values are words rather than numbers. The concept of linguistic variable has played and is continuing to play a pivotal role in decision making with linguistic information. Computing with words is a methodology in which the objects of computation are words and propositions drawn from a natural language, e.g., small, large, far, heavy, not very likely, etc., it is inspired by the remarkable human capability to perform a wide variety of physical and mental tasks without any measurements and any computations [7]. In the process of decision making with linguistic information, the decision maker generally provides his/her linguistic assessment information by using a linguistic scale. In [8], Xu defined a finite and totally ordered discrete linguistic scale as S = { s i | i = − t ,..., t } , where t is a non- negative integer, s i represents a possible value for a linguistic variable, and it requires that si < s j iff i < j . For example, a set of nine labels S could be:
S = {s − 4 = extremely poor , s −3 = very poor , s − 2 = poor , s−1 = slightly poor, s0 = fair, s1 = slightly good, s2 = good, s3 = very good, s4 = extremely good } In the process of information aggregating, some results may do not exactly match any linguistic labels in S . To preserve all the given information, Xu [8] extended the discrete label set S to a continuous label set S = { s α | α ∈ [ − q , q ]} , where
q ( q > t ) is a sufficiently large positive integer. If sα ∈ S , then sα is termed an original linguistic label, otherwise,
sα is termed a virtual linguistic label. In general,
the decision maker uses the original linguistic labels to evaluate alternatives, and the virtual linguistic labels can only appear in operation. Since the decision maker is characterized by his own personal background and experience, in some situations, the decision maker may provide fuzzy linguistic information because of time pressure, lack of knowledge, and their limited expertise related with the problem domain. In the following we define the concept of trapezoid fuzzy linguistic variable Definition 1. Let
sγ
s = [ sα , s β , sγ , sη ] ∈ S , where sα , s β , sγ , sη ∈ S , s β
indicate the interval in which the membership value is 1, with
s , respectively, then s
sα
and
sη
and indi-
cating the lower and upper values of is called a trapezoid fuzzy linguistic variable, which is characterized by the following member function (see Fig.1)
112
Z. Xu
0, ° ° d ( sθ , s α ) ° d (s , s ) , β α °° μ s (θ ) = ®1, ° ° d ( sθ , sη ) , ° d ( s γ , sη ) ° °¯ 0 ,
s − q ≤ sθ ≤ s α s α ≤ sθ ≤ s β s β ≤ sθ ≤ s γ s γ ≤ sθ ≤ s η s η ≤ sθ ≤ s q
S is the set of all trapezoid fuzzy linguistic variables. Especially, if any two of α , β , γ ,η are equal, then s is reduced to a triangular fuzzy linguistic variable; if
where
any three of variable.
α , β , γ ,η are
equal, then
s
is reduced to an uncertain linguistic
Fig. 1. A trapezoid fuzzy linguistic variable
Consider any three trapezoid fuzzy linguistic variables
s s = [ sα , s β , sγ , sη ],
s1 = [sα1 , s β1 , sγ 1 , sη1 ], s2 = [sα2 , sβ2 , sγ 2 , sη2 ] ∈ S , and suppose that λ ∈ [0,1] , then we define their operational laws as follows: 1) 2)
s1 ⊕ s2 = [sα2 , sβ2 , sγ 2 , sη2 ] ⊕[sα2 , sβ2 , sγ 2 , sη2 ] = [sα2 +α2 , sβ2 +β2 , sγ 2 +γ 2 , sη2 +η2 ]; λ s = λ[ sα , s β , sγ , sη ] = [ s λα , s λβ , s λγ , s λη ] .
In order to measure the similarity degree between any two trapezoid fuzzy linguistic values
s1 = [ sα1 , s β1 , sγ 1 , sη1 ] and s 2 = [ sα 2 , s β 2 , sγ 2 , sη2 ] ∈ S ,
duce a similarity measure as follows:
we intro-
An Approach Based on Similarity Measure to Multiple Attribute Decision Making
113
| α − α 1 | + | β 2 − β 1 | + | γ 2 − γ 1 | + | η 2 − η1 | s ( s1 , s 2 ) = 1 − 2 8q
(1)
where
s( s1 , s2 )
is called the similarity degree between
s1 and s 2 .
Obviously, the
greater the value of s ( s1 , s 2 ) , the closer s1 to s 2 . The properties of the similarity degree s ( s1 , s 2 ) are shown as follows:
2) s ( s1 , s 2 ) = 1 iff s1 = s 2 , that is α1 = α 2 , β1 = β 2 , γ 1 = γ 2 , η1 = η2 ; 1) 0 ≤ s ( s1 , s 2 ) ≤ 1 ;
3)
s( s1 , s 2 ) = s( s 2 , s1 ) .
Below we propose an operator for aggregating trapezoid fuzzy linguistic variables. Definition 2. Let
TFLWA : S n → S , if
TFLWAw (s1 , s 2 ,..., s n ) = w1 s1 ⊕ w2 s 2 ⊕ " ⊕ wn s n where w = (w1 , w2 ,..., wn ) is the weighting vector of the
i = 1,2,..., n,
n
¦w
i
(2)
s i , s i ∈ S , wi ≥ 0,
= 1 then TFLWA is called a trapezoid fuzzy linguistic
i =1
weighted averaging (TFLWA) operator. Especially, if w = (1 n ,1 n ,...,1 n ) , then TFLWA operator is reduced to a trapezoid fuzzy linguistic averaging (TFLA) operator. Example 1. Assume
s1 =[s−3, s−2 , s0 , s2 ], s2 = [s−1 , s0 , s1 , s2 ] , s3 =[s0 , s1, s2 , s4 ], and
s4 = [ s−1 , s1 , s2 , s3 ] , w = (0.3,0.1,0.2,0.4) , then by the operational laws of trape-
zoid fuzzy linguistic variables, we have
TFLWAw (s1 , s 2 , s3 , s4 ) = 0.3×[s−3 , s −2 , s0 , s 2 ] ⊕ 0.1×[s−1 , s0 , s1 , s 2 ] ⊕ 0.2 × [ s −1 , s 0 , s1 , s 2 ] ⊕ 0.4 × [ s 0 , s1 , s 2 , s 4 ]
= [ s −1.2 , s − 0.2 , s1.1 , s 2.8 ] 3 A Similarity Measure Based Approach A multiple attribute decision making problem under fuzzy linguistic environment is represented as follows: Let
X = {x1 , x 2 ,..., x n } be the set of alternatives, and U = {u1 , u 2 ,..., u m } be
the set of attributes. Let
w = (w1 , w2 ,..., wm ) be the weight vector of attributes,
114
Z. Xu
where wi ≥ 0, i = 1,2,..., m,
= 1 . Suppose that A = (aij ) m×n is the fuzzy
m
¦w
i
i =1
aij = [aij(α ) , aij( β ) , aij(γ ) , aij(η ) ] ∈ S is the attribute
linguistic decision matrix, where
value, which takes the form of trapezoid fuzzy linguistic variable, given by the deci-
x j ∈ X with respect to the attribute ui ∈U . Let a j = (a1 j , a2 j ,...,amj ) be the vector of the attribute values corresponding to the
sion maker, for the alternative
x j , j = 1,2,...,n . Definition 3. Let A = (a ij ) m×n be the decision matrix with trapezoid fuzzy linalternative
guistic variables, then we call I = ( I 1 , I 2 ,..., I m ) the ideal point of attribute
values, where I i = [ I i(α ) , I i( β ) , I i(γ ) , I i(η ) ] , I i(α ) = max{aij(α ) } , I i( β ) = max{a ij( β ) } , j
j
I
(γ ) i
(γ ) ij
= max{a } , I j
(η ) i
(η ) ij
= max{a } , i = 1,2,..., m . j
In the following we develop an approach to ranking the decision alternatives based on the similarity measure and the ideal point of attribute values: Step 1. Utilize the TFLWA operator
z j = TFLWAw (a1 j , a 2 j ,..., a mj ) = w1a1 j ⊕ w2 a2 j ⊕ " ⊕ wm amj , to
derive
the
overall
j = 1, 2 ,..., n values z j ( j = 1, 2,..., n )
(3) of
the
alternatives
x j ( j = 1,2,..., n) , where w= (w1, w2 ,...,wm ) is the weight vector of attributes. Step 2. Utilize the TFLWA operator
z = TFLWA w ( I 1 , I 2 ,..., I m ) = w1 I 1 ⊕ w2 I 2 ⊕ " ⊕ wm I m , j = 1, 2 ,..., n (4) to derive the overall value z of the ideal point I = ( I 1 , I 2 ,..., I m ) , where w = (w1 , w2 ,..., wm ) is the weight vector of attributes. Step 3. By (1), we get the similarity degree s ( z , z j ) between z and z j
( j = 1,2,..., n). Step 4. Rank all the alternatives x j ( j = 1,2,..., n) and select the best one in accordance with
s ( z , z j ) ( j = 1,2,..., n).
Step 5. End.
An Approach Based on Similarity Measure to Multiple Attribute Decision Making
115
4 Illustrative Example In this section, a decision making problem of assessing cars for buying (adapted from [2]) is used to illustrate the developed approach. Let us consider, a customer who intends to buy a car. Four types of cars
x j ( j = 1, 2, 3, 4) are available. The customer takes into account four attributes to decide which car to buy: 1) G 1 : economy, 2)
G2 : comfort, 3) G3 : design, and 4)
G 4 : safety. The decision maker evaluates these four types of cars x j ( j = 1, 2, 3, 4) under
the
Gi (i = 1,2,3,4)
attributes
(whose
weight
vector
is
w = (0.3,0.2,0.1,0.4) ) by using the linguistic scale S = {s − 4 = extremely poor , s −3 = very poor , s − 2 = poor , s−1 = slightly poor, s0 = fair, s1 = slightly good, s2 = good, s3 = very good, s4 = extremely good } and gives a fuzzy linguistic decision matrix as listed in Table 1.
Table 1. Fuzzy linguistic decision matrix A
x1 [s-3, s-2, s0, s1] [s-1, s0, s3, s4] [s0, s1, s2, s4] [s-2, s-1, s0, s2]
Gi G1 G2 G3 G4
x2
[s-2, s0, s1, s2] [s0, s1, s2, s3] [s-1, s0, s3, s4] [s-1, s0, s2, s3]
x3 [s-1, s1, s3, s4] [s-4, s-3, s-1, s1] [s1, s2, s3, s4] [s-2, s-1, s0, s1]
x4 [s0, s1, s2, s4] [s-1, s2, s3, s4] [s-2, s0, s1, s2] [s1, s2, s3, s4]
In the following, we utilize the approach developed in this paper to get the most desirable car: Step 1. From Table 1, we get the vector of the attribute values corresponding to the alternative x j ( j = 1,2,3,4) , and the ideal point as follows:
a11 = [ s − 3 , s − 2 , s 0 , s1 ] , a21 = [s−1 , s0 , s3 , s4 ] , a31 = [s0 , s1 , s2 , s4 ] , a 41 = [ s −2 , s −1 , s0 , s 2 ] . 2) a 2 = (a12 , a 22 , a32 , a 42 ) , where a12 = [s − 2 , s 0 , s1 , s 2 ] , a 22 = [ s0 , s1 , s 2 , s3 ] , a32 = [ s −1 , s 0 , s3 , s 4 ] , a 42 = [ s −1 , s0 , s 2 , s3 ] . 3) a3 = (a13 , a 23 , a 33 , a 43 ) , where a13 = [ s −1 , s1 , s 3 , s 4 ] , a 23 = [ s −4 , s −3 , s −1 , s1 ] , a33 = [ s1 , s 2 , s3 , s 4 ] , a 43 = [ s −2 , s −1 , s 0 , s1 ]
1) a1 = (a11 , a 21 , a31 , a 41 ) , where
116
Z. Xu
4) a 4 = (a14 , a 24 , a34 , a 44 ) , where a14 = [s 0 , s1 , s 2 , s 4 ] ,
a 24 = [s −1 , s 2 , s3 , s 4 ] , a34 = [ s − 2 , s0 , s1 , s 2 ] , a 44 = [s1 , s 2 , s3 , s 4 ] 5) I = ( I 1 , I 2 , I 3 , I 4 ) , where I 1 = [ s 0 , s1 , s 3 , s 4 ] , I 2 = [ s 0 , s 2 , s 3 , s 4 ] , I 3 = [ s1 , s 2 , s3 , s 4 ] , I 4 = [ s1 , s 2 , s 3 , s 4 ] . Step 2. Utilize (3) to derive the overall values z j ( j = 1,2,3,4 ) of the alternatives
x j ( j = 1,2,3,4) : z1 = [ s −1.9 , s −0.9 , s 0.8 , s 2.3 ] , z 2 = [ s −1.1 , s −0.4 , s1.8 , s2.8 ] z3 = [ s−1.8 , s −0.5 , s1 , s 2.2 ] , z 4 = [ s 0 , s1 .5 , s 2 .5 , s 3 .8 ] Step 3. Utilize (4) to derive the overall value z of the ideal point I : z = [ s 0 .5 , s1 .7 , s 3 , s 4 ] Step 4. By (1) (suppose that tween
z
and
q = 4 ),
we get the similarity degree
z j ( j = 1,2,3,4) :
s( z , z j )
be-
s( z , z1 ) = 0.72 , s ( z , z 2 ) = 0.83 , s( z , z 3 ) = 0.74 , s( z , z 4 ) = 0.96 Step 5. Rank all the alternatives x j ( j = 1,2,3,4) in accordance with s ( z , z j )
( j = 1,2,3,4) : x 4 ; x 2 ; x3 ; x1 and thus the best car is
x4 .
5 Concluding Remarks We have defined the concept of trapezoid fuzzy linguistic variable, and developed a similarity measure between two trapezoid fuzzy linguistic variables. Based on the similarity measure and the ideal point of attribute values, we have developed an approach to multiple attribute decision making with trapezoid fuzzy linguistic variables, which utilizes an aggregation operator called trapezoid fuzzy linguistic weighted averaging (TFLWA) operator to fuse all the given decision information corresponding to each alternative, and uses the similarity measure to rank the decision alternatives and then to select the most desirable one.
Acknowledgement This work was supported by China Postdoctoral Science Foundation under Project (2003034366).
An Approach Based on Similarity Measure to Multiple Attribute Decision Making
117
References 1. Bordogna, G., Fedrizzi, M., Pasi G.: A Linguistic Modeling of Consensus in Group Decision Making Based on OWA Operators. IEEE Transactions on Systems, Man, and Cybernetics-Part A 27 (1997) 126-132. 2. Herrera, F., Martínez, L.: An Approach for Combining Numerical and Linguistic Information Based on the 2-Tuple Fuzzy Linguistic Representation Model in Decision Making. International Journal of Uncertainty, Fuzziness and Knowledge -Based Systems 8 (2000) 539-562. 3. Li, D.F., Yang, J.B.: Fuzzy Linear Programming Technique for Multi-attribute Group Decision Making in Fuzzy Environments. Information Sciences 158 (2004) 263-275. 4. Xu, Z.S.: Uncertain Linguistic Aggregation Operators Based Approach to Multiple Attribute Group Decision Making under Uncertain Linguistic Environment. Information Sciences 168 (2004) 171-184. 5. Xu, Z.S.: Uncertain Multiple Attribute Decision Making: Methods and Applications, Tsinghua University Press, Beijing (2004). 6. Zadeh, L.A.: Outline of a New Approach to the Analysis of Complex Systems and Decision Processes. IEEE Transactions on Systems, Man, and Cybernetics 3 (1973) 28-44. 7. Zadeh, L.A.: From Computing with Numbers to Computing with Words-from Manipulation of Measurements to Manipulation of Perceptions. International Journal of Applied Mathematics and Computer Science, 12 (2002) 307-324. 8. Xu, Z.S. Deviation Measures of Linguistic Preference Relations in Group Decision Making. Omega, 33 (2005) 249-254.
Research on Index System and Fuzzy Comprehensive Evaluation Method for Passenger Satisfaction* Yuanfeng Zhou1, Jianping Wu2, and Yuanhua Jia3 1
School of Traffic and Transportation, Beijing Jiaotong University, 100044, Beijing, P. R. China [email protected] 2 School of Civil Engineering and the Environment, University of Southampton, SO17 1BJ, Southampton, UK 3 School of Traffic and Transportation, Beijing Jiaotong University, 100044, Beijing, P. R. China
Abstract. Passenger satisfaction index (PSI) is one of the most important indexes in comprehensive evaluation of management performance and service quality of passenger transport corporations. Based on the investigations in China, the authors introduced the notion and method for passenger group division and the concept of index weight matrix, and made successful application for passenger satisfaction evaluation. Index weight matrix developed by applying AHP and Delphi methods gives satisfactory results. The paper ends with examples of using a fuzzy inference system for passenger satisfaction evaluation in Beijing railway station.
1 Introduction Satisfaction is the feeling of people formed by comparing perceivable quality with that expected for services or products. At present, competition for passengers in transport markets is fierce. It is important to develop a passenger satisfaction index system (PSI), and help transport enterprises efficiently modify and improve their service quality and management performance. The research team conducted by Claes Fornell developed a SCSB model in 1989[8]. In 1994, an ACSI model was developed in the USA and in 1999, an ECSI model was developed with the sponsorship of EOQ and EFQM. More attention was paid to the commodities quality and service in above models. In China, Zhao Ping assessed the China railway passenger satisfaction by making use of the least square method in 2001[6]. However, this model didn’t consider the difference made by education and the economic background of different passenger groups, which is believed to have significant influence on the accuracy and objectivity of the evaluation results [7]. In this paper, we give an index system for passenger satisfaction evaluation. We designed a series of questionnaires firstly and investigated in stations and trains. From the statistic data, the index matrix was plotted and then passenger satisfaction was *
This paper is based on the China Railway User Satisfaction Evaluation Project, which was supported by the S&T Development Plan(2002X024) of the Railway Department of China.
L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 118 – 121, 2005. © Springer-Verlag Berlin Heidelberg 2005
Research on Index System and Fuzzy Comprehensive Evaluation Method
119
gained by fuzzy comprehensive evaluation. Then, a real-time emulator for PS evaluation was designed and the result could reflect the true situation properly.
2 Index for Passenger Satisfaction Evaluation Passenger satisfaction includes much element associated with a passenger starting from buying/booking a ticket to the completion of a trip. Based on the pre-research results, a passenger satisfaction evaluation index system was proposed in this study. The author proposed a two-level index system, the General Survey and the Specific Survey. A general survey will have the following characteristics: comprehensiveness, importance, generality and completeness. The indexes of a general survey includes security, economy, efficiency, convenience, comfort, service quality, staff service quality, monitoring, infrastructures, etc. The details can be found in the research of Jia and Zhou in 2003[7]. However, a specific survey is the survey, which focuses on a specific service or products of passenger transport.
3 Fuzzy Model for Evaluation of Passenger Satisfaction As the result of the evaluation of satisfaction degree of passenger transport has involved too many factors, to use the Fuzzy theory (L.A. Zadeh, 1965) for the evaluation of the satisfaction degree is possibly a better alternative. A fuzzy evaluation system can be represented as:
U = {u1 , u 2 ,"u i }, where u i is
one of the indexes in the investigation. According to rundle theory, we divide passenger’s subjective judgment into 5 levels from very unsatisfactory to very satisfactory, and the relevant score evaluation, i.e.: V
Pj (j=1,…5) was given for a quantitative
= {v1 , v2 , " v5 } . This paper used the method which combines
AHP with Delphi method, to decide the fuzzy set of weights of indexes: A = {a1 , a 2 ," ai } . The details of calculation process of the eigenvector
W = [W1 ,",Wi ,",Wn ] , λmax and C.R. can be seen in the study of Li, 1993[2]. T
Passenger satisfaction is the subjective judgment of passengers, which is closely related to many factors such as education background, occupancy, age, etc. The following are identified as the major elements to be considered for passenger group classification: Social attributes; Natural attributes; Geographical distribution; Nature of the travel (private or company paid travel); Travel frequency; Economic solvency and psychological enduring ability of passengers. Different weights were given to different individuals and groups. When a passenger i of group k submitted a questionnaire, the real-time evaluation system will calculate the passenger’s weight vector by AHP method according to the indexes importance ordering given by him. The synthesized index weight vector of passenger group k is:
120
Y. Zhou, J. Wu, and Y. Jia
1 n Si ⋅ Wi k ¦ n i =1 T Si = μ xi = (μA ,…μC )xi = μA Aij + μE Eij + μI Iij + μOOij + μF Fij + μCCij Ak =
(1) (2)
Where, μ is the weighing factor vector. For passenger i, Si shows his contribution to index weight distribution of passenger group k, Si is calculated based on his individual information. Aij : passenger i with age group j. Similarly, Eij is for education, Iij is for income, Oij is for occupation, Fij is for frequency of travel, Cij is for the charge source. T
Single Passenger Satisfaction Evaluation Model: m
5
f i = ¦¦ σ ij PjWi
(3)
i =1 j =1
σ ij = 1 Satisfaction degree j of index i was selected σ ij = 0 Satisfaction degree j of index i wasn’t selected m
s.t.
5
f i = ¦¦ σ ij PjWi ≤ 100 i=1,2,…,n
(4)
i =1 j =1
W = ( w1 , " , wm ) T ≥ 0
(5)
Fuzzy Comprehensive Evaluation for Passenger Satisfaction: Firstly, we develop the membership function of index and build the fuzzy inference matrix Rnm. In the matrix, rij is the percentage of passengers who give index i a judgment of j. In order to consider the different influences of factors, the weight matrix A was applied. This leads to a fuzzy inference matrix B which has the form: B =A*R
(6)
B = {b1 , b2 ,"bm } is the fuzzy evaluation results. We applied the following rules: k −1
Assuming bk
= max bi , then b = ¦ bi . If b ≤ 0.5 the passenger satisfaction is i =1
[3]
grade K, otherwise, it is grade K-1 . The passenger transport enterprises can take as a comprehensive index of the passenger satisfaction evaluation.
b
4 Application Example Taking Beijing Railway Station passenger transport service as a reference, following is the relevant parameters and evaluation result:
Research on Index System and Fuzzy Comprehensive Evaluation Method
λmax = 15.387 B = A*R =
C.I.= 0.1067
R.I. = 1.58
121
C.R. = 0.0675 < 0.10
0.299953 0.459185 0.089498 0.111913 0.039451 0.315847 0.459253 0.083172 0.103227 0.038501
where b12 = 0.758 b22 = 0.774. As far as the service quality of Beijing Station is concerned, the evaluation of the higher middle group and that of the lower group passengers are both of the similar degree of comparatively satisfied, although the lower group passengers give a slightly higher score. This result is relatively much more acceptable and proper than the evaluating result conducted by Zhao Ping(2001).
5 Conclusion and Recommendations Based on the results above, we have the following conclusions and recommendations: Index weight reflects the differences between passengers on evaluation because of their different backgrounds. It is important to produce a credible index weight matrix. The index weight systems estimated by the method of combination of AHP and Delphi has given satisfactory and reasonable results. The fuzzy inference system has been successfully used in passenger satisfaction evaluation, and the application in Beijing Stations has shown its efficiency and credibility. The real time evaluation systems can help on management performance of transport enterprises by understanding the requests of passengers of different groups quickly.
References 1. Jianping Huang, Jin-Ming Li: Application of Consumer Satisfaction and Satisfaction Index in China. Journal of Beijing Institute of Business, Beijing (2000) 2. Guogang Li, Bao-Shan Li: Management Systems Engineering. The Publish House of the People's University of China, Beijing (1993) 3. Yongbo Lv, Yifei Xu: System Engineering. The Publish House of the Northern Jiaotong University, Beijing (2003) 4. Yong-Ling Cen: Fuzzy Comprehensive Evaluation Model for Consumer Satisfaction. Journal of Liaoning Engineering and Technology University (2001) 5. Yuanfeng Zhou, Yuan-hua Jia: Research on Fuzzy Evaluation Method for Railway Passenger Satisfaction. Journal of Northern Jiaotong University, vol.27(5), Beijing (2003)64-68. 6. Ping Zhao: Guide to China Customer Satisfaction Index. The Publish House of China Criterion, Beijing (2003) 7. Yuanhua Jia, Yuanfeng Zhou, Sufen Li: Report of China Railway Passenger Satisfaction Testing Index System and Data-collection and Real-time Evaluation system (2003) 8. Claes Fornell: A National Customer Satisfaction Barometer: The Swedish Experience. Journal of Marketing Vol56 January (1992)6~21 9. L.A. Zadeh: Fuzzy Sets. Inf. and Control. 8 (1965)38-53
Research on Predicting Hydatidiform Mole Canceration Tendency by a Fuzzy Integral Model Yecai Guo1, Wei Rao1, Yi Guo2, and Wei Ma2 1
Anhui University of Science and Technology, Huainan 232001, China 2 Shanghai Marine University, Shanghai, China [email protected]
Abstract. Based on the Fuzzy mathematical principle, a fuzzy integral model on forecasting the cancerational tendency of hydatidiform mole is created. In this paper, attaching function, quantum standard, weight value of each factor, which causes disease, and the threshold value of fuzzy integral value are determined under condition that medical experts take part in. The detailed measures in this paper are taken as follows: First, each medical expert gives the score of the sub-factors of each factor based on their clinic experience and professional knowledge. Second, based on analyzing the feature of the scores given by medical experts, attaching functions are established using K power parabola larger type. Third, weight values are determined using method by the analytic hierarchy process[AHP] method. Finally, the relative information is obtained from the case histories of hydatidiform mole cases. Fuzzy integral value of each case is calculated and its threshold value is finally determined. Accurate rate of the fuzzy integral model(FIM) is greater than that of the maximum likelihood method (MLM) via diagnosing the history cases and for new cases, the diagnosis results of the FIM is in accordance with those of the medical experts.
1 Introduction Hydatidiform mole is a benign and deutoplasmic tumo(u)r and regarded as benign hydatidiform mole. When its organization transfers to adjacent or far apparatus or invades human body of womb and becomes greater in size, volume, quantity, or scope, uterine cavity may be wore out and this will result in massive haemorrhage of abdomen. When its organization invades intraliagamentary, uterine hematoma is brought, indeed, this organization may transfer to vagina, lung, and brain, patients may die. In this case, benign hydatidiform mole has been turn into the malignant tumor and is fatal, it is called as malignant hydatidiform mole[1]. Now that malignant hydatidiform mole is the result of benign hydatidiform mole canceration, it is very necessary for us to predict that if the benign hydatidiform mole can become malignant hydatidiform mole or not. This is a question for worthful discussion. L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 122 – 129, 2005. © Springer-Verlag Berlin Heidelberg 2005
Research on Predicting Hydatidiform Mole Canceration Tendency by a FIM
123
In this paper, based on the fuzzy mathematic method, we discuss this the problem. The organization of this paper is as follows. In section 2, we establish the model for predicting the benign hydatidiform mole canceration. Section 3 gives a example to illustrate the effective of this model..
2 Hydatidiform Mole Canceration Prediction Model Assume that prediction set of hydatidiform mole canceration is given by T={T1,,T2}.
(1)
where T1 denotes hydatidiform mole non-canceration tendency, T2 represents hydatidiform mole canceration tendency. According to the correlative literatures[1][2] and the conditions of medical apparatus, assume that factor set S associated with the pathogenies of hydatidiform mole is written as
S = {S1 , S 2 , " , S10 } .
(2)
where S1, S2, S3, S4, S5, S6, S7, S8, S9, S10 are ages of the suffers, number of pregnancy, delitescence, enlarged rate of womb, womb size(weeks of pregnancy), hydatid mole size, highest titer value of HCG, Time Taken by turning result of pregnancy test based on crude urine into Negative Reaction(TTNR), pathological examination, and method for termination of pregnancy, respectively. The sub-factor set of the ith factor of pathogenies is denoted by
Si = {Si1 , Si 2 , " , SiL , " , SiK } . where i = 1, 2, " ,10; L = 1, 2, " , K , K represents number of sub-factor of the factor.
(3)
i th
2.1 Quantum Standard of Each Factor Each factor of the pathogenies of hydatidiform mole is quantized according to the given standard in Table 1 2.2 Conformation of Attaching Function of Each Factor Attaching function of each factor is given by following equations
3 5
μ ( S1 j ) = log(7 + S1 j )( j = 1, 2, " , 7) , 1 3
μ ( S 2 j ) = ln(6 + S 2 j )( j = 1, 2, 3)
124
Y. Guo et al.
1 μ ( S3 j ) = log(4 + S3 j )( j = 1, 2, " , 6) ,
2 3 μ ( S 4 j ) = log(8 + S4 j )( j = 1, 2, 3) , 4 4 μ ( S5 j ) = log(5 + S5 j )( j = 1, 2, 3) , 5 3 μ ( S6 j ) = log(15 + S 6 j )( j = 1, 2,3) , 5 3 μ ( Sij ) = log(10 + Sij )(i = 7,8; j = 1, 2, " , 7) , 5 5 μ ( S9 j ) = log(7 + S9 j )( j = 1, 2, 3) , 7 5 μ ( S10, j ) = log(10 + S10, j )( j = 1, 2, " , 7) . 9
(4)
where Sij is the j th sub-factor of the i th factor S i . Their quantum standard is given and their attaching function values are calculated by Eq.(4) and shown by Table 1. Table 1. Sub-factors of each factor and their scores and weights (1)
Factor
Sub-factor
Score Attaching function
44 (S17)
5 6 9 11 12 13 14
0.6475 0.6684 0.7225 0.7532* 0.7672 0.7806 0.7933
1(S21) 2~3 (S22) >3(S23)
1 3 7
0.6486 0.7324* 0.8549
0.1034
)(S41) Normal(=Month) (S42) Slow(3months (S86)
1 4 5 7 11 12 13
0.6248 0.6877 0.7057 0.7383* 0.7933 0.8055 0.8170 0.6036 0.8186* 0.9314
Highest titer Value(IU/L) of HCG (S7)
0.1725
TTNR (S8)
0.1034
Pathological examination (S9)
Low-grade hyperplasia (S91) Middle-grade hyperplsia(S92) 0.1379 Serve-grade hyperplasia (S93)
0 7 12 5
Method for termination of pregnancy (S10)}
Nature (S10,1) Induced labor success (S10,2) Induced labor unsuccess(S10,3) Uterine aspiration (S10,4) 0.0345 Uterine apoxesis (S10,5) Uterine apoxesis removel (10,6) Direct removel (S10,7)
10 12 6 7 1 2
0.6534 0.7228 0.7458 0.6689 0.6836 0.5786* 0.5995
2.3 Weight of Each Factor According to the clinic experience of medical experts and relative literatures, the weight value of each factor may be determined. Assume that weight set is expressed by
A = {a1 , a2 , " , a10 }
(5)
where ai is the weight value of the ith factor Si . The weight values are given by Table 1.
126
Y. Guo et al.
2.4 Computing Fuzzy Measure a( Si ) For λ -Fuzzy measure a( Si ) [3],[4], we have example,
a( S1 ) = a1 , a ( Si ) = a1 + a ( Si −1 ) + λ ⋅ ai ⋅ a ( Si −1 ) .
(6a) (6b)
When λ = 0 , according to the Eq.(6), we also have Using Eq.(7), fuzzy measure may be computed. 2.4 Computing Fuzzy Integral E Value In limited ranges, fuzzy integral is denoted by the symbol “E” and given by
(8) . where μ ( Si ) is a attaching function of the factor S i and monotone. Weight value ai is arrayed according to the depressed order of μ ( Si ) . 2.6 Threshold of Fuzzy Integral E Based on the clinic experience of the medical experts, the threshold of fuzzy integral E value is determined and shown in Table 2. Table 2. Threshold of fuzzy integral E
Hydatidiform mole canceration tendency Threshold of fuzzy integral(E)
Canceration pqi } , # is the set numbers, n −1
ND LLD q ∈ {1, " , n} . In the computing process, if #{x } = 1 or #{x } = 1 , then the process is over. The alternative with the largest non-dominance degree or the largest dominance degree is preferred to others.
3 Consensus Measures and Feedback Mechanism 3.1 Consensus Measures Definition 3.1. Let V c = {v1c , ", vnc } and V i = {v1i , " , vni } be the collective and
individual (i.e., expert
ei ) ordered vector of alternatives, respectively, where v cj and
v ij are the position of alternative x j in that collective ordered vector and individual
134
Z.-P. Fan and X. Chen
are the position of alternative x j in that collective ordered vector and individual (i.e., expert ei ) ordered vector, then consensus degree Ci x j and linguistic consensus degree Q 2 [Ci x j ] of each expert
ei for each alternative x j are defined according to
the following expressions: Ci x j = 1 − v cj − v ij (n − 1) , Q 2 [Ci ( x j )] = Q 2 [1 − v cj − v ij
(5a) n −1 ] .
In which, for example, if collective and individual (i.e., expert
(5b)
e3 ) ordered vector
of alternatives are V c = {x3 , x1 , x2 , x4 } and V 3 = {x2 , x3 , x4 , x1} , linguistic quantifier “most ” with the pair (0.3, 0.8), then C3 x1 and Q 2 [C3 x1 ] are obtained as: C3 x1 = 1 − 2 − 4 /( 4 − 1) = 1 − 2 / 3 = 1 / 3 , Q 2 [C3 x1 ] = s1 = EU . Definition 3.2. If Q 2 [Ci x j ] = sT , where sT is the largest element in set S, then con-
sensus measure Ci x j of expert ei for alternative x j reaches the consensus level. Or else consensus measure Ci x j of expert ei for alternative x j does not reach the consensus level. Definition 3.3. Consensus degree Ci x j and linguistic consensus degree Q 2 [C ( x j )]
of all experts on alternative x j are defined according to the following expressions: m
m
i =1
i =1
C ( x j ) = [ ¦ Ci ( x j )] m = [¦ (1 − v cj − v ij (n − 1))] m , m
m
i =1
i =1
Q 2 [C ( x j )] = Q 2 [( ¦ Ci ( x j )) m] = Q 2 [( ¦ (1 − v cj − v ij (n − 1)) m] .
(6a) (6b)
Definition 3.4. If Q 2 [C ( x j )] = sT , where sT is the largest element in set S , then
consensus measure C ( x j ) of all experts on alternative x j reaches the consensus level. Or else consensus measure C ( x j ) of all experts on alternative x j does not reach the consensus level. Definition 3.5. Consensus degree C X and linguistic consensus degree Q 2 [C X ] of all experts over the alternative set are defined according to the following expressions: n
n m
j =1
j =1i =1
C X = [ ¦ C ( x j )] n = [ ¦ ¦ (1 − v cj − v ij (n − 1))] nm , n
n m
j =1
j =1 i =1
Q 2 (C X ) = Q 2 [( ¦ C ( x j )) n] = Q 2 [( ¦ ¦ (1 − v cj − v ij (n − 1))) nm] .
(7a) (7b)
Consensus Measures and Adjusting Inconsistency of Linguistic Preference Relations
135
Definition 3.6. If Q 2 [C X ] = s T , where sT is the largest element in set S , then the
consensus measure C X of all experts over the alternative set reaches the consensus level. Or else consensus measure C X of all experts over the alternative sst does not reach the consensus level. Theorem 3.1. If the consensus measure of expert ei (i = 1,", m) on alternative x j
reaches the consensus level, then consensus measure on alternative x j of all experts reaches the consensus level. Proof. First, using (5b), Q 2 [Ci ( x j )] = sT , ∀i ∈ {1 ", m} . Using (4b), then Ci ( x j ) > b , ∀i ∈ {1 ", m} .
Using
m
m
i =1
i =1
(6a),
we
have
m
C ( x j ) = [¦ (1 − v cj − v ij (n − 1))] m i =1
= [ ¦ Ci ( x j )] m > ( ¦ b) m = b , i.e., C ( x j ) > b . Hence, Q 2 [C ( x j )] = C . By Definition 3.4, we know that consensus measure on alternative x j of all experts reaches the consensus. Theorem 3.2. If consensus measure of all experts on alternative x j ( j = 1,", n)
reaches the consensus level, then consensus measure C X of all experts reaches the consensus level. Proof. First, using (6b), Q 2 [C ( x j )] = sT , ∀j ∈ {1 ", n} . Using (4b), then C ( x j ) > b , ∀j ∈ {1 ", n} .
Using
n
n
j =1
j =1
(7a),
we
have
n m
C X = [ ¦ ¦ (1 − v cj − v ij (n − 1))] nm j =1i =1
= [ ¦ (C ( x j ))] n > ( ¦ b) n = b . i.e., C X > b . Hence, Q 2 [C X ] = C . By Definition
3.6, consensus measure of all experts over the alternative set reaches the consensus level. Remark 3.1. Note that definitions 3.2, 3.4 and 3.6, as in Definitions 3.2, 3.4 and 3.6, the required consensus levels are difference if linguistic relative quantifier are difference based on the difference parameters (a, b). The required consensus levels are usually decided by a leading decision maker. 3.2 Feedback Mechanism
When consensus measure C X has not reached the required consensus level, i.e., C X < C , then the experts’ opinions should be improved or modified. In this pape, the rule of feedback mechanism is developed below. (1) If v cj − v ij < 0 , then move forward position of alternative x j for the ith expert, i.e., number m j about x j ; xi (i = 1,", n) should be increased in the above Pi [9].
136
Z.-P. Fan and X. Chen
(2) If v cj − v ij = 0 , do not changed position of alternative x j for the ith expert. (3) If v cj − v ij > 0 , then move backward position of alternative x j for the ith expert, i.e., numbers m j about x j ; xi (i = 1,", n) should be decreased in the above Pi [9]. An adjusting approach is expressed in the following steps. Step 1. Consensus measure is calculated by (5a)-(7a). If Q 2 [C X ] = sT , then go Step 5. Or else go next step. Step 2. Let v cj − v ij = max v cj − v ij , then the linguistic preference relation of the *
*
*
1 0 and [A]γ = cl{t∈ R|A(t) > 0} (the closure of the support of A) if γ = 0. It is well known that if A is a fuzzy number then [A]γ is a compact subset of R for all γ ∈ [0, 1]. Moreover, a function f : [0, 1] → R is said to be a weighting function if f is non-negative, monotone increasing and 1 satisfies the normalization condition 0 f (γ)dγ = 1. Full´er and Majlender defined the notations of the f −weighted lower and upper possibilistic mean values of a fuzzy number A [6]. They defined the f −weighted possibilistic mean value of A as 1 MfL (A) + MfU (A) a1 (γ) + a2 (γ) dγ = , M f (A) = f (γ) 2 2 0 where MfL (A)
1
1
=
f (γ)a1 (γ)dγ = 0
MfU (A)
=
0
1
1
f (γ)a2 (γ)dγ = 0
0
a1 (γ)f (P os[A ≤ a1 (γ)])dγ , 1 f (P os[A ≤ a1 (γ)])dγ 0 a2 (γ)f (P os[A ≥ a2 (γ)])dγ , 1 f (P os[A ≥ a2 (γ)])dγ 0
where Pos denotes possibility, i.e., P os[A ≤ a1 (γ)] = Π((−∞, a1 (γ)]) =
sup A(u) = γ, u≤a1 (γ)
P os[A ≥ a2 (γ)] = Π([a2 (γ), +∞]) =
sup A(u) = γ. u≥a2 (γ)
Then the following lemma can directly be proved using the definition of f weighted interval-valued possibilistic mean [6]. Lemma 1. Let A ∈ F, and let θ, λ be two real numbers. Then M f (A + θ) = M f (A) + θ, M f (λA) = λM f (A), where the addition and multiplication by a scalar of fuzzy numbers is defined by the sup-min extension principle [4]. We will now introduce the notations of the weighted possibilistic variance and covariance of fuzzy numbers. Definition 1. The weighted possibilistic variance of A ∈ F is defined as 1 V ar f (A) = f (γ)([MfL (A) − a1 (γ)]2 + [MfU (A) − a2 (γ)]2 )dγ. 0
(1)
150
X. Wang et al.
Note that the equation (1) can be also rewritten as
1
f (γ)([MfL (A) − a1 (γ)]2 + [MfU (A) − a2 (γ)]2 )dγ
V arf (A) = 0
1
2
2
f (γ)(a21 (γ) + a22 (γ))dγ − [MfL (A) + MfU (A)].
= 0
Definition 2. The weighted possibilistic covariance of A, B ∈ F is defined as 1 Cov f (A, B) = f (γ)[(MfL (A) − a1 (γ))(MfL (B) − b1 (γ)) 0
+ (MfU (A) − a2 (γ))(MfU (B) − b2 (γ))]dγ. The following theorem shows some important properties of the weighted possibilistic variance and covariance. Theorem 1. Let A ∈ F, and let θ, λ be two real numbers. Then V ar f (A + θ) = V arf (A), V arf (λA) = λ2 V ar f (A). Proof. Let [A]γ = [a1 (γ), a2 (γ)], γ ∈ [0, 1]. From the relationship [A + θ]γ = [a1 (γ) + θ, a2 (γ) + θ], we get
MfL (A
1
f (γ)(a1 (γ) + θ)dγ = MfL (A) + θ,
+ θ) = 0
MfU (A
1
f (γ)(a2 (γ) + θ)dγ = MfU (A) + θ.
+ θ) = 0
Thus, V ar(A + θ) = V ar(A). According to the relationship γ
γ
[λA] = λ[A] = λ[a1 (γ), a2 (γ)] = we get that
MfL (λA)
=
MfU (λA)
=
[λa1 (γ), λa2 (γ)] if λ ≥ 0, [λa2 (γ), λa1 (γ)] if λ < 0,
λMfL (A) if λ ≥ 0, λMfU (A) if λ < 0, λMfU (A) if λ ≥ 0, λMfL (A) if λ < 0.
Hence, V ar f (λA) = λ2 V arf (A). The proof of theorem is completed.
Weighted Possibilistic Variance of Fuzzy Number and Its Application
151
We show that the variance of a fuzzy number is invariant to shifting. The following theorem can be proved in a similar way as Theorem 1. Theorem 2. Let f be a weighting function, let A, B, C be fuzzy numbers and let x and y be real numbers. Then the following properties hold Cov f (A, B) = Cov f (B, A), Cov f (A, A) = V ar f (A), Cov f (x, A) = 0, V ar f (x) = 0. The following theorem shows that the possibilistic variance of linear combination of fuzzy numbers can easily be computed. Theorem 3. Let A and B be two fuzzy numbers and let λ, μ ∈ R. Then V arf (λA + μB) = λ2 V arf (A) + μ2 V arf (B) + 2|λμ|V arf (φ(λ)A, φ(μ)B), where φ(x) is a sign function of x ∈ R. Proof. Let [A]γ = [a1 (γ), a2 (γ)] and [B]γ = [b1 (γ), b2 (γ)], γ ∈ [0, 1]. Suppose λ < 0 and μ < 0. Then [λA + μB]γ = [λa2 (γ) + μb2 (γ), λa1 (γ) + μb1 (γ)]. We obtain
1
f (γ)(MfL (λA + μB) − λa2 (γ) − μb2 (γ))2 dγ
V ar f (λA + μB) = 0
1
f (γ)(MfU (λA + μB) − λa1 (γ) − μb1 (γ))2 dγ
+
0 1
f (γ)(λMfU (A) + μMfU (B) − λa2 (γ) − μb2 (γ))2 dγ
= 0
1
f (γ)(λMfL (A) + μMfL (B) − λa1 (γ) − μb1 (γ))2 dγ
+ 0
= λ2 V ar f (A) + μ2 V arf (B) + 2λμCov f (A, B) = λ2 V ar f (A) + μ2 V arf (B) + 2λμCov f (−A, −B), that is, V arf (λA + μB) = λ2 V arf (A) + μ2 V arf (B) + 2|λμ|Cov f (φ(λ)A, φ(μ)B), where
⎧ ⎨ 1 if x > 0, 0 if x = 0, φ(x) = ⎩ −1 if x < 0.
Similar reasoning holds for the case λ ≥ 0 and μ ≥ 0.
152
X. Wang et al.
Suppose now that λ > 0 and μ < 0. Then [λA + μB]γ = [λa1 (γ) + μb2 (γ), λa2 (γ) + μb1 (γ)]. We get
1
f (γ)(MfL (λA + μB) − λa1 (γ) − μb2 (γ))2 dγ
V ar f (λA + μB) = 0
1
f (γ)(MfU (λA + μB) − λa2 (γ) − μb1 (γ))2 dγ
+ 0
1
f (γ)(λMfL (A) + μMfU (B) − λa1 (γ) − μb2 (γ))2 dγ
= 0
1
f (γ)(λMfU (A) + μMfL (B) − λa2 (γ) − μb1 (γ))2 dγ
+ 0
= λ2 V ar f (A) + μ2 V arf (B) + 2|λμ|Cov f (φ(λ)A, φ(μ)B). Similar reasoning holds for the case λ < 0 and μ ≥ 0, which ends the proof. Let Ai (i = 1, . . . , n) be fuzzy numbers. Set bij = Cov f (Ai , Aj ), i, j = 1, . . . , n. The matrix Cov f = (bij )n×n is called as possibilistic covariance matrix of (A1 , A2 , . . . , An ). The following theorem shows that Cov f = (bij )n×n has the same properties as the covariance matrix in probability theory. Theorem 4. Let Ai (i = 1, . . . , n) be fuzzy numbers. Then Cov f = (bij )n×n is nonnegative definite matrix. Proof. Let [Ai ]γ = [ai1 (γ), ai2 (γ)] (i = 1, . . . , n). Then bii = V ar f (Ai ), bij = bji , i = j = 1, 2, . . . , n. For any ti ∈ R (i = 1, . . . , n), n n i=1 j=1
bij ti tj =
n n i=1 j=1
1
f (γ)([MfL (Ai ) − ai1 (γ)][MfL (Aj ) − aj1 (γ)]
0
+ [MfU (Ai ) − ai2 (γ)][MfU (Aj ) − aj2 (γ)])ti tj dγ 1 n 2 = f (γ) ti (MfL (Ai ) − ai1 (γ)) dγ 0
+ 0
i=1
1
n 2 f (γ) ti (MfU (Ai ) − ai2 (γ)) dγ i=1
≥ 0. This concludes the proof of theorem.
Weighted Possibilistic Variance of Fuzzy Number and Its Application
3
153
Weighted Possibilistic Portfolio Analysis
Assume that there are n risky assets available in markets. Let r = (M f (r1 ), . . . , M f (rn ))T be the vector of mean return rates where M f (ri ) = 1 2i (γ) f (γ) a1i (γ)+a dγ is the weighted possibilistic mean of fuzzy number ri that 2 0 means the expected return rate of asset i. Let x = (x1 , . . . , xn )T be the investment weights of a portfolio to the n assets. Then the weighted possibilistic mean value of r is given by rT x. Cov f = (bij )n×n is the n × n variance-covariance matrix among the return fuzzy numbers of n assets, where bij = Cov f (ri , rj ) is the weighted possibilistic covariance between the return fuzzy numbers of the i-th and j-th assets, and bii = V arf (ri ) is the weighted possibilistic variance of the return fuzzy number of asset i. The weighted possibilistic variance of portfolio r is then given by xT Cov f x. Thus, if the weighted possibilistic variance-covariance matrix Cov f is positive definite, the weighted possibilistic M–V portfolio model with short sales may be described by min
V arf (x) = xT Cov f x
s.t rT x = r E
(2)
T
e x = 1, where V ar f (x) is the weighted possibilistic risk (variance) of the portfolio x, r E is the goal of expected returns for the investment, and e is the n-dimensional vector with all elements being one. Similar to Markovitz portfolio model in [1] and [7], we can also obtain that the unique global optimal portfolio of (2) is rE −1 , x∗ = Cov f (r, e)A−1 1 and is called the weighted possibilistic efficient M-V portfolio with goal rE , and −1 −1 −1 c b A= , a = eT Cov f e, b = eT Cov f r, c = rT Cov f r. b a The corresponding minimum weighted possibilistic risk is obtained by substituting x∗ into the objective function of (2), V ar f (x∗ ) =
2 arE − 2brE + c ,
(3)
where = ac−b2 > 0, because of the positive definiteness of the matrix Cov f . In V arf –rE plane, the equation (3) describes a hyperbola and the right-branch gives the boundary for feasible portfolios. The top half of the boundary is called the weighted possibilistic efficient frontier, and the low half is the inefficient frontier, while the extreme point G gives the portfolio that has the global minimum
154
X. Wang et al.
weighted possibilistic variance and the global minimum weighted possibilistic return rate, −1 Cov f e b 1 , rG = , V ar f G = . xG = a a a xG is called the global minimum weighted possibilistic variance portfolio. Now, we will give a simple example to illustrate the application of our results in the security market as follows. Four stocks in the security market are only considered with allowing short sales. Let rj be an LR-type fuzzy number that means the return rate of security Sj (j = 1, 2, 3, 4). Similarly in [6], let the fuzzy number rj = (rj− , rj+ , α, β) be a fuzzy number of trapezoidal form with peak [a, b], left width α > 0 and right width β > 0, and let f (γ) = 3γ 2 . A γ–level sets of rj is computed by [rj ]γ = [rj− − α(1 − γ), rj+ + β(1 − γ)], ∀γ ∈ [0, 1]. Then we can compute out the f –weighted possibilistic mean values, variances, and covariances as follows. stock j return rate rj % MfL (rj ) MfU (rj ) M f (rj ) V ar f (rj ) 1 [11–2.0, 12+1.2] 10.5 12.3 11.4 0.204 2 [13–1.5, 14+2.5] 12.625 14.625 13.625 0.31875 3 [12–1.0, 13+1.0] 11.75 13.25 12.5 0.075 4 [16–0.5, 17+1.5] 15.875 17.375 16.625 0.09375 and Cov f (r1 , r2 ) = 0.225, Cov f (r1 , r3 ) = 0.12, Cov f (r1 , r4 ) = 0.015, Cov f (r2 , r3 ) = 0.15, Cov f (r2 , r4 ) = 0.16875, Cov f (r3 , r4 ) = 0.075. Thus, we can obtain the portfolio solution x = (−1.0762, 0.0103, −0.4969, 1.5628). In the financial market with uncertain factors, real world problems are not usually so easily formulated as accuratedly mathematical forms. Moreover, the investors’ or experts’ judgment in data analysis is very important, while the weighting function can effectively characterize their knowledge. Based on this fact, the weighted possibility analysis methods in the portfolio selection will be suitable for such an attempt as shown in this paper.
4
Summary
In this paper, we have introduced the notations of the weighted possibilistic variance and covariance of fuzzy numbers, which are consistent with the extension principle and the well-known definitions of variance and covariance in probability theory. Moreover, we use these notations to build the weighted possibilistic model of portfolio selection. Since the returns of risky assets are in a fuzzy uncertain economic environment and vary from time to time, the fuzzy number is a powerful tool used to describe an uncertain environment with vagueness and ambiguity.
Weighted Possibilistic Variance of Fuzzy Number and Its Application
155
Acknowledgement The authors are grateful to the referees for their valuable comments and suggestions. Moreover, the last author would like to acknowledge the support No. 2004070 from the Fund of Higher Education Science Research Project of Ningxia in China.
References 1. Best, M.J., Hlouskova, J.: The efficient frontier for bounded assets. Mathematics of Operations Research. 52 (2000) 195–212 2. Carisson, C., Full´er, R.: On possibilistic mean value and variance of fuzzy numbers. Fuzzy Sets and Systems. 122 (2001) 315–326 3. Dubois, D., Prade, H.: The mean value of a fuzzy number. Fuzzy Sets and Systems. 24 (1987) 279–300 4. Dubois, D., Prade, H.: What are fuzzy rules and how to use them. Fuzzy Sets and Systems. 84 (1996) 169–185 5. Dubois, D., Prade, H., Sabbadin, R.: Decision-theoretic foundations of qualitative possibility theory. European Journal of Operational Research. 128 (2001) 459–478 6. Full´er, R., Majlender, P.: On weighted possibilistic mean and variance of fuzzy numbers. Fuzzy Sets and Systems. 136 (2003) 363–374 7. Markovitz H.: Portfolio selection: Efficient diversification of investment. Wiley, New York. (1959) 8. Yager, R.: On the instantiation of possibility distributions. Fuzzy Sets and Systems. 128 (2002) 261–266 9. Zhang, W., Nie, Z.: On possibilistic variance of fuzzy numbers. In: Wang, G., Liu, Q., Yao, Y., Skowron, A. (eds.): Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing. Lecture Notes in Computer Science, Vol. 2639. Springer-Verlag, Berlin Heidelberg New York (2003) 398–402
Another Discussion About Optimal Solution to Fuzzy Constraints Linear Programming Yun-feng Tan1 and Bing-yuan Cao1,2 1
2
Dept of Math., Shantou University, Shantou 515063 P. R. China School of Math. and Inf. Sci., Guangzhou University, Guangzhou 510405 China
Abstract. In this paper, we focus on the fuzzy constraints linear programming. First we discuss the properties of an optimal solution vector and of an optimal value in the corresponding parametric programming, and propose a method to the critical values. Then we present a new algorithm to the fuzzy constraint linear programming by associating an object function with the optimal value of parametric programming.
1
Introduction
The normal form of a linear programming problem with fuzzy constraint is (LP)
max Z = CX s.t. AX b, X 0,
where A = (aij )m×n is an m × n matrix, b is an m-dimensional vector, C and X are n-dimensional vectors. Let rank(A)=m. “” denotes the fuzzy version of “” and has the linguistic interaction “essentially smaller than or equal to”[1]. is to turn it into a classical linear proThe representative method to (LP) gramming [2]. We will try to explain why its fuzzy decision usually is 0.5 found by Researchers [3][4][5], but we shall propose another algorithm to (LP).
2
Analysis of Fuzzy LP
We first discuss the properties of an optimal value Zλ in (LPλ )
max Z = CX s.t. AX b + (1 − λ)d, X 0,
where λ is a parameter on the interval [0,1], and d 0. In this paper, Xλ denotes the optimal solution to (LPλ ), Bλ and Zλ the optimal basis matrix and the optimal value of (LPλ ), respectively. The right side coefficient b + (1 − λ)d of the constraint condition in (LPλ ) will vary with the changing of parameter λ. The optimal solution to the parametric linear programming (LPλ ) is Bλ−1 (b+(1−λ)d). L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 156–159, 2005. c Springer-Verlag Berlin Heidelberg 2005
Another Discussion About Optimal Solution to Fuzzy Constraints LP
157
If we solve (LPλ ) by using a simplex method, there is no relationship between discriminate number σ = CN − CB B −1 N and parameter λ, so the variation of an optimal basis matrix is decided only by Xλ . Definition 1. Let B be one of the optimal basic matrix of (LPλ ). If an interval [λ1 , λ2 ] exists, satisfying that B is an optimal basic matrix of (LPλ )(∀λ ∈ [λ1 , λ2 ]) while B is not an optimal matrix for each λ∈[λ1 , λ2 ], we call λ1 and λ2 critical values of (LPλ ) and [λ1 , λ2 ] characteristic interval. Theorem 1. (LPλ ) has finite characteristic interval on the interval [0, 1]. Proof. Let us assume B is an optimal basis matrix of LPλ , and there are two characteristic intervals [λi−1 , λi ] and [λi+1 , λi+2 ],( λi < λi+1 ) corresponding to B. The optimal solution to LPλ is (XB , XN )T , where λ ∈ [λi−1 , λi ] ∪ [λi+1 , λi+2 ], XB = B −1 (b + (1 − λ)d) 0, XN = 0, λi < λi+1 . So XB = B −1 (b + (1 − λ)d) 0 when λ ∈ [λi , λi+1 ], this means an optimal matrix of LPλ is also B on the interval [λi , λi+1 ]. Therefore the characteristic interval where the optimal matrix keeps invariant is [λi−1 , λi+2 ]. So the optimal matrix has only one corresponding characteristic interval. Because the coefficient matrix of LPλ keeps invariant on the interval [0,1], and an optimal matrix is finite, the number of a characteristic intervals is finite. Theorem 2. Let B be the optimal basis matrix of (LPλ ) on a characteristic interval [λ1 , λ2 ]. If (B −1 d)i = 0, ∀i ∈ {1, . . . , m}, then we can obtain λ1 = max
[B −1 (b + d)]
i
, 0 | (B −1 d)i < 0, i = 1, 2, . . . m ,
(B −1 d)i [B −1 (b + d)] i −1 , 0 | (B d) > 0, i = 1, 2, . . . m , λ2 = min i (B −1 d)i
(1) (2)
where (B −1 (b + d))i and (B −1 d)i are the i-th components of B −1 (b + d) and B −1 d, respectively. Proof. We can use partitioned matrices to represent a simplex method to an LP. Since I B −1 N B −1 (b + (1 − λ)d) B N b + (1 − λ)d =⇒ CB CN Z CB CN Z I B −1 N B −1 (b + (1 − λ)d) =⇒ , 0 CN − CB B −1 N Z − CB B −1 (b + (1 − λ)d) there is no relationship between variable λ and the discriminate number, where N is a non-basis matrix corresponding to B. B −1 (b + (1 − λ)d) 0 is only required in order to make the optimal matrix of (LPλ ) invariant. This means [B −1 (b + d)]i − λ[B −1 d]i 0, ∀i. By solving this inequality, we can obtain λ ∈ [λ1 , λ2 ], where λ1 and λ2 are represented with (1) and (2). It is obvious that the optimal matrix of (LPλ ) will change when λ > λ2 or λ < λ1 . Therefore the characteristic interval, corresponding to the optimal basis matrix B, is [λ1 , λ2 ].
158
Y.-f. Tan and B.-y. Cao
Now we can easily get the properties of optimal value function Zλ as follows. Property 1. Let B be an optimal matrix of (LPλ ) on the characteristic interval [λi , λj ]. Then Xλ = B −1 (b + (1 − λ)d)(λi λ λj ) is a linear vector function about variable λ. The optimal value function Zλ = CB B −1 (b + (1 − λ)d) is a linear function about variable λ and decreases with the increase of variable λ. It is obvious that every component of the bound variable b + (1 + λ)d and Zλ decrease continuously and monotonously on every characteristic interval [λi , λi+1 ] from the knowledge of property 1. Every component of the optimal solution vector function is continuous at every point λi . Property 2. The optimal value function Zλ of (LPλ ) is continuous on the interval[0,1]. Based on the above conclusion, we consider an optimal solution to (LP). Theorem 3. Let C be the constraint and G the objective function on domain X. Then the optimal solution x∗ to the fuzzy optimal set D = C ∧ G satisfies D(x∗ ) = max D(x) = max {λ ∧ max G(x)}, x∈X
x∈cλ
0λ1
where Cλ = {x|x ∈ X, C(x) λ}[2]. The fuzzy objective function can be defined as Cλ: Zλ=Z1 +d0 λ, so we can use the intersection of the fuzzy objective function Cλ: Zλ = Z1 + d0 λ and Gλ: Zλ = shown as in Fig. 1. CB λ Bλ−1 (b + (1 − λ)d) to find an optimal decision of (LP), Zλ
6 Cλ Z0 ``` Z Z Z A AG Z1 λ | 0
1
λ
Fig. 1. The intersection of Cλ and Gλ
3
Algorithm to Fuzzy LP Problem
Let Z1 be an optimal value of (LP1 ), and Z0 be an optimal value of (LP0 ), d0 = Z0 − Z1 > 0. Based on the above conclusions, we give a new algorithm to problem as follows. (LP) Step 1. Solve linear programming (LP0 ) and (LP1 ). Let the optimal solutions be X0 , X1 , the optimal values be Z0 , Z1 , and the optimal matrix of (LP0 ) be B0 . Step 2. Solve [B0 −1 (b+(1−λ)d)]i = 0. Assume the solutions are λ1 , . . . , λn−1 , (0 < λ1 < · · · < λn−1 < 1), let λ0 = 0, λn = 1, λ = λ1 , k = 1. Step 3. Solve (LPλ ). Let the optimal value be Zλ . If Zλ Z1 + d0 λ. Then turn to step 4, otherwise let k = k + 1, λ = λk , turn to step 3.
Another Discussion About Optimal Solution to Fuzzy Constraints LP
159
Step 4. Solve the optimal decision as follows λ∗ =
Z1 λk − Z1 λk−1 − Zλk−1 λk + Zλk λk−1 . Zλk − Zλk−1 − λk d0 + λk−1 d0
Step 5. Solve linear programming (LPλ∗ ), and we can obtain the optimal solution Xλ∗ and optimal value Zλ∗ . Calculate the following example by putting the above algorithm to practice. Example 1. max 3x1 + 5x2 s.t. 7x1 + 2x2 66, 5x1 + 3x2 61, x1 + x2 16, x1 8, x2 5, xi ≥ 0 (i = 1, 2), d = (0, 0, 0, 0, 7). Solution. According to the above algorithm, we can obtain the optimal decision λ∗ is 0.557. By calculating (LP0.557 ), we obtain X0.557 = (6.861, 8.899)T and Z0.557 = 65.077. So the optimal solution to the example is X ∗ = (6.861, 8.899)T and the optimal value is Z ∗ = 65.078.
4
Conclusion
From this paper, we know the optimal decision of the fuzzy constraint linear programming does not necessarily equal 0.5. The new algorithm shows us that the following. (1) When LP0 and LP1 have a same optimal matrix, an optimal is a segment. value function of a parametric programming corresponding to (LP) So the optimal decision is 0.5. (2) When LP0 and LP1 have not a same optimal is convex and composed matrix, an optimal value function corresponding to (LP) with several segments. Therefore the optimal decision is more than 0.5.
Acknowledgments Thanks to the support by National Natural Science Foundation of China (No. 70271047) and “ 211” Project Foundation of Shantou University.
References 1. Cao B.Y.: The Method of Research LP Antinomy. Journal of Hunan Education Institute 9(2) (1991) 17–22 2. Cao, B.Y.: Fuzzy Geometric Programming. Kluwer Academic Pub. Dordrecht (2002) 3. Fu G.Y.: On Optimal Solution of Fuzzy Linear Programming. Fuzzy Systems and Mathematics 4(1) (1990) 65–72 4. Lui M.J., Cao B.Y.: The Research and Expansion on Optimal Solution of Fuzzy LP. Proc. of 1th Int. Conf. on FSKD, Vol. 2. Singapore (2002) 539–543 5. Liu T.F.: The Solution to the Problem of Parametric Linear Programming by Lumped Matrix. Journal of Electric Power 15(1) (2000) 22–25 6. Zimmermann H.-J.: Fuzzy Programming and Linear Progamming with Several Objective Functions. Fuzzy Sets and Systems 1(1) (1987) 49–55
Fuzzy Ultra Filters and Fuzzy G-Filters of MTL-Algebras Xiao-hong Zhang 1,∗, Yong-quan Wang 2, and Yong-lin Liu 3 1
Faculty of Science , Ningbo University, Ningbo 315211, Zhejiang Province, P.R.China [email protected] 2 Information Technology Center, East China University of Politics and Law, Shanghai 200042, P.R.China [email protected] 3 Department of Mathematics, Nanping Teachers College, Nanping 353000, P.R.China [email protected]
Abstract. The concepts of fuzzy ultra filters and fuzzy G-filters of MTLalgebras are introduced. Some examples are given and the following main results are proved: (1) a fuzzy filter of MTL-algebra is fuzzy ultra filter if and only if it is both a fuzzy prime filter and fuzzy Boolean filter; (2) a fuzzy filter of MTL-algebra is fuzzy Boolean filter if and only if it is both a fuzzy G-filter and fuzzy MV-filter.
1 Introduction The interest in foundation of Fuzzy logic has been rapidly growing recently and several new algebras playing the role of the structures of truth values have been introduced. P.Hajek introduced the axiom system of basic logic (BL) for fuzzy propositional logic and defined the class of BL-algebras (see [1]). Ying discussed fuzzy reasoning and implication operators (see [2], [3]). Wang proposed a formal deductive system L* for fuzzy propositional calculus, and a kind of new algebraic structures, called R0-algebras (see [4],[5],[6]). Esteva and Godo proposed a new formal deductive system MTL, called monoidal t-norm-based logic, intended to cope with leftcontinuous t-norms and their residual, the algebraic semantics for MTL is based on MTL-algebras (see [7], [8]). Zhang ([13], [14]) further studied properties of filters in MTL-algebras, and introduced the notion of Boolean filters, MV-filters and G-filters. Based on the fuzzy set theory, Kim, Zhang and Jun [9] studied the fuzzy structure of filters in MTL-algebras. Furthermore, Jun, Xu and Zhang studied characterizations of fuzzy filters in MTLalgebras, and investigated the fuzzification of Boolean filters and MV-filters. As a continuation of the paper [10], in this paper we further study fuzzy ultra filter, fuzzy prime filter and fuzzy G-filters of MTL-algebras. We give some examples of fuzzy ∗
This work was supported by the National Natural Science Foundation of P. R. China (Grant no. 60273087) and the Research Foundation of Ningbo University.
L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 160 – 166, 2005. © Springer-Verlag Berlin Heidelberg 2005
Fuzzy Ultra Filters and Fuzzy G-Filters of MTL-Algebras
161
ultra filter and fuzzy G-filter, and prove the following important results: (1) a fuzzy filter of MTL-algebra is fuzzy ultra filter if and only if it is both a fuzzy prime filter and fuzzy Boolean filter; (2) a fuzzy filter of MTL-algebra is fuzzy Boolean filter if and only if it is both a fuzzy G-filter and fuzzy MV-filter.
2 Preliminaries First of all, we recall some notions and their important properties. Definition 2.1. (see [1]). A residuated lattice is an algebra (L, four binary operations and two constants such that
,,0,1) with
,0,1) is a lattice with the largest element 1 and the least element 0 (i) (L, (with respect to the lattice ordering ). is com(ii) (L, ,1) is a commutative semigroup with the unit element 1, i.e. mutative, associative, and 1 x=x for all x±L. and form an adjoint pair, i.e. zxy iff z xy for all x, y, z±L. (iii) Definition 2.2. (see [7]). A residuated lattice (L, algebra, if it satisfies the pre-linearity equation: (xy)
,,0,1) be a residuated lattice,
Proposition 2.3. (see [11] and [13]). Let (L, then for all x, y, z±L, (1) (3) (5) (6) (7) (9) (11)
,,0,1) is called a MTL(yx)=1, ∀x, y±L.
xy if and only if xy=1; (2) x=1x, x(yx)=1, y(yx)x ; xyz if and only if yxz; (4) x(yz)=(x y)z=y(xz); xy implies zxzy, yzxz; zy(xz)(xy), zy(yx)(zx); (xy) (yz)xz; (8) x′=x′′′, xx′′, x′ x=0; (10) if x x′=1 then x x′=0; x′ y′=(x y)′;
( ∨ yi ) → x = ∧ ( yi → x) i∈Γ
i∈Γ
x → ( ∧ yi ) = ∧ ( x → yi )
i∈Γ (13) (15) xy implies x
i∈Γ
zy
;
(12) x
;
(14) i∈Γ (16) yzx
z;
i ī
yi)
i ī(x
yi);
∨ ( yi → x) ≤ ( ∧ yi ) → x i∈Γ
yx
;
z.
where x′=x0; * is a finite or infinite index set and assume that the corresponding infinite meets and joints exist in L. Proposition 2.4. (see [11] and [13]) Let (L, for all x, y, z±L, (17) (19) (21) (22)
(x y)z=(xz) (yz); ((xy)z)(((yx)z)z)=1; x (y z)=(x y) (x z), x (y x yx y;
,,0,1) be a MTL-algebra, then
(18) x y=((xy)y) ((yx)x); (20) x(y z)=(xy) (xz); z) = (x y) (x z); (23) x′ y′=(x y)′.
162
X.-h. Zhang, Y.-q. Wang, and Y.-l. Liu
Definition 2.5 (see [7]). Let L be a MTL-algebra. A filter F is a nonempty subset of L satisfying (F1) for any x, y±F, x
y±F;
Proposition 2.6 (see [7]). Let (L, a filter of L if and only if (F3) 1±F;
(F2) for any x±F, if x≤y, then y±F. ,,0,1) be a MTL-algebra, F°L. Then F is (F4) y±F whenever x, xy±F, for any x, y±L.
In what follows let L denote MTL-algebra unless otherwise specified. Definition 2.7([9]). A fuzzy set μ in L is called a fuzzy filter of L if it satisfies (i) For any x, y±L, μ (x y)≥min{μ (x), μ (y)}; (ii) μ is order-preserving, that is, For any x, y±L, x≤yμ (x)≤μ (y). Theorem 2.8 (see [10]). A fuzzy set μ in L is a fuzzy filter of L if and only if it satisfies (iii) For any x±L, μ (1)≥μ (x);
(iv) For any x, y±L, μ (y)≥min{μ (x), μ (xy)}.
Definition 2.9 (see [10]). A fuzzy filter μ of L is said to be Boolean if it satisfies the following equality (v) For any x±L, μ (x
x′)=μ (1).
Theorem 2.10 (see [10]) Let μ be a fuzzy filter of L. Then the following assertions are equivalent: (vi) (vi) μ is Boolean; (vii) For any x, y, z±L, μ (xz)≥min{μ (x(z′y)), μ (yz)}; (viii) For any x, y±L, μ (x)≥μ ((xy)x)). Definition 2.11 (see [10]). A fuzzy set μ in L is called a fuzzy MV-filter of L if it is a fuzzy filter of L that satisfies the following inequality: (ix) For any x, y±L, μ (xy)≤μ (((yx)x) y).
3 Fuzzy Ultra Filters of MTL-Algebras Definition 3.1 A fuzzy set μ in L is called a fuzzy ultra filter of L if it is a fuzzy filter of L that satisfies the following condition: (x) For any x±L, μ (x)=μ (1) or μ (x′)=μ (1). Example 3.2 Let L={0,a,b,c,d,1} in which , L( ,0,1) is a MTL-algebra.
are defined by Table 1, 2. Then
Let μ be a fuzzy set in L given by if x ∈ {1, a, d } α μ ( x) = ® β otherwise ¯ where α>β in [0,1]. Then μ is a fuzzy ultra filter of L.
Fuzzy Ultra Filters and Fuzzy G-Filters of MTL-Algebras Table 1
0 a b c d 1
0 1 c d a b 0
a 1 1 a 1 1 a
b 1 b 1 1 b b
163
Table 2
c 1 b a 1 b c
d 1 a d a 1 d
1 1 1 1 1 1 1
0 0 0 0 0 0 0
0 a b c d 1
a 0 d c 0 d a
b 0 c b c 0 b
c 0 0 c 0 0 c
d 0 d 0 0 d d
1 0 a b c d 1
Fig. 1
Definition 3.3. A fuzzy set μ in L is called a fuzzy prime filter of L if it is a fuzzy filter of L that satisfies the following condition: (xi) For any x, y±L, μ (x
y)≤μ (x)
μ (y).
In Example 3.2, μ is a fuzzy prime ultra filter of L. Example 3.4. Let L=[0,1] and define a product x for all x, y
y= ® x ∧ y ¯0
if x + y > 1 , otherwise
and a residuum ĺ on L as follows
xy= ®1 ¯(1 − x) ∨ y
if x ≤ y otherwise
L. Then L is a MTL-algebra. Let μ1 and μ2 be two fuzzy sets in L given by
μ1(x)= ®α 2 ¯α + 0.5 x
if x ∈ [0, 0.5] if x ∈ (0.5,1]
,
μ2(x)= ®α ¯α + 0.3x
if x ∈ [0, 0.5] if x ∈ (0.5,1]
where α in [0,0.5). Then μ1 and μ2 are fuzzy prime filter of L, but μ1 and μ2 are not fuzzy ultra filter of L, since μ(0.5) μ(1) and μ(0.5′) μ(1). Theorem 3.5. A fuzzy set μ in L is a fuzzy ultra filter of L if and only if it satisfies (a) μ is a fuzzy Boolean filter of L; (b) μ is a fuzzy prime filter of L. Proof: Suppose that μ is a fuzzy Boolean and fuzzy prime filter of L. For any x L, by (v) and (xi) we have μ(x x′)=μ(1)≤μ(x) μ(x′). If μ(x) μ(1), since μ(x)≤μ(1), μ(x′)≤μ(1), by Theorem 2.8 (iii) we have μ(x′)=μ(1). Thus, μ is a fuzzy ultra filter of L. Conversely, let μ be a fuzzy ultra filter of L. For any x L, since x≤x x′, x′≤x x′, so μ (x)≤μ (x x′), μ (x′)≤μ (x x′) by Definition 2.7 (ii). According to the Definition of fuzzy ultra filter, we have μ(x)=μ(1) or μ(x′)≤μ(1). Thus, μ(1)≤μ(x x′). From this and Theorem 2.8 (iii), we get μ(1)=μ(x x′). This means that μ is a fuzzy Boolean filter of L. And, by Definition 2.7 (ii) and proposition 2.4 (18) we have
μ(x y)=μ(((xy)y) ((yx)x))≤μ((xy)y). From 0≤y and Proposition 2.3 (5) we get (xy)y≤x′y. Thus, μ((xy)y)≤ μ(x′y) by Definition 2.7 (ii). So, μ(x y)≤μ(x′y).
164
X.-h. Zhang, Y.-q. Wang, and Y.-l. Liu
For any x, y L, if μ(x)=μ(1), then μ(x y)≤ μ(1)=μ(x)≤μ(x) μ(y). If μ(x) μ(1), then μ(x′)=μ(1) by Definition 3.1. Thus, μ(y)≥min{μ(x′), μ(x′y)}= min{μ(1), μ(x′y)}=μ(x′y). by Theorem 2.8 (iv) and (iii). Therefore, μ(x y)≤μ(x′y)≤μ(y)≤μ(x) μ(y). This means that μ is a fuzzy prime filter of L.
4 Fuzzy G-Filters of MTL-Algebras Definition 4.1. A fuzzy set μ in L is called a fuzzy G-filter of L if it is a fuzzy filter of L that satisfies the following inequality: (x) For any x, y±L, μ(x x y)≤μ(xy). are defined by Table 3,4. Then Example 4.2. Let L={0,a,b,c,d,1} in which , ,,0,1) is a MTL-algebra. Note that (L, ) is not a chain. L( Table 3
0 a b c d 1
0 1 d 0 d a 0
a 1 1 a 1 1 a
b 1 b 1 1 1 b
Table 4
c 1 b a 1 1 c
d 1 d d d 1 d
1 1 1 1 1 1 1
0 a b c d 1
0 0 0 0 0 0 0
a 0 a c c 0 a
b 0 c b c d b
c 0 c c c 0 c
d 0 0 d 0 0 d
1 0 a b c d 1
Fig. 2
Let μ be a fuzzy set in L given by
μ(x)= ®α ¯β
if x ∈{1, a} otherwise where α>β in [0,1]. Then it is routine to verify that μ is a fuzzy G-filter of L.
Example 4.3. Let L=[0,1] and define a product x for all x, y by
y= ® x ∧ y ¯0
if x + y > 1 , otherwise
and a residuum on L as follows
xy= ®1 ¯(1 − x) ∨ y
if x ≤ y otherwise
L. Then L is a MTL-algebra. Let μ1 and μ2 be two fuzzy sets in L given
μ1(x)= ®α 2 ¯α + (1 − α ) x
if x ∈ [0, 0.5] if x ∈ (0.5,1]
,
μ2(x)= ®α ¯α + (1 − α ) x
if x ∈ [0, 0.5] if x ∈ (0.5,1]
where α in [0,1). Then μ1 and μ2 are fuzzy filter of L, but μ1 and μ2 are not fuzzy Gfilter of L, since μ1(0.3 0.30.1)>μ 1(0.30.1), μ2(0.3 0.30.1)>μ 2(0.30.1).
Fuzzy Ultra Filters and Fuzzy G-Filters of MTL-Algebras
165
Lemma 4.4. (see [10]) Every fuzzy Boolean filter μ of L satisfies the following inequality (xi) For any x, y, z±L, μ(xz)≥min{μ(x(yz)), μ (xy)} Theorem 4.5. A fuzzy set μ in L is a fuzzy Boolean filter of L if and only if it satisfies (a) μ is a fuzzy G-filter of L; (b) μ is a fuzzy MV-filter of L. Proof: Suppose that μ is a fuzzy Boolean filter of L. By theorem 3.19 in [10], we know that μ is a fuzzy MV-filter of L. For any x, y L, by Lemma 4.4 (xi) and Theorem 2.8 (iii) we have
μ(xy)≥min{μ(x(xy)), μ (xx)}=min{μ(x(xy)), μ (1)} =μ(x(xy))=μ(x x y). Thus, μ is a fuzzy G-filter of L. Conversely, let μ be a fuzzy G-filter and fuzzy MV-filter of L. For any x, y have x≤(xy)y (xy)x≤(xy)((xy)y) (xy)x≤((xy) (xy))y
L, we
(by Proposition 2.3 (2)) (by Proposition 2.3 (6)) (by Proposition 2.3 (4))
So,
μ((xy)x)≤μ((xy) (xy)y)≤μ((xy)y) by Definition 2.7 (ii) and Definition 4.1. And, we also have (xy)y≤(x(xy))(xy) ((x(xy))(xy))x≤((xy)y)x
(by Proposition 2.3 (6)) (by Proposition 2.3 (5))
So,
μ((xy)x)≤μ(((x(xy))(xy))x)≤μ(((xy)y)x) by Definition 2.11 (ix) and Definition 2.7 (ii). Thus,
μ((xy)x)≤min{μ((xy)y), μ(((xy)y)x)}. Therefore, by Theorem 2.8 (iv) we get μ((xy)x)≤μ(x). This means that μ is a fuzzy Boolean filter of L by Theorem 2.10 (viii). The proof is end.
References 1. Hájek P.: Metamathematics of Fuzzy Logic. Kluwer Academic Publishers, Dordrecht (1998) 2. M.S.Ying: Perturbation of fuzzy reasoning. IEEE Trans. Fuzzy Sys. 7(1999) 625-629 3. M.S.Ying: Implications operators in fuzzy logic. IEEE Trans. Fuzzy Sys. 10(2002) 88-91 4. G.J.Wang: A formal deductive system for fuzzy propositional calculus. Chinese Science Bulletin 42 (1997) 1521-1526.
166
X.-h. Zhang, Y.-q. Wang, and Y.-l. Liu
5. G.J.Wang: On the logic foundation of fuzzy reasoning. Information Sciences 117 (1999) 47-88 6. G.J.Wang: Non-classical Mathematical Logic and Approximate Reasoning. Science Press, Beijing (in Chinese, 2000) 7. Esteva F., Godo L.: Monoidal t-norm-based logic: towards a logic for left-continuous tnorms. Fuzzy Sets and Systems 12 (2001) (4) 271-288 8. Jenei S., Montagna F.: A proof of standard completeness for Esteva and Godo’s logic MTL. Studia Logica 70 (2002) 183-192. 9. K.H.Kim, Zhang Q., Y.B.Jun: On Fuzzy filters of MTL-algebras. The Journal of Fuzzy Mathematics 10(2002) (4) 981-989 10. Y.B.Jun, Y.Xu, X.H.Zhang: Fuzzy filters of MTL-algebras. Information Sciences (in print) 11. E.Turunen: Boolean deductive systems of BL-algebras. Arch. Math. Logic 40 (2001) 467473 12. R.S.Su, G.J.Wang: On fuzzy MP filters of R0-algebras. Fuzzy Systems and Mathematics 18 (2) (2004) 15-23. 13. X.H.Zhang: On filters in MTL-algebras (submitted) 14. X.H.Zhang, W.H. Li: On fuzzy logic algebraic systems MTL (submitted) 15. X.H.Zhang: Fuzzy BZ-ideals and fuzzy anti-grouped ideals. The Journal of Fuzzy Mathematics 11(2003) (4) 915-932
A Study on Relationship Between Fuzzy Rough Approximation Operators and Fuzzy Topological Spaces Wei-Zhi Wu Information College, Zhejiang Ocean University, Zhoushan, Zhejiang, 316004, P.R. China [email protected]
Abstract. It is proved that a pair of dual fuzzy rough approximation operators can induce a topological space if and only if the fuzzy relation is reflexive and transitive. The sufficient and necessary condition that a fuzzy interior (closure) operator derived from a fuzzy topological space can associate with a fuzzy reflexive and transitive relation such that the induced fuzzy lower (upper) approximation operator is the fuzzy interior (closure) operator is also examined.
1
Introduction
The theory of rough set, proposed by Pawlak [1], is an extension of set theory for the study of intelligent systems characterized by insufficient and incomplete information. The notion of an approximation space consisting of a universe of discourse and an indiscernible relation imposed on it is one of the fundamental concept of rough set theory. Based on the approximation space, the primitive notion of lower and upper approximation operators can be induced. From both theoretic and practical needs, many authors have generalized the concept of approximation operators by using nonequivalent binary relations [2-6], neighborhood systems [7, 8], or by using axiomatic approaches [9-11]. More general frameworks have been obtained under fuzzy environment which involve the rough approximations of fuzzy sets (rough fuzzy sets), the fuzzy approximations of sets (fuzzy rough sets) [12-23]. Extensive research has also been carried out to compare the theory of rough sets with other theories of uncertainty, such as modal logic [18, 24, 25], conditional events [26], and Dempster-Shafer theory of evidence [5, 6, 27, 28]. On the other hand, the relationships between rough sets and topological spaces were studied by many authors [29-32]. The relationships between crisp rough sets and crisp topological spaces were studied in detail. The relationship between fuzzy rough sets and fuzzy topological spaces was investigated by Boixader et al [12], but their studies were restricted to fuzzy T -rough sets defined by fuzzy T -similarity relations which were equivalence crisp relations when they degenerated into crisp ones. In this paper, we focus mainly on the study of the relationship between fuzzy rough approximation operators and fuzzy topological spaces in general case. L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 167–174, 2005. c Springer-Verlag Berlin Heidelberg 2005
168
2
W.-Z. Wu
Fuzzy Rough Approximation Operators
Let X be a nonempty set called the universe of discourse. The class of all subsets (respectively, fuzzy subsets) of X will be denoted by P(X) (respectively, by F (X)). For any A ∈ F(X), the complement of A is denoted by ∼ A. Definition 1. Let R be a fuzzy binary relation on U , i.e. R ∈ F(U × U ). R is R(x, y) = 1; R is referred to referred to as a serial fuzzy relation if ∀x ∈ U , y∈U
as a reflexive fuzzy relation if R(x, x)= 1 for all x ∈ U ; R is referred to as a (R(x, y) ∧ R(y, z)) for all x, z ∈ U . transitive fuzzy relation if R(x, z) ≥ y∈U
Definition 2. If R is a fuzzy relation on U , then the pair (U, R) is referred to as a fuzzy approximation space. Let A ∈ F(U ), the lower and upper approximations of A, R(A) and R(A), with respect to the fuzzy approximation space (U, R) are fuzzy sets of U whose membership functions, for each x ∈ U , are defined, respectively, by [(1 − R(x, y)) ∨ A(y)], R(A)(x) = y∈U R(A)(x) = [R(x, y) ∧ A(y)]. y∈U
The pair (R(A), R(A)) is referred to as a fuzzy rough set, and R, R : F (U ) → F (U ) are referred to as lower and upper fuzzy rough approximation operators, respectively. Lemma 3 ([20]). The lower and upper fuzzy rough approximation operators, R and R, satisfy the properties: ∀A, B ∈ F(U ), ∀Aj ∈ F(U )(∀j ∈ J), ∀α ∈ I = [0, 1], (FL1) R(A) =∼ R(∼ A), (FU1) R(A) =∼ R(∼ A); (FL2) R(A∪ α ) = R(A) ∪ α , (FU2) R(A∩ α ) = R(A) ; ∩α (FL3) R( Aj ) = R(Aj ), (FU3) R( Aj ) = R(Aj ), j∈J
j∈J
j∈J
j∈J
where α is the constant fuzzy set, i.e. α (x) = a, ∀x ∈ U. Properties (FL1) and (FU1) show that the fuzzy rough approximation operators R and R are dual each other. Properties with the same number may be regarded as dual properties. It can be checked that (FL4) A ⊆B =⇒ R(A) ⊆ R(B), (FU4) A ⊆B =⇒ R(A) ⊆ R(B); (FL5) R( Aj ) ⊇ R(Aj ), (FU5) R( Aj ) ⊆ R(Aj ). j∈J
j∈J
j∈J
j∈J
Properties (FL2) and (FU2) imply the following properties (FL2) and (FU2) :
(FL2)
R(U ) = U,
(FU2)
R(∅) = ∅.
A Study on Relationship Between Fuzzy Rough Approximation Operators
169
Definition 4. Let L, H : F (U ) → F (U ) be two operators. They are referred to as dual operators if for all A ∈ F(U ): (Fl1)
L(A) =∼ H(∼ A),
(Fu1)
H(A) =∼ L(∼ A).
Rough set approximation operators can also be characterized by axioms. In the axiomatic approach, rough sets are axiomatized by abstract operators. For the case of fuzzy rough sets, the primitive notion is a system (F (U ), ∩, ∪, ∼, L, H), where L, H : F (U ) → F (U ) are operators from F (U ) to F (U ). Lemma 5 ([20]). Let L, H : F (U ) → F (U ) be two operators. Then there exists a fuzzy relation R on U such that for all A ∈ F(U ) L(A) = R(A),
and
H(A) = R(A)
if and only if (iff ) L and H satisfy the following axioms: ∀A, B ∈ F(U ), ∀Aj ∈ F (U )(∀j ∈ J), and ∀α ∈ I (Fl1) L(A) =∼ H(∼ A), (Fu1) H(A) =∼ L(∼ A); (Fl2) L(A∪ α ) = L(A) ∪ α , (Fu2) H(A∩ α ) = H(A) ; ∩α Aj ) = L(Aj ), (Fu3) H( Aj ) = H(Aj ). (Fl3) L( j∈J
j∈J
j∈J
j∈J
Definition 6. Let L, H : F (U ) → F (U ) be a pair of dual operators. If L satisfies axioms (Fl2) and (Fl3) or equivalently H satisfies axioms (Fu2) and (Fu3), then the system (F (U ), ∩, ∪, ∼, L, H) is referred to as a fuzzy rough set algebra, and L and H are referred to as lower and upper fuzzy approximation operators respectively. Lemma 7 ([20]). Assume that L, H : F (U ) → F (U ) is a pair of dual fuzzy approximation operators, i.e., L satisfies axioms (Fl1), (Fl2) and (Fl3), and H satisfies (Fu1), (Fu2) and (Fu3). Then there exists a serial fuzzy relation R on U such that L(A) = R(A) and H(A) = R(A) for all A ∈ F(U ) iff L and H satisfy axioms: (Fl0) L( α) = α , ∀α ∈ I; (Fu0) H( α) = α , ∀α ∈ I. Lemma 8 ([20]). Assume that L, H : F (U ) → F (U ) is a pair of dual fuzzy approximation operators. Then there exists a reflexive fuzzy relation R on U such that L(A) = R(A) and H(A) = R(A) for all A ∈ F(U ) iff L and H satisfy axioms: (Fl6) L(A) ⊆ A, ∀A ∈ F(U ); (Fu6) A ⊆ H(A), ∀A ∈ F(U ). Lemma 9 ([20]). Assume that L, H : F (U ) → F (U ) is a pair of dual fuzzy approximation operators. Then there exists a transitive fuzzy relation R on U such that L(A) = R(A) and H(A) = R(A) for all A ∈ F(U ) iff L and H satisfy axioms: (Fl7) L(A) ⊆ L(L(A)), ∀A ∈ F(U ); (Fu7) H(H(A)) ⊆ H(A), ∀A ∈ F(U ).
170
3
W.-Z. Wu
Fuzzy Topological Spaces and Fuzzy Rough Approximation Operators
Definition 10 ([33]). A subset τ of F (U ) is referred to as a fuzzy topology on U iff it satisfies A ∈ τ, (1) If A ⊆ τ , then A∈A
(2) (3)
If A, B ∈ τ , then A ∩ B ∈ τ , If α ∈ F(U ) is a constant fuzzy set, then α ∈ τ.
Definition 11 ([33]). A map Ψ : F (U ) → F (U ) is referred to as a fuzzy interior operator iff for all A, B ∈ F(U ) it satisfies: (1) (2) (3) (4)
Ψ (A) ⊆ A, Ψ (A ∩ B) = Ψ (A) ∩ Ψ (B), Ψ 2 (A) = Ψ (A), Ψ ( α) = α , ∀α ∈ I.
Definition 12 ([33]). A map Φ : F (U ) → F (U ) is referred to as a fuzzy closure operator iff for all A, B ∈ F(U ) it satisfies: (1) (2) (3) (4)
A ⊆ Φ(A), Φ(A ∪ B) = Φ(A) ∪ Φ(B), Φ2 (A) = Φ(A), Φ( α) = α , ∀α ∈ I.
The elements of a fuzzy topology τ are referred to as open fuzzy sets, and it is easy to show that a fuzzy interior operator Ψ defines a fuzzy topology τΨ = {A ∈ F(U ) : Ψ (A) = A}. So, the open fuzzy sets are the fixed points of Ψ . Theorem 13. Assume that R is a fuzzy relation on U . Then the operator Φ = R : F (U ) → F (U ) is a fuzzy closure operator iff R is a reflexive and transitive fuzzy relation. Proof. “ ⇒ ” Assume that Φ = R : F (U ) → F (U ) is a fuzzy closure operator. Since A ⊆ Φ(A), ∀A ∈ F(U ), by Lemma 8 we know that R is a reflexive fuzzy relation. Since Φ2 (A) = Φ(A), i.e. R(R(A)) = R(A), ∀A ∈ F(U ), by using the reflexivity of R we must have R(R(A)) ⊆ R(A),
∀A ∈ F(U ).
Hence by Lemma 9 we conclude that R is a transitive fuzzy relation. “ ⇐ ” If R is a reflexive and transitive fuzzy relation, by Lemma 8 we have A ⊆ Φ(A), ∀A ∈ F(U ). It should be noted that the reflexivity of R implies that R is serial, it follows from Lemma 7 that Φ( α) = α , ∀α ∈ I.
A Study on Relationship Between Fuzzy Rough Approximation Operators
171
Since R is transitive, by Lemma 9 we have that Φ2 (A) ⊆ Φ(A),
∀A ∈ F(U ).
On the other hand, by using the reflexivity it is easy to see that Φ2 (A) ⊇ Φ(A),
∀A ∈ F(U ).
Φ2 (A) = Φ(A),
∀A ∈ F(U ).
Hence From Lemma 3 we have that Φ(A ∪ B) = Φ(A) ∪ Φ(B). Thus we have proved that Φ = R : F (U ) → F(U ) is a fuzzy closure operator. Theorem 14. Assume that R is a fuzzy relation on U . Then the operator Ψ = R : F (U ) → F (U ) is a fuzzy interior operator iff R is a reflexive and transitive fuzzy relation. Proof. It is similar to the proof of Theorem 13. Theorem 15. Assume that R is a reflexive and transitive fuzzy relation on U . Then there exists a fuzzy topology τR on U such that Ψ = R : F (U ) → F(U ) and Φ = R : F (U ) → F (U ) are the fuzzy interior and closure operators respectively. Proof. Suppose that R is a reflexive and transitive fuzzy relation on U . Then by Theorem 13 and Theorem 14 we conclude that Ψ = R : F (U ) → F(U ) and Φ = R : F (U ) → F (U ) are the fuzzy interior and closure operators respectively. Define τR = {A ∈ F(U ) : Ψ (A) = R(A) = A}. It can easily be checked that τR is a fuzzy topology on U . Evidently, Ψ and Φ are respectively the interior and closure operators induced by τR . Theorem 16. Let Φ : F (U ) → F (U ) be a fuzzy closure operator, then there exists a reflexive and transitive fuzzy relation on U such that R(A) = Φ(A) for all A ∈ F(U ) iff Φ satisfies the following two conditions (1) Φ( Aj ) = Φ(Aj ), Aj ∈ F(U ), j ∈ J, j∈J
j∈J
(2) Φ(A ∩ α ) = Φ(A) ∩ α ,
∀A ∈ F(U ), ∀α ∈ I.
Proof. Let Φ : F (U ) → F (U ) be a fuzzy closure operator. If their exists a reflexive and transitive fuzzy relation on U such that R(A) = Φ(A) for all A ∈ F (U ), then by Lemma 3 we know that conditions (1) and (2) hold. Conversely, if Φ satisfies the conditions (1) and (2), then, by Lemmas 5, 8, 9, and Theorem 13, there exists a reflexive and transitive fuzzy relation on U such that R(A) = Φ(A) for all A ∈ F(U ).
172
W.-Z. Wu
Theorem 17. Let Ψ : F (U ) → F (U ) be a fuzzy interior operator, then there exists a reflexive and transitive fuzzy relation on U such that R(A) = Ψ (A) for all A ∈ F(U ) iff Ψ satisfies the following two conditions (1) Ψ ( Aj ) = Ψ (Aj ), Aj ∈ F(U ), j ∈ J, j∈J
j∈J
(2) Ψ (A ∪ α ) = Ψ (A) ∪ α ,
∀A ∈ F(U ), ∀α ∈ I.
Proof. Let Ψ : F (U ) → F (U ) be a fuzzy interior operator. If their exists a reflexive and transitive fuzzy relation on U such that R(A) = Ψ (A) for all A ∈ F (U ), then by Lemma 3 we know that conditions (1) and (2) hold. Conversely, if Ψ satisfies the conditions (1) and (2), then, by Lemmas 5, 8, 9, and Theorem 14, there exists a reflexive and transitive fuzzy relation on U such that R(A) = Ψ (A) for all A ∈ F(U ). Remark. It should be pointed out that if U is a finite universe of discourse, then properties (FL3) and (FU3) in Lemma 3 can equivalently be replaced by the following properties (FL3) and (FU3) respectively [21, 22] (FL3) R(A B) = R(A) R(B), ∀A, B ∈ F(U ), (FU3) R(A B) = R(A) R(B), ∀A, B ∈ F(U ), thus the condition (1) in Theorems 16 and 17 can be omitted.
4
Conclusion
We have proved that a pair of dual fuzzy rough approximation operators can induce a topological space if and only if the fuzzy relation is reflexive and transitive. On the other hand, under certain conditions a fuzzy interior (closure) operator derived from a fuzzy topological space can associate with a reflexive and transitive fuzzy relation such that the induced fuzzy lower (upper) approximation operator is the fuzzy interior (closure) operator. In this paper, we only consider the fuzzy rough sets constructed by the triangle normal T = min, we will generalize the research to the (I, T )-fuzzy rough sets. The analysis will facilitate further research in uncertain reasoning under fuzziness.
Acknowledgement This work was supported by a grant from the National Natural Science Foundation of China (No. 60373078)
References 1. Pawlak, Z.: Rough sets. International Journal of Computer and Information Science 11(1982) 341–356 2. Pomykala, J.A.: Approximation operations in approximation space. Bulletin of the Polish Academy of Sciences: Mathematics 35(1987) 653–662
A Study on Relationship Between Fuzzy Rough Approximation Operators
173
3. Wybraniec-Skardowska, U.: On a generalization of approximation space. Bulletin of the Polish Academy of Sciences: Mathematics 37(1989) 51–61 4. Yao, Y.Y.: Generalized rough set model. In: Polkowski, L., Skowron, A.(eds): Rough Sets in Knowledge Discovery 1. Methodology and Applications. PhysicaVerlag, Heidelberg, 1998, pp.286–318 5. Zhang, W.-X., Leung, Y., Wu, W.-Z.: Information Systems and Knowledge Discovery. Science Press, Beijing, 2003 6. Zhang, W.-X., Wu, W.-Z., Liang, J.-Y., Li, D.-Y.: Rough Set Theory and Approach. Science Press, Beijing, 2001 7. Wu, W.-Z., Zhang, W.-X.: Neighborhood operator systems and approximations. Information Sciences 144(2002) 201–217 8. Yao, Y.Y.: Relational interpretations of neighborhood operators and rough set approximation operators. Information Sciences 111(1998) 239–259 9. Lin, T.Y., Liu, Q.: Rough approximate operators: axiomatic rough set theory. In: Ziarko, W.(eds.): Rough Sets, Fuzzy Sets and Knowledge Discovery. Springer, Berlin, 1994, pp.256–260 10. Thiele, H.: On axiomatic characterisations of crisp approximation operators. Information Sciences 129(2000) 221–226 11. Yao, Y.Y.: Constructive and algebraic methods of the theory of rough sets. Journal of Information Sciences 109(1998) 21–47 12. Boixader, D., Jacas, J., Recasens, J.: Upper and lower approximations of fuzzy sets. International Journal of General Systems 29(2000) 555–568 13. Dubois, D., Prade, H.: Rough fuzzy sets and fuzzy rough sets. International Journal of General Systems 17(1990) 191–208 14. Mi, J.-S., Zhang, W.-X.: An axiomatic characterization of a fuzzy generalization of rough sets. Information Sciences 160(2004) 235–249 15. Morsi, N.N., Yakout, M.M.: Axiomatics for fuzzy rough sets. Fuzzy Sets and Systems 100(1998) 327–342 16. Radzikowska, A.M., Kerre, E.E.: A comparative study of fuzzy rough sets. Fuzzy Sets and Systems 126(2002) 137–155 17. Thiele, H.: On axiomatic characterisation of fuzzy approximation operators I, the fuzzy rough set based case. RSCTC 2000, Banff Park Lodge, Bariff, Canada, Oct. 19–19, 2000. Conference Proceedings, pp.239–247 18. Thiele, H.: On axiomatic characterisation of fuzzy approximation operators II, the rough fuzzy set based case. Proceedings of the 31st IEEE International Symposium on Multiple-Valued Logic, 2001, pp.330–335 19. Thiele, H.: On axiomatic characterization of fuzzy approximation operators III, the fuzzy diamond and fuzzy box cases. The 10th IEEE International Conference on Fuzzy Systems 2001, Vol. 2, pp.1148–1151 20. Wu, W.-Z., Leung, Y., Mi, J.-S.: On characterizations of (I, T)-fuzzy rough approximation operators. Fuzzy Sets and Systems (to appear) 21. Wu, W.-Z., Mi, J.-S., Zhang, W.-X.: Generalized fuzzy rough sets. Information Sciences 151(2003) 263–282 22. Wu., W.-Z., Zhang, W.-X.: Constructive and axiomatic approaches of fuzzy approximation operators. Information Sciences 159(2004) 233–254 23. Yao, Y.Y.: Combination of rough and fuzzy sets based on α-level sets. In: Lin, T. Y., Cercone, N.(eds.): Rough Sets and Data Mining: Analysis for Imprecise Data. Kluwer Academic Publishers, Boston, 1997, pp.301–321 24. Nakamura, A., Gao, J. M.: On a KTB-modal fuzzy logic. Fuzzy Sets and Systems 45(1992) 327–334
174
W.-Z. Wu
25. Yao, Y.Y., Lin, T.Y.: Generalization of rough sets using modal logic. Intelligent Automation and Soft Computing: An International Journal 2 (1996) 103–120 26. Wasilewska, A.: Conditional knowledge representation systems—model for an implementation. Bulletin of the Polish Academy of Sciences: Mathematics 37(1989) 63–69 27. Wu, W.-Z., Leung, Y., Zhang, W.-X.: Connections between rough set theory and Dempster-Shafer theory of evidence. International Journal of General Systems 31(2002) 405–430 28. Yao, Y. Y., Lingras, P.J.: Interpretations of belief functions in the theory of rough sets. Information Sciences 104(1998) 81–106 29. Chuchro, M.: On rough sets in topological Boolean algebras. In: Ziarko, W.(ed.): Rough Sets, Fuzzy Sets and Knowledge Discovery, Springer-Verlag, New York, 1994, pp.157–160 30. Chuchro, M.: A certain conception of rough sets in topological Boolean algebras. Bulletin of the Section of Logic 22(1)(1993) 9–12 31. Kortelainen, J.: On relationship between modified sets, topological space and rough sets. Fuzzy Sets and Systems 61(1994) 91–95 32. Wiweger, R.: On topological rough sets. Bulletin of Polish Academy of Sciences: Mathematics 37(1989) 89–93 33. Lowen, R.: Fuzzy topological spaces and fuzzy compactness. Journal of Mathematical Analysis and Applications 56(1976) 621–633
A Case Retrieval Model Based on Factor-Structure Connection and λ−Similarity in Fuzzy Case-Based Reasoning Dan Meng1 , Zaiqiang Zhang2 , and Yang Xu3 1
2
3
School of Economics Information engineering, Southwest University of Finance and Economics, Chengdu, Sichuan, China, 610074 mengd [email protected] School of Economics and Management, Southwest Jiaotong University, Chengdu, Sichuan, China, 610031 [email protected] Intelligent Control Development Center, Southwest Jiaotong University, Chengdu, Sichuan, China, 610031 [email protected]
Abstract. One of the fundamental goals of artificial intelligence (AI) is to build artificially computer-based systems which make computer simulate, extend and expand human’s intelligence and empower computers to perform tasks which are routinely performed by human beings. Without effective reasoning mechanism, it is impossible to make computer think and reason. Research on reasoning is an interesting and meaningful problem. Case-based reasoning is a branch of reasoning research. There are a lot of research results and successful application on Casebased reasoning. How to index and retrieve similar cases is one of key issue in Case-based reasoning research hotspots in CBR research works. In addition, there is a lot of uncertainty information in daily work and everyday life. So how to deal with uncertainty information in Case-based Reasoning effectively becomes more and more important. In this paper, a new case indexing and retrieval model which can deal with both fuzzy information and accurate information is presented.
1
Introduction
Case-based reasoning (CBR) is an effective reasoning technique which has been widely used in many real-world applications such as disease-diagnose, decision support system, classification, planning, configuration, and design and so on [2,3,4,6,7,11,12]. There are a lot of research work on CBR[1,4,5,7,11,10,12]. However, there are still some problems. For example, some methods only are fit for qualitative attributes, but not for quantitative attribute cases efficiently, especially for the fuzzy quantitative attributes. In addition, many previous methods didn’t refer to the inner logic relations which exist in each case assuredly or can’t deal with L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 175–178, 2005. c Springer-Verlag Berlin Heidelberg 2005
176
D. Meng, Z. Zhang, and Y. Xu
inner logic relation reasonably. So a new indexing model is discussed in this paper. It will have the following characteristics. First, the presentation of cases is based on all kinds of logic relations, which make the indexing and retrieval process more clearly and simply. Second, the model can deal with case which has qualitative attributes and quantitative attribute.Third, the case is stored in case base in the form of factor and its corresponding weight, which make the storage of case base be less than the previous methods. Fourth, The determination of threshold λ and α can be dynamically.
2
The Presentation of Cases
In this paper, the presentation will be based on the factor space[9]. And we use a new unity model – representation based on universal logic relation to represent cases. This method eliminates the logic relation formally, but the virtual relation is reserved. Definition 1. Presentation of universal logic relation 1. if the relation of premises is conjunction or fuzzy conjunction or fuzzy-weight conjunction, then the case is expressed in ‘IF A1 , · · · , An , THEN B’(i = 1, 2, · · · , n); 2. if the relation of premises is disjunction or fuzzy disjunction or fuzzy-weight disjunction, then decompose ‘IF A1 ∨ A2 ∨ · · · ∨ An , THEN B’ into ‘IF Ai THEN B’(i = 1, 2, · · · , n); 3. if the relation of premises is both conjunction or fuzzy conjunction or fuzzyweight conjunction and disjunction or fuzzy disjunction or fuzzy-weight disjunction or several of them, then turn it into a disjunction normal form[13], then decompose it in 1,2 respectively; 4. if the relation of premises is feeble logic relation, then express the case into ‘IF A1 , · · · , An , THEN B’. By the above method, every case can be expressed into ‘IF A1 , · · · , An , THEN B’. Definition 2. A fuzzy set consisting of atom factors and their corresponding weights is called an atom factor fuzzy set, which has the form of {ω1 /f1 , · · · , ωn /fn }. In the following , we express the case (whether for query or for reasoning) in a atom factor fuzzy set {ω1 /g1 , · · · , ωm /gm } and its atom factor state set {A1 , · · · , Am },wherein gm is factor, ωi is corresponding weight, Ai is fuzzy set (including classic set).
3 3.1
Indexing and Retrieval Model Based on Factor-Structure Connection and λ-Similarity Factor-Structure Connection Among Cases
Definition 3. SR(ai , bj ) is called semantic connection between atom factors ai and bj :
A Case Retrieval Model Based on Factor-Structure Connection
SR(ai , bj ) =
1, ai = bj 0, ai = bj .
177
(1)
Definition 4. Let atom factor fuzzy set of case A, B be A = {p1 /a1 , · · · , pn /an }, B = {q1 /b1 , · · · , qn /bn } respectively, such that
n
n 1. i=1 pi ≤ 1, j=1 qi ≤ 1; 2. SR(ai , bi ) = 1, where i = 1, 2, · · · , n, then SR(A, B) is called factor-structure connection of case A,B, SR(A, B) =
k
(1 − |pi − qi |) × (pi ⊗ qi ),
(2)
i=1
where ⊗ is average or min, × is multiplication. Definition 5. Let atom factor fuzzy set of A, B be A = {p1 /a1 , · · · , pn /an }, B = {q1 /b1 , · · · , qm /bm } respectively, and factor set of A, B be FA = {a1 , · · · , an }, FB = {b1 , · · · , bm } respectively, if FA ∩FB = k, k ≤ min(m, n), arrange the same atom factor and its corresponding weight of A, B and delete the different factors and weights. Note the worked reminder fuzzy set is A∗ = {p∗1 /a1 , · · · , p∗k /ak }, B ∗ = {q1∗ /a1 , · · · , qk∗ /ak }. Without loss of universality, give the definition of factor-structure connection of A, B as following SR(A, B) = SR(A∗ , B ∗ ) =
k
(1 − |p∗i − qi∗ |) × (p∗i ⊗ qi∗ )
(3)
i=1
We can compute the factor-structure connection of existed case and problem case Aj , A0 SR(Aj A0 )by definition 4 and definition 5, The next step is to judge if their states are similar. 3.2
λ-Similarity of Cases
Definition 6. Define adjacency field Dλ (A, B) of fuzzy set A, B as following: Dλ : F (U ) × F (U ) → P (U ): ∀(A, B) ∈ F (U ) × F (U ), Dλ (A, B) = {u|u ∈ U andd(A(u), B(u)) < λ},
(4)
λ ∈ [0, 1], d(a, b) is some distance on [0, 1]. Definition 7. Let U be a finite discrete domain, define λ-similarity Sλ of fuzzy sets A, B as follows Sλ : F (U ) × F (U ) → [0, 1], ∀(A, B) ∈ F (U ) × F (U ), Sλ (A, B) =
|Dλ | |U |
wherein λ ∈ [0, 1], Dλ (A, B) is adjacency field of fuzzy set A, B.
(5)
178
D. Meng, Z. Zhang, and Y. Xu
Let the remainder state value set of problem A0 be {A01 , · · · , A0k }, the remainder state value set of existent cases Aj be {Aj1 , · · · , Ajk }, then we can get Sλ (Aji , A0I ), i = 1, 2, · · · and by the following formula Sλ (Aj , A0 ) =
k
(ωji ⊗ w0i ) ◦ Sλ (Aji , A0i )
(6)
i=1
where ωji , w0i is weight of the same factor of existent case and problem respectively, ⊗ is average or min, ◦ is min or multiplication.
4
Conclusion
In this paper, an indexing model in fuzzy Case-based reasoning is given. It is more suitable for the cases that have qualitative or quantitative attribute cases and for the precise or fuzzy attribute value cases theoretically. However, how to define the weights of each factor reasonably and how to define the membership of fuzzy set appropriately still should be further discussed.1
References 1. He Xingui: fuzzy data-base systems (in Chinese). Tsinghua university press,Beijing, (1994), 61-63 2. He Xingui: The theory and technology of fuzzy knowledge processing.Beijing: National Defence Industry Press, (1994), 23-32,178-182, 3. He Xingui: Weighted fuzzy logic and its application in different fields. Chinese Journal of Computers(in Chinese),(1989), 458-464, 12(6), 4. Li and D.Xu: Case-based Reasoning. IEEE Potentials, (1994), 10-14 5. Nguyen Hoang Phuong, Nguyen Ba Tu et al: Case Based Reasoning Using Fuzzy Set Theory and the Importance of Features in Medicine. IEEE, (2001), 872-876 6. Shu Chang etal.: The feeble logic of Compound fuzzy proposition and it’s algorithm. Chinese Journal of Computers (in Chinese),(2000),272-277, 23(3) 7. Tsatoulis C and Lasbyap RL: Case-Based reasoning and learning In manufacturing with the TOLTEC planne. IEEE Transaction on Systems, Man, and cybernetics, (1993), 1010-1023, 23(4) 8. Wei-Chou Chen et al.: A Framework of Features for the Case-based Reasoning. IEEE, (2000), 1-5 9. Wang Peizhuang and Li Hongxing: Fuzzy system and fuzzy computer(in Chinese). Science press, (1996), 29-59 10. Wang Mingyang et al.: ICIMX:An expert systems with case-based reasoning. Chinese Journal of Computers (in Chinese),(1997), 105-110, 20(2) 11. Xu Ming and Hu Shouren: Research on indexing model for Case-based reasoning. Computer Science (in Chinese), (1993), 32-35, 20(4) 12. X. Z. Wang,and D. S. Yeung: Using Fuzzy Integral to Modeling Case Based Reasoning with Feature Interaction. IEEE, (2000), 3660-3665 1
This work was supported by the Scientific Research Fund of Southwestern University of Finance and Economics, the National Natrual Science Fund of P.R.China (Grant No. 60474022) and Innovation Fund of SWJTU for Ph.D.
A TSK Fuzzy Inference Algorithm for Online Identification Kyoungjung Kim1, Eun Ju Whang1, Chang-Woo Park2, Euntai Kim1, and Mignon Park1 1
Department of electrical and electronic engineering, Yonsei University, 134, Shinchon-dong, Sudaemoon-gu, Seoul, Korea {kjkim01, garung, etkim, mignpark}@yonsei.ac.kr http://yeics.yonsei.ac.kr 2 Korea electronics technology institute, 401-402 B/D 192, Yakdae-Dong, Wonmi-Gu, Buchon-Si, Kyunggi-Do, Korea {drcwpark}@keti.re.kr
Abstract. This paper proposes an online self-organizing identification algorithm for TSK fuzzy model. The structure of TSK fuzzy model is identified using distance. Parameters of the piecewise linear function consisting consequent part are obtained using recursive version of combined learning method of global and local learning. Both input and output spaces are considered in the proposed algorithm to identify the structure of the TSK fuzzy model. By processing clustering both in input and output space, outliers are excluded in clustering effectively. The proposed algorithm is non-sensitive to noise not by using data itself as cluster centers. The proposed algorithm can obtain a TSK fuzzy model through one pass. By using the proposed combined learning method, the estimated function can have high accuracy.
1 Introduction TSK fuzzy has been used in various applications. In recent years, many studies on fuzzy model based identification of nonlinear dynamic systems have been conducted. System identification can be classified into two groups roughly, that is, 1) off line identification [12], [21], [22], [23], 2) online identification [1-5], [7], [9-11], [13], [15], [20], [28-30]. Off line identification supposes that all the data is available at the start of the process of training. In online identification, model structure and parameter identification is required. Clustering method is used for structure identification mainly. In clustering, to update fuzzy rules, distance from centers of fuzzy rules, potentials of new data sample, or error from previous step have been used. Different mechanisms are used in constructing structure. Self-organizing fuzzy neural network (SOFNN) [1], self-constructing fuzzy neural network SCFNN) [5], dynamic fuzzy neural network (D-FNN) [9], [10] and general dynamic fuzzy neural network (GDFNN) [8] use error to update fuzzy rules. Dynamic evolving neural-fuzzy inference system (DENFIS) [3] and self-constructing neural fuzzy inference network (SONFIN) [4] use distance to update fuzzy rules. Evolving Takagi-Sugeno model (ETS) [2] uses potential to update fuzzy rules. There are other approaches; that is, evolving fuzzy L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 179 – 188, 2005. © Springer-Verlag Berlin Heidelberg 2005
180
K. Kim et al.
neural networks (EFuNNs) [15] using fuzzy difference between two membership vector, [20], neural fuzzy control network (NFCN) [7] using kohonen’s feature maps, neuro-fuzzy ART-based structure and parameter learning TSK model (NeuroFAST) [11] using ART concept and generalized adaptive neuro-fuzzy inference systems (GANFIS) [13] using modified mountain clustering. To learn parameters of consequent part, recursive least square (RLS) [2], [4], [9,10], weighted recursive least square (wRLS) [2] are used mainly, and also singular value decomposition (SVD) [6], δ rule [11] is used. A new online structure identification algorithm is proposed in this paper. Generally, clustering is considered in input space only. Because not only input but also output include important information of the system, we considered clustering in both input and output space. Also, by not take data points as centers of clusters, the proposed algorithm shows non-sensitive characteristics to noise. We use maximum distance from centers to update rules. To learn parameters of consequent part, we propose a method to learn globally and locally. Parameters of consequence are calculated recursively using recursive version of the proposed learning method. The rest of the paper is organized as follows. In section 2, we describe TSK fuzzy briefly, and a present new structure identification algorithm. New Parameter learning algorithm is proposed also in section 2. In section 3, Procedure for online structure and parameter identification algorithm is presented. In section 4, test results are given. Conclusions are presented in section 5.
2 TSK Fuzzy Modeling Algorithm TSK fuzzy model can present static or dynamic nonlinear system. This paper deals with dynamic nonlinear system. TSK fuzzy model has premise parts of membership function and consequent parts consisting of piecewise linear functions. TSK fuzzy model has the form as G G G (1) R i : If x1 is A1i (θ ii ) and x 2 is A2i (θ 2i ),! , x n is Ani (θ ni )
then h i = b0i + b1i x1 + b2i x 2 + " + bni x n Ki i where i = 1,2,", r , r is the number of rules, A j (θ j ) is the fuzzy set of the j th Ki Ki i i rule for x j with the adjustable parameter set θ j , and b = (b0 ,", bn ) is the parameter set in the consequent part. The predicted output of the fuzzy model can be inferred from (1) as C
yˆ =
¦ ¦
i =1 C
h i wi
i =1
i
(2)
wi i
where h is the output of the i th rule, w is the i th rule’s firing strength, which is obtained as the minimum of the fuzzy membership functions
A TSK Fuzzy Inference Algorithm for Online Identification
° ( x j − Cij ) 2 ½° Aij (Cij ,σ ij ) = exp®− ¾ 2σ ij2 °¿ °¯ where
181
(3)
C ij is cluster center, and σ ij is width of membership function.
We describe the method to find centers and widths of membership functions of premise parts and parameters of consequences fast and effectively. 2.1 Structure Identification Structure identification is conducted by clustering data in input and output space and deciding centers. We suppose fuzzy membership function has gaussian type. Clustering process is presented in 2-dimensional space. We calculate distances between incoming data and all existing cluster centers in input and output space as
where
d ijin = xi − C jx , i = 1,", n , j = 1,", m
(4)
d ijout = yi − C jy , i = 1,", n , j = 1,", m
(5)
xi is input data, y i is output data, d ijin is the distance between i th incoming
j th cluster center in input space, d ijout is the distance between i th incoming data and the center of cluster j in output space and n , m is number of data and x y number of clusters respectively. C j is center in input space and C j is center in data and
output space.
d in , d out predefined upper bound in input space and output space respectively. MAXDi represents maximum distance between the center cluster j and data points included in cluster j . Let
1) Create a new cluster and take the first incoming data as a position of the cluster center. Initialize width of membership function with small random number.
C1 = [ x1 , y1 ] , σ 1 = σ 0 And, initialize maximum distance from cluster center to 0. When k th data sample enters, calculate distance between incoming data and all existing cluster center in input space and output space, and find minimum distance. With minimum distance obtained from the above, create new cluster and update center position and width as the following. –
min(d ijin ) > d in and d ijout > d out : When minimum distance is greater than upper bound in input space and distance between data points and corresponding cluster is greater than upper bound in output space, incoming data is con-
182
K. Kim et al.
sidered to outlier. In this case, the center and width of cluster and the number of cluster is not changed.
C j = C j , σ j = σ j , m(t + 1) = m(t ) –
(6)
min(d ijin ) > d in and d ijout ≤ d out : When minimum distance is greater than upper bound in input space and distance between incoming data and corresponding cluster is smaller than or equal to upper bound in output space, create a new cluster and increase the number of cluster. Initialize cluster center and width as
C j = [ x k , y k ] , σ j = σ 0 , m(t + 1) = m(t ) + 1 –
(7)
min(d ijin ) ≤ d in and d ijout > d out : When minimum distance is smaller than or equal to upper bound in input space and distance between incoming data and corresponding cluster is greater than upper bound in output space, incoming data is considered to outlier and the number, center and width of cluster is maintained as (6).
–
min(d ijin ) ≤ d in and d ijout ≤ d out : When minimum distance is smaller than or equal to upper bound in input space and distance between incoming data and corresponding cluster center is smaller than or equal to upper bound in output space, incoming data is considered to the data points included in cluster j , and center position and width of cluster is updated. In this case, the number of cluster is maintained.
In case of
where
α
MAXDi ≥ d ijin , update center position as follows Ci (t + 1) = Ci (t ) + α ( Di max − d ij )
(8)
Di max = d ijin
(9)
is update ratio.
MAXDi < d ijin , maintain center position and maximum distance between cluster j and any data points included in cluster j . In case of
Next, we determine width using distance between cluster center and each data points included in this cluster.
S j (t + 1) = S j (t ) + d ij2
(10)
σ 2j = S j / g
(11)
where g represents the number of data included cluster
j.
A TSK Fuzzy Inference Algorithm for Online Identification
183
We can construct fuzzy structure from the above process. After the structure is obtained, we can learn parameters of consequence. 2.2 Parameter Identification In this paper, we use a combined cost function of global and local approximation [6] to estimate the parameters of the consequent part. Each piecewise linear function of consequence can be represented as
yˆ = b0 + b1 x1 + b2 x2 + " + bq xq
(12)
The cost function for global learning is as follows N
2
J G = ¦ [ y (k ) − yˆ (k )]
(13)
k =1
where y ( k ) is the output of the real system, yˆ ( k ) is the output of the identified model, and N is the number of training data. The cost function for local learning is as follows L
N
2
J L = ¦¦ [ y (k ) − wi Xb]
(14)
i =1 k =1
The cost function of combined learning is as follows
J = γ 1 (y − Xb)T (y − Xb) + γ 2 (y − WXb) T ( y − WXb) where
γ1
and
γ2
(15)
are two positive constants satisfying (16)
γ1 + γ 2 =1
(16)
From (15) b can be obtained as
b = ( XT (γ 1I + γ 2 W T W ) X ) −1 (γ 1I + γ 2 W ) XT y
(17)
We can rearrange (17) as
b = P(γ 1I + γ 2 W) XT y
(18)
P = ( XT (γ 1I + γ 2 W T W ) X) −1
(19)
Now, we can represent (18) and (19) to a recursive way with forgetting factor as follows
b k +1 = b k + (γ 1 + γ 2 wk +1 )Pk +1x k +1 ( yk +1 − xTk +1b k )
(20)
184
K. Kim et al.
(γ 1 + γ 2 wkT+1wk +1 )Pk x k +1x Tk +1Pk 1§ ¨ Pk +1 = ¨ Pk − λ© λ + xTk +1Pk x k +1
· ¸¸ ¹
(21)
where λ is a forgetting factor which typical value is between 0.8-1. Initial value of P and b can be calculated from (20) and (21) applying to the center and the first data points included in the center.
3 Procedure for Online Identification We proposed the structure and parameter identification algorithm in section 2. Now, we describe the procedure for the online learning. The online learning procedure of the proposed algorithm is described as follows Step 1) Set upper bound d Step 2) Step 3) Step 4) Step 5) Step 6)
in
, d
out
in input and output space, center update ra-
tio α , forgetting factor λ and two positive constant γ 1 and γ 2 . Create the first cluster and take the first data sample as the center position, and set width σ with small random number. When k th data point enters, calculate distances between entered data and all existing cluster centers, and update center position according to (6)-(9). Calculate the widths of membership functions according to (10) and (11) Consequence parameters can be calculated recursively from (20) and (21). Stop the procedure if there is no more data to learn.
By conducting above procedure, we can obtain fuzzy model that is non-sensitive to outliers and noise. Because it dose not memorize many data and conduct procedure through one pass, the proposed algorithm shows fast response.
4 Experimental Results In this section, we test the proposed algorithm. We use Mackey-Glass (MG) data that used as a standard example in area of fuzzy system and neural network in [2], [3], [15]. Time series data to use in test are generated using MG time delay differential equation as
dx(t ) 0.2 x(t − τ ) − 0.1x(t ) = dt 1 + x10 (t − τ )
(22)
In this tests data obtained when x (0) = 1.2 , τ = 17 , and x (t ) = 0 for t < 0 are used. The I/O data for the proposed algorithm from the Mackey-Glass time series data is in the following format: input vector [ x (t ), x (t − 6), x (t − 12), x (t − 18)] and output vector [ x (t + 6)] . The task is to predict future values x(t+6) from four points spaced at six time intervals in the past. We test using 500 data samples from t = 5001 to t = 5500 .
A TSK Fuzzy Inference Algorithm for Online Identification
185
To evaluate the performance of the proposed algorithm, non-dimensional error index (NDEI) in [2], [3], [15] is used. NDEI is the value of RMSE divided by the standard deviation of the target series. In this test, we set update ratio of cluster center, α , in the proposed algorithm to 0.5 and forgetting factor λ to 0.8. Figure 1 shows identification results of the proposed algorithm for the data samples obtained from (22). We set upper bound in input space to 0.2 and upper bound in output space to 0.8. The number of fuzzy rules is 100 in figure 1. Solid line represents real function and dotted line represents estimated function. We can see that the proposed algorithm estimate real function well from Figure 1. In Figure 1, weight γ 1 , γ 2 is 0.8 and 0.2 respectively.
Fig. 1. Estimation results (ʊ: real function, •••: estimated function)
Table 1 presents experimental results compared with other algorithms. We set upper bound in input space to 7, upper bound in output space to 0.8, γ 1 and γ 2 to 0.8 and 0.2 respectively. In this case, TSK fuzzy model has 72 rules. We use eTS model [2], DENFIS [3] and EFuNN [15] to compare with the proposed algorithm. Table 1. Results compared with other algorithms
Methods DENFIS eTS Model EfuNN The proposed algorithm
Rules 58 113 1125 72
NDEI 0.276 0.0954 0.094 0.0172
From Table 1, we can know that the proposed algorithm is superior to other algorithms when it applied to data samples without noise.
186
K. Kim et al.
We test whether the proposed algorithm can estimate with noisy data well. We obtain noisy data adding noise to data sample from (24).
F = (1 − ε )G + εH
(24)
where F is distribution of noise and G and H are probability distribution of probability of 1 − ε and ε respectively. Figure 2 presents real function and noisy data. Here, we use G ~ N (0,0.05) ,
H ~ N (0,0.1) and ε = 0.05 . From the above experimental results, we can know that the proposed algorithm shows superior performance to other algorithms. It has smaller number of fuzzy rules than other algorithms. And, it can approximate well with noisy data. It shows robust
Fig. 2. Data samples with noise (ʊ: real function, •: noisy data)
Fig. 3. Estimation results with noisy data (ʊ: real function, •••: estimated function)
A TSK Fuzzy Inference Algorithm for Online Identification
187
performance against noise. Because it has smaller number of fuzzy rules than other algorithms, it can reduce computational efforts.
5 Conclusions This paper proposes self-organizing online TSK fuzzy identification algorithm. Distance is used to update fuzzy rule and recursive version of combined learning method of global and local learning is used to calculate parameters of the consequence in the proposed algorithm. The proposed algorithm has a robust performance not by using data points itself as cluster centers. Outliers are excluded from data estimation by considering both input and output space in clustering. We calculate widths of membership functions recursively with squared distance between cluster center and temporal data. The proposed algorithm shows robust performance against noise, and it is effective in identifying the system that includes noise.
References 1. Leng, G., Prasad, G. and McGinnity T. M.: An on-line algorithm for creating selforganizing fuzzy neural networks. Neural Networks, Vol. 17 (2004) 1477-1493 2. Angelov, P. P. and Filev, D. P.: An Approach to Online Identification of Takagi-Sugeno Fuzzy Models. IEEE Trans. System, Man, And Cybernetics-part B, Vol. 34 (2004) 484498 3. Kasabov, N. and Song, Q.: DENFIS: Dynamic Evolving Neural-Fuzzy Inference System and Its Application for Time-Series Prediction. IEEE Trans. Fuzzy Systems, Vol. 10 (2004) 144-154 4. Juang, C.-F. and Lin, C.-T.: An On-line Self-Constructing Neural Fuzzy Inference Network and Its Applications. IEEE Trans. Fuzzy Systems, Vol. 6 (1998) 12-32 5. Lin, F.-J., Lin, C.-H. and Shen, P-H.: Self-Constructing Fuzzy Neural Network Speed Controller for Permanent-Magnet Synchronous Motor Drive. IEEE Trans. Fuzzy Systems, Vol. 9 (2001) 751-759 6. Yen, J., Wang, L. and Gillespie, C. W.: Improving the Interpretability of TSK Fuzzy Models by Combining Global Learning and Local learning. IEEE Trans. Fuzzy Systems, Vol. 6 (1998) 530-537 7. Lin, C.-T.: A neural fuzzy control system with structure and parameter learning. Fuzzy Sets and Systems, Vol. 70 (1995) 183-212 8. Wu, S., Er, M. J. and Gao, Y.: A Fast Approach for Automatic Generation of Fuzzy Fules by Generalized Dynamic Fuzzy Neural Networks. IEEE Trans. Fuzzy Systems, Vol. 9 (2001) 578-594 9. Er, M. J. and Wu, S.: A fast learning algorithm for parsimonious fuzzy neural systems. Fuzzy Sets and Systems, Vol. 126 (2002) 337-351 10. Wu, S. and Er, M. J.: Dynamic Fuzzy Neural Networks-A Novel Approach to Function Approximation. IEEE Trans. Systems, Man, and Cybernetics-Part B, Vol. 30 (2000) 358364 11. Tzafestas, S. G. and Zikidis, K. C.: NeuroFAST: On-line Neuro-Fuzzy ART-Based Structure and Parameter Learning TSK Model. IEEE Trans. Systems, Man, and CyberneticsPart B, Vol. 31 (2001) 797-802
188
K. Kim et al.
12. Kukolj, D. and Levi, E.: Identification of Complex Systems Based on Neural and TakagiSugeno Fuzzy Model. IEEE Trans. Systems, Man, and Cybernetics-PART B, Vol. 34 (2004) 272-282 13. Azeem, M. F., Hanmandlu, M. and Ahmad, N.: Structure Identification of Generalized Adaptive Neuro-Fuzzy Inference Systems. IEEE Trans. Fuzzy Systems, Vol. 11 (2003) 666-681 14. Liu, P. X. and Meng, M. Q.-H.: Online Data-Driven Fuzzy Clustering With Applications to Real-Time Robotic Tracking. IEEE Trans. Fuzzy Systems, Vol. 12 (2004) 516-523 15. Kasabov, N.: Evolving Fuzzy Neural Networks for Supervised/Unsupervised Online Knowledge-Based Learning. IEEE Trans. Systems, Man, and Cybernetics-PART B, Vol. 31 (2001) 902-918 16. Angelov, P. P., Hanby, V. I., Buswell, R. A. and Wright, J. A.: Automatic generation of fuzzy rule-based models from data by genetic algorithms. in Advances in Soft Computing, R. John and R. Birkenhead, Eds. Heidelberg, Germany: Springer-Verlag (2001) 31-40 17. Angelov, P. P.: Evolving Rule-Based Models: A Tool for Design of Flexible Adaptive Systems. Heidelberg, Germany: Springer-Verlag (2002) 18. Takagi, T. and Sugeno, M.: Fuzzy identification of systems and its applications to modeling and control. IEEE Trans. Systems, Man, and Cybernetics, Vol. 15 (1985) 116-132 19. Wang, L. X.: A Course in Fuzzy Systems and Control. Prentice Hall (1997) 20. Yamakawa, T. and Matsumoto, G.: Methodologies for the Conception, Design, and Applications of Soft Computing, Eds. Singapore: World Scientific (1998) 271-274 21. Kim, K., Kim, Y.-K., Kim, E. and Park, M.: A New Fuzzy Modeling Approach. Proc. of FUZZ-IEEE 2004 (2004) 773-776 22. Kim, E., Park, M., Ji, S. and Park, M.: A new approach to fuzzy modeling. IEEE Trans. Fuzzy Systems, Vol. 5 (1997) 328-337 23. Kim, K., Kyung, K. M., Park, C.-W., Kim, E. and Park, M.: Robust TSK Fuzzy Modeling Approach Using Noise Clustering Concept for Function Approximation. LNCS, Vol. 3314 (2004) 538-543 24. Isermann, R., Lachmann, K.-H. and Matko, D.: Adaptive Control Systems. Prentice Hall (1992) 25. Ljung, L.: System Identification: Theory for the user. Prentice Hall (1998) 26. Goodwin, G. C. and Sin, K. S.: Adaptive Filtering Prediction and Control. Prentice Hall (1984) 27. Haykin, S.: Adaptive Filter Theory. Prentice Hall (1996) 28. Wai, R.-J., Chen, P.-C.: Intelligent Tracking Control for Robot Manipulator Including Actuator Dynamics via TSK-Type Fuzzy Neural Network.. IEEE Trans. Fuzzy Systems, Vol. 12 (2004) 552-559 29. Kiguchi, K., Tanaka, T., Fukuda, T.: Neuro-Fuzzy Control of a Robotic Exoskeleton With EMG Signals. IEEE Trans. Fuzzy Systems, Vol. 12 (2004) 481-490 30. Wang, L., Frayman, Y.: A dynamically generated fuzzy neural network and its application to torsional vibration control of tandem cold rolling mill spindles. Engineering Appl. Artificial Intelligence, Vol. 15 (2002) 541-550
Histogram-Based Generation Method of Membership Function for Extracting Features of Brain Tissues on MRI Images Weibei Dou1,2 , Yuan Ren1 , Yanping Chen3 , Su Ruan2 , Daniel Bloyet2 , and Jean-Marc Constans4 1
3
Department of Electronic Engineering, Tsinghua University, 100084 Beijing, China [email protected] 2 GREYC-ENSICAEN CNRS UMR 6072, Bd. Mar´echal Juin, 14050 Caen, France [email protected] Imaging Diagnostic Center, Nanfang Hospital, Guangzhou, China 4 Unit´e d’IRM, EA3916, CHRU, Caen, France
Abstract. We propose a generation method of membership function for extracting features of brain tissues on images of Magnetic Resonance Imaging (MRI)1 . This method is derived from histogram analysis to create a membership function. According to a priori knowledge given by the neuro-radiologist, such as the features of gray level of differentiate brain tissues in MR images, we detect the peak or valley features of the histogram of MRI brain images. Then we determine a transformation of the histogram by selecting the feature values to generate a fuzzy membership function that corresponds to one type of brain tissues. A function approximations process is used to build a continuous membership function. This proposed method is validated for extracting whiter matter (WM), gray matter (GM), cerebra spino fluid (CSF). It is evaluated also using simulated MR images with two different, T1-weighted, T2-weighted MRI sequences. The higher agreement with the reference fuzzy model has been discovered by kappa statistic.
1
Introduction
The feature extraction is very important process in a fusion system especially for the features fusion. Fuzzy set theory is an empirical science[1]. It represents those natural phenomenon which we observe in our real lives. A fuzzy set has been defined as a collection of some objects with membership degree [2]. A membership function represents a mapping of the elements of a universe of discourse to the unit interval [0, 1] in order to determine the degree to which each object is compatible with distinctive features to collect. The meaning and measurement to the membership degree is one of the main difficulties for determining a membership 1
Project 60372023 supported by National Natural Science Foundation of China.
L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 189–194, 2005. c Springer-Verlag Berlin Heidelberg 2005
190
W. Dou et al.
function. [2] had introduced six classes of some experimental methods that help to determine membership functions. We can put them into three main categories: Experiment-based fuzzy statistic; Experiment-based trichotomy; Parametric estimation based on fuzzy distribution. However the first two methods depend on some necessary experiment conditions and the limitation of the third one is how to determine the fuzzy distribution and the estimation criterion. Normally, the boundaries and the shapes of membership function must strictly correspond to an interpretation for an observation. So that an objective measurement is the first choice and the most suited to determine the membership function for representing a fuzzy set. We have found a correlation between a histogram of MRI image and the interpretation of a radiologist for the image. It is the gist of that we propose a generation method (based on histogram) of membership function for extracting features of brain tissues on MRI images. The second section of this paper explains some of the relationship between the interpretation and gray level of MRI images. The third section presents the combination of histogram measurement and property ranking to determine the membership degree of some brain tissues. The validation is given at last by simulated MR images from BrainWeb [3] with two different, T1-weighted (T1), T2-weighted (T2) MRI sequences. The error measurement has been carried out and has shown that this is an efficient method of a membership function generation by using kappa statistic.
2
Measurement of the Membership Degree
MRI images can provide much information about brain tissues from a variety of excitation sequences. Generally there are three main tissues in brain image, white matter (WM), gray matter (GM) and cerebral spinal fluid (CSF). Due to the limited resolution of the acquisition system, the partial volume effects in which one pixel may be composed of multiple tissue types. 2.1
Observation Space and Object
Observation Space and Object. The universe of discourse B = {v} is the brain image of MRI, where v = (x, y, z) is the coordinate of voxel. The various sequences images of MRI are called signal intensity spaces of B, noted SI ∈ B. We focuss on the two subsets of B, T1 and T2, noted as T1 = {(v, GLT 1 )}, T2 = {(v, GLT 2 )}, where GL denote the gray level or signal intensity of v. Such that SI = {T1, T2}. The pair of v and GL, (v, GL) is our observation object. Destination Space. Our task is to answer the question of ”Which tissue does a voxel belonged to?” So the different tissues are our destination space noted as Tiss = {CSF, GM, WM, . . .}, Tiss ∈ B. Definition Fuzzy Set. Due to the partial volume effect, a fuzziness result is more suitable to answer the question mentioned above. Then this question has been mapped to some fuzzy sets of B. View from a relation pair of the signal
Histogram-Based Generation Method of Membership Function
191
intensity space and the destination space, (SI, Tiss), we can construct the fuzzy sets as a map like μSI Tiss : B → [0, 1]. It means that a voxel belongs to a destination space Tiss with the membership degree μSI Tiss in a signal intensity space SI. 2.2
Interpretation of the Histogram Features
The gray level histogram of an image represents some statistic features. If we use GLk to present k th gray level, and the total number of gray level is L (for 8-bit image, L=256), k = 0, 1, . . . , L−1. The histogram p(GLk ) gives us an estimation of the probability of GLk . nk (1) p(GLk ) = N where nk is the total number of pixel which their gray level is equal to GLk . N is the total number of pixels in image.
1
1
1
0.9
0.9
0.9
0.8
0.8
0.8
0.7
0.7
0.7
0.6
0.6
0.6
0.5
0.5
0.4
0.4
0.4
0.3
0.3
0.3
0.2
0.2
0.1
0
0.2
0.1
0
50
100
1 μTCSF
(a)histogram
150
200
250
300
0
0.1
0
50
100
1 μTGM
− histo
1
150
200
250
300
0
0
− histo 1
0.9
0.9
0.9
0.8
0.8
0.8
0.7
0.7
0.7
0.6
0.6
0.6
0.5
0.5
0.4
0.4
0.4
0.3
0.3
0.3
0.2
0
0.2
50
100
150
1 μTCSF
200
250
300
0
100
150
200
250
300
− histo
0.5
0.2
0.1
0
50
μTW1M
1
0.1
(b)T1
0.5
0.1
0
50
100
150
1 μTGM
200
250
300
0
0
50
100
150
200
250
300
μTW1M
Fig. 1. Histogram and membership function. (a) is the histogram of whole se1 quence of the original image (b). Continuous membership function (GLT 1 , μTCSF ), 1 ), (GLT 1 , μTW1M ) are shown in second line, and transformed histogram (first (GLT 1 , μTGM 1 1 line) (GLT 1 , μTCSF − histo), (GLT 1 , μTGM − histo) and (GLT 1 , μTW1M − histo).
One example of histogram of whole sequence is shown in figure 1(a). What meaning are these peaks and valleys? By combining the gray level characteristics of the three main tissues on the different MRI sequences [4], we have found some features of the MRI image’s histogram: 1. The gray levels at the histogram peaks correspond to the feature of pure tissue types. For example in figure 1(a), the peak maximum corresponds to the feature of pure GM, the middle-height peak corresponds to pure WM and the peak minimum corresponds to pure CSF.
192
W. Dou et al.
2. The gray levels at the histogram valleys correspond to the feature of mix tissues. The valley means a transition process from one tissue type to another. The maximum valley in figure 1(a) corresponds to the mix of GM and WM, and the other one corresponds to the mix of CSF and GM. 3. Normalized histogram provide an assessment of possibility. The gray level histogram p(GLk ) of an image provides the frequency or probability of the gray value GLk in the image. If the histogram has been normalized with that of pure tissue corresponded, the normalized histogram can provide an assessment of possibility that the GLk belongs to the pure tissue.
3
Generation of the Membership Function
We propose an approach for transforming the histogram into a membership function. The feature points for transformation are their peaks, noted as pk vl vl (GLpk SI , p(GLSI )) and their valleys, noted as (GLSI , p(GLSI )). Let F denote one of transformation operation. So the membership function is presented as equation 2, and an example of transformation from figure 1(a) is shown in the figure 2-(a). pk vl μSI Tiss = F {(GLSI , p(GLSI )), p(GLSI ), p(GLSI )}
(2)
where GLSI = {GLT 1 , GLT 2 }. As an example, μSI Tiss can be obtained with the properties ranking in T1, μTW1M ∝ GLT 1 ;
(3)
1 1 2 2 μTCSF -(a) μTCSF -(b) μTCSF -(a) μTCSF -(b) μt−norm -(a) μt−norm -(b) μStd CSF CSF CSF
1 μTGM -(a)
1 μTGM -(b)
2 μTGM -(a)
2 μTGM -(b)
μt−norm -(a) μt−norm -(b) GM GM
μStd GM
μTW1M -(a) μTW1M -(b) μTW2M -(a) μTW2M -(b) μt−norm -(a) μt−norm -(b) WM WM
μStd WM
Fig. 2. Anatomical models μStd T iss and extraction results by the membership functions SI SI μSI T iss . μT iss -(a) are taken by transformation of histogram, μT iss -(b) are taken by apt−norm 1 are taken by operating t-norm between μTT iss and proximating function, and μT iss 2 μTT iss .
Histogram-Based Generation Method of Membership Function
1 ; GLT 1
(4)
GLT 1 , if GLT 1 ≤ GLpk T1 pk 1 GLT 1 , if GLT 1 < GLT 1
(5)
1 μTCSF ∝
1 μTGM
∝
193
For building a simplicity and continuous membership function of brain tissues on MRI images, function approximating approaches is firstly considered. According to the shape of μSI Tiss , select a simpler continuous function to approximate them with a rule of minimizing error. For example, we determined a semi-trapezoid’s taking lower function for approximating μT1 CSF , a triangle’s taking middle function for μT1 , and a semi-trapezoid’s taking upper function for GM . These approximation function are illustrated in figure 1. The parameters μT1 WM of these functions have been determined of course using the boundary points of μSI Tiss , i.e. the feature points in histogram, such as the peaks or valleys.
4
Fuzzy Feature Extraction and Evaluation
The simulated MRI volumes, available on the BrainWeb [3], are used to evaluate our method. For extracting their features from original images, a geometric mean as t-norm operation between μSI Tiss is used as a fusion operator of fuzzy information fusion system proposed in [4]. These results are shown in figure 2. Figure 2-(a) derived directly from the transformed histogram μSI Tiss , and -(b) derived from the approximated functions of μSI mentioned in section 3. Tiss The fuzzy anatomic models provided by BrainWeb [3], are utilized as the Std standard sets μStd T iss shown in figure 2. The error measurement between μT iss and SI μT iss are done with two methods: detection error of binary set and a measure of agreement using kappa statistic [5]. Because of the fuzzy value μ = 0.5 is a very important feature point which the maximum fuzziness value is in fuzzy set. So the measurement has done by separating into two cases: μT iss > 0 and μT iss ≥ 0.5. We define the relative miss detection ηrm in (6) and the relative false detection ηrf in (7). ηrm =
NF alseN egative NF alseN egative + NT rueP ositive
(6)
ηrf =
NF alseP ositive NF alseN egative + NT rueP ositive
(7)
where N notes the number of voxels. Some important results are explained that: (1)The fusion operation, geometric mean proposed in [4] as t-norm give a better result. (2)Higher agreement between the μSI Tiss that is derived directly from the transformed histogram, and . That is because the coefficient of kappa statistic for any tissue is upon the μStd T iss 0.67 for 10 classes separated from entire set and is upon 0.96 for two classes that are separated by 0.5-cut set. (3)Lower relative false detection ηrf . The maximum
194
W. Dou et al.
ηrf is only 5.49% for 0.5-cut set and for WM is 0.09%. (4)Lower relative miss detection ηrm . The maximum ηrm is 5.74% for 0.5-cut set and 5.52% for entire set. For GM, ηrm is only 1.41% for 0.5-cut set.
5
Conclusion
The first one of measurement problem in fuzzy set theory deals with measuring the degree of membership of several subjects or objects in a single fuzzy set [1]. The proposed generation approach of membership function based on histogram gives the membership degree by statistic histogram of grey level. This process is described by three steps: calculating the histogram of whole MRI sequence; transformation the histogram into membership function by selecting some feature values such as peaks and valleys of the histogram; a function approximation process to build a continuous membership function. It has performed well the conjunction or fusion of objective measure and a priori knowledge or experiences of neuro-radiologist. The measuring of membership degree is objective, because it is derived from entire histogram of image volume of MRI. The transformation of histogram is derived from an integration of a priori knowledge, experience, objective observation for histogram and analysis of property ranking. The generated membership function can be approximated with an appropriate continuous function. There is at least geometric mean, one of appropriate t-norm operator for conjunction the different models. It gives a natural description of possibility. The result of error measure shows a higher agreement of the constructed fuzzy model and the reference fuzzy model by resulting from the coefficient of kappa statistic. Then, this method of membership function generation is only applicable for an object or fuzzy set with larger size.
References 1. Dubois, Didier and Prade, Henri: Fundamentals of fuzzy sets. Kluwer academic publishers (2000). 2. Pedrycz, Witold and Gomide, Fernando: An introduction to fuzzy sets analysis and design. The MIT Press (1998). 3. Cocosco, A., Kollokian, V., Kwan, R. K-S., and Evans, A. C.,: Brain Web: Online interface to a 3D MRI simulated brain database. available at http://www.bic.mni.mcgill.ca/brainweb. 4. Dou, W., Ruan, S., Bloyet, D., Constans, JM., and Chen, Y.: Segmentation based on Information Fusion Applied to Brain Tissue on MRI. Proceedings of SPIE-IST Electronic Imaging Vol.5298 (2004) pp.492-503. 5. Siegel, Sidney and N. J. Castellan, Jr.,: Nonparametric Statistics for the Behavioral Sciences. Second edition. McGraw-Hill (1988).
On Identity-Discrepancy-Contrary Connection Degree in SPA and Its Applications Yunliang Jiang 1,2, Yueting Zhuang 1, Yong Liu 1, and Keqin Zhao 3 1
College of Computer Science, Zhejiang University, Hangzhou 310027, P.R. China 2 School of Information & Engineering, Huzhou University, Huzhou 313000, P.R. China 3 Institute of Zhuji Connection Mathematics, Zhuji 311811, P.R. China {jylsy, yzhuang}@zju.edu.cn, [email protected], [email protected]
Abstract. As a kind of new uncertainty theory, the set pair analysis (SPA) researches certainties and uncertainties from the whole. The main idea of SPA is: (1) Every system is comprised of certainty knowledge and uncertainty knowledge; (2) Certainties and uncertainties are inter-related, inter-influenced, interrestricted, and inter-transformed even under certain condition, in every system; (3) A computation formula (identity-discrepancy-contrary (IDC) connection degree formula) which can embody the above idea fully is used to depict uniformly all kinds of uncertainties such as fuzzy uncertainty, random uncertainty, indeterminate-known uncertainty, unknown and unexpected incident uncertainty, and uncertainty which is resulted from imperfective information. This paper introduces the concepts and provides the applications of IDC connection degree and IDC connection number in SPA.
1 Introduction Set pair analysis (SPA) is a novel mathematical tool dealing with vagueness and uncertainty. It can be effectively used to analysis and process such uncertainty knowledge as imprecise knowledge, disagreement knowledge, and unintegrity knowledge, etc., find hidden knowledge, and reveal latent rule. [1] SPA was proposed by Keqin Zhao, one of the four authors of this paper, a Chinese scholar, in 1989. From 1995 up to now, the workshop of SPA is convened annually in China. SPA has been one of the solicitant topics of the national academic conference of Chinese Artificial Intelligence Institute from the year 2001. SPA has also been one of the important topics of many academic journals. All these promote the development and application of SPA effectively. Now SPA has been a relatively new academic hot point in the field of AI (especially in the field of the uncertainty knowledge processing) in China, and has received much attention of the researchers all over China. Advances have been achieved substantially in the theory and application of SPA after more than ten years of research and development. In recent years, the IDC connection degree (number) in SPA has been successfully applied to many fields including decision making, forecasting, data fusion, etc. L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 195 – 198, 2005. © Springer-Verlag Berlin Heidelberg 2005
196
Y. Jiang et al.
2 Basic Concepts 2.1 IDC Connection Degree Under general circumstances, it’s a formula [1]
μ=
S F P + i+ j N N N
(1)
Here N is the total number of features, S is the number of identity features, and P is the number of contrary features, of the set pair discussed. F = N-S-P is the number of features of the two sets that are neither identity nor contrary. S/N, F/N, and P/N are called identity degree, discrepancy degree, and contrary degree of the two sets under certain circumstances respectively. The “j” is the coefficient of the contrary degree, and is specified as -1. The “i” is the coefficient of the discrepancy degree, and is an uncertain value between -1 and 1, i.e. i ∈ [-1,1], in terms of various circumstances. When the values of “i” and “j” do not demand to be considered, the “i” and “j” can merely be regarded as the markers of the discrepancy degree and the contrary degree respectively. In order to clarify formula (1), we set a= S/N, b= F/N, c= P/N, and then the formula (1) can be rewritten as μ = a + bi + cj
(2)
It is obvious that the “a”, “b”, and “c” satisfy the equation a+b+c=1. 2.2 IDC Connection Number The IDC connection number is a kind of number coming from the IDC connection degree. Its representation form is still μ = a + bi + cj , here “a”, “b”, and “c” are random positive numbers, j=-1, i ∈ [-1,1]. [1] SPA understands the meaning of the IDC connection number from the viewpoint of layer. SPA considers the IDC connection number as a kind of number that can depict uncertain quantity, and thinks that the IDC connection number is different from constant, variable, and super uncertain quantity essentially(see Tab.1) Table 1. Constant, variable, uncertain quantity, and super uncertain quantity Items
Constant
Variable
macro layer micro layer
certain certain
uncertain certain
number concept
constant number
variable number
instance
circle rateʌ
free dropping rate v
Uncertain quantity certain uncertain IDC connection number waiver rate bi
Super uncertain quantity uncertain uncertain chaos number
On Identity-Discrepancy-Contrary Connection Degree in SPA and Its Applications
197
3 Research Advances The Research of SPA is currently focused on its mathematical property, its extension, the relation and mutual supplement with other uncertainty approaches, etc. 3.1 Mathematical Property and Extension of SPA The paper [2] researches the fundamental rule and property of arithmetic operation, the equivalence relation, and the priority relation, of the IDC connection degree, and proves that the IDC connection degree is “half order”. The research of extension of SPA deals mainly with the generalized SPA model. The research of the generalized SPA model doesn’t consider that the numbers of elements of two sets in a set pair should be limited as equal. Moreover, the research considers that the comparison of elements in two sets should be done in “order-pair” of elements, not in corresponding elements directly. “Whether or not has certain relation” should be adopted as the comparison style. [2] 3.2 Relation and Mutual Supplement with Other Uncertainty Approaches The paper [3] discussed some differences of SPA and the fuzzy set theory in research object, thought method, application field, etc. The papers [4, 5] presented the fuzzy SPA way which regards certainty knowledge and uncertainty knowledge as a pair of fuzzy sets and depicts from certainty and uncertainty. The way picks up relatively certainty knowledge from the object, acknowledges and considers relatively uncertainty knowledge corresponding to a kind of depiction, and regards this knowledge as useful knowledge. The paper [6] presented the definitions of the rough logic operator and the valuation of the upper and lower approximation sets, applied the IDC connection degree to the rough set, and put forward the concept of the rough IDC connection degree. The paper [7] proposed the concept of the IDC connection degree of the rough set of the set type, presented the calculation steps of the condition attributes reduction and the attribute redundant values reduction of the decision table using the rough set IDC connection degree, and proposed an example that illustrates that the new approach is simpler than the traditional reduction approach of the rough set theory.
4 Applications Fei Peng et al. introduced SPA into FHW and founded the schema evaluation decision approach based on SAP and FHW DSS. Jie Gao et al. proposed a SPA classified forecasting method which combines “the principle of selecting the closer” of IDC pattern recognition in SPA with the basic idea of clustering analysis. Radar-ESM correlation is one of the key subjects in data fusion. Based on SPA and Multiple Hypothesis Testing (MHT), Guohong Wang et al. developed a new radarESM correlation algorithm suitable for the situations where both radar and ESM tracks are specified by different numbers of measurements.
198
Y. Jiang et al.
More than 400 papers have been published on application of SPA till now. The application fields of SPA also include uncertainty reasoning, network planning, comprehensive evaluation, traffic, power system, and other fields. [8]
5 Conclusions Though the development history of SPA has only more than ten years till now, many fruits of the research and application of SPA have been achieved. It has been showed that SPA is a relatively promising soft computing approach which provides a powerful analysis means for the processing of uncertainty knowledge.
Acknowledgements This research was supported in part by the Zhejiang Provincial Natural Science Foundation of China under grant M603169, and in part by the Natural Science Foundation of China under grant 60402010, and in part by the Huzhou Natural Science Foundation of Zhejiang. We are grateful to the anonymous referees for their insightful comments and suggestions, which clarified the presentation.
References 1. Keqin Zhao. Set Pair Analysis and its Preliminary Applications. Zhejiang Science and Technology Press, Hangzhou, 2000 (In Chinese). 2. Peng Zhang, and Guangyuan Wang. New Theory of Set Pair. Journal of Harbin University of C. E. & Architecture, 33(3): 1-5, 2000 (In Chinese). 3. Bin Zhang. The Method and Thought of Set Pair Treated with Uncertainties Information. Fuzzy Systems and Mathematics, 15(2): 89-93, 2001 (In Chinese). 4. Bin Zhang, and Xiumei Zhao. The Fuzzy Set Pair Analysis Way for Processing Uncertainty Problem. Journal of University of Electronic Science and Technology of China, 26(6): 630634, 1997 (In Chinese). 5. Bin Zhang. The Fuzzy Set Pair Analysis Way of Multi-Objective System Decision. Systems Engineering-Theory &Practice, 17(12): 108-114, 1997 (In Chinese). 6. Ping Zhang, and Decai Huang. Rough Set Based on Connection Degree. Journal of Hangzhou Institute of Electronic Engineering, 21(1): 50-54, 2001 (In Chinese). 7. Ping Zhang, and Decai Huang. Reducing Method for Knowledge Representation System Based on Rough Set Connection Degree. Journal of Zhejiang University of Technology, 30(1): 5-8, 2002 (In Chinese). 8. Yunliang Jiang, Congfu Xu, et al. Advances in Set Pair Analysis Theory and its Applications. Computer Science (In Chinese) (Accepted).
A Mathematic Model for Automatic Summarization Zhiqi Wang, Yongcheng Wang, and Kai Gao Department of Computer Science and Engineering, Shanghai Jiao Tong University, 200030, Shanghai, P. R. China {shrimpwang, ycwang, gaokai}@sjtu.edu.cn Abstract. Automatic Summarization is need of the era. Mathematics is an important tool of nonfigurative thinking. A mathematic model of automatic summarization is established and discussed in the paper. The model makes use of meta-knowledge to describe the composition of the summary and help to calculate the semantic distance between summary and source document. It is proposed that how to get meta-knowledge aggregate and their weight are the key problems in the model.
1 Introduction With tons of text information pouring in every day, text summaries are becoming essential. It is said that a professional abstractor can edit only 55 summaries at most in one day (15 summaries at least and 27 summaries on average) [1]. This makes it necessary using computer to make summary automatically. Mathematics is an important tool of nonfigurative thinking. Research and development of almost every kind of science field must turn to mathematics as something of a tool. However, there is no any previous work concerned about mathematic description of automatic summarization, although it will be useful for both the application of automatic summarization and automatic assessment for summaries.
2 Mathematic Model for Automatic Summarization 2.1 Mathematic Description of Automatic Summarization Suppose D represents the source document. A represents a summary of document D. L is A’s length, and the error scope of length L is ± δ . Suppose Length(S) is the length measuring function of string S and TopicSet(S) is an aggregate, which contains all the information conveyed by S. So a summary is a string, which must satisfy all the conditions as follows: Condition 1:
|Length ( Ai )-L| ≤ δ
Condition 2:
TopicSet (A i ) ⊂ TopicSet (D)
(L 0 C(xi ) = l, otherwise
(6)
(7)
226
J. Zhang, S.Z. Li, and J. Wang
where l means some unlabelled class or outlier. With respect to Eq. (6) and (7), we can remove outliers and unknown labelled samples effectively. It is worthy noting that the proposed algorithm is incapable of distinguish the two exceptions (outlier and unknown labelled sample) due to the absence of prior knowledge. 2.2 Comparison with NFL Algorithms The proposed algorithm inherits the basic idea of NFL algorithms [4], that is to say, some extended prototypes are generated by employing interpolation approach. However, there exists some differences between the two algorithms. The distinction is illustrated as Figure 1.
(a) NFL algorithm
(b) GPC algorithm Fig. 1. Two examples
In Figure 1, point B is equidistance to three edges of triangle and the projection distance of point A is the same projection distance to nearest line as point B, and three circular white points are the assumed prototypes. Also the dash curves are hypothesized to form the boundary of data distribution. From Figure 1(a), it can be shown that the extended prototypes are generated through orthogonal projection from sample A to three lines which are made up of any two training samples. Meanwhile, in Figure 1(b), the extended prototypes are means of any two prototypes in the same class with the proposed GPC algorithm. Definitely, point A has the same projection distance as point B if NFL algorithm is used. And therefore, when using NFL to classify point A and B, the nearest projection distance would be equal. From Figure 1(b) it is clear that point B is closer to the boundary of distribution. Nevertheless, point B should be more important than point A. Because of the local behavior of NFL algorithm, the property has not been embodied, especially when data are sparse and highly twisted.
Geometrical Probability Covering Algorithm
227
Conversely, with the proposed GPC algorithm, the probability of each sample is the sum of the probabilities that sample are classified to the extended prototype set in the same class. The procedure is similar to cover the structure or distribution of data. As in Figure 1(b), the local probability densities from sample A to each extended prototypes are the same as those of sample B. But the global probability of the sample A will be smaller because the number of the covered gaussian kernel of sample in the boundary is less than that of sample in the middle of distribution when suitable variance σ 2 is selected. The reason is that the proposed GPC algorithm considers the properties of probability distribution based on the geometrical covering. More generally speaking, the data in the boundary of distribution will have lower probabilities than these samples in the “middle” of data. In particular, the GPC algorithm may obtain better classification performance than the NFL algorithm when data are highly twisted and sparse.
3 Experiments Extensive experiments are carried out on several databases to verify the performance of the proposed GPC algorithm. Also we explore potential applications of the probability representation of the expression and outlier removal based on the extension version of the GPC algorithm. 3.1 Recognition In the recognition experiments, several databases including three face databases and one image-segmentation database, are used for evaluating the classification ability of the proposed GPC algorithm. The face databases are ORL, UMIST and JAFFE databases, respectively [7,8,9]. Image segmentation databases are from UCI repository [10]. All the databases have been standardized to scope [0, 1]. It is noticeable that we employ the dimensionality reduction approaches for reducing the dimensions of the high- dimensional databases and avoiding the problem of “the curse of dimensionality”. The details of experimental procedure are listed in Table 1: Table 1. Experimental approach and the abbreviation of the corresponding algorithms: 1: PCA+NN; 2: PCA+NFL; 3: PCA+GPC; 4: LLE+NN; 5: LLE+NFL; 6: LLE+GPC; 7: PCA+LDA+NN; 8: PCA+LDA+NFL; 9: PCA+LDA+GPC; 10: ULLELDA+NN; 11: ULLELDA+NFL; 12: ULLELDA+GPC. TR: training samples (*classes); TE: test samples.
ORL UMIST JAFFE Image-Segmentation
Compared Algorithms Experimental Average 1-12 100 runs 1-12 100 runs 1-12 100 runs 1-12 Once
TR 5*40 10*20 6*10 30*7
TE classes 5*40 40 375 20 153 10 300*7 7
In Table 1, LDA denotes linear discriminant analysis [11], PCA principal component analysis [12], LLE locally linear embedding algorithm [13], ULLELDA is our proposed dimensionality reduction approach [14]. These dimensionality reduction
228
J. Zhang, S.Z. Li, and J. Wang
approaches are used to obtain different subspaces to assure the robustness of the proposed GPC algorithm. Because the paper focuses on the proposed GPC algorithm, the detail on how to achieve dimensionality reduction can be seen in [14,15,16]. For classifiers, NN denotes 1-nearest neighbor and NFL nearest feature line. In our experiments, several combinational algorithms are thus applied for recognition. For example, PCA+NN represents the combination of principal component analysis and nearest neighbor algorithm. As for the proposed GPC algorithm, variance σ 2 need to be predefined. Considering the geometrical explanation mentioned above, a set of parameters are generated from 1e − 10 to 1e + 10 according to logarithm basis. The final results are selected based on the lowest error rates of each mentioned algorithms. The results are reported in Table 2, where the lowest error rates have been marked with bold type.
Table 2. The Error rates (%) and Standard Deviations (%) of the classification algorithms ORL LLE+GPC 5.01±1.68 LLE+NFL 4.35±1.42 LLE+NN 7.01±1.83 PCA+GPC 8.39±2.51 PCA+NFL 8.27±2.41 PCA+NN 10.12±2.06 ULLELDA+GPC 3.82±1.91 ULLELDA+NFL 4.06±1.40 ULLELDA+NN 4.13±1.33 PCA+LDA+GPC 7.56±1.92 PCA+LDA+NFL 7.35±1.89 PCA+LDA+NN 7.45±1.92
JAFFE 2.75±1.44 3.79±2.14 5.78±2.49 11.21±2.86 9.65±2.70 11.29±2.85 2.08±1.39 2.14±1.45 2.18±1.38 0.80±0.95 1.09±1.11 0.93±1.05
UMIST Image Segmentation 4.323±1.68 9.52 4.50±1.76 9.19 5.96±1.97 10.91 6.55±2.06 12.62 5.44±1.74 10.61 8.38±1.88 14.24 2.20±1.21 7.05 2.40±1.18 8.81 2.20±1.22 8.76 2.60±1.21 7.61 2.62±1.22 8.33 2.59±1.21 9.10
From the table it is clear that the proposed GPC algorithm obtain the lowest recognition when being compared with other mentioned classification algorithms. When the dimensionality of database is large, moreover, the GPC algorithm combining with some dimensionality reduction techniques will obtain better classification ability. We also investigate the influences of training samples for the proposed GPC algorithm which can be illustrated in Figure 2. As training samples increases, the ULLELDA+GPC algorithm reports the lowest error rates for ORL database. And when training samples of the ORL face databases are 9 each class, the error rates and standard variance of ULLELDA+GPC algorithm are 0.30% and 0.96%, respectively. However, when training samples of the JAFFE face databases are 11 each class, two algorithms (PCA+LDA+GPC and PCA+LDA+NN) achieve 100% recognition accuracy. The reason we assumed is that when the distribution of training samples tends to true one, the two algorithms make a comparable performance. We also observe that when training samples are sparse, the classification of the proposed GPC algorithm are better than those of the mentioned algorithms.
Geometrical Probability Covering Algorithm 0.4
8
ULLELDA+NFL ULLELDA+GPC ULLELDA+NN PCA+LDA+NFL PCA+LDA+GPC PCA+LDA+NN
0.35
ORL Face Recognition 0.3
JAFFE Face Recognition 6
5 Error Rates (%)
Error Rates
ULLELDA+NFL ULLELDA+GPC ULLELDA+NN PCA+LDA+NFL PCA+LDA+GPC PCA+LDA+NN
7
0.25
0.2
0.15
4
3
0.1
2
0.05
1
0
0
−0.05
229
3
4
5
6
7
8
−1
9
4
5
6
7
Training samples
8
9
11
10
Training samples
(a) ORL Face Recognition
(b) JAFFE Face Recognition
Fig. 2. The influence of training samples
3.2 Outlier Removal Generally speaking, the categories of some samples often differ from the labelled classes when we process large-scale data. An effective face recognition system should not only recognize the face, but also exclude some unlabelled nonface images or data. Based on the extension of the GPC algorithm, the outlier or unlabelled data can be removed. In the experiment, we randomly select some images such as faces and landscapes, which are not in the JAFFE databases. The images are shown in Figure 3(a) and have the same pixels (146*111). With respect to the proposed GPC algorithm, let variance σ 2 be 1 and the reduced dimensions be 150 using the LLE algorithm, then the relationship between the rejected rate and accepted rate with different thresholds is illustrated in Figure 3(b).
1
Error Identification Rate for Fabricate Faceandnonface Database Error Rejection Rate for JAFFE Database
0.9 0.8
Rejection Rate
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
0.5
1
1.5
2
2.5
3
threshold value
(a) Unlabelled Face and Landscapes
3.5
4
4.5
5
(b) The Rejected and Accepted Curves of the Unlabelled Samples
Fig. 3. Outlier Removal
−3
x 10
230
J. Zhang, S.Z. Li, and J. Wang
From the Figure 3(b), it can be seen that the line represents the rejected ratio of unlabelled samples when the range of threshold is in [0, 0.005] and the unlabelled samples are completely excluded when the threshold is 0.0015 or so. Meanwhile, dashed-line reflects the the rejected ratio of the labelled samples. Considering the trade-off, the threshold is equal to 0.0012. Consequently, only one of 22 unlabelled images is misclassified and all the labelled images are recognized without being removed as outliers.
4 Conclusions In the paper, we propose the GPC algorithm to improve classification ability. With combination of geometrical covering and probability distribution, the proposed algorithm first forms extended prototypes through computing means of any two prototypes of the same class. And then the Gaussian kernel is adopted for controlling the scope of geometrical covering and generates the local probability measurement. By computing the sum of the probabilities from sample to the set of prototypes and extended prototypes, the global measurements for classification is obtained. Based on the GPC algorithm, the data in the boundary will have lower probability than in the ’middle’ of distribution. Simple and geometric intuitive, the GPC algorithm is computationally effective without involving estimation of many parameters. When the training samples are sparse, the proposed algorithm obtain better performance than other classification algorithms. Experiments on several databases show the advantages of the proposed algorithm. However, several problems remain. First of all, the computational complexity of the proposed algorithm is higher than that of NN algorithm. It is difficult for large-scale data to obtain practical application. How to decrease the complexity deserves research profoundly. Second, the choice of variances depends on human experiences. Adaptive selecting the parameter is worth studying. Finally, all local probability is represented with the same Gaussian kernel in the paper. But there may exist different kernels which lead to better classification performance. Therefore, we will research the possibility of the alternative local kernels in the future work.
References 1. Duda, R. O., Hart, P. E., and Stork, D. G.: Pattern Classification. Second Edition, John Wiley & Sons, Inc, (2001). 2. Cover, T. M., Hart, P. E.: Nearest neighbor pattern classification. IEEE Transactions on Infomation Theory. 13 (1968) 21-27 3. Dasarathy, B. V., ed.: Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. IEEE Computer Society. Washington, (1991) 4. Li, S. Z., Chan K. L. and Wang, C. L.: Performance Evaluation of the Nearest Feature Line Method in Image Classification and Retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence. 22(11), November, (2000), 1335-1339 5. Fraley C., Raftery, A. E.: Model-based Clustering, Discriminant Analysis, and Density Estimation. Technical Report No. 380, University of Washington, (2000) 6. Dempster, A., Laird, N., and Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. B, 39 (1977) 1-38
Geometrical Probability Covering Algorithm
231
7. Samaria, F. S.: Face Recognition Using Hidden Markov Models. PhD thesis, University of Cambridge, (1994) 8. Wechsler, H., Phillips, P. J., Bruce, V., F. Fogelman-Soulie V. and Huang T. S.(eds): em Characterizing virtual Eigensignatures for General Purpose Face Recognition. Daniel B Graham and Nigel M Allinson. In Face Recognition: From Theory to Applications. NATO ASI Series F, Computer and Systems Sciences, 163 (1998) 446-456 9. Michael, J. L., Budynek, J., and Akamatsu, S.: Automatic classification of Single Facial Images. IEEE Transactions on Pattern Analysis and Machine Intelligence. 21(12) (1999) 1357-1362 10. Murphy, P. M., Aha, D. W., UCI Repository of machine learning databases [http://www.ics.uci.edu/ mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science, (1994) 11. Daniel, L. S., Weng, J.: Using Discriminant Eigenfeatures for Image Retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligences. 18(8) (1996) 831-836 12. Turk, M., Pentland, A.: Eigenfaces for Recognition. Journal of Cognitive Neuroscience. 3(1) (1991) 71-86 13. Roweis, S. T., Lawrance, K. S.: Nonlinear Dimensionality reduction by locally linear embedding. Science. 290 (2000) 2323-2326. 14. Zhang, J., Shen, H., and Zhou, Z-H.: Unified Locally Linear Embedding and Linear Discriminant Analysis Algorithm (ULLELDA) for Face Recognition. Advances in Biometric Personal Authentication. Stan Z. Li, Jianhuang Lai, Tieniu Tan, Guocan Feng, Yunhong Wang (Ed.), LNCS 3338, Springer-Verlag, (2004) 209-307 15. Zhang, J., Li, S. Z., and Wang, J.: Nearest Manifold Approach for Face Recognition. The 6th IEEE International Conference on Automatic Face and Gesture Recogntion.Seoul, Korea, May, (2004) 16. Zhang, J., Li, S. Z., and Wang, J.: Manifold Learning and Applications in Recognition. in Intelligent Multimedia Processing with Soft Computing. Yap Peng Tan, Kim Hui Yap, Lipo Wang (Ed.), Springer-Verlag, Heidelberg, (2004)
Extended Fuzzy ALCN and Its Tableau Algorithm* Jianjiang Lu1,2, Baowen Xu1,2,3, Yanhui Li1, Dazhou Kang1, and Peng Wang1 1
Department of Computer Science and Engineering, Southeast University, Nanjing 210096, China 2 Jiangsu Institute of Software Quality, Nanjing 210096, China 3 State Key Laboratory of Software, Wuhan University, Wuhan 430072, China [email protected]
Abstract. Typical description logics are limited to dealing with crisp concepts. It is necessary to add fuzzy features to description logics for management of the fuzzy information. In this paper, we propose extended fuzzy ALCN to enable representation and reasoning for complex fuzzy information. We define syntax structure, semantic interpretation and reasoning problems of the extended fuzzy ALCN, and discuss the reasoning properties inexistent in typical description logics. We also design tableau algorithms of reasoning problems for extended fuzzy ALCN. The tableau algorithms are developed in the style of so-called constraint propagation method. Extended fuzzy ALCN is more expressive than the existing fuzzy description logics and present more wide fuzzy information.
1 Introduction The Semantic Web [1] is an extension of the current Web in which web information is given well-defined semantic meaning to be machine-understandable, better enabling intelligent Web data processing. Description logics (DLs) [2] provide a logical reconstruction of object-centric and frame-based knowledge representation languages as main inferential means on the Semantic Web. Web applications often need management of fuzzy information, but typical DLs are limited to dealing with crisp concepts and crisp roles. Therefore, it is necessary to add fuzzy features to description logics. Meghini [3] proposed a preliminary fuzzy DL, which lacks reasoning algorithm, as a modeling tool for multimedia document retrieval. Straccia [4] presented Fuzzy ALC (FALC), a fuzzy extension of ALC combining fuzzy logic with typical ALC [5], and gave a constraint propagation calculus for reasoning in it. FALC just offers limited but not sufficient expressive power of complex fuzzy information. Some discussion about reducing FALC into classical ALC is given in [6], and the reduction does not extend the expressive bound of FALC. We present a more expressive fuzzy DL called Extended Fuzzy ALCN (EFALCN), which introduces the cut sets of fuzzy concepts and fuzzy roles as atomic concepts and atomic roles, *
This work was supported in part by the NSFC (60373066, 60425206, 90412003), National Grand Fundamental Research 973 Program of China (2002CB312000), National Research Foundation for the Doctoral Program of Higher Education of China (20020286004).
L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 232 – 242, 2005. © Springer-Verlag Berlin Heidelberg 2005
Extended Fuzzy ALCN and Its Tableau Algorithm
233
inherits the concept constructors and representation from ALCN [2], and defines a semantic interpretation for concepts and roles. EFALCN adopts a special fuzzifymethod different from FALC and cover the expressive scope of FALC with more expressive power.
2 A Quick Look to FALC Many vague concepts in the real world do not have a precise definition of membership. Such vague concepts are called fuzzy concepts. DLs are limited to dealing with crisp concepts and lack expressive power of fuzzy concepts. Straccia presented FALC, which is an extension of ALC with fuzzy features, to support fuzzy concept representation. In the syntactic view, starting with a set of atomic concepts and a set of atomic roles, let B be an atomic concept, R be an atomic role, FALC concept descriptions C and D are inductively defined with the operation of ALC concept constructors in the following syntax rules: C, D::= F | ⊥ |B| ¬C | C D | C D | ∀R.C | ∃R.C . The top concept F denotes the whole domain and the bottom concept ⊥ denotes the empty set. FALC roles can only be atomic roles. A FALC knowledge base is a pair ¦ ( , T), where A is an ABox, which is a set of fuzzy assertions, and T is a TBox, which is a set of fuzzy terminological axioms. A fuzzy assertion is in the form of or . means the membership degree of the individual a being instance of the concept C is at least n, and has a similar meaning. A fuzzy terminological axiom is in the form of < B E⋅ D > or < B :≈ D >. < B E⋅ D > means B is a sub concept of D, and < B :≈ D > means B is equivalent to D. In the semantic view, the fuzzy interpretation for FALC is a pair I =< Δ I ,< I > , where Δ I is a nonempty domain, and < I is an interpretation function satisfying:
(a) I = a I ∈ Δ I ; B I : Δ I → [0,1] ; R I : Δ I × Δ I → [0,1] ;
(1)
F I ( d ) = 1 ; ⊥ I ( d ) = 0 ; ( ¬C ) I ( d ) = 1 − C I ( d ) ; (C D) I (d ) = min{C I (d ), D I (d )} ; (C D) I (d ) = max{C I (d ), D I (d )} ;
(∃R.C ) I (d ) = sup d '∈Δ I {min{R I (d , d '), C I (d ')}} ; (∀R.C ) I (d ) = inf d '∈ΔI {max{1 − R I (d , d '), C I (d ')}} . I satisfies a fuzzy assertion (resp. ) iff C I (a I ) ≥ n (resp. R (a I , b I ) ≥ n ). I satisfies an ABox A (written I §A) iff I satisfies all fuzzy assertions in A. I satisfies a fuzzy terminological axiom < B E⋅ D > (resp. < B :≈ D >) iff I
∀d ∈ Δ I , B I (d ) ≤ D I ( d ) (resp. B I (d ) = D I ( d ) ). I satisfies a TBox T (written I §T)
iff I satisfies all fuzzy terminological axioms in T. I satisfies a knowledge base ¦ (A,
T), iff I satisfies both A and T.
234
J. Lu et al.
3 EFALCN 3.1 Syntax Structure of EFALCN
EFALCN introduces the cut sets of fuzzy concepts and fuzzy roles. Let ΔN C and ΔN R denote the sets of atomic fuzzy concepts and the set of atomic fuzzy roles. The
set of atomic concepts in EFALCN is defined as ΔN CE = {B[ n ] | B ∈ ΔN C ∧ n ∈ (0,1]} , where B[ n ] is called an atomic cut concept. The set of atomic roles in EFALCN is defined as ΔN RE = {R[ n ] | R ∈ ΔN R ∧ n ∈ (0,1]} , where R[ n ] is called an atomic cut role. For any B[ n ] (resp. R[ n ] ), we define B (resp. R) as the prefix of [n] and reversely [n] as the suffix of B (resp. R). EFALCN starts with B[ n ] and R[ n ] , and inherits concept constructors of ALCN to define cut concepts in EFALCN: 1.
The top and bottom concept (denoted by F[1] , ⊥[1] ) are cut concepts;
2.
For any B[ n ] ∈ ΔN CE , B[ n ] is a cut concept;
3.
If C[ n ,..., n 1
k
]
and D[ m ,..., m ] are cut concepts, and R[ n ] is an element of ΔN RE , 1
l
then ¬C[ n ,..., n ] , C[ n ,..., n ] D[ m ,..., m ] , C[ n ,..., n ] D[ m ,..., m ] , ∃R[ n ] .D[ m ,..., m ] , ∀R[ n ] .D[ m ,..., m ] , 1
k
1
k
1
l
1
k
1
l
1
l
1
l
≥ NR[ n ] , ≤ NR[ n ] are cut concepts. where C[ n ,..., n 1
k
]
and D[ m ,..., m ] are abbreviation expressions for cut concepts. For 1
l
example, ∃ friend[0.7].(Tall[0.7] Strong[0.9]) is a cut concept, and it can be denoted by ∃ friend.(Tall Strong)[0.7,0.7,0.9] with collecting all the suffixes in a suffix vector, where ∃ friend.(Tall Strong) is called the prototype of the cut concept and [0.7, 0.7, 0.9] is called the suffix vector of the cut concept. The cut assertions, which are in the form of a : C[ n ,..., n ] or (a, b) : R[ n ] , build up 1
EFALCN ABoxs, where a : C[ n ,..., n 1
k
]
k
shows that the individual a belongs to the cut
concept C[ n ,..., n ] . For example, Stan: ∃ friend.(Tall Strong)[0.7,0.7,0.9] means Stan likely 1
k
has a friend who is likely tall and more likely strong. Similarly, (a, b) : R[ n ] shows that the pair of individuals (a, b) belongs to the cut role R[ n ] . EFALCN TBoxs are composed of cut terminological axioms about inclusion and equivalent of cut concepts, which are in the form of B[n] C[ f1 ( n ),..., fk ( n )] n ∈ X and B[n] ≡ C[ f1 ( n ),..., fk ( n )]
n ∈ X , where B[n] is an atomic cut concept and C[ f1 ( n ),..., fk ( n )] is a cut concept, X is continuous and f i (n) is a linear function from domain X to (0, 1], i = 1,..., k . The suffix vectors including variables and functions are called alterable suffix vectors, and corresponding cut concepts are called alterable cut concepts in the given domain X. R[ fi ( n )] may appears in C[ f1 ( n ),..., fk ( n )] , such R[ fi ( n )] is also called an alterable cur role in the given domain X. EFALCN ABox A and TBox T make up an EFALCN knowledge base ¦ E (A, T).
Extended Fuzzy ALCN and Its Tableau Algorithm
235
3.2 Semantic Interpretation of EFALCN
The interpretation of EFALCN is defined as a pair I =< Δ I ,< I > , where Δ I is a nonempty set as the domain and < I is an interpretation function. < I maps every individual a into an element of the domain: (a) I = a I , and maps every atomic fuzzy concept and atomic fuzzy role into a membership degree function: B I : Δ I → [0,1] ,
R I : Δ I × Δ I → [0,1] . Additionally, < I maps cut concepts and cut roles into subsets of Δ I and Δ I × Δ I (correspondingly called interpretations of cut concepts and cut roles). The interpretation of the atomic cut concept B[ n ] is ( B[ n ] ) I = {d | d ∈ Δ I ∧ B I (d ) ≥ n} . In fuzzy math, ( B[ n ] ) I can be treated as the n-cut of fuzzy set BI with respect to the universe Δ I , which is the reason that we call B[ n ] a cut concept. Similarly, the interpretation of the atomic cut role R[ n ] is ( R[ n ] ) I = {( d , d ') | d , d ' ∈ Δ I ∧ R I (d , d ') ≥ n} . The interpretations of cut concepts are inductively defined as: (¬C[ n ,..., n ] ) I = Δ I /(C[ n ,..., n ] ) I ; 1
k
1
(2)
k
((C D)[ n1 ,..., nk ] ) = (C[ n1 ,..., nm ] ) ( D[ nm+1 ,..., nk ] ) ; I
I
I
(3)
((C D)[ n1 ,..., nk ] ) = (C[ n1 ,..., nm ] ) * ( D[ nm+1 ,..., nk ] ) ; I
I
(4)
I
((∃R.C )[ n1 ,..., nk ] ) = {d | d ∈ Δ ∧ ∃d ' ∈ Δ , R (d , d ') ≥ n1 ∧ d ' ∈ (C[ n2 ,..., nk ] ) } ;
(5)
((∀R.C )[ n1 ,..., nk ] ) = {d | d ∈ Δ ∧ ∀d ' ∈ Δ , R (d , d ') ≥ n1 → d ' ∈ (C[ n2 ,..., nk ] ) } ;
(6)
(≥ NR[ n ] ) I = {d | d ∈ Δ I ,| {d ' | R (d , d ') ≥ n} |≥ N } ;
(7)
(≤ NR[ n ] ) I = {d | d ∈ Δ I ,| {d ' | R (d , d ') ≥ n} |≤ N } .
(8)
I
I
I
I
I
I
I
I
I
I
In equations (3-4), [n1 ,..., nm ] and [nm +1 ,..., nk ] are the suffix vector of cut concept C[ n1 ,..., nm ] and D[ nm+1 ,..., nk ] respectively. In equations (5-6) [n2 ,..., nk ] is the suffix vector of C[ n2 ,..., nk ] . In equations (7-8), | Z | denotes the number of elements in the set Z.
EFALCN also can convert cut concepts into equivalent forms in NNF. An EFALCN cut concept C[ n ,..., n ] is in NNF iff negation ¬ only occurs in front of the 1
k
atomic cut concepts. Every cut concept can be converted in NNF by exhaustively applying the following rewrite rules: ¬¬C[ n1 ,..., nk ] ~ C[ n1 ,..., nk ] ; ¬(C[ n1 ,..., nk ] D[ h1 ,..., hs ] ) ~ ¬C[ n1 ,..., nk ] ¬D[ h1 ,..., hs ] ; ¬(C[ n1 ,..., nk ] D[ h1 ,..., hs ] ) ~ ¬C[ n1 ,..., nk ] ¬D[ h1 ,..., hs ] ; ¬(∃R.C )[ n1 ,..., nk +1 ] ~ ∀R[ n1 ] .¬C[ n2 ,..., nk +1 ] ; ¬(∀R.C )[ n1 ,..., nk +1 ] ~ ∃R[ n1 ] .¬C[ n2 ,..., nk +1 ] ; ¬(≥ NR[ n1 ] ) ~ (≤ N − 1R[ n1 ] ) ;
(9)
¬(≤ NR[ n1 ] ) ~ (≥ N − 1R[ n1 ] ) .
For example: a cut concept ¬ ((Young Tall) friend)[0.8,0.7,0.9] can be converted to ( ¬ Young ¬ Tall ( friend)) [0.8,0.7,0.9], which is in NNF. For any I =< Δ I ,< I > , we can prove that the above rewrite rules are semantic consistent. Now we will talk about the relation between EFALCN interpretations I and knowledge base ¦ E (A, T). I satisfies a cut assertion a : C[ n1 ,..., nk ] (resp. (a, b) : R[ n ] ), iff
236
J. Lu et al.
a I ∈ (C
[ n1 ,..., nk ]
) I (resp. (a I , b I ) ∈ ( R
[n]
) I ). I satisfies an ABox A (written I|§A ), iff I
satisfies all cut assertions in A; such I is called a model of A. Any interpretation I satisfies B[n] C[ f1 ( n ),..., fk ( n )] , n ∈ X iff for any n0 ∈ X , ( B[ n0 ] ) I ⊆ (C[ f1 ( n0 ),..., fk ( n0 )] ) I . I satisfies B[n]Ł C[ f1 ( n ),..., fk ( n )] , n ∈ X iff for any n0 ∈ X , ( B[ n0 ] ) I = (C[ f1 ( n0 ),..., fk ( n0 )] ) I . I satisfies a TBox T (written I|§T ), iff I satisfies all cut terminological axioms in T; such I is called a model of T. I satisfies ¦ E (A, T) iff I is a model of both A and T. 3.3 Reasoning Problems and Reasoning Properties of EFALCN
The main reasoning problems of EFALCN are satisfiability and consistency. In this paper, we only talk about reasoning problems without respect to TBox. Satisfiability: a given cut concept C[ n ,..., n ] is satisfiable iff there is an interpretation 1
k
I such that C[In ,..., n ] ∅ . 1
k
Consistency: a given ABox A is consistent iff there is a model I of A. For alterable concepts allowed in EFALCN, we extend satisfiability of cut concepts to satisfiability of alterable cut concepts in a given domain X, and discuss the satdomain task instead of satisfiability task. Sat-domain: for an alterable cut concept C[ f1 ( n ),..., fk ( n )] in a given domain n ∈ X 0 , where X 0 ⊆ [0,1] , X0 is continuous, and f i (n) is a linear function from domain X0 to (0,1], the sat-domain task computes satisfiable and unsatisfiable sub-domains of X0. For any n0 ∈ X 0 , if C[ f1 ( n0 ),..., fk ( n0 )] is satisfiable then n0 is in the satisfiable subdomain, otherwise n0 is in the unsatisfiable sub-domain. After talking about reasoning problems, we will discuss two reasoning properties dealing with suffix vectors of cut concepts and cut roles. Two properties are better enabling reasoning in EFALCN. Property 1. For any two cut roles R[ n ] and R[ m ] , if they have the same prototypes 1
1
and n ≤ m1, then for any interpretation I =< Δ I ,< I > , it is true that ( R [ n ] ) I ⊇ ( R [ m ] ) I (this implicitly means R [ m ] R [ n ] ). 1
1
1
1
The constraint of suffixes: for any suffix, when its prefix is an atomic fuzzy concept, if ¬ occurs in front of the concept, its constraint is ¬ ; when its prefix is an atomic fuzzy role, if ∀ , ∃ , ≥ or ≤ is in front of the role, its constraint is ∀ , ∃ ≥ or ≤ respectively. Property 2. For any two cut concepts C[ n ,..., n ] and C[ m ,..., m ] with the same prototypes, 1
k
1
k
for any two ith suffixes ni, mi in the suffix vectors, they have the same prefix and constraint, if it is true that Condition 1: When the prefix is an atomic fuzzy concept, if the constraint is ¬ , then ni ≥ mi, otherwise ni ≤ mi. Condition 2: When the prefix is an atomic fuzzy role, if the constrain is ∀ and ≤ N, then ni ≥ mi, otherwise ni ≤ mi.
Extended Fuzzy ALCN and Its Tableau Algorithm
Condition 3: then for any interpretation I =< Δ I ,< I > , (C[ n ,..., n ] ) I ⊇ (C[ m ,..., m ] ) I . 1
k
1
it
is
true
237
that
k
4 Tableau Algorithm for EFALCN We design tableau algorithms of sat-domain and consistency. The algorithms are developed in the style of so-called constraint propagation method. For in the process of constrain propagation, ABoxes may contain such assertions x: D[ fi ( n ),..., fl ( n )] or (x, y): R[ f ( n )] , and n ∈ X . Such ABox is called an alterable ABox, donated as a pair (A, X), where A is the set of assertional axioms and X is the domain of variable n. In EFALCN, we define translation rules to achieve constraint propagation process with keeping consistency of alterable ABox (A, X) in its domain. By defining the translation rules, we should consider the effect of comparison between suffix functions in domain X. For example, there are two assertional axioms in current alterable ABox(A, X): x: ∀R[ fi ( n )] .D[ fi+1 ( n ),..., fl ( n )] and (x,y) ∈ R[ f j ( n )] . The comparison between two functions fi(n) and fj(n) in domain X will affect the application of translation rules. If fi(n)fj (n) in X, then translation rules will add y: D[ fi+1 ( n ),..., fl ( n )] . In more complex case, if fi(n)fj (n) in sub-domain X1 of X, then (A, X) will be divided into two new ABox: (A1, X1) with adding y: D[ f2 ( n ),..., fk ( n )] in A1 and replacing domain with X1 and (A, X2) with keeping A and changing domain with X2=X\X1. We define a method to achieve the comparison between two functions fi(n) and fj(n) in domain X.
Method Comparison Input (fi(n), fj(n), X) fi(n) and fj(n) are two functions, and X is the domain of variable n. Output (X1, X2) X1 is the sub-domain where fi(n) fj(n) and X2 is the other sub-domain of X Fig. 1. Comparison method between functions fi(n) and fj(n) in X
Additionally, for EFALCN supports number restrictions ≥ NR[ n ] and ≤ NR[ n ] , in the constraint propagation process, the comparison between one given function and other multi functions may be encountered. For example, current alterable ABox (A, X) may contain x :≤ NR[ f0 ( n )] and ( x, y j ) ∈ R[ f j ( n )] 1jN+1. If f0(n)fj(n) 1jN+1 in subdomain X* of X, obviously X* is a special sub-domain which need be separated from the whole domain X. To keep the domain of alterable ABox is continuous, then X will be divided into three sub-domains: X*, left and right adjacent domains of X*. We also define a method to achieve the comparison between a given function f0(n) and other N+1 functions in domain X.
238
J. Lu et al.
Method Multi-Comparison Input (f0(n), f1(n),…, fN+1(n),X) fi(n) is a function, 0iN+1and X is the domain of variable n. Output (X1, X2, X3), where X2 is the sub-domain where f0(n)fj(n), 1jN+1; X1 and X3 are left and right adjacent domains of X2. Fig. 2. Multi-Comparison method between functions f0(n) and multi functions in X
After talking about some preparation, we express our tableau algorithm. Given an alterable cut concept C[ f1 ( n ),..., fk ( n )] in domain X, the tableau algorithm starts with a single alterable ABox ({x0: C[ f1 ( n ),..., fk ( n )] }, X), and applies domain-consistency preserving transform rules (fig.3.) to ABoxes, which divides the whole domain X into two sub-domains with satisfiability and unsatisfiability. New ABoxes generated by the action of rules are called subsequences of the given ABoxes. By Applying some translation rules, the given alterable ABox (A, X) is translated into multi subsequences. Therefore we define a set of ABoxes S instead of the single ABox. The tableau algorithm starts with a single set S0={({x0: C[ f1 ( n ),..., fk ( n )] }, X0)}. Translation rules are applied to S with replacing one alterable ABox by its subsequence. We will define some terms to circumscribe conditions of termination. Definition 1. An alterable ABox (A, X) is complete iff none of translation rules could be applied to it. An alterable ABox (A, X) is not closed, iff it is not announced closed. An alterable ABox is open iff it is complete and not closed. A finite set of alterable ABoxes S is complete iff any alterable ABox (A, X) in S is complete. The satisfiable domain of S is denoted by Sat(S) and defined as the union of domains of open ABox (Ai, Xi): Sat(S)= * X i , (Ai, Xi) ∈ S and (Ai, Xi) is open. Now we refine process of tableau algorithm for sat-domain (fig.4.) in following steps:
(1) Let C[ f1 ( n ),..., fk ( n )] be a cut concept in NNF. The tableau algorithm starts with a set of a single alterable ABox: S0 = {({x0: C[ f1 ( n ),..., fk ( n )] }, X)}. (2) The tableau algorithm exhaustively applies translation rules to current Si. We denote Si+1 is subsequence set of ABoxes of Si, so there is a chain of Si by rules application: S0ĺS1ĺ…ĺSiĺ…. (3) If the current Si is complete, the tableau algorithm returns (Xsat, Xunsat), where Xsat=Sat(Si) as satisfiable sub-domain and Xunsat=X \ Xsat as unsatisfiable subdomain. The termination and correctness of the tableau algorithm can be guaranteed, and the complexity of sat-domain is PSPACE-complete. In this paper, we do not give the detailed proof of these results.
Extended Fuzzy ALCN and Its Tableau Algorithm
239
For any alterable ABox (A, X) which is not closed -rule Condition: x: D[ fi ( n ),..., f s ( n )] E[ f s+1 ( n ),..., fl ( n )] ∈ A , x: D[ fi ( n ),..., fs ( n )] or x: E[ f s+1 ( n ),..., fl ( n )] ∉ A . Action: (A1, X)=(A * {x: D[ fi ( n ),..., f s ( n )] , x: E[ f s +1 ( n ),..., fl ( n )] }, X). -rule Condition: x: D[ fi ( n ),..., f s ( n )] E[ f s+1 ( n ),..., fl ( n )] ∈ A , x: D[ fi ( n ),..., fs ( n )] , E[ f s+1 ( n ),..., fl ( n )] ∉ A .
Action: (A1, X)= (A * {x: D[ fi ( n ),..., f s ( n )] }, X); (A2, X)=(A * {x: E[ f s +1 ( n ),..., fl ( n )] }, X). ∃ -rule
Condition: x: ∃R[ fi ( n )] .D[ fi +1 ( n ),..., fl ( n )] ∈ A , ∀y ∈ A , (x, y): R[ fi ( n )] or y : D[ fi +1 ( n ),..., fl ( n )] ∉ A . Action: (A1, X)=(A * {z: D[ fi +1 ( n ),..., fl ( n )] , (x, z): R[ fi ( n )] }, X), where z is new generated.
∀ -rule Condition: x: ∀R[ fi ( n )] .D[ fi +1 ( n ),..., fl ( n )] , (x, y): R[ f j ( n )] ∈ A ,y: D[ fi +1 ( n ),..., fl ( n )] ∉ A , (X1, X2)=Comparison( f i (n) , f j (n) ,X) and X1 ≠ ∅ . Action: (A1, X)=(A * {y: D[ fi +1 ( n ),..., fl ( n )] }, X1); if X2 ≠ ∅ , (A2, X) = (A, X2). ¬ -rule Condition: x: ¬B[ fi ( n )] , x: B[ f j ( n )] ∈ A , (X1, X2)=Comparison( f i (n) , f j (n) ,X) and
X1 ≠ ∅ . Action: (A1, X)= (A, X1), (A1, X) is announced closed; if X2 ≠ ∅ , (A2, X) = (A, X2). ≥ -rule Condition: x: ≥ NR[ fi ( n )] ∈ A and |{yi|(x,yi): R[ fi ( n )] ∈ A }| ~ B[ n ] D[ n ] , n ∈ [0,1] and < B :≈ D > ~ B[ n ] D[ n ] n ∈ [0,1] and D[ n ] B[ n ] , n ∈ [0,1] .
Secondly, EFALCN and EFALCN* have the same expressive power for fuzzy information. We only have to prove that every complex cut concept in EFALCN* can be equally denoted by atom cut concepts in EFALCN. Obviously, it is true. For example, if C[ n ] = ( D E )[ n ] , C[ n ] can be denoted by D[ n ] E[ n ] . The denotation of other forms of complex cut concept is similar.
Extended Fuzzy ALCN and Its Tableau Algorithm
241
Above all, EFALCN* can express any FALC knowledge base ¦ ( , T), and EFALCN and EFALCN* have the same expressive power for fuzzy information, so all fuzzy information described by FALC can also be described by EFALCN. EFALCN is more expressive than FALC. Firstly, FALCN supports more powerful presentation of fuzzy concept assertion, which is not allowed in FALC. For example, ∃ R.C is a FALC fuzzy concept, if an interpretation I satisfies fuzzy assertion < (∃R.C )(a ) ≥ n > , then it means (∃R.C ) I (a I ) ≥ n . From the definition of (∃R.C ) I , we can get that ∃b I ∈ Δ I , R I (a I , b I ) ≥ n and C I (b I ) ≥ n . In many application, it may encounter more complex fuzzy assertion, such as an individual a such that ∃b I ∈ Δ I , R I (a I , b I ) ≥ n1 and C I (b I ) ≥ n2 , where n1n2. In this case, the membership degrees of concept and role are different and need respective presentations. FALC cannot distinguish such diverse membership degrees, but EFALCN can handle it. The above fuzzy assertion can be described as a: ∃ R [ n ] .C [ n ] . 1
2
Secondly, EFALCN extends the expressive scope of fuzzy terminological information in FALC. The FALC fuzzy terminological axiom C E⋅ D implicit means ∀d ∈ Δ I , C I (d ) ≥ n → D I (d ) ≥ n , which can not express complex inclusion based on various membership degrees. For example, the fuzzy concept Young and Very-young have the following inclusion relation: if membership degree of an individual a belonging to Very-young is more than 0.6, the membership degree of this individual belonging to Young is more than 0.9, formally denoted by ∀d ∈ Δ I , Veryyoung(d) ≥ 0.6 → Young(d) ≥ 0.9. Now two side of the inclusion contain various membership degree constrains, which is not allowed in FALC but can be denoted in EFALCN: Very-young[0.6] Young[0.9].
6 Conclusions We propose the extended fuzzy description logic EFALCN, which introduces the cut sets of fuzzy concepts and fuzzy roles as atomic concepts and atomic roles. In detail, we define the syntax structure, semantic interpretation and reasoning problem of EFALCN. We also design tableau algorithms of reasoning problems for EFALCN. We compare EFALCN with other fuzzy DLs, and explain that EFALCN has more expressive power. This work can be applied as a language in current Semantic Web to enrich its representation means, and regarded as a new idea of extending DLs with fuzzy features.
References 1. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American, vol. 284, no.5 (2001) 34–43 2. Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D., Patel-Schneider, P.F.(Eds.): The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press (2003) 3. Meghini, C., Sebastiani, F., Straccia, U.: Reasoning about the form and content for multimedia objects. In: Proceedings of AAAI 1997 Spring Symposium on Intelligent Integration and Use of Text, Image, Video and Audio, California (1997) 89–94
242
J. Lu et al.
4. Straccia, U.: Reasoning within fuzzy description logics. Journal of Artificial Intelligence Research, no. 14 (2001) 137–166 5. Schmidt-Schau?, M., Smolka, G.: Attributive concept descriptions with complements. Artificial Intelligence, vol. 48 (1991) 1–26 6. Straccia, U.: Transforming fuzzy description logics into classical description logics. In: Proceeedings of the 9th European Conference on Logics in Artificial Intelligence, Lisbon, (2004) 385–399 7. Heinsohn, J.: Probabilistic description logics. In: Proceedings of UAI-94 (1994) 311–318 8. Jaeger, M.: Probabilistic reasoning in terminological logics. In: Proceedings of KR-94 (1994) 305–316 9. Giugno, R., Lukasiewicz, T.: A probabilistic extension of shoq(d) for probabilistic ontologies in the semantic web. In: Proceeedings of the 9th European Conference on Logics in Artificial Intelligence, Cosenza, Italy (2002) 23–26 10. 10.Yen, J.: Generalizing term subsumption languages to fuzzy logic. In: Proceedings of the 12th Int. Joint Conference on Artificial Intelligence (1991) 472–477
Type II Topological Logic C2T and Approximate Reasoning Yalin Zheng, Changshui Zhang, and Yinglong Xia State Key Laboratory for Intelligent Technology and Systems, Department of Automation , Faculty of Information Science and Technology, Tsinghua University , Beijing 100084 , The People’s Republic of China {zheng-yl, zcs}@mail.tsinghua.edu.cn [email protected]
Abstract. This paper propose a topological logic model of approximate reasoning based on the type II topological logic C2T and the structure of matching function C and C-match neighborhood group. The type II topological algorithm of simple approximate reasoning and multiple approximate reasoning is given in type II topological logic C2T with matching function C. We also propose the structure of type II regular topological logic C2T and regular matching function C. The type II completeness and type II perfectness of knowledge base K is investigated in type II regular topological logic C2T with regular matching function C.
1
Introduction
In F r´ echet topology, for each knowledge Ai → Bi in knowledge base K = {Ai → Bi |i − 1, 2, · · · , n}, the topological closure {Ai }− of antecedent Ai is identically itself, that is, {Ai }− = {Ai }. Therefore, any input A∗ that different from Ai can not active knowledge Ai → Bi , which implies that F r´ echet topology is not suitable for type I topological algorithm of approximate reasoning. Consequently, we intend to explore a reasonable topology T in non-F r´ echet topologies for type I topological algorithm of approximate reasoning. The non-F r´ echet topology T , which makes type I topological algorithm sensibly is valid and significative, possess the following two characteristics. On one hand, its open sets are few, which make it possible to include the antecedent A
This work is supported by the project (60475001) of the National Natural Science Foundation of China.
L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 243–252, 2005. c Springer-Verlag Berlin Heidelberg 2005
244
Y. Zheng, C. Zhang, and Y. Xia
of knowledge A → B into U∈U (A∗ ) U , the intersection of all neighborhoods of A∗ ; On the other hand, the correspondent type I topological algorithm holds a rigid topological requirement, say, input A∗ succeeds in firing knowledge A → B iff A∗ is the adherent formula of {A}. Then, any adherent formula B ∗ of {B} is taken as the reasoning result and it is outputted from the reasoning system. In this paper, we significantly decrease the ”topological requirement” and significantly increase the ”number of open sets” so that the topological algorithm is more convenient for application.
2
The Structure of Type II Topological Logic C2T
ˆ C}, ˜ where C ˆ is the syntax of C and Assume classical propositional logic C = {C, ˜ is the semantic of C. F (S), which is the type (2, 2, 2, 1) free algebra generated C by non-empty countable set S, is called proposition set , or formulae set of C. →, ∧, ∨, are implication, conjunction, disjunction and negation connectives ˆ Ø|= is the set of for propositions respectively. Ø is the set of all theorems in C. ˜ According to the soundness theorem and the completeness all tautologies in C. theorem for C, we have Ø = Ø|= . Assign a topology T to formulae set F (S) in C, we obtain a topological space (F (S), T ). For any formula A, B, A → B ∈ F (S), we denote their neighborhood systems under the topology T with U(A), U(B) and U(A → B). For any implication formula A → B ∈ F (S), we have ∀W ∈ U(A → B), ∃U ∈ U(A), ∃V ∈ U(B), s.t.U → V ⊆ W
(1)
where U → V = {A∗ → B ∗ |A∗ ∈ U, B ∗ ∈ U } then, the 3-tuple ˆ C, ˜ T) C2T = (C, is called the type II topological propositional logic on formulae set F (S) , or type II topological logic for short. It is also said that topology T is the type II ˆ C). ˜ consistent topology with respect to logic C = (C, Constraint (1) depicts the following characteristic of type II topological logic C2T . For any ”precision requirement” W approaching to formula A → B, it always can find the ”precision requirement” U approaching to formula A and the ”precision requirement” V approaching to formula B which makes the approximate degree between implication formulae A∗ → B ∗ and A → B dose not exceed the preassigned W when the approximate degree between A∗ and A dose not exceed U and the approximate degree between B ∗ and B dose not exceed V .
Type II Topological Logic C2T and Approximate Reasoning
U
245
U →V
A A→B
V
B
W
Fig. 1. The structure of type II topological logic C2T
3
The Simple Approximate Reasoning Based on Type II Topological Logic C2T
In type II topological logic C2T , it can construct the choice function (U(A) × U(B) × U(A → B)); H→ A→B∈H C: (A → B) → (UA , UB , UA→B ) ∈ U(A) × U(B) × U(A → B). which satisfies UA → UB ⊆ UA→B where H is the set of all implication formulae in form of A → B, that is, H = {A → B|A ∈ F (S), B ∈ F (S)}. We call the choice function C as the matching function on formulae set H, the 3-tuple C(A → B) = (UA , UB , UA→B ) as the C−matching neighborhood group with respect to implication formula A → B, or matching neighborhood group for short. In type II topological logic C2T with matching function C, consider following simple approximate reasoning model A→B A∗ B∗
.
(2)
where A → B ∈ Ø is called knowledge; A∗ ∈ F (S) is called input and B ∗ ∈ F (S) is called output, or approximate reasoning conclusion. Matching function C assigns the matching neighborhood group (UA , UB , UA→B ) to the knowledge A → B.
246
Y. Zheng, C. Zhang, and Y. Xia
If A∗ ∈ UA , it means knowledge A → B is activated by input A∗ . Taking any B ∗ ∈ UB as the approximate reasoning conclusion, we call B ∗ as the type II topological solution of simple approximate reasoning with respect to knowledge A → B and input A∗ . ¯ UA , it means knowledge A → B is not activated by input A∗ . In this If A∗ ∈ ∗ case, A is not considered i.e. there is no type II topological solution of simple approximate reasoning with respect to knowledge A → B and input A∗ . The above algorithm is called type II topological algorithm of simple approximate reasoning. UA → UB UA
A · · A∗
B · · B∗
A→B · · A∗ → B ∗
UB
UA→B
Fig. 2. The type II topological algorithm of simple approximate reasoning
The structure of type II topological logic C2T guarantees the successful construction of matching function C; the C−matching neighborhood group of knowledge A → B assigned by matching function C guarantees sufficient approaching degree between approximate knowledge A∗ → B ∗ and the standard knowledge A → B with respect to type II topological solution B ∗ of simple approximate reasoning. Particularly, when the approximate degree between input A∗ and the antecedent A of knowledge A → B dose not exceed UA , the approximate degree between solution B ∗ obtained by type II topological algorithm of simple approximate reasoning and the descendant B of knowledge A → B will not exceed UB . The approximate degree between approximate knowledge A∗ → B ∗ and standard knowledge A → B will be also within UA→B . It is straightforward to derive the theorem below. Theorem 1. In type II topological logic C2T with matching function C, if there exists type II topological solution B ∗ of input A∗ with respect to knowledge A → B, then we have A∗ → B ∗ ∈ UA→B Review a simple approximate reasoning algorithm named Modus Ponens reappear algorithm, or MP reappear algorithm for short. If A→B A∗ B∗
.
(3)
Type II Topological Logic C2T and Approximate Reasoning
247
that is, when A∗ is just the antecedent A of knowledge A → B, the descendant B of A → B can be taken as the reasoning result. Theorem 2. In type II topological logic C2T with matching function C, the type II topological algorithm of simple approximate reasoning (2) is MP reappear algorithm. Review a simple approximate reasoning algorithm named degenerated approximate reasoning algorithm, or classical reasoning algorithm for short. If each knowledge A → B can be activated by input A∗ , if and only if A∗ = A. Theorem 3. In type II topological logic C2T with matching function C, type II topological algorithm of simple approximate reasoning is classical reasoning algorithm, if and only if the topology T in C2T is a discrete topology and C−matching neighborhood group C(A → B) assigned by matching function C for each implication A → B satisfies (UA , UB , UA→B ) = ({A}, {B}, {A → B}) Two versions of above theorem is very useful. Theorem 4. For every knowledge A → B and input A∗ which is different from A in type II topological logic C2T with matching function C, there dose not exist any type II topological solution of simple approximate reasoning for A∗ with respect to A → B, if and only if the topology T is discrete in C2T , and the C−matching neighborhood group C(A → B) assigned by matching function C for each implication formula A → B satisfies (UA , UB , UA→B ) = ({A}, {B}, {A → B}) Theorem 5. In type II topological logic C2T with matching function C, there exists type II topological solution B ∗ of simple approximate reasoning for a knowledge A → B with respect to input A∗ which is different from A, if and only if the C−matching neighborhood group (UA , UB , UA→B ) assigned by matching function C for A → B satisfies UA = {A}, and which is independent of the fact whether the topology T in C2T is discrete. From above discussion, we know that the type II topological algorithm of approximate reasoning only brings little constraints on topology T , which show that the type II topological algorithm can choose topologies from very wide scope. Specially, we will know that the Hausdorff matching with good ”separation” will be favorite after the type II topological algorithm of multiple approximate reasoning is given. It illustrates the topology T with large numbers of open sets is more favorable for type II topological algorithm of multiple approximate
248
Y. Zheng, C. Zhang, and Y. Xia
reasoning because a input A∗ which can activate a knowledge in knowledge base K can determine completely the final output B ∗ as the reasoning result. Furthermore, an input A∗ can activate a knowledge A → B if and only if it is included in the neighborhood UA assigned for A in type II topological algorithm, that is, A ∈ UA . Differently, a input A∗ which can activate knowledge A → B must be included in the topological closure {A}− i.e. A∗ ∈ {A}− in type I topological algorithm, that is, A should be include in all the neighborhoods of A∗ i.e. A ∈ U∈U (A∗ ) U . Therefore, type II topological algorithm is more convenient than type I topological algorithm in practice. From the above discussion, we also known that the key phase for determining and implementing the precision requirement of type II topological algorithm of approximate reasoning is the choice of matching function C. Tuning matching function C according to the requirement, we can fulfill different precision requirements of approximate reasoning.
4
The Multiple Approximate Reasoning Based on Type II Topological Logic C2T
In type II topological logic C2T with matching function C, consider following multiple approximate reasoning model A1 → B1 A2 → B2 ··· An → Bn A∗ B∗
(4)
where K = {Ai → Bi |i = 1, 2, · · · , n} ⊆ Ø , is called knowledge base, the nonempty subset of which is called knowledge group; A∗ ∈ F (S) is called input ; B ∗ ∈ F (S) is called output or approximate reasoning conclusion. The C−matching neighborhood group assigned by matching function C for each knowledge Ai → Bi (i = 1, 2, · · · , n) is given by (UA1 , UB1 , · · · , UA1 →B1 ) (UA2 , UB2 , · · · , UA2 →B2 ) ··· (UAn , UBn , · · · , UAn →Bn ) If A∗ ∈ UAi , then say that knowledge Ai → Bi can be activated by input A∗ . If ∃E ⊆ {1, 2, · · · , n}, E = Ø, such that A∗ ∈ UAi for each i ∈ E, that is, ! U Ai A∗ ∈ i∈E
Type II Topological Logic C2T and Approximate Reasoning
249
¯ UAi for each i ∈ {1, 2, · · · , n} − E, that is, but A∗ ∈ # ¯ U Ai A∗ ∈ i∈{1,2,··· ,n}−E
then say that knowledge group {Ai → Bi |i ∈ E} can be activated by input A∗ . Take any Bi∗ ∈ UBi for each i ∈ E and let
B∗ =
(
Bi∗
i∈E
then
where
B∗ ∈ )
#
UBi
i∈E
Bi∗ is a aggregation of {Bi∗ |i ∈ E}. B ∗ , as the conclusion of approx-
i∈E
imate reasoning, is called type II topological solution of multiple approximate reasoning for A∗ with respect to knowledge base K = {Ai → Bi |i = 1, 2, · · · , n}. Specially, if there exists unique i ∈ {1, 2, · · · , n} which satisfies A∗ ∈ UAi i.e. Ai → Bi can be activated by A∗ , then take any B ∗ ∈ UBi to be the approximate reasoning conclusion, called type II topological solution of multiple approximate reasoning for A∗ with respect to knowledge base K = {Ai → Bi |i = 1, 2, · · · , n}. ¯ UAi for any i ∈ {i, 2, · · · , n}, that is, If A∗ ∈ ¯ A∗ ∈
n #
U Ai
i=1
then, knowledge base K = {Ai → Bi |i = 1, 2, · · · , n} can not be activated by input A∗ . Input A∗ will be ignored in this case, say, there is no type II topological solution of multiple approximate reasoning of A∗ with respect to knowledge base K = {Ai → Bi |i = 1, 2, · · · , n}. This scheme is called type II topological algorithm of multiple approximate reasoning. It is straightforward to shown those theorems as follows. Theorem 6. In type II topological logic C2T with matching function C, there exists type II topological solution of multiple approximate reasoning for A∗ with respect to knowledge base K = {Ai → Bi |i = 1, 2, · · · , n},
250
Y. Zheng, C. Zhang, and Y. Xia
if and only if there at least exists an i ∈ {1, 2, · · · , n} which satisfies A∗ ∈ UAi Theorem 7. In type II topological logic C2T with matching function C, there exists type II topological solution of multiple approximate reasoning for A∗ with respect to knowledge base K = {Ai → Bi |i = 1, 2, · · · , n}, if and only if A∗ ∈
n #
U Ai
i=1
Theorem 8. In type II topological logic C2T with matching function C, if there exists type II topological solution B ∗ of multiple approximate reasoning for A∗ with respect to knowledge base K = {Ai → Bi |i = 1, 2, · · · , n}, then there at least exists an i ∈ {1, 2, · · · , n} which holds B ∗ ∈ UBi Theorem 9. In type II topological logic C2T with matching function C, if there exists type II topological solution B ∗ of multiple approximate reasoning for A∗ with respect to knowledge base K = {Ai → Bi |i = 1, 2, · · · , n}, then B∗ ∈
n #
UBi
i=1
Theorem 10. In type II topological logic C2T with matching function C, if there exists type II topological solution B ∗ of multiple approximate reasoning for A∗ with respect to knowledge base K = {Ai → Bi |i = 1, 2, · · · , n}, then there at least exists an i ∈ {1, 2, · · · , n} which holds A∗ → B ∗ ∈ UAi →Bi
Type II Topological Logic C2T and Approximate Reasoning
251
Theorem 11. In type II topological logic C2T with matching function C, if there exists type II topological solution B ∗ of multiple approximate reasoning for A∗ with respect to knowledge base K = {Ai → Bi |i = 1, 2, · · · , n}, then there at least exists an i ∈ {1, 2, · · · , n} which holds A∗ → B ∗ ∈
n #
UAi →Bi
i=1
It is not ideal that a input A∗ simultaneously activated more than one knowledges in knowledge base K = {Ai → Bi |i = 1, 2, · · · , n}, that is to say, the most favorable situation is that a input A∗ activates at most one knowledge and B ∗ is consequently regarded as the output. Therefore, we would like to impose some constraints to matching function C and choose a proper topology T . The reason is that the performance of type II topological algorithm of approximate reasoning majorally depends on the structure of matching function C, which depends on the choice of topology T for type II topological logic C2T to some extent. In type II topological logic C2T , matching function C is called a Hausdorff matching with respect to knowledge base K = {Ai → Bi |i = 1, 2, · · · , n}, if and only if U Ai
!
U Aj = Ø
where i, j ∈ {1, 2, · · · , n} and i = j. Theorem 12. In type II topological logic C2T with matching function C, if C is a Hausdorff matching with respect to knowledge base K = {Ai → Bi |i = 1, 2, · · · , n}, then each input A∗ can activate at most only one knowledge Ai → Bi , therefore, B ∗ ∈ UBi , A∗ → B ∗ ∈ UAi →Bi .
5
Conclusion
We explore approximate reasoning in logical framework. We propose the structure of type II topological logic C2T , the structure of matching function C and
252
Y. Zheng, C. Zhang, and Y. Xia
the structure of matching neighborhood group. We investigate approximate reasoning in type II topological logic C2T with matching function C, develop type II topological algorithm of simple approximate reasoning and multiple approximate reasoning, introduce the essential characteristics of those schemes. The structure of type II regular topological logic C2T and the structure of regular matching function C are also proposed in this paper. The type II completeness and type II perfectness of knowledge base K in type II regular topological logic C2T with regular matching function C are also presented. In the future, we will investigate type II approximate knowledge mass, type II knowledge universe and type II k-level knowledge base Kk . We will also build the model of type II topological expert system with the function of automatic extension. Besides, we will study the structure and properties of true k-level knowledge circle and type II (k, j)−level knowledge base Kk,j , establish the model of extended type II topological expert system, research into the type II (k, j)−level perfectness of extended type II knowledge base. We also intend to give the structure of type II strong topological logic C2T and the structure of strong matching function C. The multidimensional approximate reasoning, multiple multidimensional approximate reasoning and the correspondent approximate reasoning schemes in type II strong topological logic C2T with strong matching function C will also be investigate. In order to make our topological schemes more applicable, we significantly decrease the ”topological requirements” and significantly increase the ”number of open sets” simultaneously in our research work.
References 1. Wang, G.J.: On the logic foundation of fuzzy reasoning. Information Science, 117 (1999) 47-88. 2. Wang, G.J., Leung, Y.: Intergrated semantics and logic metric spaces. Fuzzy Sets and Systems, 136 (2003) 71-91. 3. Wang, G.J., Wang, H.: Non-fuzzy versions of fuzzy reasoning in classical logic. Information Sciences, 138 (2001) 211-236. 4. Ying, M.S.: A logic for approximate reasoning. The Journal of Symbolic Logic, 59 (1994) 830-837. 5. Ying, M.S.: Fuzzy reasoning under approximate match. Science Bulletin, 37 (1992) 1244-1245. 6. Ying, M.S.: Reasoning about probabilistic sequential programs in a probabilistic logic. Acta Informatica, 39 (2003) 315-389. 7. Zheng, Y.L.: Stratified construction of fuzzy propositional logic. Proceedings of International Conference on Fuzzy Information Processing, Tsinghua University Press, Springer Verlag, 1-2 (2003) 169-174. 8. Zheng, Y.L., Zhang, C.S., Yi, X.: Mamdaniean logic. Proceedings of IEEE International Conference on Fuzzy Systems, Budapest, Hungary, 1-3 (2004) 629-634.
C1T
(K0 )−
KI K0
(K0 )−
K0
L∗
α−3I
A → B A∗ , B∗ A∗ = A
C d(A∗ , A)
q(A∗ , A)
A∗
A
s(A∗ , A) A → B
B∗
A → B
B
C1T C) * C = (C, →
C
C F (S) C
∅
T
|=
F (S) A, B, A → B ∈ F (S) T
A → B ∈ F (S) −
−
−
{A} − {B} = {A → B}
C
* C
C ∅
* C C ∅ = ∅|= (F (S), T ) U(A), U(B), U(A → B) − T
C1T
{A}− − {B}− = {A∗ → B ∗ |A∗ ∈ {A}− , B ∗ ∈ {B}− } C, * T) C1T = (C, F (S) T
C {A}−
{B}−
A
R
B
A∗
R
B∗
{A → B}− A→B A∗ → B ∗
C1T
∗
{A} B∗
{B}
B A→B B
A
A∗ → B ∗
A→B R [A]R A [A]R
C1T A∗ ∗ A → B∗ A∗
F (S)
R−
{A}−
F (S) TR
{A}
R−
A {A}− = [A]R F (S) s:
d:
F (S) × F (S) → [0, 1] (A, B) → s(A, B)
F (S) × F (S) → [0, 1] (A, B) → d(A, B)
∀(A, B) ∈ F (S) × F (S) d(A, B) = 1 − s(A, B), TS α−
d F (S) {A} {A}−α = {A∗ ∈ F (S)|d(A∗ , A) ≤ α} = {A∗ ∈ F (S)|s(A∗ , A) ≥ 1 − α}, {A} {A}− = {A∗ ∈ F (S)|d(A∗ , A) ≤ 0} = {A∗ ∈ F (S)|s(A∗ , A) ≥ 1}.
C1T C1T
A → B A∗ B∗
A → B ∈ ∅
A∗ ∈ F (S)
A∗ ∈ {A}− B ∗ ∈ {B − }
A∗
B ∗ ∈ F (S) A→B A∗
A→B − A∗ ∈ / {A} A∗
A∗
A → B
A∗ A→B C1T B∗
A→B A∗ → B ∗
A∗ → B ∗ {A → B}
C1T {A}−
{B}− A
R B
A∗
R B∗
C1T
A∗
A→B !
A∈
U
U∈U (A∗ )
A A∗
A→B U∈U (A∗ ) U
−
⇐⇒ A∗ ∈ {A} ⇐⇒ ∀U ∈ U(A∗ ), U ∩ {A} = ∅ ⇐⇒ ∀U ∈ U(A∗ ), A ∈ U ⇐⇒ A ∈ U∈U (A∗ ) U C1T
∗
B A→B
!
B∈
A∗
U
U∈U (B ∗ )
A→B
B
B∗
C1T
B∗ A→B
A→B
−
A∗ ∈ {A} ,
B ∗ ∈ {B}
U∈U (B ∗ )
U A∗
A∗ → B ∗ ∈ {A → B}− A∗ → B ∗
−
−
−
A∗ → B ∗ ∈ {A} → {B} . C1T {A}− → {B}− = {A → B}− , −
A∗ → B ∗ ∈ {A → B} . B∗ A∗ → B ∗
A→B C1T
∗
B A→B
A∗
!
A→B∈
U,
U∈U (A∗ →B ∗ )
A → B
A A→B
A
R
B
A∗
R
B∗
U(A∗ )
U(B ∗ )
A∗ → B ∗
A→B
A→B A∗ → B ∗ U(A∗ → B ∗ )
C1T
C1T
A ∈ {A}−
A∗
B ∈ {B}−
A→B
A∗ = A C1T
T
C1T −
A∗ ⇐⇒ A∗ = A ⇐⇒ {A} = {A} ⇐⇒
A→B T C1T
C1T T
A→B
A∗ = A
T
A∗ = A ⇐⇒ A∗ ∈ / {A}− ∅ − {A} = {A}
C1T C1T
A∗ = A A∗ = A
C1T
A∗ = A ⇐⇒ A∗ − A∗ = A ⇐⇒ {A} = {A} a → B A ∈ F (S), A → A T C1T
A
⇐⇒
A∗ ,
, A∗ =
A→B
A∗ = A
A∗
U∈U (A∗ )
A→B
A A∗
A→B
{A}
U A∗
∗
B ∈ {B}
A∗ → B ∗
T1 A→B
C1T1
∗
⇒ A
A→B
T2 C1T1 A → B A → B C1T2
C1T2
A∗
A∗ σ τ
C1T1 C1T2 σ≺τ σ≺τ
τ ≺σ
τ
σ
σ∼τ σ σ
τ T1
τ
C1T1
C1T2
T2
σ≺τ −T −T ⇐⇒ “A∗ ∈ {A} 1 ⇒ A∗ ∈ {A} 2 ” ⇐⇒ {A}−T1 ⊆ {A}−T2 ⇐⇒ T1 ⊇ T2 ⇐⇒ T1 ≺ T2 −T1
T2
−T2
T1
C1T
σ T1
σ∼τ
τ
T2 σ
τ
τ−
∗
A∗
σ−
A
A→B
A→B
C1T C1T C1T
Ω
C1T
I
Ω2
Ω1 ΩI KI E =
(C1T
I
I
,Ω ,K ) (K0 )−
KI
K0 (K0 )− K0 A = A∗ A = A∗ A∗ A T
A→B
A∗ ∩U∈U (A∗ ) U A∗ A
∗
A {B} ∗
A → B {A∗ → B ∗ }
∗
A A → B
B B∗ A →B A∗ → B ∗
B∗
Vagueness and Extensionality Shunsuke Yatabe1 and Hiroyuki Inaoka2 1
Faculty of Engineering, Kobe University, Kobe 657-8501, Japan [email protected] 2 The Graduate School of Humanities and Social Sciences, Kobe University, Kobe 657-8501, Japan [email protected]
Abstract. We introduce a property of set to represent vagueness without using truth value. It has gotten less attention in fuzzy set theory. We introduce it by analyzing a well-known philosophical argument by Gearth Evans. To interpret ‘a is a vague object’ as ‘the Axiom of Extensionality is violated for a’ allows us to represent a vague object in Evans’s sense, even within classical logic, and of course within fuzzy logic.
1
Introduction
In fuzzy set theory, vagueness is represented by truth value: Set x is a vague object if there is a set y such that the truth value of y ∈ x is indeterminate, i.e. except for 0, 1. However set has a property which can be applied to represent vagueness without using truth value; it has never been regarded as connected with vagueness in fuzzy set theory. In this paper, we focus our attention on it by analyzing a well-known philosophical argument which insists the impossibility of a vague object. In his short paper [3], Gearth Evans defined vague objects as having vague identity statement: a is a vague object if there is some object b such that a = b is of indeterminate truth value. Let us assume there can be a vague object in his sense in the world; we call this Evans’s Vagueness Assumption (EVA). He proceeded his argument as follows: Let a = b be indeterminate, then (I) !(a = b), (II) λx[!(a = x)]b , (III) ¬ ! (a = a), (IV) ¬λx[!(a = x)]a , (V) a = b,
i.e. i.e. i.e. i.e. i.e.
a = b is indeterminate (assumption), b is indeterminately equal to a (by (I)), a = a is determinate, a is not indeterminately equal to a (by (III)), a is not equal to b (from (II) and (IV)).
We note that !ϕ means that the truth value of ϕ is indeterminate. The conclusion (V) is derived by the definition of Leibniz equality, i.e. a = b if and only if ϕ(a) ↔ ϕ(b) for any ϕ, and (V) is ‘contradicting to the assumption that the identity statement “a = b” is of indeterminate truth value’. Hence (I) must be rejected, that is to say, any identity statement has determinate truth value. Therefore, he seems to have concluded EVA doesn’t hold. L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 263–266, 2005. c Springer-Verlag Berlin Heidelberg 2005
264
S. Yatabe and H. Inaoka
Many articles have been published for or against Evans’s proof. Many philosophers defended EVA and denied his derivation. They implicitly supposed that classical logic is not a suitable framework to represent vagueness; to change logic (e.g. fuzzy logic) is required to represent a vague object1 . We defend EVA, however we show that we need not to deny his derivation from (I) to (V) even within classical logic: To interpret ‘a = b is indeterminate’ as ‘a and b are extensionally equal (they share the same elements) nevertheless a = b holds’ gives us its model. This means that the Axiom of Extensionality is violated for a and b: The Axiom insists that any set is determined precisely by its members, i.e.if x and y has the same members then x = y holds.. So we can regard it as a representation of precision. In this sense, the violation of the Axiom represents a sort of vagueness. Many non-extensional set theories, i.e. set theories without the Axiom of Extensionality, have been proposed. In particular, Hajek and Hanikova’s Fuzzy Set Theory (FST) [5] was proposed to axiomatize our intuition of fuzzy set. It proves that Leibniz equality is crisp (i.e. its truth value is 0 or 1) but extensional equality can be a fuzzy relation, so the Axiom can’t be valid. Such violation of the Axiom has been regarded as merely introduced for technical reasons, however it seems to suggest that it is a necessary feature of fuzzy set implicitly connoted by our intuition itself. It seems that no one studying vagueness has mentioned the Axiom of Extensionality is connected with vagueness representation. We must investigate phenomenon inherently found in examining such vagueness.
2
Representation of Vagueness and Extensionality
Hereafter we employ set theory within classical logic. The famous relations in set theory are as follows: Leibniz equality x = y iff (∀z)[x ∈ z ↔ y ∈ z], Extensional equality x =ext y iff (∀z)[z ∈ x ↔ z ∈ y]. Of course x = y → x =ext y holds. It is necessary to consider the Axiom of Extensionality when we think about identity relations: The Axiom of Extensionality guarantees that, for any set x and y, x =ext y → x = y. 2.1
Vagueness as the Violation of the Axiom of Extensionality
In this subsection, we introduce our interpretation of indeterminacy. First we analyze Evans’s proof. Technically speaking, his proof seems to have three implicit assumptions as follows: 1
One typical approach is to analyze his proof within many-valued logic. For example, Jack Copeland tried to prove that derivation (I) from (V) isn’t valid within fuzzy logic [2]. Michael Tye represented vagueness within many-valued logic [9]. However it has been objected that ‘the writers who adopt this strategy rarely provide much argument for the need for a many-valued logic’ [6]. Another approach is within modal logic. It requires to add new modal operators to represent indeterminacy. For example, Ken Akiba defined a vague object as a transworld object [1].
Vagueness and Extensionality
265
(i) For every a, a = a has definite truth value (¬ ! (a = a)), (ii) the Diversity of the Dissimilar (DD) : if object a has a property that b lacks, then you can infer a = b, (iii) # ϕ implies # ϕ (as the generalization law in S5 -modal logic). For more details, see [6]. We note that ϕ means that the truth value of ϕ is determinate. (ii) is used to infer (V) from (II) and (IV). Now, (a = b) is inferred from (V) and (iii). For duality (¬ ¬ϕ ↔ !ϕ), ¬ ! (a = b) is inferred from (a = b). This contradicts (I). We don’t assume (iii): (i) and (ii) are necessary to derive (V) from (I) but (iii) is nothing to do with the derivation itself. What Evans proved is merely that vague identity statement (I) implies (V) if we don’t admit (iii). We call this, !(a = b) → a = b Evans Conditional (EC) as in [2]. We define vague equality so as to satisfy EC, and show the existence of a model which witnesses the consistency of EVA and EC. We propose to define vague equality as follows: Definition 1. a = b is indeterminate iff a =ext b & a = b. This means, the Axiom of Extensionality is violated for a and b. For more details about the justification of this consequence, see [10]. As we saw, a vague object in Evans’s sense is defined by using vague identity. Definition 2. a is a vague object iff the Axiom of Extensionality is violated for a, i.e. (∃b)[a =ext b & a = b]. Now EC implies contradiction only when we assume the Axiom of Extensionality. Otherwise !(a = b) implies a = b without implying a = b. So any model of set theory without the Axiom of Extensionality is a model of EVA and EC. 2.2
A Fuzzy Set Theory as an Example of Non-extensional Set Theory
We can easily generalize definition 2: The violation of the Axiom of Extensionality represents vagueness not only within classical logic but also within a greater variety of logics. So, within any logic, we insist that set theory without the Axiom of Extensionality is required to represent vague object. Many non-extensional set theories have been proposed by now2 . In particular, FST is a set theory within the framework of fuzzy logic with operator which means ‘determinately true’, i.e. its truth value is 1 [5]. It is in the style of ZF, and it seems to be an attempt to axiomatize our intuition of fuzzy set. 2
Traditionally, it has been studied within intuitionistic logic; the one of the most famous result is due to Harvey Friedman [4]. There are a few studies to generalize ZF first within intuitionistic logic, next within strengthened logic which is referred to as G¨ odel logic [8]. G¨ odel logic has truth set [0, 1] as fuzzy logic, but it has a few new logical constants.
266
S. Yatabe and H. Inaoka
In FST, the Axiom of Extensionality can’t be valid. It is because that Leibniz equality becomes crisp nevertheless the truth value of extensional equality can be indeterminate. We note that FST has the Axiom of Pairing, which guarantees that there is a crisp set {x, y} for any set x, y, where X is crisp set if and only if x ∈ X is crisp for any x. For any set x, fix the crisp set {x} by the Axiom of Pairing: {x} = {x, x}. Then for any set y, the truth value of y ∈ {x} is 1 or 0: if it is 1 this means that so is the truth value of x = y, otherwise the truth value of x = y is 0. Here the Axiom of Extensionality holds for crisp set, but it might be violated for some fuzzy set. Such violation of the Axiom of Extensionality has been regarded as merely introduced for technical reasons, but we justify it positively: We can represent two different sorts of vagueness simultaneously in FST. Furthermore, the violation seems to suggest that it is a necessary feature of fuzzy set implicitly connoted by our intuition of fuzziness itself . In this sense, definition 2 can be regarded as an isolation of some aspect of fuzziness so that we can represent it within a greater variety of logics.
3
Conclusion
In this paper, we gave an example of a new way of representation of vagueness in set theory by analyzing Evans’s philosophical argument on a vague object. We defined a vague object in his sense as an object for which the Axiom of Extensionality doesn’t hold, so we could construct a model of EVA and EC within classical logic without adding a new operator which represents indeterminacy. Many non-extensional set theories have been proposed, and the violation of the Axiom in FST seems to suggest that it is a necessary feature of fuzziness. We must investigate phenomenon inherently found in examining such vagueness.
References 1. Akiba, Ken. 2000. Vagueness as a modality. The Philosophical Quarterly 50: 359-70. 2. Copeland, B. Jack. 1995. On Vague Objects, Fuzzy Logic and Fractal Boundaries. Southern journal of philosophy 33: 83-95. 3. Evans, Gearth. 1978. Can there be vague objects? Analysis 38: 208. Reprinted in Vagueness: a reader, ed. R.Keefe, P.Smith, 317. Cambridge, Mass.:MIT press. 4. Friedman, Harvey. 1973. The consistency of classical set theory relative to a set theory with intuitionistic logic. Journal of Symbolic Logic 38: 315-319. 5. Hajek, Petr. Hanikova, Zuzana. 2003. A development of set theory in fuzzy logic. Theory and applications of multiple-valued logic, 273-85, Heidelberg.: Physica 6. Keefe, Rossanna. Smith, Peter. 1997. Introduction: theories of vagueness. in Vagueness: a reader, ed. R.Keefe, P.Smith, 1-57. Cambridge, Mass.:MIT press. 7. Noonan, H.W. 2004. Are there vague objects? Analysis 64: 131-34. 8. Titani, S. Takeuti, G. Fuzzy logic and fuzzy set theory. 1992. Arch. Math Logic 32: 1-32. 9. Tye, Michael. 1994. Sorites paradoxes and the semantics of vagueness. Reprinted in Vagueness: a reader, 281-93. 10. Yatabe, S. Inaoka, H. On Evans’s vague object from set theoretic viewpoint. Preprint.
Using Fuzzy Analogical Reasoning to Refine the Query Answers for Relational Databases with Imprecise Information Z.M. Ma1, Li Yan1, and Gui Li2 1
Northeastern University, Shenyang, Liaoning 110004, China 2 Liaoning Branch of China Netcom (Group) Corporation LTD., Shenyang, Liaoning 110044, China
Abstract. In this paper, we use the notion of equivalence degree of fuzzy data, by which we can mine the rules of fuzzy functional dependencies from the fuzzy relational databases. Following the rules of fuzzy functional dependencies, we can apply the frame of analogical reasoning to refine the imprecise query answer for the relational databases with imprecise information.
1 Introduction Analogy is an important inference tool in human cognition and is a powerful computation tool for general inference in artificial intelligence and decision-making. Approximate reasoning refers to such an inferring process that an object S (the source) has properties P and Q, respectively, and another object T (the target) shares the property P, then we can infer that T may also possess the property Q [4]. In other words, S and T have the same property Q if S and T are matched on P to each other. This process can be expressed by the following logical rule. IF P (S) and Q (S) and P (T) THEN Q (T) Note that the properties P and Q must have an associated relationship in applying the approximate reasoning above. So there are two issues involved in approximate reasoning. One is that which property P can determine the property Q. Another is that which object S has the same property as the object T on property P such that T has the same property as S on property Q. In relational databases, the relationship between properties P and Q in approximate reasoning is essentially the constraints of functional dependency between P and Q, where P and Q can be seen as attributes. According to semantics of functional dependency, two tuples, i.e., objects S and T, must have the same attribute values on Q if they have the same attribute values on P. Therefore, in relational databases, approximate reasoning can be emulated by functional dependency. In real world applications, information is often vague or ambiguous. Different kinds of imperfect information have been extensively introduced into relational databases. Fuzzy information has been extensively investigated in the context of the relational model. Since the fuzzy relational databases contain fuzzy information, the L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 267 – 275, 2005. © Springer-Verlag Berlin Heidelberg 2005
268
Z.M. Ma, L. Yan, and G. Li
query answers may be imprecise or even null [2, 3]. Applying approximate reasoning, it is clear that we can refine the imprecise query answer of tuple T on attribute Q when there exists a more precise attribute value of tuple S on attribute Q. The refinement result is that the value of T on Q is the same as the value of S on Q. However, in the fuzzy relational databases, the associated relationship between P and Q (i.e., functional dependency: P → Q) may be fuzzy. In addition, the match between S and T on P is generally fuzzy. In this paper, we use the notion of equivalence degree of fuzzy data to mine the rules of fuzzy functional dependencies from the fuzzy relational databases so that we could know which property P fuzzily determines the property Q. Also we need to determine which object S has the highest match degree with the object T on property P such that T has the same property Q as S. On the basis, we can apply the frame of analogical reasoning to refine the imprecise query answer for the relational databases with imprecise information. The remainder of this paper is organized as follows. Section 2 of the paper gives the basic knowledge of fuzzy sets and fuzzy relational databases. Section 3 of the paper presents the method of semantic measure of fuzzy data. Section 4 investigates fuzzy functional dependency in the fuzzy relational databases. Section 5 proposes a framework to refine imprecise query answers using fuzzy analogical reasoning. Section 6 summaries this paper.
2 Fuzzy Sets and Fuzzy Relational Databases Fuzzy data is originally described as fuzzy set by Zadeh [15]. Let U be a universe of discourse, then a fuzzy value on U is characterized by a fuzzy set F in U. A membership function μF: U → [0, 1] is defined for the fuzzy set F, where μF (u), for each u ∈ U, denotes the degree of membership of u in the fuzzy set F. Thus the fuzzy set F is described as follows. F = {μ (u1)/u1, μ (u2)/u2, ..., μ (un)/un} When the μF (u) above is explained to be a measure of the possibility that a variable X has the value u in this approach, where X takes values in U, a fuzzy value is described by a possibility distribution πX [16]. Let πX and F be the possibility distribution representation and the fuzzy set representation for a fuzzy value, respectively. It is apparent that πX = F is true [11]. In addition, a fuzzy data is represented by similarity relations in domain elements [1], in which the fuzziness comes from the similarity relations between two values in a universe of discourse, not from the status of an object itself. Similarity relations are thus used to describe the similarity degree of two values from the same universe of discourse. A similarity relation Sim on the universe of discourse U is a mapping: U × U → [0, 1] such that (a) for ∀x ∈ U, Sim (x, x) = 1, (reflexivity) (b) for ∀x, y ∈ U, Sim (x, y) = Simi (y, x), and (symmetry) (c) for ∀x, y, z ∈ U, Sim (x, z) ≥ maxy (min (Sim (x, y), Sim (y, z))). (transitivity)
Using Fuzzy Analogical Reasoning to Refine the Query Answers
269
In connection to the three types of fuzzy data representations, there exist two basic extended data models for fuzzy relational databases. One of the data models is based on similarity relations [1], or proximity relation [13], or resemblance [12]. The other one is based on possibility distribution [10, 11]. The latter can further be classified into two categories, i.e. tuples associated with possibilities and attribute values represented possibility distributions [11]. The form of an n-tuple in each of the abovementioned fuzzy relational models can be expressed, respectively, as t = , t = and t = , where pi ⊆ Di with Di being the domain of attribute Ai, ai ∈ Di, d ∈ (0, 1], πAi is the possibility distribution of attribute Ai on its domain Di, and πAi (x), x ∈ Di, denotes the possibility that x is the actual value of t [Ai]. It is clear that, based on the above-mentioned basic fuzzy relational models, there should be one type of extended fuzzy relational model [12], where possibility distribution and resemblance relation arise in a relational database simultaneously. The focus of this paper is put on such fuzzy relational databases and it is assumed that the possibility of each tuple in a fuzzy relation is exactly 1. Definition: A fuzzy relation r on a relational schema R (A1, A2, ..., An) is a subset of the Cartesian product of Dom (A1) × Dom (A2) × ... × Dom (An), where Dom (Ai) may be a fuzzy subset or even a set of fuzzy subset and there is the resemblance relation on the Dom (Ai). A resemblance relation Res on Dom (Ai) is a mapping: Dom (Ai) × Dom (Ai) → [0, 1] such that (a) for all x in Dom (Ai), Res (x, x) = 1 (b) for all x, y in Dom (Ai), Res (x, y) = Res (y, x)
(reflexivity) (symmetry)
3 Semantic Measure of Fuzzy Data In [7], the notions of semantic inclusion degree and semantic equivalence degree were proposed for the measure of fuzzy data. For two fuzzy data πA and πB, semantic inclusion degree SID (πA, πB) denotes the degree that πA semantically includes πB and semantic equivalence degree SED (πA, πB) denote the degree that πA and πB are equivalent to each other. Based on possibility distribution and resemblance relation, the definitions of calculating the semantic inclusion degree and the semantic equivalence degree of two fuzzy data are given as follows. Definition: Let U = {u1, u2, …, un} be the universe of discourse. Let πA and πB be two fuzzy data on U based on possibility distribution and πA (ui), ui ∈ U, denote the possibility that ui is true. Let Res be a resemblance relation on domain U and α (0 α 1) be a threshold corresponding to Res. Then n
SIDα (πA, πB) = ¦
n
min
i = 1 u i , u j ∈ U and Res U (u i , u j ) ≥ Į
( ʌ B (u i ), ʌ A (u j )) / ¦ ʌ B (u i )
SEDα (πA, πB) = min (SIDα (πA, πB), SIDα (πB, πA))
i =1
270
Z.M. Ma, L. Yan, and G. Li
Example. Let π1 = {1.0/a, 0.95/b, 0.9/c} and π1 = {0.95/a, 0.9/b, 1.0r/d, 0.3/e} be two fuzzy data on domain U = {a, b, c, d, e, f} and let Res be a resemblance relation on U given in Figure 1. Let threshold α = 0.9. Then SID (π1, π2) = {0.95 + 0.9 + 0.9}/{0.95 + 0.9 + 1.0 + 0.3} = 0.873, SID (π2, π1) = {0.95 + 0.9 + 0.9}/{1.0 + 0.95 + 0.9} = 0.965, and thus SED (π1, π2) = min (SID (π1, π2), SID (π2, π1)) = min (0.873, 0.965) = 0.873. Res a b c d e f
a 1.0
b 0.1 1.0
c 0.4 0.2 1.0
d 0.3 0.3 0.95 1.0
e 0.1 0.2 0.5 0.3 1.0
f 0.1 0.2 0.3 0.1 0.4 1.0
Fig. 1. A Resemblance Relation
Now Let us consider the following two particular cases, namely, between π3 = {1.0/a, 1.0/b, 1.0/c} and π3 = {1.0/a, 1.0/b, 1.0/d} and between π5 = {0.9/a, 1.0/b, 0.8/c} and π6 = {0.3/a, 0.4/b, 0.2/d}. In the similar way above, we have SID (π3, π4) = {1.0 + 1.0 + 1.0}/{1.0 + 1.0 + 1.0} = 1.0, SID (π4, π3) = {1.0 + 1.0 + 1.0}/{1.0 + 1.0 + 1.0} = 1.0, SID (π5, π6) = {0.3 + 0.4 + 0.2}/{0.3 + 0.4 + 0.2} = 1.0, SID (π6, π5) = {0.3 + 0.4 + 0.2}/{0.9 + 1.0 + 0.8} = 0.333, and thus SE (π3, π4) = min (SID (π3, π4), SID (π4, π3)) = min (1.0, 1.0) = 1.0; SE (π5, π6) = min (SID (π5, π6), SID (π6, π5)) = min (1.0, 0.333) = 0.333. It follows that π5 semantically includes π6 whereas π6 does not include π5. The notion of the semantic inclusion (or equivalence) degree of attribute values can be extended to the semantic equivalence degree of tuples. Let ti = and tj = be two tuples in fuzzy relational instance r over schema R (A1, A2, …, An). The semantic inclusion degree of tuples ti and tj is denoted SIDα (ti, tj) = min {SIDα (ti [A1], tj [A1]), SIDα (ti [A2], tj [A2]), …, SIDα (ti [An], tj [An])}. The semantic equivalence degree of tuples ti and tj is denoted SEDα (ti, tj) = min {SEDα (ti [A1], tj [A1]), SEDα (ti [A2], tj [A2]), …, SEDα (ti [An], tj [An])}.
Using Fuzzy Analogical Reasoning to Refine the Query Answers
271
4 Fuzzy Functional Dependency in Fuzzy Relational Databases Functional dependencies (FDs), being one of the most important data dependencies in relational databases, express the dependency relationships among attribute values in relation [5]. In classical relational databases, functional dependencies can be defined as follows [9]. Definition: For a classical relation r (R), in which R denotes the set of attributes, we say r satisfies the functional dependency FD: X → Y where XY ⊆ R if (∀ t ∈ r) (∀ s ∈ r) (t [X] = s [X] t [Y] = s [Y]). For a FD: X → Y, t [X] = s [X] implies t [Y] = s [Y]. Such knowledge exists in relational databases and can easily be mined from precise databases when we do not know it. However, how to discover such knowledge in fuzzy relational databases is difficult. Fuzzy functional dependencies (FFD) can reflexively represent the dependency relationships among attribute values in fuzzy relations such as "the salary almost dependents on the job position and experience" and have been extensively discussed in the context of fuzzy relational databases [9, 6, 11]. The following gives the definition of fuzzy functional dependency according to the related work. Definition: For a relation instance r (R), where R denotes the schema, its attribute set is denoted by U, and X, Y ⊆ U, we say r satisfies the fuzzy functional dependency FFD: X ѿ Y, if (∀t ∈ r) (∀s ∈ r) (SEDα (t [X], s [X]) ≤ SEDα (t [Y], s [Y])) Example. Consider a fuzzy relation instance r in Table 1. Assume that attribute domains Dom (X) = {a, b, c, d, e} and Dom (Y) = {f, g, h, i, j}. There are two resemblance relations Res (X) and Res (Y) on X and Y shown in Fig. 2 and Fig. 3, respectively. Let two thresholds on Res (X) and Res (Y) be α1 = 0.90 and α2 = 0.95, respectively. Table 1. Fuzzy relation instance r K
X
Y
1001
{0.7/a, 0.4/b, 0.5/d}
{0.9/f, 0.6/g, 1.0./h}
1002
{0.5/a,0.4/c, 0.8/d}
{0.6/g, 0.9/h, 0.9/i}
1003
{0.3/d, 0.8/e}
{0.6/h, 0.4/i, 0.1/j}
Since SE (t [X], s [X]) = min (SID (t [X], s [X]), SID (s [X], t [X])) = min (0.824, 0.875) = 0.824 and SE (t [Y], s [Y]) = min (SID (t [Y], s [Y]), SID (s [Y], t [Y])) = min (1.0, 0.96) =0.96, so SE (t [X], s [X]) ≤ SE (t [Y], s [Y]) is true. Similarly, we have SE (t [X], u [X]) ≤ SE (t [Y], u [Y]) and SE (s [X], u [X]) ≤ SE (s [Y], u [Y]). Hence FFD: X ѿ Y holds in r.
272
Z.M. Ma, L. Yan, and G. Li Res (X)
a
b
c
d
e
a
1.0
0.2
0.3
0.2
0.4
1.0
0.92
0.4
0.1
1.0
0.1
0.3
1.0
0.2
b c d e
1.0
Fig. 2. Resemblance relation Res (X) on X Res (Y)
f
g
h
i
j
f
1.0
0.3
0.2
0.96
0.2
g h
1.0
0.4
0.2
0.3
1.0
0.3
0.1
1.0
0.4
i j
1.0
Fig. 3. Resemblance relation Res (Y) on Y
Fuzzy functional dependencies have the following properties. Proposition: A classical functional dependency FD satisfies the definition of FFD. Proof: Let FD: X → Y hold in r (R). Then for ∀ t ∈ r and ∀ s ∈ r, t [X] = s [X] t [Y] = s [Y], and SED (t [X], s [X]) = SED (t [Y], s [Y]) = 1.
5 Refining Imprecise Query Answers Approximate reasoning refers to such an inferring process that an object S (the source) has properties P and Q, respectively, and another object T (the target) shares the property P, then we can infer that T may also possess the property Q. In other words, S and T have the same property Q if S and T are matched on P to each other. This process can be expressed by the following logical rule. IF P (S) and Q (S) and P (T) THEN Q (T) It is clear that there are two issues involved in approximate reasoning. One is that which property P can determine the property Q. Another is that which object S has the same property as the object T on property P such that T has the same property as S on property Q. In order to use approximate reasoning to refine the query answers for the relational databases with imprecise information, we need to determine which property P imprecisely determines the property Q and which object S has the highest match degree with the object T on property P such that T has the same property Q as S. In relational databases, the relationship between properties P and Q of approximate reasoning is
Using Fuzzy Analogical Reasoning to Refine the Query Answers
273
essentially the constraints of functional dependency between P and Q, where P and Q can be seen as attributes. Consider that the associated relationship between P and Q may be fuzzy and the match between S and T on P is generally fuzzy. The approximate reasoning under such conditions is called fuzzy approximate reasoning. Let an fuzzy database relation with imprecise data be r (A1, A2, …, An) which consists of tuples t1, t2, …, tm. Assume that we have an imprecise query answer tm [An] and we would like to refine this query answer using approximate reasoning. For this purpose, first, we need to know which attribute Ai (1 ≤ i ≤ n - 1) can determine the attribute An. In other words, we have to mine such a rule of fuzzy functional dependency: Ai An (1 ≤ i ≤ n - 1). Second, for the found attribute Ai, we need to know which tuple tj (1 ≤ j ≤ m - 1) has the highest equivalent degree with the tuple tm on attribute Ai. On the basis of that, we can replace tm [An] with tj [An]. 5.1 Mining Fuzzy Functional Dependency According to the definition of fuzzy functional dependency, the requirement SED (tj [Ai], tk [Ai]) ≤ SED (tj [An], tk [An]) (1 ≤ j, k ≤ m – 1) must be satisfied in a relation r (A1, A2, …, An) with imprecise information if FFD: Ai An. In order to discover such attribute Ai from the relation r, we give the algorithm as follows. For i := 1 to n – 1 do For j := 1 to m – 1 do For k := j + 1 to m – 1 do If SED (tj [Ai], tk [Ai]) > SED (tj [An], tk [An])) then loop next i; Next k; Next j; Output Ai; Next i.
For any one output attribute Ai, FFD: Ai An must hold. 5.2 Measuring Equivalence Degree Between Attribute Values Assume that FFD: Ai An holds. Then we should find a tuple tj (1 ≤ j ≤ m – 1) from r such that for any tuple tk (1 ≤ k ≤ m – 1 and k ≠ j) in r, SED (tj [Ai], tm [Ai]) ≥ SED (tk [Ai], tm [Ai]). In the following we give the corresponding algorithm. t := t1; x := SED (t1 [Ai], tm [Ai]); For j := 2 to m – 1 do If SED (tj [Ai], tm [Ai]) > x then { x := SED (tj [Ai], tm [Ai]); t := tj }; Next j.
From the algorithm above, it can be seen that tuple t has the highest equivalence degree with tuple tm on attribute Ai.
274
Z.M. Ma, L. Yan, and G. Li
5.3 Refining Imprecise Query Answer Following the idea of analogical reasoning discussed above, we can infer that tm [An] should be equivalent to t [An] because FFD: Ai An holds and tm [Ai] is equivalent to t [Ai] with the highest equivalence degree. If t [An] is more precise data than tm [An], we can replace query answer tm [An] with t [An] as an approximate answer. Viewed from the result, the query answer is refined. In order to refine the imprecise query answer, the simplest approach is to replace tm [An] with t [An]. However the imprecise query answer tm [An] can generally be compressed to be a more precise value than t [An] according to the constraints of fuzzy functional dependencies [14, 9]. In the following, we give the strategies for compressing imprecise value by using fuzzy functional dependencies. Assume the imprecise values are value intervals on the continua domain of attribute. With respect to the partial values on the discrete domain of attribute, the similar method can be used. Here the value intervals and the partial values can be regarded as the special fuzzy values, where the membership degree of each value in the intervals or the partial values is exactly 1.0. In [6], fuzzy values have been transformed into intervals. Let t [Ai] = f1, tm [Ai] = f2, t [An] = g1, and tm [An] = g2. Moreover, |f1| = p, |f2| = q, |g1| = u, |g2| = v, |g1 ∩ g2| = k, and |f1 ∩ f2| = k’, then |g1 ∪ g2| = u + v – k and |f1 ∪ f2| = p + q – k’. Here |η| denotes the modular of value interval, and |η| = b − a if η = [a, b] and b ≠ a. Assume SED (f1, f2) > SED (g1, g2). It is clear that FFD: Ai An does not hold and we should make SED (f1, f2) equal SED (g1, g2) after compression processing. (a) If 0 < SED (g1, g2) < SED (f1, f2) = 1, g2 can be refined to make g1 and g2 closer in semantics. At this moment, g2 is compressed into g2’ = g1 ∩ g2. (b) If 0 < SED (g1, g2) < SED (f1, f2) < 1, we can compress g2 to satisfy FFD: Ai An. sub (g1) ≥ sub (g2) and sup (g1) ≥ sup (g2) Let g2’ = Rstring (g2, x) such that SED (g1, g2’) = SED (f1, f2), where |g1| = u, |g2’| = x, |g1 ∩ g2’| = k, and |g1 ∪ g2’| = u + x – k. Hence x can be obtained uniquely. If x ≤ k, let x = k. When sub (g1) ≤ sub (g2) and sup (g1) ≤ sup (g2), we can apply the similar method to gain the unique x. sub (g1) ≥ sub (g2) and sup (g1) ≤ sup (g2) Let g2’ = Rstring (Lstring (g2, x), y) such that SED (g1, g2’) = SED (f1, f2) and(n–i)/(i–j) = (sup (g2) – sup (g1))/(sub (g1) – sub (g2)), where |g1| = u, |g2’| = y, |g1 ∩ g2’| = u, and |g1 ∪ g2’| = y. So x and y can be obtained uniquely. If x < y ≤ k, let x = y = k. When sub (g1) ≤ sub (g2) and sup (g1) ≥ sup (g2), we cannot refine g2. Note that there may be another attribute Aj such that Aj An (1 ≤ j ≤ n – 1). Under the situation that multiple attributes functionally determine An, we can repeat the processing procedure above to obtain the more precise query answers.
6 Summary Imprecise data may appear in databases due to data unavailability. Fuzzy sets have been used to model imprecise data in the relational databases. Functional dependen-
Using Fuzzy Analogical Reasoning to Refine the Query Answers
275
cies, which are one kind of integrity constraints in relational databases, not only play a critical role in a logical database design but also can be employ to reasoning, being knowledge. In this paper, we discuss the issues on refining imprecise query answers for the relational databases with imprecise information by using analogical reasoning. In order to do that, we use the notion of equivalence degree of fuzzy data, by which we define the fuzzy functional dependencies in the fuzzy relational databases. On the basis of that, we give the approach to mining the rules of fuzzy functional dependencies and apply the frame of analogical reasoning to refine the imprecise query answer.
References 1. Buckles, B. P., Petry, F. E., A Fuzzy Representation of Data for Relational Database. Fuzzy Sets and Systems, 7 (3): 213-226, 1982. 2. Codd, E. F., Extending the Database Relational Model to Capture More Meaning. ACM Transactions on Database Systems, 4 (4): 397-434, 1979. 3. DeMichiel, L. G., Resolving Database Incompatibility: An Approach to Performing Relational Operations over Mismatched Domains. IEEE Transactions on Knowledge and Data Engineering, 1 (4): 485-493, 1989. 4. Dutta, S., Approximate Reasoning by Analogy to Null Queries. International Journal of Approximate Reasoning, 5: 373-398, 1991. 5. Hale, J., Shenoi, S., Analyzing FD Inference in Relational Databases. Data and Knowledge Engineering, 18: 167-183, 1996. 6. Liao, S. Y., Wang, H. Q., Liu, W. Y., Functional Dependencies with Null Values. Fuzzy Values, and Crisp Values, IEEE Transactions on Fuzzy Systems, 7 (1): 97-103, 1999. 7. Ma, Z. M., Zhang, W. J., Ma, W. Y., Semantic Measure of Fuzzy Data in Extended Possibility-based Fuzzy Relational Databases. International Journal of Intelligent Systems, 15 (8): 705-716, 2000. 8. Ma, Z. M., Zhang, W. J., Ma, W. Y., Mili, F., Data Dependencies in Extended PossibilityBased Fuzzy Relational Databases. International Journal of Intelligent Systems, 17 (3): 321-332, 2002. 9. Ma, Z. M., Zhang, W. J., Mili, F., Fuzzy Data Compression Based on Data Dependencies. International Journal of Intelligent Systems, 17 (4): 409-426, 2002. 10. Prade, H., Testemale, C., Generalizing Database Relational Algebra for the Treatment of Incomplete or Uncertain Information and Vague Queries. Information Sciences, 34: 115143, 1984. 11. Raju, K. V. S. V. N., Majumdar, A. K., Fuzzy Functional Dependencies and Lossless Join Decomposition of Fuzzy Relational Database Systems. ACM Transactions on Database Systems, 13 (2): 129-166, 1988. 12. Rundensteiner, E. A., Hawkes, L. W., Bandler, W., On Nearness Measures in Fuzzy Relational Data Models. International Journal of Approximate Reasoning, 3:267-98, 1989. 13. Shenoi, S., Melton, A., Proximity Relations in the Fuzzy Relational Databases. Fuzzy Sets and Systems, 31 (3): 285-296, 1989. 14. Tseng, F. S. C., Chen, A. L. P., Yang, W. P., Refining Imprecise Data by Integrity Constraints. Data and Knowledge Engineering, 11 (3): 299-316, 1993. 15. Zadeh, L. A., Fuzzy Sets. Information and Control, 8 (3): 338-353, 1965. 16. Zadeh, L. A., Fuzzy Sets as a Basis for a Theory of Possibility. Fuzzy Sets and Systems, 1 (1): 3-28, 1978.
A Linguistic Truth-Valued Uncertainty Reasoning Model Based on Lattice-Valued Logic Shuwei Chen, Yang Xu, and Jun Ma Department of Mathematics, Southwest Jiaotong University, Chengdu 610031, Sichuan, P.R. China [email protected]
Abstract. The subject of this work is to establish a mathematical framework that provide the basis and tool for uncertainty reasoning based on linguistic information. This paper focuses on a flexible and realistic approach, i.e., the use of linguistic terms, specially, the symbolic approach acts by direct computation on linguistic terms. An algebra model with linguistic terms, which is based on a logical algebraic structure, i.e., lattice implication algebra, is applied to represent imprecise information and deals with both comparable and incomparable linguistic terms (i.e., non-ordered linguistic terms). Within this framework, some inferential rules are analyzed and extended to deal with these kinds of lattice-valued linguistic information.
1
Introduction
One of the fundamental goals of artificial intelligence (AI) is to build artificially computer- based systems which make computer simulate, extend and expand human’s intelligence and empower computers to perform tasks which are routinely performed by human beings. Due to the fact that human intelligence actions are always involved with uncertainty information processing, one important task of AI is to study how to make the computer simulate human being to deal with uncertainty information. Among major ways in which human being deal with uncertainty information, the uncertainty reasoning becomes an essential mechanism in AI. In real uncertainty reasoning problem, most information, which are always propositions with truth-values, can be very qualitative in nature, i.e., described in natural language. Usually, in a quantitative setting the information is expressed by means of numerical values. However, when we work in a qualitative setting, that is, with vague or imprecise knowledge, this cannot be estimated with an exact numerical value. Then, it may be more realistic to use linguistic truthvalues instead of numerical values. Since 1990, there have been some important conclusions on inference with linguistic terms. In 1990, Ho [3] constructed a distributive lattice-Hedge algebra, which can be used to deal with linguistic terms. He [4] gave a measure function between two linguistic terms, and obtained the fuzzy inference theory and method, which based on linguistic term Hedge algebra. In 1996, Zadeh [17] L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 276–284, 2005. c Springer-Verlag Berlin Heidelberg 2005
A Linguistic Truth-Valued Uncertainty Reasoning Model
277
discussed the formalization of some words and proposition of natural language, and given the standard form of language propositional and production ruler, and he discussed the linguistic terms fuzzy inference based on the fuzzy sets theory and fuzzy inference method. In 1998 and 1999, Turksen [9,10] studied the formalization and inference of descriptive words, substantive words and declarative sentence. In 2003, Xu [5], and in 2004 Pei [8] proposed a kind of simple lattice implication algebra with linguistic truth-values. Based on the symbolic approaches, a linguistic truth-valued algebra model, which is based on a logical algebraic structure, i.e., lattice implication algebra, is applied to represent imprecise information and deals with both comparable and incomparable linguistic terms (i.e., non-ordered linguistic values). Within this framework, some inferential rules are analyzed and extended to deal with these kinds of lattice-value linguistic information. The paper is organized as follows: Section 2 analyzes the structure of lattice implication algebras with linguistic terms. Based on it, a linguistic truth-valued uncertainty reasoning model based on lattice-valued logic is proposed in Section 3 with an illustration. Section 4 comes to the conclusion.
2 2.1
Lattice Implication Algebra with Linguistic Terms Lattice Structure and Lattice Implication Algebra
Lattice structures [1] have been successfully applied to many fields, such as reliability theory, rough theory, knowledge representation and inference etc. Among them, the introduction of L-fuzzy sets by Goguen [2] provides a general framework for Zadeh’s fuzzy set theory. Two important cases of L are of interest and often being used: when L is a finite simple ordered set; and when L is the unit interval [0, 1]. More general, L should be a lattice with suitable operations like ∧, ∨, →, $. The question of the appropriate operation and lattice structure has generated much literature. Goguen [2] established L-fuzzy logic of which truth value set is a complete latticeordered monoid, also called a complete residuated lattice in Pavelka and Novak’s L-fuzzy logic [7,6]. Since this algebraic structure is quite general, we specify the algebraic structure to lattice implication algebras introduced by Xu [11], which was established by combining lattice and implication algebra with the attempt to model and deal with the comparable and incomparable information. There have been much work about lattice implication algebra, as well as the corresponding lattice valued logic system, lattice-valued reasoning theory and methods [11,12,13,16]. The lattice implication algebra is defined axiomatically as: Definition 1. [11] Let (L, ∨, ∧, O, I) be a bounded lattice with an order- reversing involution $, I and O the greatest and the smallest element of L respectively, and →: L × L → L be a mapping. (L, ∨, ∧, O, I) is called a lattice implication algebra (LIA) if the following conditions hold for any x, y, z ∈ L:
278
S. Chen, Y. Xu, and J. Ma
(I1 ) x → (y → z) = y → (x → z); (I2 ) x → x = I; (I3 ) x → y = y → x ; (I4 ) x → y = y → x = I implies x = y; (I5 ) (x → y) → y = (y → x) → x; (l1 ) (x ∨ y) → z = (x → z) ∧ (y → z); (l2 ) (x ∧ y) → z = (x → z) ∨ (y → z). In the following, LIA structure will be used to characterize the set of linguistic terms. 2.2
Lattice Implication Algebra with Linguistic Terms
A linguistic term differs from a numerical one in that its values are not numbers, but words or sentences in a natural or artificial language. On the other hand, these words, in different natural language, seem difficult to distinguish their boundary sometime, but their meaning of common usage can be understood. A simple structure of linguistic terms is given in Fig. 1.
O
a
b
c
I
O=f alse, a=less f alse, b=unk now, c=less true, I=true Fig. 1. Linguistic terms in a linear ordering
Note that there are some “vague overlap districts” among some words which cannot be strictly linearly ordered, a more general structure of linguistic terms [8] is given as in Fig. 2. Note that c=less true, a=less false are incomparable. You can not collapse that structure into a linearly ordered structure, because then you would impose an ordering on a and c which was originally not present. It means the set of linguistic values may not be strictly linearly ordered. Moreover, linguistic terms can be ordered by their meanings in natural language. Naturally, it should be suitable to represent the linguistic values by a partially ordered set or lattice. According to the feature of linguistic variables, we need to find some suitable algebras to characterize the values of linguistic variables. There is a good basic to believe that this approach would provide us with simple algorithms for reasoning. To attain this goal we characterize the set of linguistic truth-values by a LIA structure, i.e., use LIA to construct the structure of value sets of linguistic variables. In general, the value of a linguistic variable can be a linguistic expression involving a set of linguistic terms such as “high,” “middle,” and “low,” modifiers such as “very,” “more or less” (called hedges [3]) and connectives (e.g., “and,” “or”). Let us consider the domain of the linguistic variable “truth”: domain (truth)={true, false, very true, more or less true, possibly true, very false, ...},
A Linguistic Truth-Valued Uncertainty Reasoning Model
279
I a b d c O I=more true, b=true, c=less true, a=less f alse, d=f alse, O=more f alse
Fig. 2. Linguistic terms in a nonlinear ordering
which can be regarded as a partially ordered set whose elements are ordered by their meanings and also regarded as an algebraically generated set from the generators G={true, false} by means of a set of linguistic modifiers M ={very, more or less, possibly, ...}. The linguistic modifiers [3] play a vital role in the representation of fuzzy sets. They are strictly related to the notion of vague concept. The generators G can be regarded as the prime terms, different prime terms correspond to the different linguistic variables. Taking into account the above remarks, construction of an appropriate set of linguistic values for an application can be carried out step by step. Consider a set of linguistic hedges, H={absolutely, highly, very, quite, exactly, almost, rather, somewhat, slightly }, which is selected by a careful and detailed investigation and analysis. H can be naturally ordered by the degree of strengthening or weakening the meaning of the primary terms: true and f alse. We say that a ≤ b iff a(True)≤ b(True) in the natural language, where a and b are linguistic hedges. Applying the hedges of H to the primary term “true” or “false” we obtain a partially ordered set or lattice. For example, as represented in Fig. 3, we can obtain a lattice generated from “true” or “false” by means of modifiers in H. The set of linguistic truth-values obtained by the above procedure is a lattice with the boundary, denoted by L18 ={I=absolutely true, a=highly true, b=very true, c=quite true, d=exactly true, e=almost true, f=rather true, g=somewhat true, h=slightly true, O=absolutely false, t=highly false, s=very false, r=quite false, q=exactly false, p=almost false, n=rather false, m=somewhat false, l=slightly false}. Moreover, one can define ∧, ∨, implication → and complement operation $ on this lattice according to the LIA structure. The operations ∧ and ∨ can be shown in the Hasse diagram Fig. 3. The complement operation is given in Table 1 according to the lattice implication algebra structure and properties respectively (the table of implication operation is not given here for its large size): As we can show that (L18 , ∨, ∧, $, →, O, I) constructed above is a lattice implication algebra, all the properties of lattice implication algebra hold in L. So the finite set of linguistic terms can be characterized by a finite lattice implication algebra.
280
S. Chen, Y. Xu, and J. Ma I
a
l
b
m
c
n
d
p
e
q
f
r
g
s
h
t
O
Fig. 3. The lattice-ordering structure of linguistic terms with 18 elements Table 1. Complement operator of L18 O I
3
t
s
g f
r q p n m
l
h g f
e d c
e d c
h
l
r q p n m O
b
a
t
s
b
a
I
Linguistic Truth-Valued Uncertainty Reasoning Model Based on Lattice-Valued Logic
Uncertainty reasoning can be seen as the basis of intelligent control, decision making, etc. From the viewpoint of symbolism, it is highly necessary to study and establish the logical foundation for uncertainty reasoning. In order to provide a kind of logical foundation for uncertainty reasoning and automated reasoning, Xu and his colleagues established some kinds of logical systems, especially the gradational lattice-valued propositional logic Lvpl [12] and the gradational lattice-valued first-order logic Lvf l [14,15]. Based on the lattice-valued logic, some uncertainty reasoning and automated reasoning theory and methods are also investigated [13,16]. Here we take the gradational lattice-valued propositional logic Lvpl as our framework for linguistic truth-valued uncertainty reasoning approach. We review some related results of Lvpl , detailed information can be found in [12,16]. Definition 2. [12] Let Dn ⊆ Fpn , where Fp is the set of formulae of Lvpl . A mapping rn : Dn −→ Fp is called an n-ary partial operation on Fp , where Dn is the domain of rn , also denoted by Dn (rn ). Definition 3. [12] A mapping tn : Ln −→ L is said to be an n-ary truth-valued operation on L, if
A Linguistic Truth-Valued Uncertainty Reasoning Model
281
(1). α → tn (α1 , · · · , αn ) ≥ tn (α → α1 , · · · , α → αn ) holds for any α ∈ L and (α1 , · · · , αn ) ∈ Ln . (2). tn is isotone in each argument. Denote Rn ⊆ {rn | rn is an n-ary partial operation on Fp }, Tn ⊆ {tn | tn is an n-ary truth-valued operation on L}, +∞ Rn ⊆ Rn × Tn , and R ⊆ Rn . n=0
If (r, t) ∈ Rn , then (r, t) is called an n-ary rule of inference in Lvpl . Remark 4. It can be seen that the rules of inference in Lvpl are composed of two parts, one for the formal deduction of formulae, and the other for the transfer of the truth-values of the corresponding formulae. The following two definitions about α-i type closed and α-i type sound mean that the syntactical deduction and the semantic transfer should be consistent with some degree. In this sense, uncertainty reasoning can be brought into the framework of lattice-valued logic. Example 5. For any p, q, g ∈ Fp , β, θ ∈ L, define t∗2 (θ, β) = θ ∧ β,
r20 (p, p → q) = q, r2 (p, q) = p ∧ q,
r2∗ (p → g, p → q) = p → (g ∧ q), r2 (p → q, q → g) = p → g, r2∇ (p, q) = (p → q). It can be check easily that (r20 , t∗2 ), (r2 , t∗2 ), (r2∗ , t∗2 ), (r2 , t∗2 ) and (r2∇ , t∗2 ) are all 2-ary rule of inference in Lvpl . In the following, we assume that T ⊆ FL (Fp ), Th ⊆ {T | T : Fp −→ L is a homomorphic mapping } TH , where FL (Fp ) is the set of all L-fuzzy subsets on Fp . Definition 6. [12] Let X ∈ FL (Fp ), (r, t) ∈ Rn , α ∈ L. If X ◦ r ⊇ α ⊗ (t ◦
n +
X)
(1)
holds in Dn (r), then X is said to be α-I type closed w.r.t. (r, t). If X ◦r ⊇t◦
n + (α ⊗ X)
(2)
holds in Dn (r), then X is said to be α-II type closed w.r.t. (r, t). If for any (r, t) ∈ R, X is α-i type closed w.r.t. (r, t), then X is said to be α-i type closed w.r.t. R, i =I, II.
282
S. Chen, Y. Xu, and J. Ma
Definition 7. [12] Let α ∈ L, R is said to be α-i type sound w.r.t. T , if T is α-i type closed w.r.t. R holds for any T ∈ T , i =I, II. Proposition 8. Let α, θ ∈ L, i = I, II, T ∈ TH , (1). If α ≤ θ∈L (θ ∨ θ ), then T is α-i type closed w.r.t. (r20 , t∗2 ) and (r2 , t∗2 ). (2). T is α-i type closed w.r.t. (r2 , t∗2 ), (r2∗ , t∗2 ) and (r2∇ , t∗2 ). The linguistic truth-valued uncertainty reasoning approach based on latticevalued logic is described as follows: Step 1. Transfer the real reasoning problem into formulae in Lvpl with linguistic truth-values, such as (p, very true), (p → q, somewhat true), etc.. Step 2. Choose appropriate rules of inference in Lvpl . Step 3. Use the selected rules of inference to get the reasoning consequence, which also takes the form of (q, somewhat true). Step 4. Transfer the obtained reasoning consequence into natural languages. To illustrate how the proposed method works, we consider a simple example of deciding whether to buy some kind of car. Example 9. Suppose that we have the following rule: If the car is cheap, safe and comfortable, then David will highly want to buy it. The above rule describe in natural language can be transferred into (p1 , I) ∧ (p2 , I) ∧ (p3 , I) → (q, a), where p1 = “the car is cheap, ” p2 = “the car is saf e, ” p3 = “the car is comf ortable, ” q = “David will buy it, ” I = absolutely true, a = highly true. By using (r2 , t∗2 ) two times and (r2∇ , t∗2 ) once, we can transfer the above formula with linguistic truth-values into ((p1 ∧ p2 ∧ p3 ) → q, a). Now, there is a car which is rather cheap, quite safe and very comfortable. Then what about David’s decision? By taking a similar procedure, we can transfer the fact into ((p1 ∧ p2 ∧ p3 , f ), where f = rather true. So, by applying (r20 , t∗2 ), the reasoning consequence can be obtained as (q, f ), which is interpreted in natural language as David will rather want to buy this car. Remark 10. The above example can also be looked upon as a kind of evaluation or decision-making problem with uncertainties, and weights can also be added to the factors. This shows that the above method could also be applied to treat evaluation and decision-making problems.
A Linguistic Truth-Valued Uncertainty Reasoning Model
4
283
Conclusions
A linguistic truth-valued uncertainty reasoning approach based on lattice-valued logic was proposed in this paper. This method offers the advantage of not requiring the linguistic approximation step. Furthermore, it does not require the definition of the membership functions associated with the linguistic terms. Especially, we do use a finite set of linguistic terms with a rich lattice ordering algebraic structure to represent the truth-values of propositions. So this procedure has another advantage of handling incomparable linguistic terms and the logical operations to fulfill the transfer of truth-values accompanied the procedure of formal deduction. Note that lattice is a more universal structure than the set of linguistic terms, and the implication operation in LIA is much general richer, it would be reasonable and realistic to design uncertainty reasoning models based on these methodologies.
Acknowledgements This work was partially supported by the National Natural Science Foundation of China (Grant No. 60474022).
References 1. G. Birkhoff, Lattice Theory, 3rd edition, American Mathematical Society, New York, 1967. 2. J. A. Goguen, “The logic of inexact concepts,” Synthese, Vol. 19, pp. 325-373, 1968. 3. N. C. Ho, and W. Wechler, “Hedge algebras: an algebraic approach to structure of sets of linguistic truth values,” Fuzzy Sets and Systems, Vol. 35, pp. 281-293, 1990. 4. N. C. Ho, and W. Wechler, “Extended hedge algebras and their application to fuzzy logic,” Fuzzy Sets and Systems, Vol. 52, pp. 259-281, 1992. 5. T. T. Lee, and Y. Xu, “The consistencey of rule-bases in lattice-valued first-order logic LF(X),” Proc. of 2003 IEEE Internatinal Conference on SMC, pp. 4968-4973, 2003. 6. V. Novak, “First-order fuzzy logic,” Studia Logica, Vol. 46, pp. 87-109, 1982. 7. J. Pavelka, “On fuzzy logic I: Many-valued rules of inference, II: Enriched residuated lattices and semantics of propositional calculi, III: Semantical completeness of some many-valued propositional calculi,” Zeitschr. F. Math. Logik und Grundlagend. Math. , Vol. 25, pp. 45-52, 119-134, 447-464, 1979. 8. Z. Pei, and Y. Xu, “Lattice implication algebra model of a kind of linguistic terms and its inference,” Proc. of the 6th International FLINS Conference, pp. 93-98, 2004. 9. I. B. Turksen, A. Kandel, and Y. Q. Zhang, “Universal truth tables and normal forms,” IEEE Trans. Fuzzy Syst., Vol. 6, pp. 295-303, 1998. 10. I. B. Turksen, “Computing with descriptive and verisic words,” Proc. NAFIP’99, pp. 13-17, 1999.
284
S. Chen, Y. Xu, and J. Ma
11. Y. Xu, “Lattice implication algebras,” J. Southwest Jiaotong Univ., Vol. 89, pp. 20-27, 1993. 12. Y. Xu, K. Y. Qin, J. Liu, and Z. M. Song, “L-valued propositional logic Lvpl ,” Inform. Sci., Vol. 114, pp. 205-235, 1999. 13. Y. Xu, D. Ruan, and J. Liu, “Approximate reasoning based on lattice-valued propositional logic Lvpl ,” in: D. Ruan and E. E. Kerre (Eds.) Fuzzy Sets Theory and Applications. Kluwer Academic Publishers, pp. 81-105, 2000. 14. Y. Xu, J. Liu, Z. M. Song, and K. Y. Qin, “On sematics of L-valued first-order logic Lvf l ,” Int. J. Gen. Syst., Vol. 29, pp. 53-79, 2000. 15. Y. Xu, Z. M. Song, K. Y. Qin, and J. Liu, “Syntax of L-valued first-order logic Lvf l ,” Int. J. Mutiple-Valued Logic, Vol. 7, pp. 213-257, 2001. 16. Y. Xu, D. Ruan, K. Y. Qin, and J. Liu, Lattice-Valued Logic, Springer-Verlag, Berlin, 2003. 17. L. A. Zadeh, “Fuzzy logic = computing with words,” IEEE Trans. Fuzzy Syst., Vol. 4, pp. 103-111, 1996.
Fuzzy Programming Model for Lot Sizing Production Planning Problem Weizhen Yan1 , Jianhua Zhao2 , and Zhe Cao3 1
Institute of Systems Engineering, Tianjin University, Tianjin 300072, China Department of Mathematics, Shijiazhuang College, Shijiazhuang 050801, China Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China
2 3
Abstract. This paper investigates lot sizing production planning problem with fuzzy unit profits, fuzzy capacities and fuzzy demands. First, the fuzzy production planning problem is formulated as a credibility measure based fuzzy programming model. Second, the crisp equivalent model is derived when the fuzzy parameters are characterized by trapezoidal fuzzy numbers. Then a fuzzy simulation-based genetic algorithm is designed for solving the proposed fuzzy programming model as well as its crisp equivalent. Finally, a numerical example is provided for illustrating the effectiveness of algorithm.
1
Introduction
As an important topic in operations research, production planning problem has been attracting attentions from both the researchers and enterprizes. In traditional production planning problems, it is assumed that the system parameters are deterministic, and the objectives and constraints are crisp. However, in reality, uncertainty is unavoidably involved in a production planning problem, which makes the objectives and constraints uncertain. In order to deal with the uncertain environment and arrange the production planning properly, an enterprize has to take into account the uncertainties involved so that it can retain a satisfactory profit level. Recent years viewed increasing interests in the problem of production planning in fuzzy environments. In [1], Dong et al. presented a possibility based chance- constrained programming model for fuzzy production planning problem. In [12], Schnidis discussed the fuzzy fuzzy production planning problem by tolerance method, which was presented by Zimmermann [18]. In [13,14,15,16], fuzzy production planning problems were modelled by parametric programming, in which the possibility that the fuzzy constraints holds is described by fuzzy membership function. As far as we know, the study of fuzzy production problem is based on the fuzzy membership function and possibility that fuzzy events hold. Recently, Liu and Liu [9] proposed the credibility measure and the credibility measure based fuzzy programming models. The purpose of this paper is to formulate the fuzzy production planning problem by credibility measure based L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 285–294, 2005. c Springer-Verlag Berlin Heidelberg 2005
286
W. Yan, J. Zhao, and Z. Cao
chance-constrained programming model. Then based on the credibility theory, we derive the crisp equivalent when the fuzzy parameters are characterized by trapezoidal fuzzy numbers. Moreover, a hybrid intelligent algorithm by integrating fuzzy simulation and genetic algorithm is designed for solving the fuzzy programming model as well as its crisp equivalent model. Finally, a numerical example is provided to illustrate the effectiveness of the hybrid intelligent algorithm.
2
Preliminaries
Possibility theory [2][17] is based on a pair of dual fuzzy measures—possibility measure and necessity measure. Let Θ be a nonempty set, (Θ) the power set of Θ, and Pos a set function defined on (Θ). Pos is said to be possibility measure, if Pos satisfies the following conditions (i) Pos(∅) = 0, Pos(Θ) = 1; (ii) Pos(∪k Ak ) = supk Pos(Ak ) for any arbitrary collection {Ak } in (Θ). While the necessity measure Nec is defined by Nec{A} = 1 − Pos{Ac } for all A ∈ (Θ). Recently, the third set function Cr, called credibility measure, was introduced by Liu and Liu [9] as follows 1 (Pos(A) + Nec(A)) . 2 It is easy to verify that Cr has the following properties: Cr(A) =
(i) Cr(∅) = 0, Cr(Θ) = 1; (ii) Cr(A) ≤ Cr(B) whenever A, B ∈ (Θ), and A ⊂ B; (iii) Pos(A) ≥ Cr(A) ≥ Nec(A) for all A ∈ (Θ); (iv) Cr(A) + Cr(Ac ) = 1 for all A ∈ (Θ). Remark 1. (Gao and Liu [3])A fuzzy event does not hold necessarily even though its possibility achieves 1, and may hold possibly even though its necessity is 0. However, if the credibility of a fuzzy event is 1, then it holds certainly. On the other hand, if the credibility of fuzzy event is 0, then it does not hold certainly.
3
Problem Formulation
Consider an enterprize that manufactures and supplies N types of products to customers in T production planning periods. Let ξi be profit per unit product i, ηi be the inventory cost per unit product i, xit be the production quantity in the production period t, yit be the inventory level at production period t, i = 1, 2, · · · , N , t = 1, 2, · · · , T . Then the total profit from the sales of all prodT N ucts in T periods is ξi xit , and the total inventory cost in T periods is T N i=1 t=1
i=1 t=1
hi yit . Therefore, the net profit of the enterprize is
Fuzzy Programming Model for Lot Sizing Production Planning Problem T N
287
(ξi xit − hi yit ).
i=1 t=1
Suppose that the production consists of K working procedures. Let wij be the production capacity required by product i in working procedure j, ηjt the production capacity of working procedure j in period t. Then we have n
wij xit ≤ ηjt , j = 1, 2, · · · , K, t = 1, 2, · · · , T.
(1)
i=1
That is, in each period, the should not exceed the production capacity of the enterprize. Let the initial inventory level be zero, and dit the custom demand quantity of product i in period t. Then we have xit + yi(t−1) − yit ≥ dit , yi0 = 0, i = 1, 2, · · · , N, t = 1, 2, · · · , T.
(2)
That is, in each period, the customer demand should be completely satisfied with out delays. Suppose that the enterprize pursues the total profit by making decisions on the production quantities xit and inventory levels yit with production capacity constraints (1) and demand constraints (2) in the planning horizon. Then we have the following mathematical model: ⎧ T N ⎪ ⎪ ⎪ max (ξi xit − hi yit ) ⎪ ⎪ ⎪ ⎪ i=1 t=1 ⎪ ⎨ N (3) s.t. wij xit ≤ ηjt , j = 1, 2, · · · , K, t = 1, 2, · · · , T ⎪ ⎪ ⎪ ⎪ i=1 ⎪ ⎪ ⎪ xit + yi(t−1) − yit ≥ dit , i = 1, 2, · · · , N, t = 1, 2, · · · , T ⎪ ⎩ xit ≥ 0, yit ≥ 0, yi0 = 0, i = 1, 2, · · · , N, t = 1, 2, · · · , T, In practice, demands, profit per unit production and production capacity are often subject to fluctuations and difficult to measure. Certainly, we may neglect the uncertainties and solve the problem via deterministic models. However, this simplification may be quite costly because, sometimes, the optimal solution may even be infeasible with respect to the realization of uncertain parameters. Hence, it is necessary for us to take into account explicitly the range of the possible realizations of uncertain parameters, and formulate the production planning problem under uncertainty as uncertain models. Assume that the profit of per unit production i ξi , customer demands dit and production capacity ηjt are characterized by fuzzy variables. Then model (3) is meaningless. In the following, we will use fuzzy programming to model the fuzzy production planning problem. Let x = (x11 , · · · , x1T , x21 , · · · , x2T , · · · , xN 1 , · · · , xN T ), y = (y11 , · · · , y1T , y21 , · · · , y2T , · · · , yN 1 , · · · , yN T ),
288
W. Yan, J. Zhao, and Z. Cao
and ξ = (ξ1 , · · · , ξN ). Since there fuzzy parameters, the total profit f (x, y, ξ) =
N T
(ξi xit − hi yit )
i=1 t=1
is fuzzy, too. In many cases, the enterprize may give a profit level, and wish to maximize the chance that the fuzzy total profit is more than the predetermined profit level. Assume that the profit level is set as T0 by the enterprize. Then we have the objective of the enterprize as follows: maxCr{f (x, y, ξ) ≥ T0 }. Moreover, due to the fuzziness of the production capacity, customer demands, the constraint functions are also fuzzy, and cannot define the feasible set crisply. In practice, the enterprize often gives certain confidence levels, and regards a solution feasible when the constraint functions hold at the predetermined confidence levels. That is, the feasible set is defined by inequalities Cr
N
, wij xit ≤ ηjt
≥ α, j = 1, 2, · · · , K, t = 1, 2, · · · , T
i=1
and Cr{xit + yi(t−1) − yit ≥ dit } ≥ β, yi0 = 0, i = 1, 2, · · · , N, t = 1, 2, · · · , T where α and β are confidence levels given by the enterprize. Then we have the following fuzzy programming model for the fuzzy production planning problem: ⎧ max Cr{f (x, y, ξ) ≥ T0 } ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ s.t. ⎪ ⎪ ⎪ ⎨ N , (4) Cr wij xit ≤ ηjt ≥ α, j = 1, 2, · · · , K, t = 1, 2, · · · , T ⎪ ⎪ ⎪ ⎪ i=1 ⎪ ⎪ ⎪ Cr{xit + yi(t−1) − yit ≥ dit } ≥ β, i = 1, 2, · · · , N, t = 1, 2, · · · , T ⎪ ⎪ ⎩ xit ≥ 0, yit ≥ 0, yi0 = 0, i = 1, 2, · · · , N, t = 1, 2, · · · , T.
4
Crisp Equivalents
In this section, we assume that the fuzzy parameters ξi , ηjt and dit are all trapezoidal fuzzy numbers. Then we derive the crisp equivalent model of (4) based on credibility theory.
Fuzzy Programming Model for Lot Sizing Production Planning Problem
289
Lemma 1. Let ξ be a trapezoidal fuzzy number (r1 , r2 , r3 , r4 ). Then for any given confidence level α(> 0.5), we have Cr{ξ ≥ r} ≥ α ⇐⇒ r ≤ Kα , where Kα = (2α − 1)r1 + 2(1 − α)r2 . Proof. It follows from the definition of the credibility measure, we can get the credibility distribution function of ξ is ⎧ 1, if r ≤ r1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ r − 2r2 + r1 ⎪ ⎪ , if r1 ≤ r ≤ r2 ⎪ ⎪ 2(r1 − r2 ) ⎪ ⎨ (5) Cr{ξ ≥ r} = 1 , if r2 ≤ r ≤ r3 ⎪ ⎪ 2 ⎪ ⎪ ⎪ r − r4 ⎪ ⎪ , if r3 ≤ r ≤ r4 ⎪ ⎪ ⎪ 2(r 3 − r4 ) ⎪ ⎪ ⎩ 0, otherwise, It is obvious that the function Cr{ξ ≥ r} is continuous in r. If α > 1/2 and Cr{ξ ≥ r} ≥ α, then we have Cr{ξ ≥ r} = (r − 2r2 + r1 )/(2r1 − 2r2 ) ≥ α or r ≤ r1 . It is easy to verify that r ≤ (2α − 1)r1 + 2(1 − α)r2 . Conversely, if r ≤ r1 , then we have Cr{ξ ≥ r} = 1 ≥ α. If r ≤ (2α − 1)r1 + 2(1 − α)r2 , then we have Cr{ξ ≥ r} = (r − 2r2 + r1 )/(2r1 − 2r2 ) ≥ α. That is, the crisp equivalent of Cr{ξ ≥ r} ≥ α is r ≤ (2α − 1)r1 + 2(1 − α)r2 = Kα . The lemma is proved. Lemma 2. Let ξ be a trapezoidal fuzzy number (r1 , r2 , r3 , r4 ). Then for any given confidence levels beta(> 0.5), we have Cr{ξ ≤ r} ≥ β ⇐⇒ r ≥ Kβ , where Kβ = 2(1 − β)r3 + (2β − 1)r4 . Proof. The proof is similar to that of Lemma 1, and omitted here. Assume ξi , ηjt and dit to be independent trapezoidal fuzzy variables. Now we derive the crisp equivalent model of fuzzy programming model (4). Theorem 1. Let ξi = (ri1 , ri2 , ri3 , ri4 ), ηjt = (cjt1 , cjt2 , cjt3 , cjt4 ) and dit = (sit1 , sit2 , sit3 , sit4 ), where i = 1, 2, · · · , N, j = 1, 2, · · · , K and t = 1, 2, · · · , T . Then we have the crisp equivalent of fuzzy programming model (4) as follows: ⎧ ⎪ ⎪ max g(x, y) ⎪ ⎪ s.t. ⎪ ⎪ ⎪ N ⎪ ⎪ ⎪ ⎪ ⎪ wij xit ≤ (2α − 1)cjt1 + 2(1 − α)cjt2 , ⎪ ⎪ ⎨ i=1 (6) j = 1, 2, · · · , K, t = 1, 2, · · · , T ⎪ ⎪ ⎪ ⎪ ⎪ xit + yi(t−1) − yit ≥ 2(1 − β)sit3 + (2β − 1)sit4 , ⎪ ⎪ ⎪ ⎪ ⎪ i = 1, 2, · · · , N, t = 1, 2, · · · , T ⎪ ⎪ ⎪ ⎪ ⎩ xit ≥ 0, yit ≥ 0, yi0 = 0, i = 1, 2, · · · , N, t = 1, 2 · · · , T,
290
W. Yan, J. Zhao, and Z. Cao
where ⎧ 1, if T0 ≤ R1 (x, y) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ T0 − 2R2 (x, y) + R1 (x, y) ⎪ ⎪ , if R1 (x, y) ≤ T0 ≤ R2 (x, y) ⎪ ⎪ ⎪ 2(R1 (x, y) − R2 (x, y)) ⎪ ⎪ ⎨ g(x, y) = 1 , if R2 (x, y) ≤ T0 ≤ R3 (x, y) ⎪ 2 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ T0 − R4 (x, y) ⎪ ⎪ , if R3 (x, y) ≤ T0 ≤ R4 (x, y) ⎪ ⎪ 2(R (x, y) − R4 (x, y)) ⎪ 3 ⎪ ⎩ 0, otherwise and R1 (x, y) = R2 (x, y) = R3 (x, y) = R4 (x, y) =
T N i=1 t=1 T N i=1 t=1 T N i=1 t=1 T N
(7)
(ri1 xit − hi yit ), (ri2 xit − hi yit ), (8) (ri3 xit − hi yit ), (ri4 xit − hi yit ).
i=1 t=1
Proof. Firstly, it follows the linearity of f (x, y, ξ) in ξi and Extension Principle of Zadeh that f (x, y, ξ) is a trapezoidal fuzzy variable, too. Let f (x, y, ξ) = (R1 (x, y), R2 (x, y), R3 (x, y), R4 (x, y)),
(9)
where R1 (x, y), R2 (x, y), R3 (x, y), R4 (x, y) are defined by equation (8). Then it follows from equation (5) that Cr{f (x, y, ξ) ≥ T0 } = g(x, y). Secondly, it follows from lemma 2 that , N N Cr ηjt ≥ wij xit ≥ α ⇐⇒ wij xit ≤ (2α − 1)cjt1 + 2(1 − α)cjt2 . i=1
i=1
Thirdly, it follows from lemma 1 that Cr{dit ≤ xit +yi(t−1) −yit } ≥ β ⇐⇒ xit +yi(t−1) −yit ≥ 2(1−β)sit3 +(2β −1)sit4 . Lastly, we can turn the fuzzy programming model (4) to its crisp equivalent model defined by (6). The proof is completed.
Fuzzy Programming Model for Lot Sizing Production Planning Problem
5
291
Simulation Based Genetic Algorithm
In general cases, the fuzzy parameters are not all trapezoidal fuzzy variables, and it is difficult for us to derive their crisp equivalent models. Now, we design a simulation based genetic algorithm for solving the proposed fuzzy programming model (4) as well as its equivalent model (6). First, we have to cope with uncertain functions (functions with fuzzy parameters)in model (4): U1 (x, y) = Cr{f (x, y, ξ) ≥ T 0 }, , N U2 (x) = Cr wij xit ≤ ηjt , i=1
U3 (x, y) = Cr{xit + yi(t−1) − yit ≥ dit }, i = 1, 2, · · · , N, j = 1, 2, · · · , K, t = 1, 2, · · · , T . The fuzzy simulation technique was proposed and discussed by [10][11]. Now we give a fuzzy simulation procedure for computing uncertain functions. Fuzzy Simulation for Uncertain Function U1 : Step 1. Randomly generate uik from the ε-level set of ξi , i = 1, 2, · · · , N , respectively, where k = 1, 2, · · · , M and ε is a sufficiently small positive number. Step 2. Set νk = mini,j {μξi (uik )} for k = 1, 2, · · · , M . Step 3. Return L(T 0 ) via the following estimation formula (10) . - . / / 1 L(r) = (10) max νk .fk (x, y) ≥ r + min 1 −νk.fk (x, y) < r , 1≤k≤M 2 1≤k≤M where fk (x, y) =
T N
(uik xit − hi yit ), k = 1, 2, · · · , M.
(11)
i=1 t=1
Uncertain functions U2 and U3 can be calculated similarly. For simplicity, we omitted their computing procedures here. Now we embed fuzzy simulation into a genetic algorithm, thus producing a simulation based genetic algorithm as follows: Fuzzy Simulation Based Genetic Algorithm Step 1. Initialize pop size chromosomes randomly. Step 2. Update the chromosomes by crossover and mutation operations, in which fuzzy simulations are used to check the feasibility of chromosomes. Step 3. Calculate the objective values for all chromosomes by fuzzy simulation. Step 4. Compute the fitness of each chromosome according to the objective values. Step 5. Select the chromosomes by spinning the roulette wheel. Step 6. Repeat the second to fifth steps for a given number of cycles. Step 7. Report the best chromosome as the optimal solution.
292
6
W. Yan, J. Zhao, and Z. Cao
A Numerical Example
Consider an enterprize that manufactures three types of products in 6 periods. Each product needs 7 working procedures. The customer demands dit and other parameters are listed in Table 1, 2 and 3. Table 1. Customer Demands (dit ) i t=1 t=2 t=3 t=4 t=5 t=6 1 (30,35,36,38) (65,66,68,70) (43,45,46,47) (56,60,61,62) (41,44,45,46) (51,55,56,57) 2 (54,60,61,62) (37,39,40,41) (64,67,69,70) (35,45,46,47) (35,40,41,43) (34,36,37,38) 3 (40,45,46,47) (46,49,50,51) (24,26,28,29) (52,54,55,56) (58,61,63,65) (54,59,60,61)
Table 2. Problem Parameters wij , ξi and hi i
wij
j=1 j=2 j=3 1 0.45 4.43 11.91 2 0.41 4.54 12.87 3 0.44 5.00 11.7
j=4 3.18 2.90 4.20
ξi j=5 3.53 2.80 3.51
j=6 0.42 0.34 0.43
hi
j=7 0.85 (10,11,12,14) 1.8 0.87 (15,16,18,19) 4.3 0.81 (8,9,10,11) 1.2
Table 3. Production Capacity (ηjt ) j t = 1, 2, · · · , 6 j=1 (74,75,76,77) j=2 (840,860,880,900) j=3 (2200,2300,2400,2500) j=4 (620,630,640,650) j=5 (600,610,620,630) j=6 (70,71,76,80) j=7 (145,150,180,200) Table 4. Computing Results t x1t x2t x3t y1t y2t y3t
1 45.287 70.247 56.837 4.456 0.03 4.254
2 72.282 46.34 51.078 2.098 0.667 0.095
3 51.362 76.548 44.128 0.511 0.035 11.012
4 67.266 55.799 47.814 0.233 0.016 0.68
5 51.196 49.954 69.721 1.396 0.022 0.146
6 61.604 45.027 63.673 0.579 0.19 0.242
Since ξi , ηjt and dit are all trapezoid fuzzy variables, we can derive the crisp equivalent of fuzzy programming model (4). A run of the genetic algorithm for
Fuzzy Programming Model for Lot Sizing Production Planning Problem
293
10000 generation with population size 30, probability of crossover 0.3, probability of mutation 0.2, T 0 = 11500, α = β = 0.9 and number of cycles 10000, we get the optimal result in Table 4.
7
Conclusion
Lot sizing production planning problem with fuzzy unit profits, fuzzy capacities and fuzzy demands was formulated as a credibility measure based fuzzy programming model. Then its crisp equivalent model was also derived when the fuzzy parameters are characterized by trapezoidal fuzzy numbers. Moreover, a fuzzy simulation-based genetic algorithm was designed for solving the proposed fuzzy programming model as well as its crisp equivalent. Finally, a numerical example was provided for illustrating the effectiveness of algorithm.
Acknowledgments This work was supported by National Natural Science Foundation of China (No.6042-5309), and Specialized Research Fund for the Doctoral Program of Higher Education (No.20020003009).
References 1. Dong, Y., Tang, J., Xu, B., Wang, D.: Chance-constrained programming approach to aggregate production planning. Journal of System Engineering, 18(2003), 255– 261 (In Chinese) 2. Dubois, D., Prade, H.: Possibility Theory. Plenum, New York, 1988 3. Gao, J., Liu B.: New primitive chance measures of fuzzy random event, International Journal of Fuzzy Systems. 3(2001), 527–531 4. Gao, J., Lu M.: Fuzzy quadratic minimum spanning tree problem. Applied Mathematics and Computation, 164/3(2005), 773-788 5. Gao, J., Lu M.: On the Randic Index of Unicyclic Graphs. MATCH Commun. Math. Comput. Chem., 53(2005), 377-384 6. Gao, J., Liu Y.: Stochastic Nash equilibrium with a numerical solution method. Lecture Notes in Computer Science, 3496(2005), 811-816 7. Liu B., Iwamura, K.: Chance constrained programming with fuzzy parameters. Fuzzy Sets and Systems, 94(1998), 227–237 8. Liu, B.: Dependent-chance programming in fuzzy environments. Fuzzy Sets and Systems, 109(2000) 97–106 9. Liu, B., Liu, Y.K.: Expected value of fuzzy variable and fuzzy expected value models, IEEE Transactions on Fuzzy Systems, 10(2002), 445–450 10. Liu, B.: Theory and Practice of Uncertain Programming. Physica-Verlag, Heideberg, 2002 11. Liu, B.: Uncertainty Theory: An Introduction to its Axiomatic Foundations. Springer-Verlag, Berlin, 2004 12. Sahinidis, N.V.: Optimization under uncertainty: state-of-the-art and opportunities. Computers and Chemical Engineering, 28(2004), 971–983
294
W. Yan, J. Zhao, and Z. Cao
13. Sakawa, M., Nishizaki, I., Uemura, Y.: Fuzzy programming and profit and cost allocation for a production and transportation problem. European Journal of Operational Research, 131(2001), 1–15 14. Sakawa, M., Nishizaki, I., Uemura, Y.: Interactive fuzzy programming for two-level linear and linear fractional production and assignment problems: A case study. European Journal of Operational Research, 135(2001), 142–157 15. Tang, J., Wang, D., Xu, B.: Fuzzy modelling approach to aggregate production planning with multi-product. Journal of Management Sciences in China, 6(2003), 44–50 (In Chinese) 16. Wang, R.C., Fang, H.H.: Aggregate production planning with multiple objectives in a fuzzy environment. European Journal of Operational Research, 133(2001), 521–536 17. Zadeh, L.A.: Fuzzy sets as a basis for a theory of possibility, Fuzzy Sets and Systems, 1(1978), 3–28. 18. Zimmermann, H.J.: Fuzzy Set and its Application. Boston, Hingham, 1991
Fuzzy Dominance Based on Credibility Distributions Jin Peng1 , Henry M.K. Mok2 , and Wai-Man Tse3 1
3
Department of Mathematics, Huanggang Normal University, Hubei 438000, China [email protected] 2 Department of Decision Sciences and Managerial Economics, The Chinese University of Hong Kong, Hong Kong [email protected] Department of Statistics, The Chinese University of Hong Kong, Hong Kong [email protected]
Abstract. Comparison of fuzzy variables is considered one of the most important and interesting topics in fuzzy theory and applications. This paper introduces the new concept of fuzzy dominance based on credibility distributions of fuzzy variables. Some basic properties of fuzzy dominance are investigated. As an illustration, the first order case of fuzzy dominance rule for typical triangular fuzzy variables is examined. Keywords: fuzzy theory, fuzzy dominance, credibility distribution .
1
Introduction
Fuzzy set theory, initiated by Zadeh [22], has been well developed and applied in a wide variety of real problems. The term fuzzy variable was first introduced by Kaufmann [16], then it appeared in Zadeh [23] and Nahmias [18]. Meanwhile, possibility theory was proposed by Zadeh [23], and developed by Dubois & Prade [6] and other researchers [4]. Fuzzy set theory and possibility theory provide an alternative framework for modelling of real-world fuzzy decision systems mathematically. As the average of possibility measure and necessity measure, a new type of measure—credibility measure is introduced by Liu & Liu [15]. Traditionally, possibility measure is widely used and regarded as the parallel concept of probability measure. However, as many researchers pointed out, there exist part and parcel differences between possibility measure and probability measure. As we know from Dubois and Prade [6], the necessity measure is the dual of possibility measure. But neither possibility measure nor necessity measure is self-dual. Fortunately, the credibility measure is self-dual (Liu [11]). In this aspect, the credibility measure shares some properties like the probability measure. In this way, credibility measure weights possibility and necessity measures harmoniously. In fact, it is credibility measure that plays the role of probability measure in the fuzzy world. More importantly, a new kind of expected value operator is defined L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 295–303, 2005. c Springer-Verlag Berlin Heidelberg 2005
296
J. Peng, H.M.K. Mok, and W.-M. Tse
by Liu & Liu [15] by means of the credibility measure. Motivated by developing an axiomatic approach based on credibility measure, credibility theory is proposed by Liu [12]. Based on the possibility measure, necessity measure, and credibility measure of fuzzy event, the optimistic and pessimistic values of fuzzy variable have been introduced by Liu [11] and investigated by Peng & Liu [20] for handling optimization problems in fuzzy environments. Traditionally, possibility measure and necessity measure, as well as the optimistic and pessimistic values of fuzzy variable with respect to possibility measure and necessity measure, are employed to model a type of fuzzy chance constrained programming [10][13][14]. In recent years, credibility measure, together with the optimistic and pessimistic values of fuzzy variable with respect to credibility measure, has been applied to modelling a new type of fuzzy chance constrained programming [11]. Uncertainty is the key ingredient in many decision problems and comparing uncertain variables is one of fundamental interests of decision theory [19]. Random variable and fuzzy variable are two basic types of uncertain variables. Stochastic dominance (see the review articles [1][9] and the references therein) is based on probability distribution and has been widely used in economics, finance, statistics, and operations research [2][3][5]. Inspired by the stochastic dominance approach, we introduce the new concept of fuzzy dominance based on credibility distribution. So far, comparison of fuzzy numbers is still considered one of the most important topics in fuzzy logic theory and numerous papers have been written about this subject [7][8][17][21]. Fuzzy dominance by means of credibility distribution, in quite a different light, provides a new approach to compare fuzzy variables in theory and applications of fuzzy logic. The rest of this paper is organized as follows. Section 2 presents preliminaries in credibility theory. The concepts of fuzzy dominance are formally defined in Section 3. In Section 4 we investigate some basic properties of fuzzy dominance. Section 5 provides an example to show the first order case of fuzzy dominance rule for typical triangular fuzzy variables. Section 6 concludes this paper.
2
Preliminaries
In this section, we are going to review some concepts of credibility measure, credibility distribution, optimistic and pessimistic values as well as expected value operator for fuzzy variable. Let us introduce some notation used throughout this paper. An abstract possibility space is denoted by (Θ, (Θ), Pos). The expected value operator is denoted by standard symbol E. 2.1
Fuzzy Variable
Fuzzy variable has been defined in many ways. In this paper we use the following definition of fuzzy variable in Nahmias’ sense [18].
Fuzzy Dominance Based on Credibility Distributions
297
Definition 1. A fuzzy variable is defined as a function from the possibility space (Θ, (Θ), Pos) to the real line . Let ξ be a fuzzy variable on the possibility space (Θ, (Θ), Pos). Then its membership function is derived from the possibility measure Pos by μ(x) = Pos{θ ∈ Θ | ξ(θ) = x}. 2.2
(1)
Possibility, Necessity, Credibility
Dubois and Prade [6] described the possibility measure and necessity measure as follows. Let r be a real number. The possibility and necessity measure of fuzzy event {ξ ≤ r} are defined as Pos{ξ ≤ r} = sup μ(x),
(2)
x≤r
Nec{ξ ≤ r} = 1 − sup μ(x),
(3)
x>r
respectively. Liu and Liu [15] introduced a relatively new concept—credibility measure. Definition 2. Let r be a real number. The credibility measure of fuzzy event {ξ ≤ r} is defined as Cr{ξ ≤ r} =
1 (Pos{ξ ≤ r} + Nec{ξ ≤ r}). 2
(4)
Similarly, the credibility measure of fuzzy event {ξ ≥ r} is defined as Cr{ξ ≥ r} = 2.3
1 (Pos{ξ ≥ r} + Nec{ξ ≥ r}). 2
(5)
Credibility Distribution, Credibility Density Function
Definition 3 (Liu [12]). The credibility distribution Φ : → [0, 1] of a fuzzy variable ξ is defined by . / (6) Φ(x) = Cr θ ∈ Θ . ξ(θ) ≤ x . Definition 4 (Liu [12]). The credibility density function φ: → [0, +∞) of a fuzzy variable ξ is a function such that x Φ(x) = φ(y)dy (7) −∞
holds for all x ∈ , where Φ is the credibility distribution of the fuzzy variable ξ. Obviously, the credibility distribution of a fuzzy variable is nonnegative and bounded. However, generally speaking, it is neither left-continuous nor rightcontinuous. Moreover, we have
298
J. Peng, H.M.K. Mok, and W.-M. Tse
Lemma 5 (Liu [12]). The credibility distribution Φ of a fuzzy variable is a non-decreasing function on with ⎧ lim Φ(x) ≤ 0.5 ≤ lim Φ(x) ⎨ x→−∞ x→∞ (8) ⎩ lim Φ(y) = Φ(x) if lim Φ(y) > 0.5 or Φ(x) ≥ 0.5. y↓x
2.4
y↓x
Optimistic Value, Pessimistic Value, and Expected Value
Liu [11] introduced the following concepts of optimistic and pessimistic values. Definition 6 (Liu [11]). Let ξ be a fuzzy variable, and α ∈ (0, 1]. Then - . / ξsup (α) = sup r . Cr {ξ ≥ r} ≥ α and
/ - . ξinf (α) = inf r . Cr {ξ ≤ r} ≥ α
(9) (10)
are called the α-optimistic value and α-pessimistic value of ξ with respect to credibility, respectively. Remark 7. α-pessimistic value of ξ with respect to credibility is also called the generalized inverse function (g.i.f.) of credibility distribution function Φ(x), denoted by Φ−1 (α). Definition 8 (Liu and Liu [15]). Let ξ be a fuzzy variable. Then the expected value of ξ is defined by +∞ 0 E[ξ] = Cr{ξ ≥ r}dr − Cr{ξ ≤ r}dr (11) −∞
0
provided that at least one of the two integrals is finite.
3
Fuzzy Dominance
The forthcoming definitions of fuzzy dominance are based on the credibility distribution functions of fuzzy variables. Let ξ and η be two fuzzy variables, and Φ(x) and Ψ (x) denote the credibility (k) functions of ξ and η, distribution respectively. Hereinafter, we write Φ (x) = x
x
Φ(k−1) (t)dt and Ψ (k) (x) = −∞
−∞
Ψ (k−1) (t)dt, k = 2, 3, · · · where Φ(1) (x) =
Φ(x) and Ψ (1) (x) = Ψ (x). We focus on the set of fuzzy variables . Lk = {ξ .ξ is with credibility distribution Φ(x) and Φ(k) (x) < ∞}. x In this way, it is guaranteed that the integral Φ(k−1) (t)dt sounds well defined when ξ ∈ Lk (k = 2, 3, · · ·).
−∞
Fuzzy Dominance Based on Credibility Distributions
299
Definition 9. Let ξ and η be two fuzzy variables in L1 . We say that ξ First Order Fuzzy Dominates η, denoted by ξ %F F D η, if and only if Φ(x) ≤ Ψ (x) for all x ∈ . Definition 10. Let ξ and η are two fuzzy variables in L2 . We saythat ξ Second x Order Fuzzy Dominates η, denoted by ξ %SF D η, if and only if Φ(t)dt ≤ −∞ x Ψ (t)dt for all x ∈ . −∞
Definition 11. Let ξ and η are two fuzzy variables in L3 . We say that ξ Third x t Φ(s)dsdt ≤ Order Fuzzy Dominates η, denoted ξ %T F D η, if and only if −∞ −∞ x t Ψ (s)dsdt for all x ∈ . −∞
−∞
In a general way, we can introduce the concept of fuzzy dominance in k-order. Definition 12. Let ξ and η are two fuzzy variables in Lk . We say that ξ kOrder Fuzzy Dominates η, denoted ξ %kF D η, if and only if Φ(k) (x) ≤ Ψ (k) (x) for all x ∈ .
4
Properties of Fuzzy Dominance
In this section, some basic properties related to the fuzzy dominance are presented. Theorem 13. Let ξ be a fuzzy variable with E[ξ] < ∞ and Φ(x) be the credibility x
distribution functions of ξ. Then Φ(2) (x) = −∞
Φ(t)dt is finite for all x ∈ .
+∞
0
Cr{ξ ≥ r}dr − Cr{ξ ≤ r}dr. Since Proof. We recall that E[ξ] = 0 −∞ 0 x E[ξ] < ∞, the integral Cr{ξ ≤ r}dr is finite. Then Cr{ξ ≤ r}dr is finite for all x ∈ .
−∞
−∞
Theorem 14. Let ξ be a fuzzy variable in L2 . Then (a) Φ(2) (x) is a non-decreasing, nonnegative function of x; (b) If ξ has credibility density φ(x), then Φ(2) (x) is a convex function of x; (c) lim Φ(2) (x) = 0; x→−∞
(d) If Φ(x0 ) > 0 for some x0 ∈ , then Φ(2) (x) is an increasing function of x ∈ [x0 , ∞) in strict sense. Proof. (a) The result follows from the fact that Φ(x) is a nonnegative, monotonic integrable function of x. (b) If ξ has credibility density φ(x), then Φ(2) (x) =
300
J. Peng, H.M.K. Mok, and W.-M. Tse
x
x
t
Φ(t)dt =
φ(s)dsdt. Thus we obtain that
−∞
−∞
−∞
d2 Φ(2) dx2
= φ(x) ≥ 0,
a.e.
Theorem and hence Φ(2) (x) is a convex function. (c) It follows immediately from x2
13. (d) Suppose that x0 ≤ x1 < x2 . Then we have Φ(2) (x2 ) = Φ(t)dt = −∞ x2 x1 x1 Φ(t)dt + Φ(t)dt ≥ Φ(t)dt + Φ(x0 )(x2 − x1 ) > Φ(2) (x1 ). Therefore −∞
−∞
x1
Φ(2) (x) is an increasing function of x ∈ [x0 , ∞) in strict sense.
Theorem 15. Let ξ be a fuzzy variable in L3 . Then Φ(3) (x) is a non-decreasing, convex, nonnegative function of x. Proof. Similar to the proofs of (a) and (b) in Theorem 14.
Theorem 16. Let ξ, η and ζ be fuzzy variables in L1 . Then (a) Reflexivity: ξ %F F D ξ; (b) Transitivity: ξ %F F D η and η %F F D ζ implies ξ %F F D ζ. Proof. Simple consequences from the definition of FFD.
Remark 17. The first (second or third ) order fuzzy dominance relation is a partial order but not a total order among fuzzy variables (or their distributions). Theorem 18. Let ξ and η be two fuzzy variables in L3 . Then ξ %F F D η ⇒ ξ %SF D η ⇒ ξ %T F D η. Proof. Let Φ(x) and Ψ (x) denote the credibility distribution functions of ξ and η, ξ ≥F F D η, we have Φ(x) ≤ Ψ (x) for all x ∈ . Then xrespectively. Since x Φ(t)dt ≤ Ψ (x)dt for all x ∈ . That is, ξ ≥SF D η. Now, we also have −∞ x t −∞ x t Φ(s)dsdt ≤ Ψ (s)dsdt for all x ∈ . Therefore we conclude −∞
−∞
that ξ ≥T F D η.
−∞
−∞
A sufficient and necessary condition for FFD is given by the following theorem. Theorem 19. Let ξ and η be two fuzzy variables in L1 , α ∈ (0, 1]. Let ξinf (α) and ηinf (α) denote the α-pessimistic values of ξ and η, respectively. Then ξ %F F D η if and only if ξinf (α) ≥ ηinf (α) for all α ∈ (0, 1]. Proof. If ξ %F F D -η, .i.e., Φ(x) ≤ / Ψ (x) -for .all x ∈ ,/then for all α ∈ (0, 1], we have ξinf (α) = inf r . Φ(r) ≥ α ≥ inf r . Ψ (r) ≥ α = ηinf (α). Conversely, assume that there exists some α0 ∈ (0, 1] such that Φ(x0 ) > α0 > Ψ (x0 ) for some x0 ∈ . On one side, from the definition of ηinf (α0 ), we have x0 < ηinf (α0 ). On the other side, from the definition of ξinf (α0 ), we have x0 ≥ ξinf (α0 ). Then ξinf (α0 ) < ηinf (α0 ) which contradicts to ξinf (α0 ) ≥ ηinf (α0 ). Therefore, Φ(x) ≤ Ψ (x) for all x ∈ . That is, ξ %F F D η.
Fuzzy Dominance Based on Credibility Distributions
301
Theorem 20. Let ξ and η be two continuous fuzzy variables and {ξn } be a fuzzy variable sequence in L1 . If ξn %F F D η for all n = 1, 2, · · · and {ξn } converges in distribution to ξ (that is, Φn (x) → Φ(x) as n → ∞ for all continuity points x of Φ), then ξ %F F D η. Proof. Suppose that Φ, Ψ, Φ1 , Φ2 , · · · are the credibility distributions of fuzzy variables ξ, η, ξ1 , ξ2 , · · ·, respectively. Since ξn %F F D η, n = 1, 2, · · ·, we have Φn (x) ≤ Ψ (x) for all x ∈ . Because limn→∞ Φn (x) = Φ(x) for all continuity points x of Φ, we conclude Φ(x) ≤ Ψ (x) for all continuity points x of Φ. That is, Φ(x) ≤ Ψ (x) holds for all x ∈ . This means ξ %F F D η. Theorem 21. Let ξ and η be two fuzzy variables in L1 , U (x) is a strictly monotone increasing function defined on . Then ξ %F F D η iff U (ξ) %F F D U (η). Proof. Let Φ(x) and Ψ (x) denote the credibility distribution functions of ξ and η, respectively. Assume that F (x) and G(x) are the credibility distribution functions of fuzzy variables U (ξ) and U (η), respectively. Note that F (x) = Cr{U (ξ) ≤ x} = Cr{ξ ≤ U −1 (x)} = Φ(U −1 (x)) and G(x) = Cr{U (η) ≤ x} = Cr{η ≤ U −1 (x)} = Ψ (U −1 (x)). If ξ %F F D η, i.e., Φ(y) ≤ Ψ (y) for all y ∈ , then for all x ∈ , we have F (x) = Φ(U −1 (x)) ≤ Ψ (U −1 (x)) = G(x). This means U (ξ) %F F D U (η). Conversely, if U (ξ) %F F D U (η), then the reverse function U −1 exists and is also strictly monotone. Therefore, U −1 U (ξ) %F F D U −1 U (η). That is, ξ %F F D η. The above theorem shows that the first order fuzzy dominance remains invariant under monotonic transformation. Theorem 22. Let ξ and η be two fuzzy variables in L1 , a, b ∈ and a > 0. If ξ %F F D η, then aξ + b %F F D aη + b. Proof. It follows immediately from Theorem 21.
5
An Example
Generally speaking, the condition of fuzzy dominance is rather strong and difficult to verify due to the fact that it is an inequality of two credibility distribution functions. In some specific cases, the fuzzy dominance relation can be easily observed. As an illustration, the first order case of fuzzy dominance rule for typical triangular fuzzy variables are examined in the following example. Example 23. The credibility distribution function of a triangular fuzzy variable ξ = (r1 , r2 , r3 ) is ⎧ 0, if x ≤ r1 ⎪ ⎪ ⎪ ⎪ x − r1 ⎪ ⎪ ⎨ 2(r − r ) , if r1 ≤ x ≤ r2 2 1 Φ(x) = x + r3 − 2r2 ⎪ ⎪ , if r2 ≤ x ≤ r3 ⎪ ⎪ ⎪ 2(r3 − r2 ) ⎪ ⎩ 1, if r3 ≤ x,
302
J. Peng, H.M.K. Mok, and W.-M. Tse
and for any given α(0 < α ≤ 1), we have r1 + 2(r2 − r1 )α, if 0 < α ≤ 0.5 ξinf (α) = 2r2 − r3 + 2(r3 − r2 )α, if 0.5 < α ≤ 1. The credibility distribution function and optimistic value function of a triangular fuzzy variable η = (s1 , s2 , s3 ) and are expressed in similar way. A question is that under what conditions a triangular fuzzy variable fuzzy dominates another one? According to the definition of first order fuzzy dominance or Theorem 19, we can obtain the following intuitive result. Proposition 24. Let ξ = (r1 , r2 , r3 ) and η = (s1 , s2 , s3 ) be two triangular fuzzy variables. Then ξ ≥F F D η iff r1 ≥ s1 , r2 ≥ s2 , r3 ≥ s3 .
6
Conclusions
This paper introduced the fuzzy dominance approach in terms of credibility distribution. Some properties of fuzzy dominance are investigated. As an illustration, the first order case of fuzzy dominance rule for typical triangular fuzzy variables is examined. We see this short paper as a first step towards developing a route for dealing with uncertainty by means of fuzzy dominance within the axiomatic framework of credibility theory. Future research both in theoretical and applied aspects is on the way.
Acknowledgments This work was supported by the National Natural Science Foundation of China Grant No. 60174049, the Significant Project No. Z200527001 and Scientific and Technological Innovative Group Project of Hubei Provincial Department of Education, China.
References 1. Bawa, V. S.: Stochastic dominance: A research bibliography. Management Science 28 (1982) 698–712 2. Dachraoui, K., Dionne, G.: Stochastic dominance and optimal portfolio, Economics Letters 71(2001) 347–354 3. Dana, R. A.: Market behavior when preferences are generated by second-order stochastic dominance. Journal of Mathematical Economics 40(2004) 619–639 4. De Cooman, G.: Possibility theory-I, II, III. International Journal of General Systems 25(1997) 291–371. 5. Dentcheva, D., Ruszczynski, A.: Optimization with Stochastic Dominance Constraints. SIAM Journal on Optimization 14(2003) 548–566. 6. Dubois, D., Prade, H.: Possibility Theory: An Approach to Computerized Processing of Uncertainty. Plenum Press, New York (1988)
Fuzzy Dominance Based on Credibility Distributions
303
7. Facchinetti, G., Ricci, R. G., Muzzioli, S.: Note on ranking fuzzy triangular numbers. International Journal of Intelligent Systems 13(1998) 613–622 8. Iskander, M. G.: A suggested approach for possibility and necessity dominance indices in stochastic fuzzy linear programming, Applied Mathematics Letters 18 (2005) 395–399 9. Levy, H.: Sochastic Dominance and Expected Utility: Survey and Analysis. Management Science 38(1992) 555–593 10. Liu, B.: Uncertain Programming. John Wiley & Sons, New York (1999) 11. Liu, B.: Theory and Practice of Uncertain Programming. Physica-Verlag, Heidelberg (2002) 12. Liu, B.: Uncertainty Theory: An Introduction to its Axiomatic Foundations. Springer-Verlag, Heidelberg (2004) 13. Liu, B., Iwamura, K.: Chance-constrained programming with fuzzy parameters. Fuzzy Sets and Systems 94(1998) 227–237 14. Liu, B.: Dependent-chance programming in fuzzy environments. Fuzzy Sets and Systems 109 (2000) 95–104 15. Liu, B., Liu, Y.-K.: Expected value of fuzzy variable and fuzzy expected value model. IEEE Transactions on Fuzzy Systems 10(2002) 445–450 16. Kaufmann, A.: Introduction to the Theory of Fuzzy Subsets. Academic Press, New York(1975) 17. Mitchell, H. B., Schaefer, P. A.: On ordering fuzzy numbers. International Journal of Intelligent Systems 15(2000) 981–993 18. Nahmias, S.: Fuzzy variables. Fuzzy Sets and Systems 1(1978) 97–100 19. Ogryczak, W., Ruszczy´ nski, A.: From stochastic dominance to mean-risk models: Semideviations as risk measures. European Journal of Operational Research 116(1999) 33–50 20. Peng, J., Liu, B.: Some properties of optimistic and pessimistic values of fuzzy variables. In: Proceedings of the Tenth IEEE International Conference on Fuzzy Systems 2(2004) 292–295 21. Yager, R. R., Filev, D.: On ranking fuzzy numbers using valuations. International Journal of Intelligent Systems 14(1999) 1249–1268 22. Zadeh, L. A.: Fuzzy sets. Information and Control 8(1965) 338–353 23. Zadeh, L. A.: Fuzzy set as a basis for a theory of possibility. Fuzzy Sets and Systems 1(1978) 3–28
Fuzzy Chance-Constrained Programming for Capital Budgeting Problem with Fuzzy Decisions Jinwu Gao1 , Jianhua Zhao2 , and Xiaoyu Ji3 1 School of Information, Renmin University of China, Beijing 100872, China Department of Mathematics, Shijiazhuang College, Shijiazhuang 050801, China Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China
2 3
Abstract. In this paper, capital budgeting problem with fuzzy decisions is formulated as fuzzy chance-constrained programming models. Then fuzzy simulation, neural network and genetic algorithm are integrated to produce a hybrid intelligent algorithm for solving the proposed models. Finally, numerical experiments are provided to illustrate the effectiveness of the hybrid intelligent algorithm.
1
Introduction
The original capital budgeting is to choose an appropriate combination of projects under fixed assets to maximize the total net profit. Early attempts to model the capital budgeting problem was linear programming with single objective and crisp coefficients. But in practice, there often exist many uncertain elements such as future demands, the benefits of per project and the level of funds that needs to be allocated to every projects. Moreover, multiple conflicting goals make the problem more complex. For this reason, uncertain programming [12][17], the more powerful and suitable approaches are employed to deal with it. Chance-constrained programming (CCP) [1] has long been used to model stochastic capital budgeting problems in the working capital management [8] and in the production area [9]. De et al [3] extended chance-constrained goal programming (CCGP) to the zero-one case and applied it to capital budgeting problems. A framework of fuzzy chance-constrained programming was presented by Liu and Iwamura [10] [11], and used to model capital budgeting problems in fuzzy environment by Iwamura and Liu [6]. Traditional mathematic programming models all produce crisp decision vectors such that some objectives achieve the optimal values. But sometimes we should provide fuzzy decisions for practical purposes. In view this reasons, Liu and Iwamura [15] provided a spectrum of CCP with fuzzy decisions. The purpose of this paper is to investigate capital budgeting problem with fuzzy decisions. Section 2 formulate capital budgeting problems with fuzzy decisions by CCP and CCGP models. Then, fuzzy simulation, neural network and genetic algorithm are integrated to produce a hybrid intelligent algorithm for solving the proposed CCP models. Lastly, two numerical examples are provided to illustrate the effectiveness of the hybrid intelligent algorithm. L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 304–311, 2005. c Springer-Verlag Berlin Heidelberg 2005
Fuzzy Chance-Constrained Programming for Capital Budgeting Problem
2
305
Fuzzy Capital Budgeting Problem with Fuzzy Decisions
Consider a company which are planning to initialize a new factory. Suppose that there are several types of machines to be selected, and different types of machines can produce the same product. Certainly, the objective is to make profit by satisfying future demands, and there are resources constraints such as capital and space. The following notation will be used henceforh. n Ai xi bi , b
: : : :
ci , c : ηj ξij
: :
ai
:
the number of types of machines; denoting the i-th type of machines; nonnegative integer representing the number of Ai selected; the price of an Ai and the total capital available for buying new machines; space that an Ai takes up and the total space available, i = 1, 2, · · · , n; the j-th class demand per time period, j = 1, 2, · · · , p; the production capacity of an Ai for j-th class product, i = 1, 2, · · · , n, j = 1, 2, · · · , p; the net profit of an Ai per time period, i = 1, 2, · · · , n.
Immediately, we have the following deterministic linear integer programming model for capital budgeting problem,
n ⎧ max ⎪ i=1 ai xi ⎪ ⎪ ⎪ ⎪ subject to: ⎪ ⎪
n ⎪ ⎨ bi xi ≤ b
i=1 (1) n ⎪ ⎪ i=1 ci xi ≤ c ⎪
n ⎪ ⎪ ⎪ ⎪ i=1 ξij xi ≥ ηj , j = 1, 2, · · · , p ⎪ ⎩ xi , i = 1, 2, · · · , n, nonnegative integers. In practice, the production capacities ξij , future demand ηj , the net profit ai per Ai , the level of funds that needs to be allocated to type i machine bi , i = 1, 2, · · · , n, j = 1, 2, · · · , p, and so on are not necessarily deterministic. Here, we assume that they are fuzzy variables. In addition, management decisions sometimes cannot be precisely performed in actual cases, even if we had provided crisp values. It is thus reasonable for us to fuzzify the decision. In the fuzzy decision, each decision may be restricted in a reference collection of fuzzy sets which are determined by the property of decision systems. Then the decision maker would have more selection. In the following, we denote a fuzzy decision ˜n ), in which each component x ˜i will be assigned an by a vector x ˜ = (˜ x1 , x˜2 , · · · , x element in the reference collection of fuzzy sets Xi , i = 1, 2, · · · , n, respectively. Since the reference collections are all predetermined, x ˜ must be taken from the Cartesian product (2) X = X1 ⊗ X2 ⊗ · · · ⊗ Xn . Since there exist fuzzy parameters and fuzzy decision variables, the objective and constraints in model (1) are fuzzy, too. Then then feasible set as well as
306
J. Gao, J. Zhao, and X. Ji
the optimal objective are not defined mathematically. In the following, we will model the capital budgeting problem with fuzzy decisions by chance-constrained programming models.
3
Chance-Constrained Programming Models
Suppose that the manager wishes to maximize the total profit at a confidence level α1 . He also specified the confidence levels that the capital constraint, the space constraint, and the demand constraints should hold. Then we have the CCP with fuzzy decisions as follows ⎧ maxf ⎪ ⎪ ⎪ ⎪ ⎪ subject to: ⎪ ⎪ ⎪
n ⎨ Pos{ i=1 ai x˜i ≥ f } ≥ α1 (3)
⎪ Pos{ ni=1 bi x ˜i ≤ b} ≥ α2 ⎪ ⎪
⎪ ⎪ ⎪ Pos{ ni=1 ci x˜i ≤ c} ≥ α3 ⎪ ⎪
n ⎩ Pos{ i=1 ξij x ˜i ≥ ηj } ≥ βj , j = 1, 2, · · · , p, In general, there are several elements such as the total profit, service level, capital and space available, which need to be taken into accounts by the manager. Sometimes, the manager has to take more than one of them as objectives, and the others as constraints. However, the multiple goals are often conflicting. In order to balance them, suppose that the decision maker give the following target levels and priority structure. Priority 1: Profit goal, the total profit should achieve a given level a at confidence level α1 ; Priority 2: Budget goal, that is, the total cost spent for machines should not exceed the amount available at confidence level α2 ; Priority 3: Space goal, the total space used by the machines should not exceed the space available at confidence level α3 . Then the fuzzy capital budgeting problem may also be modelled by the following CCGP with fuzzy decision and fuzzy coefficients. ⎧ + + lexmin{d− ⎪ 1 , d2 , d3 } ⎪ ⎪ ⎪ ⎪ subject to: ⎪ ⎪ n ⎪ ⎪ Pos{a − i=1 ai x ˜i ≤ d− ⎪ 1 } ≥ α1 ⎨
n + Pos{ i=1 bi x ˜i − b ≤ d2 } ≥ α2 (4)
n ⎪ + ⎪ ⎪ Pos{ c x ˜ − c ≤ d } ≥ α i i 3 ⎪ 3 ⎪
i=1 ⎪ n ⎪ ⎪ Pos{ ξ x ˜ ≥ η } ≥ β , j = 1, 2, · · · , p ij i j j ⎪ i=1 ⎪ ⎩ − + + d1 , d2 , d3 ≥ 0, where lexmin represents lexicographical minimization.
Fuzzy Chance-Constrained Programming for Capital Budgeting Problem
4
307
Hybrid Intelligent Algorithm
In this section, fuzzy simulation, neural network, and genetic algorithm are to be integrated to produce a hybrid intelligent algorithm for solving the chanceconstrained programming models with fuzzy decisions. 4.1
Fuzzy Simulations
In our fuzzy programming models with fuzzy decisions, we summarize the following two types of uncertain functions. U1 (˜ x) : x ˜ → Pos{g(˜ x, ξ) ≤ 0} - . / U2 (˜ x) : x ˜ → max f . Pos{f (˜ x, ξ) ≥ f } ≥ α
(5)
where x ˜ and ξ are fuzzy vectors. Due to the complexity, we will resort to fuzzy simulation for computing the uncertain functions. For each given decision variable x ˜, the computation procedures of the two x) and U2 (˜ x) are given as follows. types of uncertain functions U1 (˜ Fuzzy Simulation for the Possibility of Fuzzy Event: Step 1: Set U2 (˜ x) = 0; x, ξ), if g(˜ x0 , ξ0 )≤ Step 2: Sample a crisp vector (x0 , ξ 0 ) from the α-level set of (˜ x) < μ(˜ xi , ξ i ), then let U1 (˜ x) = μ(˜ xi , ξ i ), else go to step 1. 0 and U1 (˜ Step 3: Repeat steps 1 and 2 for N times, then return U1 (˜ x). Fuzzy Simulation for the Optimistic Value: Step 1: Set U2 (˜ x) = −∞; Step 2: Sample a crisp vector (x0 , ξ 0 ) from the α-level set of (˜ x, ξ), if f (˜ x0 , ξ0) > x), then let U2 (˜ x) = f (˜ x0 , ξ0 ), else go to step 1. U2 (˜ Step 3: Repeat steps 1 and 2 for N times, then return U2 (˜ x). 4.2
Neural Networks
A neural network is essentially a nonlinear mapping from the input space to the output space. It is known that a neural network with an arbitrary number of hidden neurons is a universal approximator for continuous functions [2][5]. Moreover, it has high speed of operation after it is well-trained on a set of inputoutput data. In order to speed up the solution process, we train neural networks to approximate uncertain functions, and then use the trained neural networks to evaluate the uncertain functions in the solution process. Training Procedure of Neural Network: Step 1. Sample M fuzzy vectors x ˜1 , x ˜2 , · · · , x ˜M from the reference set X = xi ) and U2 (˜ xi ), i = X1 ⊗ X2 ⊗ · · ·⊗ Xn , then calculate the value of U1 (˜ 1, 2, · · · , M by fuzzy simulation. Step 2. Use these samples to train NNs by BPA to approximate these uncertain functions Step 3. return the tained NN.
308
4.3
J. Gao, J. Zhao, and X. Ji
Hybrid Intelligent Agorithm
Genetic algorithm is a search method for optimization problems based on the mechanics of natural selection in the evolution of biology. The algorithm has demonstrated considerable success in providing approximate optimal solutions to many complex optimization problems. In general genetic algorithm, the feasibility and objective value of each chromosome has to be calculated by timeconsuming fuzzy simulation. A natural idea is to train a neural network to approximate the uncertain functions, and then embed it into genetic algorithm, thus much improved the efficiency of the genetic algorithm. Up to now, an idea of hybrid intelligent algorithm is proposed. The concrete procedure of the hybrid intelligent algorithm may be written as follows: Hybrid Intelligent Algorithm: Step 1. Generate samples of uncertain functions by fuzzy simulation; Step 2. Train neural networks with these samples to approximate the uncertain functions; Step 3. Initialize a population in which the feasibility of each chromosome is checked through the trained neural networks; Step 4. Use crossover and mutation operators to generate offspring, then the trained neural networks to check the feasibility of them; Step 5. Calculate the objective values of all chromosomes through the trained neural networks, then give each of them a fitness; Step 6. Select the chromosomes by spinning the roulette wheel. Step 7. Repeat Step 4 to 6 for a given number of cycles. Step 8. Report the best chromosome as the optimal solution.
5
Numerical Examples
In this section, two numerical examples, which has been written in C language, are given to illustrate the effectiveness of the hybrid intelligent algorithm. Firstly, we give a concrete description of a plant which is going to initiate five types of machines. The fuzzy parameters are given as follows: ai (i = 1, 2, · · · , 5) are fuzzy numbers with membership functions exp(−|x − 5|), exp(−|x − 6|), exp(−|x−4|), exp(−|x−10|) and exp(−|x−7|), respectively; b, bi (i = 1, 2, · · · , 5) are triangular fuzzy numbers (250, 350, 450), (3, 4, 5), (4, 5, 6), (5, 6, 7),(11, 12, 13) and (6, 7, 8), respectively; c, ci (i = 1, 2, · · · , n) are 700, 20, 18, 16, 12 and 15, respectively; ξ11 , ξ21 , ξ31 , ξ42 and ξ52 are triangular fuzzy numbers (50,60,70), (60,70,80), (80,90,100), (110,130,150), (100,110,120), respectively. η1 , η2 are trapezoidal fuzzy numbers (2800, 2900, 3000, 3100), (3300, 3400, 3500, 3600), respectively. Suppose the decision maker wants to maximize the net profit at a predetermined confidence level subject to capital, space, and demands constraints at certain confidence level. According to the discussion in Section 3, the CCP model with fuzzy decisions for capital budgeting is
Fuzzy Chance-Constrained Programming for Capital Budgeting Problem
⎧ maxf ⎪ ⎪ ⎪ ⎪ ⎪ subject to: ⎪ ⎪ ⎪
n ⎪ ⎪ ⎪ Pos{ i=1 ai x ˜i ≥ f } ≥ 0.95 ⎨
n Pos{ i=1 bi x ˜i ≤ b} ≥ 0.95 ⎪
n ⎪ ⎪ ˜i ≤ 700} ≥ 0.90 Pos{ i=1 ci x ⎪ ⎪ ⎪ ⎪ ⎪ Pos{ξ11 x ˜1 + ξ21 x ˜2 + ξ31 x ˜3 ≥ η1 } ≥ 0.85 ⎪ ⎪ ⎪ ⎩ Pos{ξ42 x ˜4 + ξ52 x ˜3 ≥ η2 } ≥ 0.80.
309
(6)
Where fuzzy decision x ˜ = (˜ x1 , x ˜2 , · · · , x ˜5 ) should be taken from the Cartesian product X = X1 ⊗ X2 ⊗ · · · ⊗ X5 , and the reference collection Xi are composed of all fuzzy numbers 4 (1 + 0.05j)/(t + j), t = 6, 7, · · · , 15. j=−4
Then we train an neural network (5 input neurons, 10 hidden neurons, 5 output neurons) to approximate the uncertain functions. Finally, we integrate the trained neural network and genetic algorithm to produce a hybrid intelligent algorithm. A run of the hybrid intelligent algorithm (5000 cycles in simulation, 2000 steps in training, 500 generations in genetic algorithm) shows that the optimal solution (˜ x∗1 , x ˜∗2 , x ˜∗3 , x ˜∗4 , x ˜∗5 ) is ⎧ 4
⎪ ⎪ x ˜∗1 = (1 − 0.05|j|)/(12 + j) ⎪ ⎪ ⎪ j=−4 ⎪ ⎪ 4 ⎪
⎪ ⎪ (1 − 0.05|j|)/(6 + j) x ˜∗2 = ⎪ ⎪ ⎪ j=−4 ⎪ ⎪ ⎨ 4
(1 − 0.05|j|)/(12 + j) x ˜∗3 = ⎪ j=−4 ⎪ ⎪ ⎪ 4
⎪ ⎪ ⎪ x ˜∗4 = (1 − 0.05|j|)/(15 + j) ⎪ ⎪ ⎪ j=−4 ⎪ ⎪ 4 ⎪
⎪ ⎪ ⎩ (1 − 0.05|j|)/(15 + j), x ˜∗5 = j=−4 ∗
whose objective value is f ≈ 433. In order to balance multiple conflicting objectives, suppose that the decision maker has set the target level and priority structure as in Subsection 3. Then we can formulate the CCGP with fuzzy decisions for capital budgeting as follows: ⎧ + + lexmin{d− ⎪ 1 , d2 , d3 } ⎪ ⎪ ⎪ ⎪ subject to: ⎪ ⎪
n ⎪ ⎪ ⎪ Pos{433 − i=1 ai x ˜i ≤ d− ⎪ 1 } ≥ 0.95 ⎪
n ⎪ ⎨ Pos{ i=1 bi x ˜i − b ≤ d+ } 2 ≥ 0.95 (7)
n ⎪ Pos{ i=1 ci x ˜i − 700 ≤ d+ ⎪ 3 } ≥ 0.90 ⎪ ⎪ ⎪ ⎪ Pos{ξ11 x ˜1 + ξ21 x ˜2 + ξ31 x ˜3 ≥ η1 } ≥ 0.85 ⎪ ⎪ ⎪ ⎪ ⎪ x4 + ξ52 x˜5 ≥ η2 } ≥ 0.80 Pos{ξ4 2˜ ⎪ ⎪ ⎩ + + d− , d , d 1 2 3 ≥ 0.
310
J. Gao, J. Zhao, and X. Ji
Then we train an neural network (5 input neurons, 10 hidden neurons, 3 output neurons) to approximate the uncertain function. Finally, we integrate the trained neural network and genetic algorithm to produce a hybrid intelligent algorithm. A run of the hybrid intelligent algorithm (5000 cycles in simulation, 2000 steps in training, 1500 generations in genetic algorithm) shows that the ˜∗2 , x ˜∗3 , x ˜∗4 , x ˜∗5 ) is optimal fuzzy solution (˜ x∗1 , x ⎧ 4
⎪ ⎪ x ˜∗1 = (1 − 0.05|j|)/(12 + j) ⎪ ⎪ ⎪ j=−4 ⎪ ⎪ 4 ⎪
⎪ ⎪ (1 − 0.05|j|)/(14 + j) x ˜∗2 = ⎪ ⎪ ⎪ j=−4 ⎪ ⎪ ⎨ 4
x ˜∗3 = (1 − 0.05|j|)/(6 + j) ⎪ j=−4 ⎪ ⎪ ⎪ 4
⎪ ⎪ ⎪ (1 − 0.05|j|)/(13 + j) x ˜∗4 = ⎪ ⎪ ⎪ j=−4 ⎪ ⎪ 4 ⎪
⎪ ⎪ ⎩ (1 − 0.05|j|)/(13 + j), x ˜∗5 = j=−4
which can satisfy the first two goals, but the third objective is 80.
6
Conclusions
In this paper, CCP and CCGP models were used to formulate capital budgeting problem with fuzzy coefficients and decisions. A hybrid intelligent algorithm was also designed for solving the proposed fuzzy programming models. Two numerical examples showed that the hybrid intelligent algorithm was effective.
Acknowledgments This work was supported by National Natural Science Foundation of China (No.6042-5309), and Specialized Research Fund for the Doctoral Program of Higher Education (No.20020003009).
References 1. Charnes, A., Cooper, W.W.: Chance-constrained programming. Management Science, 6(1959), 73–79 2. Cybenko, G.: Approximations by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2(1989), 183–192 3. De, P.K., Acharya, D., Sahu, K.C.: A chance-constrained goal programming model for capital budgeting. Journal of the Operational Research Society, 33(1982), 635– 638 4. Gao, J., Lu, M.: Dependent-chance integer programming models for capital budgeting problems in fuzzy environments. Proceedings of the Asia-Pacific Conference on Genetic Algorithms and Applications, May 3-5, (2000), Global-Link Publishing Company, Hong Kong, 409–514
Fuzzy Chance-Constrained Programming for Capital Budgeting Problem
311
5. Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Networks, 2(1989), 359–366 6. Iwamura, K., Liu, B.: Chance-constrained integer programming for capital budgeting in fuzzy environments. Journal of the Operational Research Society, 49(1998), 854–860 7. Iwamura, K., Liu B.: Dependent-chance integer programming applied to capital budgeting. Journal of the Operations Research Society of Japan, 42(1999), 117– 127 8. Keown, A.J., Martin, J.D.: A chance-constrained goal programming model for working capital management. Engng Econ 22(1977) 153–174 9. Keown, A.J., Taylor, B.W.: A chance-constrained integer goal programming model for capital budgeting in the production area. Journal of the Operational Research Society, 31 (1980) 579–589 10. Liu, B., Iwamura, K.: A note on chance-constrained programming with fuzzy coefficients. Fuzzy Sets and Systems, 100(1998), 229–233 11. Liu, B., Iwamura, K.: Chance-constrained programming with fuzzy parameters. Fuzzy Sets and Systems, 94(1998), 227–237 12. Liu, B.: Uncertain Programming, John Wiley & Sons, New York, 1999 13. Liu, B., Iwamura, K.: Modeling stochastic decision systems using dependent-chance programming. European Journal of Operational Research, 101(1997), 193–203 14. Liu, B.: Dependent-chance programming: A class of stochastic optimization. Computers & Mathematics with Applications, 34(1997), 89–104 15. Liu, B.: Dependent-chance programming with fuzzy decisions. IEEE Transactions on Fuzzy Systems, 7(1999), 354–360 16. Liu, B.: Dependent-chance programming in fuzzy environments. Fuzzy Sets and Systems, 109(2000), 95–104 17. Liu, B.: Theory and Practice of Uncertain Programming, Physica-Verlag, Heidelberg, 2002
Genetic Algorithms for Dissimilar Shortest Paths Based on Optimal Fuzzy Dissimilar Measure and Applications Yinzhen Li1,2 , Ruichun He1 , Linzhong Liu1 , and Yaohuang Guo2 1 2
Lanzhou Jiaotong University, Lanzhou 730070, China Southwest Jiaotong University, Chengdu, 610031, China
Abstract. The derivative problems from the classical shortest path problem (SPP) are becoming more and more important in real life[1]. The dissimilar shortest paths problem is a typical derivative problem. In Vehicles Navigation System(VNS),it is necessary to provide drivers alternative paths to select. Usually, the path selected is a dissimilar path to the jammed path. In fact, ”dissimilar” is fuzzy. Considering traffic and transportation networks in this paper, we put forward to the definition of dissimilar paths measure that takes into account the decision maker’s preference on both the road sections and the intersections. The minimum model is formulated in which not only the length of paths but also the paths dissimilar measure is considered. And a genetic algorithm also is designed. Finally, we calculate and analyze the dissimilar paths in the traffic network of the middle and east districts of Lanzhou city in P.R. of China by the method proposed in this paper.
1
Introduction
With the development of communications, computer science, traffic and transportation systems, and so on, derivative problems from the classical SPP are becoming more and more important in real life. Many researchers have paid much attention to them[1]. In some applications, because of the restriction of some conditions, it is very difficult to make a single shortest path implementation. Thus, the second shortest path, the third shortest path,. . . , are needed. For example, in order to find a practicable railway freight path, it is always needed to find k-shortest paths because of the restriction of railway transport capacity. In the 0-1 assignment method for urban traffic assignment problem, it is always needed to find the k-shortest paths. There have been many works on finding k-shortest paths [2-4]. There is another kind of k-shortest paths problem called dissimilar paths in traffic and transportation systems. The dissimilar paths are a set of some shortest paths from the source node to the destination node. These shortest paths have minimum common edges or maximum difference in the geography. The maximum difference in the geography is fuzzy. For example, in Vehicles Navigation System(VNS), it is necessary to provide drivers several alternative paths to select. However, because the circumstance or traffic jam is L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 312–320, 2005. c Springer-Verlag Berlin Heidelberg 2005
Genetic Algorithms for Dissimilar Shortest Paths Based on Optimal Fuzzy
313
different in different road sections, and the number of intersections or delay is different in different paths, and so on, drivers always want to choose a satisfied path synthetically according to various factors. Usually, the path chosen is a dissimilar path to the jammed path. In addition, when delivering the military supplies or special hazard goods, dispatchers always choose several dissimilar paths according to weather condition or nature disaster. S.P.Miaou, S.M.Chin, D.R.Shier and C.K.Lee proposed that after obtaining a set of shortest paths by the traditional algorithms for finding k-shortest paths, decision makers choose a satisfied path from them according to their requirements[3,4,5]. However, only one objective is considered in the traditional algorithms, such as time, distance, cost, and so on. Thus the shortest paths in a set are very similar. In many cases, these paths share not a few of common edges or of common nodes. Generally, it is very difficult for the decision makers to choose the best dissimilar path. R.Dial, F.Glover and D.Karney putted forward to the method called GSP(Gateway Shortest Paths)[6]. The interested readers are referred to the reference [6] for a detailed survey. Obviously, this method has randomness in nominating the passing nodes and emphasizes geographical distributes of a network. In addition, K. Lombard and R.L. Church pointed out that this method would get a group of very similar paths sometimes and lose some very excellent dissimilar paths[7]. E. Erkut and N.M. Ruphail proposed the penalty iteration method[8,12]. The essence of the method consists in the penalty mechanism. Besides, there are p dispersion methods[9,10,11]. In the VNS, considering the delay caused by the traffic jam, drivers want to get a group of shortest paths that they are dissimilar in the road sections and intersections. Taking into account this practical situation, Li and He defined the paths dissimilar measure α based on road sections and intersections, and designed the node-edge penalty algorithm[14]. Further, we formulate a fuzzy dissimilar measure and a minimum α model with the limited length of the paths in this paper. And also we design a genetic algorithm. Finally, as an instance, we calculate and analyze the dissimilar paths in the real traffic network of the middle and east districts of Lanzhou city in P.R. of China.
2
Models
2.1
α Fuzzy Measure
Definition 1. Given a directed graph G = {V, E, W }, where V = {v1 , v2 , · · · , vn } is a set of nodes, |V | = n, E = {e1 , e2 , · · · , em } is a set of edges, |E| = m, and W = {wi,j |i, j ∈ V } is a set of weights of edges. If let p1,n be g th path from source node 1 to destination node n, then p1,n = {v1 , ei , vk , · · · , ej , vn } denotes g g a set of the nodes and edges in the path, l1,n is the length of the path, v1,n is the number of nodes in the path not including the source node and destination node. Let P1,n be a set of all paths from the source node 1 to the destination node n.
Definition 2. If ∃p1,n ∈ P1,n , ∃p1,n ∈ P1,n , let l1,n and l1,n be the length of p1,n and p1,n respectively, then define the following membership function
314
Y. Li et al. a
1.0 0.75 0.5 0.25
m
n
The common nodes and edges
Fig. 1. The membership function where, p1,n = p1,n at ”m”, and p1,n ∩ p1,n = Φ at ”n”
⎧ ⎪ ⎨ 1, α=
⎪ ⎩
2λ wi,j l1,n +l1,n
0,
when p1,n = p1,n ,
+
∗ 2(1−λ)v1,n
v1,n +v1,n
,
others,
when p1,n ∩ p1,n = Φ.
called paths fuzzy dissimilar measure of path p1,n to p1,n (or p1,n to p1,n ), simply ∗ called α fuzzy measure, where wi,j : ek ∈ p1,n ∩ p1,n , and v1,n is the number of nodes(not including source node 1 and destination node n )that they present commonly in p1,n and p1,n . λ and 1 − λ are the decision maker’s preference to edges(road sections) and nodes(intersections) respectively. 2.2
Model
Assume we have gotten m − 1 shortest paths pr1,n from node 1 to node n, r = ∗ 1, 2, . . . , m − 1. The shortest path is p∗1,n among them, its length is l1,n . To th m get m shortest path p1,n which has the best paths dissimilar measure to pr1,n , r = 1, 2, . . . , m − 1, we formulate the following optimal α model with the limited length of the path.
2 M in α= C12 αk , k = 1, 2, . . . , Cm . m 2 Cm ⎧ g = ph1,n , ⎪ ⎨ 1, when p1,n ∗ 2(1−λ)v1,n 2λ wi,j + vg +vh , g = 1, 2, . . . , m; h = 1, 2, . . . , m; g = h, , others, s.t. αk = g h l +l 1,n 1,n ⎪ ⎩ 1,n 1,n 0, when p⎧g1,n ∩ ph1,n = Φ. ⎨ 1, if i = 1,
xi,j − xj,i = 0, ∀ i, j ∈ V \{1, n}, ⎩ −1, if i = n.
m wi,j xi,j , l1,n = m ∗ ≤ (1 + β)l1,n , l1,n xi,j = 0 or 1.
where, the first formula minimizes paths fuzzy dissimilar measure α . The second formula calculates the paths fuzzy dissimilar measure of arbitrary two paths. The third is the constraints of the path from node 1 to node n. The 4th and 5th are the constraints of mth path length. β is a constant given. It is obvious that the model is NP-Hard. We propose the following genetic algorithms.
Genetic Algorithms for Dissimilar Shortest Paths Based on Optimal Fuzzy
3
315
Algorithm
The priority-based encoding genetic algorithm is a very effective method. M.Gen [13] solved the bicriterion SP problem by this method in 2004. But he only considered acyclic graph. By the use of the idea of the priority-based encoding, aiming at arbitrary networks with non-negative weights, we design the genetic algorithm for the above model. The dissimilar paths problem is very complex. Its solution is a set of solutions, not one solution. And the set of solutions includes the shortest path p∗1,n . Thus it is not necessary to design and calculate the fitness function of single chromosome. In other words, it is needed to formulate the fitness function of a set of chromosomes, and select several sets of chromosomes as the offspring chromosomes. Assume we want to find m optimal dissimilar paths, then each set of chromosomes includes m − 1 chromosomes, because the shortest path p∗1,n is certainly included in the optimal solutions. Thus, the size of population is certainly m − 1 multiples.
i
1
2
3
4
5
6
7
Priority n(i)
5
4
1
6
7
3
2
Node
Fig. 2. The code of a chromosome
The steps of the algorithm are as follows. Step 1. Find the shortest path p∗1,n from node 1 to node n by Dijkstra algorithm, and marked as mth path. Let l∗ be its length. Step 2. Generate initial population. Step 2.1. Let n(i) = i, ∀i ∈ V . Step 2.2. Generate & n2 ' random integer pairs l1 , l2 ∈ [1, n], if l1 = l2 , then swap(n(l1 ), n(l2 )). Array n() is a chromosome. Step 2.3. Decode. The process of decoding from node 1 to node n is as follows. i=1 flag=false P =Φ while not flag A(i)={all adjacent vertices of node i} repeat j ∗ = arg max{n(j)}, j ∈ A(i) j
∗
if j ∈ P then A(i) − {j ∗ } −→ A(i) else p(i) = j ∗ ; {j ∗ } + P −→ P ; i = j ∗ , break
316
Y. Li et al.
if j ∗ = n then flag=true; return if A(i) = Φ then go to Step 2.1 { array n() is a unfeasible chromosome.} until true loop p() is a path from node 1 to node n. Step 2.4. Examine the feasibility.
Calculate the length of path p() : l = wi,j , If l ≤ (1 + β)l∗ , then n() is a feasible chromosome, otherwise, n() is not feasible, go to Step 2.1. Step 2.5. If size feasible chromosomes are obtained, go to Step3, otherwise, go to Step2.1 to continue. Step 3. Crossover operator.The OX(Order Crossover) operator is adopted in the paper. Select chromosomes by crossover probability Pc and randomly group them by pairs. In this way, we can obtain the pairs as crossover parents. The OX is shown in Figure.3.
Parant1
5
4
1
6
7
3
2
Offspring
5
4
1
6
7
3
2
Parant2
4
7
6
1
2
3
5
Fig. 3. Example of OX operator
Step 3.1. Random multi-position crossover is adopted here. We can get an offspring by crossing one pairs. For example, an offspring is generated by selecting preserved genes of parent1 at random and replace unpreserved genes of parent1 with corresponding genes of parent2. Step 3.2. Examine the feasibility of the offspring generated in Step 3.1 by the use of Step 2.3 and Step 2.4. If it is unfeasible, repeat Step 3.1 and Step 3.2 until it is feasible or the predetermined rounds has been run. If we cannot obtain a feasible offspring by Step 3.1 and Step 3.2, the parent1( or parent2) will be reserved as an offspring. Step 3.3. Repeat Step 3.1 and Step 3.2 until all pairs are crossed. Step 4. Mutation operator. Select chromosomes by mutation probability Pm . The swap mutation operator is used here. Step 4.1. Select two gene positions of a mutation parent randomly and then swap them. The process is shown in Figure.4.
Genetic Algorithms for Dissimilar Shortest Paths Based on Optimal Fuzzy
Parant
Offspring
4
3
2
1
6
5
7
4
3
5
1
6
2
7
317
Fig. 4. Example of mutation operator
Step 4.2. Examine the feasibility of the offspring generated in Step 4.1 by the use of Step 2.3 and Step 2.4. If it is unfeasible, repeat Step 4.1 and Step 4.2 until it is feasible or the predetermined rounds has been run. If we cannot obtain a feasible offspring by Step 4.1 and Step 4.2, the parent will be reserved as an offspring. Step 4.3. Repeat Step 4.1 and Step 4.2 until all mutation chromosomes are processed. Step 5. Evaluate the fitness function of chromosomes group. Because of 0 ≤ α ≤ 1, we know by the definition of α that the smaller α, the better the paths dissimilar measure. Thus, the fitness function of k th chromosomes group is defined as follows: eval(k) = 1 −
αk m−1 Csize
m−1 , k = 1, 2, . . . , Csize .
αj
j=1
where αk= C12
m
⎧ h pg ⎪ 1,n = p1,n , ⎪ 1, when
∗
⎨ 2λ wi,j 2(1−λ)v1,n + g , g = 1, 2, . . . , m; h = 1, 2, . . . , m; g = h, others, g h l1,n +lh v1,n +v1,n 1,n g,h ⎪ ⎪ ⎩ h 0, when pg ∩ p = Φ. 1,n 1,n
Step 6. We select T (T × (m − 1) = size) chromosomes groups as the offspring chromosomes by wheel approach. The probability of k th chromosomes group to be selected is eval(k) m−1
, k = 1, 2, . . . , Csize . eval(k) k
Step 7. Repeat Step 3-Step 6 until a satisfied set of solutions is obtained or predetermined rounds is run.
4
Instance
In VNS, it is very necessary for the drivers to avoid the jammed traffic routes by providing some dissimilar paths for them. Figure.5 is the simplified network of the middle and east districts of Lanzhou city in China. In terms of the real data and calculating, the route 1, 4, 9, 14, 20, 21, 28, 34, 40, 44, 45 is always jammed among all paths from 1 to 45. Now we want to provide 4 dissimilar routes from 1 to 45 for the drivers. We designed the computer program for the algorithm and
318
Y. Li et al. Table 1. The computing results no dissimilar path length 1 1, 4, 9, 14, 20, 21, 28, 34, 40, 44, 45 11271 2 1, 3, 8, 13, 19, 26, 33, 39, 43, 45 12932 3 1, 5, 11, 12, 45 13509 4 1, 5, 11, 22, 29, 35, 40, 44, 45 12384 5 1, 4, 9, 14, 20, 27, 33, 39, 43, 45 12398 The average dissimilar measure α=0.11145
run it with the instance. Where size = 40, Pc = 0.4, Pm = 0.2, λ = 0.5, β = 0.2. The computing results are in table 1. Where no.2–no.5 are dissimilar paths that are provided for the drivers. The decision makers may choose the required route in table 1. Figure.6 shows that the optimal solutions does not mater with the initial population. This illustrates the robustness of the algorithm proposed in the paper is good. In addition, Figure.7 shows that the convergence of the algorithm is also good.
5
Conclusion
In Vehicles Navigation System(VNS),it is necessary to provide drivers alternative paths to select. Usually, the path selected is a fuzzy dissimilar path to the jammed path. Considering traffic and transportation networks in this paper, we put forward to the definition of α fuzzy dissimilar paths measure that takes into account the decision maker’s preference on both the road sections and the intersections. The minimum α model is formulated in which not only the length of paths but also the paths dissimilar measure is considered. The genetic algorithm designed has good robustness and convergence. It can find a group of dissimilar paths being of both shortest lengths and best fuzzy dissimilar measures.
Fig. 5. The simplified traffic network of middle and east districts of Lanzhou city
Genetic Algorithms for Dissimilar Shortest Paths Based on Optimal Fuzzy
319
Fig. 6. The relation of optimal and initial population
Fig. 7. The evolution process of the first population
Acknowledgments This work is supported by National Natural Science Foundation of China(No. 70071028), and Qinglan Project of Lanzhou Jiaotong University.
References 1. B.V. Cherkassky, Andrew V. Goldberg, Tomasz Radzik. Shortest paths algorithms:Theory and experimental evaluation[J]. Mathematical programming, 1996: 129–174. 2. J.Y.Yen. Finding the k shortest loopless paths in a network[J]. Management Science, 1971: 17, 721–716. 3. S.P. Miaou, S.M. Chin. Computing k-shortest paths for nuclear spent fuel highway transportation[J]. European Journal of Operational Research, 1991: 53,64–80. 4. D.R. Shier. On algorithms for finding the k shortest paths in a network[J]. Networks,1979: 17,341–352. 5. C.K.Lee. A multiple path routing strategy for Vehicle route guidance systems[J]. Tran. Res.,1994:2c, 185–195. 6. R.Dial, F.Glover, D.Karney. A computational analysis of alternative algorithms and labeling techniques for finding shortest path trees[J]. Networks,1999:9, 214– 248.
320
Y. Li et al.
7. K. Lombard, R.L. Church. The gateway shortest path problem:Generating alternative routes for a corridor location problem[J]. Geographical systems,1993: 1,25–45. 8. Kanliang Wang,Yinfeng Xu. A model and algorithm on the path selection in dissimilar path problem [J]. TheoryMethodApplication of system engineering, 2001: 1, 8–12. 9. M.Kuby. Programming models for facility dispersion: The p-dispersion and Maximum Dispersion Problems[J]. Geographical Analysis,1987: 19, 315–329. 10. E. Erkut. The discrete p-dispersion problem[J]. European Journal of operational Research,1990: 46,48–60. 11. N.M. Ruphail, et al. A Dss for dynamic pre-trip route planning[A]. Applications of advanced technologies in transportation engineering[C]: Proc of the 4th Intl. Conference,1995: 325–329. 12. Qun Yang . Research on the path selection based on multiple rational paths[J]. Journal of managing Engineering, 2004: 4, 42–45. 13. M.Gen. Bicriterion network design problem[R]. the 3d International Information and Management Science Conference, Dunhuang, P.R.of China,2004: 5, 419–425. 14. Yinzhen Li and Ruichun He. On the models and algorithms for finding dissimilar shortest paths in a traffic network[R]. The 7th National Operation Research Conference of China, Qingdao, China, 2004: 8
Convergence Criteria and Convergence Relations for Sequences of Fuzzy Random Variables Yan-Kui Liu1 and Jinwu Gao2 1
2
College of Mathematics and Computer Science, Hebei University Baoding 071002, Hebei, China [email protected] School of Information, Renmin University of China, Beijing 100872, China [email protected]
Abstract. Fuzzy random variable is a measurable map from a probability space to a collection of fuzzy variables. In this paper, we first present several new convergence concepts for sequences of fuzzy random variables, including convergence almost sure, uniform convergence, almost uniform convergence, convergence in mean chance, and convergence in mean chance distribution. Then, we discuss the criteria for convergence almost sure, almost uniform convergence, and convergence in mean chance. Finally, we deal with the relationship among various types of convergence.
1
Introduction
In classical measure theory [1], there are well-known convergence theorems such as Lebesgue’s theorem, Riesz’s theorem and Egoroff’s theorem for sequences of measurable functions. Nonadditive measure theory [22] is an extension of measure theory. By using the structural characteristics of fuzzy measure, most of the convergence theorems in measure theory for both real-valued and set-valued measurable functions have been studied [10,11]. Possibility theory [2,4,23] can be regarded as a subset theory of nonadditive measure theory since it is based on two dual semicontinuous nonadditive measures—possibility and necessity measures, it is also the theoretical foundation of fuzzy programming [7]. In a practical decision-making process, we often face a hybrid uncertain environment where linguistic and frequent nature coexist. Fuzzy random variable is one of the tools to deal with this kind of twofold uncertainty. The term fuzzy random variable was introduced by Kwakernaak [6] to depict the phenomena in which fuzziness and randomness appear simultaneously. Since then, its variants as well as extensions were presented by other researchers, aiming at different purposes, e.g., Kr¨ atschmer [3], Kruse and Meyer [5], L´ opez-Diaz and Gil [17], and Puri and Ralescu [20]. In this paper, we adopt the definition of fuzzy random variable given in [12] for fuzzy random optimization such as fuzzy random expected value model [13], fuzzy random minimum-risk problem [14], and fuzzy random equilibrium chance-constrained programming [15]. For other formulation of fuzzy random programming, we may refer to Wang and Qiao [16], and L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 321–331, 2005. c Springer-Verlag Berlin Heidelberg 2005
322
Y.-K. Liu and J. Gao
Luhandjula [18]. Our goal in this paper is to study the convergence for sequences of fuzzy random variables, which are the theoretical foundation of fuzzy random optimization. The material is arranged into five sections. First, in Section 2, we recall some concepts in fuzzy random theory such as fuzzy variable, credibility measure, fuzzy random variable, and the mean chance of a fuzzy random event. Section 3 first presents several new convergence concepts for sequences of fuzzy random variables such as uniform convergence, convergence in mean chance, and convergence in mean chance distribution, then discusses their convergence criteria. The relationship among various types of convergence is covered in Section 4.
2
Fuzzy Random Variables
Given a universe Γ , an ample field [21] A on Γ is a class of subsets of Γ that is closed under arbitrary unions, intersections, and complementation in Γ . Let Pos be a set function defined on the ample field A. The set function Pos is said to be a possibility measure if it satisfies the following conditions (P 1) Pos(∅)=0, and Pos(Γ ) = 1; Ai = sup Pos(Ai ) for any subclass {Ai | i ∈ I} of A, where I is (P 2) Pos i∈I
i∈I
an arbitrary index set. The triplet (Γ, A, Pos) will be called a possibility space, which was called a pattern space by Nahmias [19]. Based on possibility measure, a self-dual set function Cr, called credibility measure, was defined as follows [9]: Definition 1. Let (Γ, A, Pos) be a possibility space. The credibility measure, denoted Cr, is defined as Cr(A) =
1 (1 + Pos(A) − Pos(Ac )) , 2
A ∈ A,
(1)
where Ac is the complement of A. A credibility measure has the following properties: (C1) (C2) (C3) (C4)
Cr(∅)=0, and Cr(Γ ) = 1. Monotonicity: Cr(A) ≤ Cr(B) for all A, B ∈ A with A ⊂ B. Self-duality: Cr(A) + Cr(Ac ) = 1 for all A ∈ A. Subadditivity: Cr(A ∪ B) ≤ Cr(A) + Cr(B) for all A, B ∈ A.
Using possibility space, a fuzzy variable can be defined as follows. Definition 2. Let (Γ, A, Pos) be a possibility space. A function X from Γ to is said to be a fuzzy variable defined on the possibility space if {γ | X(γ) ≤ t} ∈ A for every t ∈ .
(2)
Convergence Criteria and Convergence Relations
323
The possibility distribution of X is defined as μX (t) = Pos{γ ∈ Γ | X(γ) = t}
(3)
for every t ∈ . Definition 3 ([12]). Let (Ω, Σ, Pr) be a probability space. A fuzzy random variable is a map ξ : Ω → Fv such that for any Borel subset B of , the following function (4) ξ ∗ (B)(ω) = Pos {γ | ξω (γ) ∈ B} is measurable with respect to ω, where Fv is a collection of fuzzy variables defined on a possibility space (Γ, A, Pos). Suppose ξ is a fuzzy random variable, and f is a continuous functions defined on . In the theory of fuzzy random optimization, a fuzzy random event is usually expressed as follows (5) f (ξ) ≤ 0. For each fixed ω ∈ Ω, ξω is a fuzzy variable and (5) becomes a fuzzy event f (ξω ) ≤ 0
(6)
whose credibility is denoted as Cr {f (ξω ) ≤ 0}, which is a measurable function of ω [12]. Definition 4 ([14]). Let ξ be a fuzzy random variable, and f : → be a continuous function. Then the mean chance, denoted Ch, of the fuzzy random event (5) is defined as
1
Ch {f (ξ) ≤ 0} =
Pr {ω ∈ Ω | Cr {f (ξω ) ≤ 0} ≥ p} dp.
(7)
0
Since Pr is an additive measure, the mean chance defined by (7) is equivalent to the following form Cr {f (ξω ) ≤ 0} Pr(d ω). (8) Ch {f (ξ) ≤ 0} = Ω
3 3.1
Convergence Criteria Modes of Convergence
The intent of this section is to present some new convergence concepts for sequences of fuzzy random variables, including convergence almost uniform, convergence almost sure, convergence in chance, and convergence in distribution. From measure-theoretic viewpoint, these concepts are important for the development of fuzzy random theory [8].
324
Y.-K. Liu and J. Gao
Definition 5 (Convergence almost sure). A sequence {ξn } of fuzzy random variables is said to converge almost surely to a fuzzy random variable ξ, denoted a.s. ξn −→ ξ, if there exist E ∈ Σ, F ∈ A with Pr(E) = Cr(F ) = 0 such that for every (ω, γ) ∈ Ω\E × Γ \F , lim ξn,ω (γ) → ξω (γ).
n→∞
Definition 6 (Uniform convergence). A sequence {ξn } of fuzzy random variables is said to converge uniformly to a fuzzy random variable ξ on Ω×Γ , denoted u. ξn −→ ξ, if lim sup |ξn,ω (γ) − ξω (γ)| = 0. n→∞ (ω,γ)∈Ω×Γ
Definition 7 (Almost uniform convergence). A sequence {ξn } of fuzzy random variables is said to converge almost uniformly to a fuzzy random variable ξ, a.u. denoted ξn −→ ξ, if there exist two nonincreasing sequences {Em } ⊂ Σ, {Fm } ⊂ A with limm Pr(Em ) = limm Cr(Fm ) = 0 such that for each m = 1, 2, · · · , we u. have ξn −→ ξ on Ω\Em × Γ \Fm . Definition 8 (Convergence in chance). A sequence {ξn } of fuzzy random variables is said to converge in chance Ch to a fuzzy random variable ξ, denoted Ch ξn −→ ξ, if for every ε > 0, lim Ch{|ξn − ξ| ≥ ε} = 0.
n→∞
Definition 9. Let ξ be a fuzzy random variable. The (mean) chance distribution function of ξ is defined as Gξ (t) = Ch{ξ ≥ t},
t ∈ .
It is evident that Gξ is a nonincreasing real-valued function. Let {Fn } and F be nonincreasing real-valued functions. The sequence {Fn } w. is said to converge weakly to F , denoted Fn −→ F , if Fn (t) → F (t) for all continuity points t of F . Definition 10 (Convergence in distribution). Let Gξn and Gξ be chance distribution functions of fuzzy random variables ξn and ξ, respectively. The d. sequence {ξn } is said to converge in distribution to ξ, denoted ξn −→ ξ, if w. Gξn −→ Gξ . 3.2
Convergence Criteria
The purpose of this section is to deal with the criteria for the convergence almost sure, convergence almost uniform, and convergence in mean chance. Proposition 1 (Convergence a.s. criterion). Suppose {ξn } and ξ are fuzzy a.s. random variables. Then ξn −→ ξ if and only if for every ε > 0, " ∞ ∞ $ ! # Ch {|ξn − ξ| ≥ ε} = 0. (9) m=1 n=m
Convergence Criteria and Convergence Relations a.s.
325
a.s.
Proof. First, it is easy to check that ξn −→ ξ iff the limit ξn,ω −→ ξω holds almost surely with respect to ω, i.e., 0 . 1 a.s. Pr ω . ξn,ω −→ ξω = 1. Therefore, there exists E ∈ Σ with Pr(E) = 0 such that for each ω ∈ Ω\E, a.s. ξn,ω −→ ξω , i.e., for every ε > 0, " ∞ ∞ $ ! # {γ ∈ Γ | |ξn,ω (γ) − ξω (γ)| ≥ ε} = 0, a.s. Cr m=1 n=m
which is equivalent to " Ch
∞ ∞ # !
$ {|ξn − ξ| ≥ ε}
= 0.
m=1 n=m
The proof is complete.
Proposition 2 (Convergence a.u. criterion). Suppose {ξn } and ξ are fuzzy a.u. random variables. If ξn −→ ξ, then for every ε > 0, the limit " ∞ $ # lim Cr {γ ∈ Γ | |ξn,ω (γ) − ξω (γ)| ≥ ε} = 0 (10) m→∞
n=m
holds almost surely with respect to ω, and furthermore " ∞ $ # {|ξn − ξ| ≥ ε} = 0. lim Ch m→∞
(11)
n=m
Conversely, if ω is a discrete random variable assuming finite number of values, a.u. then Eq. (10) implies ξn −→ ξ. a.u.
Proof. If ξn −→ ξ, then there exist two nonincreasing sequences {Em } ⊂ Σ, {Fm } ⊂ A with limm Pr(Em ) = limm Cr(Fm ) = 0 such that for each m = u. 1, 2, · · · , ξn −→ ξ on Ω\Em × Γ \Fm . ∞ Let E = Em . Then Pr(E) = 0, and for every ω ∈ Ω\E, there is a m=1
positive integer mω such that ω ∈ Ω\Emω . Since {Em } is nonincreasing, we have ω ∈ Ω\Em whenever m ≥ mω . Therefore, there is a subsequence {Fm , m ≥ mω } of {Fm } such that for each mω , mω + 1, · · ·, the sequence {ξn,ω } converges to ξω a.u. uniformly on Fm , i.e., ξn,ω −→ ξω almost surely with respect to ω. a.u. We now show that ξn,ω −→ ξω implies " ∞ $ # {|ξn − ξ| ≥ ε} = 0. lim Ch m→∞
n=m
326
Y.-K. Liu and J. Gao
In fact, for any δ > 0, there exists Fω ∈ A with Cr(Fω ) < δ such that {ξn,ω } converges to ξω uniformly on Γ \Fω . Thus, for every ε > 0, there exists a positive integer m(ε, ω) such that for all γ ∈ Γ \Fω , |ξn,ω (γ)−ξω (γ)| < ε whenever n ≥ m. Therefore ∞ ! {γ ∈ Γ | |ξn,ω (γ) − ξω (γ)| < ε}, Γ \Fω ⊂ n=m
or
∞ #
{γ ∈ Γ | |ξn,ω (γ) − ξω (γ)| ≥ ε} ⊂ Fω ,
n=m
which implies " Cr
∞ #
$ {γ ∈ Γ | |ξn,ω (γ) − ξω (γ)| ≥ ε}
≤ Cr(Fω ) < δ.
n=m
Letting δ → 0, we have lim sup Cr n→∞
"
∞ #
$ {γ ∈ Γ | |ξn,ω (γ) − ξω (γ)| ≥ ε}
= 0.
n=m
Applying dominated convergence theorem, we obtain " ∞ $ " ∞ $ # # lim Ch {|ξn − ξ| ≥ ε} = lim Cr {|ξn,ω − ξω | ≥ ε} = 0. m→∞
Ω n→∞
n=m
n=m a.u.
Conversely, suppose Eq. (10) is valid, we prove ξn −→ ξ. By supposition, assume ω has the following probability distribution ω1 , ω2 , · · · , ωN ω∼ p1 , p2 , · · · , pN
N
pi = 1. Since for each i = 1, · · · , N, we have " ∞ $ # lim Cr {γ ∈ Γ | |ξn,ωi (γ) − ξωi (γ)| ≥ ε} = 0.
with pi > 0 and
i=1
m→∞
n=m
Then for every δ ∈ (0, 1), and each k = 1, 2, · · · , there exists a positive integer mk such that " ∞ $ # Cr {γ ∈ Γ | |ξn,ωi (γ) − ξωi (γ)| ≥ 1/k} < δ/2k n=mk
for i = 1, · · · , N. Let F =
∞ ∞ N # # # i=1 k=1 n=mk
{γ ∈ Γ | |ξn,ωi (γ) − ξωi (γ)| ≥ 1/k}.
Convergence Criteria and Convergence Relations
327
Then N ∞
∞
{γ ∈ Γ | |ξn,ωi (γ) − ξωi (γ)| ≥ 1/k} i=1 k=1 n=mk ∞ ≤ sup sup Pos {γ | |ξn,ωi (γ) − ξωi (γ)| ≥ 1/k} < δ.
Cr(F ) = Cr
i
k
n=mk
In addition, for each k = 1, 2, · · · , one has sup |ξn,ωi (γ) − ξωi (γ)| < 1/k
sup
1≤i≤N γ∈Γ \F
a.u.
whenever n ≥ mk , which implies ξn −→ ξ. The proof is complete.
Proposition 3 (Convergence in chance criterion). Suppose {ξn } and ξ Ch are fuzzy random variables. Then ξn −→ ξ if and only if for every ε > 0, Pr Cr{γ | |ξn,ω (γ) − ξω (γ)| ≥ ε} −→ 0. Ch
Proof. Assume that ξn −→ ξ, then for every ε > 0 and η > 0, Pr{ω | Cr{|ξn,ω (γ) − ξω (γ)| ≥ ε} ≥ η} ≤ η1 Ω Cr{γ | |ξn,ω (γ) − ξω (γ)| ≥ ε}Pr(dω) = η1 Ch{|ξn − ξ| ≥ ε}, Pr
which implies Cr{γ | |ξn,ω (γ) − ξω (γ)| ≥ ε} −→ 0. Pr
On the other hand, if Cr{γ | |ξn,ω (γ) − ξω (γ)| ≥ ε} −→ 0, then for every α ∈ (0, 1], lim Pr{ω | Cr{|ξn,ω (γ) − ξω (γ)| ≥ ε} ≥ α} = 0. n→∞
Applying Lebesgue bounded convergence theorem, we obtain lim Ch{|ξn − ξ| ≥ ε}
n→∞
=
1 lim Pr{ω 0 n→∞
| Cr{|ξn,ω (γ) − ξω (γ)| ≥ ε} ≥ α}dα = 0,
which completes the proof. Ch
Corollary 1. Suppose {ξn } and ξ are fuzzy random variables. Then ξn −→ ξ Cr provided that the limit ξn,ω −→ ξω holds almost surely with respect to ω. Cr
Proof. Assume that ξn,ω −→ ξω almost surely with respect to ω. Then for every ε > 0, the limit lim Cr{γ | |ξn,ω (γ) − ξω (γ)| ≥ ε} = 0 n→∞
holds with probability 1. Since convergence a.s. implies convergence in probability, one has Pr Cr{γ | |ξn,ω (γ) − ξω (γ)| ≥ ε} −→ 0, Ch
which, by Proposition 3, implies ξn −→ ξ.
328
4
Y.-K. Liu and J. Gao
Relationship
In this section, we discuss the relationship among various types of convergence based on the convergence criteria obtained in Section 3. The following theorem compares convergence a.u. and convergence a.s.. a.u.
Theorem 1. Suppose {ξn } and ξ are fuzzy random variables. If ξn −→ ξ, then a.s. ξn −→ ξ. a.u.
Proof. Assume ξn −→ ξ. By Proposition 2, we have " ∞ $ # lim Ch {|ξn − ξ| ≥ ε} = 0. m→∞
"
Since Ch
n=m
$
∞ ∞ # !
{|ξn − ξ| ≥ ε}
≤
m=1 n=m
"
we obtain Ch
∞ #
{|ξn − ξ| ≥ ε},
n=m ∞ ∞ # !
$ {|ξn − ξ| ≥ ε}
= 0.
m=1 n=m a.s.
It follows from Proposition 1 that ξn −→ ξ. The proof is complete.
The following theorem compares convergence a.u. and convergence in chance. a.u.
Theorem 2. Suppose {ξn } and ξ are fuzzy random variables. If ξn −→ ξ, then Ch ξn −→ ξ. Conversely, if ω is a discrete random variable assuming finite number of Ch a.u. values, then ξn −→ ξ implies ξn −→ ξ. a.u.
Proof. Suppose ξn −→ ξ. By Proposition 2, we have " ∞ $ # {|ξn − ξ| ≥ ε} = 0. lim Ch m→∞
n=m
According to the following inequality " Ch {|ξm − ξ| ≥ ε} ≤ Ch
∞ #
$ {|ξn − ξ| ≥ ε} ,
n=m Ch
we obtain that lim Ch {|ξm − ξ| ≥ ε} = 0, i.e., ξn −→ ξ. m→∞ We now assume ω is a discrete random variable whose probability distribution is ω1 , ω2 , · · · , ωN ω∼ p1 , p2 , · · · , pN
Convergence Criteria and Convergence Relations
329
N Cr Ch with pi > 0 and i=1 pi = 1. By ξn −→ ξ, we deduce that ξn,ωi −→ ξωi for i = 1, 2, · · · , N . Then for each positive integer k = 1, 2, · · · , lim Cr {γ ∈ Γ | |ξn,ωi (γ) − ξωi (γ)| ≥ 1/k} = 0
n→∞
for i = 1, 2, · · · , N. Thus, for each m, there exists Nkm such that for i = 1, 2, · · · , N , Cr {γ ∈ Γ | |ξn,ωi (γ) − ξωi (γ)| ≥ 1/k} < 1/2m whenever n ≥ Nkm . Letting Em =
N # ∞ #
#
{γ ∈ Γ | |ξn,ωi (γ) − ξωi (γ)| ≥ 1/k},
i=1 k=1 n≥Nkm
then we have
1 Cr(Em ) ≤ sup sup sup Pos γ ∈ Γ | |ξn,ωi (γ) − ξωi (γ)| ≥ k i k n≥Nkm
2 < 1/m.
It is easy to show that {ξn,ωi } converges to ξωi uniformly on each Γ \Em . Thus, a.u. ξn −→ ξ. The proof is complete. As a consequence of Theorems 1 and 2, we have the following result: Theorem 3. Suppose {ξn } and ξ are fuzzy random variables. If ω is a discrete a.s.
Ch
random variable assuming finite number of values, then ξn −→ ξ implies ξn −→ ξ. Finally, we compare convergence in chance and convergence in distribution. Ch
Theorem 4. Suppose {ξn } and ξ are fuzzy random variables. If ξn −→ ξ, then d.
ξn −→ ξ. Proof. Let Gn and G be the chance distribution functions of ξn and ξ, respectively. Then for every ω ∈ Ω, t ∈ , ε > 0 and each integer n, one has Cr{ξn,ω ≥ t} ≤ Cr{ξn,ω ≥ t, |ξn,ω − ξω | < ε} + Cr{ξn,ω ≥ t, |ξn,ω − ξω | ≥ ε} ≤ Cr{ξω ≥ t − ε} + Cr{|ξn,ω − ξω | ≥ ε}. Integrating with respect to ω on the preceding inequality, we obtain Ch{ξn ≥ t} ≤ Ch{ξ ≥ t − ε} + Ch{|ξn − ξ| ≥ ε}. Letting n → ∞, and then ε → 0, we obtain lim sup Ch{ξn ≥ t} ≤ G(t− ), n→∞
where G(t− ) is the left limit of function G(s) at s = t.
330
Y.-K. Liu and J. Gao
On the other hand, according to the following inequality Cr{ξω ≥ t + ε} ≤ Cr{ξω ≥ t + ε, |ξn,ω − ξω | < ε} + Cr{ξω ≥ t + ε, |ξn,ω − ξω | ≥ ε} ≤ Cr{ξn,ω ≥ t} + Cr{|ξn,ω − ξω | ≥ ε}, we have Ch{ξ ≥ t + ε} ≤ Ch{ξn ≥ t} + Ch{|ξn − ξ| ≥ ε}. Letting n → ∞, and then ε → 0, we deduce lim inf Ch{ξn ≥ t} ≥ G(t+ ), n→∞
where G(t+ ) is the right limit of function G(s) at s = t. w.
d.
Therefore, Gn −→ G, i.e., ξn −→ ξ.
5
Concluding Remarks
The major new results of this paper can be summarized as the following three aspects: • Several new convergence concepts such as convergence almost uniform, convergence in mean chance and convergence in mean chance distribution for sequences of fuzzy random variables were presented. • The criteria for convergence almost sure, convergence almost uniform, and convergence in chance criterion were obtained. • The relationship between convergence almost uniform and convergence almost sure, convergence almost uniform and convergence in chance, convergence in chance and convergence almost sure, and convergence in chance and convergence in distribution were established. As the continuation of this work, there are several issues will be considered in our future research. For instance, we will apply the obtained convergence results to fuzzy random optimization to discuss the convergence of approximation approach to the mean chance of a fuzzy random event and the expected value of a fuzzy random variable.
Acknowledgments This work was supported by Natural Science Foundation of Hebei Province Grant A2005000087, and National Natural Science Foundation of China (NSFC).
References 1. Cohn, D. L.: Measure Theory. Birkh¨auser, Boston (1980) 2. Dubois, D., Prade, H.: Possibility Theory. Plenum Press, New York (1988)
Convergence Criteria and Convergence Relations
331
3. Kr¨ atschmer, V.: A Unified Approach to Fuzzy Random Variables. Fuzzy Sets Syst. 123 (2001) 1–9 4. Klir, G. J.: On Fuzzy-Set Interpretation of Possibility Theory. Fuzzy Sets Syst. 108 (1999) 263–373 5. Kruse, R., Meyer, K. D.: Statistics with Vague Data. D. Reidel Publishing Company, Dordrecht (1987) 6. Kwakernaak, H.: Fuzzy Random Variables–I. Definitions and Theorems. Inform. Sciences 15 (1978) 1–29 7. Liu, B.: Uncertain Programming. John Wiley & Sons, New York (1999) 8. Liu, B.: Uncertainty Theory: An Introduction to Its Axiomatic Foundations. Springer-Verlag, Berlin Heidelberg New York (2004) 9. Liu, B., Liu, Y.-K.: Expected Value of Fuzzy Variable and Fuzzy Expected Value Models. IEEE Trans. Fuzzy Syst. 10 (2002) 445–450 10. Liu, Y.-K.: On the Convergence of Measurable Set-Valued Function Sequence on Fuzzy Measure Space. Fuzzy Sets Syst. 112 (2000) 241–249 11. Liu, Y.-K., Liu, B.: The Relationship between Structural Characteristics of Fuzzy Measure and Convergences of Sequences of Measurable Functions. Fuzzy Sets Syst. 120 (2001) 511–516 12. Liu, Y.-K., Liu, B.: Fuzzy Random Variable: A Scalar Expected Value Operator. Fuzzy Optimization and Decision Making 2 (2003) 143–160 13. Liu, Y.-K., Liu, B.: A Class of Fuzzy Random Optimization: Expected Value Models. Inform. Sciences 155 (2003) 89–102 14. Liu, Y.-K., Liu, B.: On Minimum-Risk Problems in Fuzzy Random Decision Systems. Computers & Operations Research 32 (2005) 257-283 15. Liu, Y.-K., Liu, B.: Fuzzy Random Programming with Equilibrium Chance Constraints. Inform. Sciences 170 (2005) 363-395 16. Wang, G., Qiao, Z.: Linear Programming with Fuzzy Random Variable Coefficients. Fuzzy Sets Syst. 57 (1993) 295–311 17. L´ opez-Diaz, M., Gil, M. A.: Constructive Definitions of Fuzzy Random Variables. Statistics and Probability Letters 36 (1997) 135–143 18. Luhandjula, M. K.: Fuzziness and Randomness in an Optimization Framework. Fuzzy Sets Syst. 77 (1996) 291–297 19. Nahmias, S.: Fuzzy Variables. Fuzzy Sets Syst. 1 (1978) 97–101 20. Puri, M. L., Ralescu, D. A.: Fuzzy Random Variables. J. Math. Anal. Appl. 114 (1986) 409–422 21. Wang, P.: Fuzzy Contactability and Fuzzy Variables. Fuzzy Sets Syst. 8 (1982) 81–92 22. Wang, Z., Klir, G. J.: Fuzzy Measure Theory. Plenum Press, New York (1992) 23. Zadeh, L. A.: Fuzzy Sets as a Basis for a Theory of Possibility. Fuzzy Sets Syst. 1 (1978) 3–28
Hybrid Genetic-SPSA Algorithm Based on Random Fuzzy Simulation for Chance-Constrained Programming Yufu Ning1,2 , Wansheng Tang1 , and Hui Wang3 1 2
Institute of Systems Engineering, Tianjin University, Tianjin 300072, China Department of Computer Science, Dezhou University, Dezhou 253023, China 3 Department of Statistics, Henan Institute of Finance and Economics, Zhengzhou 450002, China
Abstract. In this paper, hybrid genetic-SPSA algorithm based on random fuzzy simulation is proposed for solving chance-constrained programming in random fuzzy decision-making systems by combining random fuzzy simulation, genetic algorithm (GA), and simultaneous perturbation stochastic approximation (SPSA). In the provided algorithm, random fuzzy simulation is designed to estimate the chance of a random fuzzy event and the optimistic value to a random fuzzy variable, GA is employed to search for the optimal solution in the entire space, and SPSA is used to improve the new chromosomes obtained by crossover and mutation operations at each generation in GA. At the end of this paper, an example is given to illustrate the effectiveness of the presented algorithm.
1
Introduction
In a large number of realistic decision-making systems, randomness and fuzziness usually appear simultaneously. Fuzzy random variable was introduced by Kwakernaak [2] as a function from a probability space to a collection of fuzzy variables. In contrast with fuzzy random variables, Liu [3] defined a random fuzzy variable from a possibility space to a collection of random variables. In random fuzzy environments, the chance-constrained programming models have been provided in [3,4]. The basic idea for constructing random fuzzy chance-constrained programming is making the decision to maximize the optimistic value to the random fuzzy return subject to some chance constraints. To solve random fuzzy chance-constrained programming, hybrid intelligent algorithms (HIAs) have been designed in [3,4], integrating random fuzzy simulation, neural network (NN) and genetic algorithm (GA). In the HIAs, GA is used to search for the optimal solution in the entire space. GA performs well in global search, but it is relatively time-consuming to converge to a local optimum, especially in solving large scale problems. A local search method, on the other hand, can converge faster to a local optimum but is poor at finding global optimum. Therefore, the issue incorporating local search method into GA has attracted a lot of researchers such as [1,5,8]. L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 332–335, 2005. c Springer-Verlag Berlin Heidelberg 2005
Hybrid Genetic-SPSA Algorithm Based on Random Fuzzy Simulation
333
SPSA is a gradient approximation method for solving optimization problems. This gradient approximation uses only two measurements of the objective function in each iteration, independent of the dimension of the problem. So the SPSA algorithm is efficient in solving large-dimensional optimization problems [6]. The details on the implementation of SPSA can be found in [7]. In essence, SPSA is a “hill-climbing” method, so it is a local search algorithm. Inspired by the above mentioned algorithms, this paper proposes hybrid genetic-SPSA algorithm based on random fuzzy simulation for solving random fuzzy chance-constrained programming. In Section 2, random fuzzy chanceconstrained programming is introduced, then an algorithm is proposed in Section 3. Finally, the presented algorithm is employed to solve an example in Section 4.
2
Random Fuzzy Chance-Constrained Programming
The details on the concept and results about random fuzzy variables can be found in [3]. It is assumed that the decision-maker wants to maximize the optimistic value to the random fuzzy return subject to some chance constraints. Then the following random fuzzy chance-constrained programming can be constructed [3]: ⎧ max f ⎪ ⎪ ⎨ subject to: / Ch f (x, ξ) ≥ f (γ) ≥ δ, ⎪ ⎪ ⎩ Ch{gj (x, ξ) ≤ 0}(αj ) ≥ βj , j = 1, 2, · · · , p,
(1)
where x is a decision vector, ξ is a random fuzzy vector, f (x, ξ) is the objective function, gj (x, ξ) is the constraint functions for j = 1, 2, · · · , p, and αj , βj are specified confidence levels for j = 1, 2, · · · , p.
3
Hybrid Genetic-SPSA Algorithm Based on Random Fuzzy Simulation
One of the difficulties in solving model (1) is to calculate the chances of random fuzzy events, Ch{gj (x, ξ) ≤ 0}(αj ), j = 1, 2, · · · , p, and the (γ, δ)-optimistic value f to random fuzzy variable f (x, ξ), for every fixed x. In most practical decision-making systems, it is very difficult or impossible to obtain their exact values by analytic methods. Therefore, random fuzzy simulation is used to estimate these values (see [3]). In this section, hybrid genetic-SPSA algorithm based on random fuzzy simulation is proposed. The procedure is summarized as follows. Step 1) Determine the size of population pop size, crossover probability Pc , mutation probability Pm in GA, and the parameters a, A, c, α, γ in SPSA.
334
Y. Ning, W. Tang, and H. Wang
Step 2) Generate randomly pop size feasible chromosomes. To verify the feasibility of each chromosome, random fuzzy simulation is used to estimate the values of constraint functions. Step 3) Apply SPSA to all the pop size chromosomes in initial population. Step 4) Estimate the objective values of the pop size chromosomes by random fuzzy simulation, and calculate the fitness of each chromosome. Step 5) Select the chromosomes by spinning the roulette wheel. Step 6) Apply crossover operator to the chromosomes selected by Pc . Step 7) Apply SPSA to the new chromosomes obtained by crossover operation. Step 8) Apply mutation operator to the chromosomes selected by Pm . Step 9) Apply SPSA to the new chromosomes obtained by mutation operation. Step 10) If the termination criterion for GA is satisfied, turn to next step. Otherwise, return to Step 4. Step 11) Apply SPSA to all the final chromosomes obtained by GA. Step 12) Display the chromosome with maximum fitness among all the pop size chromosomes improved by SPSA, then end the algorithm. Remark 1. The choice of parameters a, A, c, α, γ in SPSA is crucial for the performance of the proposed algorithm. A detailed guideline can be found in [7]. Remark 2. To prevent SPSA from spending almost all available computation time to improve the new chromosomes generated by crossover and mutation operations in GA, a relatively small number of iterations in SPSA before GA ends are required. Besides, to further improve the final chromosomes reached by GA, a relatively large number of iterations in SPSA after GA ends are recommended.
4
An Example
Let us consider the following model: ⎧ ⎪ ⎪ max f ⎪ ⎪ ⎨ subject / - to: ≥ f (0.95) ≥ 0.90, Ch -ξ1 x21 − ξ2 x32 + ξ3 x33 − ξ4 x24 − ξ2 x1 x2 − ξ3 x3 x4 / ⎪ ⎪ ⎪ Ch (ξ3 x21 − x2 )(ξ2 x3 − x1 ) + ξ4 x22 + x3 + x4 ≤ 10 (0.90) ≥ 0.85, ⎪ ⎩ i = 1, 2, 3, 4, xi ≥ 0, where x1 , x2 , x3 and x4 are decision variables, ξ1 , ξ2 , ξ3 and ξ4 are random fuzzy variables defined as follows: ξ1 ∼ U(ρ1 − 1, ρ1 + 1),
ξ2 ∼ N (ρ2 , 2),
ξ3 ∼ EX P(ρ3 ),
ξ4 ∼ N (ρ4 , 4),
where ρ1 is a fuzzy variable with membership function μρ1 (x) = [1−(x−1)2 ]∨0,ρ2 is a triangular fuzzy variable (2, 3, 4), ρ3 is a trapezoidal fuzzy variable (1, 2, 3, 4), and ρ4 is a fuzzy variable with membership function μρ4 (x) = [1 − (x − 4)2 ] ∨ 0. The solution obtained by the proposed algorithm (the parameters pop size = 30, Pc = 0.3, Pm = 0.2, a = 0.16, A = 100, c = 0.20, α = 0.602, γ = 0.101, 5000
Hybrid Genetic-SPSA Algorithm Based on Random Fuzzy Simulation
335
cycles in random fuzzy simulation, 500 generations in GA, and 1500 iterations in SPSA) is x∗1 = 1.0823, x∗2 = 0.0031, x∗3 = 0.2530, x∗4 = 0.2317 with the objective value −1.2805. Now, we employ GA alone to search for the optimal solution, then the solution after 2000 generations is x1 = 9.5156, x2 = 0.0700, x3 = 1.7186, x4 = 0.4427 with the objective value −59.7336.
5
Conclusion
This paper proposed hybrid genetic-SPSA algorithm based on random fuzzy simulation, which had both global search ability of GA and strong convergence property of SPSA. Finally, an example showed the validity of the provided algorithm.
Acknowledgments This work was supported by the National Natural Science Foundation of China Grant No. 70471049 and China Postdoctoral Science Foundation No. 2004035013.
References 1. Ishibuchi, H., Murata, T.: A multi-objective genetic local search algorithm and its application to flowshop scheduling. IEEE Transactions on Systems, Man, and Cybernetics - C. 28 (1998) 392-403 2. Kwakernaak, H.: Fuzzy random variables-I: Definition and theorems. Information Sciences. 15 (1978) 1-29 3. Liu, B.: Theory and Practice of Uncertain Programming. Physica-Verlag, Heidelberg (2002) 4. Liu, Y. K., Liu, B.: Random fuzzy programming with chance measures defined by fuzzy integrals. Mathematical and Computer Modelling. 36 (2002) 509-524 5. Renders, J. M., Flasse, S.: Hybrid methods using genetic algorithm for global optimization. IEEE Transactions on Systems, Man, and Cybernetics - B. 26 (1996) 243-258 6. Spall, J. C.: Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Transactions on Automatic Control. 37 (1992) 332-341 7. Spall, J. C.: Implementation of the simultaneous perturbation algorithm for stochastic optimization. IEEE Transactions on Aerospace and Electronic Systems. 34 (1998) 817-823 8. Tsai, J. T., Liu, T. K., Chou, J. H.: Hybrid Taguchi-genetic algorithm for global numerical optimization. IEEE Transactions on Evolutionary Computation. 8 (2004) 365-377
Random Fuzzy Age-Dependent Replacement Policy Song Xu, Jiashun Zhang, and Ruiqing Zhao Institute of Systems Engineering, Tianjin University, Tianjin, 300072, China [email protected], [email protected]
Abstract. This paper discusses the age-dependent replacement policy, in which the interarrival lifetimes of components are characterized as random fuzzy variables. A random fuzzy expected value model is presented and shown how it can be applied to reduce the loss of system failures. To solve the proposed model, a simultaneous perturbation stochastic approximation (SPSA) algorithm based on random fuzzy simulation is developed to search the optimal solution. At the end of this paper, a numerical example is enumerated.
1
Introduction
In the past several decades, renewal processes have been extensively investigated in the literature. Most models treated in the literature restrict attention to the system lifetime which is characterized through probability measure. Based on probability theory, stochastic renewal theory considers the processes in which the interarrival maintenance times are assumed to be independent and identically distributed (iid) random variables [1]. Different from stochastic renewal theory, fuzzy renewal theory reckons that the interarrival times can not be known precisely in the real-life world. Under this theory, Huang [2] presented a geneticevolved fuzzy system for maintenance scheduling of generating units. Huang et al [3] proposed a new approach using fuzzy dynamic programming for generator maintenance scheduling in power systems. Suresh [9] gave out a fuzzy-set model for maintenance policy of multi-state equipment. In many practical situations, however, randomness and fuzziness are required to be considered simultaneously in one process. Zhao and Tang [10] considered a random fuzzy renewal process, in which the interarrival times were modeled as iid random fuzzy variables. Later, Zhao and Tang [11] extended the idea of Zhao and Tang [10] to random fuzzy renewal reward process and provided a theorem on the limit of the long-run expected reward per unit time. In this paper, the lifetimes of components are treated as random fuzzy variables defined by Liu and Liu [4]. In Section 2, we recall some basic concepts and results about random fuzzy variables. In Section 3, we discuss the random fuzzy age-dependent replacement policy, and presente a mathematic model for this policy. In Section 4, we employ the random fuzzy simulation and SPSA algorithm to solve the proposed model. At last, a numerical example is enumerated. L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 336–339, 2005. c Springer-Verlag Berlin Heidelberg 2005
Random Fuzzy Age-Dependent Replacement Policy
2
337
Random Fuzzy Variables
Let Θ be a nonempty set, P (Θ) the power set of Θ, and Pos a possibility measure. Then (Θ, P (Θ), Pos) is called a possibility space (for details, see [7]). Based on the possibility measure, necessity and credibility of a fuzzy event are given as Nec{A} = 1 − Pos{Ac } and Cr{A} = 12 (Pos{A} + Nec{A}), respectively . A random fuzzy variable is a function from the possibility space (Θ, P (Θ), Pos) to the set of random variables. In a practical system, the lifetimes of components should be nonnegative and iid. Definition 1. A random fuzzy variable ξ is nonnegative if and only if for each θ ∈ Θ, Pr(ξ(θ) < 0) = 0. Definition 2 (Liu [6]). The random fuzzy variables ξ1 , ξ2 , · · · , ξn are iid if and only if (Pr{ξi (θ ∈ B1 )}, Pr{ξi (θ ∈ B2 )}, · · · , Pr{ξi (θ ∈ Bm )}), i = 1, 2, · · · , n are iid vectors for any Borel set B1 , B2 , · · · , Bm of and any integer m. Definition 3 (Liu and Liu [4]). Let ξ be a random fuzzy variable. Then the expected value E[ξ] is defined by 0 ∞ . . Cr{θ ∈ Θ . E[ξ(θ)] ≥ r}dr − Cr{θ ∈ Θ . E[ξ(θ)] ≤ r}dr (1) E[ξ] = −∞
0
provided that at least one of the two integrals is finite.
3
Random Fuzzy Age-Dependent Replacement Policy
In this section, we focus on the age-dependent replacement policy in random fuzzy environment. Let ξi be the interarrival time between the (i − 1)th event and the ith event, i = 1, 2, · · ·, respectively. Here, we suppose that ξi , i = 1, 2, · · · are iid nonnegative random fuzzy variables in this section. Then the length of the ith maintenance cycle is Li (T ) = ξi I(ξi ≤T ) + T I(ξi >T ) , where I(·) is the characteristic function of the event (·). And the cost of the ith cycle is Ci (T ) = cf I(ξi ≤T ) + cp I(ξi >T ) , where cf and cp are the costs upon failures and at the predetermined age T , respectively. For the routine maintenance cost should be less than the loss of system failure, it is easy to see that cf > cp . Let N (t) be the total number of the events by time t, and C(t) the total reward by time t, then we have N (t) Ci (T ). C(t) = i=1
Definition 4. The long-run expected cost per unit time under the age-dependent replacement policy is defined as Ca (T ) = lim
t→∞
E[C(t)] . t
(2)
338
S. Xu, J. Zhang, and R. Zhao
1 (T ) Furthermore, if E C L1 (T ) is finite, it follows from Zhao and Tang [10] that 1 (T ) Ca (T ) = E C L1 (T ) . Our problem is to find an optimal value T such that Ca (T ) is minimized. This problem can be described as the following model. ⎧ C1 (T ) ⎪ min E ⎪ L1 (T ) ⎨ ⎪ ⎪ ⎩
subject to: T > 0,
where L1 (T ) = ξ1 I(ξ1 ≤T ) + T I(ξ1 >T ) is the length of the first cycle, C1 (T ) = cf I(ξ1 ≤T ) + cp I(ξ1 >T ) is the cost of the first cycle, and ξ1 is the lifetime of first renewed component.
4
SPSA Algorithm Based on Random Fuzzy Simulation and Numerical Experiment
1 (T ) For any fixed time T , it is difficult to calculate the value of E C L1 (T ) due to the complexity of function structure. Here we employ the random fuzzy simulation technique designed by Liu and Liu [4] to estimate it. The method for estimating E
C1 (T ) L1 (T )
is summrized as follows.
1. Set E=0. 2. Uniformly sample θi from Θ such that Pos{θi } ≥ ξ, i = 1, 2, · · · , N, where ξ is a sufficiently small number. C1 (T )(θi ) 1 (T )(θi ) 3. Let a = min E L1 (T )(θi ) and b = max E C L1 (T )(θi ) . 1≤i≤N
1≤i≤N
4. Uniformly generate r from 0 [a, b]. 1 . 1 (T )(θi ) ≥ r . 5. If r > 0, then E ← E + Cr θ ∈ Θ . E C L1 (T )(θi ) 0 1 . C1 (T )(θi ) 6. If r < 0, then E ← E − Cr θ ∈ Θ . E L1 (T )(θi ) ≤ r . 7. Repeat the fourth to sixth steps for N times. 8. E
C1 (T ) L1 (T )
= a ∨ 0 + b ∧ 0 + E · (b − a)/N .
In addition, since the gradient of Ca (T ) may not always exist, the common gradient descent algorithms can not be used to find an optimal value of T such that Ca (T ) is minimized. Here, we introduce SPSA algorithm based on random fuzzy simulation to solve the problem (about SPSA algorithm, see [8]). 1. Choose initial guess of T and the nonnegative parameters a, c, A, α, and γ. 2. Generate the perturbation Δk , which should satisfy the conditions provided in Spall [8]. 3. Compute Ca (Tˆk + ck Δk ) and Ca (Tˆk − ck Δk ) by random fuzzy simulation proposed above, where ck = c/(c + k + 1)γ .
Random Fuzzy Age-Dependent Replacement Policy
339
4. Generate the simultaneous perturbation approximation to the unknown gradient Ca (Tˆk + ck Δk ) − Ca (Tˆk − ck Δk ) gˆk (Tˆk ) = . (3) 2ck Δk 5. Use Tˆk+1 = Tˆk − ak gˆk (Tˆk ) to update Tˆk , where ak = a/(A + k + 1)α . 6. Return to Step 2 with k + 1 replacing k. Terminate the algorithm and return Tˆk , if there is little change in several successive iterates. Example 1. Let ξ be the lifetime of the component and defined as ξ ∼ EX P 1θ , where θ = (0, 4, 7, 10), and cf = 1000, cp = 300. In order to minimize the long-run expected cost per unit time, we first initialize the value of decision variable Tˆ0 = 5 and select coefficients A = 100, c = 0.2, α = 0.602, γ = 0.101. Then we begin the SPSA iterations presented in Section 4. After 3000 cycles iterations, we get the optimal solution is T ∗ = 6.87385 with the cost Ca (T ∗ ) = 77.9375.
Acknowledgments This work was supported by National Natural Science Foundation of China Grant No. 70471049 and China Postdoctoral Science Foundation No. 2004035013.
References 1. Asmussen, S.: Applied Probability and Queues. Wiley, New York. (1987) 2. Huang, S.: A genetic-evolved fuzzy system for maintenance scheduling of generating units. Electrical Power Energy Systems. 20 (1998) 191-195 3. Huang, C., Lin, C.: Fuzzy approach for generator maintenance scheduling. Electric Power System Research. 24 (1992) 31-38 4. Liu, Y., Liu, B.: Expected value operator of random fuzzy variable and random fuzzy expected value models. International Journal of Uncertainty Fuzziness & Knowledge-Based Systems. 11 (2003) 195-215 5. Liu, B.: Uncertainty Theory: An Introduction to its Axiomatic Foundations. Springer-Verlag, Berlin (2004) 6. Liu, B.: Theory and Practice of Uncertain Programming. Phisica-Verlag, Heidelberg (2002) 7. Nahmias, S.: Fuzzy variables. Fuzzy Sets and Systems. 1 (1978) 97-110 8. Spall, J.:An overview of the simultaneous perturbation method for efficient optimization. Johns Hopkins Apl Technical Digest. 19 (1998) 482-492 9. Suresh, P.,Chaudhuri, D.: Fuzzy-set approach to select maintenance strategies for multistate equipment. IEEE Transactions on Reliability. 43 (1994) 431-456 10. Zhao, R., Tang, W.: Random fuzzy renewal process. European Journal of Operational Research. to be published 11. Zhao, R., Tang, W.: Random fuzzy renewal reward process. Information Sciences. to be published
A Theorem for Fuzzy Random Alternating Renewal Processes Ruiqing Zhao, Wansheng Tang, and Guofei Li Institute of Systems Engineering, Tianjin University, Tianjin 300072, China [email protected], [email protected]
Abstract. In this paper, a new kind of alternating renewal processes— fuzzy random alternating renewal processes—is devoted. A theorem on the limit value of the mean chance of the fuzzy random event “system is on at time t” is presented. The two degenerate cases of the theorem, stochastic and fuzzy cases, are also analyzed. The importance of the results lies in the fact that the relation between classical alternating renewal processes and fuzzy random alternating renewal processes is established.
1
Introduction
Stochastic alternating renewal process is one of important process in stochastic renewal theory. In such a process, on and off times are usually assumed to be independent and identically distributed (iid) random variables, respectively. The results on stochastic alternating renewal processes have been discussed in detail in almost all standard text books in stochastic processes such as Asmussen [1], Birolini [2], Daley and Vere-jones [3], Ross [20] and Tijms [21]. After stochastic renewal theory, fuzzy renewal theory has been considered by several authors. In this theory, the parameters such as the interarrival times were supposed to be fuzzy variables. Dozzi et al [4] gave a limit theorem for a counting renewal process indexed by fuzzy sets. Liu [13] discussed a fuzzyvalued Markov process and gave some properties on this process. Zhao and Liu [22] discussed fuzzy renewal process and fuzzy renewal reward process in which fuzzy elementary renewal theorem and renewal reward theorem were provided. As an extension of stochastic renewal theory and fuzzy renewal theory, fuzzy random renewal theory has been considered recently. Popova and Wu [17] considered a fuzzy random renewal reward process in which interarrival times and rewards were assumed to be fuzzy random variables and gave a theorem on the long-run average fuzzy reward per unit time. Hwang [5] investigated a renewal process in which the interarrival times were considered as iid fuzzy random variables and provided a theorem on the fuzzy rate of the fuzzy random renewal process. As a continuation of the above-mentioned works, this paper deals with a new kind of renewal processes—fuzzy random alternating renewal processes. In Section 2, we review some basic concepts on fuzzy variables and fuzzy random variables. Finally, fuzzy random alternating renewal processes were discussed in Section 3. L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 340–349, 2005. c Springer-Verlag Berlin Heidelberg 2005
A Theorem for Fuzzy Random Alternating Renewal Processes
2
341
Fuzzy Variables and Fuzzy Random Variables
Let ξ be a fuzzy variable on possibility space (Θ, P(Θ), Pos), where Θ is a universe, P(Θ) is the power set of Θ and Pos is a possibility measure defined on P(Θ). Based on the possibility measure Pos, necessity (Nec) and credibility (Cr) of a fuzzy event {ξ ≥ r} can be expressed by Nec{ξ ≥ r} = 1 − Pos{ξ < r}, Cr{ξ ≥ r} =
(1)
1 (Pos{ξ ≥ r} + Nec{ξ ≥ r}) , 2
respectively. Definition 1 (Liu [9]). Let ξ be a fuzzy variable on possibility space (Θ, P(Θ), Pos), and α ∈ (0, 1]. Then - . / ξα = inf r . Pos{ξ ≤ r} ≥ α
and
- . / ξα = sup r . Pos{ξ ≥ r} ≥ α
(2)
are called α-pessimistic value and α-optimistic value of ξ, respectively.
Definition 2 (Liu and Liu [12]). Let ξ be a fuzzy variable on the possibility space (Θ, P(Θ), Pos). The expected value E[ξ] is defined as E[ξ] =
∞
Cr{ξ ≥ r}dr −
0
−∞
0
Cr{ξ ≤ r}dr
(3)
provided that at least one of the two integrals is finite. The definitions of fuzzy random variable have been discussed by several authors [6],[7], [8], [10], [11], [14], [18], [19]. In what follows, we will introduce the results in [14] that are related to our work. We assume that (Ω, A, Pr) is a probability space. Let F be a collection of fuzzy variables defined on the possibility space (Θ, P(Θ), Pos). Definition 3 (Liu and Liu [14]). A fuzzy random variable is a function ξ : Ω → F such that for any Borel set B of , Pos {ξ(ω) ∈ B} is a measurable function of ω. Definition 4 (Liu and Liu [14]). Let ξ be a fuzzy random variable defined on the probability space (Ω, A, Pr). Then its expected value is defined by E[ξ] = Ω
0
∞
Cr{ξ(ω) ≥ r}dr −
0
−∞
Cr{ξ(ω) ≤ r}dr Pr(dω)
provided that at least one of the two integrals is finite.
342
R. Zhao, W. Tang, and G. Li
Definition 5 (Liu and Liu [16]). Let ξ be a fuzzy random variable defined on the possibility space (Θ, P(Θ), Pos). Then the mean chance, denoted by Ch, of fuzzy random event characterized by {ξ ≤ 0} is defined as Ch{ξ ≤ 0} =
1
. / Pr ω ∈ Ω . Cr {ξ(ω) ≤ 0} ≥ α dα.
(4)
0
Proposition 1. It follows that the equation (4) is equivalent to the following form 1 0 1 . . 1 1 0 Ch{ξ ≤ 0} = Pr ω ∈ Ω . ξα (ω) ≤ 0 + Pr ω ∈ Ω . ξα (ω) ≤ 0 dα, 2 0
where ξα (ω) and ξα (ω) are α-pessimistic value and α-optimistic value of ξ(ω), respectively. Proof. Note that 1 . Pr ω ∈ Ω . Cr {ξ(ω) ≤ 0} ≥ α dα Ch (ξ ≤ 0)= 0
Cr {ξ(ω) ≤ 0} Pr(dω)
= Ω
= = = =
1 2 1 2 1 2 1 2 1 2
=
1 2 1 2
=
1 2 1 2
[Pos (ξ(ω) ≤ 0) + Nec (ξ(ω) ≤ 0)] Pr(dω) Ω
Pos (ξ(ω) ≤ 0) Pr(dω) + Ω
Pos (ξ(ω) ≤ 0) Pr(dω) + Ω
1 2 1 2
Nec (ξ(ω) ≤ 0) Pr(dω) Ω
[1 − Pos (ξ(ω) > 0)] Pr(dω) Ω
1
. Pr ω ∈ Ω . Pos (ξ(ω) ≤ 0) ≥ α dα+
1
. Pr ω ∈ Ω . [1 − Pos (ξ(ω) ≥ 0)] ≥ α dα
1
. Pr ω ∈ Ω . Pos (ξ(ω) ≤ 0) ≥ α dα+
1
. Pr ω ∈ Ω . Pos (ξ(ω) > 0) ≤ 1 − α dα
1
. Pr ω ∈ Ω . Pos (ξ(ω) ≤ 0) ≥ α dα+
0
. Pr ω ∈ Ω . Pos (ξ(ω) > 0) ≤ p (−dp)
0
0
0
0
0
1
A Theorem for Fuzzy Random Alternating Renewal Processes
=
1 2 1 2
=
1 2 1 2
= = =
1 2 1 2 1 2
343
1
. Pr ω ∈ Ω . Pos (ξ(ω) ≤ 0) ≥ α dα+
1
. Pr ω ∈ Ω . Pos (ξ(ω) > 0) ≤ p dp
1
. Pr ω ∈ Ω . Pos (ξ(ω) ≤ 0) ≥ α dα+
1
. Pr ω ∈ Ω . Pos (ξ(ω) > 0) ≤ α dα
1
. . Pr ω ∈ Ω . Pos (ξ(ω) ≤ 0) ≥ α +Pr ω ∈ Ω . Pos (ξ(ω) > 0) ≤ α dα
0
0
0
0
0 1
. . Pr ω ∈ Ω . ξα (ω) ≤ 0 + Pr ω ∈ Ω . ξα (ω) ≥ 0 dα
0 1
Pr ξα (ω) ≤ 0 + Pr ξα (ω) ≤ 0 dα.
0
The proof is completed. . Remark 1. For convenience, the smaller one of Pr{ω ∈ Ω . ξα (ω) ≤ 0} and . Pr{ω ∈ Ω . ξα (ω) ≤ 0} is called the α-pessimistic probability of the event {ξ ≤ 0} and the other is called the α-optimistic probability of the event {ξ ≤ 0}. Definition 6 (Liu and Liu [14]). The fuzzy random variables ξ1 , ξ2 , · · ·, ξn are indepentent and identically distributed if and only if (Pos{ξi (ω) ∈ B1 }, Pos{ξi (ω) ∈ B2 }, · · · , Pos{ξi (ω) ∈ Bm }) , i = 1, 2, · · · , n are iid random vectors for any Borel sets B1 , B2 , · · · , Bm of and any positive integer m.
3
Fuzzy Random Alternating Renewal Processes
Consider a system that can be in one of the two states: on or off. Initially it is on and it remains on for a period of time ξ1 ; it then goes off and remains off for a period of time η1 ; it then goes on for a period of time ξ2 ; then off for a period of time η2 ; then on, and so forth. We suppose that (ξi , ηi ), i = 1, 2, · · · are iid fuzzy random vectors on the probability space (Ω, A, Pr), especially, ξi and ηi are assumed to be independent. The process characterized by {(ξi , ηi ), i ≥ 1} is called a fuzzy random alternating renewal process. For each ω ∈ Ω, ξi (ω) and ηi (ω) are clearly fuzzy variables and their α pessimistic values and α-optimistic values are denoted by ξi,α (ω), ξi,α (ω), ηi,α (ω), and ηi,α (ω), respectively.
344
R. Zhao, W. Tang, and G. Li
In what follows, we will pay our attention to the mean chance of the event {system is on at t}. For convenience, the α-pessimistic probability of the event . {system is on at t} is denoted by Prα {ω ∈ Ω . system is on at t} and its α. optimistic probability is denoted by Prα {ω ∈ Ω . system is on at t}. By Definition 5 and Proposition 1, we have Ch{system is on at time t} 1 = 12 0 Prα {ω ∈ Ω|system is on at t} + Prα {ω ∈ Ω|system is on at t} dα. (5) Definition 7. A positive random variable ζ is lattice if and only if there exists ∞
Pr{ζ = nd} = 1. d ≥ 0 such that n=0
Theorem 1. Assume that {(ξi , ηi ), i ≥ 1} be a sequence of pairs of iid positive fuzzy random variables. For any given ω ∈ Ω and α ∈ (0, 1], let ξi,α (ω), ηi,α (ω), ξi,α (ω), and ηi,α (ω) be α-pessimistic values and α-optimistic values of ξi (ω) and ηi (ω), i = 1, 2 · · · If the random variables ξi,α (ω), ξi,α (ω), ηi,α (ω) and ηi,α (ω) are nonlattice, E ξ1,α (ω) + E η1,α (ω) < ∞ and E ξ1,α (ω) + E η1,α (ω) < ∞, then E ξ1,α (ω) . , lim Prα {ω ∈ Ω . system is on at t} = (6) t→∞ E ξ1,α (ω) + E η1,α (ω) E ξ1,α (ω) . E ξ1,α (ω) + E η1,α (ω)
. lim Prα {ω ∈ Ω . system is on at t} =
t→∞
(7)
Proof. For any given α ∈ (0, 1], we define two sequences of real numbers {βi } and {γi } such that α ≤ βi ≤ 1 and α ≤ γi ≤ 1 for i = 1, 2, · · · Furthermore, for any given ω ∈ Ω, ξi (ω) and ηi (ω) are clearly fuzzy variables. Let ξβi (ω) be the βi -pessimistic value or the βi -optimistic value of ξi (ω) and ηγi (ω) the γi -pessimistic value or the γi -optimistic value of ηi (ω). Hence, we have
ξi,α (ω) ≤ ξβi (ω) ≤ ξi,α (ω)
and
ηi,α (ω) ≤ ηγi (ω) ≤ ηi,α (ω)
for each ω ∈ Ω. Moreover, ξβi (ω) and ηγi (ω) are random variables when ω varies all over Ω. Thus, for any given t > 0, we have 1 0 1 0 (8) Pr ξi,α (ω) > t ≤ Pr {ξβi (ω) > t} ≤ Pr ξi,α (ω) > t and
0 1 0 1 Pr ηi,α (ω) > t ≤ Pr {ηγi (ω) > t} ≤ Pr ηi,α (ω) > t .
(9)
A Theorem for Fuzzy Random Alternating Renewal Processes
345
Now let us consider the following three processes:
(1) the process A characterized by {(ξi,α (ω), ηi,α (ω)), i ≥ 1}; (2) the process B characterized by {(ξi,α (ω), ηi,α (ω)), i ≥ 1}; (3) the process C characterized by {(ξβi (ω), ηγi (ω)), i ≥ 1}. It is easy to know that the processes A and B are two standard stochastic alternating renewal processes since (ξi,α (ω), ηi,α (ω)) and (ξi,α (ω), ηi,α (ω)), i = 1, 2, · · · are iid random vectors, respectively. Correspondingly, let P1 (t) = Pr{process A is on at t},
(10)
P2 (t) = Pr{process B is on at t},
(11)
P3 (t) = Pr{process C is on at t}.
(12)
P1 (t) ≤ P3 (t) ≤ P2 (t).
(13)
Now we prove that We just prove the left inequation of (13). To do so, we say that a renewal takes place each time the system goes on. Conditioning on the time of that last renewal prior to or at time t yields . / P1 (t)= Pr process A is on at t . SA = 0
∞
+
. / Pr process A is on at t . SA = y dFSA (y)
0
and
. / P3 (t)= Pr process C is on at t . SC = 0 +
∞
. / Pr process C is on at t . SC = y dFSC (y),
0
where SA represents the time of the last renewal prior to or at time t in the process A, SC the time of the last renewal prior to or at time t in the process C, FSA (y) the distribution function of SA and FSC (y) the distribution function of SC . Furthermore, we have 0 1 . . / Pr process A is on at t . SA = 0 = Pr ξ1,α (ω) > t . ξ1,α (ω) + η1,α (ω) > t , . . / / Pr process C is on at t . SC = 0 = Pr ξβ1 (ω) > t . ξβ1 (ω) + ηγ1 (ω) > t , and for y < t, 0 1 . . / Pr process A is on at t . SA = y = Pr ξ1,α (ω) > t − y . ξ1,α (ω) + η1,α (ω) > t − y , . . / / Pr process C is on at t . SC = y = Pr ξβ (ω) > t − y . ξβ (ω) + ηγ (ω) > t − y ,
346
R. Zhao, W. Tang, and G. Li
where ξβ (ω) and ηγ (ω) are the β-pessimistic (or β-optimistic) value and γpessimistic (or γ-optimistic) value of the last on and off times prior to or at time t in the process C, respectively. It follows from (8), (9) and the definition of the conditional probability that 0 1 0 1 . . Pr ξ1,α (ω) > t . ξ1,α (ω) + η1,α (ω) > t ≤ Pr ξ1,α (ω) > t . ξ1,α (ω) + ηγ1 (ω) > t . / ≤ Pr ξβ (ω) > t . ξβ (ω) + ηγ (ω) > t ,
i.e., . . / / Pr process A is on at time t . SA = 0 ≤ Pr process C is on at t . SC = 0 . (14)
Similarly, we can prove that . . / / Pr process A is on at time t . SA = y ≤ Pr process C is on at t . SC = y
(15)
for any y < t. It follows from (14) and (15) that the left inequation of (13) is proved. The similar technique may prove the right inequation of (13). By the arbitrariness of the process C, it follows from (13) that . (16) P1 (t) = Prα {ω ∈ Ω . system is on at t} and
. P2 (t) = Prα {ω ∈ Ω . system is on at t}.
(17)
Note that the processes A and B are two standard stochastic alternating renewal processes, it follows from the random alternating renewal process that E ξ1,α (ω) lim P1 (t) = (18) t→∞ E ξ1,α (ω) + E η1,α (ω) E ξ1,α (ω) (19) lim P2 (t) = t→∞ E ξ1,α (ω) + E η1,α (ω) provide that E ξ1,α (ω) + E η1,α (ω) < ∞ and E ξ1,α (ω) + E η1,α (ω) < ∞ and
(see Ross [20]). Finally, it follows from (16) and (17) that the results (6) and (7) hold. The theorem is proved. Remark 2. If {(ξi , ηi ), i ≥ 1} degenerates to a sequence of iid random vectors, then the results (6) and (7) in Theorem 1 degenerate to the form lim Pr{system is on at t} =
t→∞
E [ξ1 ] , E [ξ1 ] + E [η1 ]
which is just the conventional result in stochastic case (see Ross [20]).
A Theorem for Fuzzy Random Alternating Renewal Processes
347
Remark 3. If {(ξi , ηi ), i ≥ 1} degenerates to a sequence of fuzzy vectors with the same membership function, then, for each α ∈ (0, 1], α-pessimistic values and α-optimistic values of ξ1 and η1 degenerate to four real numbers (denoted by ξα , ξα , ηα , ηα ). The results (6) and (7) in Theorem 1 degenerate to the form ξα ξα + ηα
. lim Prα {ω ∈ Ω . system is on at t} =
ξα , ξα + ηα
t→∞
and
. lim Prα {ω ∈ Ω . system is on at t} =
t→∞
(20)
respectively.
Let ξ be one of the fuzzy variables with α-pessimistic value E ξ1,α (ω) , and α-optimistic value E ξ1,α (ω) , and η one of the fuzzy variables with α pessimistic value E η1,α (ω) and α-optimistic value E η1,α (ω) . Then we have the following theorem. Theorem 2. Assume that {(ξi , ηi ), i ≥ 1} is a sequence of pairs of iid positive fuzzy random variables. For any given α ∈ (0, 1], if the random variables ξi,α (ω), ξ ξi,α (ω), ηi,α (ω) and ηi,α (ω) are nonlattice, E [ξ + η] < ∞ and E < ∞, ξ+η then we have ξ lim Ch{system is on at t} = E . (21) t→∞ ξ+η Proof. By Definition 5 and Remark 1, we have Ch{system is on at t} 1 = 12 0 Prα {ω ∈ Ω|system is on at t} + Prα {ω ∈ Ω|system is on at t} dα. (22) It follows from Theorem 1 that E ξ1,α (ω) . (23) lim Prα {ω ∈ Ω . system is on at t} = t→∞ E ξ1,α (ω) + E η1,α (ω) E ξ (ω) . 1,α . lim Prα {ω ∈ Ω . system is on at t} = (24) t→∞ E ξ1,α (ω) + E η1,α (ω) That is,
. . Prα {ω ∈ Ω . system is on at t} + Prα {ω ∈ Ω . system is on at t} t→∞ E ξ1,α (ω) E ξ1,α (ω) + . = E ξ1,α (ω) + E η1,α (ω) E ξ1,α (ω) + E η1,α (ω) lim
348
R. Zhao, W. Tang, and G. Li
On the one hand, we have
0 ≤ Prα {ω ∈ Ω|system is on at t} + Prα {ω ∈ Ω|system is on at t} ≤ 2. (25) It follows from the dominated convergence theorem that lim Ch{system is on at t} 1 1 lim Prα {ω ∈ Ω|system is on at t} + Prα {ω ∈ Ω|system is on at t} dα = 2 0 t→∞ ⎞ ⎛ E ξ1,α (ω) E ξ1,α (ω) 1 1⎝ + ⎠ dα = 2 0 E ξ1,α (ω) + E η1,α (ω) E ξ1,α (ω) + E η1,α (ω) t→∞
On the other hand, ∞ 2 ξ ξ E Cr = ≥ r dr ξ+η ξ+η 0 ⎛ ⎞ 1 E ξ (ω) (ω) E ξ 1,α 1,α 1 ⎝ + ⎠ dα. = 2 0 E ξ1,α (ω) + E η1,α (ω) E ξ1,α (ω) + E η1,α (ω) The proof is completed. Remark 4. If {(ξi , ηi ), i ≥ 1} degenerates to a sequence of iid random vectors, then the result in Theorem 2 degenerates to the form lim Pr{system is on at time t} =
t→∞
E [ξ1 ] , E [ξ1 ] + E [η1 ]
which is just the conventional result in stochastic case (see Ross [20]). If {(ξi , ηi ), i ≥ 1} degenerates to a sequence of fuzzy vectors, then, the result in Theorem 2 degenerates to the form ξ1 . lim Cr{system is on at time t} = E t→∞ ξ1 + η1
Acknowledgments This work was supported by the National Natural Science Foundation of China Grant No. 70471049 and China Postdoctoral Science Foundation No. 2004035013.
References 1. Asmussen, S.: Applied Probability and Queues. Wiley, Now York. 1987 2. Birolini, A.: Quality and Reliability of Technical Systems. Spring, Berlin. 1994
A Theorem for Fuzzy Random Alternating Renewal Processes
349
3. Daley, D., Vere-Jones, D.: An Introduction to the Theory of Point Processes. Springer, Berlin. 1988 4. Dozzi, M., Merzbach, E., Schmidt, V.: Limit theorems for sums of random fuzzy sets. Journal of Mathematical Analysis and Applications. 259 (2001) 554-565 5. Hwang, C.: A theorem of renewal process for fuzzy random variables and its application. Fuzzy Sets and Systems. 116 (2000) 237-244 6. Kwakernaak, H.: Fuzzy random variables—I. Information Sciences. 15 (1978) 1-29 7. Kwakernaak, H.: Fuzzy random variables—II. Information Sciences. 17 (1979) 253278 8. Kruse, R., Meyer, K.: Statistics with Vague Data. D. Reidel Publishing Company, Dordrecht. 1987 9. Liu, B.: Theory and Practice of Uncertain Programming. Physica-Verlag, Heidelberg. 2002 10. Liu, B.: Fuzzy random chance-constrained programming. IEEE Transactions on Fuzzy Systems. 9 (2001) 713-720 11. Liu, B.: Fuzzy random dependent-chance programming. IEEE Transactions on Fuzzy Systems. 9 (2001) 721-726 12. Liu, B., Liu, Y.: Expected value of fuzzy variable and fuzzy expected value models. IEEE Transactions on Fuzzy Systems. 10 (2002) 445-450 13. Liu, P.: Fuzzy-valued Markov processes and their properties. Fuzzy Sets and Systems. 91 (1997) 45-52 14. Liu, Y., Liu, B.: Fuzzy random variables: a scalar expected value. Fuzzy Optimization and Decision Making. 2 (2003) 143-160 15. Liu, Y., Liu, B.: Expected value operator of random fuzzy variable and random fuzzy expected value models. International Journal of Uncertainty, Fuzziness & Knowledge-Based Systems. 11 (2003) 195-215 16. Liu, Y., Liu, B.: On minimum-risk problems in fuzzy random decision systems. Computers & Operations Research. 32 (2005) 257-283 17. Popova, E., Wu, H.: Renewal reward processes with fuzzy rewards and their applications to T -age replacement policies. European Journal of Operational Research. 117 (1999) 606-617 18. Puri, M., Ralescu, D.: The concept of normality for fuzzy random variables. Annals of Probability. 13 (1985) 1371-1379 19. Puri, M., Ralescu, D.: Fuzzy random variables. Journal of Mathematical Analysis and Applications. 114 (1986) 409-422 20. Ross, S.: Stochastic Processes. John Wiley & Sons, Inc, New York. 1996 21. Tijms, H.: Stochastic Modelling and Analysis: A Computational Approach. Wiley, New York. 1994 22. Zhao, R., Liu, B.: Renewal process with fuzzy interarrival times and rewards. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems. 11 (2003) 573-586
Three Equilibrium Strategies for Two-Person Zero-Sum Game with Fuzzy Payoffs Lin Xu, Ruiqing Zhao, and Tingting Shu Institute of Systems Engineering, Tianjin University, Tianjin 300072, China [email protected], [email protected], [email protected]
Abstract. In this paper, a two-person zero-sum game is considered, in which the payoffs are characterized as fuzzy variables. Based on possibility measure, credibility measure, and fuzzy expected value operator, three types of concept of minimax equilibrium strategies, r-possible minimax equilibrium strategy, r-credible minimax equilibrium strategy, and expected minimax equilibrium strategy, are defined. An iterative algorithm based on fuzzy simulation is designed to find the equilibrium strategies. Finally, a numerical example is provided to illustrate the effectiveness of the algorithm.
1
Introduction
When the game theory is applied to real world problems in the field of economics and management science, on occasions it is difficult to assess payoffs exactly because of imprecise information and fuzzy understanding of situations by experts. In such situations, it is useful to model the problems as games with fuzzy payoffs. Campos [1] studied two-person zero-sum matrix game with fuzzy payoffs, in which fuzzy linear programming models were employed to solve fuzzy matrix games. Furthermore, Nishizaki and Sakawa [10] discussed multiobjective matrix games with fuzzy payoffs. Maeda [7,8] discussed a game, in which the payoffs were characterized as fuzzy numbers, and also defined equilibrium strategies based on possibility and necessity measure. Recently, Liu and Liu [4] defined a credibility measure and gave an expected value operator of fuzzy variable. Moreover, the propositions about possibility measure, credibility measure and fuzzy expected value operator can be found in [4,5,6,9]. In this paper, we will discuss a two-person zero-sum game with fuzzy payoffs, and provide three types of minimax equilibrium strategies based on possibility measure, credibility measure and fuzzy expected value operator, respectively. An iterative algorithm based on fuzzy simulation is designed to find the equilibrium strategies. Finally we provide a numerical example to illustrate the procedure.
2
Two-Person Zero-Sum Game
In this section, we discuss a two-person zero-sum game, in which the payoffs are characterized as fuzzy variables, and then define three types of concept of minimax equilibrium strategies to the game. L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 350–354, 2005. c Springer-Verlag Berlin Heidelberg 2005
Three Equilibrium Strategies for Two-Person Zero-Sum Game
351
Let I = {1, 2, · · · , m} be a set of pure strategies of player I and J = {1, 2, · · · , n} a set of pure strategies of player II. Mixed strategies of players I and II are represented by weights to their pure strategies, i.e.,
x = {(x1 , x2 , · · · , xm )T ∈ m + |
xi = 1}
i∈I
is a mixed strategy of player I, and
y = {(y1 , y2 , · · · , yn )T ∈ n+ |
yj = 1}
j∈J m is a mixed strategy of player II, where m + = {a ∈ | ai ≥ 0, i = 1, 2, · · · , m} T and x is the transposition of x. By fuzzy variable ξij , we denote the payoff that player I receives or player II loses when player I plays the pure strategy i and player II plays the pure strategy j. Then a two-person zero-sum game is represented as fuzzy variables payoff matrix
⎡
⎤ ξ1n ξ2n ⎥ .. ⎥ ⎦. .
ξ11 ⎢ ξ21 η=⎢ ⎣ .. .
ξ12 ξ22 .. .
··· ··· .. .
ξm1
ξm2
· · · ξmn
(1)
Definition 1. Let ξij (i = 1, 2, · · · , m, j = 1, 2, · · · , n) be independent fuzzy variables. Then (x∗ , y∗ ) is called an expected minimax equilibrium strategy to the game if (2) E xT ηy∗ ≤ E x∗T ηy∗ ≤ E x∗T ηy . where η is defined by equality (1). Definition 2. Let ξij (i = 1, 2, · · · , m, j = 1, 2, · · · , n) be independent fuzzy variables and r the predetermined level of the payoffs, r ∈ . Then (x∗ , y∗ ) is called a r-possible minimax equilibrium strategy to the game if / / / Pos xT ηy∗ ≥ r ≤ Pos x∗T ηy∗ ≥ r ≤ Pos x∗T ηy ≥ r . (3) where η is defined by equality (1). Definition 3. Let ξij (i = 1, 2, · · · , m, j = 1, 2, · · · , n) be independent fuzzy variables and r the predetermined level of the payoffs, r ∈ . Then (x∗ , y∗ ) is called a r-credible minimax equilibrium strategy to the game if / / / Cr xT ηy∗ ≥ r ≤ Cr x∗T ηy∗ ≥ r ≤ Cr x∗T ηy ≥ r . where η is defined by equality (1).
(4)
352
3
L. Xu, R. Zhao, and T. Shu
Iterative Algorithm Based on Fuzzy Simulation
When payoffs are fuzzy variables in a game, traditional analytical solution methods are not applicable to find a minimax equilibrium strategy. In this section, we propose an algorithm, in which fuzzy simulation is employed to estimate the possibility, credibility, and expected value, and iterative technique is used to seek the minimax equilibrium strategy. The reader interested in fuzzy simulation can consult [2,3,4]. For convenience, let g(ξij ) be one of the functions E [ξij ], Pos {ξij ≥ r}, and Cr {ξij ≥ r}, where ξij is the payoff that player I receives or player II loses when player I plays the pure strategy i and player II plays the pure - ∗Tstrategy /j. ∗ ∗ ∗T ∗ ∗ , Pos x , y ) be one of the functions E x ηy ηy ≥ r , Similarly, let f (x / and Cr x∗T ηy∗ ≥ r . The procedure to find the minimax equilibrium strategy for a two-person zero-sum game with fuzzy payoffs is summarized as follows. Step 1) Firstly, let xi1 be a pure strategy selected randomly from the set of pure strategies {1, 2, · · · , m} by player I, and yj1 a pure strategy selected randomly from the set of pure strategies {1, 2, · · · , n} by player II. Step 2) After the tth game, the pure strategies selected by player I are xi1 , · · · , xit while the pure strategies selected by player II are yj1 , · · · , yjt . In the (t + 1)th game, the pure strategy xit+1 selected by player I should satisfy the condition t t g(ξit+1 jk ) = max g(ξijk ). 1≤i≤m
k=1
k=1
The pure strategy yjt+1 selected by player II should satisfy the condition t
g(ξik jt+1 ) = min
1≤j≤n
k=1
t
g(ξik j ),
k=1
where g(ξij ) can be estimated by fuzzy simulation. Step 3) Return to Step 2 with t = t + 1 until the iteration number is N , where N is a sufficiently large number. Step 4) Let x∗ = {x∗1 , x∗2 , · · · , x∗m }, where x∗i is the ratio of times of the pure strategy i selected by player I in the above-mentioned steps to N , i = 1, 2, · · · , m. Let y∗ = {y1∗ , y2∗ , · · · , yn∗ }, where yj∗ is the ratio of times of the pure strategy j selected by player II in the above-mentioned steps to N , j = 1, 2, · · · , n. Step 5) Estimate f (x∗ , y∗ ) by fuzzy simulation, then display the minimax equilibrium strategy (x∗ , y∗ ) and f (x∗ , y∗ ).
4
Numerical Example
ξ11 ξ12 , where ξij (i = 1, 2, j = 1, 2) are fuzzy variables whose ξ21 ξ22 membership functions are given by
Let η =
μ11 (x) = e−
(x−1)2 2
, μ12 (x) = e−
(x−3)2 2
, μ21 (x) = e−
(x−4)2 2
, μ22 (x) = e−
(x−2)2 2
,
Three Equilibrium Strategies for Two-Person Zero-Sum Game
353
respectively. By 1000 iterations (2000 sample points in fuzzy simulation), the expected minimax equilibrium strategy is ∗ ∗ (x1 , x2 ), (y1∗ , y2∗ ) = (0.5230, 0.4770), (0.2530, 0.7470) with E x∗T ηy∗ = 2.5370. The results of iterative algorithm are shown in Fig. 1, in which the two straight lines represent the optimal solutions of x1 and y1 , and the curves represent the solutions obtained by different numbers of the iterative numbers. Weights . .. ....... ...
1.0 ....... ....... .. .. 0.75 ...... ........ .. ... x∗ ........... 0.5 .................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................. 1 ........ ..... ...... .. ........... .............. . 0.25 ........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................ y1∗ ... ............................................................................................................................................................................................................................................................................................................................................................................. 0 N 5 10 15 20 25 30 35 40 45 50
Fig. 1. An iterative process based on fuzzy simulation
5
Conclusions
In this paper, a two-person zero-sum game with fuzzy payoffs was studied. By employing fuzzy variables to describe the payoffs of the game, three types of concept of minimax equilibrium strategies were presented. An algorithm based on fuzzy simulation was designed to seek the minimax equilibrium strategies.
Acknowledgments This work was supported by the National Natural Science Foundation of China Grant No. 70471049 and China Postdoctoral Science Foundation No. 2004035013.
References 1. Campos, L.: Fuzzy linear programming models to solve fuzzy matrix games. Fuzzy Sets and Systems. 32 (1989) 275-289 2. Liu, B., Iwamura, K.: Chance constrained programming with fuzzy parameters. Fuzzy Sets and Systems. 94 (1998) 227-237 3. Liu, B., Iwamura, K.: A note on chance constrained programming with fuzzy coefficients. Fuzzy Sets and Systems. 100 (1998) 229-233 4. Liu, B., Liu, Y.: Expected value of fuzzy variable and fuzzy expected value model. IEEE Transactions on Fuzzy Systems. 10 (2002) 445-450 5. Liu, B.: Toward fuzzy optimization without mathematical ambiguity. Fuzzy Optimization and Decision Making. 1 (2002) 43-63
354
L. Xu, R. Zhao, and T. Shu
6. Liu, B.: Uncertainty Theory: An Introduction to its Axiomatic Foundations. Springer-Verlag, Berlin (2004) 7. Maeda, T.: Characterization of equilibrium strategy of the bi-matrix game with fuzzy payoff. Journal of Mathematical Analysis and Applications. 251 (2000) 885896 8. Maeda, T.: On the characterization of equilibrium strategy of two-person zero-sum game with fuzzy payoffs. Fuzzy Sets and Systems. 139 (2003) 283-296 9. Nahmias, S.: Fuzzy variables. Fuzzy Sets and Systems. 1 (1978) 97-110 10. Nishizaki, I., Sakawa, M.: Fuzzy and Multiobjective Games for Conflict Resolution. Physica-Verleg, Heidelberg (2001)
An Improved Rectangular Decomposition Algorithm for Imprecise and Uncertain Knowledge Discovery Jiyoung Song, Younghee Im, and Daihee Park Korea Univ. Dept. of Computer & Information Science {songjy, yheeim, dhpark}@korea.ac.kr
Abstract. In this paper, we propose a novel improved algorithm for the rectangular decomposition technique for the purpose of performing fuzzy knowledge discovery from large scaled database in a dynamic environment. To demonstrate its effectiveness, we compare the proposed one which is based on the newly derived mathematical properties with those of other methods with respect to the classification rate, the number of rules, and complexity analysis.
1
Introduction
In the real world applications, the database is usually updated by an adding of new objects or by modification of old measurements. Maddouri[1] combined rectangular decomposition algorithm which is the incremental updates possible with fuzzy set theory to process imprecise and uncertain knowledge. The rectangular decomposition algorithms of Khcherif[2] and Maddouri[1] transform a binary matrix to a general graph first and find the maximum clique from the general graph. It is NP-hard problem[2]. However, our algorithm transforms a binary matrix to a bipartite graph first and finds bicliques from the bipartite graph by using node-deletion[3]. This can be solved in polynomial times[4]. In Section 2, we review rectangular decomposition techniques. In Section 3, we introduces a newly created rectangular decomposition algorithm and analyse the complexity. In Section 4, we present and analyze the result of experiments. Finally, concluding remarks are given in section 5.
2
Rectangular Decomposition Technique
Rectangular decomposition is a technique that finds an optimal coverage of binary relation R, where relational database is considered as binary relation, denoted R(O, P ), between a set of objects, O, and a set attribute value or properties P . The rectangle and the optimal coverage obtained in this step mean one rule by one-to-one mapping and minimum rule-base respectively [1] [2]. L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 355–359, 2005. c Springer-Verlag Berlin Heidelberg 2005
356
3
J. Song, Y. Im, and D. Park
Improved Algorithm
In this paper, we propose a novel improved algorithm that notably reduces the costs for optimal coverage keeping the advantages of incremental update feature of the rectangular decomposition technique under dynamic environments where there are more frequent insertions, deletions, and updates. For this purpose following theorems are derived based on bipartite graph obtained from transformation of the binary matrix. V2 , E), and the comTheorems 1. Let the given bipartite graph be B = (V 1 plete bipartite graph, a subset of B, be Bc = (W1 W2 , Ec ). Where Bc = W1 × W2 , W1 ⊂ V1 , W2 ⊂ V2 , Ec ⊂ E. If the rectangle obtained by the nodedeletion is Bc , then Bc is the maximal rectangle. Proof. To prove it by contradiction, let’s assume that the conclusion is false. Then the Bc is not a maximal rectangle and there exists Bc such that Bc = W1 × W2 ⊂ W1 × W2 = Bc . Hence, the only following three cases are valid when the Bc becomes the proper subset of Bc . i) W1 = W1 + x and W2 = W2 ii) W1 = W1 and W2 = W2 + y iii) W1 = W1 + x and W2 = W2 + y For the case of i), let’s define the elements of V1 as u1 , u2 , · · · , um , elements of V2 as v1 , v2 , · · · , vn , elements of W1 as w1 , w2 , · · · , wi , and elements of W2 as z1 , z2 , · · · , zj . Let x be a temporary element of V1 − W1 that satisfies x ∈ V1 and x ∈ / W1 . If we rearrange the elements in the sets as uk = wk , vl = zl (1 ≤ k ≤ i, 1 ≤ l ≤ j) then x is one of elements in the set,ui +1, ui +2, · · · , um . And if we let W1 +x = W1 then there exists W1 ×W2 = Bc . Since Bc is a complete bipartite graph, x must have edges that connect all the elements of W2 . However, in the node deletion technique, only the nodes without the edges to ∀z(∈ W2 ) are deleted. Accordingly, no element x satisfies x ∈ V1 and x ∈ / W1 exists. This is a contradiction to the initial condition i). Therefore, the rectangle Bc which is obtained by node-deletion is maximal rectangle. ii) and iii) can be proved likewise. Corollary. Let the given bipartite graph be B =(V1 V2 , E), and the complete bipartite graph, a subset of B, be Bc = (W1 W2 , Ec ). Here, we know that Bc = W1 × W2 , W1 ⊂ V1 , W2 ⊂ V2 , Ec ⊂ E. Now, let be Bc1 from B using node deletion and Bc2 be B that was obtained from e2 ∈ / Ec1 , the edge that is not included in Bc1 . Using the same method let Bc1 be Bc that was obtained / (Ec1 Ec2 ) and get Bcn for all edges,ei ∈ E. If the set of all Bc is from e3 ∈ CV = {Bc1 , Bc2 , Bc3 , · · · , Bcn } then is a coverage. Proof. Since all Bci ∈ CV are rectangle by theorem 1 and all the edges, ei ∈ E is contained in CV , CV = {Bc1 , Bc2 , Bc3 , · · · , Bcn } is a coverage by the definition of coverage. Theorems 2. Let define the maximal rectangle Bci obtained by a node-deletion as Reci (i = 1, 2, · · · , n) and the coverage obtained from the corollary as CVopt =
An Improved Rectangular Decomposition Algorithm for Imprecise
357
{Rec1 , Rec2 , · · · , Recn }. If we set the each maximal rectangle as optimal rectangle then CVopt is the optimal coverage. Proof. Since we define each rectangle as optimal rectangle, it is sufficient to show that all elements of Reci in CVopt are not redundant rectangles. This is same as showing that we can’t form a coverage with n − 1 number of Reci . Let’s assume ei is an edge which is the first pair of nodes to get Reci (i = 1, 2, · · · , n) in node-deletion. Let’s first find out if the Reci is remaining rectangle. By the definition of the coverage all the edges must belong to at least one of rectangles. However, en was not included in any member of Reci (i = 1, 2, · · · , n − 1) when Reci , Rec2 , · · · , Recn−1 belong to CVopt and each member of Reci (i = 1, 2, · · · , n − 1) is optimal rectangle hence the optimal rectangle, Recn , which contains en must be included in CVopt . Therefore, Recn is not a redundant rectangle. Likewise,Reci , Rec2 , · · · , Recn−1 ) must be included in the coverage. A rectangle can’t be included in one or multiple rectangles(since other rectangles are optimal rectangles) because each rectangle is an optimal rectangle. Consequently, the coverage can’t be formed with n− 1 number of rectangles. Therefore CVopt is an optimal coverage by definition. From the theorems derived above we could prove that it’s possible to get desired optimal coverage by finding biclique directly from bipartite graph without transforming the bipartite graph that was obtained from binary matrix during the process of rectangular decomposition to general graph. Because the searching space that includes the solution for the problem to find biclique from the bipartite graph is m×n matrix, it becomes significantly smaller than (m+n)×(m+n) matrix from the methodology of Maddouri[1]. Moreover, finding the maximum clique is a problem regarding NP-hard while the problem of finding biclique from bipartite graph can only be solved by polynomial time. Hence this proves the method proposed in this paper is more efficient than Maddouri’s methodology that solves NP-hard problem using the heuristic.
Fig. 1. (a) Algorithm that finds the optimal coverage from the bipartite graph. (b) Algorithm that finds the maximal rectangle
358
J. Song, Y. Im, and D. Park
Fig. 2. Comparison of machine learning methods
Fig. 1 show the proposed rectangular decomposition algorithm. The node of domain, node of co-domain, and edges represent property, object, and relation respectively. The improved rectangular decomposition algorithm proposed in this paper modifies only the part of finding the optimal coverage. Therefore we can still keep the advantage of incremental updates from method proposed by Khcherif[2]. 3.1
Complexity Analysis
To show its effectiveness, we compare the proposed method with other methods with respect to complexity and knowledge representation in Fig. 2
4
Experimental Results
To assess the proposed algorithm, experiments were performed using the IRIS data. Table 1. Comparison of number of fuzzy rules and classification rate between conventional methods and proposed method Number of NM RM Jang Proposed training data criterion[6] criterion[6] [7] method 21 89.8(71) 89.6(72) 88.3(32) 90.8(14) 30 93.0(83) 93.3(87) 91.6(36) 92.8(17) 60 93.9(105) 94.1(107) 93.3(46) 94.5(24) 90 94.8(150) 94.6(150) 95.0(46) 95.6(28)
An Improved Rectangular Decomposition Algorithm for Imprecise
359
In Table 1, the proposed method has less rules and higher classification rate than conventional methods. In newly proposed method, we created and classified the rules with 21 test data first and added 9, 30, and 30 data gradually to the initial 21 data for incremental updates on rule-base since the our method supports the incremental updates.
5
Conclusions
In this paper, we proposed a further improved algorithm for the rectangular decomposition technique with incremental updating for large scaled database in a dynamic environment. The conventional methods transform a binary matrix to a general graph first and find the maximum clique from the general graph. This is considered as an NP-hard problem. However the proposed method transforms a binary matrix to a bipartite graph first and finds bicliques from the bipartite graph by using node-deletion. This can be solved in polynomial times. The proposed algorithm is valid because it’s based on the newly derived mathematical proofs. Also, it is not only effective but also has better results than conventional methods in comparisons of number of rules and the classification rates.
References 1. M. Maddouri, S. Elloumi, A. Jaoua: An Incremental Learning System for Imprecise and Uncertain Knowledge Discovery. Information Sciences. 109 (1998) 149-164 2. R. Khcherif, A. Jaoua: Rectangular Decomposition Heuristics for Documentary Databases. Information Sciences. 102 (1997) 187-202 3. M. Yannakakis: Node deletion problems on bipartite graphs. SIAM J. Comput. 10 (1981) 310-327 4. D. S. Hochbaum: Approximating Clique and Biclique Problems. Journal of Algorithm. 29 (1998) 174-200 5. R. Khcherif, M. M. Gammoudi, A. Jaoua: Using difunctional relations in information organization. Information Sciences. 125(2000) 153-166 6. H. Ishibuchi, K. Nozaki, H. Tanaka: Efficient fuzzy partition of pattern space for classification problems. Fuzzy Sets and Systems. 59 (1993) 295-304 7. Jang, D-S., Choi, H-I.: Automatic Generation of Fuzzy Rules with Fuzzy Associative Memory. Proceeding of the ISCA 5th International Conference. (1996) 182-186
XPEV: A Storage Model for Well-Formed XML Documents* Jie Qin1, Shu-Mei Zhao2, Shu-Qiang Yang1, and Wen-Hua Dou1 1
School of Computer, National University of Deference Technology, Changsha Hunan, 410073 China [email protected] 2 Zhengzhou Railway Vocational & Technical College, Zhengzhou Henan, 450052 China
Abstract. XML is an emerging standard for Internet data representation and exchange. There are more and more XML documents without associated schema on the Web. An XML document without associated schema is called well-formed XML document. It is difficult to store and query well-formed XML documents. This paper proposes a new XML documents storage model based on model-mapping, namely XPEV. The unique feature of modelmapping-based storage model is that no XML schema information is required for XML data storage. Through XPEV’s three tables: Path table, Edge table and Value table, well-formed XML documents could be easily stored in relational databases. XPEV model can make full use of the index technology of relational databases, and give a better solution for querying and storing well-formed XML documents using relational databases.
1 Introduction As the eXtensible Markup Language (XML) becoming a standard for Internet data representation and exchange, large amounts of XML documents are being generated, the problem of storing and querying XML documents has gained increasing attention [1],[2],[3]. There exist numerous different approaches to store XML document, such as file systems, object-oriented database systems, native XML databases systems, relational or object-relational database system. Comparative studies [4],[5] indicate that relational or object-relational database systems are competitive in terms of efficiency as compared to native XML systems, object-oriented database systems, or file systems. Significant benefits include scalability and support for value-based indexes. Consider easy to manage and realize, relational database management system (RDBMS) is chosen as one efficient way to store XML data. Effective storage for XML documents must consider both the dynamic nature of XML data and the stringent requirements of XQuery[6]. When XML documents are stored in off-the-shelf database management systems, the problem of storage model *
This work was supported by National Science Foundation of China under Grant No.90412011, the National High-Tech Research and Development Plan of China under Grant No. 2003AA111020, and the National High-Tech Research and Development Plan of China under Grant No. 2004AA112020.
L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 360 – 369, 2005. © Springer-Verlag Berlin Heidelberg 2005
XPEV: A Storage Model for Well-Formed XML Documents
361
design for storing XML data becomes a database schema design problem. In [3], the authors categorize such database schemas into two categories: structure-mapping approach and model-mapping approach. In the former, the design of database schema is based on the understanding of DTDs (Document Type Declarations) or XML Schema that describes the structure of XML documents. In the latter, a fixed database schema is used to store any XML documents without assistance of XML schema. Obviously, the structure-mapping approach is inappropriate to the storage of a large number of dynamic(DTDs vary from time to time) and structurally-variant XML documents. An XML document is called well-formed if it does not have an associated DTDs (or XML Schema). In many applications, the schema of XML documents is unknown. There are more and more well-formed XML documents on the Web. It is difficult to query these non-schema XML documents. Model-mapping approach gives solution to this problem. A new model-mapping-based storage model, namely XPEV was present in this paper. XPEV is a multi-XML document storage model. It uses a three tables schema to store XML data to relational databases: Path table stores distinct (root-to-any) paths, Edge table keeps parent-child relationships, Value table stores all the values of elements and attributes. Through XPEV, any well-formed XML document can be converted to relational database schema, experiment shows that XPEV is an efficient storage model for query processing using relational database technology. And it is easy to realize. To the best of our knowledge, XPEV outperform known modelmapping methods. The organization of this paper is as follows. Section 2 discusses XML data models. Section3 introduces XPEV data storage model. Section 4 introduces two existing model-mapping approaches, XRel[3] and Edge[8]. Section 5 gives out experimentation results. Section 6 is conclusion.
2 Data Model The World Wide Web Consortium (W3C) proposed four XML data model: the Infoset model, the XPath data model, the DOM model, and the XQuery 1.0 and XPath 2.0 data model. Each of the models describes an XML document as a tree, but there are differences in the trees. Although these variations may not impact the models’ ability to represent enterprise data for most applications, they do impact the ability to provide consistent and uniform management of documents across diverse applications. The particular contents of the four XML data model can be found in [1]. It is important to note that among the four models only the XQuery 1.0 and XPath 2.0 data model acknowledges that the data universe includes more than just a single document. Furthermore, it is also the only model that includes inter-document and intra-document links in a distinct node type (i.e., Reference). We adopt the XQuery 1.0 and XPath 2.0 data model Data Model to represent XML documents. The XQuery 1.0 and XPath 2.0 data model models XML documents as an ordered graph using 7 types of nodes, namely, root, element, text, attribute, namespace, processing instruction and comment. In brief, here we emphasize on four types: root, element, text and attribute. The formal definition of the XML data graph is omitted. The full specification can be found in [6],[7].
362
J. Qin et al.
SIGMOD 2002
Storing and querying XML Igor Stratis Id =A2
SIGMOD 2001
Updating XML Igor
Fig. 1. A small XML document
Fig. 2. A data graph for a small XML document
Fig. 1 is a simple XML document example. Fig. 2 is its data graph illustrating the data model in our research. All the round nodes are element nodes. Each node is assigned a unique preorder node traversal number. An element node has an element-
XPEV: A Storage Model for Well-Formed XML Documents
363
type name, which is represented as an edge-label of the incoming edge pointing to the element node itself. The text nodes are marked with underline. All the triangle nodes are attribute nodes. The IDREF attribute nodes are used for intra-document references. The root node of the XML data graph has an edge, with an edge-label Proceedings, pointing to the element (&1), which is the real root element of the XML document. The IDREF attribute node (octagon) is used for intra-document (interdocument) reference. For example, the element node (&5) has an IDREF attribute (@Reference) pointing to the element node (&14) whose ID attribute (@Id) is A2. A label-path in an XML data graph is a /-separated sequence of edge-labels (elementtype names), such as /Proceedings/Proceeding/Article/Author in Fig. 2.
3 XPEV Storage Model XPEV is a new model-mapping-based storage model. XPEV storage model consists of three parts, storage schema, index methods and query evaluation algorithm. 3.1 Storage Schema XPEV’s storage schema contains three tables: Path table, Edge table and Value table. Path table stores distinct (root-to-any) path, it gives a global view on the XML documents stored in the database management system. It allows navigation through the data with regular path expressions, which can speed up querying. Edge table keeps parent-child relationships, and describes all edges of XML data graphs. Value table stores all the values of elements and attributes which are leaf nodes in XML data graph. Table 1 is Path table for the XML data graph of Fig. 2. Table 2 shows Edge table of Fig. 2. Table 3 shows Value table of Fig. 2. Path table keeps all distinct label-paths of XML documents. For example, in Fig. 2, two data-paths, &1/&2/&3, &1/&11/&12, share the same label-path #/Proceedings/Proceeding/conference, their Path-id is 3 in Table 1. Symbol #/ denotes root node. Edge table keeps all edges of XML data graph which describes parent-child relationships of XML data. Each edge is specified by two node identifiers, Source and Target. The Path_id indicates which path the edge belongs to. The Doc_id indicates which XML document the edge belongs to. In this example only one XML document exists whose Doc_id is X1. The Label attribute keeps the edge-label of an edge. Because elements of an XML document have order, the order of an element is the order of this element among all siblings that share the same parent. The Order attribute records the order of the Target element node among its siblings. The Flag value indicates whether the target node is an inter-object reference (ref) or points to a value (val). For example, the triple (8, "X1", &5, &9, "Author", 2, Val), describes the edge from the element node (&5) to the element node (&9). The edge has an edge-label "Author", and a value ("Stratis" in the value table), the order of element node &9 is 2 (the second author of the article "Storing and querying XML"), its path is #/Proceedings/Proceeding/Article/Author (Path_id =8), it belongs to XML document "X1".
364
J. Qin et al. Table 1. Path table of Fig. 2 Path_id
Pathexpress #/Proceedings
1 2 3 4 5 6 7 8 9
#/Proceedings/Proceeding #/Proceedings/Proceeding/Conference #/Proceedings/Proceeding/Year #/Proceedings/Proceeding/Article #/Proceedings/Proceeding/Article/ID #/Proceedings/Proceeding/Article/Title #/Proceedings/Proceeding/Article/Author #/Proceedings/Proceeding/Article/Reference Table 2. Edge table of Fig. 2
Path_id 2 5 2 5 3 4 6 7 8 8 9 3 4 6 7 8
Doc_id X1 X1 X1 X1 X1 X1 X1 X1 X1 X1 X1 X1 X1 X1 X1 X1
Source &1 &2 &1 &12 &2 &2 &5 &5 &5 &5 &5 &11 &11 &14 &14 &14
Target &2 &5 &11 &14 &3 &4 &6 &7 &8 &9 &10 &12 &13 &15 &16 &17
Label Proceeding Article Proceeding Article Conference Year ID Title Author Author Reference Conference Year ID Title Author
Order 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1
Flag Ref Ref Ref Ref Val Val Val Val Val Val Val Val Val Val Val Val
Table 3. Value table of Fig. 2 Path_id 3 4 6 7 8 8 9 3 4 6 7 8
Doc_id X1 X1 X1 X1 X1 X1 X1 X1 X1 X1 X1 X1
Source &2 &2 &5 &5
Target &3 &4 &6 &7
Lable Conference Year ID Title
Order 1 1 1 1
&5 &5 &5 &11 &11 &14 &14 &14
&8 &9 &10 &12 &13 &15 &16 &17
Author Author Reference Conference Year ID Title Author
1 2 1 1 1 1 1 1
Val SIGDOM 2002 A1 Storing and Querying XML Igor Stratis A2 SIGDOM 2001 A2 Updating XML Igor
XPEV: A Storage Model for Well-Formed XML Documents
365
Value table keeps all the leaf-nodes information of XML data graphs. For example, in table 3 the triple (8, "X1", &5, &9, "Author", 2, "Stratis"), describes the edge ("Author") from the element node (&5) to the element node (&9) has a value "Stratis", and "Stratis" is the second author of the article "Storing and querying XML". Through the above three tables, any XML documents can be appropriate decomposed and stored into a RDBMS according to their tree structures. This model can be efficiently supported by the conventional index mechanisms. 3.2 Query Evaluation Algorithm When retrieve XML documents from such databases, XML queries (Xquery) should be translated into SQL queries. With the help of path expressions (Pathexpress in Path table), and the Path_id in Edge table and value table, query searching scope can be efficiency reduced. When a simple XML query(query has no branch) executes, first using string match algorithm from Path table select the paths (Path_id) which satisfy the query path, then from Value table select the results which have the same Path_id. When a complex XML query(query has multi-branch) executes, both Edge table and Value table are used to find the final results. The outline of query evaluation algorithm is described below: Input: An XQuery query expression Output: result of SQL query Algorithm: 1.Transform XQuery query expression to SQL format; 2.Select all the Path_ids from Path table which query path satisfy Pathexpress; 3.If query is single path query { Using the selected Path_id to find all matched triples from Value table; Output all the matched triples according to SQL format; if no matched triples in Value table { Using the selected Path_id to find all matched triples from Edge table; Output all the matched triples according to SQL format; } } 4.else //query is complex query which has multi-branch { Set n=the number of selected Path_id in step 2; Using one of selected Path_ids whose query path is the longest, to find all the matched triples from Value table or Element table; Store the matched triples to temp table T1; while n-1>0 do
366
J. Qin et al.
{ Using one of the unused selected Path_id to find all the matched triples from Element table; Store the matched triples to temp table T2; for each triple of T2 ,if its Source value equal to any one of T1 triple’s Source node’s ancestor, and they have same Doc_id, then join these two triples, and store the new triple to T3; Set T2 to empty; Set T3 to empty; Set T1=T3; n=n-1; } Output all the triples in T1 according to SQL format; } From the above describe of the query evaluation algorithm, it is easy to derive that the algorithm can be finished in time O(M+N), M is the size of query sentence, and N is the size of XML source data.
4 Related Work Edge [8] and XRel [3] are two representatives of the model-mapping approaches. The Edge approach stores XML data graphs in a table named Edge, its Edge table is defined as follows: Edge(Source, Target, Order, Label, Flag, value). Because it only keeps edge-label, rather than the label paths, a large number of joins is needed to check edge-connections. It needs to concatenate the edges to form a path for processing user queries. It requires a number of join operations in proportion to the length of the path expression. XRel is a path-based storage model. The basic XRel schema consists of the following four relational schemas: Attribute(docID, pathID, start, end, value) Text(docID, pathID, start, end, value) Path(pathID, pathexp) Element(docID, pathID, start, end, index, reindex). Attribute table stores values of attributes of XML elements. Text table stores values of XML elements Path table stores distinct (root-to-any) paths, just like XPEV’s Path table. Element table associates each path with a region (start, end). There is no edge information explicitly maintained in this schema. With a concept called region, it maintains a containment relationship. A region is a pair of start and end points of a node. A node ni is reachable from another node nj, if the region of ni is included in the region of nj. The containment relationship is not only for parent-child relationships,
XPEV: A Storage Model for Well-Formed XML Documents
367
but rather ancestor-descendent relationships. The attributes index and reindex in the relation Element represent the occurrence order of an element node among the sibling element nodes in document order and reverse document order, respectively. The advantage of this approach is that it can easily identify whether a node is in a path from another node by usingș-joins (to test the containment- relationship using >, ε) While (input file is not empty) Read one datum x [Update GBFCM(DM) center Mean] v μ (n + 1) = v μ (n) − ημ2 (v μ (n) − xμ ) [Update GBFCM(DM) membership grade] 1 μi (x) = c D(x,v i) 2 (
j=1 D(x,v j )
)
e := v μ (n + 1) − v μ (n) End while 2
Ni
v σi (n + 1) = k=1 error := e End while 2 Output μi , v μ and v σ End main() End
5
2
μ
μ
2 (x σ k,i (n)+(x k,i (n)−v i (n)) )
Ni
Experiments and Results
The MPEG video traffic data sets considered in this paper are from the following internet site: http://www3.informatik.uni-wuerzburg.de/MPEG/ The data sets are prepared by O. Rose [7] and are compressed with MPEG-1, where the size of GoP is 12 and the sequence is IBBPBBPBBPBB. 10 video streams are used for experiments. Each video stream has 40,000 frames at 25 frames/sec. Table 2 shows the subjects of the 10 video streams. The video data prepared by Rose[7] have been analyzed and approximated with the statistical features of frames and GoP size by using Gamma or Lognormal by Manzoni et. al [10]. Later, Krunz et. al have found that Lognormal representation is the best matching method for I/P/B frames[13]. In this paper, the Lognormal representation of MPEG VBR Video data is also employed for our experiments. For applying the divergence measure, the mean and variance of each 12 frame data in the GoP are obtained and each GoP is expressed as 12-dimensional data
Classification of MPEG VBR Video Data Using Gradient-Based FCM
481
Table 2. MPEG VBR Video used for experiments MOVIE SPORTS “Jurassic Park”(Dino) ATP tennis final (Atp) “The silence of the lambs”(Lambs) Formula 1:GP Hockenheim 1994 (Race) “Star Wars”(Star) Super Bowl 1995: Chargers-49ers (Sbowl) “Terminator 2”(Term) Two 1993 World Cup matches (Soc1) “a 1994 movie preview”(Movie2) Two 1993 World Cup matches (Soc2)
Table 3. Classification results of Experiment 1 in False Alarm Rate (%) # of Clusters 3 4 5 6 7
FCM 31.1 28.6 26.8 30.1 27.8
GBFCM 28.6 26.7 26.8 24.8 25.6
GBFCM(DM) 13.2 12.9 12.7 14.2 12.8
Table 4. Classification results of Experiment 2 in False Alarm Rate (%) # of Clusters 3 4 5 6 7
FCM 17.4 21.2 20.5 18.8 18.4
GBFCM 15.8 16.5 15.2 22.1 17.6
GBFCM(DM) 10.4 10.1 9.9 9.9 9.9
where each dimension consists of the mean and the variance values of each frame in the GoP. In our experiments, parts of available MPEG VBR video data are used for the training of algorithms and the rest of the video data are used for performance evaluation of the trained algorithms. Experiments are performed in two ways by which the training data are chosen: – Experiment 1: Dino from movie data and ATP from sports data – Experiment 2: randomly chosen 10 % of data from each class of data Since Dino and ATP data have been thoroughly analyzed and successfully modeled with the GPDF, each data stream is selected for a training data set representing each class. Note that each MPEG VBR data set consists of 3,333 GPDF data. The test data sets used are the rest of the available MPEG VBR data for both cases of experiments. Note that the training data and test data sets in the experiment 2 are selected for 20 different cases for obtaining unbiased results. The False Alarm Rate (FAR) is used for the performance measure [5]: FAR =
# of misclassification cases total # of cases
482
D.-C. Park
For performance evaluation, the FCM and GBFCM are compared with the proposed GBFCM(DM). In the cases of the FCM and GBFCM, however, they use a only the mean of each frame data while the GBFCM(DM) uses both the mean and the variance. In experiments, each class of movie and sports is assigned with several clusters because one cluster for each class is found insufficient. The number of clusters for each class has been increased up to 7. Classification results for the experiments are given in Table 3 and Table 4. As can be seen from Table 3 and Table 4, the proposed GBFCM(DM) outperforms both the FCM and GBFCM. This is a somewhat obvious result because the FCM and GBFCM do not utilize the variance information of the frame data while the GBFCM(DM) does. The divergence measure use in GBFCM(DM) plays a very important role for modeling and classification of MPEG VBR data streams. This result implies that the divergence measure makes it possible to utilize the research results obtained by researchers including Rose and Manzoni et. al [7,10,13].
6
Conclusions
An efficient clustering algorithm, called the Gradient-Based Fuzzy C-Means algorithm with Divergence Measurement (GBFCM(DM)) for the GPDF is proposed in this paper. The proposed GBFCM(DM) employs the divergence measure as its distance measurement and utilizes the spatial characteristics of MPEG VBR video data for MPEG data classification problems. When compared with conventional clustering and classification algorithms such as the FCM and GBFCM, the proposed GBFCM(DM) successfully finds clusters and classifies the MPEG VBR data modelled by the 12-dimensional GPDFs. The results show that the GBFCM(DM) gives 5-15% improvement in False Alarm Rate over the FCM and GBFCM whose FARs are in the range between 15.2% and 31.1% according to the number of clusters used. This result implies that the divergence measure used in the proposed GBFCM(DM) plays a very important role for modeling and classification of MPEG VBR video data streams. Furthermore, the GBFCM(DM) can be used as a useful tool for clustering GPDF data.
Acknowledgement This research was supported by the Korea Research Foundation (Grant # R052003-000-10992-0(2004)).
References 1. Pacifici,G.,Karlsson,G.,Garrett,M.,Ohta,N.: Guest editorial real-time video services in multimedia networks, IEEE J. Select. Areas Commun. 15 (1997) 961-964 2. Tsang,D.,Bensaou,B.,Lam,S.: Fuzzy-based rate control for real-time MPEG video, IEEE Trans. Fuzzy Syst. 6 (1998) 504-516
Classification of MPEG VBR Video Data Using Gradient-Based FCM
483
3. Tan,Y.P.,Yap,K.H.,Wang,L.(eds.): Intelligent Multimedia Processing with Soft Computing, Series: Studies in Fuzziness and Soft Computing, Vol. 168. Springer, Berlin Heidelberg New York (2005) 4. Dimitrova,N.,Golshani,F.: Motion recovery for video content classification, ACM Trans. Inform. Sust. 13 (1995) 408-439 5. Liang,Q.,Mendel,J.M.: MPEG VBR Video Traffic Modeling and Classification Using Fuzzy Technique, IEEE Trans. Fuzzy Systems 9 (2001) 183-193 6. Patel,N.,Sethi,I.K.: Video shot detection and characterization for video databases. Pattern Recog. 30 (1977) 583-592 7. Rose,O.: Satistical properties of MPEG video traffic and their impact on traffic modeling in ATM systems, Univ. Wurzburg,Inst. Comput.Sci. Rep. 101 (1995) 8. Park,D.C.,Dagher,I.: Gradient Based Fuzzy c-means ( GBFCM ) Algorithm, IEEE Int. Conf. on Neural Networks, ICNN-94 3 (1994) 1626-1631 9. Looney,C.: Pattern Recognition Using Neural Networks, New York, Oxford University press (1997) 252-254 10. Manzoni,P.,Cremonesi,P.,Serazzi,G.: Workload models of VBR video traffic and their use in resource allocation policies, IEEE Trans. Networking 7 (1999) 387-397 11. Bezdek,J.C.: Pattern recognition with fuzzy objective function algorithms, New York : Plenum, (1981) 12. Bezdek,J.C.: A convergence theorem fo the fuzzy ISODATA clustering algorithms, IEEE trans. pattern Anal. Mach. int. 2 (1980) 1-8, 24 (1975) 835-838. 13. Krunz,M.,Sass,R.,Hughes,H.: Statistical characteristics and multiplexing of MPEG streams, Proc. IEEE Int. Conf. Comput. Commun., INFOCOM’95, Boston, MA 2 (1995) 445-462 14. Kohonen,T.: Learning Vector Quantization, Helsinki University of Technology, Laboratory of Computer and Information Science, Report TKK-F-A-601 (1986) 15. Dunn.,J.C.: A fuzzy relative of the ISODATA process and its use in detecting compact well separated clusters. J. Cybern. 3 (1973):32-75 16. Windham,M.P.: Cluster Validity for the Fuzzy cneans clustering algorithm, IEEE trans. pattern Anal. Mach. int. 4 (1982) 357-363 17. Gokcay,E.,Principe,J.C.: Information Theoretic Clustering, IEEE Trans. Pattern Ana. Mach Int. 24 (2002) 158-171 18. Fukunaga,K.: Introduction to Statistical Pattern Recognition, Academic Press Inc. 2nd edition (1990)
Fuzzy-C-Mean Determines the Principle Component Pairs to Estimate the Degree of Emotion from Facial Expressions M. Ashraful Amin1, Nitin V. Afzulpurkar2, Matthew N. Dailey3, Vatcharaporn Esichaikul1, and Dentcho N. Batanov1 1 Department
of CS & IM Department of MT & ME, Asian Institute of Technology, Thailand [email protected], [email protected] {vatchara, batanov}@cs.ait.ac.th 3 Sirindhorn International Institute of Technology, Thammasat University, Thailand [email protected] 2
Abstract. Although many systems exist for automatic classification of faces according to their emotional expression, these systems do not explicitly estimate the strength of given expressions. This paper describes and empirically evaluates an algorithm capable of estimating the degree to which a face expresses a given emotion. The system first aligns and normalizes an input face image, then applies a filter bank of Gabor wavelets and reduces the data’s dimensionality via principal components analysis. Finally, an unsupervised Fuzzy-C-Mean clustering algorithm is employed recursively on the same set of data to find the best pair of principle components from the amount of alignment of the cluster centers on a straight line. The cluster memberships are then mapped to degrees of a facial expression (i.e. less Happy, moderately happy, and very happy). In a test on 54 previously unseen happy faces., we find an orderly mapping of faces to clusters as the subject’s face moves from a neutral to very happy emotional display. Similar results are observed on 78 previously unseen surprised faces.
1 Introduction A significant amount of research work on facial expression recognition has been performed by researchers from multiple disciplines [1], [2], [3]. In this research, we build on existing systems by applying a fuzzy clustering technique to not only determine the category of a facial expression, but to estimate its strength or degree. The clustering is also used to choose the best description of faces in a reduced dimension. Very few researchers have considered the problem of estimating the degree or intensity of facial expressions. Kimura and Yachida [4] used the concept of a potential network on normalized facial images to recognize and estimate facial expression and its degree respectively. Pantic and Rothkrantz [5] used the famous Ekman [1] defined FACS (Facial Action Coding System) to determine facial expression and its intensity. L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 484 – 493, 2005. © Springer-Verlag Berlin Heidelberg 2005
Fuzzy-C-Mean Determines the Principle Component Pairs to Estimate the Degree
485
2 The Facial Expression Degree Estimation System Our implementation of the facial expression recognition & degree estimation system involves four major steps (Fig-1).
Fig. 1. The facial expression degree estimation system
2.1 Facial Data Acquisition The training and testing data for our experimental facial expression recognition & degree estimation system is collected from the Cohn-Kanade AU-Coded Facial Expression Database [6]. 2.2 Data Preprocessing Two main issues in image processing will affect the recognition results: the brightness distribution of the facial images and facial geometric correspondence to keep face size constant across subjects. To ensure the above-mentioned criterions in facial expression images, an affine transformation (rotation, scaling and translation) is used to normalize the face geometric position and maintain face magnification invariance and also to ensure that gray values of each face have close geometric correspondence [7].
Fig. 2. Normalization of Face image using the affine-transformation
486
M.A. Amin et al.
2.3 Facial Data Extraction The Gabor wavelets, whose kernels are similar to the 2D receptive field profiles of the mammalian cortical simple cells, exhibit desirable characteristics of spatial locality and orientation selectivity and are optimally localized in the space and frequency domains [8]. The Gabor wavelet (kernel, filter) can be defined as follows:
ψ μ ,ν ( z ) =
k μ ,ν
σ2
kμ ,ν
2
e
2
2σ
2
z
2
ª σ2 «e ikμ ,ν z − e − 2 « «¬
º » » »¼
(1)
where μ and ν define the orientation and scale of the Gabor kernel, z = ( x, y ), •
denotes the Euclidean norm operator, and wave vector k μ ,ν is defined as follows: k μ ,ν = kν e
i φμ
(2)
where kν = k max f ν (here ν = {0,1,...,4} ) and φμ = πμ 8 (here μ = {0,1,2...,7} ), here k max is the maximum frequency, and f is the spacing factor between kernels in frequency domain [9]. We employ a lattice of phase-invariant filters at five scales, ranging between 16 and 96 pixels in width, and eight orientations, 0 to 7π / 8 .
Fig. 3. The magnitude of Gabor kernels at five different scales
Fig. 4. Gabor magnitude representation of the face from Figure-2
σ = 2π , k max = π 2 , and f = 2 Principle Component Analysis (PCA): Even with sub-sampling, the dimensionality of our Gabor feature vector is much larger to classify. Principal components analysis is a simple statistical method to reduce dimensionality while minimizing mean squared reconstruction error [11].
Fuzzy-C-Mean Determines the Principle Component Pairs to Estimate the Degree
487
Fig. 5. An ideal membership function for the degree of an emotion
2.4 Clustering and Classification
Fuzzy C-Means (FCM) [10] is one of the most commonly used fuzzy clustering techniques for different degree estimation problems. Its strength over the famous k-Means algorithm [11] is that, given an input point, it yields the point’s membership value in each of the classes. In one dimension, we would expect the technique to yield a membership function as shown in Fig. 5. The aim of FCM is to find cluster centers (centroids) that minimize a dissimilarity function. The membership matrixU is randomly initialized as: c
¦uij =1, for ∀ j =1,...,n
(3)
i =1
The dissimilarity function that is used in FCM is given as: c
J (U, c1, c2 ,...,cc ) =
c n
¦ ¦¦uijmdij2 Ji =
i =1
(4)
i =1 j =1
Here, u ij ∈ [0,1] , ci is the centroid of i th cluster, d ij is the Euclidian distance between i th centroid and j th data point and m ∈ [1, ∞] is a weighting exponent. To reach a minimum of dissimilarity function there are two conditions. These are given in Equation (5) and (6). n
n
ci = ¦ u ij x j
¦u
§ d ij u ij = 1 ¦ ¨ ¨ k =1 © d kj
· ¸ ¸ ¹
m
j =1
c
m ij
(5)
j =1
2 /( m −1)
(6)
Pseudo code for Fuzzy-C-Means follows: I. Randomly initialize the membership matrix (U) that has constraints in Equation (3). II. Calculate centroids ( ci ) by using Equation (5).
488
M.A. Amin et al.
III. Compute dissimilarity between centroids and data points using Equation (4). Stop if its improvement over previous iteration is below a threshold. IV. Compute a new U using Equation (6) Go to Step 2.
3 Results and Observations 3.1 Neutral-Happy Faces
We applied the above-mentioned steps on 945 facial images which are classified as neutral-happy sequences portrayed by 50 actors. Initially we divided the data into two random groups to use in the PCA stage: 200 faces are used to compute the covariance matrix, and then the remaining 745 faces are projected onto this covariance matrix’s principal components. Out of these 745 faces we held out 4 randomly-selected subjects (54 faces) to test our clustering approach. We used FCM to cluster the remaining 691 faces into 3 fuzzy clusters. From this clustering we obtained the clusters. In Fig. 6., we present the scatter plot of the first three clustering, where the cluster centers are the black dots. In the clustering process we used different combinations of principal components.We considered the principle components in groups of two in sequence: (1st, 2nd), (2nd, 3rd), (3rd, 4th), (4th, 5th) and so on in different combinations until we get a satisfactory result. Later in this chapter we describe what satisfactory result means.
Fig. 6. Plot for principle component pairs; (1st,2nd),(2nd,3rd),(3rd,4th) of Neutral-Happy faces
Test Subject-1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Less Happy Medium Happy Very Happy
1
2
3
4
5
6
7
8
9
10
11
12
13
Fig. 7. Neutral-Happy photo sequence of Test Example-1; Plot of membership values
Fuzzy-C-Mean Determines the Principle Component Pairs to Estimate the Degree
489
Test Subject-2 1.2 1 0.8
Less Happy
0.6
Medium Happy Very Happy
0.4 0.2 0 1
2
3
4
5
6
7
8
9
10
11
12
13
14
Fig. 8. Neutral-Happy photo sequence of Test Example-2; Plot of membership values
Two individual subjects Neutral-Happy facial sequences are presented in Fig.7. and Fig.8., along with the membership values for each facial image that are projected using the (3rd, 4th) principle component pairs. Few interesting points from these two figures need to be noticed, one is that the membership function is similar to the ideal trapezoidal fuzzy membership function given in Fig. 5.. The other observation is that when we use the winner-take-all criteria to assign the absolute membership; the first person goes slowly to the maximum intensity on the other hand the second individual remains longer in the maximum intensity (Table 1). Similar results are presented on Fig.9. (Due to space constrains the image sequence is skipped). Winner-take-all strategy is applied on all the fuzzy membership values for all 4 subjects for (3rd, 4th) principle component and the result is provided in Table 1. If any face is classified as less where it is appearing after the sequence medium then it is consider as an uneven assignment of class. It absolutely follows the desired sequencing for a fuzzy clustering in the fuzzy membership representation and also reflected in the absolute class assignment. Test subjects are viewed in trajectory plotted using the 3rd and 4th principle component cluster center. Here notice that, the individuals faces are moving from near of one cluster center to the other cluster centers in Fig.10.. This again proves that the system is able to correctly capture the fuzzy characteristic that is embedded in the degree estimation process of facial expression. Test Subject-3
Test Subject-4
1.2
1.2
1
1
0.8
Less Happy
0.6
Medium Happy Very Happy
0.4 0.2
0.8
Less Happy
0.6
Medium Happy Very Happy
0.4 0.2
0
0
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15
1
2
3
4
5
6
7
8
9
10
11
12
Fig. 9. Plot of Neutral-Happy sequence of Test Example-3 (left) & Example-4 (right) membership values
490
M.A. Amin et al. Table 1. Absalute membership asigned to each image for Neutral-Happy sequence Subject
Test Subject-1 Test Subject-2 Test Subject-3 Test Subject-4
Color in Fig. 11. Green Blue Red Black
Less Happy (LH) 1-6 1-3 1-8 1-5
Medium Happy (MH) 7-9 4-6 9-10 6-9
Very Happy (VH) 10-13 7-14 11-15 10-12
Fig. 10. Neutral-Happy (3rd, 4th principle component) trajectory of the 4 test individuals
3.2 Neutral-Surprise Faces
We applied the similar steps on 1173 facial images which are classified as NeutralSurprise sequences as portrayed by 63 actors. This time 209 faces are kept for PCA. The remaining 964 faces are projected onto this covariance matrix’s principal components. Out of these 964 faces we held out 4 randomly-selected subjects (72 faces) to test our clustering approach. Cluster center is calculated using other 892 faces (Fig. 11.). The best clustering is achieved for 2nd, 3rd principle components. In Fig.12. and Fig.13. notice that the membership curve is rather an x-curve then a trapezoidal one. But also notice that it still follows the concept of fuzzy class assignment as only two consecutive classes are present for one individual.
Fig. 11. Plot for (1st, 2nd), (2nd,3rd), (3rd,4th ) principle components of Neutral-Surprise faces
Fuzzy-C-Mean Determines the Principle Component Pairs to Estimate the Degree
491
Test Subject-1 1.2 1 Less Surprised
0.8
Medium Surprised 0.6
Very Surprised
0.4 0.2 0 1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16
Fig. 12. Neutral-Surprise photo sequence of Test Example-1; Plot of membership values
Test Subject-2 1.2 1 0.8
Less Surprised Medium Surprised
0.6
Very Surprised
0.4 0.2 0 1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
Fig. 13. Neutral-Surprised photo sequence of Test Example-2; Plot of membership values Test Subject-3
Test Subject-4 1.2
1.2
1
1 0.8
Less Surprised Medium Surprised
0.6
Very Surprised
0.4 0.2
0.8
Less Surprised
0.6
Medium Surprised Very Surprised
0.4 0.2
0
0
1
3
5
7
9
11
13
15
17
19
21
23
1
2
3
4
5
6
7
8
9
10
11
12
13
Fig. 14. Plot of Neutral-Surprise sequence of Test Example-3 (left) & Example-4 (right) membership values
492
M.A. Amin et al. Table. 2. Absalute membership asigned to each image for Neutral-Surprised sequence Subject Test Subject-1 Test Subject-2 Test Subject-3 Test Subject-4
Color in Fig. 17. Green Blue Red Black
Less Surprise (LS) 1-6 1-17 ----1-3
Medium Surprise (MS) 7-10 18-30 1-17 4-6
Very Surprise (VS) 11-16 ----18-23 7-13
Notice that in Table 2. for subject-2 the Very Surprised category is empty. And for the 3rd subject the Less Surprised is empty, which also is trivial from the membership curves. More over the result could be more suitably projected from Fig.15., here notice that two subject ends their trajectory substantially earlier.
Fig. 15. Neutral-Surprise (2nd, 3rd principle component) trajectory of the 4 test individuals
3.3 Best Cluster Criterion
From the above example and evidences it is clear that our proposed system works satisfactorily. But interesting point to be noted is that how to find out the principle components that best captures the fuzzyness of data. As we could see the principle component pair for different emotion is different (Happy 3rd, 4th , Surprised 2nd, 3rd). We had to check as many of the combination possible. The best pair of principle components is chosen depending on minimum distance criterion (MDC). The criterion is, out of all combinations of principle component pairs, we cluster them using the similar initial condition and record the distance of the middle cluster center from the middle of the other two centers connecting line. This is provided in the following equation: D = ( x 2 − ( x1 + x 3 2)) 2 + ( y 2 − ( y1 + y 3 2)) 2
(7)
Fuzzy-C-Mean Determines the Principle Component Pairs to Estimate the Degree
493
Where ( x1 , y1 ) , ( x 2 , y 2 ) and ( x3 , y 3 ) are cluster centers and D is the distance of the middle cluster center from the midpoint of the connecting line of other two.
4 Conclusion and Future Work Here we have shown that fuzzy clustering is a promising approach to estimating the degree of intensity of a facial expression, when a face is characterized with Gabor kernels and projected into a low-dimensional space using PCA. The best result is achieved when (3rd,4th) and (2nd, 3rd) principal components are used to describe the Neutral-Happy and Neutral-Surprise faces consecutively. The best suitable principle component is selected depending on the MDC (Minimum Distance Criteria). Satisfactory results are observed in the experimentation process. Presently we are doing experiment on other prototypic emotions using the same approach and in future we will expand this to more sophisticated facial expressions that are already known as hard problem to characterize with computers.
Reference 1. Ekman, P. and Friesen, W.: The Facial Action Coding System. Consulting Psychologists Press, San Francisco, USA, 1978. 2. Pantic, M. and Rothkkrantz, L. J. M.: Automatic Analysis of Facial Expression: the State of Art, IEEE Trans. Pattern analysis and machine intelligence, Vol. 24, NO. 1, 2000, 1424-1445. 3. Cowic, R., Douglas, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W. and Taylor, J. G.: EMOTION RECOGNITION in Human-Computer Interaction IEEE Signal Processing Magazine, no. 1, 2001, 32-80. 4. Kimura, S. and Yachida, M.: Facial Expression Recognition and Its Degree Estimation, Proc. Computer Vision and Pattern Recognition, 1997, 295-300. 5. Pantic, M. and Rothkrantz, L. J.M.: An Expert System for Recognition of Facial Actions and their Intensity, Proc. 12th International Conference on Innovative Applications of Artificial Intelligence, 2000, 1026-1033. 6. http://vasc.ri.cmu.edu/idb/html/face/facial_expression/, 15 May 2004. 7. Jain, A. K.: Fundamentals of Digital Image Processing, Prentice-Hall of India Private Limited, New Delhi, India, 2003. 8. Daugman, J. G.: Complete Discrete 2-D Gabor Transform by Neural Networks for Image Analysis and Compression, IEEE Trans. Acoustics, Speech and Signal Processing, vol. 36, no. 7, 1988, 1169-1179. 9. Lee, T. S.: Image Representation Using 2D Gabor Wavelets, IEEE Trans. PAMI, Vol. 18, no. 10, 1996, 959-971. 10. Höppner, F., Klawonn, F., Kruse, R. and Runkler, T.: Fuzzy Cluster Analysis, Wiley, 1999. 11. Dubes, R. C.: Cluster analysis and related issues, Handbook of Pattern Recognition & Computer Vision, Chen, C. H., Pau, L. F., and Wang , P. S. P. (Eds.): World Scientific Publishing Co., Inc., River Edge, NJ, 3–32.
An Improved Clustering Algorithm for Information Granulation Qinghua Hu and Daren Yu Harbin Institute of Technology, Harbin, China, 150001 [email protected]
Abstract. C-means clustering is a popular technique to classify unlabeled data into dif-ferent categories. Hard c-means (HCM), fuzzy c-means (FCM) and rough c-means (RCM) were proposed for various applications. In this paper a fuzzy rough c-means algorithm (FRCM) is present, which integrates the advantage of fuzzy set theory and rough set theory. Each cluster is represented by a center, a crisp lower approximation and a fuzzy boundary. The Area of a lower approximation is controlled over a threshold T, which also influences the fuzziness of the final partition. The analysis shows the proposed FRCM achieves the trade-off between convergence and speed relative to HCM and FCM. FRCM will de-grade to HCM or FCM by changing the parameter T. One of the advantages of the proposed algorithm is that the membership of clustering results coincides with human’s perceptions, which makes the method has a potential application in understandable fuzzy information granulation.
1 Introduction In essence, an information granule is a cluster of physical or metal objects drawn together by indistinguishability, similarity, proximity or functionality (Zadeh, 1997). Information granulation, constraint representation and constraint propagation are main aspects of computing with words. Information granulation, the point of departure of computing with words, plays an important role in these techniques and computational theory of perceptions (Zadeh, 1999). Generally speaking, there are two kinds of information granules: crisp and fuzzy. Although crisp granules are the foundation of a variety of techniques, such as interval analysis, rough set theory, D-S theory, they fail to reflect the fact that the granules are fuzzy rather than crisp in most of perception information and human reasoning. In human cognition, fuzziness is the direct consequence of fuzziness of the perceptions of similarity. It is entailed by the finite capability of human mind and sensory organs to resolve detain and store information. However, fuzzy information granulation underlies the remarkable human ability to make rational decision in a condition of imprecision, partial knowledge, partial certainty and partial truth. Fuzzy information granulation plays an important role in fuzzy logic, computing with words and computational theory of perceptions. L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 494–504, 2005.
© Springer-Verlag Berlin Heidelberg 2005
An Improved Clustering Algorithm for Information Granulation
495
Partitioning a given set of unlabeled data into granules is a most fundamental problem in pattern recognition and data mining. C means clustering algorithms are a series of popular techniques to find the structure in unlabeled sample sets. A classical clustering algorithm introduced three decades ago [1], called Hard C Means (HCM), is to assign a sample with a label according to the nearest neighbor principle and minimize the within cluster distance. Fuzzy c means (FCM), an generalization of HCM introduced by Dunn [2] and generalized by Bezdek [3] is one of the most wellknown techniques in clustering analysis. The main difference between HCM and FCM is introducing the weighting index m, which is used to control the fuzziness. FCM has better global convergence and slower speed. Basically the performance of FCM clustering is dependent of some parameters, such as the fuzziness weighting exponent, number of clusters [4, 5, 6] and initialization. Rough set theory [7] is a new paradigm to deal with uncertainty, vagueness and incompleteness and has been applied to fuzzy rule extraction [8], reasoning with uncertainty [9] and fuzzy modeling. Rough set theory is proposed for indiscernibility in classification according to some similarity, whereas fuzzy set theory characterizes the vagueness in language variables. They are complementary in some aspects. Combining fuzzy set and rough set has been an important direction in reasoning with uncertainty [10, 11]. Pawan Lingras [12] introduced a new clustering method, called rough k-means, for web users pattern mining. The algorithm describes a cluster by a center and a pair of lower and upper approximations. And the lower and upper approximations are weighted different parameters in compute the new centers in the algorithm. S. Asharaf etc. extended the technique as a Leader one [13], which doesn’t require user specify the number of clusters. In this paper, a fuzzy rough c-means clustering algorithm will be presented. It combines two soft computing techniques, rough set theory and fuzzy set theory, together. Each cluster is represented by a center, a crisp lower approximation and a fuzzy boundary in this algorithm. Then a new center is a weighting average of the lower approximation and boundary. The rest of the paper is organized as follows.
2 Review on HCM and FCM C-means clustering is defined over a real-number space, and computes its centers iteratively by minimizing the objective function. Given data set
X = {x1 , x2 ,", x N } ⊂ R s
,
u = {uik }c× N ∈ M fcn
is
a
membership
s
matrix,
v = {v1 , v2 ,", vc } are c centers of clusters. vi ∈ R , 2 ≤ c ≤ N . C-means partitions N data points into c clusters Ci (i = 1, 2,", c ) . The objective function is N
c
J (u , v) = ¦ ¦ uik || xk − vi || 2 , k =1 i =1
c
where ¦ uik = 1 and uik ∈ {0, 1} for Hard c-means clustering. i =1
496
Q. Hu and D. Yu
The c-means algorithm calculates cluster centers iteratively as follows: l
1. Initialize the centers c using random sampling; 2. Decide membership of the patterns in one of the c clusters according the nearest neighbor principle from cluster center criteria; 3. Calculate new
c l +1 centers as vil +1 =
¦ kN=1uikl +1 x k ¦ kN=1uikl +1
°1 if i = arg min{|| xk − vil ||} l +1 where uik =® °¯ 0, otherwise 4. if max i || vil +1 − vil ||< ε , end, otherwise go to step 2. Hard c-means algorithm is intuitional, easy to be implemented and has fast convergence. However the results from this method are dependent of the cluster center initialization because it will stop in a local minimum. The most popular clustering algorithm FCM was proposed by Dunn and generalized by Bezdek in 1981. Each pattern doesn’t belong to some cluster definitely in this technique, but is a member of all clusters with a membership value. Here the objective function is defined as: J m (u , v ) = c
where ¦ uik = 1 and uik ∈ [0, 1]
N
c
¦ ¦ (u ik ) m || x k − v i || 2
k =1 i =1
m ∈ (1, + ∞) .
i =1
In order to minimize the objective function, the new centers and memberships are calculated as follows: vi = u ik =
¦ kN=1 (u ik ) m x k , i = 1,2, " , c ¦ kN=1 (u ik ) m 1
§ d ik j =1© d jk c
¦ ¨¨
· ¸ ¸ ¹
2 / m −1
, i = 1,2, " , c; k = 1,2, " , N
The performance of FCM depends on selection of fuzziness weighting exponent m. On one side FCM degrades to HCM when m = 1 , on the other side, FCM will give the mass center of all data when m → ∞ . Bezdek suggested that the preferred value of m is over the range 1.5-2.5. FCM usually stops on a local minimum or saddlepoint. The result changes with the initialization. There are a lot of variants of FCM proposed for all kinds of applications in the last decade. Different metrics [14], objective functions [15] and constraint conditions on membership and center will lead to a new clustering algorithm.
An Improved Clustering Algorithm for Information Granulation
497
3 Rough Sets and Rough C-Means Rough set methodology has been witnessed great success in modeling imprecise and incomplete information. The basic idea of this method hinges on classifying objects of discourse into clusters containing indiscernible objects with respect to some attributes.In this section we will review some basic definitions in rough set theory. is called an information system (IS), where U= {x1 , x 2, " x n } is the
universe; A is a family of attributes, called knowledge on the universe; V is the value domain of A and f is an information function f : U × A → V . Any subset B of knowledge A defines an indiscernibility relation or equivalence relation IND (B) on U and generates a partition Π B of U, where IND ( B ) = {( x, y ) ∈ U × U | ∀a ∈ B, f a ( x) = f a ( y )} Π B = U / B = ⊗{a ∈ B : U / IND (a)}
where A ⊗ B = { X Y : ∀X ∈ A, ∀Y ∈ B, X Y ≠ φ } . An indiscernibility relation on U will generate a partition of U, and a partition of U is necessarily induced by an indiscernibility relation. Indiscernibility relation and partition are corresponding one by one. We denote the equivalence classes induced by attribute set B as Π B = U / B = {[ x i ] B : x i ∈ U }
Knowledge B induces the elemental concepts [ x i ] B of the universe, which are used to approximate arbitrary subsets in U. We say Π A is a refinement of Π B if there is a partial ordering Π A E Π B ⇔ ∀ [ xi ] A ∈ Π A , ∃ [ x j ] B : [ xi ] A ⊆ [ x j ] B
Arbitrary subset X of U is characterized by a two-tuple < B X , B X > , called lower approximation and upper approximation, respectively
B X = *{[ x i ] B | [ x i ] B ⊆ X } . ® ¯ B X = *{[ x i ] B | [ x i ] B X ≠ φ } Lower approximation is the greatest union set of [ x i ] B contained in X and upper approximation is the least union set of [ x i ] B containing X. Lower approximation sometimes also is called positive region, denoted by POS B ( X ) . Correspondingly, negative region is defined as NEG B ( X ) = U − B X . If B X = B X , we say that set X is definable, otherwise, X is indefinable or rough set. BN B ( X ) = B X - B X is called boundary set. A set X in U is definable if it’s
498
Q. Hu and D. Yu
composed by some elemental concepts, which leads X can be precisely characterized with respect to knowledge B and BN B ( X ) = φ . According to the definitions of lower and upper approximations, we know that the object set [ xi ] B belongs to the lower approximation means all of the objects in [ xi ] B are contained by X definitely, [ x j ] B belongs to upper approximation of X is to say that objects in [ xi ] B probably are contained based on knowledge B. Here knowledge B classifies the universe into three cases respect to a certain object subset X: lower approximation, boundary region and negative region. There are some elemental properties in rough set theory. Given an information system , B ⊆ A , U / B = { X 1 , X 2 , " , X c } ,: • ∀x ∈ U , x ∈ B X i x ∉ B X j , j = 1, 2, " , c; j ≠ i • ∀x ∈ U , x ∈ B X i x ∈ B X i , i = 1, 2, ", c • ∀x ∈ U ∀i, x ∉ BX i ∃X k , X l : x ∈ BX k and x ∈ BX l .
Property 1 shows an object can be part of at most one lower approximation; property 2 means that objects that belong to the lower approximation necessarily are contained by the upper approximation and the third shows if an object is not part of any lower approximation, the object must belong to at least two upper approximations. The key problem in incorporating rough set theory into c-means clustering is how to define the definitions of lower and upper approximation in real space. Assumed that the lower approximations and upper approximations have been found, the modified centroid is given by: ¦ x∈AX x j ¦ x∈AX − AX x j , | AX − AX |≠ 0 + ωupper × °ωlower × | AX | ° | AX − AX | vj = ® ¦ x∈AX x j ° ω otherwise lower × ° | AX | ¯
where ω lower and ω upper correspond to the relative importance of the lower approximation and boundary . It is easy to find that the above formula is a generalization of HCM. Especially when | AX − AX |= 0 , RCM will degenerates to HCM. Let’s design the criteria to determine whether an object belongs to the lower approximation, boundary or negative region. For each object x and center point v , D( x, v) is the distance from x to v . The differences between D ( x, vi ) and D( x, v j ) are used to determine the label of x . Let D( x, v j ) = min D( x, vi ) and T = {∀i, i ≠ j : | D( x, vi ) − D( x, v j ) |≤ Threshold} . 1≤i ≤c
If T ≠ φ x ∈ A(vi ) , x ∈ A(v j ) and x ∉ A(vl ), l = 1,2,", c
If T = φ , x ∈ A(v j ) , and x ∈ A(v j ) .
An Improved Clustering Algorithm for Information Granulation
499
The performance of rough c-means algorithm depends on five conditions: the weighting index ω lower , ω upper ; Threshold, number of clusters c and initialization of c centers. So it’s difficult for users to manipulate the algorithm in practice. We will present a fuzzy rough c means (shortly, FRCM) to overcome this problem.
4 Fuzzy Rough Clustering As we know, HCM assigns a label to an object definitely; the membership value is 0 or 1. While FCM maps a membership over the arrange 0 to 1; each object belongs to some or all of the clusters to some fuzzy degree. RCM classify the object space into three parts, lower approximation, boundary and negative region. Then different weighting values are taken in computing the new centers, respectively. According RCM all the objects in lower approximation take the same weight, and all the objects in boundary take another weighting index uniformly. In fact, the objects in boundary regions have different influence on the centers and clusters. So different weighting should be imposed on the objects. How to d it in computing? According to the lower approximation is the object subset which belongs to a cluster without doubt. And the boundary is the object subset that maybe belongs to the cluster. So boundary is the region assigned a label with uncertainty. The fuzziness membership should be imposed on the objects in boundary. Here we define that membership function is given by
xk ∈ A(vi )
1, ° 1 , ° 2/ m−1 uik = ® c § dik · ° ¦¨ ¸ ° j=1¨ d jk ¸ © ¹ ¯
xk ∈ A(vi )
i = 1,2, " , c; k = 1,2, " , N .
It is worth noting that the membership function is constructed and is not derived by the objective function. So it is not strict. However, this problem has little influence on the performance. Then the new centers are calculated by vi =
¦ kN=1(uik ) m xk , i = 1,2, " , c . ¦ kN=1(uik ) m
The objective function used is
J m (u, v) =
N c
¦ ¦ (uik ) m || x k − vi || 2
k =1i =1
Then the lower and upper approximations are defined the same as RCM. For each object x and center point v , D( x, v) is the distance from x to v . The differences between D ( x, vi ) and D( x, v j ) are used to determine the label of x . Let
D( x, v j ) = min D( x, vi ) 1≤i ≤c
and T = {∀i, i ≠ j : | D( x, vi ) − D( x, v j ) |≤ Threshold} .
500
Q. Hu and D. Yu
1 If T ≠ φ x ∈ A(vi ) , x ∈ A(v j ) and x ∉ A(vl ), l = 1,2,", c 2 If T = φ , x ∈ A(v j ) , and x ∈ A(v j ) . It’s worth noting that the definitions of lower and upper approximations are different from the classical ones. They are not defined based on any predefined indiscernible relation on the universe. What’ more it is remarkable that the distance metric is not limited with Euclidean distance. So 1-norm, 2-norm, p-norm, infinite norm and some other measures can be applied. The FRCM can be formulated as follows: Input: unlabeled data set; number of cluster c, threshold T, exponent index m, stop criterion ε . Output: membership matrix U. ( 0) Step 1. Let l=0, J m (u, v) = 0 ; randomly make a membership U cl × N ;
Step 2. Compute the c centers vi(l ) , (i = 1,2, ", c) with U cl × N and data set; Step 3. Compute
the
A(vi(l ) ), A(vi(l ) ) ,
uij(l +1)
vi(l +1) , (i = 1,2, " , c)
with
vi(l ) , (i = 1,2, ", c) , and Threshold T, (l +1) Step 4. Compute J m (u, v) ; (l +1) (l ) Step 5. || J m (u, v) − J m (u , v) ||< ε , then stop, otherwise, l = l + 1 , go to step 2.
The difference between FCM and FRCM is that the membership values of objects in lower approximation are 1, while those in boundary region are the same as FCM. In other word, FRCM first partitions the data into two classes: lower approximation and boundary. Only the objects in boundary are fuzzified. The difference between RCM and FRCM is that, as for RCM the same weights are imposed on all of the objects in lower approximations and boundary regions, respectively. However, as to FCM, each object in boundary region is imposed a distinct weight, respectively.
5 Numeric Experiments To illustrate the differences of these algorithms, let’s look at the membership function in the one-dimensional space for a two-cluster problem. The membership functions are showed in figure 1. Comparing figure 1, we can find the membership generated by FRCM clustering is much similar with human conceptions. For example, we need to cluster some men into two groups according their ages. HCM and RCM will present two crisp sets of the men, which cannot reflect the fuzziness in human reasoning. Although FCM will give two fuzzy classes, but the memberships of the fuzzy sets are not understandable. As to a one year old baby, his memberships to young and old are nearly identical and close to 0.5. In fact, it’s unreasonable and unacceptable according to my intuition. Fuzzy rough c-means algorithm is implemented and applied to some artificial data. Three clusters of data are generated randomly. There are one hundred points in each
An Improved Clustering Algorithm for Information Granulation
501
cluster and the data points are satisfied a normal distribution. The data and result of HCM are showed in figure 2.
(A) 1 0.8 0.6 0.4 0.2 0
(B) 1 0.8 0.6 0.4 0.2 0
(C) 1 0.8 0.6 0.4 0.2 0
(D) 1 0.8 0.6 0.4 0.2 0
(E) Fig. 1. (A) shows some data distributed in one-dimensional space; (B) is the membership function with HCM; (C) is the membership function with RCM; (D) is function with FCM and (E) is with FRCM. It’s easy to see that HCM gets a crisp boundary line; RCM produces a boundary region; FCM and FRCM develop a soft boundary region.
502
Q. Hu and D. Yu
Figure 3 shows the clustering results with different thresholds. And the comparison is presented in table 1. We can find the iteration number of HCM is the fastest. And global convergence, namely objective function, of FCM is the best. FRCM make a tradeoff between speed and convergence. And with the change of threshold FRCM has a similar performance as HCM or FCM. 3 2.5 2 1.5 1 0.5 0
0
0.5
1
1.5
2
Fig. 2. Result of HCM 3
3
2.5
2.5 2
2
1.5
1.5
1
1
0.5
0.5 0
0
0
0.5
1
1.5
0
Figure 3 (A) Threshold=0.25 3
2.5
2.5
2
2
1.5
1.5
1
1
0.5
0.5
0
0.5
1
1.5
1
1.5
2
Figure 3 (C) Threshold=0.75
0
0
0.5
1
1.5
Figure 3 (D) Threshold=1
Fig. 3. The results with different Threshold from FRCM Table 1. Comparison of HCM, FCM and FRCM HCM Iter. count obj. fcn Center1 Center2 Center3
6 7.0589 0.4487,0.5200 0.4919,2.4553 1.4868, 1.5083
2
Figure 3 (B) Threshold=0.5
3
0
0.5
2
FCM 16 5.6172 0.4272,0.5153 0.4675,0.4699 1.4873,1.5054
FRCM T=0.25 11 6.5679 0.4592, 0.5002 0.4953, 2.4701 1.4792, 1.5097
T=10 16 5.6172 0.4272, 0.5153 0.4675, 2.4699 1.4873, 1.5054
2
An Improved Clustering Algorithm for Information Granulation
503
6 Conclusion Combining two soft computing methods, a fuzzy rough c means clustering algorithm (FRCM) is proposed in this paper. FRCM characterizes each class with a positive region, a fuzzy boundary region and a negative region. According rough set methodology, the objects in positive regions and negative regions are definitely `contained or not contained by the class. Only the objects in boundary regions are fuzzy and unclassified, so a soft separating hyper-plane is required in these regions. A new membership function is constructed based on the idea that membership should be only assigned the classes which centers are near the data points. What’ more, the size of fuzzy boundary regions can be controlled with a proportional index, which control the fuzziness of the clustering results. Because the clustering results are similar to human’s perception, this algorithm is applicable to fuzzy information granulation and computing with words.
Reference 1. Mac Queen, J.: Some methods for classification and analysis of multivariate observations. In: Le Cam. 281-297 2. Dunn, J.C.: Some recent investigations of a new fuzzy partition algorithm and its application to pattern classification problems. J. cybernetics, 4 (1974) 1-15 3. Bezdek, J., C.: pattern recognition with fuzzy objective function algorithms. Plenum, New York. 1981 4. Nikhil R. Pal, James C. Bezdek: On cluster validity for the fuzzy c-means model. IEEE transaction on fuzzy sytems. 3 (1995) 370-379 5. Jian Yu, Qiansheng Cheng, Houkuan Huang: Analysis on weighting exponent in the FCM. IEEE Transaction on SMC, Part B—cybernetics, 31 (2004) 634-639 6. Shehroz S. Khan, Amir Ahmad: Cluster center initialization algorithms for K-means clustering. Pattern recognition letters. 25 (2004) 1293-1302 7. Pawlak Z.: rough sets—theoretical aspects of reasoning about data. Kluwer academic publishers, 1991 8. Jensen R., Shen Q.: Fuzzy-rough attribute reduction with application to web categorization. Fuzzy sets and systems. 141 (2004) 469-485 9. Wang, Yi-Fan: Mining stock price using fuzzy rough set system. Expert Systems with Applications. 24 (2003) 13-23 10. Dubois D., Prade H.: Putting fuzzy sets and rough sets together, in: R. Slowiniski (Ed.), intelligent Decision support, 1992, 203-232 11. Wu, W. Zhang, W.: Constructive and axiomatic approaches of fuzzy approximation operators. Information Sciences. 159 (2004) 233-254 12. Pawan Lingras, Chad West: Interval set clustering of web users with rough k-means. Inter. J. of intell. Inform. system. 23 (2003) 5-16 13. Asharaf S., Narasimh M. Murty: A rough fuzzy approach to web usage categorization. Fuzzy sets and systems. 148 (2004) 119-129 14. Hathaway, R. J., Bezdek J. C. C-means clustering strategies using Lp norm distance. IEEE Trans. On fuzzy systems. 8 (2000) 576-582
504
Q. Hu and D. Yu
15. Li R.P. Mukaidon M. A maximum entropy approach to fuzzy clustering. In: proc. Of the 4th IEEE conf. on fuzzy systems, 1995. 2227-2232 16. Zadeh L. A.: Fuzzy logic equals Computing with words. IEEE Transactions on fuzzy systems 2 (1996) 103-111 17. Zadeh L. A.: Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy sets and systems. 90 (1997) 111-127 18. Zadeh L A.: From computing with numbers to computing with words - From manipulation of measurements to manipulation of perceptions. IEEE Transactions on circuits and systems. 46 (1999) 105-119
A Novel Segmentation Method for MR Brain Images Based on Fuzzy Connectedness and FCM Xian Fan, Jie Yang, and Lishui Cheng Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, 200030, Shanghai, P.R. China [email protected]
Abstract. Image segmentation is an important research topic in image processing and computer vision community. In this paper, a new unsupervised method for MR brain image segmentation is proposed based on fuzzy c-means (FCM) and fuzzy connectedness. FCM is a widely used unsupervised clustering algorithm for pattern recognition and image processing problems. However, FCM does not consider the spatial coherence of images and is sensitive to noise. On the other hand, fuzzy connectedness method has achieved good performance for medical image segmentation. However, in the computation of fuzzy connectedness, one needs to select seeds manually which is elaborative and timeconsuming. Our new method used FCM as the first step to select salient seeded points and then applied fuzzy connectedness algorithm based on those seeds. Thus our method achieved unsupervised automatic segmentation for brain MR images. Experiments on simulated and real data sets proved it is effective and robust to noise.
1 Introduction Image processing techniques are important in medical engineering, such as 3D visualization, registration and fusion, and operation planning. In image processing, object extraction or image segmentation is the most crucial step because it lays the foundation for subsequent steps such as object visualization, manipulation, and analysis. Since manual or semi-automatic segmentation takes a lot of time and energy, automatic image segmentation algorithms have received a lot of attention. Images are by nature fuzzy [1]. This is especially true to the biomedical images. The fuzzy property of images is usually made by the limitation of scanners in the ways of spatial, parametric, and temporal resolutions. What’s more, the heterogeneous material composition of human organs adds to the fuzzy property in magnetic resonance images (MRI). As the goal of image segmentation is to extract the object from the other parts, segmentation by hard means may despoil the fuzziness of images, and lead to bad results. By contrast, using fuzzy methods to segment biomedical images would respect the inherent property fully, and could retain inaccuracies and uncertainties as realistically as possible [2]. L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 505 – 513, 2005. © Springer-Verlag Berlin Heidelberg 2005
506
X. Fan, J. Yang, and L. Cheng
Although the object regions of biomedical images manifest themselves with heterogeneity of intensity values due to their fuzzy property, knowledgeable human observers could recognize the objects easily from background. That is, the elements in these regions seem to hang together to form the object regions in spite of their heterogeneity of intensity. In 1973, Dunn[3] firstly developed “fuzzy c-means” (FCM) which is a fuzzy clustering method to allow one piece of data to belong to two or more clusters. In [4], Bezdek improved the algorithm so that the objective function minimizes in an iterative procedure. Clark has developed a system to provide completely automatic segmentation and labeling of MR brain images in [5]. Although FCM maintains inaccuracy of the elements’ membership to every cluster, it does not take into consideration that the elements in the same object are hanging together. What’s more, it is susceptible to noise and may end with a local minimum. In [6], Rosenfeld developed fuzzy digital topological and geometric concepts. He defined fuzzy connectedness by using a min-max construct on fuzzy subsets of 2-D picture domains. Based on this, Udupa [7] introduces a fundamental concept called affinity, which combined fuzzy connectedness directly with images and utilized it for image segmentation. Thus the properties of inaccuracy and hanging togetherness are made full use of. However, the step of manual selection of seeds, although onerous and time-consuming, is unavoidable in the initialization of fuzzy connectedness. This paper proposes a new segmentation method based on the combination of fuzzy connectedness and FCM, and applies it to segment MR brain images. The method makes full use of the advantages of the two fuzzy methods. The segmentation procedure implements automatic seed selection and at the same time guarantees the quality of segmentation. The experiments on simulated and real MR brain images prove that this new method behaves well. The remainder of this paper is organized as follows. Section 2 introduces the fuzzy connectedness method. Section 3 describes our fuzzy connectedness and FCM based method in detail. Section 4 presents the experimental results. Conclusions are given in Section 5.
2 Background In this section, we summarize the fuzzy connectedness method for segmentation proposed by Udupa[7]. Let X be any reference set. A fuzzy subset of is a set of ordered pairs where μ
: X → [0,1] . μ is called the membership function of in X . A 2-ary in X is a fuzzy subset of X×X , fuzzy relation ρ ρ = {(( x, y ), μ ρ ( x, y )) | x, y ∈ X } where μ ρ : X × X → [0,1] .
For any n ≥ 2 , let n -dimensional Euclidean space R be subdivided into hypercuboids by n mutually orthogonal families of parallel hyperplanes. Assume, with no n
A Novel Segmentation Method for MR Brain Images
507
loss of generality, that the hyperplanes in each family have equal unit spacing so that the hypercuboids are unit hypercubes, and we shall choose coordinates so that the center of each hypercube has integer coordinates. The hypercubes will be called spels (an abbreviation for “space elements”).
Z n is the set of all spels in R n .
A fuzzy relation α in Z is said to be a fuzzy adjacency if it is both reflexive and symmetric. It is desirable that α be such that is a non increasing function of the disn
tance || c − d || between c and d .We call the pair adjacency, a fuzzy digital space.
( Z n , α ) , where α is a fuzzy
( Z n , α ) is a pair & = (C , f ) where C = {c | −b j ≤ c j ≤ b j for some b ∈ Z +n } ; Z +n is the set of n -tuples of positive integers; f , called scene intensity, is a function whose domain is C . C is called the scene domain, whose range [ L, H ] is a set of numbers (usually integers). n Let & = (C , f ) be any scene over ( Z , α ) . Any fuzzy relation κ in C is said A scene over a fuzzy digital space
to be a fuzzy spel affinity (or, affinity for short) in if it is reflexive and symmetric. In practice, for κ to lead to meaningful segmentation, it should be such that, for any c, d ∈ C , μκ (c, d ) is a function of 1) the fuzzy adjacency between c and d ; 2) the homogeneity of the spel intensities at c and d ; 3) the closeness of the spel intensities and of the intensity-based features of c and d to some expected intensity and feature values for the object. Further,
μκ (c, d )
μκ (c, d )
may depend on the actual location of
(i.e., μκ is shift variant).
Path strength is denoted as the strength of a certain path connecting two spels. Saha and Udupa[7] have shown that under a set of reasonable assumptions the minimum of affinities is the only valid choice for path strength. So the path strength is
μ ( p) = min [ μκ (ci −1 , ci ) ] ,
(1)
1 and
Every pair of
is denoted as the fuzzy
κ -net
in
C.
(ci −1 , ci ) is a link in the path while ci −1 and ci may not always be
adjacent.
, α ) , for any affinity κ and κ –net in , fuzzy κ -connectedness K in is a fuzzy relation in C defined by the following membership function. For any c, d ∈ C For any scene
over ( Z
n
μ K (c, d ) = max [ μ ( p) ] p∈Pcd
.
(2)
508
X. Fan, J. Yang, and L. Cheng
Fig. 1. Illustration of the algorithm of fuzzy connectedness
Combined with eq.(1), eq.(2) shows the min-max property of the fuzzy connectedness between each two spels, as is illustrated in Fig.1. A physical analogy one may consider is to think that there are a lot of strings connecting spels A and B , each with its own strength (called path strength as to a certain path). Imagine that A and B are pulled apart. Under the force the strings will break one by one. As to a certain string, the affinity of the link where the string breaks is denoted as the path strength of this string (the path strength is defined as the minimum affinity of all the links in the path). When all but one string are broken, the last string behave as the strongest one and it’s path strength is denoted as the fuzzy connectedness between spels A and B . Let S be any subset of C . We refer to S as the set of seed spels and assume throughout that S ≠ ∅ . The fuzzy kθ -object of containing S equals
{
}
OKθ ( S ) = c | c ∈ C and max[ μ K ( s, c)] ≥ θ . s∈S
With eq.(3), we could extract the object we want given computed via dynamic programming [7].
θ
and
(3)
S . This could be
3 Proposed Method In this section, we propose a new segmentation method which is unsupervised and robust, based on the combination of fuzzy connectedness and FCM. As Udupa has been trying to utilize the fuzzy nature of the biomedical images in both aspects of inaccuracy and hanging togetherness, users’ identifying seed spels belonging to the various objects is always left to be an onerous task. Therefore, automatic selection of seeds becomes important in our research work. On the other hand, FCM, as an old fuzzy method in clustering, is likely to converge to the local minimum and thus lead to false segmentation. Our new method combines the two fuzzy methods organically, trying to implement automatic seed selection and guarantee the segmentation quality. The outline of the new method is as follows. First, FCM is used to pre-segment the MR brain image, through which the scope of the seeds is obtained. Since the number
A Novel Segmentation Method for MR Brain Images
509
of seeds within the scope is much more than that we need, the unqualified seeds within the scope are automatically eliminated according to their spatial information. With the left seeds as an initialization, the MR brain image is segmented by fuzzy connectedness. Here are the detailed steps of our proposed algorithm applied to MR brain images. Step 1. Preprocess the MR brain images, including denoising, intensity correction, and inhomogeneity-correction. In intensity correction, a standardized histogram of MR brain image is acquired, and all the other images’ is corrected so that their histogram would match the standardized one as best as possible [8]. Step 2. Set the number of clustering to 4, and pre-segment the image by FCM, so that the expected segmented objects will be white matter, grey matter, CSF and background. Step 3. Compute the average intensity of each of the four clusters, which is denoted as
v j (b ) , b = 1, 2,3, 4 . Due to the biomedical knowledge that white matter’s
intensity is larger than other tissues in MR brain image, we find the largest v j
(i )
. The cluster
v j (i ) belongs to is the white matter.
Step 4. Compute the deviation of the cluster obtained in step 3. The scope of the seeds is defined by the equation
v j (i ) ± 0.3 δ (i ) .
(4)
Step 5. Define N as the number of seeds in fuzzy connectedness. Step 6. Take the spatial information of the spels within the scope into account, and class them into N clusters. Make the center spels of each cluster the seeds as initialization. Step 7. Segment the image precisely with selected seeds by fuzzy connectedness method. With step 1, we get the standardized histogram of each MR brain image, which guarantees that the parameter 0.3 in eq.(4) will work. Then according to the average and deviation intensity of the region of interest, the scope of the seeds could be gotten with the eq.(4). Since the spatial information of these spels is taken into account, the automatically selected seeds are effective.
4 Experimental Results In this section, we give the experimental results with both simulated and real MR brain images. To simulated images, as the reference result is the “ground truth”, accuracy is described with three parameters: true positive volume fraction (TPVF), false positive volume fraction (FPVF), and false negative volume fraction (FNVF). These parameters are defined as follows:
510
X. Fan, J. Yang, and L. Cheng
(5)
TPVF (V ,Vt ) =
V Vt , Vt
FPVF (V ,Vt ) =
V − Vt , Vt
(6)
Vt − V
(7)
FNVF (V , Vt ) =
Vt
,
Vt denotes the set of spels in the reference result, V denotes the set of spels resulting from the users’ algorithms. FPVF (V , Vt ) denotes the cardinality of the set of spels expressed as a fraction of the cardinality of Vt that are in the segmentation result of the method but not in Vt . Analogously, FNVF (V , Vt ) denotes a fraction of Vt that is missing in V . We use probability of error (PE) to Where
evaluate the overall accuracy of the simulated image segmentation. PE [9] could be described as
PE = FPVF (V ,Vt ) + FNVF (V ,Vt ) .
(8)
The larger PE is, the poorer the accuracy of the segmentation method is. To the real images, as the reference result is made manually, the inaccuracy of the reference segmentation result should be taken into account. Zijdenbos’s[10] Dice Similarity Coefficient (DSC), which has been adopted for voxel-by-voxel classification agreement, is proper to evaluate the segmentation result of real images. We now describe it as follows. For any type T, assuming that Vm denotes the set of pixels assigned for it by manual segmentation and
Vα denotes the set of pixels assigned for
it by the algorithm, DSC is defined as follows:
DSC = 2 •
| Vm Vα | . | Vm | + | Vα |
(9)
Since manual segmentations are not “ground truth”, DSC provides a reasonable way to evaluate automated segmentation methods. 4.1 Simulated Images We applied our method on simulated MR brain images provided by McConnell Brain Image Center [11]. As the brain database has provided the gold standard, validation of the segmentation methods could be carried out. The experimental images in this paper were imaged with T1 modality, 217 * 181 (spels) resolution and 1mm slice thickness. 3% noise has been put up and the intensity-nonuniformity is 20%.
A Novel Segmentation Method for MR Brain Images
511
We take the 81th to 100th slices of the whole brain MRI in the database, and segment them with fuzzy connectedness (FC), FCM and our proposed method respectively. The evaluation values are listed in table 1 according to the gold standard. Table 1. Average value of 20 simulated images using three segmentation methods (%)
Study FCM FC Proposed Method
(a)
(b)
TPVF 96.45 98.03 98.89
FPVF 2.79 1.09 1.55
FNVF 3.55 1.97 1.11
(c)
PE 6.34 3.06 2.66
(d)
Fig. 2. Segmentation of the white matter from simulated images (a) original image; (b) result of segmentation by fuzzy connectedness; (c) result of segmentation by our proposed method; (d) white matter provided by gold standard
It could be noticed in table 1 that our proposed method behaves best, and the second is fuzzy connectedness, the last FCM. This should be own to the high quality of the automatic seed selection. And the high accuracy of fuzzy connectedness also contributes to the good result. Take the 97th slice as an example. When using fuzzy connectedness as the segmentation method, manual selected seeds are (79, 64), (61,131), as shown in Fig. 2(b). When our proposed method is implemented, the scope of the seeds from FCM is 0.98949 ± 0.0002878. That is, the mean value of the intensity is 0.98949, and the deviation value is 0.0009593. According to eq.(4), the scope of seeds is obtained. Considering the spatial information of the seeds from FCM, the unqualified seeds are eliminated and the two left is (85, 65) and (62,130), as is shown in Fig.2(c). It could be seen that the seeds automatically selected are located near to the manual selected ones, and thus they work as effectively as the manual ones. 4.2 Real Images We applied our method to twelve real MRI brain data sets. They were imaged with a 1.5 T MRI system (GE, Signa) with the resolution 0:94 *1.25*1.5 mm (256* 192*
512
X. Fan, J. Yang, and L. Cheng
124 voxels). These data sets have been previously labeled through a labor-intensive (usually 50 hours per brain) manual method by an expert. We first applied our method on these data sets and then compared with expert results. We give the results from the 3th to the 10th of them gained by our method in Table 2 and Fig. 3. Table 2. Average DSC of 8 real cases (%)
FC FCM Proposed Method
1 72 78 82
2 67 77 83
3 61 79 85
4 67 80 84
5 68 82 85
6 55 82 86
7 72 81 82
8 54 80 81
(a) (b) (c) (d) Fig. 3. Segmentation of the white matter in real images (a) original image; (b) the scope of seeds to be selected from FCM; (c) white matter segmented from proposed method; (d) white matter segmented from FCM
According to Table 2, the rank of DSC by the three methods is the same as that in simulated image experiments. It could be observed that there are several cases’ DSC of FCM is rather poor. That is because the noise in real images changes widely in different cases. Since FCM is sensitive to noise, it could not segment the object correctly. Take the 6th case as an illustration. Although the original figure is corrupted by noise severely, the scope of the seeds selected by FCM is still within the region of white matter as is marked in Fig.3 (b). Thus our proposed method behaves well with two automatically selected seeds as in Fig.3(c). Yet the result of FCM is very poor as is shown in Fig.3 (d).
5 Conclusion Fuzzy property is the nature of images. When the images are segmented by fuzzy methods, the inaccuracies property of the elements is adopted. FCM is one of the fuzzy clustering techniques, but it ignores the hanging-togetherness property of images. Thus it is sensitive to noise and may end with a local minimum. Fuzzy
A Novel Segmentation Method for MR Brain Images
513
connectedness is a method which takes both the inaccuracies and hangingtogetherness of the elements into consideration. Although fuzzy connectedness behaves well, the selection of seeds takes operators time and energy. In our proposed method, seed selection is made automatic by FCM. The quality of the automatically selected seeds is guaranteed by the elimination of unqualified seeds. What’s more, unlike FCM, the new method is robust because of the use of fuzzy connectedness. Through simulated and real image experiments, we could see that our proposed method could not only automatically select seeds, but has a desirable accuracy.
Reference 1. Udupa, J.K., Saha, P.K.: Fuzzy Connectedness and Imaging Segmentation. Proceedings of The IEEE, Vol.91, No.10, pp: 1649-1669, 2003 2. Saha, P.K., Udupa, J.K. and Odhner, D.: Scale-based fuzzy connected image segmentation: Theory, algorithms, and validation. Computer Vision and Image Understanding, Vol.77, pp: 145-174, 2000 3. Dunn, J.C. A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters. Journal of Cybernetics, Vol.3, pp: 32, 1973 4. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, 1981 5. Clark, M.C.: MRI segmentation using fuzzy clustering techniques. IEEE Engineering in Medicine and Biology, Vol.13, No.5, pp: 730, 1994 6. Rosenfeld, A. Fuzzy digital topology. Information and Control, Vol.40, No.1, pp: 76, 1979 7. Udupa, J.K. Samarasekera, S. Fuzzy Connectedness and Object Definition: Theory, Algorithms, and Applications in Image Segmentation. Graphical Model and Image Processing, Vol.58, No.3, pp: 246, 1995 8. Nyul, L.G., Udupa, J.K.: On standardizing the MR image intensity scale. Magn Reson Med, Vol.42, pp: 1072-1081, 1999 9. Dam, E.B.: Evaluation of diffusion schemes for multiscale watershed segmentation. MSC. Dissertation, University of Copenhagen, 2000 10. Zijdenbos, A.P., Dawant, B.M., Margolin, R.A., Palmer, A.C. : Morphometric analysis of white matter lesions in MR images: Method and validation. IEEE Transactions on Medical Imaging, Vol.13, No.4, pp: 716, 1994 11. Cocosco, C.A., Kollokian, V., Kwan, R.K.-S., Evans, A.C.: BrainWeb: Online Interface to a 3D MRI Simulated Brain Database. NeuroImage, Vol.5, No.4, pp: part 2/4, S425, 1994
Improved-FCM-Based Readout Segmentation and PRML Detection for Photochromic Optical Disks Jiqi Jian, Cheng Ma, and Huibo Jia Optical Memory National Engineering Research Center, Tsinghua University, Beijing 100084, P.R. China [email protected] {macheng, jiahb}@tsinghua.edu.cn
Abstract. Algorithm of improved Fuzzy C-Means (FCM) clustering with preprocessing is analyzed and validated in the case of readout segmentation of photochromic optical disks. Characteristic of the readout and its differential coefficient and other knowledge are considered in the method, which makes it more applicable than the traditional FCM algorithm. The crest and trough segments could be divided clearly and the rising and falling edges could be located properly with the improved-FCM-based readout segmentation, which makes RLL encoding/decoding applicable to photochromic optical disks and makes the storage density increased. Further discussion proves the consistency of the segmentation method with PRML, and the improved-FCM-based detection could be regarded as an extension of PRML detection. Keywords: Fuzzy clustering, FCM, PRML, optical storage, photochromism.
1 Introduction With the fast development of technology and with the ever growing demand for high density optical data storage, photochromic optical disks, including multi-level and multi-wavelength photochromic optical disks, become more and more important a research area. Continuous efforts and experiments have been made on it, but most sample disks are dotted recorded, whose storage density is lower than run-length limited (RLL) encoded disks, such as CD, DVD and blu-ray disk. The nonapplication of RLL encoding to photochromic optical disks is partly because of the indistinct edges of the information characters on a photometric optical disk, different from the sharp edges of the characters of CD, DVD and blu-ray disk, which make it hard to clearly divide the readout into wave crests and troughs and locate the rising and falling edges, as RLL encoding/decoding requests. As the laboratory findings in the Optical Memory National Engineering Research Center, the information characters and readouts of a dualwavelength photochromic sample disk are shown in fig. 1 and fig. 2. In recent years the synthesis between clustering algorithms and fuzzy set theory has led to the development of fuzzy clustering whose aim is to model fuzzy unsupervised patterns efficiently. Fuzzy C-means (FCM) method proposed by Bezdech is the L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 514 – 522, 2005. © Springer-Verlag Berlin Heidelberg 2005
Improved-FCM-Based Readout Segmentation and PRML Detection
(a)
515
(b)
Reflectance f(x)
Fig. 1. Information characters on a dual-wavelength photometric sample disk (a) Information characters for the laser, 650nm; (b) Information characters for the laser, 532nm
x(μm) Fig. 2. Readouts of a dual-wavelength photometric sample disk
most widely used fuzzy clustering algorithm.[1-8] It may help us to divide the readout of photochromic disks into crests and troughs and locate the rising and falling edges.
2 Theoretical Background FCM clustering is based on the theory of fuzzy sets. The most important feature of fuzzy set theory is the ability to express in numerical format the impression that stems from a grouping of elements into classes that do not have sharply defined boundaries, which is exactly fit for crest and trough classification of the readouts of photochromic disks. For a given dataset X={ x j |j=1,2,…,n}, FCM algorithm generates a fuzzy partition providing the membership degree uij of data x j to cluster i, (i=1,2,…,C). The objective of the algorithm is to minimize the objective function J m to get the optimal fuzzy partition for the given dataset, where m is a real-valued number which controls the
516
J. Jian, C. Ma, and H. Jia
‘fuzziness’ of the resulting clusters, and d ( x j , vi ) corresponds to the square value of the Euclidean distance between x j and vi . The procedure of the algorithm can be briefly described as Fig 3.
x j , (j=1,2,…,n), Fix C and m, m>1 Initialize fuzzy partition matrix U uij ∈ [0,1] , (i=1,2,…,C), n
c
¦u
ij
i =1
= 1 , 0 < ¦ uij < n j =1
Compute the cluster centers n
¦u
m ij
vi =
xj
j =1 n
¦u
m ij
j =1
Update the fuzzy partition matrix , 1 uij = 1 c § d ( x , v ) · m −1 j i ¨¨ ¸¸ ¦ ( , d x v k =1 © j k)¹
d ( x j , vi ) = x j − vi
2
Calculate the objective function n
Jm
c
J m(l ) = ¦¦ (uijm d ( x j , vi )) j =1 i =1
No Satisfy terminal condition? J m(l ) − J m( l −1) ≤ ε Yes End Fig. 3. Process of FCM algorithm
Improved-FCM-Based Readout Segmentation and PRML Detection
517
3 Comparison of Two Readout Segmentation Methods
Reflectance f (x)
The traditional FCM algorithm and improved FCM method with preprocessing are tested illustratively using the 650nm readout curve f(x) of a dual-wavelength photometric sample disk shown in fig. 2. Both of f(x) and its differential coefficient f’(x) are shown in fig. 4.
x(μm)
f’(x)
(a)
x(μm)
(b) Fig. 4. 650nm readout curve and its differential coefficient
3.1 FCM-Based Readout Segmentation Using the readout curve and its differential coefficient as dataset given, we can segment the readout with FCM method. When the exponent for the partition matrix m is 2.0[8], we can get the result shown in Fig. 5(a), where cluster centers are indicated by the large characters, and (b), where the curve segments with crosses are crests, the curve segments with dots are troughs, and between them, there are the rising and falling edges. The segmentation result of the 650nm readout is shown in Fig. 4, from which we could see that the curve is segmented unsupervisedly. However, there are some mistakes of the result of readout segmentation based on FCM Clustering. For example, some parts of the curve are not classified properly, such as the pseudo-peaks on the left side in the figure. And some edges are located inaccurately, such as the first rising edge on the left.
518
J. Jian, C. Ma, and H. Jia
f’(x)
The main reason of the errors is that there is no fore-known information applied in the algorithm, which limits the efficiency of the algorithm significantly, no matter how many categories the points on curve are divided into, as fig. 5 suggests.
Reflectance f(x)
Reflectance f(x)
(a)
x(μm)
(b) Fig. 5. Result of readout segmentation based on traditional FCM algorithm (C=2)
519
f’(x)
Improved-FCM-Based Readout Segmentation and PRML Detection
Reflectance f(x)
Reflectance f(x) (a)
x(μm)
(b) Fig. 6. Result of readout segmentation based on traditional FCM algorithm (C=4)
3.2 Improved-FCM-Based Readout Segmentation with Preprocessing With the characteristic of the readout and its differential coefficient considered as well as other knowledge, we could add the following pretreatments to the segmentation based on FCM clustering. Firstly, we could give two threshold values A and B for the readout curve. A>B. When the value of a readout sample point is larger than A, the point can be classified as crest point, and when the value is smaller than B, the point can be classified as trough point. Secondly, give two threshold values C and D
520
J. Jian, C. Ma, and H. Jia
Reflectance f(x)
for the differential coefficient of the readout. C>D. When the differential coefficient is larger than C or smaller than D, the point would be classified as rising or falling edge point. Finally, on the basis of the preprocessing, FCM method could be used to judge the classification of other parts of the readout with the initial centers as the results of pre-clustering. When A=0.50, B=0.35, C=0.05, D=-0.07, m=2.0, we could get the result shown in Fig. 7, where points on the curve are divided into four categories: crest points, trough points, rising or falling edge points.
x(μm)
f’(x)
(a)
Reflectance f(x) (b) Fig.7. Result of readout segmentation based on improved FCM method
Improved-FCM-Based Readout Segmentation and PRML Detection
521
The crest and trough segments are divided clearly, and the rising and falling edges are located properly. 3.3 Consistency of the Readout Segmentation Methods with PRML Detection As data density increases, partial response maximum likelihood (PRML) detection method has been applied more and more widely to avoid the Inter-Symbol Interference (ISI) problem.[9, 10] Instead of trying to distinguish individual peaks to find flux reversals, PRML manipulates the analog data stream coming from the disk (the "partial response" component) and then determines the most likely sequence of the bits ("maximum likelihood"). In a PRML system, the channel is truncated to a target PR mode by linear equalization, and the PR modes could be expressed as P( D) = r0 + r1 D + r2 D 2 + rM D M . Both of the two readout segmentation methods manipulate the readout f(x) and its differential coefficient f’(x) together, where f’(x) =( f(x) - f(x-1))/2 =(1-D) f(x)/2. So the detection based on the improved-FCM-based readout segmentation method could be regarded as the extension of PRML detection with the PR mode P( D ) = α 0 + α1 D ,
where α 0 and α1 are variable and both have different expressions when f(x) and f’(x) have different relation with the threshold values A, B, C, D, etc.
4 Conclusion Algorithm of improved FCM clustering is analyzed and validated in the case of readout segmentation of photochromic optical disks. Characteristic of the readout and its differential coefficient and other knowledge are considered in the method, which makes it more applicable than the traditional FCM algorithm. The crest and trough segments could be divided clearly and the rising and falling edges could be located properly with the improved method, which makes RLL encoding/decoding applicable to photochromic optical disks and paves the way for the advance of photochromic optical storage, as well as photochromism based multi-level and multi-wavelength optical storage. Further discussion proves the consistency of the segmentation method with PRML, and the detection method based on the improved-FCM-based readout segmentation could be regarded as the extension of the PRML detection method with the PR mode P( D ) = α 0 + α1 D where α 0 and α1 are variable and both have different expressions when f(x) and f’(x) have different relation with the threshold values.
References 1. Bezdek J. C.: Fuzzy mathematics In pattern classfication, PhD thesis, Cornell University, Ithaca (1973) 2. Bezdek J. C.: Pattern Recognition with Fuzzy Objective Function Algorithm, Plenum, New York (1981) 3. Yoo S. H., Cho S. B.: Partially Evaluated Genetic Algorithm Based on Fuzzy c-Means Algorithm. Parallel Problem Solving from Nature - PPSN VIII, Lecture Notes in Computer Science, Vol. 3242. Springer-Verlag, Berlin Heidelberg New York (2004) 440
522
J. Jian, C. Ma, and H. Jia
4. Pedrycz W., Loia V., Senatore S.: P-FCM: a proximity-based fuzzy clustering. Fuzzy Sets and Systems, Vol.148. Elsevier, Hoboken (2004) 21-41 5. Tsekouras G. E., Sarimveis H.: A new approach for measuring the validity of the fuzzy cmeans algorithm. Advances in Engineering Software, Vol. 35. Elsevier, Hoboken (2004) 567-575 6. Loia V., Pedrycz W., Senatore S.: P-FCM: a proximity-based fuzzy clustering for usercentered web applications. International Journal of Approximate Reasoning, Vol. 34. Elsevier, Hoboken (2003) 121-144 7. Sebzalli Y. M., Wang X. Z.: Knowledge discovery from process operational data using PCA and fuzzy clustering. Engineering Applications of Artificial Intelligence, Vol. 14, Elsevier, Hoboken (2001) 607-616 8. Paland N. R., Bezdek J. C.: On cluster validity for the fuzzy c-means model. IEEE Tans. Fuzzy Systems, Vol. 3, IEEE (1995) 370-379 9. Lee C. H., Cho Y. S.: A PRML detector for a DVDR system. IEEE Trans. Consumer Electron, Vol. 45, IEEE (1999) 278-285 10. Choi S. H., Kong J. J., Chung B. J., Kim Y. H.: Viterbi detector architecture for high speed optical storage. in Speech and Image Technologies for Computing and Telecommunications, IEEE Proceedings/TENCON, Vol. 1, IEEE (1997) 89-92
Fuzzy Reward Modeling for Run-Time Peer Selection in Peer-to-Peer Networks Huaxiang Zhang1 , Xiyu Liu1 , and Peide Liu2 1
2
Dept. of Computer Science, Shandong Normal Univ. Jinan 250014, Shandong, China [email protected] Dept. of Computer Science , Shandong Economics Univ. Jinan 250014, Shandong, China
Abstract. A good query plan in p2p networks is crucial to increase the query performance. Optimization of the plan requires effective and suitable remote cost estimation of the candidate peers on the basis of the information concerning the candidates’ run-time cost model and online time. We propose a fuzzy reward model to evaluate a candidate peer’s online reliability relative to the query host, and utilize a real-time cost model to estimate the query execution time. The optimizer is based on the run-time information to generate an effective query plan.
1
Introduction
The peer-to-peer (p2p) [1] paradigm has opened up new research areas such as networking and distributed computing. As one of the main application areas, file sharing has gained much attention. Routing the query messages to the destination peers and executing the query plan efficiently pose new challenges to the research. As each peer can provide services such as data services or computing services, a query plan can be executed with the coordination of relevant peers. The execution cost estimation of a query plan is a crucial problem, and an optimization approach should be employed to generate the plan. The cost of executing a query at remote nodes may be expensive because of the communication cost or the heavy workload at the remote sites, and different distributed computing strategies result in quite different query processing performances. To optimize the query cost in distributed information systems is difficult. The evaluation cost models proposed can be categorized into static and dynamic cost models. The static cost model simplifies the distributed systems as static systems, and utilizes different approaches in the cost measurement metrics. As the name implies, no changes will be made in static models once they are derived. So employing the static models to estimate a query cost is not suitable to a dynamic environment such as p2p networks. In p2p networks, high-availability, L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 523–530, 2005. c Springer-Verlag Berlin Heidelberg 2005
524
H. Zhang, X. Liu, and P. Liu
fault-tolerance and scalability are main characteristics, and every peer can joint in or drop out of the systems freely. The workload of a remote peer changes as more or less tasks need to be supported, and the dynamic property of p2p requires new query cost models be proposed. Availability of a peer is also very important since the dynamic property of p2p networks. This issue is important since many p2p systems are designed on the fundamental assumption that the availability of a peer depends on the online time of it. In p2p networks, intermittent component online time is feasible, and that a peer leaves the system and joins the system again at a later time is very common. We are also motivated to study peer availability in part to shape the evaluation of p2p systems. The primary goal of this consideration is to optimize the query cost model in p2p networks.
2
Related Work
As soft computing technologies have been adopted in communications [2], some models based on these technologies have been proposed. Bosc et al [3] propose a fuzzy data model as the basis of the design of a run-time service selection trader. It’s not suitable to the p2p networks as the trader environment is static. The cost-based query optimization in distributed information sources has been studied [4], and a calibration-based approach has been proposed. Based on the sampling query running against user database systems, zhu et. Al [5] proposes a local cost model. Shahabi et. al [6] proposes a run-time cost statistics model for the remote peers by employing a probe-based strategy. Even though the approach proposed in [6] is able to reflect more run-time cost statistics of remote peers, but it lacks scalability due to the extensive probe query number. All the models mentioned above take no dynamism into consideration. Ling et. al [7] proposes a progressive ”push-based” remote cost monitoring approach, and introduces a fuzzy cost evaluation metric to measure a peer’s reliability. Even though their approach takes dynamism into consideration, the fuzzy reliability is not properly defined. If we say a remote peer is more reliable to a query host, it means both the remote and the query host share more common online time, instead that the remote peer has more online time. Even complex p2p query systems have been proposed in the literature recently, which employ different message routing and query location schemes, how to optimize a query plan is still a key issue needing much attention, because the query plan has great impacts on the query performance.
3
Cost Evaluation
A complex query generated by a peer can be executed in several peers having the target resources. Suppose there are candidate peers storing the target
Fuzzy Reward Modeling for Run-Time Peer Selection
525
resources, the objective of the query optimization is to select n ( n * m ) peers from m candidates to execute the query at the minimal cost and with high execution reliability. This problem can be considered as a multi-agent coordination problem [8], and an effective coordination mechanism is essential for a query host to achieve its goal in p2p systems. The essential coordination solves the query problem in the p2p environment with distributed resources. Selection of the candidate peers should take the remote peer’s reliability, the execution time and other factors into consideration. In p2p networks, a peer’s reliability depends on its online time relative to other peers, and this online time can be estimated. Many factors affect the query execution time of a peer, such as the cpu processing speed, the network communication cost, the input or output, the size of information to be transmitted and the waiting time. To simplify the query cost optimization problem, we think a peer’s reliability and the query execution time are the two main factors that should be utilized in a query cost model. 3.1
Relative Online Time
A peer’s availability is not well modeled in p2p networks, and the reliability of one peer to another peer depends not only its online availability but also on the inter-dependence between these peers. A peer’s online time determines its reliability. For example, if a peer is offline, and the host still sends message to it, then the host will get no result and have to re-transmit the message to other candidate peers. In this case, the peer is unreliable, and we set the reliability degree to 0; if a peer is online all the time and can provide other peers with software or hardware services, then it is reliable and we set its reliability to 1. The concept of reliability is relative. For example, if a peer A is online from 0 am to 2 am, and peer B is online from 2 am to 12 pm, even though peer B’s online time is very long, it is not reliable to peer A as they share no common online time. Relative reliability of each peer pair can also be characterized by using conditional probabilities. Consider both peer A and B, the conditional probability of B being available given that A is available for a given time period. The value P (B = 1/A = 1) is relative reliability of B to A. In this case, these two peers are dependent. Definition 1: [a, b] is defined as an online time interval of a peer, and the length of [a, b] is denoted as [a, b](= b − a). [a, b] ∧ [c, d] is the common online time interval falling within both [a, b] and [c, d] concurrently. If [a, b]∧[c, d] = + , then [a, b] ∧ [c, d] = 0. Definition 2: [a, b] ∨ [c, d] means a peer’s two online time intervals, and [a, b] ∨ [c, d] = b − a + d − c − [a, b] ∧ [c, d].
526
H. Zhang, X. Liu, and P. Liu
Definition 3: If the jth online time interval of peer n is [snj , enj ](j = 1, · · ·, k) (k is the number of online time intervals), then the online time interval of peer n is Tno = ∨kj=1 [snj , enj ], and n’s online time length is Tno . Definition 4: If Tno and Tmo are the online time intervals of peer n and m respectively, then we define Tno ∧ Tmo as the online time of peer n relative to m, and denote it as Tnmo . Tnmo is the relative online time length, and it’s easy to prove that Tnmo = Tmno . If Tnmo = + , it means peer m and n share no common online time intervals. In this case, peer n is unreliable to peer m. If Tnmo = Tno , it means peer n’s online time falls within peer m’s online time. In this case, peer m is completely reliable to peer n. Else if Tnmo = Tmo , it means peer m’s online time falls within peer n’s online time, and peer n is completely reliable to peer m.
3.2
Reliability Degree
The execution time of a query is affected by several factors as we described above, and it is also affected by the dynamism of the p2p networks. A peer will take more time to finish a query with heavy workload than that it takes with light workload. Definition 5: The relative reliability degree of peer m to n is defined as rmn = Tmno Tno . Example 1. If m’s online time intervals are [2, 4] and [6, 10], and n’s online time intervals are [3, 5], [7, 9], [14, 20]. We have Tmno = [3, 4] ∨ [7, 9] and Tmno = 4−3+9−7 = 3. Tno = [3, 5]∨[7, 9]∨[14, 20] and Tno = 5−3+9−7+20−14 = 10. Tnmo mno 3 3 rmn = T Tno = 10 = 0.3 . We can similarly calculate rnm = Tmo = 6 = 0.5 . It’s clear rmn = rnm .
3.3
Fuzzy Reward Formulation
The query can be considered as a cooperative task requiring several peers to finish, and the query host peer has to select the peers that will engage in the coordination. Peer selection should follow a query optimization rule to select the peers from all the candidates with less query cost and more reliability. If a candidate peer is more reliable to a query host, we think it needs less reward for the query execution. Otherwise, we think it needs more reward. So the query optimization is transformed to be a problem of selecting peers to minimize the cost and rewards. We use fuzzy set theory [9] to formulate the fuzzy reward. The reliability degree of a candidate relative to the query host falls within [0, 1], and we regard it as a fuzzy set member. Then a fuzzy set A(r) in the universe of dis-
Fuzzy Reward Modeling for Run-Time Peer Selection
527
course where it denotes all possible reliability degrees can be formulated as
A(r) = r∈C μA (r)/r . C is the set of all candidates’ reliability degrees, and the membership function μA (r) in equation (1) indicates the degree that r belongs to A(r). 0 r≤ω (1) μA (r) = 2(1 + (1 − ω)(r − ω)−1 )−1 r > ω Where, ω is a threshold. A host may set a reliability degree and selects candidates with degrees no less than the set degree ω . μA (r) is approaching 1 if and only if r is closing to 1. As the relation between the reliability degree and the reward is discussed above, we define the fuzzy reward function f (r) in equation (2) ∞ r≤ω (2) f (r) = r>ω (1 + (1 − ω)(r − ω)−1 )/2 f (r) indicates the fuzzy reward needed for a candidate peer of reliability degree r to execute the query generated by a host. The objective of the host is to select suitable ones from all candidates with r > ω to minimize its cost. The online time interval can be collected statistically by a query sent to a candidate peer. 3.4
Cost Model
Several criteria influence the cost needed for a peer to execute a remote query, such as the waiting time, the communication cost, the workload of the peer, the cpu cost and input/output cost, the size of the message to be transmitted and etc. We use the cost model proposed in [7] to estimate the cost in terms of time. It is used to calculate the time a peer required to finish a query. The model employs four evaluation criteria. Previous studies on the static query optimization assume the evaluation criteria remain as constants. This may not be true for the dynamic p2p environment. We utilize the above cost model and describe the cost at a very high level. For each peer, we use a query execution time to indicate the cost. The time may change as the dynamic property of p2p networks. The host peer has a local cache to store the cost models of its candidates and updates the models according to their corresponding cost monitoring agents at the candidate peers. We use the cost models stored locally to estimate the query time required by a candidate peer. How to estimate the available time of a candidate peer should be solved. We adopt a prober to periodically probe each peers to determine whether they are available or not at a particular time interval. This information is stored in the host peer and updated according newly coming information. Relative reliability of a peer is calculated based on the probed information stored in the host peer.
528
4
H. Zhang, X. Liu, and P. Liu
Query Optimization Algorithm
For the peers in the host candidate list, we use equation (1) and (2) to calculate their reliability degrees and fuzzy rewards. The candidates with degrees less than the threshold are ignored, and we can get the left peer number m . For a given query, we utilize the cost model to estimate the execution time t required by each peer. Then the host has m value pair (ti , fi )(i = 1, · · ·, m) . fi is the fuzzy reward requested by the ith peer for executing the query, and ti is the time cost. The host needs to select n(n * m) from the m peers to finish the query. The query ending time is the largest one among all the times required by the selected n peers, and we denote the ending time as te . Therefore we have the following optimization problem: n Q = min(αte + fi ) (3) i=1
m
f
i . Where, α is a coefficient, and can be gotten as α = i=1 m i=1 ti (3) is a programming problem, and can be solved by a query optimization algorithm proposed in the following. If we arrange all the time values in an ascending order, we can get an ordered tuple (t1 , · · ·, tm ) , where ti is the ith smallest member of the tuple. So we can conclude the maximal time t required for a host to finish the query must be a member from tuple (tn , · · ·, tm ) . We testify each member in (tn , · · ·, tm ) to calculate the Q values and compare them with each other. The time that minimizes the values is the solution of equation (3). We have the optimization algorithm as shown in table 1.
Table 1. Query selection algorithm (1) arrange all the time ts in an ascending order, and get an ordered tuple T = (t1 , · · ·, tm ) (2) initialize a set S = , put m value pairs of (f, t) into S ; arrange the pairs in an ascending list ordered by t. S = {(f1 , t1 ), · · ·, (fm , tm )} (3) arrange the values of f whose t is a member of (t1 , · · ·, tn ) in ascendant order 0 0 too. The ordered values of f are denoted as F = (f1 , · · ·, fn0 )
0 We calculate U0 = ( n i=1 fi ) + αtn (4) for j =1 to m − n
j−1 { compute Uj = ( n−1 ) + fn+j + αtn+j ; i=1 fi j j−1 Insert fn+j into F . Finally, we get an ordered F j , F j = (f1j , · · ·, fn+j )} (5)the smallest one of Uk , where k ∈ (0, · · ·, m − n), is the best one. The first n − 1 members of F k in addition with fn+k are the fuzzy rewards we should select, and tn+k is the maximal time of the selected n peers.
The analysis of computational complexity of algorithm in table one can be given as follows: the time complexities in step (1), (2), (3), (4) and (5) are
Fuzzy Reward Modeling for Run-Time Peer Selection
529
m−n m log m , m , n log n , i=0 log(n + i) and m − n + 1 separately. So the algorithm’s computational complexity is the sum of step (1) to (5). We get O(m log m).
5
Conclusions
We study the problem of remote query cost optimization in p2p networks. It’s important for a query host to optimize the query and generate a query strategy. As the host just needs some of the candidates to execute the query, a selection algorithm is required. The objective is to select peers that provide minimal response times and with high reliability degrees. We consider a query as a cooperative task that can be performed by multi-agent, and propose a concept of fuzzy reward to evaluate a peer’s online reliability relative to the query host. We also adopt a peer cost model to estimate the dynamic cost of each peer, and use this model to calculate the query execution time. Taking the fuzzy reward and the dynamic cost model into consideration, we can utilize the run-time information to optimize the query. We argue that existing measurements and proposed novel query cost models in p2p systems do not capture the complex time-varying nature of availability in the dynamic p2p environment, and propose the concept of relative reliability to deal with the query optimization problem. As query optimization problem is very complex in p2p environment, some extensive work should be done.
Acknowledgement This paper is supported by the Natural Science Fund of Shandong Province of China with Grant No. Z2004G02
References 1. Risson J., Moors T.: Survey of research towards robust peer-to-peer networks search methods. In: Technical Report UNSW-EE-P2P-1-1, Univ. of New South Wales, Sydney, Australia (2004) 2. Wang, L.P.: Soft Computing in Communications. Springer, Berlin Heidelberg New York (2003) 3. Bosc P., Damiani E., Fugini M.: Fuzzy service selection in a distributed objectoriented environment. In: IEEE TRAN. on fuzzy systems 9, 5 (2001) 682–698 4. Adali S., Candan K. S., Papakonstantinou Y. and Subrahmanian V.: Query caching and optimization in distributed mediator systems. In: Proc. of the 1996 ACM SIGMOD Int. Conf. on Management of Data, Montreal, Canada (1996) 137–148 5. Zhu Q., Larson P.A.: Solving local cost estimation problem for global query optimization in multidatabase-systems. In: Distributed and Parallel Databases 6,(1998) 373–420
530
H. Zhang, X. Liu, and P. Liu
6. Khan L., McLeod D., Shahabi C.: An Adaptive Probe-Based Technique to Optimize Join Queries in Distributed Internet Databases. In: J. Database Manag. 12, 4 (2001) 3–14 7. Ling B. , Ng W. S. , Shu Y., Zhou A.: Fuzzy Cost Modeling for Peer-to-Peer Systems. In: LNCS, Vol 2872 (2004) 138–143 8. Bourne R. A. , Excelente-Toledo C. B., Jennings N. R.: Run-Time Selection of Coordination Mechanisms in Multi-Agent Systems. In: Proc. 14th European Conf. on Artificial Intelligence. Berlin, Germany (2000) 348–352 9. Jang J.S.R., Sun C. T., Mizutani E.: Neuro-fuzzy and soft computing. Prentice Hall (1997)
KFCSA: A Novel Clustering Algorithm for High-Dimension Data Kan Li and Yushu Liu Dept. of Computer Science and Engineering, Beijing Institute of Technology, Beijing , China, 100081 [email protected] [email protected]
Abstract. Classical fuzzy c-means and its variants cannot get better effect when the characteristic of samples is not obvious, and these algorithms run easily into locally optimal solution. According to the drawbacks, a novel mercer kernel based fuzzy clustering self-adaptive algorithm(KFCSA) is presented. Mercer kernel method is used to map implicitly the input data into the high-dimensional feature space through the nonlinear transformation. A self-adaptive algorithm is proposed to decide the number of clusters, which is not given in advance, and it can be gotten automatically by a validity measure function. In addition, attribute reduction algorithm is used to decrease the numbers of attributes before high dimensional data are clustered. Finally, experiments indicate that KFCSA may get better performance.
1 Introduction Clustering analysis groups data points according to some distance or similarity measure in order that objects in a cluster have high similarity.The popular methods such as Cmeans ,fuzzy C-means and its variants[1-5]which represent clusters through centroids by optimizing the squared error function in the input space. If the separation boundaries among clusters are nonlinear, the performance of these methods will be decreased. At the same time, the outliers affect the effect of the clustering. Fuzzy c-means algorithm cannot get better clustering effect when the characteristic of samples is not obvious, and the current fuzzy clustering algorithm run easily into locally optimal solution because the number of clusters needs to be determined in advance. According to the disadvantage of fuzzy c-means, a novel kernel based fuzzy clustering self-adaptive algorithm(KFCSA) is proposed in the paper. Mercer kernel method is introduced to fuzzy cmeans method. It may map implicitly the input data into the high-dimensional feature space through nonlinear mapping. Validity measure function is used to justify iteratively the number of clusters instead of number of clusters given in advance.
2 Attribute Reduction of High-Dimension Data Rough set based attribute reduction algorithm here is used to reduce the number of attributes in the terrain database(high-dimension data) in order to improve speed of clustering. Before a decision table is reducted, the decision table should be judged L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 531 – 536, 2005. © Springer-Verlag Berlin Heidelberg 2005
532
K. Li and Y. Liu
whether or not it is consistent. To an inconsistent table, the table is divided into two parts: the complete consistent table and the incomplete consistent table. 2.1 Attribute Reduction of Consistent Decision Table According to database theory, redundancy and dependency should be as few as possible in databases. Our algorithm views this rules as criteria of attribute reduction. The attribute set which average relevance is the minimum value is the last result of reduction. Conditional entropy is used to judge the relevance of attributes. Algorithm 1. RSAR(Rough Set Based Attribute Reduction Algorithm) Input. Decision table S={U,A,V,f}, A=C ∪ D, condition attributes C and decision attributes D. Output. A set of attributes REDU. Method Step 1. Step 1.Computes the core C0 based on discernibility matrix M= mij nxn where i,j=1,2…n. REDU= C0; Step 2. Step 2.The matrix element mij which does not include core builds an expression by conjunct, that is ∧ mij; Step 3. Step 3.Converts the above expression into the extract form. Its terms Si are the set of attribute reduction Ri={V Si,i=1,2…n}; Step 4. Step 4. To Ri, computes the relevance of attributes in Ri based on the conditional entropy H
B|A
n
m
i =1
j =1
=- ¦ p (ai) ¦ p (bj | ai) log( p(bj | ai)) ,
A,B are the
elements of Ri A U/IND A ={a1,a2…an},B U/IND B = {b1,b2…bn} .In Ri=RiUC0, the set which value is the minimum of average of attribute relevance is REDU. 2.2 Results Terrain data include DEM, water area, concealment, vegetation, barrier, shelter , distance and traffic capacity attributes. The terrain data are from the city of Xiamen, which is located in the south of China. The range is from (117 38’14’’,24 33’30’’) to (117 52’30’’,24 25’14’’). After attribute reduction algorithm in high dimensional data is used, DEM, water area, concealment, vegetation and traffic capacity attributes will be used to cluster.
3 Feature Space Clustering 3.1 Algorithm 2. KFCSA (Kernel Based Fuzzy C-Means Self-adaptive Algorithm) Clustering of data in a feature space has been previously proposed[6],where c-means was expressed by the kernel trick. It was the hard-clustering case. In the paper, we use kernel function to fuzzy c-means.
KFCSA: A Novel Clustering Algorithm for High-Dimension Data
533
An alternative kernel function is given which is equivalent to map into a high dimensional space called feature space( x → ij (x) ).The mapping is gotten by means of a replacement of the inner product. In the feature space, cluster center can be expressed as: (1)
n
c j = ¦ Ȗ jk ij ( xk ) k =1
The objective function is defined as: n
c
(2)
n c
n
Į T T α J = ¦¦ u ij || φ ( xi ) − ¦ γ jk φ ( x k ) || 2 = ¦ ¦ uij [k ( xi , xi ) − 2Ȗ j ki + Ȗ j kȖ j ] i =1 j =1
i =1 j =1
k =1
where Ȗ i = (Ȗ i1, Ȗ i 2 "Ȗ in )T , k = (k1, k1 " kn ) , ki = (ki1, ki 2 " kin )T Equation uij and Ȗ jk is as follows (3)
n
Į −1 ¦ uij k k j
1
uij = c
dij
g =1
dig
¦(
1 ) Į −1
, Ȗj =
j =1
n
Į ¦ uij
j =1
where dij = k ( xi , xi ) − 2Ȗ Tj ki + Ȗ Tj kȖ j . The kernel based fuzzy c-means self-adaptive algorithm is as follows Step1. Initializes the positive parameters Į , İ and iterations m=1; initialize Ȗ j ;fix c=2; Step2. Computes the kernel matrix using Gaussian kernel k ( xi , x j ) = e
− q|| xi − x j || 2
;
) Step3. Updates uij(m ) ; calculate Ȗ (m again; j
Step4. If max | uij( m) − uij( m −1) |≤ İ , stop; else m=m+1,go step3; Step5. If validity measure function s get the minimum value, the clustering is over; else c=c+1, go step3. 3.2 Clustering Validity Measure Analysis Validity measure function is used to estimate the number of clusters. Girolami used the block diagonal structure in the kernel matrix to determine the number of clusters[7]. Girolami’s method is used in c-means method. Other researchers applied the method to fuzzy c-means. But the method has its disadvantage. When distinguish of the data in the data set is not obvious, the block diagonal structure in the matrix is not also obvious. It easily arrives at locally optimal solution. In the paper, the number of clusters is determined through the self-adaptive algorithm. The number of clusters need not be given in advanced. The initial value of number of the clusters is supposed. Then validity measure function is proposed to justify iteratively the number of clusters. Validity measure function is to estimate the correctness of the number of clusters.
534
K. Li and Y. Liu
Compactness of clustering is expressed as c n
comp = ¦ ¦ Ȝij
k ( xi , xi ) − 2Ȗ Tj ki + Ȗ Tj kȖ j
i =1 j =1
nuij
(4)
2
where Ȝij is used to judge the outlier.
1 °
λij = ®
, u ij > u lj i ≠ l ,
°¯0 , others.
If Ȝij =0, the data point is the outlier and it will be deleted. uij 2 is introduced to compactness function to strengthen the compactness of clusters. Separability of clustering is expressed as sep = min (Ȗ iT kȖ i − 2Ȗ iT kȖ j + Ȗ Tj kȖ j )
(5)
i, j
Validity measure function is defined as c n
s=
comp = sep
¦ ¦ Ȝij
k ( xi , xi ) − 2Ȗ Tj ki + Ȗ Tj kȖ j
i =1 j =1
min (Ȗ iT kȖ i i, j
nuij − 2Ȗ iT kȖ j
(6)
2
+ Ȗ Tj kȖ j )
where comp means the compactness of clustering and sep means the separability of clustering.After the number of clusters c is determined, data set may be divided into c clusters and weighted sum of squares from data in the clusters to their cluster centers arrive at the minimum value.
4 Experiments 4.1 Test in the Standard Data Set In order to verify the validity and feasibility of the clustering algorithm, experiments are made with fuzzy c-means and our algorithm. Experimental data are iris data set(from UCI Repository). We select randomly 20 data in iris set. In order to distinguish data, cluster centers and data points are drawn with different labels in the figures. In fig.1, the number of clusters is determined( c=3) in advance. From the result of FCM algorithm, the cluster centers easily run into locally optimal solution. At the same time, the results are related to the selection of the initial value of cluster centers. In the fig.2, KFCSA algorithm is used. The number of clusters(c=3) is gotten automatically via self-adaptive method. Three clusters may be shown in the fig.2. From the figures, the clustering effectiveness with our algorithm is better than the one with fuzzy c-means algorithm. The error rate in our algorithm is lower than the one in fuzzy c-means.
KFCSA: A Novel Clustering Algorithm for High-Dimension Data
Fig. 1. FCM clustering
535
Fig. 2. KFCSA clustering
4.2 Application in the High Dimensional Data Terrain is analyzed to determine the characteristic of terrain by the algorithm of KFCSA. DEM, water area, concealment, vegetation and traffic capacity attributes in the terrain data through RSAS algorithm are used to clustering analysis. The terrain data are also from (117 38’14’’,24 33’30’’) to (117 52’30’’,24 25’14’’) , which is located in the city of Xiamen in China. The result of terrain analysis by KFCSA algorithm is shown in fig.3. Two ellipses in fig.3 indicate two clusters which are selected by KFCSA algorithm.
Fig. 3. KFCSA applied in high dimensional data
5 Conclusions The validity of clustering algorithm depends on characteristic distinction along clusters. When distinction along clusters is not obvious, or overlapped, classical fuzzy cmeans method cannot tackle it well. In the paper, mercer kernel based fuzzy clustering self-adaptive algorithm is proposed. Data in the input space by mercer kernel is mapped into the feature space. Characteristic of data in the feature space may be strengthened. It can realize well the clustering to the data which distinction is faint. In our algorithm, self-adaptive method is used to determine automatically the number of clusters. Experiments show that our proposed algorithm gets better performance than
536
K. Li and Y. Liu
classical clustering algorithms. In addition, attribute reduction algorithm is used to decrease the numbers of attributes before these terrain data are clustered.
References 1. MacQueen,J.: Some methods for classification and analysis of multivariate observations. Proc. 5th Berkeley Symposium(1967) 281-297 2. Hartigan,J., Wang,M.:A K-means clustering algorithm. Applied Statistics,Vol. 28.(1979)100-108 3. Bezdek,J.C.: Pattern recognition with fuzzy objective function algorithm. Plenum Press(1981) 4. Jain,A., Dubes,R.: Algorithms for clustering data. Prentice Hall(1988) 5. Wallace,R.: Finding natural clusters through entropy minimization. Ph.D Thesis. CarnegieMellon University, CS Dept (1989) 6. Sch lkopf ,B., Smola, A., M ller,K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation,Vol. 10(5) (1998)1299-1319 7. Girolami, M.:Mercer kernel based clustering in feature space.IEEE Trans Neural Network,Vol.13(3) (2002)780-784
An Improved VSM Based Information Retrieval System and Fuzzy Query Expansion Jiangning Wu1, Hiroki Tanioka2, Shizhu Wang3, Donghua Pan1, Kenichi Yamamoto2, and Zhongtuo Wang1 1
Institute of Systems Engineering, Dalian University of Technology, Dalian, 116024, China {jnwu, gyise, wangzt}@dlut.edu.cn 2 Department of R&D Strategy, Justsystem Corporation, Tokushima, 771-0189, Japan {hiroki_tanioka, kenichi_yamamoto}@justsystem.co.jp 3 Dalian Justsystem Co.,Ltd, Dalian, 116024, China [email protected]
Abstract . In this paper, we propose an improved information retrieval model, where the integration of modification-words and head-words is introduced into the representation of user queries and the traditional vector space model. We show how to calculate the weights of combined terms in vectors. We also propose a new strategy to construct the thesaurus in a fuzzy way for query expansion. Through the developed information retrieval system, we can retrieve documents in a relatively narrow search space and meanwhile extend the coverage of the retrieval to the related documents that do not necessarily contain the same terms as the given query. Experiments for testing the retrieval effectiveness have been implemented by using benchmark corpora. Experimental results show that the improved information retrieval system is capable of improving the retrieval performance both in precision and recall rates.
1 Introduction With the information explosion on the Internet, Internet users have to encounter huge amount of information junk when they retrieve documents. Therefore, there is a great need for tools and methods to filter such information junk out and meanwhile retain the documents that users really want. The information retrieval (IR) system is thought to be one of good tools for solving the problems mentioned above. So far, many models have been proposed to construct effective information retrieval systems, of which the vector space model (VSM) [1] [2] [3] is the most influential. To date, this model leads the others in terms of performance [1]. It is hereby adopted in our study to construct the improved IR system. The major problem with VSM comes from the over simplicity of its purely term-based representation of information [4]. In this model, keywords are identified, pulled out of context, and further processed to generate term vectors. Unfortunately, independent keywords cannot adequately capture the document contents, resulting in poor retrieval performance. This motivates us to find a new way to the representation of information. It should be noted that among the identified keywords, some are nouns and verbs and some are adjectives and adverbs according to their parts of speech. From the grammatical point of view, there would be no any actual sense if an adjective or L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 537 – 546, 2005. © Springer-Verlag Berlin Heidelberg 2005
538
J. Wu et al.
adverb appears alone. In other words, adjectives and adverbs are both constraint words rather than independent concepts. In this paper, adjectives and adverbs are named as modification-words (MWs), and nouns and verbs are named as head-words (HWs). Ignoring the constraints of MWs will result in many irrelevant documents coming from the independent meaningless modification-words. To enhance MWs in user queries, a new information representing method is proposed in this paper, which integrates MWs with HWs to form new combined terms. The combined terms bring somewhat closer to user’s requests of the given queries. Except for the above problem with VSM, there exit some other problems to be solved. In the traditional VSM, the system’s relevance judgment is based on the basic assumption that a query and a document are related to each other only if there are shared words in the query and the document. However, simply by examining the terms a document and a query share, it is still difficult to determine whether the document is relevant to the user. The difficulty lies in the fact that most terms have multiple meanings (polysemy) on the one hand, and on the other hand, some concepts can be described by more than one term (synonym). Therefore, sometimes, many unrelated documents may be included in the answer set because they match some of the query terms, while some other relevant documents may not be retrieved because they do not contain any of the exact query terms [5]. To handle these problems, we propose a new method in which MWs and HWs are partially expanded based on the developed fuzzy synonym thesauruses, and then the expansion MWs and the expansion HWs are recombined in terms of the correlations between them for the search process. Based on the proposed methods, we develop a new IR system that can retrieve the relevant documents in a relatively narrow search space and meanwhile extend the coverage of the retrieval to the related documents that do not necessarily contain the same words as the given query. The experimental results show that the retrieval results obtained from our proposed IR system have improvements in two main measures, precision and recall.
2 System Architecture The flowchart of the proposed IR system is shown in Figure 1. This system consists mainly of three processing stages: the query expanding stage, the document representing stage, and the query-document matching stage. During the first stage, MWs and HWs are firstly identified by means of the syntactic analyzer. After that, if the given query has no MWs, the system only follows the path 2. In this case, the proposed IR system will return the same retrieval results as normal (we call this process normal search in the paper). On the other hand, if the given query contains MWs, the proposed system will follow the path 1. In this case, MWs and HWs will be integrated at first and then MWs and HWs be expanded respectively to recombine new terms according to the correlations between them as well as the fuzzy synonym thesauruses. Therefore, the system will return different retrieval results comparing with normal search (we call this process modification & expansion search in the paper). During the second stage, documents are represented by use of the modified VSM. The document vector can then be formed. During the last stage, structured search is conducted to obtain the similarities between queries and documents and the scores of the candidate documents. Finally, the retrieved documents whose scores exceed the predefined threshold return to the end user.
An Improved VSM Based IR System and Fuzzy Query Expansion
539
User interface
Input a query in natural language
NLPs
Document corpus Parse the given query
Index the parsed keywords and tag MWs
Index documents and tag MWs No (path 2) Is there MW? Yes (path 1) Integrate MWs with HWs
Expand MWs Fuzzy thesaurus
From document vector
Expand HWs
Form expanded query vector
Process matching by modified VSM Calculate the similarity value
CB search
Rank the candidate documents
Display and store the retrieval results
Result database
Fig. 1. The flowchart of the proposed IR system
3 Methods 3.1 General VSM In general VSM [2], each document di, i ∈ [1, m], m is the total number of documents in the corpus, can be represented as a weighted vector, di =
{w1i , w2i , ! wki , ! , wli }T ,
(1)
540
where
J. Wu et al.
wki is the weight of the term tk to document di which is a nonnegative value, l
is the number of indexing terms contained in document di, and T is the transpose operator. Similarly, a query can be represented as a vector in which the weights indicate the importance of each term in the overall query. Thus, the query vector qj, j ∈ [1, n], n is the total number of queries, can be written as: qj = where
{w1 j , w2 j ,!, wkj ,!, wlj }T ,
(2)
wkj is the weight of term tk to query qj which is a nonnegative value; l and T
respectively have the same meanings as above. The similarity s(di, qj) between two vectors can be measured by the cosine of the angle between di and qj. That is (3)
s(di, qj) = cos (di, qj) . 3.2 Modified VSM
The basic idea behind the modified VSM is recalculate the weights of combined terms and the similarity values between document vectors and query vectors presented in Formulas (1) and (2). We all know that the classic tf_idf is a common way to weigh the importance and uniqueness of a term in a document. In this study, we use the following formulas provided by Justsystem Corporation to define tf and idf respectively. tfki = 0.5 + 0.5 × (freq / max_ freq) and
(4)
idfk = 1.0 + log (total_doc / dist ) ,
(5)
where freq is the count of the term tk in document di, max_freq is the maximum count of a term in the same document, total_doc is the total number of the document collection, and dist is the number of documents which contain the term tk. The weight of the combined term can be calculated by the following formula: wki = tfki × idfk .
(6)
By conducting normalization to Formula (6), we finally get
wki =
tf ki × idf k l
¦ (tf
ki
× idf k )
, 2
k =1
where l is the size of the indexing term set in document di.
(7)
An Improved VSM Based IR System and Fuzzy Query Expansion
541
With respect to the weight of term tk in the query qj, denoted as wkj, it is defined in a similar way as wkj (that is, tfkj × idfk). Once the weights of all indexing combined terms both in the document di and in the query qj are determined, the similarity between di and qj can be measured by the cosine of the angle. That is l
s(di, qj) =
¦w
ki
× wkj
k =1 l
. l
¦w צw 2 ki
k =1
(8)
2 kj
k =1
3.3 Query Expansion and Fuzzy Thesaurus Query expansion is a natural idea when, as is often in practice the case, the user’s initial requests are very brief, regardless of whether the initial request terms are very specific or not. Thus enlarging requests allows both for more discriminating retrieval through matches on several terms and for more file coverage through getting a match at all. Query expansion using thesauruses is proved to be an effective way for improving the performance of the IR system. Briefly, there are two types of thesauri, that is, hand-crafted thesauri and corpus-based thesauri [9]. The latter is generated based on the term co-occurrence statistics. In our study, we construct the fuzzy thesaurus by calculating the simultaneous occurrences of terms and the term-term similarities in thesauruses derived from WordNet1. Now let's take the determination of expansion terms and their corresponding weights by the fuzzy way into account in detail. Suppose that the given query qj contains l terms describing l aspects of the query. Suppose that all terms in the given query can be expanded according to the fuzzy thesauruses. If all the expansion terms are directly added into the original query vector, then the new expanded query vector will be in the form of qj =
{w1 j , w1 j ,1 , w1 j , 2 ,! w1 j , N1 , w2 j ,!, wsj ,!, wlj , wlj ,1 ,! wlj , N l }T ,
where
(9)
wlj , N l is the weight of the expansion term related to the term tl in the query j, Nl is
the number of expanded terms corresponding to the term tl and T is the transpose operator. To calculate the weights of the expansion terms, we must firstly obtain the values of nearness degrees between the original term and the expansion terms. In this paper, they are determined mainly based on the term co-occurrence measure. Suppose that two terms that occur frequently together in the same document are related to the same concept. Therefore, the similarity of the original term t k in the query and the expansion term t e can be determined by a term-term relevance coefficient r ke according to thesauruses derived from WordNet, such as Tanimoto coefficient [9]: 1
WordNet http://www.cogsci.princeton.edu/~wn/
542
J. Wu et al.
rke =
nke , nk + ne − nke
(10)
where nk is the number of synonym sets in thesauruses which contain the term tk, ne is the number of synonym sets in thesauruses which contain the term te, and nke is the number of synonym sets in thesauruses which contain both terms. Such a relevance coefficient represents the nearness degree of the term tk to the expansion term te, which takes values in the interval [0, 1]. All nearness values for the term tk and all corresponding expansion terms form a relevance matrix Rke shown as following:
t1 t1 Rke = t 2 # t Nk
ª r11 «r « 21 « # « «¬rN k 1
t2 " t Nk r12 " r1N k º r22 " r2 N k »» , # rke # » » rN k 2 ! rN k N k »¼
(11)
where the element rke represents the nearness degree of term tk to term te, rke ∈ [0, 1], and Nk is the number of expansion terms. For k ≠ e, rke = rek; for k = e, rkk = ree = 1. In real-world applications, not all expansion terms are important enough to the document collection. From this point of view, we therefore define a membership value between document di and expansion terms in a fuzzy way based on the term correlation matrix Rke. In this research, the membership value is defined as following Nk
μ kd = 1 − ∏ (1 − rkm ) , i
for rkm ∈ Rke
(12)
m =1 m ≠k
where μ kd (k = 1, …, Nk,) denotes the membership degree for term tk to document di, which is computed as a complement of the negated algebraic product over all expansion terms involved; and rkm takes values from the term correlation matrix Rke. Here the adoption of an algebraic sum over all expansion terms with respect to the given index term tk (instead of the classical maximum function) allows a smooth transition for the values of μ kd factor. Once the relatively important expansion terms are determined, the weighting i
i
T
method can be applied to them. For a query j, qj = {w1 j , w2 j , ! , wkj , ! , wlj } and an expansion term te, the similarity between qj and te can be defined as following [11]: s(qj, te) =
¦w
t kj ∈q j
kj
× rke
for k ∈ [1, l]
(13)
where wkj is the weight of term tk when it is contained in the query j, rke is the nearness value for the term tk and the expansion term te, and l is the total number of indexing terms in the collection.
An Improved VSM Based IR System and Fuzzy Query Expansion
543
The weight of the actual expansion term te, denoted as wej, with respect to the query vector qj is defined based on s(qj, te) as: wej =
s(q j , t e )
¦w
.
(14)
kj
t kj ∈q j
Depending to the modification weights of all expansion terms corresponding to the original term tk, an expansion term vector can then be formed. It is qk = {wej ,1 , wej , 2 , ! , wej , N }T . e
(15)
where Ne is the number of expansion terms with respect to the original term tk. By adding the above vector into the original query vector, the expanded query vector can then be obtained.
4 Effectiveness Evaluation We use a small size collection, LA-times, contained in the TREC2 corpus to examine the effectiveness of the proposed IR system. The queries and a list of documents relevant to each query are randomly chosen from this collection. The statistics on them are listed in Table 1 below. To improve the performance of the proposed IR system, we create a fuzzy synonym thesaurus that stores a number of adjectives to expand MWs occurring in the user queries and the documents. We use WordNet to determine the related terms, where only synonymy relations implied in WordNet synsets are concerned. In our study, we use precision and recall to evaluate the effectiveness of the proposed IR system. To verify the correctness and effectiveness of the proposed IR system developed based on the methods described above, three types of experiments have been implemented at first, Normal Search (NS), Head-word WeighTing Search (HWTS) and Head-word WeighTing plus Expansion Search (HWTplusES). In respect of NS, only CB Search tool developed by Justsystem Corp. is involved in, which is used as a baseline against the other two search approaches. The average precision and recall as benchmarks obtained by using NS for LA-times collection are presented in Tables 2 below. In HWTS, CB search tool is also used alone. However differing from NS, for HWTS, the weights of head-words are adjusted by adding an important factor that reflects the importance of the head-words in the examined query in order to heighten the weights of head-words (mainly referring to nouns in our experiments). Therefore an increased precision is obtained against NS for LA-times collection, see Table 2. To test our fuzzy thesaurus and query expansion method addressed previously, the other experiment named HWTplusES based on HWTS is then designed and implemented. In this experiment, the VSM modifying module and the document retrieving 2
TREC http://trec.nist.gov/
544
J. Wu et al.
module are both used. The retrieved results obtained by using HWTplusES for LAtimes collection are also summarized in Table 2. In addition, to reveal the constraints of modifiers to the associated head-words ad hoc adjectives and nouns, the experiment named Modifier Adjective Search (MAS) is accordingly conducted for LA-times collection by implementing combined search. The retrieved results as shown in Table 2 indicate that the precision of the proposed IR system increases greatly with a 5.16 percent rise against the normal search. Figure 2 illustrates 11-point precision-recall curves using three different searches with respect to La-times collection for all 108 queries. Except for the precision-recall curves, we also draw a bar graph for La-times collection using all 108 queries in order to make a comparison between NS, HWTS and HWTplusES in another way, see Figure 3, in which each bar is corresponding to one point in Figure 2. Table 1. Statistics for the selected collection
Collection name LA-times
Number of documents
Number of queries
Size (Mbytes)
3,319
108
15
Table 2. Average precision and recall obtained from four different search approaches
Name of approach NS HWTS HWTplusES MAS
Avg. Precision (%) 20.01 20.10 20.08 25.17
Avg. Recall (%) 73.73 73.80 73.71 34.84
precision
NS HWTS HWTplusES 1 0.8 0.6 0.4 0.2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 recall
Fig. 2. Precision-recall curves using three different searches with respect to LA-times collection for all 108 queries
An Improved VSM Based IR System and Fuzzy Query Expansion
545
HWT S
Precision
HWT plusES 0.004 0.002 0 -0.002 -0.004 -0.006 -0.008 -0.01 Recall
Fig. 3. A bar graph for LA-times collection using all 108 queries corresponding to 11 precisionrecall points in two experiments HWTS and HWTplusES
5 Conclusion Remarks In this paper, a new improved IR system based on VSM is proposed. The results obtained from the experiments lead us to believe that the integration of MWs and the associated HWs is indeed an effective way to improving the performance of the vector-based IR system. Our proposed system not only performs well in terms of precision and recall but also provides a fuzzy thesaurus that is very useful for query expansion. The automatic approach to constructing such a fuzzy thesaurus can save, to a great extent, the manpower and meanwhile avoid inconsistencies resulted from human mistakes. After analyzing the method and doing experiments with TREC corpus, it can be concluded that during the text retrieval, restricting HWs by the associated MWs can improve the precision of the proposed IR system compared to the normal CB Search system, and expanding MWs and HWs simultaneously can improve the recall of the proposed IR system compared to the normal CB Search system.
Acknowledgements The work reported in this paper is subject to an international collaborative research project that is sponsored by Justsystem Corporation of Japan. The authors would like to thank master students Huinan Ma and Jun Zhang of Dalian University of technology who did much work on documentation and experimention.
References 1. Kraft D.H., Petry F.E.: Fuzzy information systems: managing uncertainty in databases and information retrieval systems. Fuzzy Sets and Systems. 90 (1997) 183-191 2. Salton G., Wong A., Yang C. S.: A vector space model for automatic indexing. Communications of the ACM. 18 (1975) 613-620
546
J. Wu et al.
3. Salton G., Buckley C.: Term-weighting in information retrieval using the term precision model. Journal of the Association for Computing Machinery. 29 (1982) 152-170 4. Papadimitriou C.H., Raghavan P., Tamaki H., Vempala S.: Latent semantic indexing: A probabilistic analysis. Journal of Computer and System Sciences. 61 (2000) 217-235 5. Letsche T.A., Berry M.W.: Large-scale information retrieval with latent semantic indexing. Information Sciences. 100 (1997) 105-137 6. Chandren-Muniyandi R., Komputer J.S., Maklumat F.T. dan S.: Neural network: An exploration in document retrieval system. In: Proceedings of TENCON 2000. Vol. 1 (2000) 156-160 7. Ramirez C., Cooley R.: Case-based reasoning model applied to information retrieval. In: IEE Colloquium on Case Based Reasoning: Prospects for Applications. (1995) 9/1 -9/3. 8. Liu G.: The semantic vector space model (SVSM)⎯A text representation and searching technique. In: Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences, Vol. IV: Information Systems: Collaboration Technology Organizational Systems and Technology. Vol. 4 (1994) 928 –937 9. Mandala R., Tokunaga T., Tanaka H.: Query expansion using heterogeneous thesauri. Information Processing and Management. 36 (2000) 361-378 10. Kim M.C., Choi K.S.: A comparison of collocation-based similarity measures in query expansion. Information Processing and Management. 35 (1999) 19-30 11. Qiu Y., Frei H.: Concept based query expansion. In: Proceedings of the 16th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. (1993) 160-169 12. Nie J.Y., Jin F.: Integrating logical operators in query expansion in vector space model. In: Workshop on Mathematical/Formal Methods in Information Retrieval, 25th ACM-SIGIR. (2002)
The Extraction of Image’s Salient Points for Image Retrieval Wenyin Zhang1,2 , Jianguo Tang1,2 , and Chao Li1 1
Chengdu University of Information Technology, 610041, P.R. China 2 Chengdu Institute of Computer Applications Chinese Academy of Sciences, Chengdu 610041, P.R. China
Abstract. A new salient point extraction method from Discrete Cosine Transformation (DCT) compressed domain for content-based image retrieval is proposed in this paper. Using a few significant DCT coefficients, we provide a robust self-adaptive salient point extraction algorithm, and based on salient points, we extract 13 rotation-, translation- and scaleinvariant moments as the image shape features for retrieval. Our system reduces the amount of data to be processed and only needs to do partial entropy decoding and partial de-qualification. Therefore, our proposed scheme can accelerate the work of image retrieval. The experimental results also demonstrate it improves performance both in retrieval efficiency and effectiveness. Keywords: Salient Point, Image Retrieval, Discrete Cosine Transformation, DCT.
1
Introduction
Digital image databases have grown enormously in both size and number over the years [1]. In order to reduce bandwidth and storage space, most image and video data are stored and transmitted by some kind of compressed format. However, the compressed images cannot be conveniently processed for image retrieval because they need to be decompressed beforehand, and that means an increase in both complexity and search time. Therefore, it is important to develop an efficient image retrieval technique to retrieve wanted images from the compressed domain. Nowadays, more and more attention has been paid on the compressed-domain based image retrieval techniques [2] which extract image features from the compressed data of the image. The JPEG is the image compression standard [3] using DCT and is widely used in large image databases and on the World Wide Web because of its good compression rate and image quality. However, the conventional image retrieval approaches used for JPEG compressed images need full decompression which consumes too much time and requires large amount of computation. Some new researches [4,5,6,7,8,9,10] have recently resulted in improvements in that image features can be directly extracted in the compressed domain without full decompression. L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 547–556, 2005. c Springer-Verlag Berlin Heidelberg 2005
548
W. Zhang, J. Tang, and C. Li
The purpose of this paper is to propose a novel compressed image retrieval method based on salient points [11] computed from DCT compressed domain. The salient points are interesting for image retrieval because they are located in visual focus points and thus they can capture the local image information and reduce the amount of data to be processed. The salient points are related to the visually most important parts of the images and lead to a more discriminant image feature than interesting points such as corners [14]. Unlike the traditional interesting points, the salient points should not be clustered in few regions. It’s quite easy to understand that using a small amount of such points instead of all images reduces the amount of data to be processed. First, based on a small part of important DCT coefficients, we provide a new salient point extraction algorithm which is very robust to noise, rotation, translation and scale and most of common image processing such as lighting, darkening, blurring, compressing and so on. Then, we adaptively choose some important salient points to constitute a binary salient map of the image, which represents the shape of the objects in the image. Last, we extract 13 rotation-, translation- and scale-invariant moments [12,13] from the salient map as the shape features of the image for retrieval. The remainder of this paper is organized as follows. In Section 2, we introduce the works related to JPEG compression image retrieval. In Section 3, we discuss in details our new scheme, followed by the experimental results and analysis. Finally, Section 5 concludes the paper.
2
Related Works
Direct manipulation of the compressed images and videos offers low-cost processing of real time multimedia applications. It is more efficient to directly extract features in the JPEG compressed domain. As a matter of fact, many JPEG compressed image retrieval methods based on DCT coefficients have been developed in recent years. Climer and Bhatia proposed a quadtree-structure-based method [4] that organizes the DCT coefficients of an image into a quadtree structure. This way, the system can use these coefficients on the nodes of the quadtree as image features. However, although such a retrieval system can effectively extract features from DCT coefficients, its main drawback is that the computation of the distances between images will grow undesirably fast when the number of relevant images is big or the threshold value is large. Feng and Jiang proposed a statistical parameter-based method [5] that uses the mean and variance of the pixels in each block as image features. The mean and variance can be directly computed via DCT coefficients. However, this system has to calculate the mean and variance of each block in each image, including the query image and the images in the database, and the calculation of the mean value and variance value of each block is a computationally heavy load. Chang, Chuang and Hu provided a direct JPEG compressed image retrieval technique [6] based on DC difference and the AC correlation. Instead of fully decompressing the images, it only needs to do partial entropy decoding and extracts the DC difference and the AC correlation
The Extraction of Image’s Salient Points for Image Retrieval
549
as two image features. However, although the retrieval system is faster than the method [4,5], it doesn’t do well in anti-rotation. The related techniques are not limited to the above three typical methods. Shneier [7] described a method of generating keys of JPEG images for retrieval, where a key is the average value of DCT coefficients computed over a window. Huang [8] rearranged the DCT coefficients and then got the image contour for image retrieval. B.Furht [9] and Jose A.Lay [10] made use of the energy histograms of DCT coefficients for image or video retrieval. Most image retrieval methods based on DCT compressed domain strengthened the affectivity and efficiency of image retrieval [2]. But most of these research focused on global statistical feature distributions which have limited discriminating power because they are unable to capture the local image information or shape information. In our proposed approach, we use the image salient points computed from a small part of significant DCT coefficients to describe the image feature. The salient points give local outstanding information and on the whole provide the shape features of the image.
3
The Proposed Method
In this section, we introduce in details our retrieval methods based on salient points. The content of the section is arranged with the sequence: edge point detection→ salient point extraction→image feature extraction. 3.1
Fast Edge Detection Based on DCT Coefficients
Edges are significant local changes in the image and are important feature for analyzing image because they are relevant to estimating the structure and properties of objects in the scene. Here we provide a fast edge detection algorithm in DCT domain which directly compute the pixel gradients from DCT coefficients to get edge information. Based on it, we give the salient points extraction algorithm. The 8 × 8 Inverse DCT formula is as follows: f (x, y) =
7 7 1 C(x, u)C(y, v)F (u, v); 4 u=0 v=0
(2x + 1)uπ ; 16 1 √ ,x = 0 2 c(x) = 1 , x = 0
where : C(x, u) = c(u) cos
Compute derivative to formula (1), we get:
f (x, y) =
∂f (x, y) ∂f (x, y) + ∂x ∂y
(1)
550
W. Zhang, J. Tang, and C. Li 7 7 1 C (x, u)C(y, v)F (u, v) 4 u=0 v=0
=
7 7 1 C(x, u)C (y, v)F (u, v) + 4 u=0 v=0
W here :
C (x, u) = −
(2)
(2x + 1)uπ) uπ c(u) sin 8 16
From the equation (2), we can compute the pixel gradient in (x, y), its magnitude can be given by: G(x, y) = |(
∂f (x, y) ∂f (x, y) )| + |( )| ∂x ∂y
(3)
In order to simplify computation, we change the angle (2x+1)uπ to acute angle 16 [15]. Let (2x + 1)u = 8(4k + l) + qx,u , k, l and qx,u are integers, in which: qx,u = (2x + 1)u mod 8, k = (2x + 1)u/32, l = (2x + 1)u/8 mod 4, 0 ≤ qx,u < 8, 0 ≤ l < 4. Then, we can do as follows: (8(4k + l) + qx,u (2x + 1)uπ ) = sin( ) sin( 16 16 ⎧ ⎫ qx,u π ⎪ ⎪ ⎪ ⎪ ⎪ sin( 16 ) : qx,u = qx,u , l = 0 ⎪ ⎪ ⎪ ⎪ ⎪ qx,u π ⎨ ⎬ sin( 16 ) : qx,u = 8 − qx,u , l = 1 = qx,u π ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ − sin( 16 ) : qx,u = qx,u , l = 2 ⎪ ⎪ ⎪ ⎩ ⎭ qx,u π − sin( 16 ) : qx,u = 8 − qx,u , l = 3
= (−1)
l−1 2
sin(
qx,u ) 16
(4)
Similarly, we can get:
cos(
qx,u l+1 (2x + 1)uπ ) = (−1) 2 cos( ) 16 16
(5)
To the formulae (4) and (5), the sign and the qx,u can be decided aforehand according to x, u. Let ssx,u and csx,u be the signs of formulae (4) and (5). The ssx,u and qx,u can be described as follows: ⎫ ⎫ ⎧ ⎧ + + + + + + + +⎪ 0 1 2 3 4 5 6 7⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ + + + + + + − −⎪ 0 3 6 7 4 1 2 5⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ + + − − − + + + 0 5 6 1 4 7 2 3 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬ ⎬ ⎨ ⎨ +++−−++− 07254361 ; qx,u = ssx,u = + + − − + + − −⎪ 0 7 2 5 4 3 6 1⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ + + − + + − + + 0 5 6 1 4 7 2 3⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪+ + − + − + + −⎪ ⎪0 3 6 7 4 1 2 5⎪ ⎪ ⎪ ⎪ ⎪ ⎭ ⎭ ⎩ ⎩ ++−+−+−+ 01234567
The Extraction of Image’s Salient Points for Image Retrieval
551
The csx,u can be given the same as ssx,u . For more time-saving, according
to Taylor formula, we extend sin(
qx,u 16 )
and cos(
n qx,u π ( sin( )= sin(k) ( ) 16 4
qx,u 16
k=0
n qx,u π ( cos( )= cos(k) ( ) 16 4
at π/4:
qx,u − π4 )n + Rn ( ) n! 16
qx,u 16
k=0
q
qx,u 16 )
(6)
qx,u − π4 )n + Rn ( ) n! 16
(7)
−4
( 3π )n+1 ( π )n+1 | x,u4 |n+1 ≤ 16 where : |Rn | < 4 (n + 1)! (n + 1)! When consider to extend up to second order, the equation (6) and (7) can be approximated as follows: √ qx,u π 2 π2 )≈ [1 − (4 − qx,u ) + (4 − qx,u )2 ] (8) sin( 16 2 16 512 √ qx,u π 2 π2 )≈ [1 + (4 − qx,u ) − (4 − qx,u )2 ] cos( (9) 16 2 16 512 The residue error R2 is more less than 0.034. This suggests that the equation (8) and (9) can be approximately decided by qx,u and can be calculated off-line. As such, the C (x, u) and C(x, u) in the equation (2) also can be calculated approx imately by the qx,u , ssx,u and csx,u off-line, which means that the coefficients of the extension of equation (2) can be computed in advance and the equation (3) is only related to the DCT coefficients F (u, v). So, the computation of the equation (3) is much simplified. Further more, we need not use all the coefficients to compute the equation (2), because most of the DCT coefficients with high frequency are zero and do nothing to the values of edge points, so they can be omitted, which means we can use a small part of DCT coefficients with low frequency to compute the edge points and more computation cost is saved again. Fig.1 gives an example for edge detection using different DCT coefficients. From the Fig.1, we can see that the more the DCT coefficients used, the smoother the edge. With the decreasing of the number of DCT coefficients used, the “block effect” becomes more and more obvious. In a 8 × 8 block, the more gray changes, the larger every edge point value. The Fig.2 gives another example of edge detection with first 4 × 4 DCT coefficients. 3.2
Salient Points Computation
According to analysis in Sec.3.1 that the edge points in a block reflect the gray changes in this block, the more changes, the larger edge points value, we sum all the edge points values in one block to stand for one salient point value, which means that one 8 ∗ 8 block corresponds to one salient point. If M × N stands for
552
W. Zhang, J. Tang, and C. Li
Fig. 1. An example of edge detection: (a) Lena.jpg, (b)-(h) are the edge images of Lena, which are computed respectively by first n*n DCT coefficients, 2 ≤ n ≤ 8.
Fig. 2. Another example of edge detection, which are computed respectively by first 4*4 DCT coefficients.
the size of an image, the maximum of its salient points number is M/8 × N/8. Let Sp(x , y ) be the salient point value in (x , y ), 0 ≤ x < M/8, 0 ≤ y < N/8, it can be computed as follows: (γ is a parameter, 2 ≤ γ ≤ 7)
Sp(x , y ) =
x ×8+γ y ×8+γ
|G(x, y)|
(10)
x=x ×8 y=y ×8
3.3
Adaptive Selection of Salient Points
Not all salient points are important, we want to adaptively select the salient points with larger value. The number of the salient points extracted will clearly influence the retrieval results. Less salient points will not mark the image; More salient point will increase the computation cost. A given threshold T may be
The Extraction of Image’s Salient Points for Image Retrieval
553
not suited for all images. Through experiments we have found that the gray changes in a block can be relatively reflected by the variance (denoted by σ)of AC coefficients in this block. The more changes, the larger the variance. Let Msp be the mean value of Sp(x , y ), We adaptively select the salient points which satisfy the following condition:
λ × Sp(x , y ) > μ × Msp
(11)
W here : λ = σ/128, 0 ≤ λ ≤ 1; @ Ax 1 °°λSim (q, d ) + (1 − λ)¨ 2 a ¨ 1 + max(mms) − min(mms) ¸ ¨ ¸ Sim4 (q, d ) = ® q' + ne(qa ) © ¹ © ¹ ° q' d + ne(qa d ) = 1 ¯°Sim2 (q, d ),
(5)
Formula (5) is modified based on the formula (4), and it adds a query expansion words. Query words and query expansion words are considered when computing the span size ratio and word matching ratio. When counting the number of query words, the set of query expansion words ( q a ) as a whole counts as one query word. The
ne(") function checks whether the set qa d is empty. If
q a d > 0 then
ne ( q a d ) returns 1, represent that query expansion word occurring in document,
while 0 represents that query expansion word doesn’t occurring in document. 3.3 Case Study of Computing Similarity Give a question Q3: (How far away from the earth to the sun?), query words are { , , },and question type is “ (num_distance)”. After query expansion, query expansion words are { }. The computing process of similarity between question and document based on minimal matching span is shown as follows:
The Research on Query Expansion for Chinese Question Answering System
577
Assuming that there is a document d with query words matching at following position : Posd ( ) = {20 35 60} , Posd ( ) = {38 70} , Posd ( ) = {Φ} ,Before query expansion, matching span ( ms ) is {20,35,38,60,70},{20,35,70}{20,38},{35,38}, {38,60}, ,{60,70},obtained by definition 1, and minimal matching span ( mms ) is {35,38},obtained by definition 2, so its span size ratio is 2/(1+38-35)=0.5, and its word matching ratio is 2/3=0.67. Take α = 1 8 , β = 1 , and λ = 0.4 , assuming the global
similarity before query expansion, Sim1 ( q, d ) = 0.72 ,is computed by formula(1), then Sim 3 ( q , d ) is the minimal matching span similarity of unexpanded 1
0.4 × 0.72 + 0.6 × (0.5) 8 × (0.6)1 = 0.657 .
(distance)”,query is expanded. Assume that the According to question type “ matching positions of query expansion words in the document are as follows: Posd ( ) = {41} Posd ( ) = {Φ} Posd ( ) = {Φ} Posd ( ) = {Φ} Posd ( ) = {180} , Pos d (
) = {Φ}
Pos d (
)
= {Φ}, Pos d ( ) = {Φ} ,
in which expansion words
and occur in the document. Therefore, distribution relationship of two words must be considered respectively, so that one is selected to compute similarity. ”,its corresponding ( ms ) is {20,35,38,41,60,70}, For expansion word “ {20,35,38,41},{35,38,41},{20,38,41}, ,{20,,41,70},obtained by definition 1, and minimal matching span (mms) is {35,38,41},obtained by definition 2, so span size ratio is 3/(1+41-35)=0.429. Similarity, for example word “ ”,its corresponding ms is {20,35,38,60,70,180}, {20,35,38,180},{35,38,180},{35,85,180}, ,{60,70,180},and its minimal matching span (mms) is {60,70,180},so the span size ratio is 3/(1+180-60)=0.025. Because the span size ratio of “ ” is bigger than that of “ ”, “ ” is selected as expansion word to compute similarity . The word span ratio after query expansion is 3/4=0.75, take α = 1 8 β = 1 λ = 0.4 . Assuming the global similarity after query expansion, Sim2 (q, d ) = 0.8 , computed by formula (3), then the minimal matching span similarity after query expansion Sim4 (q, d ) is 1
0.4 × 0.8 + 0.6 × (0.429) 8 × (0.75)1 = 0.725 .
4 Experiment and Evaluation The experiment focuses on the evaluation of performance effect that query expansion has on answer-document retrieval. At present, there is no uniform test data set of questions and answer-documents about Chinese question answering system, and the purpose of answer retrieval in question answering system is to retrieve the document containing correct answers. Therefore, data set for experiments must be collected and tagged. 10 fine-grained question types including “ (num_price)”, “ (num_speed)”,” (num_temperature)”,“ (num_age)”,“ (num_weight)”, “ (num_distance)”,“ (num_frequency)”,“ (num_area)”,“ (loc_island)”, and “ (time_year)” are selected to do this experiment. We collect 20 questions for
578
Z. Yu et al.
each fine-grained type respectively, obtain the query words after parsing each question, and submit them to Baidu search engine; then we extract the first 30 WebPages of each question as the candidate answer-documents, and establish 200 Chinese questions and more than 6000 corresponding document sets; finally, we process these documents, tag whether the document contains the corresponding answers, and tag the number of corresponding answer for document containing correct answers .Thus a test data set of questions and answers is established. Generally, in information retrieval, evaluation methods are precision and recall rate. For Chinese question answering system, correct answers are needed to be further extracted from the recall documents, so evaluation emphasizes whether the recall documents contain correct answer-documents. Therefore, there is a quite simple evaluation method a@n for evaluating precision. It only checks whether the first n recall documents contain correct answer-documents. If they contain correct answerdocument, then it is 1,and 0 otherwise. For all test questions, a@n refers to that the number of all questions in test set is divided by the number of documents containing correct answers in the first n recall documents. The experiment uses precision a@n to evaluate answer-document retrieval. In order to verify the effect of the proposed method. Using vector space modole (VSM) and minimal matching span (MMS) to retrieve answer-document, we evaluate the answer-document retrieval experiment that doesn’t expand query (unexpanded) and does expand query (expanded) for the 200 questions in test set respectively. We use formula (1) (VSM unexpanded), formula (3) (VSM expanded), formula (4) (MMS unexpanded) and formula (5) (MMS expanded) to compute document similarity respectively. Table 2 lists the data of answer retrieval results before query expansion and after query expansion. From the above data, when we evaluate that the number of recall documents is 5,8,10,15 respectively, document retrieval precision shows that query expansion makes substantial improvement over an unexpanded baseline using two methods respectively. The percent in bracket is ratio improved after query expansion. The method based on MMS make better than the one based on VSM. Table 2. Comparison of answer-document retrieval results before query expansion and after query expansion based on MMS and VSM respectively
a@n a@5 a@8 a@10 a@15
VSM unexpanded expanded 0.445 0.51(+14.6%) 0.595 0.665(+11.8%) 0.68 0.745(+9.7%) 0.775 0.85(+9.7%)
MMS unexpanded expanded 0.505 0.71(+40.6%) 0.655 0.78(+19.1%) 0.75 0.855(+14%) 0.855 0.96(+12.3%)
5 Conclusions The purpose of answer-document retrieval in Chinese question answering system is to recall the documents containing correct answer to questions. Query expansion method
The Research on Query Expansion for Chinese Question Answering System
579
in information retrieval is not fully suitable for question answering system. According to the features of Chinese question answering system, this paper proposes a new query expansion method that establishes related words for specific question type through statistical method, and expands query with related words for specific question type and synonymy in HowNet. In order to verify the effect of this method, on the basis of vector space model, this paper proposes a computing method of similarity between questions and documents based on minimal matching span. This method fully considers the effects of query words and query expansion words on answerdocument retrieval. The experiment results show that in Chinese question-answering system, answer-document retrieval has quite good effect using query expansion method proposed in this paper. Further research will focus on the following aspects: 1. query expansion of specific question type for extracting sentences as answer; 2. answer extraction according to question type and answer sentence model.
References 1. Zheng, S.F., Liu, T., Qin, B.: Overview of Question Answering. Journal of Chinese Information Processing, Vol.16, No.6, (2002) 46-52 2. Cui, H., Wen, J.R., Li, M.Q.: A Statistical Query Expansion Model Based on Query Logs. Journal of Software, Vol.19, No.3, (2003) 1593-1599 3. He, H.Z., He, P.L., Gao, J.F.: Query Expansion Based on the Context in Chinese Information Retrieval. Joural of Chinese Information Processing, Vol.16, No.6, (2003) 3245 4. Voorhees, E.: Overview of the TREC 2003 Question Answering Track. In: Voorhees, E. (eds.): Proceeding of the 11th Text Retrieval Conference,NIST Special Publication, Gaithersburg, (2003)1-15 5. Li, X., Roth, D.: Learning Question Classifier. In: Tseng, S.C. (eds): Proceeding of the 19th International Conference on Computational Linguistics, Morgan Kaufmann Publishers, Taipei, (2002) 556-562 6. Singhal, A., Salton, G., Mitra, M.: Document Length Normalization. Information Processing & Management. Vol.32, No.5, (1996) 619–633 7. Buckley, C., Singhal, A., Mitra, M.: New Retrieval Approaches Using SMART. In: Harman, D. (eds.): Proceedings of the Fourth Text REtrieval Conference. NIST Special Publication, Gaithersburg, (1995) 25-48 8. Salton, G. Buckley, C.: Term Weighting Approaches in Automatic Text Retrieval. Information Processing and Management, Vol.24, No.5, (1998) 513–523
Multinomial Approach and Multiple-Bernoulli Approach for Information Retrieval Based on Language Modeling Hua Huo1,2 , Junqiang Liu2 , and Boqin Feng1 1 2
Department of Computer Science, Xi’an Jiaotong University, P.R. China [email protected] School of Electronics and Information, Henan University of Science and Technology, P.R.China
Abstract. We present a new retrieval method based on multipleBernoulli model and multinomial model in this paper. We use the multiple-Bernoulli model and multinomial model to estimate the term probabilities by importing the conjugate prior and the term frequencies, and use Dirchlet method to smooth the models for solving the ”zero probability” problem of the language model.
1
Introduction
Ponte and Croft’s multiple-Bernoulli approach in [4] and Miller’s multinomial approach in [3] are typical retrieval methods based language model. For further improving retrieval performance, we present a new multiple-Bernoulli approach and a new multinomial approach. Different from the two approaches in which the term frequencies in the query and document are not captured, we employ the term frequencies and use Dirchlet smoothing method to solve the ”zero problem” in the approaches instead of the linear interpolation smoothing method. The general idea is to build a language model Md for each document d , and rank the documents according to how likely the query q can be generated from each of these document models, i.e. P (q|Md ). In different models, the probability is calculated in different approaches [4]. In M ultinomial model, the query is treated as a sequence of independent terms (i.e. q = w1 , w2 , wm ), taking into account possibly multiple occurrences of the same term. The ”ordered sequence of terms assumption” behind this approach states that both queries and documents are defined by an ordered sequence of terms.A query of length k is modeled by an ordered sequence of k random variables, one for each term occurrence in the query.Based on this assumption,the query probability can be obtained by multiplying the individual term probabilities [1]. P (q|Md ) =
m +
P (wi |Md )
(1)
i
Supported by the science research foundation program of Henan University of Science and Technology, China (2004ZY041) and the natural science foundation program of the Henan Educational Department, China (200410464004).
L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 580–583, 2005. c Springer-Verlag Berlin Heidelberg 2005
Multinomial and Multiple-Bernoulli Approach for Information Retrieval
581
In M ultiple − Bernoulli model, a query is represented as a vector of binary attributes, one for each unique term in the vocabulary, indicating the presence or absence of terms in the query. Terms are assumed to occur independently of one another in a document.So,the query likelihood P (q|Md ) is thus formulated as the product of two probabilities-the probability of producing the query terms and the probability of not producing other terms [1]. P (q|Md ) =
+
P (wi |Md )
w∈q
2
+
(1 − P (wi |Md ))
(2)
w!∈q
Computing the Term Probability
Both queries and documents are represented as vectors of indexed words. Given a document d = (f1 , f2 , ..., fV ) ∈ [0, 1]V , where fi is the term frequency of the ith word wi , and V is the size of the vocabulary. Furthermore, consider a Multiple-Bernoulli generation model or a multinomial generation model for each document, parameterized by the vector: Md = (Mf1 , Mf2 , ..., MfV ) ∈ [0, 1]V which indicates the probabilities of emission of the different words in the vocabV
ulary. Where Mfi = P (wi |Md ) and Mfi = 1. Now, let us define the length of i=1
fi . We wish to compute the a document as the sum of their components: nd = i
maximum likelihood by Bayesian inference and use it as the documents language model [1][5]. According the Bayesian approximate equation P (Md |d) ≈ P (d|Md )P (Md )
(3)
ˆd ≈ arg max P (d|Md )P (Md ) M
(4)
we have Md
where P (d|Md ) is the likelihood of the document given Md ,and P (Md ) is the prior of the sample distribution model. We assume that we sample from a multinomial distribution once for each word in the document model.When Md parameterizes a multinomial and model prior is Dirichlet, the conjugate prior for the multinomial, we get: ˆd = arg max(Γ (nd + M Md
V i=1
σi ))/(
V +
Γ (fi + σi ))
i=1
V +
(Mfi )fi +σi −1
(5)
i=1
where Γ is the Gamma function, Γ (s) = (s − 1)! . σi are the parameters of the Dirichlet prior.The solution to the above Equation by using EM algorithm yields the following form of probability estimates: Mˆfi = (fi + σi − 1)/(nd +
V i=1
σi − V )
(6)
582
H. Huo, J. Liu, and B. Feng
The simplest method of choosing the parameters σi is to attribute equal probability to all words in the query. However, this leads to ”zero probability” problem.So, we use the document collection model to smooth the document model and choose to set the parameters as σi = μci /nc + 1. The setting of σi yields the popular Dirichlet smoothing [2][5]. μ is the smoothing parameter,ci is the number of times word wi appears in the collection, and nc is the total number of words in the collection. So, we get the term probability ˆd ) = (fi + μ(ci /nc ))/(nd + P (wi |M
V
(μ(ci /nc )))
(7)
i=1
If we assume that we sample from a multiple-Bernoulli distribution once for each word in the document model, we get ˆd = arg max M Md
V + Γ (αi + βi ) (Mfi )fi +αi −1 (1 − Mfi )nd −fi +βi −1 Γ (α )Γ (β ) i i i=1
(8)
The solution to the above Equation is Mˆfi = (fi + αi − 1)/(nd + αi + βi − 2)
(9)
We set αi = μci /nc + 1 and βi = nc /ci + μ(1 − ci /nc ) − 1. This setting also leads the Dirichlet smoothing, then we get the term probability ˆd ) = (fi + μ(ci /nc ))/(nd + (nc /ci ) + μ − 2) P (wi |M
3
(10)
Experiments and Results
Two sets of experiments are performed on the four data sets from TREC.The first set of experiments investigates whether the performances of retrieval based on our new multiple-Bernoulli approach (NMB) and our new multinomial approach (NML) are sensitive to the setting of the smoothing parameter μ. The second set of experiments is to compare the performance of NMB and NML with the performance of the Ponte and Croft’s traditional multiple-Bernoulli approach(TMB) and the performance of the Miller’s traditional multinomial approach (TML). Results of the first set of experiments are presented in Fig. 1. It is clear that average precision is much more sensitive to μ ,especially when the values of μ are small. However, the optimal values of μ seem to vary from data set to data set, though in most they are around 2000. Results of the second set of experiments are shown in Table 1. We observe that the NMB has improvements of 11.5%, 13%, 10.5%, and 13.6% in average precision on the four data sets respectively over the TMB, and that the average improvement in average precision is 12.2%. We think that the improvements of the NMB over the TMB are attributed to the using of the term frequency and Dirchlet smoothing method in the NMB. We also find that the NML has improvements of 7.7%, 10.1%, 8.3%, and 11.6% respectively over the TML in average precision on the four data sets, and the average improvement is 9.4%.
Multinomial and Multiple-Bernoulli Approach for Information Retrieval
583
Table 1. Average precision of TMB,NMB,TML and NML on four data sets Data set TREC5 TREC6 TREC7 TREC8 Avg.
TMB 0.2304 0.2101 0.2243 0.2105 0.2188
NMB Chg1% TML NML Chg2% 0.2568 11.5 0.2571 0.2768 7.7 0.2374 13 0.2404 0.2646 10.1 0.2478 10.5 0.2585 0.2799 8.3 0.2392 13.6 0.2386 0.2662 11.6 0.2453 12.2 0.2486 0.2718 9.4
0.30
Average precision of NML
Average precision of NMB
0.30 0.28 0.26 0.24 0.22 0.20
TREC5 TREC6 TREC7 TREC8
0.18 0.16
0.28 0.26 0.24 0.22 0.20
TREC5 TREC6 TREC7 TREC8
0.18 0.16 0.14
0.14
0.12
0.12 0
500
1000
1500
2000
2500
3000
3500
u
4000
0
500
1000
1500
2000
2500
3000
3500
4000
u
Fig. 1. Plots of average precision of NMB and NML for different μ
References 1. Lafferty,J., Zhai, C,: Document language models, query models, and risk minimization for information retrieval, In proceedings of SIGIR’01, (2001)111-119. 2. Metzler, D., Lavrenko, V., and Croft, W.B.: Formal multiple-Bernoulli models for language modeling. In proceedings of ACM SIGIR’04,(2004)231-235. 3. Miller, D.H., Leek, T. and Schwartz, R.: A hidden Markov model information retrieval system. In proceedings of ACM SIGIR’99 , (1999)214-221. 4. Ponte, J., Croft,W.B.: A language modeling approach to information retrieval. In proceedings of ACM SIGIR’98, (1998)275-281. 5. Zaragoza,H., Hiemstra,D.,et.: Bayesian extension to the language model for ad hoc information retrieval. In proceedings of ACM SIGIR’03.(2003)325-327.
Adaptive Query Refinement Based on Global and Local Analysis Chaoyuan Cui, Hanxiong Chen, Kazutaka Furuse, and Nobuo Ohbo University of Tsukuba, Tsukuba Ibaraki 305-8573, Japan {cui, chx, furuse, ohbo}@dblab.is.tsukuba.ac.jp
Abstract. The goal of information retrieval (IR) is to identify documents which best satisfy users’ information need. The task of formulating an effective query is difficult in the sense that it requires users to predict the keywords that will appear in the desired documents. In our study we proposed a method of query refinement by combining candidate keywords with query operators. The method uses the concept Prime Keyword Set, which is a subset of whole keywords and obtained by global analysis of the target database. Considering user’s intension we generate rational size of candidates by local analysis based on several specified principles. The experiments are conducted to confirm the effectiveness and efficiency of our proposed method. Moreover, as an extension of our approach an online system is implemented to investigate the feasibility.
1
Introduction
The rapid growth of on-line documents increases the difficulty for users to find documents relevant to his/her information need. Unfortunately, short queries are becoming increasing common in retrieval applications, especially with the advent of the World Wide Web (WWW). According to the research by [4], queries submitted by users to the WWW search engines contains averagely only two keywords. Short query makes it difficult to distinguish relevant documents from irrelevant ones. Therefor it requires a system to provide candidates automatically for helping users to refine their initial queries. Generally, meaning-based candidates and statistics-based candidates are two main sources of related keywords. Meaning-based approach construct thesauri manually. WordNet ([6]) provides a tool to search dictionaries conceptually, rather than merely alphabetically. It’s basic object is synset, which is a set of synonymous words representing a certain meaning. Synsets are organized by the semantic relations defined on them. However, this kind of thesaurus is difficult to use because of ambiguity. Selecting the correct meaning for refining may be difficult, especially in case of short queries ([10]). Statistics-based approach can be divided into two kinds, namely global analysis and local analysis. Automatic thesaurus construction technique grouped keywords together based on their co-occurrence in documents, keywords which often occur together in documents are assumed to be similar. These thesaurus can be L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 584–593, 2005. c Springer-Verlag Berlin Heidelberg 2005
Adaptive Query Refinement Based on Global and Local Analysis
585
used for automatic or manual query expansion. A global technique requires some corpus-wide statistics that take a considerable amount of computer resources to compute, such as co-occurrence data about all possible pairs of terms in a database. In contrast, local strategy only processes a small number of top-ranked documents. It is based on a hypothesis that top-ranked documents is indeed relevant. and carried out by relevance feedback systems. [8] improves the quality of the initial run by re-ranking a small number of the retrieved documents by making use of the term co-occurrence information to estimate word correlation. [3] use information-theoretic term-score function within top retrieved documents to assign scores to candidate expansion terms. Query expansion has been shown to produce good retrieval performance and prevent query drift. [11] combines the global analysis and local analysis. This approach is based on the use of noun groups instead of simple keywords, as document concepts. It works as follows. Firstly retrieve some top-ranked passages using the original query. This is accomplished by breaking up the documents retrieved by the query in fixed length passages (of say, 300 words) and ranking these passages as if they were documents. Then the similarity between each concept c in the top-ranked passages and the whole query q is computed using a variant of tf -idf ranking. The query q is refined by adding some top-ranked concepts, based on the similarity between the concepts and q. In this work, we adopted both global analysis and local analysis to generate refinement candidates based on statistics. We differ from the above work in that we analyze the relationship between documents of the database. in global analysis, which improve the precision of refinement significantly. The rest of the paper is organized as follows. Section 2 gives several principles used throughout this study, and approaches for refinement. Section 3 introduces two important concepts. Section 4 and 5 explain how global analysis and local analysis work. Section 6 outlines the test data and gives experimental results. Section 7 offers some conclusions and directions for future works.
2
Principles and Goal
The existing techniques emphasize to improve retrieval effectiveness, but there still remains a problem. Sometimes information user wants can not be reproduced by way of suggested candidates. Certainly, all keywords included in retrieval results satisfy user’s intension, but large size of candidates is inconvenient for user to browse them all. Here, we offer several principles which reflect the basic requirement of user in refinement system. – No loss of information : any of retrieved documents can be accessed through candidates or their combinations – Screen effect : appropriate reduction of documents by refined query – Rational size of candidates – Response time
586
C. Cui et al.
Based on these principles we aim at developing a refinement system as shown in Figure 1 which illustrates a running example of the prototype system. We use CRAN test collection as a database, which includes 1398 documents and 4612 keywords. In this example, suppose that a user issues a query by keyword “method”. The system shows that there are 366 hits for this query, and it takes 0.01 second to retrieve. Besides the links to hit documents, refinement candidates each of which accompanied by bool operators, are provided. For example, in the first line, “+(8) -(358) affected”, “+” and “-” represent the bool operators “AND” and “NOT”, respectively. Selection of this line with “+(8)” means that by adding candidate keyword “affected” to the original query, 8 documents will leave in the result. Selection of “-” means the exclusion of documents containing “affected”. Here suppose that the user is interested in the field of aircraft and does not want documents that include keyword “tn”, then user may choose “+ aircraft” and “- tn” which means that the query is modified to {method Fig. 1. An example of prototype refine+ aircraft - tn}. Modified query be- ment system come more clear and retrieved documents decrease. User can access documents by the links, or, carry out the refinement process repeatedly until he/she is satisfied with the results.
3
Prime Keyword Set
One of the drawback in the interactive system is that it takes lots of computation after user’s query arrived. Because most of the systems search the database and have to process contents of retrieved documents. Usually the content of a document is represented approximately by keywords included in it, and content of database is thus represented by a keyword set included in it. In fact, some of these keywords are unnecessary in sense of specifying documents. On other hand, a document can be represented only by some “specific” keywords. As an extension, a database can also be represented by a “specific” keyword set. In our study, we call this kind of keyword set prime keyword set. If a database can be well represented by prime keyword set, the effectiveness of online computation would be improved. For convenience we first give the notations that will be used in the following explanation. Let D and K be a set of documents and a set of keywords, respectively. ρ extracts keywords from a document d ∈ D,or ρ(d) = {k | (k ∈ ρ(d) is denoted by K) ∧ (k is a keyword included in d)}. Further, let D ⊂ D. d∈D
ρ(D), and in particular ρ(D) = K. A query Q is simply a subset of K. A query
Adaptive Query Refinement Based on Global and Local Analysis
587
evaluation retrieving all the documents that contain all the given keywords in Q is defined as σ(Q) = {d | d ∈ D ∧ Q ⊆ ρ(d)}. A refinement candidate of Q is also a subset of K. Let X and Y be subsets of K, the fact that ”Y is a refinement candidate of X”, is denoted by an association rule X ⇒ Y . The confidence of such a rule is defined as |σ(X ∪ Y )|/|σ(X)| where |σ(X ∪ Y )| is also called the support of the rule. In order to ensure that system does not arbitrarily narrow the retrieved result by user’s query we introduce two concepts: coverage and prime keyword set. Coverage: Let K(⊆ 2K ) be a subset of the power set of K and D(⊆ D) be a set of documents. If # σ(Q) D⊆ Q∈K
then we say that K covers D, or K is a coverage of D. In a similar fashion, K is also called a coverage of K (⊆ 2K ) iff # # σ(Q) ⊆ σ(Q) Q∈K
Q∈K
Naturally, coverage K of D is a minimum coverage if and only if D can not be covered by any pure subset of K. Minimum coverage is defined in order to exclude redundant keywords from the keyword database. In general, minimum coverage is not unique for a specific database. Prime Keyword: Let D and K be the set of all documents and all keywords, respectively. Kp is called the prime keyword set of D (or K) iff {{k} | k ∈ Kp } is a minimum cover of D (or 2K ). In addition, keywords included in prime keyword set are called prime keywords. By the definition of coverage, prime keywords guarantees no loss of information. In other words, any documents can be retrieved from a certain subset of prime keyword, expressed as the follows. # # σ({k}) ≡ σ(K) D= k∈Kp
K⊂Kp
Based on these concepts we extract prime keyword set and generate candidates according to the relationship between prime keywords.
4
Global Analysis
In global analysis, system extracts prime keyword set based on the principles mentioned in Section 2. Though no loss of information can be guaranteed by prime keywords, the practicability of system may be hurt due to the composition of database. As a refinement system, it is important to keep good screen
588
C. Cui et al.
effect. Keywords with very low support are unsuitable to be candidates because too many candidates are necessary. On the other hand, keywords with very high support can not well discriminate one document from the others. Unfortunately, some documents only contain keywords with very low or very high support. We call them outliers in sense of effectiveness of refinement. In our study, in the pre-processing phase we exclude outlier documents in which keywords are out of a specified range of support. Certainly, outlier documents can be retrieved by the system though no refinement candidates are returned. Refinement Coefficient. Theoretically, the vector space model had the disadvantage that index terms are assumed to be mutually independent. Due to the locality of many term dependencies, their indiscriminate application to all documents in the database might in fact decline the overall performance. In order to obtain high effectiveness of refinement the relationship between keywords should be taken into consideration. In our study, we use Refinement Coefficient (RC) defined below to express the correlation of keywords. cnf (kx ⇒ k) k ∈ρ(σ({k}))−{k}
tf (k,d) × x (| ρ(σ({k}) |> 1) RC(k, d) = |ρ({d})| |ρ(σ({k})|−1 Where tf (k, d) is the term frequency of keyword k included in the document d, | ρ({d}) | is the number of keywords included in d, ρ(σ({k})) is a set of keywords that co-occur with k. The formula consists in two parts, the left part shows the importance of keyword in d. In general, the higher the frequency of keyword that appears in the document is, the more important the keyword becomes. On the other hand, if the value of | ρ({d}) | is small, keywords included in d are considered to be important. As an extreme example, if a document contains only one keyword, then the keyword is essential. The right part means average effect of k in the refinement. kx is a keyword included in ρ(σ({k})) − {k}. Suppose kx is issued as a query, the refinement degree of k is denoted as cnf (kx ⇒ k). So the right part reflects average effect of refinement when when each kx is a query.
Generation of Prime Keyword Set. We use RC as a main factor to extract prime keyword set. Firstly, system select a keyword with maximal RC value from each document. Certainly, the keyword set selected is a coverage of database. Then keywords in the coverage are sorted in ascending order of RC. And keyword whose RC value is small are deleted if the rest of them can cover the whole database after this one is removed. Finally, the minimal cover is output as prime keyword set after the redundant keywords are removed. The pseudo code is as follows. Algorithm Generation of Prime Keyword Set Input : Document Set D, Keyword Set K, range of support M inSpt and M axSpt Output : Prime Keyword Set Kp (initially empty) 1 forall d ∈ D do 2 Xd = {k|k ∈ ρ({d}) ∧ M inSpt ≤ spt(k) ≤ M axSpt}
Adaptive Query Refinement Based on Global and Local Analysis
3 4 5 6 7 8
select k0 such that RC(k0 , d) = maxk∈Xd {RC(k, d)} Kp := Kp {k0 } end sort Kp in ascending order of RC forall ki ∈ Kp do if σ({k}) = D then Kp := Kp − {ki }
9
end
589
k∈Kp −{ki }
The algorithm extracts precise information that can make local analysis efficiency. In this algorithm, all keywords included in each document have to be processed, so the cost of computation is very high. Fortunately, this process is done during the off-line phase in advance. Therefore it does not affect the interactive processing.
5
Local Analysis
There have been a number of efforts to improve local analysis ([2], [3], [5], [7], [9]). These strategies can not work well when loss of information occurs. Almost all of existing methods ignore this problem. Our approach makes up for the drawback of existing techniques by using concept prime keywords. At a glance, it may take great effort and may be infeasible. In fact, the size of σ(Q) is greatly reduced and system only aims at prime keywords included in it to carry out local analysis. Fortunately, the prime keyword set is a coverage of the whole database, and naturally prime keywords included in retrieved documents become a coverage of it. Candidates from this keyword set can guarantee no loss of information. In particular, the retrieved results contain user’s intension. Keywords, especially prime keywords contained in them reflect screen effect of refinement. In addition, the size of prime keyword set is significantly reduced so time efficiency would be improved. Local RC. Based on the prime keywords extracted from σ(Q), system looks for refinement candidates. In local analysis, we still use RC as a criterion of evaluation to discriminate candidates from others. And we localize RC as follows in order to keep consistency. tf (k,d) × cnf (Q ⇒ {k}), d ∈ σ(Q) RCl (k, d) = |ρ({d})| RCl reflects the effect of refinement to query Q.
Generation of Candidates. We proposed two algorithms, called Generation of Conjunction Candidates and Generation of Exclusion Candidates to generate candidates. The similarity function between document d and Query Q is defined as follows.
590
C. Cui et al.
sim(d, Q) =
@ A A A B
w(kx , d)
kx ∈Q
w2 (k, d)×|Q|
k∈ρ(d)
where w(k, d) is the weight of keyword k in document d, and w(k, d) =
tf (k,d) |ρ({d})|
|D| × log( |σ(k)| ).
Using sim(d, Q), we define σr (Q) as a document set which satisfies the following condition. σr (Q) ⊂ σ(Q) ∧ |σr (Q)| = r ∧ ∀d1 ∈ σr (Q), ∀d2 ∈ (σ(Q) − σr (Q)), sim(d1 , Q) ≥ sim(d2 , Q) The pseudo code of two algorithms is as follows. Algorithm Generation of Conjunction Candidates Input: Kp , σ(Q), and ranking threshold r. Output: Refinement Candidates Ca 1 sort σ(Q) in descending order of sim(d, Q) 2 Ca := ρp (σr (Q))− Q σ({k}) = ∅ 3 while σ(Q) − k∈Ca 4 forall d ∈ σ(Q) − σ({k}) do k∈Ca
5 6 7
select k0 such that RCl (k0 , d) = maxk∈ρp ({d})−Q {RCl (k, d)} Ca := Ca ∪ {k0 } end
Algorithm Generation of Exclusion Candidates Input: Kp , σ(Q), and l low-ranked documents. Output: Refinement Candidates Ca 1 sort σ(Q) in ascending order of sim(d, Q) − ρp (σn−l (Q)) − Q 2 Ca := ρp (σl (Q)) 3 while σ(Q) − σ({k}) = ∅ k∈Ca 4 forall d ∈ σ(Q) − σ({k}) do k∈Ca
5 6 7
select k0 such that RCl (d, k0 ) = maxk∈ρp ({d})−Q {RCl (d, k)} Ca := Ca ∪ {k0 } end
The former emphasize the importance of prime keywords appeared in r topranked documents. If prime keywords in σr (Q) do not cover σ(Q), then more keyword are chosen from uncovered documents. This process is performed repeatedly until σ(Q) is covered. Moreover, the candidate set is a subset of ρp (σ(Q)), the size of it can be significantly reduced. The effect of candidates and the time of online computation will discussed in our experiments. Besides the above idea,
Adaptive Query Refinement Based on Global and Local Analysis
591
2500 "RetrievedPk" "Conjunction" "Exclusion" 10e+1 1-keyword-query
2000
2-keyword-query
Average Execution Time(:second)
Number of Candidates
1
1500
1000
500
10e-1
10e-2
10e-3
0 0
1000
2000
3000
4000
5000
6000
7000
8000
Support of Query
Fig. 2. Number of candidates
9000
10000
10e-4 10e+3
10e+4
10e+5
10e+6
Data Size
Fig. 3. Execution time vs. database size
if all irrelevant documents are excluded, the remainder are the relevant ones and become easy to be found. The latter algorithm Generation of Exclusion Candidates is designed based on this idea.
6
Experiments
We use C++ for our implementation. The experiments were run on a 3.0GHz Pentium IV processor with 4GB of main memory and 70GB of disk space running Debian Linux (Kernel 2.4). The proposed methods are evaluated using TREC7,8 ad hoc dataset, which includes 527,993 documents and 72,359 keywords after the elimination of stop words and stemming, as well as 100 test questions. As mentioned in section 4, our system eliminates outliers before extracting prime keyword set. In our experiments, we set M inSpt and M axSpt to [500, 10, 000] and 738 outlier documents are removed. Applying algorithm extracting prime 1 of keyword set to the dataset, we obtained a prime keyword set of less than 30 the original keyword set. Size of Candidates and Time of Online Processing. In order to reveal time efficiency and candidate’s reduction ratio of the two algorithms in local analysis, we compare them with the method which picked up all the prime keywords included in σ(Q) as refinement candidates. In the following this method is referred to as RetrievedPk. We take the following steps to carry out our experiments. First the system randomly issues 10,000 1-keyword-queries and 2-keyword-queries respectively. Then we generate candidates based on the two algorithms. For queries with same support, our system calculate the average number of candidates and the average time. Figure 2 shows the number of candidates by different algorithms. By our local analysis, in most cases, the number of conjunction and exclusion candidates are reduced to less than 15% and 3%, respectively. Next experiment aims at investigating the scalability of our approach. As shown in Figure 3, three datasets, ad hoc, FT(a subset of ad hoc) and CRAN are used to measure the execution time. The results show that in both heuris-
592
C. Cui et al. 1
0.07 "query" "query+1c" "query+2c"
"query" "query+1c" "query+2c" 0.06
0.8
F-Measure
Average Precision
0.05
0.6
0.4
0.04
0.03
0.02 0.2 0.01
0
0 0.1
0.2
0.3
0.4
0.5
Recall
Fig. 4. Average Precision and Recall
0.6
0
5
10
15
20
25
30
35
40
45
50
Order of ranked documents
Fig. 5. F-Measure
tics, the processing time is almost linear to the size of database. Considering the hardware specification used in the experiments, we can conclude that the system is feasible. Effect of Candidates. This experiment investigates the effect of refinement. Obviously, query by several keywords can narrow the intention and decrease the ambiguity. This is proved in Figure 4. Adding one candidate to the initial query improves the average precision, and an addition candidate enhances the average precision further. A single measure which combines recall and precision might be of interest. One such measure is the harmonic mean F of recall and precision which is computed as F (i) = 2/(r−1 (i)+p−1 (i)) where r(i) is the recall for the i-th document in the ranking, p(i) is the precision for the i-th document in the ranking, Figure 5 shows the same result with Figure 4 about F -Measure. Comparing with Rocchio. There have been a lot of approaches attempted to improve retrieval effects on query modification. Rocchio is a famous way and has to improve been proved to be effective. Many systems use Ide Regular formula the performance by query expansion ([1]). Qnew = αQorig +β dj −γ dj ∀dj ∈Dr
∀dj ∈Dn
Here, Qorig and Qnew are the initial query vector and the expanded (refined) query vector, respectively Dr and Dn stand for the sets of relevant and irrelevant documents, respectively. Considering the fact that the information contained in the irrelevant documents is much less important, we set the parameter γ to 0. The experiments are done by the following steps. – Look for σ(Qorig ). – Sort retrieved results by sim(d, Qorig ). – Specify relevant documents, look for expansion query by Ide Regular. Figure 6 shows that our method doubles the precisions of those of Rocchio’s for all recall. As for F -measure shown in Figure 7, our method is also superior to Rocchio’s remarkably.
Adaptive Query Refinement Based on Global and Local Analysis 0.7
593
0.06 "query" "Rocchio" "query+1c"
"query" "Rocchio" "query+1c"
0.6
0.05
0.5
F-Measure
Average Precision
0.04 0.4
0.3
0.03
0.02 0.2
0.01
0.1
0
0 0.1
0.2
0.3
0.4 Recall
0.5
0.6
0
5
10
15
20
25
30
35
40
45
50
Order of ranked documents
Fig. 6. Comparison with Rocchio (Average Fig. 7. Comparison with Rocchio (Fprecision) Measure)
7
Future Work
Our study use global analysis and local analysis to improve response time and quality of candidates in query refinement. It guarantees no loss of information and is proved to be effective and efficient experimentally. In order to examine the practicability we attempt to implement a refinement system with practical dataset It is expected to applied to web search engine in the future.
References 1. Baeza-Yates, R. and Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley, pp.24-34, 1999 2. Carmel, D., Farchi, E. and Petruschka, Y.: Automatic Query Refinement using Lexical Affinities with Maximal Information Gain. ACM SIGIR, pp.283-290, 2002 3. Carpineto, C., De Mori, R., Romano, G. and Bigi, B.: An Information-Theoretic Approach to Automatic Query Expansion. ACM TOIS, vol.19(1), pp.1-27, 2001. 4. Croft, W.B., Cook, R. and Wilder, D.: Providing government information on the Internet: Experience with THOMAS. In Proceedings of the Digital Libraries Conference, (DL’95) pp.19-24, 1995 5. Cui, C., Chen, H., Furuse, K. and Ohbo, N.: Web Query Refinement without Information Loss. APWeb 2004, pp.363-372, 2004 6. George, M.: Special Issue, WordNet: An On-line Lexical Database. International Journal of Lexicography, 3(4), 1990. 7. Kraft, R. and Zien, J.: Mining anchor text for query refinement Proceedings of the 13th international conference on World Wide Web pp.666-674(2004) 8. Mitra, M., Singhal, A. and Buckley, C.: Improving Automatic Query Expansion. ACM SIGIR, pp.206-214, 1999 9. V´ elez, B., et al: Fast and Effective Query Refinement. ACM SIGIR, pp.6-15, 1997 10. Voorhees, E.M.: Query Expansion Using Lexical-Semantic Relations. In ACM SIGIR, pp.61-69, 1994 11. Xu, J., and Croft, W. B.: Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information System, vol.18(1), pp.79112, 2000.
Information Push-Delivery for User-Centered and Personalized Service Zhiyun Xin1, Jizhong Zhao2, Chihong Chi1, and Jiaguang Sun1 1 School
of Software, Tsinghua University, 100084, Beijing, China [email protected] 2 Department of Computer Science and Technology, Xi’an Jiaotong University, 710049, Xi’an, China zjz@m ail.xjtu.edu.cn
Abstract. In this paper, an Adaptive and Active Computing Paradigm (AACP) for personalized information service in heterogeneous environment is proposed to provide user-centered, push-based high quality information service timely in a proper way, the motivation of which is generalized as R4 Service: the Right information at the Right time in the Right way to the Right person, upon which formalized algorithms of adaptive user profile management, incremental information retrieval, information filtering, and active delivery mechanism are discussed in details. The AACP paradigm serves users in a push-based, eventdriven, interest-related, adaptive and active information service mode, which is useful and promising for long-term user to gain fresh information instead of polling from kinds of information sources. Performance evaluations based on the AACP retrieval system that we have fully implemented manifest the proposed schema is effective, stable, feasible for adaptive and active information service in distributed heterogeneous environment.
1 Introduction During the past decades pull-based information service such as search engine and traditional full-text retrieval were studied much more[1-5], and many applications have been put to real use. However with the explosive growth of the Internet and World Wide Web, locating relevant information is time consuming and expensive, push technology[4-10] promises a proper way to relieve users from the drudgery of information searching. Some current commerce software or prototype systems such as PointCast Network, CNN Newswatch, SmartPush and ConCall serve users in a personalized way[8,10,12], while recommendation system[13-15] such as GroupLens, MovieLens, Alexa, Amazon.com, CDNow.com and Levis.com are used in many Internet commerce fields. Although kinds of personalized recommendation systems were developed, still many things left unresolved, these problems result in deficiency and low quality of information service as the systems declared. One of most important reasons of which is that single recommendation mechanism such as content-based or collaborative recommendation is difficult to serve kinds of users for their various information needs[10-15]. L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 594 – 602, 2005. © Springer-Verlag Berlin Heidelberg 2005
Information Push-Delivery for User-Centered and Personalized Service
595
In this paper an Adaptive and Active Computing Paradigm (AACP) for personalized information service in wide-area distributed heterogeneous environment is proposed to provide user-centered, push-based high quality information service timely in a proper way, the motivation of which is generalized as R4 Service: the Right information at the Right time in the Right way to the Right person, Upon which formalized algorithms framework of adaptive user profile management, incremental information retrieval, information filtering, and active delivery mechanism are discussed in details. As the work of National High-Tech Research and Development Plan of China, we have fully implemented the Self-Adaptive and Active Information Retrieval System (AIRS) for scientific research use, and evaluations showed the AIRS system is effective and reliable to serves large scale users for adaptive and active information retrieval.
2 The Adaptive and Active Computing Paradigm 2.1 Abstract Model of Information Retrieval System Usually traditional information retrieval system[1-6] is composed of RetrievalEnabled Information Source (IS), Indexing Engine (XE), Information Model (IM), Retrieval Engine (RE), Graphic User Interface (GUI), entrance for users to retrieve for information, which includes the order option parameters. Definition 1: The general Information System can be defined according to its components in a system’s view as: IRSSysView : ={IS, XE, IM, RE, GUI}
(1)
Especially, in an open source information system such as search engine, the indirect IS is the WWW, while the direct IS is the abstract image of the indirect information source. In non-open information sources such as full-text retrieval system the IS is the original metadata suitable for retrieval. Definition 2: In a user’s view, the information retrieval system is a 3-tuple framework composed of virtual or real information source (IS), user’s Retrieval Input (RI) and Retrieval Output (RO), which is: IRSUserView : ={IS, RI, IO}
(2)
Traditional information system is essentially information-centered, pull-based computing paradigm, which serves customers in a passive mode, and can’t meet longterm users’ demand of getting information in an active way. 2.2 Adaptive and Active Computing Paradigm An Adaptive and Active Computing Paradigm (AACP) for personalized information service[12-15] in heterogeneous environment is user-centered, push-based high quality information service in a proper way, the motivation of which is generalized as R4 Service: the Right information at the Right time in the Right way to the Right person, that is: R4 : =R × R × R × R
(3)
596
Z. Xin et al.
The R4 Service serves users in an active and adaptive mode for high quality information timely and correctly, which can adjust the information content according to user’s preference dynamically and adaptive to user’s interest in some degree, automatically retrieve and delivery related information to users with interaction. The R4 service can be described for adaptive and active information retrieval. 2.3 The Abstract Architecture of AACP Definition 3: The Adaptive and Active Information System (AAIS) for personalized information service is encapsulated on the traditional information system with adaptivity and activity, and can be viewed as the following composition: AACP : =IRSSysView AAIS
(4)
AAIS : =(VUser, VIS, C, T, P)
(5)
Where
The semantic of each element in AAIS is described as table 1: Table 1. Semantic of AAIS
0Symbol VUSER VIS C T P
1Definition Vector of User’s interests or preferences Vector of Information Source Condition that VIS matches VUSER Trigger for the system starts to work for U Period that the system serves for U
For a long-term user in information retrieval system, what one really needs in a relative stable period is just a little part in the whole global information domain, the relationships that can be precisely described as definition 4. Definition 4: A Personalized Information Set (PIS) is a subset of Domain Information Set (DIS), while DIS is a subset of Global Information Set (GIS), so that: PIS
⊆ DIS ⊂ GIS
(6)
Considering the high-dimension of information system, usually the domains overlap each other especially in multi-subjects, the set of the above is in fact n-dimension space, and fig.4 just shows a flat model of the above relationship. Definition 5: Personalized Information Service over heterogeneous information source is a matching map R (where R stand for Retrieval) from VIS to VUser with the Condition C and trigger T during the period P, that is: Q=R (VUser , VIS, C, T, P ; MInfo)
(7)
Information Push-Delivery for User-Centered and Personalized Service
597
where Minfo is the information Model for the Retrieval Map, which may be Boolean, Probable, or VSM modal. To serve users adaptively and actively, the system must know users’ needs, usually user profile is the basic infrastructure, which decides the quality of adaptive and active service. Except for adaptivity and activity, the AACP paradigm also builds on the infrastructure of information indexing, information retrieval, information filtering. Moreover, automatically monitoring for retrieval and delivery is another key technology. The following section will talk about each key technology and its optimization strategy in details.
3 Infrastructure of AACP 3.1 Abstract User Profile To serve users according to their preference and interests, user profile is needed as a image of what users require, by which then system can decide who to serve, when to serve, what to serve and how to serve. Thus a user profile can be defined in an abstract form as the following multi-tipple: AUP : =(ExtUID,IntUID,TrueVector,RecVectosr,
(8)
StartTime,EndTime,Freq ) where ExtUID, IntUID stand for external and internal User Identity respectively, and TrueVector stands for true and confirmed user interest vector ,but RecVector is the recommended interest vector, which should be evaluated by user whether to retrieve by the vector or not, while StartTime, EndTime describe the start and end time that user want to be served, Freq is the frequency that user consumes the information that pushed to him/her. To provide high quality information it is necessary to define different interests with different weight, thus TrurVector and RecVector can be defined as following: TrueVector : =
(9)
wi ≥ wj,(1 ≤ i, j ≤ n, iC2). In this case, (1) can be reduced to simple car following model: a(t)=C1*DV|t-Td
(2)
The physical meaning of Td was straightforward, which was the sum of perception reaction time (PRT), engine delay, brake delay, etc. and always existed in manual control system. Two linear coefficients, C1 and C2 were speed tracking gain and headway tracking gain. Fig 2 illustrated two simulated car following processes using different value of parameters. Headway tracking errors (DX-Desired Headway) and speed tracking errors (DV) were plotted against each other so that small circles were associated with small tracking errors. It can be observed that large C1 resulted in small speed tracking error and accordingly small headway tracking error. Large C2 resulted in reduced headway tracking error. Time delay was universally harmful to the performance of any tracking task. However reducing time delay demanded perceptual resources and was constrained by human perception ability. In addition, under large C1 and C2, large correction was made to a perceived error, which resulted in stronger acceleration/deceleration manifested as more aggressive car following behaviour. Under large time delay and C1, C2 combinations, the system was going to become unstable and tracking error could become very large. A more accurate representation of car following behaviour should take into account of non-linearity of human response and limitations of human perception system, i.e. drivers may not be able to perceive relative speed and headway accurately, and the decision process (acceleration control) could be highly non-linear. Neuro-fuzzy systems are suitable tool to describe such behaviour. A generic fuzzy inference system (FIS) can map between variables driver can perceive and variables driver can directly control with arbitrary accuracy based on 'fuzzy reasoning'. This makes the system natural and suitable to model a human-in-the-loop system (e.g. [11]). Fuzzy systems are established on fuzzy logic which is based on the idea that sets are not crisp but some are fuzzy, and these can be described in linguistic terms such as
786
P. Zheng and M. McDonald
C1=0.16,C2=0.02,Td=2.5
C1=0.32, C2=0.1, Td=1.5
3
DV [m/s]
2 1 0 -1 -2 -3 -10
-5
0
5
10
DX-Desired Headway [m]
Fig. 2. Car following performances under different combination of parameters
fast, slow and medium [12]. Fuzzy sets have boundaries that are not precise and are described by membership functions that are curves defining how each point in the input space is mapped to a membership grade. The transition from non-membership to membership is gradual rather than abrupt. The value of the membership is between 0 and 1. A membership degree of 1 represents certainty with respect to an entity belonging to a particular set. Fuzzy inference systems use IF THEN ELSE rules to relate linguistic terms defined in output space and input space. Every linguistic term corresponds to a fuzzy set. Neuro-fuzzy systems are a combination of artificial neural networks (ANN) and fuzzy systems. ANN mimics the manner in which the brain processes information, which consists of a number of independent processors that communicate with each other via weighted connections. The most prominent feature of ANN is to learn and adapt to a large quantity of data. The combination of ANN and fuzzy sets offers a powerful method to model human behaviour. A generic fuzzy inference system (FIS) could then be used to represent the openloop mapping in car following processes, which was able to approximate any nonlinear function with arbitrary accuracy [13] and therefore was able to identify car following behaviour more accurately. The model can be described notational as: a(t)=FUZZY[DV,(DX-DXdsr)]|t-Td
(3)
where FUZZY denoting a fuzzy logic mapping.
3 Data Collection and Reduction A car following experiment was conducted to observe car following behaviour under normal driving conditions. An instrumented vehicle (IV, a Vauxell 2.0 sedan) was used to measure speed, relative speed and headway between a leader and the IV. The instrumentation of the IV included a laser speedometer, front and rear Radar, three
Application of Fuzzy Systems in the Car-Following Behaviour Analysis
787
video cameras typically facing front, rear and driver, as well as other sensors such as pedal/brake movement, indicators use etc [14]. All outputs from sensors were logged onto a PC at a sampling rate of 10 Hz. The measurements for car following study purpose and their accuracy are summarised in Table1. The video camera recordings were used to help tracking vehicles in later data reduction stage. Table 1. IV Measurements and the Accuracy
Instrument Laser Speedometer Front Radar Rear Radar
Measurement
Accuracy
IV Speed
0.278 m/s
Front Headway Rear Headway
20cm @100m
The survey was carried out at morning peak hours (7:00 –9:00 am) when stable car following condition was frequent at high flow rate. Subjects were asked to drive between two junctions on motorway (M27) repeatedly. The survey involved in a motorway driving of about 7 km when going to and coming back from downstream junction where car following situation was frequent. No instruction was given for the driving so that drivers could follow any leader like in everyday driving. Over the twomonth of survey, 590 trials of driving were carried out in 30 experiment days. Useful car following processes were then extracted from the raw data. The car following situation was assumed to exist if and only if: (1) (2) (3) (4)
both vehicles (IV as the follower) was on the same lane; distance separation between two vehicles was less than 100 meters; the above relative position was kept more than 30 second (about 1km); both vehicles was on motorway (not in round-about or on slip road)
Thirty days of raw data collected on weekday was processed. Totally 1236 segments of car following process were derived. For each identified car following process, directly measured headway and speed was further processed to reduce measurement errors. Time derivatives such as acceleration, range rate (DV) were calculated from speed and headway measurements. The final reduced data was a time-series of speed, acceleration rate, headway (DX) and relative speed (DV). A detailed data processing procedure was documented elsewhere [15]. A database was formed based on 30 days of observations; typically, about 40 car following processes were identified each day.
4 Results Neural fuzzy systems were used to extract the necessary information directly from measurements of car following behaviour as it can directly learn and adapt to the pattern from a large quantity of data, i.e., a fuzzy model was generated from the data with parameters trained to the data. The membership functions were obtained by
788
P. Zheng and M. McDonald
clustering every input variable separately. Generation of fuzzy rules included the generation of the premise and the consequence. Premise was directly derived from the clusters and the obtained membership functions while the consequent was generated by associating clusters to classes. All input variables were linked in a conjunctive rule. The functionality is available through Adaptive Neural Fuzzy Inference System in Matlab for Suegeno-type fuzzy inference system [16]. The description of the adaptive learning algorithm for the neuro-fuzzy system can be found in [17] Adaptive neural fuzzy training was applied to estimate the parameters of fuzzy inference system, which could be analogue to a least square estimation. All data collected on the same day was grouped as they represented the behaviour of the same driver. 60% of data was used as training data and the remaining 40% as checking data to prevent overfitting. The desired headway of driver (DXdsr) in a car following process was a steady-state where relative speed and acceleration were zero. However, in real traffic, strict defined steady-state can be hardly achieved. A typical car following process lasting for about 2 minutes is illustrated in Figure 3. It showed repeated ‘goal seeking’ cycles on the relative speed-headway plane. Desired headway can be identified, which was near the centre of each circle on headway axis. In this research, the desired headway was derived from each car following process and was calculated by averaging crossing points (i.e., relative speed from + to – or – to +) at which acceleration rate were below a threshold of 0.1 m/s2. By varying time delay t, training data was generated to train the fuzzy inference system. The best fit was obtained when minimum root mean square error of acceleration rate was achieved. The trained model was a nonlinear mapping between the inputs (relative speed, headway discrepancy) and the output (acceleration rate), expressed as a fuzzy system. A section plane of two trained model representing two different drivers is shown in Figure 4. The nonlinearity of the response can be clearly noticed.
Relative Speed [m/s]
2.0
1.0
0.0 24
29
34
39
44
-1.0
-2.0 Headway [m]
Fig. 3. A car following process, note the desired headway at about 30m headway
Application of Fuzzy Systems in the Car-Following Behaviour Analysis
789
For identified non-linear mapping, linear analysis was further performed to determine qualitative/quantitative effects of the equivalent linear parameters as outlined in Equation (1). This made it possible to compare with its linear counterparts. The linearlised speed-matching gain (C1) and headway-matching gain (C2) were calculated according to Equation near steady-state (DV=0, DX=DXdsr).
c1 =
∂F ( DV , DX − DXdsr ) , and ∂DV Subject 1
c2 =
Subject 2
∂F ( DV , DX − DXdsr ) ∂ ( DX − DXdsr ) Linear line of subject1
Acceleration [m/ss]
1.0 0.5 0.0 -0.5 -1.0 -1.5 -2.0 -2.5 -8.0
-6.0
-4.0
-2.0
0.0
2.0
4.0
6.0
DV [m/s]
Fig. 4. Section plane of a trained model at DX=DXdsr
The mean and standard deviation of calculated linear gains are shown in Table 2. Table 2. Mean and Std of Behavioural Indices 2
Indexes
Time Delay [s]
C1 [1/s]
C2 [1/s ]
Mean
2.5
0.16
0.015
Std
0.81
0.056
0.008
A comparison of behavioural indices between observations was summarised in Table 3. Because of the insignificance of C2, only C1 was compared. The fitted value for Td and C1 varied considerable between researches, thus made a quantitative comparison difficulty. However, the C1 obtained in this research was much smaller than all of the others. This implied that car following was performed in a ‘loose’ manner under normal task-demand, which was intuitively reasonable.
790
P. Zheng and M. McDonald Table 3. A comparison of Td and C1 with other observations
Observations Chandler, Test Track Forbes, Helly, Tunnel and Open Road Allen Test Track Allen Open road McDonald, Zheng, Open Road
Td (s) 1.55 0.983 3.28 4.96 (4.15*) 2.125
C1 (1/s) 0.368 0.633 0.21 0.21 0.296
* If an outlier of 11.47 second was excluded
5 Conclusion A car following observation had been carried out which differed in its control over driver’s task demand of car following compared with other experiments. The taskdemand of driver was kept in normal level by a proper manipulation of experiment design. A detailed analysis revealed a property of car following behaviours under normal task-demand, i.e. the dynamic control, measured by open-loop gain, was significantly lower. Knowledge had accumulated regarding manual car following behaviour through observations under experiment settings. Models of car following behaviour were able to reproduce it because car following behaviour was tightly constrained especially under experiment situations. The results from this research, however, indicated that car following might have been performed in a more ‘tight’ manner in experiment. While in normal driving, drivers might not always be able to, or unwilling to do a car following in this manner. Fortunately, driver assistance device, such as ACC, was able to do so in a very ‘tight’ manner because of very short time delay, therefore could be very helpful in improving car-following performance. It was the intent of this research to expand the understanding of manual car following behaviour, which had affected and would keep affecting both traffic modelling and driving-task automation.
References 1. Chandler, F.E., Herman, R., Montroll, E.W.: Traffic Dynamics: Studies in Car Following. Operations Research, 6(1958) 165-184 2. Forbes, T. W., Zagorski, M. J., Holshouser, E. L., Deterline, W. A.: Measurement of Driver Reaction to Tunnel Conditions, Proceedings of the Highway Research Board 37 (1959), 345-357 3. McDonald, M., Wu, J., Brackstone, M.: Development of a Fuzzy Logic Based Microscopic Motorway Simulation Model. In Proceedings of the IEEE Conference on Intelligent Transport Systems. Boston, USA. (1997) 4. Allen, R. W., Magdaleno, R.E., Serafin, C., Eckert, S., Sieja, T.: Driver Car Following Behaviour Under Test Track and Open Road Driving Condition. SAE Paper 970170, SAE. 17 pages. (1997) 5. Todosiev, E. P.: The Action Point Model of the Driver Vehicle System. Report No. 202A3. Ohio State University. (1963)
Application of Fuzzy Systems in the Car-Following Behaviour Analysis
791
6. McDonald, M., Brackstone, M., Sultan, B., Roach, C.: Close Following on the Motorway: Initial Findings of an Instrumented Vehicle Study. Vision in Vehicles VII (Ed, Gale, A. G.), Elsevier, North Holland, Amsterdam. (2000) 7. Ozaki, H.: Reaction and Anticipation in the Car-Following Behaviour, Transportation and Traffic Theory (Ed. Daganzo, C. F. ), (1993) 349-366 8. Rothery, R. W.: Car Following Models. In Traffic Flow Theory, TRB Special Report 165, (1999) 9. McRuer, D. T., Krendel, E. S.: The Human Operator as a Servo System Element. Journal of the Franklin Institute, Vol. 267, (1959) 381-403 10. Herman, R., Montroll, E. W., Potts, R. B., Rothery, R. W.: Traffic Dynamics: Analysis of Stability in Car Following, Operational Reseach, Vol. 7, (1959) 86-106. 11. Rouse, W. B.: ‘Fuzzy Models of Human Problem Solving.’ Advances in Fuzzy Set, Possibility Theory and Application (Ed: Wang, P. P.), Plenum Press, New York (1983) 12. Zadeh, L. A.: Fuzzy Sets, Information and Control, Vol. 8 (1965) 65-70 13. Buckley, J. J.: Sugeno Type Controllers are Universal Controllers, Fuzzy Sets and Systems, No. 53 (1993) 299-304 14. McDonald, M., BrackStone, M, Sultan, B.: Instumented Vehicle Studies of Traffic Flow Models. Proc. of the 3rd International Symposium on Highway Capacity. Copenhagen, Denmark. (1998) 15. Zheng, P., McDonald, M., et al., Identifying Best Predictors for Car Following Behaviour from Empirical Data. Proceedings of 13th ESS, (2001) 158-165 16. MathWorks Inc.: Fuzzy Logic Toolbox User’s Guide (version 2). MathWorks Inc (1999) 17. Jang, J. F.: ANFIS: Adaptive-network-based Fuzzy Inference System. IEEE Trans. Syst., Man, Cybern., 23 (1999) 665-685
GA-Based Composite Sliding Mode Fuzzy Control for Double-Pendulum-Type Overhead Crane Diantong Liu1, Weiping Guo1, and Jianqiang Yi2 1
Institute of Computer Science and technology, Yantai University, Shandong Province, 264005, China {diantong.liu, weiping.guo}@163.com 2 Institute of Automation, Chinese Academy of Sciences, Beijing, 100080, China [email protected]
Abstract. A genetic algorithm (GA) based composite sliding mode fuzzy control (CSMFC) approach is proposed for the double-pendulum-type overhead crane (DPTOC). The overhead crane exhibits double-pendulum dynamics because of the large-mass hook and the payload volume. Its nonlinear dynamic model is built using Lagrangian method. Through defining a composite sliding mode function, the proposed control approach greatly reduces the complexity to design a controller for complex underactuated systems. The control system stability is analyzed for DPTOC. Real-valued GA is used to optimize the parameters of CSMFC to improve the performance of control system. Simulation results illustrate the complexity of DPTOC and the validity of proposed control algorithm.
1 Introduction Overhead crane works as a robot in many places such as workshops and harbors to transport all kinds of massive goods. It is desired for the overhead crane to transport the payloads to the required position as fast and as accurately as possible without collision with other equipments. Many works have been done in controlling the overhead crane. William et al [1] adopted input shaping control method. Lee [2] proposed feedback control methods. Moreno et al [3] used neural network to tune the parameters of state feedback control. Benhidjeb and Gissinger [4] compared fuzzy control with Linear Quadratic Gaussian Control of an overhead crane. Nalley and Trabia [5] adopted fuzzy logic to both positioning control and swing damping. Mansour and Mohamed [6] used the variable structure control to control the overhead crane. In the above works, the mass of hook was usually ignored and the system was a singlependulum-type overhead crane (SPTOC). Therefore, the model is easily built and the control algorithm is easily designed. However, when the hook mass is not less so much than the payload mass, the overhead crane performs as a double pendulum system [7]. The dynamics and control of DPTOC are more complex than SPTOC. In this paper, the overhead crane that exhibits double-pendulum-type dynamics is investigated and a GA-based composite sliding mode fuzzy control algorithm is proposed. The remainder of this paper is organized as follows. In section 2, the system dynamics are given. In section 3, a composite fuzzy sliding mode control algorithm is L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 792 – 801, 2005. © Springer-Verlag Berlin Heidelberg 2005
GA-Based Composite Sliding Mode Fuzzy Control
793
proposed and its stability is analyzed. In section 4, real-valued GA is designed to optimize the parameters of CSMFC for improving system performance. Section 5 is system simulations and section 6 is conclusions.
2 System Dynamics of DPTOC Fig.1 shows the schematic representation of DPTOC. The crane is driven by applying a force F to the trolley in the X-direction. A cable of length l1 hangs below the trolley and supports a hook of mass m1, to which the payload is attached using some form of rigging. Here the rigging and the payload are modeled as the second cable of length l2, and mass point m2. The system dynamic equations can be derived using the Lagrangian method as follows:
M (q)q + C (q, q )q + G (q ) = τ .
(1)
where q = [ x,θ1 ,θ 2 ]T is the vector of generalized coordinates, τ = [ F ,0,0]T is the
vector of generalized forces acting on the system. M (q ) is the inertia matrix:
ªm + m1 + m2 « M (q) = «(m1 + m2 )l1 cosθ1 « ¬m2l 2 cosθ 2
(m1 + m2 )l1 cosθ1
m2l 2 cosθ 2
º » (m1 + m2 )l1 m2 l1l 2 cos(θ1 − θ 2 )» . » 2 m2 l1l 2 cos(θ1 − θ 2 ) m2 l 2 ¼ 2
(2)
G (q) is the potential (gravitational/elastic) term: G (q) = [0 (m1 + m2 ) gl1 sin θ 1 m2 gl 2 sin θ 2 ]T .
(3)
C (q, q ) is the vector of velocity (Coriolis/centrifugal) term: ª0 − (m1 + m2 )l1θ1 sin θ1 − m2 l 2θ2 sinθ 2 º » « C (q, q ) = «0 m2 l1l 2θ1 sin(θ1 − θ 2 )» . 0 » « 0 ¼» ¬«0 − m2 l1l 2θ1 sin(θ1 − θ 2 ) x F
m
l1 1
m1 l2 2
m2 Fig. 1. Scheme of the double-pendulum-type overhead crane
(4)
794
D. Liu, W. Guo, and J. Yi
3 Composite Sliding Mode Fuzzy Control 3.1 Sliding Mode Fuzzy Control
Consider a second order nonlinear system: x 1 = x 2 . x 2 = f ( X ) + b ( X ) u
(5)
where, X = [ x1 , x 2 ]T is state vector, f ( X ) and b( X ) are continuous linear or nonlinear functions, u is the control input. The sliding mode function is defined as s = x 2 + λx1 .
(6)
which is a measure of the algebraic distance of the current state to the sliding surface s = 0 ; λ is real and positive. The resulting sliding mode fuzzy controller (SMFC) is actually a single-input-single-output fuzzy logic controller [8]. The input to the controller is the sliding mode function s . The output from the controller is the control command u . A typical rule of a SMFC has the following format: Ri: IF s IS Fi THEN u IS Ui where Fi is the linguistic value of s in the ith-fuzzy rule, and Ui is the linguistic value of u in the ith-fuzzy rule. 3.2 Composite Sliding Mode Fuzzy Control
The underactuated DPTOC systems described by equation (1) can be represented as: x1 = x 2 x 2 = f 1 ( X ) + b1 ( X ) u x 3 = x 4 . x 4 = f 2 ( X ) + b 2 ( X ) u x 5 = x 6
(7)
x 6 = f 3 ( X ) + b 3 ( X ) u
where, X=(x1, x2, x3, x4, x5, x6) is the state variable vector that represents the crane position x , velocity x , the hook swing angle ș1 and angle velocity ș1 , the payload swing angle ș and angle velocity ș , f1(X), f2(X), f3(X), b1(X), b2(X) and b3(X) are 2
2
continuous nonlinear or linear functions, u is the control input. From equation (7), the system has three coupled subsystems: position subsystem and two (payload and hook) anti-swing subsystems. In order to decouple the system, three sliding mode functions are defined for the three subsystems: s1 = x 2 + λ1 x1 s 2 = x 4 + λ 2 x3 . s 3 = x 6 + λ 3 x5 where Ȝ1 , Ȝ2 and Ȝ3 are positive real numbers.
(8)
GA-Based Composite Sliding Mode Fuzzy Control
795
Based on the three sliding mode functions, a composite sliding mode function can be further defined as s = k1 s1 + k 2 s 2 + k 3 s 3 .
(9)
where k1 , k 2 and k 3 are real numbers. The control objective is: to determine a feedback control such that the closed-loop system is globally stable in the sense that all the state variables are uniformly bounded and converge to its equilibrium asymptotically. By doing so, the main process is to make the system states move toward the surface s1 = 0 , s 2 = 0 and s 3 = 0 , and then converge to the equilibrium point asymptotically. The signs and values of coefficients k1 , k 2 and k 3 can decide the stability and accessibility of the composite sliding mode surface, and their values can adjust the roles of three subsystems in the composite sliding mode function so as to affect their respective proportion in system control. For DPTOC systems, the composite sliding mode function s works as the input to the sliding mode fuzzy control. To determine the final control action, we design following fuzzy rules: Rk: IF s IS Fk THEN u IS Uk where, Rk is the kth rule among p rules, Fk is a fuzzy set of the input variable s, and Uk is a fuzzy set of the output variable u. The output singleton fuzzy sets and the centerof-gravity defuzzification are used: p
u=(
¦μ k =1
p Fk
(s) × U k ) /
¦μ k =1
F
k
(s) .
(10)
where μ F k (s ) is the firing degree of the jth rule and u is the output of the composite sliding mode fuzzy controller. 3.3 Stability Analysis for DPTOC
Now we consider the asymptotical stability of the DPTOC system with the composite sliding mode fuzzy control. Theorem 1. For the DPTOC system (1), consider the proposed composite sliding mode fuzzy control (10). If in the sliding mode functions, the parameters λ1 , λ 2 and
λ3 are positive real number, in the composite sliding mode functions, parameter k1 and k 3 are positive real number and k 2 is a negative real number, and make k1b1 ( X ) + k 2 b2 ( X ) + k 3 b3 ( X ) > 0 hold, then the system is asymptotically stable. Proof: The stability of the closed loop system consists of the stability in the sliding mode surface and the accessibility of sliding mode surface.
a. The stability in the sliding mode surface All the sliding mode surfaces s1 = 0 , s 2 = 0 and s 3 = 0 are stable because λ1 ,
λ 2 and λ3 are positive real numbers. The stability in the composite sliding mode
796
D. Liu, W. Guo, and J. Yi
surface is decided by the coupling factors k1 , k 2 and k 3 in equation (9) and the definitions of the swing angles and the control input u. Assume positive angles and a positive u are along the positive X-direction as in Fig.1. When the position sliding mode function s1 > 0 , a negative control input is required such that the sliding function s1 approaches s1 = 0 . When the payload anti-swing sliding mode function s 3 > 0 , a negative control input is required such that the sliding function s3 approaches s 3 = 0 . However, when the hook anti-swing sliding mode function s 2 > 0 , a positive driving force is required such that the sliding function s 2 approaches s 2 = 0 . Therefore, the role of the hook anti-swing sliding mode function in control is contradictive with the others, and the coupling factor k1 and k 3 should be positive and k 2 should be negative for the stability in sliding mode surface. b. The accessibility of sliding mode surface Choose the Lyapunov function candidate V = s2 2 .
(11)
The time derivative of the composite sliding mode function (9) is s = k1 ( x 2 + λ1 x1 ) + k 2 ( x 4 + λ2 x 3 ) + k 3 ( x 6 + λ3 x 5 ) .
(12)
Substituting equation (7) into equation (12) s = k1 ( f1 ( X ) + λ1 x 2 ) + k 2 f 2 ( X ) + k 2 λ2 x 4 + k 3 f 3 ( X ) + k 3 λ3 x 6 + (k1b1 ( X ) + k 2 b2 ( X ) + k 3b3 ( X ))u
.
(13)
It is easy to obtain the time derivative of the Lyapunov function candidate V = ss = M ( X ) s + N ( X ) su .
(14)
where, M ( X ) = k1 ( f1 ( X ) + λ1 x 2 ) + k 2 f 2 ( X ) + k 2 λ 2 x 4 + k 3 f 3 ( X ) + k 3 λ3 x 6 N ( X ) = k1b1 ( X ) + k 2 b2 ( X ) + k 3b3 ( X )
.
(15)
If k1b1 ( X ) + k 2 b2 ( X ) + k 3 b3 ( X ) > 0 holds, it can be realized through the design of the composite sliding mode fuzzy inference that increasing the control input u will result in decreasing ss as the sliding mode function s is negative, and decreasing the control input u will result in decreasing ss as the sliding mode function s is positive. Therefore, the sliding mode surface s = 0 can be accessible.
4 CSMFC Parameters Adjusting by Real-Valued GA System performance is very sensitive to the slope λ1 ( λ2 or λ3 ) of the sliding mode functions: when the value of λ1 ( λ2 or λ3 ) becomes larger, the rise-time will become smaller, but at the same time, both overshoot and settling-time will become
GA-Based Composite Sliding Mode Fuzzy Control
797
larger, and vice versa. The values of coefficients k1 , k 2 and k 3 not only affect the stability of the closed-loop DPTOC system, but also affect the system control performance and the roles of the three subsystems in the system control. The speed of motor is limited in practice, and the trolley rated speed v e that transformed from the motor rated speed is the optimum transporting speed. In order to make the trolley transport with v e after the acceleration process, the position can be limited as follows when calculating the control variable: x1 = η1sat ( x1 / η1 ) .
(16)
where x1 is in the stead of x1 to obtain s in equation (8), Ș1 is the maximum of x1 which decides the decelerating distance from current position to desired position. The position sliding mode parameter λ1 can be obtained by:
λ1 = ve / η1 .
(17)
It is very difficult to determine the other parameters because these parameters haven’t obvious physics meaning and there are complex couplings among them. Therefore, real-valued GA [9] is used to optimize the parameters in the composite sliding mode fuzzy controller. In the application of real-valued GA to the optimizing problem, there are two considerations to be made: fitness function and encoding. a. Encoding The chromosome comprising five genes represents a set of parameters: P=[ λ 2 λ3 k1 k 2 k 3 ] b. Fitness Function The fitness function is the key to use the GA. A simple fitness function that reflects small steady-state errors, short rise-time, low oscillations and overshoots with a good stability is given J = 1 penalty
³ (μ
1
x + μ 2 θ 1 + μ 3 θ 2 + μ 4 u ) dt .
(18)
where μ1 , μ 2 , μ 3 and μ 4 are coefficients, which respectively represent the importance degree of position, hook angle, payload angle and control input in the cost function. The penalty is a penalty coefficient when position overshoot arises. c. Steps of Real-Valued GA The real-valued GA that we implemented can be described as follows: Step 1: Generate initial population, P(t) randomly. Step 2: While (number of generations 0 means that M is
a positive definite matrix. In a block symmetric matrix, * in (i, j ) block means the transpose of the matrix in ( j , i ) block. Finally ⋅ ∞ denotes the H ∞ norm of the system.
2 T-S Fuzzy System with IQC’s We consider the following fuzzy dynamic system with IQC’s. Plant Rule i ( i = 1," , r ) : IF ρ1 (t ) is M i1 and " and ρ g (t ) is M ig , THEN q
x (t ) = Ai x(t ) + Bi u (t ) + ¦ Fij p j (t ), j =1
q
(1)
y (t ) = Ci x(t ) + ¦ Gij p j (t ), j =1
q j (t ) = H ij x(t ) + J ij u (t ) + K ij p(t ), where r is the number of fuzzy rules. ρ j (t ) and M ik ( k = 1," , g ) are the premise variables and the fuzzy set respectively. x (t ) ∈ R n is the state vector, u (t ) ∈ R m is the input, y (t ) ∈ R p is the output variable, p j (t ) ∈ R
kj
and q j (t ) ∈ R
kj
are the uncertain
variables related to the following IQC’s:
³
∞
0
∞
p j (t )T p j (t )dt ≤ ³ q j (t )T q j (t )dt ,
with p(t ) = ª¬ p1 (t )T
0
"
T
j = 1," , q,
pq (t )T º¼ and q (t ) = ª¬ q1 (t )T
(2) T
" qq (t )T º¼ . In addition, Ai ,
Bi , " , K ij are constant matrices with compatible dimensions. To simplify notation, we define
804
S.-H. Yoo and B.-J. Choi
Fi = ª¬ Fi1 " Fiq º¼ , Gi = ª¬Gi1 " Giq º¼ , H i = ª¬ H i1T T
" H iqT º¼
T
,
T
J i = ª¬ J i1T " J iqT º¼ , K i = ª¬ K i1T " K iqT º¼ . Let μi ( ρ (t )) , i = 1," , r , be the normalized membership function of the inferred
fuzzy set hi ( ρ (t )) ,
μi ( ρ (t )) =
hi ( ρ (t )) r
,
(3)
¦ h ( ρ (t )) i
i =1
where g
hi ( ρ (t )) = ∏ M ik ( ρ k (t )) , ρ (t ) = ª¬ ρ1 (t ) ρ 2 (t ) " ρ g (t ) º¼ . T
k =1
Assuming hi ( ρ (t )) ≥ 0 for all i and
(4)
r
¦ h (ρ (t )) > 0 , we obtain i
i =1
μi ( ρ (t )) ≥ 0,
r
¦
μi ( ρ (t )) = 1 .
(5)
i =1
For simplicity, by defining μi = μi ( ρ (t )) and μ T = [ μ1 " μr ] the uncertain fuzzy system (1) can be written as follows : r
x (t ) = ¦ μi (Ai x(t ) + Bi u (t ) + Fi p(t )) i =1
= A( μ ) x (t ) + B( μ )u (t ) + F ( μ ) p(t ), r
y (t ) = ¦ μi (Ci x(t ) + Gi p (t )) i =1
(6)
= C ( μ ) x(t ) + G ( μ ) p(t ), r
q (t ) = ¦ μi (H i x(t ) + J i u (t ) + K i p (t )) i =1
= H ( μ ) x(t ) + J ( μ )u (t ) + K ( μ ) p (t ). In a packed matrix notation, we express the fuzzy system (6) as in (7). ª A( μ ) F ( μ ) B( μ ) º « » G = « H ( μ ) K (μ ) J ( μ ) » . « C (μ ) G(μ ) 0 » ¬ ¼
(7)
3 A Balanced Realization In this section we present a balanced realization for the uncertain fuzzy system (7) using generalized controllability and observability Gramians. We define generalized controllability and observability Gramians in lemma 1.
A Balanced Model Reduction for T-S Fuzzy Systems with IQC’s
805
Lemma 1: (Generalized controllability and observability Gramians) (i) Suppose that there exist Q = QT > 0 and R = diag (τ1 I k1 ," ,τ q I kq ) > 0 satisfying
LMI (9), then the output energy is bounded above as follows :
³
∞
0
y (t )T y (t )dt < x(0)T Qx(0) for u (t ) ≡ 0 .
ª Ai T Q + QAi « RH i Loi = « « FiT Q « Ci «¬
* * *º » * *» −R < 0, i = 1," , r. K iT R − R * » » 0 Gi − I »¼
(8)
(9)
(ii) Suppose that there exist P = PT > 0 and S = diag (τ 1 I k1 ," ,τ q I kq ) > 0 satisfying LMI (11), the input energy transferring from x(−∞) = 0 to x(0) = x0 is bounded below as follows:
³
0
−∞
u (t )T u (t )dt > x0T P −1 x0 ,
ª PAiT + Ai P * « Hi P −S Lci = « « SFi T SKiT « Bi T J iT «¬
* * −S 0
*º » *» < 0, i = 1," , r. *» » − I »¼
(10)
(11)
Proof: Using the Lyapunov function candidate V = x(t )T Qx(t ) (in the proof of (i)), V = x(t )T P −1 x(t ) (in the proof of (ii)) and the S-procedure, the theorem can be easily proved so that we omit the detailed proof. As in [6], we say Q and P , solutions of LMI's (9) and (11), are generalized observability Gramian and controllability Gramian respectively. While the observability and controllability Gramian in linear time invariant systems are unique, the generalized Gramians of the fuzzy system (7) are not unique. But the generalized Gramians are related to the input and output energy as can be seen in lemma 1. Using the generalized Gramians, we suggest a balanced realization of the uncertain fuzzy system (7). We obtain a transformation matrix T and W satisfying Σ = diag (σ 1 , σ 2 ," , σ n ) = T T QT = T −1 PT −T , σ1 ≥ σ 2 ≥ " ≥ σ n , Π = diag (π 1 I i1 , π 2 I i2 ," , π q I iq ) = W T RW = W −1 SW −T , π1 ≥ π 2 ≥ " ≥ π q ,
(12)
where i j ( j = 1," , q ) is a member of the index set {k1 , k2 ," , kq } . With T and W defined in (12), the change of coordinates in the fuzzy system (7) gives
806
S.-H. Yoo and B.-J. Choi
ª Ab ( μ ) Fb ( μ ) Bb ( μ ) º ª T −1 A( μ )T T −1 F ( μ )W « » « Gb = « H b ( μ ) Kb ( μ ) J b ( μ ) » = «W −1 H ( μ )T W −1 K ( μ )W « C ( μ ) G (μ ) G ( μ )W 0 » « C ( μ )T b ¬ b ¼ ¬
T −1 B( μ ) º » W −1 J ( μ ) » . » 0 ¼
(13)
One can easily observe that the state space realization of (13) satisfy following LMI's (14) and (15). ª Ab ( μ )T Σ + ΣAb ( μ ) + Cb ( μ )T Cb ( μ ) * * º « » < 0, Lo ( μ ) = « * ΠH b ( μ ) −Π » T T T T « »¼ G ( μ ) C ( μ ) F ( μ ) K G ( μ ) G ( μ ) + Σ Π − Π b b b b b b ¬
(14)
ª ΣAb ( μ )T + Ab ( μ )Σ + Bb ( μ ) Bb ( μ )T * * º « T T Lc ( μ ) = « H b ( μ )Σ + J b ( μ ) Bb ( μ ) J b ( μ ) J b ( μ ) − Π * »» < 0. « ΠFb ( μ )T Π K bT −Π »¼ ¬
(15)
From this reason, we say that the realization (13) is a balanced realization of the fuzzy system (7) and Σ is a balanced Gramian.
4 Balanced Model Reduction In this section, we develop a balanced model reduction scheme using the balanced Gramian defined in section 3. We also derive an upper bound of model approximation error. We assume that the fuzzy system (7) is already balanced and partitioned as follows :
ª A11 ( μ ) A12 ( μ ) F11 ( μ ) F12 ( μ ) B1 ( μ ) º « A ( μ ) A ( μ ) F ( μ ) F (μ ) B ( μ )» 22 21 22 2 « 21 » G = « H11 ( μ ) H12 ( μ ) K11 ( μ ) K12 ( μ ) J1 ( μ ) » « » « H 21 ( μ ) H 22 ( μ ) K 21 ( μ ) K 22 ( μ ) J 2 ( μ ) » « C (μ ) C (μ ) G (μ ) G (μ ) 0 » 2 1 2 ¬ 1 ¼ ª Ai ,11 «A « i ,21 r = ¦ μi « H i ,11 « i =1 « H i ,21 «C ¬ i ,1
Ai ,12
Fi ,11
Fi ,12
Ai ,22
Fi ,21
Fi ,22
H i ,12
Ki ,11
Ki ,12
H i ,22
Ki ,21
Ki ,22
Ci ,2
Gi ,1
Gi ,2
(16)
Bi ,1 º Bi ,2 »» J i ,1 » , » J i ,2 » 0 » ¼
where A11 ( μ ) ∈ R k × k , F11 ( μ ) ∈ R k ×( i1 +"+ iv ) and the other matrices are compatibly partitioned. From (16) we obtain a reduced order model by truncating n − k states and q − v IQC’s as follows:
A Balanced Model Reduction for T-S Fuzzy Systems with IQC’s
ª A11 ( μ ) F11 ( μ ) B1 ( μ ) º ª Ai ,11 « » r « G = « H11 ( μ ) K11 ( μ ) J1 ( μ ) » = ¦ μi « H i ,11 i =1 « C (μ ) G (μ ) «C 0 » 1 ¬ 1 ¼ ¬ i ,1
Fi ,11 K i ,11 Gi ,1
Bi ,1 º » J i ,1 ». 0 » ¼
807
(17)
Theorem 2: The reduced order system (17) is quadratically stable and balanced. Moreover the model approximation error is given by
G −G
∞
≤ 2(
q
n
¦σ
j = k +1
j
+
¦π
j
).
(18)
j = v +1
Proof: We partition Σ = diag (Σ1 , Σ 2 ) and Π = diag (Π1 , Π 2 ) where Σ1 ∈ R k × k , Π1 ∈ R ( i1 +"+ iv )×( i1 +"+ iv ) . Then the reduced order system (17) satisfies LMI's (19) and (20).
ª A11 ( μ )T Σ1 + Σ1 A11 ( μ ) + C1 ( μ )T C1 ( μ ) * * º « » Π1 H11 ( μ ) −Π1 * « » < 0, T T T T « » + Σ Π − Π G ( μ ) C ( μ ) F ( μ ) K ( μ ) G ( μ ) G ( μ ) 1 1 11 1 1 1 1 1 1¼ ¬ ªΣ1 A11 ( μ )T + A11 ( μ )Σ1 + B1 ( μ ) B1 ( μ )T * « T H11 ( μ )Σ1 + J1 ( μ ) B1 ( μ ) J 1 ( μ ) J 1 ( μ ) T − Π1 « « Π1 F11 ( μ )T Π1 K1 ( μ )T ¬
* º * »» < 0. −Π1 ¼»
(19)
(20)
Hence the reduced order system is quadratically stable and balanced. Without loss of generality we consider two cases. Case1: ( k = n − 1, v = q ) Note that in this case F12 ( μ ) , F22 ( μ ) , H 21 ( μ ) , H 22 ( μ ) , J 2 ( μ ) and G2 ( μ ) are empty matrices. Hence a state space realization of the error system G e = G − G can be written by
ª Ae ( μ ) Fe ( μ ) Be ( μ ) º « » G = « H e ( μ ) Ke (μ ) J e (μ ) » « C (μ ) G (μ ) 0 » e ¬ e ¼ 0 0 F11 ( μ ) 0 B1 ( μ ) º ª A11 ( μ ) « 0 A11 ( μ ) A12 ( μ ) 0 F11 ( μ ) B1 ( μ ) »» « « 0 A21 ( μ ) A22 ( μ ) 0 F21 ( μ ) B2 ( μ ) » := « ». H ( ) 0 0 K ( ) 0 J (μ ) » μ μ « 11 « 0 H11 ( μ ) H12 ( μ ) 0 K (μ ) J (μ ) » « » 0 » «¬ −C1 ( μ ) C1 ( μ ) C2 ( μ ) −G ( μ ) G ( μ ) ¼ e
(21)
808
S.-H. Yoo and B.-J. Choi
The change of coordinate with M in the error system gives ª Ae ( μ ) Fe ( μ ) Be ( μ ) º ª M −1 Ae ( μ ) M « » « G e = « H e ( μ ) K e ( μ ) J e ( μ ) » := « H e ( μ ) M « C (μ ) G (μ ) 0 » « Ce ( μ ) M e ¬ e ¼ ¬
M −1 Fe ( μ ) M −1 Be ( μ ) º » K e (μ ) J e (μ ) » , » Ge ( μ ) 0 ¼
(22)
where
ª I I 0º M = «« I − I 0 »» . «¬0 0 I »¼ It is well known that the existence of Σ e = ΣTe > 0 and Π e = Π Te > 0 satisfying following LMI (23) guarantees G e
∞
≤γ .
Γ11 * ª « T L=« H e ( μ )Σ e + J e ( μ ) Be ( μ ) J e ( μ ) J e ( μ )T − Π e T T −2 «¬Π e Fe ( μ ) + γ Π e Ge ( μ ) Ce ( μ )Σ e Π e K e ( μ )T
* º * »» < 0, Γ33 »¼
(23)
where
Γ11 = Σ e Ae ( μ )T + Ae ( μ )Σe + Be ( μ ) Be ( μ )T + γ −2 Σe Ce ( μ )T Ce ( μ )Σ e , Γ 33 = γ −2 Π e Ge ( μ )T Ge ( μ )Π e − Π e . ªΠ + σ n2 Π −1 Let γ = 2σ n , Σ e = diag (Σ1 , σ n2 Σ1−1 , 2σ n ) and Π e = « 2 −1 ¬Π − σ n Π LMI (23) can be written as follows : ªU1T « L=« 0 « 0 ¬ ªV1T « +« 0 «0 ¬
0 V2T 0
0 U 2T 0
ªU1 0 º » « 0 » Lc ( μ ) « 0 «0 U 2T »¼ ¬
ªV1 0º » « 0 » Lo ( μ ) « 0 «0 V2T »¼ ¬
0 V2 0
0 U2 0 0 0 V2
0º » 0» U 2 »¼ º » » < 0, » ¼
Π − σ n2 Π −1 º » . Then Π + σ n2 Π −1 ¼
(24)
where ª σ n Π −1 º ª 0 σ n Σ1−1 0 º ª I 0 0º ªI º T T , , , . U1 = « U V V = = = » 2 « 2 » « » 1 « −1 » −I ¼ 0 ¬0 0 I ¼ ¬I ¼ ¬0 ¬ −σ n Π ¼ Case 2: ( k = n, v = q − 1 ) In this case, A12 ( μ ) , A21 ( μ ) , A22 ( μ ) , F21 ( μ ) , F22 ( μ ) , H12 ( μ ) and H 22 ( μ ) are empty matrices so that the error system becomes
A Balanced Model Reduction for T-S Fuzzy Systems with IQC’s
ª Ae ( μ ) « G = « H e (μ ) « C (μ ) ¬ e ª A( μ ) « 0 « « H11 ( μ ) := « « 0 « 0 « −C ( μ ) ¬« e
809
Fe ( μ )
Be ( μ ) º » Ke (μ ) J e (μ ) » Ge ( μ ) 0 » ¼ F11 ( μ ) B(μ ) º 0 0 0 A( μ ) F11 ( μ ) F12 ( μ ) B ( μ ) »» 0 K11 ( μ ) J1 ( μ ) » 0 0 0 ». H11 ( μ ) K11 ( μ ) K12 ( μ ) J1 ( μ ) » 0 H 21 ( μ ) K 21 ( μ ) K 22 ( μ ) J 2 ( μ ) » 0 » C ( μ ) −G1 ( μ ) G1 ( μ ) G2 ( μ ) 0 » ¼
(25)
The change of coordinate with M in the error system gives ª Ae ( μ ) Fe ( μ ) Be ( μ ) º ª M −1 Ae ( μ ) M « » « G e = « H e ( μ ) K e ( μ ) J e ( μ ) » := « H e ( μ ) M « C (μ ) G (μ ) 0 » « Ce ( μ ) M e ¬ e ¼ ¬
(26)
M −1 Fe ( μ ) M −1 Be ( μ ) º » K e (μ ) J e (μ ) » , » Ge ( μ ) 0 ¼
ªI I º where M = « ». ¬ I −I ¼ We define γ = 2π q and ª Π1 + π q2 Π1−1 0 º ªΠ1 0 º ªΣ « 2 −1 Π=« » , Σe = « 2 −1 » , Π e = « Π1 − π q Π1 «¬ 0 π q I iq »¼ ¬0 πq Σ ¼ « 0 ¬
Π1 − π q2 Π1−1 Π1 + π q2 Π1−1 0
º » ». 2π q I iq » ¼
0 0
Then LMI (23) can be written as ªU1T « L=« 0 « 0 ¬ ªV1T « +« 0 «0 ¬
0 T 2
U 0
0 V2T 0
ªU1 0 º » « 0 » Lc ( μ ) « 0 «0 U 2T »¼ ¬ ªV1 0º » « 0 » Lo ( μ ) « 0 T » «0 V2 ¼ ¬
0 U2 0 0 V2 0
0º » 0» U 2 »¼
(27)
0º » 0 » < 0, V2 »¼
where
U1 = [ I
ªπ Π −1 ª I I 0º 0] , U 2 = « , V1 = ª¬ 0 π q Σ −1 º¼ , V2 = « q 1 » ¬0 0 I ¼ ¬ 0
−π q Π1−1 0
0º ». −I ¼
This completes the proof. In theorem 2, we have derived an upper bound of the model reduction error. In order to get a less conservative model reduction error bound, it is necessary for n − k
810
S.-H. Yoo and B.-J. Choi
smallest σ i ’s of Σ and q − v smallest π i ’s of Π to be small. Hence we choose a cost function as J = tr ( PQ) + α tr ( RS ) for a positive constant α . Thus, we minimize the non-convex cost function subject to the convex constraints (9) and (11). Since this optimization problem is non-convex, the optimization problem is very difficult to solve it. So we suggest an alternative suboptimal procedure using an iterative method. We summarize an iterative method to solve a suboptimal problem. step 1: Set i = 0 . Initialize Pi , Qi , Ri and Si such that tr ( Pi + Qi ) + α tr ( Ri + Si ) is minimized subject to LMI's (9) and (11). step 2: Set i = i + 1 . 1) Minimize J i = tr ( Pi −1Qi ) + α tr ( Ri Si −1 ) subject to LMI (9). 2) Minimize J i = tr ( PQ i i ) + α tr ( Ri Si ) subject to LMI (11). step 3: If J i − J i −1 is less than a small tolerance level, stop iteration. Otherwise, go to step 2.
5 Concluding Remark In this paper, we have studied a balanced model reduction problem for T-S fuzzy systems with IQC’s. For this purpose, we have defined generalized controllability and observability Gramians for the uncertain fuzzy system. This generalized Gramians can be obtained from solutions of LMI problem. Using the generalized Gramians, we have derived a balanced state space realization. We have obtained the reduced model of the fuzzy system by truncating not only some state variables but also some IQC’s.
References 1. Moore, B.C. : Principal component analysis in linear systems: Controllability, observability and model reduction. IEEE Trans. Automatic Contr., Vol. 26 (1982) 17-32 2. Pernebo, L., Silverman, L.M. : Model reduction via balanced state space representations. IEEE Trans. Automatic Contr., Vol. 27 (1982) 382-387 3. Glover, K. : All optimal Hankel-norm approximations of linear multivariable systems and their error bounds. Int. J. Control, Vol. 39 (1984) 1115-1193 4. Liu, Y., Anderson, B.D.O. : Singular perturbation approximation of balanced systems. Int. J. Control, Vol. 50 (1989) 1379-1405 5. Beck, C.L., Doyle, J., Glover, K. : Model reduction of multidimensional and uncertain systems. IEEE Trans. Automatic Contr., Vol. 41 (1996) 1466-1477 6. Wood, G.D., Goddard, P.J., Glover, K. : Approximation of linear parameter varying systems. Proceedings of the 35th CDC, Kobe, Japan, Dec. (1996) 406-411 7. Wu, F. : Induced L2 norm model reduction of polytopic uncertain linear systems. Automatica, Vol. 32. No. 10 (1996) 1417-1426 8. Haddad, W.M., Kapila, V. : Robust, reduced order modeling for state space systems via parameter dependent bounding functions. Proceedings of American control conference, Seattle, Washington, June (1996) 4010-4014
A Balanced Model Reduction for T-S Fuzzy Systems with IQC’s
811
9. Tanaka, K., Ikeda, T., Wang, H.O. : Robust stabilization of a class of uncertain nonlinear systems via fuzzy control : Quadratic stabilizability, control theory, and linear matrix inequalities. IEEE Trans. Fuzzy Systems, Vol. 4. No. 1. Feb. (1996) 1-13 10. Nguang, S.K., Shi, P. : Fuzzy output feedback control design for nonlinear systems : an LMI approach. IEEE Trans. Fuzzy Systems, Vol. 11. No. 3. June (2003) 331-340 11. Tuan, H.D., Apkarian, P., Narikiyo, T., Yamamoto, Y. : Parameterized linear matrix inequality techniques in fuzzy control system design. IEEE Trans. Fuzzy Systems, Vol. 9. No. 2. April (2001) 324-332 12. Yakubovich, V.A. : Frequency conditions of absolute stability of control systems with many nonlinearities. Automatica Telemekhanica, Vol. 28 (1967) 5-30
An Integrated Navigation System of NGIMU/ GPS Using a Fuzzy Logic Adaptive Kalman Filter Mingli Ding and Qi Wang Dept. of Automatic Test and Control, Harbin Institute of Technology, 150001 Harbin, China [email protected]
Abstract. The Non-gyro inertial measurement unit (NGIMU) uses only accelerometers replacing gyroscopes to compute the motion of a moving body. In a NGIMU system, an inevitable accumulation error of navigation parameters is produced due to the existence of the dynamic noise of the accelerometer output. When designing an integrated navigation system, which is based on a proposed nine-configuration NGIMU and a single antenna Global Positioning System (GPS) by using the conventional Kalman filter (CKF), the filtering results are divergent because of the complicity of the system measurement noise. So a fuzzy logic adaptive Kalman filter (FLAKF) is applied in the design of NGIMU/GPS. The FLAKF optimizes the CKF by detecting the bias in the measurement and prevents the divergence of the CKF. A simulation case for estimating the position and the velocity is investigated by this approach. Results verify the feasibility of the FLAKF.
1 Introduction Most current inertial measurement units (IMU) use linear accelerometers and gyroscopes to sense the linear acceleration and angular rate of a moving body respectively. In a non-gyro inertial measurement unit (NGIMU) [1-6], accelerometers are not only used to acquire the linear acceleration, but also replace gyroscopes to compute the angular rate according to their positions in three-dimension space. NGIMU has the advantages of anti-high g value shock, low power consumption, small volume and low cost. It can be applied to some specific occasions such as tactic missiles, intelligent bombs and so on. But due to the existence of the dynamic noise of the accelerometer output, it is inevitable that the system error increases quickly with time by integrating the accelerometer output. The best method to solve this problem above is the application of the integrated navigation system. NGIMU/GPS integrated navigation system can fully exert its superiority and overcome its shortcomings to realize the real-time high precision positioning in a high kinematic and strong electrically disturbed circumstance. But when using the conventional Kalman filter (CKF) in the NGIMU/GPS, the filtering results are often divergent due to the uncertainty of the statistical characteristics of dynamic noise of the accelerometer output and the system measurement noise. So, in order to ascertain the statistical characteristics of the noises mentioned above and L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 812 – 821, 2005. © Springer-Verlag Berlin Heidelberg 2005
A Integrated Navigation System of NGIMU/ GPS Using a FLAKF
813
alleviate the consumption error, a new fuzzy logic adaptive Kalman filter (FLAKF) [7] is proposed in designing a NGIMU/GPS integrated navigation system.
2 Accelerometer Output Equation As all know, the precession of gyroscopes can be used to measure the angular rate. Based on this principle, IMU measures the angular rate of a moving body. The angle value can be obtained by integrating the angular rate with given initial conditions. With this angle value and the linear acceleration values in three directions, the current posture of the moving body can be estimated. The angular rate in a certain direction can be calculated by using the linear acceleration between two points. To obtain the linear and angular motion parameters of a moving body in three-dimension space, the accelerometers need to be appropriately distributed on the moving body and the analysis of the accelerometer outputs is needed. An inertial frame and a rotating moving body frame are exhibited in Fig. 1, where b represents the moving body frame and I the inertial frame. ZI Zb
M(x,y,z)
ω
r
Yb
Ob
R' R
Xb
OI
YI
XI Fig. 1. Geometry of body frame (b) and inertial frame (I)
The acceleration of point M is given by
+ r + Ȧ × r + 2Ȧ × r + Ȧ × (Ȧ × r ) , a=R I b b
(1)
is the inertial acwhere rb is the acceleration of point M relative to body frame. R I b I celeration of O relative to O . 2Ȧ × rb is known as the Coriolis acceleration, Ȧ × ( Ȧ × r ) represents a centripetal acceleration, and Ȧ × r is the tangential acceleration owing to angular acceleration of the rotating frame If M is fixed in the b frame, the terms rb and rb vanish. And Eq.(1) can be rewritten as
+ Ȧ × r + Ȧ × (Ȧ × r ) . a=R I
(2)
814
M. Ding and Q. Wang
Thus the accelerometers rigidly mounted at location ri on the body with sensing direction și produce Ai as outputs. + ȍ r + ȍȍr ] ⋅ ș Ai = [ R I i i i
(i = 1,2,..., N ) ,
(3)
where ª 0 « ȍ = « ωz «− ω y ¬
− ωz 0
ωx
º ªR ωy º Ix » « » − ω x » , RI = « RIy » .
(4)
» «R ¬ Iz ¼
0 »¼
3 Nine-Accelerometer Configuration In this research, a new nine-accelerometer configuration of NGIMU is proposed. The locations and the sensing directions of the nine accelerometers in the body frame are shown in Fig.2. Each arrow in Fig.2 points to the sensing direction of each accelerometer.
Z A8
A6
A5
A7 A9
O
A2
A4 Y
A1
A3 X Fig. 2. Nine-accelerometer configuration of NGIMU
The locations and sensing directions of the nine accelerometers are
ª0 0 1 − 1 0 0 0 0 1 º [r1 ,", r9 ] = l ««1 − 1 0 0 1 − 1 0 0 0»» , «¬0 0 0 0 0 0 1 1 0»¼
(5)
where l is the distance between the accelerometer and the origin of the body frame. ª1 1 0 0 0 0 1 0 0 º [ș1 ,", ș9 ] = ««0 0 1 1 0 0 0 1 0»» . «¬0 0 0 0 1 1 0 0 1»¼
(6)
A Integrated Navigation System of NGIMU/ GPS Using a FLAKF
815
It is easy to obtain
ª 0 0 0 0 1 −1 0 −1 0 º [r1 × ș1 , " , r9 × ș 9 ] = l «« 0 0 0 0 0 0 1 0 − 1»» . «¬− 1 1 1 − 1 0 0 0 0 0 »¼
(7)
With Eq.(3), we get the accelerometer output equation ª0 «0 « «0 « «0 Ai = « l « «− l «0 « «− l «0 ¬
0
−l
0
l
0 0 0
l −l 0
0
0
l
0
0 −l
0 0
1 0 0º ª0 » «0 1 0 0 » ª ω x º « 0 1 0» « » «0 » ω y « 0 1 0» « » «0 « ω z » 0 0 1 » « » + « 0 » R Ix « 0 0 1 » « » « 0 « R Iy » 1 0 0 » « » « 0 » « R Iz » « 0 1 0» ¬ ¼ «0 «0 0 0 1 »¼ ¬
0 0
0
0
0 0
0
0
0 0
0
0
0 0 0 0
0 l
0 0
0 0
−l
0
0 0
0
l
0 0
l
0
0 0
0
l
l º − l» » ª ω x2 º l »« 2 » » ωy − l»« 2 » « ωz » 0 »« ». » «ω y ω z » 0» «ω x ω z » » 0 »« » «ω x ω y ¼» 0 »¬ 0 »¼
(8)
With Eq.(8), the linear expressions are 1 ( A3 + A4 + A5 − A6 − 2 A8 ) , 4l
(9a)
1 (− A1 − A2 + A5 + A6 + 2 A7 − 2 A9 ) , 4l
(9b)
1 ( − A1 + A2 + A3 − A4 ) , 4l
(9c)
ω x =
ω y =
ω z =
= 1 ( A + A ) , R = 1 ( A + A ) , R = 1 ( A + A ) . R Iz 5 6 Ix 1 2 Iy 3 4 2 2 2
(9d)
4 Conventional Kalman Filter (CKF) In Eq.(9), the linear acceleration and the angular acceleration are all expressed as the linear combinations of the accelerometer outputs. The conventional algorithm computes the navigation parameters as the time integration or double integrations of the equations in Eq.(9). But a numerical solution for the navigation parameters depends on the value calculated from previous time steps. And if the accelerometer output has a dynamic error, the error of the navigation parameters will inevitably increase with t and t 2 rapidly. So the design of a NGMIMU/GPS integrated navigation system is expected. In this section, the CKF is used in the system.
816
M. Ding and Q. Wang
In order to analyze the problem in focus, we ignore the disturbance error contributed to the accelerometers due to the difference of the accelerometers’ sensing directions in three-dimension space. Define the states vector X (t ) for the motion as X (t ) = [S e (t ) S N (t ) Ve (t ) V N (t ) ω (t )] , T
(10)
where S e (t ) is the estimation eastern position of the moving body at time t with respect to the earth frame (as inertial frame), S N (t ) is the estimation northern position, Ve (t ) is the estimation eastern velocity, V N (t ) is the estimation northern velocity and ω (t ) is the estimation angular rate along x axis. Considering the relationship between the parameters, the states equations are then S e = Ve , S N = V N , Ve = a e , VN = a N , ω = ω x ,
(11)
+ T R + T R , a = T R + T R + T R . a e = T11 R Ix 21 Iy 31 Iz N 12 Ix 22 Iy 32 Iz
(12)
where
In Eq.(12), T11 , T21 , T31 , T12 , T22 and T32 are the components of the coordinate transªT11 form matrix T = «T21 « «¬T31 We also obtain
T12 T22 T32
ω x =
T13 º T23 » . » T33 »¼ 1 ( A3 + A4 + A5 − A6 − 2 A8 ) . 4l
(13)
The system state equation and system measurement equation in matrix form become X = ȌX + Gu + īW ,
(14)
Z = HX + İ .
(15)
and
In the system measurement equation Eq.(15), the input vector X is the output of the GPS receiver (position and velocity). In Eq.(14) and Eq.(15), İ and W denote the measurement noise matrix and the dynamic noise matrix respectively. The preceding results are expressed in continuous form. Equation of state and measurement for discrete time may be deduced by assigning t = kT , where k = 1,2,... , and T denotes the sampling period. Straightforward application of the discrete time Kalman filter to (14) and (15) yields the CKF algorithm as outlined below. Xˆ 0 0 is the initial estimate of the CKF state vector. P0 0 is the initial estimate of the CKF state vector error covariance matrix.
A Integrated Navigation System of NGIMU/ GPS Using a FLAKF
817
5 Fuzzy Logic Adaptive Kalman Filter (FLAKF) The CKF algorithm mentioned in the above section requires that the dynamic noise and the system measurement noise process are exactly known, and the noises processes are zero mean white noise. In practice, the statistical characteristics of the noises are uncertain. In the GPS measurement equation Eq.(15), among the measurement noise İ , the remnant ionosphere delay modified by the ionosphere model is just not zero mean white noise. Furthermore, the value of the dynamic error of the accelerometer output also cannot be exactly obtained in a kinematic NGIMU/GPS positioning. These problems result in the calculating error of K k and make the filtering process divergent. In order to solve the divergence due to modeling error, In this paper, a fuzzy logic adaptive Kalman filter (FLAKF) is proposed to adjust the exponential weighting of a weighted CKF and prevent the Kalman filter from divergence. The fuzzy logic adaptive Kalman filter will continually adjust the noise strengths in the filter’s internal model, and tune the filter as well as possible. The structure of NGIMU/GPS using FLAKF is shown in Fig. 3.
NGIMU System
EKF
Output
Residual error GPS System
Pseudo-range
FLAKF
Fig. 3. Structure of NGIMU/GPS using FLAKF
The fuzzy logic is a knowledge-based system operating on linguistic variables. The advantages of fuzzy logic with respect to more traditional adaptation schemes are the simplicity of the approach and the application of knowledge about the controlled system. In this paper, FLAKF is to detect the bias in the measurement and prevent divergence of the CKF. Let us assume the model covariance matrices as Rk = αRv , ® ¯Qk = βQ v
(16)
where α and β are the adjustment ratios which are time-varying. The value of α and β can be acquired from the outputs of FLAKF. Let us define į = Z − HXˆ i
k k −1
as residual error, which reflects the degree to which the NGIMU/GPS model fits the data. According to the characteristic of the residual error of CKF, the variance matrixes of the dynamic noise of the accelerometer output and the measurement noise can be adjust self-adaptively using α and β . If α = β = 1 , we obtain a regular CKF. The good way to verify whether the Kalman filter is performing as designed is to monitor the residual error. It can be used to adapt the filter. In fact, the residual error į is the difference between the actual observing results and the measurement predic-
818
M. Ding and Q. Wang
tions based on the filter model. If a filter is performing optimally, the residual error is a zero-mean white noise process. The covariance of residual error Pr relates to Qv and Rv . The covariance of the residual error is given by: Pr = H (ȌPk −1Ȍ T + Qv ) H T + Rv .
(17)
Using some traditional fuzzy logic system for reference, the Takagi-Sugeno fuzzy logic system is used to detect the divergence of CKF and adapt the filter. According to the variance and the mean of the residual error, two fuzzy rule groups are built up. To improve the performance of the filter, the two groups calculate the proper α and β respectively, and readjust the covariance matrix Pr of the filter. As an input to FLAKF, the covariance of the residual error and the mean value of the residual error are used in order to detect the degree of the divergence. By choosing n to provide statistical smoothing, the mean and the covariance of the residual error are
į=
Pr =
1 n
1 n
t
¦į
j
,
(18)
j =t −n
t
¦į į j
T j
.
(19)
j = t − n +1
The estimated value Pr can be compared with its theoretical value Pr calculated from CKF. Generally, when covariance Pr is becoming larger than theoretical value Pr , and mean value į is moving from away zero, the Kalman filter is becoming unstable. In this case, a large value of β is applied. A large β means that process noises are added and we are giving more credibility to the recent data by decreasing the noise covariance. This ensures that all states in the model are sufficiently excited by the process noise. Generally, Rv has more impact on the covariance of the residual error. When the covariance is extremely large and the mean takes the values largely different from zero, there are presumably problems with GPS measurements. Therefore, the filter cannot depend on these measurements anymore, and a smaller α will be used. By selecting the appropriate α and β , the fuzzy logic controller optimally adapt the Kalman filter and tries to keep the innovation sequence act as a zero-mean white noise. The membership functions of the covariance and the mean value of the residual error are also built up.
6 Simulations and Results The simulations of NGIMU/GPS using the CKF and the FLAKF are performed respectively in this section. Fig. 4, Fig. 5, Fig. 6 and Fig .7 illustrate the eastern position estimating error, the northern position estimating error, the eastern velocity estimating error and the northern velocity estimating error respectively. In this simulation, the GPS receiver used is the Jupiter of Rockwell Co.. The initial conditions in position, velocity, posture angle and angular rate are x (0) = 0 m, y (0) = 0 m, z (0) = 0 m, v x (0) = 0 m/s, v y (0) = 0 m/s, v z (0) = 0 m/s,
α x = 0 rad
α y = 0 rad
A Integrated Navigation System of NGIMU/ GPS Using a FLAKF
819
α z = π 3 rad, ω x (0) = 0 rad/s, ω y (0) = 0 rad/s, ω z (0) = 0 rad/s respectively. The accelerometer static bias is 10-5g and the swing of posture angle is 0.2 rad. Moreover, when using the CKF, assume that W and İ are all Gaussian distribution, the covariance are Qv = (0.01) I 9×9 and Rv = (0.01) I 5×5 respectively, and P0 0 = (0.01) I 5×5 . The time required for simulation is 100s, and that for sampling is 10ms. Comparing the curves in Fig 4 and Fig. 5, it is obvious that the eastern position estimating error and the northern position estimating error of NGIMU/GPS using the two filtering approaches are all leveled off after estimating for some time. And the errors acquired with the FLAKF are less than those with the CKF. In Fig.4, the error drops from 160m to 50m after using the FLAKF at 100s. In Fig.5, that is 220m to 100m. The similar results are also acquired in Fig.6 and Fig. 7 in the velocity estimation. The curves indicate that the NGIMU/GPS with the FLAKF can effectively alleviate the error accumulation of the estimation of the navigation parameters. When a designer lacks sufficient information to develop complete models or the parameters will slowly change with time, the FLAKF can be used to adjust the performance of CKF on-line.
Fig. 4. The estimating error of the eastern position
Fig. 5. The estimating error of the northern position
820
M. Ding and Q. Wang
Fig. 6. The estimating error of the eastern velocity
Fig. 7. The estimating error of the northern velocity
6 Conclusions Due to the existence of the dynamic noise of the accelerometer output, it is inevitable that the navigation parameter estimation error increases quickly with time by integrating the accelerometer output. The use of the FLAKF to design a NGIMU/GPS based on a NGIMU of nine-accelerometer configuration can overcome the uncertainty of the statistical characteristics of the noises and alleviate the errors accumulation speed. By monitoring the innovations sequence, the FLAKF can evaluate the performance of a CKF. If the filter does not perform well, it would apply two appropriate weighting factors α and β to improve the accuracy of a CKF. In FLAKF, there are 9 rules and therefore, little computational time is needed. It can be used to navigate and guide autonomous vehicles or robots and achieved a relatively accurate performance. Also, the FLAKF can use lower state-model without compromising accuracy significantly. Another words, for any given accuracy, the FLAKF may be also to use a lower order state model.
A Integrated Navigation System of NGIMU/ GPS Using a FLAKF
821
References 1. L.D. DiNapoli: The Measurement of Angular Velocities without the Use of Gyros. The Moore School of Electrical Engineering, University of Pennsylvania, Philadelphia (1965) 34-41 2. Alfred R. Schuler: Measuring Rotational Motion with Linear Accelerometers. IEEE Trans. on AES. Vol. 3, No. 3 (1967) 465-472 3. Shmuel J. Merhav: A Nongyroscopic Inertial Measurement Unit. J. Guidance. Vol. 5, No. 3 (1982) 227-235 4. Chin-Woo Tan, Sungsu Park: Design of gyroscope-free navigation systems. Intelligent Transportation Systems, 2001 Proceedings. Oakland (2001) 286-291 5. Sou-Chen Lee, Yu-Chao Huang: Innovative estimation method with measurement likelihood for all-accelerometer type inertial navigation system. IEEE Trans. on AES. Vol. 38, No. 1 (2002) 339-346 6. Wang Qi, Ding Mingli and Zhao Peng: A New Scheme of Non-gyro Inertial Measurement Unit for Estimating Angular Velocity. The 29th Annual Conference of the IEEE Industry Electronics Society (IECON’2003). Virginia (2003) 1564-1567 7. J. Z. Sasiadek, Q. Wang, M. B. Zeremba: Fuzzy Adaptive Kalman Filtering For INS/GPS Data Fusion. Proceedings of the 15th IEEE International Symposium on Intelligent Control, Rio, Patras, GREECE (2000) 181-186
Method of Fuzzy-PID Control on Vehicle Longitudinal Dynamics System Yinong Li1, Zheng Ling1, Yang Liu1, and Yanjuan Qiao2 1
State Key Laboratory of Mechanical Transmission, Chongqing University, Chongqing 400044, China [email protected] 2 Changan Automobile CO.LTD Technology Center, Chongqing 400044, China
Abstract. Based on the analysis of the vehicle dynamics control system, the longitudinal control of tracking for the platoon of two vehicles is discussed. A second-order-model for the longitudinal relative distance control between vehicles is presented, and the logic switch rule between acceleration and deceleration of the controlled vehicle is designed. Then the parameters autoadjusting fuzzy-PID control method is used by adjusting the three parameters of PID to control the varying range of error for the longitudinal relative distance and relative velocity between the controlled vehicle and the navigation vehicle, in order to realized the longitudinal control of a vehicle. The simulation results shown that compared with fuzzy control, the parameters auto-adjusting fuzzyPID control method decreases the overshoot, enhances the capacity of antidisturbance, and has certain robustness. The contradictory between rapidity and small overshoot has been solved.
1 Introduction With the rapid development and increased preservation of the vehicle in recent years, traffic jam and accident has become a global problem. Under the background of increasingly deteriorated traffic conditions, many countries has embarked on the research of the Intelligent Transport System (ITS) in order to utilize the roads and vehicles most efficiently, increase the vehicle security and reduce the pollution and congestion. In the Intelligent Transport System, automatic driving, which is an important component of transport automatization and intelligence, has become popular and important in vehicle research [1-2]. As a main content of the vehicle active safety, the vehicle longitudinal dynamic control system is composed of the super layer system and the under layer system[3]. The super layer of the vehicle longitudinal dynamic control system is to realize the automatic tracking of the longitudinal space of the vehicle platoon. The under layer of the vehicle longitudinal dynamic control system is to transfer the output of the super layer system to the controlled vehicle system, and computed the expected acceleration/deceleration. The longitudinal dynamic control system is the combination of the super layer system and the under layer system. It is a complicated nonlinear system, thus the actual characteristics of this system are hard to be described exactly by linear system L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 822 – 832, 2005. © Springer-Verlag Berlin Heidelberg 2005
Method of Fuzzy-PID Control on Vehicle Longitudinal Dynamics System
823
[4]. For the vehicle dynamic longitudinal control system, intelligent control theory is used to fit the actual system more exactly, to realize real time control, and to respond more quickly. In this paper, parameter auto-adjusting fuzzy-PID control method is adopted to control the established vehicle longitudinal dynamic system, the real-time performance and the validity are studies in order to realize and improve the control performance of the vehicle longitudinal dynamic system.
2 Longitudinal Relative Distance Control Model 2.1 Longitudinal Relative Distance Control Between Two Vehicles The control quality of the longitudinal relative distance between two vehicles is an important index to evaluate the active security of the vehicle. The control parameter of this system is the longitudinal relative distance between the leader and the follower. Considering the dynamic response characteristic of the system, we take the chang rate of the longitudinal relative distance between the two vehicles as another parameter of this control system to improve the accuracy of the model. The function of the throttle/brake switch logic control system is to determine the switch between the engine throttle and brake main cylinder, and transmit the expected acceleration or deceleration to the under layer of the longitudinal dynamic control system [5]. The logic control for throttle or brake switch is shown in Fig1. Control rule one (engine throttle)
The engine
The switch logic
Control rule two (brake master cylinder)
The brake master cylinder
The controlled vehicle
Fig. 1. The logic switch between throttle / brake
The one dimensional control model of the longitudinal relative distance error is described as follow
δ d ′ = ( x h − x) − L − H
(1)
where H is the expected longitudinal relative distance between the two vehicles; L is the length of the vehicle body; xh , x are the longitudinal dimensions of the back bumper of the leading vehicle and the controlled vehicle, respectively; δd′ is the error of the longitudinal relative distance. For this one dimension control model, the structure is simple, the physical meaning is clear, and few information are required, so it is easy to realize the control purpose. However, when in the actual longitudinal running condition, the error of the longitudinal relative distance between the two vehicles is related to the change rate of the longitudinal displacement (i.e. running speed) of the controlled vehicle.
824
Y. Li et al.
Considering the influence of the controlled vehicle’s running speed to the accuracy of the model, some researches proposed the two dimensional model for the longitudinal relative distance control [6]:
δ d = ( xh − x) − L − H − λ ⋅ v = δ d ′ − λ ⋅ v
(2)
where δd is the error of the longitudinal relative distance between the two vehicles; λ is the compensate time for the controlled vehicle to converge to δd′; v is the running speed of the controlled vehicle. 2.2 Longitudinal Relative Speed Control The error of the longitudinal relative speed is expressed by:
δ v = vh − v
(3)
where δv is the error of the longitudinal relative speed; vh is the speed of the leading vehicle. 2.3 Second-Order Model for the Longitude Relative Distance Control Combined the two dimensions control model of the longitudinal relative distance and the control model of the relative speed between the two vehicles, a second-order model for the longitudinal relative distance control can be established (rewrite equation (2) and equation (3)), and the state space equations of this model can be written as follows
δ d = ( xh − x) − L − H − λ ⋅ v ® δ v = vh − v ¯ ª0 º ª− λ º ª0 1 º X = A ⋅ X + B ⋅ u + Γ ⋅ w = « ⋅ X + « » ⋅u + « » ⋅ w » ¬1¼ ¬ −1¼ ¬0 0 ¼
(4)
(5)
where X is the state variables vector of the control system, X T = [ x1 x 2 ] = [δ d δ v ] ; u is the control variable of the control system (acceleration and deceleration of the controlled vehicle); w is the disturbance variable of the control system (acceleration or deceleration of the leading vehicle). This second-order control model includes information about the longitudinal displacement, speed, acceleration and deceleration of the leading vehicle and the controlled vehicle respectively. The information can reflect the real time performance, dynamic response and dynamic characteristics of the control system based on the automatic tracking from a vehicle to a vehicle. This control system is a two-input, single-output system, the inputs variables are the relative distance error and the relative speed error, the outputs are the expected acceleration or deceleration of the controlled vehicle.
Method of Fuzzy-PID Control on Vehicle Longitudinal Dynamics System
825
3 Throttle /Brake Switch Logic Control Model Throttle/brake switch logic control model is an important component to realize the super layer control of the vehicle longitudinal dynamics. The expected acceleration / deceleration of the controlled vehicle, which are outputs of the second-order longitudinal relative distance controll, are transmitted to the switch logic control system. According to the switch logic rule, we can determine whether the throttle control or the brake control should be executed, and then the acceleration or deceleration will be transformed into engine throttle angle and brake pressure of the master cylinder. The longitudinal dynamic model of the controlled vehicle can be expressed as:
τ d − τ b − rFrr − rc(v − vb ) 2 = εa
(6)
where τd is the driving torque transmitted to the rear wheel by the driving system; τ b is the sum of the torques imposed on the rear wheel and the front wheel; r is the wheel radius; c is the coefficient of wind resistance; Frr is rolling resistance force; f is the coefficient of rolling resistance; m is the vehicle mass; v is the running speed of the vehicle; vb is the speed of wind; a is the acceleration of the vehicle;
1 Je + mr 2 + J wr + J wf ) 2 r i
ε= (
i = 1 / igear iiηT ୗ
igear is the speed ratio of the auto transmission; ii is the main transmission ratio of the driving system; ηT is the mechanical efficiency of the transmission system; Jwf Jwr are moments of inertia of the rear wheel and the front wheel, respectively. When the throttle angle is zero, the minimum acceleration of the controlled vehicle is:
a min =
1 ª Tem º − rFrr − rc(v − vb ) 2 » « ε¬ i ¼
(7)
where Tem is the minimum output torque as the engine throttle is totally closed. The switch logic rule between the engine throttle and the brake master cylinder can be deduced by the equation of the minimum acceleration. Control rule for the engine throttle: a synth ≥ amin (8) Control rule for the brake master cylinder:
a synth < amin
(9)
where asynth is the expected acceleration / deceleration of the controlled vehicle. The existence of the switch rule between the engine throttle and the brake master cylinder may probably lead to vibration of the control system, thus, it is necessary to introduce a cushion layer near the switch surface to avoid violent vibration of the system and achieve good control. The optimized switch logic rules are: Control rule for the engine throttle: a synth − a min ≥ s (10) Control rule for the brake master cylinder:
826
Y. Li et al.
a synth − a min < s
(11) 2
where s is the thickness of the cushion layer, let s=0.05m/s . According to the switch rule between the engine throttle and the brake master cylinder, we can determine whether the engine throttle control or the brake master cylinder control will be executed, and then calculate the expected throttle angle or brake master cylinder pushrod force. For the engine control, the expected longitudinal driving torque can be written as: τ d = εa synth + rFrr + rc (v − vb ) 2 (12) For the brake master cylinder control, the engine throttle is totally closed, and then the expected longitudinal brake torque can be written as:
τ b = −εa synth +
Tem − rFrr − rc(v − vb ) 2 i
(13)
The minimum acceleration / deceleration m/s2
From equation (7), it can be seen the minimum acceleration is a function of the speed of the controlled vehicle. Substituting the simulation parameters of the vehicle in this equation, we can obtain the switch logic rule between the engine throttle and the brake master cylinder by simulation. The simulation result is shown in Fig. 2.
Velocity ( km/h)
Fig. 2. Throttle/brake master cylinder switch logic law
4 Fuzzy PID Control of the Vehicle Longitudinal Dynamic System 4.1 Parameter Self-adjusting Fuzzy PID Control
Parameter self-adjusting fuzzy PID control is a compound fuzzy control. In order to satisfy the self-adjusting requirement of the PID parameters under different error e and different change rate of the error ec, the PID parameters are modified on line by fuzzy control rules [9]. The accuracy of the traditional PID control combined with the intelligence of the fuzzy control will make the dynamic characteristic and static characteristic of the controlled object well. The basic thought of the parameter selfadjusting fuzzy PID control is: at first, we find the fuzzy relations between the three parameters of the PID and the error e, the change rate of the error ec, respectively, and adjust e and ec through the progress continuously; and then revising the three parameters on line according to the fuzzy control rules, in order to satisfy the different requirement of the control parameters under different e and ec, and achieve the ideal
Method of Fuzzy-PID Control on Vehicle Longitudinal Dynamics System
827
control purpose. The computation is simple, so it is easy to be realized on the single chip processor. The structure of the parameter self-adjusting fuzzy PID controller is shown in Fig.3. P
e
e d/dt
ec
The fuzzy controller
I D
Control signal PID parameter adjusting
Fig. 3. Parameter self-adjusting fuzzy PID controller
In the parameter self-adjusting fuzzy PID control, self-adjusting rules for the parameters KP KI and KD under different e and ec can be simply defined as follows: e is larger, larger KP and smaller KD should be chosen (to accelerate the system response), and let KI =0 in order to avoid large overshoot, thus the integral effect can be eliminate ). e is medium, KP should be set smaller (in order to make the overshoot of the system response relatively small), KI and KD should be chosen proper value (the value of KD has great effect to the system). e is smaller, KP and KI should be set larger in the cause of making the system have good static characteristics, KD should be set properly in order to avoid oscillation near the equilibrium point. Here it takes the absolute value of the error e fuzzy and the change rate of the error ec fuzzy as the input language variables, where the variables with subscript “fuzzy” represent the fuzzy quantities of the parameters. Here e fuzzy can also be represented as E . Each of the language variables has three language values, i.e. big (B), medium (M), and small (S). The membership functions are shown respectively in Fig.4.
u
u uSC(|EC|)
uMC (|EC|) uBC (|EC|)
uSE (|E|)
uME (|E|)
|E1 |
|E 2 |
uBE (|E|)
|EC| |EC1 |
|EC2 |
|EC3 |
|E| |E3 |
Fig. 4. The membership functions of the fuzzy PID
The membership functions can be adjusted by choosing the different values of e1 , e2 , e3 and ec1 , ec2 , ec3 in the different turning points.
828
Y. Li et al.
4.2 The Process of the Parameter Self-adjusting Fuzzy PID Control
For the fuzzy input variables, there are five combination forms of based on the rules designed above.
e
and
ec
e fuzzy = B e fuzzy = M
ec fuzzy = B
e fuzzy = M
ec fuzzy = M
e fuzzy = M
ec fuzzy = S
e fuzzy = S The membership degree of each form can be computed by expressions as follows:
( μ (e μ (e μ (e μ (e
) )= μ )= μ )= μ )= μ
( (e (e (e (e
μ1 e fuzzy , ec fuzzy = μ BE e fuzzy 2
fuzzy
, ec fuzzy
3
fuzzy
, ec fuzzy
4
fuzzy
5
fuzzy
, ec
fuzzy
, ec fuzzy
) ) ∧ μ (ec ) ) ∧ μ (ec ) ) ∧ μ (ec ) )
ME
fuzzy
BC
fuzzy
ME
fuzzy
MC
fuzzy
ME
fuzzy
SC
fuzzy
SE
fuzzy
The three PID parameters can be computed by the following equations based on the measurement of e and ec .
(
)
(
)
º º ª 5 ª 5 K P = «¦ μ j e fuzzy , ec fuzzy × K Pj » / «¦ μ j e fuzzy , ec fuzzy » ¼ ¼ ¬ j =1 ¬ j =1 5 5 º º ª ª K I = «¦ μ j e fuzzy , ec fuzzy × K Ij » / «¦ μ j e fuzzy , ec fuzzy » ¼ ¼ ¬ j =1 ¬ j =1 5 5 º º ª ª K D = «¦ μ j e fuzzy , ec fuzzy × K Dj » / «¦ μ j e fuzzy , ec fuzzy » ¼ ¼ ¬ j =1 ¬ j =1
(14)
(
)
(
)
(15)
(
)
(
)
(16)
where KPj, KIj, KDj (j=1,2,3…5) are the weights of the parameters KP, KI, KD respectively under different states. They can be set as:
K P1 = K P′ 1 , K I 1 = 0, K D1 = 0 K P 2 = K P′ 2 , K I 2 = 0, K D 2 = K D′ 2 K P 3 = K P′ 3 , K I 3 = 0, K D 3 = K D′ 3 K P 4 = K P′ 4 , K I 4 = 0, K D 4 = K D′ 4 K P 5 = K P′ 5 , K I 5 = K I′5 , K D 5 = K D′ 5
Method of Fuzzy-PID Control on Vehicle Longitudinal Dynamics System
829
where K′P1∼ K′P5 K′I1∼ K′I5 K′D1∼ K′D5 are the adjusting values of the parameters KP, KI, KD respectively under different states by using normal PID parameter adjusting method. Using the online self-adjusting PID parameters KP, KI, KD, the output control variable u can be computed by the following discrete differential equations of the PID control algorithm. u n = K P E n + K I ¦ E n + K D (En − E n−1 ) (17) 4.3 The Selection of Control Parameters and the Research of the Simulation 1) The Selection of Control Parameters It takes the fuzzy quantitative factors of the inputs e and ec , and the defuzzificational scale factors of the output variables P I and D respectively as follows: K e = 7.2 , K ec = 8 , RP = 0.6 , RI = 1.3 , RD = 1.0 The adjusting values adjusted by normal PID parameter-adjusting method under different states are K P′ 1 ~ K P′ 5 K I′1 ~ K I′5 K D′ 1 ~ K D′ 5
K P′ 1 = 1, K I′1 = 0, K D′ 1 = 0 K P′ 2 = 2, K I′2 = 0, K D′ 2 = 1.25 K P′ 3 = 3, K I′3 = 0, K D′ 3 = 2.5 K P′ 4 = 4, K I′ 4 = 0, K D′ 4 = 3.75 K P′ 5 = 5, K I′5 = 2, K D′ 5 = 5 2) Analysis of the Simulation Combined the desired parameter-adjusting fuzzy PID controller and the vehicle longitudinal dynamic system, the MATLAB/Simulink platform is used to do the simulation research. The frame of the control is shown in Fig.5. Fig.6~Fig.10 are the simulation results of the parameters self-adjusting fuzzy PID control system of the vehicle longitudinal dynamic system. The simulation parameters are Ke=7.2 Kec=8 RP=0.6 RI=1.3 RD=1.0 λ=2 L+H=7.5. Information of the leading vehicle The under layer of the vehicle longitudinal dynamics The switch logic between the engine and the brake Control master cylinder signal u
The super layer of The output signal the vehicle longitudinal dynamics
e PID adjusting
The fuzzy controller
e d/dt
ec
Fig. 5. Fuzzy PID control system of the vehicle longitudinal dynamics
Longitudinal relative speed error m/s
Y. Li et al.
Longitudinal relative distance error m
830
time (s)
time (s)
` Fig. 7. Longitudinal relative speed error
Torque
Force
Nm
N
Fig. 6. Longitudinal relative distance error
time (s)
time (s)
Fig. 9. The brake torque
Angle
deg
Fig. 8. The push force of the brake master cylinder
time (s)
Fig. 10. The open angle of the engine throttle
Fig.6 and Fig.7 show that the longitudinal relative distance error between the leading vehicle and the controlled vehicle within ±1m is achieved through the parameter self-adjusting fuzzy PID control of the controlled vehicle, while the leading vehicle is accelerating or decelerating. The result is better than ±1.4m while only using the fuzzy control method. It also can be seen the longitudinal relative speed error is between 1m/s and –0.6m/s, it is better than ±1m/s while only using the fuzzy
Method of Fuzzy-PID Control on Vehicle Longitudinal Dynamics System
831
control method. Because the change process is smooth, and at the end the error convergent to zero, the auto-tracking of the controlled vehicle to the leading vehicle can be realized. So the parameters self-adjusting fuzzy PID control is more robust than the fuzzy control method. The response of the brake master cylinder pushrod force is shown in Fig.8, it can be seen that there is a break at the moment of 10s when the controller coming to work, and at the rest time the curve is smooth and is accordance with the brake torque curve in Fig.11. Fig.9 and Fig10 show the response of the controlled vehicle’s brake torque and the throttle angle. When the controlled vehicle’s throttle angle is above zero, the vehicle will accelerate and the brake torque is zero; otherwise, when the throttle open angel is below zero, the vehicle will decelerate and the brake torque is above zero. It can be seen the simulation curve using parameter self-adjusting fuzzy PID control fits the logic, and for most time the transition is smooth. However, it also can be seen the frequent alternation between the acceleration and the deceleration from 20 s to 35 s leads to the frequent switch between the brake torque and the throttle angel.
5
Conclusion
Through above analysis, it can be seen that, for vehicle longitudinal dynamic system, the parameter self-adjusting fuzzy PID control can achieve the control purpose of maintaining the safe distance between two vehicles. It absorbs both the fuzzy control’s advantage of expressing irregular events and the adjusting function of the PID control, thus, the overshoot is reduced, the dynamic anti-disturbance capability is strengthened and the robustness is improved. Meanwhile, the parameter self-adjusting fuzzy PID control method also solves the conflict between the quick response and small overshoot. However, the frequent alternation between the acceleration and the deceleration leads to the phenomenon of the frequent switch between the brake torque and the throttle open angel, this should be improved in the future.
Acknowledgments This research is supported by National Natural Science Foundation of China (No. 50475064) and Natural Science Foundation of Chongqing (No: 8366).
References 1. Hesse, Markus.: City Logistics: Network Modeling and Intelligent Transport Systems. Journal of Transport Geography, Vol.10, No.2 . (2002) 158-159 2. Marell. A, Westin. K.: Intelligent Transportation System and Traffic Safety – Drivers Perception and Acceptance of Electronic Speed Checkers. Transportation Research Part C: Emerging Technologies, Vol. 7, No.2-3. (1999) 131-147 3. Rajamani. R., Shladover.S. E.: An Experimental Comparative Study of Autonomous and Co -Operative Vehicle-Follower Control Systems. Transportation Research Part C: Emerging Technologies, Vol. 9, No. 1. (2001) 15-31
832
Y. Li et al.
4. Kyongsu,Yi, Young Do Kwon: Vehicle-to-Vehicle Distance and Speed Control Using an Electronic-Vacuum Booster. JSAE Review, Vol. 22. (2001) 403–412 5. Sunan Huang, Wei.Ren: Vehicle Longitudinal Control Using Throttles and Brakes. Robotics and Autonomous Systems, Vol. 26. (1999) 241-253 6. Swaroop, D., Hedrick, J. K., Chien, C.C., Ioannou, P.: A Comparision of Spacing and Headway Control Laws for Automatically Controlled Vehicles. Vehicle System Dynamics., Vol. 23. (1994) 597-625 7. Visioli, A.: Tuning of PID Controllers with Fuzzy Logic. IEE Proceedings-Control Theory Appl, Vol. 148, No. 1. (2001)1-6
Design of Fuzzy Controller and Parameter Optimizer for Non-linear System Based on Operator’s Knowledge Hyeon Bae1, Sungshin Kim1, and Yejin Kim2 1
School of Electrical and Computer Engineering, Pusan National University, 30 Jangjeon-dong, Geumjeong-gu, 609-735 Busan, Korea {baehyeon, sskim, yjkim}@pusan.ac.kr http://icsl.ee.pusan.ac.kr 2 Dept. of Environmental Engineering, Pusan National University, 30 Jangjeon-dong, Geumjeong-gu, 609-735 Busan, Korea
Abstract. This article describes a modeling approach based on an operator’s knowledge without a mathematical model of the system, and the optimization of the controller. The system used in this experiment could not easily be modeled by mathematical methods and could not easily be controlled by conventional systems. The controller was designed based on input-output data, and optimized under a predefined performance criterion.
1 Introduction Fuzzy logic can express linguistic information in rules to design controllers and models. The fuzzy controller is useful in several industrial fields, because the fuzzy logic can be easily designed by expert’s knowledge for rules. Therefore, the fuzzy logic can be used in un-modeled system control based on the expert’s knowledge. During the last several years, fuzzy controllers have been investigated in order to improve manufacturing processes [1]. Conventional controllers need mathematical models and cannot easily handle nonlinear models because of incompleteness or uncertainty [2], [3]. The controller for the experimental system presented in this study was designed based on the empirical knowledge of the features of a tested system [4].
2 Ball Positioning System and Control Algorithms As shown in Fig. 1, the experimental system consists of two independent fans operated by two DC motors. The purpose of this experiment was to move a ball to a final goal position using the two fans. This system contains non-linearity and uncertainty caused by the aerodynamics inside the path. Therefore, the goal of this experiment was to initially design the fuzzy controller based on the operator’s knowledge and then optimize using the system performance. In this study, the position of the ball is found by image processing with real-time images. The difference of each image was measured to determine the position of the moving ball [5], [6]. Figure 2 shows the sequence of the image processing to find the ball in the path. L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 833 – 836, 2005. © Springer-Verlag Berlin Heidelberg 2005
834
H. Bae, S. Kim, and Y. Kim
Ball path
CCD camera
Motors with fan
Micro controller
Fig. 1. The experimental system and components
Final ball position
Initial ball position
(a) Initial position of the ball
(b) Final position of the ball
(c) Find the ball
Fig. 2. Image processing to determine the ball position
3 Experimental Results 3.1 Fuzzy Controller In an operator’s experiment, the convergence rates and error values were measured to compare the performance. This was achieved by the performance criterion that is, the objective function, and fuzzy rules were fixed based on empirical knowledge. In an operator’s test, random parameters were first selected and these parameters were subsequently adjusted according to the evaluated performance. The results of the test are shown in Table 1. 3.2 Optimization Hybrid Genetic Algorithm: GA and Simplex Method Thirty percent of crossover and 1.5% mutation rates were implemented for the genetic algorithm in this test. The first graph of Fig. 3 represents the performance with respect to each membership function. The performance values jumped to worse values once in the middle of a graph, but they could be improved gradually with iteration. After a GA process, the performance showed worse values than the previous values for the simplex method, as shown in the second graph of Fig. 3, because performance differences exists in real experiments even though the same membership functions are used. The reason is that all parameters transferred from the GA are used to operate the sys-
Design of Fuzzy Controller and Parameter Optimizer for Non-linear System
835
tem, and then the performance is evaluated again in the simplex method, so small bits gaps can exist. Simulated Annealing As shown in Fig. 4, two graphs represent the results for the SA algorithm. In the initial part of the second graph of Fig. 4, the performance values drop significantly and then lower to better values as the iterations are repeated. SA can search for good solutions quickly even though the repeated iteration times are less than those of the GA. When SA is used for optimization, to select the initial values is very important. In this experiment, the SA starts with the search parameters from the operator’s trials. Initially a high temperature coefficient is selected for the global search and then it is improved gradually during the processing for the local search. Table 1. Parameters of membership functions for experiments
A B C D E F
NB CE VA -40 20 -40 30 -50 35 -50 25 -30 20 -35 25
Error M.F NS ZE CE VA CE VA -20 10 0 10 -15 15 0 5 -10 20 0 8 -20 10 0 10 -15 10 0 10 -18 10 0 8
PS CE VA 20 10 15 20 10 20 20 10 15 10 35 25
Derivative Error M.F PB N ZE P CE VA CE VA CE VA CE VA 70 40 -10 5 0 3 10 5 80 50 -4 5 0 5 4 5 50 25 -4 5 0 3 4 5 50 25 -4 5 0 4 4 5 30 20 -4 4 0 3 4 4 50 20 -2 2 0 2 2 2
Fig. 3. Performance of the experiment in the case of the Hybrid GA
Fig. 4. Performance of the experiment in the case of the SA
836
H. Bae, S. Kim, and Y. Kim
4 Conclusions The primary goal of this study was to design and optimize a fuzzy controller based on the operator’s knowledge and running process. The method used characteristics of the fuzzy controller to control systems using fixed rules without mathematical models. Gaussian functions were employed as fuzzy membership functions and 8 functions were used for error and derivative error. Thus, a total of 16 parameters were optimized. SA is better than the hybrid GA considering convergence rate. But it is difficult to determine which one is the better optimization method. It depends on the conditions under which the systems operate.
Acknowledgement This work was supported by “Research Center for Logistics Information Technology (LIT)” hosted by the Ministry of Education & Human Resources Development.
References 1. Lee, C. C.: Fuzzy Logic in Control Systems: Fuzzy Logic Controller, Part I, II. IEEE Transaction on Systems, Man, and Cybernetics 20, (1990) 404-435 2. Tsoukalas, L. H. and Uhrig, R. E.: Fuzzy and Neural Approaches in Engineering. John Wiley & Sons, New York (1997) 3. Yen, J. and Langari, R.: Fuzzy Logic: Intelligence, Control, and Information. Prentice Hall, NJ, (1999) 4. Mamdani, E. H. and Assilian, S.: An Experiment in Linguistic Synthesis with a Fuzzy Logic Controller. International Journal of Man-Machine Studies 7, (1975) 1-13 5. Haralick, R. M. and Shapiro, L. G.: Computer and Robot Vision. Addison Wesley, MA (1992) 6. Jain, R., Kasturi, R. and Schunck, B. G.: Machine Vision. McGraw-Hill, New York, (1995) 30-33
A New Pre-processing Method for Multi-channel Echo Cancellation Based on Fuzzy Control* Xiaolu Li1, Wang Jie2, and Shengli Xie1 1 Electronic and Communication Engineering, South China University of Technology, 510641, China 2 Control Science and Engineering Post-doc workstation, South China University of Technology, 510641, China [email protected]
Abstract. The essential problem of multi-channel echo cancellation is caused by the strong correlation of two-channel input signals and the methods of preprocessing are always used to decorrelate it and the decorrelation degree depends on nonlinear coefficient α . But in most research, α is constant. In real application, the cross correlation is varying and α should be adjusted with correlation. But there is not precise mathematical formula between them. In this paper, the proposed method applies fuzzy logic to choose α so that the communication quality and convergence performance can be assured on the premise of small addition of computation. Simulations also show the effect of validity method.
1 Description of Problem In free-hand mobile radiotelephone or teleconference system, how to cancel the echo is very important to assure communication. At present, adaptive cancellation technology is mainly adopted in multi-channel echo cancellation. The essential problem of such adaptive multi-channel echo cancellation is caused by strong correlation between input signals[1,2] and the convergence performance is bad. So researchers propose many pre-processing methods[1-4], How to choose the nonlinear transforming coefficient α is the main problem in the method. Figure 1 is the block of stereophonic echo canceller with nonlinear pre-processing units with two added pre-processing units Let output
x i ( n ) = [ xi ( n), x i ( n − 1), " , x i ( n − L + 1)]T , i = 1,2 of
the
microphone
at
time
n
in
the
remote
denote
room,
and
the let
T h i = [h1, 0 , h1,1 , " , h1, L −1 ] denote the true echo path in the local room and
*
The work is supported by the Guang Dong Province Science Foundation for Program of Research Team (grant04205783), the National Natural Science Foundation of China (Grant 60274006), the Natural Science Key Fund of Guang Dong Province, China (Grant 020826), the National Natural Science Foundation of China for Excellent Youth (Grant 60325310) and the Trans-Century Training Program, the Foundation for the Talents by the State Education Commission of China.
L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 837 – 840, 2005. © Springer-Verlag Berlin Heidelberg 2005
838
X. Li, W. Jie, and S. Xie
Fig. 1. Block of stereophonic echo canceller with nonlinear pre-processing units
ˆ i (n) = [wˆ 1,0 (n), wˆ 1,1 (n)," , wˆ 1, L −1 (n)]T denote the adaptive FIR filters at time w n respectively, where i = 1,2 ,and L is the length of impulse response and adaptive filters. The echo signal at time n is y (n) . In order to improve convergence, one of pre-processing methods is nonlinearly transforming x1 x 2 and using the results as the input of filters, i.e.
xi' (n) = xi (n) + αf [ xi (n)]
(1)
where f is a nonlinear function, e.g. half-wave commutate function. Through adjusting the nonlinear coefficient α , we can change the introduced degree of distortion.
2 Adjustment of Nonlinear Component Based on Fuzzy Control The larger α is, the more added nonlinear component is and the quicker the filters converge, but large α perhaps will influence speech quality. Contrarily, small α brings good speech quality, but perhaps the convergence performance won’t be improved much. So adjustment of α should be on the premise of speech quality. So, large α should be adopted when correlation between input signals is strong, and small α should be adopted when the correlation is faint. The relationship is gotten from experience and there is not precise mathematical formula. Following we propose a sort of LMS algorithm which applies fuzzy logic to nonlinear pre-processing for stereophonic echo cancellation. Fuzzy logic is used to adjust α so that the communication quality and convergence performance can be assured on the premise of small addition of computation. The input of fuzzy inference
New Pre-processing Method for Multi-channel Echo Cancellation
system(FIR) is the correlation coefficient nonlinear component coefficient
ρx x
839
of input signals, and the output is the
1 2
a . Three linguistic variables: small(S), mini-
mum(M) and large(L) are defined to deal with the magnitude of either of
ρx x
1 2
and
a . Complete fuzzy rules are shown in Table1. Table 1. Fuzzy Control Rules
Rules
ρx x
Input
Output
a
1 2
R0
L
L
R1
M
M
R2
S
S
At last, the proposed algorithm can reduce to following equations as follows
ρx x = 1 2
cov( x1 (n), x 2 (n))
; a ( n)
D( x1 (n) ) D( x 2 (n) )}
= FIS ( ρ x1x2 (n))
(2)
xi' = xi (n) + α (n) f [ xi (n)]
(3)
x 'i (n) = [ x 'i (n), xi' (n − 1),..., x 'i (n − L + 1)]
(4)
d ( n) = x T ( n ) h ( n)
e( n ) = d ( n ) − x T ( n ) w ( n )
,
(5)
'
w i (n + 1) = w i (n) + 2μ
e ( n) x i ( n ) x 1 ( n ) x 1 ( n ) + x 2 ( n) T x 2 ( n) '
T
'
'
'
,i
= 1,2 (6)
3 Simulations Simulations show that when the correlation is strong, a (n) is also large;when weak correlation is caused by background noise or position changing of speakers, the designed fuzzy system will give small a (n) to reduce speech distortion. Fig.2 shows the good convergence performance of mean square error of the adaptive filters. We adopt Mean Square Error(MSE) as the criterion:
MSE = 10 log10 E[e 2 (n)] .
840
X. Li, W. Jie, and S. Xie
Fig. 2. Mean square error of the adaptive filters
4 Conclusion The essential problem of multi-channel echo cancellation is caused by the strong correlation, and the methods of pre-processing the input signals are always used to decorrelate it. In real application, the cross correlation of two-channel input signals is varying with time and it also will be influenced by background noise and not be constant. But in most research, α is constant but in fact α should be adjusted with correlation. Their relationship is gotten from experience and there is not precise mathematical formula between them. In this paper the proposed method applies fuzzy logic to nonlinear pre-processing for stereophonic echo cancellation so that the communication quality and convergence performance can be assured on the premise of small addition of computation. Simulations also show the effect of validity method.
References 1. M. Mohan Sondhi, Dennis R. Morgan, Joseph L. Hall, Stereophonic acoustic echo cancellation- an overview of the fundamental problem, IEEE Signal Processing Letters. 2(1995)148-151 2. Jacob Benesty, Dennis R. Morgan and M. Mohan Sondhi, A better understanding and improved solution to the probems of stereophonic acoustic echo cancellation, ICASSP. (1995)303-305 3. Benesty J, Morgan D and Sondhi M.A better understanding and an improved solution solution to the problem of stereophonic acoustic echo cancellation. IEEE Trans. Speech Audio Procesing. 6(1998) 156-165 4. Suehiro Shimauchi and Shoji Makino, Stereo projection echo canceller with true echo path estimation, ICASSP. (1995)3059-3063
Robust Adaptive Fuzzy Control for Uncertain Nonlinear Systems Chen Gang, Shuqing Wang, and Jianming Zhang National Key Laboratory of Industrial Control Technology, Institute of Advanced Process Control, Zhejiang University, Hangzhou, 310027, P. R. China [email protected]
Abstract. Two different fuzzy control approaches are proposed for a class of nonlinear systems with mismatched uncertainties, transformable to the strictfeedback form. A fuzzy logic system (FLS) is used as a universal approximator to approximate unstructured uncertain functions and the bounds of the reconstruction errors are estimated online. By employing special design techniques, the controller singularity problem is completely avoided for the two approaches. Furthermore, all the signals in the closed-loop systems are guaranteed to be semi-globally uniformly ultimately bounded and the outputs of the system are proved to converge to a small neighborhood of the desired trajectory. The control performance can be guaranteed by an appropriate choice of the design parameters. In addition, the proposed fuzzy controllers are highly structural and particularly suitable for parallel processing in the practical applications.
1 Introduction Based on the fact that FLS can approximate uniformly a nonlinear function over a compact set to any degree of accuracy, FLS provides an alternative way to modeling and design of nonlinear control system. It provides a way to combine both the available mathematical description of the system and the linguistic information into the controller design in a uniform fashion. In order to improve the performance and stability of FLS, the synthesis approach to constructing robust fuzzy controllers has received much attention. Recently, some design schemes that combine backstepping methodology with adaptive FLS have been reported. In order to avoid the controller singularity problem, the control gain is often assumed to be known functions [1], [2]. However, this assumption cannot be satisfied in many cases. In [3], a FLS is used to approximate the unknown control gain function, but the controller singularity problem is not solved. The possible controller singularity problem can be avoided when the sign of the control gain is known [4]. The problem becomes more difficult if the sign of the control gain is unknown. Another problem is that some tedious analysis is needed to determine regression matrices [1], [3]. Therefore, the approaches are very complicate and difficult to use in practice. In this note, we will present two approaches to solve the aforementioned problems. One is the robust adaptive fuzzy tracking control (RAFTC) L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 841 – 850, 2005. © Springer-Verlag Berlin Heidelberg 2005
842
C. Gang, S. Wang, and J. Zhang
for the low order systems with known sign of the control gain. The other is the robust fuzzy tracking control (RFTC) proposed for the high order systems with unknown sign of the control gain. The control schemes presented in this note have several advantages. First, they can incorporate in an easy way the linguistic information about the system through if-then rules into the controller design. The controllers are highly structural and particularly suitable for parallel processing in the practical applications. Second, the controller singularity problem is completely avoided. The outline of this note is as follows. In Section 2, formulation of our robust tracking problem of nonlinear system with mismatched uncertainties is presented. RAFTC and RFTC are developed in Section 3 and Section 4, respectively. Finally, the note is concluded in Section 5.
2 System Description and Problem Statement Consider the n-order nonlinear system of the form
ξ = f (ξ ) + Δq(ξ ) + bg (ξ )u , y = h (ξ ) ,
(1)
where ξ ∈ R n is the state, u ∈ R is the input, y ∈ R is the output, and h is a smooth function on R n . f and g are known functions with g (ȟ ) ≠ 0, ∀ȟ ∈ R n . b is an unknown parameter with b ≠ 0 . Δq(ȟ ) represents uncertainties due to many factors, such as modeling errors, parameter uncertainties, disturbances, and so on. The control objective is to make the output y tracks the desired trajectory y r (t ) in the presence of bounded uncertainties. Assumption 1. The desired output trajectory y r (t ) and its derivatives up to nth order are known and bounded. Assumption 2. The uncertainties Δq(ξ ) satisfy the structural coordinate-free con-
{
}
dition ad z Δq ∈ ϕ i , ∀z ∈ ϕ i , 0 ≤ i ≤ n − 2 . ϕ i = span g , ad f g , ", ad if g , 0 ≤ i ≤ n − 1 , are involutive and of constant rank i + 1 . The nominal form of (1) is described by the form
ξ = f (ξ ) + bg (ξ )u , y = h (ξ ) .
(2)
Lemma 1. If the system (2) has relative degree n and Assumption 2 is satisfied, there exists a global diffeomorphism x = φ (ξ ) , transforming the system (1) into the form
x i = x i +1 + Δ i (x1 ,", x i ) , 1 ≤ i ≤ n − 1 , x n = γ (x ) + bβ (x )u + Δ n (x ) , y = x1 .
(3)
Robust Adaptive Fuzzy Control for Uncertain Nonlinear Systems
843
3 Design of RAFTC Assumption 3. The sign of the unknown parameter b is known. Furthermore, b satisfies b0 < b < b1 for some unknown positive constant b0 , b1 . In this note, we consider a FLS consisting of the product-inference rule, singleton fuzzifier, center average defuzzifier, and Gaussian membership function. Based on universal approximation theorem, given a compact set Ω xi ⊂ R i , the unknown function Δ i can be expressed as Δ i (x1 ,", x i ) = wi*T hi (x1 ,", x i ) + v i (x1 ," , x i ) ,
(4)
where wi* is an optimal weight vector; hi is the fuzzy function vector; the reconstruction error v i is bounded, i.e., there exists unknown constant ρ i > 0 such that vi < ρ i .
The design procedure consists of n steps. At each step i (1 ≤ i < n ) , the direct adaptive fuzzy control techniques are employed to design a fictitious controller α i . For the residual uncertainties, a robustness term is introduced to compensate them. At the nth step, the actual control u appears and the design is completed. For sparing the space, we omit these steps and directly give the resulting adaptive controller e1 = x1 − y r , ei = x i − α i −1 , i = 2," , n ,
α1 = − k1 e1 − wˆ 1T h1 − ϕ 1 + y r , α i = −k i ei − wˆ iT hi − ϕ i − ei −1 + α i −1 , i = 2,", n − 1 ,
(
)
u = − qˆ β −1 γ + k n en + wˆ nT hn + ϕ n + e n −1 − α n −1 , ϕ i = ρˆ i tanh (ei ε i ) , i = 1," , n ,
(5)
where k i > 0 and ε i > 0 for i = 1," , n are design parameters; q = b −1 ; wˆ i , ρˆ i , and qˆ denote the estimates of wi , ρ i , and q , respectively. It is well known that for conventional adaptive laws, the unmodeled dynamics, disturbances, and the reconstruction errors may lead to parameter drift and even instability problems. Consider the constraint region Ω wi = wi wi ≤ ci for approximation
{
}
parameter vector wi with constant ci > 0 . wˆ i , ρˆ i , and qˆ are updated according to wˆ i = Qi (hi ei − σ i wˆ i ) ,
ρˆ i = −k c A i (ρˆ i − ρ i0 ) + A i ei tanh (ei ε i ) , qˆ = g1 (sgn (b )e nφ1 − ηqˆ ) ,
(6)
844
C. Gang, S. Wang, and J. Zhang
where φ1 = γ + k n e n + wˆ nT hn + ϕ n + e n −1 − α n −1 ; Qi = QiT is a positive definite matrix; A i > 0 , k c > 0 , g1 > 0 , and η > 0 are constants; ρ i0 is initial estimate of ρ i ; the switching parameter σ i is chosen as
σ i = 0 , if wˆ i < ci or ( wˆ i ≥ ci and wˆ iT hi ei ≤ 0 ) σi =
( wˆ
2 i
wˆ i
2
)
− ci2 wˆ iT hi ei
(δ
2 i
+ 2δ i ci
)
(7) , if wˆ i ≥ ci and wˆ iT hi ei > 0 ,
with small constant δ i > 0 . According to (5), we obtain the error dynamics of the closed-loop system ~ E = − KE − W T H + M − B + SE + D ,
(
(8)
)
T ~ T ~ ,", w ~ ), where E = (e1 ,", e n ) , K = diag(k1 ,", k n ) , H = h1T ,", hnT , W = diag(w 1 n T T T * ~ ~ ~ wi = wˆ i − wi , M = (v1 ,", v n ) , B = (ϕ 1 , " , ϕ n ) , D = (0,", bq φ1 ) , q = q − qˆ , and
S ∈ R n×n has only nonzero elements s i ,i +1 = 1 , s i +1,i = −1 , i = 1,", n − 1 . Theorem 1. Consider the closed-loop system consisting of (3) satisfying Assumption 3, controller (5), and the adaptive laws (6). For bounded initial conditions, 1) all signals in the closed-loop system are bounded; 2) the tracking error e1 can be kept as small as possible by adjusting the controller parameters in a known form. T Proof. Define Q = diag(Q1 ,", Qn ) , L = diag(A 1 ,", A n ) , ρ = (ρ1 ,", ρ n ) , ρ~ = ρ − ρˆ , ρˆ = (ρˆ ,", ρˆ )T , Wˆ = diag(wˆ ,", wˆ ) . Considering the following Lyapunov candi1
1
n
n
date V=
(
)
b ~2 1 T 1 ~ ~ 1 q . E E + tr W T Q −1W + ρ~ T L−1 ρ~ + 2 2 g1 2 2
(9)
The time derivative of V along the trajectory of (8) satisfies
(
)
b ~ ~ . V = E T − KE − W T H + M − B + SE + D + tr §¨W T Q −1Wˆ ·¸ − ρ~ T L−1 ρˆ − q~qˆ © ¹ g1
(
n
)
n
(
)
n
T ≤ − E T KE − ¦ σ i wˆ i − wi* wˆ i + k c ¦ ρ~i ρˆ i − ρ i0 + b ηq~qˆ + ¦ ρ i ε i . i =1
i =1
(10)
i =1
Referring to (7) for i = 1 , σ 1 = 0 if the first condition is true. If wˆ 1 ≥ c1 and
(
)
T
wˆ 1T h1e1 > 0 , then σ 1 wˆ 1 − w1* wˆ 1 ≥ 0 , because wˆ 1 ≥ w1* and σ 1 > 0 . By employ-
(
)
T
ing the same procedure, we can achieve the same results σ i wˆ i − wi* wˆ i ≥ 0 ,
Robust Adaptive Fuzzy Control for Uncertain Nonlinear Systems
i = 2 ", n . Therefore, we can obtain
¦σ i (wˆ i − wi* ) n
T
i =1
845
wˆ i ≥ 0 . Consequently, there
exists V ≤ − λmin (K ) E
2
−
1 k c ρ~ 2
2
−
(
n 1 1 b ηq~ 2 + k c ¦ ρ i − ρ i0 2 2 i =1
)
2
+
η 2b0
(11)
n
+ ¦ ρ iε i , i =1
where λmin (K ) denotes the minimum eigenvalue of K . We see that V is negative
{
{
}
}
whenever E ∉ Ω E = E E ≤ c 0 λmin (K ) , or ρ~ ∉ Ω ρ~ = ρ~ ρ~ ≤ 2c 0 k c , or
{
( bη )}, where c
n 2 kc n η ρ i − ρ i0 + + ¦ ρiε i . ¦ 2 i =1 2b0 i =1 According to standard Lyapunov theorem extension, these demonstrate the uniform ultimate boundedness of E , ρ~ , and q~ . Since wˆ i (i = 1,", n) and y r(i ) (i = 0,", n )
q~ ∉ Ω q~ = q~ q~ ≤ 2c0
0
=
(
)
are bounded, all fictitious functions α i (i = 1,", n − 1) and control input u are bounded. Consequently, we can conclude that all signals in the closed-loop system are bounded. According to (11), we know that the tracking error satisfies lim e1 (t ) ≤ c 0 λmin (K ) . The small tracking error can be achieved by increasing cont→∞
trol gain k i and decreasing ε i , η . The parameter k c offers a tradeoff between the magnitudes of ρ~ and e1 . It is also shown the closer ρ i0 (i = 1,", n ) are to ρ (i = 1,", n ) , the smaller ρ~ and e become. i
1
The RAFTC is suitable for the low order system with known sign of control gain. For the high order system, the online computation burden will be heavy. In the next section, we will present an approach for the high order system. The control laws have the adaptive mechanism with minimal learning parameterizations. Furthermore, a priori knowledge of the control gain sign is not required.
4 Design of RFTC The design procedure is briefly given as follows. Step i ( 1 ≤ i < n ): According to (4), the fictitious controllers are chosen as
α1 = −k1e1 − w1T h1 − ϕ 1 + y r , α i = −k i ei − wiT hi − ϕ i − ei −1 + α i −1 , i = 2,", n − 1 ,
(12)
where the nominal vector wi is designed and fixed by a priori knowledge. There exist constants ρ wi and ρ vi such that wi − wi* ≤ ρ wi , v i ≤ ρ vi . The robustness term ϕ i is given by ϕ i = ρˆ wi hi tanh ( hi ei ε wi ) + ρˆ vi tanh (ei ε vi ) . ρˆ wi and ρˆ vi denote the estimates of ρ wi and ρ vi , respectively. k i > 0 , ε wi > 0 , and ε vi > 0 are design parameters. ρˆ wi and ρˆ vi are updated as follows:
846
C. Gang, S. Wang, and J. Zhang
ρˆ wi = −σ wi (ρˆ wi − ρ wi0 ) + rwi ei hi tanh ( hi ei ε wi ) , ρˆ = −σ (ρˆ − ρ 0 ) + r e tanh (e ε ) , vi
vi
vi
vi
vi i
i
vi
(13)
where σ wi , rwi , σ vi , rvi are positive constants. Step n : In the final step, we will design the actual controller. Since the sign of the control gain is unknown, we will introduce Nussbaum-type gain in the controller design. Choosing the following actual controller u = N (ω )α n β (x ) , ω = α n en ,
(14)
α n = k n en + γ + w hn + ϕ n + en −1 − α n −1 . T n
N (ω ) is a Nussbaum-type function which has the following properties [6] s
s
lim sup ³ N (ω ) dω = +∞ ; lim inf ³ N (ω ) dω = −∞ . s →∞
0
s →∞
0
(15)
( )
In this note, the Nussbaum function N (ω ) = exp ω 2 cos(πω 2 ) is considered. It should be pointed out that the Nussbaum-type gain technique was firstly proposed in [6] for a class of first-order linear system. Subsequently, the method was generalized to higher order linear systems [7] and nonlinear systems [8], [9]. According to (12) and (14), we obtain the following error dynamics of the closedloop system ~ (16) E = − KE − W T H + M − B + SE + D ,
(
)
T ~ T ~ , ", w ~ ), where E = (e1 ,", e n ) , K = diag(k1 ,", k n ) , H = h1T ,", hnT , W = diag(w 1 n ~ = w − w * , M = (v ,", v )T , B = (ϕ ,", ϕ )T , D = (0,", (bN (ω ) + 1)α )T , and w 1 1 i i i n n n
s ∈ R n×n has only nonzero elements s i ,i +1 = 1 , s i +1,i = −1 , i = 1,", n − 1 . Theorem 2. Suppose Assumption 1 is satisfied. Consider the closed-loop system consisting of (3), controller (12), (14), and the parameter updating laws (13). Given a compact set Ω n ⊂ R n , for any x (0) ∈ Ω n , the errors ei , i = 1,", n , and parameter estimates ρˆ wi , ρˆ vi , i = 1,", n , are uniformly ultimately bounded (UUB).
Proof. Define R w = diag(rw1 ,", rwn ) , Rv = diag(rv1 ,", rvn ) , ρ w = (ρ w1 ,", ρ wn ) , ρˆ w = (ρˆ w1 ,", ρˆ wn )T , ρ~w = ρˆ w − ρ w , ρv = (ρv1 ,", ρvn )T , ρˆv = (ρˆ v1,", ρˆvn )T , ρ~v = ρˆ v − ρ v . Considering the following Lyapunov function candidate 1 1 1 (17) V = E T E + ρ~wT Rw−1 ρ~w + ρ~vT Rv−1 ρ~v . 2 2 2 T
The time derivative of V along the trajectory of (16) is given by
Robust Adaptive Fuzzy Control for Uncertain Nonlinear Systems
(
847
)
~ V = E T − KE − W T H + M − B + (bN + 1)ω + ρ~wT Rw−1 ρˆ w + ρ~vT Rv−1 ρˆ v . By completing the squares
ρ~wi (ρˆ wi − ρ wi0 ) =
(
ρ~vi (ρˆ vi − ρ vi0 )
(
)
(
)
1 (ρˆ wi − ρ wi )2 + 1 ρˆ wi − ρ wi0 2 − 1 ρ wi − ρ wi0 2 , 2 2 2 2 2 1 1 1 2 = (ρˆ vi − ρ vi ) + ρˆ vi − ρ vi0 − ρ vi − ρ vi0 , 2 2 2
)
(
)
we obtain n § · σ σ V ≤ − E T KE − ¦ ¨¨ wi ρ~wi2 + vi ρ~vi2 ¸¸ + (bN (ω ) + 1)ω + ς , 2rvi i =1 © 2 rwi ¹ n § σ where ς = ¦ ¨¨ ρ wi ε wi + ρ vi ε vi + wi ρ wi − ρ wi0 2rwi i =1 ©
(
)
2
+
σ vi
(ρ
2rvi
vi
)
2· − ρ vi0 ¸¸ . Thus ¹
V ≤ − λV + ς + (bN (ω ) + 1)ω ,
{
(18)
}
where λ = min 2k1 ,",2k n , σ w1 ,", σ wn , σ v1 ," , σ vn . Solving the inequality (18) yields 0 ≤ V (t ) ≤
t ς + e − λtV (0) + ³ (bN (ω ) + 1)ωe − λ (t −τ ) dτ , ∀ t ≥ 0 . λ 0
(19)
[ ) ) dω (τ ) , ∀t ∈ [0, t ) . Two cases need to be consid-
We first show that ω (t ) is bounded on 0, t f by seeking a contradiction. Define t
P (ω (0), ω (t )) = ³ (bN (ω ) + 1)e − λ (t −τ
f
0
ered: 1) ω (t ) has no upper bound and 2) ω (t ) has no lower bound. Case 1: Suppose that ω (t ) has no upper bound on 0, t f ) , i.e., there exists a mono-
[
tone increasing sequence {ω (t i )} such that lim ω (t i ) = +∞ as lim t i = t f . i →∞
i →∞
First, we consider the case b > 0 . Suppose that 4 M + 1 > ω (0) , where M is an integer. We have t1
P (ω (0),4 M + 1) = ³ (bN (ω ) + 1)e − λ (t1 −τ ) dω (τ )
(
≤ be
0
(4 M +1)2
)
+ 1 (4 M + 1 − ω (0)) .
Noting that N (ω ) is negative on intervals [4 M + 1,4 M + 3] , thus P (4 M + 1,4 M + 3) =
t2
³ (bN (ω ) + 1)e
t1
− λ (t2 −τ )
dω (τ )
(20)
848
C. Gang, S. Wang, and J. Zhang
≤ be − λ (t2 − t1 )
4 M + 2.5
³ N (σ )dσ + 2
≤ − 2 2 be − λ (t2 − t1 )+ (4 M +1.5 ) + 2 . 2
(21)
4 M +1.5
According to (20), (21), we have
e
(4 M +1)2
(−
P (ω (0),4 M + 3) ≤ 2 2be
− λ (t2 − t1 )+ 4 M +1.25
)
+ b(4 M + 1 − ω (0 )) + (4 M + 3 − ω (0))e − (4 M +1) . 2
Hence, P (ω (0),4 M + 3) → −∞ as M → ∞ , which yields a contradiction in (19). Then, we consider the case b < 0 . Suppose that 4 M − 1 > ω (0) , then t1
P (ω (0),4 M − 1) = ³ (bN (ω ) + 1)e − λ (t1 −τ ) dω (τ )
(
≤ be
0
(4 M −1)2
)
+ 1 (4 M − 1 − ω (0)) .
(22)
Noting that N (ω ) is positive on intervals [4 M − 1,4 M + 1] , thus P (4 M − 1,4 M + 1) =
t2
³ (bN (ω ) + 1)e
− λ (t2 −τ )
dω (τ )
t1
≤ be − λ (t2 −t1 )
4 M + 0.5
³ N (σ )dσ + 2 ≤
(23)
2 2 be − λ (t2 − t1 )+ (4 M −0.5) + 2 . 2
4 M − 0.5
According to (22), (23), we have
e
(4 M −1)2
( 2 2 be
P (ω (0),4 M + 1) ≤ − λ (t 2 − t1 )+ 4 M − 0.75
)
+ b (4 M − 1 − ω (0)) + (4 M + 1 − ω (0))e − (4 M −1) . 2
Hence, P (ω (0),4 M + 1) → −∞ as M → ∞ , which yields a contradiction in (19).
[
Thus, we conclude that ω (t ) is upper bounded on 0, t f ) .
Case 2: Employing the similar method as in case 1, we can prove that ω (t ) is
[
lower bounded on 0, t f ) .
[
Therefore, ω (t ) must be bounded on 0, t f ) . As an immediate result, V (t ) is bounded
[
on 0, t f ) . All the signals in the closed-loop system are bounded, then t f → ∞ . This proves that all the signals of the closed-loop system are UUB, i.e., for any compact set Ω n ⊂ R n , there exist a controller such that as long as x (0) ∈ Ω n , ei , ρˆ wi , and ρˆ vi are UUB, respectively. Correspondingly, the tracking error e1 (t ) satisfies
t § · e1 (t ) ≤ 2¨¨ ς λ + ³ (bN (ω ) + 1)ωe − λ (t −τ ) dτ + e − λtV (0)¸¸ . 0 © ¹
Thus, by appropriately adjusting the design parameters, the tracking error e1 (t ) can be kept as small as possible.
Robust Adaptive Fuzzy Control for Uncertain Nonlinear Systems
849
Remark 1. For the low order system with mismatched uncertainties and known sign of control gain, we present a new RAFTC algorithm. In order to avoid the possible divergence of the on-line tuning of FLS, a new adaptive law is proposed to make sure that the FLS parameter vectors are tuned within a prescribed range. By a special design technique, the controller singularity problem is avoided. Remark 2. For the high order system with mismatched uncertainties and unknown sign of control gain, we present a new RFTC algorithm. The main feature of the algorithm is the adaptive mechanism with minimal learning parameters. The online computation burden is kept to minimum. By employing Nussbaum gain design technique, the controller singularity problem is avoided perfectly. Remark 3. Both the algorithms are very suitable for practical implementation. The controllers and the parameter adaptive laws are highly structural. Such a property is particularly suitable for parallel processing and hardware implementation in practical applications.
5 Conclusions In this note, the tracking control problem has been considered for a class of nonlinear systems with mismatched uncertainties, transformable to the strict-feedback form. By combining backstepping design technique with fuzzy set theory, RAFTC for the low order system with known sign of the control gain and RFTC for the high order system with unknown sign of the control gain are proposed. Both algorithms completely avoid the controller singularity problem. The proposed controllers are highly structural and particularly suitable for parallel processing in the practical applications. Furthermore, the RFTC algorithm has the adaptive mechanism with minimal learning parameterizations. It is shown the proposed algorithms can guarantee the semiglobally uniform ultimate boundedness of all the signals in the closed-loop system. In addition, the tracking error can be reduced to arbitrarily small values by suitably choosing the design parameters.
References 1. Lee, H., Tomizuka, M.: Robust Adaptive Control Using a Universal Approximator for SISO Nonlinear Systems. IEEE Trans. Fuzzy Syst. 8 (2000) 95-106 2. Jagannathan, S., Lewis, F.L.: Robust Backstepping Control of a Class of Nonlinear Systems Using Fuzzy Logic. Inform. Sci. 123 (2000) 223-240 3. Wang, W.Y., Chan, M.L., Lee, T.T., Liu, C.H.: Adaptive Fuzzy Control for Strict-Feedback Canonical Nonlinear Systems with H∞ Tracking Performance. IEEE Trans. Syst. Man, Cybern. 30 (2000) 878-885 4. Yang, Y.S., Feng, G., Ren, J.: A Combined Backstepping and Small-Gain Approach to Robust Adaptive Fuzzy Control for Strict-Feedback Nonlinear Systems. IEEE Trans. Syst. Man, Cybern. 34 (2004) 406-420 5. Polycarpou, M.M., Ioannou, P.A.: A Robust Adaptive Nonlinear Control Design. Automatica. 32 (1996) 423-427
850
C. Gang, S. Wang, and J. Zhang
6. Nussbaum, D.R.: Some Remarks on a Conjecture in Parameter Adaptive Control. Syst. Contr. Lett. 3 (1983) 243-246 7. Mudgett, D.R., Morse, A.S.: Adaptive Stabilization of Linear Systems with Unknown High Frequency Gains. IEEE Trans. Automat. Contr. 30 (1985) 549-554 8. Ge, S.S., Wang, J.: Robust Adaptive Neural Control for a Class of Perturbed Strict Feedback Nonlinear Systems. IEEE Trans. Neural Networks. 13 (2002) 1409-1419 9. Ding, Z.T.: Adaptive Control of Nonlinear System with Unknown Virtual Control Coefficients. Int. J. Adapt. Control Signal Process. 14 (2000) 505-517
Intelligent Fuzzy Systems for Aircraft Landing Control Jih-Gau Juang1, Bo-Shian Lin1, and Kuo-Chih Chin2 1
Department of Communication and Guidance Engineering National Taiwan Ocean University, Keelung 20224, Taiwan [email protected] [email protected] 2 ASUSTeK Computer Inc, Taipei 101, Taiwan [email protected]
Abstract. The purpose of this paper is to investigate the use of evolutionary fuzzy neural systems to aircraft automatic landing control and to make the automatic landing system more intelligent. Three intelligent aircraft automatic landing controllers are presented that use fuzzy-neural controller with BPTT algorithm, hybrid fuzzy-neural controller with adaptive control gains, and fuzzy-neural controller with particle swarm optimization, to improve the performance of conventional automatic landing system. Current flight control law is adopted in the intelligent controller design. Tracking performance and adaptive capability are demonstrated through software simulations.
1 Introduction In a flight, take-off and landing are the most difficult operations in regard to safety issues. The automatic landing system of an airplane is enabled only under limited conditions. If severe wind disturbances are encountered, the pilot must handle the aircraft due to the limits of the automatic landing system. The first Automatic Landing System (ALS) was made in 1965. Since then, most aircraft have been installed with this system. The ALS relies on the Instrument Landing System (ILS) to guide the aircraft into the proper altitude, position, and approach angle during the landing phase. Conventional automatic landing systems can provide a smooth landing which is essential to the comfort of passengers. However, these systems work only within a specified operational safety envelope. When the conditions are beyond the envelope, such as turbulence or wind shear, they often cannot be used. Most conventional control laws generated by the ALS are based on the gain scheduling method [1]. Control parameters are preset for different flight conditions within a specified safety envelope which is relatively defined by Federal Aviation Administration (FAA) regulations. According to FAA regulations, environmental conditions considered in the determination of dispersion limits are: headwinds up to 25 knots; tailwinds up to 10 knots; crosswinds up to 15 knots; moderate turbulence, wind shear of 8 knots per 100 feet from 200 feet to touchdown [2]. If the flight conditions are beyond the preset envelope, the ALS is disabled and the pilot takes over. An inexperienced pilot may not be able to guide the aircraft to a safe landing at airport. L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 851 – 860, 2005. © Springer-Verlag Berlin Heidelberg 2005
Intelligent Fuzzy Systems for Aircraft Landing Control
852
China Airlines Flight 642 had a hard landing at Hong Kong International Airport on 22 August 1999. The lifting wing was broken during the impact that killed 3 passengers and injured 211 people. After 15 months investigation, a crash report was released on November 30, 2000. It showed that the crosswind-correction software 907 on the Boeing MD-11 had a defect. Boeing also confirmed this software problem later and replaced nearly 190 MD-11's crosswind-correction software with the 908 version. According to Boeing's report [3], 67% of the accidents by primary cause are due to human factors and 5% are attributed to weather factors. By phase of flight, 47% accidents are during final approach or landing. It is therefore desirable to develop an intelligent ALS that expands the operational envelope to include more safe responses under a wider range of conditions. The goal of this study is that the proposed intelligent automatic landing controllers can relieve human operators and guide the aircraft to a safe landing in wind disturbance environment. In this study, robustness of the proposed controller is obtained by choosing optimal control gain parameters that allows wide range of disturbances to the controller. In 1995, Kennedy and Eberhart presented a new evolutionary computation algorithm, the real-coded Particle Swarm Optimization (PSO) [4]. PSO is one of the latest population-based optimization methods, which dose not use the filtering operation (such as crossover and mutation) and the members of the entire population are maintained through the search procedure. This method was developed through the simulation of a social system, and has been found to be robust in solving continuous nonlinear optimization problems [5-7]; they are suitable for determination of the control parameters which give aircraft better adaptive capability in severe environment. Recently, some researchers have applied intelligent concepts such as neural networks and fuzzy systems to intelligent landing control to increase the flight controller's adaptively to different environments [8-12]. Most of them do not consider robustness of controller due to wind disturbances [8-10]. In [11], a PD-type fuzzy control system is developed for automatic landing control of both a linear and a nonlinear aircraft model. Adaptive control for a wide range of initial conditions has been demonstrated successfully. The drawback is that the authors only set up the wind disturbance at the initial condition. Persistent wind disturbance is not considered. In [12] wind disturbances are included but the neural controller is trained for a specific wind speed. Robustness for a wide range of wind speeds has not considered. In previous works [13-14], we have utilized neural networks to automatic landing control. Environment adaptive capability has been improved but the rate of convergence is very slow. Here, we present three learning schemes, fuzzy-neural controller with BPTT algorithm, fuzzy-neural controller with adaptive control gains, and fuzzy-neural controller with PSO algorithm, to guide the aircraft to a safe landing and make the controller more robust and adaptive to the ever-changing environment.
2 Aircraft Landing System The pilot descends from the cruise altitude to an altitude of approximately 1200ft above the ground. The pilot then positions the airplane so that the airplane is on a heading towards the runway centerline. When the aircraft approaches the outer airport marker, which is about 4 nautical miles from the runway, the glide path signal is in-
853
J.-G. Juang, B.-S. Lin, and K.-C. Chin
tercepted (as shown in Fig. 1). As the airplane descends along the glide path, its pitch, attitude and speed must be controlled. The aircraft maintains a constant speed along the flight path. The descent rate is about 10ft/sec and the pitch angle is between -5 to +5 degrees. Finally, as the airplane descends 20 to 70 feet above the ground, the glide path control system is disengaged and a flare maneuver is executed. The vertical descent rate is decreased to 2ft/sec so that the landing gear may be able to dissipate the energy of the impact at landing. The pitch angle of the airplane is then adjusted, between 0 to 5 degrees for most aircraft, which allows a soft touchdown on the runway surface.
Altitude 1200 ft Glide Path
≈ 50 ft
Runway Position
Flare Path
0 ft
Touchdown
Fig. 1. Glide path and flare path
A simplified model of a commercial aircraft that moves only in the longitudinal and vertical plane is used in the simulations for implementation ease [12]. To make the ALS more intelligent, reliable wind profiles are necessary. Two spectral turbulence forms modeled by von Karman and Dryden are mostly used for aircraft response studies. In this study the Dryden form [12] was used for its demonstration ease. Figure 2 shows a turbulence profile with a wind speed of 30 ft/sec at 510 ft altitude. W ind Gust velocity components: Longitudinal (Solid) & Vertical (Dashed) 20 10
ft/sec
0 -10 -20 -30 -40 0
5
10
15
20 25 30 Time (sec.)
35
Fig. 2. Turbulence profile
40
45
50
Intelligent Fuzzy Systems for Aircraft Landing Control
854
3 Landing Controller Design In this study, the aircraft maintains a constant speed alone the flight path, we assumed that the change in throttle command is zero. The aircraft is thus controlled solely by the pitch command. Detailed descriptions can be found in [12]. 3.1 Fuzzy-Neural Controller with BPTT Algorithm
In this section, we first design and analyze the performance of the fuzzy-neural controller for auto-landing in severe wind condition. The learning process is shown in Fig. 3, where AM is the aircraft model, FNNC is the fuzzy modeling neural network controller, and LIAM is the linearized inverse aircraft model [13]. Every learning cycle consists of all stages from S 0 to S k . Weight changes in the FNNC are updated using a batch model. The controller is trained by Backpropagation through time (BPTT) algorithm. The inputs for the fuzzy-neural controller are: altitude, altitude command, altitude rate, and altitude rate command. The output of the controller is the pitch command. Detail structure of the fuzzy modeling network can be found in [1415]. The fuzzy neural controller starts learning without any control rule. The LIAM calculates the error signals that will be used to back propagate through the controller in each stage [13].
S0
C0
S1
Sk−2
Ck−2
Sk−1
ΔCk−2
ΔC0
e1
ek−2
Ck−1
Sk
ΔCk−1
ek−1
ek
+ Sd
Fig. 3. Learning Process
In the simulations, successful touchdown landing conditions are defined as follows -3 ≤ h(T ) ft/sec ≤ 0, 200 ≤ x (T ) ft/sec ≤ 270
-300 ≤ x(T ) ft ≤ 1000, -1 ≤ θ (T ) degree ≤ 5 where T is the time at touchdown. Initial flight conditions are: h(0)=500 ft, x (0) =235 ft/sec, x(0) =9240 ft, and γ o =-3 degrees. With the wind turbulence speed at 30 ft/sec, the horizontal position at touchdown is 418.5 ft, horizontal velocity is 234.7 ft/sec, vertical speed is -2.7 ft/sec, and pitch angle is -0.08 degrees, as shown in Fig. 4 to Fig. 6. Table 1 shows the results from using different wind turbulence speeds. The controller can successfully guide the aircraft flying through wind speeds of 0 ft/sec to 45 ft/sec while the conventional controller can only reach 30 ft/sec [12].
855
J.-G. Juang, B.-S. Lin, and K.-C. Chin
Fig. 4. Aircraft altitude and command
Fig. 5. Aircraft vertical velocity and command
Fig. 6. Aircraft pitch angle and command
Intelligent Fuzzy Systems for Aircraft Landing Control
856
Table 1. The results from using different turbulence strength Wind speed (ft/sec) Landing point (ft) Aircraft vertical Speed (ft/sec) Pitch angle (degree)
3.2
10 580.3 -2.2 -0.81
20 541.8 -2.3 -0.63
30 418.5 -2.7 -0.08
40 457.3 -2.3 -0.06
45 247.9 -2.9 0.35
Fuzzy-Neural Controller with Adaptive Control Gains
In previous section the control gains of the pitch autopilot in glide-slope phase and flare phase are fixed. Robustness of the fuzzy-neural controller is achieved by the BPTT training scheme. In this section, a neural network generates adaptive control gains for the pitch autopilot. Different wind disturbances are the inputs of the neural network, as in Fig. 7. The fuzzy neural controller is trained by BP instead of BPTT. With the wind turbulence speed at 50 ft/sec, the horizontal position at touchdown is 547 ft, horizontal velocity is 235 ft/sec, vertical speed is –2.5 ft/sec, and pitch angle is 0.34 degrees, as shown in Fig. 8 to Fig. 10. Table 2 shows the results from using different wind turbulence speeds. The controller can successfully guide the aircraft flying through wind speeds of 34 ft/sec to 58 ft/sec.
kθ kq
S0
C0
S1
kθ kq
Sk −2
Ck − 2
S k −1
kθ kq
Ck −1
Fig. 7. Learning process – with wind disturbance NN
Fig. 8. Aircraft altitude and command
Sk
857
J.-G. Juang, B.-S. Lin, and K.-C. Chin
Fig. 9. Aircraft vertical velocity and command
Fig. 10. Aircraft pitch angle and command Table 2. The results from using different turbulence strength Wind speed (ft/sec) Landing point (ft) Aircraft vertical Speed (ft/sec) Pitch angle (degree)
34 427.9 -2.9 0.15
40 346.8 -2.9 0.10
45 853.5 -2.8 0.20
50 547.5 -2.5 0.34
58 943.7 -2.8 0.25
3.3 Fuzzy-Neural Controller with Particle Swarm Optimization
In the PSO algorithm, each member is called “particle”, and each particle flies around in the multi-dimensional search space with a velocity, which is constantly updated by the particle’s own experience and the experience of the particle’s neighbors or the experience of the whole swarm. Each particle keeps track of its coordinates in the problem space, which are associated with the best solution (fitness) it has achieved so far. This value is called pbest. Another best value that is tracked by the global version of the particle swarm optimizer is the overall best value, and its location, obtained so far by any particle in the population. This location is called gbest. At each time step,
Intelligent Fuzzy Systems for Aircraft Landing Control
858
the particle swarm optimization concept consists of velocity changes of each particle toward its pbest and gbest locations. Acceleration is weighted by a random term, with separate random numbers being generated for acceleration toward pbest and gbest locations. The turbulence strength increases progressively during the process of parameter search. The purpose of this procedure is to search more suitable control gains for kΫ and kq in glide path and flare path. With the wind turbulence speed at 50 ft/sec, the horizontal position at touchdown is 505 ft, horizontal velocity is 235 ft/sec, vertical speed is -2.7 ft/sec, and pitch angle is 0.21 degree, as shown in Fig. 11 to Fig. 13. Table 3 shows the results from using different wind turbulence speeds. The controller can successfully guide the aircraft flying through wind speeds of 30 ft/sec to 90ft/sec. With the same wind turbulence speed at 50 ft/sec, Fig. 14 shows the absolute error for height using fuzzy-neural controller with adaptive control gains (dash line) and hybrid fuzzy-neural controller with PSO (solid line), respectively. It indicates that the performance of hybrid fuzzy-neural controller with PSO is much better.
Fig. 11. Aircraft altitude and command
Fig. 12. Aircraft vertical velocity and command
859
J.-G. Juang, B.-S. Lin, and K.-C. Chin
Fig. 13. Aircraft pitch angle and command Table 3. The results from using different turbulence strength Wind speed (ft/sec) Landing point (ft) Aircraft vertical Speed (ft/sec) Pitch angle (degree)
30 467.2 -2.4 -0.18
40 403.7 -2.5 0.06
50 505.3 -2.7 0.21
70 543.0 -2.5 0.91
90 771.4 -2.8 1.31
Fig. 14. Performance of Fuzzy-Neural Controller
4 Conclusions For the safe landing of an aircraft with a conventional controller, the wind speed limit of turbulence is 30 ft/sec. In this study, Control gains are selected by a combination method of a nonlinear control design, a neural network, and particle swarm optimization. Comparisons on different control schemes are given. The hybrid fuzzy-neural controller with adaptive control gains can overcome turbulence to 58 ft/sec. The
Intelligent Fuzzy Systems for Aircraft Landing Control
860
fuzzy-neural controller with BPTT and the fuzzy-neural controller with PSO algorithm can reach 45 ft/sec and 90 ft/sec, respectively. The fuzzy-neural controller with PSO algorithm has best performance and the convergence rate is also improved. The purpose of this paper has been achieved.
Acknowledgement This work was supported by the National Science Council, Taiwan, ROC, under Grant NSC 93-2213-E-019 -007.
References 1. Buschek , H., Calise, A.J.: Uncertainty Modeling and Fixed-Order Controller Design for a Hypersonic Vehicle Model. Journal of Guidance, Control, and Dynamics. 20 (1997) 42-48 2. Federal Aviation Administration: Automatic Landing Systems. AC 20-57A (1971) 3. Boeing Publication: Statistical Summary of commercial Jet Airplane Accidents. Worldwide Operations 1959-1999. (2000) 4. Kennedy, J., Eberhart, R.C.: Particle Swarm Optimization. Proceedings of IEEE International Conference on Neural Networks. 4 (1995) 1942-1948 5. Shi, Y., Eberhart, R. C.: Empirical Study of Particle Swarm Optimization. Proceedings of the 1999 Congress on Evolutionary Computation. (1999) 1945-1950 6. Angeline, P. J.: Using Selection to Improve Particle Swarm Optimization. Proceedings of IEEE International Conference on Evolutionary Computation. (1998) 84-89 7. Zheng, Y.L., Ma, L., Zhang, L., Qian, J.: On the Convergence Analysis and Parameter Selection in Particle Swarm Optimization. Proceedings of the Second IEEE International Conference on Machine Learning and Cybernetics. (2003) 1802-1807 8. Izadi, H., Pakmehr, M., Sadati, N.: Optimal Neuro-Controller in Longitudinal Autolanding of a Commercial Jet Transport. Proceedings of IEEE International Conference on Control Applications. (2003) 1-6 9. Chaturvedi, D.K., Chauhan, R., Kalra, P.K.: Application of Generalized Neural Network for Aircraft Landing Control System. Soft Computing. 6 (2002) 441-118 10. Ionita, S., Sofron, E.: The Fuzzy Model for Aircraft Landing Control. Proceedings of AFSS International Conference on Fuzzy Systems. (2002) 47-54 11. Nho, K., Agarwal, R.K.: Automatic Landing System Design Using Fuzzy Logic. Journal of Guidance, Control, and Dynamics. 23 (2000) 298-304 12. Jorgensen, C.C., Schley, C.: A Neural Network Baseline Problem for Control of Aircraft Flare and Touchdown. Neural Networks for Control. (1991) 403-425 13. Juang, J.G., Chang, H.H., Chang, W.B.: Intelligent Automatic Landing System Using Time Delay Neural Network Controller. Applied Artificial Intelligence. 17 (2003) 563581 14. Juang, J.G.: Fuzzy Neural Networks Approaches for Robotic Gait Synthesis. IEEE Transactions on Systems Man and Cybernetics—Part B: Cybernetics. 30 (2000) 594-601 15. Horikawa, S., Furuhashi, T., Uchikawa, Y.: On Fuzzy Modeling Using Fuzzy Neural Networks with the Back-Propagation Algorithm. IEEE Transactions on Neural Networks. 3 (1992) 801-806
Scheduling Design of Controllers with Fuzzy Deadline* Hong Jin, Hongan Wang, Hui Wang, and Danli Wang Institute of Software, Chinese Academy of Sciences. Beijing 100080 {hjin, wha, hui.wang, dlwang}@iel.iscas.ac.cn
Abstract. Because some timing-constraints of a controller task may be not determined as a real-time system engineer thinks of, its scheduling with uncertain attributes can not be usually and simply dealt with according to classic manners used in real-time systems. The model of a controller task with fuzzy deadline and its scheduling are studied. The dedication concept and the scheduling policy of largest dedication first are proposed first. Simulation shows that the scheduling of controller tasks with fuzzy deadline can be implemented by using the proposed method, whilst the control performance cost gets guaranteed.
1 Introduction Existing academic researches on co-designing of control and scheduling assume to know all timing-constraints (e.g., sampling period, deadline, etc.) [1][2][3][4]. However, some timing-constraints of a control loop are usually dependent on requirements for controlled process dynamics and loop performance, e.g., the imprecise clock, overload or computer fault can lead to a relative change for the sampling period all. Moreover, computer hardware, control algorithm, real-time operating system, scheduling algorithm and network delay can cause delay and jitter in control system all [3]. The controller tasks with fuzzy deadline usually appear in control problems although a control engineer does not care of the implementing process of controller task in computer-controlled systems. Lingual terms can be used to describe timingconstraints, e.g., the response time of less than 0.1 seconds and the sampling period of 2 seconds. The later means that 2±0.1% seconds can be considered to be correct and acceptable. Another natural comprehend is that the deadline is admitted to have a small variant around the sampling period which is precise [5], and the small variant can be considered as random, uncertainty or fuzzy. So, it is important and meaningful to study the scheduling of control tasks with uncertain/fuzzy attributes in computercontrolled systems. However, existing researches on the scheduling of tasks with uncertain/fuzzy attributes have not considered control tasks as their scheduling objects besides of the performance cost of a control system [5][6]. Under the precondition of assuring control performance, how to use limited computing resources to achieve the scheduling of controller tasks with fuzzy deadline is the problem cared in this paper. The time-driven sampling is considered. The proposed scheduling algorithm of largest dedication first is introduced in Section 3. *
This work is supported by China NSF under Grant No. 60374058, 60373055 and 60373056.
L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 861 – 864, 2005. © Springer-Verlag Berlin Heidelberg 2005
862
H. Jin et al.
2 Controller Model 2.1 Controller Task In computer-controlled systems [1][3], the dispersion of the reference input r(t) with the system output y(t) is used as the input of a controller which recalculates the manipulated variable used as the input of the controlled system G(s) for every P seconds. The control aim is to make y(t) approximate to r(t) as quickly as possible. The state update of the manipulated variable (Update State) and the calculation of system output (Calculate Output) will make up of the close loop of the whole system. In the co-design of control and scheduling, the simplest model assumption about a controller task, Ti, is that Ti is periodic, and has a fixed sampling period Pi, a known worst cast execute time and a hard deadline di. And its release/arrive time is assumed to be zero in this paper. Moreover, the execute time of Ti is denoted as Ci. In following discussion, the deadline di is assumed to be fuzzy uncertain. 2.2 Fuzzy Deadline For fuzzy description of deadline, which can take any value in some interval with a probability, there are many continuous fuzzy numbers to be referred, e.g., trapezoid/ triangle or truncated normal fuzzy number. The continuous trapezoid deadline is used here [6]. Let diŁTrapezoid(ai,ei,fi,bi), where aieifibiPi. Let μi(t) be [ai,bi]-cut trapezoid membership function of di, then, μi(t) is equal to (t-ai)hi/(ei-ai) for ai 0 for all t . Therefore hi ( z (t )) ≥ 0 , i = 1,2,..., r , and
¦ir=1 hi ( z (t )) = 1 for all t . Then the method of the PDC scheme is utilized to design the fuzzy controller to stabilize fuzzy system (2) . The idea of the PDC scheme is to design a compensator for each rule of the T-S fuzzy model. The resulting overall fuzzy controller, which is nonlinear in general, is a fuzzy blending of each individual linear controller. The fuzzy controller shares the same fuzzy sets with the fuzzy system (2) , so the i th control rule for the fuzzy system (2) is described as follows: Rule i : IF z1 (t ) is M i1 and … and z p (t ) is M ip (4) THEN u (t ) = − Fi x(t ) , i = 1,2,..., r . where Fi is the local state feedback gain matrix. The overall state feedback fuzzy control law is represented by u (t ) = −
¦ir=1 wi ( z (t )) Fi x(t ) ¦ir=1 wi ( z (t ))
r
= − ¦ hi ( z (t )) Fi x(t ) i =1
(5)
Now, substituting (5) into (3) , we can obtain the following closed-loop system: r
x (t ) = ¦ hi2 ( z (t ))G ii x(t ) + i =1
r
¦
i , j =1,i < j
hi ( z (t ))h j ( z (t ))(G ij + G ji ) x(t )
(6)
where Gij = Ai − Bi F j , i, j = 1,2,..., r and i < j ≤ r . The design of the state feedback fuzzy controller is to determine the local feedback gains Fi such that the above closed-loop fuzzy system is asymptotically stable. Based on the stability theory, it can be seen that the equilibrium of the fuzzy system described by (6) is asymptotically stable in the large if there exist a common symmetric positive definite matrix P such that the following matrix inequalities are satisfied: P>0
(7)
GiiT P + PGii < 0
(8)
(Gij + G ji ) T P + P(Gij + G ji ) < 0
(9)
where Gij = Ai − Bi F j , i, j = 1,2,..., r and i < j ≤ r .
972
F. Liu and J. Chen
According to Schur complement, it is easy to find that the matrix inequalities (7) , (8) and (9) are equivalent to the following LMIs [3], respectively: X >0
(10)
Ai X + XAiT − Bi Yi − YiT BiT < 0
(11)
Ai X + XAiT + A j X + XATj − Bi Y j − Y jT BiT − B j Yi − YiT B Tj < 0
(12)
where X = P −1 , Yi = Fi X , i, j = 1,2,..., r and i < j ≤ r .
The above conditions are LMIs with respect to variables X , Yi . This feasibility problem can be solved very efficiently by means of the recently developed interior-point methods. Furthermore, if matrices exist which satisfies these inequalities, then the feedback gains are given by Fi = Yi X −1 .
3 Pipeline System Control Consider the following series arrangement of two identical gravity-flow tanks equipped with outlet pipes (see Fig.1). The dynamic equations of this system are described as follows:
x1 = x 2 = x 3 = x 4 =
Ap g L
x2 −
Kf
ρA p2
x12
(
1 FC max α −(1−u ) − x1 At Ap g L
x4 −
Kf
) (13)
x2 2 3 ρA p
1 (x1 − x 3 ) At
where x1 and x 3 are the volumetric flow rates of liquid leaving the tanks via the pipes, x 2 and x 4 are the heights of the liquid in the tank, respectively. FC max = 2 m 3 s is the maximum value of the volumetric rate of fluid entering the first tank, g = 9.81 m s 2 is the gravitational acceleration constant, L = 914m is the length of the pipes, K f = 4.41 Ns 2 m 3 is the friction factor, ρ = 998 kg m 2 is the density of the liquid, A p = 0.653m 2 is the cross-sectional area of the pipes, and At = 10.5m 2 is the cross-sectional area of each of the tanks. The parameter α = 5 is the range ability parameter of the valve and the control input u is the valve position.
Fuzzy Control of Nonlinear Pipeline Systems with Bounds on Output Peak
973
Fig. 1. A series of two gravity flow tanks
In order to represent the system (13) in the format given by (1) , we regard the control input term via the following auxiliary variable v v = FC max α −(1−u )
(14)
Then the system (13) can be transformed into the following form: x = f ( x) + g ( x)v
(15)
where
K f 2º ª Ap g x2 − x1 » « ρA p2 » « L ª0 » « 1 «1 x1 − » « At « » « f (x ) = , g ( x ) = « At K f 2» « Ap g «0 x4 − x3 » « ρA p2 » «¬ 0 « L » « 1 (x1 − x3 ) » « ¼ ¬ At
º » » ». » »¼
Using the linearization method introduced in paper [5], the system (15) can be illustrated by the following T-S fuzzy models. This linearization method overcomes the disadvantage of the Tailor series approach, and adopts local linearization idea to provide a new linearization method for nonlinear systems. Rule 1: IF x 4 (t ) is about 1 THEN x (t ) = A1 x(t ) + B1v(t ) Rule 2: IF x 4 (t ) is about 5 THEN x (t ) = A2 x(t ) + B2 v(t )
974
F. Liu and J. Chen
where ª − 0.0153 0.0091 0.0017 0.0021º «− 0.0952 0 0 0 »» A1 = « , « 0.0017 0.0021 − 0.0153 0.0091» » « 0 0 ¼ − 0.0952 ¬ 0.0952 ª− 0.0370 0.0101 0.0011 0.0031º «− 0.0952 0 0 0 »» A2 = « , « 0.0011 0.0031 − 0.0370 0.0101» » « 0 − 0.0952 0 ¼ ¬ 0.0952 ª 0 º «0.0952» ». B1 = B2 = « « 0 » » « ¬ 0 ¼ Selecting membership functions of the form h1 ( x 4 ) = 1 − ( x 4 − 1) 4 , h2 ( x 4 ) = 1 + ( x 4 − 5) 4 , solving LMIs (10) , (11) , (12) , a feasible solution is ontained ª 1551.5 − 394.8 1284.8 − 383.5 º «− 394.8 1822.4 69.9 − 1028.4»» X =« , « 1284.8 69.9 1762.9 403.4 » » « ¬− 383.5 − 1028.4 403.4 4982.0 ¼ F1 = [3.9787 4.0388 − 4.1664 1.4611] , F2 = [4.3154 4.1636 − 4.4364 1.5346] . Then the fuzzy state feedback controller is constructed as follows: u = −(h1 F1 + h2 F2 )( x − x d ) + v d
(16)
where ( x d , v d ) is the desired operating point of the system (15) , choosing x d = [1.8389 5.0 1.8389 5.0]T , v d = 1.8389 . To test the performance of the controller design, applying the controller (16) to system (15) , the simulation results are described in Fig.2 with the initial condition x 0 = [1.3 2.5 1.3 2.5]T . From Fig.2, it can be seen that the designed fuzzy controller globally stabilizes the system (15) .
Fuzzy Control of Nonlinear Pipeline Systems with Bounds on Output Peak
975
Fig. 2. Simulation results of pipeline system without output constraints
However, the dynamical response of liquid height x 2 is not perfect. To obtain better dynamical performance, output constraints will be introduced to control system. Assume that z = Cx is the controlled output vector, where x = [ x1 x 2 x3 x 4 ]T is system vector, matrix C ∈ R1×4 is selected by requirement. Here, the aim is to limit the liquid height x 2 , so we choose C = [0 1 0 0] , that is z = x 2 . Under the known initial condition x 0 and the upper limit ξ > 0 , the system (15) is asymptotically stable in the large and satisfies the condition z ≤ ξ if there exist matrix X such that LMIs (10) - (12) and the following matrix inequalities are satisfied: ªX «xT ¬ 0
ªX « ¬«CX
x0 º >0 ξ »¼
XC T º »>0 ξ ¼»
(17)
(18)
For the sake of comparing with Fig.2, we choose the initial condition and the desired operating point just the same as in Fig.2. Consider that this method has a stated conservativeness, so the value of variable ξ can be tested for several times, such as ξ = 24 . Then, by solving LMIs (10) - (12) and (17) , (18) , the feasible solutions are as follows: 8.4 º ª322.7 − 5.1 290.4 « − 5.1 17.4 − 0.6 − 13.9 » », X =« «290.4 − 0.6 370.7 119.7 » » « ¬ 8.4 − 13.9 119.7 1008.4¼ F1 = [4.0886 75.1448 −4.3597 1.5083] , F2 = [4.1009 75.1490 −4.3691 1.5094] .
976
F. Liu and J. Chen
Using the above feedback gains F1 , F2 to construct a fuzzy controller, the simulation results are depicted in Fig.3. From Fig.3, it is obvious to see that the overshoot of liquid height x 2 curve is reduced, while other system response curves are not influenced.
Fig. 3. Simulation results of pipeline system with output constraints
4 Conclusions In this paper, the control problem for nonlinear pipeline system described by the T-S fuzzy model is carried out. It is shown that the nonlinear system can be represented by a set of local models, and the controllers for each local model are designed via PDC scheme. It is also shown that the stable condition can be transformed into convex problems in terms of LMIs. To improve the response performance of system, control constraints is introduced to the system output variable. The simulation results demonstrate that the method is effective.
References 1. Tanaka, K., Sugeno, M.: Stability Analysis and Design of Fuzzy Control Systems. Fuzzy Sets and Systems, Vol.45, (1992) 135-156 2. Wang, H.O., Tanaka, K., Griffin, M.F.: Parallel Distributed Compensation of Nonlinear Systems by Takagi-Sugeno Fuzzy Model. Proc. FUZZ-IEEE/IFES’95, (1995) 531-538 3. Boyd, S., EI Ghaoui, L., Feron, E., Balakrishnan, V.: Linear Matrix Inequalities in System and Control Theory. Philadelphia, PA: SIAM, (1994) 4. Sira-Ramirez, H., Angulo-Nunez, M.I.: Passivity-Based Control of Nonlinear Chemical Processes. Int. J. Control, Vol.68, (1997) 971-996 5. Marcelo C. M. Teixeira, Stanislaw H. Zak: Stabilizing Controller Design for Uncertain Nonlinear Systems Using Fuzzy Models. IEEE Trans. Fuzzy Syst., Vol.7, (1999) 133-142
Grading Fuzzy Sliding Mode Control in AC Servo System Hu Qing1, Qingding Guo 2, Dongmei Yu1, and Xiying Ding 2 1
School of Electrical Engineering, Shenyang University of Technology, Postalcode 110023, No.58 Xinghua South Street, Tiexi District, Shenyang, Liaoning, P.R. China {aqinghu, yu_dm163}@163.com 2 School of Electrical Engineering, Shenyang University of Technology, Postalcode 110023, No.58 Xinghua South Street, Tiexi District, Shenyang, Liaoning, P.R. China {guoqd, dingxy}@sut.edu.cn
Abstract. In this paper a strategy of grading fuzzy sliding mode control (FSMC) applied in the AC servo system is presented. It combines the fuzzy logic and the method of sliding mode control, which can reduce the chattering without decreasing the system robustness. At the same time, the exponent approaching control is added by grading. The control strategy makes the response of the system quick and no overshoot. It is simulated to demonstrate the feasible of the proposed method by MATLAB6.5 and the good control effect is received.
1 Introduction Since the 1990's, the combination of the fuzzy logic control and sliding mode control has been widely researched and applied. The most significant aspect is FSMC characterized by illumination [1], [2]. The combination of the fuzzy control and sliding mode control can solve the existed problems and keep the system stability effectively. To solve the problem of chattering this paper presents the method using the fuzzy reasoning to decide the magnitude of the control. At the same time, the exponent approaching method is adopted in the start-up process of the system, which can speed up the response of the system and make the time of response is secondary priority.
2 System Analysis and Controller Design 2.1 Design of Sliding Mode Controller The control methods are chosen by the judgment of s to adopt either fuzzy sliding mode control or exponent approaching control. The speed differential equation of the permanent magnet synchronous motor (PMSM) follows as[3]:
P P dω r B = − ω r + n K t iq − n TL dt J J J L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 977 – 980, 2005. © Springer-Verlag Berlin Heidelberg 2005
(1)
978
H. Qing et al.
where J, ωr, B, TL, Pn and iq denote inertia, angular velocity, friction factor, load torque, pairs of poles and quadrate-axis current, respectively. Kt is constant. Suppose x1 = θr−θ and x 2 = x1 = −ω r . Where, θr is the reference position of the rotor and θ is the practical position of the rotor. The state equation is
x1 = x 2 ® ¯ x 2 = −ax 2 − bu + F (t )
(2)
where a = B0/J0, b = KtPn/J0, F(t) = d0TL + g⋅dwr/dt + hwr, and u = iq is the input. J0, B0 is ideal value, d0 = Pn/J0, g = ΔJ/Pn, h = ΔB/Pn, and ΔJ, ΔB is the practical changing variable of them. The sliding mode line is defined as s = cx1 + x2. Using the following control rules: °u = u1 ® °¯u = u 2
if s ≥ s 0
(3)
if s < s 0
where u1 is exponent approaching control and u2 is fuzzy sliding mode control. Calculation of the exponent control rules: Supposing u = u1, so u1 can be derived:
u1 =
1 [(c − a + k ) x 2 + c ⋅ k ⋅ x1 + ε ⋅ sign( s )] b
(4)
where k, ε are the constants of control. Calculation of FSMC rules: The reachable condition s ⋅ s < 0 must be satisfied. Suppose u = u2, substituting (2) to s ⋅ s < 0 , we have
1 1 °u2 > b (c − a ) x2 + b F ® 1 1 °u2 < (c − a ) x2 + F b b ¯
when s > 0 . when s < 0
(5)
Define u2 = ueq + uk. The control law superposes equivalent control ueq and switching control uk. ueq is the necessary control effort that makes the state trajectory remains on the sliding mode line. The action of uk is that any initial state point on phase plane can come to the sliding mode line in finite interval. So the achievable property can be satisfied. 2.2 Design of Fuzzy Sliding Mode Controller
The physical meaning of fuzzy controller is that the distance between state point and sliding mode line and the derivative of it reasons out the output. Chosen of membership: The domain of s, s , uk are chosen as [−3, 3]. On this domain, the 7 fuzzy subsets are set, [NB, NM, NS, ZO, PS, PM, PB]. The membership of them is shown in Fig. 1. As shown in Fig. 1, the membership near ZO is lumping and the degree of reasoning is stronger, which makes the control effort soft.
Grading Fuzzy Sliding Mode Control in AC Servo System
979
Fig. 1. Fuzzy sets on input and output variables domain Table 1. Fuzzy reasoning rules
s uk s NB NM NS ZO PS PM PB
NB
NM
NS
ZO
PS
PM
PB
NB NB NB NB NS PB PB
NB NB NM NM NM PB PB
NB NM NS NS PS PM PB
NB NM NB ZO PB PM PB
NB NM NS PS PS PM PB
NB NB PM PM PM PB PB
NB NB PS PB PB PB PB
Chosen of control rules: The following rules in table 1 are adopted. Where s is the distance between the state point and the sliding mode line and s is the velocity of it.
3 Simulations of the System Mathematical model and parameters of an AC servo motor: G(s) = Pn /(J+B), where Pn = 2, J = 0.0013 Kgm2, B = 0.0026Nm/(rad ⋅ s). Kt = 0.0124. Chosen of control parameters: c = 5, K1 = 3/4, K2 = 3/100, K3 = 5/3, where K1, K2 are the quality factors, K3 is the anti-quality factor. s0 = 10, k = 10, ε = 0.2. Simulation parameters: the reference position θr = 5, the simulation step is 0.01s. At 3s, the step overload is added. PID parameters: KP = 10, KI = 2, KD = 0.5. Output responses of system with step interruption: Fig. 2 and Fig. 3 are the PID control and FSMC with a step overload, respectively. Compared with the traditional PID methods, as shown in Figures, FSMC has the property of no overshoot, quick response and strong robustness. Simulations of reducing chattering: Fig. 4 is the phase trajectory of traditional sliding mode control. Fig. 5 is the phase trajectory of FSMC. It is obvious that FSMC has the good control effect in reducing chattering.
980
H. Qing et al.
Fig. 2. The PID control with step overload
Fig. 4. Error state vector of sliding mode
Fig. 3. The FSMC with step overload
Fig. 5. Error state vector of fuzzy sliding mode
4 Conclusions The simulations show that the grading FSMC presented by this paper reduces the chattering by controlling the output with the sliding mode variables s and s . It keeps the robustness to the parameters vary and interruption. Moreover, the exponent approaching control is added by grading, so the response of the system is sped up and the very good effect is received.
References 1. Wang, F. Y.: Sliding Mode Variable Structure Control. Mechanics Industry Press, Beijing (1995). 2. Ha, Q. P., Rye, D.C., Durrant -Whyte, H.F.: Fuzzy Moving Sliding Mode Control with Application to Robotic Manipulators. Automatics, 35 (1999) 607-616. 3. Guo, Q. D., Wang, C. Y.: AC Servo System. Mechanics Industry Press, Beijing (1994).
A Robust Single Input Adaptive Sliding Mode Fuzzy Logic Controller for Automotive Active Suspension System Ibrahim B. Kucukdemiral1 , Seref N. Engin1 , Vasfi E. Omurlu2 , and Galip Cansever1 1 2
Yildiz Technical University, Department of Electrical Engineering, Turkey {beklan, nengin, cansever}@yildiz.edu.tr, Yildiz Technical University, Department of Mechanical Engineering, Turkey [email protected] Abstract. The proposed controller in this paper, which combines the capability of fuzzy logic with the robustness of sliding mode controller, presents prevailing results with its adaptive architecture and proves to overcome the global stability problem of the control of nonlinear systems. Effectiveness of the controller and the performance comparison are demonstrated with chosen control techniques including PID and PD type self-tuning fuzzy controller on a quarter car model which consists of component-wise nonlinearities.
1
Introduction
There are two major objectives in the studies of Automotive Active Suspension Systems (AASSs) that are to improve ride comfort by reducing the vertical acceleration of the sprung mass and to increase holding ability of the vehicle by providing adequate suspension deflections. To overcome the problems that originate from the complexity and nonlinearity of vehicle systems, various kinds of Fuzzy Logic Controllers (FLCs) are suggested such as [1,2,3,4,5].In this work, we propose a novel, robust, simple and industrially applicable FLC with a single state feedback for AASSs, where the stability of the controller is secured in the sense of Lyapunov. The other benefit is that, the rule-base for the proposed controller does not need to be tuned by an expert since an adaptation mechanism takes the responsibility for tuning. Online simulations of the proposed system verify that the proposed suspension controller exhibits much better performances compared to other methods namely, the passive suspension, classical Proportional-IntegralDerivative (PID) controller and PD-type self-tuning fuzzy controller (STFPD), when testing with standard bumps and random road profiles.
2
Modelling the Active Suspension System
A typical active suspension system for a quarter car model is illustrated in Fig. 1. It is assumed that the tire never leaves the ground. The actuator force is denoted with fu . Nonlinear spring and damping forces are provided as L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 981–986, 2005. c Springer-Verlag Berlin Heidelberg 2005
982
I.B. Kucukdemiral et. al.
Fig. 1. Quarter car model used
fs = ks (zu − zs ) (ks /4) (zu3 − zs3 ) fb = bs |z˙u − z˙s |(z˙u − z˙s ).
(1)
Here damping and spring coefficients are denoted by, bs = 1000 Nsec/m and, ks = 45000N/m respectively. Car body displacement, zs , wheel displacement, zu , and road displacement, zr , are all measured from the static equilibrium position. The dynamic equations of the quarter car active suspension system, neglecting nonlinear effects for now, are ms z¨s = ks (zs − zu ) + bs (z˙s − z˙u ) + fu mu z¨u = −ks (zs − zu ) − bs (z˙s − z˙u ) − fu + kt (zr − zu )
(2)
where ms = 250.3 kg and mu = 30.41 kg are the masses of the car body and the wheel respectively, z¨s denotes the acceleration of the car body and kt = 150000 N/m is the tire spring constant.
3
Design of a Robust Single-Input Adaptive Fuzzy Sliding Mode Controller for AASS
In general, a compact dynamic equation for AASS can be regarded as a second order differential equation, such as z¨s (t) = −
B K Td Fu z˙s (t)− zs (t)− + ≡ Ap z˙s (t)+Kp zs (t)+Dp Td +Mp u(t) (3) M M M M
where M is the total mass of the body, B is the damping coefficient, Td is the term for total unknown load disturbances and finally Fu is the applied control force to the system. The error of sprung mass displacement is e(t) = zs (t) − zd (t) where zd (t) is the desired displacement of the car body. The first step in sliding mode controller design process is choosing the sliding surface. Therefore the following sliding surface is employed t [¨ zd (τ ) − k1 e(τ ˙ ) − k2 e(τ )] dτ. (4) s(t) = z˙s (t) − 0
A Robust Single Input Adaptive Sliding Mode
983
Assuming that the system dynamics are well known, a feedback linearization u (t) can be applied. However this control law cannot be directly applicable. Thus, one method to overcome this difficulty is to imitate the feedback linearization method given by an adaptive fuzzy logic controller such as
N wr θr uf z (s) = r=1 N r=1 wr
(5)
where θr , r = 1, 2, . . . , N are the discrete singleton control signals labelled as adjustable parameters and wr is the firing of r. rule. In the present work, we have chosen Gaussian membership functions for the fuzzification process of s. And N is chosen to be 7. If θr is chosen as an adjustable parameter, (5) can be regarded as uf z = ΘT Δ where Θ = [θ1 , θ2 , . . . , θN ]T and Δ = [δ1 , δ2 , . . . , δN ]T are the corresponding vectors. According to the universal approximation property of the fuzzy systems (5) can be approximated by a fuzzy system with a bounded approximation error ε with a bound ρ such as u (t) = uf z (t) + ε = Θ T Δ + ε. ˆ ˆ as the approximation of be the approximation of u (t) and Θ Let u ˆf z = ΘΔ Θ . Moreover, in order to compensate the approximation error u ˆf z − u we add a variable structure controller term to the control signal which results u(t) = u ˆf z + uvs . Then, after some algebraic manipulations, one can achieve the error uf z + uvs − u ] = s. ˙ Denoting u ˆ f z − u = dynamics as e¨ + k1 e˙ + k2 e = Mp [ˆ T ˆ ˜ ˜ u ˆf z − uf z − ε as u ˜f z and Θ − Θ as Θ then u˜f z = Θ Δ − ε is obtained. the variable structure controller is defined as uvs = −ρ · sat(s/Φ) where ρ is the variable switching gain and Φ is the thickness of the boundary layer and chosen as 0.0008 for the present application and sat is the saturation function. If ρ denotes the equivalent gain and ρˆ denotes the estimated gain then the error for switching gain will be ρ˜ = ρˆ − ρ . In order to achieve minimum approximation error and to guarantee the existence of sliding mode, we have chosen a Lyapunov function candidate as, ˜TΘ ˜ + Mp ρ˜2 ˜ ρ˜) = 1 + Mp Θ V (s, Θ, 2 2γ1 2γ2
(6)
where γ1 and γ2 are positive constants. then the time derivative of V along the trajectory will be ˜ ρ˜) = s.Mp [−ρˆ · sat(s/Φ) − ε] + Mp (ˆ V˙ (s, Θ, ρ − ρ )ρˆ˙ γ2
(7)
Then choosing ρˆ˙ = γ2 sat(s/Φ), and taking into consideration that ρ˜˙ = ρˆ˙ and ˜˙ = Θ ˆ˙ = γ1 · s · Δ, one can easily bring 7 to result, V˙ = −Mp |s| (ρ − |ε|) ≤ 0 Θ Finally by using Barbalat’s lemma, one can easily show that the stability of the proposed controller and adaptation laws are achieved in the sense of Lyapunov. The final structure of the proposed controller is shown in Fig. 2.
984
I.B. Kucukdemiral et. al.
Fig. 2. Block diagram of the proposed controller
4
Simulation Results
To evaluate the proposed controller presented above, a simulation environment of AASS is created for the quarter car model which has component-wise nonlinearities within. Damper and spring element nonlinearities in the system dynamics equations are chosen as in actual components. Vehicle speed of 72 km/h is chosen and two types of road profiles are prepared for controller performance evaluation: standard bump-type surface profile with 10cm length × 10cm height and a random road profile generated to simulate stabilized road with 1cm × 1cm pebbles. Open loop, PID and STFPD are employed along with the proposed controller.The rule-bases for the main FLC and self-tuning mechanism are chosen, as described in [6]. On the other hand, PID controller is, first, tuned with well-known Ziegler-Nichols tuning method and then, with numerous repetitive simulations around previously obtained acceptable PID controller values [7]. Although the initial rule table of the proposed controller is not important because of the self organizing structure of the proposed controller, initial value of the adjustable vector Θ is chosen as Θ = [5000 3000 1000 0 −1000 −3000 −5000]T .The values of k1 and k2 are chosen to be 10.1 and 0.16, respectively. First, bump-type road profile is applied to the system for four types of controllers. Car body displacements are plotted as shown in Fig. 3. In Fig. 3, the proposed controller clearly produces the shortest response time of 0.85 sec. and the lowest peak value of 0.4 cm. Open loop response has continuing oscillations of 25 sec. and also it has a high peak value. PID controller decreases the peak value while shortening the response time. On the other hand, STFPD still has a higher peak value and slower response time compared to the proposed controller. However, for chosen PID parameters, classical PID seems to perform better than PD type self-tuning fuzzy controller.
A Robust Single Input Adaptive Sliding Mode
985
Fig. 3. Car Body Displacement for a simulated bump (Proposed controller: solid bold; passive suspension: dotted; PID controller: dash; STFPD controller: dot-dash)
Fig. 4. Body displacement for random road profile (Proposed controller: solid bold; passive suspension: dotted; PID controller: dot-dash; STFPD controller: dash)
As a second experiment, proposed controller and evaluatory ones have been tested using the random road profile. Fig. 4 shows the response of all four controllers for this road profile and it is obvious that the proposed sliding mode controller has overwhelming success over other controllers. Summary of all responses to two kinds of road conditions can be seen through Table 1.
5
Conclusion
A novel single-input adaptive fuzzy sliding mode controller is proposed and successfully employed to control AASS with component-wise nonlinearities. The strategy is robust and industry applicable since it has a single input FLC as a
986
I.B. Kucukdemiral et. al. Table 1. Comparison of the controller performances
controller
Proposed Passive (Open loop) STFPD PID
z¨s random road (rms) 0.1288 0.4994 0.3209 0.2936
z¨s bumpy road (rms) 1.2538 1.6369 1.4876 1.4097
Sett. time First peak (meter) 0.0040 0.0133 0.0135 0.0112
bumpy road (sec.) 0.85 25 6 4.15
main controller. Thus, the rule base of the FLC drastically decreases when it is compared with the traditional FLCs. On the other hand, the efficiency of the controller is improved by combining a sliding mode compensator which also has an adaptive structure. The stability of the proposed scheme is achieved in the sense of Lyapunov. In order to demonstrate the effectiveness of the proposed method, the controller is applied to the suspension system in comparison with the passive suspension, PID controller and STFPD. Road profiles that are tested are a simulated random road surface and a bump. The simulation results show that the proposed control scheme improves the ride comfort considerably when compared to the aforementioned controller structures.
References 1. Ting, C.S., Li, T.H.S., Kung, F.C.: Design of fuzzy controller for active suspension system. Mechatronics. 5 (1993) 457–470 2. Yeh, E.C., Tsao, Y.J.: A fuzzy preview control scheme of active suspension for rough road. Int. J. Vehicle Des. 15 (1994) 166–180 3. Rao, M.V.C., Prahald, V.: A tunable fuzzy logic controller for vehicle-active suspension system. Fuzzy Sets and Syst. 85 (1997) 11–21 4. D’Amato, F.J., Viassolo, D.E.: Fuzzy control for active suspensions. Mechatronics. 10 (2000) 897–920 5. Huang, S.J., Lin, W.C.: Adaptive fuzzy controller with sliding surface for vehicle suspension control. IEEE Trans. on Fuzzy Syst. 11 (2003) 550–559 6. Mudi, R.K., Pal, N.R.: A robust self-tuning scheme for PI- and PD-type fuzzy controllers. IEEE Trans. on Fuzzy Syst. 7 (1999) 2–16 7. Omurlu, V.E., Engin, S.N., Kucukdemiral, I.B.: A Robust Single Input Adaptive Sliding Mode Fuzzy Logic Controller for a Nonlinear Automotive Suspension System. MED’04 Conference, June 6-9, (2004), Kusadasi, Aydin, Turkey.
Construction of Fuzzy Models for Dynamic Systems Using Multi-population Cooperative Particle Swarm Optimizer Ben Niu1, 2, Yunlong Zhu1, and Xiaoxian He1, 2 1
Shenyang Institute of Automation, Chinese Academy of Sciences, 110016, Shenyang, China, 2 School of Graduate, Chinese Academy of Sciences, 100039, Beijing, China {niuben, ylzhu}sia.cn
Abstract. A new fuzzy modeling method using Multi-population Cooperative Particle Swarm Optimizer (MCPSO) for identification and control of nonlinear dynamic systems is presented in this paper. In MCPSO, the population consists of one master swarm and several slave swarms. The slave swarms execute Particle Swarm Optimization (PSO) or its variants independently to maintain the diversity of particles, while the particles in the master swarm enhance themselves based on their own knowledge and also the knowledge of the particles in the slave swarms. The MCPSO is used to automatic design of fuzzy identifier and fuzzy controller for nonlinear dynamic systems. The proposed algorithm (MCPSO) is shown to outperform PSO and some other methods in identifying and controlling dynamic systems.
1 Introduction The identification and control of nonlinear dynamical systems has been a challenging problem in the control area for a long time. Since for a dynamic system, the output is a nonlinear function of past output or past input or both, and the exact order of the dynamical systems is often unavailable, the identification and control of this system is much more difficult than that has been done in a static system. Therefore, the soft computing methods such as neural networks [1-3], fuzzy neural networks [4-6] and fuzzy inference systems [7] have been developed to cope with this problem. Recently, interest in using recurrent networks has become a popular approach for the identification and control of temporal problems. Many types of recurrent networks have been proposed, among which two widely used categories are recurrent neural networks (RNN) [3, 8, 9] and recurrent fuzzy networks (RFNN) [4, 10]. On the other hand, fuzzy inference systems have been developed to provide successful results in identifying and controlling nonlinear dynamical systems [7, 11]. Among the different fuzzy modeling techniques, the Takagi and Sugeno’s (T-S) type fuzzy controllers have gained much attention due to its simplicity and generality [7, 12]. T-S fuzzy model describes a system by a set of local linear input-output relations and it is seen that this fuzzy model can approach highly nonlinear dynamical systems. L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 987 – 1000, 2005. © Springer-Verlag Berlin Heidelberg 2005
988
B. Niu, Y. Zhu, and X. He
The bottleneck of the construction of a T-S model is the identification of the antecedent membership functions, which is a nonlinear optimization problem. Typically, both the premise parameters and the consequent parameters of T-S fuzzy model are adjusted by using gradient descent optimization techniques [12-13]. Those methods are sensitive to the choice of the initial parameters, easily got stuck in local minima, and have poor generalization properties. This hampers the aposteriori interpretation of the optimized T-S model. The advent of evolutionary algorithm (EA) has attracted considerable interest in the construction of fuzzy systems [1 16]. In [15], [16], EAs have been applied to learn both the antecedent and consequent part of fuzzy rules, and models with both fixed and varying number of rules have been considered. As compared to traditional gradient-based computation system, evolutionary algorithm provides a more robust and efficient approach for the construction of fuzzy systems. Recently, a new evolutionary computation technique, the particle swarm optimization (PSO) algorithm, is introduced by Kennedy and Eberhart [17, 18], and has already come to be widely used in many areas [19-21]. As already has been mentioned by Angeline [22], the original PSO, while successful in the optimization of several difficult benchmark problems, presented problems in controlling the balance between exploration and exploitation, namely when fine tuning around the optimum is attempted. In this paper we try to deal with this issue by introducing a multi-population scheme, which consists of one master swarm and several slave swarms. The slave swarms evolve independently to supply new promising particles (the position giving the best fitness value) to the master swarm as evolution goes on. The master swarm updates the particle states based on the best position discovered so far by all the particles both in the slave swarms and its own. The interactions between the master swarm and the slave swarms control the balance between exploration and exploitation and maintain the population diversity, even when it is approaching convergence, thus reducing the risk of convergence to local sub-optima. The paper is devoted to a novel fuzzy modeling strategy to the fuzzy inference system for identification and control of nonlinear dynamical systems. In this paper, we will use the MCPSO algorithm to design the T-S type fuzzy identifier and fuzzy controller for nonlinear dynamic systems, and the performance is also compared to other methods to demonstrate its effectiveness. The paper is organized as follows. Section 2 gives a review of PSO and a description of the proposed algorithm MCPSO. Section 3 describes the T-S model and a detailed design algorithm of fuzzy model by MCPSO. In Section 4, simulation results of one nonlinear plant identification problem and one nonlinear dynamical system control problems using fuzzy inference systems based on MCPSO are presented. Finally, conclusions are drawn in Section 5.
2 PSO and MCPSO 2.1 Particle Swarm Optimization Particle Swarm Optimization (PSO) is inspired by natural concepts such as fish schooling, bird flocking and human social relations. The basic PSO is a population
Construction of Fuzzy Models for Dynamic Systems
989
based optimization tool, where the system is initialized with a population of random solutions and searches for optima by updating generations. In PSO, the potential solutions, called particles, fly in a D-dimension search space with a velocity which is dynamically adjusted according to its own experience and that of its neighbors. The location and velocity for the ith particle is represented as x i = ( x , x , ... x iD ) i1
i2
and v i = ( v , v , ...v iD ) , respectively. The best previous position of the ith particle i1 i 2 is recorded and represented as Pi = ( P , P , ..., PiD ), which is also called pbest . The i1 i 2 index of the best particle among all the particles in the population is represented by the symbol g , and p g is called gbest . At each time step t , the particles are manipulated according to the following equations: vi ( t + 1) = vi ( t ) + R1c1 ( Pi − xi ( t )) + R 2 c 2 ( Pg − xi ( t ))
(1)
xi (t + 1) = xi (t ) + vi (t )
(2)
where R and R are random values within the interval [0, 1], c and c are 2 1 2 1 acceleration constants. For Eqn. (1), the portion of the adjustment to the velocity influenced by the individual’s own pbest position is considered as the cognition component, and the portion influenced by gbest is the social component. A drawback of the aforementioned version of PSO is associated with the lack of a mechanism responsible for the control of the magnitude of the velocities, which fosters the danger of swarm explosion and divergence. To solve this problem, Shi and Eberhart [23] later introduced an inertia term w by modifying (1) to become: vi ( t + 1) = w × vi ( t ) + R1c1 ( Pi − xi ( t )) + R 2 c 2 ( Pg − xi ( t ))
(3)
They proposed that suitable selection of w will provides a balance between global and local explorations, thus requiring less iteration on average to find a sufficiently optimal solution. As originally developed, w often decreases linearly from about 0.9 to 0.4 during a run. In general, the inertia weight w is set according to the following equation:
w = wmax −
wmax − wmin × iter itermax
(4)
where itermax is the maximum number of iterations, and iter is the current number of iterations. 2.2 Multi-population Cooperative Particle Swarm Optimization The foundation of PSO is based on the hypothesis that social sharing of information among conspecifics. It reflects the cooperative relationship among the individuals (fish, bird, insect) within a group (school, flock, swarm). Obviously it is not the case
990
B. Niu, Y. Zhu, and X. He
of the nature. In natural ecosystem, many species have developed cooperative interactions with other species to improve their survival. Such cooperative—also called symbiosis—co-evolution can be found in organisms going from cells (e.g., eukaryotic organisms resulted probably from the mutualistic interaction between prokaryotes and some cells they infected) to superior animals (e.g., African tick birds obtain a steady food supply by cleaning parasites from the skin of giraffes, zebras, and other animals) [24, 25]. Inspired by the phenomenon of symbiosis in the natural ecosystem, a master-slave mode is incorporated into the PSO, and a Multi-population (species) Cooperative Optimization (MCPSO) is thus presented. In our approach, the population consists of one master swarm and several slave swarms. The symbiotic relationship between the master swarm and slave swarms can keep a right balance of exploration and exploitation, which is essential for the success of a given optimization task. The master-slave communication model is shown in Fig.1, which is used to assign fitness evaluations and maintain algorithm synchronization. Independent populations (species) are associated with nodes, called slave swarms. Each node executes a single PSO or its variants, including the update of location and velocity, and the creation of a new local population. When all nodes are ready with the new generations, each node then sends the best local individual to the master node. The master node selects the best of all received individuals and evolves according to the following equations:
M M M M M M S M vi (t + 1) = wvi (t ) + R1c1( pi − xi (t )) + R2c2 ( pg − xi (t )) + R3c3( pg − xi (t ))
(5)
M M M xi (t + 1) = xi (t ) + vi (t )
(6)
where M represents the master swarm, c is the migration coefficient, and R is a 3 3 uniform random sequence in the range [0, 1]. Note that the particle’s velocity update in the master swarm is associated with three factors: M i. pi : Previous best position of the master swarm. ii. PgM : Best global position of the master swarm. iii. p gS : Previous best position of the slave swarms. Node 1
Node 2
Node k Local best
Master Fig. 1. The master-slave model
Construction of Fuzzy Models for Dynamic Systems
991
As Shown in Eqn. (5), the first term of the summation represents the inertia (the particle keeps moving in the direction it had previously moved), the second term represents memory (the particle is attracted to the best point in its trajectory), the third term represents cooperation (the particle is attracted to the best point found by all particles of master swarm) and the last represents information exchange (the particle is attracted to the best point found by the slave swarms). The pseudocode for the MCPSO algorithm is listed in Fig 2. Algorithm MCPSO Begin Initialize all the populations Evaluate the fitness value of each particle Repeat Do in parallel Node i , 1 ≤ i ≤ K //K is the number of slaver swarms End Do in parallel Barrier synchronization //wait for all processes to finish Select the fittest local individual p gS from the slave swarms Evolve the mast swarm // Update the velocity and position using (5) and (6), respectively Evaluate the fitness value of each particle Until a terminate-condition is met End Fig. 2. Pseudocode for the MCPSO algorithm
3 Fuzzy Model Based on MCPSO 3.1 T-S Fuzzy Model Systems In this paper, the fuzzy model suggested by Takagi and Sugeno is employed to represent a nonlinear system. A T-S fuzzy system is described by a set of fuzzy IFTHEN rules that represent local linear input-output relations of nonlinear systems. The overall system is then an aggregation of all such local linear models. More precisely, a T-S fuzzy system is formulated in the following form: l l i l l l l R : if x1 is A1 and ... xn is An , then yˆ = α 0 + α1x1 + ... + α n xn ,
(7)
l l where yˆ (1 ≤ l ≤ r ) is the output due to rule R and α il (1 ≤ i ≤ n ) , called the consequent parameters, are the coefficients of the linear relation in the lth rule and l will be identified. Ai ( xi ) are the fuzzy variables defined as the following Gaussian membership function:
992
B. Niu, Y. Zhu, and X. He
x − ml 2 l Ai ( x i ) = ex p[ − 12 ∗ ( i l i ) ], σi
(8)
where 1 ≤ l ≤ r , ...,1 ≤ i ≤ n, xi ∈ R , mil and σ il represent the center (or mean) and the width (or standard deviation) of the Gaussian membership function, respectively. mil and σ il are adjustable parameters called the premise parameters, which will be identified. Given an input ( x 0 ( k ), " , xn0 ( k )) , the final output yˆ( k ) of the fuzzy system is 1 inferred as follows: yˆ ( k ) =
n r ¦ l =1 yˆ l ( k )( ∏ i =1 Ail ( xi0 ( k ))) ¦ lr=1 yˆ l ( k ) w l ( k ) , = n r ¦ l =1 ( ∏ i =1 Ail ( xi0 ( k ))) ¦ lr=1 w l ( k )
where the weight strength
(9)
w l (k ) of the lth rule, is calculated by: l l 0 w ( k ) = ∏ in=1 Ai ( xi ( k )).
(10)
3.2 Fuzzy Model Strategy Based on MCPSO
The detailed design algorithm of fuzzy model by MCPSO is introduced in this section. The overall learning process can be described as follows: (1) Parameter representation In our work, the parameter matrix, which consists of the premise parameters and the consequent parameters described in section 3.1, is defined as a two dimensional matrix, i.e.,
ª m1 « 12 «m1 «" « r «¬ m1
1
1
1
1
" r mn
" r σn
" r α0
1
1
σ1 " mn σ n α 0 α1 " α n º » 2 2 2 2 2 2 σ1 " mn σ n α 0 α1 " α n » " r σ1
"
" " "» r r» α1 " α n »¼
The size of the matrix can be represented by D = r × (3n + 1) . (2) Parameter learning a)
In MCPSO, the master swarm and the slave swarm both work with the same parameter settings except for the velocity update equation. Initially, N × n ( N ≥ 2, n ≥ 2) individuals forming the population should be randomly generated and the individuals can be divided into N swarms (one master swarm and N − 1 slave swarms). Each swarm contains n individuals with random
Construction of Fuzzy Models for Dynamic Systems
993
positions and velocities on D dimensions. These individuals may be regarded as particles in terms of PSO. In T-S fuzzy model system, the number of rules, r , should be assigned in advance. In addition, the maximum iterations wmax , minimum inertia weight wmin and the learning parameters c , c , the migration 1 2 coefficient c should be assigned in advance. After initialization, new 3
individuals on the next generation are created by the following step. b) For each particle, evaluate the desired optimization fitness function in D variables. The fitness function is defined as the reciprocal of RMSE (root mean quadratic error), which is used to evaluate various individuals within a population of potential solutions. Considering the single output case for clarity, our goal is to minimize the error function: RMSE =
K 2 1 ( y p ( k + 1) − y r ( k + 1)) K k¦ =1
(11)
where K is the total time steps, y p (k + 1) is the inferred output and yr ( k + 1) is the desired reference output.. Evaluate the fitness for each particle. Compare the evaluated fitness value of each particle with it’s pbest . If current value is better than pbest , then set the current location as the pbest location in D-dimension space. Furthermore, if current value is better than gbest , then reset gbest to the current index in particle array. This step will be executed in parallel for both the master swarm and the slave swarms. e) In each generation, after step d) is executed, the best-performing
c) d)
particle p gS among the slave swarms should be marked. Update the velocity and position of all the particles in N − 1 slave swarms according to Eqn. (3) and Eqn. (2), respectively (Suppose that N − 1 populations of SPSO with the same parameter setting are involved in MCPSO as the slave swarms). g) Update the velocity and position of all the particles in the master swarm according to Eqn. (5) and Eqn. (6), respectively. f)
(3) Termination condition The computations are repeated until the premise parameters and consequent parameters are converged. It should be noted that after the operation in master swarm and slaver swarm the values of the individual may exceed its reasonable range. Assume that the domain of the ith input variable has been found to be [min( xi ), max( xi )] from training data, then the domains of mil and σ il are defined as
[min( xi ) − δ i , max( xi ) + δ i ] and [ di − δ i , di + δ i ], respectively, where δ i is a small positive value defined as δ i = (max( xi ) − min( xi )) /10 , and di is the predefined width of the Gaussian membership function, the value is set as (max( xi ) − min( xi )) / r , the variable r is the number of fuzzy rules.
994
B. Niu, Y. Zhu, and X. He
4 Illustrative Examples In this section, two nonlinear dynamic applications, including an example of identification of a dynamic system and an example of control of a dynamic system are conducted to validate the capability of the fuzzy inference systems based on MCPSO to handle the temporal relationship. The main reason for using these dynamic systems is that they are known to be stable in the bounded input bounded output (BIBO) sense. A. Dynamic System Identification The systems to be identified are dynamic systems whose outputs are functions of past inputs and past outputs as well. For this dynamic system identification, a serialparallel model is adopted as identification configuration shown in Fig.3. Example 1: The plant to be identified in this example is guided by the difference equation [2, 4]: y p ( k + 1) = f [ y p ( k ), y p ( k − 1), y p ( k − 2), u ( k ), u ( k − 1)],
(12)
x x x x ( x −1) + x 4 f [ x1 , x 2 , x3 , x 4 , x 5 ] = 1 2 3 52 3 2 . 1+ x3 + x2
(13)
where
Here, the current output of the plant depends on three previous outputs and two previous inputs. Unlike the authors in [1] who applied a feedforward neural network with five input nodes for feeding appropriate past values of y p ( k ) and u ( k ) , we only use the current input u ( k ) and the output y p ( k ) as the inputs to identify the output of the plant y p (k + 1) . In training the fuzzy model using MCPSO for the nonlinear plant, we use only ten epochs and there are 900 time steps in each epoch. Similar to the inputs used in [2, 3]. The input is an independent and identically distributed (iid) uniform sequence over [-2, 2] for about half of the 900 time steps and a single sinusoid given by 1.05 * sin(π k / 45) for the remaining time steps. In applying MCPSO to this plant, the number of swarms N = 4 , the population size of each swarm n = 20 , are chosen, i.e., 80 individuals are initially randomly generated in a population. The number r of the fuzzy rules is set to be 4. For master swarm, inertial weights wmax , wmin , the acceleration constants c , c , and the migration coefficient 1
2
c3 , are set to 0.35, 0.1, 1.5, 1.5 and 0.8, respectively. In slave swarms the inertial weights and the acceleration constants are the same as those used in master swarm. To show the superiority of MCPSO, the fuzzy identifiers designed by PSO are also applied to the same identification problem. In PSO, the population size is set as 80 and initial individuals are the same as those used in MCPSO. For fair comparison, the
Construction of Fuzzy Models for Dynamic Systems
995
other parameters, wmax , wmin , c , c are the same as those defined in MCPSO. To 1 2 see the identified result, the following input as used in [3, 4] is adopted for test:
u(k ) = sin(π k / 25), k < 250 = 1.0, 250 ≤ k < 500 = −1.0, 500 ≤ k < 750 = 0.3sin(π k / 25) + 0.1sin(π k / 32) + 0.6sin(π k /10), 750 ≤ k < 1000.
(14)
Fig. 3. Identification of nonlinear plant using MCPSO 1
output
0.5
0
-0.5
-1
-1.5 0
200
400
600
800
1000
time
Fig. 4. Identification results using MCPSO in Example 1, where the solid curve denotes desired output and the dotted curve denotes the actual output Table 1. Performance comparisons with different methods for Example 1
Method RMSE(train) RMSE (test)
RSONFIN 0.0248 0.0780
RFNN 0.0114 0.0575
TRFN-S 0.0084 0.0346
PSO 0.0386 0.0372
MCPSO 0.0146 0.0070
996
B. Niu, Y. Zhu, and X. He
Fig.4 shows the desired output (denoted as a solid curve) and the inferred output obtained by using MCPSO (denoted as a dotted curve) for the testing input signal. Table 1 gives the detailed identification results using different methods, where the results of the methods RSONFIN, RFNN and TRFN-S come from literature [5]. From the comparisons, we see that the fuzzy controller designed by MCPSO is superior to the method using RSONFIN, and is slight inferior to the methods using RFNN and TRFN-S. However, among the three types of methods, it achieves the highest identification accuracy in the test part, which demonstrates its better generalized ability. The results of MCPSO identifier also demonstrate the improved performance compared to the results of the identifiers obtained by PSO. The abnormal phenomenon that the test RMSE is smaller than the train RMSE using fuzzy identifier based on PSO and MCPSO may attributes to the well-regulated input data in test part (during time steps [250, 500] and [500, 750], the input data is equal to a constant). B. Dynamic System Control
As compare to linear systems, for which there now exists considerable theory regarding adaptive control, very litter is known concerning adaptive control of plants governed by nonlinear equations. It is in the control of such systems that we are primarily interested in this section. Based on MCPSO, the fuzzy controller is designed for the control of dynamical systems. The control configuration and input-output variables of MCPSO fuzzy controller are shown in Fig.5, and are applied to one MISO (multi-input-single-output) plant control problem in the following example. The comparisons with other control methods are also presented. Example 2: The controlled plant is the same as that used in [2] and [5] and is given by
y p (k + 1) =
y p (k ) y p (k − 1)( y p (k ) + 2.5) + u(k ). 2 2 1 + y p (k ) + y p (k − 1)
(15)
In designing the fuzzy controller using MCPSO, the desired output yr is specified by the following 250 pieces of data: yr ( k + 1) = 0.6 yr ( k ) + 0.2 yr ( k − 1) + r ( k ),1 ≤ k ≤ 250, r ( k ) = 0.5 sin(2π k / 45) + 0.2 sin(2π k / 15) + 0.2 sin(2π k / 90).
The inputs to MCPSO fuzzy controller are y p ( k ) and yr ( k ) and the output is u (k ). There are five fuzzy rules in MCPSO fuzzy controller, i.e., r = 5 , resulting in total of 35 free parameters. Other parameters in applying MCPSO are the same as those used in Example 1. The fuzzy controller designed by PSO is also applied to the MISO control problem and the parameters in using PSO are also the same as those defined in Example 1.The evolution is processed for 100 generations and is repeated for 50 runs. The averaged best-so-far RMSE value over 50 runs for each generation is shown in Fig.6.
Construction of Fuzzy Models for Dynamic Systems
997
Fig. 5. Dynamical system control configuration with MCPSO fuzzy controller 1 MCPSO PSO
0.9 0.8
RMSE
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
20
40
60
80
100
Iteration
Fig. 6. Average best-so-far RMSE in each generation for PSO and MCPSO in Example 2
From the figure, we can see that MCPSO converges with a higher speed compared to PSO and obtains a better result. In fact, since the competition relationships of the slave swarms, the master swarm will not be influenced much when a certain slave swarms gets stuck at a local optima. Avoiding premature convergence allows MCPSO continue search for global optima in optimization problems The best and averaged RMSE error for the 50 runs after 100 generations of training for each run are listed in Table 2, where the results of the methods GA and HGAPSO are from [6]. It should be noted that the TRFN controller designed by HGPSO (or GA) is evolved for 100 generations and repeated for 100 runs in literature [6]. To test the performance of the designed fuzzy controller, another reference input r ( k ) is given by: r ( k ) = 0.3sin(2π k / 50) + 0.2 sin(2π k / 25) + 0.4 sin(2π k / 60), 251 ≤ k ≤ 500.
998
B. Niu, Y. Zhu, and X. He 4 3 2
output
1 0 -1 -2 -3 -4 0
50
100
150
200
250
time step
(a) 4 3 2
output
1 0 -1 -2 -3 -4 0
50
100
150
200
250
time step
(b) Fig. 7. The tracking performance by MCPSO controller in Example 2 for (a) training and (b) test reference output, where the desired output is dotted as solid curve and the actual output by a dotted curve
The best and averaged control performance for the test signal over 50 runs is also listed in Table 2. From the comparison results, we can see that the fuzzy controller based on MCPSO outperforms those based on GA and PSO greatly especially in the test results, and reaches the same control level with the TRFN controller based on HGPSO. To demonstrate control performance using the MCPSO fuzzy controller for the MISO control problem, one control performance of MCPSO is shown in Fig.7 for both training and test control reference output.
Construction of Fuzzy Models for Dynamic Systems
999
Table 2. Performance comparisons with different methods for Example 2
Method RMSE (train mean) RMSE (train best) RMSE (test mean) RMSE (test best)
GA 0.2150 0.1040 — —
PSO 0.1364 0.0526 0.1526 0.1024
HGAPSO 0.0890 0.0415 — —
MCPSO 0.1024 0.0518 0.1304 0.0704
5 Conclusions The paper proposed a multi-population cooperative particle swarm optimizer to identify the T-S fuzzy model for processing nonlinear dynamic systems. In the simulation part, we apply the suggested method to respectively design a fuzzy identifier for a nonlinear dynamic plant identification problem and a fuzzy controller for a nonlinear dynamic plant control problem. To demonstrate the effectiveness of the proposed algorithm MCPSO, its performance is compared to several typical methods in dynamical systems.
Acknowledgements This work is supported by the National Natural Science Foundation of China (No.70431003) and the National Basic Research Program of China (No. 2002CB312200). The first author would like to thank Prof. Q.H Wu of Liverpool University for many valuable comments. Helpful discussions with Dr. B. Ye, Dr. L.Y. Yuan and Dr. S. Liu are also gratefully acknowledged.
References 1. Narenda, K. S., Parthasarathy, K: Adaptive identification and control of dynamical systems using neural networks. In: Proc. of the 28th IEEE Conf. on Decision and Control, Vol. 2. Tampa, Florida, USA (1989) 1737-1738 2. Narenda, K. S., Parthasarathy, K.: Identification and control of dynamical systems using neural networks. IEEE Trans. Neural Networks 1 (1990) 4-27 3. Sastry, P. S., Santharam, G., Unnikrishnan, K. P.: Memory neural networks for identification and control of dynamical systems. IEEE Trans. Neural Networks 5 (1994) 306-319 4. Lee, C. H., Teng, C. C.: Identification and control of dynamic systems using recurrent fuzzy neural networks. IEEE Trans. Fuzzy Syst. 8 (2000) 349-366 5. Juang, C. F.: A TSK-type recurrent fuzzy network for dynamic systems processing by neural network and genetic algorithms. IEEE Trans. Fuzzy Syst. 10 (2002) 155-170 6. Juang, C. F.: A hybrid of genetic algorithm and particle swarm optimization for recurrent network design. IEEE Trans. Syst. Man Cyber. B 34 (2004) 997-1006 7. Tseng, C. S., Chen, B. S., Uang, H. J.: Fuzzy tracking control design for nonlinear dynamical systems via T-S fuzzy model. IEEE Trans. Fuzzy Syst. 9 (2001) 381-392
1000
B. Niu, Y. Zhu, and X. He
8. Chow, T. W. S., Yang, F.: A recurrent neural-network-based real-time learning control strategy applying to nonlinear systems with unknown dynamics, IEEE Trans. Industrial Electronics 45 (1998) 151-161 9. Gan, C., Danai, K.: Model-based recurrent neural network for modeling nonlinear dynamic systems. IEEE Trans. Syst., Man, Cyber. 30 (2000) 344-351 10. Juang, C. F., Lin, C. T.: A Recurrent self-constructing neural fuzzy inference network. IEEE Trans. neural networks. 10 (1999) 828-845 11. Wang, L. X., Mendel, J.M.: Back-propagation fuzzy systems as nonlinear dynamic systems identifiers. In: Proc. IEEE Int. Conf. Fuzzy Syst., San Diego, USA (1992) 14091418 12. Takagi, T., Sugeno, M.: Fuzzy identification of systems and its application. IEEE Trans. Syst., Man, Cyber. 15 (1985) 116-132 13. Tanaka K., Ikeda, T., Wang, H. O.: A unified approach to controlling chaos via an LMIbased fuzzy control system design. IEEE Trans. Circuits and Systems 45 (1998) 10211040 14. Karr, C. L.: Design of an adaptive fuzzy logic controller using a genetic algorithm. In: Proc. of 4th Int. Conf. Genetic Algorithms, San Diego, USA (1991) 450–457 15. Wang, C. H., Hong, T. P., Tseng, S.S.: Integrating fuzzy knowledge by genetic algorithms. IEEE Trans. Evol. Comput. 2 (1998) 138-149 16. Ishibuchi, H., Nakashima, T. and Murata, T.: Performance evaluation of fuzzy classifier systems for multi dimensional pattern classification problems. IEEE Trans. Syst., Man, Cyber. B 29 (1999) 601-618 17. Eberhart, R. C., Kennedy, J.: A new optimizer using particle swarm theory. In: Proc. of Int. Sym. Micro Mach. Hum. Sci., Nagoya, Japan (1995) 39-43 18. Kennedy, J., Eberhart, .R. C.: Particle swarm optimization. In: Proc. of IEEE Int. Conf. on Neural Networks, Piscataway, NJ (1995) 1942-1948 19. Zhang, C., Shao, H., Li, Y.: Particle swarm optimization for evolving artificial network. In: Proc. of IEEE Int. Conf. Syst., Man, Cyber., Vol.4. Nashville, Tennessee, USA (2000) 2487–2490 20. Engelbrecht, A. P. , Ismail, A. : Training product unit neural networks. Stability Control: Theory Appl. 2 (1999) 59–74 21. Mendes, R., Cortez, P. Rocha, M. and Neves, J.: Particle swarms for feedforward neural network training. In: Proc. of Int. Joint Conf. on Neural Networks, Honolulu, USA (2002) 1895–1899 22. Angeline, P. J.: Evolutionary optimization versus particle swarm optimization: philosophy and performance difference. In: Proc. of the 7th Annual Conf. on Evolutionary Programming, San Diego, USA (1998) 601-610 23. Shi, Y., Eberhart, R. C.: A modified particle swarm optimizer. Proc. of IEEE Int. Conf. on Evolutionary Computation, Anchorage, USA (1998) 69-73 24. Moriartv, D., Miikkulainen: Reinforcement learning through symbiotic evolution Machine learning. 22 (1996) 11-32 25. Wiegand, R. P.: An analysis of cooperative coevolutionary Algorithms. PhD thesis, George Mason University, Fairfax, Virginia, USA (2004)
Human Clustering for a Partner Robot Based on Computational Intelligence Indra Adji Sulistijono1,2 and Naoyuki Kubota1,3 1
Department of Mechanical Engineering, Tokyo Metropolitan University, 1-1 Minami-Osawa, Hachioji, Tokyo 192-0397, Japan 2 Electronics Engineering Polytechnic Institute of Surabaya – ITS (EEPIS-ITS), Kampus ITS Sukolilo, Surabaya 60111, Indonesia [email protected] 3 “Interaction and Intelligence”, PRESTO, Japan Science and Technology Corporation (JST) [email protected]
Abstract. This paper proposes computational intelligence for a perceptual system of a partner robot. The robot requires the capability of visual perception to interact with a human. Basically, a robot should perform moving object extraction, clustering, and classification for visual perception used in the interaction with human. In this paper, we propose a total system for human clustering for a partner robot by using long-term memory, k-means, self-organizing map and fuzzy controller is used for the motion output. The experimental results show that the partner robot can perform the human clustering.
1 Introduction Recently, a personal robot is designed to entertain or to assist a human owner, and such a robot should have capabilities to recognize a human, to interact with a human in natural communication, and to learn the interaction style with the human. Especially, visual perception is very important, because the vision might include much information for the interaction with the human. Image-based human tracking might play a prominent role in the next generation of surveillance systems and human computer interfaces. Estimating the pose of the human body in a video stream is difficult problem because of the significant variations in appearance of the object throughout the sequence [16]. The robot might specify the intention of the human by using a built map, because a task is dependent on its environmental conditions. Furthermore, the robot should learn not only an environmental map, but also human gestures or postures to communicate with the human. In the previous studies, various image processing methods for robotic visual perception such as differential filter, moving object detection, and pattern recognition [14,15]. In order to perform the pattern recognition, the robot needs patterns or templates, but a humanL. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 1001 – 1010, 2005. © Springer-Verlag Berlin Heidelberg 2005
1002
I.A. Sulistijono and N. Kubota
friendly robot cannot know pattern nor templates for the persons beforehand. Therefore, the robot must learn patterns or templates through the interaction with the human. For this, we propose a method for visual tracking of a human. First of all, the robot extracts a human from an image taken by a built-in CCD camera. Since we assume a human can move, a moving object is a candidate for the human. A longterm memory is used in order to extract a human from the taken image. The differential filter is used for detecting a moving object. Next, the color combination pattern is extracted by k-means. A robot classifies the detected human by self-organizing map based on the color pattern. The position of the human is obtained from the SOM node. Finally, the robot moves toward the detected human. This paper is organized as follows. Section 2 explains a vision-based partner robot, and proposes a human recognition method based on the visual perception. Section 3 shows several experimental results and section 4 summarizes the paper.
2 Visual Perception 2.1 A Partner Robot We developed a partner robot; MOBiMac as shown in Figure 1. This robot is designed for the usage as a personal computer as well as a partner. Two CPUs are used for PC and robotic behaviors and the robot has two servo motors, four ultrasonic sensors, and a CCD camera. The ultrasonic sensor can measure 2000 mm. Therefore, the robot can take various behaviors such as collision avoiding, human approaching, and visual tracking.
CCD Camera Ultra Sonic Sensor
PC Unit
CCD Camera
Wireless LAN V25 CPU
Control Unit
RS 232C
Host Computer
Sensor Board
Ultra-sonic Sensor 8ch
PWM Board
DC Motor
DC Motor UPP Board
Wireless Keyboard & Mouse
Rotary Encoder
Fig. 1. A partner robot; MOBiMac and its control architecture
The robot takes an image from the CCD camera, and extracts a human. If the robot detects the human, the robot extracts the color patterns of the human. And the robot takes a visual tracking behavior (see also in figure 1). Figure 2 shows the total architecture of visual tracking process and the detailed procedure is explained in the following.
Human Clustering for a Partner Robot Based on Computational Intelligence
1003
Image taking Differential filter k-means for color extraction SOM for feature extraction SOM for human clustering Combining for Positioning Fuzzy inference Motion output
Fig. 2. Total architecture of visual tracking
2.2 Differential Filter In order to detect a human, we use the recent topic of psychology [3-7]. In this paper, a long-term memory based on expectation is used to detect a human considered as a moving object. The robot has a long-term memory used as a background image GI(t) composed of gi (i=1, 2, ..., l). A temporal image TI(t) is generated using differences between an image at t and the GI(t-1). Each pixel has a belief value bi satisfying 0.010 and ¦ sρ =1 Φ ρ Pρ +GTjk Pi + PG i jk < 0
(10)
P is common positive definite matrix and where: Gjk=Aj-BjFk, and i, j, k, ȡ=1,2 satisfies PT=P. (Proof) The proof is omitted due to lack of space. With the given ĭȡ, the regional feedback gains Fj can be determined with the PDC scheme and LMI approach proposed in ref. [4]. With this condition, the close-loop FSM system can guarantee both stability and smoothness. Remark: It is difficult to determine Ɏȡs so as to satisfy (9), Reference [3] proposes a new PDC design in the case of the time derivatives of membership function is computable from the states. This paper employs the algorithm proposed in ref. [3].
4 Example Consider a fuzzy switching system which FSM model rules and controller rules are shown as follows: Model rules: Regional Rule i: If x2(t) is Ni Then Local Plant Model j: If x2 (t ) is M ij Then x (t ) = Aij x (t )+ Bij u (t ) Controller rules: Regional Rule i: If x2(t) is Ni Then Local Controller Model j: If x2 (t ) is M ij Then u = − Fij x (t ) Where i = 1, 2; j = 1, 2 . x(t)=[x1(t); x2(t)] is state variable. Ni is fuzzy sets. Each regional membership functions are assigned as follows: x2 (t )0 . There are some constraints about choosing the parameters of k and b as follows: (1) |x/Į(x)| įE, į is around 1. Input variables after contraction should still in UOD. (2) 0 0) is a parameter. Expanding the joint probability distribution function P (xk , yk ) in an orthogonal form based on the product P (xk ) and P (yk ), the following expression on the conditional probability distribution function P (yk |xk ) can be derived. P (yk |xk ) =
S R P (xk , yk ) = P (yk ) Ars θr(1) (xk )θs(2) (yk ), P (xk ) r=0 s=0
Ars = < θr(1) (xk )θs(2) (yk ) >, (A00 = 1, Ar0 = A0s = 0 (r, s = 1, 2, · · ·)),
(2) (3)
where < > denotes the averaging operation on the variables. The linear and non-linear correlation information between xk and yk is reflected hierarchically (1) (2) in each expansion coefficient Ars . The functions θr (xk ) and θs (yk ) are orthonormal polynomials with the weighting functions P (xk ) and P (yk ) respectively, and can be decomposed by using Schmidt’s orthogonalization[6]. Though (2) is originally an infinite series expansion, a finite expansion series with r ≤ R and s ≤ S is adopted because only finite expansion coefficients are available and the consideration of the expansion coefficients from the first few terms is usually sufficient in practice. Since the objective system contains an unknown specific signal and unknown structure, the expansion coefficients Ars expressing hierarchically the statistical relationship between xk and yk must be estimated on the basis of the fuzzy observation zk . Considering the expansion coefficients Ars as unknown parameter vector a: a = (a1 , a2 , · · · , aI ) = (a(1) , a(2) , · · · , a(S) ), a(s) = (A1s , A2s , · · · , ARs ), (s = 1, 2, · · · , S),
(4)
1164
A. Ikuta et al.
the following simple dynamical model is introduced for the simultaneous estimation of the parameters with the specific signal xk : ak+1 = ak , (ak = (a1,k , a2,k , · · · , aI,k ) = (a(1),k , a(2),k , · · · , a(S),k )),
(5)
where I(= RS) is the number of unknown expansion coefficients to be estimated. On the other hand, based on the correlative property in time domain for the specific signal fluctuating with non-Gaussian property, the following time transition model for the specific signal is generally established. xk+1 = F xk + Guk ,
(6)
where uk is the random input with mean 0 and variance σu2 . Two parameters F and G are estimated by using a system identification method[8]. A method to estimate xk based on the fuzzy observation zk is derived in this study by introducing a fuzzy probability theory and an expansion series expression of the conditional probability distribution function.
3
State Estimation Based on Fuzzy Observation
In order to derive an estimation algorithm for a specific signal xk , based on the successive observations of fuzzy data zk , we focus our attention on Bayes’ theorem[9] for the conditional probability distribution. Since the parameter ak is also unknown, the conditional probability distribution of xk and ak is considered. P (xk , ak |Zk ) =
P (xk , ak , zk |Zk−1 ) , P (zk |Zk−1 )
(7)
where Zk (= {z1 , z2 , · · · , zk }) is a set of observation data up to a time k. After applying fuzzy probability[10] to the right side of (7), expanding it in a general form of the statistical orthogonal expansion series[11], the conditional probability density function P (xk , ak |Zk ) can be expressed as: μzk (yk )P (xk , ak , yk |Zk−1 )dyk P (xk , ak |Zk ) = μzk (yk )P (yk |Zk−1 )dyk ∞ ∞ ∞ (1) (2) Blmn P0 (xk |Zk−1 )P0 (ak |Zk−1 )ψl (xk )ψm (ak )In (zk ) =
l=0 m=0 n=0
∞
(8) B00n In (zk )
n=0
with
In (zk ) =
μzk (yk )P0 (yk |Zk−1 )ψn(3) (yk )dyk , (1)
(9)
(2) (ak )ψn(3) (yk )|Zk−1 >, (m = (m1 , m2 , · · · , mI )). (10) Blmn = < ψl (xk )ψm
A Fuzzy Adaptive Filter for State Estimation of Unknown Structural System (1)
(2)
1165
(3)
The functions ψl (xk ), ψm (ak ) and ψn (yk ) are the orthogonal polynomials of degrees l, m and n with weighting functions P0 (zk |Zk−1 ), P0 (ak |Zk−1 ) and P0 (yk |Zk−1 ), which cam be artificially chosen as the probability density functions describing the dominant parts of P (xk |Zk−1 ), P (ak |Zk−1 ) and P (yk |Zk−1 ). (1) (2) Based on (8), and using the orthonormal relationships of ψl (xk ) and ψm (ak ), the recurrence algorithm for estimating an arbitrary (L, M)th order polynomial type function fL,M (xk , ak ) of xk and ak can be derived as follows: fˆL,M(xk , ak ) = < fL,M (xk , ak )|Zk > ∞ M L
=
LM Clm Blmn In (zk )
l=0 m=0 n=0 ∞
,
(11)
B00n In (zk )
n=0 LM where Clm is the expansion coefficient determined by the equality:
fL,M (xk , ak ) =
M L l=0 m=0
(1)
LM (2) Clm ψl (xk )ψm (ak ).
(12)
In order to make the general theory for estimation algorithm more concrete, the well-known Gaussian distribution is adopted as P0 (xk |Zk−1 ), P0 (ak |Zk−1 ) and P0 (yk |Zk−1 ), because this probability density function is the most standard one. P0 (xk |Zk−1 ) = N (xk ; x∗k , Γxk ), P0 (ak |Zk−1 ) = P0 (yk |Zk−1 ) = N (yk ; yk∗ , Ωk )
I +
N (ai,k ; a∗i,k , Γai,k ),
i=1
(13)
with 1 (x − μ)2 exp{− }, N (x; μ, σ 2 ) = √ 2σ 2 2πσ 2 x∗k = < xk |Zk−1 >, Γxk =< (xk − x∗k )2 |Zk−1 >,
a∗i,k = < ai,k |Zk−1 >, Γai,k =< (ai,k − a∗i,k )2 |Zk−1 >, yk∗ = < yk |Zk−1 >, Ωk =< (yk − yk∗ )2 |Zk−1 > .
(14)
Then, the orthonormal functions with three weighting probability density functions in (13) can be given in the Hermite polynomial[12]: I + ai,k − a∗i,k 1 1 xk − x∗ (1) (2) √ Hm i ( ; (ak ) = ), ψl (xk ) = √ Hl ( ; k ), ψm Γai,k mi ! Γxk l! i=1
1 yk − y ∗ ψn(3) (yk ) = √ Hn ( √ k ). Ωk n!
(15)
1166
A. Ikuta et al.
Accordingly, by considering (1)(13) and (15), (9) can be given by eK3 (zk ) In (zk ) = √ 2K1 Ωk
;
(yk − K2 (zk ))2 1 exp{− } 1/K1 π/K1 n
yk − K2 (zk ) 1 dnr Hr ( ; )dyk ·√ n! r=0 1/2K1
(16)
with K1 =
2αΩk + 1 2αΩk zk + yk∗ , , K2 (zk ) = 2Ωk 2αΩk + 1
K3 (zk ) = K1 {K22 (zk ) −
2αΩk zk2 + yk∗ 2 }, 2αΩk + 1
(17)
where the fuzzy data zk are reflected in K2 (zk ) and K3 (zk ). Furthermore, dnr (r = 0, 1, 2, · · · , n) are the expansion coefficients in the equality: n yk − yk∗ yk − K2 (zk ) √ Hn ( )= dnr Hr ( ; ). Ωk 1/2K1 r=0
(18)
By considering the orthonormal condition of Hermite polynomial[12], (16) can be expressed as follows: In (zk ) = √
eK3 (zk ) dn0 , 2K1 Ωk n!
(19)
where a few concrete expressions of dn0 in (19) can be expressed as follows: 1 (K2 (zk ) − yk∗ )2 1 d00 = 1, d10 = √ (K2 (zk ) − yk∗ ), d20 = + − 1, Ωk 2K1 Ωk Ωk d30 =
d40 =
3(K2 (zk ) − yk∗ ) 3/2
2K1 Ωk
−
3(K2 (zk ) − yk∗ ) (K3 (zk ) − yk∗ )3 √ + , 3/2 Ωk Ωk
3 3{(K2 (zk ) − yk∗ )2 − Ωk } + (2Ωk K1 )2 Ωk2 K1 +
(K2 (zk ) − yk∗ )4 − 6Ωk (K2 (zk ) − yk∗ )2 + 3Ωk2 . Ωk2
(20)
Using the property of conditional expectation and (2), the two variables yk∗ and Ωk in (20) can be expressed in functional forms on predictions of xk and ak at a discrete time k (i.e., the expectation value of arbitrary functions of xk and ak conditined by Zk−1 ), as follows:
A Fuzzy Adaptive Filter for State Estimation of Unknown Structural System
yk∗
1167
= |Zk−1 >=< 1 ∞
=
e1s Ars θr(1) (xk )|Zk−1 >
r=0 s=0 1
e1s < A(s),k Θ(xk )|Zk−1 >,
(21)
s=0
Ωk =
=
r=0 s=0
=
2
e2s < A(s),k Θ(xk )|Zk−1 >
(22)
s=0
with A(s),k = (0, a(s),k ), (s = 1, 2, · · ·), A(0),k = (1, 0, 0, · · · , 0), Θ(xk ) = (θ0 (xk ), θ1 (xk ), · · · , θR (xk ))T , (1)
(1)
(1)
(23)
where T denotes the transpose of a matrix. The coefficients e1s and e2s in (21) and (22) are determined in advance by expanding yk and (yk − yk∗ )2 in the following orthogonal series forms: yk =
1 i=0
e1i θi (yk ), (yk − yk∗ )2 = (2)
2 i=0
(2)
e2i θi (yk ).
(24)
(2)
Furthermore, using (2) and the orthonormal condition of θi (yk ), each expansion coefficient Blmn defined by (10) can be obtained through the similar calculation process to (21) and (22), as follows: (1) (2) Blmn = < ψl (xk )ψm (ak ) ψn(3) (yk )P (yk |xk )dyk |Zk−1 > = (ψn(3) (yk ) =
n s=0 n i=0
(1)
(2) ens < ψl (xk )ψm (ak )A(s),k Θ(xk )|Zk−1 >,
(25)
(2)
eni θi (yk ), eni ; appropriate coef f icients).
In the above, the expansion coefficient Blmn can be given by the predictions of xk and ak . Finally, by considering (5) and (6), the prediction step to perform the recurrence estimation can be given for an arbitrary function gL,M(xk+1 , ak+1 ) with (L, M)th order, as follows: ∗ gL,M (xk+1 , ak+1 ) = < gL,M (xk+1 , ak+1 )|Zk >
= < gL,M (F xk + Guk , ak )|Zk > .
(26)
1168
A. Ikuta et al.
The above equation means that the predictions of xk+1 and ak+1 at a discrete time k are given in the form of estimates for the polynomial functions of xk and ak . Therefore, by combining the estimation algorithm of (11) with the prediction algorithm of (26), the recurrence estimation of the specific signal can be achieved.
4
Application to Psychological Evaluation for Noise Annoyance
To find the quantitative relationship between the human noise annoyance and the physical sound level for environmental noises is important from the viewpoint of noise assessment. Especially, in the evaluation for a regional sound environment, the investigation based on questionnaires to the regional inhabitants is often given when the experimental measurement at every instantaneous time and at every point in the whole area of the region is difficult. Therefore, it is very important to estimate the sound level based on the human noise annoyance data. It has been reported that the noise annoyance based on the human sensitivity can be distinguished each other from 7 annoyance scores, for instance, 1.very calm, 2.calm, 3.mostly calm, 4.little noisy, 5.noisy, 6.fairly noisy, 7.very noisy, in the psychological acoustics[2]. After recording the road traffic noise by use of a sound level meter and a data recorder, by replaying the recorded tape through amplifier and loudspeaker in a laboratory room, 6 female subjects (A, B, · · · , F ) aged of 22-24 with normal hearing ability judged one score among 7 noise annoyance scores (i.e.,1, 2, · · · , 7) at every 5 [sec.], according to their impressions for the loudness at each moment using 7 categories from very calm to very noisy. Two kinds of data (Data 1 and Data 2) were used, namely, the sound level data of road traffic noise with mean values 71.4 [dB] and 80.2 [dB]. The proposed method was applied to an estimation of the time series xk for sound level of a road traffic noise based on the successive judgments zk on human annoyance scores after regarding zk as fuzzy observation data. Figure 1 shows one of the estimated results of the waveform fluctuation of the sound level. In this figure, the horizontal axis shows the discrete time k of the estimation process, and the vertical axis represents the sound level. For comparison, the estimated result obtained by a method without considering fuzzy theory is also shown in this figure. More specifically, a similar algorithm to our previously reported methods[6,7] based on expansion expressions of Bayes’ theorem without consideing the membership function in (8) is applied to estimate the specific signal xk . There are great discrepancies between the estimates based on the method without considering the membership function and the true values, while the proposed method estimates precisely the waveform of the sound level with rapidly changing fluctuation. The root mean squared errors of the estimation are shown in Table 1 (for Data 1) and Table 2 (for Data 2). It is obvious that the proposed method shows more accurate estimation than the results based on the method without considering fuzzy theory.
A Fuzzy Adaptive Filter for State Estimation of Unknown Structural System
1169
100 95 90 85
[dBA]
80 75 70 65 60
Xk
55 50 45 40 35
True values Estimated results by considering fuzzy theory Estimated results without considering fuzzy theory
30 25 0
20
40
60 k
80
100
120
Fig. 1. Estimation results of the fluctuation waveform of the sound level based on the succesive judgement on human annoyance scores by the subject A (for Data 1)
Table 1. Root mean squared error of the estimation in [dB] (for Data 1) Subject A B C D E F Proposed Method 3.65 3.63 4.51 4.62 4.89 4.56 Compared Method 7.55 4.10 15.8 5.06 5.13 5.75
Table 2. Root mean squared error of the estimation in [dB] (for Data 2) Subject A B C D E F Proposed Method 4.59 4.26 4.82 6.80 7.49 4.65 Compared Method 10.7 7.79 4.96 14.6 11.6 4.64
5
Conclusion
In this paper, based on the observed data with fuzziness, a new method for estimating the specific signal for sound environment systems with unknown structure has been propoesd. The proposed estimation method has been realized by introducing a system model of conditional probability type and a fuzzy probability theory. The proposed method has been applied to the estimation of an actual sound environment, and it has been experimentally verified that better results are obtained as compared with the method without considering fuzzy theory.
1170
A. Ikuta et al.
The proposed approach is quite different from the traditional standard approaches. It is still at early stage of development, and a number of practical problems are yet to be investigated in the future. These include: (i) Application of the proposed state estimation method to a diverse range of practical estimation problems for sound environment systems with unknown structure and fuzzy observation. (ii) Extension of the proposed method to cases with multi-dimensional state variable and multi-source configuration. (iii) Finding an optimal number of expansion terms in the proposed estimation algorithm of expansion expression type. (iv) Extension of the proposed theory to the actual situation under existence of the external noise (e.g., background noise).
References 1. Ohta, M., Ikuta, A.: An acoustic signal processing for generalized regression analysis with reduced information loss based on data observed with amplitude limitation. Acustica 81 (1995) 129–135 2. Namba, S., Kuwano, S., Nakamura, T.: Rating of road traffic noise using the method of continuous judgement by category. J. Acousti. Soc. Japan 34 (1978) 29–34 3. Kalman, R. E.: A new approach to linear filtering and prediction problem. Trans. ASME, Series D, J. Basic Engineering 82 (1960) 35–45 4. Kalman, R. E., Buch, R.: New results in linear filtering and prediction theory. Trans. ASME, Series D, J. Basic Engineering 83 (1961) 95–108 5. Kushner, H. J.: Approximations to optimal nonlinear filter. IEEE Trans. Autom. Control AC-12 (1967) 546–556 6. Ohta, M., Yamada, H.: New methodological trials of dynamical state estimation for the noise and vibration environmental system— Establishment of general theory and its application to urban noise problems. Acustica 55 (1984) 199–212 7. Ikuta, A., Ohta, M.: A state estimation method of impulsive signal using digital filter under the existence of external noise and its application to room acoustics. IEICE Trans. Fundamentals E75-A (1992) 988–995 8. Eyhhoff, P.: System identification: parameter and state estimation. John Wiley & Sons (1974) 9. Suzuki, Y., Kunitomo, N.: Bayes statistics and its application. Tokyo University Press (1989) 50–52 10. Zadeh, L. A.: Probability measures of fuzzy events. J. Math. Anal. Appl. 23 (1968) 421–427 11. Ohta, M., Koizumi, T.: General treatment of the response of a nonlinear rectifying device to a stationary random input. IEEE Trans. Inf. Theory IT-14 (1968) 595– 598 12. Cramer, H.: Mathematical methods of statistics. Princeton University Press (1951) 133, 221–227
Preventing Meaningless Stock Time Series Pattern Discovery by Changing Perceptually Important Point Detection Tak-chung Fu1,2,† , Fu-lai Chung1, Robert Luk1, and Chak-man Ng2 1
Department of Computing, The Hong Kong Polytechnic University, Hong Kong. {cstcfu, cskchung, csrluk}@comp.polyu.edu.hk 2 Department of Computing and Information Management Hong Kong Institute of Vocational Education (Chai Wan), Hong Kong. [email protected]
Abstract. Discovery of interesting or frequently appearing time series patterns is one of the important tasks in various time series data mining applications. However, recent research criticized that discovering subsequence patterns in time series using clustering approaches is meaningless. It is due to the presence of trivial matched subsequences in the formation of the time series subsequences using sliding window method. The objective of this paper is to propose a threshold-free approach to improve the method for segmenting long stock time series into subsequences using sliding window. The proposed approach filters the trivial matched subsequences by changing Perceptually Important Point (PIP) detection and reduced the dimension by PIP identification.
1 Introduction When time series data are divided into subsequences, interesting patterns can be discovered and it is easier to query, understand and mine them. Therefore, the discovery of frequently appearing time series patterns, or called surprising patterns in paper [1], has become one of the important tasks in various time series data mining applications. For the problem of time series pattern discovery, a common technique being employed is clustering. However, applying clustering approaches to discover frequently appearing patterns is criticized as meaningless recently when focusing on time series subsequence [2]. It is because when sliding window is used to discretize the long time series into subsequences given with a fixed window size, trivial match subsequences always exist. The existing of such subsequences will lead to the discovery of patterns derivations from sine curve. A subsequence is said to be a trivial match when it is similar to its adjacent subsequence formed by sliding window, the best matches to a subsequence, apart from itself, tends to be the subsequence that begin just one or two points to the left or the right of the original subsequence [3]. Therefore, it is necessary to prevent the over-counting of these trivial matches. For example, in Fig.1, the shapes of S1, S2 and S3 are similar to a head and shoulders (H&S) patterns while the †
Corresponding Author.
L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 1171 – 1174, 2005. © Springer-Verlag Berlin Heidelberg 2005
1172
T.-c. Fu et al.
shape of S4 is completely different from them. Therefore, S2 and S3 should be considered as trivial matches to S1 and we should only consider S1 and S4 in this case.
P I P id e n tif ie d
S
1
S
2
S
3
S
4
Fig. 1. Trivial match in time series subsequences (with PIPs identified in each subsequence)
References [3,4] defined the problem of enumerating the most frequently appearing pattern (which are called the most significant motifs, 1-motif, in reference [3]) in a time series P is the subsequence S1 that has the highest count of non-trivial matches. Therefore, the Kth most frequently appearing pattern (significant motif, K-Motif) in P is the subsequence SK that has the K highest count of non-trivial matches and satisfies D(SK, Si)>2R, for all 1 ≤ i < K . However, it is difficult to define a threshold R, to distinguish trivial and non-trivial matches. It is case dependent and there is no general rule for defining this value. Furthermore, reference [2] suggested that applying a classic clustering algorithm in place of subsequence time series clustering to cluster only the motifs discovered from K-motif detection algorithm. The objective of this paper is to develop a threshold-free approach to improve the segmentation method for segmenting long stock time series into subsequences using sliding window. The goal is redefined as to filter all the trivial matched subsequences formed by sliding window. The remaining subsequences should be considered as nontrivial matches for further frequently appearing pattern discovery process.
2 The Proposed Frequently Appearing Pattern Discovery Process Given a time series P = { p 1 ,..., p m } and fixing the width of the sliding window at w, a set of time series segments W ( P ) = { S i = [ p i ,..., p i + w −1 ] | i = 1,..., m − w + 1} can be formed. To identify trivial matches from the matching process of subsequences formed by sliding window, a method based on detecting the changes of the identified Perceptually Important Points (PIPs) is introduced. PIP identification is first proposed in reference [5] for dimensionality reduction and pattern matching of stock time series. It is based on identifying the critical points of the time series as the general shape of a stock time series is typically characterized by a few points. By comparing the differences between the PIP identified between two consequent subsequences, a trivial match occurred if the same set of PIP is identified and the second subsequence can be ignored. Otherwise, both subsequences are non-trivial matched and should be considered as the subsequence candidates of the pattern discovery process. This process carries along from the starting subsequence of the time series obtained by using sliding window till the end of the series. In Fig.1, the same set of PIP is identified in the subsequence S1, S2 and S3. Therefore, the matching of subsequence S2 and S3 with subsequence S1 should be considered as trivial match and subsequence S2 and S3
Preventing Meaningless Stock Time Series Pattern Discovery
1173
should be filtered. On the other hand, the set of PIP obtained from subsequence S4 is different from that of subsequence S3. This means that they are non-trivial match. Therefore, S1 and S4 are identified as the subsequence candidates. After all these trivial matched subsequences are filtered, the remaining subsequences should be considered as non-trivial matches and served as the candidates for further discovery process on frequently appearing patterns. They will be the input patterns for the training of the clustering process. k-means clustering technique can be applied to these candidates. The trained algorithm is expected to group a set of patterns M 1 ,..., M k which represent different structures or time series patterns of the data, where k is the number of the output patterns. Although the clustering procedure can be applied directly to the subsequence candidates, it will quite time consuming when a large number of data points (high dimension) are considered. By compressing the input patterns with the PIP identification algorithm, the dimensionality reduction can be achieved (Fig.2). compressed to
14 data points
compressed to
compressed to
9 data points
a) Examples of patterns discovered from original 14 data points and compressed to 9 data points
30 data points
9 data points
b) Examples of patterns discovered from original 30 data points and compressed to 9 data points
90 data points
9 data points
c) Examples of patterns discovered from original 90 data points and compressed to 9 data points
Fig. 2. Examples of dimensionality reduction based on PIP identification process
3 Experimental Result The ability of the proposed pattern discovery algorithm was evaluated in this section. Synthetic time series is used which is generated by combining 90 patterns with length of 89 data points from three common technical patterns in stock time series as shown in Fig.8 (i.e.30x3). The total length of the time series is 7912 data points. Three sets of subsequence candidate were prepared for the clustering process. They include the (i) original one, the subsequences formed by sliding window, where w=89. This is the set which is claimed to be meaningless in reference [2]; (ii) motifs, K-motifs [2,3] formed from the time series, where K is set to 500 based on the suggested method in paper [3] and (iii) proposed PIP method, the subsequence candidates filtered by detecting the change of PIPs and 9 PIPs are used. The number of pattern candidates and the time needed for pattern discovery are reported in Fig.4a. Only 281 motifs were formed while half of the subsequences were filtered by detecting the change of PIPs. The proposed PIP method is much faster than the other two approaches because the subsequences are compressed from 89 data points to 9 data points. The dimension for the clustering process is greatly reduced. Fig.4b shows the final patterns discovered. Six groups were formed by each of the approaches and it shows that the set of the pattern candidates deducted from the proposed approach is the most similar set to the pattern templates used to form the time series. On the other hand, the patterns discovered from the original subsequences seem not too related to the patterns which are used to construct the time series. Although the motifs approach can also discover the patterns which used to construct
1174
T.-c. Fu et al.
(i) original (ii) motifs (iii) PIP
No. of candidate 7833 281 3550
(a)
Time 1:02:51 0:02:33 0:00:03
(b)
Fig. 3. (a) Number of patterns and time needed for pattern discovery by using different pattern candidates and (b) Pattern discovered (i) original, (ii) motifs and (iii) PIP
the time series, it smoothed out the critical points of those patterns. Also, uptrends, downtrends and a group of miscellaneous patterns are discovered in all the approaches. To sum up, meaningless patterns are discovered by applying the clustering process on the time series subsequences (i) whereas both motifs and the proposed approach can partially solve this problem by filtering the trivial matched subsequences. However, it is still difficult to determine the starting point of the patterns and leads to the discovery of the shifting patterns. Additionally, the proposed approach can preserve the critical points of the patterns discovered and speed up the discovery process.
4 Conclusion In this paper, a frequently appearing pattern discovery process for stock time series by changing Perceptually Important Point (PIP) detection is proposed. The proposed method tackles the main problem of discovering meaningless subsequence patterns with the clustering approach. A threshold-free approach is introduced to filter the trivial matched subsequences, which these subsequences will cause the discovery of meaningless patterns. As demonstrated in the experimental results, the proposed method can discover the patterns hidden in the stock time series which can speed up the discovery process by reducing the dimension and capturing the critical points of the frequently appearing patterns at the same time. We are now working on the problem of determining the optimal number of PIPs for representing the time series subsequences and the results will be reported in the coming paper.
References 1. Keogh, E., Lonardi, S., Chiu, Y.C.: Finding Surprising Patterns in a Time Series Database in Linear Time and Space. Proc. of ACM SIGKDD (2002) 550-556 2. Keogh, E., Lin, J., Truppel, W.: Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research. Proc. of ICDM, (2003) 115-122 3. Lin, J., Keogh, E., Lonardi, S., Patel, P.: Finding Motifs in Time Series. In: Workshop on Temporal Data Mining, at the ACM SIGKDD (2002) 53-68 4. Patel, P., Keogh, E., Lin, J., Lonardi, S, Mining Motifs in Massive Time Series Databases. Proc. of the ICDM (2002) 370-377 5. Chung, F.L., Fu, T.C., Luk, R., Ng, V., Flexible Time Series Pattern Matching Based on Perceptually Important Points. In: Workshop on Learning from Temporal and Spatial Data at IJCAI (2001) 1-7
Discovering Frequent Itemsets Using Transaction Identifiers Duckjin Chai, Heeyoung Choi, and Buhyun Hwang Department of Computer Science, Chonnam National University, 300 Yongbong-dong, Kwangju, Korea {djchai, hychoi}@sunny.chonnam.ac.kr [email protected]
Abstract. In this paper, we propose an efficient algorithm which generates frequent itemsets by only one database scan. A frequent itemset is a set of common items that are included in at least as many transactions as a given minimum support. While scanning the database of transactions, our algorithm generates a table having 1-frequent items and a list of transactions per each 1-frequent item, and generates 2-frequent itemsets by using a hash technique. k(k ≥ 3)-frequent itemsets can be simply found by checking whether for all (k − 1)-frequent itemsets used to generate a k-candidate itemset, the number of common transactions in their lists is greater than or equal to the minimum support. The experimental analysis of our algorithm has shown that it can generate frequent itemsets more efficiently than FP-growth algorithm.
1
Introduction
As an information extraction method, data mining is a technology to analyze a large amount of accumulated data to obtain information and knowledge valuable for decision-making. Because data mining is used to produce information and knowledge helpful in generating profit, it is widely used in various industrial domains such as telecommunication, banking, retailing and distribution for shopping bag analysis, fraud detection, customer classification, and so on [1,4,11]. Data mining technologies include association rule discovery, classification, clustering, summarization, and sequential pattern discovery, etc.[1,4]. Association rule discovery, an area being studied most actively, is a technique to investigate the possibility of simultaneous occurrence of the data. In this paper, we study a technique to discover association rules that describe the associations among data. Most of previous studies, such as [2,3,5,6,7,8,9,10], have adopted an Apriori-like heuristic approach: if any k-itemset, where k is the number of items in the itemset, is not frequent in the database, a (k + 1)-itemset containing the k-itemset is never frequent. The essential idea is to iteratively generate the set of (k + 1)-candidate itemsets from k-frequent itemsets (for k ≥ 1), and check whether each (k + 1)-candidate itemset is frequent in the database.
This work was supported by Institute of Information Assessment(ITRC).
L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 1175–1184, 2005. c Springer-Verlag Berlin Heidelberg 2005
1176
D. Chai, H. Choi, and B. Hwang
The Apriori heuristic method can achieve good performance by significantly reducing the size of candidate sets. However, in situations with prolific frequent itemsets, long itemsets, or quite low minimum support thresholds, where the support is the number of transactions containing the itemsets, an Apriori-like algorithm may still suffer from nontrivial costs[6]. [6] proposed FP-growth algorithm using a novel tree structure(FP-tree). It discovers frequent itemsets by scanning a database twice without generating the candidate itemsets. FP-growth algorithm generates FP-tree and discovers frequent itemsets by checking all nodes of FP-tree. For the construction of FP-tree, FP-growth algorithm needs nontrivial cost for pruning non-frequent items and sorting the rest frequent items per each transaction. Moreover, there can be a space problem because new nodes have to be frequently inserted into the FP-tree if items in each transaction differ from items on the path of FP-tree. In this paper, we propose an algorithm that can compute a collection of frequent itemsets by only one database scan. We call this algorithm FTL(Frequent itemsets extraction algorithm using a TID(Transaction IDentifier) List table) from now on. By only one database scan, FTL algorithm discovers 1-frequent items and constructs a TID List table of which each row consists of a frequent itemset and a list of transactions including it. Simultaneously, FTL algorithm computes the frequency of 2-itemsets by using a hash function with two 1frequent items as parameters. This algorithm can reduce much computing cost that is needed for the generation of 2-frequent itemsets. k(k ≥ 3)-frequent itemsets are discovered by using TID List table. Most of association rule discovery algorithms spend their time on the scanning of massive database. Therefore, the less the number of database scanning is, the better the efficiency of an algorithm is. Consequently, FTL algorithm can considerably reduce the computing cost since it scans a database only once. The remaining of this paper is organized as follows. Section 2 presents FTL algorithm. In Section 3, we show the performance of FTL algorithm through the simulation experiment. Section 4 summarizes our study.
2
FTL Algorithm
In association rule discovery, the most efficient method is to discover frequent itemsets by only one database scan. However, previous algorithms must have found frequent itemsets sequentially from 1-frequent items to the final k-frequent itemsets because the frequent itemsets cannot be found easily in the massive database. Therefore, previous algorithms tried to reduce the number of database scans and the number of candidate itemsets. In this section, we propose an algorithm that can find frequent itemsets by only one database scan. Discovery of frequent itemsets is to find out a set of common items included in transactions of which the number satisfies a given minimum support. Consequently, if there is a data structure having information
Discovering Frequent Itemsets Using Transaction Identifiers
1177
Table 1. A transaction database as running example TID Items 100 f, a, c, d, g, i, m, p 200 a, b, c, f, l, m, o 300 b, f, h, j, o 400 b, c, k, s, p 500 a, f, c, e, l, p, m, n
about transactions in which each 1-frequent item is included, we can find frequent itemsets by searching the data structure without a database scan. We generate a data structure which is called TID List table. The TID List table consists of a 1-frequent item and a list of transactions containing it. The TID List table is used to generate k(k ≥ 3)-frequent itemsets. The 1-frequent items are discovered when TID List table is constructed and 2-frequent itemsets are extracted using a hash table which is generated when a database is scanned. Therefore, TID List table and hash table are simultaneously constructed when a database is scanned. When a count value of each 2-itemset stored in bucket of hash table is greater than the minimum support, the 2-itemset becomes a 2-frequent itemset. Many algorithms for generating candidate itemsets focused on the generation of the smaller candidate itemsets [3,10]. Especially, since the number of 2-candidate itemsets is generally big, it fairly improves the performance of mining to reduce the number of 2-candidate itemsets[10]. Our algorithm generates k-candidate itemsets for k ≥ 3. The performance of our algorithm can be improved since 2-candidate itemsets are not generated. The generation of candidate itemsets in our algorithm uses a method used in the Apriori[3]. Through probing TID List table, it can be checked whether the generated candidate itemsets are frequent. For all items in a candidate itemset, if the number of common TIDs is greater than the given minimum support, the candidate itemset becomes a frequent itemset. 2.1
The Generation of 1-Frequent Items and the TID List Table
All items of frequent itemset are included together in a transaction represented as one record in a database. That is, k items contained in the k-frequent itemset are included together in some identical transaction. For example, if an itemset {A, B, C} is frequent with the support 3, at least three transactions include all of items A, B, and C. For each 1-frequent item, FTL algorithm computes a list of transactions that include it, and stores it into the TID List table. Table 2 shows items and their TID lists for an example used at Table 1. 1-frequent items can be discovered by comparing the length of TID list of each item with a minimum support. At Table 2, 1-frequent items which satisfy the minimum support(50%) are shown at Table 3.
1178
D. Chai, H. Choi, and B. Hwang Table 2. TID List table Item TID List Length Item TID List Length a 100, 200, 500 3 j 300 1 b 200, 300, 400 3 k 400 1 c 100, 200, 400, 500 4 l 200, 500 2 d 100 1 m 100, 200, 500 3 e 500 1 n 500 1 f 100, 200, 300, 500 4 o 200, 300 2 g 100 1 p 100, 400, 500 3 h 300 1 s 400 1 I 100 1
Table 3. 1-TID List table for 1-frequent items 1-frequent item TID List Support a 100, 200, 500 3 b 200, 300, 400 3 c 100, 200, 400, 500 4 f 100, 200, 300, 500 4 m 100, 200, 500 3 p 100, 400, 500 3
2.2
The Generation of 2-Frequent Itemsets Using a Hash Technique
Using a hash function with two parameters, FTL algorithm computes the frequency of 2-itemsets and selects 2-frequent itemsets from them. Two parameters represent two items. When TID List table is constructed through the database scan, at the same time we generate a bucket of each 2-itemset using a hash function having two items as its parameters. Whenever each bucket is referenced by a hash function, the bucket count is increased by 1. Table 4 shows the frequency of 2-itemsets which are calculated by a hash function f (x, y) = array[x][y], where x and y are an item. Since a pattern in association rule discovery is not a sequential one, items in an itemset are commutative. That is, an itemset {a, b} is the same as an itemset {b, a}. We will maintain items in a itemset in alphabetic order from this moment. A collection of buckets is represented by a two-dimensional array. For example, the frequency count of a itemset {a, b} is stored to the two-dimensional array[a][b]. If the frequency count of a bucket for an itemset is greater than the minimum support, it is extracted as 2-frequent itemset. Therefore, we can get 2-frequent itemsets such as Table 4. We assume that the minimum support is 50 percent. If we use a hash technique to generate all frequent itemsets, the computing cost will be very low. But its space overhead can be high since nk operations are required, where n is the number of total items in database and k is the number of items included in itemsets. The number of buckets can be increased exponentially
Discovering Frequent Itemsets Using Transaction Identifiers
1179
Table 4. Hash table having the frequency of 2-itemsets hash function: f (x, y) = array[x][y] index–> {a,f} {a,c} {a,d} {a,g} {a,i} {a,m} {a,p} {a,b} {a,l} {a,o} {a,e} {a,n} count–> 3 3 1 1 1 3 2 1 2 1 1 1 {b,c} {b,f} {b,l} {b,m} {b,o} {b,k} {b,s} {b,p} {c,f} {c,d} {c,g} {c,i} 2 1 1 1 1 1 1 1 3 1 1 1 {c,m} {c,p} {c,l} {c,o} {c,k} {c,s} {c,e} {c,n} {d,f} {d,g} {d,i} {d,m} 3 3 2 1 1 1 1 1 1 1 1 1 {d,p} {f,g} {f,i} {f,m} {f,p} {f,l} {f,o} {f,h} {f,j} {f,o} {f,e} {f,n} 1 1 1 3 2 2 1 1 1 1 1 1 {g,i} {g,m} {g,p} {i,m} {i,p} {l,m} {l,o} {l,n} {l,p} {m,p} {m,o} {m,n} 1 1 1 1 1 2 1 1 1 2 1 1 {h,j} {h,o} {j,o} {k,s} {p,s} {e,p} {e,m} {e,n} {e,l} {p,n} 1 1 1 1 1 1 1 1 1 1 2-frequent itemsets {a,f} {a,c} {a,m} {c,f} {c,m} {c,p} {f,m} 3 3 3 3 3 3 3
according to the increase of k. There is a tradeoff between a computing overhead and space overhead. However, if this hash technique is used partially, frequent itemsets can be computed effectively. Especially, if we use this hash technique in the generation of 2-frequent itemsets or 3-frequent itemsets, the performance of an algorithm may be improved since most computing cost is required for the generation of 2-frequent itemsets or 3-frequent itemsets. 2.3
The Generation of k(k > 2)-Frequent Itemsets Using TID List Table
In FTL algorithm, candidate itemsets are generated by using Apriori-gen used at Apriori algorithm. FTL algorithm uses (k − 1)-TID List table to find k-frequent itemsets(k > 3). (k − 1)-TID List table has information about transactions including (k − 1)-frequent itemsets. Therefore, we can calculate the support of each candidate itemset by computing the number of common transactions that include (k − 1)-frequent itemsets used to generate the k-candidate itemset. k-TID List table consists of TID lists. Each of them is a list of transactions containing a k-frequent itemset. k-TID List table is constructed by using (k − 1)TID List table since k-frequent itemsets is composed only by the combination of (k − 1)-frequent itemsets in (k − 1)-TID List table. Generally, k-TID List table becomes smaller than (k − 1)-TID List table when k is greater than 3. Thus, it can reduce the number of comparisons since the TID List table size becomes smaller and smaller. We can find the fact by investigating Table 6 and 7. Table 5 is an example of generating 3-frequent itemsets by using 1-TID Lists of Table 3 and 2-frequent itemsets of Table 4. 2-frequent itemsets are {a,f}, {a,c}, {a,m}, {c,f}, {c,m}, {c,p}, and {f,m} as shown in Table 4.
1180
D. Chai, H. Choi, and B. Hwang
Table 5. Generation of 3-frequent itemsets using 1-TID List table and 2-frequent itemsets 3-candidate itemset Item a c {a, c, f} f a c {a, c, m} m a f {a, f, m} m c f {c, f, m} m
TID List 100,200,500 100, 200, 400, 500 100, 200, 300, 500 100,200,500 100, 200, 400, 500 100, 200, 500 100, 200, 500 100, 200, 300, 500 100, 200, 500 100, 200, 400, 500 100, 200, 300, 500 100, 200, 500
Common TIDs 100, 200, 500
100, 200, 500
100, 200, 500
100, 200, 500
Table 6. 3-TID List table for 3-frequent itemsets 3-frequent itemset {a, c, f} {a, c, m} {a, f, m} {c, f, m}
TID List Support 100, 200, 500 3 100, 200, 500 3 100, 200, 500 3 100, 200, 500 3
By using Apriori-gen algorithm, FTL algorithm generates 3-candidate itemsets, {a, c, f}, {a, c, m}, {a, f, m}, and {c, f, m}. As shown in Table 5, the candidate itemsets include transactions 100, 200, and 500. These candidate itemsets become 3-frequent itemset since they satisfy the minimum support 50%. Thus, we get Table 6 as 3-TID List table. Next, we can generate 4-candidate itemset {a, c, f, m} from 3-frequent itemsets. At this time, we use 3-TID List table of Table 6. 4-candidate itemset {a, c, f, m} is generated from the first two 3-frequent itemsets, {a, c, f} and {a, c, m}. Since the remaining two 3-frequent itemsets, {a, f, m} and {c, f, m}, are included in 4-candidate itemset {a, c, f, m}, these itemsets need not be considered. We have to compute the number of common transactions included in both of TID lists for two 3-frequent itemsets, {a, c, f} and {a, c, m}, and check whether the number satisfies the support. Therefore, for calculating the number of common transactions including all items of k-candidate itemset, the use of (k − 1)-TID List table is more effective than that of 1-TID List table. As shown in Table 7, since both TID lists for 3-frequent itemset {a, c, f} and {a, c, m} are transactions 100, 200, and 500, 4-candidate itemset {a, c, f, m} includes transactions 100, 200, and 500, and satisfies the minimum support 50%.
Discovering Frequent Itemsets Using Transaction Identifiers
1181
Table 7. Generation of 4-frequent itemsets using 3-TID List table 4-candidate itemset 3-frequent itemset TID List Common TIDs {a, c, f} 100, 200, 500 {a, c, f, m} 100, 200, 500 {a, c, m} 100, 200, 500
Algorithm 1. FTL: TID List construction and Frequent Itemset extraction Input :A transaction database DB and a minimum support threshold ε Output :The complete set of frequent itemsets Procedure TIDList&2-frequent(DB) for each transaction do generate 1-TID List table generate bucket using hash function for all buckets do if count value of bucket > ε then insert to 2-frequent itemsets Procedure k-frequent(TID List, k-frequent itemsets) for all generate (k + 1)-candidate itemsets do for all (k + 1)-candidate itemsets do compare TIDs of each item if the number of the common TID > ε then insert to (k + 1)-frequent itemsets update TID List table if candidate itemsets can generate then call k-frequent(TID List, (k + 1)-frequent itemsets)
Thus, the 4-candidate itemset becomes 4-frequent itemset. Since k-candidate itemsets(k > 4) cannot be generated, FTL algorithm is terminated. Let the number of items of an itemset be m and the length of TID list be n. Since each TID list in TID List table are sorted by TIDs of transactions, the computing time for calculating the support of an itemset is O(mn). If not being sorted, the computing time is O(m·n2 ). For all items in an itemset, TID List table is searched from the first position of each TID list. If a common TID in all their TID lists is found, a count is increased by one. The search is continued until the count is equal to the minimum support or one of the TID lists is ended. If the count is equal to the minimum support, the itemset becomes a frequent itemset. This procedure for computing frequent itemsets can considerably reduce the computing time. The FTL algorithm generates TID List table having information that 1frequent items and their list of TIDs including them by only one database scanning. The performance of FTL algorithm can be improved since k-frequent itemsets(k > 2) can be extracted by searching only (k − 1)-TID List table. The FTL algorithm is shown in Algorithm 1.
1182
D. Chai, H. Choi, and B. Hwang
16
D1 Dataset
14
FP-growth FTL
Run time(sec.)
12 10 8 6 4 2 0 0.1
0.3
0.5
1
1.5
2
2.5
3
Support threshold(%)
Fig. 1. Scalability with support threshold(Data set : D1 )
3
Experimental Evaluation and Performance Study
In this section, we present a performance comparison of FTL with a recently proposed efficient method, FP-growth algorithm. The experiments are performed on a 2-GHz Pentium PC machine with 1 GB main memory, running on Microsoft Windows/XP. All the programs are written in Microsoft/Visual C++6.0. Notice that we implement their algorithms to the best of our knowledge based on the published reports on the same machine and compare in the same running environment. Please also note that run time used here means the total execution time, i.e., the period between input and output, instead of CPU time measured in the experiments in some literature. As the generator of data set, we have used the generator used for the performance experiment in the previous papers. The data generator can be downloaded from the following URL. http://www.almaden.ibm.com/software/quest/Resources/index.shtml We report experimental results on two data sets. The first one is T20.I10. D10K with 1K items, which is denoted as D1 . In this data set, the average size of transactions and the maximal potential size of frequent itemsets are 20 and 10, respectively, while the number of transactions in the data set is set to 10K. The second data set, denoted as D2 , is T20.I10.D100K with 1K items. And the size of transactions and the maximal size of frequent itemsets are 20 and 10, respectively. There are exponentially numerous frequent itemsets in both data sets, as the support threshold goes down. There are long frequent itemsets as well as a large number of short frequent itemsets in them. The scalability of FTL and FP-growth as the support threshold decreases from 3% to 0.1% is shown in Fig.1 and Fig.2. Each graph shows the runtime of FTL algorithm and FP-growth algorithm while increasing support threshold. Fig.1 and 2 are the result of data set D1 and D2 , respectively. As shown in Fig.1 and Fig.2, both FP-growth and FTL have good performance when the support
Discovering Frequent Itemsets Using Transaction Identifiers
120
1183
D2 Dataset FP-growth
Run time(sec.)
100
FTL
80 60 40 20 0 0.1
0.3
0.5
1
1.5
2
2.5
3
Support threshold(%)
Fig. 2. Scalability with support threshold(Data set : D2 )
threshold is pretty low, but FTL is better. This phenomenon results from the fact that the cost to generate frequent itemsets in FTL algorithm is less than that in FP-growth algorithm and a database is scanned only once. According to the increase of support threshold, the runtime difference of two algorithms becomes much bigger. To test the scalability with the number of transactions, experiments on data set D2 are used. The support threshold is set to 0.5%. The results are presented in Fig.3. Both FTL and FP-growth algorithms show linear scalability with the number of transactions from 20K to 80K. As the number of transactions grows up, the difference between the two methods becomes larger and larger. In a database with a large number of frequent items, candidate itemsets can become quite large. However FTL compares only two records of TID List table as shown in Table 7. Therefore, FTL can find frequent itemsets fastly. This explains why FTL has advantages when the number of transactions is large.
4
Conclusions
We proposed the FTL algorithm that can reduce the number of database scans by one and thus discovers efficiently frequent itemsets. FTL algorithm generates TID List table by one database scanning and simultaneously calculates the frequency of 2-itemsets using a hash technique. Since k-TID List table has the information about transactions including k-frequent items, TID List table is the compact database from which k-frequent itemsets can be calculated. One of advantages of FTL algorithm is that k-frequent itemsets can be computed by only one scan of database. In FTL algorithm, it is easy and efficient to test whether each k-candidate itemset is frequent or not. Since (k + 1)-candidate itemsets are generated from k-frequent itemsets, a method is necessary to effi-
1184
D. Chai, H. Choi, and B. Hwang
18 FP-growth 16
FTL
14
Run time(sec.)
12 10 8 6 4 2 0 20
40
60
80
Number of transaction(K)
Fig. 3. Scalability with number of transactions
ciently prune candidate itemsets for much more efficiency. Our simulation experiments have shown that FTL algorithm is more efficient than FP-growth algorithm for given two data set.
References 1. Adrians, P., Zantige, D.: Data Mining. Addison-Wesley. (1996) 2. Agrawal, R., Aggarwal, C., and Prasad, V. V. V.: A tree projection algorithm for generation of frequent itemsets. In J. Parallel and Distributed Computing. (2000) 3. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In VLDB. (1994) 487–499 4. Berry, M.J.A., Linoff, G.: Data Mining Techniques-For marketing, Sales, and Customer Support. Wiley Computer Publishing. (1997) 5. Grahne, G., Lakshmanan, L., and Wang, X.: Efficient mining of constrained correlated sets. In ICDE. (2000) 6. Han, J., Pei, J., and Yin, Y.: Mining frequent patterns without candidate generation. In ACM SIGMOD. (2000) 1–12 7. Lent, B., Swami, A., and Widom, J.: Clustering association rules. In ICDE. (1997) 220–231 8. Liu, B., Hsu, W., and Ma, Y.: Mining association rules with multiple minimum supports. In ACM SIGKDD. (1999) 337–341 9. Ng, R., Lakshmanan, L. V. S., Han, J., and Pang, A.: Exploratory mining and pruning optimizations of constrained associations rules. In SIGMOD. (1998) 13–24 10. Park, J.S., Chen, M.S., and Yu, P.S.: An effective hash-based algorithm for mining association rules. In ACM SIGMOD. (1995) 175–186 11. Simoudis, E.: Reality Check for Data Mining. IEEE Expert: Intelligent Systems and Their Applications 11 (5), October, (1996)
Incremental DFT Based Search Algorithm for Similar Sequence Quan Zheng, Zhikai Feng, and Ming Zhu Department of Automation, University of Science and Technology of China, Hefei, 230027, P.R. China [email protected] [email protected]
Abstract. This paper begins with a new algorithm for computing time sequence data expansion distance on the time domain that, with a time complexity of O(n×m), solves the problem of retained similarity after the shifting and scaling of time sequence on the Y axis. After this, another algorithm is proposed for computing time sequence data expansion distance on frequency domain and searching similar subsequence in long time sequence, with a time complexity of merely O(n×fc), suitable for online implementation for its high efficiency, and adaptable to the extended definition of time sequence data expansion distance. An incremental DFT algorithm is also provided for time sequence data and linear weighted time sequence data, which allows dimension reduction on each window of a long sequence, simplifying the traditional O(n×m×fc) to O(n×fc).
1 Introduction In time sequence data mining, the extensively applicable technique, the fundamental issue of time sequence data similarity comparison has a promising prospect. Current methods of data similarity comparison and fast similar subsequence searching include, apart from Euclid technique, frequency domain method [1],[2],[3],[4], segmentation method [5],[6], waveform descriptive language method [7]. Previous studies produced the concept of time sequence expansion distance to preserve the similarity of time sequence data after linear shifting. Major studies on expanded similar sequence searching have been conducted by Chu et al [8] and Agrawal et al [9]. However, the distance proposed in [8] is asymmetrical, which may lead to results against common sense, while the algorithm is basically a costly distance computation on the time domain. The similar sequence searching algorithm in [9] suffers from: 1. simple normalization technique is used to solve the shifting and scaling problems in subsequence similarity comparison, which not universally applicable; 2. complicated and costly. In this paper, the basic frequency domain method is extended to apply to the search in expanded similar sequence. Main points include: providing an analytical result for computing time sequence data expansion distance on time domain; An innovative computing method for time sequence data expansion distance based on frequency domain analytical solution, and the fast searching technique for relevant similar sequence. Computing on a dimension reduced frequency domain, the technique is L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 1185 – 1191, 2005. © Springer-Verlag Berlin Heidelberg 2005
1186
Q. Zheng, Z. Feng, and M. Zhu
highly efficient, with a time complexity of O(n×fc), and adaptable to expansion distance of time sequence data; In the similar subsequence searching, DFT dimension reduction is necessary for each window of a long sequence. The incremental DFT for each window of a long sequence and the incremental DFT for linear weighted time sequence data proposed below reduce the time complexity from the traditional O(n×m×fc) to O(n×fc).
2
Expanded Time Sequence Data Distance and Its Analytical Solution
Definition 1: The expanded asymmetric distance of one dimension time series data of the same length (m) x=[x0,x1,…, xm-1]T y=[y0,y1,…, ym-1]T is defined as: m −1
d ( x , y ) = min ( ¦ ( xi − ay i − b ) 2 )1/ 2 . a ,b
(1)
i=0
The advantage of this definition is that it maintains the similarity of time sequence data after scaling and shifting, applicable to different scaling and shifting amount of transducers. This distance is asymmetrical and against common sense, therefore another definition for time sequence data distance is given by [1], as the minimum value of 2 asymmetric distance d(x,y) and d(y,x). Although Chu et al [8] proposed an algorithm that maps the time sequence data to a shifting-eliminated plane where the distance is computed, the method is over-complicated. This paper proposes a direct analytical method for computing time sequence data expansion distance through computing the optimum parameter of a, b. Theorem 1: The optimum analytical solution of the asymmetric distance of one dimension time sequence data x y of the length m is: m −1
d ( x , y ) = ( ¦ ( x i − a t y i − bt ) 2 ) 1 / 2 .
(2)
i =0
m −1
¦x y i
Where: a t
=
i=0 m −1
¦y
i
− mx y ,
2 i
− my
2
bt = x − at y
i =0
The symmetrical is the minimum value of d(x,y) and d(y,x). The time complexity of the search algorithm for relevant similar sequence is, at a perfect matching, O(m), while at subsequence searching, it is O(n×m), where n is the length of the long time sequence and m is the length of subsequence. This algorithm, i.e. computing time sequence data distance on the time domain, avoids searching on the (a,b) plane, and may quickly obtain the analytic solution of the expansion distance according to the values of time sequence data.
Incremental DFT Based Search Algorithm for Similar Sequence
1187
3 Computing Time Sequence Data Distance on the Frequency Domain The search algorithm for similar sequence based on time domain analytic solution, as described in theorem 1, is conducted on the time domain, therefore costly and unsuitable for online application, whereas frequency domain methods are generally not applicable to the expansion distance of time sequence data. Our concern is to somehow extend the frequency domain method for computation of time sequence, and adapt it to the definition of time sequence data expansion distance. Lemma 1: let the corresponding Fourier coefficient of time sequence data x be Xf, and the time sequence data x after linear transformation be y=a x+b then the Fourier parameter Yf for the number f item in time sequence data y is:
Y f = aX f + j 2π m
Where c = −
b 1 − e cfm . cf m 1− e
(3)
Xf,Yf Are the number f component of time sequence data x
and y, respectively. Theorem 2: The expanded asymmetric distance of time sequence data x and y are approximately: f c −1
d ( x, y ) ≈ ( ¦ X f =0
fc −1
where:
at =
¦(X
bt =
f =0
fc −1
f
f =0
fc −1
2
)1 / 2 .
(4)
fc −1
⊕Z f )¦(Yf ⊕ Z f ) − ¦(X f ⊕Yf )¦(Z f ⊕ Z f ) f =0
f =0
2
fc −1
fc −1
f =0
fc −1
§ · ¨ ¦(Yf ⊕Z f )¸ − ¦(Yf ⊕Yf )¦(Z f ⊕ Z f ) ¨ ¸ f =0 © f =0 ¹ f =0
fc −1
¦(X
f
b 1 − e cfm − atY f − t cf m 1− e
fc −1
f
⊕Y f ) − a t ¦ (Y f ⊕ Y f ) f =0
fc −1
¦ (Y
f
⊕Zf)
f =0
fc is the limiting frequency;
Zf =
1 − e cfm m (1 − e cf )
is a complex number sequence
introduced for convenience's sake; Xf is the Fourier parameter of the number f item in the time sequence data x, and Yf is the Fourier parameter of the number f item in the time sequence data y; function ⊕ is a mapping from complex number to real number, i.e. the product of the real parts of two complex numbers plus the product of their imaginary parts.
1188
Q. Zheng, Z. Feng, and M. Zhu
The relevant subsequence searching algorithm allows, at the same time, incremental DFT and expansion distance computation with frequency domain analytic solution, thereby to perform similarity comparison. The time complexity of it is O(n×fccc). Compared to the similar subsequence searching on time domain, the time complexity of which being O(n×m) , this algorithm is more efficient and suitable for online application, because the fc ranges from 2-5, at 2-3 magnitudes lower than m. Furthermore, this algorithm maintains the similarity of time sequence data after linear shifting, and is therefore adaptable to the expanded definition of distance. To simply the matter, the similar subsequence searching algorithm on the frequency domain that utilizes incremental DFT, and solves the issue of shifting and scaling is henceforth called: Extended frequency domain method.
4 Incremental Fourier Shifting of Time Sequence Data and Linear Weighted Time Sequence Data Regarding the searching of similar subsequence, the algorithm described in section 3 requires discrete Fourier shifting for each subsequence window. According to traditional DFT formulae, time complexity for obtaining low order fc Fourier parameters is O(n×m×fc), which is costly. We now present an incremental Fourier shifting algorithm that greatly enhances the efficiency, and is suitable for online application. The long time sequence x is divided into n-m+1 interlapping time windows at the length m. xwi represents the number i window, capitalized XWi,f represents the number f frequency component of the time window. Theorem 3: The relation between XWi,f, the number f Fourier parameter of the data time window xwi,, and XWi-1,f, the number f Fourier parameter of the previous time window, is:
XWi , f = Where:
Δ i, f ᧹
1 m
XWi −1, f
( xwi ,m e cfm −
e cf
xwi −1,0 e
cf
+ Δ i, f .
)=
1 m
( xi + m e cfm −
(5)
xi −1 ) e cf
On some occasions, if the time sequence data x is closer to the current time (m-1), it is regarded as more important than the more distant points. For convenience sake, we introduce a forgetting function f (t) to contribute to the weight of distance. see (6).
f (t ) = z + kt = (1 − km + k ) + kt .
(6)
Definition 2: The linear forgetting distance dw(x,y) for 2 one dimensional time sequence data x y of the length of m is: m −1
d w ( x, y ) = (¦ ( xt − y t ) 2 f (t ) 2 )1 / 2 . t =0
(7)
Incremental DFT Based Search Algorithm for Similar Sequence
1189
In the time sequence, the number t datum in the number i window is represented as xwi,t. the datum after weighing is xw’i,t. Their relations are:
xw' i ,t = xwi ,t f (t ) = xwi ,t (1 − km + k + kt ) .
(8)
From definition 2, m −1
m −1
t =0
t =0
d w ( x, y ) = (¦ ( xt − y t ) 2 f (t ) 2 )1 / 2 = (¦ ( x' t − y ' t ) 2 )1 / 2 = d ( x' , y ' ) Therefore computing the weighted distance between 2 subsequences is equivalent to computing the Euclidean distance between two weighted time sequences. According to Parseval Rules, we may take the first few frequency components from the frequency domain of the weighted time sequence data to perform an approximate distance computation, allowing fast similar sequence search. The issue now is how to obtain the Fourier parameters of each window after linear weighing, and in an incremental manner. Time window xw’is time window xw after weighing. When the DFT parameters of the previous window XWi-1,f linear weighted Fourier parameter XW’i-1,f and auxiliary parameter XWTi-1,f are given, how to obtain the DFT parameters XW’i-1,f of this linear weighted data window in an incremental manner. Where, XWTi,f=
1
m −1
¦ tx m
e cft
w ,t
The following lemmas can be obtained:
t =0
Lemma 2: XWTi , f
(m − 1) x w,m e cfm + x w,0 1 ) = cf ( XWTi −1, f − XWi , f ) + e e cf m
Lemma 3: The number f Fourier parameter XW i,f of the time sequence window XW’i after linear forgetting is:
XW ' i , f = (1 − km + k ) XWi , f + kXWTi , f Therefore, incremental algorithm for Fourier parameter for the time sequence window XW’i,f after linear forgetting can be obtained,namely, theorem 4. It’s easy to prove by combining theorem 3, lemma 2 and lemma 3. Theorem 4: Recursion formulae for incremental computation of the Fourier parameter of linear forgetting time sequence window are shown in (9)-(13):
XW0, f =
XWT0, f =
XWi , f = XWTi , f =
XWi −1, f
ecf
1 m
m −1
¦x
0, t
e cft .
1 m−1 ¦ tx0,t ecft . m t =0 +
(9)
t =0
x 1 ( xw,m ecfm − w−cf1,0 ) . e m
( m − 1) xw,m −1ecfm + xw−1,0 . 1 ( XWTi −1, f − XWi −1, f ) + cf e ecf m
(10)
(11)
(12)
1190
Q. Zheng, Z. Feng, and M. Zhu
XW 'i , f = (1 − km + k ) XWi , f + kXWTi , f .
(13)
When the weighted Fourier parameters of each window are obtained, the approximate weighted distance of time sequence data can be computed on dimension reduced frequency domain, achieving high efficiency in similar sequence searching. Obviously, the time complexity of incremental DFT algorithmic is O(n×fc) much lower thant the time complexity of traditional DFT algorithm O(n×m×fc).
5 Experiment A comparison of the running time between extended frequency domain method and time domain method is shown in Table 1 and Table 2. In Table 1, the length of time sequence n = 200000 limiting frequency fccc 3; in Table 2, length of subsequence m= 2000; limiting frequency in both tables fccc 3. The results indicate that with the extended frequency domain method, the time is about 1/10 1/50 of the time domain method, greatly improving the efficiency of search algorithm. The former is also adaptable to the definition of expansion distance of time sequence. Table 1. The running time of time domain method and extended frequency method along with subsequence length
subsequence Length m 500 1000 1500 2000 2500
Time domain method (Second) 32.32 64.44 98.98 132.36 164.97
Extended frequency method (Second) 3.391 3.375 3.359 3.406 3.390
domain
Table 2. the running time of time domain method and extended frequency method along with sequence length Sequence length n
50000 100000 150000 200000 250000
Time domain method (Second) 32.65 69.53 100.79 132.36 169.06
Extended frequency method (Second) 0.844 1.687 2.547 3.406 4.297
domain
6 Conclusion In this paper, an analytical algorithm is proposed for computing time sequence data expansion distance on the frequency domain, offering new techniques for similar
Incremental DFT Based Search Algorithm for Similar Sequence
1191
subsequence searching. It is proven, through experiment, this algorithm is more efficient than the time domain based algorithm, and suitable for online application, adaptable to the definition of time sequence data expansion distance. An incremental DFT algorithm is also provided for time sequence data and linear weighted time sequence data, which greatly improves the efficiency of DFT dimension reduction on each window of a long sequence, simplifying the traditional O(n×m×fc) time complexity to O(n×fc).
Reference 1. R. Agrawal, C.Faloutsos and A.swami: Efficient similarity search in sequence database. In FODO, Evanston , Illinois, October (1993) 69-84 2. Faloutsos Christos. Ranganathan M. and Manolopulos Yannis: Fast subsequence matching in time series databases. Proc ACM SIGMOD, Minnerapolis MN, May 25-27 (1994) 419-429 3. D.Rafiei and A.O.Mendelzon: Efficient retrieval of similar time sequences using DFT. In FODO, Kobe,Japan (1998) 203-212 4. K.P.Chan and A.W.C.Fu: Efficient time series matching by wavelets. In ICDE, Sydney, Australia (1999) 126-133 5. Keogh Eamonn, Padhraic smyth: A probabilistic approach to fast pattern matching in time series databases. Proceedings of the Third Conference on Knowledge Discovery in Databases and Data Mining, AAAI Press, Menlo Park, CA (1997) 24-30 6. Keogh Eamonn, Michael J.Pazzani: An Enhanced representation of time series which allow fast and accurate classification, clustering and relevance feedback. Proceeding of the 4th International Conference of Knowledge discovery and Data Mining, AAAI Press, Menlo Park, CA (1998) 239-241 7. Rakesh Agrawal, Giuseppe Psaila, Edward L.Wimmers, Mohamed Zait: Querying shapes of histories. Proceedings of the 21st VLDB Conference, Zurich, Switzerland (1995) 502-514 8. K.K.W. Chu, M.H Wong: Fast time-series searching with scaling and shifting. In proceedings of the l g ACM Symposium on Principles of Database Systems, Philadelphia, PA (1999) 237-248 9. Agrawal R, Lin K I, Sawhney H S, Shim K: Fast similarity search in the presence of noise, scaling and translation in time-series databases. In Proc. 1995 Int. Conf. Very Large Data Bases(VLDB’95), Zurich, Switzerland (1995) 490-501
Computing High Dimensional MOLAP with Parallel Shell Mini-cubes Kong-fa Hu, Chen Ling, Shen Jie, Gu Qi, and Xiao-li Tang Department of Computer Science and Engineering, Yangzhou University [email protected]
Abstract. MOLAP is a important application on multidimensional data warehouse. We often execute range queries on aggregate cube computed by preaggregate technique in MOLAP. For the cube with d dimensions, it can generate 2d cuboids. But in a high-dimensional cube, it might not be practical to build all these cuboids. In this paper, we propose a multi-dimensional hierarchical fragmentation of the fact table based on multiple dimension attributes and their dimension hierarchical encoding. This method partition the high dimensional data cube into shell mini-cubes. The proposed data allocation and processing model also supports parallel I/O and parallel processing as well as load balancing for disks and processors. We have compared the methods of shell mini-cubes with the other existed ones such as partial cube and full cube by experiment. The results show that the algorithms of mini-cubes proposed in this paper are more efficient than the other existed ones.
1 Introduction Data warehouses integrate massive amounts of data from multiple sources and are primarily used for decision support purposes.Since the advent of data warehousing and online analytical processing (OLAP), data cube has been playing an essential role in the implementation of fast OLAP operations [1]. Materialization of a data cube is a way to pre-compute and store multi-dimensional aggregates so that multi-dimensional analysis can be performed on the fly. For this task, there have been many efficient cube computation algorithms proposed, such as BUC [2], H-cubing [3], and Starcubing [4].Those methods have taken effect for the low-dimensional cube in the traditional data warehouse. But in the high-dimensional data cube, it is too costly in both computation time and storage space to materialize a full cube. For example, a data cube of 100 dimensions, each with 10 distinct values, may contain as many as 11100 aggregate cells. Although the adoption of Iceberg cube[4], Condensed cube[5],Dwarf[6] or approximate cube[7] delays the explosion, it does not solve the fundamental problem. No feasible data cube can be constructed with such data sets. In this paper we will address the problem of developing an efficient algorithm to perform OLAP on such data sets. In this paper, we propose a multi-dimensional hierarchical fragmentation of the fact table based on multiple dimension attributes their dimension hierarchical encoding. Such an approach permits a significant reduction of processing and I/O overhead for many queries by restricting the number of fragments to be processed for both the L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 1192 – 1196, 2005. © Springer-Verlag Berlin Heidelberg 2005
Computing High Dimensional MOLAP with Parallel Shell Mini-cubes
1193
fact table and bitmap data. This method also supports parallel I/O and parallel processing as well as load balancing for disks and processors.
2 Shell Mini-cubes OLAP Queries tend to be complex and ad hoc, often requiring computationally expensive operations such as joins and aggregation. The OLAP query that accesses a large number of fact table tuples that are stored in no particular order might result to much more many I/Os, causing a prohibitive long response time. To illustrate the method, a tiny data cube PRT, Table 1,is used as a running example. Table 1. A sample data cube with two measure values TID 1 2 3 4 …
DimProduct Category Class Product Country Office OA Computer China Office OA Computer China Office OA Computer China Office OA Computer China … … … …
dimRegion dimTime Measure Province City Year Month Day Count SaleNum Jiangsu Nanjing 1998 1 1 Jiangsu Nanjing 1998 1 2 Jiangsu Yangzhou 1998 1 2 Jiangsu Yangzhou 1998 1 3 … … … … … … …
The cube PRT have three dimensions ,such as (P,R,T). From the RPT Cube, we would compute eight cuboids:{(P,R,T),(P,R,All), (P,All,T), (All,R,T), (P,All,All), (All,R,All), (All,All,T), (All,All,All)}.To the cube of d dimensions, it would create 2d cuboids.For the cube with d dimensions (D1,D2,...,Dd) and |Di| distinct values for each d
d
∏ (| D
i
| + 1)
cells. But in a highdimension Di, it can generate 2 cuboids and dimensional cube with many cuboids, it might not be practical to build all these indices. If we consider the dimension hierarchies, the cuboids is vary much. So we can partition all the dimensions of the high-dimensional cube into independent groups, called Cube segments. For example, for a database of 30 dimensions, D1, D2, ..., D30, we first partition the 30 dimensions into 10 fragments(mini-Cubes) of size 3: (D1,D2,D3), (D4,D5,D6), ... , (D28,D29,D30). For each mini-Cube, we compute its full data cube while recording the inverted indices. For example, in fragment mini-Cube (D1,D2,D3), we would compute eight cuboids:{(D1,D2,D3),…,(All,All,All)}. An inverted encoding index is retained for each cell in the cuboids. The benefit of this model can be seen by a simple calculation. For a base cuboid of 30 dimensions, there are only 8×10 = 80 cuboids to be computed according to the above shell fragment partition. Comparing this 3 1 to C 30 = 4525cuboids for the partial cube shell of size 3 and 230= 109cu+ C 302 + C 30 i =1
boids for full cube, the saving is enormous. We propose a novel hierarchical encoding on each dimension table, called dimension hierarchical encoding. It is constructed as a path of values corresponding to dimension attributes that belong to a common hierarchy.
1194
K.-f. Hu et al.
Definition 1. (Dimension hierarchical encoding ) The dimension hierarchical member encoding of the hierarchy
Lij on the dimen-
i
sion Di is B L j :dom( Lij )ĺ{| biɽ{0,1},i=0,...,k-1}.The dimension hierarchical encoding B Di of each member on the dimension Di is defined as formula 1. i
i
i
i
B Di =(...((B L1 1 − δ.
(5)
It means that a sampling algorithm must yield a good approximation of f r(x, D) with reasonable probability. The difference between (4) and (5) is just that (4) is an absolute error measure, while (5) is a relative error measure. In this paper, P r(x) = 1 − δ. Therefore, if δ < 12 , the sampling ensemble works. According to the Hoeffding bound, ∀δ > 0, 0 < < 1, if the sample size n satisfies the following inequalities (6)/(7), then it satisfies (4)/(5), respectively [10]. 2 1 n > 2 ln (6) 2 δ 2 1 ln n> (7) 2f r(x, D)2 2 δ Since we require δ < 12 , the sample size n must satisfy: 1 ln4 22
(8)
1 ln4 2f r(x, D)2 2
(9)
n> n>
Sampling Ensembles for Frequent Patterns
1201
So, we have: Theorem 1. ∀ 0 < < 1, when the sample size n satisfies the inequality (8)/(9), the sampling ensemble works well in terms of the measure of probable error (4)/(5). As a rule, since we don’t know the real support of a frequent pattern, fr(x,D), we always minimize it to the support threshold such that the sample size satisfies all conditions we might encounter. Similar with the result in [9], there exists heavily overestimated phenomenon. However, it has a wonderful function of theoretical direction, especially for large scale databases. Similarly, according to Chernoff Bound, we have: Corollary 1. ∀ 0 < < 1, when the sample size n satisfies the inequality (10)/(11), the sampling ensemble works well in terms of the measure of probable error (4)/(5). 3f r(x, D) ln4 (10) n> 2 n>
3 ln4 f r(x, D)2
(11)
According to the Center Limited Theorem in statistic, we have: Corollary 2. ∀ 0 < < 1, when the sample size n satisfies the inequality (12)/(13), the sampling ensemble works well in terms of the measure of probable error (4)/(5). 2 3 f r(x, D)(1 − f r(x, D)) −1 n> (12) φ 2 4 n>
(1 − f r(x, D)) f r(x, D)2
2 3 φ−1 4
(13)
√ y 2 where Φ(y) = (1/ 2π) −∞ e−x /2 dx is the normal distribution function. (12)/ (13) is the best result under worst-case analysis. Since sampling ensemble is indeed a kind of Monte Carlo algorithm, we have: Corollary 3. when the sample size of sampling ensemble is larger the theoretical low bound, with the increase of quantity of individual samples in an ensemble, the probability of correct answers exponentially increases. 3.2
Bias-Variance Decomposition
In order to learn where the performance of a sampling ensemble originates and explain why it works, in this section, we use the bias-variance decomposition [13-16] to analyze the sample error of an ensemble. The sample error is measure by Cerror(S, D) + ξ¯S,D in this section.
1202
C. Jia and R. Lu
Let the error induced by individual samples in an ensemble be: N 1 E¯ = N i=1
|f r(x, D) − f r(x, Si )|
(14)
x∈L(D)∪L(Si )
where N is the total number of individual samples in an ensemble. For sure the simpleness of the inference, we ignore the factor of normalization in this section, 1 for example, the factor L(D)∩L(S is ignored in (14). i ) Similarly, the error of a sampling ensemble is as follows: ¯b = E |f r(x, D) − f r(x, Sbag )| (15) x∈L(D)∪L(Sbag )
where Sbag is the final answer set of the ensemble. It’s just the bias of the ensemble. And the variance of the ensemble is denoted as follows: N 1 E¯v = N i=1
|f r(x, Si ) − f r(x, Sbag )|
(16)
x∈L(Si )∪L(Sbag )
It measures the diversity of answers on sample sets in an ensemble. Seeing that N
|f r(x, D) − f r(x, Si )| E¯ = N1 ≤ ≤ +
1 N 1 N 1 N
N
i=1 x∈L(D)∪L(Si )
i=1 x∈L(D)∪L(Si )∪L(Sbag ) N
i=1 x∈L(D)∪L(Si )∪L(Sbag ) N
|f r(x, D) − f r(x, Si )| |f r(x, D) − f r(x, Sbag )| |f r(x, Sbag ) − f r(x, Si )|
i=1 x∈L(D)∪L(Si )∪L(Sbag )
¯v + E = E¯b + E where we set E =
1 N
+ N1
N
|f r(x, D) − f r(x, Sbag )|
i=1 x∈L(Si )−(L(D)∪L(Sbag )) N
|f r(x, Sbag ) − f r(x, Si )|
i=1 x∈L(D)−(L(Si )∪L(Sbag ))
We have: ¯b , and the average sample Theorem 2. The sample error of sampling ensemble, E ¯ error of individual samples, E, satisfy the following inequality. ¯b ≥ E ¯ −E ¯v − E E
(17)
Although we can’t draw a conclusion that Eb ≤ E, the larger the right hand of (17), the larger Eb will be.
Sampling Ensembles for Frequent Patterns
1203
For the sample error defined by Cerror(S, D) + ξ¯S,D , if we let f¯r(·, ·) substitute for f r(·, ·), the similar result with (17) will be obtained. For the two databases, S and D, ⎧ x ∈ L(D) but x ∈ / L(S) ⎨1 x∈ / L(D) but x ∈ L(S) f¯r(x, D) = 0 ⎩ f r(x, D) x ∈ L(S) ∩ L(D) It means that we give a penalty to a false positive or a false negative frequent pattern. For the three databases, S, Sbag and D, ⎧ 1 x ∈ (L(D) ∩ L(S)) or ⎪ ⎪ ⎪ ⎪ x ∈ (L(D) ∩ L(Sbag )) or ⎪ ⎪ ⎨ x ∈ (L(S) ∩ L(Sbag )) but f¯r(x, D) = x ∈ / L(S) ∩ L(Sbag ) ∩ L(D) ⎪ ⎪ ⎪ ⎪ f r(x, D) x ∈ L(S) ∩ L(Sbag ) ∩ L(D) ⎪ ⎪ ⎩ 0 others It predicates that the penalty is given to the minority frequent patterns in L(D), L(S) and L(Sbag ). Thus, the following equation holds. |f¯r(x, D) − f¯r(x, S)| = Cerror(S, D) + αξ¯S,D x∈L(D)∪L(S)
where α = L(S)∪L(D) L(S)∩L(D) . Using similar techniques, we have: Corollary 4. The sample error of sampling ensemble, e¯b , and the average sample error of individual samples, e¯, satisfy the following inequality. e¯b ≥ e¯ − e¯v − e Where e¯ =
N 1 (Cerror(D, Si ) + ξ¯D,Si ) N i=1
e¯b = Cerror(D, Sbag ) + ξ¯D,Sbag e¯v = and e =
1 N
+ N1
N
N 1 (Cerror(Si , Sbag ) + ξ¯Si ,Sbag ) N i=1
(18)
(19) (20) (21)
|f¯r(x, D) − f¯r(x, Sbag )|
i=1 x∈L(Si )−(L(D)∪L(Sbag )) N
|f¯r(x, Sbag ) − f¯r(x, Si )|
i=1 x∈L(D)−(L(Si )∪L(Sbag ))
According to (18), the performance of an ensemble is influenced by the individual samples in an ensemble. It’s better to diminish e¯ and enlarge e¯v . It
1204
C. Jia and R. Lu
means that the more accuracy and the more diversity of the answers on samples in an ensemble, the more accuracy of the ensemble might be (the result will be supported by the experiments in the next section).
4
Some Experiments
In this section, we use both a synthetic and a real-world benchmark database to test the performance of sampling ensembles. The synthetic database is generated using code from the IBM QUEST project [1]. And the parameter settings are as same as those in [1]. In the paper, we choose the T10I4D100k denoted as D1 as a test database. The real-world database is large sales database derived from Blue Martini Software Inc, named BMS-POS and denoted as D2 . And in order to make a fair comparison we use Apriori written by Christian Borgelt1 and the fast speed sequential random sampling algorithm, Method D in [19], in all cases to compute the frequent itemsets and perform sampling. For both of the databases, the support threshold is 0.77% for sure there are neither too many nor too few frequent itemsets. Because of space limitation, we just show the results of sampling ensembles with 3 individual samples at 1% sampling ratios on D1 and D2 respectively (table 1). These results indicate that Cerror(S, D), ξ¯S,D and VS,D are all reduced (the similar results can also been obtained at other sampling ratios). Table 1. The three individual samples with sampling ratio 1% Name False Missed Cerror ξ¯S,D VS,D S1 126 45 0.289831 0.00220286 0.00186835 173 47 0.345369 0.00243885 0.00202185 S2 186 48 0.36 0.00246875 0.00179169 S3 3 − Ensemble 79 40 0.201507 0.00140448 0.00160673 (a) The result on D1 Name False Missed Cerror ξ¯S,D VS,D S1 301 164 0.225745 0.001261 0.0016519 257 180 0.212239 0.00138286 0.00128904 S2 307 128 0.206259 0.00117324 0.00113885 S3 3 − Ensemble 251 108 0.146169 0.000758604 0.00066939 (b) The result on D2
Figure 1 shows the cardinal errors of sampling ensembles with different number of individual samples at 5% sampling ratio for D1 , 1% sampling ratio for D2 . According to figure 1, with the increase of number of individual samples, the cardinal error is gradually reduced. It means that we can obtain very accurate answers at a small (fixed) sample size when the number of individual samples is proper. It makes the difficult problem of sample size selection, as argued by several researches, can be avoid or covert into the problem of selecting the number of individual samples at small sample size. 1
http://fuzzy.cs.uni-magdeberg.de/∼borgelt/
Sampling Ensembles for Frequent Patterns
(a) D1 with 5% sampling ratio
1205
(b) D2 with 1% sampling ratio
Fig. 1. The cardinal errors of ensembles with different number of individual samples
Since the more diversity of individual samples, the more accuracy of an ensemble might be according to theorem 2. We also test the performance of sampling ensembles with no overlap individual samples (for space limitation, we just show the results on D1 with the 5% sampling ratio in table 2 ). According to the experiments, in general, the sample error of a sampling ensemble is sure less than that of a sampling ensemble using sampling Method D directly (it will cause overlap phenomena) at the same sample size. Table 2. The sample error of a sampling ensemble on D1 Name Cerror ξ¯S,D VS,D 3 − Ensemble 0.0987654 0.000938128 0.000818714 5 − Ensemble 0.0770833 0.000781941 0.00061186 7 − Ensemble 0.0623701 0.000674501 0.000539261 (a) overlap sampling Method D Name Cerror ξ¯S,D VS,D 3 − Ensemble 0.0792683 0.000847682 0.000667883 5 − Ensemble 0.0695297 0.000655385 0.000504559 7 − Ensemble 0.053719 0.00060131 0.000456445 (b) no overlap sampling Method D
5
Conclusion
One of the approaches to improving the scalability of data mining algorithm is to take random sample and do data mining on it. However, it is difficult to determine the appropriate sample size for a given accuracy. Many researchers are dedicate themselves to solve this problem. But they seldom consider how to improve the accuracy of answers at small sample size. In this paper, we give a sampling ensemble method to improve the accuracy of answers at a fixed small
1206
C. Jia and R. Lu
sample size. And some pertinent theoretical problems are discussed by using machine learning method. Both the theoretical analysis and the real experiments show the sampling ensemble works well.
References 1. Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In VLDB’94, Santiago, Chile (1994) 487-499 2. Zaki, M. J., Hsiao, C.-J.: CHARM: An Efficient Algorithm for Closed Association Rule Mining. Technical Report 99-10, Computer Science Dept., Rensselaer Polytechnic Institute (1999) 3. Han, J., Pei, J., Yin, Y.: Mining Frequent Patterns Without Candidate Generation. In SIGMOD’00, Dallas, TX (2000) 1-12 4. Chen, B., Haas, P., Scheuermann, P.: A New Two-phase Sampling Based Algorithms for Discovery Association Rules. In ACM SIGKDD’02, EDmonton, Alberta, Canada (2002) 462-468 5. Bronnimann, H., Chen, B., et al: Efficient Data Reduction with EASE, In ACM SIGKDD’03, Washington, D.C., USA (2003) 59-68 6. Parthasarathy, S.: Efficient Progressive Sampling for Association Rules. In ICDE’02, Maebashi City, Japan (2002) 354-361 7. Jia, C-Y., Gao, X-P.: Multi-scaling Sampling: An Adaptive Sampling Method for Discovering Approximate Association Rules. to be appeared, Journal of Computer Science and Technology. 8. Toivonen, H.: Sampling Large Databases for Association Rules. In VLDB’96, Mumbai (Bombay), India (1996) 134-145 9. Zaki, M. J., Parthasarathy, S., Li, W., Ogihara, M.: Evaluation of Sampling for Data Mining of Association Rules. In Proceeding of the 7th Workshop on Research Issues in Data Engineer, Birmingham, UK (1997) 42-50 10. Watanabe, O.: Simple Sampling Techniques for Discovery Science. IEICE Trans. Information and Systems, E83-D(1) (2000) 19-26 11. John, G. H., Langley, P.: Static Versus Dynamic Sampling for Data Mining. In ICDM’96, AAAI Press (1996) 367-370 12. Domingo, C., Gavalda, R., Watanabe, O.: Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms. Data Mining and Knowledge Discovery, An International Journal, Vol. 6(2). (2002) 131-152 13. Breiman, L.: Bagging Predictors. Machine Learning, Vol. 24. (1996) 123-140 14. Breiman, L.: Bias, Variance, and Arcing Classifiers. The Annals of Statistics, Vol. 26(3). (1998) 801-849 15. Dietterich, T. G.: Ensemble Methods in Machine Learning. Lecture Notes in Computer Science, Vol. 1857. Springer-Verlag, Berlin Heidelberg New York (2000) 1-15 16. Krogh, A., Vedelsby, J.: Neural Network Ensembles, Cross Validation, and Active Learning. Advances in Neural Infromation Processing Systems 7 (1995) 231-238 (2000) 267-279 17. Esposito, R., Saitta, L.: Monte Carlo Theory As An Explanation of Bagging and Boosting. In IJCAI’03, Acapulco, Mexico (2003) 499-504 18. Valiant, L. G., A Theory of the Learnable. Communications of the ACM 27. (1984) 1134-1142 19. Vitter, J. S.: An Efficient Algorithm for Sequential Random Sampling. In ACM Transactions Mathematical Software, Vol. 13(1). (1987) 58-67.
Distributed Data Mining on Clusters with Bayesian Mixture Modeling1 M. Viswanathan, Y.K. Yang, and T.K. Whangbo College of Software, Kyungwon University, San-65, Bokjeong-Dong, Sujung-Gu, Seongnam-Si Kyunggi-Do South Korea {murli, ykyang, tkwhangbo}@kyungwon.ac.kr
Abstract. Distributed Data Mining (DDM) generally deals with the mining of data within a distributed framework such as local area and wide area networks. One strong case for DDM systems is the need to mine for patterns in very large databases. This requires mandatory partitioning or splitting of databases into smaller sets which can be mined locally over distributed hosts. Data Distribution implies communication costs associated with the need to combine the results from processing local databases. This paper considers the development of a DDM system on a cluster. In specific we approach the problem of data partitioning for data mining. We present a prototype system for DDM using a data partitioning mechanism based on Bayesian mixture modeling. Results from comparison with standard techniques show plausible support for our system and its applicability.
1 Introduction Data mining research is continually coming up with improved tools and methods to deal with distributed data. There are mainly two scenarios in distributed data mining (DDM): A database is naturally distributed geographically and data from all sites must be used to optimize results of data mining. A non-distributed database is too large to process on one machine due to processing and memory limits and must be broken up into smaller chunks that are sent to individual machines to be processed. In this paper we consider the latter scenario [3]. This paper considers the use of a Bayesian clustering system (SNOB) [1] for dividing large databases into meaningful partitions. These variable-sized partitions are then distributed using a standard MPI based system to a number of local hosts on a cluster. The local hosts run data mining algorithms on the data partition received thus producing local models. These models are combined into a global model by using a model aggregation scheme. The accuracy of the global model is verified through a crossvalidation technique. 1
This research was supported by the MIC (Ministry of Information and Communication), Korea, under the ITRC (Information Technology Research Center) support program supervised by the IITA (Institute of Information Technology Assessment).
L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 1207 – 1216, 2005. © Springer-Verlag Berlin Heidelberg 2005
1208
M. Viswanathan, Y.K. Yang, and T.K. Whangbo
In light of this paper we typically consider DDM as a four stage process. • Data Partitioning – The large dataset must be partitioned into smaller subsets of data with the same domain according to some function f. The choice of algorithm is dependent on the user, on the DDM system requirements and resources available. Communication needs, message exchange and processing costs are all considerations of this task. • Data Distribution – Once we have partitioned the database the data must be distributed usually over a LAN. An appropriate distribution scheme must be devised depending on number of clusters, size of clusters, content of clusters, size of LAN and LAN communication speeds. For example the user may decide to distribute similar (this measure of similarity is non-trivial) clusters on the same local host machine to save on communication costs or may decide to distribute one cluster per host (assuming the required number of hosts are available on the LAN) to save on processing costs by utilizing absolute parallel processing. • Data Modelling – Once the data is distributed on the LAN, local data models have to be constructed. Many different classification tools and algorithms are widely available. The choice of classification algorithm depends on the nature of the data (numerical, nominal or a combination of both), the size of the data and the destined nature and purpose of the models. • Model Aggregation - All the local models must be combined and aggregated in some optimal way such as to produce one global model to describe the whole database. Model aggregation attempts to achieve one of the fundamental aims of DDM that is to come up with a final global data model from the distributed database to match the efficiency and effectiveness of a data model developed from undistributed data. Some existing techniques rely on such methods as voting, combining and meta-classifiers. In order to develop a good and efficient model aggregation framework one must analyse communication needs and costs as well as datasets sizes and distributions. For example if local datasets are reasonably large it would be undesirable to transfer them in their entirety across hosts since the incurring communication costs will effectively halt the whole system, in this case it might be a good idea to transfer sample subsets of the data or maybe just transfer descriptive data [7,8].
2 Data Partitioning and Clustering As suggested earlier we consider the issue of DDM in a case where the existing database is too big to process on a single machine due to memory and storage constraints. There is a great need in this case to use some intelligent mechanism to break the database into smaller groups of data 2.1 SNOB SNOB is a system developed for cluster analysis using mixture modeling by Minimum Message Length (MML) [1]. SNOB aims to discover the natural classes in the data by categorising data sets based on their underlying numerical distributions. It does this using the assumption that if it can correctly categorize the data, then the data
Distributed Data Mining on Clusters with Bayesian Mixture Modeling
1209
can be described most efficiently (i.e. using the minimum message length). SNOB uses MML induction, a scale-invariant Bayesian technique based on information theory [2]. To summarize SNOB considers that a database is usually made up of many objects or instances. Each instance has a number of different attributes with each attribute having a particular value. We can think of this as a population of instances in a space with each attribute being a different dimension or variable. SNOB assumes to know the nature, number and range of the attributes. The attributes are also assumed to be uncorrelated and independent of each other. SNOB attempts to divide the population into groups or classes such that each class is tight and simple while ensuring the classes’ attribute distributions significantly differ from one another. Snob takes an inductive inference approach to achieve data clustering [2]. 2.2 Data Partitioning Sub- system Our system employs SNOB in the initial process of data partitioning. One of our objectives in this project is to investigate whether using SNOB-based clustering is appropriate and efficient for partitioning in the framework of DDM. In figure 1 we depict the partitioning aspect of out DDM system with Snob being the core of the partitioning operation. Once SNOB is used to cluster the data and generate a report file containing details of the clusters, the complete database is divided into the appropriate clusters using a standard program.
Main DataBase
Cluster 1 Cluster 2 Cluster 3 Cluster 1 Clustor K
SNOB Clustering
Cluster Report
Partition Program
Fig. 1. The partitioning component of the DDM system using SNOB
In order to test the predictive accuracy of the models generated at the end of the DDM process we retain 10% of data records randomly selected from the database for cross-validation purposes. The test dataset is removed from the original dataset so we are finely left with the un-clustered dataset and each cluster containing only training data (for classification). In addition to these there is a separate dataset for testing purposes only.
1210
M. Viswanathan, Y.K. Yang, and T.K. Whangbo
3 Distributing the Data Distributing Data over a network such as a LAN is no trivial task. Many considerations must be taken into account before deciding on the distribution strategy. We need to take into consideration the number, size, and nature of clusters and purpose of distributing the data in order to come up with an optimal distribution scheme. Not only that, to further complicate matters every dataset is different and will be clustered differently and so we will have a different distribution scheme that will optimize each of these datasets. This problem is out of the scope of this project. As a simple solution we used the Message Passing Interface (MPI) to distribute the data in our DDM system. MPI is a library specification for message-passing, proposed and developed as a standard by a broadly based committee of vendors, implementers, and users. MPI is used by programs for inter-process communication and was designed for high performance on both massively parallel machines and on workstation clusters [5]. As can be seen in figure 2 the complete (un-clustered) training dataset resides on the master machine (ID 0) and each cluster is exported to a different host machine (ID>0). The first cluster is sent to host 1, the second cluster to host 2 and so on, although this is not the optimal solution for distribution this solution will still classify the clusters in parallel and cause an overall speedup in the data mining task. It should be noted that the distribution automatically occurs in conjunction with Data Modelling (classification) phase in our DDM system. In concept Data Distribution and Data Modelling are two distinctly different steps however in this project these two steps are combined into one using MPI and the Classification Algorithm.
C lu s te r 1
C o m p le te T r a in in g D a ta S e t
C lu s te r 2
C lu s te r i
Fig. 2. The MPI Architecture
4 Local Data Mining with C4.5 Creating data models in order to predict new future unseen data is the main task of data mining. There are various types of models that can be constructed however we
Distributed Data Mining on Clusters with Bayesian Mixture Modeling
1211
will use a classification model based on the C4.5 decision tree learning system. C4.5rules is a classification-rule learning algorithm developed by Quinlan that uses decision trees to represent its model of the data. It aims to find a function that can map data to a predefined class correctly thus accurately predicting and classifying previously unseen examples and data. The function is a decision tree and is built from a training data set and is evaluated using a test data set, these sets are usually subsets of the entire data. The algorithm attempts to find the simplest decision tree that can describe the structure of the data [6]. The implementation of the C4.5 that is used for this project is by Ross Quinlan [6]. The original source code was changed to incorporate distribution via MPI and to output human readable rule files. The C4.5 algorithm was embedded with MPI for use in our system. The result is the full dataset is classified by C4.5 on the master host and each cluster is classified by C4.5 on a different host in the network. In case of insufficient number of hosts in the network some hosts will classify more then one cluster. The working directory is copied to all hosts involved in the classification process and the algorithm is run from each host’s local copy of this directory. Once the C4.5 program finishes and terminates all the files from all the hosts are copied back to the original directory on the master machine. So as we have mentioned in the previous section the Data modeling is combined with Data Distribution in our DDM system. It should be noted that our C4.5 classification algorithm is a DDM classification algorithm. Figure 3 shows how C4.5 and MPI work together in our DDM system.
Cluster 1 Rule Set on Host 1
Cluster 1
Cluster 1 Rule Set on Host 2
Cluster 2
Cluster 3
C4.5 Algorithm
Message Passing Interface
Cluster 1 Rule Set on Host 3
Master Machine
Cluster 1 Rule Set on Host 4
Cluster 4
Cluster 1 Rule Set on Host k
Cluster k
Full Rule Set on Master Machine 0
Fig. 3. Distributed Data Modelling using MPI and the C4.5 classification algorithm
5 Model Aggregation Using a Voting Scheme Model Aggregation is the key step in the DDM process since it uses all distributed data and its associated learned knowledge to generate the all-inclusive and compre-
1212
M. Viswanathan, Y.K. Yang, and T.K. Whangbo
hensive data model for the original and complete database. With such a model we can predict the outcome of unseen new data introduced to our system. The Data Aggregation task itself is an area of extensive research and many methods and techniques have been developed [7, 8]. We have chosen to use a simple but effective voting technique whereby the classification models (rules) are validated against the unseen test data by means of positive and negative votes. A positive vote is an unseen data record that is classified correctly by a classification rule and the negative vote is an incorrectly classified record. Once the votes are summed up and the rules are thereby ranked, the system can choose what rules to discard and what rules to keep, these rules will form the global data model. Data Model A Data Model B Data Model C Data Model D Data Model E Fig. 4. The Data Aggregation Process
In terms of our DDM system, voting works as follows: Each Rule is tested on each data record in the test dataset and is given a positive vote for a correct prediction a negative vote for a false prediction and no vote at all if the rule conditions don’t hold. After all the records in the test dataset have been scanned the error rate is computed as: (1)
However if there are no votes at all then the error rate is considered unknown and is set to the value of –1.
6 Distributed Data Mining – System Components The whole DDM system can finally be discussed in terms of its individual components which have been discussed in the above sections. Our system takes a dataset and
Distributed Data Mining on Clusters with Bayesian Mixture Modeling
1213
using SNOB partitions it to clusters. The clusters get distributed over the LAN using MPI. Data models are developed for each cluster dataset using the classification algorithm C4.5. Finally the system uses a voting scheme to aggregate all the data models. The final global classification data model comprises of the top three rules for each class (where available). We can describe our DDM system in terms of the following algorithm in figure 5. Report N Data MaxHosts Ci Ri GModel
Snob Report File Number of Clusters produced by Snob Complete Dataset Maximum Number of Hosts on LAN Cluster I Rule set of I Global Classification Model
1:
(Report, N) = Snob(Data);
2:
Partition(Data, Report);
3:
For i = 1 to N
4: 5: 6: 7:
Ri = C4.5Classify(MPI(Ci, MaxHosts)); For i = 1 to N Validate(Ri); GModel = Aggregate(R1•R2•…•Ri); Fig. 5. Functional algorithm
Note that MPI is used in conjunction with the known maximum number of hosts (line 4) to classify the clusters in parallel using the C4.5 classification algorithm. If the number of clusters exceeds the available number of hosts then some hosts will classify multiple clusters (using MPI). Also the aggregation model scans all Rule files (line 7) from all clusters and picks the best rules out of the union of all cluster rule sets.
Data
Fig. 6. DDM System Components
During the Classification phase we have also classified the original un-clustered dataset and produced rules modelling this data. To finely ascertain if our DDM system is efficient we compare our Global Model to this un-clustered Data Model. We
1214
M. Viswanathan, Y.K. Yang, and T.K. Whangbo
compare the top three rules for each class from this model with our rules from the Global Model. If our global model is over 90% accurate in comparison to the unclustered data model we can consider this a good result and conclude that our DDM system is efficient.
7 DDM System Evaluation and Conclusions Our DDM system was applied to relatively smaller databases in order to test the effectiveness of the concept. The system was used to mine (classify) the original database as well the clustered partitions. The DDM system was designed to output classification rules for the original database and the clustered datasets. In the case of the partitioned clusters the rule were validated and aggregated using our voting scheme. Five real-world databases were compared taking into account the top classification rules for each class from the original database. Thus we could compare the difference in predictive accuracy of the model derived from the original datasets with the aggregated global model from the partitioned database. As an example of the empirical evaluation we present experiments on the ‘Demo’ database which is taken from Data for Evaluating Learning in Valid Experiments (Delve) from [13]. The top three rules for each class from the cluster-derived rules are compared against the rules mined from the un-partitioned dataset. The aim of this evaluation is to determine the effect of the clustering process on the efficiency of our classification model and its predictive accuracy. The total average error for the rules from the original dataset is 42.8% while for the clustered partitions it is 41.83%.
Error Rate Average
120 100 80 60 40 20 0 Class 1
Class 2
Class 3
Unclustered Rules
Class 4
Class 5
Clustered Rules
Fig. 7. Accuracy of Un-clustered Rules versus Clustered Rules
From the chart in figure 7 it can be seen that the first two classes of the clustered rules have a lower average error rate then the un-clustered ones. Rules for class three have approximately equal average error rates for both sets of rules and for the last two
Distributed Data Mining on Clusters with Bayesian Mixture Modeling
1215
classes the clustered rules have greater average error rates then the un-clustered ones. If we average out these results both set of rules will result in approximately total equal average error rates. If we had to graph these they would be two straight lines around the 42% average error rate mark. From the experimental evaluation the total average error rate for the un-partitioned datasets taking all datasets into account, was 30.86%. And the total average error rate for the clustered data taking all datasets into account is 35.46%. Overall in our experiments the clustering caused 4.6% loss of accuracy in our data modeling. Considering the other costs involved in standard schemes we think that this drop in accuracy is not very significant. In general our evaluation gives us plausible evidence supporting SNOB in the framework of Distributed Data Mining. There are many issues in this project that need further research and enhancement to improve the overall DDM system performance. The use of SNOB to cluster the entire database may be questioned on the premise that it is probably impossible to apply clustering on the entire database due to memory and processor limitations. This being true our primary objective is to demonstrate the effectiveness of SNOB in producing meaningful partitions that enable us to reduce the costs of model aggregation. In other words SNOB could be used to create partitions from arbitrary subsets of the very large database. A more efficient aggregation model may produce more predictive accuracy and hence a better DDM system.
References 1. Wallace C. S., Dowe D. L.: MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions. Statistics and Computing, 10(1), January (2000) 73-83 2. Wallace C. S., Freeman P. R.: Estimation and Inference by Compact Coding. Journal of the Royal Statistical Society series B., 49(3), (1987) 240-265 3. Park B. H., Kargupta H.: Distributed data mining: Algorithms, systems, and applications. In In Data Mining Handbook, To be published, (2002) 4. Kargupta H., Hamzaoglu I., Stafford B.: Scalable, Distributed Data Mining Using an Agent Based Architecture. Int. Conf. on Knowledge Discovery and Data Mining, August (1997) 211-214 5. Message Passing Interface Forum. MPI: A message-passing interface standard. International Journal of Supercomputer Applications, Vol. 8(3/4), (1994) 165-414 6. Quinlan, J. R.. C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann (1993) 7. Prodromidis A., Chan P.: Meta-learning in Distributed Data Mining Systems: Issues and Approaches. In Hillol Kargupta and Philip Chan, editors, Advances of Distributed Data Mining. MIT/AAAI Press, (2000) 8. Chan P., Stolfo S. J.:.. A Comparative Evaluation of Voting and Meta-learning on Partitioned Data. In Proceedings of Twelfth International Conference on Machine Learning, pages 90–98, (1995) 9. Subramonian R., Parthasarathy S.: An Architecture for Distributed Data Mining. In Fourth International Conference of Knowledge Discovery and Data Mining, New York, NY, (1998) 44–59
1216
M. Viswanathan, Y.K. Yang, and T.K. Whangbo
10. Fayyad U.M., Piatetsky-Shapiro G., Smyth P.: From data mining to knowledge discovery: an overview. In: U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy. Advances in Knowledge Discovery & Data Mining, AAAI/MIT, (1996) 1-34 11. Grigorios T., Lefteris A., Ioannis V.: Clustering Classifiers for Knowledge Discovery from Physically Distributed Databases. Data and Knowledge Engineering, 49(3), June (2004) 223–242 12. Kargupta H., Park B., Hershberger D., Johnson E.: Collective Data Mining: A New Perspective towards Distributed Data Mining. In Advances in Distributed and Parallel Knowledge Discovery, MIT/AAAI Press, (2000) 133–184 13. Neal, R. M.: Assessing relevance determination methods using DELVE', in C. M. Bishop (editor), Neural Networks and Machine Learning, Springer-Verlag (1998) 97-129
A Method of Data Classification Based on Parallel Genetic Algorithm Yuexiang Shi 1, 2, 3, Zuqiang Meng 2, Zixing Cai 2, and B.Benhabib 3 1
School of information engineer, XiangTan University, XiangTan 411105, China 2 School of information engineer, Central South University, ChangSha 410082, China 3 Department of Mechanical and Industrial Engineering, Toronto University, Ontario, Canada M5S 3G8
Abstract. An effectual genetic coding is designed by constructing fullclassification rule set. This coding results in full use of all kinds of excellent traits of genetic algorithm in data classification. The genetic algorithm is paralleled in process. So this leads to the improvement of classification and its ability to deal with great data. Some defects of current classifier algorithm are tided over by this algorithm. The analysis of experimental results is given to illustrate the effectiveness of this algorithm.
1 Introduction With the development of information technology and management, data mining becomes a new fiddle in Computer Science from 1990[1]. Today, data mining has been used in large enterprise, telecom, commercial, bank and so on. And the fields have made the best benefits from it. Algorithm ID3[2] and C4.5[3] are the first algorithms for data classification. The algorithms and the ramification were based on the decision tree. But there are some defects no way to avoid. Such as it is not powerful for retractility[4] and adjust. There is difficult to get the best decision tree just using the heuristic from information plus. The defect is lack of integer searching strategy. In fact, constructing the best decision tree is a NP[5]. But parallel genetic algorithm has the specific function to solve those problems. This paper presents a technology based on parallel genetic algorithm to class the data. It can improve and increase the ability of classification algorithm for magnanimity data.
2 Related Definitions Definition 1 (Date set). For a specific data mining system, the process data is called data set. It was written as U = {S1 , S 2 , " , S size } . The object in U is called as data L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 1217 – 1222, 2005. © Springer-Verlag Berlin Heidelberg 2005
1218
Y. Shi et al.
element. It is also called sample, example or object. The size presents the number of data set. Definition 2 (Length of attribute). The number was used to describe the attributes of data element is called attribute length. Definition 3 (Training set). The set of data element being used for the forecast model is called training set. It is written as T and T ⊆ U . Theorem 1. For the attribute length h of data set U, its attributes variables are A1 , A2 , " , Ah , the construction depth of decision tree is no more than h. (Proof: omit)
Aj Xj
1
Aj 1
2
Xj
Ai 1
2
2
Fig. 1. The decision tree only with one node
Xi
Fig. 2. The decision tree with multi-nodes
Definition 4 (Decision tree). Tree was called decision tree about B(U), when every interior node can present a test was taken on the attribute of B, every branch presents one test output and every leaf presents a distribution of one class. Definition 5 (Classification regulation). IF-THEN regulation r was called one classification about B (⊆ U ) , when the pre-part of regulation r was consisted of B’s attribute which forms the SAT. The post-part is a judge formulation that fits the data elements of SAT. Apparently, one classification regulation corresponds to one class C. Written as C = r (B ) . Definition 6 (Completed classification regulation). The set R was called completed classification regulation about B (⊆ U ) that was consisted of some classification regulations and
B = * r ( B) . r∈R
Theorem 2. For the given data set U, there must be a completed classification regulation about U. That is to say exist R, let
U = * r (U ) .(Proof: omit) r∈R
Data classification is a prediction model by a limited training set T and directed studying. It was used to describe the reserved data set U. That is to pre-class for U. This model is a completed classification regulation R about U. Assumed T was extracted from U and enough represents U, and U is very large to GB, TB, so T is large too. How to get the best and completed classification regulation R from the magna-
A Method of Data Classification Based on Parallel Genetic Algorithm
1219
nimity data about U, avoiding the exist problems, this is the research goal. In order to solve the problem, the genetic algorithm will be used in seeking the objects.
3 The Design of Genetic Algorithm for Data Classification *Coding. From theorem 1, for the attribute length h of data set U, the depth of any decision tree is no more than h. Like that, the number of SAT’s pre-part is also no more than h from classification regulation export. So for any regulation r, it can be described as IF-THEN: h+1) all are IF P1 AND P2 AND AND Ph THE Ph+1, Among that, Pi(i=1,2 judgment proposition and corresponds to the test attribute Ai. If there is no test about this attribute, the test was presented with TRUE. Assume Ai have different values: a1 , a 2 , " , a xi . T was divided into n class: C1 , C 2 , " , C n . So the proposition Pi can be expressed as: the value of Ai is. ji{1,2, ,Xi}. This item was coded as ji. If the value of Pi is TRUE, the item’s code is star ‘*’. Proposition Ph+1 was presented as: this data element that fits the pre-part SAT belongs to class Cy. This item was coded as j’ and j ∈ {1,2, " , n} . So the IF-THEN regulation r was coded as: j1 j 2 " j h j ' and called as chromosome θ r of regulation r. For the θ r , the bit’s value not equal ‘*’ was called valid genetic bit. The number of valid genetic was called the valid length about this chromosome and was written as genetic( θ r ). It was also called as valid genetic bit and valid genetic length based on regulation r. genetic( θ r ) was written as genetic(r). j1 , j 2 ,", j h j '∈ {1,2,", max{max{ X i | i = 1,2,", h}, n},*}h +1 . So one completed class regulation set can be coded as:
{ j1(1) j 2(1) " j h(1) j (1)' , j1( 2 ) j 2( 2 ) " j h( 2 ) j ( 2)' , " , j1( q ) j 2( q ) " j h( q ) j ( q )' } Among these, q represents the size of R and has the relation with specific R. The valid genetic length of classification regulation set was defined as:
genetic( R ) = max{genetic( j1(1) j 2(1) " j h(1) j (1)' ), genetic( j1( q ) j 2( q ) " j h( q ) j ( q )' )} *Group set. Group is an individual set consisted with classification regulation. It was present as:
P = {{ j1(1) j2(1) " jh(1) j (1)' , j1( 2 ) j2( 2 ) " jh( 2 ) j ( 2 )' , " , j1( q ) j2( q ) " jh( q ) j ( q )'} | i = 1,2, " , Psize }
Psize represents the group size. *Adapt function. Adapt function is a judge standard given by the algorithm for the individual and shown like +. One excellent completed classification regulation can exactly class for U. There is a contact between classification regulation set and valid genetic bits. According above, for individual S ∈ P , the adapt function was defined as:
f ( S ) = w1 x1 + w2 x 2 + w3 x3 + w4 (1 − x 4 ) among the formula, x1 = genetic( S ) shows the valid length of genetics.
1220
Y. Shi et al.
x 2 = ¦ genetic(r ) shows the total of valid genetic bits. r∈S
x3 =| S | shows the size of S. That is to say S has the number of classification regulations. x 4 = represents the wrong probability based on classification S.
wx is a weight coefficient corresponds to adapt function. *Genetic Algorithm Select Monte Carlo was used. Crossover For any S1,S2 P, random select S1 , S2, and random get a positive integer pos. Then the single crossover was taken at the location pos of , . In the same time, the dynamic adjustment was adapted to crossover probability Pc. When the individual tends to single model and heighten the crossover probability Pc optimising ability for the algorithm. Set Pc = − k ( f min − f avg ) /( f max − f avg + ε ) .
f max is the biggest adapt value at present. f min is the minimum and the f avg is the middle. k ∈ (0,1] , is a little positive real number and avoid the denominator equals zero. Variation It is mainly to ensure the reach for search space and keeps the multiplicity of individual. The process is random to select S P, S, and random select a valid genetic bit(assume j), added a random positive number ram, then models and general this bit Xj(number of attribute Aj). The variation probability Pm is 0~0.05[6].
4 Parallel Genetic Algorithms There are 4 kinds of parallel Genetic Algorithm. Take the characteristics of the database system into consideration, the article selects granularity parallel model(CPGA) which can also be called as acnode. The method is to extract training-set T from dataset U and at the same time, generating Psize decision trees through T using C4.5. And above all these, educing Psize complete-sorted-rules-sets, then divided them into several subgroup and send them to each acnode as initial colony. The colony evolves themselves independently in succession until they fulfill certain qualifications (for example, time interval or genetic algebra etc.). The acnode exchange each other and continue to evolve until the Algorithm converge. The exchange manner between the acnodes is one of the key method of the Genetic Algorithm. We present a stochastic exchange method in article [8], shown as Fig.3.(annotate: as an example, Pi in the figure represents acnode, i=1,...5; the dotted lines represent possible data exchange fashion). The article selects this method though it maybe inconvenient when implementing program, yet it can avoid the problem of prematurity[8].
A Method of Data Classification Based on Parallel Genetic Algorithm
1221
P1 P5
P2
P4
P3
Fig. 3. Improved CPGA
5 Experiment Results The experiment based upon some filiale of the China Petro Co.. The sample attributes mainly include the following control parameter as temperature, compressive stress, metal etc. The result are divided into 5 kinds and respectively indicate 5 product class of dissimilar quality: particularly excellent, excellent, commonly, not good and waster. In the experiment, there is a contrast between the CMGA of the parallel Genetic Algorithm and the one based on common Genetic Algorithm. The size of the trainingset T is 500, and the testing-set size is 600. The initial value of the Genetic Algorithm Cross-probability is 0.7, and it's dynamic adjust coefficient K is 0.6, the initial value of the aberrance probability Pm is 0.03, the weight coefficient W1, W2, W3 and W4 are respectively denoted as 0.01,0.0001,0.01,0.95, the constringency of the Algorithm is controlled by threshold. The Algorithm will be considered as constringency when the sort mistake rate is under or equal to 0.30%. As to CMPGA, other than denoted as the same parameter as above, the acnodes start an interchange among outstanding individual each 20 generation. The parallel environment is composed of 3 PC connected by internet and the message transfer between courses is based on traditional Socket Programming method. There is a 40%-65% higher of CMPGA than that of CMGA.
6 Conclusions The article constructs a complete-sorted-rules-set, realizes genetic coding and thereafter reinforces the maneuverability of the Algorithm. The system utterly data management and global optimize ability is advanced at one side because of the effective import of the parallel Genetic Algorithm, and at the other side, enlarging the manage mode from n to O(n3) using the concealed parallel of the Genetic Algorithm. Thus it greatly solves the retractility of the sort algorithm and fit to the mining to the large database.
References 1. Li li, Xu zhanweng. An algorithm of classification regulation and trend regulation. MiniMicro system, 2000, 21(3): 319-321 2. Quinlan J R. Induction of decsion trees. Machine Learning, 1986, (1):81-106
1222
Y. Shi et al.
3. Quinlan J R. C4.5 Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann Publishers, Inc. 1993 4. Han Jianwei, Kamber M. Data Mining: Concept and technology. Peking: Mechanical industry publishers. 2001.8 5. Hong J R. AE1: An extension approximate method for general covering problem. Interna-
tional Journal of Computer and Information Science, 1985,14(6):421-437 6. Xi Yugong, Zi Tianyou. Summary of genetic algorithm. Control theory and application. 1996,13(6):697-708 7. Wang Hongyan. Parallel genetic algorithm research development. Computer Science, 1999,26(6):48-53 8. Meng Zuqiang, Zheng Jinghua. Parallel genetic algorithm based on shared memory and
communication. Computer engineering and application. 2000, 36(5):72-74
Rough Computation Based on Similarity Matrix Huang Bing1, Guo Ling2, He Xin2, and Xian-zhong Zhou3 1
Department of Computer Science & Technology, Nanjing Audit University, Nanjing 210029, China 2 Department of Automation, Nanjing University of Science & Technology, Nanjing 210094, China 3 School of Engineering Management, Nanjing University, Nanjing 210093, China
Abstract. Knowledge reduction is one of the most important tasks in rough set theory, and most types of reductions in this area are based on complete information systems. However, many information systems are not complete in real world. Though several extended relations have been presented under incomplete information systems, not all reduction approaches to these extended models have been examined. Based on similarity relation, the similarity matrix and the upper/lower approximation reduction are defined under incomplete information systems. To present similarity relation with similarity matrix, the rough computational methods based on similarity relation are studied. The heuristic algorithms for non-decision and decision incomplete information systems based on similarity matrix are proposed, and the time complexity of algorithms is analyzed. Finally, an example is given to illustrate the validity of these algorithms presented.
1 Introduction The rough set theory proposed by Pawlak[1] is a relatively new mathematic tool to deal with imprecise and uncertain information. Because of the successful application of rough set theory to knowledge acquisition, intelligent information processing, pattern recognition and data mining, many researchers in these fields are very interested in this new research topic since it offers opportunities to discover useful knowledge in information systems. In rough set theory, a table called information system or information table is used as a special kind of formal language to represent knowledge. In an information system, if all the attribute values of each object are known, this information system is called complete. Otherwise, it is called incomplete. Obviously, it is very difficult to ensure completion of an information system due to the uncertainty of information. As we know, not all of attributes are indispensable in an information system. The approach to removing all redundant attributes and preserving the classification or decision ability of the system is called attribute reduction or knowledge reduction. Though knowledge reduction has been an important issue in rough set theory, most of knowledge reductions are based on complete information systems [2-4]. To deal with incomplete information, several extended models to the classical rough set are L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 1223 – 1231, 2005. © Springer-Verlag Berlin Heidelberg 2005
1224
H. Bing et al.
proposed [5-6]. In these extended theories, equivalence relation is relaxed to tolerance relation, similarity relation or general binary relation, respectively. In [7], several Definitions of knowledge reductions based on tolerance relation were presented. However, corresponding definitions and approaches to knowledge reduction based on similarity relation were not studied. In this paper, we aim at the approach to attribute reduction based on similarity relation in incomplete information systems. The remainder of this paper is organized as follows. Section 2 reviews rough set model based on similarity relation in incomplete systems and Section 3 gives the definitions of lower approximation reduction and lower approximation matrix in incomplete data tables, and proposes an algorithm for lower approximation reduction. Similar to Section 3, the corresponding contents are studied in incomplete decision tables in Section 4. An illustrative example is given in Section 5 and Section 6 concludes this paper.
2 Incomplete Information Systems and Similarity Relation Definition 1. S = (U , A = C D, V , f ) is called an information system, where U is a non-empty and finite set of objects called the universe, C is a non-empty and finite set of conditional attributes, D is a non-empty and finite set of decision attributes, and C D = φ . In general, D is the set including a single element. The information function, f , is a map from U × A onto V , a set of attribute values. If every object in the universe has a value on any attribute, this information system is complete. Otherwise, it is incomplete. Considering further, if D = φ , it’s a non-decision information system, otherwise a decision table. For simplicity, we assume that all values of every object on decision attributes are known. Definition 2. In an incomplete information system S = (U , A = C D, V , f ) , where x, y ∈ U , the object
x is called similar to y with respect to B ⊆ A , if ∀ b ∈ B , such that
f ( x, b) = * ∨ ( f ( x, b) = f ( y, b)) , and the relation between
x and y is denoted as
S B ( x, y ) .
Obviously, the similarity relation is reflexive and transitive, but not symmetric. Definition 3. R B ( x) = { y : y ∈ S B ( y , x)} ; R B−1 ( x) = { y : y ∈ S B ( x, y )} . R B (x) denotes the set of all objects in U which are similar to
x with respect to B ,
−1 B
and R ( x) denotes the set of all objects in U to which x are similar with respect to
B . Obviously, ∀ b ∈ B ⊆ A , R B ( x) ⊆ R B \{b} ( x) and R B ( x) ≠ R B−1 ( x) .
R B−1 ( x ) ⊆ R B−1\{b} ( x ) . In general,
Rough Computation Based on Similarity Matrix
1225
Definition 4. S = (U , A = C D, V , f ) is an incomplete information system. The
upper approximation B( X ) and lower approximation B( X ) of X ⊆ U with respect to B ⊆ A are defined as follows: B( X ) = R B ( x) , B( X ) = {x : R B−1 ( x) ⊆ X } x∈ X
Theorem 1. ∀ b ∈ B ⊆ A , X ⊆ U , B( X ) ⊆ B \ {b}( X ) , B \ {b}( X ) ⊆ B( X ) .
Based on the similarity relation, we can define the upper/lower approximation reduction.
3 Similarity Matrix and Approximation Reduction for Non-decision Incomplete Information Systems Definition 5. S = (U , C , V , f ) is an non-decision incomplete information system,
where U = {x1 , x 2 , , x n } . The lower approximation matrix of B ⊆ C can be defined as x j ∈ RB−1 ( xi ) . else
1 B M B = ( m ij ) n× n = { 0
Definition 6. S = (U , C , V , f ) is an non-decision incomplete information system,
where U = {x1 , x 2 , , x n } . The upper approximation matrix of B ⊆ C can be defined as x j ∈ RB ( xi ) . else
B 1 M B = ( m ij ) n× n = { 0
Theorem 2. The upper/lower approximation matrix has the following attributes
1
mii = mii = 1 1 ≤ i ≤ n
2
mij = 1, m jl = 1 mil = 1
3
mij = 1 ⇔ m ji = 1
B
B
B
B
B
B
B
mij = 1, m jl = 1 mil = 1
B
Proof. (1) and (2) are direct results from the self-reflect and attributes of similarity. And (3) holds since x j ∈ RB−1 ( xi ) ⇔ xi ∈ RB ( x j ) . Definition 7. S = (U , C , V , f ) is an non-decision incomplete information system,
where U = {x1 , x 2 , , x n } and B ⊆ C . If ∀ xi (1 ≤ i ≤ n) ∈ U and RB−1 ( xi ) = RC−1 ( xi ) RB ( xi ) = RC ( xi )
, B is called lower / upper approximation consistent set. If B is the
upper/lower approximation consistent set and any proper subset of B is not, B is called lower/upper approximation reduct. According to (3) in theorem 2, we’ll only discuss lower approximation reduct.
1226
H. Bing et al.
Definition 8. If two matrix M1 = (mij1 ) m× n
M 2 = (mij2 ) m×n
M1 < M 2 ⇔ M1 ≤ M 2
1 ≤ i ≤ m,1 ≤ j ≤ n
Theorem 3. If E ⊂ F ⊆ C
then M1 ≤ M2 ⇔ mij1 ≤ mij2
and ∃ m1ij < mij2 .
MF ≤ ME.
Theorem 4. B ⊆ C is lower similarity reduct ⇔ M B = M C M B < M B \{b} holds.
and ∀ b ∈ B
Definition 9. S = (U , C ,V , f ) is a non-decision incomplete system, where B ⊆ C . If M B < M B \{b} , we say b is lower-approximation dispensable in B. The set of all
lower-approximation dispensable attributes in C is called the core of S = (U , C ,V , f ) , denoted as Core(C ) = {c ∈ C : M C \{c} < M C } . Theorem 5. If Re d (C ) is denoted as all the lower-approximation reduct of S = (U , C ,V , f ) , Core(C ) = Re d (C ) .
Proof. ∀ c ∈ Core(C ) such that c ∉ B
then M C \{c} > M C . If ∃ a lower approximation reduct, B
then B ⊆ C \ {c}
hand, ∀ c ∈ Re d (C )
hence M B > M C . It is contradictive. On the other
if c ∉ Core(C )
then M C \{c} = M C
thus there is subset of
C \ {c} which is lower approximation reduct. It is contradictive.
Definition 10. S = (U , C , V , f ) is a non-decision incomplete information system. Let M B = (mij ) n×n be the lower approximation matrix of B ⊆ C . The lower approximation B
significance of b ∈ C \ B with respect to B and b ∈ B in B are defined respectively as Sig (b) = M B − M B {b} B
and
Sig
B \{b}
(b) = M B \{b} − M B}
,
where
M B = mij : mij = 1,1 ≤ i, j ≤ n . B
B
According to Definition 5 and 6, the lower and upper approximation reduct are both minimal conditional attribute subsets, which preserve the lower and upper approximation of every decision class respectively. Definition 11. If Sig B (b) > 0 , b is called significant with respect to B . If Sig
B \{b}
(b) > 0 , b is called significant in B .
Rough Computation Based on Similarity Matrix
1227
Theorem 6. Core(C ) = {c ∈ C : Sig C \{c} (c) > 0} .
An algorithm to compute lower approximation reduct of incomplete data table is given as follows. Algorithm 1.
Input: an incomplete data table S = (U , C ,V , f ) Output: a lower approximation reduct of S = (U , C ,V , f ) 1.
Compute M C ;
2.
Compute Core(C ) = {c ∈ C : Sig C \{c} (c) > 0} . Let B = Core(C ) ; (1) Judge A B = A C , if it holds, then go to 3; (2) Select c 0 ∈{c : Sig (c) = max Sig (c )} , and do B = B {c 0 } . Go to (1). B B b∈C \ B
3.
Let k the number of attributes added into B = Core(C ) in step 2, which is denoted as {c i j : j = 1,2, , k} , where j is the added order. Judge Sig
4.
B \{ c i j }
B = B \ {c ik } .
(ci j ) = 0 in reverse order. If it holds, then
Output B.
Notes: Adding attributes step by step from the core until the attribute set satisfies the condition of lower approximation consistent set. Take the significance of attributes as a measure to be added. The greater the significance is, the more important the attribute is. The attempt aims to get a reduct with the least elements. The last step of algorithm 1 checks the dispensable of every non-core attribute to ensure that the final result is a reduct. We analyze the time complexity of Algorithm 1 as follows. 2
In step 1, the time complexity is O( C U ) . In step 2, we need to compute all 2
2
M C \{c} (c ∈ C ) and compare them with M C , so its time complexity is O( C U ) .The 2
2
2
2
time complexity of (1) is O( C U ) ; and that of (2) is O( C U ) since we need to find all matrices of every {c} B and compute the number of nonzero elements. In the worst 3
2
case, the algorithm loops C times and the time complexity of step 2 is O( C U ) . 3
2
Therefore, the time complexity of Algorithm 1 is O( C U ) .
1228
4
H. Bing et al.
Similarity Matrix and Upper/lower Approximation Reduction for Incomplete Decision Table
Definition
S = (U , A = C D, V , f )
12.
is
an
incomplete
decision
table,
where U = {x1 , x 2 , , x n } . Define the lower and upper generalized decision function as d B ( xi ) = { f ( x j , d ) : x j ∈ RB−1 ( xi )} and d B ( xi ) = { f ( x j , d ) : x j ∈ RB ( xi )} respectively.
Definition
S = (U , A = C D, V , f )
13.
is
an
incomplete
decision
table,
where U = {x1 , x 2 , , x n } and B ⊆ C . B is a relative lower approximation consistent set if ∀ xi ∈ U
d C ( xi ) = 1 d B ( xi ) = 1 and B is a relative upper approximation consistent
d B ( xi ) = d C ( xi ) .
set if ∀ xi ∈ U
If B is a relative lower/upper approximation consistent set and any proper subsets of B are not, B is relative lower/upper reduct. According to the above definitions, relative lower/upper approximation reduct is the minimal condition attribution set to keep the lower/upper approximation of each decision class constant. In other words, they keep all certain and possible rules constant, respectively. Definition
14.
S = (U , C D,V , f )
is
an
incomplete
decision
table,
where U = {x1 , x 2 , , x n } . The relative lower approximation matrix of B ⊆ C is defined as D B = ( d ijB ) n× n , where
1 B d ij = { 0 Definition
15.
d B ( xi ) = 1, f ( x j , d ) = f ( xi , d )
S = (U , C D,V , f )
else is
an
incomplete
decision
table,
where U = {x1 , x 2 , , x n } . The relative upper approximation matrix of B ⊆ C is defined B
as D B = (d ij ) n×n , where B 1 d ij = { 0
f ( x j , d ) ∈ d B ( xi ) else
.
Theorem 7. (1) B ⊆ C is the relative lower approximation reduct ⇔ D B = D C and ∀ b ∈ B, D B \{b} < D B ;
(2) B ⊆ C is the relative upper approximation reduct ⇔ D B = D C and ∀ b ∈ B, D B \{b} > D B .
Rough Computation Based on Similarity Matrix
1229
Similarly, we give the matrix representation of relative upper and lower approximation core as follows: Definition
16.
(1)
Relative
Core (C) = {c ∈ C : DC \{c] < DC } D
;(2)
lower
Relative
approximation
upper
approximation
core core
D
Core (C ) = {c ∈ C : D C \{c ] > D C } . Theorem 8. Core D (C ) = Re d D (C ) , where Re d D (C ) is all relative lower D
D
approximation reducts. Core (C ) = Core (C ) , where Re d D (C ) is all relative upper approximation reducts. Definition 17. The relative lower and upper approximation significance of b∈C \ B
with
respect
to
B
are
defined
as
D
Sig (b) = D B {b} − D B B
and
D
Sig B (b) = D B − D B {b} respectively.
An algorithm to compute relative lower approximation reduct for incomplete decision table is given as follows. Algorithm 2
Input: an incomplete decision table S = (U , C ,V , f ) Output: a relative lower approximation reduct of S = (U , C ,V , f ) . 1.
Compute D C ;
2.
Compute Core D (C ) = {c ∈ C : D C − DC \{c} > 0} . Let B = Core D (C ) ; (1) Judge D B = D C . If it holds, go to 3; (2) Select c0 ∈ {c ∈ C \ B : Sig DB (b) = Max Sig DB (b)} and do B = B {c0 } . Then go to c∈C \ B
(1). 3.
Let k the number of attributes added into B = Core(C ) in step 2, which is denoted as {c i j : j = 1,2, , k} , where j is the added order. Judge Sig DB \{c } (ci j ) = 0 in reverse ij
order. If it holds, then B = B \ {ci j } . 4.
Output B. 3
2
The time complexity of algorithm 2 is O( C U ) through similar analysis with algorithm 1. The algorithm of relative upper approximation reduct can be given in a similar way.
1230
H. Bing et al.
5 Example Analysis An incomplete information system is given is table 1, where U = {x1 , x 2 , , x12 } , C = {c1 , c2 , c3 , c4 , c5} , D = {d } .
Table. 1
U\A c1 c2 c3 c4 c5 d x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
1 1 1 1 1 0 0 0 0 0
0 0 1 * * 1 1 0 * *
1 1 * * 0 0 0 * * 1
1 * * * * 0 * * * *
0 * * * 1 1 * * * 0
1 1 2 2 1 2 2 1 1 2
Without considering decision attribute D = {d } , we use algorithm 1 to obtain the lower approximation reduct for the incomplete data table. Compute
B = Core(C ) = {c1 , c2 } , M B {c3 } = M C , Sig {c3} = M B − M B {c3 } = 8
Sig {c4} = M B − M B{c4 } = 2 B
B
Sig B {c 5 } =
M B − M B {c5 } = 10
So B = {c1 , c2 , c5} is
the minimal reduct. Similarly, with algorithm 2 we obtain the corresponding lower approximation relative reduct, which is {c1 , c2 , c3} or {c1 , c2 , c5} .
6 Conclusions Knowledge reduction is always a hot topic in rough set theory. However, most of knowledge reductions are based on complete information systems. Because of the existence of incomplete information in real world, the classical theory model of rough sets is extended to that based on similarity relation. In this paper, based on similarity relation, the upper/lower approximation reduction is defined and their approaches to corresponding reduction are examined. Finally, the example analysis shows the validity of this approach.
Rough Computation Based on Similarity Matrix
1231
References 1. Pawlak Z. Rough sets. International Journal of Computer and Information Sciences 1982;11:341-356. 2. Ju-Sheng Mi, Wei-Zhi Wu, Wen-Xiu Zhang. Approaches to approximation reducts in inconsistent decision tables. In G. Wang et al. (Eds.): Rough Set, Fuzzy Sets, Data Mining, and Granular Computing, 2003; 283-286. 3. Wen-Xiu Zhang, Ju-Sheng Mi, Wei-Zhi Wu. Approaches to knowledge reductions in inconsistent systems. International Journal of Intelligent Systems 2003; 18: 989-1000. 4. Zhang Wen-Xiu, Leung Yee, Wu Wei-Zhi. Information systems and knowledge discovery. Beijing: Science Press; 2003. 5. Kryszkiewicz, M. Rough set approach to incomplete information systems. Information Sciences 1998; 112, 39-49. 6. R. Slowinski, D. Vanderpooten. A generalized definition of rough approximations based on similarity. IEEE Transaction on Knowledge and Data Engineering 2000; 12: 331-336. 7. Zhou Xian-zhong Huang Bing. Rough set-based attribute reduction under incomplete information systems. Journal of Nanjing University of Science & Technology, 2003; 27: 630-635.
The Relationship Among Several Knowledge Reduction Approaches Keyun Qin1 , Zheng Pei2 , and Weifeng Du1 1
Department of Applied Mathematics, Southwest Jiaotong University, Chengdu, Sichuan 610031, China [email protected] 2 College of Computers & Mathematical-Physical Science, Xihua University, Chengdu, Sichuan, 610039, China [email protected]
Abstract. This paper is devoted to the discussion of the relationship among some reduction approaches of information systems. It is proved that the distribution reduction and the entropy reduction are equivalent, and each distribute reduction is a d reduction. Furthermore, for consistent information systems, the distribution reduction, entropy reduction, maximum distribution reduction, distribute reduction, approximate reduction and d reduction are all equivalent.
1
Introduction
Rough set theory(RST), proposed by Pawlak [1], [2], is an extension of set theory for the study of intelligent systems characterized by insufficient and incomplete information. The successful application of RST in a variety of problems have amply demonstrated its usefulness. One important application of RST is the knowledge discovery in information system (decision table). RST operates on an information system which is made up of objects for which certain characteristics (i.e., condition attributes) are known. Objects with the same condition attribute values are grouped into equivalence classes or condition classes. The objects are each classified to a particular category with respect to the decision attribute value, those classified to the same category are in the same decision class. Using the concepts of lower and upper approximations in RST, the knowledge hidden in the information system may be discovered. One fundamental aspect of RST involves the searching for some particular subsets of condition attributes. By such one subset the information for classification purpose provides is equivalent to (according to a particular standard) the condition attribute set done. Such subsets are called reducts. To acquire brief decision rules from information systems, knowledge reduction is needed. Knowledge reduction is performed in information systems by means of the notion of a reduct based on a specialization of the general notion of independence due to Marczewski [3]. In recent years, more attention has been paid to knowledge reduction in information systems in rough set research. Many types L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 1232–1241, 2005. c Springer-Verlag Berlin Heidelberg 2005
The Relationship Among Several Knowledge Reduction Approaches
1233
of knowledge reduction and their applications have been proposed for inconsistent information systems in the area of rough sets [4], [5], [6], [7], [8], [9], [10], [11], [12]. The first knowledge reduction approach due to[6] which is carry out through discernibility matrixes and discernibility functions. This kind of reduction is based on the positive region of the universe and we call it d reduction. For inconsistent information systems, Kryszkiewicz [7] proposed the concepts of distribution reduction and distribute reduction.Zhang [5] proposed the concepts of maximum distribution reduction and approximate reduction and provide new approaches to knowledge reduction in inconsistent information systems. Furthermore, some approaches to knowledge reduction based on variable precision rough set model were proposed [13]. Information entropy is a measure of information involved in a system. Based on conditional information entropy, some knowledge reduction approaches in information systems were proposed in [4]. This paper is devoted to the discussion of the relationship among some reduction approaches of information systems. It is proved that the distribution reduction and the entropy reduction are equivalent, and each distribute reduction is a d reduction. Furthermore, for consistent information systems, the distribution reduction, entropy reduction, maximum distribution reduction, distribute reduction, approximate reduction and d reduction are all equivalent.
2
Preliminaries and Notations
An information system is a quadruple S = (U, AT ∪ {d}, V, f ), where (1) U is a non-empty finite set and its elements are called objects of S. (2) AT is the set of condition attributes and d is the decision attribute of S. (3) V = ∪q∈AT ∪{d} Vq , where Vq is a non-empty set of values of attribute q ∈ AT ∪ {d}, called domain of the attribute q. (4) f : U → AT ∪ {d} is a mapping, called description function of S, such that f (x, q) ∈ Vq for each (x, q) ∈ U × (AT ∪ {d}). Let S = (U, AT ∪ {d}, V, f ) be an information system and A ⊆ AT . The discernibility relation ind(A) on U derived from A, defined by (x, y) ∈ ind(A) if and only if ∀a ∈ A, f (x, a) = f (y, a), is an equivalent relation and hence (U, ind(A)) is a Pawlak approximation space. We denote by [x]A the equivalent class with respect to ind(A) that containing x and U/A the set of these equivalent classes. For each X ⊆ U , according to Pawlak [1], the upper approximation A(X) and lower approximation A(X) of X with respect to A are defined as A(X) = {x ∈ U |[x]A ∩ X = ∅},
A(X) = {x ∈ U |[x]A ⊆ X}.
(1)
Based on the approximation operators, Skowron proposed the concept of d reduction of an information system. Definition 1. Let S = (U, AT ∪ {d}, V, f ) be an information system and A ⊆ AT . The positive region posA (d) of d with respect to A is defined as posA (d) = ∪X∈U/d A(X).
1234
K. Qin, Z. Pei, and W. Du
Definition 2. Let S = (U, AT ∪ {d}, V, f ) be an information system and A ⊆ AT . A is called a d consistent subset of S if posA (d) = posAT (d). A is called a d reduction of S if A is a d consistent subset of S and each proper subset of A is not a d consistent subset of S. All the d reductions can be carry out through discernibility matrixes and discernibility functions [6]. Let S = (U, AT ∪ {d}, V, f ) be an information system and B ⊆ AT , x ∈ U . We introduce the following notations: U/d = {D1 , · · · , Dr };
μB (x) = (D(D1 /[x]B ), · · · , D(Dr /[x]B ));
γB (x) = {Dj ; D(Dj /[x]B ) = maxq≤r D(Dq /[x]B )}; δB (x) = {Dj ; Dj ∩ [x]B = ∅}; 1 r Σ |B(Dj )|; ηB = |U | j=1 |D ∩[x] |
j B where D(Dj /[x]B ) = |[x] is the include degree of [x]B in Dj . B| For inconsistent information systems, Kryszkiewicz [7] proposed the concepts of distribution reduction and distribute reduction. Based on this work, Zhang [5] proposed the concepts of maximum distribution reduction and approximate reduction. Farther more, the judgement theorems and discernibility matrixes with respect to those reductions are obtained. These reductions are based on the concept of include degree.
Definition 3. Let S = (U, AT ∪ {d}, V, f ) be an information system, A ⊆ AT . (1) A is called a distribution consistent set of S if μA (x) = μAT (x) for each x ∈ U . A is called a distribution reduction of S if A is a distribution consistent set of S and no proper subset of A is distribution consistent set of S. (2) A is called a maximum distribution consistent set of S if γA (x) = γAT (x) for each x ∈ U . A is called a maximum distribution reduction of S if A is a maximum distribution consistent set of S and no proper subset of A is maximum distribution consistent set of S. (3) A is called a distribute consistent set of S if δA (x) = δAT (x) for each x ∈ U . A is called a distribute reduction of S if A is a distribute consistent set of S and no proper subset of A is distribute consistent set of S. (4) A is called a approximate consistent set of S if ηA = ηAT . A is called a approximate reduction of S if A is a approximate consistent set of S and no proper subset of A is approximate consistent set of S. [5] proved that the concepts of distribute consistent set and approximate consistent set are equivalent, a distribution consistent set must be a distribute consistent set and a maximum distribution consistent set. Let S = (U, AT ∪ {d}, V, f ) be an information system, A ⊆ AT and U/AT = {Xi ; 1 ≤ i ≤ n},
U/A = {Yj ; 1 ≤ j ≤ m},
U/d = {Zl ; 1 ≤ l ≤ k}.
The conditional information entropy H(d|A) of d with respect to A is defined H(d|A) = −
m j=1
(p(Yj ) ·
k l=1
p(Zl |Yj )log(p(Zl |Yj ))),
The Relationship Among Several Knowledge Reduction Approaches |Y |
1235
|Z ∩Y |
j l where p(Yj ) = |U|j , p(Zl |Yj ) = |Y and 0log0 = 0. j| Based on conditional information entropy, Wang [4] proposed the concept of entropy reduction for information systems.
Definition 4. Let S = (U, AT ∪ {d}, V, f ) be an information system and A ⊆ AT . A is called an entropy consistent set of S if H(d|A) = H(d|AT ). A is called an entropy reduction of S if A is an entropy consistent set of S and non proper subset of A is entropy consistent set of S.
3
The Relationship Among Knowledge Reduction Approaches
In this section, we discuss the relationship among knowledge reduction approaches. In what follows we assume that S = (U, AT ∪ {d}, V, f ) is an information system, A ⊆ AT and U/AT = {Xi ; 1 ≤ i ≤ n},
U/A = {Yj ; 1 ≤ j ≤ m},
U/d = {Zl ; 1 ≤ l ≤ k}.
Theorem 5. Let Yj = ∪t∈Tj Xt , 1 ≤ j ≤ m, where Tj is an index set. For each 1 ≤ l ≤ k, |Zl ∩ Yj |log(
|Zl ∩ Yj | |Zl ∩ Xt | )≤ ). |Zl ∩ Xt |log( |Yj | |Xt | t∈Tj
Proof. Let Tj1 = {Xt ; t ∈ Tj , Zl ∩ Xt = ∅}. If Tj1 = ∅, then Zl ∩ Xt = ∅ for each t ∈ Tj and hence Zl ∩ Yj = ∅, the conclusion holds. If Tj1 = ∅, by lnx ≤ x − 1, it follows that |Zl ∩ Yj | |Zl ∩ Xt | )− ) |Zl ∩ Xt |log( |Zl ∩ Yj |log( |Yj | |Xt | =
t∈Tj
|Zl ∩ Xt |log(
t∈Tj
=
t∈Tj1
≤
t∈Tj
|Zl ∩ Yj ||Xt | ) |Zl ∩ Xt |log( |Yj ||Zl ∩ Xt | |Zl ∩ Xt |(
t∈Tj1
=
|Zl ∩ Yj | |Zl ∩ Xt | )− ) |Zl ∩ Xt |log( |Yj | |Xt |
|Zl ∩ Yj ||Xt | − 1)loge |Yj ||Zl ∩ Xt |
loge (|Zl ∩ Yj ||Xt | − |Yj ||Zl ∩ Xt |) |Yj | t∈Tj1
=
loge ( (|Zl ∩ Yj ||Xt | − |Yj ||Zl ∩ Xt |) |Yj | t∈Tj1
=
loge (|Zl ∩ Yj |(|Yj | − |Yj |
t∈Tj1
t∈Tj −Tj1
|Xt |) − |Yj ||Zl ∩ Yj |) ≤ 0.
1236
K. Qin, Z. Pei, and W. Du
Theorem 6. H(d|AT ) = H(d|A) if and only if |Zl ∩ Yj |log(
|Zl ∩ Yj | |Zl ∩ Xt | )= ), |Zl ∩ Xt |log( |Yj | |Xt | t∈Tj
for each 1 ≤ l ≤ k and 1 ≤ j ≤ m, where Yj = ∪t∈Tj Xt , 1 ≤ j ≤ m, and Tj is an index set. Proof. H(d|AT ) = −
n
(p(Xi ) ·
i=1
p(Zl |Xi )log(p(Zl |Xi )))
l=1
|Zl ∩ Xi | 1 ), (p(Xi )|Zl ∩ Xi |log( |U | |Xi | i=1 k
=−
k
n
l=1
H(d|A) = −
m
(p(Yj ) ·
j=1
p(Zl |Yj )log(p(Zl |Yj )))
l=1
|Zl ∩ Yj | 1 ). (p(Xi )|Zl ∩ Yj |log( |U | |Yj | j=1 k
=−
k
m
l=1
The sufficiency is trivial because each A equivalent class is just a union of some AT equivalent classes. Necessity: For each 1 ≤ l ≤ k and 1 ≤ j ≤ m, by Theorem 5, |Zl ∩ Yj |log(
|Zl ∩ Yj | |Zl ∩ Xt | )≤ ), |Zl ∩ Xt |log( |Yj | |Xt | t∈Tj
and hence m
|Zl ∩ Yj | |Zl ∩ Xt | )≤ ). |Zl ∩ Xt |log( |Yj | |Xt | i=1 n
|Zl ∩ Yj |log(
j=1
If there exists 1 ≤ l ≤ k and 1 ≤ j ≤ m such that |Zl ∩ Yj |log(
|Zl ∩ Yj | |Zl ∩ Xt | )< ). |Zl ∩ Xt |log( |Yj | |Xt | t∈Tj
consequently, m j=1
|Zl ∩ Yj | |Zl ∩ Xt | |Zl ∩ Xt |log( )< ). |Yj | |Xt | n
|Zl ∩ Yj |log(
i=1
and hence H(d|AT ) < H(d|A), a contradiction. Theorem 7. A ⊆ AT is a distribution reduction of S if and only if A is an entropy reduction of S.
The Relationship Among Several Knowledge Reduction Approaches
1237
Proof. It needs only to prove that A is a distribution consistent set if and only if A is an entropy consistent set. Sufficiency: Assume that A is a distribution consistent set. For each 1 ≤ l ≤ k notice that Jx,A = {[y]AT ; [y]AT ⊆ [x]A } is a and 1 ≤ j ≤ m, let Yj = [x]A . We partition of [x]A . By |Zl ∩ [x]A | = [y]AT ∈Jx,A |Zl ∩ [y]AT |, it follows that |Zl ∩ [y]A | |Zl ∩ [x]A | |Zl ∩ [y]AT | = = , |[y]AT | |[y]A | |[x]A | for each [y]AT ∈ Jx,A and hence |Zl ∩ [x]A |log(
|Zl ∩ [x]A | )= |[x]|A
|Zl ∩ [y]AT |log(
[y]AT ∈Jx,A
|Zl ∩ [y]AT | ), |[y]AT |
it follows by Theorem 6 that H(d|AT ) = H(d|A) and A is an entropy consistent set. Necessity: Assume that A is an entropy consistent set. For each x ∈ U , Jx,A = {[y]AT ; [y]AT ⊆ [x]A } forms a partition of [x]A . Let J = Jx,A − {[x]AT }. For each 1 ≤ l ≤ k, by Theorem 6, it follows that |Zl ∩ [x]A |log( and hence
|Zl ∩ [x]A | )= |[x]A |
|Zl ∩ [y]AT |log(
[y]AT ∈Jx,A
|Zl ∩ [y]AT |log(
[y]AT ∈Jx,A
|Zl ∩ [y]AT |log(
[y]AT ∈Jx,A
|Zl ∩ [x]A | )= |[x]A |
|Zl ∩ [y]AT | ). |[y]AT |
|Zl ∩ [y]AT |log(
[y]AT ∈Jx,A
|Zl ∩ [y]AT | ), |[y]AT |
|Zl ∩ [x]A ||[y]AT | ) = 0, |[x]A ||Zl ∩ [y]AT |
that is, |Zl ∩ [x]A ||[x]AT | ) |[x]A ||Zl ∩ [x]AT | |Zl ∩ [x]A ||[y]AT | ) |Zl ∩ [y]AT |log( |[x]A ||Zl ∩ [y]AT |
−|Zl ∩ [x]AT |log( =
[y]AT ∈J
≤
|Zl ∩ [y]AT |(
[y]AT ∈J
=
loge |[x]A |
|Zl ∩ [x]A ||[y]AT | − 1)loge |[x]A ||Zl ∩ [y]AT |
(|Zl ∩ [x]A ||[y]AT | − |[x]A ||Zl ∩ [y]AT |)
[y]AT ∈J
loge (|Zl ∩ [x]A |(|[x]A | − |[x]AT |) − |[x]A |(|Zl ∩ [x]A | − |Zl ∩ [x]AT |)) |[x]A | loge = (|[x]A ||Zl ∩ [x]AT | − |[x]AT ||Zl ∩ [x]A |). |[x]A |
=
1238
K. Qin, Z. Pei, and W. Du
It follows that ln(
|Zl ∩ [x]A ||[x]AT | |Zl ∩ [x]A ||[x]AT | )≥ − 1. |[x]A ||Zl ∩ [x]AT | |[x]A ||Zl ∩ [x]AT |
By lna ≤ a − 1 for each a > 0, ln(
|Zl ∩ [x]A ||[x]AT | |Zl ∩ [x]A ||[x]AT | )= − 1, |[x]A ||Zl ∩ [x]AT | |[x]A ||Zl ∩ [x]AT |
and hence
|Zl ∩ [x]A ||[x]AT | = 1, |[x]A ||Zl ∩ [x]AT |
because a = 1 is the unique root of lna = a − 1, that is |[x]AT ∩ Zl | |[x]A ∩ Zl | = |[x]AT | |[x]A | and A is a distribution consistent set. Theorem 8. A is an entropy consistent set if and only if |[x]A ∩ [x]d | |[x]AT ∩ [x]d | = |[x]AT | |[x]A | for each x ∈ U . Proof. By Theorem 7, the necessity is trivial. Sufficiency: We prove that |Zl ∩ Yj |log(
|Zl ∩ Yj | |Zl ∩ Xt | )= ), |Zl ∩ Xt |log( |Yj | |Xt | t∈Tj
for any 1 ≤ l ≤ k and 1 ≤ j ≤ m and finish the proof by Theorem 6, where Yj = ∪t∈Tj Xt . If Zl ∩ Yj = ∅, then Zl ∩ Xt = ∅ for each t ∈ Tj and the conclusion holds. If Zl ∩ Yj = ∅, suppose that x ∈ Zl ∩ Yj , it follows that Zl = [x]d and Yj = [x]A . Let Jx,A = {[z]AT ; [z]AT ⊆ [x]A }. Consequently, |Zl ∩ Yj |log(
|Zl ∩ Yj | |Zl ∩ Xt | |Zl ∩ Xt |log( )− ) |Yj | |Xt | t∈T j
|[x]d ∩ [x]A | )− = |[x]d ∩ [x]A |log( |[x]A | =
[z]AT ∈Jx,A
=
[z]AT ∈Jx,A
|[x]d ∩ [z]AT |log(
|[x]d ∩ [z]AT |log(
[z]AT ∈Jx,A
|[x]d ∩ [x]A | )− |[x]A |
|[x]d ∩ [z]AT |log(
[z]AT ∈Jx,A
|[x]d ∩ [x]A ||[z]AT | |[x]d ∩ [z]AT |log( ). |[x]A ||[x]d ∩ [z]AT |
|[x]d ∩ [z]AT | ) |[z]AT | |[x]d ∩ [z]AT | ) |[z]AT |
The Relationship Among Several Knowledge Reduction Approaches
1239
Assume that u ∈ [x]d ∩ [z]AT , it follows that u ∈ [x]A and hence [x]d = [u]d , [z]AT = [u]AT and [x]A = [u]A , consequently, |[x]d ∩ [x]A ||[z]AT | |[u]d ∩ [u]A ||[u]AT | = = 1, |[x]A ||[x]d ∩ [z]AT | |[u]A ||[u]d ∩ [u]AT | and hence |Zl ∩ Yj |log(
|Zl ∩ Yj | |Zl ∩ Xt | )= ). |Zl ∩ Xt |log( |Yj | |Xt | t∈Tj
Theorem 9. Let S = (U, AT ∪{d}, V, f ) be an information system and A ⊆ AT . If A is a distribute consistent set of S, then A is a d consistent set of S. Proof. Let A ⊆ AT be a distribute consistent set of S and U/d = {D1 , D2 , · · · , Dr }. For each x ∈ posAT (d), it follows that [x]AT ⊆ [x]d and hence δAT (x) = {Dj ; Dj ∩ [x]AT = ∅} = {[x]d } = δA (x). Assume that [x]d = Dj , it follows that Dl ∩ [x]A = ∅ for each l ≤ r, l = j., that is [x]A ⊆ Dj = [x]d and x ∈ A([x]d ) ⊆ posA (d), it follows that posAT (d) ⊆ posA (d). posA (d) ⊆ posAT (d) is trivial.
4
Knowledge Reduction for Consistent Information Systems
An information system S = (U, AT ∪ {d}, V, f ) is called to be consistent, if [x]AT ⊆ [x]d for each x ∈ U . In this section, we discuss knowledge reductions for consistent information systems. We will prove that the concepts of distribution reduction, approximate reduction and d reduction are equivalent for consistent information systems. Theorem 10. Let S = (U, AT ∪ {d}, V, f ) be an information system. S is consistent if and only if posAT (d) = U . Proof. If S is consistent, then [x]AT ⊆ [x]d for each x ∈ U and hence x ∈ AT ([x]d ) ⊆ ∪X∈U/d AT (X) = posAT (d), that is posAT (d) = U . If posAT (d) = U , then x ∈ posAT (d) for each x ∈ U and hence x ∈ AT ([x]d ), that is [x]AT ⊆ [x]d . Theorem 11. Let S = (U, AT ∪ {d}, V, f ) be an information system. S is consistent if and only if δAT (x) = {[x]d )} for each x ∈ U . Proof. If S is consistent, then [x]AT ⊆ [x]d for each x ∈ U and hence δAT (x) = {[x]d )}. If δAT (x) = {[x]d )} for each x ∈ U , then [x]AT ∩ [y]d = ∅ for each [y]d = [x]d , that is [x]AT ⊆ [x]d and S is consistent. Theorem 12. Let S = (U, AT ∪ {d}, V, f ) be an information system. S is consistent if and only if H(d|AT ) = 0.
1240
K. Qin, Z. Pei, and W. Du
Proof. Assume that U/AT = {X1 , X2 , · · · , Xn },
U/d = {Y1 , Y2 , · · · , Ym }.
It follows that H(d|AT ) = − =−
n
(p(Xi ) ·
i=1 m n i=1 j=1
m
p(Yj |Xi )log(p(Yj |Xi )))
j=1
|Yj ∩ Xi | |Yj ∩ Xi | log( ). |U | |Xi |
If S is consistent, then for each i(1 ≤ i ≤ n), there exists unique j(1 ≤ j ≤ |Yj ∩Xi | |Yj ∩Xi | m) such that Xi ⊆ Yj , and hence |X = 1 or |X = 0, consequently, i| i| H(d|AT ) = 0. If H(d|AT ) = 0, then − it follows that
m n |Yj ∩ Xi | |Yj ∩ Xi | log( ) = 0, |U | |Xi | i=1 j=1
m |Yj ∩ Xi | |Yj ∩ Xi | log( ) = 0, |U | |Xi | j=1
for each i(1 ≤ i ≤ n), that is there exists j(1 ≤ j ≤ m) such that Xi ⊆ Yj , and S is consistent. Theorem 13. Let S = (U, AT ∪ {d}, V, f ) be a consistent information system and A ⊆ AT . (1) A is an entropy consistent set if and only if S = (U, A ∪ {d}, V, f ) is consistent. (2) A is a approximate consistent set if and only if S = (U, A ∪ {d}, V, f ) is consistent. (3) A is a positive domain consistent set if and only if S = (U, A ∪ {d}, V, f ) is consistent. By this Theorem, for consistent information systems, the concepts of distribution reduction, entropy reduction, maximum distribution reduction, distribute reduction, approximate reduction and d reduction are all equivalent.
Acknowledgements The authors are grateful to the referees for their valuable comments and suggestions. This work has been supported by the National Natural Science Foundation of China (Grant No. 60474022).
The Relationship Among Several Knowledge Reduction Approaches
1241
References 1. Pawlak, Z.: Rough Sets. International Journal of Computer and Information Science. 11 (1982) 341–356 2. Pawlak, Z.(ed.): Rough Sets: Theoretical Aspects of Reasoning About Data. Kluwer Academic Publishers, Boston (1991) 3. Marczewski.: A General Scheme of Independence in Mathematics. Bulletin de L Academie Polonaise des Sciences–Serie des Sciences Mathematiques Astronomiques et Physiques. 6 (1958) 731–736 4. Wang, G, Y., Yu, H., Yang, D, C.: Decision Table Reduction Based on Conditional Information Entropy. Chinese Journal of Computers (in Chinese). 25 (2002) 759– 766 5. Zhang, W, X., Mi, J, S., Wu, W, Z.: Knowledge Reductions in Inconsistent Information Systems. Chinese Journal of Computers (in Chinese). 26 (2003) 12–18 6. Skowron, A., Rauszer, C.: The Discernibility Matrices And Functions in Information System. Intelligent Decision Support Handbook of Applications and Advances of the Rough Sets Theory, Kluwer Academic Publishers, Dordrecht (1992) 7. Kryszkiewicz.: Comparative Study of Alternative Type of Knowledge Reduction in Inconsistent Systems. International Journal of General Systems. 16 (2001) 105–120 8. Beynon, M.: Reducts within The Variable Precision Rough Set Model: A Further Investigation. European Journal of Operational Research. 134 (2001) 592–605 9. Quafatou, M.: RST: A Generalization of Rough Set Theory. Information Sciences. 124 (2000) 301–316 10. Zheng, P., Keyun, Q.: Obtaining Decision Rules And Combining Evidence Based on Modal Logic. Progress in Natural Science (in chinese). 14 (2004) 501–508 11. Zheng, P., Keyun, Q.: Intuitionistic Special Set Expression of Rough Set And Its Application in Reduction of Attributes. Pattern Recognition and Artificial Intelligence(in chinese). 17 (2004) 262–266 12. Slowinski, R., Zopounidis, Dimitras, A. I.: Prediction of Company Acquisition in Greece by Means of The Rough Set Approach. European Journal of Operational Research. 100 (1997) 1–15 13. Jusheng, M., Weizhi, W., Wenxiu, Z.: Approaches to Knowledge Reduction Based on Variable Precision Rough Set Model. Information Sciences. 159 (2004) 255–272
Rough Approximation of a Preference Relation for Stochastic Multi-attribute Decision Problems Chaoyuan Yue, Shengbao Yao, Peng Zhang, and Wanan Cui Department of Control Science and Engineering, Huazhong University of Science and Technology,Wuhan, Hubei, 430074, China
Abstract. Multi-attribute decision problems where the performances of the alternatives are random variables are considered in this paper. The suggested approach grades the probabilities of preference of one alternative over another with respect to the same attribute. Based on the graded probabilistic dominance relation, the pairwise comparison information table is defined. The global preferences of the decision maker can be seen as a rough binary relation. The present paper proposes to approximate this preference relation by means of the graded probabilistic dominance relation with respect to the subsets of attributes.
1
Introduction
Multi-attribute decision making (MADM) is widely applied in many fields such as military affairs, economy and management. When dealing with MADM, it is usual to be confronted to a context of uncertainty. We suppose in this paper that uncertainty is due to the fact that performance evaluations of alternatives on each of the attributes lead to random variables with probability distribution. This kind of problems are called stochastic multi-attribute decision making. Based on the results in [1]-[3], this short paper presents a method to solve above-mentioned problems. In our approach, we define a dominance relation by grading the probabilities of preference of one alternative over another with respect to the same attribute. The global preference of the DM is approximated by means of the graded probabilistic dominance relation with respect to the subsets of attributes. This paper is organized as follows. In the next section graded probabilistic dominance relation about attribute is introduced. The global preference is approximated by means of the graded probabilistic dominance relation. Section 3 is devoted to generation of decision rules and section 4 groups conclusion.
2
Rough Approximation of a Preference Relation
The multi-attribute problem that is considered in this paper can be represented by an < A, Q, E > model, where A is a finite set of potential alternatives ai (i = 1, 2, · · · , n) , Q = {q1 , q2 , · · · , qn } is a finite set of attributes and E is the set L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 1242–1245, 2005. c Springer-Verlag Berlin Heidelberg 2005
Rough Approximation of a Preference Relation
1243
of evaluations Xik expressed by the probability density function fik (xik ) which associates the performance of alternative ai with respect to the attribute qk . For any alternatives ai and aj in A , the possibility of preference of ai over aj with respect to qk can be quantified by the probability pkij = p(Xik ≥ Xjk ). Suppose that random variables Xik and Xjk are independent, we have k k pij = fij (xik , xjk )dxik dxjk = fik (xik )fjk (xjk )dxik dxjk xik ≥xjk
xik ≥xjk
fijk (xik , xjk )
where is the joint probability density function of random variables Xik and Xjk . Notice that 0 ≤ pkij ≤ 1 and pkij + pkji = 1 hold. pkij measures the strength of preference of ai over aj with respect to qk .In order to distinguish the strength, we propose to grade the preference according to the value of pkij . The following set I of interval is defined: I = {[0, 0.15], (0.15, 0.3], (0.3, 0.45], (0.45, 0.55), [0.55, 0.7), [0.7, 0.85), [0.85, 1.0]} In term of above partition, we define the set Tqk of binary graded probabilistic dominance relations on A: Tqk = {Dqhk , h ∈ H} where qk ∈ Q, H = {−3, −2, −1, 0, 1, 2, 3}. The degree h in the relation ai Dqhk aj corresponds to the interval that pkij belongs to one by one. Due to pkij + pkji = 1, ∀ai , aj ∈ A, we have ai Dqhk aj ⇐⇒ aj Dq−h ai . k In order to represent preferential information provided by the DM, we shall use the pairwise comparison table (PCT) introduced by Greco et al [2]. The preferential information concerns a set B ⊂ A of, so called, reference actions, with respect to which the DM is willing to express his/her attitude through pairwise comparisons. Let Q be the set of attributes (condition attributes) describing the alternatives, andD, the decision attribute. The decision table is defined as 4-tuple:T =< U, Q D, VQ VD , g >, where U ⊆ B × B is a finite set of pairs of alternatives, VQ = {Vqk , qk ∈ Q} is the domains of the condition attributesand VD is the domain of the decision attribute, and g : U × (Q ∪ D) −→ VQ VD is a total function. This function is such that: (1)g[(ai , aj ), q] = h, if ai Dqh aj , q ∈ Q, (ai , aj ) ∈ U ; (2)g[(ai , aj ), D] = P , if ai is preferred to aj , (ai , aj ) ∈ U ; (3)g[(ai , aj ), D] = N , if ai is not preferred to aj , (ai , aj ) ∈ U , where ”ai is preferred to aj ” means ”ai is at least as good as aj ”.Generally, the decision table can be presented as in Table 1. Table 1. Pairwise comparison table.
q1 q2 ··· qm D (ai , aj ) g[(ai , aj ), q1 ] g[(ai , aj ), q2 ] · · · g[(ai , aj ), qm ] g[(ai , aj ), D] = P ··· ··· ··· ··· ··· ··· (as , at ) g[(as , at ), q1 ] g[(as , at ), q2 ] · · · g[(as , at ), qm ] g[(as , at ), D] = N
1244
C. Yue et al.
The binary relation P defined on A is the comprehensive preference relation of the DM. In this paper, it is supposed that P is a complete and antisymmetric binary relation on B, i.e., 1)∀ai , aj ∈ B, ai P aj and/or aj P ai ; 2) both ai P aj and aj P ai implies ai = aj . In order to approximate the global preference of the DM, the following dominance relation with respect to the subset of the condition attributes is defined: Definition 3.1. Given ai ,aj ∈ A , ai positively dominates aj by degree h with h respect to the set of attributes R, denoted by ai D+R aj , if and only if ai Dqfk aj with f ≥ h, ∀qk ∈ R; ai negatively dominates aj by degree h with respect to h the set of attributes R, denoted by ai D−R aj , if and only if ai Dqfk aj with f ≤ h, ∀qk ∈ Q. The above defined dominance relations satisfy the following property: h k , then (ai , aj ) ∈ D+S for each S ⊆ R, k ≤ h; Property 3.1. If (ai , aj ) ∈ D+R h k If (ai , aj ) ∈ D−R , then (ai , aj ) ∈ D−S for each S ⊆ R, k ≥ h. In the suggested approach, we propose to approximate the global preference h h relation by D+R and D−R dominance relations. Therefore, P is seen as a rough binary relation. The lower approximation of preferences, denoted by R∗ (P ) and R∗ (N ), and the upper approximation of preferences, denoted by R∗ (P ) and R∗ (N ), are respectively defined as: # ! h h R∗ (P ) = {(D+R ) ∩ U ⊆ P }, R∗ (P ) = {(D+R ) ∩ U ⊇ P }; R∗ (N ) =
h∈H
h∈H
#
!
h {(D−R ) ∩ U ⊆ N }, R∗ (N ) =
h∈H
h {(D−R ) ∩ U ⊇ N }.
h∈H
Let h + h h+ min (R) = min{h ∈ H : (D+R )∩U⊆P }, hmax (R) = max{h ∈ H : (D+R )∩U ⊇ P } h − h h− min (R) = min{h ∈ H : (D−R )∩U ⊇ N }, hmax (R) = max{h ∈ H : (D−R )∩U⊆N }
According to the property 3.1 and the definitions of the approximation of the preferences, the following conclusion can be obtained. Theorem 3.1. h+
(R)
h−
(R)
min R∗ (P ) = D+R
max R∗ (N ) = D−R
3
h+
max ) ∩ U, R∗ (P ) = D+R
h−
(R)
min ) ∩ U, R∗ (N ) = D−R
) ∩ U;
(R)
) ∩ U.
Decision Rules
The rough approximations of preferences can serve to induce a generalized description of alternative contained in the information table in term of ”if · · ·, then · · ·” decision rules. We will consider the following two kinds of decision rules: h h 1.If ai D+R aj , then ai P aj , denoted by ai D+R aj =⇒ ai P aj ; h h aj =⇒ ai N aj . 2.If ai D−R aj , then ai N aj , denoted by ai D−R
Rough Approximation of a Preference Relation
1245
h h Definition 4.1. If there is at least one pair (au , av ) ∈ D+R ∩ U [D−R ∩ U] h such that au P av [au N av ], and as P at [as N at ] holds for each pair (as , at ) ∈ D+R ∩ h h h U [D−R ∩ U ], then ai D+R aj =⇒ ai P aj [ai D−R aj =⇒ ai N aj ] is accept as a D++ decision rule [ D−− -decision rule]. h aj =⇒ ai P aj will be called miniDefinition 4.2. A D++ -decision rule ai D+R k mal if there is not any other rule ai D+S aj =⇒ ai P aj such that S ⊆ R, k ≤ h; h A D−− -decision rule ai D−R aj =⇒ ai N aj will be called minimal if there is not k any other rule ai D−S aj =⇒ ai N aj such that S ⊆ R, k ≥ h. The following theorem 4.1 expresses the relationships between decision rules and the approximations of preferences P and N . Both theorem 4.1 and theorem 4.2 can be useful for the induction of the decision rules. Theorem 4.1. h h (1)If ai D+R aj =⇒ ai P aj is a minimal D++ -decision rule, then R∗ (P ) = D+R ∩ U, h h aj =⇒ ai N aj is a minimal D−− -decision rule, then R∗ (N ) = D−R ∩ (2)If ai D−R U. Theorem 4.2. Assuming U = B × B with B ⊆ A , the following conclusions hold: h aj =⇒ (1)If ai D+R −h ai D−R aj =⇒ ai N aj h aj =⇒ (2)If ai D+R −1 ai D−R aj =⇒ ai N aj
4
ai P aj is a minimal D++ -decision rule and h > 0, then is a minimal D−− -decision rule; ai P aj is a minimal D++ -decision rule and h ≤ 1, then is a minimal D−− -decision rule.
Conclusion
In this paper a new rough set method for stochastic multi-attribute decision problems was presented. It is based on the idea of approximating a preference relation represented in a PCT by graded probabilistic dominance relations. This methodology supplies some very meaningful ”if..., then...” decision rules, which synthesize the preferential information given by the DM and can be suitably applied to obtain a recommendation for the choice or ranking problem. Further research will tend to refine this approach for application.
References 1. Pawlak, Z.: Rough Sets. Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht(1991) 2. Greco, S., Matarazzo, B., Slowinski R.: Rough approximation of a preference relation by dominance relations. European Journal of Operational Research.117(1999)63-83 3. Zaras, K.: Rough approximation of a preference relation by a multi-attribute dominance for deterministic, stochastic and fuzzy decision problems. European Journal of Operational Research.159(2004)196-206
Incremental Target Recognition Algorithm Based on Improved Discernibility Matrix Liu Yong, Xu Congfu, Yan Zhiyong, and Pan Yunhe College of Computer Science, Zhejiang University, Hangzhou 310027, China
Abstract. An incremental target recognition algorithm based on improved discernibility matrix in rough set theory is presented. Some comparable experiments have been completed in our ”Information Fusion System for Communication Interception Information (IFS/CI2 )”. The results of experimentation illuminate that the new algorithm is more efficient than the previous algorithm.
1
Introduction
It is difficult to recognize kinematic targets, such as communication broadcast station and its embarked platform, accurately and efficiently, in communication intercept. Traditional targets recognition approach is implement by problalities[1]. As it is a mathematical method, it achieves results by complex calculate. In this paper, we provide a new approach for the targets recognition implemented by the data mining method. In our Information Fusion System for Communication Interception Information (IFS/CI2 )[2], the system obtains the identity, attributes, location and so on, analyzes communication parameter, characteristic of communication, neighbor information, manual information etc. together with others, and then evaluates situation of battlefield and minatory degree. So the targets recognition can be regard as a data mining process, mining the targets decision rules from the previous raw data attributes, such as the location of targets, frequency of the broadcast in targets etc., then deduce the new coming data items’ type of targets by the previous mining results, normally the results are described as decision rules. IFS/CI2 adopts a hierarchical fusion model, which is divided into two parts: the junior associate module and the senior fusion module. The former gets the type of corresponding station, number, platform, and network station and so on, by associating operation of time and location on communication interception data; the latter gets network station attributes, identity of station, types and deploy of arms and so on, and then evaluates situation of battlefield and minatory degree. Since Rough Set theory [3] is effective in data classification, it works as an important tool in the junior associate module of IFS/CI2 . The literature [4] discusses the kinematic target recognition and tracking algorithm based on Rough Set Theory briefly, and obtains satisfied recognition ratio of L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 1246–1255, 2005. c Springer-Verlag Berlin Heidelberg 2005
Incremental Target Recognition Algorithm
1247
kinematic targets. However, the result of simulation shows that the efficiency of targets recognition algorithm in [4] is not so ideal, and the essential reason is that the non-incremental algorithm of literature [4] has the following shortages: (1)Complexity of the non-incremental algorithm is large. Because the communication interception information is incremental, and this means that we cannot obtain all the data at once, the data are produced in batches. When the new batch of data coming, the non-incremental algorithm in [4] has to re-calculate all the data, which includes the historical data and the new one. So the temporal wastage in the computation is huge. (2) Non-incremental algorithm can’t make full use of the existed classification results and the rules. There is no need to compute from scratch when the new batch of data comes. In fact, the new data only affect few classification results and rules, which need to be modified. Therefore, this paper utilizes the improved discernibility matrix [5] to perfect the incremental targets recognition algorithm, which lays a strong emphasis on improving target recognition efficiency, as well as keeping recognition ratios. Furthermore, this paper introduces confidence factor to solve the problem of quantitative computation of rules, that IFS/CI2 adopts many subjectively experiential rules, and it is lack of objective criterion to verify the validity and veracity. The algorithm presented in this paper can process the inconsistent data[6] efficiently. It can obtain both certain rules, also named consistent rules, and uncertain rules, also named inconsistent rules.
2
Incremental Algorithm for Target Recognition
This section will focus on the incremental platform recognition algorithm in IFS/CI2 . Sine the principle and process of incremental station recognition algorithm is similar to the platform recognition algorithm, the former will not be discussed. Other algorithm in IFS/CI2 , such as data condense algorithm, target network recognition algorithm, numerical scatter algorithm have been discussed in [2,8]. Before presenting the incremental algorithm, the confidence factor of rule and category of incremental data will be first introduced. 2.1
Confidence Factor and Support Factor of Rule
Since the algorithm presented in this paper will extract both consistent and inconsistent rules, we must first introduce the confidence factor of rules, especially to deal with inconsistent rules. Usually, consistent rules can be denoted by: < Condition 1|Condition 2| ... |Condition n → Conclusion > For inconsistent rules, there may be different conclusions, although the condition attributes and the condition attribute combination are the same, it is
1248
L. Yong et al.
necessary to introduce a confidence factor α(0 < α ≤ 1) to identify all the inconsistent rules. The α means the probability of the rule on the same condition attributes. And the denotation is: < Condition 1|Condition 2| ... |Condition n → Conclusion, α > Evidently, consistent rules’ confidence factor α ≡ 1, while the confidence factor of inconsistent rule will be adjusted during the executing of algorithm. For example, the inconsistent rules, which may appear in communication antagonizing, are: < there are 4 or more stations in the platform on the ground→the platform is headquarter, 0.95>
The computation basis of confidence factor is Rough Operator [8,9] in rough set theory. Rough Operator is the conditional probability of decision ψ, given transcendent probability of condition ϕ. And it is can be defined as: p(ψ) = Σ(p(ϕ) ∗ u(ϕ, ψ)) = Σp(ϕ ∧ ψ)
(1)
If the data to be recognized in the database matches multiple platform type recognition rules, it is necessary to improve the former formula of Rough Operator. And the improved one is: p (ψ) = Σ(α ∗ (p(ϕ) ∗ u(ϕ, ψ))) = Σ(α ∗ p(ϕ ∧ ψ))
(2)
Here, α is the confidence factor corresponding to the matched rule. We compute the confidence factors of all matched rules respectively. The larger the Rough Operator is, the higher the matching ratio of the rule. And we choose the rule with largest Rough Operator to judge the platform type. Normally, there are some noise and distortion in those data from sensors. And this will cause wrong rules based on those distorted data. In our algorithm, we introduce a measure, which is similar to the association rule algorithm[10], named support factor of rules, μ, to filter the noise rules. 2.2
Definition of Improved Discernibility Matrix and Classes of Incremental Data
In the incremental station platform recognition algorithm, we adopt the improved discernibility matrix[5], and its definition is as follows: Definition 1 Improved Discernibility Matrix: in the information system S = (U, A), where U denotes domain, and A denotes attribute set composed of
Incremental Target Recognition Algorithm
1249
condition and decision attributes, suppose B ⊆ A, Ei , Ej ∈ U/IN D(B),i, j = 1, 2, ..., n = |U/IN D(B)|;Xk ∈ U/IN D(D), where D is the decision attribute, j = 1, 2, ..., m, m = |U/IN D(D)|. The improved discernibility matrix is an n × n phalanx MS (B) = MS (i, j)n×n , 1 ≤ i, j ≤ n = |U/IN D(B)|. The unit of the phalanx MS (i, j) is defined as follows: If Ei ⊆ Xk , Ej ⊆ Xk , and i = j, then MS (i, j) = N U LL; else MS (i, j) = {a ∈ B : a(Ei ) = a(Ej )}, i, j = 1, 2, ..., n Where, the N U LL means that the difference of the two corresponding items is neglectable. U/IN D(B) and U/IN D(D) denote the classification of domain on condition attribute B and decision attribute D. From the above definition, it is obvious that the improved discernibility matrix must be a symmetry matrix, and the N U LL values in this matrix can be ignored when calculating the discernibility functions and the comparative discernibility functions, so the computational complexity will be decreased greatly. Furthermore, incremental platform recognition depends on classification of incremental data. It classifies the new data according to the consistency of corresponding condition and decision attributes of new data and existed rules. Consider information system S = (U, A), and suppose M is the rule set, there is a rule φi → ϕi , where i is an element in U , φi is the antecedent, and ϕi is the consequent. In this new category system, there exist four possible conditions when a new item of data is added to the information system S. They are defined respectively as follows: Definition 2. CS category the new added datum x belongs to CS category, if and only if ∃(φ → ϕ) ∈ M, φx → φ and ϕx = ϕ. Definition 3. CN category the new added datum x belongs to CN category, if and only if ∀(φ → ϕ) ∈ M, ϕx = ϕ. Definition 4. CC category the new added datum x belongs to CC category, if and only if x does not belong to CN category, and y ∈ U satisfies φx ≡ φy and ϕx = ϕy . Definition 5. P C category the new added datum x belongs to P C category, if and only if x does not belong to CN category, and y ∈ U satisfies φx = φy . 2.3
Incremental Platform Recognition Algorithm
The raw data is incremental, and the previous algorithm [4] must compute all the data from scratch, it is difficult to deal with huge data in real-time. So the platform recognition algorithm that can process incremental data in real-time is required. The incremental platform recognition algorithm gives each platform recognition rule a confidence factor, if there are several rules matching the new item data, we compute the improved rough operator mentioned previously by formula (2), then select the one with largest rough operator to recognize, and at the
1250
L. Yong et al.
same time, the improved discernibility matrix is employed to extract recognition rules contained in the data. When there is new data imported, the forenamed classification definition of incremental data is used to judge and process. The incremental platform recognition algorithm is constructed by three subalgorithm, main recognition algorithm, original data rule-extracting algorithm and incremental data rule-extracting algorithm. The original data rule-extracting algorithm is used to deal with the static data in the beginning of the recognition which is same to our previous approach[4]. And the incremental data ruleextracting algorithm processes the incremental data by their categories that is defined in section 2.2. The recognition algorithm executes after the data condense algorithm, which combines multiple items of data into one item. In data condense algorithm, all the data items describing the same targets at one times-tamp are united into one item, and the unused attributes of raw data are removed from the database. After attributes reduction, there will usual be some conflicts, which is also called inconsistent condition. In this condition, there are more than one items whose condition attributes are all uniform, while they decision attributes are different in the database after data condensing. This may cause by the attributes removed in the condense algorithm. The recognition rules generated by those inconsistent data will be incompatible. In our recognition approach, the condition equivalence class are calculated by Ei ∈ U/IN D(C ∪ D), including the decision attributes into the equivalence categories generating. The conflict items will be classified into different equivalence class, they will produce different rules, and we can choose the rule who has the largest confidence factor.
3
Comparison of Practice Experiment
In IFS/CI2, the processed data arrive batch by batch, it can be regard as a typical incremental data sequence. In the information process system, it is quite important to ensure the system to proceed in real-time and efficiently. The most important criterion to describe the efficiency of this sort of systems is response time, which is the consumed time from the initial data to completing processing of latest data. We design and implement an algorithm comparison experiment based on respond time. In this section, we discuss the practice comparison experiment between incremental algorithm and non-incremental algorithm emphatically on respond time. The hardware of the experiment is PC with Pentium IV 866MHZ CPU, 256M ROM and 40G hard disk, whose operating system is Windows 2000 Server. The comparison experiment results are shown in Figure 1 and Figure 2. In those figures, the time axises are all the response time of fusion center. From the above comparison experiments results, it can be inferred that there is no notable difference between two algorithms when the size of data is small, however, when the size becomes large, incremental platform recognition algorithm is more effective than no-incremental algorithm for it need not process platform data in the database from the beginning.
Incremental Target Recognition Algorithm
1251
Algorithm 1. main recognition algorithm Data: α0 , μ0 , Preliminary parameter database P (C, D), C is the attributes set of parameters, D is the possible decision. Result: The recognition rules set whose support factor larger than μ0 and confidence factor larger than α0 and the recognition result for each data-item. begin Step 1. Data pretreatment, dividing preliminary database P (C, D) into a number of object equivalent class by condition attribute set C: Ei ∈ U/IN D(C ∪ D), i = 1, 2, ..., |U/IN D(C ∪ D)|. divide P (C, D) into a number of decision equivalent classes on decision attribute set D: Xj ∈ U/IN D(D), j = 1, 2, ..., |U/IN D(D)| while The Preliminary DB not NULL do Step 2. Data recognition, if preliminary data item is non-decision then As to data in station database without decision, match the existed rules in rule database to judge the type of the platform, which it belongs to. If there are several rules matching the data, compute the Rough Operator of every rules by formula (2), choose the rule with the largest rough operator to match, and then get the type of platform. Step 3. Rule extracting, if preliminary data is with decision then if data item is current original one(opposite to incremental data) then Call Algorithm2(Original Data Rule-extracting Algorithm) if data item is incremental one then Call Algorithm3(Incremental Data Rule-extracting Algorithm) end
1252
L. Yong et al.
Algorithm 2. Original data Rule-extracting Algorithm Data: α0 , μ0 , Data item in Preliminary parameter database P (C, D). Result: The recognition rules. begin Step 1. Compute improved discernibility matrix MS (i, j), If Ei ⊆ Xk , Ej ⊆ Xk , and i = j, then MS (i, j) = N U LL; else MS (i, j) = {a ∈ C : a(Ei ) = a(Ej )}, i, j = 1, 2, ..., n Step 2. Compute relative discernibility function f (Ei ) of each equivalent class Ei , Step 3. Extract rules from f (Ei ), if Ei ⊆ Xj then Get the consistent rule, α = 1 if Ei ⊂ Xj and Ei ∩ Xj = 0 then Get the inconsistent rule, and its confidence is: |Ei ∩ Xj | |Ei | And the support factor can be calculated as: α=
μ=
|Ei ∩ Xj | |U |
Step 4. Put the rule whose support factor larger than μ0 and confidence factor larger than α0 into the rule database of the recognition algorithm. end
Fig. 1. Response Time of Non-incremental Approach in Platform Data set of IFS/CI2
Incremental Target Recognition Algorithm
1253
Algorithm 3. Incremental Data Rule-extracting Algorithm Data: α0 , μ0 , Incremental data item R in Preliminary parameter database P (C, D). Result: The recognition rules. begin if incremental data R belongs to CS category or CC category then Suppose R ∈ Ei (object equivalent class), consider Ei , and there are the following two situations: (1) When Ei ⊂ Xj (decision equivalent class) and Ei ∩ Xj = 0, the confidence of Des(Ei , C) → Des(Xj , D) is changed to: α=
|Ei ∩ Xj | + 1 |Ei | + 1
the support factor is changed to: μ=
|Ei ∩ Xj | + 1 |U | + 1
as to other Xk , which has the property k = j and Ei Xk = 0, change the confidence of the rule Des(Ei , C) → Des(Xj , D) to: α=
|Ei ∩ Xj | |Ei | + 1
the support factor is changed to: μ=
|Ei ∩ Xj | |U | + 1
(2) When Ei ⊆ Xk , the rule database is not changed. if incremental data R belongs to CN category or PC category then Add a new line below the last line of improved discernibility matrix MS (n, n), and add a new column behind the last column, then get a new improved discernibility matrix MS (n + 1, n + 1). Compute the new discernibility and new relative discernibility functions according to the new matrix, which are used to extract rules. Put the rule whose support factor larger than μ0 and confidence factor larger than α0 into the rule database of the recognition algorithm. end
4
Conclusion
The incremental recognition algorithm has more advantages than the nonincremental one[4]. – It processes data incremental, so the recognition time will decrease effectively when dealing with the incremental data. – It can filter the noise and distorted data by introducing the support factor of rules.
1254
L. Yong et al.
Fig. 2. Response Time of Incremental Approach in Platform Data set of IFS/CI2
– It can deal with the inconsistent condition,which often occurs after data condensing, using the confidence factor of rules The experiments results show that the incremental recognition algorithm can is more effective than the previous one, while keeping the same recognition ratio.
Acknowledgements. This paper is sponsored by National Science Foundation of China (No.60402010)and Zhejiang Province Science Foundation(No.M603169), Advanced Research Project of China Defense Ministry (No.413150804), and partially supported by the Aerospace Research Foundation (No. 2003-HT-ZJDX-13).
References 1. Yaakov Bar-Shalom: Tracking Methods in a Multitarget Environment. IEEE Transactions on Automatic Control. 4 (1978) 618–626 2. Xu congfu, Pan Yunhe: IFS/CI2: An intelligent fusion system of communication interception information. Chinese Journal of Electronics and Information Technology. 10 (2002) 1358–1365 3. Pawlak Z.: Rough set. International Journal of Computer and Information Science. 5 (1982) 341–356 4. Liu Yong, Xu congfu, Pan Yunhe: A New Approach for Data Fusion: Implement Rough Set Theory in Dynamic Objects Distinguishing and Tracing. IEEE International Conference on Systems, Man & Cybernetics, Hague (2004) 3318–3323 5. A. Skowron and C. Rauszer: The discernibility matrices and functions in information systems. Fundamenta Informaticae. 2 (1991) 331–362
Incremental Target Recognition Algorithm
1255
6. Shan N., Ziarko W.: An incremental learning algorithm for constructing decision rules. In: Kluwer.R.S.(eds.): Rough Sets, Fuzzy Sets and Knowledge Discovery, Springer-Verlag (1994) 326–334 7. Liu Yong, Xu Congfu, Li Xuelan, Pan Yunhe: A dynamic incremental rule extracting algorithm based on the improved discernibility matrix. The 2003 IEEE International Conference on Information Reuse and Integration, USA (2003) 93–97 8. Liu Qing, Huang Zhaohua, Yao Liwen: Rough Set Theory: Present State and Prospects. Chinese Journal of Computer Science. 4 (1997) 1–5 9. Liu Qing, Huang Zhaohua, Liu Shaohui, Yao Liwen: Decision Rules with Rough Operator and Soft Computing of Data Mining. Chinese Jounal of Computer Research and Development. 7 (1999) 800–804 10. Agrawal, R., Srikant, S.: Fast Algorithms for Mining Association Rules in Large Databases. In: VLDB’94, Morgan Kaufmann (1994) 487–499
Problems Relating to the Phonetic Encoding of Words in the Creation of a Phonetic Spelling Recognition Program Michael Higgins and Wang Shudong Department of Kansei Design Engineering, Faculty of Engineering and Technology, Yamaguchi University 2-16-1 Tokiwadai, Ube City, Yamaguchi Prefecture, Japan {higginsm, peterwsd}@yamaguchi-u.ac.jp
Abstract. A relatively new area of research in centering on the phonetic encoding of information. This paper deals with the possible computer applications of the Sound Approach© English phonetic alphabet. The authors review some preliminary research into a few of the more promising approaches to the application of the processes of machine learning to this phonetic alphabet for computer spell-checking, computer speech recognition etc. Applying mathematical approaches to the development of a data-based phonetic spelling recognizer, and speech recognition technology used for language pronunciation training in which the speech recognizer allows a large margin of pronunciation accuracy, the authors delineate the parameters of the current research, and point the direction of both the continuation of the current project and future studies.
1 Introduction In 1993-1994, Dr. Michael Higgins of Yamaguchi University, Japan developed and did initial testing on a new system of phonetic spelling of the sounds in English as an aid to learning better English pronunciation and improving listening and spelling skills in English for Japanese students of English. The method, subsequently entitled “A Sound Approach”, has been proven to be a very effective English phonetic system [1]. The Sound Approach (SA) alphabet represents without ambiguity all sounds appearing in the pronunciation of English language words, and does so without using any special or unusual symbols or diacritical marks; SA only uses normal English letters that can be found on any keyboard but arranges them so that consistent combinations of letters always represent the same sound, for example, for the word, “this”, instead of using IPA (International Phonetic Alphabet) symbols of / is/, SA uses /dhis/ to express the pronunciation. Consequently, any spoken word can be uniquely expressed as a sequence of SA alphabet symbols, and pronounced properly when being read by a reader knowing the SA alphabet. For instance, the sentence “One of the biggest problems with English is the lack of consistency in spelling,” is written in SA (showing word stress) as: “Wun uv dhu BI-gust PRAA-blumz widh EN-glish iz dhu lak uv kun-SIS-tun-see in SPEL-ing. ” L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 1256 – 1260, 2005. © Springer-Verlag Berlin Heidelberg 2005
Problems Relating to the Phonetic Encoding of Words
1257
2 Project Development Due to representational ambiguity and the insufficiency of English language characters to adequately and efficiently portray their sounds phonetically, the relationship between a word expressed in SA alphabet and its possible spellings is one to many. That is, each SA sequence of characters can be associated with a number of possible, homophonic sequences of English language characters (e.g. “tuu” is equivalent to “to”, “too”, and “two”). However, within a sentence usually only one spelling for a spoken word is possible. The major challenge in this context is the recognition of the proper spelling of a homophone/homonym given in SA language. In addition to the obvious speech recognition OS that would eventually follow, automated recognition of the spelling has the potential for development of SA-based phonetic text editors which would not require the user to know the spelling rules for the language but only being able to pronounce a word within a relatively generous margin of error and to express it in the simple phonetic SA-based form. Speech recognition parameters could also be adjusted to accommodate the wider margin of error inherent in SA which is based on International Broadcast Standard English. As the SA is phonetic (i.e., a clear one-to-one correlation of sound to spelling), the words themselves could be ‘regionally tuned’ so that one could effectively select the regional accent that they are accustomed to much in the same way that we currently select the keyboard by language groupings. In other words, someone from Australia could select Australia as their text editor and phonetically spell the word ‘day’ as ‘dai’ without contextual ambiguity. (See Fig. 1) ’d like to go to the hospital again today. 1) Ai’d laik tuu go tuu dhu HAAS-pi-tul u-GIN tuu-DEI. (International Standard Broadcast English) 2)Aa’d lak tu go tu dhu HAAS-pi-tul u-GEE-un tu-DEI. (Generic Southern US English) 3) Oi’d laik tuu go tuu dhu HOS-pi-tul u-GEIN tuu-DAI. (Australian-English) Fig. 1. Regionally tuned sample sentences
The approach adapted in this project involves the application of rough sets [2] in the development of a data-based word spelling recognizer. In this part, the techniques of rough sets, supported by rough-set based analytical software such as KDD-R [3], would be used in the analysis of the classificatory adequacy of the decision tables, and their minimization and extraction of classification (decision) rules to be used in the spelling recognition. While the initial identification and minimization of the required number of information inputs in such decision tables would be one of the more labor intensive aspects of the project, it should be emphasized at this point that the latter stages of the process of minimization and rule extraction would be automated to a large degree and adaptive in the sense that inclusion of new spoken word-context combinations would result in regeneration of the classification rules without human intervention. In this sense the system would have some automated learning ability allowing for continuous expansion as more and more experience is accumulated while being used [4]. The adaptive pattern classification part of the system development is absolutely key to the successful deployment of the system. This aspect of the system, however, only becomes
1258
M. Higgins and W. Shudong
important when some of the more essential problems are solved. Given that, let us briefly outline how we plan to use rough sets in our approach. In the word spelling recognition problem, in addition to the problem of homophones, one of the difficulties is the fact that many spoken words given in SA form correspond to a number of English language words given in a standard alphabet. To resolve, or to reduce this ambiguity, the context information must be taken into account. That is, the recognition procedure should involve words possibly appearing before, and almost certainly after the word to be translated into Standard English orthography. In the rough-set approach this will require the construction of a decision table for each word. In the decision table, the possible information inputs would include context words surrounding the given word and other information such as the position of the word in the sentence, and so on. (See Appendix) Several extensions of the original rough sets theory have been proposed recently to better handle probabilistic information occurring in empirical data, and in particular the Variable Precision Rough Sets (VPRS) model [4] which serves as a basis of the software system KDD-R [3] to be used in this project. In the preliminary testing of SA completed in 1997, a selection of homonyms was put into representative sentences. The words in the sentences were assigned numbers (features) according to a simple, and relatively unrefined, grammatical protocol. These numbers were then inserted into decision tables and using KDD-R it was found that the computer could accurately choose the correct spelling of non-dependent homonyms (i.e., those homonyms for which the simple grammatical protocol was unable to determine the correct spelling from the context) 83.3% of the time, as in the sentence, “The ayes/eyes have it.” With dependent homonyms, as in the sentence, “We ate eight meals,” the computer could accurately choose the correct spelling more than 98% of the time [4]. Besides the above usages of SA, we are also testing to build HMM/GMM modules of speech recognizers for English pronunciation training based on SA for an online English pronunciation training system, currently, just for Japanese learners of English. As SA encoded speech recognizer allows larger scope of parameter baseline limitation than any current existing speech recognizer, it would more accurately catch the real error of English pronunciation. For example, many acceptable /f/ pronunciations in Japanese English words are judged as wrong or unable to recognize in the current English pronunciation training system using ASR technology [5]. The “regionally tuned” feature of SA can be effectively used for foreign language pronunciation training, too. Usually a non-native speaker’s English pronunciation is a mixture of accents of American English, British English, Australian English, etc, and of course colored by his/her own mother tongue. In this case, if ASR modules are built based on SA, then users’ input sounds will be encoded to SA with a large margin of speech signal parameters. From SA codes, spoken words are re-coded to regular text on the basis of one-to-one correspondence of SA to regular word, the speech recognition rates will be greatly improved accordingly. Additionally the ASR software training time will be much shorter. Therefore, the speech of foreign language learners who read continuous sentences into a computer will be more easily recognized and translated into correct texts (STT). Also the English learner does not need to worry about what kind of English he/she has to speak.
Problems Relating to the Phonetic Encoding of Words
1259
3 Current Challenges The adaptive pattern classification part of the system development presents one of the largest difficulties. As alluded to above, a major difficult with the approach suggested is that the practicality of re-coding every word in the English language into a Sound Approach coding is problematic, simply due to the size of the problem, and the fact that it is difficult, at this time, to accommodate any practical public machine learning, whereby users could add new words to the Sound Approach© coded dictionary, which is the way normal spell check systems overcome the problem of an initially small dictionary size. To ease this problem, an already phonetically coded dictionary using the IPA symbols as pronunciation guides has been temporarily adopted, and an interface to link the dictionary (IPA) coding with the Sound Approach coding is being developed. This obviously will save a lot of time, as so many words have already been encoded. However, the problem is that such dictionaries do not contain the phonetic coding of all the inflections of a word, e.g. go = goes = went, or play = playing = played = plays. Therefore, we are still faced with the problem of having to encode many words by hand, before the system can be used as a practical phonetic spell checker. This problem must be solved before we can seriously consider other issues such as how to process words ‘in context’ and so on.
4 Prospects and Conclusion If these particular problems can be adequately addressed in the coming year, the authors see no major difficulty for being able to complete the classification of the homonyms into decision tables and, using that as a base, develop a protocol for converting words written in Standard English or IPA symbols into SA characters for ease of encoding. In this way, the interactive database of SA to Standard or Standard to SA could be completed in a relatively short amount of time. This will then make completing the spelling recognition project possible with a more complete regional tunability and, in turn, pave the way to complete voice recognition capability.
References 1. Higgins, M.L, with Higgins M.L and Shima, Y.: Basic Training in Pronunciation and Phonics: A Sound Approach, vol. 19, number 4, The Language Teacher (1995) 4-8. 2. Ziarko,W.: Rough Sets, Fuzzy Sets and Knowledge Discovery. Springer Verlag (1994) 3. Ziarko,W and Shan, N.: KDD-R: A Comprehensive System for Knowledge Discovery Using Rough Sets. Proceedings of the International Workshop on Rough Sets and Soft Computing, San Jose (1994) 164-173. 4. Higgins.M.L, with Ziarko, W.: Computerized Spelling Recognition of Words Expressed in the Sound Approach. New Directions in Rough Sets, Data Mining, and Granular-Soft Computing: Proceedings, 7th International Workshop, RSFDGrC '99. Lecture Notes in Artificial Intelligence 1711, Springer Tokyo (1999). 543-550 5. Goh Kawai and Keikichi Hirose: A Call system for teaching the duration and phone quality of Japanese Tokushuhaku Proceedings of the Joint Conference of the ICA (International Conference on Acoustics) and ASA (Acoustical Society of America) (1998) 2981-2984
1260
M. Higgins and W. Shudong
Appendix Values of the observations (Grammatical Protocol): 0: none 1.verb 2: noun/pronoun 3: adjective 4: adverb 5: article 6: connective 7: number 8: possessive a: let, please, etc. b: will, shall, can (modals), etc. c: prepositions
Fig. 2. Values
Fig. 3. Sample Table 1b — Reduct: “ai”
Table 1. Sample Table 1a: “ai” (IPA symbol: aI ); (Sound Spelling: ai ) Head Word 1 2 3 4 5 6 7 8 9 10 11 12 13
Sentence Number 15 16 17 18 19 20 21 22 23 24 25 26 27
-5
-4 2 7 0 0 0 0 0 0 0 1 0 0 0
-3 b 2 0 0 0 2 0 0 0 1 0 0 0
1 c 0 0 2 1 0 0 2 c 0 0 0
-2
-1 2 2 0 0 1 2 0 0 1 1 0 5 0
Sample Sentences: 15. “I’ll love you for aye.” 16. “All those in favor say, ‘aye’.” 17. “Aye, Captain.” 18. “The ‘ayes’ have it.” 19. “He injured his eye at work.” 20. “He gave me the eye.” 21. “Her eyes are blue.” 22. “The eyes have it.” 23. “She’s making eyes at me.” 24. “I’m going to keep an eye on you.” 25. “He eyed the situation carefully before he went in.” 26. “The letter i comes after the letter h and before j.” 27 “I want to go out tonight.”
Spelling c 1 0 5 8 5 8 5 1 5 2 2 0
aye aye aye ayes eye eye eyes eyes eyes eye eyed i I
Diversity Measure for Multiple Classifier Systems Qinghua Hu and Daren Yu Harbin Institute of Technology, Harbin, China [email protected]
Abstract. Multiple classifier systems have become a popular classification paradigm for strong generalization performance. Diversity measures play an important role in constructing and explaining multiple classifier systems. A diversity measure based on relation entropy is proposed in this paper. The entropy will increase with diversity in ensembles. We introduce a technique to build rough decision forests, which selectively combine some decision trees trained with multiple reducts of the original data based on the simple genetic algorithm. Experiments show that selective multiple classifier systems with genetic algorithms get greater entropy than those of the top-classifier systems. Accordingly, good performance is consistently derived from the GA based multiple classifier systems although accuracies of individuals are weak relative to top-classifier systems, which shows the proposed relation entropy is a consistent diversity measure for multiple classifier systems.
1 Introduction In the last decade, multiple classifier systems (MCS) become a popular technique for building a pattern recognition machine [4, 5]. This system is to construct several distinct classifiers, and then combines their predictions. It has been observed the objects misclassified by one classifier would not necessarily misclassified by another, which suggests that different classifiers potentially offered complementary information. This paradigm is with several names in different views, such as neural network ensemble, committee machine, and decision forest. In order to construct a multiple classifier system, some techniques were exploited. The most widely used one is resampling, which selects a subset of training data with different algorithms. Resampling can be roughly grouped into two classes; one is to generate a series of training sets from the original training set and then trains a classifier with each subset. The second method is to use different feature sets in training classifiers. Random subspace method, feature selection were reported in the documents [2, 4]. The performance of multiple classifier systems not only depends on the power of the individual classifiers in the system, but also is influenced by the independence between individuals [5, 6]. Diversity plays an important role in combining multiple classifiers, which guilds MCS users to design a good ensemble and explain the success of a ensemble systems. Diversity may be interpreted differently from some angles, such as independence, orthogonality or complementarity [7, 8]. Kuncheva pointed that diversity is generally beneficial but it is not a substitute for accuracy [6]. L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 1261 – 1265, 2005. © Springer-Verlag Berlin Heidelberg 2005
1262
Q. Hu and D. Yu
As there are some pair-wise measures, which cannot reflex the whole diversity in MCS, A novel diversity measure for the whole system is presented in the paper, called relation entropy, which is based on the pair-wise measures.
2 Relation Entropy Here we firstly introduce two classical pairwise diversity measures, Q-statistic and correlation coefficient. Given a multiple classifier system with n individual classifiers {C1 , C 2 , C i , , C n } , the joint output of two classifiers, C i and C j , 1 ≤ i, j ≤ n , can be represented in a 2 × 2 table as shown in table 1. Table 1. The relation table with classifiers C i and C j
C j correct (1)
C j wrong (0)
N 11
N 10
C i correct (1)
N 01 Yule introduced Q-statistic for two classifiers defined as C i wrong (0)
Q ij =
N
11
N
00
− N
10
N
01
N
11
N
00
+ N
10
N
01
N 00
The correlation coefficient ρ ij is defined as N 11 N
ρ ij = (N
11
+ N
10
)( N
01
00
+ N
00
− N 10 N )( N
11
01
+ N
01
)( N 10 + N
00
)
Compute the Q-statistic or correlation coefficient of each pair of n classifiers, a matrix will produce: M = rij n×n Here rii = 1 , rij = r ji and | rij |≤ 1 . Therefore
( )
matrix |M| is a fuzzy similarity relation matrix. the greater the value | rij | , i ≠ j , is, the stronger the relation between C i and C j is and then the weaker of independence between classifiers is. The matrix M surveys the total relation of classifiers in the MCS. Given a set of classifiers C = {C1 , C 2 , , C n } , R is a fuzzy relation on C. It can be denoted as a relation matrix ( Rij ) n×n , where Rij is the relation degree between C i and
C j with respect to relation R . As we know that the larger Rij is, the stronger the relation of C i and C j is. As to correlation coefficient, Rij denotes the degree of correlation between C i and C j . If Rij > Rik , we say C i and C j are more indiscernible than C i and C k .
Diversity Measure for Multiple Classifier Systems
1263
Definition 1. Let R be a fuzzy relation over a set C, wi the weight of C i in the ensemble system, 0 ≤ wi ≤ 1 and ¦ wi = 1 . ∀ C i ∈ C . We define expected relation dei
gree of C i to all C j ∈ C with respect to R as follows: n
π (C i ) = ¦ w j • rij j =1
Definition 2. The information quantity of relation degree of C i is defined as I (C i ) = − log 2 π (C i )
It’s easy to show that the larger π (C i ) is, the stronger C i is with other classifiers in the ensemble system, and the less I (C i ) is, which shows that the measure I (C i ) describes the relation degree of C i to all classifiers in system C with respect to relation R. Definition 3. Given any relation R between individuals in multiple classifier system,
and a weight factor series of C, the relation entropy of the pair is defined as H w ( R) = ¦ wi • I (C i ) = − ¦ wi log 2 π (C i ) Ci ∈C
Ci ∈C
Information entropy gives the total diversity of a multiple classifier system if relations used represent the similarity of outputs of individual classifiers. This measure not only takes the relations between classifiers into account, but also computes the weight factors of individual classifiers in ensemble. The proposed information entropy can applied to a number of pairwise similarity measures for multiple classifier systems, such as Q-statistic, correlation coefficient and so on.
3 Experiments Searching the optimal ensemble of multiple classifier systems involves combinational optimization. Genetic algorithms make a good performance in this kind of problems. Some experiments were conducted with UCI data. The numbers of reducts range between 5 and 229. All the trees are trained with CART algorithm and two-thirds samples in each class are selected as training set, others are test set. Here, for simplicity, 20 reducts are randomly extracted from the reduct sets of all data sets if there are more than 20 reducts. Subsequent experiments are conducted on the 20 reducts. The accuracies with different decision forests are shown in table 2. GAS means the forests based on genetic algorithm. TOP denotes the forests with the best trees. We find that GAS ensembles get consistent improvement for all data sets relative to systems combining the best classifiers. All entropies of Q-statistic and correlation coefficient in two kinds of ensembles as to the data sets are shown in table 3. As the entropies represent the total diversity in systems, we can find GAS based ensembles consistently catch more diversity than top-classifier ensembles.
1264
Q. Hu and D. Yu
Table 2. Comparison of decision forests
Data BCW Heart Ionos WDBC Wine WPBC
size 10 6 8 7 7 9
GAS accuracy 0.9766 0.8857 0.9901 0.9704 1.00 0.75
size 10 6 8 7 7 9
TOP accuracy 0.92642 0.85714 0.94059 0.94675 0.97917 0.70588
Table 3. Relation entropy of multiple classifier systems
BCW Heart IONOS WDBC Wine WPBC
TOP 0.1385 0.0031 0.0978 0.1780 0 1.0541
Q-statistic GAS 0.2252 0.4011 0.2313 0.2340 0.0593 1.1466
Correlation coefficient TOP GAS 0.6348 0.7719 0.0698 0.9319 0.7689 1.2639 1.0296 1.2231 0.8740 1.3399 1.7425 1.8319
4 Conclusion Diversity in multiple classifier systems plays an important role in improve classification accuracy and robustness as the performance of ensembles not only depends on the power of individuals in systems, but also is influenced by the independence between individuals. Diversity measures can guild users to select classifiers and explain the success of the multiple classifier system. Here a total diversity measure for multiple classifier systems is proposed in the paper. The measure computes the information entropy represented with a relation matrix. If Q-statistic or correlation coefficient is employed, the information quantity reflexes the diversity of the individuals. We compare two kinds of rough decision forest based multiple classifier systems with 9 UCI data sets. GA based selective ensembles achieve consistent improvement for all tasks compared with the ensembles with best classifiers. Correspondingly, we find the diversity of GAS with the proposed entropy based on Q-statistic and correlation coefficient is consistently greater than that of top-classifier ensembles, which shows that the proposed entropy can be used to explain the advantage of GA based ensembles.
References 1. Ghosh J.: Multiclassifier systems: Back to the future. Multiple classifier systems. Lecture notes in computer science, Vol.2364. Springer-Verlag, Berlin Heidelberg (2002) 1-15 2. Zhou Z., Wu J., Tang W.: Ensembling neural networks: Many could be better than all. Artificial intelligence 137 (2002) 239-263
Diversity Measure for Multiple Classifier Systems
1265
3. Ho Ti Kam: Random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence. 20, (1998) 8, 832-844 4. Czyz J., Kittler J., Vandendorpe L.: Multiple classifier combination for face-based identity verification. Pattern recognition. 37 (2004) 7: 1459-1469 5. Ludmila I. Kuncheval: Diversity in multiple classifier systems. Information fusion. 6, (2005) 3-4 6. Kuncheva L. I. et al.: An experimental study on diversity for bagging and boosting with linear classifiers. Information fusion. 3 (2002) 245-258 7. Hu Qinghua, Yu Daren: Entropies of fuzzy indiscernibility relation and its operations. International journal of uncertainty, fuzziness and knowledge based systems. 12 (2004) 575-589 8. Hu Qinghua, Yu Daren, Wang Mingyang: Constructing rough decision forests. The tenth conference on rough sets, fuzzy sets, data mining and granular computing. 2005
A Successive Design Method of Rough Controller Using Extra Excitation Geng Wang, Jun Zhao, and Jixin Qian Institute of Systems Engineering, Zhejiang University, Hangzhou, 310027, China [email protected]
Abstract. An efficient design method to improve the control performance of rough controller is presented in this paper. As the input-output data of the history process operation may not be enough informative, extra testing signals are used to excite the process to acquire sufficient data reflecting the control laws of the operator or the existing controller. Using data from the successive exciting tests or excellent operation by operators, the rules can be updated and enriched, which is helpful to improve the performance of the rough controller. The effectiveness of the proposed method is demonstrated through two simulation examples emulating PID control and Bang-Bang control, respectively.
1 Introduction In industrial process, it is usually very difficult to obtain quantitative models of the complex process. In such cases it is necessary to observe the operation of experts or experienced operators and discover rules governing their actions for control. Rough set theory [1],[2],[3] provides a methodology for generating rules from process data, which enable to set up a decision-making utility that approximates operator’s knowledge about how to control the system. Such a processor of decision rules is referred to as a “rough controller” [4]. The rough controller has been applied to industrial control successfully [5-11]. However, the result of rough control is coarse. One reason is that the rules extracted from history operating data usually are not sufficient, which means the rules may not provide full knowledge for controlling the outputs, therefore the performance of rough controller may be poor. An improved approach to design rough controller is proposed in this paper, First extra testing signals is used to excite the system to acquire sufficient data reflecting strategy of operators or experts, which is similar to model identification; next, rough set is used to extract rules from the collected data; then the rules are updated and enriched by using data from the successive exciting tests or excellent operations by operators; finally, rough controller is designed based on the rules. This method is applied to emulate two control schemes, a PID control, and a Bang-Bang control, both with good control performance. L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 1266 – 1270, 2005. © Springer-Verlag Berlin Heidelberg 2005
A Successive Design Method of Rough Controller Using Extra Excitation
1267
2 Design Method of Rough Controller 2.1 Data Acquirement and Excitation The proposed method to acquire data is shown in Fig.1. Extra testing signals are impose to excite the process. The signals type can be pseudo-random signal or step signals. Both the control signals and output signals are recorded. Testing signals Plant operators
Control signals
Process
Output signals
Fig. 1. Data collection system
The testing signals should ensure that the system is excited sufficiently, so that the acquired data can cover the whole input-output space and completely reflect the actions of the operator. For example, in the example of emulating PID controller, the error of set point to output signal and change in error are chosen as condition attributes. Pseudo-random signal is used as the testing signal to ensure enough magnitude and enough frequency. In the example of emulating Bang-Bang controller, to ensure the whole input-output space can be overlapped and the control laws of Bang-Bang control can be implemented exactly, multi-step signal with different magnitude is used. 2.2 Rough Set Analysis The goal of the analysis stage is to generate rules. The procedure is as follows: 1. Select condition attributes and decision attributes, and discretize these attributes by manual discretization. 2. Check and remove redundant data. 3. Check contradictory data and process them with weighted average method. 4. Generate minimal decision rules from data. 2.3 Rough Rules Update and Complement As the rules derived from single test may not be sufficient. One or more additional tests need to be performed, and operators’ excellent operations can also be referenced. These tests can use the same type of testing signals, but with different magnitude or frequency. Data acquired from these tests or operators’ excellent operations are used to generate new rules, and these rules are used to update or enrich original knowledge base successively. The method can be processed through the following operations. Suppose U representing the rules set in the original knowledge base, x’ is a new rule extracted from a new test. 1. if ∀ x U , 2. if ∃ x U , nored.
f(x’, ci ) f(x’, ci )
f(x, ci ), then x’ is a new rule and is appended to set U. f(x, ci ), and f(x’, dj ) f(x, dj ), then x’ is ig-
1268
G. Wang, J. Zhao, and J. Qian
3. if ∃ x U , f(x’, ci ) f(x, ci ), and f(x’, dj ) f(x, dj ), then x’ is processed by weighted average method with these rules x, as follows: k
Vd = ¦ Vdk nk i =1
k
¦n
(1)
k
i =1
where k is the number of the decisions, Vd the value of the decisions, n the number of their corresponding objects, Vd the final decision. Take that emulating PID controller as example; total 16 tests are performed successively. The numbers of the rules after being updated and enriched are shown in Table 1. It can be seen that the rules are not sufficient only for the first several tests, and the rule number increases with successive tests. The test can be terminated when the rule numbers do not change. Table 1. The number of the rules after each test
Test Rule Number Test Rule Number
1 73 9 113
2 97 10 113
3 106 11 114
4 110 12 115
5 110 13 115
6 112 14 115
7 112 15 115
8 112 16 115
2.4 Rough Controller Design Based on Rules The basic structure of the rough controller [4] is shown in Fig. 2. It consists of the following parts: rough A/D converter, inference engine, and knowledge base. Knowledge base is the central part that contains rules derived from test data by rough set. Forward feed signals
Knowledge base Rough A/D converter
Input signals
Feedback signals
Output signals
Inference engine
Process
Disturbing signals
Fig. 2. Rough control system
3 Simulation Two examples emulating a PID controller and a Bang-Bang controller, respectively, are used to validate the proposed method. In the examples, the PID controller and Bang-Bang controller are assumed as the plant operator.
A Successive Design Method of Rough Controller Using Extra Excitation
1269
Fig. 3. Comparison between rough controller and PID controller with step input form 0 to 0.25 (Left: output, Right: control move)
In the example of emulating PID controller, the rough controller is expected to response to any input or disturbance distributed in [-0.3, 0.3] like the PID controller. Fig.3 shows the responses of the rough controller and PID controller when tracking step input signal from 0 to 0.25. It is observed that the control move of rough controller is different with the PID controller, but the control performances of the two controllers are very similar.
Fig. 4. Comparison between Bang-Bang controller and rough controller with step input form 0 to 2.9 (Left: output, Right: control move)
Bang-Bang control is a special type of control where control move is only allowed to be discrete value, which is very similar to operators’ control actions. In the example of emulating Bang-Bang controller, the rough controller is expected to response to any input or disturbance distributed in [0, 5] like the Bang-Bang controller. The response to a step input from 0 to 2.9 of the rough controller and Bang-Bang controller is showed in Fig.4, it is observed that the control scheme of rough controller (e.g. two time switches) is the same as the Bang-Bang controller. The small steady error compared with Bang-Bang controller is just owing to the signal magnitude discretization from 2.9 to 3. When the magnitude of input signal is the same as the exciting signal used in testing procedure, the identical response of the rough controller and the BangBang controller is achieved.
1270
G. Wang, J. Zhao, and J. Qian
4 Conclusions A method of designing a rough controller based on the rough set theory is presented in this paper. The method provides a new idea of a control system capable of emulating the human operator’s decision process.
Acknowledgement The authors would like to gratefully acknowledge financial support from the China National Key Basic Research and Development Program under Grant. 2002CB312200.
References 1. Pawlak, Z.: Rough sets. International Journal of Information and Computer Science, 11(1982) 2. Pawlak, Z.: Rough sets.: Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht, The Neatherlands(1991) 3. Mrózek, A.: Rough Sets and Dependency Analysis among Attributes in Computer Implementations of Expert’s Inference Models. International Journal of Man-Machine Studies, 30(1989) 4. Mrózek, A., Planka, L., Kedziera, J.: The methodology of rough controller synthesis. Proceeding of the Fifth IEEE International Conference on Fuzzy Systems, (1996)1135-1139 5. Mrózek, A.: Rough sets in computer implementation of rule-based control of industrial process, Intelligent Decision Support: Handbook of Applications and Advances of the rough sets Theory, Kluwer Academic Publishers, (1992)19-31 6. Mrózek, A., Planka, L.: Rough sets in industrial applications, Rough sets in knowledge discovery 2: applications, case studies and software systems, Physica-Verlag, Heidelberg, (1998)214-237 7. Planka, L., Mrózek, A.: Rule-based stabilization of the inverted pendulum. Computational Intelligence, 11(1995) 348-356 8. Lamber-Torres, G.: Application of rough sets in power system control center data mining, Power Engineering Society Winter Meeting, 1(2002) 627-631 9. Munakata, T.: Rough control: A perspective. Rough sets and Data Mining: Analysis of Imprecise Data. Kluwer Academic Publisher, (1997)77-88 10. Huang, J.J., Li, S.Y., Man, C.T.: A T-S type of rough fuzzy controller based on process input-output data. Proceedings of IEEE Conference on Decision and Control, 5(2003)47294734 11. Peters, J.F., Skowron, A., Suraj, Z.: An application of rough set methods in control design, Fundamental Information, 43 (2000)269-290
A Soft Sensor Model Based on Rough Set Theory and Its Application in Estimation of Oxygen Concentration Xingsheng Gu and Dazhong Sun Research Institute of Automation, East China University of Science & Technology, Shanghai, 200237, P. R. China [email protected]
Abstract. At present, much more research in the field of soft sensor modeling is concerned. In the process of establishing soft sensor models, how to select the secondary variables is still an unresolved question. In this paper, rough set theory is used to select the secondary variables from the initial sample data. This method is used to build the soft sensor model to estimate the oxygen concentration in a regeneration tower and the good result is obtained.
1 Introduction In many chemical processes, due to the limitations of measurement device, it is often difficult to estimate some important process variables [1] (e.g. product composition variables and product concentration variables), or the variables of interest can only be obtained with long measurement delays. But these variables are often used as feedback signals for quality control [2]. If they are not measured fast and right, the quality of the product can be badly affected. In this case, soft sensor can be used to build a model between easily and frequently obtained variables (secondary variables) and those important process variables (primary variables) are stand for the values of primary variables. Soft sensor is a newly developing technique which is on the basis of inferential control. An important step of soft sensor modeling is the selection of secondary variables which have functional relationship with the primary variable. But the optimal selection method of secondary variables is still unresolved. Rough set theory (RST) was proposed by Pawlak as a new method of dealing with fuzzy and uncertain knowledge. In this paper, we apply the rough set theory to the selection of secondary variables in order to propose a new approach for soft sensor technique.
2 Rough Set Theory [3] An information system is: S = (U , A,V , f ) , where U = {x1 , x2 ,", xn }is a nonempty and finite set of objects. A = {a1 , a 2 ," , a n } is a nonempty and finite set of attributes. , is the domain set of a . f : U × A → V is an information function and V = V Va
*
a
a∈ A
L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 1271 – 1276, 2005. © Springer-Verlag Berlin Heidelberg 2005
1272
X. Gu and D. Sun
∀a ∈ A , x ∈ U , f ( x, a) ∈ Va [3 4] . Let P ⊆ A , an indiscernibility relation
ind (P ) is defined as:
ind ( P) = {( x, y ) ∈ U × U ∀a ∈ P, f ( x, a) = f ( y, a)}
(1)
The indiscernibility relation ind (P ) is an equivalence relation on U . All the equivalence classes of the relation ind ( P ) is expressed as
U / ind ( P) = {X 1 , X 2 , ", X m } where
(2)
X i ( i = 1,2, " , m ) is called the i-th equivalence class. X i ⊆ U , X i ≠ Φ , m
X i X j = Φ ( i ≠ j; i, j = 1,2, " , m ), * X i = U [7]. The equivalence class is i =1
also called the classification of U . K = (U , ind ( P )) is called a knowledge base Knowledge (attributes) reduction is one of the kernel parts of the rough set theory. Some attributes are perhaps redundant. If an attribute is redundant, it can be removed from the information system. Reduction and core are two basic concepts of knowledge reduction. Definition 1: suppose R ⊆ A is an equivalence relation, r ∈ R , the attribute r is redundant if ind ( R) = ind ( R − {r }) otherwise r is indispensable in R .
If every r ( r ∈ R ) is indispensable in R , R is independent, otherwise R is dependent. Definition 2: Suppose P ⊆ R , if P is independent and ind ( P) = ind ( R) , then P is
called a reduction of R , expressed as red (R) . R has many reductions, all the indispensable attributes in
R included by all the
reductions is called the core of R , expressed as: core ( R ) = red ( R ) .
P and Q are two equivalence relations of U . posP (Q) is called the positive region of Q with respect to P and can be expressed as: Definition 3: Suppose
posP (Q ) =
* PX .
X ∈U / Q
A decision table is a special information system S = (U , A,V , f ) where A = C * D ,
C D = Φ , C is the condition attributes set, D is the decision attributes set and Φ is an empty set. It’s a two-dimension table. Each row is an object and each column is an attribute. Suppose c ∈ C is a condition attribute, the significance of c with respect to D is defined as:
σ CD (c ) = ( posC ( D ) − posC − c ( D ) ) / U Where
U represent the cardinality of U .
(3)
A Soft Sensor Model Based on Rough Set Theory and Its Application
1273
3 Soft Sensor Modeling Method In this paper, we consider using the input and output sample data to build a soft sensor model for a MISO system based on the rough set theory. It includes three steps. 3.1 Discretization of Continuous Attribute Values
The input and output data of the real process is continuous. Before using rough set theory to judge the significance of one input attribute with respect to the output attribute, the continuous attribute values should be transferred into discretization values expressed as 1, 2, …, n. There exist a number of discretization methods such as fuzzy c-means clustering method [5] and equal-width-interval method. In this paper, we apply equal-width-interval method to these inputs and output attributes. The equal-width-interval method is as follows: for one attribute x ∈ [ xmin , xmax ] ,
x into ni part, each partition point is Y j , j ∈ ( 1,2," ,n ) . If x − Yi = min{ x − Y j }, i = 1,2, " , ni , the discrete value of x is i . Substitute the continuous attribute values
divide
i
with the discrete attribute values can get the decision table [4]. 3.2 Removing the Redundant Condition Attributes [3, 4]
Using the above method of judging the significance of the input attributes with respect to the output attribute to remove the redundant condition attributes. C = {C1 , C 2 ,", C r } (r is the number of the input attributes) are the condition attributes and D = {D1 , D2 ,", Dm } (m is the number of the output attributes) is the output
= 1,2 ," , r . If σ CD (Ci ) is big, the condition attribute C i is significant to D , otherwise, C i is less significant attributes. For each input attribute, compute σ CD (C i ) , i
to D . If σ CD (Ci ) = 0 , Ci is insignificant to D . 3.3 Building Soft Sensor Model
There are several ways to build soft sensor models such as mechanism analysis, regression analysis, artificial neural network and fuzzy techniques and so on. In this paper, we will use the Elman U(k-1) network to build the soft sensor Wih model. Elman network is a dynamic recurrent neural network as shown Who in figure 1 [6]. It can be seen that in Whc Y(k) a Elman network, in addition to the input, hidden and the output units, context Wch X(k) there are also context units. The feedforward weights such as ih, ch Xc(k) and ho are modifiable. The recurrent weights hc is fixed as a constant one. The output vector is Y(k) Rm and the input vector is u(k-1) R Fig. 1. Elman network
1274
X. Gu and D. Sun
X(k) Rn is the hidden layer output vector. Xc(k) is a n between the input and output can be represented as.
n matrix. The relationship
X (k ) = F (Wch X c (k ) + Wih u (k − 1)) X c ( k ) = X ( k − 1) Y (k ) = G (Who X (k ))
F And G are activation functions of hidden units and output units respectively and the dynamic back-propagation (DBP) learning rule is used to train the Elman network.
4 Rough Set Based Soft Sensor Model for Oxygen Concentration In this part, we will use the rough set theory to select the secondary variables and then build a soft sensor for the oxygen concentration in a regeneration tower. Oxygen concentration is an important variable of the regeneration tower. According to the analysis of technological mechanisms, oxygen concentration is relating to 15 variables as shown in figure 2. Variable 16 is the oxygen concentration. Variables 1 to 4 are gas flux. Variables 5 and 6 are gas pressure. Variables 7 to 13 are gas temperature. Variable 14 is the flux of cycling catalyst. Variable 15 is the coking of the catalyst. But these 15 variables are not equally significant to the oxygen concentration. Perhaps some of them can be removed from the initial data set. Rough set theory is a strong tool to do this work. In this application, C = {C1 , C 2 ,", C15 } and D = {D1 }. 200 samples are used to judge the significance of each input variable to the oxygen concentration. Using the above method, we can get the following results shown in Table 2.
Fig. 2. Regeneration Tower
A Soft Sensor Model Based on Rough Set Theory and Its Application
1275
Table 2. Significance of condition attributes to decision attribute
σ CD (C1 )
σ CD (C 2 )
σ CD (C3 )
σ CD (C 4 )
σ CD (C5 )
0
0 σ CD (C7 )
0.003 σ CD (C8 )
0 σ CD (C9 )
0.01
σ CD (C 6 )
0
0
0
0
σ CD (C11 )
σ CD (C12 )
σ CD (C13 )
σ CD (C14 )
0 σ CD (C15 )
0.01
0
0.005
0
0.005
σ CD (C10 )
Fig. 3. The training result of the Elman network
Fig. 4. The generalization result of the Elman network
Table 2 shows that C3 C5 C11 C13 and C15 are significant to D . Using these five variables as the input variables to train the Elman network. The structure of the network is 6-13-1. The train and generalization results are shown in Figure 3 and Figure 4. The root mean square errors for the train and test samples are 0.0184 and 0.0197 respectively which show that the modeling accuracy is high.
1276
X. Gu and D. Sun
5 Conclusions In this paper, we used rough set theory to judge the significance of the initial selected secondary variables to the primary variables in soft sensor technique. On the basis of rough set theory, some less significance or insignificance variables can be removed. This method is applied to the estimation of oxygen concentration in regeneration tower and the satisfactory result is obtained. The simulation results show the effectiveness of the proposed method.
References 1. Zhong, W., Yu, J.S.: MIMO Soft Sensors for Estimating Product Quality with On-line Correction. Chemical Engineering Research & Design,Transaction of the Insitute of Chemical Engineers. 2000, 78(A): 612-620 2. Tham, M.T., Morris, A.J. et al.: Soft-sensing: a Solution to the Problem of Measurement Delays. Chem Eng Res Des, 1989, 67: 547-554. 3. Zhang, W.X., Wu, W.Z., et al.: Rough Set Theory and Methods. Beijing: Science Press, 2001. 4. Luo, J.X., and Shao, H.H.: Selecting Secondary Measurements for Soft Sensor Modeling Using Rough Set Theory. Proceedings of the 4th World Congress on Intelligent Control and Automation, Press of East China University of Science and Technology, 2002: 415-419. 5. Li, M., and Zhang, H.G.: Research on the Method of Neural Network Modeling Based on Rough Sets Theory. ACTA AUTOMATICA SINICA, 2002, 28(1): 27-33. 6. Tham, D.T., and Liu, X.: Training of Elman Networks and Dynamic System Modeling. International Journal of Systems Science, 1996, 27(2):221-226.
A Divide-and-Conquer Discretization Algorithm Fan Min, Lijun Xie, Qihe Liu, and Hongbin Cai College of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 610051, China {minfan, xielj, qiheliu, caihb}@uestc.edu.cn
Abstract. The problem of real value attribute discretization can be converted into the reduct problem in the Rough Set Theory, which is NP-hard and can be solved by some heuristic algorithms. In this paper we show that the straightforward conversion is not scalable and propose a divide-and-conquer algorithm. This algorithm is fully scalable and can reduce the time complexity dramatically especially while integrated with the tournament discretization algorithm. Parallel versions of this algorithm can be easily written, and their complexity depends on the number of objects in each subtable rather than the number of objects in the initial decision table. There is a tradeoff between the time complexity and the quality of the discretization scheme obtained, and this tradeoff can be made through adjusting the number of subtables, or equivalently, the number of objects in each subtable. Experimental results confirm our analysis and indicate appropriate parameter setting.
1
Introduction
The majority of machine learning algorithms can be applied only to data described by discrete numerical or nominal attributes (features). In the case of continuous attributes, there is a need for a discretization algorithm that transforms continuous attributes into discrete ones [1]. And the discretization step determines how coarsely we want to view the world [2]. The problem of real value attribute discretization can be converted into the reduct problem in the Rough Set Theory [3][5][6]. But some existing algorithms is not scalable in practice [2][6] when the decision table has many continuous attributes and/or many possible attribute values. Recently we have proposed an algorithm called the tournament discretization algorithm for situations where the number of attributes is large. In this paper we propose a divide-and-conquer algorithm that can dramatically reduce the time complexity and is applicable especially for situations where the number of objects in the decision table is large. By integrating these two algorithm together, we can essentially cope with decision tables with any size. A parallel version of this algorithm can be easily written and run to further reduce the time complexity. There is a tradeoff between the time complexity and the quality of the discretization scheme obtained, and this tradeoff can be made through adjusting the number of subtables, or equivalently, the number of objects in each subtable. Experimental results confirm our analysis and indicate appropriate parameter setting. L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 1277–1286, 2005. c Springer-Verlag Berlin Heidelberg 2005
1278
F. Min et al.
The rest of this paper is organized as follows: in Section 2 we enumerate relative concepts about decision tables, discretization schemes and discernibility. In Section 3 we analyze existing rough set approaches for discretization and point out their scalability problem. In Section 4 we present and analyze our divide-andconquer discretization algorithm. Some experimental results are given in Section 5. Finally, we conclude and point out further research works in Section 6.
2
Preliminaries
In this section we emulate relative concepts after Nguyen [3] and Komorowski [2], and propose the definition of discernibility for attributes and cuts. 2.1
Decision Tables
Formally, a decision table is a triple S = (U, A, {d}) where d ∈ A is called the decision attribute and elements of A are called conditional attributes. a : U → Va for any a ∈ A ∪ {d}, where Va is the set of all values of a called the domain of a. For the sake of simplicity, throughout this paper we assume that U = {x1 , . . . , x|U| }, A = {a1 , . . . , a|A| } and d : U → {1, . . . , r(d)}. Table 1 lists a decision table. Table 1. A decision table S U x1 x2 x3 x4 x5
2.2
a1 1.1 1.3 1.5 1.5 1.7
a2 0.2 0.4 0.4 0.2 0.4
a3 0.4 0.2 0.5 0.2 0.3
d 1 1 1 2 2
Discretization Schemes
We assume Va = [la , ra ) ⊂ to be a real interval for any a ∈ A. Any pair (a, c) where a ∈ A and c ∈ is called a cut on Va . Let Pa be a partition on Va (for a ∈ A) onto subintervals Pa = {[ca0 , ca1 ), [ca1 , ca2 ), . . . , [caka , caka +1 )} where la = ca0 < ca1 < . . . < caka < caka +1 = ra and Va = [ca0 , ca1 ) ∪ [ca1 , ca2 ) ∪ . . . ∪ [caka , caka +1 ). Hence any partition Pa is uniquely defined and often identified as the set of cuts: {(a, ca1 ), (a, ca2 ), . . . , (a, caka )} ⊂ A × . Any set of cuts P = ∪a∈A Pa defines from S = (U, A, {d}) a new decision table S P = (U, AP , {d}) called P -discretization of S, where AP = {aP : aP (x) = i ⇔ a(x) ∈ [cai , cai+1 ) for x ∈ U and i ∈ {0, . . . , ka }}. P is called a discretization scheme of S. While selecting cut points, we only consider midpoints of adjacent attribute values, e.g., in the decision table listed in Table 1, (a1 , 1.2) is a possible cut while (a1 , 1.25) is not.
A Divide-and-Conquer Discretization Algorithm
2.3
1279
Discernibility
The ability to discern between perceived objects is important for constructing many entities like reducts, decision rules or decision algorithms [8]. The discernibility is often addressed through the discernibility matrix and the discernibility function. In this subsection we propose definitions of discernibility for both attributes and cuts from another point of view to facilitate further analysis. Given a decision table S = (U, A, {d}), the discernibility of any attribute a ∈ A is defined as the set of object pairs discerned by a, this is formally given by DP (a) = {(xi , xj ) ∈ U × U |d(xi ) = d(xj ), i < j, a(xi ) = a(xj )},
(1)
where i < j is required to ensure that the same pair does not appear in the same set twice. The discernibility of an attribute set B ⊆ A is the union of the discernibility of each attribute, namely, # DP (a). (2) DP (B) = a∈B
Similarly, the discernibility of a cut (a, c) is defined by the set of object pairs it can discern, DP ((a, c)) = {(xi , xj ) ∈ U × U | d(xi ) = d(xj ), i < j, a(xi ) < c ≤ a(xj ) or a(xj ) < c ≤ a(xi )}.
(3)
The discernibility of a cut set P is the union of the discernibility of each cut, namely, # DP (P ) = DP ((a, c)). (4) (a,c)∈P
It is worth noting that maintaining discernibility is the most strict requirement used in the rough set theory because it implies no loss of information [11].
3 3.1
Related Works and Their Limitations The Problem Conversion
In this subsection we explain the main idea of the rough set approach for discretization [3] briefly, and we point out that this conversion is independent of the definition of reduction and the reduction algorithm employed. Given a decision table S = (U, A, {d}), denote the cut set containing all possible cuts by C(A). Construct a new decision table S(C(A)) = (U, C(A), {d}) called C(A)-cut attribute table of S, where ∀ct = (a, c) ∈ C(A), ct : U → {0, 1}, and 0, if a(x) < c; ct(x) = { (5) 1, otherwise . Table 2 lists the C(A)-cut attribute table, denoted by S(C(A)), of the decision table S listed in Table 1. We have proven the following two theorems [6]:
1280
F. Min et al. Table 2. S(C(A)) U (a1 , 1.2) (a1 , 1.4) (a1 , 1.6) (a2 , 0.3) (a3 , 0.25) (a3 , 0.35) (a3 , 0.45) d x1 0 0 0 0 1 1 0 1 1 0 0 1 0 0 0 1 x2 1 1 0 1 1 1 1 1 x3 1 1 0 0 0 0 0 2 x4 1 1 1 1 1 0 0 2 x5
Theorem 1. DP (C(A)) = DP (A).
(6)
Theorem 2. For any cut set P , DP (AP ) = DP (P ).
(7)
Therefore the discretization problem (constructing AP from A) is converted into the reduction problem (selecting P from C(A)). Nguyen [3][5] integrated the conversion process with reduction process and employed the Boolean approach. According to above proved theorems, this conversion maintains the discernibility of decision tables, hence it is independent from the definition of reduction or reduction algorithm employed. For example, if the definition of reduction requires that the positive region is maintained, i.e., the decision tables (U, C(A), {d}) and (U, P, {d}) have the same positive region, then S and S P would have the same positive region; if the definition of reduction requires that the generalized decision is maintained, i.e., ∂C(A) = ∂P , then ∂A = ∂AP . 3.2
The Scalability Problem of Existing Approaches
Although the above-mentioned approach seems perfect because the discretization problem has been converted to another problem which is solved by many efficient algorithms, using it directly is not scalable in practice when the decision table has many continuous attributes and/or many possible attribute values. The decision table S(C(A)) = (U, C(A), {d}) has |U | rows and |C(A)| = O(|A||U |) columns, which may be very large. For example, in the data table WDBC of the UCI library [10] (stored in the file uci/breast-cancer-wisconsin/ wdbc.data.txt), there are 569 objects and 31 continuous attributes, each attribute having 300 - 569 different attribute values, and S(C(A)) should contain 15,671 columns, which is simply not supported by the ORACLE system run in our computer. Nguyen [4] also proposed a very efficient algorithm based on Boolean approach. But from our point of view, this approach is not flexible because only object pairs discerned by given cuts are used as heuristic information. Moreover, this approach may not be applicable for mixed-mode data [7]. In previous works [6] we have developed the tournament discretization algorithm. This algorithm has some rounds, during round i the discretization schemes
A Divide-and-Conquer Discretization Algorithm
1281
of decision tables containing 2i conditional attributes (may be less for the last one) are computed on the basis of cut sets constructed in round i − 1, resulting in & |A| 2i ' cut sets. In this process the number of candidate cuts of current decision table could be kept under a relative low degree. By using this algorithm we have computed discretized scheme of WDBC, but it took my compute 10,570 seconds for the plain version and 1,886 seconds for the parallel version. Both are rather long. Moreover, this algorithm may be invalid when the number of possible cuts of any attribute exceeds 1000, namely, the upper bound of columns supported by the ORACLE systems.
4
A Divide-and-Conquer Discretization Algorithm
We can use the divide-and-conquer approach to the other dimension of the decision table, namely, the number of objects. In this section we firstly propose the algorithm structure, then analyze parameter setting. 4.1
The Algorithm Structure
Firstly we list our discretization algorithm in Fig. 1. DivideAndConquerDiscretization (S = (U, A, {d})) {input: A decision table S.} {output: A discretization scheme P .} Step 1. divide S into K subtables; Step 2. compute discretization schemes of subtables; Step 3. compute the discretization scheme P of S based on discretization schemes of subtables; Fig. 1. A Divide-and-Conquer Discretization Algorithm
Generally, in Step 1 we require the family of subtables to be a partition of S. Namely, let the set of subtables be {S1 , S2 , . . . , SK } where Si = (Ui , A, {d}) for all i ∈ {1, 2, . . . , K}, K i=1 Ui = U and ∀i, j ∈ {1, 2, . . . , K}, i = j, Ui ∩ Uj = ∅. Moreover, we require all subtables except the last one to be the same size, namely, |U1 | = |U2 | = . . . = |UK−1 | = & |U| K '. We have these requirements because: 1. Our algorithm is intended for data tables with large amount of rows, subtables containing the same row may not be preferred; 2. Loss of rows, i.e., some rows are not included in any subtable may incur too much loss of information. However, for very huge data tables we may lose this requirement; 3. Subtables with almost the same size is preferred from both statistical and implementation points of view.
1282
F. Min et al.
Moreover, it is not encouraged to construct subtables using adjacent rows, e.g., U1 = {u1 , u2 , . . . , u |U | }, because many data tables such as IRIS are well K organized according to decision attribute values. We propose the following scheme to meet these requirement. Firstly, select a prime number p such that |U |%p = 0. Then generate a set of numbers N = {n1 , n2 , . . . , n|U| } where nj = (j ∗ p)%|U | + 1 for all j ∈ {1, 2, . . . , |U |}. Because |U |%p = 0, it is easy to prove that N = {1, 2, . . . , |U |}. At last we let Ui = {un
(i−1)∗
|U | +1 K
, un
(i−1)∗
|U | +2 K
, . . . , un
i∗
|U | K
}
for all i ∈ {1, 2, . . . K − 1} and UK = {un
(K−1)∗
|U | +1 K
, un
(K−1)∗
|U | +2 K
, . . . , un|U | }.
For example, if U = 8, K = 2 and p = 3, then U1 = {u1 , u4 , u7 , u2 }, U2 = {u5 , u8 , u3 , u6 }. It is easy to see that objects of any subtable are distributed in S evenly with no more tendency, and this scheme of constructing subtables can easily break any bias of S by choosing appropriate p (relatively larger ones such as 73, 97 are preferred). In Step 2 any discretization algorithm could be employed, while in this paper only Nguyen’s algorithm [3] (employed while |A| ≤ 8) and our tournament discretization algorithm [6] (employed while |A| > 8) are concerned to keep a unified form. In Step 3 we use the same idea mentioned in Subsection 3.1. Instead of K using C(A), we use cuts selected from all subtables, i.e., i=1 Pi where Pi is the discretization scheme of Si . 4.2
Time Complexity Analysis
Time required for Step 1 is simply ignorable compared with that of Step 2, and time required for Step 3 is also ignorable if K is not very large. It is worth noting that Step 2 can be easily distributed into K computers/processors and run in parallel. In order to specify its relationship with respective decision table, we use P (S) instead of P to denote the discretization scheme of S, and P (S1 ) to denote the discretization scheme of S1 . Obviously, the time complexity of computing discretization scheme of any subtable is equal to that of S1 . The time complexity of the most efficient reduction algorithm is O(M 2 N log N ) where N is the number of objects and M is the number of attribute [9]. We have developed an entropy-based algorithm with the same complexity. In the following we assume that this algorithm is employed and give time complexity of different algorithms or combinations of algorithms: If we apply Nguey’s algorithm [3] directly to S, because S(C(A)) has O(|A| |U |) cut attributes, the time complexity of computing P (S) is O(|A|2 |U |3 log |U |).
(8)
A Divide-and-Conquer Discretization Algorithm
1283
If we apply the tournament discretization algorithm [6] directly to S, the time complexity of computing P (S) is O(|A||U |3 log |U |),
(9)
O(log |A||U |3 log |U |)
(10)
which is reduced to if the parallel mechanism is employed and there are & |A| 2 ' computers/processors to use [6]. If we apply Nguey’s algorithm [3] to subtables, the time complexity of computing P (S1 ) is |U | 3 |U | ) log( )), (11) O(|A|2 ( K K which is also the time complexity of the parallel version of computing P (S) if there are K computers/processors to use. And the time complexity for the plain version of computing P (S) is O(K|A|2 (
|U | 3 |U | ) log( )). K K
(12)
If we apply the tournament discretization algorithm [6] to subtables, the time complexity of computing P (S1 ) is O(|A|(
|U | |U | 3 ) log( )), K K
(13)
which is reduced to O(log |A|(
|U | 3 |U | ) log( )). K K
(14)
if the parallel mechanism is employed and there are & |A| 2 ' computers/processors to use. And this is also the time complexity of computing P (S) if there are & |A| 2 ' ∗ K computers/processors to use. If the parallel mechanism is not employed, the time complexity for computing P (S) is |U | |U | 3 ) log( )). (15) O(K|A|( K K By comparing equations (8) and (14) or (15) it is easy to see that our algorithms have made great progress on deducing time complexity of the discretization algorithm. 4.3
Tradeoff for Deciding Suitable Parameter
According to above analysis, larger K will incur lower time complexity. But larger K, or equivalently, smaller |U| K has some drawbacks. When we divide S into subtables, we are essentially losing some candidate cuts. For example, if we
1284
F. Min et al.
divide S listed in Table 1 into 2 subtables S1 = ({x1 , x2 }, {a1 , a2 , a3 }, d) and S2 = ({x3 , x4 , x5 }, {a1 , a2 , a3 }, d), cuts (a1 , 1.4), (a3 , 0.35) and (a3 , 0.45) will be lost. When subtables are large enough, the loss of cuts may be trivial and ignorable. But when subtables are relatively small, this kind of loss may be unendurable. For one extreme, if K = |U |, any subtable will contain exactly one object, and there will be no candidate cut at all. Obviously, appropriate K may varies directly as |U | for different applications, and it may be suitable to investigate on appropriate setting of |U| K . In fact, if we keep |U| in a certain range for different applications, according to equations K (11), (13) and (14), the time complexities of parallel versions are not influenced by |U |. We will analyze this issue through examples in the next section.
5
Experimental Results
We are developing a tool called Rough set Developer’s kit (RDK) using the Java platform and the ORACLE system to test our algorithms and also as a basis of application development. For convenience we run our algorithm in my notebook PC with an Intel Centrino 1.4G CPU and 256M memory. The reducing algorithm employed throughout this paper is entropy based. And the discretization algorithm for subtables is the tournament discretization algorithm. Table 3 lists some results of the WDBC dataset. Specifically, K = 1 indicates that we do not divide S into subtables. P OS(S P ) denotes the number of objects in the positive region of the discretized decision table S P . Total time indicates time used for the plain version of our algorithm, while Parallel time indicates time used for the parallel version of our algorithm. Processors indicates the number of computers/processors required for running the parallel version. Step 3 time indicates time required for executing Step 3 of our algorithm. Time units are all seconds. Experiment analysis: 1. While K ≤ 5, the positive region of S P is the same as that of S, i.e., all 569 objects are in the positive region. But this no longer hold true for K > 5. This difference is essential from the Rough Set point of view. 2. If the discretization quality is estimated by the number of selected cuts |P |, suboptimal results could be obtained especially when K ≤ 6. 3. The total time and parallel time decreases as K increases, but this trend does not continue significantly when K > 4. This is partly because that more subtables will incur more overheads. 4. The run time of Step 3 is no longer ignorable while K ≥ 7. 5. In this example, K = 4 is the best setting. 6. For generalized situations, we recommend that |U| K to be between 100 and 150 (corresponding to K = 4 or 5 in this example) because subtables of such size tend to maintain most cuts, while at the same time easy to discretize.
A Divide-and-Conquer Discretization Algorithm
1285
Table 3. Some results of the WDBC data set K 1 2 3 4 5 6 7 8 9 10 11 12
6
P OS(S P ) 569 569 569 569 569 550 550 567 566 554 565 565
|P | 11 13 13 12 11 13 18 13 15 12 12 13
Total time 10,570 2,858 2,354 1,395 1,189 1,225 920 865 942 994 1093 912
Parallel time 1,886 278 207 107 80 108 46 52 58 46 76 36
Processors 16 32 48 64 80 96 112 128 144 160 176 192
Step 3 time 0 11 51 25 24 34 16 26 34 21 49 21
Conclusions and Further Works
In this paper we proposed a divide-and-conquer discretization scheme that divides the given decision table into subtables and combine discretization schemes of subtables. While integrated with the tournament discretization algorithm, this algorithm can discretize decision tables with any size. Moreover, the time complexity of parallel versions of our algorithm is influenced by |U1 | rather than |U| ' processors/computers to use. We have also given |U | if there are K = & |U 1| suggestion of |U1 | to be between 100 and 150 through an example. Further research works include applying our algorithms along with parameter settings on applications to test their validity.
References 1. L. Kurgan and K.J. Cios.: CAIM Discretization Algorithm. IEEE Transactions on Knowledge and Data Engeering 16(2) (2004) 145–153. 2. J. Komorowski, Z. Pawlak, L. Polkowski, A. Skowron.: Rough sets: A Tutorial. S. Pal, A. Skowron (Eds.) Rough Fuzzy Hybridization (1999) 3–98. 3. H.S. Nguyen and A. Skowron.: Discretization of Real Value Attributes. Proc. of the 2nd Joint Annual Conference on Information Science, Wrightsvlle Beach, North Carolina (1995) 34–37. 4. H.S. Nguyen.: Discretization of Real Value Attributes, Boolean Reasoning Approach. PhD thesis, Warsaw University, Warsaw, Poland (1997). 5. H.S. Nguyen.: Discretization Problem for Rough Sets Methods. RSCTC’98, LNAI 1424 (1998) 545–552. 6. F. Min, H.B. Cai, Q.H. Liu, F. Li.: The Tournament Discretization Algrithm, already submitted to ISPA 2005. 7. F. Min, S.M. Lin, X.B. Wang, H. B. Cai, “Attribute Extraction from Mixed-Mode Data,” already submitted to ICMLC 2005.
1286
F. Min et al.
8. R.W. Swiniarski, A. Skowron: Rough Set Methods in Feature Selection and Recognition. Pattern Recognition Letters 24 (2003) 833–849. 9. Liu Shao-Hui, Sheng Qiu-Jian, Wu Bin, Shi Zhong-Zhi, Hu Fei. Research on Efficient Algorithms for Rough Set Methods. Chinese Journal of Computer 26(5) (2003) 524–529 (in Chinese). 10. C.L. Blake, C.J. Merz. UCI Repository of Machine Learning Databases. http://www.ics.uci.edu/ mlearn/MLRepository. html, UC Irvine, Dept. Information and Computer Science, 1998. 11. M. Cryszkiewicz.: Comparative Studies of Alternative Type of Knowledge Reduction in Inconsistent Systems. International Journal of Intelligent Systems 16(1) (2001) 105 – 120.
A Hybrid Classifier Based on Rough Set Theory and Support Vector Machines* Gexiang Zhang1, Zhexin Cao2, and Yajun Gu3 1
School of Electrical Engineering, Southwest Jiaotong University, Chengdu 610031, Sichuan, China [email protected] 2 College of Profession and Technology, Jinhua 321000 Zhejiang, China 3 School of Computer Science, Southwest University of Science and Technology, Mianyang 621002 Sichuan, China
Abstract. Rough set theory (RST) can mine useful information from a large number of data and generate decision rules without prior knowledge. Support vector machines (SVMs) have good classification performances and good capabilities of fault-tolerance and generalization. To inherit the merits of both RST and SVMs, a hybrid classifier called rough set support vector machines (RS-SVMs) is proposed to recognize radar emitter signals in this paper. RST is used as preprocessing step to improve the performances of SVMs. A large number of experimental results show that RS-SVMs achieve lower recognition error rates than SVMs and RS-SVMs have stronger capabilities of classification and generalization than SVMs, especially when the number of training samples is small. RS-SVMs are superior to SVMs greatly.
1 Introduction For many practical problems, including pattern matching and classification [1,2] function approximation [3], data clustering [4] and forecasting [5], Support Vector Machines (SVMs) have drawn much attention and been applied successfully in recent years. The subject of SVM covers emerging techniques that have been proven successful in many traditionally neural network-dominated applications [6]. An interesting property of SVM is that it is an approximate implementation of the structure risk minimization induction principle that aims at minimizing a bound on the generation error of a model, rather than minimizing the mean square error over the data set [6]. SVM is considered as a good learning method that can overcome the internal drawbacks of neural networks [7,8,9]. Although SVMs have strong capability of recognizing patterns and good capabilities of fault-tolerance and generalization, SVMs cannot reduce the input data and select the most important information. Rough Set Theory (RST) can supplement the deficiency of SVMs effectively. RST, introduced by Zdaislaw Pawlak [10] in his seminal paper of 1982, is a new mathematical approach to uncertain and vague data analysis and is also a new * This work was supported by the National EW Laboratory Foundation (No.NEWL51435QT22 0401). L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 1287 – 1296, 2005. © Springer-Verlag Berlin Heidelberg 2005
1288
G. Zhang, Z. Cao, and Y. Gu
fundamental theory of soft computing [11]. In recent years, RST becomes an attractive and promising research issue. Because RST can mine useful information from a large number of data and generate decision rules without prior knowledge [12,13], it is used generally in many fields [10-16], such as knowledge discover, machine learning, pattern recognition and data mining. RST has strong capabilities of qualitative analysis and generating rules, so it is introduced to preprocess the input data of SVMs so as to extract the key elements to be the inputs of SVMs. In our prior work, an Interval-Valued Attribute Discretization approach (IVAD) was presented to process the continuously interval-valued features of radar emitter signals [17]. RST was combined with neural networks to design rough neural networks and experimental results verify that rough neural networks are superior to neural networks [18]. Unfortunately, neural networks have some unsolved problems, such as over-learning, local minimums and network structure decision, especially many difficulties for determining the neural nodes of hidden layers [7,8,9]. So this paper incorporates SVMs with RST to design a hybrid classifier called Rough Set Support Vector Machines (RS-SVMs). The new classifier inherits the merits of both RST and SVMs. Experimental results show that the introduction of RST not only enhances recognition rates and recognition efficiencies of SVMs, but also strengthens classification and generalization capabilities of SVMs. This paper is organized as follows. Section 2 gives feature selection method using RST. Section 3 presents a hybrid classifier based on RST and SVMs. Simulation experimental results are analyzed in section 4. Conclusions are drawn in Section 5.
2 Feature Selection Method RST can only deal with discrete attributes. In engineering applications, especially in pattern recognition and machine learning, the features obtained using some feature extraction approaches usually vary in a certain range (interval values) instead of fixed values because of some reasons such as plenty of noise. The existing discretization methods based on cut-splitting cannot deal with the information system that contains some interval attribute values effectively, while IVAD can discretize well the interval-valued continuous features. So the IVAD is firstly used to process the features. The key problem of IVAD is to choose a good class-sepability criterion function. When an attribute value varies in a certain range, in general, the attribute value always orders a certain law. Without loss of generality, suppose the law is approximate Gaussian distribution. This paper uses the below class-separability criterion function in feature discretization. J = 1−
³ f ( x) g ( x)dx ³ f ( x)dx ⋅ ³ g ( x)dx 2
2
(1)
In (1), f(x) and g(x) represent the probability distribution functions of attribute values of two objects in universe U in a decision system, respectively. Using the criterion function and discretization algorithm in [17], the interval-valued features can be discretized effectively. After discretizing continuous features, some methods in
A Hybrid Classifier Based on Rough Set Theory and Support Vector Machines
1289
RST can be used to select the most discriminatory feature subset from the original feature set composed of a large number of features. This paper introduces attribute reduction method based on discernibility matrix and logic operation [12,13] to reduce discretized decision table. The detailed reduction algorithm is as follows. Step 1 Computing discernibility matrix CD of decision table. Step 2 For the elements cij (cij ≠ 0, cij ≠ φ ) of all nonempty set in discernibility
matrix CD , construct corresponding disjunction logic normal form. Lij = ∨ ai ai ∈cij
(2)
Step 3 Conjunction operation is performed using all disjunction logic normal form Lij and a conjunction normal form L is obtained finally, i.e.
L=
∧
cij ≠ 0, cij ≠φ
Lij
(3)
Step 4 Transforming the conjunction normal form L into disjunction normal form L ' and achieve L ' = ∨ L . Step 5 The results of attribute reduction is achieved. In disjunction normal form L ' , each conjunction item corresponds a result of attribute reduction of decision table and the attributes contained in the conjunction item constitute a set of condition attribute after reduction. Although RST finds all the reducts of the information system, the multi-solution problem brings many difficulties to decide the input features of classifiers of automatic recognition. So this paper introduces the complexity of feature extraction to solve the problem. The complexity is measured using consuming time of feature extraction. In all reducts of the information system, the feature subset with the lowest complexity is considered as the final feature subset.
3 Rough Set Support Vector Machines SVMs have good classification, fault-tolerance and generalization capabilities. Though, SVMs cannot select and reduce the input data. If the dimensionality of input vector is very high, the training time and testing time of SVMs will be very long. Moreover, high-dimensional input feature vector usually has some redundant data, which may lower the classification and generalization performances of SVMs. So it is very necessary to use some methods to preprocess the input data of SVMs. Fortunately, RST can mine useful information from a large number of features and eliminate the redundant features, without any prior knowledge. To introduce strong capabilities of qualitative analysis and generating rules into SVMs, this section uses RST as preprocessing step of SVMs to design a hybrid classifier called RS-SVMs. The structure of RS-SVMs is shown in Fig. 1. The steps of designing RS-SVM are as follows.
1290
G. Zhang, Z. Cao, and Y. Gu
Step 1 Training samples are used to construct a decision table in which all attributes are represented with interval-valued continuous features. Step 2 IVAD is employed to discretize the decision table and discretized decision table is obtained. Step 3 Attribute reduction methods are applied to deal with the discrete attribute table. Using attribute reduction method, multiple solutions are usually obtained simultaneously. So the complexity of feature extraction is introduced to select the final feature subset with the lowest cost from the multiple reduction results. After selection again, the final decision rule can be achieved. Step 4 According to the final decision rule obtained, Naïve Scaler algorithm [12] is used to discretize the attribute table discretized by using IVAD and decide the number and position of cutting points. Thus, all cutting-point values are computed in terms of the attribute table before discretization using IVAD and the discretization rule, i.e. the preprocessing rule of SVMs, is generated. Step 5 The training samples are processed using the preprocessing rule and then are used to be the inputs to train SVMs. Step 6 When SVM classifiers are tested using testing samples or SVM classifiers are used in practical applications, the input data are firstly dealt with using preprocessing rule and then are applied to be inputs of trained SVMs. Training samples
Decision table
Reduction methods
Testing samples
IVAD
Discretized decision table
Naïve Scaler Algorithm
Decision rules
Preprocessing rules
Untrained SVMs
Trained SVMs
Complexity of features
Output
Fig. 1. Structure of RS-SVMs
SVMs were originally designed for binary classification. How to effectively extend it for multiclass classification is still an ongoing research issue [19]. Currently there are two types of approaches for multiclass SVM. One is by constructing and combining several binary classifiers while the other is by directly considering all data in one optimization formulation [19]. The former approach including mainly three methods: one-versus-rest (OVR) [19,20], one-versus-one (OVO) [19,21] and support vector machines with binary tree architecture (BTA) [22]. Some experimental results [1-9,19-22] show that the combinatorial classifier of several binary classifiers is a valid and practical way for solving muticlass classification problem.
A Hybrid Classifier Based on Rough Set Theory and Support Vector Machines
1291
4 Simulations We choose 10 radar emitter signals to make simulation experiments. The 10 signals are represented with x1 , x2 , , x10 , respectively. In our prior work, 16 features have been extracted from the 10 radar emitter signals [23,24,25]. The 16 features are represented with a1 , a2 , , a16 , respectively. After discretization and reduction, the final result a5 , a10 is obtained. To bring into comparison, several feature selection approaches including Resemblance Coefficient method (RC) [25], Class-Separability method (CS) [26], Satisfactory Criterion method (SC) [27], Sequential Forward Search using distance criterion (SFS) [28], Sequential Floating Forward Search using distance criterion (SFFS) [28] and New Method of Feature Selection (NMFS) [29]. The results obtained using RC, CS, SC, SFS, SFFS and NMFS are a2 a7 , a4 a15 , a5 a12 , a1a4 , a4 a5 and a6 a7 , respectively. To test the classification performance of the results obtained by the 7 feature selection methods, BTA is used to construct SVM classifiers to recognize 10 radar emitter signals. Average accurate recognition rates obtained by using the 7 feature selection methods are shown in Table 1. Table 1. Comparison of recognition rates (RR) obtained by 7 methods
Methods Features RR (%)
RC a 2a 7 93.89
CS a4a15 87.69
SC a5a12 95.12
SFS a1a4 63.27
SFFS a4a5 84.73
NMFS a6a7 77.68
Proposed a5a10 95.32
Table 1 shows that the average recognition rate of the proposed method is higher than other 6 methods, which indicates that the feature selection method using RST is superior to other 6 methods. Simultaneously, the experimental results show that the introduced discretization method is feasible and valid. The classification and generalization capabilities of RS-SVMs are compared with those of SVMs using the following experiments. 6 classifiers including OVR-SVM, OVO-SVM, BTA-SVM, OVR-RS-SVM, OVO-RS-SVM and BTA-RS-SVM are employed to recognize the 10 radar emitter signals. The inputs of the 6 classifiers uses the selected feature subset obtained by the proposed method, i.e. two features a5 and a10 . Performance criterions including recognition error rate and recognition efficiency are used to evaluate the several classifiers. Recognition efficiency includes training time (Trt) and testing time (Tet). The samples in training group are used to train 6 classifiers and then the samples in testing group are applied to test the trained classifiers. Statistical results of 100 experiments using the 6 classifiers are shown in Table 2, respectively. To compare the training time and the capabilities of classification and generalization of SVMs with those of RS-SVMs, different samples including 10, 20, 30, 40 and 50 are respectively applied to train OVR-SVM, OVO-SVM, BTA-SVM and OVR-RS-SVM, OVO-RS-SVM, BTA-RS-SVM. Also, testing samples of 5 dB, 10 dB, 15 dB and 20 dB are respectively used to test trained SVM and RS-SVM. After 100 experiments, the changing curves of average recognition rates (ARR)
1292
G. Zhang, Z. Cao, and Y. Gu
obtained using OVR-SVM and OVR-RS-SVM, OVO-SVM and OVO-RS-SVM, BTA-SVM and BTA-RS-SVM, are shown in Fig.2, Fig.3, Fig.4, respectively. The average training time (ATT) spent by OVR-SVM and OVR-RS-SVM, OVO-SVM and OVO-RS-SVM, BTA-SVM and BTA-RS-SVM, are shown in Fig.5, Fig.6, Fig.7, respectively. All experiments are made using a personal computer (P-IV, CPU: 2.0GHz, EMS memory: 256Mb). Table 2. Experimental result comparison of 6 classifiers (%)
Signals
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 Error rate Trt (sec.) Tet (sec.)
SVMs
RS-SVMs
OVR 0 13.86 0 0 0 0 0 73.20 0.33 0 8.74 754.74
OVO 0 0 0 0 0 0 0 0 0 0 0 51.96
BTA 20.36 26.00 0 0 0 0 0 0 0 0.39 4.68 101.14
OVR 0.40 0 0 0 0 0 0 0 0 0 0.04 878.11
OVO 0 0 0 0 0 0 0 0 0 0 0 52.58
BTA 0 0 0 0 0 0 0 0 0 0 0 108.53
233.58
6.56
4.22
232.38
27.64
25.02
Average recognition rate
100
95
90
85
80 OVR-SVM OVR-RS-SVM 75 10
20 30 40 Number of training samples
50
Fig. 2. ARR of OVR-SVM and OVR-RS-SVM
From Table 2 and Fig. 2 to Fig.7, several conclusions can be drawn as follows. (1) Table 2 shows that recognition rates of 3 RS-SVM classifiers including OVRRS-SVM, OVO-RS-SVM and BTA-RS-SVM are higher than or not less than those of 3 SVM classifiers including OVR-SVM, OVO-SVM and BTA-SVM. When the number of training samples is 50, three RS-SVM classifiers including OVR-RS-SVM,
A Hybrid Classifier Based on Rough Set Theory and Support Vector Machines
1293
OVO-RS-SVM, BTA-RS-SVM and OVO-SVM classifier are good classifiers to recognize the 10 radar emitter signals using the selected features. 101
Average recognition rate
100.5 100 99.5 99 98.5 98 97.5
OVO-SVM OVO-RS-SVM
97 10
20 30 40 Number of training samples
50
Fig. 3. ARR of OVO-SVM and OVO-RS-SVM
102
Average recognition rate
100 98
96 94 92
BTA-SVM BTA-RS-SVM
90 10
20 30 40 Number of training samples
50
Fig. 4. ARR of BTA-SVM and BTA-RS-SVM 900
Average training time (s)
800 700 600 500 400 300 200 100 0 10
OVR-SVM OVR-RS-SVM 20 30 40 Number of training samples
50
Fig. 5. ATT of OVR-SVM and OVR-RS-SVM
1294
G. Zhang, Z. Cao, and Y. Gu 55 50 Average training time (s)
45 40 35 30 25 20 15 OVO-SVM OVO-RS-SVM
10 5 10
20 30 40 Number of training samples
50
Fig. 6. ATT of OVO-SVM and OVO-RS-SVM 110 100 Average training time (s)
90 80 70 60 50 40 30 20 10 10
BTA-SVM BTA-RS-SVM 20 30 40 Number of training samples
50
Fig. 7. ATT of BTA-SVM and BTA-RS-SVM
(2) Table 2 and Fig. 5 to Fig.7 show that RS-SVM classifiers need more time than that of SVM classifiers because the discretization procedure in RS-SVM consumes a little time. (3) From Fig.2 to Fig.4, recognition error rates of 3 RS-SVM classifiers are lower than those of 3 SVM classifiers when the number of training samples varies from 10 to 50. The experimental results indicate that classification and generalization capabilities of RS-SVM classifiers are much stronger than those of SVM classifiers, especially when the number of training samples is small. Fig.2 to Fig.4 also show that classification and generalization capabilities of RS-SVM classifiers when the number of training samples is 10 correspond with those of SVM classifiers when the number of training samples is 50. That is to say, RS-SVM classifiers with 10 training samples are superior to SVM classifiers with 50 training samples because the former have much lower recognition error rates and much shorter training and testing time than the latter. In 3 RS-SVM classifiers, the OVO-RS-SVM classifier is the best from recognition rate and recognition efficiency.
A Hybrid Classifier Based on Rough Set Theory and Support Vector Machines
1295
(4) When the number of training samples is 50, OVO-SVM seems to be the best classifier from the evaluation criterions of 6 classifiers. However, Fig.3 and Fig.6 indicate that OVO-RS-SVM is superior to SVM when the number of training samples decreases. (5) If the same values of evaluation criterions of classifiers are obtained, RS-SVM classifiers need much shorter training time and testing time than that of SVM classifiers because RS-SVM classifiers need smaller training samples. Therefore, the above analysis indicates that the introduction of rough set theory decreases recognition error rates and enhances recognition efficiencies of SVM classifiers, and strengthens classification and generalization capabilities of SVM classifiers.
5 Conclusions This paper combines RST with SVMs to design a hybrid classifier. RST is used to preprocess the input data of SVMs both in training procedure and in testing procedure. Because RST selects the most discriminatory features from a large number of features and eliminates the redundant features, the preprocessing step enhances the efficiency of SVMs in training and testing phases and strengthens classification and generation capabilities of SVMs. Experimental results verify that RS-SVMs are much superior to SVMs in recognition capability and in recognition efficiency. The proposed hybrid classifier is promising in other applications, such as image recognition, speech recognition and machine learning.
References 1. Osareh, A., Mirmehdi1, M., Thomas, B., and Markham, R.: Comparative Exudate Classification Using Support Vector Machines and Neural Networks. Lecture Notes in Computer Science, Vol.2489. (2002) 413-420 2. Foody, G.M., Mathur, A.: A Relative Evaluation of Multiclass Image Classification by Support Vector Machines. IEEE Transactions on Geoscience and Remote Sensing, Vol.42, No. 6. (2004) 1335-1343 3. Ma, J.S., Theiler, J., and Perkins, S.: Accurate On-line Support Vector Regression. Neural Computation, Vol.15, No.11. (2003) 2683-2703 4. Ben-Hur, A., Horn, D., Siegelmann, H.T., and Vapnik, V.: Support Vector Clustering. Journal of Machine Learning Research, Vol.2, No.2. (2001) 125-137 5. Kim, K.J.: Financial Time Series Forecasting Using Support Vector Machines. Neurocomputing, Vol.55, No.1. (2003) 307-319 6. Dibike, Y.B., Velickov, S., and Solomatine, D.: Support Vector Machines: Review and Applications in Civil Engineering. Proc. of the 2nd Joint Workshop on Application of AI in Civil Engineering, (2000) 215-218 7. Kecman, V.: Learning and Soft Computing, Support Vector Machines, Neural Networks and Fuzzy Logic Models. The MIT Press, Cambridge, MA (2001) 8. Wang, L.P. (Ed.): Support Vector Machines: Theory and Application. Springer-Verlag, Berlin Heidelberg New York (2005)
1296
G. Zhang, Z. Cao, and Y. Gu
9. Samanta, B.: Gear Fault Detection Using Artificial Neural Networks and Support Vector Machines with Genetic Algorithms. Mechanical Systems and Signal Processing, Vol.18, No.3. (2004) 625-644 10. Pawlak, Z.: Rough Sets. Informational Journal of Information and Computer Science, Vol.11, No.5. (1982) 341-356 11. Lin, T.Y.: Introduction to the Special Issue on Rough Sets. International Journal of Approximate Reasoning, Vol.15, No.4. (1996) 287-289 12. Wang, G.Y.: Rough Set Theory and Knowledge Acquisition. Xi’an Jiaotong University Press, Xi’an (2001) 13. Walczak, B., Massart, D.L.: Rough Sets Theory. Chemometrics and Intelligent Laboratory Systems, Vol.47, No.1. (1999) 1-16 14. Dai, J.H., Li, Y.X.: Study on Discretization Based on Rough Set Theory. Proc. of the first Int. Conf. on Machine Learning and Cybernetics, (2002) 1371-1373 15. Roy, A., Pal, S.K.: Fuzzy Discretization of Feature Space for a Rough Set Classifier. Pattern Recognition Letter, Vol.24, No.6. (2003) 895-902 16. Mitatha, S., Dejhan, K., Cheevasuvit, F., and Kasemsiri, W.: Some Experimental Results of Using Rough Sets for Printed Thai Characters Recognition. International Journal of Computational Cognition. Vol.1, No.4. (2003) 109–121 17. Zhang, G.X., Hu, L.Z., and Jin, W.D.: Discretization of Continuous Attributes in Rough Set Theory and Its Application. Lecture Notes in Computer Science, Vol.3314. (2004) 1020-1026 18. Zhang, G.X., Hu, L.Z., and Jin, W.D.: Radar Emitter Signal Recognition Based on Feature Selection Algorithm. Lecture Notes in Artificial Intelligence, Vol.3339. (2004) 1108-1114 19. Hsu, C.W., Lin, C.J.: A Comparison of Methods for Multiclass Support Vector Machines. IEEE Transaction on Neural Networks. Vol.13, No.2. (2002) 415-425 20. Rifkin, R., Klautau, A.: In Defence of One-Vs-All Classification. Journal of Machine Learning Research, Vol.5, No.1. (2004) 101-141 21. Platt, J.C., Cristianini, N., and Shawe-Taylor, J.: Large Margin DAG’s for Multiclass Classification. Advances in Neural Information Processing Systems, Vol.12. (2000) 547553 22. Cheong, S.M., Oh, S.H., and Lee, S.Y.: Support Vector Machines with Binary Tree Architecture for Multi-class Classification. Neural Information Processing-Letters and Reviews, Vol.2, No.3. (2004) 47-51 23. Zhang, G.X., Hu, L.Z., and Jin, W.D.: Intra-pulse Feature Analysis of Radar Emitter Signals. Journal of Infrared and Millimeter Waves, Vol.23, No.6. (2004) 477-480 24. Zhang, G.X., Rong, H.N., Jin, W.D., and Hu, L.Z.: Radar Emitter Signal Recognition Based on Resemblance Coefficient Features. Lecture Notes in Artificial Intelligence, Vol.3066. (2004) 665-670 25. Zhang, G.X., Jin, W.D., and Hu, L.Z.: Resemblance Coefficient and a Quantum Genetic Algorithm for Feature Selection. Lecture Notes in Artificial Intelligence. Vol.3245. (2004) 155-168 26. Zhang, G.X., Hu, L.Z., and Jin, W.D.: Quantum Computing Based Machine Learning Method and Its Application in Radar Emitter Signal Recognition. Lecture Notes in Artificial Intelligence, Vol.3131. (2004) 92-103 27. Zhang, G.X., Jin, W.D., and Hu, L.Z.: A Novel Feature Selection Approach and Its Application. Lecture Notes in Computer Science, Vol.3314. (2004) 665-671 28. Mitra, P., Murthy, C.A., and Pal, S.K.: Unsupervised Feature Selection Using Feature Similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.24, No.3. (2002) 301-312 , T.J., Wang, H., and Xiao, X.C.: Recognition of Modulation Signal Based on a New 29. Lu Method of Feature Selection. Journal of Electronics and Information Technology, Vol.24, No.5. (2002) 661-666
A Heuristic Algorithm for Maximum Distribution Reduction Xiaobing Pei and YuanZhen Wang Department of Computer Science, HuaZhong University of Science & Technology, Wuhan, Hubei 430074, China [email protected]
Abstract. Attribute reduction is one of the basic contents in decision table. And it has been proved that computing the optimal attribute reduction is NP-complete. A lot of algorithms for the optimal attribute reduction were proposed in consistent decision table. But most decision tables are inconsistent in fact. In this paper, the judgment theorem with respect to maximum distribution reduction is obtained and the significance of attributes is defined in decision table, from which a polynomial heuristic algorithm for the optimal maximum distribution reduction is proposed. Finally, the experimental results show that this algorithm is effective and efficient.
1 Introduction The rough set theory [1] was first proposed by professor Pawlak in 1982, it is an excellent mathematics tool to handle imprecise and uncertain knowledge, it has been successfully applied in many fields such as data mining, decision support [3][4] etc. Attribute reduction is one of the basic contents in rough set theory. It is well known that an information system or a decision table may usually have irrelevant and superfluous knowledge, which is inconvenient for us to get concise and meaningful decision. When reducing attributes, we should eliminate the irrelevant and superfluous knowledge without losing essential information about the original data in decision table. It is NP-complete problem to computing the optimal attribute reductions [2]. Nowadays, the main algorithms for the optimal reduction have been proposed based on positive region [5], mutual information [6], attribute frequency [7] and attribute ordering [8] etc in consistent decision table. But most decision tables are inconsistent in fact. Zhang et al [9] introduced the concept of maximum distribution reduction in decision table and proposed an algorithm for the set of all maximum distribution reduction based on discernible matrix, but the algorithm is not efficient, its time complexity is exponent. In this paper, the judgment theorem with respect to maximum distribution reduction will be given and the significance of attributes will be defined, from which a polynomial heuristic algorithm for the optimal maximum distribution reduction will be proposed. Finally, experimental results for this algorithm will be given. L. Wang and Y. Jin (Eds.): FSKD 2005, LNAI 3613, pp. 1297 – 1302, 2005. © Springer-Verlag Berlin Heidelberg 2005
1298
X. Pei and Y. Wang
2 Basic Concepts of Rough Set Theory In this section, we introduce only the basic notations from rough set approach used in the paper. A decision table S is defined by S=, A=C D is a set of attributes, where C and D are condition attributes and decision attributes respectively, U={x 1 ,x 2 ,…,x n } is a non-empty finite set of objects. V=
V
a
, and V a is value set
a∈ A
of attribute a. f: U × A → V is a total function such that f(x i ,a) ∈ V a for each a ∈ A, x i ∈ U. For every B
⊆ A defines an equivalence relation denoted by ind(B) called
indiscernibility relation as defined below: ind(B)={(x,y): f (x, a
k
)=f(y, a
k
),
∀ a k ∈ B}. Thus U/ind(B) is a set of equivalence classes, as defined below, U/ind(B) ={[x] B : x ∈ U}, where [x] B ={y: (x,y) ∈ ind(B)} is a equivalence class for an example x with respect to concept B. If ind(C) ⊆ ind(D) then it can be said that the decision table S is consistent, else inconsistent. Certain rules can be generated in consistent decision table and possible rules or uncertain rules can also be generated from inconsistent decision table. Let U/ind(D)={D 1 ,D 2 ,...,D r }. The maximum distribution information vector of x ∈ U with respect to attribute set C is defined by: Dinf C (x)={d 1 ,d 2 ,..,d r },
°1 P ( D j /[ x ] C ) = Max { P ( D k /[ x ] C | k = 1, 2 ,..., r } , °¯ 0 P ( D j /[ x ] C ) ≠ Max { P ( D k /[ x ] C | k = 1, 2 ,..., r }
dj= ®
Where P(D j /[x] C )=
| D j [ x ]C | | [ x ]C |
.
For every subset X of U, if Dinf C (x)=Dinf C (y) for every x, y ∈ X then the maximum distribution information vector of X with respect to attribute set C is defined by: Dinf C (X)=Dinf C (y), where y ∈ X . Let B is an attribute subset of set C, if Dinf B (x)=Dinf C (x) for every x ∈ U, then B can be called maximum distribution consistent set. If B is a maximum distribution consistent set and no proper subset of B is maximum distribution consistent set, then B can be called maximum distribution reduction. There may be more than one maximum distribution reductions in decision table [9]. The maximum distribution reductions are particular subset of attributes with the same preserving maximum decision rules as the full set of attributes.
A Heuristic Algorithm for Maximum Distribution Reduction
1299
3 Judgement Theorem of Maximum Distribution Reduction In this section, the judgment theory with respect to maximum distribution reduction will be obtained. Theorem 1: Let S= be a decision table, A=C D, C and D are condition attributes and decision attributes respectively, B ⊆ C is a attribute set. U/ind(D)={D 1 ,D 2 ,...,D r }, U/ind(C)={X 1 ,X 2 ,..,X n }, U/ind(B) ={Y 1 ,Y 2 ,..,Y m }, Y j (j ≤ m)={X
'
' j1 ,X j 2
,..,X
'
jt j
}, where X
'
ji
∈ U/ind(C), 1 ≤ i ≤ t j . Then attributes set
B is maximum distribution consistent set with respect to attribute set C if and only if for each j (1 ≤ j ≤ m) satisfy Dinf C (X
'
j1 )=…=Dinf C
(X
'
).
jt j
Proof: If B is a maximum distribution consistent set, then Dinf B (x)=Dinf B (x) for each x ∈ U. Therefore, we have Dinf C (X ji )=Dinf B (Y j ) for each X '
that is Dinf C (X
'
j1 )=…=Dinf C
(X
'
jt j
'
).
On the other hand, for each x ∈ U, there exist j and i such that x ∈ X x∈ Y
j
(i=1,2,…,t j ),
ji
'
ji
∈ U/ind(C),
∈ U/ind(B).
Suppose that there exist l
≤ r such that P(D l / X 'ji )=Max{P(D k / X 'ji ) | k=1,2,..,r},
where i=1,2,…, t j . Since Dinf C (X
| Dl X 'ji | | Dm X 'ji | | X 'ji |
>
| X 'ji |
'
j1 )=…=Dinf C
'
jt j
), it is easy to conclude that
for all m ≤ r, m ≠ l . tj
tj
¦| D X l
Thus we conclude that
(X
' jk
k =1
| >
¦| D
j
X 'jk |
k =1
tj
tj
¦| X k =1
' jk
|
¦| X
' jk
, m ≤ r, m ≠ l .
|
k =1
'
Therefore we have Dinf C (X ji )=Dinf B (Y j ). That is Dinf B (x)=Dinf C (x) for each x ∈ U. Thus we conclude that B is a maximum distribution consistent set with respect to attribute C. We conclude the judgment theorem of maximum distribution reduction as below according to theorem 1. Theorem 2: Let S= be a decision table, A=C D, C and D are condition attributes and decision attributes respectively, B ⊆ C is a attribute set. Then B is maximum distribution reduction if and only if 1) B is maximum distribution consistent
1300
X. Pei and Y. Wang
set; 2) for each a ∈ B, there is j (1 ≤ j ≤ m) such that Dinf C (X
'
j1 )=…=Dinf C
not true, where U/ind(B-{a})={Y 1 ,Y 2 ,..,Y m }, Y j (j ≤ m)={X X
' ji
' j1
,X
'
(X
' jt j
,..,X
j2
) is
' jt j
},
∈ U/ind(C), 1 ≤ i ≤ t j .
Proof: We can directly complete the proof according to theorem 1.
4 Heuristic Algorithm for Maximum Distribution Reduction 4.1 The Significance of Attributes Definition 1: Let S= be a decision table, A=C D, C and D are condition attributes and decision attributes respectively. The significance of attribute a C is defined by: SGF (a, C, D)= Card ( DC −{a} ), where DC −{a} ={[x] C | Dinf C ([x] C ) ≠ Dinf C −{a} (x), x ∈ U },
Card ( DC −{a} ) denotes cardinality of set
DC −{a} . The more big SGF (a, C, D) is, the more important attribute a is, with respect to attributes C. In this paper, we regard SGF (a, C, D) as heuristic information to search for the optimal reduction. 4.2 Heuristic Algorithm for the Optimal Maximum Distribution Reduction A heuristic algorithm for the optimal maximum distribution reduction will be proposed based on theorem 2 and definition 1. Algorithm 1: A heuristic algorithm for optimal maximum distribution reduction Input: Let S=, A=C D, C is condition attributes and D is decision attributes; Output: the optimal maximum distribution reduction of decision table S. Step 1. Calculate U/ind(C)={X 1 ,X X
j
2
,..,X
n
}; Calculate Dinf
C
(X
j
), where
∈ U/ind(C), j ≤ n;
Step 2. Calculate SGF (a, C, D) for each a ∈ C and arrange the attribute set C into sort ascending by using SGF (a, C, D). Step 3. Let RED=C; Step 4.Judge whether for each attribute a ∈ RED is required from back to front in attribute set RED: Step 5.1 if SGF (a, RED, D)=0, then RED=RED-{a}; Step 6. Return RED;
A Heuristic Algorithm for Maximum Distribution Reduction
1301
In most case, |D|=1, where |D| denotes cardinality of decision attributes set D. The time complexity of step 1 is O(|C| 2
2
|U|
2
); the time complexity of step 2 is
2
2
2
O(|C| |U| ), the time complexity of step 4 is O(|C| |U| ). Therefore the time 2
2
complexity of algorithm 1 is O(|A| |U| ) according to |C|