436 32 3MB
English Pages 227 Year 2002
THESIS ON INFORMATICS AND SYSTEM ENGINEERING
Transparent Fuzzy Systems: Modeling and Control
Andri Riid
TALLINN TECHNICAL UNIVERSITY FACULTY OF INFORMATION TECHNOLOGY DEPARTMENT OF COMPUTER CONTROL
Thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Engineering in Tallinn Technical University
© Andri Riid, 2002
ii
Abstract During the last twenty years, fuzzy logic has been successfully applied to many modeling and control problems. One of the reasons of success is that fuzzy logic provides human-friendly and understandable knowledge representation that can be utilized in expert knowledge extraction and implementation. It is observed, however, that transparency, which is vital for undistorted information transfer, is not a default property of fuzzy systems, moreover, application of algorithms that identify fuzzy systems from data will most likely destroy any semantics a fuzzy system ever had after initialization. This thesis thoroughly investigates the issues related to transparency. Fuzzy systems are generally divided into two classes. It is shown here that for these classes different definitions of transparency apply. For standard fuzzy systems that use fuzzy propositions in IF-THEN rules, explicit transparency constraints have been derived. Based on these constraints, exploitation/modification schemes of existing identification algorithms are suggested, moreover, a new algorithm for training standard fuzzy systems has been proposed, with a considerable potential to reduce the gap between accuracy and transparency in fuzzy modeling. For 1st order Takagi-Sugeno systems that are interpreted in terms of local linear models, such conditions cannot be derived due to system architecture and its undesirable interpolation properties of 1st order TS systems. It is, however, possible to solve the transparency preservation problem in the context of modeling with another proposed method that benefits from rule activation degree exponents. 1st order TS systems that admit valid interpretation of local models as linearizations of the modeled system are useful, for example, in gain-scheduled control. Transparent standard fuzzy systems, on the other hand, are vital to this branch of intelligent control that seeks solutions by emulating the mechanisms of reasoning and decision processes of human beings not limited to knowledgebased fuzzy control. Performing the local inversion of the modeled system it is possible to extract relevant control information, which is demonstrated with the application of fed-batch fermentation. The more a fuzzy controller resembles the expert’s role in a control task, the higher will be the implementation benefit of the fuzzy engine. For example, a hierarchy of fuzzy (and non-fuzzy) controllers simulates an existing hierarchy in the human decision process and leads to improved control performance. Another benefit from hierarchy is that it assumes problem decomposition. This is especially important with fuzzy logic where large number of system variables leads to exponential explosion of rules (curse of dimensionality) that makes controller design extremely difficult or even impossible. The advantages of hierarchical control are illustrated with truck backer-upper applications. iii
Kokkuvõte Viimaste aastakümnete vältel on hägus loogika leidnud edukat rakendust mitmesuguste juhtimis- ja modelleerimisprobleemide lahendamisel. Edu üheks pandiks on olnud asjaolu, et informatsiooni esitus hägus loogika kaudu on lähedane informatsiooni esitusele neis otsustusmehhanismides, mida inimene oma igapäevaelus kasutab. Seejuures tuleb arvestada, et läbipaistvus, hägusate süsteemide omadus, mis on paljude antud rakenduste edukuse oluliseks eelduseks, ei ole vaikimisi tagatud, samuti puudub algoritmide kasutamisel, mis on suutelised andmekogumi põhjal hägusaid süsteeme identifitseerima, igasugune garantii, et tulemuseks on läbipaistev hägus mudel. Käesolevas töös kontsentreerutakse hägusate süsteemide läbipaistvusega seonduvale. Kui tavakäsitluses jagatakse hägusad süsteemid kahte eri klassi, siis töös on näidatud, et nende klasside puhul kehtivad erinevad läbipaistvuse definitsioonid. Klassikaliste hägusate süsteemide puhul, kus KUI-SIIS reeglid seostavad hägusaid määratlusi, on võimalik esitada läbipaistvuse tingimused ilmutatud kujul. Esitatud tingimuste alusel on hinnatud olemasolevate identifitseerimisalgoritmide omadusi ja kasutusvõimalusi. Lisaks on väljatöötatud uudne algoritm, millega on võimalik vähendada eksisteerivat lõhet täpsuse ja läbipaistvuse vahel hägusas modelleerimises. Esimest järku TakagiSugeno hägusate süsteemide jaoks ilmutatud läbipaistvuse tingimuste andmise võimalus puudub, kuid probleemile on võimalik leida lahendus modelleerimise kontekstis, seda teise töös väljatöötatud meetodiga. Esimest järku Takagi-Sugeno süsteemide läbipaistvus on kasulik näiteks metoodikas, tuntud termini gain-scheduling all. Läbipaistvate hägusate klassikaliste süsteemide kasutusvaldkond on veelgi suurem, laiendudes nendele juhtimismeetoditele, mis otsivad lahendusi inimese otsustus ja mõtlemisprotsesside emuleerimise läbi ja ei piirdu vaid teadmuspõhise juhtimisega. Protsessi lingvistilise mudeli piiratud pööramise kaudu on võimalik omandada olulist juhtimisinformatsiooni, mille näiteks on töös esitatud fermentatsiooniprotsessi juhtimise rakendus. Hägusa loogika kasutegur on seda suurem, mida enam regulaatori ülesanne meenutab eksperdi rolli. Regulaatorite hierarhia kopeerib tegelikku hierarhiat inimese otsustusprotsessis ja tagab juhtimiskvaliteedi paranemise. Kuivõrd hierarhilise juhtimissüsteemi konstrueerimise eelduseks on probleemi dekompositsioon, on kasu hägusa loogika valdkonnas veelgi suurem, sest hägus juhtimine on eriti tundlik juhtimisparameetrite paljususe suhtes. Hierarhilise juhtimissüsteemi eeliseid on demonstreeritud auto tagurdamissüsteemi näitel.
iv
Acknowledgements First I would like to thank my supervisor, prof. Ennu Rüstern, for introducing me to the subject, providing excellent working conditions and continuous support throughout the studies. Special thanks go to ex-colleague Mati Pirn for many fruitful discussions in the early stadium of the work. I am even more grateful to Raul Isotamm who did a lot of work on the implementation of algorithms described in the thesis and other students I supervised during those years who all contributed to my work in one way or another. Andres Rähni and colleagues in the Department of Computer Control also deserve a mention here. I would also like to mention gratefully other researchers all over the world who have made their papers available online or sent their papers at my modest request, as well as people who stand behind www.researchindex.org. In this corner of the world it is sometimes difficult to obtain relevant scientific information and cooperation of all such people has been of great help. Many thanks to prof. em. Hanno Sillamaa for proofreading the first draft of the manuscript and pointing out numerous mistakes and how the work could be improved. I am indebted to my family. What one may accomplish in terms of professional career is quite meaningless compared to the importance of having children and not ruining their lives. At least, this is what I think.
Andri Riid Tallinn, April-September 2001, December 2001, February-March 2002
v
vi
Contents 1 Introduction …………………………………………………….
1
1.1 General Background ……………………………………….
1
1.2 Problem statement …………………………………………
6
1.3 Original contribution ………………………………………
6
1.4 Outline of the thesis ………………………………………..
7
2 Fuzzy systems …………………………………………………..
9
2.1 Fuzzy sets ………………………………………………….
9
2.2 Basic properties of fuzzy sets ..…………………………….
10
2.3 Fuzzy partition ..……………………………………………
11
2.4 Operations on fuzzy sets and fuzzy logic ………………….
13
2.5 Fuzzy systems ..…………………………………………….
15
2.6 Rule base properties ………………………...……………...
19
2.7 Inference examples ..……………………………………….
21
2.8 Takagi-Sugeno fuzzy systems…...…………………………
25
2.9 Design of fuzzy systems ..………………………………….
27
2.10 Summary ..………………………………………………..
28
vii
3 Interpolation and transparency in fuzzy systems ……………
31
3.1 Transparency and interpretability ………………………….
31
3.2. Transparency of standard fuzzy systems ………………….
33
3.3 Interpolation in standard systems ………………………….
37
3.3.1 Role of defuzzification………………………………
37
3.3.2 Role of MF type ………..…………………………...
38
3.3.3 Role of inference parameters ……………………….
40
3.3.4 Interpolation in multidimensional space …………....
40
st
3.4. Interpolation in 1 order TS systems …..………………….
41
3.5 Transparency of 1st order TS systems ………...……………
45
th
st
3.6 Relationship between 0 and 1 order TS systems ………..
47
3.7 Summary……………………………………………………
49
4 Fuzzy modeling ………………..……………………………….
51
4.1 Introduction ……………………..…………………………
51
4.2 Fuzzy systems as universal approximators ………………...
54
4.3 Selection of input-output data ...……………………………
54
4.4 Rule-based approaches ..…………………………………...
56
4.4.1 Fuzzy template modeling …………..……………….
56
4.4.2. Yager-Filev fuzzy template modeling algorithm …..
58
4.4.3 Rule weights in modeling ……….………………….
59
4.4.4 Wang-Mendel rule extraction ….…………………...
62
4.5 Least squares method …………...………………………….
64
4.6 Gradient descent ..………………………………………….
68
4.6.1 Gradient descent for fuzzy systems …..…………….
68
4.6.2 The learning process...………………………………
72
4.6.3 Convergence issues and higher order methods ……..
73
4.6.4 Overfitting…...………………………………………
76
4.7 Clustering algorithms…...………………………………….
78
4.7.1 Extraction of fuzzy rules and membership functions.
81
4.7.2. Clustering example ……………………………..….
83
viii
4.8 Genetic Algorithms …...……………………………………
86
4.9 Transparency protection ..………………………………….
89
th
4.9.1 Transparency protection of 0 order TS systems and standard fuzzy systems ...…………………………………
90
4.9.2 Transparency protection of 1st order TS systems …...
92
4.10 Comparison of gradient-based methods ………………….
94
4.10.1 Modeling of a SISO system …………………….....
94
4.10.2 Modeling of a TISO system ……………………….
100
4.11 Modeling of large systems ……………………..…………
101
4.12 Summary and conclusions. ……………………………….
103
5 Fuzzy control …...………………………….…………………...
105
5.1 Introduction ………………………..……………………...
105
5.2 Fuzzy setpoint controllers ………………………………...
107
5.3. Fusion of fuzzy and PID control …………………………..
113
5.4. Inversion of fuzzy systems ……………………………….
116
5.4.1 Numerical inversion of fuzzy systems………………
116
5.4.2 Non-numerical inversion of fuzzy systems……….....
118
5.4.3 Control by inverting a fuzzy model ………………...
124
5.5. Control example…………..….……………………………
128
5.6. Stability issues …..………………………….……………..
134
5.7. Summary and conclusions..………………………………..
136
6 Applications ……………………...……………………………..
139
6.1 Introduction…………………...……………………………
139
6.2 Backing up the truck and truck-and-trailer ………………...
140
6.2.1 Truck backer-upper …..……………………………..
140
6.2.2 Backing up the truck and trailer …………………….
149
ix
x
6.3 Control of a fed-batch fermentation ……………………….
153
6.3.1 Control system for fed-batch fermentation process with a single substrate feed ……………………………….
154
6.3.2 Fed-batch fermentation control (two substrate process)……………………………………………………
160
6.4 Conclusions and comments ……………..…………………
175
7 Conclusions …………………………………………...………...
179
7.1 Transparency conditions ……..…………………………….
179
7.2 Transparent modeling algorithms ..………………………...
180
7.3 Transparent fuzzy control ..………………………………...
181
7.4 Suggestions for further research ..………………………….
182
References ………………………………………………...………
183
Symbols and abbreviations..…………………………………….
193
List of publications ……………………………………………….
195
Appendix A ……………………………………………………….
197
Appendix B ……………………………………………………….
201
Appendix C ……………………………………………………….
205
Appendix D …...…………………………………………………..
213
Transparent fuzzy syst em s: m odeling and cont rol
1 Introduction "Artificial intelligence is the science of making machines do things that would require intelligence if done by men" Marvin Minsky This thesis summarizes author's research experience and principal results achieved in the field of fuzzy modeling and control during last six years. Ability of fuzzy logic to abstract and to explain the complex behavior of systems in linguistic terms has been the driving force behind the research. The introductory chapter describes the general background, the research problem and explains what is to be expected from the rest of the thesis.
1.1 General background Perhaps the biggest dream of mankind is the dream of artificial human being or a thinking machine created by humans. Why is that so, is open to speculations. Perhaps creation of such machine would raise humans into the position of godlike beings to whom nothing is impossible. In science fiction (the particular branch of fiction that explicitly expresses our fears and expectations about future), this theme has been prominent from the very beginning. The history of artificial intelligence (AI) is closely connected to the history of digital computer. There is, however, fundamental difference between the digital computer and human mind. From the very beginning, computer programs were superior to human beings both in speed and accuracy what concerns the solving of complex mathematical problems, e.g. differential equations. On the other hand, it is very difficult to construct robot programs that could see and move well enough to handle ordinary things like children's building blocks and do things like stack them up, take them down, rearrange them, and put them in boxes.
1
Transparent fuzzy syst em s: m odeling and cont rol
The problem does not derive from inadequacy of sensors and actuators alone. The key issue is that human thinking is predominantly inexact. This inexactness is, however, essential for the management of real-world systems, the crucial ability to summarize data and focus on decision-relevant information. This inexactness is something very opposite to what computers can do. Thus, special AI techniques are needed to imitate the human being. Alan Turing, one of the early prominent figures in the field of AI, was among the first to consider the philosophical issues of AI, e.g. the definition and criterion of intelligence. 50 years later, these questions have still no final solution. Many believe that important attributes of intelligence are self-learning and self-awareness. The self-learning problem is somewhat solved by specifically designed (mostly supervised) learning algorithms that allow AI programs to improve themselves (basically the most primitive learning tasks can be solved that way). Selfawareness is believed to be emergent property, i.e. similarly to critical mass it will pop up if when a sufficient amount of mass (intelligence) has been accumulated. Of AI techniques to emerge during the last 50 years, two stand out: Artificial neural networks (ANN) (biologically inspired, as is much of AI) are based on a loose analogy of the presumed workings of a brain and share some important characteristics with the brain. First, as its name suggests, a neural network consists of a network of at least partially connected, simple processing elements. In the biological brain, each processing element is called a neuron. These biological neurons have a body (consisting of a nucleus and the soma), a set of dendrites, an axon, and a set of synaptic buttons. Artificial neuron components are direct analogs to the components of an actual neuron. Each of the inputs (dendrites in actual neuron) is modified by a weight whose function is analogous to that of the synaptic junction in a biological neuron. The processing element consists of two parts. The first part simply sums the weighted inputs, whereas the second part is a nonlinear filter, usually called the activation (most typically threshold or sigmoidal) function through which the combined signal flows. These artificial neurons are usually organized into a sequence of layers. Neural networks perform two major functions: learning and recall. Recall is the process of accepting an input stimulus and producing an output response in accordance with the network weight structure (the weights of the network represent "distributed knowledge"). Learning is the process of adapting the connection weights to produce the desired output vector in response to a stimulus vector presented to the input buffer. Typically the learning is of supervisory type, i.e. another stimulus is presented at the output buffer, representing the desired response to the given input. Note also that recall is an integral part of the learning process since a desired response to the network must be compared to the actual output in order to create an error function.
2
Transparent fuzzy syst em s: m odeling and cont rol
However, if the workings (including learning processes) of the human brain are to be simulated using ANN, drastic simplifications must be adopted. And ANNs, for the engineer, are just design techniques that draw inspiration from the workings of the brain, they are not meant to simulate the brain. A presentday artificial neural network is very simple compared to actual brain, is not selfaware and does not "think"! At this one would ask - where is the point? The answer is a somewhat unexpected paradox: much "expert" adult thinking is basically much simpler than what happens in a child's ordinary play! It can be harder to be a novice than to be an expert. This is because, sometimes, what an expert needs to know and do can be quite simple - only, it may be very hard to discover, or learn, in the first place. Another important AI technique is fuzzy logic. Whereas ANNs simulate the physical aspect of human brain, fuzzy logic or multi-valued logic as opposed to Aristotelian logic, imitates the thinking model of humans. Fuzziness is a property of language. Fuzzy logic is used for reasoning about inherently vague or uncertain concepts and provides a representation scheme and a calculus for dealing with them, On the bottom, fuzzy logic is a generalization of classical (or Aristotelian) logic, in which a concept can possess a degree of truth anywhere between 0.0 and 1.0. Aristotelian logic applies only to concepts that are completely true (having degree of truth 1.0) or completely false (having degree of truth 0.0). Such generalization makes possible the manipulation of such terms as "large," "warm," and "fast," which can simultaneously be seen to belong partially to two or more different, contradictory sets of values. Most applications of fuzzy logic utilize it as the underlying logic system for fuzzy (expert) systems that use a collection of fuzzy membership functions and fuzzy IF-THEN rules, instead of Boolean logic, to reason about data. This could be compared to a very high-level programming language, where the program consists of IF-THEN rules and the compiler or interpreter results in a nonlinear inference algorithm. The inventor of fuzzy logic - L.A. Zadeh originally devised the technique as a means for solving problems in the soft sciences, particularly those that involved interactions between humans, and/or between humans and machines. It is interesting to note that the actual applications of fuzzy logic are far afield from Zadeh's original notion of help for the soft sciences and not in high-level artificial intelligence but rather in low-level machine control. Fuzzy logic is especially useful for situations in which conventional logic technologies are not effective, such as systems and devices that cannot be precisely described by mathematical models, those that have significant uncertainties or contradictory conditions, and linguistically controlled devices or systems.
3
Transparent fuzzy syst em s: m odeling and cont rol
Today, when some scientists are attempting to create life-like robots, another group of scientists is creating life, or something very close to it, using computers to program "organisms" that can "move", "see", "feed", "reproduce", and "die", the mentioned AI-inspired techniques are already in use. In last twenty years the machines incorporating neural networks and/or fuzzy logic earned livings making scientific, medical and financial decisions. This particular branch of AI has grown so prominent that even a special term computational intelligence - has been coined. In a more broad view, computational intelligence can considered as but the first way station on the road to human-friendly integrated AI systems. Fuzzy logic and neural networks, at first glance, may seem to have very little in common: the former is a generalized "multi-valued logic", while the latter is a structure consisting of one or more small, interconnected processing elements. Neural networks can perform effective function approximation but how any individual weight contributes to the system output is unclear - the ANN obtained from the learning process cannot be interpreted, we cannot check if the solution is plausible. For the same reason, we cannot initialize ANN with prior knowledge in any meaningful way, the learning must always start from scratch. ANN cannot also learn anything without training data. Fuzzy systems, on the other hand, are suitable for incorporating prior knowledge and experience and are transparent to interpretation (on what conditions will be explained in this thesis) but without knowledge are pretty useless. The differences, however, can be used to advantage. The general idea is to combine the two in a manner that results in the best of both techniques. Some examples to exploit the complementary relationship are: 1. neural networks may be trained to generate membership values for a fuzzy logic membership function; 2. fuzzy logic functions may be used to "fine tune" a neural network training algorithm; 3. fuzzy logic functions or conditionals may encapsulate the input and/or output layers of a neural network, i.e., inputs are "filtered" through a fuzzy function before entering the neural network, and/or outputs are "filtered" through a fuzzy function after being processed by the network; 4. fuzzy logic functions may access data stored within neural network-based (associative) memories; 5. fuzzy conditional statements may be used to activate subsets of a neural network-based system, and vice versa. Further hybrids of fuzzy logic, neural networks, and other techniques (e.g. genetic algorithms) are possible, depending on what is best for a particular application. Especially fruitful has been the crossover of fuzzy logic and neural networks in a manner where fuzzy systems are trained by a learning algorithm derived from neural network theory. In order to facilitate that, a fuzzy system is usually represented as a special multilayer (usually five-layer) feedforward neural 4
Transparent fuzzy syst em s: m odeling and cont rol
network. In such "neuro-fuzzy" networks, connection weights and propagation and activation functions differ from common neural networks. It must be noted, however, that usually the cooperation between fuzzy logic and neural networks is unidirectional, i.e. neuro-fuzzy systems can be initialized with prior knowledge but lose valid interpretation quickly during the training. This is because the learning procedure of a neuro-fuzzy system does not take the semantic properties of the underlying fuzzy system into account. Moreover, there is no general agreement about these "semantic properties" and how exactly they should be taken into account. This is the topic where the current thesis fits in, making the assumption that the most attractive property of fuzzy systems lies in their ability to process the information both linguistically and numerically and ignoring the linguistic aspect reduces fuzzy logic another black-box technique with its full potential unused. Linguistic interpretation (when valid) is a rather powerful tool for analyzing the numerical data and can be used to obtain useful information about the modeled unknown system. This is just one of many potential application of transparent (the term stands for a property that allows valid linguistic interpretation) fuzzy systems. There is a tradeoff between interpretability and adaptability, which is perhaps one of the reasons why most of research has been focused on adaptation properties of neuro-fuzzy systems without giving proper measure to transparency. Fuzzy logic and fuzzy control in particular have been subjects of rather harsh criticism from the very beginning. Some have seen it as a typical conflict between the well-established conventional theory and a new emerging paradigm. For instance the term "fuzzy" has been repeatedly discredited for being misleading by itself. Further accusations include the claim that anything that can be done with fuzzy logic can be done equally well with classical logic and probabilistic theory. Fuzzy control has been criticized for the ad-hoc design method and for the inability to provide stability analysis for fuzzy control. The former problem is somewhat solved by the recent developments in the field. Ironically, the lack of stability analysis derives primarily from the fact that fuzzy control techniques enable us to design the controller without the mathematical model (that is generally considered a virtue). Supporters of fuzzy techniques, on the other hand, have sometimes made clearly unrealistic predictions and claims. Two typical claims are that fuzzy control provides more robustness than conventional control and that fuzzy control is more suitable for controlling nonlinear processes. This actually depends more on the particular application and the configuration of the controller than on fuzziness - no general proof about it can be provided. The controversy is not helped by the fact that current fuzzy technology is often compared with poor implementations of traditional control technology. The truth is probably somewhere in between - fuzzy logic is certainly not the universal cure for all the troubles of the world but it is difficult to deny its 5
Transparent fuzzy syst em s: m odeling and cont rol
considerable potential for practical applications - basically because such a design method is closer to human thinking and perception and reduces development time.
1.2 Problem statement As stated in section 1.1, fruitful cooperation between neural networks and fuzzy logic has been established but the cooperation itself appears to be rather unidirectional at a closer look. Neural network learning algorithms can be applied to fuzzy systems but the resulting systems are more neural networks than fuzzy systems in the sense that their parameters lose physical meaning and cannot be interpreted correctly. So, it turns out that in most cases we are able to make use of prior knowledge and experience prior to training only - it will be utilized with fuzzy logic in order to aim for the better initial state of the network - but have no further means for checking what has happened to the original knowledge and how has it been modified in training process. This presents a challenge - could the learning potential of neural networks be used without losing transparency of the system? The interest is not purely academic. Generalization and abstraction properties of human beings allow us to control the processes that are still beyond the capabilities of automatic control. If mechanisms that preserve transparency could be established, naturally, many aspects of fuzzy modeling and control would need revision, which is exactly what is attempted in this thesis. Transparency preservation of self-adaptive control systems could by itself be a topic for another thesis and it is quite clear that the current work at best finds answers only to some questions thus many implications of transparency to modeling and control require further consideration. Hopefully, the thesis manages to establish firm foundation and clear perspectives for further research.
1.3 Original contribution The main original contributions of the thesis are the definition of fuzzy system transparency and transparency constraints for fuzzy system parameters derived from the established definition. We, however, are able to derive explicit constraints for standard and 0th order (Takagi-Sugeno) TS systems only. Transparency of 1st order TS systems suffers from undesirable interpolation properties and no constraints that make sense can be derived for the consequent parameters of this type of systems. Nevertheless, the solution can be provided in the context of modeling. Our contribution is a highly flexible method that uses rule activation degree exponents to give more importance to relevant rules and to reduce the contribution of irrelevant ones that in the end results in the identification of local models that admit valid interpretation as local linearizations of the modeled system. 6
Transparent fuzzy syst em s: m odeling and cont rol
Transparency-accuracy tradeoff is similarly observable with standard and 0th order TS systems and the possibilities to reduce the gap between transparency and accuracy have been thoroughly explored in this thesis. A gradient-based optimization method that is applicable to standard fuzzy systems is derived (Appendix D). The proposed algorithm uses the interpolation properties of standard fuzzy systems to provide more efficient transparent approximation. Other contributions are of complementary and derivative nature. We seek for the formalization of methodology that allows us to emulate the control and decision processes of human beings with maximum efficiency. This task requires in-depth research and evaluation of modeling and control techniques. Modeling techniques with potential for transparent modeling are reviewed and their suitability for transparent modeling is evaluated. Modifications of original algorithms are suggested or adopted from the works of other authors, where necessary. Hybrid hierarchical control architecture where control knowledge (that can be obtained from experts or through the analysis of the model (e.g. local linguistic inversion) of the controlled process) is expressed by IF-THEN rules of the supervisor and low-level control is carried out by PID-type controller is found especially suitable for the implementation of transparent control. The applications of truck backer-upper and fed-batch fermentation promptly illustrate the advantages of such control.
1.4 Outline of the thesis The thesis is organized as follows: the next chapter gives the overview of fuzzy set theory, fuzzy logic (in narrower sense) and the theory of fuzzy systems in a measure that is necessary to understand the rest of the thesis. Chapter 3 introduces the concept of transparency and proposes a set of constraints for transparency preservation. Additionally, the interpolation properties of fuzzy systems are reviewed because of close relation of these two topics and because system transparency provides more systematic approach for the analysis of interpolation in fuzzy systems. The chapter provides the foundation for the rest of thesis. Chapter 4 lists a variety of already existing approximation algorithms with built-in transparency protection and suggests the methods for transparency protection with incremental algorithms. Moreover, two new algorithms (in section 4.9.1 and 4.9.2, respectively) to reduce the gap between transparency and accuracy in fuzzy modeling are introduced. Chapter 5 gives a brief overview of fuzzy control techniques, where of particular interest are analytical and linguistic inversion techniques of fuzzy
7
Transparent fuzzy syst em s: m odeling and cont rol
systems that, as shown, greatly benefit from system transparency. The principles of local linguistic inversion are introduced in section 5.4.2. In chapter 6, two control applications that require system transparency for success and make use of hierarchical architecture of the control system are described and the control results are provided. The final chapter summarizes the results of the thesis and points out the subjects for further research. Background of these particular subtopics (transparency, fuzzy modeling, fuzzy control) is surveyed in chapter introductions for reader's convenience.
8
Transparent fuzzy syst em s: m odeling and cont rol
2 Fuzzy systems 2.1 Fuzzy sets L.A. Zadeh (1965) introduced the concept of fuzzy sets and respective theory that can be regarded as the extension of classical set theory. In classical set theory an element x is either a member or non-member of A, subset of the universe X. The membership µ A (x) of x into A is thus given by:
1, if x ∈ A 0, if x ∉ A
µ A ( x) =
(2.1)
Real life presents a number of situations where crisp membership (2.1) is not flexible enough for the accurate description of sets because it forces abrupt transition from absolute membership to non-membership. A typical example would be the problem where given the age of a person we are required to determine if he (she) is young (e.g. in order to calculate some health risk) or not. In other words, the set of young people has to be defined. It would be obvious that people younger than 20 years are unconditionally young whereas people older than 40 are middle-aged quite frankly. The age range between 20 and 40 years, however, is a different matter. Gradual transition would seem more reasonable and is implemented by allowing the membership degree to be chosen from the interval [0,1] (Fig. 2.1). In theoretical works fuzzy sets are often represented by sets of ordered pairs
µ A ( x) = {µ1 / x1 , µ 2 / x 2 , ..., µ n / x n },
(2.2)
where each value of x is paired with its membership value into A. Although such representation is very flexible, allowing arbitrary MF shape, obviously much storage space is required if it comes to practical issues and therefore in application-oriented works functional representation dominates:
µ A ( x) = f ( x)
(2.3)
9
Transparent fuzzy syst em s: m odeling and cont rol
1.0 0.8
µ(age) 0.6 0.4 0.2 0 20
25
30
35 age
40
45
50
Fig. 2.1. Fuzzy set.
2.2 Basic properties of fuzzy sets Here, only some basic properties of fuzzy sets that are needed to understand the rest of the thesis are given. a) The height of a fuzzy set A, hgt(A), is defined by hgt ( A) = sup µ A ( x) x∈X
(2.4
Fuzzy sets with a height equal to 1 are called normal. b) The core of a fuzzy set, is a crisp subset of X:
core( A) = {x ∈ X | µ A ( x) = 1}
(2.5
Normal, piecewise continuous and convex (see below) fuzzy sets with the core that consists of one value only are called fuzzy numbers, in contrast fuzzy sets satisfying first three conditions but with the core that consists of more than one value, are called fuzzy intervals. c) The support of a fuzzy set, is another crisp subset of X
supp( A) = {x ∈ X | µ A ( x) > 0}
(2.6
If the support of a fuzzy set is finite, it is called compact support. A convex fuzzy set is characterized by
∀x1 , x 2 , x3 ∈ X , x1 ≤ x 2 ≤ x3 → µ A ( x 2 ) ≥ min( µ A ( x1 ), µ A ( x3 )) .
10
(2.7)
Transparent fuzzy syst em s: m odeling and cont rol
1.0 0.8
µA(x) 0.6
hgt (A)
0.4 0.2 0
a
b
c
d
x
core (A) supp (A)
Figure 2.2. Height, support and core of a fuzzy set.
Fuzzy sets used in applications are generally convex fuzzy numbers or intervals. Most often piecewise linear standard functions are used, such as trapezoid or triangular membership functions (see A.3, A.5 in Appendix A). Trapezoidal membership function (MF) is determined by four parameters a ≤ b ≤ c ≤ d, where a = min(supp(A)), b = min(core(A)), c = max(core(A)), d = max(supp(A)) (Fig. 2.2). Triangular MF can then be considered a special case of described function, with b = c. The second group of MFs are “smooth” like Gaussian MF, determined by two parameters (A.2). Gaussian MF is differentiable and has compact representation.
2.3 Fuzzy partition Let us assume that we have to solve the problem where the sole definition of the fuzzy set "young" is not accurate enough and all people have to be assigned to sets like young, middle-aged, old. The resulting partition (Fig.2.3) would be called fuzzy partition and consists of fuzzy sets, that are identified through linguistic labels (terms) assigned to them. There is considerable overlap between the fuzzy sets. The advantage of fuzzy sets over the crisp ones becomes more clearer. Partial membership (fuzziness) allows the description of concepts in which the boundary between having a property and not having a property is not sharp (e.g. it would be really difficult to say if the person being of age 60 is old or middleaged). Moreover, by using fuzzy sets and their linguistic labels we are able to move from numbers to abstractions (or the opposite) that is natural for human beings but is otherwise difficult to formulate mathematically.
11
Transparent fuzzy syst em s: m odeling and cont rol
AGE
Linguistic variable
middle-aged
young
Linguistic labels ( terms)
old
1.0 0.8
µA(x) 0.6
Membership functions
0.4 0.2 0 0
20
40
60
80
100
x (age)
Numerical values Base variable
Figure 2.3. A fuzzy partition.
It is usually desired that each value of x has nonzero membership value for at least one fuzzy set:
∀x ∈ X , ∃i, µ Ai ( x) > 0
(2.8)
or in alternative transcription: S
∀x ∈ X : ∑ µ Ai ( x) > 0 ,
(2.9)
s =1
where S is the number of fuzzy subsets that make up the partition. A partition satisfying (2.9) has coverage property. Particularly interesting partitioning type (for reasons that become clear in chapter 3) is the one with what S
∑ µ A ( x) = 1, ∀x ∈ X , s =1
s
(2.10)
often referred to as a fuzzy partition (or Ruspini partition). In case of a fuzzy partition for each x its total membership is equal to 1, whereas it can belong maximum to two fuzzy subsets. The partition in Fig. 2.3 appears to be a fuzzy partition. Semantic soundness of the partition is a rather empirically determined quality. A partition can be considered semantically sound if fuzzy sets that form the partition, are convex and normal, "sufficiently" distinct and the number of subsets per variable is relatively small (maximum values from 7 to 10 have been suggested (de Oliveira 1999). Another empirical property - semantic consistency – measures the consistency between the intuitive semantics of linguistic labels and the corresponding fuzzy
12
Transparent fuzzy syst em s: m odeling and cont rol
subsets to which the labels are assigned. Note that the semantics of the linguistic labels is not always utilized (plain labels like "mf1", "mf2", etc. are sometimes used that carry little information). Semantics of a linguistic item depends on its context, consequently, semantic consistency of a partition also depends on the context of the problem. Imagine a fuzzy set having nonzero membership for values in the range [30, 40], labeled as "young". Semantic consistency of the partition then depends on the definition of universal set X, e.g. with X = [30, 100] we have semantically consistent partition but with X = [0, 40] semantic inconsistency is detected. The linguistic ordering of fuzzy sets also plays a role here, e.g. the center parameter of the fuzzy subset corresponding to the linguistic label "old" should always be than the one corresponding to the linguistic label "young".
2.4 Operations on fuzzy sets and fuzzy logic We consider basic set operations known from classic set theory such as intersection, union and complement. These extensions are not uniquely defined (as in classical theory) due to the fact that membership function can have any value in the interval [0, 1]. The general forms of intersection and union are represented by triangular norms (T-norms) and triangular conorms (T-conorms or S-norms), respectively. T-norm is a two-place function from [0,1] × [0,1] to [0,1] satisfying the following criteria: T(a,1) = a T(a, b) ≤ T(c, d), whenever a ≤ c, b ≤ d T(a, b) = T(b, a) T(T(a, b), c) = T(a, T(b, c))
One identity Monotonicity Commutiativity Associativity
The conditions defining a S-norm (T-conorm), S: [0,1] × [0,1] → [0, 1], are S(a,0) = a S(a, b) ≤ S(c, d), whenever a ≤ c, b ≤ d S(a, b) = S(b, a) S(S(a, b), c) = S(a, S(b, c))
Zero identity Monotonicity Commutiativity Associativity
The complement of a fuzzy set A is defined by c(0) = 0, c(1) = 1 c(a) < c(b), whenever a > b c(c(a)) = a
Boundary Order reversing Involution
The most common t-norms are minimum (2.11) and product (2.12). See also Fig 2.4. A I B = min( µ A ( x), µ B ( x))
(2.11)
13
Transparent fuzzy syst em s: m odeling and cont rol
A I B = µ A ( x) µ B ( x)
(2.12)
The most common s-norms are maximum (2.13) and probabilistic sum (2.14). See also Fig 2.5. A U B = max(µ A ( x), µ B ( x))
(2.13)
A U B = µ A ( x) + µ B ( x) − µ A ( x) µ B ( x)
(2.14)
1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2 0
0
x
x
Figure 2.4. Minimum (left) and product of two fuzzy sets (right). 1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.6
0.6
0
0
0
x
x
Figure 2.5. Maximum (left) and probabilistic sum of two fuzzy sets (right).
In applications, interestingly, a sum (obviously not a s-norm) of fuzzy sets is far more common choice than probabilistic sum that may result in a supernormal fuzzy set with height greater than one. The latter, however, is considered a minor problem in practice (Jager 1995). Typically, complement of a fuzzy set A is defined as A = 1 − µ A ( x) .
(2.15) 1.0 0.8 0.6 0.4 0.2 0 x
Fig. 2.6. Complement of a fuzzy set.
14
Transparent fuzzy syst em s: m odeling and cont rol
As classical set theory serves as the basis for classical logic, fuzzy set theory serves as the basis for fuzzy logic. Theoretic operations on fuzzy sets are a base for fuzzy logical operations meaning that the operations defined for sets (union, intersection and complement), have a corresponding logical operation - or, and and not respectively, similarly based on t-norm, s-norm and conditions given for complement.
2.5 Fuzzy systems Fuzzy set theory and fuzzy logic provide the means for constructing fuzzy systems (Zadeh 1973). Fuzzy system consists of a number of rules that specify linguistic relation between the linguistic labels of input and output variables of the system. A fuzzy rule (2.16) is a statement where the premise and the consequent consist of fuzzy propositions that are statements like "x is big" that connect the variable with a linguistic label defined for that variable. IF U1 is A1r AND U2 is A2r … AND Ui is Air … AND UN is ANr THEN V1 is B1r AND V2 is B2r … AND Vj is Bjr … AND VM is BMr OR…
(2.16)
Air and Bjr denote the linguistic labels of ith linguistic input variable xi and jth linguistic output variable yj (i = 1 … N, j = 1 … M), respectively, associated with the rth rule (r = 1 … R). With (2.16) the relationship in linguistic terms is given. In order to associate it with the numerical values, a special six-step inference algorithm is used. The inference algorithm implements the mapping between the linguistic variables Ui, Vj and corresponding base variables xi, yj. We assume for a moment that the system under consideration is multi-input/single-output (MISO). 1. The inference mechanism operates on fuzzy sets to produce fuzzy sets. Normally the inputs to the fuzzy system are crisp and thus these have to be converted to fuzzy sets. This first step of the inference algorithm during which the fuzzy representation of the crisp input is created is therefore called fuzzification. Uncertainty, imprecision or inaccuracy in the inputs can be modeled by fuzzy numbers (e.g. triangular fuzzification) to represent the inputs. Most often, however, singleton fuzzification is used, with a membership function defined by a fuzzy singleton (A.1) because alternative fuzzification methods add computational complexity to the process and the need for them has not been that well justified. 2. In order to evaluate the premise propositions in numerical terms (proposition matching), the membership value of µir in respect to x is to be determined.
15
Transparent fuzzy syst em s: m odeling and cont rol
Proposition matching is defined as τ ir = hgt ( µ i' ∩ µ ir ) , where µ i' is the ith fuzzy input that in case of singleton fuzzification reduces to
τ ir = µ ir ( xi )
(2.17)
As the fuzzy singleton can be regarded just a different representation of a crisp number, thus with singleton fuzzification, the fuzzification procedure is embedded into proposition matching. 3. Operator AND that concatenates the premise propositions obviously corresponds to t-norm. Using the appropriate t-norm, the activation degree (also degree of fulfillment, firing strength) of a rule is calculated. The procedure is called premise conjunction. N
τ r = Iτ ir
(2.18)
i =1
4. Operator THEN corresponds to implication. In classical logic, implication is defined by A → B = ¬A U B , that in fuzzy logic could be expressed as fuzzy implication using negation (2.15) and maximum s-norm: A → B = max(1 − µ A ( x), µ B ( x))
(2.19)
This (material) implication is, however, rarely used in fuzzy system applications. Usually conjunction (t-norm) is preferred, and the output fuzzy set is thus determined by Fr ( y ) = τ r ∩ γ r ,
(2.20)
where γr denotes the output membership function associated with the rth rule. This preference is basically due to undesirable interpolation properties characteristic of material implication (explored e.g. in (Jager 1995)); (2.20) is also much easier to implement. Babuska (1997) states that while material implication represents the unidirectional relationship "A implies B", t-norm should be interpreted as a nondirectional relationship "it is true that A holds and B holds". 5. The rules are then aggregated by using s-norm for the aggregation operator OR. Note that with (2.19) t-norm must be used (Jager 1995). R
F ( y ) = U Fr ( y ) r =1
16
(2.21)
Transparent fuzzy syst em s: m odeling and cont rol
Fuzzification (2.17), proposition matching, premise conjunction (2.18), implication (2.20) and aggregation (2.21) result in fuzzy output of the system: R N F ( y ) = U I µ ir ( xi ) ∩ γ r r =1 i =1
(2.22)
Consider a two-input/single-output fuzzy system for illustration (Fig. 2.7). implication
premise conjunction
proposition matching
IF U1 is A11 AND U2 is A21 THEN V1 is B1 aggregation
OR IF U1 is A12 AND U2 is A22 THEN V1 is B2
Fig 2.7. Steps of inference algorithm and corresponding linguistic operators.
When it comes to multi-input/multi-output (MIMO) systems, AND operator in rule consequent is not treated as logical and (implemented by t-norm), each fuzzy output F(y j) is evaluated independently by R N F ( y j ) = U I µ ir ( xi r =1 i =1
) ∩ γ jr
(2.23)
The latter implies that any MIMO fuzzy system can be decomposed to M MISO fuzzy systems. 6. Because in practice we deal with crisp rather than fuzzy values, the crisp representation of the fuzzy output must be derived finally. This is normally done by averaging technique called defuzzification (inverse operation to fuzzification). Two common defuzzification methods are center-of-gravity (CoG) and mean-of-maxima (MoM). CoG (often referred to as center-of-area defuzzification in the case of onedimensional sets) is actually the same method employed to calculate the center of gravity of a mass. The difference is that point masses are replaced by the membership values.
17
Transparent fuzzy syst em s: m odeling and cont rol
∫ yF ( y)dy Ycog ( F ( y )) =
Y
(2.24)
∫ F ( y)dy Y
In practice CoG is usually applied in discrete form: Q
∑ F(y Ycog ( F ( y )) =
q =1
q
) yq .
Q
∑ F(y q =1
q
(2.25)
)
Mean-of-maxima defuzzification belongs to the class of indexed (or threshold) defuzzification methods that discriminate part of a fuzzy output where membership values are below a certain threshold level. For MoM defuzzification only this part of fuzzy output is taken into account that yields maximum membership. Ymom ( F ( y )) =
1 q
∑ F(y j ) ,
(2.26)
j∈J *
where J* denotes the subset of maximum values of F(y) and q is the number of its elements. Putting (2.22) into (2.25) we obtain
R Uτ r I Γr ⋅ Y T r =1 y = Ycog ( F ( y )) = R , Uτ r I Γr ⋅ 1 r =1
(2.27)
[
]
[
]
where Γr = γ r ( y1 ) γ r ( y 2 ) ...γ r ( y q ) ...γ r ( y Q ) , Y = y1 y 2 ... y q ... y Q and 1 is a unitary column vector of Q elements. Note that with product chosen as t-norm and sum chosen as s-norm R ∑ τ r ⋅ Γr ⋅ Y T r =1 y = Ycog ( F ( y )) = R = ∑ τ r ⋅ Γr ⋅ 1 r =1 R
Q
r =1
q =1
= ∑τ r ∑ γ r ( y q ) y q
18
R
Q
r =1
q =1
Q
R
∑∑τ r γ r ( y q ) y q q =1 r =1 Q
R
∑∑τ r γ r ( y q )
∑τ r ∑ γ r ( y q )
q =1 r =1
= (2.28)
Transparent fuzzy syst em s: m odeling and cont rol
For reasons that become clearer in next chapter, two special cases of (28) are of special interest to us: (i) output MFs are fuzzy singletons (A.1). (ii) output MFs are symmetrical triangles (A.4) It can be shown (see Appendix B for details) that in first case (2.28) reduces to R
y = ∑τ r br r =1
R
∑τ r ,
(2.29)
r =1
whereas in second case R
y = ∑τ r br s r r =1
R
∑τ r s r .
(2.30)
r =1
A popular representation of fuzzy systems is depicted in Fig. 2.8, where (to summarize what we have established) rule base stores a set of logical IF-THEN rules defined on the system variables, data base stores a set of MFs of fuzzy labels of rules used in the rule set. These two bases can be regarded as the linguistic layer (or knowledge base) of the system. Knowledge Base Rule Base
Fuzzifier
Data Base
Fuzzy Inference Engine
Defuzzifier
Fig. 2.8. A generic fuzzy system.
Inference layer consists of fuzzifier (that converts a set of crisp variables into a set of fuzzy variables to enable the application of logical rules), fuzzy inference engine (that is an algorithm that calculates the extent to which each rule is activated and combines these into fuzzy system output) and defuzzifier (that converts a set of fuzzy variables into crisp values in order to enable the output of the fuzzy system to be applied to another non-fuzzy system). Fuzzy inference engine, rule base and data base can be regarded as the reasoning block of a fuzzy system.
2.6 Rule base properties In (2.16)-(2.23) we refer to linguistic labels (and respective fuzzy subsets) in correspondence to their occurrence in the rth rule that probably gives the impression that each fuzzy subset defined for the particular linguistic variable is used only once in the rule base. Generally this is not the case - the number of
19
Transparent fuzzy syst em s: m odeling and cont rol
"slots" in rules exceeds the number of unique linguistic labels and a fuzzy subset is usually associated with several rules. Assuming that each input variable Ui is partitioned into Si fuzzy subsets, each output variable Vj is partitioned into Tj fuzzy subsets and the fuzzy system consists of R rules, a separate structure that defines the mapping between ruleoriented notation and variable-oriented notation is needed. In MATLAB Fuzzy Logic Toolbox, for example, the information is stored in the R × (N + M) matrix, each element of which mrp, is the index of either input (if p ≤ N) or output (if p > N) variable's membership function, associated with the rth rule. m11 ... m1 p ... m1, M + N ... ... ... .... ... mr1 ... m rp ... m r , M + N , ... ... ... ... ... m ... m Rp ... m R , M + N R1
(2.31)
The maximum number of rules of a system is given by N
Rmax = ∏ S i .
(2.32)
i =1
The comparison of the actual number of rules with Rmax is a good indicator if the rule base is properly defined: a) R < Rmax - implies that one or several rules possible with the given input partition are undefined - incomplete rule base. b) R > Rmax - implies that there are several rules with equivalent antecedents that are associated with • same consequent subsets - resulting in redundant rule base; • unique consequent subsets - resulting in inconsistent rule base. c) R = Rmax - usually the desired situation. Variable-oriented transcription of fuzzy inference algorithm (2.33) is also possible, assuming that R = Rmax, and that each output fuzzy set γr is associated with one rule only (which means that the number of output MFs is equal to R), what would at the first glance seem impractical, but such combinatorial rule base is actually quite common in fuzzy modeling because of the lack of adequate rule training algorithms. F ( y) =
SN N j ... U U U I µ i i ( xi ) ∩ γ r j1 =1 j2 =1 j N =1 i =1 S1
S2
(2.33)
Because of the difficulties in fuzzy system modeling some authors (e.g. (Tong 1978), (Kosko 1992a)) have employed rule weights. According to this strategy,
20
Transparent fuzzy syst em s: m odeling and cont rol
each rule is assigned a rule weight Wr = [0,1], that is involved in calculation of the rth's rule's output (2.20) and is said to express the relevance, credibility or probability of the rule: Fr ( y ) = Wrτ r ∩ γ r .
(2.34)
In most cases the sum of the weights of the rules with equivalent antecedents is required to be equal to 1. A weightless fuzzy system (2.2) can then be regarded as a special case of weighted fuzzy systems, with ∀ Wr = 1. There are three basic reasons why rule weights should be avoided. N
1) Number of rules is increased by an order of magnitude - Rmax = T ∏ S i . i =1
2) Interpretation of the rules is made difficult, partly because of the increased number of rules, partly because no good explanation as to how to interpret these weights exists. 3) Any tuning action by adjusting the weights can accomplished by modifying membership function parameters. A more detailed discussion about this issue is available in (Nauck and Kruse 1998).
2.7 Inference examples According to Sections 2.5 and 2.6, the inference algorithm that establishes the numerical mapping between the fuzzy system variables consists of six steps. To apply the algorithm, the association between the rule-oriented notation (that indexes the fuzzy subsets in respect to their occurrence in the rth rule) used for convenience in description of the algorithm and variable-oriented notation (that numbers the fuzzy subsets in respect to which system variable they belong) must be created. Let us present a simple and illustrative example. Let the system have two inputs and single output, let there be two subsets for both inputs and three for the output variable. The non-redundant, consistent and complete rule base would then be (in variable-oriented notation). 1. IF U1 is A11 AND U2 is A21 THEN V is B2 2. IF U1 is A11 AND U2 is A22 THEN V is B1 3. IF U1 is A12 AND U2 is A21 THEN V is B3 4. IF U1 is A12 AND U2 is A22 THEN V is B2 Thus, the variable-to-rule mapping matrix (2.31) appears as
21
Transparent fuzzy syst em s: m odeling and cont rol
1 1 2 2
1 2 1 2
2 1 . 3 2
It is possible to depict this fuzzy system as a network structure (Fig. 2.9). Each layer of the network represents the respective step of the inference algorithm. In fuzzification/proposition matching layer the input membership function parameters are stored. The arrows that indicate fuzzy data flow are bold, crisp data flow is depicted with normal lines. The mapping (2.31) is determined by connections between the 1st and 2nd layer and the connections between output MFs and implication layer.
x1
x2
proposition matching
premise conjunction
implication
µ11
∩
∩
µ12
∩
∩
µ21
∩
∩
µ22
∩
∩ γ1
γ2
aggregation
∪
F(y)
γ3
Fig.2.9. Network representation of a fuzzy system.
We observe how output value y is inferred for the given input values x1 = x1* , x 2 = x 2* when i) inference operators for conjunction, implication and aggregation are minimum, minimum and maximum, respectively and center-of-gravity defuzzification is used (Fig. 2.10) ii) product-product-sum inference is combined with mean-of-maxima defuzzification (Fig.2.11).
22
Transparent fuzzy syst em s: m odeling and cont rol
1
A11 A12
1
A2 1
A2 2
1
B1
B2
B3
µ11(x1) τ1
µ21(x2) 0 x1*
1
A11 A12
µ11(x1)
0 x 2*
1
x2
x1*
x1
A11 A12
B1
0 x1*
x1
x 2*
x2
A21 A22
B3
0
1
y B1
B2
B3
τ3
0 x 2*
A11 A12
x2
0
A21 A22
1
B2
τ2
µ21(x2)
µ12(x1)
y
1
0
1
0
A21 A22
µ22(x2)
0
1
x1
y B1
1
B2
B3
1
µ22(x2) τ4
µ12(x1) 0 x1*
x1
0 x 2*
x2
0
1
y B1
0
B2
y*
B3
y
Fig 2.10. Min-max inference with COG defuzzification.
23
Transparent fuzzy syst em s: m odeling and cont rol
1
A11 A12
1
A21 A22
1
B1
B2
B3
µ11(x1) µ21(x2) 0 x1*
1
A11 A12
µ11(x1)
0 x 2*
1
x1*
x1
A21 A22
A11 A12
x 2*
1
µ21(x2)
0
0
x1*
x1
A11 A12
1
x2
A21 A22
y B1
B2
B3
0
1
y B1
B2
B3
τ3 x 2*
1
0
τ2
0
µ12(x1)
1
x2
µ22(x2)
0
1
x1
τ1
x2
A21 A22
0
1
y B1
B2
B3
µ22(x2) µ12(x1)
τ4
0 x1*
x1
0 x 2*
x2
0
1
0 Fig 2.11. Prod-sum inference with MOM defuzzification.
24
y B1
y*
B2
B3
y
Transparent fuzzy syst em s: m odeling and cont rol
2.8 Takagi-Sugeno fuzzy systems Fuzzy systems observed so far belong to the class of standard (also linguistic, Mamdani) fuzzy systems. Standard fuzzy systems appear particularly useful when human-machine interface is under observation, because it is the linguistic nature of the system that makes the information stored in the fuzzy system intuitively understandable and vice versa - it gives us possibility to implement our knowledge about the system. On the other hand, there is acknowledged deficiency concerning efficient data-driven modeling algorithms that could be applied to standard fuzzy systems. Not satisfied with the situation, Takagi and Sugeno (1985) came up with the alternative rule format (2.35) in order to make automated tuning possible and to reduce the number of fuzzy rules needed to model a system. IF U1 is A1r AND U2 is A2r … AND Ui is Air … AND UN is ANr THEN yr = p0r + p1rx1 + … + pirxi + … + pNrxN
(2.35)
In Takagi-Sugeno (TS) rules consequent fuzzy proposition is replaced by an affine linear function of inputs and each rule can be considered as a local linear model that are then blended together by means of aggregation to form the overall output y. The redefinition of fuzzy system influences the 4th step in inference algorithm implication N
Fr ( y ) = τ r ∩ y r = τ r ∩ ( p0 r + ∑ pir xi ) .
(2.36)
i =1
Modified is also the 6th step - defuzzification. With TS systems the implication and aggregation operators are product and sum respectively, using which center-of-gravity defuzzification reduces to an algorithm that is known as fuzzy c-means defuzzification (FcM). FcM, in fact, combines the aggregation and defuzzification into one operation and is thus more than a defuzzification method (Jager 1995). R
N
∑τ r ( p0r + ∑ pir xi ) y = Y fcm ( F ( y )) =
r =1
i =1
R
∑τ r
(2.37)
r =1
A special case of the consequent function where the offset p0r = 0, r = 1...R results in homogeneous TS system. Another and particularly interesting special case of TS systems is obtained if the consequent function is a constant (∀pir = 0, i =1…N, r = 1...R), thus (2.35) reduces to (2.38) and (2.37) reduces to (2.39). IF U1 is A1r AND U2 is A2r … AND Ui is Air … AND UN is ANr THEN yr = p0r
(2.38)
25
Transparent fuzzy syst em s: m odeling and cont rol
R
y = Y fcm ( F ( y )) = ∑τ r p0 r r =1
R
∑τ r
(2.39)
r =1
It is easy to see complete equivalence between singleton standard fuzzy systems (2.29) and 0th order TS systems (2.39). Sometimes FcM defuzzification is applied to standard fuzzy systems so that before performing the weighted sum, each output fuzzy set is represented by its numerical representation br, which is normally chosen to be the center of gravity of the given output set. This, however, is more or less equivalent to replacing the fuzzy system with the corresponding singleton system (or 0th order TS systems) because output MF parameters other than their crisp representation have no influence to the system output. Another logical conclusion is that 0th order TS systems (as opposed to ordinary, 1st order TS systems) retain linguistic interpretability in the manner of standard fuzzy systems while possessing these attractive properties of TS systems that open the way for automated determination of system parameters from data. Like standard fuzzy systems, TS systems can also be depicted as a network structure but in this case the analogy with neural networks is even more obvious (Fig. 2.12). In particular, equivalence between 0th order TS systems and radial basis neural networks has been shown (Jang 1993b). p11x1 + p21x2 + p01
x1
µ11 µ12
x2
µ21
Π
N
Π
Π
N
Π
Σ
y
µ22 p12x1 + p22x2 + p02 Fig. 2.12. 1st order TS system in network representation.
This equivalence means that many techniques (both in modeling and control) developed for neural networks can be adopted by TS systems.
26
Transparent fuzzy syst em s: m odeling and cont rol
2.9 Design of fuzzy systems Two major applications of fuzzy systems are fuzzy modeling and fuzzy control. Two general sources of information for building fuzzy systems are the prior knowledge and data (measurements). Prior knowledge can be of a rather approximate nature that usually originates from "experts", e.g. process designers or operators, who are asked to express their knowledge in the form of fuzzy rules. Hence, such fuzzy systems can be regarded as fuzzy expert systems. For many processes, data about the process operation is recorded in a daily routine. If this is not the case, special experiments can be designed to obtain the relevant data. Building fuzzy systems from data involves special algorithms designed for that task. The acquisition or tuning of fuzzy systems by means of data is usually termed fuzzy identification. There is certain parallelism with classical system modeling - knowledge based design is somewhat analogous to first principle modeling while fuzzy identification belongs to the same class with statistical methods used in system identification. In classical modeling quite often a combined approach is used where we use physics to write down a general differential equation that is believed to represent the system behavior and then experiments are performed to determine certain system parameters or functions. Similar combined approach is widely used in fuzzy system design. Design of fuzzy systems may be seen as a general algorithm consisting of six steps (Yager and Filev, 1994). i) Selection of the input and output variables; ii) Selection of the appropriate reasoning mechanism for the formalization of the fuzzy model; iii) Determination of the universes of discourse; iv) Determination of the linguistic labels into which the variables are partitioned; v) Formation of the set of linguistic rules that represent the relationships between the system variables; vi) Evaluation of system adequacy. Quite often this algorithm results in preliminary fuzzy system only, if during the system evaluation phase it is revealed that the performance index (e.g. rootmean-square error) of the system is not what was expected. Steps i-v can then be regarded as the structure determination of the fuzzy system and further parameter identification phase is needed during which the membership function parameters obtain supposedly optimal values. If input-output data that reflects the optimal behavior of the system is supplied, it is natural to apply data-driven techniques, observed in more detail in chapter 4. In some cases, however we need to return to the beginning.
27
Transparent fuzzy syst em s: m odeling and cont rol
With complex systems, it is not always clear, which variables should be used as inputs to the model. Prior knowledge, insight into process behavior and the purpose of modeling are the typical sources of information for this choice. For the selection of appropriate reasoning mechanism (including specification of system type, inference operators, defuzzification method, MF types) the deciding factors are again the purpose of modeling and the type of available knowledge. The identification algorithms may play a role here e.g. with derivative based identification algorithms the fuzzy inference algorithm must be differentiable and thus inference operators are predetermined. Computational cost may also be the issue - Some inference schemes are computationally more expensive than others - e.g. CoG defuzzification compared to MoM or FcM (in this sense, simplified inference algorithms as (2.30) are important). Not the least of the deciding factors are the interpolation properties of the system that are determined by the reasoning mechanism. Fuzzy system design is quite application-dependent and exact design algorithm cannot be defined. The general guidelines, however, provide the reliable framework for the design of fuzzy systems.
2.10 Summary In this chapter the basics of fuzzy set theory and fuzzy logic were considered (sections 2.1-2.4) that appear as extensions to crisp set theory and Aristotelian logic, respectively, and serve as the basis for building fuzzy systems. The presented material constitutes only a small part of the huge body of fuzzy set theory and fuzzy logic but there is more than enough in order to understand the rest of the thesis. Fuzzy systems allow the processing of information in linguistic terms that is expressed in the form of IF-THEN rules and is built on the analogy with human reasoning. Besides the linguistic layer, information processing takes also place at numerical level, using the special inference algorithm. Inference algorithm is a six-step procedure with a large degree of flexibility (there exists a large family of inference operators and fuzzification and defuzzification methods) that creates unique input-output mapping between the system (base) variables. This unique architecture of fuzzy systems makes them useful for man-machine interaction problems and makes possible to use human experience and knowledge usually expressed in vague terms otherwise difficult to implement. In addition to purely linguistic fuzzy systems where all variables are partitioned into fuzzy sets, there exists another form of fuzzy rules where consequent part is a linear function of inputs known as Takagi-Sugeno rules (see section 2.8). TS systems have become increasingly popular because their inference algorithm is mathematically less complex and allows acquisition of control/modeling techniques from other fields of research of more analytical character. 28
Transparent fuzzy syst em s: m odeling and cont rol
Particularly attractive are 0th order TS systems (that at the same time can be regarded as singleton standard fuzzy systems) because of intuitively understandable rule base and computationally inexpensive inference algorithm. Similarly attractive are the inference properties of standard fuzzy systems with symmetrical triangular MFs. Fuzzy system design issues were only briefly considered here, we also return to them in the following chapters, specifically in chapter 4 that will be dedicated to fuzzy identification - acquisition of fuzzy models from training data.
29
Transparent fuzzy syst em s: m odeling and cont rol
30
Transparent fuzzy syst em s: m odeling and cont rol
3 Interpolation and transparency in fuzzy systems 3.1 Transparency and interpretability The use of the term transparency in present work is based on (Brown and Harris 1994) where transparency is defined as a property that enables us to understand the influence of each system parameter on the system output as well as on (Setnes et. al. 1998) where fuzzy systems are characterized as being transparent to interpretation. Fuzzy system transparency is closely related to the concept of linguistic interpretability but these are not matching terms and, in our opinion, it is very important to see the distinction. Interpretability is a property of fuzzy systems that exists by default, being established with linguistic rules and fuzzy sets associated with these rules; even the rules of 1st order TS systems can be interpreted. Transparency, on the other hand, is not a default property of fuzzy systems and being the measure of how valid or how reliable is the linguistic interpretation of the system. It will be shown in this chapter that for standard fuzzy systems and 0th order TS systems, transparency has binary character; for 1st order TS systems it is a continuous variable. Most authors, however, do not make this distinction; some of them do not pay attention to transparency at all and consequently assume that transparency like interpretability is a default property of fuzzy systems (sometimes regarded characteristic to standard and 0th order TS systems only as in (Nauck et. al.
31
Transparent fuzzy syst em s: m odeling and cont rol
1996); others do emphasize that transparency of fuzzy systems is not guaranteed by default (Yin 2000), (Babuska 2000) but use the terms in parallel. There is yet another aspect of the problem that sometimes gets mixed up with transparency of fuzzy systems. We speak of readability of fuzzy rules that basically boils down to the overall complexity of the system. Improvement of readability through the use of moderate number of variables, rules and fuzzy subsets or by avoiding the inconsistencies in the rule base is undoubtedly useful but has little in common with transparency as understood in this thesis. We concentrate on low-level transparency that grows out from conformity between the linguistic layer and the inference layer of a fuzzy system. This conformity is necessary enables us to "see" through the inference layer and is the precondition for making fuzzy systems both predictable and reliable in their behavior. In fact, very few authors (Lotfi et. al. 1996), (Oliveira 1999), (Yin 2000), (Babuska 2000) have investigated the latter issue in any detail. The most important of these works is perhaps (Oliveira 1999) that lists a set of properties (moderate number of MFs; natural zero positioning, normality, coverage and distinguishability of MFs) that fuzzy systems should meet and proposes mathematically formulated constraints for preserving the last two, incorporated into the cost function of the gradient descent learning algorithm. These works dealing with low-level transparency, however, aim for certain balance between transparency and accuracy and the results can be generally applied only to a limited class of systems/algorithms. On the other hand, there are even fewer works concentrating on the transparency problem of 1st order TS systems (Yen et. al. 1998), (Bikdash 1999), (Fiordaliso 2000). Our aim is therefore to unite all these efforts into the general definition of fuzzy system transparency. It is claimed that "currently there exists no wellestablished definition of transparency of a fuzzy system" and "there are no definite criteria for the distinguishability of a fuzzy partition" (Yin 2000). Hopefully, solutions proposed to these problems in present chapter, help to fill the void. Once the transparency conditions for fuzzy systems are defined, interpolation properties of fuzzy systems can be revised in more systematic manner. Interpolation and transparency could be regarded as two sides of a coin; therefore it would be unreasonable to ignore the interpolation aspect in present chapter. The transparency conditions can be easily satisfied if fuzzy systems are obtained through manual design. The key problem in fuzzy modeling (and control) is that transparency is generally lost when fuzzy systems are identified from data. Transparence conditions serve as the basis for establishing transparency protection mechanisms that are discussed in the next chapter.
32
Transparent fuzzy syst em s: m odeling and cont rol
It is also important to point out that although the specific interest toward fuzzy system transparency is not very prominent in academic circles, there are and always have been authors who use transparent fuzzy systems (according to the general definition proposed in this thesis) in their research, not mentioning the everyday practitioners of fuzzy logic control. E.g. (Setnes et. al. 1998) and (Jager 1995) are listed here as important sources of inspiration. As the conclusion to this introduction it must be stressed that transparency is certainly not the universal requirement for fuzzy systems. When the fuzzy system is used as a black box and its interpretation is the least of the concerns of its end user, transparency aspect can be freely ignored. It must be noted, however, that generally, system transparency is an useful property that provides additional means for control system or model validation and in some cases, transparency facilitates the application of transparency-based control methods. One of the aims of the thesis is to demonstrate this through the applications.
3.2 Transparency of standard fuzzy systems Let us consider the properties listed in (Oliveira 1999). It is arguable if coverage and natural zero positioning have anything to do with transparency (Babuska 2000). Normality on the other hand, is the standard assumption in fuzzy systems. Distinguishability of input MFs that is in turn directly related to the overlap of input MFs is, however, vital to transparency as shown in the following. Note that the conclusions are valid for 0th order TS systems, too. The overlap of input MFs is also one of the most important factors influencing interpolation in fuzzy systems. It is reported (Shaw 1998) that a suggested minimum of 25% and a maximum of 75% have been established experimentally. Frequently, 50% overlap is a reasonable compromise. The effect of overlap to the interpolation can be most conveniently observed in twodimensional space that we do by constructing five otherwise equivalent SISO fuzzy systems, made up of 6 rules with 0%, 25%, 50%, 75% and 100% overlap degree, respectively. Although other system parameters (including minimum tnorm, maximum s-norm and CoG defuzzification) remain the same, in each case quite a different result is obtained (Fig 3.1). With 0% overlap, no interpolation occurs, hence the system is actually non-fuzzy and its output abruptly switches from one rule centroid to another. With 25% overlap the input intervals for what the output has constant value, are still present but some interpolation between the neighboring rules occurs. With 50% overlap, the interval where the system output is the explicit contribution of the given rule is reduced to a single point. With larger overlap, however, at least two rules contribute simultaneously for any given input, thus system output is always the result of interpolation. This makes the contribution of a given rule invisible in system output. We suggest that such feature would not be exactly the desired one. The phenomenon is driven to extreme with 100% 33
Transparent fuzzy syst em s: m odeling and cont rol
overlap where all rules are fully activated simultaneously and system output has constant value, equaling to the centroid of the union of output fuzzy sets. 4 50%
0%
3
75%
2 0%
25%
100%
1 y 0
IF x is mf1 THEN y is mf3 IF x is mf2 THEN y is mf1 IF x is mf3 THEN y is mf2 IF x is mf4 THEN y is mf4 IF x is mf5 THEN y is mf5 IF x is mf6 THEN y is mf3
-1 -2 -3
25%
-4 0
2
4
x
6
8
50% 75% 100% 10 0
2
4
6
8
10
x
Fig. 3.1. Overlap degree of input MFs (right) and its influence to system output (left).
Let us consider again the case of 50% overlap and let us refer to the point in input-output space where the explicit contribution of a given rule takes place and the rule under observation is fully activated as transparency checkpoint. When the overlap is equal or smaller than 50%, transparency checkpoints do exist. Closer inspection reveals that the input coordinate of the transparency checkpoint is equal to the center of the fired MF (where µ(x) = 1). Building up on the analogy, the desired output y at the transparency checkpoint would also be the center of the respective output MF, where γ(y) = 1. This ensures that the interpretation of the rule that we are able to obtain by combining the information from the rule base and MF definition base has good correspondence with the inferred numerical values. This is what we call transparency. This ideology of transparency checkpoints extends to MISO and MIMO systems and is covered by the following definition. Definition: rth rule of the standard MIMO fuzzy system (2.27) is transparent if it's activation degree N
τ r = I µ ir ( xi ) = 1 ,
(3.1)
i =1
results in system output y j = b jr , j = 1…M
(3.2)
where bjr is the center of the output MF γjr associated with the activated rule.
34
Transparent fuzzy syst em s: m odeling and cont rol
mf1
5
mf2 mf3 mf4
mf6
mf5
4 mf1
3 2
mf2
1 mf3
y 0 -1 mf4
-2 -3 mf5
-4 -5
0
2
4
6
8
10
x
Fig 3.2. Transparency checkpoints, depicted by 2.
A standard fuzzy system (2.27) can be regarded transparent only if all its rules are transparent (Fig. 3.2). In order to preserve input transparency (3.1) with triangular input MFs (4.35), that is, to guarantee the existence of transparency checkpoints, the following conditions apply: cis −1 ≤ bis ≤ a is +1 , i = 1, ..., N ; s = 2, ..., S i − 1.
(3.3)
In order to preserve output transparency in case of CoG defuzzification (2.24), (3.4) must be satisfied. y max
∫ yγ jr ( y)dy
y min
= b jr
y max
(3.4)
∫ γ jr ( y)dy
y min
(3.4) implies that output MFs must be symmetrical. Note that with MoM defuzzification, however, (3.4) would not be necessary. Next we generalize the transparency conditions so that they can be applied universally to other types of MFs. More general formulation of (3.3) is as follows: Si
∀x ∈ X : ∑ µis ( xi ) ≤ 1
(3.5)
s =1
Note that if (3.5) is strictly equal to 1, a fuzzy partition (2.12) is established.
35
Transparent fuzzy syst em s: m odeling and cont rol
(3.4) rewritten in general form: ymax
∫ yγ jr ( y)dy
Υcog (γ jr ( y j )) =
= core(γ jr ( y j ))
ymin ymax
(3.6)
∫ γ jr ( y)dy
ymin
It must be taken into account that with several MF types such as Gaussian (A.2), (3.5) cannot be achieved because of non-compact support. This means that in order to achieve transparency, input MFs must satisfy certain conditions that follow. With fuzzy number-like MFs, defined by three parameters a, b and c; the following conditions must be satisfied:
a ≤ b ≤ c a = min(supp( A)) b = core( A) c = max(supp( A))
(3.7)
With fuzzy interval-like MFs, defined by four parameters a, b, c and d; the following conditions must be satisfied:
a ≤ b ≤ c ≤ d a = min(supp( A)) b = min(core( A)) c = max(core( A)) d = max(supp( A))
(3.8)
If the use of smooth MFs is prescribed then possible choice is a spline-based MF satisfying (3.8), such as square spline (A.6) or cubic spline (A.7). µ(x) 1.0
0.5
0 0
a
b
c
d
x
Fig. 3.3. Comparison of cubic and square spline based MFs.
Fig. 3.3 demonstrates that the actual numerical difference between cubic and square spline based MFs is quite small. 36
Transparent fuzzy syst em s: m odeling and cont rol
3.3 Interpolation in standard systems If a standard fuzzy system is transparent, we are able to predict its output at transparency checkpoints. Between these points, however, the output is the result of interpolation that takes place between individual rules. The nature of interpolation is determined by fuzzy system parameters - defuzzification method, inference operators and shape of membership functions. In next few sections these factors are addressed separately.
3.3.1 Role of defuzzification
We observe the influence of basic defuzzification methods to fuzzy system output by using the SISO system from section 3.2 with 50% overlap, leaving all other parameters intact. The most obvious is the effect of MoM defuzzification that results in stepwise output. This is the reason why MoM, although computationally inexpensive, is seldom used in modeling where we usually expect smooth interpolation between the transparency checkpoints. With this method, system is also insensitive to all other parameters otherwise influencing the interpolation and a multi-level relay what a fuzzy system with MoM defuzzification is, could be as well implemented with classical set theory (if output fuzzy sets are symmetrical). 4 3 2 1 y 0 -1 -2 -3 -4 0
2
4
x
6
8
10
Fig. 3.4. Output interpolated by MoM (normal), FcM (bold) and CoG (dashed) defuzzification methods.
As noted in section 2.8, FcM defuzzification transforms the original system to the 0th order TS system and the resulting interpolation between the transparency checkpoints is linear. The latter may be regarded a desirable property.
37
Transparent fuzzy syst em s: m odeling and cont rol
Finally, in case of CoG, interpolation results in a curve, and the exact shape of it is determined by other parameters, most notably by the relative magnitude of output fuzzy sets. If two output MFs are of equal size, the interpolated output curves around the linear interpolation intersecting it at the midpoint. If one of the output MFs is larger than other, then the interpolation is "drawn" to the direction of the larger set as shown in Fig. 3.5. A
B
1.0
µ(x)
supp(A) < supp(B)
bB
0
bA
y
A
supp(A) = supp(B)
bB
B
1.0
y
µ(x)
supp(A) > supp(B)
0
bA
y
A
bA x
bB
B
1.0
µ(x)
0
bA
y
bB
Fig. 3.5. Interpolation with CoG defuzzification (linear interpolation between the transparency checkpoints is depicted by dashdot).
3.3.2 Role of MF type
On the basis of their shape, MFs can be divided into following subcategories: a) piecewise linear MFs (i.e. triangular and trapezoidal MF) b) smooth MFs (spline-based MFs) Another classification is based on the determination if the core of the MF is a single point or not. a) fuzzy numbers (triangular MF, 3-parameter spline-based MF) b) fuzzy intervals (trapezoid MF, 4-parameter spline-based MF)
38
Transparent fuzzy syst em s: m odeling and cont rol
4
4
3
3
2
2
1
1
y 0
y 0
-1
-1
-2
-2
-3
-3
-4 0
2
mf1
4
6
8
mf5
mf2 mf3 mf4
-4
10
0
mf6
2
mf1
mf2
4
6
mf3 mf4
8
mf5
10
mf6
1.0
1.0
0 0
2
4
6
8
0 0
10
2
4
6
8
10
x
x
Fig. 3.6. Use of trapezoid or smooth MFs instead of triangular ones and its influence on interpolation. 0.8
0.8
0.75
0.75
0.7
0.7
0.65
0.65
y 0.6
y 0.6 minimum
0.55
0.5
0.5 0.45 0.4
minimum
0.55
0.45
product
0.4
0
0.2
0.4
x
0.6
0.8
1
product 0
0.2
0.4
0.6
0.8
1
x 0.6 mf1
mf2
0.4
0.2
0 0
0.2
0.4
0.6
0.8
1
y
Fig. 3.7. Product vs. minimum implication with sum aggregation (above left), Product vs. minimum implication with maximum aggregation (above right), output MFs (below)
39
Transparent fuzzy syst em s: m odeling and cont rol
With fuzzy intervals, a zone of insensitivity forms around the transparency checkpoint. Length of the zone is proportional to the size of core of the contributing MF (rule). The effect is quite similar to the one we experienced with the overlap smaller than 50%. The interpolation outside these transparency zones significantly deviates from the original interpolation (Fig. 3.6, left). If smooth MFs are used, additional non-linearity (Fig. 3.6, above right) is introduced. The deviation from the original interpolation as with fuzzy intervals in previous case depends on the deviation of input MFs from the original input partition, being proportional to the support of the contributing fuzzy set. 3.3.3 Role of inference parameters
It is interesting to note that inference parameters influence the interpolation significantly only if CoG defuzzification is used. MoM was already considered insensitive to all other parameters of fuzzy systems from interpolation viewpoint, with FcM the following characteristics occur. a) output membership functions are crisp, thus τ r ⋅ p0 r ≡ min(τ r , p 0 r ) R
b)
∑τ r ⋅ p0r ≡ max(τ 1 ⋅ p01 , ... ,τ R ⋅ p0 R )
if output singletons p0r do not
r =1
match, that is generally true for 0th order TS systems because they are usually used in a configuration where each rule is assigned an unique singleton p0r (combinatorial rule base). Thus, we conclude that the nature of interpolation in the case of 0th order TS systems depends little on inference operators (premise conjunction operator plays a small role). With CoG defuzzification, both aggregation and implication operators have impact on interpolation. Note that output MFs in Fig 3.7 (below) are not equal in size and that they overlap (that is where aggregation by sum and maximum differs). It is clear that product implication provides smoother interpolation as well as that with maximum aggregation the interpolated output deviates more from the linear interpolation than with sum aggregation. 3.3.4 Interpolation in multidimensional space
Although the conclusions about interpolation issues based on observations
made on SISO systems can be basically generalized to MISO systems some substantial differences exist. E.g. (linear) interpolation in SISO 0th order TS systems is not linear in multidimensional space (Fig 3.8). The reason is simple: the output that is interpolated from four neighboring rules (transparency checkpoints) is not linear because planar surface is determined by
40
Transparent fuzzy syst em s: m odeling and cont rol
three freely chosen points. When generalized, it turns out that the difference between the number of neighboring rules and number of points that defines linear interpolation increases with every extra input added and linear interpolation is possible only if those two are equal, i.e. in SISO case (Table 3.1).
y
x2
x1
Fig. 3.8. Interpolation between 4 rules in a MISO 0th order TS system. Table 3.1. Why linear interpolation occurs only in 2-dimensional space
No. of inputs 1 2 3 … N
No. of transparency checkpoints 2 4 8 … 2N
No. of points defining the linear interpolation 2 3 4 … N+1
3.4. Interpolation in 1st order TS systems We defined transparency conditions for standard and 0th order TS fuzzy systems that guarantee that for the full activation of any given rule we are able to predict system output correctly. For each rule such transparency checkpoint can be easily found. Beyond those checkpoints system output is the result of interpolation that occurs between the individual rules and the nature of interpolation depends on many system features including defuzzification method, type and shape of MFs and inference operators but remains predictable. Violation of transparency conditions results in a non-transparent system and predictability is lost. The nature of transparency conditions implies that transparency of these kinds of systems is of binary nature. Interpolation in 1st order TS systems significantly differs from that in standard or 0th order TS systems. Here each rule by itself represents a linear relationship
41
Transparent fuzzy syst em s: m odeling and cont rol
between the system variables. Overall output is a combination of those linear local by the means of interpolation.
y
y
x
mf1
x
mf2
mf1
1.0
1.0
µ(x)
µ(x)
0
q
di
bi+1 x
0
mf2
q
bi+1 x
di+1
Fig. 3.9. V-type interpolation (left) and S-type interpolation (right).
The interpolation issues of TS systems have been analyzed in (Babuska et. al. 1994) and (Babuska et. al. 1996) that distinguish S-type and V-type interpolation. The type of interpolation depends on the coefficients of the local model yr (Fig. 3.9). If the intersection point of two interpolating local models falls into the interpolation area (di < q < ci+1), V-type interpolation is the case. Otherwise, Sinterpolation occurs. Basic conclusions about different interpolation types are summarized in Table 3.2. Table 3.2. Comparison of V- and S-type interpolation.
Interpolation type S-type interpolation
Interpolation properties Intuitively expected results
V-type interpolation
some undesirable properties
Application area Stepwise and possibly discontinuous function approximation Continuous, smooth function approximation
According to Table 3.2 neither of the interpolation types has clear advantage over another. In (Babuska et. al. 1996), however, preference seems to be given to V-type and weighted-mean defuzzification algorithm is replaced by another functional - smoothing maximum. This replacement can be considered a
42
Transparent fuzzy syst em s: m odeling and cont rol
deviation from the "classic" TS inference algorithm, and is not accepted here because of computational complexities of smoothing maximum. The distinction between V and S-type interpolation is given in general terms by Babuska. According to it, the interpolation between a pair of affine rules (Ri, Rj) is of the V-type if and only if
Ω ij I S ij ≠ 0 and Ω ij I (C i U C j ) ,
(3.9)
where Ωij denotes the intersection of the consequents of Ri and Rj projected on x, the vector of input variables; Sij denotes the support of the intersection of affine membership functions associated with these rules and Ci, Cj denote the cores of the respective affine rules. With clear preference given to V-type interpolation, (3.9) should be maintained for all rules throughout the training process. We observe how to apply (3.9) to TS systems with single input and then with two inputs to give an illustration of the complexity of the problem. First, let us consider a SISO TS system. Assuming that we are employing fourparameter input MFs ( µ (a) = µ (d ) = 0, µ (b) = µ (c) = 1 , a ≤ b ≤ c ≤ d ), (3.9) is satisfied if for two neighboring local models yi = p0i + p1ix and yi +1 = p0, i + 1 + p1, i + 1x the following holds:
ci
δ ,
(4.21)
k =1
as the rule validation criterion. (4.22) is not the optimal solution to singularity problem because different value of δ results in different number of rules and usually we do not know beforehand which value of δ is optimal. A more general way to overcome the problem is to use recursive Kalman filter that finds the mean square estimate of the solution to (4.20) sequentially. θ(l + 1) = θ(l ) − P(l + 1)Φ T (l + 1)(y − Φ(l )θ(l )), l = 0, …,L - 1
(4.22)
where
P(l )Φ T (l + 1)Φ(l + 1)P(l ) . P(l + 1) = 1 + Φ(l + 1)P(l )Φ T (l + 1)
(4.23)
The Kalman filter is characterized by fast convergence, which is due to the adaptive learning rate determined by the R × R matrix P. Application of (4.20) or, alternatively (4.22-4.23), effectively minimizes the root-mean-squared error and results in extraction of optimal consequent parameters for the given input partition and the given data set. The properties of the partition and of the data set (e.g. data scarcity, presence of noise) have, however, serious influence on the modeling error and moreover, on the validity of the model that is demonstrated through the following examples. We are mostly concerned with the last issue (i.e. validity of the model). First we observe what are the generalization properties of the model obtained by least squares estimation (i.e. how accurate is model reaction when presented with new data that was not included in the training data set). Let us compare two approaches (removal of rules not supported by data and the use of Kalman filter) by using the 0th order TS system to generate output values for x = [0, 10] discretized with the step 0.1. Our next step is to remove all data for which τ5 > 0 from the training data set i.e. {x ∈ X | min(supp(mf5)) < x < max(supp(mf5))}. With this, the given potential rule is no longer supported by data. With data missing from the training data set, the algorithm is not able to discover the true input-output relationship in the poorly defined region. Each approach, has, however different solution to this problem. First one removes the fifth rule and identifies five singletons for the five remaining rules. Recursive
65
Transparent fuzzy syst em s: m odeling and cont rol
algorithm preserves all rules but assigns zero value to the one for what there was no data. The identified consequent constants are listed in Table 4.3 Table 4.3. Parameters of the identified models.
Rule no. 1 2 3 4 5 6
p0r of model 1 0.333 -3.5 -1.5 2 0.333
4
5
3
4
p0r of model 2 0.333 -3.5 -1.5 2 0 0.333
3
2
2
1
1 y 0
y
0 -1
-1
-2
-2
-3 -4
-3 0
2
4
6 x
8
10
-4
0
2
4
6
8
10
x
Figure 4.7 Generalization properties of the scarce model obtained with LSE (left). Output of the Kalman filter is depicted with circles, system with a rule removed with crosses. Figure at right depicts the situation where there exists considerable gap in the identification data (between the 3rd and 4th rule) and the identified output singleton introduces large approximation error (bold line) for unseen data (dashed line).
When presented with data that was not included in training data set, model 1 produces stepwise interpolation between the neighboring rules in the underdetermined region. Extra problem is faced if x equals the value where τ5 = 1 (transparency checkpoint of the missing rule) and therefore no rule is activated. Software packages tend to produce the value that is average of the range of the output variable in such case. Model 2, on the other hand, uses the zero value that was assigned to 5th rule in (Fig 4.7 left). Both solutions cannot be considered very good, but model 1 is more predictable in the sense that the output value (apart from transparency checkpoint) is defined using neighboring rules. In the second case there may exist large difference between the zero value and the actual output range and system output would be strongly biased in such case. It is important to note that with the systems where the overlap of input MFs is greater, the described problem is not that acute. Larger overlap simply ensures
66
Transparent fuzzy syst em s: m odeling and cont rol
that the number of rules not supported by data is low. From approximation point of view, non-transparent fuzzy systems therefore have clear advantage over transparent ones in the given problem. The described situation is not the only shortcoming of least squares method when using scarce data. Potentially even more dangerous is the situation where relatively small portion of data is missing and input partition is near-optimal. We observe the case where the peak of the 4th input MF is shifted to the right by 4% of the input range and 26 points of data between 4th and 5th rule are removed. Training results in the model with zero error but the output consequent for the underdetermined region obtains the value that is well outside the working range of the system and when presented with unseen data, substantial deviation from the original system is observed (Fig. 4.7, right). Presence of noise in the data set does not significantly alter the result if it is reasonably distributed because the method minimizes the mean error but even a single point that strongly deviates from the general pattern (e.g. false measurement) has dramatic influence on the modeling result (Fig. 4.8). 4 3 2 1 y
0 -1 -2
False measurement
-3 -4 0
2
4
6
8
10
x
Fig. 4.8. The effect of noise and false measurements to LSE. Original relationship (normal line), approximated relationship (bold line).
Finally, we need to derive the algorithm for 1st order TS systems. If the output parameter vector for a 1st order TS systems is in the following form
θ = [ p 01 , p 02 , ..., p 0 R , p11 , p12 ,..., p1R ,..., p N 1 ,..., p NR ]
T
(4.24)
and
67
Transparent fuzzy syst em s: m odeling and cont rol
φ1 (1) φ 2 (1) ... φ R (1) φ1 (1) x1 (1) ... φ r (1) xi (1) ... φ R (1) x N (1) ... ... ... ... ... ... Φ = φ1 (k ) φ 2 (k ) ... φ R (k ) φ1 (k ) x1 (k ) ... φ r (k ) xi (k ) ... φ R (k ) x N (k ) (4.25) ... ... ... ... ... ... φ1 ( K ) φ 2 ( K ) ...φ R ( K ) φ1 ( K ) x1 ( K ) ... φ r ( K ) xi ( K ) ... φ R ( K ) x N ( K )
Then (4.24) is obtained by using (4.20).
4.6 Gradient descent Gradient descent (GD) parameter adjusting method is based on the minimization of the error (cost) function
ε=
1 [ y − ~y ]2 , 2
(4.26)
where y denotes the output of the model and ~ y is the reference output. The history of the method goes back to 1960 when Widrow and Hoff introduced the adaline rule and applied that to McCulloch-Pitts neuron that brought learning to neural networks (Widrow and Hoff 1960). It was later shown in (Minsky and Papert 1969) that MucCulloch-Pitts neurons and adaline rule can solve only a limited group of learning problems, namely, linearly separable problems, and the interest faded. The discovery of back-propagation technique for multilayer perceptrons (Werbos 1974), popularized later in (Rumelhart et. al. 1986) renewed the interest.
4.6.1 Gradient descent learning rules for fuzzy systems The update rule for the given system parameter p to minimize the error ε, is obtained through differential calculus, provided that the error function (4.26) is differentiable ∆ξ = −η
∂ε , ∂ξ
(4.27)
where ξ is the updated parameter and η is the learning rate. The key idea in training fuzzy systems with back-propagation is to regard a fuzzy system as a feedforward network and then to use the chain rule to determine gradients of the output errors of the fuzzy system with respect to its parameters. Among the first people to apply back-propagation to fuzzy systems were Wang and Mendel (0th order TS system with Gaussian input MFs) (Wang
68
Transparent fuzzy syst em s: m odeling and cont rol
and Mendel 1992b) and Nomura (triangular MFs) (Nomura et. al. 1992); in (Guely and Siarry 1993), several other MF types are considered. The requirement of differentiability suggests that GD can be applied to 0th and 1st order TS systems with product-product-sum inference (applicability extends to (2.30) but is not considered in present section). The derivation procedures for the learning rules are given in Appendix C. For kth input-output pattern, the value of the error function is computed.
ε (k ) =
1 [ y(k ) − ~y (k )]2 , 2
(4.28)
y (k ) and model being the squared difference between the kth reference value ~ response for the given input pattern y(k) that is obtained from inference function. Parameter updates are computed using and applying chain rule. We start with 0th order TS systems (2.39). The learning task is to identify new consequent parameters p0r(l + 1) and input MF parameters (e.g. air(l + 1), bir(l + 1) and cir(l + 1) when using triangular MFs). The learning rule for output parameters p0r is
τ (k ) p0 r (l + 1) = p0 r (l ) − η ( y (k ) − ~y ( k )) R r . ∑τ r (k )
(4.29)
r =1
The learning rules for input MF parameters depend on what kind of MFs are used. It must also be taken into account that if transparent MFs are piecewise continuous, for each continuous region a different learning rule is derived. For the parameters of triangular MFs (A.3) the following learning rules are obtained: if air(l) < xi(k) < bir(l)
(4.30)
xi (k ) − bir (l ) τ r (k ) ~ air (l + 1) = air (l ) − η ( y ( k ) − y (k ))( p0 r (l ) − y ( k )) R ( x ( k ) − air (l ))(bir (l ) − air (l )) τ r (k ) i ∑ r =1 1 b (l + 1) = b (l ) − η ( y ( k ) − ~y (k ))( p (l ) − y ( k )) τ r ( k ) ir 0r R ir ( a (l ) − bir (l )) ∑τ r (k ) ir r =1
69
Transparent fuzzy syst em s: m odeling and cont rol
if bir(l) < xi(k) < cir(l)
(4.31)
τ r (k ) 1 ~ bir (l + 1) = bir (l ) − η ( y (k ) − y ( k ))( p 0 r (l ) − y (k )) R (c (l ) − bir (l )) τ r (k ) ir ∑ r =1 xi ( k ) − bir (l ) τ (k ) c (l + 1) = c (l ) − η ( y (k ) − ~ y ( k ))( p 0 r (l ) − y (k )) R r ir ir (c (l ) − xir (k ))(cir (l ) − bir (l )) ∑τ r (k ) i r =1 If xi(k)> cir(l) or xi(k) < air(l) no learning occurs, this is is also true for points where xi(k) = cir(l), xi(k) = bir(l) and xi(k) = cir(l) because derivative does not exist there. Moreover, for each MF air(l + 1) < bir(l + 1) < cir(l + 1)
(4.32)
has to be satisfied, in order to preserve the physical meaning of the parameters. If (4.32) is somehow violated, the respective update rule cannot be applied. Similar restrictions must be taken into account with other types of MFs as well. Extension from learning rules (4.30)-(4.31) to the ones for trapezoid MF (A.5) is a matter of rewriting (4.33)-(4.34): if air(l) < xi(k) < bir(l)
(4.33)
xi (k ) − bir (l ) τ r (k ) ~ air (l + 1) = a ir (l ) − η ( y (k ) − y (k ))( p 0 r (l ) − y ( k )) R ( x ( k ) − air (l ))(bir (l ) − a ir (l )) τ r (k ) i ∑ r =1 τ (k ) 1 b (l + 1) = b (l ) − η ( y ( k ) − ~ y (k ))( p 0 r (l ) − y (k )) R r ir ir (a (l ) − bir (l )) ∑τ r (k ) ir r =1
if cir(l) < xi(k) < dir(l)
(4.34)
τ r (k ) 1 ~ c ir (l + 1) = cir (l ) − η ( y (k ) − y (k ))( p 0 r (l ) − y (k )) R (d (l ) − cir (l )) τ r (k ) ir ∑ r =1 x i (k ) − cir (l ) d (l + 1) = d (l ) − η ( y (k ) − ~y (k ))( p (l ) − y (k )) τ r (k ) 0r ir ir R ( d (l ) − xir (k ))(d ir (l ) − cir (l )) ∑ τ r (k ) i r =1 The learning rules of square spline MF (A.6) parameters are given by (4.35)(4.38).
70
Transparent fuzzy syst em s: m odeling and cont rol
if air(l) < xi(k) < (air(l) + bir(l))/2
(4.35)
2( x i (k ) − bir (l )) τ r (k ) ~ a ir (l + 1) = a ir (l ) − η ( y ( k ) − y (k ))( p 0 r (k ) − y ( k )) R (b (l ) − a ir (l ))( xi (k ) − a ir (l )) τ r (k ) ir ∑ r =1 τ (k ) 2 b (l + 1) = b (l ) + η ( y (k ) − ~ y (k ))( p 0 r (k ) − y (k )) R r ir ir (b (l ) − a ir (l )) ∑τ r (k ) ir r =1
if (air(l) + bir(l))/2 < xi(k) < bir(l)
(4.36)
4(bir (l ) − xi ( k )) 2 τ r (k ) ~ air (l + 1) = a ir (l ) + η ( y (k ) − y ( k ))( p 0 r ( k ) − y (k )) R µ ( x ( k ))(bir (l ) − air (l )) 3 τ r (k ) ir i ∑ r =1 b (l + 1) = b (l ) + η ( y (k ) − ~y ( k ))( p (k ) − y ( k )) τ r (k ) 4(bir (l ) − xi (k ))( xi ( k ) − air (l )) ir 0r R ir µ ir ( xi (k ))(bir (l ) − air (l )) 3 τ r (k ) ∑ r =1
if cir(l) < xi(k) < (cir(l) + dir(l))/2
(4.37)
τ r (k ) 4(cir (l ) − xi ( k ))( xi (k ) − d ir (l )) ~ cir (l + 1) = cir (l ) − η ( y ( k ) − y (k ))( p 0 r ( k ) − y (k )) R µ ir ( xi ( k ))(d ir (l ) − cir (l )) 3 τ ( ) k ∑r r =1 4(cir (l ) − xi (k )) 2 τ r (k ) b (l + 1) = b (l ) − η ( y ( k ) − ~ − ( ))( ( ) ( )) y k p k y k ir 0r R ir µ ( x (k ))(d ir (l ) − cir (l )) 3 τ r ( k ) ir i ∑ r =1 if (cir(l) + dir(l))/2 < xi(k) < dir(l)
(4.38)
τ r (k ) 2 ~ cir (l + 1) = cir (l ) − η ( y (k ) − y (k ))( p 0 r (k ) − y (k )) R (d (l ) − cir (l )) τ r (k ) ir ∑ r =1 2( xi (k ) − cir (l )) τ (k ) b (l + 1) = b (l ) − η ( y (k ) − ~ y (k ))( p 0 r (k ) − y ( k )) R r ir ir ( d (l ) − cir (l ))( x i (k ) − d ir (l )) ∑τ r (k ) ir r =1 The learning rules for the linear coefficients of 1st order TS systems (2.37) are given by (4.39).
71
Transparent fuzzy syst em s: m odeling and cont rol
x τ (k ) pir ( k + 1) = pir (k ) − η ( y (k ) − ~ y (k )) Ri r ∑τ r ( k )
(4.39)
r =1
To obtain the learning rules of the input MFs of 1st order TS systems, term p0r in (4.30)-(4.38) must be replaced by (4.40) N
p 0 r + ∑ p ir xi − y
(4.40)
i =1
The learning rules (4.29), (4.30-4.31) and (4.33-4.39) along with (4.40) allow us to train 0th and 1st order TS systems with three different kind of input MFs. Note that application of given algorithms will result in non-transparent fuzzy system. The possibilities to preserve transparency of the modeled systems are considered in section 4.9. 4.6.2 The learning process The training algorithm consisting of (4.34) and (4.41) (triangular MFs) performs an error back-propagation procedure: to train p0r, the "normalized" R
error ( y − ~ y ) / ∑τ r is back-propagated to the layer of p0r, (Fig. 4.9) for which r =1
τr are the inputs. To train the MF parameters, the above mentioned "normalized" error × (p0r-y)τr is back-propagated to the processing unit of layer 1 whose output is xi. MF parameters are then updated by respective update rules using back-propagated values and the rest of the variables that can be obtained locally. µ22(x)
x1
µ11(x)
Π
τr
P01
P02
Σ
a y2 a/b
µ12(x)
x2
Π
Σ
b
µ21(x)
Fig. 4.9. Zeroth order TS system in network representation.
72
Transparent fuzzy syst em s: m odeling and cont rol
Thus, training is a two-pass procedure, in forward pass y is computed for a given input, in backward pass the network parameters are trained. When presented with training data set consisting of K data pairs, the question how exactly to apply GD, arises. One may take one pair of data and train all system parameters until the error for this given data pair is sufficiently small then proceed with the next pair. Typical practice is, however, to cycle through data many times, taking one step with the gradient algorithm for each data pair (each cycle is called a training epoch). Another question is when exactly to update the parameters. Usual practice (incremental mode) is to perform it after the presentation of each training example. Another possibility is to apply the update rule after the presentation of all training examples that constitute an epoch (batch mode). ∂ε k =1 ∂ξ K
∆ξ = η ∑
(4.41)
(4.41) implies that the cost function in this case is the sum of squared errors: K
K
1 [ y(k ) − ~y (k )]2 k =1 2
ε = ∑ ε (k ) = ∑ k =1
(4.42)
The incremental mode of training makes the search in parameter space stochastic in nature, which, in turn, makes it less likely for the back-propagation algorithm to be trapped in a local minimum. The use of batch mode of training, on the other hand, provides a more accurate estimate of the gradient vector. In the final analysis, however, the effectiveness of a training mode depends on the problem at hand (Haykin 1994). Training is conducted until the stop criterion is satisfied - e.g. the desired error value is achieved or change of parameters is smaller than some specified threshold value or the change of error (even though the parameter changes are still larger than threshold values) has become very small. 4.6.3 Convergence issues and higher order methods The described algorithm is quite simple but in general case there is no guarantee that the algorithm will converge to an optimal solution. There are several issues associated with the implementation of the algorithm that influence convergence. First of all the question of training data selection arises (this universally applies to all data-driven techniques). Gradient descent algorithm does not add rules to the system or delete them, thus the estimate of R has direct impact on learning result. Of importance are also the initial values of the trained parameters. It is useful to have them close to where they should be. The usual problem is that we do not know where they should be.
73
Transparent fuzzy syst em s: m odeling and cont rol
One of the most important factors for convergence is the learning rate η. The typical choice is to take the learning rate to be constant during the learning process. Usually we, however, do not know the optimal value of η or a variable η would be optimal. If learning rate is too small, the training process is very slow and may become trapped in local minimum because it is not able to "climb" over the local peaks on error surface (Fig.4.10, right). Larger η increases the learning speed but may similarly become trapped when expected change in parameter value to reach the optimum error value is smaller than step size (Fig.4.10, left). The solutions that have been suggested to improve the performance can be divided to three categories: momentum terms, adaptive learning rates and higher-order algorithms.
ε
abs(η
p(l)
ε
∂ε ) ∂p
poptimal p(l + 1)
p
abs(η
p(l) p(l + 1)
∂ε ) ∂p
p
Fig. 4.10. Typical reasons for getting stuck in local minimum: learning rate is too large (left), learning rate is too small (right).
The simplest method to increase the rate of learning and yet avoiding the danger of oscillation is to include a momentum term in the delta rule (Rumelhart et. al. 1986) ∆p (l + 1) = α∆p(l ) + η
∂ε , ∂p
(4.43)
where 0 ≤ α < 1 is the momentum constant. The inclusion of momentum tends to accelerate descent in steady downhill directions and has a stabilizing effect in directions that oscillate the sign. Proposed learning rate adaptation techniques are divided into global and local adaptation of the learning rate. Global adaptation of η requires a single learning rate value for all adaptable parameters. "Search then Converge" method (Darken and Moody 1991) has one of the highest performances among learning rate adaptation techniques where
74
Transparent fuzzy syst em s: m odeling and cont rol
1+
η (l ) = η 0
c l η 0 l0
l c l 1+ + l0 η 0 l0 l0
2
(4.44)
where η0 is the initial value of the learning rate, c is a constant and l0 >> 0 is another constant with typical values in the range 100 ≤ l0 ≤ 500. For l > l0, η(l) decreases with (1/l). Heuristic methods can be used, too. In (Jang, 1993a) the following rules are used a) if the error measure undergoes 4 consecutive reductions, increase η by 10%; b) If the error measure undergoes 2 consecutive combinations of one increase and one decrease, decrease η by 10%. The basic descent algorithm adjusts the parameters in the steepest descent direction (negative of the gradient). This is the direction in which the cost function is decreasing most rapidly. This does not necessarily produce the fastest convergence and in the conjugate gradient algorithms (Fletcher and Reeves 1964) a search is performed along conjugate directions, which produces generally faster convergence than steepest descent directions.
p(l + 1) = p(l ) + ηd(l ) ,
(4.45)
where d is the direction vector and p is the parameter vector. All the conjugate gradient algorithms start out by searching in the steepest descent direction on the first iteration
d ( 0) = − g ( 0) ,
(4.46)
where g is the gradient vector. Each successive direction vector is then computed as a linear combination of the current gradient vector and the previous direction vector
d(l ) = − g (l ) + βd(l ) .
(4.47)
There are several variations of conjugate gradient algorithms, distinguished by the manner in which the constant β is computed. With Fletcher-Reeves formula
d(0) = −g (0) , β (l ) =
g T (l )g (l ) . g T (l − 1)g (l − 1)
(4.48)
With Polyak-Ribiere formula (Polyak 1969)
75
Transparent fuzzy syst em s: m odeling and cont rol
β (l ) =
g T (l )[g(l ) − g (l − 1)] g T (l − 1)g (l − 1)
(4.49)
Newton's method is an alternative to the conjugate gradient methods for fast optimization. The basic step of Newton's method is
∆p = −H −1g ,
(4.50)
in which the Hessian matrix (second derivatives) must be computed. Computation of H and its inverse is computationally expensive. There is also no guarantee that H is nonsingular. There is a class of algorithms that are based on Newton method but which don't require calculation of second derivatives. These are called quasi-Newton methods (Dennis and Schnabel 1983), (Battiti 1992). They use an approximation of Hessian matrix that is updated at each iteration of the algorithm. Like the quasi-Newton methods, the Levenberg-Marquardt algorithm (Marquardt 1963) is designed to approach second-order training speed without having to compute the Hessian matrix. Application of higher order methods is not very common in fuzzy modeling so far. Only few applications have been reported e.g. (Jang 1996), (Männle 2000).
4.6.3 Overfitting One of the known problems characteristic to incremental training is called overfitting. The error on the training data set may be driven to a very small value but when new data is presented to the system, the error is large. System has memorized the learning examples very well but it has not learned how to react to new situations, i.e. the generalization properties are rather bad. From neural network theory, several techniques for improving generalization, are known. First there is the consideration that the number of adjustable parameters should be just large enough to provide an adequate fit (if the number is too small we deal with underfitting). The problem is that it is difficult to know beforehand how complex model is required for a specific application. Another method that can be easily applied to fuzzy systems is early stopping. In this technique, the available data is divided into two subsets. First subset - the training set - is used in updating the parameters. The second subset is the validation set. The error on the validation set is monitored during the training process. Normally both training error and validation error will decrease in the initial phase of training. When the overfitting phenomenon occurs, however, the error on the validation set will typically begin to rise. When the validation error increases for a specified number of iterations, the training is stopped, and system with the minimum validation error is returned.
76
Transparent fuzzy syst em s: m odeling and cont rol
Yet another approach is known in neural network theory termed regularization. This involves modifying the performance index by adding a term that causes the network to have smaller weights and biases. Adaptive fuzzy systems can be considered more regular and thus generally less sensitive to overfitting than neural networks because the trained parameters of fuzzy systems have physical meaning and are therefore bounded to the actual operating range of the system. This is especially true for transparent systems thus transparency can be considered a protective measure against overfitting as is demonstrated in the following example.
y
4
4
3
3
2
2
1
1
0
y
0
-1
-1
-2
-2
-3
-3
-4
0
2
4
6 x
8
10
-4 0
1
2
3
4
5
6
7
8
9
10
x
Fig. 4.11. Generalization properties of gradient descent algorithm. GD with transparency protection (left), unconstrained gradient (right). Dashed line depicts the desired performance, normal line is the model output and crosses denote the training samples.
Here we use the system used in section 4.5 for the generation of test data. We use only few samples and introduce noise - data is both scarce and noisy what makes overfitting phenomenon probable. Next we apply unconstrained gradient descent (section 4.6.1) and Jager algorithm (Jager 1995) that preserves transparency of the system (see section 4.10 for details). Finally we test both obtained models against the noise-free original system. The results are depicted in Fig. 4.11. Unconstrained gradient descent reduces RMSE to the value of 0.0782 but this has come on the expense of generalization properties as RMSE value of 0.2079 on the original data set demonstrates. The respective errors (0.1464 and 0.1852) with the transparent algorithm are less different and the modeled relationship is smoother.
77
Transparent fuzzy syst em s: m odeling and cont rol
4.7 Clustering algorithms Cluster is a group of objects that are mathematically more similar to one another than to members of the other clusters. Clustering is the detection of subspaces (clusters) of the data space. The potential of clustering algorithms to reveal the underlying structures in data, can be exploited for partitioning the input space of fuzzy systems or constructing the rule base along with the definition of MFs (product space clustering). There are many clustering methods available that can be divided into two subcategories: hard and fuzzy clustering methods. Hard clustering methods (e.g. hard c-means clustering algorithm (Duda and Hart 1973)) are based on classical set theory and require that an object either does or does not belong to a cluster. Fuzzy clustering methods, on the other hand, allow objects to belong to several clusters simultaneously, with different membership degree. For many real-world problems a fuzzy partitioning of the underlying space is considered more realistic than hard clustering, especially in association with fuzzy systems. A large family of fuzzy clustering algorithms is based on minimization of fuzzy cmeans objective function J (Dunn 1974). K
H
J = ∑∑ ( µ hk ) m d A2 (z (k ), ν h ) ,
(4.51)
k =1 h =1
where H is the number of clusters, µhk is the notation for membership and νh for cluster centers. z(k) denotes the kth observation of input-output data (4.2), being a row vector in matrix Z. Distance measure used in 4.51 is defined as
d A2 (z (k ), ν h ) = ( ν h − z (k )) A( ν h − z (k )) T
(4.52)
The minimization of c-means functional can be solved by using a variety of available methods. The most widely used method is fuzzy c-means algorithm, an iterative optimization approach proposed in (Bezdek 1981). According to this algorithm, the cluster prototypes (h = 1,…, H) are computed by K
∑ µ hk (l ) m z(k ) ν h (l ) =
k =1
,
K
∑ µ hk (l )
(4.53)
m
k =1
where l is the number of iteration. In the next step the distances are found for all clusters and for all data objects
78
Transparent fuzzy syst em s: m odeling and cont rol
d A2 (z (k ), ν h (l )) = ( ν h (l ) − z (k ))A( ν h (l ) − z (k )) T ,
(4.54)
where h = 1,…, H, k = 1,…, K. Next, the partition matrix U is updated according to
1
µ hk (l ) =
d ( x(k ),ν h (l )) ( x(k ),ν j (l )) j =1 H
∑ d
2 A 2 A
2 m −1
, (4.55)
h = 1,…, H, k = 1,…, K. The procedure is repeated by returning to (4.53) until U (l ) − U (l − 1) < ε . Convergence of the algorithm is proved in (Bezdek 1980). A singularity occurs when d A2 (z (k ), ν h (l )) = 0 for some z(k) and one or more cluster prototypes vh. In this case 0 is assigned to each µhk in the given column for what d A2 (z (k ), ν h (l )) > 0 and the membership is distributed arbitrarily among the remaining µhk so that
H
∑ µ hk = 1 , for the given k. h =1
The shape of the clusters is determined by the choice of A in the distance measure (4.51). Typically, A = I, which induces the standard Euclidean norm. The Euclidean norm induces hyperspherical clusters, i.e. clusters whose surfaces of constant membership are hyperspheres. A can be defined as the inverse of the n×n sample covariance matrix of Z, i.e. A = R-1, with R=
1 K
K
∑ (z (k ) − z ) T (z (k ) − z )
(4.56)
k =1
where z denotes the sample mean of the data. Such A induces hyperellipsoidal clusters with arbitrary orientation but the common limitation of clustering algorithms based on a fixed distance norm is that such a norm forces the objective function to prefer clusters of that shape even if they are not present. Matrix A can be adapted by using estimates of the data covariance as in Gustafson-Kessel (GK) algorithm (Gustafson and Kessel 1979). The difference between GK algorithm and classical FCM algorithm is then that each cluster has its own norm-inducing Ah, resulting in d A2 h (z (k ), ν h (l )) = ( ν h (l ) − z (k ))A h ( ν h (l ) − z (k )) T ,
(4.57)
where
79
Transparent fuzzy syst em s: m odeling and cont rol
A h = ρ h det( Fh
1 )n
Fh−1
(4.58)
and K
∑ (µ hk (l − 1)) m (z (k ) − ν h (l )) T (z(k ) − ν h (l )) Fh =
k =1
(4.59)
K
∑ (µ hk (l − 1))
m
k =1
An advantage of the GK algorithm over FCM is that GK can detect clusters of different shape and orientation in one data set (although, due to the constraint on cluster volume, it can only find clusters of approximately equal volumes). It is, however, computationally more expensive than FCM. The number of clusters H, has the most severe influence to convergence and in the sense of the effect on U. The weighting exponent m > 1 is also quite important, measuring the "fuzziness" of the clusters. If m approaches one from above, the partition becomes hard. If m → ∞, the partition becomes maximally µhk = 1/H. fuzzy, i.e. Usually the partition matrix U and cluster centers are initialized with random values. One possibility to improve the convergence of the fuzzy clustering algorithm is to use special clustering algorithms to return the initial estimates of cluster centers, e.g. mountain clustering (Yager and Filev 1994b) or subtractive clustering (Chiu 1994). The mountain method is a grid based process for identifying the approximate locations of cluster centers in data sets with clustering tendencies. In the 1st step the object space is discretized to generate the potential cluster centers nm. The 2nd step uses the data to construct the mountain function. The mountain function K
M (n m ) = ∑ e − (α ⋅d ( nm ,z ( k ))) ,
(4.60)
k =1
where z(k) is the kth data observation, α is a positive constant and d(nm, z(k)) is a distance measure between nm and z(k) typically computed as: d (n m , z (k )) = (z (k ) − n m )(z (k ) − n m ) T .
(4.61)
Consequently, the closer a data point to a node the bigger its contribution to the node’s score. The higher the mountain function value at a node the larger its potential to become a cluster center. The 3rd step of the algorithm is to use the mountain function to generate the cluster centers. The node with maximum total score will be marked nl* as the first cluster center. In order to get the next cluster, the
80
Transparent fuzzy syst em s: m odeling and cont rol
effect of the current cluster must be eliminated. This is carried out by revising the mountain function.
M l +1 (nm ) = M l (nm ) − M l e − β ⋅d ( nl , nm ) . *
The process will be repeated until Ml determination of l cluster centers.
(4.62) + 1
< δ (stop criterion) and results in
Subtractive clustering is a computationally less expensive extension of the mountain method. It assumes that each data point z(k) is a potential cluster center and calculates a measure of the potential Pl for each data point based on the density of surrounding data points. Thus the number of potential clusters equals the number of data points and does not grow exponentially with the number of variables. K
Pl (z (i)) = ∑ e − d ( z (i ),z ( k )) , i = 1, …, K 2
(4.63)
k =1
The algorithm selects the data point with the highest potential as the lth cluster center Pl (z *l ) = max Pl (z (k )) ,
(4.64)
1≤ k ≤ K
and then destroys the potential of data points near the lth cluster center using. Pl +1 ( z (k )) = Pl (z (k )) − Pl (z *l −1 )e βd ( z ( k ), z l −1 ) *
2
(4.65)
This process repeats until the potential of all data points falls below a threshold, e.g. Pl +1 (z *l ) P1 (z 1* )