194 81 14MB
English Pages 238 Year 2005
/HFWXUH1RWHVLQ(FRQRPLFV DQG0DWKHPDWLFDO6\VWHPV
)RXQGLQJ(GLWRUV 0%HFNPDQQ +3.Q]L 0DQDJLQJ(GLWRUV 3URI'U*)DQGHO )DFKEHUHLFK:LUWVFKDIWVZLVVHQVFKDIWHQ )HUQXQLYHUVLWlW+DJHQ )HLWKVWU$9=,,+DJHQ*HUPDQ\ 3URI'U:7URFNHO ,QVWLWXWIU0DWKHPDWLVFKH:LUWVFKDIWVIRUVFKXQJ,0: 8QLYHUVLWlW%LHOHIHOG 8QLYHUVLWlWVVWU%LHOHIHOG*HUPDQ\ (GLWRULDO%RDUG $%DVLOH$'UH[O+'DZLG.,QGHUIXUWK:.UVWHQ86FKLWWNR
3KLOLSSH0DWKLHX %UXQR%HDXILOV 2OLYLHU%UDQGRX\(GV
$UWLILFLDO(FRQRPLFV $JHQW%DVHG0HWKRGVLQ)LQDQFH *DPH7KHRU\DQG7KHLU$SSOLFDWLRQV
(GLWRUV 3URI3KLOLSSH0DWKLHX /,)/867/&LWp6FLHQWLILTXH 9LOOHQHXYHG·$VFT&HGH[ SKLOLSSHPDWKLHX#OLIOIU 'U%UXQR%HDXILOV /,)/867/&LWp6FLHQWLILTXH 9LOOHQHXYHG·$VFT&HGH[ EUXQREHDXILOV#OLIOIU
3URI2OLYLHU%UDQGRX\ &/$5((867/ $YHQXHGX3HXSOH%HOJH /LOOH&HGH[)UDQFH ROLYLHUEUDQGRX\#XQLYOLOOHIU
/LEUDU\RI&RQJUHVV&RQWURO1XPEHU
,661 ,6%16SULQJHU%HUOLQ+HLGHOEHUJ1HZ and a synchronous run of the market price with the fundamental price (with /(o.97,1.03])The sum of all / is chosen to be equal or less than 1, e.g. /(o,o.85] = 0.137 means that at least 13.7% of time periods the price pt of the risky asset deviates from p* by at least -15%. Over all simulations, overvaluation occurs more rarely than undervaluation. The reason for this is that the fundamental price reflects the valuation of a risk-neutral investor, whereas the market price is determined by supply and demand of agents who use a risk-averse utility function. One exception of this market behavior can be observed, if supply and demand would result in a negative market price (which is prohibited and set equals to zero). Figure 2 and figure 3 show the price evolvement and the wealth ratio for the simulation run with k= 1 over 150000 periods. Highfluctuationsoccur with a wealth ratio higher than 0.7. As assumed before a trend is visible that the wealth ratio increases within the simulation run because the parameter setting empowers agents to build up monopolies. Table 1 shows the calculated key ratios for the different market phases in this simulation run. Table 1. Simulation run without taxes and trading restrictions, k = 1 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
pt 99.86 84.31 99.43 105.17 87.92 98.87 98.70 103.28 96.38 99.75 100.43 95.02 94.31 100.22 96.12 98.78 99.89 103.56 102.69 97.61
Pmax
Pmin
Pt
105.92 105.91 266.42 214.35 183.10 148.83 116.98 196.98 144.40 137.38 132.07 139.31 168.51 149.16 140.78 137.46 135.42 155.02 161.37 141.82
66.64 100.15 21.77 100.28 0 100.16 0 100.28 0 99.99 0 100.08 85.65 99.71 0 99.88 0 99.56 0 99.77 72.13 100.27 0 100.00 0 100.08 11.04 100.08 0 100.30 85.16 100.06 80.15 99.93 72.65 99.87 0 100.15 0 99.73
P^ax
P'^in
P 0. He then elects one of these rules with a random process proportional to their strength. The action aik associated with the elected rule gives the agent decision. If there are no rules activated by the current market state, then the agent's decision is to stay unchanged {hi{t) = 0 and Oi{t) = 0). At the end of the time step, the agent updates the previously activated rules according to how much money they would have make him earn, giving: Sik{t + 1) = (1 - c)sik{t) + caik{p{t + ! ) - ( ! + r)p{t) + d{t + 1)) The parameter c controls the speed at which the rules strength is updated. Each time the genetic algorithm is run, the worst rules (e.g. with the smallest strength) are deleted. They are replaced by new rules generated using a classical genetic process: the best rules are selected to be the parents' of the new rules. A new rule can be generated either by mutation (only a bit of the parent's chromosome is changed) or by a crossover process (reproduction between two parents rules). This mechanism permits, on the one hand, to delete the rules that don't make our agent earn money and to build up new rules using good genetical material. This process is aimed to increase the adaptation of the agents to the market activity. There are two types of agents in our simulations: some who try to be as close as possible to the fundamental value of the stock (will be refered as fundamentalist agents in the following) and some who try to make the maximum benefit without taking care of the fundamental value (will be refered as speculators agents in the following). This is a point that largely make our work different from those previously cited. We think that in [3] and [7] one issue is that the decision rules of agents are excessively dominated by randomness: whatever the market statements are, the corresponding action is decided randomly. It is true that along market activity, the evolving process selects best responses to those statements, but nothing grants that
20
Julien Derveeuw
the corresponding actions are relevant with respect to an economic logic. For example, it is very probable that although a stock is mispriced (let's say underevaluated), the agents will never try to arbitrate this spread (here with buying it). The other issue is that technical statements as well as fundamental statements are melted and no typical behavior is clearly observable. We try to improve the agent model by defining a minimum economic logic that leads each subpopulation actions: fundamentalists try to arbitrate any price deviation whereas speculators ground their decisions on subjective, technical informations. As said before, the main characteristic of the fundamentalist agents is that they have appropriate decisions considering the spread between the observed prices and the fundamental value. Let's consider the composition of the chrosome and what kind of statements are coded inside. Bit Market Indicator 1 VtjvU > 0.2 2 Pt/vft > 0.4 3 pt/vft > 0.6 4 Pt/vft >0.S 5 vft 6 ^ > 1.2 vft 7 ^>l-4 8 ^>l-6 9 ^>l-8 10 ^>2.0
Corresponding action Pt/vft e 1 =^ to buy [0.0,0.8] - 1 =^ to sell [1.2,2.0] [0.0,7 ] 7 > 0 . 8 0 =^ stay unchanged [7 , 2.0] 7 < 1.2 0 =^ stay unchanged (ibid.) [0.0,2.0] 0 => stay unchanged (ibid.)
Fig. 1. Fundamentalists' chromosome
Fig. 2. Rules for fundamentalist rationalization
Let's consider the seventh gene; the corresponding statement, depending on its wahxQ {l, 0, ^} is: The price {is, i s n o t , i s o r i s not} at least forty percent above the fundamental value We have added to the original SASM a rationalize procedure. This procedure aims to achieve a minimal economic rationality for the agents. Fundamentalists are assumed to arbitrate significant spreads between fv and p, that is to bid for underpriced shares and to ask for overpriced stock This procedure is based on some rules presented in table 3.1. One has to keep in mind that this procedure is run each time a new rule is generated (consequently, when the genetic algorithm is initialized and run). Let's consider now the second subpopulation: the speculator agents. As said before, those agents do not arbitrate prices but rather try to make profit using trends or subjective knowledges. Therefore, their chromosome is constructed using this kind of market representations as shown in table 1. The chromosome is thought to code general sentiment on the market trend which is very different than the identification of a market state. What we mean here is that
Market Dynamics and Agents Behaviors: a Computational Approach
21
Table 1. Speculators' chromosome Bit
Market Indicator
1 2 3 4 5 6 7 8 9
Pt > Pt-i Pt > Pt-2
pt>i/^xj:iZt-iPi
pt>i/ioxj:iz',\pi pt>i/mxj:triPi p t > 1/250 x E ="!> • * Pt > l/2[Minpi Pt > l/2[Minpi Pt > l/2[Minpi
+ + +
Maxpi i e [ t - i , t - i o ] Maxpi i 6 [ t - l , t - 1 0 0 ] Maxpi i e [ t - l , t - 2 5 0 ]
this trend is supposed to constraint the attitudes of the agents that wants to exploit it, not with an arbitrage strategy but rather in following it. Hence, if the general sentiment is bull market a rational behavior for a speculator agent is to buy. (symetrically, if the market is bear, the rational behavior is to sell). We have coded this logic in the speculator rationalization. To have a global sentiment on the market trend, we simply appreciate the dominant trend given by the indicators or groups of indicators. The decision making process for speculator agents is relatively complex and can be divided into two major steps. For bits 1, 2, 7, 8 and 9, we simply consider if the belief of the agent validates the condition or not. Let's consider the example of bit 8: we explicitly test if the price is over or above the median of the interval bounded by the highest and the lowest quotation during the lasts 100 days. If the price is above, it is thought that the price will decrease and alternatively, if it under this median, it is believed that the price will rise. As instance, this last situation pushes the agent to bid new shares. Bits 3 to 6 receive a special treatment: bits 3 and 4 are considered together as well as bits 5 and 6. The first pair allows the estimation of short range trend while the second pair allows the estimation of a long range trend, pairs, hiU is the first one e.g. bit number 3 and bit number 5 while 6^+1 is the second one e.g. 4 and 6. To appreciate the trend, one has to consider the situation of the current price relatively to those bits. As example, let's consider the situation where the chromosome's bits 3 and 4 are respectively 0 and 1. In this case, it is false to assert that the current price is above the mobile average on the past five days whereas it is clearly above the mobile average on the past ten days. We therefore consider that this information is not sufficiently clear to influence the decision and bid and ask positions have to be weighted with the same absolute value scalar: 0.5. When those bits are respectively 1 and 1, the trend is clearly bull and the agents will be temptated to follow it, e.g. to buy. The nine possibilities for each pair are summed up in table 2. Afirststep in the speculators' rationalization process is then achieved: our agent can form an initial belief on the possible tendency of the market summing the values of each indicator. One has to keep in mind that some of them are positive (giving bid signals) negative (ask signals) or null (do nothing). If the number of positive
22
Julien Derveeuw Table 2. Speculators' rationalization when i G {3,5} 0 10# 1 ##0 1 10# 1 0 0 1## Partial rationalization 1 -1 0 {0:5,-"0:5}|{0.5,-0.5} -1 1 -1 1 biti biti+i
signals is dominating, the initial belief will be that the price will probably rise and the corresponding behavior will be to bid. Symmetrically dominating negative signals lead to ask and null signals lead to stay unchanged. One can easily imagine that such a logic may lead to constantly growing or falling markets: bear signals are followed by bid positions that push the price up. Why this tendency should break down ? According to [14], one major indicator observed by the traders is market liquidity. The idea is that operators are very concerned with the possibility of clearing their positions (to sell when they hold stocks or to buy if they are short). This implies that minimum volumes are realized at each time step. When the market becomes illiquid, agents may be stucked with their shares. Therefore, they follow the market only and only if they are confident on the liquidity level of the market. This point has been included in the agents' logic with the following rules: • •
each agent has her own treshold above which she considers that the market is unsufficiently liquid to clear her positions. When this threshold is reached, she adopts a position opposite to the one she would have adopted without considering this treshold. By the way, she decides to reverse her investment strategy to go out of the market.
4 Experimental Schedule and Results As the model contains many numerical parameters, we have chosen to only vary the ones which directly impacts the global price dynamics, that is to say the speculator agents proportion and their liquidity fear parameters. The other ones are considered as constant as they can be seen as more technical model parameters. All of the following experiments are realized on a time range of 10000 iterations. Though, as the genetic algorithm used by the agents to adapt themselves to the market needs a learning period, only iterations between time step 2000 and 10000 are shown. All statistics are conducted on this range unless the opposite is mentionned. As our primarly goal is to study the influence of speculator agents' proportion on price dynamics, we first run an experiment without speculators (i.e. only with fimdamentalists). Thisfirstexperiment allows us to validate our fundamentalist agent model by matching the experimental results with the ones obtained by [15] with the original SF-ASM model. This experiment will also be used as a comparison base with other ones as it represents the baseline price dynamics of our model (i.e. with the less variant price series). Other experiments are realized by gradually increasing the speculator agents proportion in the agents population and by adjusting their liquidity
Market Dynamics and Agents Behaviors: a Computational Approach
23
fear. Many experiments have been run, but we only detail here the ones with the more significant results.
4.1 A Fundamentalist Market The figure 3 represents the price and fundamental value motions when the market is only made of fiindamentalist agents. The two series perfectly overlap.
\yvM Fig. 3. Market dynamics with fundamentalist agents
The first step to test if those motions are somewhat consistent with what happens in the real stock markets consist in testing whether they are driven by non-stationary processes or not. The appropriate test to seek for a random-walk process in market returns is an Augmented Dickey-Fuller unit root test (e.g. ADF). Both fundamental values and prices have to be random walks if we want to qualify the simulations realistic since the immense part of academic researchs attest such motions for modem, real stock market dynamics. In the following tests, the null hypothesis is time seriepresents one unit root (HQ) while the alternative is time serie has no unit root (Hi). Table 3 reports the results of those tests. Interpretation is the following: if t-Statistics is less than the critical value, one can reject the HQ against the one-sided alternative Hi which is not possible in our case. Table 3. ADF Unit Root Tests. time series t-Statistics Prob.* Augmented Dickey-Fuller test fund. val. -2.3594 0.4010 Augmented Dickey-Fuller test price -2.4154 0.3713 Critical values: 1% level: -3.9591, 5% level:-3.4103, 10%: -3.1269 *MacKinnon one-sided p-values.
24
Julien Derveeuw
A Johansen co-integration test shows that prices and ftindamental values coevolve. We also observe that the spread between prices and fundamental values remains very weak (between -3.22% and +2.72% with a 0.02% mean and a 1.11%) standard deviation). This base line experiment exhibits therefore some interesting results if one considers its proximity with real market dynamics. It also shows that bounded rationality agents can make emerge a random walk motion that is characteristic of efficient prices on stock markets. This result is already documented by [15],[2]. Nevertheless, our contribution is to obtain such results with agents following rules that make sense, which was less evident in the original studies. 4.2 A Mixed Market
Figure 4 represents price and fundamental value motions when the market is made of 25% of fundamentalist agents and 75% of speculators.
Fig. 4. Market dynamics with 25% fundamentalist and 75% speculators It appears that the market is more volatile when it is flooded with fundamentalists which is an expected result. If one considers the statistical properties of the price motion globally (on the complete sample), it appears that a Null hypotheses of random walk can be rejected with a very low risk (with p < 3%). This result is understandable as the agents population is composed of a majority of speculators. Though, on smaller samples (for example on time range from 2000 to 3000), the result of the test is inverted: the market is in a period where it behaves as if it follows a random walk. In such periods, the price and the fundamental value motion are co-integrated, which shows that market follows the fundamental value dynamics. In Table 4, we have reported some basic statistics related the spreads between observed prices and fundamental values. It clearly appears that prices are much more volatile in the second regime (with speculators) than in the first one (standard, maximum and minimum deviations). The over-returns mean is also strictly positive. Moreover, returns distribution does not follow a Normal distribution.
Market Dynamics and Agents Behaviors: a Computational Approach
25
Table 4. Prices deviations relatively to the fundamental value Speculators prop. Mean Median Std. Dev. Skewness Kurtosis
Global sample 0% 75% 0.032079 2.135464 0.116219 1.259602 2.067116 3.809237 -0.200899 1.191886 2.421358 4.609048
Critical sub-sample 0% 75% 0.152548 3.180839 0.150780 1.450169 2.114522 5.535200 -0.230643 1.228508 2.236049 3.489872
On a critical period where we can visually identify a bubble, for example during time period 5000-5400, prices are still a random walk. Table 4 reports prices deviations during this critical event. Here the standard deviation is greater than the one observed on the complete sample. A bubble is hence characterized by a great deviation between the stock price and its fundamental value during a long time range. This typical dynamic, obtained with 75% of speculators and 25% of fundamentalists, can be found with other sets of parameters as long as speculator agents proportion is great (> 70%). In the speculative regime (when speculators compose the main part of population), we obtain a highly volatile price dynamic with bubbles and crashes. These phenomena would rather be undetectable if we could not watch the fundamental value. Moreover, as the prices follow most of the time a random walk, nothing can distinguish such a dynamic from the one observed with a fundamentalist population except the comparison between the prices and the fundamental value. Hence, there could be speculative bubbles in real market while the technical efiiency properties would be respected.
5 Conclusion In our simulations, we obtain price dynamics specific to our two agents populations. These behaviors were designed to illustrate two main economic logic: the first follows the classical economic theory which is grounded on agents arbitrating differences between the fundamental values and the current stock prices, whereas the second is mainly based on ideas from the keynesian theory of speculation. The first market dynamics is obtained when the agents population is only composed of fundamentalists. We show that in this case, the price dynamics follows a random walk which co-evolve with the fundamental values. This first result can be related to the ones of [15]: inductive agents in bounded rationality can make efficient prices emerge. The difference here is that fundamentalists only ground their decisions on classic market indicators and that these decisions are made following constitent behavioral rules, which is not the case in many simulated stock markets. When speculator agents compose the main part of the agents population, we obtain another type of dynamics: prices still follow a random walk process, but during
26
Julien Derveeuw
some periods, the system reaches a critical state. This critical state is characterized by the emergence of a new phenomenom: the stock starts to be more and more overpriced (bubble) before falling back violently to its fundamental value (crash). Moreover, these market dynamics are very volatile. Next steps in our research could be to introduce a third agent behavior which will act as a market regulator to arbitrate the market and prevent bubbles from happening. This could for example be realized by introducing a behavior who would ponctually decrease the market liquidity to force the speculators to reverse their decisions. One can also imagine to study the impact of social interaction between agents on market dynamics to see if it would arbitrate the price deviations or amplify them.
References 1. B. Arthur. Inductive reasoning and bounded rationality : the el-farol problem. American Economic Review, S4A06-4n, 1994. 2. B. Arthur. Inductive reasoning and bounded rationality : the el-farol problem. American Economic Review, S4'A06-411, 1994. 3. B.W. Arthur, J.H. Holland, B. LeBaron, R.G. Palmer, and P. Tayler. Asset pricing under endogeneous expectations in an artificial stock market. In D. Lane B.W. Arthur and S.N. Durlauf, editors. The Economy as an Evolving Complex System II, pages 15-44, 1997. 4. N. Ehrentreich. A corrected version of the santa fe institute artificial stock market model. Working Paper, Martin Luther Universitat, Dept of Banking and Finance, HalleWittenberg (Germany), September 2003. 5. S. Focardi, S. Cincotti, and M. Marchesi. Self-organization and market crashes. Journal ofEconomic Behavior and Organization, 49(2):241-267, 2002. 6. L. Gulyas, B. Adamcsek, and A. Kiss. An early agent-based stock market : Replicaton and participation. Proceedings of the NEU 2003, 2003. 7. L. Gulyas, B. Adamcsek, and A. Kiss. An early agent-based stock market : Replicaton and participation. Proceedings of the NEU 2003, 2003. 8. N.F. Johnson, D. Lamper, P. Jeffries, M.L. Hart, and S. Howison. Application of multiagent games to the prediction offinancialtime-series. Oxford Financial Research Centre Working Papers Series N° 2001mf04., 2001. 9. J. M. Keynes. The General Theory of Interest, Employment and Money. MacMillan, London, 1936. 10. B. LeBaron. Experiments in evolutionnary finance. Working Paper, University of Wisconsin - Madison, August 1995. 11. B. LeBaron. Evolution and time horizons in an agent based stock market. Macroeconomic Dynamics, 5(2):225-254, 2001. 12. B. LeBaron. Building the santa fe artificial stock market. Working Paper, Brandeis University, June 2002. 13. H. Levy, M. Levy, and S. Solomon. A microscopic model of the stock market: Cycles, booms, and crashes. Economic Letters, A5{\)'A()2>-\\\, May 1994. 14. A. OrlQan. Le pouvoir de la finance. 1999. 15. R.G. Palmer, W.B. Arthur, J.H. Holland, B. LeBaron, and P. Tayler. Artificial economic life : A simple model of a stockmarket. Physica D, 15:264-21A, 1994. 16. P.A. Samuelson. Proof that properly anticipated prices fluctuate randomly. Industrial Management Review, (6):41-49, 1965.
Traders Imprint Themselves by Adaptively Updating their Own Avatar Gilles Daniel^, Lev Muchnik^, and Sorin Solomon^ ^ School of Computer Science, University of Manchester, UK [email protected] ^ Department of Physics, Bar Ilan University, Ramat Gan, Israel [email protected] ^ Racah Institute of Physics, Hebrew University of Jerusalem and Lagrange Laboratory for Excellence in Complexity, ISI Foundation, Torino [email protected] i.ac.il
Simulations of artificial stock markets were considered as early as 1964 [20] and multi-agent ones were introduced as early as 1989 [10]. Starting the early 90's [18, 13, 21], collaborations of economists and physicists produced increasingly realistic simulation platforms. Currently, the market stylized facts are easily reproduced and one has now to address the realistic details of the Market Microstructure and of the Traders Behaviour. This calls for new methods and tools capable of bridging smoothly between simulations and experiments in economics. We propose here the following Avatar-Based Method (ABM). The subjects implement and maintain their Avatars (programs encoding their personal decision making procedures) on NatLab, a market simulation platform. Once these procedures are fed in a computer edible format, they can be operationally used as such without the need for belabouring, interpreting or conceptualising them. Thus ABM shortcircuits the usual behavioural economics experiments that search for the psychological mechanisms underlying the subjects behaviour. Finally, ABM maintains a level of objectivity close to the classical behaviourism while extending its scope to subjects' decision making mechanisms. We report on experiments where Avatars designed and maintained by humans from different backgrounds (including real traders) compete in a continuous doubleauction market. Instead of viewing this as a collectively authored computer simulation, we consider it rather as a new type of computer aided experiment. Indeed we consider the Avatars as a medium on which the subjects can imprint and refine interactively representations of their internal decision making processes. Avatars can be objectively validated (as carriers of a faithful replica of the subject decision making process) by comparing their actions with the ones that the subjects would take in similar situations. We hope this unbiased way of capturing the adaptive evolution of real subjects behaviour may lead to a new kind of behavioural economics experiments with a high degree of reliability, analysability and reproducibility.
28
Gilles Daniel et al.
1 Introduction In the last decade, generic stylized facts were reproduced with very simple agents by a wide range of models [3, 12, 14, 6, 16, 8]. By the very nature of their generic properties, those models teach us little on real particular effects taking place as result of real particular conditions within the market. In order to understand such specific market phenomena, one may need to go beyond "simple-stupid" traders behaviour [1]. Thus the task of the present generation of models is to describe and explain the observed collective market phenomena in terms of the actual behaviour of the individuals. For a long while, classical economics assumed individuals were homogeneous and behaved rationally. Thus it was not necessary to study real people behaviour since (presumably) there is only one way to be rational. Even after the conditions of rationality and homogeneity were relaxed, many models did it by postulating arbitrary departures not necessarily based on actual experiments. When the connection to the real subjects behaviour was considered [11], an entire host of puzzles and paradoxes appeared even in the simplest artificial (laboratory) conditions. Thus the inclusion of real trader behaviour in the next generation of models and simulations is hampered by the inexistence of comprehensive, systematic, reliable data. Given the present state of the art in psychological experiments, where even the behaviour of single subjects is difficult to assess, we are lead to look for alternative ways to elicit the necessary input for agent-based market modelling. In this paper we propose a way out of this impasse. Rather than considering the computer as a passive receiver of the behavioural information elicited by psychological experiments, we use the computer itself as an instrument to extract some of the missing information. More precisely, we ask the subjects to write and update adaptively, between simulation runs (or virtual trading sessions) their own avatars. By gradual corrections, those avatars converge to satisfactory representations of the subjects' behaviour, in situations created by their own collective co-evolution. The fact that the co-evolution takes place through the intermediary of the avatars interaction provides an objective detailed documentation of the process. More important, the dialogue with the avatars, their actions and their collective consequences assist the subjects in expressing in a more and more precise way their take on the evolving situation and validate the avatar as an expression of the subject internal decision mechanisms. Ultimately, the avatar becomes the objective repository of the subject's decision making process. Thus we extend, with the help of computers, the behaviorist realm of objectivity to a new area of decision making d3mamics. The classical behaviourism limits legitimate research access to external overt behaviour, restraining its scope to the external effects produced by a putative mental dynamics. The method above enables us to study the subjects decision making dynamics without relying on ambiguous records of overt subjects behaviour nor on subjective introspective records of their mental state and motivations. Far from invalidating the psychological experimental framework, the present method offers psychological experiments a wide new source of information in probing humans mind. The competitive ego-engaging character of the realistic NatLab
Traders Imprint Themselves by Adaptively Updating their Own Avatar
29
market platform [17] puts humans in very interesting, authentic and revealing situations in a well controlled and documented environment. Thus standard psychological techniques can exploit it e.g. by interviewing the subjects before and after their updated strategies are applied and succeed (or fail!).
2 Avatars The program sketched in the previous section suggests a Behavioural Finance viewpoint, in a realistic simulation framework. More precisely, the avatars acting in such an environment are able to elicit from the subjects operationally precise and arbitrarily refined descriptions of their decision processes. In particular, by analysing the successive avatar versions that passed the validation of its owner, one can learn how the owner behaves in this market environment, how (s)he designs his/her strategies, how (s)he decides to depart from them, how (s)he updates them iteratively, etc. Thus the new environment acquires a mixed computational and experimental laboratory character. In this respect, the present study owes to previous research that involved simulations / experiments combining human beings and artificial agents, in real-time [5] or off-line [15, 2] - see [7] for a review of computational vs experimental laboratories. The heart of the new simulation-experimentation platform is the co-evolving set of Avatars. They constitute both the interacting actors and the medium for recording the chronicles of the emergent collective dynamics of the subjects. As a medium for capturing cognitive behaviour, the avatars help extend the behaviorist objectivity criteria to processes that until now would be considered as off-limits. We are achieving it by trying to elicit from humans operational instructions for reaching decisions that they want implemented by their market representatives - the avatars. There is an important twist in this procedure: we are not trying to obtain from the subjects reports of their internal state of mind and its evolution; we are just eliciting instructions for objective actions in specific circumstances. They are however formulated in terms of conditional clauses that capture users intentionality, evaluations, preferences and internal logics. 2.1 Principle
At the beginning of a run, every participant designs his own avatar which is used as a basis to generate an entire family of artificial agents whose individuality is expressed by various (may be stochastically generated) values of their parameters. The resulting set of artificial agents compete against each other in our market environment; see Fig. 1. We use many instances, rather than a single instance of the avatar for each subject, for the following reasons: • •
having a realistic number of traders that carry a certain strategy, trading policy or behaviour profile having enough statistics on the performance of each avatar and information on the actual distribution of this performance
30
Gilles Daniel et al.
•
•
!
-
h « « « «
?-
p 7^"^
•
g
P P P P
h^
H n N 1
' Human subject
-
Avatar
• • • •
^
^^"^^rder^^
Population of artificial agents
Fig. 1. The Avatar-Based Method: Human subjects design their avatar in the NatLab environment. From each avatar, a family of artificial agents is generated and included in the market. Once the population of agents is generated, a first simulation run is performed. A typical run lasts about 10 minutes of CPU , which may represent years of trading on the artificial time scale. At the end of each run, the results are processed and presented to the participants. In our experiments until now, both private and public information were made available. In particular, the price (and volume) trajectory, the (relative) individuals wealth in terms of cash holdings, stock holdings, and their evolution were publicly displayed. The avatar codes were also disclosed and the participants were asked to describe publicly their strategy and the design of their avatar. After being presented with the results (whether full or only public information) of the previous run, the participants are allowed to modify their own avatar and submit an upgraded version for the next run, as described in Fig. 2. The goal of this iterative process, co-evolving subjects thinking with computer simulations, is to converge in two respects; the subject understands better and better: • •
the consequences of his/her own strategy how to get the avatars to execute it faithfully
2.2 Comparison Between Approaches In this section, we discuss the relevance of our method in the context of other works in economics. The economics field spans a wide range of fields and approaches. In the table displayed in Fig. 3, the four rows classify the activities in terms of their context and environment, starting with the DESK at the bottom of the table, extending it to the use of computers, then to the laboratory and ultimately to the real unstructured world.
Traders Imprint Themselves by Adaptively Updating their Own Avatar
31
Facilitator
Participant
_1^_ ^ ^ " ^ D e s i g n/l m prove^N ^^^^^^ Avatar ^ ^
^ V^^
Submit Avatar
^ ^ ^ ^
^(^^^^^^^CoW^a V ^ ^ Avatars
i ^ V^^
Generate families
^
^
^ N ^ ^
Learning
& (
Adaptation
Run simulation
)
/ ^ ^ Post-process ^ N V ^ ^ data ^ ^
/^Explain publiclyN^ V o w n avatar d e s i g n ^
^"^^ V ^ ^
Display data
^
^ ^ ^
Fig. 2. Iterative process: participants design and improve their avatar in between every simulation run
CLASSICAL METHODS Wide Wild World
Field Studies Econometrics Behavioral Economics WWW based large scale experiments
AVATAR-BASED METHOD Remote extraction of avatars from special distant subjects: Important bank executives, nationals of various cultures etc.
Interactive subjects experiments LAB
Single Subject cognitive experiments Agent-Based and
COMPUTER
Social Networks Simulations Computational
Intimate dialogue between Subjects and Computer via Avatar Update alternated with NatLab Market Runs Validation
finance
Numerical Model Solving Game Theory and other Analytical work DESK Economic Analysis
Relaxing in a controlled way the assumptions of the models. Expressing operationally qualitative descriptions.
Fig. 3. Positioning the Avatar-Based Method
32
Gilles Daniel et al.
The two columns of the table refere to the usual methods and the "Avatar-Based Method" (ABM from now on). One sees that in our view, the ABM constitutes a rather uniform way to treat economic behaviour and is capable of bridging application areas that were until now disjoint. This is clearly the case for the LAB and COMPUTER rows, where we even erased the separation line, but it has implications for the other rows too. For instance the Avatars, and especially within the NatLab environment, have been used already to extend to more realistic conditions some theoretical models (row 4 of the table) and results [16]. At the other extreme (rows 1-2 in the table), the Avatar-Based Method can help correct a perennial problem of economic studies: the biased and sometimes unrepresentative profile of the involved subjects. Indeed, it is very difficult to involve in those studies decision making officials fromfinancialinstitutions or traders. Substituting them by BA Undergrads is hardly a step towards realistic emulation of the real world. It is much more likely that these important players, rather than coming to a lab, will agree to provide the elements for creating their Avatars. Similar problems can be solved by including in the ABM experiments subjects from far away cultures or environments, without the necessity for distant travels and without separating them from their usual motivations and environment. Moreover, the information provided once by such subjects that are difficult to access can be used now repeatedly by playing their Avatars. Thus ABM has a good chance to bridge the gap between field studies and lab experiments too (rows 1-2 in the table). In fact as opposed to experiments that do not involve a market mechanism with capital gain and loss, in NatLab, incompetent non-representative subjects will naturally be eliminated since their Avatars loose very quickly their capital. Another point on which the ABM procedures are offering new hope is the well known problem of subjects motivation. Within the usual experimental frameworks, it is very difficult to motivate subjects, especially competent important people. From our experience, the NatLab realistic framework and the direct identification of the subjects with their Avatars successes and failures, lead to a very intensive and enthusiastic participation of the subjects even for experiments that last for a few days. In fact, beyond the question of "prestige", even seasoned professionals reported to have gained new insights in their own thinking during the sessions. Another promise that ABM is yet to deliver is that by isolating and documenting the Avatar update at discrete times, one will be able to contribute to neighbouring cognitivefieldssuch as learning.
3 Method Validation A piece of software is not an experimental set-up. With all its power, the value of the platform and of the "Avatar-Based Experiments" method has to be realized in real life and an elaborate technical and procedural set-up has to be created. The basic condition for the very applicability of our method is the humans capability to faithfiilly, precisely and consistently express their decision making in terms of computer feedable procedures. Thus we concentrated our first validation efforts in this
Traders Imprint Themselves by Adaptively Updating their Own Avatar
33
direction, adapting platform and procedural features to accommodate humans. Many other experimental aspects have to be standardised and calibrated, but in those experiments we concentrated on this crucial sine qua non issue. We can conclude at this stage that while there are humans (even economist professionals and successful traders) that could not express their "system" in a computer feedable format ("buy low , sell high"), by-and-large the participants in our experiments were able to confirm at some point that their avatar behaved in agreement with their own behaviour. This happened even with subjects with no particular computer (or market) skills. 3.1 Experimental Setup
Our experiment features a continuous double-auction implemented on the NatLab simulation platform. Every participant received extensive support from a computer scientist to implement his/her avatar in C++ on the platform. NatLab platform The NatLab has the capability to simulate in great detail the continuum time asynchronous real world [15]. Bilateral and multilateral communication between agents outside and in parallel with the market is made possible by NatLab. However, given that this experiment focuses mainly on the participants behaviour, we kept the market mechanism (the rules of the game) as simple as possible, while retaining the concept of continuous double-auction, essential to understand the price formation dynamics. NatLab was initially engineered as a simulation platform but its use is now in three distinct directions: 1. the platform provides a realistic framework for the individuals to act within. Providing this "reality" is independent of whether one is interested in its characteristics; it just allows an interactive continuous extraction of information from each of the participants and thereby refining our understanding on their approach, reactions and decision mechanisms; 2. the platform is part of a recent wide effort to understand the emergence of collective complex dynamics out of interacting agents with well defined, relatively simple individual behaviour; and 3. the platform, due to its realistic features and its asynchronous continuous time microstructure, is a reliable way to reproduce and maybe in the future predict real market behaviour. Market microstructure Our market implements a continuous double-auction mechanism, where agents can submit, asynchronously and at any time, limit or market orders to a single public book. Orders are sorted by price and then by time, as on the NYSE for instance. Every agent acts as a simple trader, and we do not include brokers or market makers at this stage. In this simple setup, agents balance their portfolio between a risky asset (a stock distributing or not a dividend) and a riskless one (a bond yielding or not an interest rate). Agents can communicate with each other through pairwise messages, and react to external news according to an idiosyncratic sensibility.
34
Gilles Daniel et al.
Avatars We organise our experiment as a competition between participants through the intermediary of their avatars. Avatars generate, by assigning values to their parameters, families of agents that act as independent (but possibly interacting) individuals in the market. The subjects' aim in each run is to generate a family of artificial agents that perform well against other families throughout the simulation run. A typical simulation run is exhibited in Fig. 4. Families were compared by their average wealth, but an average utility (given some utility function) or a certain bonus for minimising risk could be used in the future. We give our participants total liberty while implementing their avatar. They can define their own time horizon and design trading strategies as simple or complex as needed, but in the future we may tax agents with heavy data processing by imposing a fine or a specific time lag in the order execution.
3.2 Preliminary Results We have run two sets of experiments so far, with different participants including practitioners (real traders) and academics, either economists, physicists, psychologists or computer scientists. Each experiment included seven participants. The first experiment took place on July 19-31 2004, in Lyon, during the SCSHS Summer School on Models for Complex Systems in Human and Social Sciences organised by the Ecole Normale Superieure de Lyon. The second was organised on January 12-16, in Turin, during the Winter Seminar on Agent-based Models of Financial Markets organised by the ISI Foundation. A typical run, with a preliminary analysis of the price time series and relative evolution of populations, is presented on Fig. 4. We report here on some of the non trivial aspects of the participants behaviour during the experiments, while creating and updating their avatars. Imprinting oneself We noticed, specially at the beginning of the process, that some of our participants encountered some difficulties to express themselves in terms of computer feedable strategies. However, this improved dramatically during the iterative process itself This is clearly linked to the learning process that one has to face while performing any experiment, especially computerised ones. Conscious / Unconscious decisions The very nature of our method barely allows such things as intuition, improvisation or unconscious decisions to be operationally expressed in the avatar. In fact, after a few runs, avatars capture exclusively the conscious part of our subjects decision making process. Since we we do not know to what extent markets dynamics are driven by unconscious choices, it would be interesting to design a double experiment, comparing subjects and their own avatar in the same market micro structure.
Traders Imprint Themselves by Adaptively Updating their Own Avatar
35
Returns Volatility
/ -0.04
Time (au)
-0.03
-0.02
-0.01
0.00
0.01
0.02
0.03
0.04
Log Return
WlrMrf^^^'^^^'^T'V
e
.....,._,._,. ^ .
Time(au)
,,--,,,„,
-/•"• v
-
Time (au)
Fig. 4. Typical run with 7 avatars, 1000 agents each, for above 350 000 transaction ticks, (a) Autocorrelation functions - absence of raw returns autocorrelation and long-term autocorrelation of volatility, as defined as absolute returns, as observed in empirical data [12];(b) Normality Plot - fat tailed distribution of returns; (c) Price trajectory; (d) Relative wealth of agents populations - measure the relative success of competing avatars; (e) Stock holdings some strategies are clearly buy-and-hold, others interact with each other; and (f) Cash holdings Convergence There are two different but related convergence processes that took place during the successive iterations: the first was the convergence of the avatar's behaviour to its creator's intended strategy, while the second involved the evolution of subjects strategy itself to beat other participants. While it appeared relatively easy after a couple of runs to get an avatar successfiiUy reproducing their initial intended behaviour, subjects, driven by competition, kept refining and complexifying their strategy.
36
Gilles Daniel et al.
Strategies An interesting panel of strategies was proposed and grown by the participants, that could loosely be termed random trader, momentum trader, oscillatory trader, diversified Bollinger Bands trader, volume seeker. Neural Network based trader and evolutionary trader. Practitioners clearly outmarked themselves by their ability to think out of the box, the creativity of their strategies, their high analysis power and ability to quickly understand what was going on and spot opportunities to arbitrage other participants' strategies. We also observed the emergence of cooperation between participants to hunt for the leader, trying to bring down the winning strategy by copying and modifying it or even custom-designing new strategies for this specific purpose. Fundamental Value In the two experiments we ran, our computer simulation figured a closed artificial market, with no creation of stocks, no distribution of dividends and no interest rate associated with the riskless asset, cash. In those conditions, we observed that after a transition period, characterised by high volumes, during which assets were heavily reallocated between agents, the price kept fluctuating around a steady state equilibrium price. This price, emergingfi*omthe interactions between heterogeneous, relative risk aversion agents, was generally different from the fundamental value we could have expected from rational agents with homogeneous preferences.
4 Conclusion The rapidly growing field of Agent-based Computational Finance comes naturally as a complementary approach to the other Finance subfields: Behavioural Finance, Laboratory experiments. Econometrics, Game Theory, etc. Thefieldis definitely out of his infancy and a rather wide range of choices is available to academics and practitioners that wish to define and test concrete real and realistic systems or new models of individual and market behaviour. The next step is to set common standards for the platforms that propose to represent and simulate artificial financial markets [9, 4, 19, 2]. One possible goal is to transform them in virtual or even real laboratories capable to implement and test in realistic conditions arbitrarily sophisticated experiments. One way to solve the problems of realistic trader behaviour is the Avatar-Based Method introduced in the present paper. Even though there are many obstacles not even yet uncovered in realizing its ambitions, the method is already providing new insight and definitely even if its main ambitions are going to remain unfulfilled, it is guaranteed to provide fresh unexpected and valuable material to the existing methods. Among the fundamental issues which the ABM can address is the mystery of price formation by providing in great detail, reliability and reproducibility, the traders decision making mechanisms. Occasionally the Avatars are going to be caught unprepared and inadequate to deal with some instances that were not previewed by their
Traders Imprint Themselves by Adaptively Updating their Own Avatar
37
owners. By the virtue of this very instance, they will become effective labels for the emergence of novelty in the market. Thus in such instances, even in its failure, the ABM will provide precious behavioural and conceptual information. ABM can serve as a design tool for practitioners in the development of new trading strategies and the design of trading automata. Moreover, we hope that this approach will provide new ways to address some of the fundamental problems underlying the economics field: • • •
how people depart from rationality how out-of-equilibrium markets achieve or not efficiency how extreme events due to a shifting composition of markets participants could be anticipated
The experiments we ran, beyond eliciting information, provided a very special and novel framework of interaction between practitioners and academics. Thus NatLab and ABM might have an impact on the community by providing a common language and vocabulary to bring together academics and much needed practitioners. As a consequence, it appears necessary to gather interdisciplinary projects that would house within the same team the psychologists that run experiments on people's behaviour, computer scientists that canonise this behaviour into artificial agents, practitioners that relate those experiments to real markets and economists that assess the consequences in terms of policy making.
Acknowledgements We would like to thank the participants of our experiments for their time and commitment, together with the participants of the Seminar on (Un)Realistic Simulations of Financial Markets at ISI Foundation, Turin, Italy, on April 1-5 2005, for their enlightening comments, from which this paper largely benefitted. Finally, we are largely indebted to Alessandro Cappellini and Pietro Tema for their views and experience on online laboratory experiments of stock markets, as well as Martin Hosnisch, Diana Mangalagiu and Tom Erez. The research of SS was supported in part by a grant from the Israeli Academy of Science, and the research of LM was supported by a grant from the Centre for Complexity Science. All errors are our own responsibilities.
References 1. R. Axelrod. The Complexity of Cooperation: Agent's Based Models of Competition and Collaboration. Princeton University Press, 1997. 2. K. Boer, M. Polman, A. Bruin, and U. Kaymak. An agent-based framework for artificial stock markets. In 16th Belgian-Dutch Conference on Artificial Intelligence (BNAIC), 2004. 3. P. Bak, M. Paczuski, and M. Shubik. Price variations in a stock market with many agents. Physica A, 246:430-453, 1997.
38
Gilles Daniel et al. 4. A. N. Cappellini. Esperimenti su mercati finanziari con agenti natural! ed artificiali. Master's thesis, Dipartimento di Scienze Economiche e Finanziarie, Facolta di Economia, Universita di Torino, Italy, 2003. 5. A. Cappellini. Avatar e simulazioni. Sistemi intelligenti, 1:45-58, 2005. 6. R. Cont and J.-P. Bouchaud. Herd behaviour and aggregate fluctuations in financial markets. Macroeconomic Dynamics, 4:170-196, 2000. 7. J. Duffy. Agent-based models and human subject experiments. Computational Economics 0412001, Economics Working Paper Archive at WUSTL, December 2004. available at http://ideas.repec.Org/p/wpa/wuwpco/0412001.html. 8. I. Giardina and J.-P. Bouchaud. Bubbles, crashes and intermittency in agent based market models. The European PhysicalJournal B, 31:421-537, 2003. 9. B. I. Jacobs, K. N. Levy, and H. Markowitz. Financial market simulations. Journal of Portfolio Management, 30th Anniversary, 2004. 10. G. Kim and H. Markowitz. Investment rules, margin, and market volatility. Journal of Portfolio Management, 16(l):45-52, 1989. 11. D. Kahneman and A. Tversky. Prospect theory: An analysis of decision under risk. Econometrica, 47(2):263-292, 1979. 12. Y Liu, P. Gopikrishnan, P. Cizeau, M. Meyer, C. Peng, and H. E. Stanley. Statistical properties of the volatility of price fluctuations. Physical Review E, 60:1390-1400, 1999. 13. H. Levy, M. Levy, and S. Solomon. Microscopic Simulation of Financial Markets: From Investor Behavior to Market Phenomena. Berkeley, CA: Academic Press, 2000. 14. T. Lux and M. Marchesi. Scaling and criticality in a stochastic multi-agent model of a financial market. Nature, 397:498-500, 1999. 15. L. Muchnik and S. Solomon. Statistical mechanics of conventional traders may lead to non-conventional market behavior. Physica Scripta, 7106:41-47, 2003. 16. L. Muchnik, F, Slanina, and S. Solomon. The interacting gaps model: reconciling theoretical and numerical approaches to limit-order models. Physica A, 330:232239, 2003. 17. Lev Muchnik. Simulating emergence of complex collective dynamics in the stock markets.
http://shum.huji.ac.il/~sorin/ccs/Lev-Thesis.pdf. 18. R. G. Palmer, W. B. Arthur, J. H. Holland, B. LeBaron, and R Tayler. Artificial economic life: a simple model of a stock market. Physica D, IS'.ld^llA, 1994. 19. M. Shatner, L. Muchnik, M. Leshno, and S. Solomon. A continuous time asynchronous model of the stock market; beyond the lis model. In Economic Dynamics from the Physics Point of View. Physikzentrum Bad Honnef, Germany, 2000. 20. G. J. Stigler. Public regulation of the securities market. Journal of Business, 37(2): 117-142, 1964. 21. P. Tema. Sum: A surprising (un)realistic market - building a simple stock market structure with swarm. In Computing in Economics and Finance. Society for Computational Economics, 2000.
Learning in l\/lodels
Learning in Continuous Double Auction IVIarket Marta Posada, Cesareo Hernandez, and Adolfo Lopez-Paredes University of Valladolid, E.T.S. de Ingenieros Industriales, Paseo del Cauce s/n, 47011 Valladolid, Spain posada@eis . u v a . es
Summary. We start from the fact, that individual behaviour is always mediated by social relations. A heuristic is not good or bad, rational or irrational, but only relative to an institutional environment. Thus for a given environment, the Continuous Double Action (CDA) market, we examine the performance of alternative intelligent agents, in terms of market efficiency and individual surplus. In CDA markets traders face three non-trivial decisions: How much should they bid or ask for their own tokens? When should they place a bid or an ask? And when should they accept an outstanding order of some other trader? Artificially intelligent traders have been used to explore the properties of the CDA market. But, in all previous works, agents have afixedbidding strategy during the auction. In our simulations we allow the soft-agents to learn not only about how much they should bid or ask, but also about possible switching between the alternative strategies. We examine the emergence or not of Nash equilibriums, with a bottom-up approach. Our results confirm that although market efficiency is an ecological property, an it is robust against intelligence agents, convergence and volatility depend on the learning strategy. Furthermore our results are at odds with the results obtained from a top-down approach, which claimed the existence of Nash equilibriums.
1 CDA Market and Artificial Agents CDA is the dominant institution for the real-world trading of equities, CO2 emissions permits, derivatives, etc. In the CDA, a buyer can submit a price at which he is willing to buy (make a bid), and a seller can submit a price at which he is willing to sell (make an ask). If another buyer bids a higher price, it becomes the market bid; if another seller asks a lower price, it becomes the market ask. A buyer is free to accept the market ask; a seller is free to accept the market bid. Such acceptances consummate a binding transaction. Trades takes place as new bid-ask arrive. The auction continues for a specified period of time. The convergence and efficiency properties of the CDA have been the subject of interest among experimental economists, beginning with the seminal work of Smith [10]. In experimental economics (EE) the experiments directly reveal agent's
42
Marta Posada et al.
aggregated behaviour, they do not provide information about their individual decision rules and the impact of these rules on individual and aggregate performance. Simulation and game-theoretic analyses partially overcome this limitation. Gametheoretic studies on double auction generally adopt the framework of static games with incomplete in-formation, for which the equilibrium solution is Bayesian Nash equilibrium. How-ever, CDA is a complex dynamic system and we need a framework that captures the basic dynamics of the system. As Kirman and Vriend [7] argued, if we want to understand the dynamics of interactive market processes and the emergent properties of the evolving market structures and outcomes, it might pay to analyze explicitly how agents interact with each other and how information spreads through the market. A natural way to do this is following an agent-based computational Economics (ACE) approach (for reasons see Vriend [14]). The first shocking result from ACE approach to CDA markets was that ZeroIntelligent (ZI) agents with random offers, may lead to high market efficiency, thus proving that the institutional design is robust against individual learning (Sunder [11]). This result is in agreement with the eighteenth century classical philosophers, Adam Smith and David Hume, and later on with the Austrians, that claimed that spontaneous order may be an outcome of the individual interactions. Ecological learning and social learning beat individual intelligent learning.
2 intelligent Agents in the CDA Market Although the Zero Intelligence agents may achieve high market efficiency, ZI agents perform poorly when they are competing with agents with learning capacity (Tesauro and Das [12]). It seems intuitively reasonable that one should endorse the agents in a CDA with both, intelligence and adaptation (social learning). In a series of subsequent papers non zero intelligent agents have been proposed to analyze the performance of the CDA market: ZIP agents, GD agents, etc. A GD agent (Gjerstad and Dickhaut [4]) chooses the offer that maximizes his surplus, defined as the product of the gain from trade and the belief that some agent will accept the offer. GD agents use the history HM of the recent market activity (the bids and asks leading to the last M traders: ABL accepted bid less than b, AL accepted bid and ask less than 6, RBG rejected bid greater than 6, etc.) to calculate a belieffianction.Interpolation is used for prices at which no orders or traders are registered in HM. For example, the belief function of a buyer is: ^^^
ABL{b) + AL{b) + RBG{b)
^^
ZIP agents (Cliff and Bruten [2]) have a mark up that determines the price at which it is willing to buy or sell. The agents learn to modify this profit margin with adaptive learning. For example, the profit margin of a buyer is: M= 1
howMuchBidt-i -\- At ;;^ ;7^ ? ReservePnce
(2)
Learning in Continuous Double Auction Market
43
where At is calculated using the individual trader's learning rate (/?), the momentum learning coefficient (7) and the difference between the target bid and the bid in the last round (howMuchBidt_i) in the following way: At = jAt-i
+ (1 - j)P (targetBid - howMuchBidt_i)
(3)
In our analysis we consider one bidding strategy more, the Kaplan strategy (K) that was the winner of the Santa Fe tournament (Rust et al [9]). The basic idea behind this strategy can be summarized as follows: wait in the background and let others negotiate. Kaplan buyers will bid at the best ask only when one of the following three conditions are met: 1 The fraction of time remaining in the period is less than 10%. 2 The best ask is less than the minimum trade price in the previous trade. 3 The best ask is less than the maximum trade price in the previous period and the ratio of the bid-ask spread and the best ask is less than 10%, and the expected profit is more than 2%. We have reproduced these results for homogeneous population of agents with fixed learning strategies during the auction. Figure 1 presents the time series of transaction prices during three sessions (100 rounds per session). Learning and intelligence play an important role in the convergence of the transaction prices to equilibrium competitive price. The GD agents learn very soon to trade at a price very close to the competitive equilibrium price. The transactions are made in the first rounds of each period. The ZIP agents take more time than GD agents both to exchange and to learn. K agents must be parasitic on the intelligent agents to trade and to obtain profit. If all traders in the market are K agents no trade will take place. Although learning and the convergence to the Nash Equilibrium have been widely studied (Kirman [6]), there are few applications to the analysis of learning strategies in a CDA market (Walsh et al [15]). Walsh et al [15] found two Nash equilibria points when these three types agents (GD, K and ZIP) rival each other in a CDA market and their agents strategies are fixed by the modeller. But the question we put forward in this work is: If the agents
r 1% • *
m
1
Zl
flk
;V^^'^/r^»:>y
i-XA s 1
-J*
>--
...a...... -e-
# •
\
T
100
200
300 0
100
—1
200
300 0
100
200
300
Fig. 1. Transaction price time series for different homogeneous populations of agents during three sessions (100 rounds per session)
44
Malta Posada et al.
can adjust their strategies in an adaptative way, closer to the behaviour in the real world, will the Nash equilibrium be achieved?
3 Agents that Learn how to Choose Between Alternative Strategies Our goal is to examine the emergence of Nash efficient solutions with a bottom-up approach. The agents in our model can achieve high profits. They are robust with respect to various opponent strategies. Each agent chooses a strategyfi-oma set of three alternatives (GD, K and ZIP) at the start of each period. The initial strategy is chosen randomly. On subsequent periods, each agent learns to change his strategy looking for the best bidding strategy in the following way. To take this decision each trader only knows their own reservation prices and the information generated in the market, but he does not know the bidding strategy or the profit achieved in a market session by other agents. An agent will consider to change his strategy if the profit is less than the profit from the previous period. The agent considers whether he could have reached higher profits following an alternative strategy. To this end he assesses the transactions he made in the past. He considers both the transactions in which he took an active askbid strategy and those where he passively accepted the bid-ask (Fig. 2). In the first case, there are three possible alternatives for the buyer: 1 The bid of the alternative strategy was lower than the minimum price of exchange for this period. In this case the buyer will assume that no seller would have accepted it. 2 The bid of the alternative strategy was lower than the realized bid, but greater than the minimum transaction price for that period. Then the buyer will consider that the bid would have been accepted and he could have obtained greater profits. The profit is represented in thefigureby a blue bar. 3 The bid of the alternative strategy was greater than the realized bid. Then, he could have obtained lower profits. In the second case, above, there are only two possibilities: 1 The bid of the alternative strategy was lower than the seller's ask. The buyer would have rejected the ask with no profit. 2 The bid of the alternative strategy was greater than the seller's ask. Then he could have obtained the same profits whatever the value of the bid was. If an agent has not traded yet, he will consider the transactions made by the other agents and he will proceed with the same criteria discussed above. If an agent has no information at all, and there are no open orders, he will change his strategies in a random way.
Learning in Continuous Double Auction Market
45
4 Results and Discussion We have found that no matter what the initial population of agents is, the final composition of agents remains in a central region, where no strategy seems to dominate. The results of the simulation show that not only the proportion of strategies is relevant. It is also important the proportion of strategies of buyers and sellers. Let us comment in detail about these results. 4.1 Results when the Agents Have Different but Fixed Strategies We have simulated two scenarios. One in which the agents are not allowed to change their strategies during the auction. The aim is to reveal the learning patterns corresponding to each strategy. In the other one, the agents can change the strategy to increase their profits. This individual goal diversity should lead to an increase in market efficiency. We represent the strategy space by a two dimensional simplex grid with vertices corresponding to the pure strategies (Fig. 3): all ZIP (point a), all GD (point d) and all K (point f). We draw three regions to represent the populations that have a dominant strategy when more than 33% of the agents use the strategy. As it happens, ZIP agents are a majority in the red region (abc), GD agents settle in the blue region (bde), K agents choice is the yellow region (cef).
NO CHANGE
L
AcTive
pp Ptissive
CHANGE based price (self) ;i) (ii CHANGE based price (others)
7) min_|
:3)
©
CHANGE randoiti
i^Fig. 2. Buyers algorithm to change the type of strategy for each trading period
46
Marta Posada et al.
(a) ZIP GD{f}
K(f}
first period
(a) ZIP GD(f) last period Fig. 3. Efficiency market of the strategy space thefirstand the last session of experimental run when the agents havefixedstrategies during the auction Figure 3 presents the average efficiency market of all the populations in the strategy space in the first and the last session of experimental run (100 rounds per session) when the agents have fixed strategies during the auction. Note, that in the first period the CDA market efficiency is near 100% in almost populations of the strategy space. On the other hand, the topography has not changed substantially after fifteen auction sessions. Our results confirm that market efficiency is an ecological property, thus robust against alternative intelligent agents. But, the CDA market efficiency is very low when most of the agents are of the K t3rpe (yellow region). The volatility of the market efficiency is low but increases when the proportion of K agents is very high. The reason is not only that the proportion of strategies matters. The distribution of this proportion of strategies between the two market sides, buyers and sellers, is important as well. Let us analyze with further detail this fact. In figure 4 we show the price evolution and the market efficiency when 50% are K agents and 50%) are GD agents and they have fixed strategies during the auction. When no side is silent there is high efficiency and the transaction prices approach the equilibrium better.
Learning in Continuous Double Auction Market
47
buyer: 50%GD&50%K
- ^ - ^ - - : ^ j ^ - ^ - * n - '9F - ' ^ -•> - -fei/ " t ' - h ^ - - ^
500
1000
I5D0 • ;
buyer: 100%K competitive price
•
•
I
I
I
comjjetitive puhe
seller: 100%K
Fig. 4. Price dynamics and efficiency market when 50% are K agents and 50% are GD agents in a CDA When this 50% of K agents are all buyers, the transaction prices are below the competitive equilibrium price and the market efficiency is low. When they are all sellers the transaction prices are over the competitive equilibrium price and the market efficiency is low as well. We find equivalent results in experimental economics: contracts tend to be executed to the disadvantage of the side having the price initiative. But in this case it is no possible to control the parasitic behaviour of the real agents, and this control has to be done by the conductor of the auction, through the auction rules. The institutions in which one side of the market is free to accept but is not permitted to make an offer are labelled offer auction (if buyers are not permitted to make bids) and bid auction (if sellers are not permitted to make asks). This is a good example of the equivalence between an institution with the agents free to choose their strategies and another institution with individual agents forced to maintain a given strategy.
48
Marta Posada et al.
4.2 Results when the Agents Have Different Strategies and Can Change them We start the simulation with a proportion of 50% of K agents but we allow them to change their strategies to increase his profit so that they can move to ZIP or GD all along the experimental run. As it can be seen from the Figure 5, the transaction prices are very near the competitive equilibrium price and there is an increase in market efficiency (near 100%) in all cases. Figure 6 presents the evolution of transaction prices, market efficiency and proportion of strategies when we start the simulation with a proportion of 50%) of K agents and 50%) of ZIP agents. The coloured bars indicate the percentage of the alternative strategies in the population (blue for GD, yellow for K and red for ZIP) and we can observe the dynamics of the change in strategies along the auction sessions. In fact, comparing figures 5 and 6, it seems that the final proportions of strategies outcome strongly depend upon: the path, the initial strategy composition of the population and the proportion of strategies of buyers and sellers. We have found that the final proportion of strategies' agents remains in a central region, where no strategy seems to dominate. Some GD agents and ZIP agents consider whether they could have reached higher profit following a K strategy. But they are well aware that if the number of parasitic agents increases too much, they will
^—1-^
• IJ
7Un
• GD
1 K 1
15'
• ZIP
10'
5'
H1tiJ:':':'^iVrr[1 i
2
3
4
5
6
\ : llt/i
7
8
9
10 11 12 13 14 15
J
. ..
^;rrrrrrrrrrri 1
2 3 4 5 6 7 8 9 10 11 12 13 14 15
I nmrrir.r.r.l'.tl 1
2 3 4 5 -e- 7 8 9 10 11 12 13 14 15
Fig. 5. Price dynamics and efficiency market when initially K agents (50%) and GD agents (50%) change their strategy in a CDA market
Learning in Continuous Double Auction Market
49
iiiHidlbUy^l^^^K&Sa^ZIP
[7171
si
^V^'VVvVVVVVV* •••
d :;:
mTir.r.rrl.r.rJ i
Initfal buyers ftW^K
J
J
5
«.
7
12
U
U
15
iV-r
ty
[-
if
^
M I flilitnnhi J
^
5
t
T
?
^
10
11
1i
U
U
15
kdnflJi i i i i r J.
2
.!
J,
5
(-.J
4
a
10
11
1£
1!
W
15.
Fig. 6. Price dynamics and efficiency market when initially K agents (50%) and ZIP agents (50%) change their strategy in a CDA market
decrease their profit and the market efficiency, thus limiting theirfi*eeriding opportunity. This behaviour resembles passive investment in the stock market. If there are too many passive investors, there will not be enough arbitrage and the stock market will lose efficiency. Our results are at odds with those by Walsh et al [15], from a top-down approach that claimed the existence of two Nash equilibriums. We do not find such results in our simulations. We want to remark that in Walsh's work the agents have fixed strategies during the auction for each population of the space strategy. They calculate the Nash equilibrium from the following rule that is fixed externally once the auctions have finished. The new percentage of a specific type agent is the last percentage plus an amount proportional to the strategy's profit and the average market's profit. This is a major drawback of the top-down approach, since there is not emergence in the proportion of the strategies, but it is forced by the modellers. We think that the agents should have freedom to change their strategies and it is not realistic to assume that the agents know the group average profit for each strategy. We would like to interpret our results as a contribution to clarify a crucial issue in economics: the relationship between methodological individualism and social knowledge (Arrow [1]), as one of the reviewers pointed out.
50
Marta Posada et al.
With this focus our results and previous works, mainly the seminal contribution of Sunder confirm that ZI traders are only an important but the first historical step toward using computer simulations with artificially intelligent traders to explore the robustness and structural properties of markets. These simulations that are the wind tunnels of economics, can give us interesting clues and discoveries on the relationship between individual and social learning. Individual and social learning are interrelated (Vriend [14] and Gigerenzer [3]) and consequently, a learning process is not good or bad, rational or irrational, but only relative to an institutional environment. In general terms a distributed system with autonomous agents, that are not very instructed but are socially aware, can exhibit high intelligence as a system. The whole purpose of market design is to achieve this market efficiency even with traders that may not be much instructed. Our instruments for bounded rational agents explicitly should make methodological individualism and social knowledge compatible views (Lopez et al. [8]). To this end, a bottom-up approach, such as agent based simulation can be very useful. Thus the opportunity of this symposium and hopefully of this paper as well.
Acknowledgements We wish to thank three anonymous referees for their helpfiil comments.
References 1. Arrow K (1994) Methodological Individualism and Social Knowledge. American Economic Review 84 (2): 1-9. 2. Cliff D, Bruten J (1997) Zero is not enough: On the lower limit of agent intelligence for continuous double auction markets. HP-1997-141, Hewlett Packard Laboratories, Bristol, England. 3. Gigerenzer G (2005) Striking a Blow for Sanity in Theories of Rationality. In: Augier M, March JG (eds) Models of a Man: Essays in Memory of Herbert A. Simon, 389-410. MIT, Cambridge. 4. Gjerstad S, Dickhaut J (1998) Price formation in double auctions. Games and Economic Behavior 22: 1-29. 5. Gode D, Sunder S (1993) AUocative efficiency of market with zero-intelligent traders: Market as a partial substitute for individual rationality. Journal of Political Economy 101: 119-137. 6. Kirman A, Salmon A (eds) (1993) Learning and rationality in economics. Blackwell, Oxford. 7. Kirman A, Vriend N (2001) Evolving market structure: an ACE model of price dispersion and loyalty. Journal of Economic Dynamics and Control 25: 459-502. 8. Lopez A, Hernandez C, Pajares J (2002) Towards a new experimental socio-economics. Complex behaviour in bargaining. Journal of Socio-Economics 31: 423-429.
Learning in Continuous Double Auction Market
51
9. Rust, J, Miller J, and Palmer R (1993) Behavior of trading automata in computerized double auctions. In: Friedman and Rust (eds) The double auction markets: Institutions, theories and evidence. Addison-Wesley, New York. 10. Smith V (1962) An experimental study of competitive market behavior. Journal of Political Economy 70: 111-137. 11. Sunder S (2005) Markets as Artifacts: Aggregate Efficiency from Zero-Intelligence Traders. In: Augier M, March JG (eds) Models of a Man: Essays in Memory of Herbert A. Simon, 501-519. MIT, Cambridge. 12. Tesauro G, Das R. (2001) High performance bidding agents for the continuous double auction. In: Proceedings of the ACM Conference on Electronic Commerce (EC-01), 206209. Tampa, Florida. 13. Vriend N (1996) Rational behavior and economic theory. Journal of Economic Behavior and Organization 29: 263-285 14. Vriend N (2000) An illustration of the essential difference between individual and social learning, and its consequences for computational analyses. Journal of Economic Dynamics and Control 24: 1-19. 15. Walsh W, Das G, Tesauro G, Kephart J (2002) Analyzing complex strategic interactions in multi-agent systems. In: Game theoretic and decision theoretic agents, 109-118. AAAI Press, Menlo Park.
Firms Adaptation in Dynamic Economic Systems Lilia Rejeb^ and Zahia Guessoum^ ^ MODECO-CRESTIC University of Reims, L i l i a . Re j e b @ p o l e i a . I i p 6 . f r ^ 0ASIS-LIP6 university of Paris VI, Z a h i a . G u e s s o u m @ l i p 6 . f r
1 Introduction Evolutionary economic systems are often large-scale, dynamic, open and characterized by a strong competition. The survival offirmsin such complex systems relies on their decision process and also on the behavior of the otherfirms.Pajares [9] shows that in evolutionary economics, learning is the central issue of every model. Firms have thus to continuously revise their strategies according to their experience [5]. They have to select the most suited actions according to their local data and a partial perception of their dynamic environment. It is important then to find a learning technique allowing firms to construct a model of their environment, to update it according to their experience and to foresee the possible consequences of their decision before using it. Learning classifier systems (LCS) provide a good solution to model thefirmdecision. They combine reinforcement learning and evolutionary computing to produce adaptive behavior. They allow afirmto : • • •
build a model of its environment through a set of rules, use this model to anticipate the value of each action before adopting it, and evaluate its actions.
ThefirstappHcations of LCS have been introduced by Holland [8] [12,13]. However, recent research have proposed new improvements of LCS such as ZCS [16] and XCS [17]. We propose to use XCS to model afirmdecision process. Unlike LCS and ZCS, XCS can construct a complete and accurate model of the environment of the firm through efficient generalizations. Moreover, XCS can develop a readable set of rules which help to explain the evolution of the environment. The purpose of this paper is to build an XCS-based multi-agent system representing adaptive firms and their interactions. Each firm uses XCS for its learning. This XCS-based multi-agent system allows to explain thefirmand the global system behavior. The study of adaptive firms allows displaying the advantages and drawbacks of using XCS to model learning in multi-agent systems. We show that the
54
Lilia Rejeb and Zahia Guessoum
performances offirmscan be improved by using a more precise representation of the parameters of the firm. The paper is organized as follows. Section 2 presents the firm model. Section 3 describes the adaptivefirms.It gives an overview of XCS and presents the modeling of firms by the XCS-based agents. Section 4 presents an overview of the realized experiments and Section 5 gives a short overview of the related work.
2 The Firms Model Our economic system is characterized by a set of firms in indirect interaction via a competitive market. To model the firms, we use a resource-based approach [11]. This approach regards a firm as a collection of physical and human resources. It stipulates that the survival of afirmdepends on its use of its resources. The firm is thus defined by the parameters X, Y, K, B and S where: • • • • •
X is a set of resources, Y is a set of performances (profitability Yt[l\ and market performance ^^[2]). The performance of afirmis measured using the statistical Lisrel Model, K is a capital, B is a budget. It is used to update the resources, S is a set of strategies. Each strategy defines a method for distributing the budget among the different resources according to their priorities. The behavior of thefirmis described for each period by the following steps:
• • • •
Perception of the environment and update of the competition model. Update of the internal parameters such as the capital and the budget. Decision making, which corresponds to the choice of a strategy according to the current context. Update of the performances of the firm.
The context of a firm is determined by the firm's internal parameters (K, B^ X, and Yt), and its perception of the environment which is strongly competitive and non-stationary. At each time period,firmscan either join or leave the market. A firm leaves the market either when its performance decreases over a number of successive periods or when its capital decreases and reaches an exit threshold. However, it is impossible for the firm to dispose of all the information about its rivals. Thus, the firm has to use its previous experience to disambiguate the current state of the environment and foresee the consequences of its possible strategies to choose the adequate one. Eachfirmis represented by an agent. The perception component of an agent uses the internal parameters of the firm and its environment to build a representation of its current context. The decision process allows to choose the most suited strategy in a given context.
Firms Adaptation in Dynamic Economic Systems
55
3 Adaptive Firms The decision process of the firm is represented by XCS. The first section gives an overview of the learning classifier system XCS and the second section presents the XCS-based firm. 3.1 XCS XCS is a learning classifier system defined by Wilson [17]. Knowledge is represented by standard "condition-action" rules called classifiers. Each classifier is also characterized by three parameters: • • •
the prediction p, which corresponds to the average estimated reward when the classifier is used; the prediction error e, which estimates the error in the prediction of the classifier, the fitness F, which evaluates the prediction quality. It is the average prediction accuracy given by p. This quality distinguishes XCS from its predecessors.
A genetic algorithm is used to update this set of classifiers, whereas a Q-leaming technique is used to assign a value to the prediction p. In XCS p corresponds to the accuracy of the prediction of the reward. Moreover, XCS includes the generalization which allows representing knowledge in more compact manner by generalizing similar situations. XCS allows to choose an action either by exploration (random choice) or by exploitation (choice of the action having the greater value of PSi). The execution of the chosen action by the environment, generates a reward which is used by the reinforcement learning component of XCS to evaluate the classifiers. The reward could be immediate (single step XCS) or obtained at the end of a chain of actions (multiple step XCS). This reward is used to update the parameters characterizing the classifiers in XCS (p, e and F). The update of these parameters is done by the reinforcement learning component of XCS. The following formula are used in the presented order to update these parameters: (1) Pclj = Pclj + P{PSi - Pel J ) where (3 is the learning coefficient of XCS, clj is the classifier j using the chosen action and PSi is the average prediction of the chosen action a^. ecij = eel J + P{\PSi - Pclj I - edj)
(2)
edj is the prediction error of the classifier j . This prediction error is then used to update the fitness and the accuracy k :
*'-{r and
(ej/eo)
'',ej> eo; otherwise.
.^.
56
Lilia Rejeb and Zahia Guessoum
kj * num F,=F,+p{^^::::z^-F^
(4)
where • • •
clj is the classifier j using the chosen action a^, eo and v are parameters specific to XCS allowing to determine A;, nurrij is the numerosity of the classifier j . One step of XCS is described in the algorithm defined in table 1 ^.
do { 1 2
Table 1. An experiment of XCS
perception of the state of the environment, determination of the set [M] of the matching classifiers to the environment state; if [M] is empty, covering takes place; 3 determination of the system prediction array [SP]; 4 choice of the action "a" according to [SP] either by exploration(random choice)or by exploitation (choice of the action having the best value SP); 5 execution of the action "a" by the environment and determination of the reward R; 6 generation of the action set [A] gathering the classifiers in [M]having "a" as action; 7 reception of the reward R; 8 evaluation of the classifiers and application of the genetic algorithm if possible either in [A] if the reward is immediate or in [A]_1 if the reward is not immediate; } while the end of the problem is not reached.
3.2 XCS-Based Firms XCS-based firms are obtained by the integration of XCS and the agent representing a firm. XCS receives the perception of the context of the firm. It applies steps 1-4 of the algorithm in table 1 to determine the adequate action. This action is applied by the firm and gives a reward which is sent back to XCS to update the parameters of the classifiers. The reward is immediate. Thus, we use one-step XCS. This section presents the perception coding of the firm context and the used reward fiinctions.
' for more details see "An algorithmic description of XCS" [4]
Firms Adaptation in Dynamic Economic Systems
57
Perception of the environment
Fig. 1. XCS-based firm The Context
The diversity of the firm context parameters and their types (real, fuzzy, integer) and the continuous and non-stationary character of the system, makes it difficult to delimit the definition domains of the parameters. So, the recent representation methods for learning classifier systems such as real intervals and S-expressions [18] are not suitable to model the firm context. A unification method to homogenize the representation of thefirmparameters is then required. We propose therefore a unification method based on the decomposition of the definition domains of the parameters into m intervals . Each interval is characterized by a fuzzy value giving the rough estimate of the corresponding parameter. We opt for fiizzy granulation as it mimics the human reasoning and manipulation of information resulting from perception [21]. The domain definition of each parameter is described by symbolic values such as (very small, small, medium, large). Thesefiizzyvalues are then translated in binary coding to obtain a homogeneous representation of thefirmparameters (the method will be presented in details in the full paper). This method is general and independent of the application. The action in XCS corresponds to the firm's strategy. The number of strategies is fixed at the beginning by the economist. Table 2 represents a firm classifier and its translation into a classifier usable by XCS. This classifier associates the strategy 1 to the defined context. The use of strategy 1 in this context gives an estimated reward (prediction) of 0.5, a prediction error of 0.1 and afitnessof 100. 3.3 Decision Process
Usual rewardfiinctionsare discrete. They correspond to the allocation of a positive value when the action gives good results and 0 otherwise [18]. However, in real-life applications and in the firm problem, an action that results in a great improvement of the performances must not be recompensed as an action that results in a small improvement. We propose then to use a fiinction varying according to the context. Thisfiinctioncould be defined, either by considering the individual performances of thefirm,or by collective ones taking into account the other firms.
58
Lilia Rejeb and Zahia Guessoum Table 2. Classifier representing thefirmcontext Classifier representing the firm's context
Classifiers in XSC
K is small 0001 B is medium 0010 X = { xi is very small, X2 is small, X3 is medium, 0000,0010,10001 X4 is very small, X5 is very small,: >C6 is verysmall. 0010, 0001,0000 X7 is very small, xs is very small} 0000,0000, Y = {yi is small, y2 is small} 0001,0001, Average_K is large 0011 Average_B is very large 0111 , NbFirms is very small 0000 Average.Y={Aver_yi is medium ,.Aver_y2 is small} 0010, 0001 Action = Strategy! , parameters
1, (P)=0.5,(e>=0.01,(F)=100
The Indivual Reward Function We model this reward according to the performance variation. It is defined by an aggregation of the variation of the performances of the firm:
where Yt [1] corresponds to the profitability, Yt [2] corresponds to the market performance and aggreg is an aggregation operator. The average operator is used for this work. The Collective Reward Function Peres-Uribe [10] notes that it may be profitable to consider the effect of an agent on the other agents to measure its performance improvement. We propose then to model this collective performance by the relative performance of the firm proposed by Durand [6]. The relative performance considers the past performances of the firm and the competition state. It evaluates the position of the firm according to its rivals in the market. It is defined by: RelPerft = ^ ( ^ , B, C, D).
(6)
where •
A corresponds to the variation rate of profitability
Y,[l]-Y,_,[l] ^n-i[i]
^^^
Firms Adaptation in Dynamic Economic Systems •
59
B corresponds to thefirm'sprofitability in comparison to the average profit in the market Yt[l] -averageYi J=> =
77
•
averageYi and averageYi is the average profitability in the market. C is the evolution of the market performance
•
D is an index that gives a premium for the best market performer at time t Yt[2]-Min{Yt[2]i) Min{Yt[2]i)
(o)
^ ""^
where i e[l,k] and k is the total number of firms. The reward function corresponds then to the variation of this relative performance. It is expressed by : reward = RelPerft — RelPerft-i
(11)
4 Experiments For the implementation of the XCS-based firms, we use the XCSfi'ameworkproposed by Butz and Wilson [4]. We integrate it in the agent-based framework DIMA [7]. The first series of experiments compares the rule-based and XCS-based firms. The second series of experiments performs a sensibility analysis of the performances and learning of XCS to the coding precision and the reward ftinction. The XCS parameters are: •
the population size N is set to 6000 to allow the system to represent all the possible classifiers when the generalization is not used, the generalization probability #_probability = 0.5, the learning rate /3 = 0.001, the crossover Rate = 0.8, the mutation rate = 0.02, the minimum error = 0.01, the genetic algorithm frequency Ogen = 10, the exploration probability = 0.5, the exploration/exploitation selection mechanism is the same for all the firms.
The simulations were replicated 20 times. The obtained results are the average values. We compare the learning time or convergence of the classifiers and their performance improvement.
60
Lilia Rejeb and Zahia Guessoum
4.1 XCS-Based Firms Versus Rules-Based Firms To validate the XCS-based firm architecture, we compare the evolution of a population of 300 XCS-based firms and a population of 300 rule-based firms. Rule-based firms are endowed with a set of static rules. This rules choose the strategy according to the comparison of the performances of the firm to those of the market. To focus on the influence of the decision process, we endow these two kinds of firms with the same initial parameters. XCS-based firms use the individual reward function (see Section 3.3).
XCS-bfised furns Rule-based fims
0
?0Q
40D
eOD
80Q
1000
12O0
1400
16
Time
Fig. 2. Rule-based firms versus XCS-based firms
Table 3 Comparison of firms resistance Age
XCS-based firm Rule-based firm
Maximum 142 Average 23 Median 22
123 19 17
Figure 2 compares the average capital of the two populations of firms. Table 3 compares the firms age. Based upon Figure 2 and Table 3, we can conclude that XCS-based firms have better performances and are more resistant. Their capital is often stabilized well above that of the rule-based firms. This is due to their ability to improve their classifiers. However, they have a first difficult phase. Their initial classifier base is empty, their difficulties can be therefore explained by the random choice of strategies. 4.2 Coding Influence We focus in these experiments on the coding influence. We compare two populations of 300 firms, having the same initial parameters and differing by their representation
Firms Adaptation in Dynamic Economic Systems
61
precision. The first population uses a decomposition of the definition domain into 8 intervals. However, the second uses a decomposition in 16 intervals. Table 4. Comparison of different representations firms resistance Age
8 intervals representation 16 intervals representation 230 27
Maximum 209 Average 25
Table 4 shows that the use of a more precise representation improves the firm's resistance. However, less precise representation can lead to classifiers overgeneralization. Classifiers can thus become too general to be explained by economists.
0
10Q0 2000
3000 4000
5000 6000 7000 5000
9000
Time
Fig. 3. Influence of coding precision on the evolution of classifiers populations
Figure 3 compares the number and evolution of the classifiers of the two kinds of firms using respectively 8 intervals and 16 intervals. The classifiers number of firms using 16 intervals stabilizes later than that's offirmsusing 8 intervals representation. However, the classifiers population is richer for 16 intervals. Thus, more environment states are considered. So, a more precise decomposition of the definition domains gives a better representation of the environment and consequently better performances and resistance. However, it is costly in terms of learning time. Nevertheless, using more precise representations is an open problem when the environment states space is large. This is due to limited length of classifiers in XCS. 4.3 Individual vs Collective Reward Function We compare in these experiments two populations of 300firms.The first reinforces the classifiers according to its performances enhancement without considering the otherfirms.The second reinforces the classifiers according to the firm relative position in the market. These two populations have the same initial parameters and the
62
Lilia Rejeb and Zahia Guessoum
same representation. They differ only by their reward function. As we are in a multiagent context, the aim of these experiments is to verify if it is sufficient for firms to consider the other firms only in their perception or it is necessary to consider them also in the definition of the reward. 400 -5 E -o
300 200
B1
100
Collective-lndivjdual J
Jk
i .u 1
iA ,
.
1 [iL MTilfl'n/nffl T i l W\iid V \wu mP III ^ r^LiuMiKiNiU " Unp !ftJ 4oU yp W!^ T so^il L P vfvr ^\ f f r
»
te -200 a .S fi B*" -100
1 ^
-300
S
-400
V 1 1
'500 Time
Fig. 4. Collective versus Individual Reward Figure 4 shows that the difference between the performances of the firms using the two reward functions is not important. The figure shows that on average the collective reward function does not greatly improve the firm performance. The percent average improvement is of 1%. The comparison of their resistance confirms this result. We can conclude then that considering the other firms, only in its perception, is sufficient for the firm to make good decisions. These experiments respond to the question of Peres-Uribe about the utility of a collective performance improvement. A complete perception of the environment is more important than a collective performance improvement function in learning multi-agent systems.
5 Related Work Many techniques were used to model learning economic agents such as genetic algorithms [3,19], neural networks [2], reinforcement learning techniques (classifier systems [12, 13, 19] and Q-Leaming [15]). Neural networks work as black boxes. They cannot explain the firm behavior as it is needed by economic researchers. Moreover, their complexity relies on the importance of the environment states [14]. Q-leaming is also unsuited to model the firm's decision since it suffers from a problem of convergence in non- stationary environments and large state-actions spaces. It is also inefficient in capturing the environments regularities and consequently cannot avoid the exponential explosion in time and space [18]. Genetic algorithms do not fit our firm model as they do not allow the firm to take advantage of its previous experience to construct expectations about the environment [19]. In fact, the actions are evaluated after their actual use. However, this can result in bad consequences for firms. Learning classifier systems were widely used in learning economic systems [12, 13, 19, 1]. The first works [12, 13] used the LCS model of Holland. These
Firms Adaptation in Dynamic Economic Systems
63
systems are characterized by a set of internal messages constituting a kind of memory and by using forces to evaluate the classifiers. The learning speed of these systems are however very slow. More recent works such as those of [19, 1] use XCS. Yildizoglu[19] used XCS for the Nelson and Winter model based firm's learning. The parameters representation of these firms is a binary string indicating if the corresponding parameters improved or not. The reward function is represented by the capital productivity and profit. However, this system was only tried in homogeneous environments. Bagnall [1] used it to make offers for electricity generation. This system was only tried in stationary environments.
6 Conclusion In this paper an example of application of adaptive multi-agent systems to the simulation of economic models was presented. XCS was used to model adaptive firm's decision process. It provides the firm with the capacity to construct a complete model of its environment although a limited access to the information about its rivals. Moreover, XCS allows the firm to reason about the environment to choose the most suited action. The use of XCS meets also the needs of economic researchers as it can help in understanding the adaptation to a changing environment. The aim of our fiiture work is then to improve the XCS learning in very dynamic environments. The first perspective of this work is to control the explorationexploitation activation in XCS. The second perspective however is to enlarge the perception of the firm to take into account the organization structure of the market (organizational forms).
References 1. Bagnall A.J. (2004) A Multi-Agent Model of the UK Market in Electricity Generation. In: Bull (eds) Applications of Learning Classifier Systems, Studies in Fuzziness and Soft Computing 15. 2. Barr J and Saraceno F (2002) A computational theory of the firm. Journal of Economic Behavior and Organization 49:345-361 3. Bruderer E, Singh J.V. (1996) Organizational Evolution, Learning, and Selection: A Genetic Algorithm-Based Model. Journal of Economic Behavior and Organization 39:1216-1231 4. Butz MV and Wilson SW (2001) An algorithmic description of XCS. In: Lanzi PL, Stolzmann W, Wilson SW (eds) Advances in learning classifier systems. Lecture Notes in Artificial Intelligence 2321:253-272 Springer Verlag 5. Chen Y, Su XM, Rao JH and Xiong H, (2000) Agent-based microsimulation of economy fi-om a complexity perspective. In: Gan R (eds) The ITBM2000 Conference - Information Technlology for Business Management. Beijing, China 6. Durand R, Guessoum Z (2001) Competence systemics and survival, simulation and emperical analysis. Competence 2000 Conference. Helsinki, Finland 7. Guessoum Z, Briot JP (1999) From active objects to autonomous agents. IEEE Concurrency 7:68-76
64
Lilia Rejeb and Zahia Guessoum
8. Holland JH (1991) Artificially adaptive agents in economic theory. American Economic Review 81: 265-370 9. Pajares J, Hermandez-Iglesias C, Lopez-Paredes L (2004) Modeling learning and RD in innovative environments: a cognitive multi-agent approach. Journal of Artificial Societies and Social Simulation 7: 10. Peres-Uribe A, Hirsbrunner B, (2000) The risk of Exploration in multi-agent learning systems: a case study. Proc. Agents-00 Joint workshop on learning agents, Barcelona, June 2000. 11. Penrose ET (1959) The theory of the growth of thefirm.Basil Blackwell 12. Rivero SLDM, Storb BH, Wazlawick RS (1999) Economic Theory, Anticipatory Systems and artificial Adaptative Agents. BEJE (Brazilian electronic Journal of Economics) 2: 13. Schulenburg S, Ross P (2000) An Adaptive Agent Based Economic Model. In: Lanzi PL, Stolzmann W, Wilson SW (eds) Learning Classifier Systems: From Foundations to Applications. Springer Verlag. Lecture Notes in Artificial Intelligence 1813: 265-284 . 14. Sutton RT, Barto AG (1998) Reinforcement Learning: An Introduction. IT Press, Cambridge, MA 15. Tesauro G, Kephart JO (2002) Pricing in Agent Economies Using Multi-Agent QLeaming. Autonomous Agents and Multi-Agent Systems 5: 289-304 16. Wilson SW, (1994) ZCS: A Zeroth Level Classifier System. Evolutionary computation 2: 1-18 17. Wilson SW, (1995) Classifier fitness based on accuracy. Evolutionary computation 3: 149-175 18. Wilson SW, (2000) State of XCS classifier system research. LNAI1813: 63-83 19. Yildizoglu M. (2001) Modeling Adaptive Learning: RD Strategies in the Model of Nelson and Winter. In: DRUID's Nelson and Winter Conference. Aalborg, Danemark 20. Yildizoglu M, (2000) Connecting adaptive behaviour and expectations in models of innovation: The Potential Role of Artificial Neural Networks. European Journal of Economics and Social Systems 15: 21. Zadeh L, (2001) A new direction in AI: Toward a computational theory of perceptions. AI Magazine 22: 73-84
Firm Size Dynamics in a Coumot Computational IVIodel Francesco Saraceno^ and Jason Barr^ ^ Corresponding Author. Observatoire Frangais des Conjonctures Economiques. 69 Quai d'Orsay, 75007 Paris. Tel: +33 1 44 18 54 93. Fax +33 1 44 18 54 88 [email protected] ^ Rutgers University, Newark. jmbarr@rutgers . edu
1 Introduction This paper exploresfirmsize dynamics, with thefirmmodelled as a type of artificial neural network (ANN). Twofirms/networkscompete at two different levels. The first level, which has been explored in detail in other work (Barr and Saraceno (BS), 2004; 2005), looks at Coumot competition between two neural networks. In this paper, this level of competition is essentially in the background, while the main form of strategic interaction is in regards to firm size dynamics. Thefirm,while playing the repeated Coumot game, has to make long mn decisions about its size, which affects not only its own profits, but those of its rival as well. Our previous research showed that firm size, which was left exogenous, is an important determinant of performance in an uncertain environment. Here we reverse the perspective, taking as given both the leaming process and the dependence of firm profit on size and environmental complexity, and endogenize firm size in order to investigate whether simple adjustment mles succeed in yielding the optimal size (defined as the result of a best response dynamics). The computational requirements needed to discover the optimal network size may be quite expensive for the firm, thus we explore two simpler adjustment mles. Thefirst("the isolationist") has thefirmadjusting its size simply based on past profit. An "imitationist" firm, instead, adjusts its size if the rival has larger profits. We also study thefirmdynamics resulting from a combination of the two. To our knowledge, no other paper has developed the issue of long mnfirmgrowth in an adaptive setting. Our mainfindingsare: •
In the isolationist case firm's long mn size is a nonlinear function of both environmental complexity, initial size, rival's initial size and the adjustment rate parameter. Application of this simple mle yields an inverse relationship between long mn size and environmental complexity.
66 • •
•
Francesco Saraceno and Jason Barr In the imitationist case, the "drive to imitation" has to be large enough for the two firms to converge to the same size, higher complexity is associated with more instability and higher long runfirmsizes. Via regression analysis we measure the effects initial conditions on long run dynamics. Wefindthat own initial size is positively related to long run size, rival's initial size has a negative effect for small initial size, but positive effect for larger initial sizes. The adjustment parameters, are positively related with long run size. Finally, we show that our simple dynamics very rarely converge to the 'best response' outcome. The isolationist parameter seems to be the only one that plays a role in guaranteeing such a convergence.
Our paperfitswithin the agent-based literature on information processing (Chang and Harrington, forthcoming), that models organizations as collections of agents that process data. As no single agent can process all the data necessary for modem corporations to function, this creates the need for managers with the task of information processing (Chandler, 1977; Radner, 1993). Typical models are concerned with the relationship between the structure of the network and the corresponding performance or cost (DeCanio and Watkins, 1998; Van Zandt, 1998). In our paper the network tries to map signals about the economic environment to both demand and its rival's output decision. Unlike other information processing models, we explicitly include strategic interaction: a firm's ability to learn affects the competitor pay-off Thus a firm must adapt its structure to maximize efficiency in learning, and to respond to its rival's actions. Firms in our paper employ simple adjustment rules, routines, that are satisfactory rules-of-thumb alternatives to computationally expensive optimizing behaviors (Simon, 1982; Nelson and Winter, 1982).
2 Neural Networks as a Model of the Firm In BS (2002) we argued that when focussing on the information processing task of firms, computational learning theory may give usefiil insights and modelling techniques. Among the many possible learning machines, we focussed on Artificial Neural Networks as models of the firm, because of the intuitive mapping between their parallel processing structure andfirmorganization. Neural networks, like other learning machines, can generalize from experience to unseen problems, i.e., they recognize patterns. Firms do the same: the know-how acquired over time is used in tackling new, related problems (learning by doing). A specific feature of ANNs is the parallel and decentralized processing. ANNs are composed of multiple units that process relatively simple tasks in parallel, resulting in the ability to process very complex tasks^. In the same way firms are often composed of different units working autonomously on specific assignments, and coordinated by a management that merges the results of these simple operations in order to design complex strategies. ^ We discussed the details of ANNs in BS (2002, 2004, 2005); for an extensive treatment of the subject the reader is referred to Skapura (1996)
Coumot Firm Size Dynamics in a Coumot Computational Model
67
The parallel between firms and learning algorithms also shows in the trade-off linked to complexity. Small firms are likely to attain a rather imprecise understanding of the environment they face; but they act quickly and are able to design decent strategies with small amounts of experience. Larger and more complex firms, on the other hand, produce more sophisticated analyses, but they need time to implement their strategies. Thus, the optimal firm structure may only be determined in relation with the environment, and it is likely to change with it. In BS (2002, 2004, 2005) we showed, by means of simulations, that the tradeoff between speed and accuracy generates a hump-shaped profit curve in firm size. We also showed that environmental complexity and optimal firm size are positively related. These results reappeared when we applied the model to Coumot competition. Here we leave the learning process in the background, and focus on endogenous changes of network size.
3 Setup of the Model The background neural network model is taken fi*om BS (2004, 2005). Two firms competing in quantities face a linear demand function whose intercept is unobserved. They observe a set of environmental variables that are related to demand, and have to learn the mapping between the two. Furthermore, firms have to learn their rival's output choice. BS (2005) shows that in general firms/neural networks are able to learn how to map environmental factors to demand, which allows them to converge to the Nash equilibrium. We fixrther showed that the main determinants of firm profitability are, on one hand, firm sizes (i.e., the number of processing units of the two firms, mi and 777-2); and on the other environmental complexity, modelled as the number of environmental factors affecting demand (n). These facts may be captured by a polynomial in the three variables: TTi = / ( m i , 777,2, n)
z = 1,2.
(1)
To obtain a specific numerical form for equation 1, we simulated the Coumot learning process with different randomly drawn firm sizes (777,1,777-2 ^ [2,20]) and complexity (n e [5,50]), recording each time the profit of the two firms. With this data set we ran a regression, that allows to obtain the specific polynomial relating profits to size and complexity:"* TTi = 271 + 5.93mi - O.SSmj + O.OOTmf + 0.497712
(2)
- 0.377117712 - 2.2n + 0.0033ri^ + 0.007mim2n - 0.016m2n. Figure 1 shows the hump shape of profit with respect to own size discussed above. We report three curves, corresponding to small, medium and large opponent's size. ^ In previous work other variables affected profit; here we hold them constant. The appendix details the process yielding the profit equation (table 3); notice that the setup is symmetric, so that either firm could be used.
68
Francesco Saraceno and Jason Barr
240 profl
10 ml 12
14
16
18
20
Fig. 1. Firm 1 profit vs. size (1712 = 2, solid line; m2 = 10, crosses; m2 = 20, diamonds). Complexity isfixedat n = 10
4 The Best Response Function and Network Size Equilibrium In this section we discuss the firm's best response functions and the corresponding equilibria. Given the functional form for profit of equation (2), we can derive the best response function in size by setting the derivative of profit with respect to size equal to zero; this yields the following solution for firm i: m^''{m-i,n)
= 16.9 ± 2.26^2.6m_i - 0.058nm_i + 3.9
The 'best response' function is polynomial in m_j and generally has more than one solution (real and/or complex). Nevertheless, for values of m-i and n in the admissible range (rrii G [2,20], n G [5,50]), the solution is unique and decreasing. The Network Size Equilibrium (NSE) is given by the intersection of two firms' best responses (figure 2); these equilibria are stable as, in spite of the complexity of the profit function, the best response is quasi-linear. Notice that increasing complexity shifts the best response functions and the NSE upwards. Optimal firm size, m | , is a fiinction of environmental complexity. Exploiting symmetry, we have (figure 3): ml=m*2
= m* = 23.5 - 0.15n - 4.5x/(14.1 - 0.34n + 0.001n2)
(3)
If we substitute the optimal value given by equation (3) into the profit equation (2), we obtain a decreasing relationship between profits and equilibrium firms size (also plotted in figure 3). To conclude, the best response dynamics yields a unique and stable equilibrium. Furthermore, we were able to show that firm size in equilibrium is increasing in complexity, while profit is decreasing. In the next section we turn to simpler firm size adjustment rules, more realistic in that they require lower computational capacity for the firm.
Coumot Firm Size Dynamics in a Couraot Computational Model
7
rife
9
10
11
69
12
Fig. 2. Best response functions (n = 10, diamonds; n = 25, crosses; n = 40, solid lines) for firms 1 and 2. The Network Size Equilibria are given by the intersection of the lines
12
)tf 11 10
Fig. 3. Profit (solid line, left axis) at equilibrium is decreasing in environmental complexity. Equilibrium size (crosses, right axis) is increasing
5 Adaptive Adjustment Dynamics As discussed above, firms often face a large amount of information to process. The amount of information firms are assumed to possess in standard economic theory, is remarkable: cost functions, profit functions, and the effect of a rival's decisions on profits. If we depart from the full information case, the cost to obtain such knowledge may be significant. This is why firms engage in routines and adaptive behavior. The best response function in section 4 is quite complex, and assumed that the firm knows it; more specifically, it is assumed to know the expected maximal profit obtainable for the entire range of a rival's choice of network size. In addition, in a world in which production and market conditions constantly change, past information may quickly become irrelevant. Meaning that even if a firm has perfect knowledge of its
70
Francesco Saraceno and Jason Barr
best response function at a certain point in time, that function may quickly become outdated. Even when the firm is in possession of the computational capabilities necessary to compute the best response, it may be not efficient to actually do so. In this section, we explore relatively simple dynamics, that only assume the knowledge of firm's profits and their opponent's. We explore adjustment dynamics following a general rule-of-thumb rule, using the profit function generated in section 3. rui^t = rrii^t-i+P {'7^i,t-i - 7Ti,t-2)-\-OiIi [(m_i,t-i - m^,t-l)(7^_^,t_l - 7Ti^t-i)] • (4) P represents the sensitivity of firm size to own profit growth. A firm experiencing profit growth will increase its size by P {T^i,t-i — '^1,1-2) units. The parameter a captures the "drive to imitation"; U is an indicator function taking the value of 1 if the opponent's profit is larger than the firm's, and a value of 0 otherwise:
'
^ ri 0 \ 0 4^(7r_,,t-i-7r,,t-i) m2 (0)] = 1 if mi (0) > m2 (0) and 0 otherwise, since we found it to be statistically significant. Table 1. Regression results for adjustment dynamics. All variables stat. sig. at 99% or greater confidence level Dependent Variable: mi (100) Variable mi(0) [mi (0)]^ m2(0)
Coef. 0.517 0.009 -0.289 0.031 K(0)]^ m i (0) • m2 (0) -0.020 Nobs. 5000
Variable Coef. (3 159.342 Q:-mi(0) -3.110 S [mi (0) > ms (0)] 0.447 a 58.159 Constant 3.157 R^ 0.878
Variable p • m i (0) /?-m2(0) n n-f3 n-mi(O)
Coef. -2.042 3.907 -0.040 1.401 0.006
From the regression table we can draw the following conclusions: Increasing initial own size increases long run size, whereas the opponent has different effects (negative for low values of m2 (0), and positive for large values). a and P have positive effects onfirmsize, but interact negatively with initial size. Relative initial size (the dummy variable) has a positive effect. If firm 1 starts larger thanfirmtwo, it will have a larger long run size. Increasing environmental complexity has a negative effect. The reason is that it reduces profits and thus long run size. Several interaction effects capture the non-linear relationship between the independent variables and long run firm size. For example, both initial own size and the adjustment parameters have positive effects, but there are also off-setting interaction effects: ami(O) and f3mi{0) both have negative coefficients.
7 Convergence to Nash Equilibrium The previous section shows that the long run size of thefirmis determined by several variables; the convergence to the Nash equilibrium is not guaranteed by the simple adaptive dynamics that we study in this paper.This section investigates the conditions under which it takes place. We made random draws of the relevant parameters (a, (3 G [0,0.2], mi{0) e [2,20]), and we ran the dynamics. Then, we retained only the runs that ended with bothfirmswithin one node of the Nash value (i.e., mi(50) G [m* — 0.5, m* + 0.5].) That was done one million times for each complexity value nG {5,10,...,45}. Table 2 reports the success rate, the average a and P, the initial mi, and its mode. The first result that emerges is that only a very small number of runs converged close to the NSE. The success ratio does not even attain half of a percentage point. The value was particularly low for intermediate complexity values. Thus, we find
74
Francesco Saraceno and Jason Ban-
Table 2. Convergence to Nash equilibrium for different complexity levels. Standard deviation in parentheses n
(m*)
Succ.
5
(6.85)
0.492%
10
(7.19) 0.413%
15
(7.58) 0.296%
20
(8.01) 0.190%
25
(8.53) 0.207%
30
(9.15) 0.247%
35
(9.91) 0.293%
40
(10.93) 0.354%
45
(12.50) 0.356%
a 0.122
/3 0.082
mi(0) 8.758
(0.052)
(0.042)
(5.380)
0.115
0.087
7.897
(0.057)
(0.056)
(5.267)
0.093
0.082
4.137
(0.056)
(0.075)
(1.476)
0.106
0.009
4.845
(0.053)
(0.007)
(1.792)
0.106
0.011
4.972
(0.054)
(0.007)
(1.896)
0.105
0.012
5.408
(0.053)
(0.008)
(2.137)
0.105
0.014
5.700
(0.053)
(0.010)
(2.347)
0.110
0.016
6.304
(0.050)
(0.011)
(2.583)
0.119
0.021
6.967
(0.048)
(0.014)
(2.948)
mod[mi(0)]
that even extremely simple and commonsensical adjustment rules, while allowing for convergence to a steady state value, do not yield the equilibrium 'full information' outcome. This result calls for a careful assessment of the conditions under which a full information outcome m a y be seen as plausible (the 'as i f hypothesis). The mode and mean of initial size do not change significantly with complexity. Thus the increase in complexity, and in the associated Nash size, does not require larger initial firms for convergence to happen. In fact, the only variable that seems to vary significantly is (3. Increasing complexity requires increasing reactivity to own profit to yield convergence to NSE.
8 Discussion and Conclusion This paper has presented a model of the firm as an artificial neural network,and explored the long run size dynamics of firms/neural networks playing a C o u m o t game in an uncertain environment. We looked at long-run firm size resulting from two types of simple adaptive rules: the 'isolationist' and the 'imitationist.' These dynamics were compared to a benchmark 'best response' case. First we found that when using simple adjustment rules, long run firm size is a function of initial firm size, initial rival's size, environmental complexity and the adjustment rate parameters. These variables interact in a non-linear way. We also found that only under very precise initial conditions and parameter values does the firm converge to the N S E . The reason is that the dynamics we consider tend to settle rapidly on a path that depends on initial conditions. The simple rules generally yield suboptimal long run equilibria, thus suggesting cautiousness in taking the Nash equilibrium as a focal point of simple dynamics. We further find that when firms use simple adjustment rules, environmental complexity has a negative effect on size.
Coumot Firm Size Dynamics in a Coumot Computational Model
75
This is explained by the negative correlation between profits and complexity. More efficient information processing and more complex adjustment rules, would play a positive role in the long run profitability of the firm, and would deserve investment of resources.
Appendix. Derivation of the Profit Function This appendix briefly describes the process leading to equation 2, that in the present paper is left in the shadow. Cournot Competition in an Uncertain Environment Two Coumot duopolists face the demand function p^ = jf — (qn + q2t). Assume that production costs are zero. Then, the Nash equilibrium is q"^^ = 7t/3. Firms do not observe 7^, but know that it depends on a set of observable environmental variables x G {0,1}^:
^
k=l
Each period, thefirm/neuralnetwork views an environmental vector x and uses this information to estimate the value of 7 (x). Note that 7(xt) can be interpreted as a weighted sum of the presence or absence of environmental features. To measure the complexity of the information processing problem, we define environmental complexity as the number of bits in the vector, n e [5,50]. Thus, in each period: 1. Firms/networks observe a randomly chosen environmental state vector x, and use it to estimate a value of the intercept parameter, 7^. They also estimate the rival's choice of output, g!_^,. 2. Firms then observe the true value of 7, and q-i^ and use this information to determine the errors, su = {'ji — 7)^ and 621 = {q'Li — q-i) • 3. Based on these errors,firmsupdate the weight in the network. This process repeats for a number T = 250 of iterations, and the average profit IS
1 ^
j;Z^Qit{7t
- {qit + q2t))'
t=i
Regression Results for Profit Equation (2) was derived by using the model described in the preceding appendix. We built a data set by making random draws of n G [5,50], rrii e [2,20]. We ran the Coumot competition process for T = 250 iterations. We recorded average profit for the twofirms,and the values of mi, 7712, and n. This was repeated 10,000 times, in order to obtain a large data set. We then ran a regression to obtain a precise polynomial form for profit as a function of sizes and environmental complexity. Table 3 gives the complete results of the regression, which is reflected in equation (1).
76
Francesco Saraceno and Jason Barr
Table 3. Profit function for firm 1. All coefficients stat. sig. at 99% or greater confidence level Dependent Variable: 10,000 • TTI. Variable
Coef.
constant 270.6 nil
m\
ml Nobs.
5.932 -0.375 0.007 10,000
Variable Coef. m2 0.490 mi • 7712 -0.304 n -2.201 r B? 0.864
Variable n
mi ni2 • 77l2 • n
Coef. 0.003 • n 0.007 -0.016
References 1. Barr, J. and Saraceno, F. (2005). "Coumot Competition, Organization and Learning." Journal of Economic Dynamics and Control, 29(1-2), 277-295. 2. Barr, J. and Saraceno. F. (2004). "Organization, Learning and Cooperation." Rutgers University, Newark Working Paper #2004-001. 3. Barr, J. and F. Saraceno (2002), "A Computational Theory of the Firm," Journal of Economic Behavior and Organization, 49, 345-361. 4. Chandler, Jr., A. D. (1977). The Visible Hand: The Managerial Revolution in American Business. Harvard University Press: Boston. 5. Chang, M-H, and Harrington, J.E. (forthcoming). "Agent-Based Models of Organizations." Handbook of Computational Economics, Vol 2. Eds K.L. Judd and L. Tesfatsion. 6. DeCanio, S. J. and W. E. Watkins (1998), "Information Processing and Organizational Structure," Journal ofEconomic Behavior and Organization, 36, 275-294. 7. Nelson, R. R. and Winter, S. G. (1982). An Evolutionary Theory of Economic Change. Belknap Press of Harvard University Press: Cambridge. 8. Radner, R. (1993), "The Organization of Decentralized Information Processing," Econometrica,6\, 1109-1146. 9. Simon, H. A. (1982). "Rational Choice and the Structure of the Environment." Psychology Review. 63(2): 129-138. Reprinted in: Behavioral Economics and Business Organizations. The MIT Press: Cambridge. 10. Skapura, D. M. (1996), Building Neural Networks, Addison-Wesley, New York. 11. Van Zandt, T. (1998). "Organizations with an Endogenous Number of Information Processing Agents," In: M. Majumdar (Ed.) Organizations with Incomplete Information. Cambridge University Press: Cambridge.
Case-Studies and Applications
Emergence of a Self-Organized Dynamic Fishery Sector: Application to Simulation of the SmallScale Fresh Fish Supply Chain in Senegal. Jean Le Fur Institut de Recherche pour le Developpement Centre de Recherche Halieutique Mediterraneenne et Tropicale (CRH) Av. Jean Monnet, BP171, 34203 Sete Cedex, France [email protected]
1
Introduction
The artisanal fishery sector in Senegal is a complex system of fishermen and fish traders acting in close interaction. Indeed, in the overall marine fishery sector, several ethnic groups with different behaviors target more than a hundred fish species using nineteen types of fishing gear (Laloe and Samba 1990). Once fishermen land their catches, another complex set of fish traders is in charge of the food product distribution (Chaboud 1985). Seafood can then be sold on the beach, processed, transported to the various markets of the country or brought to Dakar, the capital, for export. The overall dynamics observed are the result of composite biological, technological and socio-economic interactions. A historical review of this fishery sub-sector (Chauveau and Samba 1990) pointed out that management changes introduced in the small-scale fishery more often than not led to unexpected effects. Indeed, some management measures introduced for a given purpose often led to undesirable consequences on other parts of the exploitation that were not concerned by the given measure. The system appears to be an archetype of complexity. From a management perspective, explanations may be sought for the conditions under which such a complex sector achieves organization and stability, despite the multiple dependencies existing between the different components. Answering such a question could help to better depict the possible derived consequences of a given management measure introduced into this sector. Studies and models on trade/market/price systems of self-organization and equilibrium are usually detailed insights which consider one or a few aspects of the market or chain dynamics such as price fixing (Gale 1987, Donangelo et al. 2000, Calvo-Armengol 2003, Zhu 2004), trade-offs and negotiation mechanisms (Faratin et al. 2002, Lemke 2004), interaction schemes and the network configuration of markets (Guriev and Shakhova 1996, lori 2002, Vriend 2004), decentrali-
80
Jean Le Fur
zation effects (Calvo-Armengol 2003), multiple levels of exchange such as in supply chain models (Kaihara 2001, Dangelmaier et al. 2002), and the interaction between prices and consumption (Nagel et al. 2004). In a world like the fishery sector, a given market can be considered as a local phenomenon embedded and connected in a variety of others, all of which together constitute the overall sector dynamics. This means that all of the cited aspects of the dynamics may be of importance and should simultaneously play a role in the overall dynamics. Moreover, the studies presented are usually described using abstract situations where localization and distances are not considered. In the real world of the fishery trade sector, transport costs are clearly a significant part of the fish price fixing as well as choices for a given place or another constitute key factors of the supply and demand dynamics. Furthermore, the resource-harvesting dynamics should clearly influence the dynamics of supply and fish market prices. Again, these multiple factors all intervene simultaneously at different levels and scales. The question then remains of the ability and means for a complex trading sector to converge, in such a context, towards self-organization and exhibit any equilibrium or steady-state. A multi-agent computer model restricted to the fresh fish supply chain in Senegal has been developed to study this question. The model is based on the representation of the agents' choices, actions and interactions. The multi-agent formalism has been used since it easily permits modeling of decentralized processes as well as studying adaptive or emerging phenomena (Railsback 2001). These latter are indeed felt to be the possible keys for the emergence of self-organization or equilibrium. Moreover, multi-agent formalism looks very suitable for studying systems where negotiation and multi-criteria functions play central roles (Oliveira et al. 1999).
2
Model Presentation
2.1
Outline
The model is based on a cybernetic (Ashby 1964) perception of the domain. In this approach, the fishery sector is considered as a set of four interconnected networks within which circulate money, fish, human agents and information. These flows may be interconnected at some points where matter can be exchanged (e.g., money exchanged for fish, activity for money). From this viewpoint, looking for a sustainable exploitation of Senegalese resources may consist in maintaining these flows. These overall dynamics are formalized at a finer scale using a diversified combination of local actions and interactions: The human groups in charge of the exploitation (fishermen, fish traders) constitute the concrete link between biological.
Emergence of a Self Organized Dynamic fishery Sector
81
technical, economic and social dynamics. The agents are endowed with various behaviors allowing them to obtain information from their environment, make choices about it and produce several actions. For these agents, the ability to respond to changes in their environment hinges on their ability to choose from one alternative to the other, and their ability to negotiate with other agents. The combination of the different actions produces flows of fish, currencies, human agents and activity and, finally, the overall dynamics (production, richness) of the fresh fish channel.
2.2
Class Structure
To investigate this composite issue, the object-oriented design of the multi-agent model leads to a class hierarchy where each sub-class 'is-a-kind-of its superclass. The classes that have been retained in the model of the Senegalese exploitation are presented in Fig. 1. FISHING COMMUNITY:
I
Community -|
Act ive — (agents) Passive-
Fislitrader Fislier'^ Consumer Factory
memory community size species (sp) money/sp quantity/sp cost/sp consumption/sp equipment current place known places confidence/place
artisanal fishe-ycomponent
• Living Fisli StGcl< LIVING FISH STOCK: memory species site biomass
MARINE PLACE: memory communities stocks species substrate type temperature
FISHING GEAR: loading capacity species harvested Icatchability/sp
Legend: CLASS NAME \ pointer field valuated field
Fig. 1: Computer constituents (classes) selected to formalize the domain. Grey boxes show examples of the object characterization for each corresponding class (note: the "memory" field refers to a separate storage class not figured in the hierarchy) The overall fishery exploitation is composed of four main classes: the communities conducting the various flows, the knowledge they can access for this purpose, the places in which they operate and the living fish stocks they harvest. These major classes are divided into more specialized ones to obtain a sufficient (parsimonious) description of what composes the fishery exploitation. In each of the eleven terminal classes, several objects are differentiated. Each object in a given class is given a set of attributes that are defined by the class to which it be-
82
Jean Le Fur
longs (four examples are presented in Fig. 1). The value of these objects' attributes document either the relationships with other objects (pointer fields) or specific characteristics (valuated fields). 2.3
Functional Representation
Upon this structure a multi-agent formalism, as described in Ferber (1999), has been developed. The Active-Community class contains the active agents (fishermen and fish traders) of the food supply chain. These objects can elaborate information and produce actions through the nesting of messages sending and replies : In this model for example, if a fishing community needs to know the traders' demand for a given species in a given port, it sends the corresponding message to the port's agent. The port sends the message "species' needs" to each of the fishtraders currently in this place. Each fish trader then conducts an internal evaluation of its requirements for this species. It replies by sending a message back to the port. The port cumulates the answers and after a compilation can reply back to the fishing community. At a higher level of combination some basic activity of the various agents in the exploitation can be formalized. The example in Fig. 2 represents fishermen's actions once they have gone fishing and come back to land their catches:
1
Go fishing
Go to port
t
1^
inaease confidence for current place
change place (spend moving costs) Look for buyers sell fishes
memorize richness estimate consumption species 2u (demandj. price j)
i
Fig. 2: Flow chart of the 'go to port' action performed by a fishing community agent (e.g. after fishing). Each box corresponds to an 'order' (set of messages) sent by the fishing community to itself Each fisherman agent arriving in a port tries to sell its fish. If it succeeds, it memorizes the results of its action and then stands by. The next time it will have to act, it will choose to go to sea. If the transaction does not succeed, because there are no fish traders for its fish or the negotiation did not succeed, fishermen look for another port (action 'changeport') using a decision process sub-model derived from
Emergence of a Self Organized Dynamic fishery Sector
83
Le Fur (1995, 1998): according to the information it can gather from other objects and agents, the community elaborates a set of alternatives to where it can go. For each of these places, it evaluates the opportunity related to several criteria. In this case, the opportunity to go to one port or the other will depend on traders' demand, species prices, transport costs, confidence for one or the other place. After comparing the opportunity for each of the alternatives, whether it finds a better place to sell its fish and then goes to the new port where the whole process starts again, or it does not find a better place, and stands by for the next fishing trip. If, during its standby, fish trader agents happen to arrive in this port, selling may occur. To account for the whole exploitation activity, four similar processes have been formalized: two for the fishermen agents: 'go fishing' and the 'go to port' just described, and two for the fish traders' agents: 'go to port' and 'go to market'. At each time step, each active agent, fisherman or fish-trader has to choose from one of its two moves. Going to the same place is allowed as a stand-by. Depending on its stocks (fish, money) and the result of its preceding choices, it goes to a given place whose characteristics conduct it to a specific 'action'. For example, a fisherman arriving at a fishing zone fishes, a fish-trader in a port tries to buy fish, etc. Any action conducted leads to a subsequent 'evaluation' of its result. This evaluation may lead to change or no change in the next action aimed at by the active agent. Interaction and Transaction
When two communities, a buyer's and a seller's, happen to meet in a given place, negotiation may occur, followed or not by a transaction. A sub-model has been developed to formalize this mechanism. Transaction sub-model: Since in Senegal, bargaining is the rule for exchange, the transaction sub-model represents selling by private contract between the different communities: at the beginning of the transaction, the selling community (fisherman, fish trader) obtains information from its surroundings, evaluates the cost caused by the previous activity (moving, buying, fishing) and proposes its price. The buyer (fish trader, customer) considers its previous costs or needs and puts forth its own proposition. The final price of the transaction will be a value between the seller's lowest price and the buyer's highest price. In decision theory, given A a set of acts, E the possible states of an environment, the possible consequences (Q can be described through a probability distribution. A rule of thumb (e.g., Charreton and Bourdaire, 1985) establishes the possible mean of this distribution (i.e., final price) as: Vs • [maximum of the distribution + minimum of the distribution + mode]. Following this scheme, the transaction price will be, for example in a port: Vs • [fishermen's price + traders' price -^finalprice in the port during the last transaction concerning this species]. If the final price is acceptable to both partners, the transaction occurs and the price changes in the port. In a given time step, the evolution of the traders' arrival in the port and their successive transactions generates the port's fish prices dynamics. These fluctuations will
84
Jean Le Fur
again intervene in the agents' choices. This procedure is duplicated in the market places where transactions occur between fish traders and consumers. Since the agents may own fish and currencies, their moves lead to the activation of the various flows constituting the exploitation. Moreover, depending on the place where an agent operates, it can come into interaction with other agents and decide whether or not to exchange matter (work into fish, fish into currency, etc.). In this way, the accuracy of the moves and of the interactions will be a condition for an accurate subsequent action. 2.4
Simulation Process
Simulation scenarios are built using data available in the literature (Chaboud 1985, Chaboud and Kebe 1990, CRODT 1989, 1990). The whole Senegalese exploitation has first been instantiated and led to a scenario with a system composed of 126 fishing community agents accounting for 3193 fishing teams, 1421 traders agents, 14 markets, 9 ports, 13 fishing zones, 5 fishing gears types, 6 vehicle types and 21 types of fish species. For practical reasons, simulations have been conducted with subsets of this configuration. In this study, the scenario accounts for the North coast of Senegal with only gillnets and lines, 2 ports and 8 markets. At the beginning of the simulation, active fish traders (trucks) and fishermen (canoes) are positioned on their current places. Each community agent is given 45.000 CU for 15 days (with CU: currency unit such as 1 CU approximates to 1000 Senegalese CFA Francs). The simulation then proceeds step by step with one time step equivalent to a fortnight. At each time step the external sources of fluctuation are documented. This consists in making the natural resources produce fish on one side of the system and providing consumers with money and consumption needs on the other side. Each active community is then allowed to produce an action. Depending on their environment, their preceding choices and results, fishermen and fish traders move to one or another type of place, port or market, and try to fish, sell or buy. At the beginning of a simulation and depending on the initial scenario, the communities introduced into the "virtual exploitation" may not fit with the particular environment simulated. For example, a fisherman with bad information will not go where fish traders are waiting, another may look for unavailable species, a fish trader may go to distant markets and incur high transport costs, etc. Depending on their action some communities may thus lose money. If a community, through its activity happens to lose more than 10.000 CU in the 15 preceding steps, it leaves the fishery. The most indebted agents leave the 'virtual exploitation' first and, from time to time, only the fittest communities remain in the exploitation.
Emergence of a Self Organized Dynamic fishery Sector
3
85
Simulation Results
Owing to the structure of the multi-agent system, it is possible to obtain an insight into the various levels of the fishery system dynamics. • At the finest scale, activities, decision processes, negotiations and transactions between agents can be traced precisely. Traces show diversified choices and actions through time and from one agent to the other. The listing in Fig. 3 presents a snapshot of a fishing community during the transaction phase of their activity. 5 pml-ky-2 leave the Kayar_sea and [transport: 2CU] arrive at Kayar Price given for rays by pml-ky-2: .220 .13 0 Price given for sea bream by pml-ky-2: by Kayar-4 0: .043 after negotiation: .097 5 pnil-ky-2 sell (each) 200kg/day of sea bream to 3 kayar-40 and earn 19.4 CU/pers./day Price given for red bream by pml-ky-2 .177 Price given for grouper by pml-ky-2 .488 by Kayar-75 .014 after negotiation: .315 , Kayar-75 refuses transaction. .3 99 by Kayar-4 0: after negotiation: .444 5 pml-ky-2 sell (each) 41kg/day of groupers to 3 kayar-40 and earn 18.2 CU/pers./day pml-ky-2 has no cuttlefish to sell. 5 pml-ky-3 leave the Kayar_sea and at Kayar
[transport: 2CU] arrive
Fig. 3: An example of the simulated transactions. At the local scale, the actions performed by the agents are traced by the computer. In this example, information has been filtered to keep the selling task trace only (explanation given in text) In this example, the fishing community agent named 'pml-ky-2' comes back to 'Kayar' port object after having fished in 'Kayar_sea' marine place object. It tries to sell its 'rays' but no fish trader is interested in this species. Nothing thus happens. The fishermen community then tries to sell its 'sea bream'. Taking into account its previous costs, the quantity owned and the port's price, it proposes a price (.130). The fish traders' agent named 'kayar-40' is interested in buying and makes an offer. After "bargaining" the price is negotiated to .097 and the two communities proceed to the transaction. For the 'red bream' species, fishermen do not find any traders. For the 'grouper' species, the 'kayar-75' traders' agent proposes a very low price (.014). The negotiated price remains too high and the trader refuses the exchange. The former trader's community 'kayar-40' is also interested in the 'grouper' species and here, the negotiation succeeds. At the end of negotia-
86
Jean Le Fur
tion, fish traders ask for 'cuttlefish' species but the fishermen community did not target this species. Nevertheless, this may conduct the community to later choose a fishing place to catch this species. Thereafter another fishing community 'pml-ky3' arrives at 'Kayar' and the process goes on. • At a higher scale, local indicators can be studied. For example, from time to time and depending on the moves of the various communities, supply and demand fluctuates from one site to the other and, through negotiations, the species prices change. The example in Fig. 4 presents an emerging co-evolution of the grouper species price dynamics in a port and its nearest market.
3.5 consumer's prbe at Louga market
O^ CD O
ex-vessel price at port Saint-Louis
2.5
QCD Q-
1.5
O
&.
0.5
simulation time step (fortnights) Fig. 4: Price changes of the grouper species in two related places (a port and a nearby market). Results from a simulation Dependencies appear between the two places. The comparison of temporal changes in prices shows higher prices in the market than in the port. Moreover, fluctuations in the port propagate with a delay in the market. The response curve shows a sharp increase followed by a slow price decay. This corresponds to a period of shortage when the prices increase followed by period of over-supply when price falls as new providers/sellers arrive. Some unexplained fluctuations also appear such as the amplification of the prices observed between t2o-t5o followed by a decrease around tgo- In the model, since the whole set of ports and markets are interconnected, changes occurring in a given place object (e.g., disaffection for a port or a market) can lead to changes in other places thus causing these indirect, and often difficult to figure out, variations. • Finally, at the overall scale, various flows can be monitored in the simulation. The example in Fig. 5 shows a possible evolution of the simulated fishery.
Emergence of a Self Organized Dynamic fishery Sector
87
population size
CU or tons
\ 1200
15000 fish traders pop. size
1000
10000 4
1! i'r^i%
ports' richness (CU)
800
600
5000
400
200
-5000
50
100
150
time step (fortnigiit) Fig. 5: Self-organization of the virtual exploitation (North coast and markets of Senegal with gillnets and lines only). Results from a simulation At the beginning of the simulation, fish landings are greater than the buying capacity of the fish traders. The unsold quantities are too high and fishermen lose money. The exploitation is indebted overall (port's richness). Through time, fishermen who are not able to support the loss leave the fishery (fishermen population size). The fish traders' number remains stable. In the middle of the simulation, the fisherman number reaches a low level until it is fitted to the trading capacity of the fish traders. The exploitation is, from that time, composed of many fish traders with small buying capacities and a few fishermen communities providing the exact demand. The four dynamics become stationary. The unsold quantities tend to zero; the exploitation richness is positive and stable.
Discussion Following the classification by Straskraba (2001), the simulated fishery sector exhibits both self-adaptation (tuning parameters without modifying structure) and self-organization (connecting or disconnecting a diversified set of relationships). Some simulations demonstrated the model's sensitivity to the agents' choice criteria, the order in which the agents act or the initial conditions of the simulations. As these factors modify the dynamics, the results of the simulation cannot
88
Jean Le Fur
be closely related to observed real events; they just display a possible situation arising from an approximation of the real context. The validation of the model is conducted by comparing the activity of the virtual exploitation at each functional level with observed patterns: fishermen go fishing, fish traders arrive in the right market at the right time, the negotiation process is rational and reliable compared to what is known of the bargain process, prices are realistic compared to those observed, related places co-evolve, etc. The simultaneous consistency of the dynamics observed at various levels is a factor that reinforces confidence in the emergent dynamics observed.
5
Conclusion
In the course of a simulation, the communities' objectives change depending on their various activities (fishing, selling, moving, buying, consuming). From one objective to another, from one type of community to another and from one environment to another, the decision processes lead to different choices. The resulting sum of the activities modifies the context (i.e., environments) through time and, by feed-back, influences the various evaluations processed by the agents. Even in such a complex multivariate system there exist some combinations for which the system is sustainable. The agents' diversity and the multiplicity of their local action provide a large degree of freedom to the multi-agent system. This feature can contribute to make sustainable combinations available. Moreover, when associated with a simple process of agent fitness selection, this 'distributed-diversity' feature also provides the ability for the simulated system to converge autonomously towards a correct parameter combination.
References Ashby WR (1964) Introduction to Cybernetics. Methuen, New York Calvo-Armengol A (2003) A decentralized market with trading links. Mathematical Social Sciences 45:83-103 Chaboud C (1985) Le mareyage au Senegal. CRODT-ISRA, doc Sci 87, 112p Chaboud C, Kebe M (1990) Commercialisation du poisson de mer dans les regions interieures du Senegal (donnees statistiques). CRODT-ISRA, contrat FAO 695 TCP/SEN/6653(t), septembre 1990, 300p. Charreton R, Bourdaire J M (1985) La decision economique. Presse universitaire de France ed, Que sais-je?, 1985, ISBN 2-13-039042-0, 125p Chauveau JP, Samba A (1990) Un developpement sans developpeurs ? Historique de la peche artisanale maritime et des politiques de developpement de la peche au Senegal. Doc ISRA, serie Reflexions et Perspectives, 20p CRODT (1989) Statistiques de la peche artisanale maritime senegalaise en 1987. Arch Centr Rech Oceanogr Dakar-Thiaroye, no 175, juillet 1989, 85p
Emergence of a Self Organized Dynamic fishery Sector
89
CRODT (1990) Recensements de la peche artisanale maritime senegalaise entre Djifere et Saint-Louis mai et septembre 1987. Arch Centr Rech Oceanogr Dakar-Thiaroye, no 181,juilletl990,49p Dangelmaier W, Franke H, Pape U (2002) A Multi-Agent-Concept in Supply Chain Management. In: International Manufacturing Leaders Forum. Adelaide, 8 - 1 0 February 2002 Donangelo R, Hansen A, Sneppen K, Souza S (2000) Modelling an imperfect market. Physica A 283:469-478 Faratin P, Sierra C, Jennings NR (2002) Using similarity criteria to make issue trade-offs in automated negociations. Artificial Intelhgence 142:205-237 Ferber J (1999) Multi-agent systems - An introduction to distributed artificial intelligence. Addison-Wesley ed, Arlow, Great Britain, 509 p Gale D (1987) Limit theorems for markets with sequential bargaining. J Economic Theory 43(l):20-54 Guriev S, Shakhova M (1996) Self-Organization of Trade Networks in an Economy with Imperfect Infrastructure. In Schweitzer F (ed) Self-Organization of Complex Structures: From Individual to Collective Dynamics. Gordon and Breach Scientific Publishers, London lori G (2002) A microsimulation of traders activity in the stock market: the role of heterogeneity, agents' interactions and trade fictions. J Econ Behavior & Organisation 49:269-285 Kaihara T (2001) Supply chain management with market economics. Int J Production Economics 73:5-14 Laloe F, Samba A (1990) La peche artisanale au Senegal: ressource et strategies de peche. Etude et Theses, Paris, Orstom ed, 395p Le Fur J (1995) Modeling adaptive fishery activities facing fluctuating environments: an artificial intelligence approach. AI Appl Agric Nat Res Environ Sci, 9(1): 85-97 Le Fur J (1998) Modeling fishery activity facing change: Apphcation to the Senegalese artisanal exploitation system. In : Durand MH, Cury P, Mendelssohn R, Roy C, Bakun A and D Pauly (sci eds) Global vs local changes. Orstom coll CoUoques et seminaires, Paris, pp 481-502 Lemke RJ (2004) Dynamic bargaining with action-dependent valuations. J Econom Dynamics & Control 28:1847-1875 Nagel K, Shubik M, Strauss M (2004) The importance of timescales: simple models for economic markets. Physica A 340:668-677 Oliveira E, Fisher K, Stepankova O (1999) Multi-agent systems: which research for which applications. Robotics and Autonomous Systems 27:91-106 Railsback SF (2001) Concepts from complex adaptive systems as a framework for individual-based modeling. Ecol Modelling 139:47-62 Straskraba M (2001) Natural control mechanisms in models of aquatic ecosystems. Ecol Modelling 140:195-205 Vriend N (2004) ACE models of market organization. Rev Econ Industrielle 107:63-74 Zhu J (2004) A buyer-seller game model for selection and negotiation of purchasing bids: extension and new models. European J Operational Research 154:150-156.
Multi-Agent Model of Trust in a Human Game Catholijn M. Jonker^ Sebastiaan Meijer^, Dmytro Tykhonov^ and Tim Verwaart^ ^Radboud University Nijmegen, Montessorilaan 3, Nijmegen, The Netherlands {c.jonker, d.tykhonov}@nici.ru.nl ^Wageningen UR, Burg. Patijnlaan 19, Den Haag, The Netherlands {sebastiaan.meijer, tim.verwaart}@wur.nl
Summary. Individual-level trust is formalized within the context of a multi-agent system that models human behaviour with respect to trust in the Trust and Tracing game. This is a trade game on commodity supply chains and networks, designed as a research tool and to be played by human players. The model of trust is characterised by its learning ability, its probabilistic nature, and how experience influences trust. The validity of the trust model is tested by comparing simulation results with aggregated results of human players. More specifically the simulations show the same effects as human plays on selected parameters like confidence, tracing cost, and the trust update coefficient on observable game statistics like number of cheats, traces, certificates, and guarantees.
1
Introduction
People from different cultures differ significantly with respect to uncertainty avoidance, individualism, mutual caretaking and other traits [1]. Personal traits and human relations affect the forming and performance of institutional frameworks in society. Important economic institutional forms are supply chains and networks [2]. The Trust and Tracing game [3] is a research tool designed to study human behaviour with respect to trust in commodity supply chains and networks in different institutional and cultural settings. The game played by human participants is used both as a tool for data gathering and as a tool to make participants feed back on their daily experiences. Although played numerous times, the number of sessions that can be played with humans is limited. It is expensive and timeconsuming to acquire participants [4]. Furthermore one needs many sessions to control for variances between groups [5]. Multi-agent simulation can to some extent overcome these disadvantages in two ways. It can validate models of behaviour induced from game observations and it can be a tool in the selection of useful configurations for games with humans (test design). Validation of the models we designed was done on the aggregated level using computer simulations. Simulation results were compared to a set of hypotheses based on human games observations and conventional economical rationality.
92
C. M. Jonker et al.
This paper presents a multi-agent model of the Trust and Tracing game. It is an instrument in the research method presented in Section 2. Section 3 provides a brief description of the game and results from human sessions. Section 4 describes the agent architecture and models for buyer's behaviour and trust. In section 5 we illustrate the validity of the approach by experimental results from multi-agent simulations. Section 6 presents the main conclusions of the paper.
2
Method
Our research uses a methodological cycle as described in figure 1. It started in the upper left comer with the human game environment. The first series of sessions led to a number of observed individual and aggregated tendencies in the human game. On the basis of observed tendencies and conventional economical theories a multi-agent model was designed and implemented in a simulated environment. In this environment sessions were simulated using the same settings as the initial human sessions. Through verification of aggregated tendencies we have been able to prove gross vaUdity of our model, and the fruitfulness of our approach.
Multi agent modei
Agent tendencies
I \/nrinhla_ Variable Vnrinhig Variable Setting PI
V
I
O K - > generate variations
Sim. Aggr. Model tendencies
Simulated sessions "
Test for PI.... Pn tendencies, and select interesting P settings to play.
Fig. 1. Methodological cycle In current and future work more variations of the setting (including the current one) will be tested in both the human and simulated environment. This will lead either to further adjustments of the multi-agent model or to more variations to test. By testing large numbers of settings quickly in the simulated environment we can select more interesting settings for the human sessions, and thus save research time. The long-term result will, hopefully, be a fiilly validated model of trust with respect to situations comparable to the Trust and Tracing game, where validation is reached for the agent- and the aggregated level.
Multi-Agent Model of Trust in a Human Game
3
93
The Trust and Tracing Game
This section provides a brief description of the Trust and Tracing game; an extensive description is available in [3]. Observations from sessions played are discussed at the end of this section. The focus of study is on trust in a business partner when acquiring or selling commodities with invisible quality. There are five roles: traders (producers, middlemen and retailers), consumers and a tracing agency. . Typically there are 4 producers, 4 middlemen, 4 retailers and 8 consumers, to reflect the multiple steps and oligopoly character of most supply networks. The real quality of a commodity is known by producers only. Sellers may deceive buyers with respect to quality, to gain profits. Buyers have either to rely on information provided by sellers (Trust) or to request a formal quality assessment at the Tracing Agency (Trace). This costs a tracing fee for the buyer if the product is what the seller stated (honest). The agency will punish untruthful sellers by a fine. Results of tracing are reported to the requestor only or by public disgrace depending on the game configuration. A strategy to be a truthful seller is to ask for a trace before selling the product. Sellers use the tracing report as a quality certificate. Middleman and Retailers have an added value for the network by their ability to trace a product cheaper than a consumer can. The game is played in a group of 12 up to 25 persons Commodities usually flow from producers to middlemen, from middlemen to retailers and from retailers to consumers. Players receive 'monopoly' money upfront. Producers receive sealed envelopes representing lots of commodities. Each lot is of a certain commodity type (represented by the colour of the envelope) and of either low or high quality (represented by a ticket covered in the envelope). The envelopes may only be opened by the tracing agency, or at the end of the game to count points collected by the consumers (table 1). The player who has collected most points is the winner in the consumer category. In the other categories the player with maximal profit wins. Table 1. Consumer satisfaction points by commodity type and quality Quality Low High
Blue 1 2
Type Red 2 6
Yellow 3 12
Sessions played until 2005 provided many insights. ([3] and unpublished) We mention three applicable here: 1. Dutch groups (with a high uncertainty tolerant culture [1]) tend to forget about tracing and bypass the middlemen and retailers as they don't add value. This gives the producers a large chance to be opportunistic. Few traces lead to more deceits. 2. American groups tend to prefer guaranteed products. They quickly find out that the most economic way to do this is by purchasing a traced product and to let
94
CM. Jonkeretal.
the middlemen do the trace, as this is the cheapest step. After initial tracing of any lot middlemen start to take samples when relationships establish. 3. Participants who know and trust each other beforehand tend to start trading faster and trace less. The afterwards indignation about deceits that had not been found out during the game is higher in these groups than it is when participants do not know each other.
4
Agent Architecture and Buyer's Model
The agent architecture for simulation of the Trust and Tracing game has been described in [6]. The models for cheating are discussed in [7]. Types of agents acting in the simulated game are trading agents (producers, middlemen, retailers, and consumers) and the tracing agent. The architecture of the tracing agent is straightforward: it reports the real quality of a product lot to the requestor, informs the sellers that a trace has been requested and penalizes untruthful sellers. In this paper we focus on the trading agents and in particular on their behaviour as buyers, entailing the trust-or-trace decision. agent
trading buying parameter management buyer utility evaluation negotiation termination trade proposal detemriination trust or trace selling parameter management seller utility evaluation negotiation termination trust management
trade proposal determination
goal determination
cheating decision
stock control
Fig. 2. Agent process composition Trading agents have processes for initialization, goal determination, trading, which entails the cheating decision in case of selling and the trust-or-trace decision in case of buying, trust management and stock control. In the goal determination process agents decide to buy or to sell, depending on their role and stock position, and selects a partner at random, weighted by success or failure of previous negotiations with particular partners.
Multi-Agent Model of Trust in a Human Game
95
The trading process is based on the algorithm presented in [8]. This approach to multi-attribute simultaneous negotiations is based on utility functions theory. Negotiation partners send complete bids (a set of negotiation object attributes with assigned values) to each other. Once an agent has received a bid it can accept, or respond with an alternative bid, or cancel the negotiation. Agents evaluate their own and their partner's bid using a generalized utility function that is a weighted linear combination of particular attribute evaluation functions. Weights represent preferences of each agent. In this case the utility function uses normalized values of income and risk (which are calculated from the negotiation object attributes) . The buyer's utility function involves individual experience-based trust in the seller as an argument to estimate the risk of being deceived. Modeling of trust for this purpose and experience-based updating of trust - as part of the trust management process - is the subject of subsection 4.1. Subsection 4.2 explains the utility function and the way it can be used to represent agent's preferences or buying strategies. Subsection 4.3 treats the tracing decision. 4.1. Trust Models In literature a variety of definitions of trust phenomena can be found. The common factor in these definitions is that trust is a complex issue relating belief in honesty, trustfulness, competence, reliability of the trusted system actors, see e.g., [9, 10, 11, 12]. Furthermore, the definitions indicate that trust depends on the context in which interaction occurs or on the observer's point of view. According to Ramchum et al. [10] trust can be conceptualized in two directions when designing agents and multi-agent systems: • Individual-level trust - poses agents beliefs over honesty of his interaction partner(s); • System-level trust - system regulation protocols and mechanisms that enforce agents to be trustworthy in interactions. In this paper we address problems and models for individual-level trust as our simulation environment already has system-level trust mechanisms such as the tracing agency that encourage trading agents to be trustworthy. Defining trust as a probability allows relating it with risk. Josang, and Presti [12] analyse the relation between trust and risk and define reliability trust as "trusting party's probability estimate of success of the transaction". This allows for considering economic aspects; agents may decide to trade with low-trust partners if loss in case of deceit is low. An important subprocess of the agent's trust management process is trust update based on tracing results. The current model uses the trust update schema proposed in [13]: g{ev,tv) = dtv + {\-d)ev
(1)
where tv is the current trust value, ev is the experience value, and d is the ratio that introduces the memory effect of the trust update function. This function poses the following properties: monotonicity, positive and negative trust extension, and
96
CM. Jonkeretal.
strict positive and negative progression. This model is suitable, because (1) models learning, necessary, because experience is the only source of information; (2) it has a probabilistic nature, usefull in the calculation of risk; (3) it has a memory effect and allows inflation of experience. Each agent maintains the level of trust he has in the other agents with respect to their role as a supplier and uses tracing results to update its trust. A trace revealing deceit has a negative effect on trust in the partner as a supplier. If a supplier is found truthful by tracing this will strengthen trust. The tracing agent has two different modes, to be set by the game leader: (1) with confidential reports of deceit to the requesting agent only and (2) with public disgrace of deceivers, where all agents are informed if a deceiver has been punished. Experience values were assigned taking into account empirical data that conclude that "it appears easier to destroy trust than to build trust" [14]. This means that negative experience has stronger impact on trust than positive experience has. This assumption is reflected in appropriate experience evaluation values: ev(pos) = 0.5 and ev(neg)=-\. The value of d and the initial value of tv are agent parameters set by the game leader. Usually (i= 0.4 and tv = 0.5. 4.2.
Buyer's Model
Negotiation skill is an important capability of agents since it enables them to efficiently achieve their goals. In the T&T game the trading process is used to achieve trade deals. The negotiation system employed in the simulation is based on utility functions. The utility function for buyers involves the risk of being deceived when buying (stated) high quality commodities. Depending on trust in seller (belief about the opponent) and risk-attitude (personal trait of buyer), the buyer can try to reduce risk. Risk can be eliminated by demanding a quality certificate or reduced by a money-back guarantee. The attributes of a transaction are product type, stated quality, price, and certificate or money-back guarantee. The buyer's utility function is a weighted sum of normalized functions of price, satisfaction difference between high and low quality (for consumers) or expected turnover (for others), and risk (estimate based on trust in seller, guarantee and prices):
+ >^2/exp ected _ turnover {expccted _ tumover(bid')) + wj^.^, {risk^eiier (bid') The weight factors implement buyer's strategies. For quality-minded buyers that are willing to pay to ensure high quality, both W2 and W3 are high relative to Wi, for instance . The opportunistic buyer prefers high quality for low price but is prepared to accept uncertainty, for instance . The suspicious buyer follows an what-you-see-is-what-you-get strategy, represented for instance by . Effective price is the total amount of money that the buyer has to pay:
Multi-Agent Model of Trust in a Human Game
97
P^^^^ effective (pid') = pHce ^^^^,^^^ + cost ^^^^^^^^,^^
(3)
where cost^^^^^^^^.^^ represents some extra cost for the buyer that depends on the type of partner and is taken from the transaction cost matrix to be defined by the game leader. (Table 2 gives an example) Table 2. Example of a transaction cost matrix Buyer Producer Middleman Retailer Consumer
Producer 10 2 4 8
Seller Middleman 100 10 2 4
Retailer 100 100 10 2
Consumer 100 100 100 10
Expected turnover is the average of the agent's beliefs about minimal and maximal future selhng price of the commodity to be bought. For consumers the expected turnover is changed to satisfaction level. Buyer's risk represents the estimation of probable losses for a given trade partner and trade conditions. It is calculated as product of probability of deceit and cost in case of deceit. ^isk,^^^^ ibid) = p,^^^, • cost,^^^,
(4)
Probability of deceit is grater than zero only if commodity quality is high and it is not certified. If these conditions are satisfied than the probability of deceit is estimated as the complement of buyer's trust in the seller. P deceit {bid') = q(bid') • c(bid') • (1 - trust {seller))
(5)
Costs in case of deceit are estimated for middlemen and retailers as the sum of the fine for untruthfully reselling a product and, only if no guarantee is provided, the loss of value that is assumed to be proportional to the loss of consumer satisfaction value taken from table I. The formula for middlemen and retailers is: cost,^^^, {bid) =fine^^^^,.^^+ loss^^^^,.^^ {bid)
(6)
where loss^^^^,.^^ {bid) = g{bid) • price^ff^^,^^ • (l - ratio,^^^,.^, {bid)) and g represents the guarantee function (5): g(bid)=l if the bid involves a guarantee; g(bid)=0 otherwise. For consumers the cost in case of deceit is also assumed to be proportional with the loss of satisfaction value, but they do not risk a fine, so for consumers: ^ost,^^^, {bid) = g{bid) - price^ff,^,^^ • (l - ratio^^^^,.^, {bid))
(7)
(8)
98
CM. Jonkeretal.
This subsection presented the buyer's model. The seller's utility function partially reflects the buyer's model. It considers effective price and risk as attributes, see [7]. 4.3. Tracing Decision
For buyers, trading entails the trust-or-trace decision. In human interaction this decision depends on factors that are not sufficiently well understood to incorporate in a multi-agent system. Hearing a person speak and visual contact significantly influence the estimate of the partner's truthfulness [15]. To not completely disregard these intractable factors the trust-or-trace decision is modeled as a random process instead of as a deterministic process. In our model the agglomerate of all these intractable factors is called the confidence factor. The distribution involves experience-based trust in the seller, the value ratio of high versus low quality, the cost of tracing, and the buyer's confidence factor. Tracing reveals the real quality of a commodity. The tracing agent executes the tracing and punishes cheaters as well as traders reselling bad commodities in good faith. The tracing agent only operates on request and requires some tracing fee. Agents may request a trace for two different reasons. First, they may want to assess the real quality of a commodity they bought. Second, they may provide the tracing result as a quality certificate when reselling the commodity. The decision to request a trace for the second reason originates from the negotiation process. This subsection focuses on the tracing decision for the first reason. Several factors shown in figure 3 influence the tracing decision to be made after buying a commodity. First of all the tracing decision is based on the buyer's trust in seller. Trust is modelled as a subjective evaluation of the probability that the seller would not cheat on the buyer. It is updated using tracing results: positive tracing results increase the trust in seller, negative ones decrease it. Then satisfaction ratio (see Table 1) of the commodity is considered. The buyer would trace more valuable products rather than products with small satisfaction ratio, because damage would be greater. My trust in seller
Good/bad customer satisfaction ratio
Tracing cost/ (effective price+tracing cost)
'' Trace factor: alpha*
Trace if trace factor>random
Fig. 3. Tracing decision model
Confidence
Multi-Agent Model of Trust in a Human Game
99
Tracing costs also influences the decision, so a middleman is more likely to trace than a consumer. The tracing fee depends on the depth to be traced, so for middlemen tracing is cheaper then for consumers. Confidence is an internal characteristic that determines the preference of a particular player to trust rather than trace, represented as a value on the interval [0,1]. The following expressions are used to make tracing decision: tracing _ level (bid) = (1 - trust{seller(bid)) * (l - ratio^^^,;, .^^ (pid)) * * tracing _ cos t _ ratio{bid) * (1 - confidence) where tracing_level(bid) - is a value on the interval [0,1] that represents an evaluation of tracing preference of a given bid and . . 7 . ,x effective priceibid) ,,^, tracing_cost_ratio {bid) = =-^^ (10) tracingjcost + effective _ price(bid) The tracing decision depends on the following rule: //
tracing _ level (bid) >rnd
then
trace
(11)
where rndis a random number in [0,1]. If an agent has decided to trace the product it sends a tracing request message to tracing agent. Once the tracing result has been received the agent updates its trust belief about the seller and adds the product to the stock.
5
Experimental Results
A group of experts possessing empirical knowledge of the game formulated the conceptual model of the Trust and Tracing game's system dynamics on an aggregated level using Vennix' group model building [15], and used it to formulate hypotheses about the effect of selected parameters {confidence, tracing cost, and trust update coefficient^trust, represented by din equation 1) on observable game statistics (number of cheats, traces, certificates, Sind guarantees). The experts used experiences from over 40 sessions with the game (during its development, testing and real world application phase) and knowledge from case studies from literature to express the following hypotheses. As an example we present the hypotheses about confidance: Hypotheses about the effects of confidence. 1. Increasing confidence decreases tracing. A highly confident buyer makes fewer traces as he thinks that his buying mechanism is taking care of risks. Confidence is present in our agent's tracing model and defines threshold for tracing. 2. Increasing confidence increases cheating, because honesty will not be corrected. High confidence means that players perform fewer traces. This means
100
C M . Jonkeretal. that sellers experience low numbers of fines that should decrease their level of honesty. Low level of honesty makes cheating more probable. Increasing confidence decreases certificates. Confidence has reverse impact on the tracing rate. High confidence decreases tracing rate and consequently decreases number of found cheats. This keeps average trust high that is in it turn decreases number of certificates. Increasing confidence increases guarantees. Because high confidence makes average trust higher it reduces risk of providing a guarantee and consequently increases number of guarantees provided.
3.
4.
Computer simulations were performed with populations of 15 agents: 3 producers, 3 middlemen, 3 retailers, 6 consumers. Game sessions are performed in continuous real-time and depend only on the performance of the computer. Agents can be involved in only one transaction a time. This organization allows (future) combining of artificial and human agents in one game session. Values of free parameters were selected uniformly from their definition intervals to confirm the models capability to reproduce desired input-output relationships and explore their sensitivities. Figure 5 presents results of experiments performed for two values of confidence: 0.1 and 0.9 across populations with various risk-taking attitude, respectively: no increased-risk-takers (denoted as "neutral" on the X-axis of the charts on figures 3,4,5), 1 out of 3 risk-takers (denoted as "2:1"), 2 out of 3 risk-takers (denoted as "1:2"), and all risk-takers (denoted as "high"). For risk-taking agents weights in (1) were set to: w^ =0.4;w2 =0.4;w3 = 0 . 2 , for agents with neutral risk attitude weights were Wj = 0.2; w^ = 0.4; W3 = 0.4. Percentage of Trace vs Confidence Level
Percentage of Cheat vs Confidance Leve 50% 1 S S 40% -
I 20% I 10%
nHigh
"S 30% -
• High
DLOW
!> 20% -
• Low
1 10%^ (2:1)
0%-
(1:2)
_
Neutral
Risk talking attitude
^
(2:1)
rH 1 • (1:2)
High
Risk talking attitude
Percentage of Certificates vs Confidence Level
Percentage of Guarantees vs Confidence Level
1 "S 30% f 20% |,or. »•
m.
LFn
"1 Neutral
nHigh DLOW
g
1 ,rB
(2:1)
S ) | 30%
sI (1:2)
Risk taking attitude
/ f _ i + X?then set set If It ^ 7 f _ i + x?then set5f ^ 7 f _ i + x r set 7f ^ 0 set Bf ^ Bf_i + ujf - (7f_i + Xt) set n? ^Bf+m^+i else set5f ^ B f _ i + c j f set7f ^ 7 f _ i + x r - ( B f - i + a ; n set Bf ^ 0 if 7f + m?+i then i7f = 0 else Qf = m?+i - 7f
Our experimental conditions can be written by a five tuples (D, F, 75, L, T). Each term is an independent variable in our experiment, where D = the length of delay, which is either 7^1 = 1 week or 7^3 = 3 weeks, F = the number of firms.
A Counterexample for the Bullwhip Effect in a Supply Chain
107
which is either F 2 = 2 firms or F 4 = 4 firms, IS = inventory strategy, which is either W = the weighted moving average or 5 = the standard moving average, L = the length of weeks moving average, which is either L3 = 3 weeks or L5 = 5 weeks, and T = time series of the customer's order distribution, which is shown in Table 1. Table 1. Time series of customer's order distributions t 12 3 4 5 6 48 49 50 Tl 11 11 11 11 17 17 17 17 17 11 11 11 T2 17 17 17 17 11 11 55 56 57 T5 11 11 11 11 12 13 13 12 11 T6 57 57 57 57 56 55 TC 11 11 11 11 11 11 11 11 11
These independent variables in our experiment are summarized in Fig. 3. Thus, our experiment i s a 2 x 2 x 2 x 2 x 5 design. As in Moyaux et al. [5], Beer Game is repeated 50 weeks in our simulation. D — The length of delay D l A week J93 Three weeks F — The number of firms F2 Two F4 Four
75 = Inventory strategy W Weighted moving average S Standard moving average L = The length of weeks taken for determining moving average L3 Three weeks L5 Five weeks
T = Time series of the customer's order distribution T l Pattern 1 in Moyaux et al.[5] T2 Pattern 2 in Moyaux et al.[5] T5 Pattern 5 in Moyaux et al.[5] T6 Pattern 6 in Moyaux et al.[5] TC Constant (always 11) Fig. 3. Independent variables in our experiment
4 Results Of 80 experimental conditions, there is no significant effect in the factors ST, L, and T. So, from now on, we use ST = W,L = b, and T = TC as a typical case. Our results are summarized in Fig. 5. There are four panes in Fig. 5 and each pane corresponds to each combination of D and F, where If is the firm a;'s inventory at week t.
108
Toshiji Kawagoe and Shihomi Wada
2nd Distributor T
1
r
— ••
r
•
f
1
1
1
1
\ 1
^
1
20
30 week
Factory • "
-
1 '""
—
1
I
1
1
^
1 ^
•
'— A
\
^f
i
11
1
\
\
«
20
30
week
Fig. 4. Time series of inventory, backorder, order, and moving average
1
A Counterexample for the Bullwhip Effect in a Supply Chain JD1,F2
109
D1,F4
e n4 t o r2 y 0
I e n4 t o
r2 y 0
10
20
30
40
50 Week
10
20
30
40
50 Week
Fig. 5. Time series of each firm's inventory level in (W, L5, TC) case Note that only for {D3, FA) case, after t > 32, inventory level for the first distributor, //^S exceeds that of the factory, l[. This confirms qualitatively the existence of a counterexample for bullwhip effect. Of all of our data, similar counterexamples are obtained only when the combination of (i^3, F4) is used. However, this is rather counterintuitive because such counterexamples are obtained even in the environment that the number of firms is many and length of delay is longer. Although there could be many quantitative definition of the bullwhip effect, natural one is that mean inventory level of upstream firm is larger than that of downstream firm. Another one is that variance of inventory level of the upstream firm is larger than that of the downstream firm. To judge our results means quantitatively a counterexample for the bullwhip effect in both senses, descriptive statistics of inventory level of each firm in TC case is shown in Table 2. One can easily see in Table 2 that mean and standard deviation (SD) of inventory level of the first distributor are both larger than that of factory. So, we can say that our results also means a counterexample for the bullwhip effect quantitatively in both senses. To judge whether such a counterexample may disappear in the long run, we also run a simulation with 30,000 weeks. The result of this simulation is shown in Fig. 6. After 5,000th week, inventory level of the first distributor becomes lower than that of the factory, but inventory level of the first distributor exceeds periodically that of the factory after 10,000th week, even the customer's order distribution is constant in this case. So, the counterexample for the bullwhip effect does not disappear even in the long run.
110
Toshiji Kawagoe and Shihomi Wada Table 2. Descriptive statistics of inventory level of each firm TFT
It' li /f Mean 1389.3 8774.9 26082.4 22397.8 SD 1758.1 10545.9 30521.1 25574.6 12 Vledian 12 0 11 0 53016 0 Mode 0 0 0 0 Min. 0 Max. 3781 22118 63990 53016
Sum
69463 438743 1304121 1119889
10000
20000
30000
Week
Fig. 6. Inventory level in the long run in (D3, F4, VT, L5, TC) case
5 Conclusions In our experiment of a supply chain using Beer Game, the number of firms in a supply chain (tv^o or four firms), and the length of the delay between firms (one or three weeks) were compared in the multiagent simulations. In our experiments, we found a counterexample for buUwhip effect such that inventory level of upstream firm was not always larger than that of downstream firm. In addition, contrary to our intuition, such a counterexample was frequently observed under the condition that (1) the number of firms in a supply chain was many (four firms), and that (2) the length of delay was rather longer (three weeks). One may criticize our result because we employed a very simple inventory strategy in our simulation. It is, of course, worthwhile to run another series of experiments by introducing more sophisticated strategies. One may also feel unrealistic that there is no capacity limit for inventory level for each firm in our experiment. Incorporating such restrictions in our environment is also worth considering in the fixture research.
References 1. Kawagoe T, Wada S (2005). "An Experimental Evaluation of the BuUwhip Effect in A Supply Chain". The Third Annual Conference of the European Social Simulation Asso-
A Counterexample for the Bullwhip Effect in a Supply Chain
111
ciation (ESSA) 2005, Accepted. Kawagoe T, Wada S (2005). "How to Define the Bullwhip Effect". North American Association for Computational Social and Organizational Science (NAACSOS) 2005, Accepted. Lee HL, Padmanabhan V, Whang S (1997). "The Bullwhip Effect In Supply Chain," Sloan Management Review, 38, 93-102. Moyaux T, Chaib-draa B, D'Amours S (2004). "Experimental Study of Incentives for Collaboration in the Quebec Wood Supply Game," Management Science, submitted. Moyaux T, Chaib-draa B, D'Amours S (2004). "Multi-Agent Simulation of Collaborative Strategy In A Supply Chain," Autonomous Agents & Multi Agent Systems (AAMAS) 2004. Cachon GP, Netessine S (2003). "Game Theory in Supply Chain Analysis". In D. SimchiLevi, S. D. Wu, and Z.-J. M. Shen, eds. Supply Chain Analysis in the eBusiness Era. Kluwer.
Bottom-Up Approaches
Collective Efficiency in Two-Sided Matching Tomoko Fuku, Akira Namatame^ and Taisei Kaizouji^ ^{g4 3 0 3 6,nama}@nda.ac.jp Department of Computer Science, National Defense Academy, Yokosuka, Kanagawa, 239-8686, Japan •^ k a i z o j i@icu. ac . j p Division of Social Sciences, International Christian University, 3-10-2 Osawa, Mitaka, Tokyo, 181-8585 Japan
Summary. Gale and Shapley originally proposed the two-sided matching algorithm. Deferred Acceptance Algorithm (DA). This is very brilliant method, but it has a demerit which produces. That is if men propose, it produces stable matching which is the best for men and the worst for women, and vise versa. In this paper, we propose a new algorithm with compromise that produce the balanced matching which are almost optimal for both sides. It is an important issue how far agents seek their own interests in a competitive environment. There are overwhelming evidences that support peoples are also motivated by concerns for fairness and reciprocity. We will show that compromise which is individually irrational improves the welfare of the whole groups. The reasonable compromise level is obtained as the function of the size of the group so that the social utility should be maximized. We also obtain large-scale properties of the proposed algorithm.
1
Introduction
Some researchers have started to take a direct role in issues of designing market, e.g. labor market, a venue for bilateral trading require a proper matching. But, in considering the design of markets, is extremely complex. Markets evolve, but they are also designed. The complexity of market design comes from many factors, especially strategic behaviors of participants. A market is two-sided if there are two sets of agents, and if an agent from one side of the market can be matched only with an agent from the other side. One of the main functions of many markets is to match one kind of agent with another: e.g. students and colleges, workers and firms, marriageable men and women. A two-sided matching model is introduced by (Gale and Shapley 1962), and they invented the deferred acceptance algorithm. One of the basic problems in societies is to match one kind of agent with another, e.g. marriageable men and women students and colleges, workers and firms. A two-sided matching model was introduced by Gale and Shapley, and they focused on college admissions and marriage. They proposed that a matching (of students and colleges, or men and women) could be regarded as stable only if it left
116
Tomoko Fuku et al.
no pair of agents on opposite sides of the market who were not matched to each other but would both prefer to be. A natural application of two-sided matching models is to labor markets. (Shapley and Shubik 1972) showed that the properties of stable matching are robust to generalizations of the model which allow both matching and wage determination to be considered together. (Kelso and Crawford 1982) showed how far these results can be generalized when firms, for example, may have complex preferences over the composition of their workforce. Two-sided matching models have proved useful in the empirical study of labor markets, starting with the demonstration in (Roth 1984). Subsequent work has identified natural experiments which show that labor markets organized so as to produce wwstable matching suffer from certain kinds of difficulties which are largely avoided in comparable markets organized to produce stable matching. This work combines the traditions of cooperative and noncooperative game theory, by considering how the strategic environment faced by market participants influences the stability of the resulting market outcome. Much of two-sided matching theory is concerned with determining the conditions under which stable matching exist, and with what algorithms these matching can be achieved. A two-sided matching could be regarded as stable if it left no pair of agents on opposite sides of the market was not matched to each other but would both prefer to be. The relationship between the concept of Pareto optimality and the stability of a matching has been also investigated. Pareto optimality requires that no change exists that betters every individual in the population. The concept of a stable matching is stronger than that of a Pareto optimal matching, in that every stable matching is Pareto optimal, but not every Pareto optimal matching is stable. Pareto optimality requires that no two individuals wish to elope together and would receive the consent of their partners. Stable matching, by contrast, requires that no two individuals wish to elope together, whether or not their partners would consent. For instance let consider two groups of marriageable men and women. A stable matching / / is defined as M-optimal if no male prefers any other stable matching. F-optimality is defined analogously. However, the M-optimal matching is not only the best stable matching for the males, it is also always the worst stable matching for the females. In fact, male and female preferences conflict in this way over any pair of stable matching, not just the M- and F-optimal ones. In this paper, we propose a new algorithm for two-sided matching with some compromise. We discuss the self-interested hypothesis vs. human sociality hypothesis. It is an important issue such as how far agents seek their own interest in a competitive environment? There are overwhelming evidences that support peoples are also motivated by concerns for fairness and reciprocity. We showed that compromise, an individually irrational behavior, improves the welfare of others. We also obtain large-scale properties of some two-sided matching algorithms. We show some compromises of individuals increase global welfare. The optimal compromise level is designed so that the social utility is maximized.
Collective Efficiency in Two-Sided Matching
2
117
A Formulation of Two-Sided Matching Problem
There are two disjoint sets of agents, groups of men = {mj,..., m„}, and women = {f]y">fn}' Associated with each side is the number of positions they have to offer. Agents on each side of the market have (transitive and strict) preferences over agents on the other side, with presented a simple model in which college applicants have ordinal preferences over schools, and colleges have ordinal preferences over applicants. How, given these preferences, could college applicants be matched to schools so as to leave both the students and the colleges as satisfied as possible? The authors derived a clever algorithm (described below) designed to create efficient pairings. Their algorithm matches students and schools in such a way that no student wishes to leave her current school for an alternative institution that would be willing to admit her. Subsequent authors expanded upon Gale and Shapley's work, extending their theoretical framework while applying two-sided matching theory to problems ranging from labor markets to human courtship and marriage. Consider a population, each member of which falls into one of two sets: the set of all males M ={mi, m2, ms,...., m„} and the set of all females F =(/}, f2, fs,..-,/»} • • Let each individual m/ or^ have a list of strict pairing preferences P over the individuals in the other set. For example, a female fj might have preferences P(fJ) ={mi, m4,fj, ma, nij}, meaning that male mi would be her best choice, ra^ would be her second choice, and she would rather remain 'single' (represented by pairing with herself,^) than form a pair with either m2 or ms. A matching is simply a list of all the pairings in the population (where having oneself for a mate means that one remains single). We indicate the mate of an individual X under matching ju by using ju (z)^^^ short. Now we are ready to consider the notion of the stability of a matching (Knuth 1962). An individual is said to block the matching ju if he or she prefers remaining single to taking the mate assigned by //. A pair m and/are said to block the matching ju if they are not matched by ju, but prefer one another to their mates as assigned by matching ju. Put another way, given matching ju, a blocking pair is a pair that would willingly abandon their mates as determined by ju and elope instead with one another. Finally, the matching ju is defined as stable if it is not blocked by any pair of agents.
3
Deferred Acceptance Algorithm and Its Properties
What happens when preferences are not uniform? One of the most remarkable results from two-sided matching theory is that, even under non-uniform preferences, a stable matching (or set of stable matching) exists in every monogamous matching system. To prove this, it is sufficient to describe an algorithm by which a stable matching can be constructed for any such system. We need not suppose that
118
Tomoko Fuku et al.
pairing actually occurs by this algorithm in the system we are considering. Rather, the algorithm simply serves as a tool in the proof that a stable matching exists. Below, we outline the deferred acceptance algorithm (DA).ThQ matching procedure then proceeds repeatedly through the following steps. Each male not currently engaged displays to his favorite female that has not already rejected him. If no acceptable females remain, he remains unmated. Each female who has received one or more courtship displays in this round rejects all but her highest-ranked acceptable male. This may involve rejecting a previously engaged male. After a large number of rounds, no new displays will be made. At this point, the algorithm terminates. All females are paired with the male to whom they are currently engaged; individuals not engaged remain unmated. The matching ju generated in this way is easily seen to be stable. No male wishes to leave his mate at ju for a female who prefers him to her mate at ju, because each male reached his current mate by sequentially courting females in order of preference. No female wishes to leave her mate at ju for a male who prefers her to his mate at ju, because she will have already received a courtship display from any male who is not matched to a female that he prefers to her. Reversing the algorithm, so that the females display and the males accept or reject court-ships, will also lead to a stable matching; this matching may be a different one than that found by the male-courtship form of the algorithm. However, the set of individuals remaining unmated is the same in every stable matching of any given monogamous mating system. As above-mentioned. Deferred Acceptance Algorithm is defined (1) Manoptimal stable matching or (2) Woman-optimal stable matching. That is (1) the matching hM produced by the deferred acceptance algorithm with men proposing is the M-optimal stable matching, or, (2) the W-optimal stable matching is the matching h^ produced when the women propose (Gale and Shapley). And it's emerged that the best outcome for one side of the market is the worst for the other, i.e. M-optimal stable matching is the worst for women. And W-optimal stable matching is the worst for men (Knuth 1972). It may be helpfiil to look at this problem with some concrete example. Consider a group of women(Ann, Betty and Carol) and a group of men (Dave, Eddy and Frank). The preferences of those are given in Table 1(3 represents the highest preference and 1 represents the lowest). In this matching system, there are two stable matching. One (call it / / ; ) pairs Betty with Dave, Ann with Eddy, and Carol with Frank. The other, jU2, pairs Betty with Dave, Ann with Frank, and Carol with Eddy. Any other matching will allow at least one blocking individual or pair. In fact, since Betty and Dave are one another's best choices, any matching / / _ which does not pair them together will be blocked by this pair.
Collective Efficiency in Two-Sided Matching
119
Table 1. Preferences relation Dave Eddy Frank Ann Betty Carol
mmm
1 3
3 2
3
2
2
2 3
1 1 3
1
mmm 2
1
Table 2. Preference relation Betty and Dave are removedfromTable. 1 Eddy Frank Ann Carol
1
2 2
1
2
1 2
1
If v^e are interested in yielding all stable matching, we can remove Betty and Dave from the preference list, yielding the reduced preference list given in Table 2. When all preferences are uniform - that is, when all males have the same preferences over females and vice versa - it is easy to see that a unique stable matching exists. To see this for monogamous matching systems, label the members of each sex by the preferences of the other sex (so that the best-ranking male mi is the best choice of the females, m2 is the second choice, etc.). Under this system, the only possible stable matching will be approved.
4
Proposed of New Algorithms
Before we state our algorithms, we define utility measures. In this section, we propose two new algorithms based on the hypothesis bounded rationality. In this paper, we propose a new algorithm for two-sided matching some compromise at individual levels. We also obtain large-scale properties of the proposed algorithm. It is an important issue how far agents seek their own interests in a competitive environment. There are overwhelming evidences that support peoples are also motivated by concerns for fairness and reciprocity. We will show that compromise which is individually irrational improves the welfare of the whole groups. The
120
Tomoko Fuku et al.
reasonable compromise level is obtained as the function of the size of the group so that the social utility should be maximized.
There are two disjoint sets of agents, groups of men = {mj,..., rrin}, and women = {fi,...,fn}' Associated with each side is the number of positions they have to offer. Agents on each side of the market have (transitive and strict) preferences over agents on the other side, with presented a simple model in which college applicants have ordinal preferences over schools, and colleges have ordinal preferences over applicants. N : the size of each group T : the preference level of the partner to be matched (/ < T 1 a constant capturing retooling and adjustment costs to be sustained each time period the production process starts; finally, c > 0 parameterizes bankruptcy costs.^ The optimal capital stock, obtained from (1), turns out to be a (non-linear) decreasing function of the interest rate and an increasing (linear) function of the net worth: Kf, = {'^-gru)/ic 3. A graph (network) G is a set of links between the agents, formally G C J\f x J\f. A link is then a (directed ^ The number of equilibria seems to grow exponentially with N. N = 6 is the minimum case with a wide multiplicity of equilibria, however limited to a treatable number (~ 20).
A Model of Myerson-Nash Equilibria in Networks
177
couple) of elements from A/*: ga,b = (a, b) e 0,3. link may also be indicated by the greek letters rj,(^,6... Q will be the set of all possible G on A/*. We call graph architecture the class of equivalence in Q that can be obtained with permutations of the elements of A/*. Most of the functions and the properties in the paper will be invariant under permutations of the elements, so that we will consider graphs with this class of equivalence in mind. Subgraph of G will be synonym of subset, we will indicate also G\A = {(a, b) : (a, b) e G^a ^ A^b ^ A} when A C J\f, by another abuse of notation G\a = G\{a}. Clearly g\A QG^^AC J\f. A graph is undirected if ga,b ^ Q => 9b,a € G, irreflexive if ga,a ^ ^ V a G A/". From now on we will consider only undirected and irreflexive graphs, calling l{a) the number of links involving a and L{G) the total number of links in G (so that E„6ArK«) = 2 - L ( G ) ) . Every G on J\f defines a topology on it. A path Ha^b ii^ G between a and b is an ordered set of agents (a, a 2 , . . . a^, b)neN such that {ga,a2,9a2,a3, • --gan^bjuen Q G. XciHa.b) = \Ha,b\ — 1 is the length of the path. Ha,a is a cycle (in irreflexive graphs XG{Ha,a) > !)• The distance between a and 6 in G is dcia, b) = min{AG(i^a,6)} if defined, otherwise d(a, 6) = oo if /9 Ha,b- The diameter of a graph is Do = m8ix{dG{a, b) : a, 6 € G}. Ifdcia^ b) < oo (i.e. there is a path between a and 6) we say that a and b are connected (we will write a \X\G b). The definition of cluster is consequential: rcia) = {(a, 6) : a X\G b} C G. Here the cluster is a subgraph, we will extend the meaning of F so as to include also the implied subset of J\f. We will write F \Z G to mean that T is a cluster in G, and hence C C G A/^ if the elements of C are all and only the ones in the couples of F. G is connected if DG < oo ==> V a G A/*, / G ( 0 for any agent, for any link she has. Considering the global economy, every link will then cost 2 - k to the aggregate N agents. The value of A; may change for the purpose of comparative statics, but never within a single game. Since the presence of links will be the only source of loss for the agents, we will consider gross and net profits, depending on whether the cost of connections is considered or not. A value function is a relation F : ^ -^ E, a global profit is associated to every possible network. If A C A/* we will write VG{A) to indicate V{{{a, b) e A x A : {a^b)eG}). A value function is anonymous if it is invariant under permutations of A/*, it is strictly
178
Paolo Pin
super-additive if V G G ^, V J, K G G, nonempty and disjoint: V{J) + V[K)