366 91 4MB
English Pages 291 [303] Year 2021
Springer Series in Synergetics
Eric Bertin
Statistical Physics of Complex Systems A Concise Introduction 3rd Edition
Springer Series in Synergetics Series Editors Henry D.I. Abarbanel, Institute for Nonlinear Science, University of California, San Diego, CA, USA Dan Braha, New England Complex Systems Institute, Cambridge, MA, USA Péter Érdi, Center for Complex Systems Studies, Kalamazoo College, USA, Hungarian Academy of Sciences, Budapest, Hungary Karl J Friston, Institute of Cognitive Neuroscience, University College London, London, UK Hermann Haken, Center of Synergetics, University of Stuttgart, Stuttgart, Germany Viktor Jirsa, Centre National de la Recherche Scientifique (CNRS), Université de la Méditerranée, Marseille, France Janusz Kacprzyk, Systems Research, Polish Academy of Sciences, Warsaw, Poland Kunihiko Kaneko, Research Center for Complex Systems Biology, The University of Tokyo, Tokyo, Japan Scott Kelso, Center for Complex Systems and Brain Sciences, Florida Atlantic University, Boca Raton, FL, USA Markus Kirkilionis, Mathematics Institute and Centre for Complex Systems, University of Warwick, Coventry, UK Jürgen Kurths, Nonlinear Dynamics Group, University of Potsdam, Potsdam, Germany Ronaldo Menezes, Computer Science Department, University of Exeter, Exeter, UK Andrzej Nowak, Department of Psychology, Warsaw University, Warsaw, Poland Hassan Qudrat-Ullah, Decision Sciences, York University, Toronto, ON, Canada Linda Reichl, Center for Complex Quantum Systems, University of Texas, Austin, TX, USA Frank Schweitzer, System Design, ETH Zurich, Zurich, Switzerland Didier Sornette, Entrepreneurial Risk, ETH Zurich, Zurich, Switzerland Stefan Thurner, Section for Science of Complex Systems, Medical University of Vienna, Vienna, Austria Editor-in-Chief Peter Schuster, Theoretical Chemistry and Structural Biology, University of Vienna, Vienna, Austria
Springer Series in Synergetics Founding Editor: H. Haken The Springer Series in Synergetics was founded by Herman Haken in 1977. Since then, the series has evolved into a substantial reference library for the quantitative, theoretical and methodological foundations of the science of complex systems. Through many enduring classic texts, such as Haken’s Synergetics and Information and Self-Organization, Gardiner’s Handbook of Stochastic Methods, Risken’s The Fokker Planck-Equation or Haake’s Quantum Signatures of Chaos, the series has made, and continues to make, important contributions to shaping the foundations of the field. The series publishes monographs and graduate-level textbooks of broad and general interest, with a pronounced emphasis on the physico-mathematical approach.
More information about this series at http://www.springer.com/series/712
Eric Bertin
Statistical Physics of Complex Systems A Concise Introduction Third Edition
Eric Bertin LIPhy, CNRS and Université Grenoble Alpes Grenoble, France
ISSN 0172-7389 ISSN 2198-333X (electronic) Springer Series in Synergetics ISBN 978-3-030-79948-9 ISBN 978-3-030-79949-6 (eBook) https://doi.org/10.1007/978-3-030-79949-6 1st edition: © The Author(s) 2012 2nd edition: © Springer International Publishing Switzerland 2016 3rd edition: © Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To Célia, Aurélien and Noémie
Preface to the Third Edition
After a very concise first edition in booklet format, and an already expanded second edition, it has appeared relevant to further expand this book into a third edition providing the reader with a broader coverage of applications of the statistical physics formalism to different types of complex systems. In this spirit, Chaps. 3 and 4 of the second edition have been expanded and reorganized into four new chapters. In this third edition, Chap. 3 explicitly focuses on physical systems driven out of equilibrium and presents different approaches useful to describe such systems, either through exact solutions of the corresponding stochastic models or using systematic approximation schemes. Chapter 4 addresses the application of the statistical physics formalism to the modeling of social systems, and a number of simple models are presented, together with their analytic solutions. Particular emphasis is put on the notion of phase transition in this context, to unify different phenomena like urban segregation or traffic congestion. Chapter 5 deals with population dynamics and evolution in biology, keeping the discussion at an elementary level. Some key notions like fixation probability and fitness landscape are introduced, using a simple model of dynamics with selection and mutation. Spatial clustering is also discussed. Then Chap. 6 discusses complex networks, starting from an elementary presentation of different classes of static random networks, and further addressing dynamical phenomena taking place on complex networks like epidemic or rumor propagation, as well as dynamical models of neural networks. Although part of these topics were already present in the second edition, the current third edition offers a clear-cut and much broader coverage of these different topics, through dedicated chapters. In addition to these new chapters, Chap. 2 has also been slightly expanded with respect to the second edition, with a new section on random walk return times, and a brief discussion of stochastic calculus. Chapters 5 and 6 of the second edition are now relabeled as Chaps. 7 and 8, with a few updates in Chap. 7. Last but not least, some exercises have also been included in each chapter of this new edition, with the corresponding solutions gathered at the end of the book. While the book has now expanded up to more than 250 pages, I have tried to make it as accessible as possible to a broad readership interested in understanding the underlying mathematical formalism of the statistical physics of complex systems. I have also kept in mind the idea to provide the reader with an overview of topics vii
viii
Preface to the Third Edition
of interest both in the core of statistical physics and in more interdisciplinary applications, providing a useful introduction to more advanced and specialized sources. The reader interested in going deeper into specific topics is referred to a number of further references, listed for convenience at the end of each chapter. Grenoble, France April 2021
Eric Bertin
Preface to the Second Edition
The first edition of this book was written on purpose in a very concise, booklet format to make it easily accessible to a broad interdisciplinary readership of science students and research scientists with an interest in the theoretical modeling of complex systems. Readers were assumed to typically have some Bachelor-level background in mathematical methods, but no a priori knowledge in statistical physics. A few years after this first edition, it has appeared relevant to significantly expand it to a full—though still relatively concise—book format in order to include a number of important topics that were not covered in the first edition, thereby raising the number of chapters from three to six. These new topics include non-conserved particles, evolutionary population dynamics, networks (Chapter 4), properties of both individual and coupled simple dynamical systems (Chapter 5), as well as probabilistic issues like convergence theorems for the sum and the extreme values of a large set of random variables (Chapter 6). A few short appendices have also been included, notably to give some technical hints on how to perform simple stochastic simulations in practice. In addition to these new chapters, the first three chapters have also been significantly updated. In Chap. 1, the discussions of phase transitions and of disordered systems have been slightly expanded. The most important changes in these previously existing chapters concern Chap. 2. The Langevin and Fokker–Planck equations are now presented in separate subsections, including brief discussions about the case of multiplicative noise, the case of more than one degree of freedom, and the Kramers– Moyal expansion. The discussion of anomalous diffusion now focuses on heuristic arguments, while the presentation of the Generalized Central Limit Theorem has been postponed to Chap. 6. Chapter 2 then ends with a discussion of several aspects of the relaxation to equilibrium. Finally, Chap. 3 has also undergone some changes, since the presentation of the Kuramoto model has been deferred to Chap. 5, in the context of deterministic systems. The remaining material of Chap. 3 has then been expanded, with discussions of the Schelling model with two types of agents, of the dissipative zero-range process, and of assemblies of active particles with nematic symmetries. Although the size of this second edition is more than twice the size of the first one, I have tried to keep the original spirit of the book, so that it could remain accessible ix
x
Preface to the Second Edition
to a broad, non-specialized readership. The presentations of all topics are limited to concise introductions, and are kept to a relatively elementary level—not avoiding mathematics, though. The reader interested in learning more on a specific topic is then invited to look at other sources, like specialized monographs or review articles. Grenoble, France May 2016
Eric Bertin
Preface to the First Edition
In recent years, statistical physics started raising the interest of a broad community of researchers in the field of complex system sciences, ranging from biology to social sciences, economics or computer sciences. More generally, a growing number of graduate students and researchers feel the need for learning some basics concepts and questions coming from other disciplines, leading, for instance, to the organization of recurrent interdisciplinary summer schools. The present booklet is partly based on the introductory lecture on statistical physics given at the French Summer School on Complex Systems held both in Lyon and Paris during the summers 2008 and 2009, and jointly organized by two French Complex Systems Institutes, the “Institut des Systèmes Complexes Paris Ile de France” (ISCPIF) and the “Institut Rhône-Alpin des Systèmes Complexes” (IXXI). This introductory lecture was aimed at providing the participants with a basic knowledge of the concepts and methods of statistical physics so that they could later on follow more advanced lectures on diverse topics in the field of complex systems. The lecture has been further extended in the framework of the second year of Master in “Complex Systems Modelling” of the Ecole Normale Supérieure de Lyon and Université Lyon 1, whose courses take place at IXXI. It is a pleasure to thank Guillaume Beslon, Tommaso Roscilde, and Sébastian Grauwin, who were also involved in some of the lectures mentioned above, as well as Pablo Jensen for his efforts in setting up an interdisciplinary Master course on complex systems, and for the fruitful collaboration we had over the last years. Lyon, France June 2011
Eric Bertin
xi
Contents
1 Equilibrium Statistical Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Microscopic Dynamics of a Physical System . . . . . . . . . . . . . . . . . . . 1.1.1 Conservative Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 Properties of the Hamiltonian Formulation . . . . . . . . . . . . . . . 1.1.3 Many-Particle System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.4 Case of Discrete Variables: Spin Models . . . . . . . . . . . . . . . . 1.2 Statistical Description of an Isolated System at Equilibrium . . . . . . 1.2.1 Notion of Statistical Description: A Toy Model . . . . . . . . . . . 1.2.2 Fundamental Postulate of Equilibrium Statistical Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.3 Computation of (E) and S(E): Some Simple Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.4 Distribution of Energy Over Subsystems and Statistical Temperature . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Equilibrium System in Contact with Its Environment . . . . . . . . . . . . 1.3.1 Exchanges of Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Canonical Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.3 Exchanges of Particles with a Reservoir: The Grand-Canonical Ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Phase Transitions and Ising Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Ising Model in Fully Connected Geometry . . . . . . . . . . . . . . . 1.4.2 Ising Model with Finite Connectivity . . . . . . . . . . . . . . . . . . . 1.4.3 Renormalization Group Approach . . . . . . . . . . . . . . . . . . . . . . 1.5 Disordered Systems and Glass Transition . . . . . . . . . . . . . . . . . . . . . . 1.5.1 Theoretical Spin-Glass Models . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.2 A Toy Model for Spin Glasses: The Mattis Model . . . . . . . . 1.5.3 The Random Energy Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 1 1 3 5 6 6 6 7 9 11 13 13 16 17 18 19 21 23 29 30 30 32 35 37
xiii
xiv
Contents
2 Non-stationary Dynamics and Stochastic Formalism . . . . . . . . . . . . . . . 2.1 Markovian Stochastic Processes and Master Equation . . . . . . . . . . . . 2.1.1 Definition of Markovian Stochastic Processes . . . . . . . . . . . . 2.1.2 Master Equation and Detailed Balance . . . . . . . . . . . . . . . . . . 2.1.3 A Simple Example: The One-Dimensional Random Walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Langevin Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Phenomenological Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Basic Properties of the Linear Langevin Equation . . . . . . . . . 2.2.3 More General Forms of the Langevin Equation . . . . . . . . . . . 2.2.4 Relation to Random Walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Fokker–Planck Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Continuous Limit of a Discrete Master Equation . . . . . . . . . . 2.3.2 Kramers–Moyal Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 More General forms of the Fokker–Planck Equation . . . . . . 2.3.4 Stochastic Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Anomalous Diffusion: Scaling Arguments . . . . . . . . . . . . . . . . . . . . . 2.4.1 Importance of the Largest Events . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 Superdiffusive Random Walks . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3 Subdiffusive Random Walks . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 First Return Times, Intermittency, and Avalanches . . . . . . . . . . . . . . 2.5.1 Statistics of First Return Times to the Origin of a Random Walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Application to Stochastic On–Off Intermittency . . . . . . . . . . 2.5.3 A Simple Model of Avalanche Dynamics . . . . . . . . . . . . . . . . 2.6 Fast and Slow Relaxation to Equilibrium . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 Relaxation to Canonical Equilibrium . . . . . . . . . . . . . . . . . . . . 2.6.2 Dynamical Increase of the Entropy . . . . . . . . . . . . . . . . . . . . . 2.6.3 Slow Relaxation and Physical Aging . . . . . . . . . . . . . . . . . . . . 2.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Models of Particles Driven Out of Equilibrium . . . . . . . . . . . . . . . . . . . . 3.1 Driven Steady States of a Particle with Langevin Dynamics . . . . . . . 3.1.1 Non-zero Flux Solution of the Fokker–Planck Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Ratchet Effect in a Time-Dependent Asymmetric Potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.3 Active Brownian Particle in a Potential . . . . . . . . . . . . . . . . . . 3.2 Dynamics with Creation and Annihilation of Particles . . . . . . . . . . . 3.2.1 Birth–Death Processes and Queueing Theory . . . . . . . . . . . . 3.2.2 Reaction–Diffusion Processes and Absorbing Phase Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Fluctuations in a Fully Connected Model with an Absorbing Phase Transition . . . . . . . . . . . . . . . . . . . .
39 40 40 41 43 46 46 48 51 53 55 55 58 59 61 63 64 66 67 69 70 72 73 75 75 77 79 83 84 87 88 88 90 91 93 94 95 98
Contents
xv
3.3 Solvable Models of Interacting Driven Particles on a Lattice . . . . . . 3.3.1 Zero-Range Process and Condensation Phenomenon . . . . . . 3.3.2 Dissipative Zero-Range Process and Energy Cascade . . . . . . 3.3.3 Asymmetric Simple Exclusion Process . . . . . . . . . . . . . . . . . . 3.4 Approximate Description of Driven Frictional Systems . . . . . . . . . . 3.4.1 Edwards Postulate for the Statistics of Configurations . . . . . 3.4.2 A Shaken Spring-Block Model . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.3 Long-Range Correlations for Strong Shaking . . . . . . . . . . . . 3.5 Collective Motion of Active Particles . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Derivation of Continuous Equations . . . . . . . . . . . . . . . . . . . . 3.5.2 Phase Diagram and Instabilities . . . . . . . . . . . . . . . . . . . . . . . . 3.5.3 Varying the Symmetries of Particles . . . . . . . . . . . . . . . . . . . . 3.6 Exercices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
101 102 104 107 111 112 113 115 116 117 121 121 123 125
4 Models of Social Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Dynamics of Residential Moves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 A Simplified Version of the Schelling Model . . . . . . . . . . . . . 4.1.2 Condition for Phase Separation . . . . . . . . . . . . . . . . . . . . . . . . 4.1.3 The “True” Schelling Model: Two Types of Agents . . . . . . . 4.2 Traffic Congestion on a Single Lane Highway . . . . . . . . . . . . . . . . . . 4.2.1 Agent-Based Model and Statistical Description . . . . . . . . . . . 4.2.2 Congestion as an Instability of the Homogeneous Flow . . . . 4.3 Symmetry-Breaking Transition in a Decision Model . . . . . . . . . . . . . 4.3.1 Choosing Between Stores Selling Fresh Products . . . . . . . . . 4.3.2 Mean-Field Description of the Model . . . . . . . . . . . . . . . . . . . 4.3.3 Symmetry-Breaking Phase Transition . . . . . . . . . . . . . . . . . . . 4.4 A Dynamical Model of Wealth Repartition . . . . . . . . . . . . . . . . . . . . . 4.4.1 Stochastic Coupled Dynamics of Individual Wealths . . . . . . 4.4.2 Stationary Distribution of Relative Wealth . . . . . . . . . . . . . . . 4.4.3 Effect of Taxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Emerging Properties at the Agent Scale Due to Interactions . . . . . . . 4.5.1 A Simple Model of Complex Agents . . . . . . . . . . . . . . . . . . . . 4.5.2 Collective Order for Interacting Standardized Agents . . . . . . 4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
129 130 131 133 136 137 138 140 142 143 143 145 146 146 148 149 150 150 154 156 157
5 Stochastic Population Dynamics and Biological Evolution . . . . . . . . . . 5.1 Motivation and Goal of a Statistical Description of Evolution . . . . . 5.2 Selection Dynamics Without Mutations . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Moran Model and Fisher’s Theorem . . . . . . . . . . . . . . . . . . . . 5.2.2 Fixation Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Fitness Versus Population Size: How Do Cooperators Survive? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Effect of Mutations on Population Dynamics . . . . . . . . . . . . . . . . . . . 5.3.1 Quasi-static Evolution Under Mutations . . . . . . . . . . . . . . . . .
159 159 161 161 163 165 166 166
xvi
Contents
5.3.2 Notion of Fitness Landscape . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Selection and Mutations on Comparable Time Scales . . . . . . 5.3.4 Biodiversity Under Neutral Mutations . . . . . . . . . . . . . . . . . . . 5.4 Real Space Neutral Dynamics and Spatial Clustering . . . . . . . . . . . . 5.4.1 Local Population Fluctuations in the Absence of Diffusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Can Diffusion Smooth Out Local Population Fluctuations? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
168 169 171 174
6 Complex Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Basic Types of Complex Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Random Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.2 Small-World Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.3 Preferential Attachment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Dynamics on Complex Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Basic Description of Epidemic Spreading: The SIR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Epidemic Spreading on Heterogeneous Networks . . . . . . . . . 6.2.3 Rumor Propagation on Social Networks . . . . . . . . . . . . . . . . . 6.3 Formal Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Modeling a Network of Interacting Neurons . . . . . . . . . . . . . 6.3.2 Asymmetric Diluted Hopfield Model . . . . . . . . . . . . . . . . . . . 6.3.3 Perceptron and Constraint Satisfaction Problem . . . . . . . . . . 6.4 Exercices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
181 182 182 184 185 188
7 Statistical Description of Dissipative Dynamical Systems . . . . . . . . . . . 7.1 Basic Notions on Dissipative Dynamical Systems . . . . . . . . . . . . . . . 7.1.1 Fixed Points and Simple Attractors . . . . . . . . . . . . . . . . . . . . . 7.1.2 Bifurcations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.3 Chaotic Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Deterministic Versus Stochastic Dynamics . . . . . . . . . . . . . . . . . . . . . 7.2.1 Qualitative Differences and Similarities . . . . . . . . . . . . . . . . . 7.2.2 Stochastic Coarse-Grained Description of a Chaotic Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.3 Statistical Description of Chaotic Systems . . . . . . . . . . . . . . . 7.3 Globally Coupled Oscillators and Synchronization Transition . . . . . 7.3.1 The Kuramoto Model of Coupled Oscillators . . . . . . . . . . . . . 7.3.2 Synchronized Steady State . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.3 Coupled Non-linear Oscillators and “Oscillator Death” Phenomenon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 A General Approach for Globally Coupled Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.1 Coupling Low-Dimensional Dynamical Systems . . . . . . . . .
175 176 178 179
189 191 194 196 196 197 201 203 204 207 207 207 210 212 214 214 215 217 219 219 221 224 227 227
Contents
xvii
7.4.2 Description in Terms of Global Order Parameters . . . . . . . . . 7.4.3 Stability of the Fixed Point of the Global System . . . . . . . . . 7.5 Exercices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
228 230 232 233
8 A Probabilistic Viewpoint on Fluctuations and Rare Events . . . . . . . . 8.1 Global Fluctuations as a Random Sum Problem . . . . . . . . . . . . . . . . . 8.1.1 Law of Large Numbers and Central Limit Theorem . . . . . . . 8.1.2 Generalization to Variables with Infinite Variances . . . . . . . . 8.1.3 Case of Non-identically Distributed Variables . . . . . . . . . . . . 8.1.4 Case of Correlated Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.5 Coarse-Graining Procedures and Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Rare and Extreme Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.1 Different Types of Rare Events . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.2 Extreme Value Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.3 Statistics of Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Large Deviation Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.1 A Simple Example: The Ising Model in a Magnetic Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.2 Explicit Computations of Large Deviation Functions . . . . . . 8.3.3 A Natural Framework to Formulate Statistical Physics . . . . . 8.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
235 235 236 237 240 244 245 247 247 248 250 252 253 254 255 256 257
Appendix A: Dirac Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 Appendix B: Numerical Simulations of Markovian Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 Appendix C: Drawing Random Variables with Prescribed Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Solutions of the Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
Chapter 1
Equilibrium Statistical Physics
Systems composed of many particles involve a very large number of degrees of freedom, and it is most often uninteresting—not to say hopeless—to try to describe in a detailed way the microscopic state of the system. The aim of statistical physics is rather to restrict the description of the system to a few relevant macroscopic observables, and to predict the average values of these observables, or the relations between them. A standard formalism, called “equilibrium statistical physics”, has been developed for systems of physical particles having reached a statistical steady state in the absence of external drive (like heat flux or shearing forces for instance). In this first chapter, we shall discuss some of the fundamentals of equilibrium statistical physics. Section 1.1 describes the elementary mechanical notions necessary to describe a system of physical particles. Section 1.2 introduces the basic statistical notions and fundamental postulates required to describe in a statistical way a system that exchanges no energy with its environment. The effect of the environment is then taken into account in Sect. 1.3, in the case where the environment does not generate any sustained energy flux in the system. Applications of this general formalism to the description of collective phenomena and phase transitions are presented in Sect. 1.4. Finally, the influence of disorder and heterogeneities, which are relevant in physical systems, but are also expected to play an essential role in many other types of complex systems, is briefly discussed in Sect. 1.5. For further reading on these topics related to equilibrium statistical physics (especially for Sects. 1.2 to 1.4), we refer the reader to standard textbooks, like, e.g., Refs. [1, 2, 6, 9].
1.1 Microscopic Dynamics of a Physical System 1.1.1 Conservative Dynamics In the framework of statistical physics, an important type of dynamics is the socalled conservative dynamics in which energy is conserved, meaning that friction © Springer Nature Switzerland AG 2021 E. Bertin, Statistical Physics of Complex Systems, Springer Series in Synergetics, https://doi.org/10.1007/978-3-030-79949-6_1
1
2
1 Equilibrium Statistical Physics
forces are absent, or can be neglected. As an elementary example, consider a particle constrained to move on a one-dimensional horizontal axis x, and attached to a spring, the latter being pinned to a rigid wall. We consider the position x(t) of the particle at time t, as well as its velocity v(t). The force F exerted by the spring on the particle is given by (1.1) F = −k(x − x0 ), where x0 corresponds to the position of repose of the particle, for which the force vanishes. For convenience, we shall in the following choose the origin of the x-axis such that x0 = 0. From the basic laws of classical mechanics, the motion of the particle is described by the evolution equation: dv = F, (1.2) m dt where m is the mass of the particle. We have neglected all friction forces, so that the force exerted by the spring is the only horizontal force (the gravity force, and the reaction force exerted by the support, do not have horizontal components in the absence of friction). In terms of x variable, the equation of motion (1.2) reads m
d2x = −kx. dt 2
(1.3)
The generic solution of this equation is x(t) = A cos(ωt + φ),
ω=
k . m
(1.4)
The constants A and φ are determined by the initial conditions, namely, the position and velocity at time t = 0. The above dynamics can be reformulated in the so-called Hamiltonian formalism. Let us introduce the momentum p = mv and the kinetic energy E c = 21 mv 2 . In terms of momentum, the kinetic energy reads E c = p 2 /2m. The potential energy U of the spring, defined by F = −dU/d x, is given by U = 21 kx 2 . The Hamiltonian H (x, p) is defined as (1.5) H (x, p) = E c ( p) + U (x). In the present case, this definition yields H (x, p) =
1 p2 + kx 2 . 2m 2
In the Hamiltonian formulation, the equations of motion read1 1
For a more detailed introduction to the Hamiltonian formalism, see, e.g., Ref. [5].
(1.6)
1.1 Microscopic Dynamics of a Physical System
∂H dx = , dt ∂p
3
dp ∂H =− . dt ∂x
(1.7)
On the example of the particle attached to a spring, these equations give dx p = , dt m
dp = −kx, dt
(1.8)
from which one recovers Eq. (1.3) by eliminating p. Hence, it is seen on the above example that the Hamiltonian formalism is equivalent to the standard law of motion (1.2).
1.1.2 Properties of the Hamiltonian Formulation Energy conservation. The Hamiltonian formulation has interesting properties, namely, energy conservation and time-reversal invariance. We define the total energy E(t) as E(t) = H (x(t), p(t)) = E c ( p(t)) + U (x(t)). It is easily shown that the total energy is conserved during the evolution of the system2 ∂ H dx ∂ H dp dE = + . dt ∂ x dt ∂ p dt
(1.9)
Using Eq. (1.7), one has dE ∂H ∂H ∂H = + dt ∂x ∂p ∂p
∂H − = 0, ∂x
(1.10)
so that the energy E is conserved. This is confirmed by a direct calculation on the example of the particle attached to a spring: 1 p(t)2 + kx(t)2 2m 2 1 2 2 2 2 1 m ω A sin (ωt + φ) + k A2 cos2 (ωt + φ). = 2m 2
E(t) =
2
(1.11)
The concept of energy, introduced here on a specific example, plays a fundamental role in physics. Although any precise definition of the energy is necessarily formal and abstract, the notion of energy can be thought of intuitively as a quantity that can take very different forms (kinetic, electromagnetic, or gravitational energy, but also internal energy exchanged through heat transfers) in such a way that the total amount of energy remains constant. Hence, an important issue is to describe how energy is transferred from one form to another. For instance, in the case of the particle attached to a spring, the kinetic energy E c and potential energy U of the spring are continuously exchanged, in a reversible manner. In the presence of friction forces, kinetic energy would also be progressively converted, in an irreversible way, into internal energy, thus raising the temperature of the system.
4
1 Equilibrium Statistical Physics
Given that ω2 = k/m, one finds E(t) =
1 1 2 2 k A sin (ωt + φ) + cos2 (ωt + φ) = k A2 2 2
(1.12)
which is indeed a constant. Time-reversal invariance. Another important property of the Hamiltonian dynamics is its time reversibility. To illustrate the meaning of time reversibility, let us imagine that we film the motion of the particle with a camera, and that we project it backward. If the backward motion is also a possible motion, meaning that nothing is unphysical in the backward projected movie, then the equations of motion are time reversible. More formally, we consider the trajectory x(t), 0 ≤ t ≤ t0 , and define the reversed time t = t0 − t. Starting from the equations of motion (1.7) expressed with t, x, and p, time reversal is implemented by replacing t with t0 − t , x with x and p with − p , yielding ∂H dp ∂H dx =− . (1.13) − =− , dt ∂p dt ∂x Changing the overall sign in the first equation, one recovers Eq. (1.7) for the primed variables, meaning that the time-reversed trajectory is also a physical trajectory. Note that time reversibility holds only as long as friction forces are neglected. The latter break time-reversal invariance, and this explains why our everyday life experience seems to contradict time-reversal invariance. For instance, when a glass falls down onto the floor and breaks into pieces, it is hard to believe that the reverse trajectory, in which pieces would come together and the glass would jump onto the table, is also a possible trajectory, as nobody has ever seen this phenomenon occur. In order to reconcile macroscopic irreversibility and microscopic reversibility of trajectories, the point of view of statistical physics is to consider that the reverse trajectory is possible, but has a very small probability to occur as only very few initial conditions could lead to this trajectory. So in practice, the corresponding trajectory is never observed. Phase-space representation. Finally, let us mention that it is often convenient to consider the Hamiltonian dynamics as occurring in an abstract space called “phase space” rather than in real space. Physical space is described in the above example by the coordinate x. The equations of motion (1.7) allow the position x and momentum p of the particle to be determined at any time once the initial position and momentum are known. So it is interesting to introduce an abstract representation space containing both position and momentum. In this example, it is a two-dimensional space, but it could be of higher dimension in more general situations. This representation space is often called “phase space”. For the particle attached to the spring, the trajectories in this phase space are ellipses. Rescaling the coordinates in an appropriate way, one can transform the ellipse into a circle, and the energy can be identified with the square of the radius of the circle. To illustrate this property, let us define the new phase-space coordinates X and Y as
1.1 Microscopic Dynamics of a Physical System
X=
k x, 2
5
p Y =√ . 2m
(1.14)
Then the energy E can be written as E=
1 p2 + kx 2 = X 2 + Y 2 . 2m 2
As the energy is fixed, the trajectory of the particle is a circle of radius (X, Y )-plane.
(1.15) √
E in the
1.1.3 Many-Particle System In a more general situation, a physical system is composed of N particles in a threedimensional space. The position of particle i is described by a vector xi , and its velocity by vi , i = 1, . . . , N . In the Hamiltonian formalism, it is often convenient to introduce generalized coordinates q j and momenta p j which are scalar quantities, with j = 1, . . . , 3N : (q1 , q2 , q3 ) are the components of the vector x1 describing the position of particle 1, (q4 , q5 , q6 ) are the component of x2 , and so on. Similarly, ( p1 , p2 , p3 ) are the components of the momentum vector mv1 of particle 1, ( p4 , p5 , p6 ) are the components of mv2 , etc. With these notations, the Hamiltonian of the N -particle system is defined as H (q1 , . . . , q3N , p1 , . . . , p3N ) =
3N p 2j j=1
2m
+ U (q1 , . . . , q3N ).
(1.16)
The first term in the Hamiltonian is the kinetic energy, and the last one is the potential (or interaction) energy. The equations of motion read dq j ∂H , = dt ∂pj
dp j ∂H , =− dt ∂q j
j = 1, . . . , 3N .
(1.17)
The properties of energy conservation and time-reversal invariance also hold in this more general formulation, and are derived in the same way as above. As an illustration, typical examples of interaction energy U include • U = 0: case of free particles. N hi · xi : particles interacting with an external field, for instance, the • U = − i=1 gravity field or an electric field. • U = i=i V (xi − xi ): pair interaction potential.
6
1 Equilibrium Statistical Physics
1.1.4 Case of Discrete Variables: Spin Models As a simplified picture, a spin may be thought of as a magnetization S associated with an atom. The dynamics of spins is ruled by quantum mechanics (the theory that governs particles at the atomic scale), which is outside the scope of the present book. However, in some situations, the configuration of a spin system can be represented in a simplified way as a set of binary “spin variables” si = ±1, and the corresponding energy takes the form N si s j − h si . (1.18) E = −J i, j
i=1
The parameter J is the coupling constant between spins, while h is the external magnetic field. The first sum corresponds to a sum over nearest neighbor sites on a lattice, but other types of interaction could be considered. This model is called the Ising model. It provides a qualitative description of the phenomenon of ferromagnetism observed in metals like iron, in which a spontaneous macroscopic magnetization appears below a certain critical temperature. In addition, the Ising model turns out to be a very useful model to illustrate some important concepts of statistical physics. In what follows, we shall consider the words “energy” and “Hamiltonian” as synonyms, and the corresponding notations E and H as equivalent.
1.2 Statistical Description of an Isolated System at Equilibrium 1.2.1 Notion of Statistical Description: A Toy Model Probabilistic concepts will be used as key tools throughout the book, and we start here by a brief and very elementary intuitive introduction of the notion of probability, in the context of the dynamics of a system. Let us consider a toy model in which a particle is moving on a ring with L sites. Time is discretized, meaning that, for instance, every second the particle moves to the next site. The motion is purely deterministic: given the position at time t = 0, one can compute the position i(t) at any later time. Now assume that there is an observable εi on each site i. It could be, for instance, the height of the site, or any arbitrary observable that characterizes the state of the particle when it is at site i. A natural question would be to know what the average value ε =
T 1 εi(t) T t=1
(1.19)
1.2 Statistical Description of an Isolated System at Equilibrium
7
is after a large observation time T . Two different approaches to this question can be proposed: • Simulate the dynamics of the model on a computer, and measure directly ε. • Use the concept of probability as a shortcut, and write ε =
L
Pi εi ,
(1.20)
i=1
where the probability Pi to be on site i is defined as Pi =
time spent on site i , total time T
(1.21)
namely, the fraction of time spent on site i. The probability Pi can be calculated or measured by simulating the dynamics, but it can also be estimated directly: if the particle has turned a lot of times around the ring, the fraction of time spent on each site is the same, Pi = 1/L. Hence, all positions of the particle are equiprobable, and the average value ε is obtained as a flat average over all sites. Of course, more complicated situations may occur, and the concept of probability remains useful beyond the simple equiprobability situation described above.
1.2.2 Fundamental Postulate of Equilibrium Statistical Physics We consider a physical system composed of N particles. The microscopic configuration of the system is described by (xi , pi = mvi ), i = 1, . . . , N , or si = ±1, i = 1, . . . , N , for spin systems. The total energy E of the system, given, for instance, for systems of identical particles by N pi2 + U (x1 , . . . , x N ) (1.22) E= 2m i=1 or for spins systems by E = −J
i, j
si s j − h
N
si
(1.23)
i=1
is constant as a function of time (or may vary within a tiny interval [E, E + δ E], in particular, for spin systems). Accordingly, starting from an initial condition with energy E, the system can only visit configurations with the same energy. In the
8
1 Equilibrium Statistical Physics
absence of further information, it is legitimate to postulate that all configurations with the same energy as the initial one have the same probability to be visited. This leads us to the fundamental postulate of equilibrium statistical physics: Given an energy E, all configurations with energy E have equal non-zero probabilities. Other configurations have zero probability. The corresponding probability distribution is called the microcanonical distribution or microcanonical ensemble for historical reasons (a probability distribution can be thought of as describing an infinite set of copies—an ensemble—of a given system). A quantity that plays an important role is the “volume” (E) occupied in phase space by all configurations with energy E. For systems with continuous degrees of freedom, (E) is the area of the hypersurface defined by fixing the energy E. For systems with discrete configurations (spins), (E) is the number of configurations with energy E. The Boltzmann entropy is defined as S(E) = k B ln (E),
(1.24)
where k B = 1.38 × 10−23 J/K is the Boltzmann constant. This constant has been introduced both for historical and practical reasons, but from a theoretical viewpoint, its specific value plays no role, so that we shall set it to k B = 1 in the following (this could be done, for instance, by working with specific units of temperature and energy such that k B = 1 in these units). The notion of entropy is a cornerstone of statistical physics. First introduced in the context of thermodynamics (the theory of the balance between mechanical energy transfers and heat exchanges), entropy was later on given a microscopic interpretation in the framework of statistical physics. Basically, entropy is a measure of the number of available microscopic configurations compatible with the macroscopic constraints. More intuitively, entropy can be interpreted as a measure of “disorder” (disordered macroscopic states often correspond to a larger number of microscopic configurations than macroscopically ordered states), though the precise correspondence between the two notions is not necessarily straightforward. Another popular interpretation, in relation to information theory, is to consider entropy as a measure of the lack of information on the system: the larger the number of accessible microscopic configurations, the less information is available on the system (in an extreme case, if the system can be with equal probability in any microscopic configuration, one has no information on the state of the system). Let us now give a few simple examples of computation of the entropy.
1.2 Statistical Description of an Isolated System at Equilibrium
9
1.2.3 Computation of (E) and S(E): Some Simple Examples Paramagnetic spin model. We consider a model of independent spins, interacting only with a uniform external field. The corresponding energy is given by E = −h
N
si ,
si = ±1.
(1.25)
i=1
The phase space (or here simply configuration space) is given by the list of values (s1 , . . . , s N ). The question is to know how many configurations have a given energy E. In this specific example, Nit is easily seen that fixing the energy E amounts to fixing si . Let us denote as N+ the number of spins with value the magnetization M = i=1 +1 (“up” spins). The magnetization is given by M = N+ − (N − N+ ) = 2N+ − N , so that fixing M is in turn equivalent to fixing N+ . From basic combinatorial arguments, the number of configurations with a given number of “up” spins reads as =
N! . N+ !(N − N+ )!
Using the relation 1 N+ = 2
E N− h
(1.26)
,
(1.27)
one can express as a function of E: N!
1
. (N − E/ h) ! 2 (N + E/ h) ! 2
(E) = 1
(1.28)
The entropy S(E) is given by S(E) = ln (E) = ln N ! − ln
E 1 E 1 N− ! − ln N+ ! 2 h 2 h
(1.29)
Using Stirling’s approximation, valid for large N ln N ! ≈ N ln N − N ,
(1.30)
one finds S(E) = N ln N −
N + E/ h N + E/ h N − E/ h N − E/ h ln − ln . 2 2 2 2
(1.31)
10
1 Equilibrium Statistical Physics
Perfect gas of independent particles. As a second example, we consider a gas of independent particles confined into a cubic container of volume V = L 3 . The generalized coordinates q j satisfy the constraints 0 ≤ qj ≤ L,
j = 1, . . . , L .
(1.32)
The energy E comes only from the kinetic contribution: E=
3N p 2j j=1
2m
.
(1.33)
The accessible volume in phase space is the product of the√accessible volume for each particle, times the area of the hypersphere of radius 2m E, embedded in a 3N-dimensional space. The area of the hypersphere of radius R in a D-dimensional space is Dπ D/2 R D−1 , (1.34) A D (R) = D 2 +1
∞ where (x) = 0 dt t x−1 e−t is the Euler Gamma function (a generalization of the factorial to real values, satisfying (n) = (n − 1)! for integer n ≥ 1). Hence, the accessible volume V (E) is given by V (E) =
3N π 3N /2 √ 3N −1 N 3N −1 2m V E 2 . 3N +1 2
(1.35)
The corresponding entropy reads, assuming N 1, SV (E) = ln (E) = S0 + with
3N ln E + N ln V 2
(1.36)
3N π 3N /2 √ 3N 2m S0 = ln . 3N +1 2
(1.37)
Note that, in principle, some corrections need to be included to take into account quantum effects, namely, the fact that quantum particles are undistinguishable. This allows, in particular, (E) to be made dimensionless, thus rendering the entropy independent of the system of units chosen. Quantum effects are also important in order to recover the extensivity of the entropy, that is, the fact that the entropy is proportional to the number N of particles. In the present form, N ln N terms are present, making the entropy grow faster than the system size. This is related to the so-called Gibbs paradox. However, we shall not describe these effects in more details here, and refer the reader to standard textbooks [1, 2, 6, 9].
1.2 Statistical Description of an Isolated System at Equilibrium
11
1.2.4 Distribution of Energy Over Subsystems and Statistical Temperature Let us consider an isolated system, with fixed energy and number of particles. We then imagine that the system is partitioned into two subsystems S1 and S2 , the two subsystems being separated by a wall which allows energy exchanges, but not exchanges of matter. The total energy of the system E = E 1 + E 2 is fixed, but the energies E 1 and E 2 fluctuate due to thermal exchanges. For a fixed energy E, let us evaluate the number (E 1 |E) of configurations of the system such that the energy of S1 has a given value E 1 . In the absence of longrange forces in the system, the two subsystems can be considered as statistically independent (apart from the total energy constraint), leading to (E 1 |E) = 1 (E 1 )2 (E − E 1 ),
(1.38)
where k (E k ) is the number of configurations of Sk . The most probable value E 1∗ of the energy E 1 maximizes by definition (E 1 |E), or equivalently ln (E 1 |E): ∂ ln (E 1 |E) = 0. ∂ E 1 E1∗
(1.39)
Combining Eqs. (1.38) and (1.39), one finds ∂ ln 1 ∂ ln 2 . ∗= ∂ E 1 E1 ∂ E 2 E2∗ =E−E1∗
(1.40)
Thus, it turns out that two quantities defined independently in each subsystem are equal at equilibrium. Namely, defining βk ≡
∂ ln k , ∂ E k Ek∗
k = 1, 2,
(1.41)
one has β1 = β2 . This is the reason why the quantity βk is called the statistical temperature of Sk . In addition, it can be shown that for large systems, the common value of β1 and β2 is also equal to β=
∂S ∂E
(1.42)
computed for the global isolated system. To identify the precise link between β and the standard thermodynamic temperature, we notice that in thermodynamics, one has for a system that exchanges no work with its environment: d E = T d S, (1.43)
12
1 Equilibrium Statistical Physics
which indicates that β = 1/T (we recall that we have set k B = 1). This is further confirmed on the example of the perfect gas, for which one finds using Eq. (1.36) β≡
3N ∂S = , ∂E 2E
(1.44)
3N . 2β
(1.45)
or equivalently E=
Besides, one has from the kinetic theory of gases E=
3 NT 2
(1.46)
(which is nothing but equipartition of energy), leading again to the identification β = 1/T . Hence, in the microcanonical ensemble, one generically defines temperature T through the relation ∂S 1 = . (1.47) T ∂E We now further illustrate this relation on the example of the paramagnetic crystal that we already encountered earlier. From Eq. (1.31), one has 1 ∂S 1 N − E/ h = = ln . T ∂E 2h N + E/ h
(1.48)
This last equation can be inverted to express the energy E as a function of temperature, yielding h (1.49) E = −N h tanh . T This relation has been obtained by noticing that x = tanh y is equivalent to 1+x 1 . y = ln 2 1−x In addition, from the relation E = −Mh, where M = zation, one obtains as a by-product M = N tanh
h . T
(1.50) N
i=1 si
is the total magneti-
(1.51)
1.3 Equilibrium System in Contact with Its Environment
13
1.3 Equilibrium System in Contact with Its Environment 1.3.1 Exchanges of Energy Realistic systems are most often not isolated, but they rather exchange energy with their environment. A natural idea is then to describe the system S of interest as a macroscopic subsystem of a large isolated system S ∪ R, where R is the environment or energy reservoir. The total energy E tot = E + E R is fixed. A configuration Ctot of the total system can be written as Ctot = (C, CR ), where C is a configuration of S and CR is a configuration of R. The total system S ∪ R is isolated and at equilibrium, so that it can be described within the macrocanonical framework: Ptot (Ctot ) =
1 , tot (E tot )
Ctot = (C, CR ).
(1.52)
To obtain the probability of a configuration C of S, one needs to sum Ptot (Ctot ) over all configurations CR of R compatible with the total energy E tot , namely, P(C) =
Ptot (C, CR ) =
CR :E R =E tot −E(C)
R (E tot − E(C)) . tot (E tot )
(1.53)
We introduce the entropy of the reservoir SR (E R ) = ln R (E R ). Under the assumption that E(C) E tot , one has SR (E tot − E(C)) ≈ SR (E tot ) − E(C) One also has
∂ SR . ∂ E R Etot
∂ SR ∂ SR 1 ≈ ∗ = , ∂ E R Etot ∂ E R ER T
(1.54)
(1.55)
where T is the temperature of the reservoir. Altogether, we have P(C) =
R (E tot ) −E(C)/T e . tot (E tot )
(1.56)
Note that the prefactor R /tot depends on the total energy E tot , while we would like P(C) to depend only on the energy E of the system considered. This problem can however be by-passed by noticing that the distribution P(C) should be normalized to unity, namely, C P(C) = 1. Introducing the partition function Z=
e−E(C)/T ,
C
one can then eventually rewrite the distribution P(C) in the form
(1.57)
14
1 Equilibrium Statistical Physics
P(C) =
1 −E(C)/T e , Z
(1.58)
which is the standard form of the canonical distribution. The partition function Z is a useful tool in statistical physics. For instance, the average energy E can be easily computed from Z : E =
P(C) E(C) =
C
C
=−
1 ∂Z ∂ ln Z =− . Z ∂β ∂β
E(C)
1 −E(C)/T e Z
1 = E(C) e−β E(C) Z C
(1.59)
Instead of Z , one may also use the “free energy” F defined as F = −T ln Z .
(1.60)
Let us give a simple example of computation of Z , in the case of the paramagnetic spin model. The partition function is given by
Z=
e−β E({si }) ,
(1.61)
{si =±1}
with E({si }) = −h
N
i=1 si .
Z =
Hence, one has
eβh
N
i=1 si
{si =±1}
=
N
e
βhsi
=
{si =±1} i=1
so that one finds
N i=1
e
βhs
(1.62)
s=±1
N Z = eβh + e−βh .
(1.63)
Turning to the average energy, one has E = −
∂ ∂ ln Z = −N ln eβh + e−βh , ∂β ∂β
(1.64)
so that one obtains, recalling that β = 1/T , E = −N h tanh
h . T
(1.65)
1.3 Equilibrium System in Contact with Its Environment
15
It is interesting to note that the above equation has exactly the same form as Eq. (1.49), provided that one replaces E, which is fixed in the microcanonical ensemble, by its average value E in the canonical ensemble. This property is an example of a general property called the “equivalence of ensembles”: in the limit of large systems, the relations between macroscopic quantities are the same in the different statistical ensembles, regardless of which quantity is fixed and which one is fluctuating through exchanges with a reservoir. The interpretation of this important property is basically that fluctuating observables actually have very small relative fluctuations for large system sizes. This property is deeply related to the Law of Large Numbers and to the Central Limit Theorem—see Chap. 8. Indeed, the relative fluctuations (quantified by the standard deviation normalized by the number of terms) of a sum of independent and identically distributed random variables go to zero when the number of terms in the sum goes to infinity. Note that the equivalence of ensembles generally breaks down in the presence of long-range interactions in the systems. Another example where the computation of Z is straightforward is the perfect gas. In this case, one has
L
Z = 0
= L 3N
dq1 . . . 3N
= VN Given that
dq3N
∞
∞
dp1 . . .
−∞
0 ∞
j=1 −∞
L
∞ −∞
dp3N e−β
3N j=1
p2j /2m
dp j e−βp j /2m 2
dp e−βp
2
/2m
3N .
−∞
∞
(1.66)
dp e−βp
2
/2m
−∞
one finds
Z=V
N
=
2π m β
2π m , β 3N2
.
(1.67)
(1.68)
Computing the average energy leads to E = −
3N 3 ∂ ln Z = = NT ∂β 2β 2
(1.69)
yielding another example of ensemble equivalence, as this result has the same form as Eq. (1.45). Equation (1.69) is also an example of the general relation of energy equipartition, valid for all quadratic degrees of freedom. More precisely, the equipartition relation states that, in the canonical ensemble, any individual degree of freedom x with a quadratic energy 21 λx 2 has an average energy
16
1 Equilibrium Statistical Physics
1 2 1 λx = k B T, 2 2
(1.70)
where we have temporarily reintroduced the Boltzmann constant k B (otherwise set to k B = 1) to comply with standard formulations.
1.3.2 Canonical Entropy As we have seen above, the microcanonical entropy is defined as S(E) = ln (E). This definition is clearly related to the equiprobability of accessible microscopic configurations, since it is based on a counting of accessible configurations. A natural question is then to know how to define the entropy in more general situations. A generic definition of entropy has appeared in information theory, namely, S=−
P(C) ln P(C),
(1.71)
C
where the sum is over all accessible configurations of the system. This entropy is called the Boltzmann–Gibbs, von Neumann, or Shannon entropy depending on the context. This definition of entropy is moreover consistent with the microcanonical one: if P(C) = 1/(E) for configurations of energy E, and P(C) = 0 otherwise, one finds 1 1 ln = ln (E) . (1.72) S=− (E) (E) C:E(C)=E In this general framework, the canonical entropy reads Scan = −
Pcan (C) ln Pcan (C)
C
=
1 e−β E(C) (ln Z + β E(C)) Z C
= ln Z + βE.
(1.73)
Recalling that the free energy F is defined as F = −T ln Z , one thus has T S = −F + E, which is nothing but the well-known relation F = E − T S. Another standard thermodynamic relation may be found using E = −∂ ln Z /∂β: S = ln Z − β
∂ ln Z ∂ ln Z ∂ = ln Z + T = (T ln Z ) ∂β ∂T ∂T
(1.74)
1.3 Equilibrium System in Contact with Its Environment
17
so that one finds the standard thermodynamic relation Scan = −
∂F . ∂T
(1.75)
1.3.3 Exchanges of Particles with a Reservoir: The Grand-Canonical Ensemble Similar to what was done to obtain the canonical ensemble from the microcanonical one by allowing energy exchanges with a reservoir, one can further allow exchanges of particles with a reservoir. The corresponding situation is called the grand-canonical ensemble. We thus consider a macroscopic system S exchanging both energy and particles with a reservoir R. The total system S ∪ R is isolated with total energy E tot and total number of particles Ntot fixed: E + E R = E tot ,
N + NR = Ntot .
(1.76)
Generalizing the calculations made in the canonical case, one has (with K a normalization constant), PGC (C) = K R (E R , NR ) = K R (E tot − E(C), Ntot − N (C)) = K exp [SR (E tot − E(C), Ntot − N (C))] . As E(C) E tot and N (C) Ntot , one can expand the entropy SR (E tot − E(C), Ntot − N (C)) to first order: SR (E tot − E(C), Ntot − N (C)) = SR (E tot , Ntot ) ∂ SR ∂ SR − N (C) . −E(C) ∂ E R Etot ,Ntot ∂ NR Etot ,Ntot
(1.77)
As before, the derivative ∂ SR /∂ E R is identified with 1/T . We also introduce a new parameter, the chemical potential μ, defined as μ = −T
∂ SR ∂ NR
(1.78)
(the T factor and the minus sign are conventional). Similar to the temperature which takes equal values when subsystems exchanging energy have reached equilibrium, the chemical potential takes equal values in subsystems exchanging particles, when equilibrium is attained. Gathering all the above results and notations, one finds that
18
1 Equilibrium Statistical Physics
PGC (C) =
μ 1 1 exp − E(C) + N (C) , Z GC T T
(1.79)
which is the standard form of the so-called grand-canonical distribution. The normalization constant Z GC , defined by Z GC =
C
μ 1 exp − E(C) + N (C) , T T
(1.80)
is called the grand-canonical partition function.
1.4 Phase Transitions and Ising Model Phase transitions correspond to a sudden change of behavior of the system when varying an external parameter across a transition point. This phenomenon may be of interest in complex systems well beyond physics, and is generically associated with collective effects. To illustrate this last property, let us briefly come back to the paramagnetic model defined in Sect. 1.2.3, for which the mean magnetization per spin is given by h M = tanh . (1.81) m ≡ N T The magnetization is non-zero only if there is a non-zero external field which tends to align the spins. A natural question is thus to know whether one could obtain a nonzero magnetization by including interactions tending to align spins between them (and not with respect to an external source). In this spirit, let us consider the standard (interaction) energy of the Ising model, in the absence of external field: E Ising = −J
si s j ,
J > 0.
(1.82)
i, j
This interaction energy is minimized when all spins are parallel. To compute the mean magnetization per spin, one would need to compute either the partition function in presence of a external magnetic field and take the derivative of the free energy with respect to the field, or to compute directly the mean magnetization from its definition. In any case, this is a very complicated task as soon as the space dimension D is larger than one, and the exact calculation has been achieved only in dimensions D = 1 and D = 2. The results can be summarized as follows: • D = 1: m = 0 for all T > 0, so that there is no phase transition at finite temperature. Calculations are relatively easy. • D = 2: there is a phase transition at a finite critical temperature Tc , namely, m = 0 for T ≥ Tc and m = 0 for T < Tc . Calculations are however very technical.
1.4 Phase Transitions and Ising Model
19
• D ≥ 3: no analytical solution is known, but numerical simulations show that there is a phase transition at a finite temperature that depends on D.
1.4.1 Ising Model in Fully Connected Geometry An interesting benchmark model, which can be shown analytically to exhibit a phase transition, is the fully connected Ising model, whose energy is defined as E fc = −
J si s j + E 0 , N i< j
(1.83)
where the sum is over all pairs of spins in the system. The 1/N prefactor is included in order to keep the energy per spin finite in the large N limit. The term E 0 is added for later convenience, and is arbitrary at this stage (it not modify the does N si , one has, given canonical distribution). Considering the magnetization M = i=1 that si2 = 1, M2 = 2 si s j + N (1.84) i< j
from which one finds E fc = −
J J J (M 2 − N ) + E 0 = − M 2 + + E0 . 2N 2N 2
(1.85)
Choosing E 0 = −J/2, and introducing the magnetization per spin m = M/N , one finds J (1.86) E fc = − N m 2 . 2 One possible way to detect the phase transition is to compute the probability distribution P(m) of the magnetization, by summing over all configurations having a given magnetization m: P(m) =
1 Z
e−β E(C) =
C:m(C)=m
1 S(m)+ 1 β J N m 2 2 e , Z
(1.87)
where (m) = e S(m) is the number of configurations with magnetization m. Using the relation N! (1.88) (m) = N+ !N− ! with N+ =
N (1 + m), 2
N− =
N (1 − m), 2
(1.89)
20
1 Equilibrium Statistical Physics
one obtains for S(m) = ln (m) S(m) = −N
1−m 1+m ln(1 + m) + ln(1 − m) − ln 2 . 2 2
(1.90)
Hence, from Eq. (1.87) it turns out that P(m) can be written as P(m) = e−N f (m)
(1.91)
with f (m) given by f (m) =
1−m J 2 1+m ln(1 + m) + ln(1 − m) − m + f 0 (T ), 2 2 2T
(1.92)
where f 0 (T ) is a temperature-dependent constant, ensuring that the minimum value reached by f (m) is 0, to be consistent with the normalization of P(m). The function f (m) is called a large deviation function, or in this specific context a Landau free energy function. Hence the magnetization m 0 that maximizes the probability distribution P(m) corresponds to a minimum of f (m). Moreover, fluctuations around m 0 are exponentially suppressed with N . For high temperature T , the term J/T is small, and the entropic contribution to f (m) should dominate, leading to m 0 = 0. To understand what happens when temperature is progressively lowered, it is useful to expand f (m) for small values of m, up to order m 4 , leading to J 1 1 1− m 2 + m 4 + O(m 6 ). f (m) = f 0 (T ) + 2 T 12
(1.93)
One can then distinguish two different cases: • If T ≥ Tc ≡ J , f (m) has only one minimum, for m = 0. • If T < Tc , f (m) has two symmetric minima ±m 0 . These minima are obtained as solutions of the equation d f /dm = 0: 1 J J 1 3 df = 1− m + m = − 1 − m + m 3 = 0. dm T 3 T 3
(1.94)
The non-zero solutions are m = ±m 0 with √ Tc − T 1/2 J −1 = 3 m0 = 3 . T T
(1.95)
This expression is quantitatively valid only for T close to Tc , but at a qualitative level it correctly describes the behavior of m 0 in the whole range T < Tc . It can be checked easily that the solution m = 0 corresponds in this case to a local maximum of f (m), and thus to a local minimum of P(m) (Fig. 1.1).
1.4 Phase Transitions and Ising Model
21
0.0005
1 0
m
f(m)-f0
f(m)-f0
0.2
0.1 -0.3
0 m
0.5
0.3
0 -1
-0.5
0
0.5
1
0
0
m
1
0.5
T
Fig. 1.1 Left, main plot: large deviation function f (m), for temperature T = 1.2, 0.98, 0.9, and 0.8 from top to bottom (Tc = 1). Two symmetric minima appear for T < Tc , indicating the onset of magnetized states. Inset: zoom on the temperature range close to Tc ; f (m) is plotted for T = 0.999, 0.995, 0.99, and 0.98 from top to bottom. Right: magnetization m(T ) as a function of temperature (Tc = 1)
Hence, there is a phase transition at T = Tc ≡ J , Tc being called the critical temperature. The most probable magnetization m 0 is called the “order parameter of the phase transition”, as the phase transition is characterized by the onset of a non-zero value of m 0 . In addition, the order parameter varies as m 0 ∼ (Tc − T )β for T close to Tc (T < Tc ), with β = 1/2 here. The exponent β is an example of critical exponent, and the value β = 1/2 is called the “mean-field value” of β, for reasons that will become clear in the next section. The notation β is standard for the critical exponent associated with the order parameter, and should not be confused with the inverse temperature β = 1/T . An important remark is that the average value m of the magnetization is still zero for T < Tc , since the two values ±m 0 of the magnetization have the same probability. However, for a large system, the time needed to switch between states m 0 and −m 0 becomes very large (at least if one uses a local spin-flip dynamics), and the time-averaged magnetization over a typical observation time window is non-zero, and equal either to m 0 or to −m 0 .
1.4.2 Ising Model with Finite Connectivity We now come back to the Ising model in a finite-dimensional space of dimension D. As mentioned above, the analytical solution is hard to obtain in dimension D = 2, and is not known in higher dimensions. However, useful approximations have been developed, the most famous one being called the mean-field approximation. The reason why the fully connected model can be easily solved analytically is that its energy E is a function of the magnetization m only, as seen in Eq. (1.86). When
22
1 Equilibrium Statistical Physics
the model is defined on a finite-dimensional lattice, this property no longer holds, and the energy reads ⎛ ⎞ N J ⎝ ⎠ E =− si sj , (1.96) 2 i=1 j∈V(i) where V(i) is the set of neighboring sites of site i. The factor 1/2 comes from the fact that a given link of the lattice now appears twice in the sum. This last expression can be rewritten as N E = −D J si s j V(i) , (1.97) i=1
s j V(i) being the local magnetization per spin of the set of neighbors V(i): s j V(i) ≡
1 sj. 2D j∈V(i)
(1.98)
The parameter D is the space dimension, and the number of neighbors of a given site i is 2D, given that we consider hypercubic lattices (square lattice in D = 2, cubic lattice in D = 3,...). As a first approximation, one could replace the local magnetization per spin of the set of neighbors by the global magnetization per spin of the whole system, N si : m = N −1 i=1 (1.99) s j V(i) → m. This approximation leads to the following expression of the energy: E ≈ E mf = −D J m
N
si = −D J N m 2 ,
(1.100)
i=1
where the subscript “mf” stands for “mean-field” approximation. Then E mf depends only on the magnetization m, and has a form similar to the energy E fc of the fully connected model. One can define an effective coupling Jmf = 2D J so that the forms of the two energies become exactly the same, namely, 1 E mf = − Jmf N m 2 . 2
(1.101)
Now it is clear that the results of the fully connected model can be applied to the present mean-field approximation, yielding a phase transition at Tcmf = Jmf = 2D J . For T > Tcmf , m 0 = 0 while for T < Tcmf , but close to Tcmf , m 0 ∼ (Tcmf − T )1/2 . Qualitatively, the approximation is expected to be valid for large space dimension D. It can be shown, using more involved arguments, that for D ≥ 4, the approximation is semi-quantitatively valid, in the sense that the value β = 1/2 of the critical expo-
1.4 Phase Transitions and Ising Model
23
nent, obtained from the approximation, is correct. However, the value of the critical temperature Tcmf is not correctly predicted by the mean-field approximation, namely, Tc = Tcmf . For D < 4, the value of β differs from the mean-field value 1/2, and the mean-field approximation breaks down. For D = 3, numerical simulations indicate that β ≈ 0.31, and for D = 2, the exact solution yields β = 1/8. Finally, for D = 1, m 0 = 0 for all temperature T , so that the exponent β is not defined [7]. The discrepancy mentioned above between mean-field predictions and results obtained in low-dimensional systems mainly comes from the presence of fluctuations of the local magnetization j∈V(i) s j . Since, on the other hand, exact solutions are very hard to obtain, there is need for a different approach that could be generic enough and could be centered on the issue of correlation, which is at the heart of the difficulties encountered. This is precisely the aim of the renormalization group approach.
1.4.3 Renormalization Group Approach A standard observation on finite-dimensional systems exhibiting a continuous phase transition is that the correlation length diverges when the temperature approaches the critical temperature Tc . The correlation length is defined through the correlation function (1.102) Ci j = (si − m 0 )(s j − m 0 ) = si s j − m 20 . As soon as the distance r = di j between sites i and j is large with respect to the lattice spacing a, the correlation function generally becomes isotropic, Ci j = C(r ). In addition, the large distance behavior of C(r ) is often of the form C(r ) ∼
1 −r/ξ e , rα
α > 0,
(1.103)
which defines the correlation length ξ . The latter diverges for T → Tc . This is the reason why direct calculations in the range T ≈ Tc are very difficult, due to the strong correlation between spins. A natural idea is to look for an approach that could reduce in some way the intensity of correlations, in order to make calculations tractable. This is basically the principle of the renormalization group (RG) approach, in which one progressively integrates out small-scale degrees of freedom. The idea is that at the critical point, structures are present at all scales, from the lattice spacing to the system size. A RG transform may intuitively be thought of as defocusing the picture of the system, so that fine details become blurred. This method is actually very general, and could be relevant in many fields of complex system sciences, given that issues like large-scale correlations and scale invariance or fractals are often involved in complex systems. For definiteness, let us however consider again the Ising model. To implement the RG ideas in a practical way, one could make blocks of spins and define an
24
1 Equilibrium Statistical Physics
effective spin for each block, with effective interactions with the neighboring blocks. The effective interactions are defined in such a way that the large-scale properties are the same as for the original (non-renormalized) model. This is done in practice by conserving the partition function, namely, Z = Z (in the present section, the prime denotes renormalized quantities). One would then like to define a renormalized interaction constant J such that S B1 S B2 , (1.104) H = −J B1 ,B2
where B1 and B2 are generic labels for the blocks (the sites of the renormalized lattice). The problem is that very often the RG transform generates new effective couplings, like next-nearest-neighbor couplings, that were absent in the original model, and the number of couplings keeps increasing with the number of iterations of the RG transform. However, in some simple cases, the transformation can be performed exactly, without increasing the number of coupling constants, as we shall see later on. Yet, let us first emphasize the practical interest of the RG transform. We already mentioned that one of the main difficulties comes from the presence of long-range correlations close to the critical point. Through the RG transform, the lattice spacing becomes a = 2a (if one makes blocks of linear size 2a). On the contrary, the correlation length remains unchanged, since the large-scale properties remain unaffected by the RG transform. Hence, the correlation length expressed in unit of the lattice spacing, namely, ξ/a, decreases by a factor of 2 in the transformation to become ξ 1ξ . = a 2a
(1.105)
Thus, upon iterations of the RG transform, the effective Hamiltonian becomes such that ξ ∼ a , so that standard approximation schemes (mean-field,...) can be used. One then needs to follow the evolution of the coupling constant J under iterations. This is called the renormalization flow. An explicit example can be given with the one-dimensional Ising chain, using a specific RG transform called decimation procedure [11]. We start with the energy (or Hamiltonian) N Hi,i+1 (si , si+1 ), (1.106) H= i=1
where the local interaction term Hi,i+1 (si , si+1 ) is given by Hi,i+1 (si , si+1 ) = −J si si+1 + c.
(1.107)
Note that periodic boundary conditions are understood. The constant c plays no role at this stage, but it will be useful later on in the renormalization procedure. The basic
1.4 Phase Transitions and Ising Model
25
idea of the decimation procedure is to perform, in the partition function, a partial sum over the spins of—say—odd indices in order to define renormalized coupling constants J and h . Then summing over the values of the spins with even indices yields the partition function Z of the renormalized model, which is, by definition of the renormalization procedure, equal to the initial partition function Z . To be more explicit, one can write Z as Z=
exp [−β H ({si })] ,
(1.108)
{s2 j } {s2 j+1 }
where {s2 j } (resp., {s2 j+1 } ) indicates a sum over all possible values of the N /2 variables {s2 j } (resp., {s2 j+1 }). Equation (1.108) can then be rewritten in the following form:
exp −β H ({s2 j }) , (1.109) Z= {s2 j }
where H ({s2 j }) is the renormalized Hamiltonian, defined by
exp [−β H ({si })] . exp −β H ({s2 j }) =
(1.110)
{s2 j+1 }
Assuming that the renormalized Hamiltonian can be decomposed into a sum of local terms N /2 H ({s2 j }) = H j, j+1 (s2 j , s2 j+2 ) (1.111) j=1
we get from Eq. (1.110) the relation N /2
exp −β H j, j+1 (s2 j , s2 j+2 )
(1.112)
j=1
=
N /2
exp −β H2 j,2 j+1 (s2 j , s2 j+1 ) − β H2 j+1,2 j+2 (s2 j+1 , s2 j+2 )
{s2 j+1 } j=1
=
N /2
exp −β H2 j,2 j+1 (s2 j , s2 j+1 ) − β H2 j+1,2 j+2 (s2 j+1 , s2 j+2 )
j=1 s2 j+1
where, in the last line, the sum runs over the single variable s2 j+1 , the index j being fixed within the product. This last relation is satisfied if, for any given j = 1, . . . , N /2, and any given values of s2 j and s2 j+2 ,
26
1 Equilibrium Statistical Physics
exp −β H j, j+1 (s2 j , s2 j+2 ) (1.113)
exp −β H2 j,2 j+1 (s2 j , s2 j+1 ) − β H2 j+1,2 j+2 (s2 j+1 , s2 j+2 ) . = s2 j+1 =±1
Further assuming that H j, j+1 (s2 j , s2 j+2 ) takes the form H j, j+1 (s2 j , s2 j+2 ) = −J s2 j s2 j+2 + c ,
(1.114)
where J and c are the renormalized parameters, one obtains
exp β J s2 j s2 j+2 − βc =
exp β J (s2 j s2 j+1 + s2 j+1 s2 j+2 ) − 2βc .
s2 j+1 =±1
(1.115) Introducing the reduced variable3 u = e−4β J ,
(1.116)
Equation (1.115) leads to the following recursion relation: u =
4u . (1 + u)2
(1.117)
Let us denote as ξnd the dimensionless correlation length ξnd =
ξ . a
(1.118)
Then from Eq. (1.105) the recursion relation for ξnd reads = ξnd
1 ξnd , 2
(1.119)
from which one deduces that the fixed points of the renormalization procedure, which = ξnd , can only be ξnd = ∞ or ξnd = 0. The latter is called the trivial fixed satisfy ξnd point, as it corresponds to the limit situation where no correlation is present in the system. In contrast, the fixed point ξnd = ∞ corresponds to the critical fixed point, where correlation extends over the whole system size. As ξnd decreases through iteration of the RG transform, the critical fixed point ξnd = ∞ is unstable, while the trivial fixed point ξnd = 0 is stable. Coming back to the iteration relation Eq. (1.117), let us first look for the fixed points of this equation, namely, the solutions of
3
We do not follow here the evolution of the constant c under renormalization, and rather focus on the evolution of the physically relevant coupling constant J .
1.4 Phase Transitions and Ising Model
27
u=
4u . (1 + u)2
(1.120)
The value u = 0 is obviously a solution, and it is easy to check that u = 1 is the other positive solution (u = −3 is the third solution, but in view of Eq. (1.116), we are seeking for positive solutions only). Then to identify which one of the two fixed points is the critical point, we need to investigate the stability of each fixed point under iteration. The stability is studied by introducing a small variation δu around a given fixed point u 1 , namely, u = u 1 ± δu, and writing the evolution equation for δu to leading order. For u 1 = 0, one finds, with u = δu, δu =
4δu ≈ 4δu, (1 + δu)2
δu > 0,
(1.121)
so that δu increases upon iteration: the fixed point u 1 = 0 is unstable, and thus corresponds to the critical fixed point. Besides, the fixed point u 1 = 1 is easily checked to be stable. Using u = 1 − δu, we have 1 − δu =
4(1 − δu) , (2 − δu)2
(1.122)
leading after a second-order expansion in δu to δu ≈
1 2 δu . 4
(1.123)
Hence, δu converges to 0 upon iteration, confirming the stability of the fixed point u 1 = 1. Coming back to the critical fixed point, and recalling the definition Eq. (1.116), one sees that u 1 = 0 corresponds to an infinite value of J/T . In the above framework, this case is interpreted as an infinite coupling limit, as the iteration was made on J . However, the fixed point can also be interpreted as a zero-temperature fixed point, keeping the coupling constant J fixed. A sketch of the corresponding renormalization flow is presented in the top panel of Fig. 1.2. This one-dimensional example is, of course, only a very simple case, which can be solved through other more direct methods. However, it is a good illustration of the way the concept of RG can be implemented in practice. In two- or three-dimensional models, exact treatments like the above one are most often not available. Yet, many approaches based on different approximation schemes have been developed. A typical situation in dimension D > 1 is that there is a finite value K c of the ratio K = J/T which corresponds to a critical fixed point, and both values K = 0 and K = ∞ correspond to trivial fixed points, where no correlation is present (see bottom panel of Fig. 1.2). Quite importantly, linearizing the iteration equation in the vicinity of the critical fixed point allows the determination of the critical exponent β as well as other critical exponents. In the Ising chain studied above, this is not possible because the critical temperature is zero, so that there is no extended temperature region where the
28
1 Equilibrium Statistical Physics Trivial fixed point
Critical fixed point
∞
0 J/T
Trivial fixed point
Critical fixed point
Trivial fixed point
∞
0 J/T
Fig. 1.2 Sketch of the renormalization flow, in terms of the reduced coupling constant J/T . In all cases, the zero coupling (or infinite temperature) point is a trivial fixed point, but the position of the critical fixed point may differ from one case to the other. Top: one-dimensional Ising model; the critical fixed point corresponds to infinite coupling (or zero temperature). Bottom: fully connected Ising model, or Ising model in dimension D ≥ 2; the critical fixed point corresponds to a finite value of the reduced coupling, implying a finite critical temperature for a given coupling
magnetization is non-zero. But this approach turns out to be relevant in dimension higher than one. As an illustration of the emergence of a critical fixed point with a finite coupling K c , let us briefly consider again the fully connected Ising model (which, as seen above, can be studied by more direct means than the renormalization group method). The energy of the fully connected Ising model reads Hfc = − 21 J N m 2 + c, where c is an arbitrary constant. Below, we denote as K = β J the dimensionless coupling constant (with β the inverse temperature). We perform a very simple renormalization procedure that consists in integrating out a single spin, going from a system of N + 1 spins to a system of N spins. Denoting as K and c the parameters of the renormalized system, the renormalization transformation reads exp
1 K N m 2 + c 2
=
S N +1 =±1
exp
1 K (N m + S N +1 )2 + c 2 N +1
(1.124)
N with m = N −1 i=1 Si . Expanding the square in the r.h.s. of Eq. (1.124) and taking the logarithm of both sides, one obtains in the large N limit 1 1 (K − K )N m 2 + c = − K m 2 + ln cosh(K m) + ln 2 + c. 2 2
(1.125)
As for the one-dimensional Ising model, we do not follow the constant term c and focus on the coupling constant K . Expanding the hyperbolic cosine to order m 2 , we get from the coefficients of the m 2 terms
1.4 Phase Transitions and Ising Model
29
(K − K )N = −K + K 2 .
(1.126)
Note that higher order terms in m, like m 4 , are also generated in this transformation; however, we do not study their role here. The renormalization flow is often characterized by a parameter which is additive when several renormalization transformations are successively performed. In a single transformation, this parameter varies by δ = ln b, where b > 1 is the scale factor of the transformation. Here, b = (N + 1)/N , so that for large N , δ = 1/N . Considering K as a function of , we can thus rewrite Eq. (1.126) as dK = −K + K 2 . d
(1.127)
It is easy to check that K c = 1 is a critical (unstable) fixed point, while K = 0 and K = ∞ are trivial (stable) fixed points. Hence, we recover the phase transition at K c = 1 obtained by direct calculations in Sect. 1.4.1.
1.5 Disordered Systems and Glass Transition Generally speaking, disordered systems are systems where each particle or agent has specific properties, which are qualitatively the same for all of them, but differ quantitatively from one to the other. In theoretical models, these quantitative properties are most often drawn from a given probability distribution for each particle or agent, and remain constant in time. Disordered systems should be very relevant in complex system studies, like social science, for instance, as each human being has its specific skills or tastes. In a physical context, the concept of disordered system could seem less natural, since all particles within a given class are identical. In this case, disorder rather comes from the possibly random position of the particles. A standard example is that of magnetic impurities (that carry a magnetic moment or spin) diluted in a non-magnetic material. The interaction between magnetic atoms (which have to be described in the framework of quantum mechanics) is mediated by the non-magnetic atoms, and acquires an oscillatory behavior, depending on the distance ri j between the two spins: J (ri j )si s j . (1.128) H =− i, j
The interaction constant J (ri j ) is a given function of the distance ri j , which oscillates around 0, thus taking both positive and negative values. The amplitude of the oscillations decays as a power law of the distance. As the distances between atoms are random, the interactions between atoms have a random sign, which is the basic property of spin glasses.
30
1 Equilibrium Statistical Physics
1.5.1 Theoretical Spin-Glass Models In order to propose a simplified model for spin-glass materials, it has been proposed to replace the positional disorder by an interaction disorder, the magnetic atoms being now situated on a regular lattice. To this purpose, one considers an Ising-like model in D dimensions, where the spins are placed at each node of a lattice. Spins on neighboring sites (i, j) interact through a coupling constant Ji j , drawn from a distribution P(J ). As the couplings Ji j are kept constant in time, one speaks about quenched disorder. This model is called the Edwards–Anderson model [4]. In D = 1, the Edwards–Anderson model is qualitatively equivalent to the standard Ising model, up to a redefinition of the spins. In D > 1, analytical solutions are not known, and results have thus been obtained through numerical simulations. A fully connected version, called the Sherrington–Kirkpatrick model, has been proposed and solved [12], but the techniques involved are already rather difficult, even at the level of the fully connected model. The main qualitative picture [10] emerging from these models is that below a given energy level, the phase space decomposes into a lot of valleys, or metastable states, from which it takes a very long time to escape. In order to give a flavor of the basic properties of spin-glass models, we present below two of the simplest models of spin glasses, namely, the Mattis model and the random energy model.
1.5.2 A Toy Model for Spin Glasses: The Mattis Model Spin-glass models are very complicated to study analytically, due to the presence of disorder. There is a simplified case, however, in which explicit calculations can be done relatively easily, while preserving some of the main features of spin glasses. This is the so-called Mattis model [8]. This model consists in a spin glass on an arbitrary lattice, composed of N spins. The energy of a configuration (s1 , . . . , s N ) takes the usual form H =−
i, j
Ji j si s j − h
si ,
(1.129)
i
where, as usual, the sum is carried over the nearest neighbors on the lattice. The simplification with respect to generic spin-glasses comes from the form assumed for the disordered couplings Ji j Ji j = J εi ε j , (1.130) where εi = ±1 are independent quenched random variables. Introducing new spin variables σi = εi si , the energy reads
1.5 Disordered Systems and Glass Transition
H = −J
31
σi σ j − h
i, j
εi σi .
(1.131)
i
In the following, we shall focus on the case of zero external field, h = 0. Let us introduce the mean magnetization per spin m, defined as m≡
1 si , N i
(1.132)
where si means a “thermal” average, that is, an average over all spin configurations, for given values of the disorder. Then it is easy to show, by averaging over the disorder represented by the variables εi , that the disorder-averaged magnetization is equal to zero, m = 0 (the overbar denotes an average over the disorder). This can be done as follows. Introducing the variables σi , let us rewrite the averaged magnetization as 1 εi σi . (1.133) m= N i Performing the average over the disorder, we get m=
1 εi σi . N i
(1.134)
For the case h = 0 considered here, the energy expressed in terms of the variables σ takes the standard Ising form H = −J
σi σ j
(1.135)
i, j
and is thus independent of the disorder. Hence, the thermal average σi is also independent of the disorder, so the disorder-averaged magnetization can be rewritten as 1 εi σi . (1.136) m= N i Since εi = 0, one readily obtains that the disorder-averaged magnetization is equal to zero. To go one step further, it is also possible to show using similar methods that the (zero-field) disordered average susceptibility, reads as χ≡
∂m 1−q , = ∂h h=0 kB T
(1.137)
32
1 Equilibrium Statistical Physics
where q ≡ si 2 = σi 2 . We thus find that the susceptibility diverges at T = 0 only, confirming that there is no phase transition at T > 0. The parameter q is generically called the Edwards–Anderson order parameter, and it is non-zero as soon as individual spins have non-zero average values, even if these values take random signs from one spin to the other, and sum up to a zero global magnetization.
1.5.3 The Random Energy Model Another very simple disordered model, which already captures a lot of the phenomenology of realistic disordered systems, is the Random Energy Model (REM) [3]. We have seen above in the Mattis model how to calculate magnetic properties like the average magnetization or the susceptibility. However, even in this simplified setting of the Mattis model it remains difficult to evaluate thermodynamic quantities like the free energy or the entropy. The aim of introducing the REM is precisely to be able to compute such thermodynamic quantities in a simple way, and to understand how these quantities behave in the presence of a glass transition. Evaluating thermodynamic quantities requires the computation of the partition function, which for disordered systems is itself a random variable depending on the statistics of the energies of each configuration. The REM makes the strong assumption that correlations between the energies of different configurations can be neglected. Let us come to the precise definition of the model. The REM has 2 N configurations, labeled by an index α = 1, . . . , 2 N . It can be thought of as a spin model, with N spins si = ±1, although spins are not explicitly described in this model. To each configuration α is attached a time-independent energy E α , chosen at random from the distribution E2 1 . (1.138) exp − P(E) = √ N J2 Nπ J2 All the energies E α are statistically independent random variables. The variance of E in the distribution (1.138) is taken to be proportional to N in order to ensure that the average energy of the model is extensive, i.e., proportional to N —see below. The fact that the variance of the energy has to be proportional to N is also easily seen by evaluating this variance in spin-glass models with explicit disordered couplings between spins, like the Mattis model [see Eq. (1.129)]. To determine thermodynamic properties, we introduce the density of states n(E). To be more specific, we denote as n(E)d E the number of configurations with energy in the interval [E, E + d E], so that n(E) is the density of configurations with energy E. The density n(E) is a random quantity, but its fluctuations are small if n(E) is large, so that n(E) ≈ n(E) in this regime. By definition, P(E) = n(E)/2 N , so that n(E) = 2 N P(E), leading to
1.5 Disordered Systems and Glass Transition
33
E2 n(E) = exp N ln 2 − N J2 ε2 , = exp N ln 2 − 2 J
(1.139)
where the energy density ε = E/N has been √ introduced. One sees that if ln 2 − ε2 /J 2 > 0, corresponding to |ε| < ε0 = J ln 2, n(E) is exponentially large with N , so that there is a large number of configurations at energy density ε, and the assumption n(E) ≈ n(E) is justified. In contrast, if ln 2 − ε2 /J 2 < 0, which corresponds to |ε| > ε0 , n(E) is exponentially small with N . This means that in most samples, there are no configurations at energy density |ε| > ε0 . The non-zero but small value of n(E) comes from the contribution to the average value of very rare and atypical samples, which include some configurations with exceptionally low (or high) energy. We can now evaluate the partition function of the REM, defined as 2 N
Z=
e−Eα /T .
(1.140)
α=1
As all the energies E α are random variables, the partition function Z is also a random variable, which fluctuates from one realization of the disorder to another. Yet, we can evaluate the typical value of Z as follows: Z ≈ Z typ =
ε0
−ε0
dε n(ε) ˜ e−N ε/T ,
(1.141)
with the notation n(ε) ˜ = N n(N ε). In Eq. (1.141), we have replaced n(ε) ˜ by n(ε) ˜ for |ε| < ε0 , and by 0 for |ε| > ε0 , consistently with the above discussion. We can then write, using Eqs. (1.139) and (1.141), Z typ =
ε0
dε e N g(ε)
(1.142)
−ε0
with g(ε) = ln 2 −
ε ε2 − . 2 J T
(1.143)
The function g(ε) is illustrated in Fig. 1.3. In the large N limit, we can evaluate Z typ through a saddle point calculation, namely, Z typ ∼ e N gmax (ε0 ) ,
(1.144)
where gmax (ε0 ) is the maximum value of g(ε) over the interval [−ε0 , ε0 ] (in practice, over the interval [−ε0 , 0] since the maximum is always reached for a negative energy
34
1 Equilibrium Statistical Physics 2
T < Tg T = Tg T > Tg
g(ε)
1.6
1.2
0.8
-ε0 -1
-0.8
-0.6
-0.4
ε
-0. 2
0
Fig. 1.3 Illustration of the function g(ε) for T > Tg (full line), T = Tg (dashed line) and T < Tg (dot-dashed), with J = 1. The function g(ε) is plotted over the interval [−ε0 , 0]; the value ε = −ε0 is indicated by the vertical dotted line. For T > Tg , the maximum ε∗ of g(ε) satisfies −ε0 < ε∗ < 0, while for T ≤ Tg , the maximum of g(ε) over the interval [−ε0 , 0] is reached at the lower bound ε = −ε0
value). Let us first consider the maximum ε∗ of g(ε) over the entire real line. Taking the derivative of g(ε), one has g (ε) = −
2ε 1 − . 2 J T
From g (ε) = 0, we find ε∗ = −
J2 . 2T
(1.145)
(1.146)
As g(ε) is a parabola with negative curvature, it is increasing for ε < ε∗ and decreasing for ε > ε∗ . If ε∗ > −ε0 , then gmax (ε0 ) = g(ε∗ ), so that ∗
Z typ ∼ e N g(ε ) .
(1.147)
The condition ε∗ > −ε0 translates into T > Tg , where the so-called glass transition temperature Tg is defined as J Tg = √ . (1.148) 2 ln 2 For ε∗ < −ε0 , or equivalently T < Tg , g(ε) is a decreasing function of ε over the entire interval [−ε0 , ε0 ], so that gmax (ε0 ) = g(−ε0 ), and Z typ ∼ e N g(−ε0 ) .
(1.149)
1.5 Disordered Systems and Glass Transition
35
From these estimates of Z typ , the free energy F = −T ln Z typ and the entropy S = −∂ F/∂ T can be computed. For T > Tg , one finds
J2 F = −N T ln 2 + 4T leading for the entropy to
,
J2 . S = N ln 2 − 4T 2
(1.150)
(1.151)
For T < Tg , we have √ F = −T N g(−ε0 ) = −N J ln 2.
(1.152)
The free energy does not depend on temperature in this range, so that the corresponding entropy vanishes ∂F = 0, T < Tg . (1.153) S=− ∂T It can also be checked that the entropy given in Eq. (1.151) for T > Tg vanishes continuously for T → Tg . Hence, the temperature Tg corresponds to a glass transition temperature, where the entropy goes to zero when lowering temperature down to Tg , and remains zero below Tg . Actually, to make the statement sharper, only the entropy density S/N goes to zero for T < Tg , in the infinite N limit. Computing subleading corrections to the entropy, one finds that the entropy S is independent of N , but nonzero, for T < Tg . The entropy is then intensive in this temperature range, meaning that only a finite number of configurations, among the 2 N ones a priori available, are effectively occupied: the system is trapped in the lowest energy configurations.
1.6 Exercises 1.1 Free energy and entropy Evaluate the free energy and the entropy of the following models: (a) A set of N harmonic oscillators defined by a real variable xi , with total energy N 1 λxi2 , λ being the stiffness of the harmonic oscillator; E = i=1 2 N (b) A paramagnetic spin model with spins si = ±1 and total energy E = − i=1 hsi . 1.2 Energy equipartition (a) Derive the energy equipartition relation given in Eq. (1.70) for a single degree of freedom x with quadratic energy 21 λx 2 in the canonical ensemble. (b) Generalize the result to the case ofa system with N degrees of freedom xi N 1 (i = 1, . . . , N ) with a total energy E = i=1 λ x 2 , where each degree of freedom 2 i i has a specific stiffness λi .
36
1 Equilibrium Statistical Physics
(c) Consider a one-dimensional lattice with N particles at position y j = ja + u j ( j = 1, . . . , N ), where a is the lattice spacing and u j the displacement of particle j with respect to its rest position. As a simple model of a crystal, neighboring particles are linked by a linear spring, resulting in a total energy E = Nj=1 21 λ(u j+1 − u j )2 . Due to the elastic interactions, the total energy is no longer a sum of terms depending on a single degree of freedom, and the displacements u j are correlated. In this case, the energy equipartition relation cannot be applied in a direct way. However, this problem can be by-passed by introducing the complex Fourier modes uˆ q =
N j=1
u j eiq ja
2kπ q= , k = 0, . . . , N − 1 , N
(1.154)
associated with the displacements u j . Equivalently, u j is expressed in terms of the Fourier modes as u j = N −1 q uˆ q e−iq ja . Compute the total energy in terms of the Fourier amplitudes uˆ q and show that these amplitudes satisfy an energy equipartition relation. 1.3 Ising chain Consider the one-dimensional Ising model in a magnetic field, defined by the energy E = −J
N
si si+1 − h
i=1
N
si ,
(1.155)
i=1
with spin variables si = ±1 (i = 1, . . . , N ); J is the spin coupling constant and h is the magnetic field. (a) Evaluate the partition function Z using the transfer matrix method, i.e., show that Z can be written in the form Z = TrT N , where T is a 2 × 2 matrix (called transfer matrix) to be specified, and Tr is the trace operator. (b) Deduce the free energy per spin f = lim N →∞ −(T /N ) ln Z , where T is the temperature. (c) Evaluate the magnetization m and the susceptibility χ = ∂m/∂h. 1.4 Paramagnetic model in a random field Consider a paramagnetic spin model in a quenched random field defined by the energy N h i si , (1.156) E =− i=1
with spin variables si = ±1, and where the parameters h i are independent and identically distributed quenched random variables drawn from a Gaussian distribution with zero mean and variance h 20 . For of the random field h i , the maga given sample netization is defined as m = N −1 i si , where i si is the average value of the spin si at temperature T .
1.6 Exercises
37
(a) Determine the magnetization m averaged over the disorder (i.e., the values of the random field h i ). (b) Write the expression of the Edwards–Anderson parameter q = m 2 as an integral, and determine simple approximate expressions of q in both the high-temperature and low-temperature limits.
References 1. Balescu, R.: Equilibrium and Nonequilibrium Statistical Mechanics. Wiley, New-York (1975) 2. Chandler, D.: Introduction to Modern Statistical Mechanics. Oxford (1987) 3. Derrida, B.: Random-energy model: an exactly solvable model of disordered systems. Phys. Rev. B 24, 2613 (1981) 4. Edwards, S.F., Anderson, P.W.: Theory of spin glasses. J. Phys. F 5, 965 (1975) 5. Goldstein, H.: Classical Mechanics. Addison-Wesley, Boston (1980) 6. Hill, T.L.: An Introduction to Statistical Thermodynamics. Dover, New York (1986) 7. Le Bellac, M.: Quantum and Statistical Field Theory. Oxford Science Publications, Oxford (1992) 8. Mattis, D.C.: Solvable spin systems with random interactions. Phys. Lett. A 56, 421 (1976) 9. McQuarrie, D.A.: Statistical Mechanics. Harper and Row, Manhattan (1976) 10. Mézard, M., Parisi, G., Virasoro, M.A.: Spin-Glass Theory and Beyond. World Scientific, Singapore (1987) 11. Nelson, D.R., Fisher, M.E.: Soluble renormalization groups and scaling fields for lowdimensional Ising systems. Ann. Phys. (N.Y.) 91, 226 (1975) 12. Sherrington, D., Kirkpatrick, S.: Solvable model of a spin-glass. Phys. Rev. Lett. 35, 1792 (1975)
Chapter 2
Non-stationary Dynamics and Stochastic Formalism
In the first chapter of this book, we have considered the stationary properties of physical systems composed of a large number of particles, using as a fundamental statistical object the joint distribution of all the degrees of freedom of the system (for instance, positions and velocities, or spin variables). This steady state is expected to be reached after a transient regime, during which the macroscopic properties of the system evolve with time. Describing the statistical state of the system during this transient regime is also certainly of interest. However, there is no known simple postulate (similar to the postulate of equiprobability of configurations having a given energy) to characterize the N -particle probability distribution in this time-dependent regime. Still, one can resort to the generic mathematical formalism of stochastic processes in order to describe statistically the time evolution of some specific variables of interest, like the position or velocity of a probe particle immersed in a fluid. This formalism is presented in Sect. 2.1, in the simplest case of Markov processes. The example of the random evolution of a single degree of freedom in a noisy environment (diffusive motion), leading to the Langevin and Fokker–Planck equations, is discussed, respectively, in Sects. 2.2 and 2.3. In addition, there exists situations in which this random evolution can be much faster or much slower than a priori expected, leading to anomalous diffusion. A brief account of scaling arguments allowing for a qualitative understanding of anomalous diffusion is given in Sect. 2.4. Then first return time properties of random walks are discussed in Sect. 2.5, in connection to applications like stochastic on–off intermittency and avalanche dynamics. Finally, generic issues regarding the convergence to equilibrium statistics in the framework of Markovian stochastic processes are presented in Sect. 2.6. Interestingly, the equilibrium distribution may not exist in some cases, leading to an endless relaxation called aging regime. An example of such a situation is also provided.
© Springer Nature Switzerland AG 2021 E. Bertin, Statistical Physics of Complex Systems, Springer Series in Synergetics, https://doi.org/10.1007/978-3-030-79949-6_2
39
40
2 Non-stationary Dynamics and Stochastic Formalism
2.1 Markovian Stochastic Processes and Master Equation 2.1.1 Definition of Markovian Stochastic Processes Let us start with some basic considerations on stochastic processes. For more advanced reading on this topic, we refer the reader, for instance, to Ref. [20]. Roughly speaking, a stochastic process is a dynamical process whose evolution is random, and depends on the presently occupied state and possibly on the history of the system. Considering first a discrete-time process (t = 0, 1, 2, . . .), with a finite number N of configurations C, we denote as T (Ct+1 |Ct , Ct−1 , . . . , C0 ) the probability for the process to jump to a new configuration Ct+1 between times t and t + 1, given the whole history (Ct , Ct−1 , . . . , C0 ). Note that Ct+1 can a priori be any of the N possible configurations, including the configuration Ct itself. The transition probability T (Ct+1 |Ct , Ct−1 , . . . , C0 ) can be considered as a conditional probability, so that the following normalization condition holds
T (Ct+1 |Ct , Ct−1 , . . . , C0 ) = 1.
(2.1)
Ct+1
Such a stochastic process is said to be Markovian if the transition probability T (Ct+1 |Ct , Ct−1 , . . . , C0 ) depends only on the configuration Ct occupied at time t, and not on previously occupied configurations. In short, Markovian processes are said to be “memoryless”. The transition probability is then defined without explicit reference to time t. In the following, we denote as T (C |C) the transition probability from configuration C to configuration C . This transition probability satisfies the normalization condition T (C |C) = 1 . (2.2) C
The above definition of discrete-time Markovian stochastic processes (also called Markov chains) can be rather straightforwardly extended to several other cases of practical importance. First, the number of discrete configurations can be infinite, and this case is recovered by taking the limit N → ∞ in the above definition. If configurations are no longer discrete, but are defined by a continuous variable y, a probability density T˜ (y |y) needs to be introduced, in such a way that T˜ (y |y)dy is the probability to choose a new configuration in the interval [y , y + dy ], starting from a given configuration y. The equivalent of the normalization condition Eq. (2.2) now reads ∞ (2.3) T˜ (y |y)dy = 1. −∞
Another generalization consists in replacing the discrete-time steps by a continuoustime evolution. Interestingly, continuous-time dynamics can be obtained from the discrete-time dynamics in the limit of a vanishing time step. Hence, instead of using
2.1 Markovian Stochastic Processes and Master Equation
41
a time step t = 1 as above, we now take an infinitesimal step t = dt. In order to obtain a meaningful limit when dt → 0, the transition probabilities T (C |C) from configuration C to configuration C have to scale with dt in the following way: if C = C, T (C |C) = W (C |C) dt + O(dt 2 ) W (C |C) dt + O(dt 2 ), T (C|C) = 1 −
(2.4)
C (=C)
where W (C |C) is independent of dt. In other words, the evolution of continuoustime Markovian stochastic processes is characterized by transition rates W (C |C), such that W (C |C)dt is the probability for the process to go from configuration C to a new configuration C in a time interval [t, t + dt]. Finally, in the case of a continuous-time process represented by a continuous variable y, a density of transition rate w(y |y) should be defined, in such a way that w(y |y)dy dt is the probability for the process to reach a value in the interval [y , y + dy ] at time t + dt, starting from a value y at time t. Beyond formal definitions and calculations, it is also important to be able to simulate Markovian stochastic processes on a computer. A brief account of elementary simulation methods is provided in Appendix B.
2.1.2 Master Equation and Detailed Balance The master equation describes the time evolution of the probability to occupy a given configuration. The simplest situation corresponds to discrete time and discrete configurations. The evolution of the probability Pt (C) to occupy configuration C at time t is given by T (C|C ) Pt (C ). (2.5) Pt+1 (C) = C
The probability Pt+1 (C) is thus simply a sum over all possible configurations C of the probability to go from C to C, weighted by the probability to occupy the configuration C at time t. It is easy to check, by summing over all configurations C and using the normalization equation (2.2), that Eq. (2.5) conserves the normalization of the probability Pt (C), namely, if C Pt (C) = 1, then C Pt+1 (C) = 1. For continuous configurations y, a density pt (y) has to be introduced (i.e., pt (y) dy is the probability that the configuration at time t belongs to the interval [y, y + dy]), and the evolution equation reads pt+1 (y) =
∞
−∞
T˜ (y|y ) pt (y )dy .
(2.6)
42
2 Non-stationary Dynamics and Stochastic Formalism
The evolution of continuous-time processes can be derived from this discrete-time equation, using again the limit of a vanishing time step dt. Considering a continuoustime process with discrete configurations, we denote as P(C, t) the probability to be in configuration C at time t. Combining Eqs. (2.4) and (2.5), we get P(C, t + dt) =
⎛
W (C|C )dt P(C , t) + ⎝1 −
C (=C)
⎞ W (C |C)dt ⎠ P(C, t).
C (=C)
(2.7) Expanding the left-hand side of this last equation to first order in dt as P(C, t + dt) = P(C, t) +
dP (C, t) dt + O(dt 2 ) dt
(2.8)
we eventually find, in the limit dt → 0, that the probability P(C, t) evolves according to the master equation: dP (C, t) = −P(C, t) W (C |C) + W (C|C )P(C , t). dt C (=C) C (=C)
(2.9)
The first term in the right-hand side can be interpreted as a “loss” term (i.e., the sum of all the possibilities to exit configuration C), while the second term can be thought of as a “gain” term (the sum of all the possibilities to arrive at configuration C, starting from any other configuration). A similar equation is obtained in the case of continuous configurations y for the probability density p(y, t), by basically replacing discrete sums by integrals in Eq. (2.9): ∂p (y, t) = − p(y, t) ∂t
∞ −∞
dy w(y |y) +
∞
−∞
dy w(y|y ) p(y , t).
(2.10)
Further generalization to multidimensional configurations y = (y1 , . . . , yn ) is also straightforward. From now on, we will work mainly with discrete configurations as far as formal and generic calculations are concerned, keeping in mind that the continuous variable case can be obtained by switching from discrete to continuous notations. An interesting property of continuous-time master equations is the notion of detailed balance, which is related to the steady-state (i.e., time-independent) solution of the master equation. From Eq. (2.9), a time-independent solution P(C) satisfies, for all configurations C,
[−W (C |C) P(C) + W (C|C ) P(C )] = 0.
(2.11)
C (=C)
It may happen, for some specific stochastic processes, that the term between brackets vanishes for all C , namely,
2.1 Markovian Stochastic Processes and Master Equation
∀(C, C ),
43
W (C |C) P(C) = W (C|C ) P(C ).
(2.12)
This situation is referred to as detailed balance. Processes satisfying detailed balance are much easier to handle analytically. Besides this practical advantage, detailed balance also plays an important role in the stochastic modeling of microscopic physical processes (i.e., at the molecular scale). This is due to the fact that detailed balance can be interpreted as the stochastic counterpart of the microreversibility property satisfied by the Hamiltonian dynamics—see Sect. 1.1. Indeed, the probability to observe, once a statistical steady state is reached, an elementary trajectory from C at time t to C at time t + dt is W (C |C) dt P(C), while the probability to observe the reverse trajectory is W (C|C ) dt P(C ). The equality of these two probabilities, to be thought of as a statistical microreversibility, precisely yields the detailed balance relation (2.12). Hence, in order to model, at a coarse-grained level, the dynamics of a microscopic physical system through a Markovian stochastic process, it is natural to assume that the process satisfies detailed balance (in addition to the appropriate conservation laws, like energy conservation).
2.1.3 A Simple Example: The One-Dimensional Random Walk A simple and illustrative example of stochastic process is the one-dimensional random walk, where a “particle” moves at random on a one-dimensional lattice. Let us consider first the discrete-time case: a particle can take only discrete positions x = . . . , −2, −1, 0, 1, 2, . . . on a line. Between times t and t + 1, the particle randomly jumps to one of the two neighboring sites, so that xt+1 = xt + t , with t = ±1 with equal probabilities. The random variables t and t , with t = t , are independent and identically distributed. An illustration of a random walk is given in Fig. 2.1.
Fig. 2.1 Illustration of a random walk
100
xt
50
0
-50
0
2000
4000
6000 t
8000
10000
44
2 Non-stationary Dynamics and Stochastic Formalism
The average value and the variance of this process can be derived straightforwardly. We first note that xt+1 = xt , so that xt = x0 for all t (the notation . . . denotes an ensemble average, that is, an average over a very large number of samples of the same process; it may thus depend on time). For instance, if the walk starts with probability 1 from x0 = 0, then all subsequent averages xt = 0. Let us now compute the variance of the process, defined as Var(xt ) = xt2 − xt 2 .
(2.13)
We assume for simplicity that xt = 0, so that Var(xt ) = xt2 (the generalization to xt = 0 is however straightforward). From xt+1 = xt + t , we get 2 = xt2 + 2xt t + 1, xt+1
(2.14)
taking into account that t2 = 1. Computing the ensemble average of Eq. (2.14) yields 2 = xt2 + 2xt t + 1, xt+1
(2.15)
using the fact that xt depends only on t with t < t, so that xt and t are independent 2 = xt2 + 1, so that xt2 = random variables. As t = 0, it follows that xt+1 2 2 x0 + t. If x0 = 0 with probability 1, one has x0 = 0 and xt2 = t. This √ means that the typical position reached by the walk after t steps is of the order of t. The present random walk problem bears a direct relationship to the Central Limit Theorem [9, 17]—see Chap. 8. As the position xt of the random walk can be expressed as xt = t−1 t =0 t , where (0 , . . . , t−1 ) are independent and identically distributed random variables, the distribution of the position of the random walk can be approximated for a large time t, using the Central Limit Theorem, as 1 2 e−x /2t . P(x, t) ≈ √ 2π t
(2.16)
Alternatively, one may endow the random walk problem with a continuous-time dynamics. Labeling with an integer n the sites of the lattice, the transition rate W (n |n) from site n to site n is given by
W (n |n) =
⎧ν ⎨2 ⎩
0
if n = n ± 1
,
(2.17)
otherwise
where ν is a characteristic frequency (the inverse of a time scale) of the process. The master equation reads d Pn =− W (n |n) Pn (t) + W (n|n ) Pn (t). dt n (=n) n (=n)
(2.18)
2.1 Markovian Stochastic Processes and Master Equation
45
Replacing the transition rates by their expression given in Eq. (2.17), one finds ν ν d Pn = −ν Pn (t) + Pn+1 (t) + Pn−1 (t). dt 2 2
(2.19)
The evolution of the probability distribution Pn (t) can be evaluated from Eq. (2.19), for instance, by integrating it numerically. However, one may be interested in making analytical predictions in the large time limit, and such a discrete-space equation is not easy to handle in this case. To this aim, it is thus useful to use a procedure called “continuous limit”, through which the discrete-space equation (2.19) can be approximated by a partial differential equation. To be more specific, let us call a the lattice spacing (which was set above to a = 1). At large time t 1/ν, the distribution Pn (t) is expected to vary over spatial scales much larger than the lattice spacing a; in other words, one has (2.20) |Pn+1 (t) − Pn (t)| Pn (t). Plotting Pn (t) as a function of space, it thus appears essentially continuous. Hence, we postulate the existence of a distribution p(x, t) of the continuous variable x, such that the discrete-space distribution can be approximated as Pn (t) ≈ a p(na, t). The prefactor a is included to ensure a correct normalization, n Pn (t) = 1. Indeed, one has for a → 0 ∞ Pn (t) = a p(na, t) → p(x, t) d x. (2.21) n
n
−∞
For ∞ consistency, it is thus necessary to assume that p(x, t) is normalized such that −∞ p(x, t) d x = 1. Replacing Pn (t) by a p(na, t) in the master equation (2.19), one obtains ν ν ∂p (x, t) = −νp(x, t) + p(x + a, t) + p(x − a, t). ∂t 2 2
(2.22)
As a is small, one can expand p(x ± a, t) to second order in a, leading to p(x ± a, t) = p(x, t) ± a
∂p a2 ∂ 2 p (x, t) + (x, t) + O(a 3 ). ∂x 2 ∂x2
(2.23)
The linear terms in a appearing in Eq. (2.22) cancel out, so that this equation reduces to νa 2 ∂ 2 p ∂p (x, t) = (x, t), (2.24) ∂t 2 ∂x2 which is called the diffusion equation. This equation appears in numerous problems in physics, like the diffusion of heat in a material, or the diffusion of dye in water for instance. The coefficient 21 νa 2 has to take a finite value D > 0 for the equation to be well defined. As the lattice spacing a goes to zero, it is thus necessary that ν
46
2 Non-stationary Dynamics and Stochastic Formalism
simultaneously goes to infinity, which means that the “microscopic” process appears very fast on the scale of observation. Equation (2.24) has several simple solutions of interest. For instance, if the diffusing particle is bound to stay on a segment [−L , L], the long time limit distribution is a flat and time-independent distribution over the segment, p(x) = (2L)−1 . In other words, diffusion tends to flatten, or smoothen, the distribution. In contrast, if the particle can diffuse on the entire line without bound, the distribution p(x, t) never reaches a steady-state regime, but rather enters a scaling regime in which the distribution keeps broadening with time, with a well-defined Gaussian shape: p(x, t) = √
1 4π Dt
e−x
2
/4Dt
.
(2.25)
Note the analogy with the result obtained from the Central Limit Theorem in the discrete-time case—see Eq. (2.16).
2.2 Langevin Equation 2.2.1 Phenomenological Approach The above random walk example was quite simple to investigate, but had little explicit connection with physical systems. We now present another standard example based on a physical phenomenology. Let us imagine a probe particle immersed in a fluid, such that the size of the particle is small at the macroscopic scale, but still much larger than the typical size of the molecules of the fluid. For the sake of simplicity, we restrict the presentation to a one-dimensional system, but the more realistic threedimensional situation would follow the same line. We choose the mass of the probe particle as the unit mass. The acceleration of the particle is then governed by the force Fcoll exerted by the collisions with the other particles: dv = Fcoll , (2.26) dt where v is the velocity of the probe particle. Since the collisional force Fcoll is strongly fluctuating, the basic idea is to decompose it into a (velocity-dependent) average force, and a purely fluctuating (or noise) part: Fcoll = Fcoll |v + ξ(t).
(2.27)
Here, the average force Fcoll |v is computed as an average over a large number of samples of the process, conditioned to a given value v of the velocity. By definition, the noise ξ(t) has zero mean, ξ(t) = 0. In principle, the statistics of the noise could also depend on the velocity v. We assume here that it is independent of v, but we
2.2 Langevin Equation
47
will come back to this point below. To proceed further, it is necessary to choose a specific model for both Fcoll and ξ(t). The average force Fcoll can be interpreted as an effective friction force, which slows down the probe particle; it is thus natural to choose, as a first approximation, a linear friction force Fcoll = −γ v, with γ > 0 a friction coefficient. Then, a model of the noise should be given. Besides the property ξ(t) = 0, its two-time correlation should be specified. Intuitively, one expects collisions occurring at different times to be essentially uncorrelated, so that one should have ξ(t) ξ(t ) = 0 for |t − t | τcol , where τcol is the typical duration of a collision. Taking into account time-translation invariance, the correlation function of ξ(t) may thus be written as (2.28) ξ(t) ξ(t ) = C(t − t ), where C(u) is an even function that decays on a characteristic time scale τcol , and converges rapidly to zero when |u| → ∞. In practice, the dynamics of the velocity v of the probe particle occurs on a time scale γ −1 that is much larger than τcol . In this limit, the only quantity that plays a role in the dynamics of v is the integral of the correlation function that we denote as , ∞ C(t) dt . (2.29)
= −∞
Since the detailed shape of C(t) plays no role in this limit, it is convenient to replace C(t) by a delta function δ(t − t ), keeping the same value of the integral (a basic introduction to Dirac delta function can be found in Appendix A). Altogether, Eq. (2.26) can be rewritten as follows: dv = −γ v + ξ(t), dt with ξ(t) = 0,
ξ(t) ξ(t ) = δ(t − t ) .
(2.30)
(2.31)
Such an equation is called a linear Langevin equation, with additive white noise. The equation is called linear because the deterministic part of the dynamics, the term −γ v, is linear with respect to the variable v. Additive noise simply means that the noise is introduced in the equation as an additive term which is independent of the variable v. We will see in Sect. 2.2.3 more complicated situations where the noise term is coupled to the dynamical variable.
48
2 Non-stationary Dynamics and Stochastic Formalism
2.2.2 Basic Properties of the Linear Langevin Equation We now study some elementary properties of the linear Langevin equation (2.30), namely, the ensemble averages v(t) and v(t)2 . For simplicity, we take as initial condition a fixed value v(0) = v0 . We first note, computing the ensemble average of Eq. (2.30): d v(t) = −γ v(t), (2.32) dt that the ensemble-averaged velocity v(t) obeys the same equation as the nonaveraged velocity, except that noise is now absent. This property is specific to the linear Langevin equation, and would not be present if we had included a non-linear dependence on v in the friction force—e.g., Fcoll = −γ v − γ3 v3 . The solution of Eq. (2.32) is a decaying exponential: v(t) = v0 e−γ t .
(2.33)
More interestingly, the effect of the noise has a deep impact on the evolution of the variance of the velocity, Var[v(t)] = v(t)2 − v(t)2 . In order to compute v(t)2 , we first determine the explicit time dependence of v(t), considering ξ(t) as an arbitrary given function. Following standard mathematical methods, the general solution of Eq. (2.30) is given by the sum of the general solution of the homogeneous equation (i.e., the noiseless equation) and of a particular solution of the full equation. The general solution of the homogeneous equation is vh (t) = A0 e−γ t , where A0 is an arbitrary constant. In order to determine a particular solution, one can use the so-called “variation of the constant” method, which indicates that such a solution should be looked for in the form vp (t) = A(t) e−γ t , that is, simply replacing the constant A0 in the solution vh (t) of the homogeneous equation by a function A(t) to be determined. Inserting vp (t) in Eq. (2.30), we get d A −γ t e = ξ(t) dt
whence the solution
t
A(t) =
(2.34)
eγ t ξ(t ) dt
(2.35)
0
follows—since we look for a particular solution at this stage, there is no need to add a constant term to Eq. (2.35). Altogether, one finds for v(t), taking into account the initial condition v(0) = v0 , v(t) = v0 e−γ t + e−γ t
t 0
Computing v(t)2 yields
eγ t ξ(t ) dt .
(2.36)
2.2 Langevin Equation
v(t) = 2
v02 e−2γ t
49
+e
−2γ t
t
γ t
e ξ(t ) dt
2 + 2v0 e
−2γ t
0
t
eγ t ξ(t ) dt . (2.37)
0
Now taking an ensemble average, the last term vanishes because ξ(t) = 0, and we get
t 2 2 2 −2γ t −2γ t γ t + e e ξ(t ) dt v(t) = v0 e . (2.38) 0
The first term on the right-hand side of Eq. (2.38) is precisely v(t)2 , so that Var[v(t)] = e
−2γ t
t
γ t
e ξ(t ) dt
2 .
(2.39)
0
The square of the integral can be expanded as a product of two integrals, which in turn can be converted into a double integral:
t
γ t
e ξ(t ) dt
2
0
t
γ t
0
t
e ξ(t ) dt eγ t ξ(t ) dt 0 0 t t dt dt eγ (t +t ) ξ(t )ξ(t ) =
=
(2.40) (2.41)
0
so that Eq. (2.39) eventually turns into Var[v(t)] = e−2γ t
t
dt
0
t
dt eγ (t +t ) ξ(t )ξ(t )
(2.42)
0
(we recall that the ensemble average can be permuted with linear operations like integrals or derivatives). Using the expression (2.31) of ξ(t )ξ(t ), we get Var[v(t)] = e−2γ t
t 0
dt
t
dt eγ (t +t ) δ(t − t ).
(2.43)
0
The second integral in Eq. (2.43) can easily be computed, thanks to the properties of the delta function, leading to
t
dt eγ (t +t ) δ(t − t ) = e2γ t .
(2.44)
0
We thus eventually find, after integration of Eq. (2.43), Var[v(t)] =
1 − e−2γ t . 2γ
(2.45)
50
2 Non-stationary Dynamics and Stochastic Formalism
Hence, the variance starts from a zero value at t = 0 (the value v0 at t = 0 is nonrandom), and progressively grows until reaching the asymptotic limit /(2γ ). As v(t) → 0 when t → ∞, the variance reduces to v2 at large time, and this value can be identified with the equilibrium average. It is known from equilibrium statistical physics (see Sect. 1.2.4) that 21 v2 eq = 21 k B T (equipartition relation), where T is the temperature of the surrounding liquid, and k B the Boltzmann constant—we recall that the mass of the probe particle was set to unity.1 Hence, equilibrium statistical physics imposes a relation between the two phenomenologically introduced coefficients
and γ , namely, = 2γ k B T . In addition, using a slight generalization of the above calculation, it is also straightforward to show that for large times (t, t γ −1 ), the correlation of v decays exponentially,
−γ |t−t | e , (2.46) v(t)v(t ) = 2γ which reduces to the infinite time limit of Eq. (2.45) for t = t . Before considering more general forms of Langevin equations, it is useful to say a word on the practical way to integrate numerically the Langevin equation (2.30). Similar to ordinary differential equations, one needs to discretize the equation using small time steps t. This discretization is obtained by computing the integral of Eq. (2.30) over a time interval [t, t + t], yielding
t+t
v(t + t) = v(t) − γ v(t)t +
ξ(t )dt
(2.47)
t
t+t v(t )dt ≈ v(t)t, valid for small t. From Eq. (2.31), with the approximation t t+t the quantity W ≡ t ξ(t )dt is a random variable with zero mean and variance
t+t
(W ) = 2
dt t
t+t
dt ξ(t )ξ(t ) = t .
(2.48)
t
From its definition, W can be interpreted as a sum of a very large number of statistically independent contributions, so that it is natural to assume that the distribution of W is Gaussian from the Central Limit Theorem—see Chap. 8. Hence, we assume that W is a Gaussian random variable with zero mean and variance t (more rigorous justifications can be found, e.g., in Ref. [8]). Discretized stochastic trajectories can thus be numerically obtained from Eq. (2.47), drawing at each time step a new random value of W (values of W at different time steps are statistically independent). Time-dependent average values of observables are then obtained by averaging a given observable (e.g., v(t) or v(t)2 ) over many independent stochastic trajectories with the same initial condition.
1
Reintroducing the mass m, the equipartition relation reads 21 mv 2 eq = 21 k B T .
2.2 Langevin Equation
51
2.2.3 More General Forms of the Langevin Equation We have studied in the previous section the simplest version of the Langevin equation, namely, the linear Langevin equation with additive noise. We now wish to briefly mention several important generalizations of this equation. From now on, we will generically call x the variable evolving according to the Langevin equation. It may be any type of “mesoscopic” physical observable, which is sensitive to the presence of fluctuations, but evolves on time scales that are sufficiently large for a Langevin type of description to be relevant. In the example of the probe particle discussed in Sect. 2.2.1, this means that the probe particle is much heavier than the molecules it collides with, but still much lighter than, say, a grain of sand which would not feel any fluctuations in the force exerted by the surrounding fluid (if the fluid is at rest). A first generalization of the linear Langevin equation is to consider a non-linear deterministic term in the equation: dx = Q(x) + ξ(t), dt
(2.49)
where Q(x) is an arbitrary function of x. The white noise ξ(t) still satisfies ξ(t) = 0 and ξ(t)ξ(t ) = δ(t − t ). In the following subsections, we will generically consider this case when dealing with the Langevin equation. More generally, one may consider Langevin equations coupling an arbitrary number N of variables xi , d xi = Q i (x1 , . . . , x N ) + ξi (t), dt
i = 1, . . . , N ,
(2.50)
where the N stochastic variables ξi (t) are white noises with correlations ξi (t)ξ j (t ) = i j δ(t − t ) ,
i, j = 1, . . . , N .
(2.51)
Another important generalization of the Langevin equation is the one with multiplicative noise, in contrast to additive noise. For a single variable, the Langevin equation with multiplicative noise takes the generic form dx = Q(x) + B(x) ξ(t) dt
(2.52)
with a white noise ξ(t) of unit amplitude, ξ(t)ξ(t ) = δ(t − t ) (the amplitude
previously considered can be reabsorbed into the function B). In this case, the amplitude of the noise term B(x) ξ(t) depends on the value of the variable x. This is important, for instance, in the modeling of absorbing phase transitions (see Chap. 3) where fluctuations vanish in the absorbing state, when there are no more particles in the system. The specificity of the multiplicative Langevin equation (2.52) is that it requires the specification of a discretization scheme in order to be well defined, as different discretization schemes lead to different results. A rigorous description
52
2 Non-stationary Dynamics and Stochastic Formalism
of Eq. (2.52) can be achieved within the mathematical framework of stochastic calculus [8]. Here, we however stick to a heuristic viewpoint and simply describe the two main interpretations of Eq. (2.52) at an elementary level. From a mathematical perspective, the most natural interpretation is the Ito one, corresponding to the following discretization of Eq. (2.52). Introducing an increasing sequence of discrete times ti = it, the Ito interpretation corresponds to x(ti+1 ) = x(ti ) + Q x(ti ) t + B x(ti ) Wi ,
(2.53)
where Wi is a Gaussian random variable with zero mean and variance t. An alternative interpretation, commonly used in the physics community, is the Stratonovich one, which is expressed in terms of the discretized equation x(ti+1 ) = x(ti ) + Q x(ti ) t + B
x(ti ) + x(ti+1 ) Wi . 2
(2.54)
Note that this last equation is implicit in terms of x(ti+1 ) and needs to be expanded up to order t for practical use (see below). Both formulations lead to consistent interpretations of Eq. (2.52). The choice of one or the other may in some cases be related to the problem at hand. For instance, starting from a problem with a finite, but small correlation time of the noise, the correct interpretation in the white noise limit is the Stratonovich one [8]. To understand why the two discretizations (2.53) and (2.54) lead to different results, one can expand the noise term in Eq. (2.54) as
1 1 B x(ti ) + xi Wi = B x(ti ) Wi + B x(ti ) xi Wi + o(t) (2.55) 2 2 with the notation x(ti ). Now the point is that Wi ∼ (t)1/2 , and xi = x(ti+1 ) −1/2 thus xi = B x(ti ) Wi + o(t ). It follows that
1 1 B x(ti ) + xi Wi = B x(ti ) Wi + B x(ti ) B x(ti ) (Wi )2 + o(t) , 2 2 (2.56) not be neglected because it is of the same order where (Wi )2 ∼ t andcan thus in t as the drift term Q x(ti ) t of the Langevin equation. Hence, an additional term 21 B x(ti ) B x(ti ) (Wi )2 appears at order t in the Stratonovich scheme with respect to the Ito scheme, and both discretizations are thus not equivalent. Besides, another type of generalization consists in changing the properties of the noise, assuming that the noise has a finite correlation time. A typical case is that of an exponentially decaying correlation function ξ(t)ξ(t ) =
−|t−t |/τ , e 2τ
(2.57)
2.2 Langevin Equation
53
which reduces to a white noise in the limit τ → 0. For a non-zero τ , such a noise is often called a “colored noise”, and it may be obtained, for instance, from an underlying Langevin equation with white noise, as illustrated in Eq. (2.46). Then an equation like dv = −γ v + ξ(t) (2.58) dt yields for the mean-square velocity in the limit t → ∞, generalizing the calculation made in Sect. 2.2.1,
v(t)2 = . (2.59) 2γ (1 + γ τ ) As discussed above, white noise is often used to model equilibrium systems, and the ratio /γ is in this case related to temperature. In contrast, colored noise is relevant when modeling out-of-equilibrium systems, and no generic relation between , γ , τ , and macroscopic parameters of the problem (like temperature in the equilibrium case) is known. As a last generalization of the Langevin equation, we can eventually mention the case of Lévy noise. When discussing the discretization Eq. (2.48) of the Langevin equation, we concluded that the noise increment W could be interpreted as the sum of many infinitesimal noise terms, leading to a Gaussian statistics for W , according to the Central Limit Theorem. Yet, as described in Chap. 8, the standard Central Limit Theorem yielding the Gaussian distribution is valid only for random variables with a finite variance. For variables with infinite variance (typically corresponding to distributions with slowly decaying power-law tails), it is replaced by a Generalized Central Limit Theorem, yielding the so-called Lévy stable distributions as limit distributions (see Chap. 8). It is also possible to define a Langevin noise based on the Lévy statistics. The interested reader may refer to the review [13]. Qualitatively, the Lévy noise corresponds to a noise dominated by rare and large fluctuations, while fluctuations in the Gaussian noise typically remain of the order of the standard deviation. We shall come back to the effect of Lévy statistics on random walks in Sect. 2.4.
2.2.4 Relation to Random Walks After having introduced the Langevin equation from a physical perspective (that of a probe particle immersed in a fluid), it is interesting to present the Langevin equation from another perspective, that of random walks. With this aim in mind, we come back to the random walk model introduced in Sect. 2.1.3 and generalize it by including a small bias in the displacements. We consider a discrete-time dynamics with a time step t, and we call a the lattice spacing. At time t + t, the new position xt+t is chosen according to xt+t = xt + t , where t is given by
54
2 Non-stationary Dynamics and Stochastic Formalism
⎧ ⎨ a with prob. t = −a with prob. ⎩ 0 with prob.
ν 1 + aq(xt ) t , 2 ν 1 − aq(xt ) t , 2
(2.60)
1 − νt .
Note that the above dynamical rules can be interpreted as a discretized version of a continuous-time dynamics, as seen from the presence of the time step t and from the allowed value t = 0. Let us define xt ≡ xt+t − xt . The dynamical rules xt+t = xt + t can be rewritten as xt t = t t
(2.61)
which is the analog of Eq. (2.26), provided that xt is interpreted as a velocity; t /t then plays the role of a random force. Computing the average value of this “force”, we find using Eq. (2.60) t = a 2 νq(xt ). (2.62) t Note that the average is taken over t , for a fixed value of xt . Let us now consider the fluctuating part of the “force”, and define ξt =
1 (t − t ), t
(2.63)
which is thus the discrete-time analog of ξ(t) introduced in Sect. 2.2.1. We wish to evaluate the correlation of ξt , given by ξt ξt =
1 − t ) . ( − )( t t t (t)2
(2.64)
For t = t , ξt ξt is thus equal to zero, as t and t are independent random variables. If t = t , one has ξt ξt = Var(t )/(t)2 . Introducing (k, k ) through t = kt and t = k t, Eq. (2.64) reads ξt ξt =
1 Var(t ) δk,k , (t)2
(2.65)
where δk,k is the Kronecker delta symbol, equal to 1 if k = k and to zero otherwise. Evaluating the variance of t , we find Var(t ) = a 2 νt + O(t 2 ),
(2.66)
so that to leading order in t, ξt ξt = a 2 ν
δk,k . t
(2.67)
2.2 Langevin Equation
55
This expression is the analog of Eq. (2.31), and the role played by τcol in the physical approach to the Langevin equation (see Sect. 2.2.1) is now played by t. To provide further evidence for this correspondence, we point out that δk,k /t can be interpreted as the discretized version of the Dirac distribution. Indeed, from the definition of the Kronecker delta symbol, one can write for an arbitrary function f ∞
t f (k t)
k =−∞
δk,k = f (kt), t
(2.68)
which is precisely the discretized version of the fundamental property (A.1) of the Dirac delta function. Hence, taking the limit t → 0 (and then the limits a → 0 and ν → ∞ with a 2 ν fixed), one can reformulate the above biased random walk problem as a Langevin equation, namely, dx = Q(x) + ξ(t), dt
(2.69)
where Q(x) ≡ a 2 νq(x), and where the noise ξ(t) satisfies ξ(t) = 0,
ξ(t) ξ(t ) = δ(t − t ) .
(2.70)
2.3 Fokker–Planck Equation The Fokker–Planck equation describes the evolution of the probability distribution p(x, t) of a variable x obeying a Langevin equation. It can be derived in various ways, one of the simplest being to start from the above biased random walk problem, and to derive the continuous limit of the master equation, following the same lines as for the derivation of the diffusion equation—see Sect. 2.1.3. For a detailed account of the Fokker–Planck equation, the reader is referred to the specific book by Risken [19] on this topic, or to the more general books on stochastic processes by van Kampen [20] and by Gardiner [8].
2.3.1 Continuous Limit of a Discrete Master Equation Starting from the biased random walk model of Sect. 2.2.4, we consider the continuous-time version of the model, and write the corresponding transition rates W (n |n), where n = x/a is an integer labeling the sites of the one-dimensional lattice:
56
2 Non-stationary Dynamics and Stochastic Formalism
⎧ν if n = n + 1 ⎪ ⎪ 2 (1 + aqn ) ⎪ ⎪ ⎨ if n = n − 1 W (n |n) = ν2 (1 − aqn ) ⎪ ⎪ ⎪ ⎪ ⎩ 0 otherwise.
(2.71)
To lighten notations, we have denoted q(na) as qn . Formally, one can write the transition rates as W (n |n) =
ν ν (1 + aqn )δn ,n+1 + (1 − aqn )δn ,n−1 . 2 2
(2.72)
The master equation then reads ν ν d Pn = −ν Pn (t) + (1 + aqn−1 ) Pn−1 (t) + (1 − aqn+1 ) Pn+1 (t). dt 2 2
(2.73)
We now take the continuous limit of this master equation. Writing, as in Sect. 2.1.3, Pn (t) = a p(na, t), where p(x, t) is a distribution of the continuous variable x sat∞ isfying −∞ p(x, t) d x = 1, we have ∂p ν ν (x, t) = −ν p(x, t) + [1 + a q(x − a)] p(x − a, t) + [1 − a q(x + a)] p(x + a, t). ∂t 2 2
(2.74)
Expanding p(x ± a, t) and q(x ± a) to second order in a, we get a2 ∂ 2 p ∂p (x, t) + (x, t) + O(a 2 ), ∂x 2 ∂x2 a 2 q (x) + O(a 2 ). q(x ± a) = q(x) ± a q (x) + 2
p(x ± a, t) = p(x, t) ± a
(2.75) (2.76)
Gathering results, one then finds, keeping only terms up to order a 2 in Eq. (2.74): ∂p a2ν ∂ 2 p ∂p (x, t) = −a 2 ν q(x) − a 2 ν q (x) p(x, t) + . ∂t ∂x 2 ∂x2
(2.77)
We note that a 2 ν is related both to the diffusion coefficient D introduced in Sect. 2.1.3, and to the coefficient characterizing the correlation of the noise in Sect. 2.2.1: a 2 ν = 2D = .
(2.78)
In order to have a well-defined continuous limit, one must here again take the limits a → 0 and ν → ∞ in such a way that a 2 ν converges to a finite value. Defining Q(x) = q(x), Eq. (2.77) then reads as ∂2 p ∂ ∂p (x, t) = − Q(x) p(x, t) + (x, t) . ∂t ∂x 2 ∂x2
(2.79)
2.3 Fokker–Planck Equation
57
This equation is called a Fokker–Planck equation. It describes, from another perspective, the same random process as the Langevin equation (2.69). Note that the Fokker–Planck equation (2.79) can be rewritten as a balance equation ∂J ∂p (x, t) = − (x, t), ∂t ∂x
(2.80)
where J is the probability current J (x, t) = Q(x) p(x, t) −
∂p (x, t) , 2 ∂x
(2.81)
which contains an advective part and a diffusive part. Before discussing a more general derivation of the Fokker–Planck equation in Sect. 2.3.2, we first provide two simple physical examples of applications of this equation. As a first example, we come back to the probe particle studied in Sect. 2.2.1. In this case, the variable x is replaced by the velocity v, and the bias function is given by Q(v) = −γ v. The Fokker–Planck equation reads ∂2 p ∂ ∂p (v, t) = γ v p(v, t) + , ∂t ∂v 2 ∂v2
(2.82)
where the coefficients and γ are related through = 2γ k B T . It can be checked that the solution of this equation, with initial condition p(v, t = 0) = δ(v − v0 ) [i.e., the initial velocity is non-random and equal to v0 ], is given by (v − v0 e−γ t )2 −2γ t −1/2 . exp − p(v, t) = 2π k B T 1 − e 2k B T (1 − e−2γ t )
(2.83)
One can check that the mean velocity v and the variance Var[v(t)] correspond to the ones calculated from the Langevin equation—see Eqs. (2.33) and (2.45). This process, namely, a random walk confined by a quadratic potential, is also called Ornstein–Uhlenbeck process. Another standard example of Fokker–Planck equation is that associated with the (overdamped) Langevin dynamics of a particle, described by its position x in one dimension, in a potential energy U (x). The Langevin dynamics reads x˙ = −U (x) + ξ(t), with ξ(t) a Gaussian white noise of amplitude = 2γ k B T . The associated Fokker–Planck equation is Eq. (2.79), with Q(x) = −U (x). Its equilibrium solution corresponds to the stationary solution with zero current, J = 0. Given the expression Eq. (2.81) of the current, and specializing to Q(x) = −U (x), the equilibrium solution is readily obtained as peq (x) = p0 e−U (x)/k B T ,
(2.84)
58
2 Non-stationary Dynamics and Stochastic Formalism
where p0 is a normalization factor. Equation (2.84) is nothing but the equilibrium Boltzmann–Gibbs distribution for a particle in a potential energy U (x), at temperature T . Hence, the Fokker–Planck equation correctly describes the equilibrium distribution, and also allows one to get information on the relaxation to equilibrium, as illustrated above in the case of the velocity distribution of a probe particle.
2.3.2 Kramers–Moyal Expansion More generally, the Fokker–Planck equation may be derived from an arbitrary master equation provided the random variable performs only small jumps. Let us consider a stochastic Markov process defined by transition rates W (x |x). The master equation reads ∂p (x, t) = d x [W (x|x ) p(x , t) − W (x |x) p(x, t)], (2.85) ∂t where the integration range is the domain of definition of the variable x. Introducing the notation T (y, x) ≡ W (x + y|x), the master equation (2.85) can be rewritten as ∂p (x, t) = ∂t
dy [T (y|x − y) p(x − y, t) − T (y, x) p(x, t)] .
(2.86)
It is then possible to expand the dependence on x − y of T (y|x − y) p(x − y, t) in powers of y around x − y = x, leading to the Taylor series expansion T (y|x − y) p(x − y, t) =
∞ (−y)n ∂ n [T (y, x) p(x, t)] n! ∂ x n n=0
(2.87)
with the convention that the zeroth-order derivative of a function is the function itself. Reporting expansion (2.87) into Eq. (2.86) and exchanging the order of integrals and derivatives, we get the so-called Kramers–Moyal expansion [8, 19, 20] of the master equation, ∞ ∂p (−1)n ∂ n (x, t) = [αn (x) p(x, t)], (2.88) ∂t n! ∂ x n n=1 where we have defined αn (x) ≡
dy y n W (x + y|x) .
(2.89)
Truncating this expansion to second order in the derivatives, one obtains the Fokker– Planck equation
2.3 Fokker–Planck Equation
59
∂ ∂p ∂2 (x, t) = − [α1 (x) p(x, t)] + 2 [α2 (x) p(x, t)] . ∂t ∂x ∂x
(2.90)
The coefficients α1 (x) and α2 (x) play the same role as the quantities Q(x) and /2 appearing in Eq. (2.79). Note that the coefficient α2 (x) is here a function of x, while its counterpart was assumed to be constant in Eq. (2.79).
2.3.3 More General forms of the Fokker–Planck Equation The Fokker–Planck equation can also be generalized to an arbitrary number of coupled variables. Let us consider a set of N variables xi (t) obeying the Langevin dynamics d xi = Q i (x1 , . . . , x N ) + ξi (t), i = 1, . . . , N , (2.91) dt where the N random noises ξi (t) are correlated according to ξi (t)ξ j (t ) = i j δ(t − t ) .
(2.92)
The associated Fokker–Planck equation governing the evolution of the probability distribution P(x1 , . . . , x N , t) reads N N 1 ∂ ∂2 P ∂P =− Qi P +
i j , ∂t ∂ xi 2 i, j=1 ∂ xi ∂ x j i=1
(2.93)
where we have dropped the arguments of the functions Q i and P to lighten notations. In addition, the Fokker–Planck equation can also be generalized to the case of multiplicative noise in the associated Langevin equation. For simplicity, we present here only the case of a single variable, but the interested reader may find more details, e.g., in Ref. [8]. We start from the multiplicative Langevin equation (2.52) that we rewrite here for clarity, dx = Q(x) + B(x) ξ(t) . (2.94) dt We recall that the white noise ξ(t) has a unit amplitude, and satisfies ξ(t)ξ(t ) = δ(t − t ). As we have seen in Sect. 2.2.3, such a multiplicative Langevin equation may be interpreted in different ways, and the interpretation scheme considered has to be specified. In the Ito interpretation, the associated Fokker–Planck equation is given by 1 ∂2 ∂ ∂p (x, t) = − Q(x) p(x, t) + B(x)2 p(x, t) , 2 ∂t ∂x 2 ∂x while in the Stratonovich interpretation, the Fokker–Planck equation reads
(2.95)
60
2 Non-stationary Dynamics and Stochastic Formalism
1 ∂ ∂ ∂p (x, t) = − Q(x) p(x, t) + ∂t ∂x 2 ∂x
B(x)
∂ B(x) p(x, t) ∂x
.
(2.96)
As briefly discussed in Sect. 2.2.3, the choice of the interpretation framework depends on the problem considered. To model systems with small, but non-zero correlation time of the noise, as when an external noise is applied on an otherwise deterministic system (e.g., turbulent wind acting on a bridge), the Stratonovich interpretation is the relevant one, hence its widespread use in physics and engineering for instance. Even if the noise is added to the intrinsic dynamics of the system, the noise term may be multiplicative due to a non-trivial coupling with the degrees of freedom of the system. Alternatively, one may use a Langevin equation to describe at a coarsegrained level an intrinsically discrete process. This is the case, for instance, for chemical reactions, or more generically for any microscopic process described by a master equation (stochastic jump process). In this case, the noise is not an additional effect that may be taken into account or not, but it is deeply rooted in the intrinsically discrete nature of the process. This corresponds to the Kramers–Moyal derivation of the Fokker–Planck equation from a master equation, and leads to the Ito interpretation of the resulting Fokker–Planck equation, Eq. (2.90). In this last equation, α1 (x) and α2 (x) play the same role as Q(x) and 21 B(x)2 in the Fokker–Planck equation (2.95). In other words, the Ito and Stratonovich interpretations of the Langevin and Fokker–Planck equations can be associated with the modeling of two different types of system, a dynamics with an “external” noise for the Stratonovich interpretation, and a dynamics with an “internal” noise for the Ito interpretation. However, it may happen that one is working on a Langevin or Fokker–Planck equation with a given interpretation (Ito or Stratonovich, as determined, e.g., from the physics of the problem), but that the other interpretation would be more convenient for the formal treatment of the problem. Indeed, Stratonovich and Ito interpretations have distinct formal advantages, especially regarding the Langevin equation. When working with a Langevin equation in the Stratonovich interpretation, one can use the standard rules of calculus to manipulate the equation. For instance, in the Stratonovich interpretation, one can rewrite the Langevin equation (2.94) as [20] Q(x) 1 dx = + ξ(t) , B(x) dt B(x)
(2.97)
and define the new variable y(t) such that dy = d x/B(x). In terms of the variable y, Eq. (2.94) reads Q x(y) dy + ξ(t), = (2.98) dt B x(y) which is thus a Langevin equation with additive noise, for which there is a uniquely defined associated Fokker–Planck equation. The latter is precisely the Stratonovich Fokker–Planck equation (2.96) up to the change of variable from x to y. In contrast, if one starts from the Langevin equation (2.94) in the Ito interpretation, it is no longer possible to use the standard rules of differential calculus, which
2.3 Fokker–Planck Equation
61
have to be replaced by the rules of stochastic calculus like Ito’s lemma, as discussed in Sect. 2.3.4. On the other side, the Ito interpretation of the Langevin equation has the advantage that the noise term is uncorrelated with the variable x(t), implying that B(x) ξ(t) = B(x)ξ(t) = 0. This property is no longer true in the Stratonovich interpretation. It may thus be of interest in some situations to transform a Stratonovich formulation into an Ito formulation, or the other way round. To establish this connection, we use the property B
∂ ∂ (B P) = (B 2 P) − B B P , ∂x ∂x
(2.99)
with B = d B/d x. It follows, for instance, that the Stratonovich Fokker–Planck equation (2.96) can be rewritten as an Ito Fokker–Planck equation (2.95) with a modified drift term 1 Q(x) = Q(x) + B (x)B(x) . (2.100) 2 Hence, in this case, the Stratonovich and Ito Fokker–Planck equations describe the same stochastic process, but they have different drift terms Q(x) and Q(x), respectively. In contrast, in their original forms Eqs. (2.95) and (2.96) where they have the same drift term Q(x), they describe two different stochastic processes, corresponding, respectively, to the Ito and Stratonovich interpretations of the multiplicative Langevin equation (2.94).
2.3.4 Stochastic Calculus In this book, we only discuss the Langevin equation and its relation to the Fokker– Planck equation in an informal way from a mathematical viewpoint. A rigorous formulation, based on the so-called stochastic calculus, can be found, for instance, in [8, 16]. An important notion studied in this mathematical framework is that of a function of a variable obeying a Langevin equation. Let us consider the general Langevin equation (2.52), x˙ = Q(x) + B(x) ξ(t), interpreted herein the Ito scheme, with ξ(t)ξ(t ) = δ(t − t ). Now define a new variable y(t) = f x(t) , where f is a regular function. The question is: how to express the time derivative of the function y(t)? If the stochastic process x(t) was differentiable, the time derivative y˙ would ˙ It may also be written in simply be given by the usual chain rule: y˙ = f (x) x. differential form, dy = f (x) d x, where d x and dy are increments of the variables x and y over a time increment dt. It turns out that for a stochastic process x(t) obeying a Langevin equation in the Ito interpretation, the chain rule has to be generalized to dy = f (x) d x +
1 f (x)B(x)2 dt . 2
(2.101)
62
2 Non-stationary Dynamics and Stochastic Formalism
This equation is called Ito’s Lemma [8, 16], or sometimes the stochastic chain rule [5]. Since d x is a stochastic increment, the term f (x) d x in Eq. (2.101) has to be interpreted consistently in the Ito scheme. Note that we use here on purpose the notations d x, dy, and dt instead of x, y, and t as in previous sections, to emphasize that Eq. (2.101) is a rigorous mathematical result, and not a mere discretization. We now try to sketch very informally the main ideas behind the derivation of Eq. (2.101), and we thus switch back to discretized equations. Over a time increment t, the process x(t) changes by x = x(t + t) − x(t). Defining the increment y = f x(t) + x − f x(t) , an expansion to second order in x yields 1 y = f x(t) x + f x(t) (x)2 + o(x 2 ) . 2
(2.102)
We wish to collect only contributions up to order t when t → 0. This means that to evaluate (x)2 , we need to retain only the contribution to x of order t 1/2 , namely, x = B(x) W , where W is a Gaussian random variable of variance t. Dropping the explicit t dependence to lighten notations, it follows that y = f (x) x +
1 f (x) B(x)2 (W )2 + o(t) . 2
(2.103)
Taking the limit t → 0, one thus obtains a result quite similar to Eq. (2.101). However, there is an important difference: the squared stochastic increment (W )2 in Eq. (2.103) is replaced by dt in Eq. (2.101). This means we should have t instead of (W )2 in Eq. (2.103) to fully understand the stochastic chain rule Eq. (2.101), at the heuristic level discussed here. The replacement of (W )2 by t is sometimes referred to as a substitution rule [5]. Its validity is a priori not obvious, because (W )2 is a distributed random variable, and only its average (W )2 = t. The standard deviation of (W )2 is also proportional to t, so that the amplitude of fluctuations is comparable to the mean. Then, how can one understand this substitution rule? Mathematically, it results from the definition of a stochastic integral and from the notion of convergence in L 2 -norm [8, 16]. In a sense, it can be interpreted as a generalization of the Law of Large Numbers (see Chap. 8). To avoid technicalities and simply get the essence of the argument, let us consider a simplified version of the problem in which a random variable z evolves according to z i = (Wi )2 , where i labels the time steps of the discretized dynamics. Integrating this relation between t = 0 and T and assuming for simplicity that z(0) = 0, we get T /t
z(T ) =
(Wi )2 .
(2.104)
i=0
The fluctuations of the random variable z(T ) can be quantified by evaluating its variance,
2.3 Fokker–Planck Equation
63 T /t
Var[z(T )] =
Var[(Wi )2 ],
(2.105)
i=0
where we have used that the squared increments (Wi )2 are statistically independent. Given that Var[(Wi )2 ] = 2(t)2 , we obtain Var[z(T )] = 2T t. Keeping the duration T of the trajectory fixed, and sending the discretization time step t to zero, we find that Var[z(T )] → 0. Hence, the fluctuations of z(T ) can be neglected, so T /t that z(T ) can be replaced by its average z(T ) = n=0 t = T . So the integrated variable z(T ) obtained by summing the fluctuating variables (Wi )2 is the same as if one had substituted (Wi )2 by its average t. This is the basic interpretation of the substitution rule, which has to be understood in terms of convergence of integrated random variables. The general argument to justify the substitution rule in Eq. (2.103) is more technical, but its essence is similar. The interested reader may refer to Refs. [8, 16] for more details.
2.4 Anomalous Diffusion: Scaling Arguments In the previous sections, we have introduced basic notions on random walks and some related types of stochastic processes. An important feature of standard random walks is that transitions between sites are characterized by well-defined time and length scales: the length scale is the lattice spacing a, and the time scale is the time step t for discrete-time dynamics, or the inverse of the frequency ν for continuous-time dynamics—see Sect. 2.1.3. One can also consider random walks with a continuous distribution of jump length x, instead of a random walk on a lattice where x is constrained to be an integer multiple of the lattice spacing a. Yet, as long as the second moment (x)2 of the jump length remains finite, the basic properties of the random walk remain the same. In particular, for an unbiased random walk such that x = 0, the mean-square displacement x(t)2 is proportional to t independently of the detailed shape of the distribution P(x). In other words, the typical displacement xtyp (t) is of the order of t 1/2 . However, if the (symmetric) distribution P(x) is broad enough so that (x)2 is infinite, the typical displacement xtyp (t) generically grows faster than t 1/2 , often as a power t β with β > 1/2. Such a random walk is called superdiffusive, as it moves faster than a diffusive walk. This happens, in particular, when P(x) has power-law tails 1 , |x| → ∞ (2.106) P(x) ∼ |x|1+α with 0 < α < 2 (the symbol ∼ here means asymptotic proportionality, i.e., constant prefactors are not made explicit). Similarly, instead of considering broadly distributed jumps, one may consider a broad distribution of the time τ elapsed between two successive jumps, leading to an intermittent dynamics. Such random walks are
64
2 Non-stationary Dynamics and Stochastic Formalism
generically called continuous-time random walks. If the distribution ψ(τ ) is such that its first moment τ is infinite (which typically happens when ψ(τ ) ∼ 1/τ 1+α with 0 < α < 2), the dynamics is slowed down, and the typical displacement grows more slowly than t 1/2 , often as a power t β with β < 1/2. Such a random walk is called subdiffusive. A well-defined mathematical formalism exists to properly deal with such continuous-time random walks—see, e.g., [4, 13]. Some aspects of these anomalous random walks can also be studied in the framework of the Generalized Central Limit Theorem, which is introduced in Chap. 8. Here, we simply wish to provide the reader with some simple scaling arguments that can be used to understand some of the basic properties of anomalous random walks.
2.4.1 Importance of the Largest Events Qualitatively, the reason why anomalous random walks have a typical displacement that scales differently from t 1/2 is that extreme events (very large jumps, or very large time lags between two jumps) start to play an important role. This is actually a major property of broad distributions. In this subsection, we would like first to give a flavor, in intuitive terms, of why these large events acquire a significant statistical weight. To do so, we consider a positive random variable x with a distribution p(x) having a power-law tail 1 x → ∞. (2.107) p(x) ∼ 1+α , x When α is lowered, the distribution (2.107) becomes broader and broader, with a “heavy tail” that contains a significant part of the probability weight. In other words, very large values of x have a significant probability to be drawn from the distribution, and such large values play an essential role in the sum. We focus on the regime where this effect is the strongest, which corresponds to α < 1. Indeed, in this range of α, the average value x itself becomes infinite. Considering N random values xi , i = 1, . . . , N drawn from the Ndistribution p(x), xi . The typical we wish to compare the largest value in the set {xi } to the sum i=1 value of the maximum max(xi ) can be evaluated as follows. Let us define FNmax (z) ≡ Prob max(x1 , . . . , x N ) < z .
(2.108)
From the independence property of the xi ’s, one has
FNmax (z)
=
N
z
p(x) d x −∞
N ˜ = 1 − F(z) ,
(2.109)
2.4 Anomalous Diffusion: Scaling Arguments
65
˜ we have defined the complementary cumulative distribution F(z) ≡ where ∞ p(x) d x. As the typical value of max(x , . . . , x ) is large for large N , we can 1 N z ˜ approximate F(z) by its asymptotic behavior at large z: c ˜ F(z) ≈ α, z
z → ∞,
(2.110)
where we have introduced explicitly the proportionality constant c. It follows that N cN ˜ ≈− α ln 1 − F(z) z so that
α
FNmax (z) ≈ e−cN /z .
(2.111)
(2.112)
In other words, FNmax (z) can be rewritten in the scaling form FNmax (z) ≈
z , N 1/α
(2.113)
−α
with (u) = e−cu , which indicates that the typical value of max(xi ) is of the order of N 1/α , as FNmax (z) increases from 0 to 1 around z ≈ N 1/α . Note that Eq. (2.113) is precisely a definition of the notion of typical value: it is the value by which the variable needs to be rescaled in the expression of the probability distribution (either the cumulative probability distribution or the probability density). The observation that thetypical value of z is of the order of N 1/α has important N xi . Intuitively, one expects the typical value of the consequences on the sum i=1 sum to be proportional to the number N of terms. If α > 1, N 1/α N for large N , so that the largest term remains much Nsmaller than the sum. In contrast, if α < 1, xi is of the order of N breaks down, as the N 1/α N , and the assumption that i=1 sum is necessarily greater than its largest term (we recall that all terms are positive). It can be shown using a more refined argument that the sum is of the order of the largest term itself, namely, N xi ∼ N 1/α . (2.114) i=1
It is then customary to say that the largest term “dominates” the sum. For 1 < α < 2, the situation is slightly more subtle: the largest term remains much N smaller than the sum, consistently with the finiteness of x which implies i=1 x i ∼ N x. However, the fluctuations of x remain large, as witnessed by the divergence of the variance of x, which prevents the Central Limit Theorem for being applicable—see Chap. 8. Hence, the fluctuations of the sum are also large.
66
2 Non-stationary Dynamics and Stochastic Formalism
2.4.2 Superdiffusive Random Walks The above behavior of the statistics of a sum of broadly distributed random variables has important consequences for anomalous diffusion processes. Let us start with the superdiffusive case, which corresponds to a broad distribution of jump sizes. We consider a discrete-time random walk evolving according to xt+1 = xt + u t , where u t is drawn from a symmetric distribution p(u). We assume that space is continuous, and that the variables {u t } are independent and identically distributed random variables. Accordingly, the position xt is given by xt =
t−1
ut ,
(2.115)
t =0
where we have assumed that x0 = 0. The present problem is thus directly related to problems of random sums. The symmetry of the distribution p(u) implies that u t = 0, from which xt = 0 follows. If u 2 is finite, one has xt2 =
u t u t = tu 2 ,
(2.116)
t ,t
where we have used the fact that the variables u t and u t are statistically independent for t = t , implying u t u t = 0. Hence, the mean-square displacement xt2 is linear in t, which corresponds to a normal diffusive process. In contrast, if the distribution p(u) is broad, with an infinite variance, the above reasoning fails, since the average values appearing in Eq. (2.116) are infinite. Let us consider for definiteness a distribution p(u) such that p(u) ∼
1 , |u|1+α
u → ±∞ ,
(2.117)
with α < 2, so that u 2 is infinite—see Fig. 2.2 for an illustration. We can however use a scaling argument inspired by Eq. (2.116), by using typical values instead of average values: t−1 xtyp (t)2 ∼ u 2t , (2.118) t =0
where we have neglected cross terms u t u t (t = t ). The typical value of the square displacement is thus the typical value of a sum of t independent random variables yt ≡ u 2t . Using the relation du P(y) = p(u) , dy
(2.119)
2.4 Anomalous Diffusion: Scaling Arguments 400
200
xt
Fig. 2.2 Illustration of a superdiffusive random walk, with a power-law distribution of jump sizes, of parameter α = 0.8—see Eq. (2.117). The largest jump is of the same order as the total displacement, whatever the chosen time window
67
0
-200 0
200 0
4000
t
6000
8000
10000
one finds that the distribution P(y) also has a power-law tail, satisfying P(y) ∼
1 y
1+ α2
,
y → ∞.
(2.120)
Combining Eqs. (2.114) and (2.118), one then deduces that xtyp (t) ∼ t 1/α .
(2.121)
Hence, the random walk has an anomalous behavior of its typical displacement, xtyp (t) ∼ t β , characterized by an exponent β = α1 > 21 (since α < 2). The walk is thus superdiffusive. Note that a more rigorous derivation of this result can be carried out using the Generalized Central Limit Theorem, introduced in Chap. 8.
2.4.3 Subdiffusive Random Walks On the contrary, subdiffusive walks have (in the simplest cases) a well-defined jump length, but exhibit strong local trapping effects, so that the sojourn times on a given site become broadly distributed, instead of being fixed to a value t as in the above superdiffusive example. We thus consider a random walk process in which the time lag τ between two jumps is itself a random variable τ following a distribution p(τ ), with a tail τ → ∞ (0 < α < 1). (2.122) p(τ ) ∼ 1/τ 1+α , After a time τ , the walker jumps to one of the two neighboring sites, namely, xt+τ = xt + t , where t = ±1 with equal probabilities. An illustration is provided in Fig. 2.3. Here again, the behavior of the random walk can be understood through a simple typ scaling argument. After N steps, the typical displacement x N of the walker is of
68
120 80 xt
Fig. 2.3 Illustration of a subdiffusive random walk, with a power-law distribution of sojourn times of parameter α = 0.8, as defined in Eq. (2.122). The largest sojourn time is typically a finite fraction of the total time window
2 Non-stationary Dynamics and Stochastic Formalism
40 0 0
2×10
5
4×10
5
t
6×10
5
8×10
5
1×10
6
√ the order of N . To relate N to the actual time t, one can observe that time t is the sum of the N sojourn times τi at the i th position. Hence, using the estimate given in Eq. (2.114), N τi ∼ N 1/α (2.123) t= i=1
whence the scaling N ∼ t α follows. Combining this relation with x N ∼ N 1/2 , we finally obtain (2.124) xtyp (t) ∼ t α/2 . typ
The exponent β characterizing the anomalous diffusion is thus β = α2 < 21 (we recall that α < 1). The random walk is therefore slower than normal diffusion or, in other words, subdiffusive. Note that this case is not a direct application of the Generalized Central Limit Theorem, but there exist rigorous methods to derive the above scaling behaviors [13]. To conclude this section on anomalous diffusion, it is interesting to mention a related situation which leads to a different scaling exponent. In the above subdiffusive walk, the lag time τ between two successive jumps is drawn anew at each step, and is independent of all the previous values of τ . However, thinking of the physical situation of a particle randomly evolving in a complex (though one-dimensional) potential energy landscape, one may identify the local minima of the potential with the sites of a lattice. Jumps from one local minimum to a neighboring one occur through thermal activation over the energy barrier separating the two minima. The sojourn time in a minimum, given by an Arrhenius law, is exponential with respect to the energy barrier, and is thus very sensitive to the value of the barrier. The presence of an even relatively moderate range of values of the energy barriers may lead in some regime to broad distributions of sojourn times. However, the main difference with the case studied above is that the sojourn time is approximately the same at each visit on the same site. This situation can be qualified as a “frozen disorder”. The successive time lags are thus no longer statistically independent, as was the case
2.4 Anomalous Diffusion: Scaling Arguments
69
above. Hence, the scaling argument needs to be modified accordingly. We still have √ typ that the typical displacement after N steps is x N ∼ N . This means that the √ number of distinct sites visited by the walk during N steps is also of the order of N . Each √ √ of these sites has been visited of the order of N / N = N times. Hence, the total time t elapsed after N steps can be roughly approximated as t∼
√
√
N
N
τi
(2.125)
i=1
√ (the upper bound in the sum should be understood as the integer part of√ N ). The to the (fixed) sojourn times in the N distinct variables τi under the sum correspond √ visited sites, while the factor N in front of the sum is the typical number of visits per site. Altogether, we have if p(τ ) ∼ 1/τ 1+α when τ → ∞ (with α < 1) that t∼ It follows that
√
√
N
1+ α1
.
(2.126)
N ∼ t α/(1+α) , so that we finally obtain xtyp (t) ∼ t β ,
β=
1 α < , 1+α 2
(2.127)
resulting again in a subdiffusive motion, but with an exponent β different from the exponent α/2 obtained in Eq. (2.124). Interestingly, one has α α > 1+α 2
(2.128)
so that the presence of fixed sojourn times on each site (frozen disorder) actually leads to a faster motion than the “annealed” case where the sojourn √ time is drawn anew at each visit. This can be understood from the fact that only N random variables τi are drawn in the frozen disorder case, while N sojourn times are picked up in the “annealed” case. Drawing more random variables typically leads to the exploration of longer sojourn times τ , which slows down the dynamics.
2.5 First Return Times, Intermittency, and Avalanches We have just seen that broad, power-law distributions naturally appear in the context of anomalous diffusion, either for the jump size distribution in the case of superdiffusion, or for the time lag distribution in the case of subdiffusion. In contrast, for standard random walks with normal diffusion, jump and time lag distributions are narrow. Yet, standard random walks also exhibit broad distributions when considering another observable, that is, the first return time to the origin. In this section, we
70
2 Non-stationary Dynamics and Stochastic Formalism
present some basic results on the distribution of first return times to the origin for the one-dimensional random walk, and then discuss as illustrations two models of physical relevance where the broad statistics of first return times plays a key role.
2.5.1 Statistics of First Return Times to the Origin of a Random Walk We are interested in the statistics of the time (i.e., the number of steps) needed by the walker to reach a given position for the first time. In some applications, it may correspond to the time to reach a target, for instance, in chemistry the time for a molecule to reach a reacting site. We consider here for the sake of simplicity a discrete-time symmetric random walk on a one-dimensional infinite lattice, with sites labeled by n ∈ Z. At each time step, the walker moves from site n to one of the two sites n ± 1 with probability 21 . Since the lattice is invariant by translation, we can set the target at position x = 0. We define pz (n) as the probability that the walk reaches the position x = 0 for the first time after exactly n steps, starting from a given position x = z. Since the walk is symmetric, we can restrict to the case z > 0 without loss of generality. The probability pz (n) satisfies a simple recursion relation pz (n + 1) =
1 1 pz+1 (n) + pz−1 (n), 2 2
(2.129)
because to reach x = 0 after n + 1 steps starting from site z, the walker makes a first step to z ± 1 with probability 21 , and from z ± 1 needs to reach x = 0 in n steps (note that we have used here the Markov property of the walk, i.e., the loss of memory of the past positions). A convenient technical method [6] to solve the recursion relation is to introduce the generating function ∞ pz (n) s n , (2.130) Uz (s) = n=0
where s is a real number such that |s| ≤ 1. The recursion relation for the generating function Uz (s) reads s s (2.131) Uz (s) = Uz+1 (s) + Uz−1 (s) . 2 2 In addition, Uz (s) has to satisfy the boundary conditions U0 (s) = 1 and the condition that for all z, |Uz (s)| ≤ 1, so that Uz (s) remains bounded when z → ∞. The solution of Eq. (2.131) satisfying these boundary conditions reads Uz (s) = λ(s)z with
(2.132)
2.5 First Return Times, Intermittency, and Avalanches
λ(s) =
71
1 1 − 1 − s2 . s
(2.133)
The simplest case corresponds to z = 1, and is called first return time problem. From the definition Eq. (2.130) of the generating function, the probability p(s) can be read from the series expansion of U1 (s) = λ(s). Since λ(s) is an odd function of s, all even coefficients pz (n) with n = 2k in the expansion vanish. This property has a simple interpretation: it is easy to see that starting from site x = 1, one needs an odd number of steps to reach x = 0. Performing the explicit expansion of λ(s), one eventually finds (2n)! . (2.134) p1 (2n − 1) = 2n 2 (2n − 1)(n!)2 One can determine the asymptotic behavior of p1 (2n − 1) for n → ∞ by using the Stirling formula, √ (n → ∞), (2.135) n! ∼ 2π n n n e−n eventually leading to p1 (2n − 1) ∼ √
1 4π n 3/2
(n → ∞).
(2.136)
Hence, discarding the zero probabilities for even values of n, one may state in short that the first return time probability p1 (n) behaves as a decaying power law with exponent − 32 . An immediate consequence is that in an unbounded domain (the infinite lattice considered here), the mean first-passage time n is infinite. This is due to very long excursions where the walker may go arbitrarily far from the origin before coming z back to it. It is interesting to note, however, that ∞ n=0 pz (n) = Uz (1) = λ(1) = 1, meaning that the total probability to return to the origin is equal to one. In other words, the probability that the walker never comes back to origin by escaping infinitely far away is equal to zero for a symmetric random walk. This may no longer be true in other situations, for instance, for a biased random walk that has a higher probability to go to the right than to the left [6]. In this case, the walker has a non-zero probability to escape at infinity and never come back to the origin, and the power-law distribution p1 (n) with exponent − 23 acquires an exponential cut-off, whose value diverges when the bias goes to zero. A corollary of the result concerning the symmetric walk is that the probability that return to the origin during the first n time steps decays as ∞ the walk did not −1/2 p (n) ∼ n , discarding prefactors. This probability is sometimes called n =n+1 1 “survival probability”, by analogy with the situation when the walk is absorbed when reaching the boundary x = 0. Note that the above results are not specific to discrete-space and discrete-time random walks. The return time τ to the origin of a symmetric one-dimensional Brownian motion (i.e., a continuous-time and continuous-space diffusive process) is also distributed according to p(τ ) ∼ τ −3/2 for τ → ∞. This can be shown, for
72
2 Non-stationary Dynamics and Stochastic Formalism
instance, by taking the continuous limit of lattice spacing a and time step t going to zero, with the diffusion coefficient D = a 2 /(2t) held fixed.
2.5.2 Application to Stochastic On–Off Intermittency A relatively simple, though non-trivial application of the probability of first return to the origin is the model of stochastic on–off intermittency, which describes the random continuous dynamics of a physical variable x which spends broadly distributed time durations close to x = 0. A typical trajectory consists in potentially very long rest phases where x ≈ 0, separated by bursts of activity where x = 0 and fluctuations become visible. One of the simplest models describing such an intermittent dynamics is the following Langevin dynamics: dx = ε + ξ(t) x − x 3 , dt
(2.137)
where ε is a (dimensionless) control parameter, and ξ(t) is a Gaussian, deltacorrelated white noise. The coefficient of the cubic term has been set to one by an appropriate choice of units. The noise term ξ(t)x is understood in the Stratonovich convention. Equation (2.137) is a stochastic version, with multiplicative noise, of the simple deterministic dynamics dx = εx − x 3 , dt
(2.138)
which describes a pitchfork bifurcation. For ε < 0, x = 0 is the only stable fixed point, while for √ ε > 0 the fixed point x = 0 becomes unstable and two stable fixed points x = ± ε appear. The value ε = 0 corresponds to the bifurcation point, where the dynamics is most sensitive to fluctuations. This is why it is particularly interesting to add multiplicative noise to the deterministic dynamics, as in Eq. (2.137). To better understand, at a qualitative level, the behavior of this Langevin equation, it is convenient to start by looking at the linearized dynamics, for small values of x, dx = ε + ξ(t) x . dt
(2.139)
If x(t = 0) > 0, the dynamics leaves x(t) > 0 for all subsequent times (this is also true for the full non-linear dynamics). One may thus introduce y(t) = ln x(t), and the linearized equation (2.139) can be reformulated as a simple Langevin dynamics on y, dy = ε + ξ(t) . (2.140) dt
2.5 First Return Times, Intermittency, and Avalanches
73
Note that we have used the standard rules of calculus for the change of variables because the noise is defined in Stratonovich convention. If the noise was defined in the Ito convention, one would have to use Ito calculus, as very briefly introduced in Sect. 2.3.4. One sees from Eq. (2.140) that the dynamics of y is a biased Brownian motion, with bias ε. The bifurcation point ε = 0 of the deterministic dynamics thus corresponds to a symmetric Brownian motion in terms of y, as long as one stays in the validity regime of linearized equation (2.139). This symmetry of the y dynamics under the change y → −y is broken when including the non-linear term in the dynamics of x. The full non-linear Langevin equation (2.137) then turns into the following dynamics of the variable y, for ε = 0, dy = −e2y + ξ(t) . dt
(2.141)
For negative y, the exponential term rapidly becomes negligible when |y| increases, and one recovers Eq. (2.140). In contrast, for y > 0 the exponential term becomes dominant as y increases, and it acts as a strong effective “recoil force”, preventing y to take large positive values. At a coarser level, the dynamics of y may be approximated as a free diffusion on the negative part of the real axis, with a reflecting boundary condition at y = 0. In this approximation, the dynamics of y thus consists of excursions on the negative real axis, starting from y = 0 and coming back to y = 0 after a random time τ , distributed according to p(τ ) ∼ τ −3/2 for τ → ∞, as discussed in Sect. 2.5.1. In terms of the x variable, the dynamics thus consists of long rest phases with x ≈ 0 (y negative and large), separated by bursts of activity where x ∼ 1 (y close to 0). The distribution of the duration τ of rest phases is broad, with a power-law tail ∼ τ −3/2 . This scenario is called stochastic on–off intermittency, and its properties are deeply related to the return time properties of random walk to the origin. Finally, it is possible to generalize the analysis beyond white noise, and one finds in this case that the intermittency phenomenon is controlled by the value of the noise spectrum at zero frequency [1]. Using jump processes, it is also possible to generalize the stochastic on–off intermittency scenario so as to introduce an extended range of control parameter where intermittency takes place, with a continuously varying exponent of the power-law distribution of rest phase durations [2]. In addition, note that beside stochastic on–off intermittency, different scenarios of deterministic on–off intermittency have been put forward. The interested reader is referred, for instance, to Refs. [7, 10, 18].
2.5.3 A Simple Model of Avalanche Dynamics Another physical illustration of return time properties of random walks is the statistics of avalanches, at least in the simplest scenarios. By avalanche, we mean here a correlated sequence of sudden events by which energy is released, as in an earth-
74
2 Non-stationary Dynamics and Stochastic Formalism
quake, for instance (we do not consider here snow avalanches, which correspond to a different phenomenon). Such avalanches are also present in many everyday life soft amorphous materials like foams or pastes (e.g., toothpaste), under slow shear deformation. When observed on small scales, the flow of these dense materials appears heterogeneous. Their deformation is locally not affine, and occurs through spatially localized plastic events involving at most a few tens of particles. These plastic events take place in an otherwise essentially elastic matrix made of the surrounding particles. This small-scale phenomenology motivates the use of mesoscopic elastoplastic models, where plastic events randomly take place on the sites of a lattice [15]. Such plastic events have two main features. First, their dynamics is effectively a threshold dynamics, in the sense that a plastic event occurs if the local shear stress (or, in other word, the local elastic energy stored) exceeds a threshold. Second, during the plastic event, stress is released and at least part of it is redistributed to the rest of the system through a long-range, power-law stress propagator. After this fast redistribution process, other local areas that were below the threshold may end up being above the threshold and yield, in turn redistributing stress across the system and generating further overstressed regions that relax through a plastic event. This is the essence of avalanches, which consists of potentially very long sequences of plastic events that are induced by the previous ones. The avalanche stops when the last plastic event no longer triggers further events. The size S of an avalanche is the number of plastic events that have been subsequently generated by the first event without increasing the external load (which means that the duration of an avalanche is much shorter than the characteristic time of the externally applied deformation). The distribution of the size S is often observed to have a power-law behavior P(S) ∼ 1/S 1+α for large S, up to a large cut-off size induced either by the finite system size, or a non-zero shear rate. In two-dimensional elastoplastic models, the exponent α is close to 0.3 [12], while in the mean-field scenario α = 21 [11]. We discuss below a toy model accounting in the simplest possible way of the mean-field scenario of avalanches, slightly rephrasing the argument presented in Ref. [11]. In agreement with some simple elastoplastic models like the Picard model [15], we assume that once the local stress on a site exceeds the threshold, it does not yield immediately, but yields after a random (exponentially distributed) time. We call overstressed sites the lattice sites of the model where the local stress exceeds the threshold, and where the plastic yielding did not occur yet. Given that yielding events occur at random continuous times, only one event occurs at time (no events occur simultaneously). We denote as n k the number of overstressed sites after k yielding events. Note that the first overstressed site is generated by the externally applied deformation, while later overstressed sites are induced by successive stress redistributions (and not by the external drive). To keep the presentation at an elementary level, we oversimplify the model and replace the explicit stress redistribution by a random creation of overstressed sites. To be specific, we assume that a yielding event randomly generates either zero or two overstressed sites, with equal probability. The number of overstressed sites then
2.5 First Return Times, Intermittency, and Avalanches
75
evolves stochastically according to n k+1 = n k ± 1, thus performing a symmetric random walk. The initial condition is n 0 = 1 (the first overstressed site is induced by the external load), and the avalanche stops when there are no more overstressed sites (i.e., n k = 0). The end of the avalanche thus precisely corresponds to the first return time of the random walk n k to the origin, where the number k of yielding events plays the role of time in the random walk. The avalanche size S is thus nothing but the “time” k ∗ at which the walk comes back to the origin for the first time. We have seen in Sect. 2.5.1 that the distribution of the first return time of a random walk to the origin has a power-law tail with an exponent −3/2. It follows that the avalanche size also has a power-law behavior at large S, P(S) ∼ 1/S 3/2 , hence the mean-field value α = 21 mentioned above. In the above argument, the fact that the number of overstressed sites induced by a yielding event can only be zero or two is obviously a rather strong restriction with respect to the true dynamics resulting from stress redistribution. The only reason for this quite artificial assumption is to allow for a direct mapping to a purely symmetric random walk as the one studied in Sect. 2.5.1. Yet, this assumption is not crucial, and was rather made for a pedagogical purpose. The result remains the same for a more general statistics of the number of overstressed sites generated by a yielding event, provided that the average number of overstressed sites created is equal to one so that the random walk remains statistically unbiased. In case the random walk would acquire a small bias, the distribution of avalanche size would be exponentially cut off beyond a large characteristic avalanche size.
2.6 Fast and Slow Relaxation to Equilibrium 2.6.1 Relaxation to Canonical Equilibrium Up to now, we have mostly considered steady-state statistical properties, although we already mentioned time-dependent situations when considering random walks. Here, we wish to explicitly discuss, in generic terms, the convergence of the probability distribution of configurations to the equilibrium distribution. With this aim in mind, let us consider a stochastic process with n energy states E i , i = 1, . . . , n. The continuous-time stochastic process is defined by transition rates W ji from configuration i with energy E i , to configuration j with energy E j (i = j) —we use here matrix notations for the transition rates for reasons that will become clear below. These transition rates are assumed to obey the following detailed balance relation: W ji e−β Ei = Wi j e−β E j
(2.142)
for all pairs (i, j) (with β the inverse temperature), so that the equilibrium probability distribution reads
76
2 Non-stationary Dynamics and Stochastic Formalism eq
Pi =
1 −β Ei e , Z
Z=
n
e−β E j .
(2.143)
j=1
The master equation (2.9) governing the probability Pi (t) to be in configuration i at time t reads with the current notations d Pi Wi j P j (t) − W ji Pi (t) . = dt j ( j=i)
(2.144)
Defining the coefficient Wii as Wii = −
W ji ,
(2.145)
j ( j=i)
we can rewrite the master equation as d Pi = Wi j P j (t), dt j=1 n
(2.146)
which shows that this equation takes the form of a matrix equation. We may thus rewrite the equation more formally as d P(t) = WP(t), dt
(2.147)
where P(t) is the vector of components (P1 (t), . . . , Pn (t)) and W is the matrix of elements (Wi j ), i, j = 1, . . . , n. In this form, finding the stationary distribution amounts to an eigenvalue problem, namely, finding the eigenvector Peq associated with the eigenvalue λ = 0 of the matrix W. However, this matrix reformulation potentially provides more information than just the stationary distribution. The other, non-zero, eigenvalues (that can be shown to be negative) precisely describe the relaxation of the distribution to the equilibrium value. Diagonalizing the matrix W, one eventually obtains n eλ j t Q( j) , (2.148) P(t) = j=1
where the λ j ’s are the eigenvalues of the matrix W (labeled in decreasing order such that λ1 = 0 and λn < . . . < λ2 < 0), and the quantities Q( j) are vectors depending on the eigenvectors of W and on the initial distribution P(t = 0). It is clear from with the Eq. (2.148) that P(t) converges to Q(1) when t → ∞, so that Q(1) identifies n Pi (t) = 1 equilibrium distribution Peq . Note that the normalization condition i=1 implies that, for j > 1,
2.6 Fast and Slow Relaxation to Equilibrium n
77 ( j)
Qi
= 0.
(2.149)
i=1
At large enough times, only the most slowly decreasing term in Eq. (2.148) contributes on top of the equilibrium distribution, and the probability distribution can be approximated as (2.150) P(t) ≈ Peq + e−λ2 t Q(2) . These generic properties can easily be illustrated in the case of a two-state systems (n = 2). Given the normalization constraint P1 (t) + P2 (t) = 1, the evolution of the probabilities can be expressed only in terms of P1 (t), leading to d P1 = − W12 + W21 P1 (t) + W12 dt
(2.151)
whose solution is readily obtained as P1 (t) = P1 + e−αt (P1 (0) − P1 ) eq
eq
(2.152)
with α = W12 + W21 and assuming that the detailed balance relation (2.142) holds eq
P1 =
W12 e−β E1 = −β E . 1 + e −β E 2 W12 + W21 e
(2.153)
In this simple case, the relaxation to equilibrium is purely exponential at all times, and not only asymptotically at large times.
2.6.2 Dynamical Increase of the Entropy Another way to characterize the relaxation to equilibrium is to show that relaxation is accompanied by an increase of entropy, in agreement with the second law of thermodynamics. With this aim in mind, we first need to introduce a time-dependent entropy defined as P(C, t) ln P(C, t). (2.154) S(t) = − C
This definition closely follows definition (1.71). Coming back to usual notations for the transition rates, we assume that W (C |C) = W (C|C ), which is a specific form of the detailed balance relation (2.12) associated with a uniform equilibrium distribution over a subset of configurations (as in the microcanonical ensemble). Under this assumption, one can show that S(t) is an increasing function of time. Let us start by computing the time derivative of the entropy:
78
2 Non-stationary Dynamics and Stochastic Formalism
dP dP dS =− (C, t) ln P(C, t) − (C, t). dt dt dt C C The last term cancels out due to the normalization condition the master equation, one has
C
(2.155)
P(C, t) = 1. Using
dS =− − W (C |C)P(C, t) + W (C|C )P(C , t) ln P(C, t) dt C C (=C) ln P(C, t) W (C |C)P(C, t) − W (C|C )P(C , t) . (2.156) = C,C (C=C )
Exchanging the notations C and C in the last equation, we also have dS = ln P(C , t) W (C|C )P(C , t) − W (C |C)P(C, t) . dt C,C (C=C )
(2.157)
Summing Eqs. (2.156) and (2.157), and using the detailed balance property W (C |C) = W (C|C ), we obtain 1 dS = ln P(C , t) − ln P(C, t) P(C , t) − P(C, t) W (C|C ). dt 2 C,C (C=C ) (2.158) As [ln P(C , t) − ln P(C, t)] and [P(C , t) − P(C, t)] have the same sign, one concludes that dS ≥ 0. (2.159) dt This is one possible statement, in the context of stochastic processes, of the second law of thermodynamics. Moreover, in the stationary state, d S/dt = 0, and one necessarily has for all pairs (C, C ) either Pst (C) = Pst (C ) or W (C|C ) = 0, where Pst (C) is the stationary probability distribution. One then recovers, consistently with the detailed balance assumption W (C |C) = W (C|C ), the postulate of equilibrium statistical mechanics stating that mutually accessible configurations have the same probability. More generally, for Markovian stochastic processes described by master equation ˜ (2.9), it is always possible to define a functional S({P(C, t)}) that increases with time, without need for detailed balance or microreversibility properties [20]. The ˜ general definition of S({P(C, t)}) is ˜ =− S(t)
C
P(C, t) ln
P(C, t) . Pst (C)
(2.160)
2.6 Fast and Slow Relaxation to Equilibrium
79
A drawback of this definition is that the stationary distribution Pst (C) needs to be ˜ which in many cases restricts the usefulness of the known in order to define S, ˜ Yet, a simple application of the generalized definition (2.160) of the functional S. entropy is the case of the canonical ensemble. Using Pst ∝ e−β E , one finds ˜ = β Feq + S(t) − βE(t), S(t)
(2.161)
where Feq is the equilibrium free energy, and S(t) the time-dependent entropy defined in Eq. (2.154). This definition suggests to introduce a time-dependent free energy F(t) as F(t) = E(t) − T S(t) (2.162) with T = β −1 the temperature. In this way, one has ˜ = β Feq − F(t) , S(t)
(2.163)
leading to d F/dt ≤ 0. Hence, the time-dependent free energy is a decreasing function of time. This result is consistent with the standard thermodynamic result that the free energy of a system in contact with a thermostat can only decrease under spontaneous evolution.
2.6.3 Slow Relaxation and Physical Aging Although many systems converge to a stationary state on times shorter than or comparable to the observation time, it turns out that some systems do not reach a steady state and keep evolving on time scales that can be very large compared to standard observation times. This is the case, for instance, of glasses, which keep aging for years or more, as well as in laser cooling experiments [3]. It is also likely that aging mechanisms, or slow relaxation effects, play a significant role in many different types of complex systems. Even though the aging mechanisms may differ from one situation to the other, it is certainly of interest to investigate one of the simplest known aging phenomena, illustrated by the trap model, which we describe here within a generic formalism that does not rely on a specific physical realization. Let us consider a model system in which to each configuration C is associated a given lifetime τ . This lifetime τ is the mean time spent in configuration C before moving to another configuration. As we consider only temporal aspects of the dynamics, and not other types of observables (energy, magnetization,...), we simply label the configurations by their lifetime τ . We then choose a simple form for the transition rate W (τ |τ ), namely, 1 (2.164) W (τ |τ ) = ψ(τ ). τ
80
2 Non-stationary Dynamics and Stochastic Formalism
The function ψ(τ ) is the a priori probability distribution of the configurations τ , meaning that the selected new ∞configuration is chosen completely at random. From the normalization condition 0 dτ ψ(τ ) = 1, we have
∞
dτ W (τ |τ ) =
0
1 , τ
(2.165)
so that the characteristic escape time from a configuration with lifetime τ is precisely τ , as it should. For simplicity, we also assume that all lifetimes τ are greater than a value τ0 that we set to τ0 = 1 in the following. The master equation then reads ∞ ∞ ∂P (τ, t) = −P(τ, t) dτ W (τ |τ ) + dτ W (τ |τ )P(τ , t) ∂t 1 1 ∞ 1 dτ = − P(τ, t) + ψ(τ ) P(τ , t). (2.166) τ τ 1 At equilibrium, the probability to be in a configuration with lifetime τ is proportional to τ and to the a priori distribution ψ(τ ) of configurations: Peq (τ ) = where τ is defined as
τ =
∞
1 τ ψ(τ ), τ
(2.167)
dτ τ ψ(τ ).
(2.168)
1
Similar to the case of anomalous diffusion discussed in Sect. 2.4, the key ingredient that determines the behavior of the process is the shape of the tail of the lifetime distribution ψ(τ ). The most interesting situation corresponds to a distribution ψ(τ ) with a power-law tail. Here, for simplicity, we take a distribution with a pure powerlaw form, namely, α τ > 1 (α > 0). (2.169) ψ(τ ) = 1+α , τ An example of physical realization is the case of a particle trapped into potential wells of random depth E, with an exponential distribution ρ(E) =
1 −E/E0 e E0
(E > 0) .
(2.170)
The lifetime τ is given by the standard Arrhenius law τ = τ0 e E/T ,
(2.171)
where τ0 = 1 is a microscopic time scale. Using the relation ψ(τ )|dτ | = ρ(E)|d E|, one precisely finds the form (2.169) for ψ(τ ), with α = T /E 0 .
2.6 Fast and Slow Relaxation to Equilibrium
81
In the case α > 1, τ is finite, but if α ≤ 1 then τ is infinite, so that the equilibrium distribution (2.167) does not exist, as it is not normalizable. As a result, no stationary state can be reached, and the system keeps drifting toward configurations with larger and larger lifetimes τ . It is then of interest to determine the time-dependent probability distribution P(τ, t) in the long time regime. We postulate the following scaling form: P(τ, t) =
1 τ φ . t t
(2.172)
From the normalization condition of P(τ, t), one has
∞
dτ P(τ, t) =
1
1 t
∞
dτ φ
τ
1
t
= 1,
(2.173)
from which one gets, with the change of variable u = τ/t,
∞
du φ(u) = 1.
(2.174)
1/t
As φ(u) does not depend explicitly on time t, the above condition cannot be satisfied for all t. But we are looking for an asymptotic large-t solution, so that we impose that Eq. (2.174) is satisfied in the infinite t limit, namely,
∞
du φ(u) = 1.
(2.175)
0
As a result, the scaling form (2.172) is an approximate solution that becomes exact when t → ∞. Equation (2.172) yields for the time derivative of P(τ, t): 1 τ τ τ ∂P =− 2φ − 3φ , ∂t t t t t
(2.176)
where φ is the derivative of φ. Multiplying Eq. (2.176) by t 2 , one obtains, with the notations u = τ/t and v = τ /t, 1 − φ(u) − uφ (u) = − φ(u) + ψ(ut) t u
∞ 1/t
dv φ(v). v
(2.177)
Using the specific form (2.169) of ψ(τ ), we find
∞ 1 dv α 1− φ(u) + uφ (u) + 1+α t −α φ(v) = 0. u u 1/t v
(2.178)
For the above equation to be well defined in the infinite t limit in which it is supposed to be valid, the explicit t-dependence has to cancel out. One thus needs to have
82
2 Non-stationary Dynamics and Stochastic Formalism
∞
1/t
dv φ(v) ∼ t α , v
t → ∞,
(2.179)
which requires that φ(v) has the following asymptotic form at small v: φ(v) ≈
φ0 , vα
v → 0.
(2.180)
Here, φ0 is an unknown constant, to be determined later on from the normalization condition of φ(u). The master equation is then finally written as the following differential equation:
1 φ0 1− φ(u) + uφ (u) + 1+α = 0. u u
(2.181)
This equation is a linear inhomogeneous differential equation, and its solution can be found using standard techniques. The solution of Eq. (2.181) satisfying the normalization condition (2.175) reads [14] sin(π α) 1 −1/u e φ(u) =
(α) u
1/u
dv vα−1 ev ,
(2.182)
0
∞ where (α) = 0 x α−1 e−x d x is the Euler Gamma function. It is rather easy to show that φ(u) ∼ u −α for u → 0 as expected, and that φ(u) ∼ u −1−α for u → ∞, leading for P(τ, t) to P(τ, t) ∝ τ ψ(τ ), P(τ, t) ∝ ψ(τ ),
τ t, τ t.
(2.183) (2.184)
These asymptotic behaviors can be interpreted rather easily: configurations with lifetimes τ t have been visited a large number of times, so that they are quasiequilibrated; in contrast, configurations with lifetimes τ t have been visited at most once, and the precise value of τ is not yet felt by the dynamics. In the physical example of the trap model defined by Eqs. (2.170) and (2.171), the aging regime occurs for α = T /E 0 < 1, so that Tg ≡ E 0 turns out to be the glass transition temperature. For T < Tg , the energy distribution p(E, t), obtained from P(τ, t) by a simple change of variable, takes the scaling form p(E, t) = (E − T ln t). A logarithmic drift toward larger energy barriers is thus observed, as illustrated in Fig. 2.4. The average trap depth is given by E(t) ≈ E 1 + T ln t, where E 1 is a temperature-dependent constant.
(2.185)
2.7 Exercises 0.4 -1
10 P(E,t)
0.3
P(E,t)
Fig. 2.4 Energy barrier distribution in the aging regime of the trap model (T = Tg /2), for different times t = 106 (full line), t = 107 (dashed line) and t = 108 (dot-dashed). The distribution drifts toward larger energy barriers, logarithmically with time. The inset shows the same data on a semi-logarithmic scale, to visualize the exponential tails
83
10
-2
0.2 -3
10 0
0
10
5
15
E
0.1
0
5
10
15
20
30
25
E
2.7 Exercises 2.1 Detailed balance with respect to the canonical equilibrium distribution Find simple transition rates obeying the canonical detailed balance relation Eq. (2.12). It may be helpful to consider transition rates W (C |C) that depend on the configurations C and C only through the energy difference E = E(C ) − E(C). 2.2 Random walk with memory Consider a discrete-time random walk defined by the two variables xt and vt as xt+1 = xt + vt ,
vt+1 = αvt + ξt
(2.186)
with a parameter 0 < α < 1, and a random variable ξt drawn at each time t from an arbitrary distribution with zero mean and variance σ 2 . The initial condition at time t = 0 is x0 = v0 = 0. The random walk may be considered to have memory in the sense that the displacement at time t + 1 is correlated with the displacement at time t, at odds with elementary random walks. Yet in terms of the couple of variables (xt , vt ) the dynamics is Markovian. Determine the mean-square displacement xt2 for any time t > 0. Hint: Write a matrix equation for the vector X t = (xt2 , vt2 )T (the superscript “T ” indicates a transpose), and find a particular solution proportional to (t, 1)T . Then from the linearity of the equation deduce the general solution. 2.3 Linear Langevin equation with white or colored noise (a) Show that for a velocity v(t) obeying the linear Langevin equation (2.30), the time correlation v(t)v(t ) decays exponentially as described by Eq. (2.46). (b) Show that if the noise ξ(t) is not a white noise, but an Ornstein–Uhlenbeck process with correlation time τ , ξ(t)ξ(t ) =
D −|t−t |/τ e , τ
(2.187)
84
2 Non-stationary Dynamics and Stochastic Formalism
the variance v2 of the velocity in steady state is given by Eq. (2.59). 2.4 Fokker–Planck equation in the Stratonovich interpretation Show that the multiplicative Langevin equation (2.52) in the Stratonovich interpretation leads to Fokker–Planck equation (2.96). Hint: Use the fact that in the Stratonovich scheme, usual rules of calculus can be applied to the Langevin equation; define the variable y through dy = d x/B(x) and show that this variable obeys the additive Langevin equation (2.98). Conclude on the Fokker–Planck equation governing the probability distribution P(x). 2.5 Fully biased random walk in a disordered environment Consider a particle performing a fully biased random walk on a one-dimensional lattice in the presence of quenched disorder. The disorder is encoded in random trapping times τi on each site i, drawn from a broad distribution ψ(τ ) ∼ τ0α /τ 1+α with 0 < α < 1. The particle stays a time τi on site i and then jumps to site i + 1 (displacements are fully biased due to the implicit presence of a strong external field). Using simple scaling arguments, show that the typical displacement of the particle after a time t scales like t γ , with an exponent γ to be determined. 2.6 Decrease of the time-dependent free energy Starting from the generic master equation (2.9) under the assumption of detailed balance with respect to the canonical equilibrium distribution, show that the timedependent free energy defined in Eq. (2.162) is a decreasing function of time.
References 1. Aumaître, S., Pétrélis, F., Mallick, K.: Low-frequency noise controls on-off intermittency of bifurcating systems. Phys. Rev. Lett. 95, 064101 (2005) 2. Bertin, E.: On-off intermittency over an extended range of control parameter. Phys. Rev. E 85, 042104 (2012) 3. Bertin, E., Bardou, F.: From laser cooling to aging: a unified Lévy flight description. Amer. J. Phys. 76, 630 (2008) 4. Bouchaud, J.P., Georges, A.: Anomalous diffusion in disordered media: statistical mechanisms, models and physical applications. Phys. Rep. 195, 127 (1990) 5. Cugliandolo, L., Lecomte, V.: Rules of calculus in the path integral representation of white noise Langevin equations: the Onsager-Machlup approach. J. Phys. A Math. Theor. 50, 345001 (2017) 6. Feller, W.: An Introduction to Probability Theory and its Applications, vol. I, 3rd edn. Wiley, New York (1968) 7. Fujisaka, H., Yamada, T.: A new intermittency in coupled dynamical systems. Prog. Theor. Phys. 74, 918 (1985) 8. Gardiner, C.W.: Stochastic Methods. Springer, Berlin (2009) 9. Gnedenko, B.V., Kolmogorov, A.N.: Limit Distributions for Sums of Independent Random Variables. Addison-Wesley, Boston (1954) 10. Heagy, J.F., Platt, N., Hammel, S.M.: Characterization of on-off intermittency. Phys. Rev. E 49, 1140 (1994)
References
85
11. Jagla, E.A.: Avalanche-size distributions in mean-field plastic yielding models. Phys. Rev. E 92, 042135 (2015) 12. Liu, C., Ferrero, E.E., Puosi, F., Barrat, J.L., Martens, K.: Driving rate dependence of avalanche statistics and shapes at the yielding transition. Phys. Rev. Lett. 116, 065501 (2016) 13. Metzler, R., Klafter, J.: The random walk’s guide to anomalous diffusion: a fractional dynamics approach. Phys. Rep. 339, 1 (2000) 14. Monthus, C., Bouchaud, J.P.: Models of traps and glass phenomenology. J. Phys. A Math. Gen. 29, 3847 (1996) 15. Nicolas, A., Ferrero, E.E., Martens, K., Barrat, J.L.: Deformation and flow of amorphous solids: a review of mesoscale elastoplastic models. Rev. Mod. Phys. 90, 045006 (2018) 16. Oksendal, B.: Stochastic Differential Equations: An Introduction With Applications, 6th edn. Springer, Berlin (2010) 17. Petrov, V.V.: Sums of Independent Random Variables. Springer, Berlin (1975) 18. Platt, N., Spiegel, E.A., Tresser, C.: On-off intermittency: a mechanism for bursting. Phys. Rev. Lett. 70, 279 (1993) 19. Risken, H.: The Fokker-Planck Equation. Springer, Berlin (1996) 20. Van Kampen, N.G.: Stochastic Processes in Physics and Chemistry. North Holland (1992)
Chapter 3
Models of Particles Driven Out of Equilibrium
Up to now, we have mainly considered physical systems, in which elementary units are implicitly atoms or molecules. In this case, the laws of motion of the individual particles are known, and the main difficulty consists in being able to change the scale of description, going from particle scale to system size. However, our everyday life experience tells us that many familiar systems are composed of interacting macroscopic units, and thus behave very differently from atoms or molecules: examples range from sand piles, foams, bacteria colonies, animal flocks, to quote only a few examples. The nonequilibrium character of these systems leads to a wealth of interesting behaviors [36, 39]. In such cases, it is clear that the interacting macroscopic objects (sand grains, bubbles, bacteria, etc.) cannot be described in the same way as atoms or molecules. The difficulties encountered when trying to apply a statistical physics approach to such assemblies of macroscopic units are then twofold. On the one hand, a model should be given for the dynamics of individual units, and it is not always clear how reliable such modeling is to describe realistic systems, given the simplifications and assumptions made. On the other hand, reasonable models of the dynamics of individual objects usually do not have similar conservation laws and time-reversal symmetry as the Hamiltonian dynamics of molecular systems. In particular, time-reversal symmetry is systematically broken for the dynamics of macroscopic objects, or more generally for driven systems, including small-scale ones. This lack of time-reversal symmetry may have far reaching consequences [45]. Hence it may be hard, even in specific cases, to build a statistical physics approach from a postulate similar to the hypothesis of equiprobability of configurations having the same energy. Interesting attempts in this direction, notably in the context of granular matter, have however been proposed [22], and are briefly discussed below. In this chapter, we illustrate on several classes of models how different statistical physics techniques can be devised, in specific cases, to describe systems driven in a nonequilibrium steady state as well as assemblies of interacting macroscopic units [4]. We start in Sect. 3.1 by describing the simple situation of a single particle or unit driven into a nonequilibrium steady state by different types of applied forces, e.g., external forces or active forces. In Sect. 3.2, we turn to systems where parti© Springer Nature Switzerland AG 2021 E. Bertin, Statistical Physics of Complex Systems, Springer Series in Synergetics, https://doi.org/10.1007/978-3-030-79949-6_3
87
88
3 Models of Particles Driven Out of Equilibrium
cles or more general units are not conserved and may appear or disappear, leading notably to the interesting phenomenology of absorbing phase transitions that we discuss at mean-field level, after having introduced the generic formalism of birth– death processes. Coming back to systems with conserved particles, we then discuss the statistical description of their spatial properties. For some carefully designed nonequilibrium stochastic models of interacting particles, it is possible to determine exactly the full N -particle probability distribution, as discussed in Sect. 3.3 on a few standard examples (Zero-Range Process and Asymmetric Simple Exclusion Process). Yet, finding the exact N -body distribution remains the exception rather than the rule, and in many cases approximation schemes are needed. We discuss in Sect. 3.4 how a relatively general approximation strategy can be divised in the context of systems of frictional particles like granular matter. Finally, we present in Sect. 3.5 another strategy to describe spatial properties of driven systems of interacting particles, based on a reduction of information to a one-particle distribution and then further to hydrodynamic fields. This is the generic goal of the so-called kinetic theory, valid when interactions are limited to binary “collisions”, that are very localized in space and time. We illustrate this generic and versatile approach on the example of collective motion of active particles.
3.1 Driven Steady States of a Particle with Langevin Dynamics We investigate here a few examples of a single-particle driven into a nonequilibrium steady state by different types of mechanisms.
3.1.1 Non-zero Flux Solution of the Fokker–Planck Equation We first consider the general solution of the one-dimensional Fokker–Planck equation for a particle with arbitrary position-dependent drift and diffusion coefficients, as given in Eq. (2.90) as a result of the Kramers–Moyal expansion. This equation can be recast into a continuity equation ensuring the conservation of probability, ∂J ∂p (x, t) = − (x, t), ∂t ∂x
(3.1)
where the probability current J (x, t) is given by (up to a redefinition of α1 ) 1 ∂p . J (x, t) = α1 (x) p(x, t) − α2 (x) 2 ∂x
(3.2)
3.1 Driven Steady States of a Particle with Langevin Dynamics
89
The steady-state condition ∂ p/∂t = 0 yields a uniform and time-independent current, J (x, t) = J0 . Equilibrium corresponds to the specific case J0 = 0, and one finds in this case x C 2α1 (y) exp , (3.3) dy peq (x) = α2 (x) α2 (y) where C is a normalization constant. For an overdamped Langevin dynamics γ
dx = −U (x) + ξ(t) dt
(3.4)
describing the stochastic motion of a particle in a potential U (x), with γ a friction coefficient, and ξ(t) a Gaussian white noise satisfying ξ(t) = 0 ,
ξ(t)ξ(t ) = 2γ k B T δ(t − t ) ,
(3.5)
one has α1 (x) = −U (x)/γ and α2 (x) = 2k B T /γ , leading to the Boltzmann distribution peq (x) ∝ exp[−U (x)/k B T ]. Turning to a nonequilibrium situation with non-zero current, we consider the case when a non-conservative force f is added to the conservative force −U (x), yielding for the Langevin equation γ
dx = −U (x) + f + ξ(t) . dt
(3.6)
To allow for a stationary current, we impose periodic boundary conditions. We set γ = 1 to lighten notations. Defining the system on an interval [0, L], we thus get p(0, t) = p(x, t), and we can set U (0) = U (L) = 0 without loss of generality. We now have α1 (x) = −U (x) + f , and α2 (x) = 2k B T . Looking for the nonequilibrium stationary distribution pst (x) associated with a non-zero current J0 , we get using standard integration procedures pst (x) =
1 Z
1−K
x
dy eβU (y)−β f y e−βU (x)+β f x
(3.7)
0
with β = 1/k B T , and K = β J0 Z can be interpreted as a normalized current. The L constant Z is fixed by the normalization condition 0 pst (x) d x = 1. The parameter K (and thus the current J0 ) is not a free parameter but is determined as a function of the non-conservative force f by the periodic boundary condition pst (0) = pst (L), leading to −1 L K = 1 − e−β f L dy eβU (y)−β f y . (3.8) 0
The expression of the current J in terms of the non-conservative force f follows.
90
3 Models of Particles Driven Out of Equilibrium
3.1.2 Ratchet Effect in a Time-Dependent Asymmetric Potential We have just seen in Sect. 3.1.1 a simple and natural mechanism to generate a particle current, by applying a non-conservative force on the particle, which breaks the space reversal (i.e., left/right) symmetry. We now briefly discuss a more subtle way to generate a current without explicitly applying a non-conservative force, using the so-called ratchet effect described below. One if its simplest instantiations is that of the on-off potential, in which a potential U (x) is randomly switched on and off with a rate λ. During the on periods, the potential is applied on the particle and the Langevin dynamics reads dx = −U (x) + ξ(t) (3.9) γ dt while during the off periods, the particle simply experiences a Brownian motion, γ ddtx = ξ(t). The statistics of the Gaussian white noise ξ(t) are the same in the on and off periods and are given by Eq. (3.5). The average duration of both on and off periods is τ = 1/λ. We assume that τ is large enough so that equilibration with or without potential can be reached to a good approximation during both the on and off phases. We now try to evaluate the average current resulting from this ratchet dynamics, considering the system to be defined on a one-dimensional interval [0, L], with periodic boundary conditions. As in the case of an applied non-conservative force, an important issue is to break the space reversal symmetry. This implies the use of a potential that is not symmetric under space reversal. As a simple and illustrative example, we consider a periodic potential U (x + L) = U (x), with U (x) = −bx/L for 0 < x < L. This discontinuous periodic potential may be understood as a limit of a continuous potential having a very steep part around x = 0. During the off phase, motion is purely diffusive with no drift, so that the average displacement of the particle is zero. In contrast, during the on phase, the average displacement is nonzero. Assuming quasi-equilibration during the off phase, the average position at the beginning of the on phase is xoff = L2 . At the end of the on phase, assuming again quasi-equilibration with a Boltzmann statistics, the average position of the particle is 1 − (1 + b) e−b L. (3.10) xon = b(1 − e−b ) Hence during the on phase, the particle has moved on average by a distance x = xon − xoff . Note that this seemingly obvious result is actually not exact, as it neglects the probability that the particle has crossed the boundaries, either on the left (x = 0) or on the right (x = L). This assumption is valid only in the small temperature limit, when the drift dominates over diffusion, and activated barrier crossing can be neglected on a time scale τ . Note, however, that diffusion cannot be completely neglected, as its role is essential in the off phase to restore a spatially uniform probability distribution.
3.1 Driven Steady States of a Particle with Langevin Dynamics
91
The average current J¯ is then given by the average total displacement over the off and on phases, yielding
x J¯ = . (3.11) 2τ For τ → ∞, the current vanishes as expected, because this corresponds to the equilibrium limit. A finite τ is thus necessary to reach a current. As already mentioned, the above simple argument relies on the assumption that τ is large enough so that quasi-equilibration is reached. A smaller value of τ still leads to a non-zero current, but it is then more difficult to evaluate. Note finally that in the limit τ → 0, the current also vanishes, because the particle has no time to relax in the potential U (x). An optimal value of τ leading to a maximal current is thus expected.
3.1.3 Active Brownian Particle in a Potential We have considered above a drive by a uniform non-conservative force, and by a random switching on and off of a non-symmetric potential. Another way to drive a particle far from equilibrium is to apply a self-propulsion force to the particle. This type of driving mechanism is particularly relevant in dimension higher than one, i.e., in two and three dimensions. Such particles are called active particles, because they are able to extract energy from their environment (e.g., chemical energy) and to convert it into mechanical motion. Such active systems have attracted a lot of attention in the last fifteen years, both at the experimental and theoretical levels [36]. For instance, a number of experiments on active colloids [12, 13, 27, 40, 41, 46], self-propelled millimetric grains [18, 33, 38], or bacteria [42, 52], has been performed. On the theoretical side, minimal models like Active Brownian Particles (ABP) or Run-and-Tumble Particles (RTP) [14] have been introduced and studied under different experimentally relevant protocols. We discuss below the example of an ABP in a potential in two dimensions [3]. Note that calculations are very similar for a RTP. In two dimensions, an ABP is a point-like particle characterized by its position r = (x, y) and by an angle θ that gives the orientation of the self-propulsion force. In the presence of an external potential U (r), the dynamics reads ∂r = v0 e(θ ) − κ∇U , ∂t
(3.12)
where κ is a mobility coefficient and v0 e(θ ) stands for the self-propulsion “force” (written here in velocity units by multiplying the force by κ); v0 is the fixed particle speed, and e(θ ) is the unit vector in the direction θ . The operator ∇ is the gradient operator, ∇ = (∂x , ∂ y ). The angle θ diffuses with an angular diffusion coefficient DR , meaning that dθ = ξ(t) , dt
ξ(t) = 0 , ξ(t)ξ(t ) = 2DR δ(t − t ) .
(3.13)
92
3 Models of Particles Driven Out of Equilibrium
The statistics of the position r and of the angle θ is described by the probability density P(r, θ ), which obeys the following Fokker–Planck equation ∂2 P ∂P (r, θ ) = −∇ · v0 e(θ ) − κ∇U (r) P(r, θ ) + DR 2 . ∂t ∂θ
(3.14)
To proceed further, it is convenient to expand the distribution P(r, θ ) onto angular Fourier modes. Defining the angular Fourier mode f k (r, t), with k integer, as
2π
f k (r, t) =
dθ P(r, θ, t) eikθ ,
(3.15)
0
the Fokker–Planck equation (3.14) transforms into ∂ fk v0 (∂x + i∂ y ) f k−1 + (∂x − i∂ y ) f k+1 + κ∇ · ( f k ∇U ) − k 2 DR f k , =− ∂t 2 (3.16) with i the imaginary unit (i 2 = −1). We aim at determining the stationary probability density ρ(r), defined by integrating P(r, θ ) over the angle θ . It turns out that ρ(r) = f 0 (r), the angular Fourier mode of order zero. We consider a limit of high angular diffusion coefficient DR , and consider that the density mode ρ(r), which is a conserved quantity, relaxes much more slowly than the angular modes f k with k = 0. At a formal level, the relevant limit to consider is to take both DR and v0 to infinity, keeping fixed the quantity D = v02 /2DR which corresponds to an effective spatial diffusion coefficient. This limit is called the diffusive limit. Taking advantage of the fast relaxation of modes f k with k = 0, one can re-express these modes in terms of ρ(r) and its derivatives to leading order in 1/DR . This allows us to close Eq. (3.16) for k = 0 in terms of ρ, eventually leading to the evolution equation [30]
dρ D
ρ = D∇ · ∇ ρ + dt 8DR κD + ∇ · [ ρ∇U + (∇ρ · ∇)∇U ] + κ∇ · (ρ∇U ) . DR
(3.17)
The stationary profile ρ(r) satisfying Eq. (3.17) can be searched for perturbatively to first order in 1/DR in the form 1 ρ(r) ∝ exp −φ0 (r) − φ1 (r) . DR
(3.18)
For simplicity, we assume that the potential U (r) is invariant along the direction y, namely, U (r) = U (x). We get in this case for φ0 (x) and φ1 (x) the explicit expressions
3.1 Driven Steady States of a Particle with Langevin Dynamics
κ U (x) , D κ 13κ 2 2 7κ 3 x U (x) + dz U (z)3 . φ1 (x) = − U (x) − 8 16D 8D 2 0
φ0 (x) =
93
(3.19a) (3.19b)
At zeroth order in 1/DR in this diffusive limit, one recovers an effective Boltzmann equilibrium, with an effective temperature Teff = D/(κk B ). However, the distribution departs from the Boltzmann distribution once the first-order correction is included. It is interesting to note that the correction is non-local with respect to the potential U (x), in the sense that ρ(x) does not depend only on U (x), but also on U (x ) for x = x.
3.2 Dynamics with Creation and Annihilation of Particles Up to now, we have considered examples where the interacting particles are stable objects that exist at all time. However, there exist situations where particles are no longer conserved at all time and can be created or annihilated. This is the case, for instance, when considering chemical reactions, where a molecule of a given chemical species can be transformed into one of another species. Another example comes from theoretical biology, when describing the dynamics of a population of living beings who have a finite lifetime and are able to reproduce, leading to fluctuations of the population size. Quite importantly, this type of particle number fluctuations typically differs from that of the grand-canonical ensemble considered in Sect. 1.3.3, where particles are stable objects exchanged with a reservoir, also leading to particle number fluctuations in the system of interest. In this latter case, the global conservation of particle number in the global system (system of interest plus reservoir) leads to a specific structure of the distribution of microscopic configuration, with a chemical potential describing the effect of the reservoir. The definition of the chemical potential is deeply rooted in the existence of an underlying conservation law for the total number of particles. In contrast, we consider here systems where the number of particles fluctuates due to the creation and annihilation of particles, without any underlying conservation law. This leads to new phenomena, like the existence of absorbing phase transition between fluctuating, “active” states and empty absorbing states where no particles are present in the system. We first present the simplest generic model of a process with creation and annihilation of particles, called birth–death process, and describe its generic solution. Then we turn to reaction–diffusion processes and present a mean-field description of such a process, to illustrate its phenomenology. Finally, we discuss the fluctuation statistics in a fully connected model with an absorbing phase transition.
94
3 Models of Particles Driven Out of Equilibrium
3.2.1 Birth–Death Processes and Queueing Theory A birth–death process is a very simple process in which a stochastic variable n counts the number of particles (or agents, or any other type of entities) in the system. The number n is changed to n + 1 if a particle is added (birth process), and to n − 1 if a particle is removed (death process). Given the current state n, the birth rate is called λn (for any n ≥ 0), and the death rate is called μn (for any n ≥ 1). The evolution of the probability pn (t) to have n particles is given by the set of equations dp0 = μ1 p1 − λ0 p0 , dt dpn = λn−1 pn−1 + μn+1 pn+1 − (λn + μn ) pn dt
(3.20) (n ≥ 1) .
(3.21)
The equilibrium distribution is obtained by setting the time derivatives to zero in Eqs. (3.20) and (3.21). Solving the resulting recursion equations, one finds for the equilibrium distribution ⎡ eq p0
= ⎣1 +
k ∞ λ j−1 k=1 j=1
pneq
= p0
n λk−1 k=1
μk
⎤−1 ⎦
(3.22)
(n ≥ 1) .
(3.23)
μj
Quite importantly, this distribution is defined only when the denominator in the defi k λ j−1 eq nition of p0 takes a finite value, meaning when the series ∞ k=1 j=1 μ j converges. When this series diverges, no stationary distribution exists, meaning that the probability pn (t) keeps evolving indefinitely in time, with typically a probability that concentrates on larger and larger values of n. Birth–death processes are of interest in many applications, in particular in queueing theory as briefly described below. A simple example is when the birth and death rates do not depend on n, namely, λn = λ and μn = μ. This case is relevant, for instance, to the modeling of queues at a desk: λ is the rate at which people randomly arrive in the queue, and μ is the rate at which people exit the queue after having been served at the desk. Both arrivals and exits are Poissonian processes, meaning that the distributions of times τar between successive arrivals and τex between successive exits (τex is also called the service time) are exponential distributions, par (τar ) = λ e−λτar ,
pex (τex ) = μ e−μτex .
(3.24)
Another property of Poisson processes is that the number m of events (e.g., arrivals or exits) during a time window of duration T is distributed according to a Poisson distribution,
3.2 Dynamics with Creation and Annihilation of Particles
Par (m) =
(λT )m −λT e , m!
95
Pex (m) =
(μT )m −μT e . m!
(3.25)
eq
Coming back to the equilibrium distribution pn , it reads in the case of n-independent birth and death rates (3.26) pneq = (1 − ρ) ρ n (n ≥ 0) , where we have set ρ = λ/μ, and on condition that ρ < 1. The average value n and the variance Var(n) are given by n =
ρ , 1−ρ
Var(n) =
ρ . (1 − ρ)2
(3.27)
When λ > μ (i.e., ρ < 1), no equilibrium state exists, and the number n grows indefinitely because addition of particles (births) dominates over the withdrawal of particles (deaths). The general formula (3.23) however leads to a well-defined equilibrium when λ > μ on condition that the birth rate λn is zero above a threshold n 0 (in other words, if there is a maximum possible number n 0 of particles in the system). Another situation of interest is when the birth rate λ0 is equal to zero. This means that the configuration n = 0 is an absorbing state: once the state n = 0 is reached, it is no longer possible to escape it. In this case, the equilibrium state is trivial, and eq eq simply reads p0 = 1 and pn = 0 for n ≥ 1. This means that after an infinitely long time, the system always ends up in the state n = 0. However, the transient leading to this final state may be very long and may present non-trivial features. The study of more general processes of this type, exhibiting an absorbing state, is the goal of the next subsection.
3.2.2 Reaction–Diffusion Processes and Absorbing Phase Transitions Reaction–diffusion processes are simplified models describing the evolution with time of an assembly of different types of molecules that diffuse and chemically react upon encounter. Particle types are usually described by letters, ‘A’, ‘B’, ‘C’, etc. Transition rates in reaction–diffusion models are often written in terms of chemical reactions, like A + B → C, or 2 A → ∅. Many reaction–diffusion models are defined on a lattice, in such a way that all particles sit on a node of the lattice. A microscopic configuration of the system is then given by the list of the numbers n iA , n iB , n iC , etc. of particles A, B, C,... on node i. In some models, an exclusion principle is present, so that at most one particle can lie on a given site. In this case, it is convenient to represent the configuration of the system by introducing a local variable qi = 0, 1, 2, 3,... corresponding, respectively, to having zero particle, a particle A, a particle B, or a particle C, for instance, on site i.
96
3 Models of Particles Driven Out of Equilibrium
In the following, we consider reaction–diffusion models with a single type of particles, denoted as A, and we discuss two different types of mean-field approaches. The first one is the most common one, which simply consists in writing non-linear evolution equations for the density field ρ of particles A. This approach is a meanfield one in the sense that some correlations between particles are neglected. In addition, such an approach describes local average values and does not account for density fluctuations. To deal with fluctuations, a second approach is to consider a fully connected model, in which any particle can interact with any other particle, and to determine the distribution of the number of particles in the system. Such an approach, accounting for fluctuations, however looses spatial information. It effectively amounts to working in an infinite-dimensional space, similarly to the fully connected Ising model. For definiteness, we will consider a simple example of reaction–diffusion model, described by the following three reactions: A → 2 A with rate κ, A→∅ 2A → A
with rate ν, with rate λ.
(3.28) (3.29) (3.30)
The precise meaning of the rates κ, ν and λ will appear below. We further assume that particles A diffuse in space with a diffusion coefficient D. We now wish to determine an evolution equation for the density field ρ(r, t) describing the average number of particles in a small volume around point r. The rate of change of the density resulting from reaction (3.28) is simply given by κρ, since for each particle already present in the system, a new particle is created with probability κ per unit time. Similarly, the rate of change associated with reaction (3.28) is equal to −νρ, corresponding to a decrease of density with rate ν. These contributions to the evolution of ρ do not involve any approximation, being linear terms. Approximations become necessary when dealing with interactions between particles. This is the case for reaction (3.30), which involves the encounter of two particles. Strictly speaking, the probability to find two particles at the same point r is a quantity that depends on correlations between the positions of particles and that can thus not be expressed in a direct way as a function of the density field. As a first approximation, one can however neglect correlations and simply express the probability to find two particles in r at time t as the square of the density ρ(r, t). The rate of change of the density resulting from reaction (3.30) thus simply reads −λρ 2 . Taking also into account the diffusion of particles, we end up with the following evolution equation for the density field, ∂ρ = (κ − ν)ρ − λρ 2 + D ρ, ∂t
(3.31)
3.2 Dynamics with Creation and Annihilation of Particles
97
where is the Laplacian operator.1 The diffusion term tends to smooth out spatial heterogeneities of the density field. In the limit of a uniform field, Eq. (3.31) reduces to ∂ρ (3.32) = (κ − ν)ρ − λρ 2 . ∂t When κ < ν, the state ρ = 0 is the only stationary state, and it is linearly stable as can be checked easily by linearizing Eq. (3.32) around ρ = 0. In constrast, for κ > ν, the state ρ = 0 becomes unstable, and a new stationary state emerges, given by κ −ν (κ > ν). (3.33) ρ0 = λ A straightforward stability analysis shows that this state ρ0 is linearly stable. The transition occurring at κ = ν is called an absorbing phase transition. The state ρ = 0 is denoted as the absorbing phase for κ < ν, while the phase ρ0 , present for κ > ν, is called the active phase. Absorbing phase transitions constitute one of the major types of out-of-equilibrium phase transitions. Similarly to equilibrium phase transitions, they are characterized by a diverging correlation length ξ , and by a set of critical exponents, leading to the identification of universality classes. An important difference with respect to equilibrium phenomena is the role played by time, since a detailed characterization of absorbing phase transitions involves space-time trajectories, leading to the introduction of a correlation time τ . For a d-dimensional system, the phase transition is thus characterized as a (d + 1)-dimensional process. Denoting as ε the control parameter of the transition (ε ≡ κ − ν in Eq. (3.31)), the stationary density ρ0 in the active phase ε > 0 scales as ρ0 ∼ εβ , which defines the exponent β. Two other critical exponents are associated with the correlation length and time, which, respectively, scale as ξ ∼ |ε|−ν⊥ and τ ∼ |ε|−ν|| . Notations ν⊥ and ν|| are standard and come from the geometrical interpretation of the spatiotemporal process in a (d + 1)-dimensional space. Universality classes are determined (in the simplest cases) by the set of critical exponents (β, ν⊥ , ν|| ). Note that similarly to equilibrium phase transitions, the exponents characterizing the critical divergence of length and time correlations are equal above and below the transition. The prominent universality class for absorbing phase transitions is called Directed Percolation, often abbreviated as DP. Other universality classes also exist, for instance, for systems with conservation laws [31]. The reaction–diffusion process described by Eqs. (3.28), (3.29) and (3.30) is a typical example of a process belonging to the DP universality class [31]. Equation (3.31) is thus a mean-field representation of the DP class. One sees from Eq. (3.33) that the mean-field value of the β exponent for the DP class is β MF = 1. The mean-field value of the exponent ν|| can also be easily determined from Eq. (3.32). In the inactive phase ε ≡ κ − ν < 0, the density decays to zero and is thus asymptotically described by the linearized equation ∂ρ/∂t = ερ (with ε < 0), whose solution is The Laplacian operator is defined as = ∂ 2 /∂ x 2 in one dimension, = ∂ 2 /∂ x 2 + ∂ 2 /∂ y 2 in two dimensions, and = ∂ 2 /∂ x 2 + ∂ 2 /∂ y 2 + ∂ 2 /∂z 2 in three dimensions.
1
98
3 Models of Particles Driven Out of Equilibrium
ρ(t) ∝ e−|ε|t .
(3.34)
As the correlation time is defined by ρ(t) ∝ e−t/τ , it follows from Eq. (3.34) that τ = |ε|−1 , resulting in a mean-field exponent ν||MF = 1. Finally, scaling arguments performed on Eq. (3.31) allow the mean-field value of the exponent ν⊥ to be determined as well, yielding ν⊥MF = 21 [31]. Using field-theoretical arguments, one can show that these mean-field exponents are correct in space dimension d ≥ dc = 4, where dc is the upper critical dimension. For d < 4, the exponents β, ν⊥ and ν|| differ from their mean-field values.
3.2.3 Fluctuations in a Fully Connected Model with an Absorbing Phase Transition The description of reaction–diffusion processes through Eq. (3.31) is purely deterministic, and provides no information about fluctuations, for instance, the fluctuations of the total number of particles in the system. To go beyond this purely deterministic description, we now study the Markov process associated with the reaction rules (3.28), (3.29), and (3.30) in a fully connected geometry, meaning that any pair of particles in the system has at any time the same probability to react. In this very simplified setting, the only stochastic variable in the system is thus the total number n of particles (by contrast, a density field ρ(r, t) is used in the above deterministic mean-field description). The stochastic evolution of the number n of particles under the reactions (3.28) to (3.30) is described by the following transition rates: W (n + 1|n) = κn W (n − 1|n) = νn +
(3.35) λ n(n − 1) . V
(3.36)
Note that in this fully connected geometry, one needs to rescale the rate λ by the volume V of the system in order to obtain a well-defined infinite volume limit. The probability Pn (t) to have a number n of particles at time t then obeys the master equation λ d Pn = − (κ + ν)n + n(n − 1) Pn + κ(n − 1)Pn−1 dt V λ + ν(n + 1) + n(n + 1) Pn+1 , V
(3.37)
with the convention that P−1 = 0. Let us first note that the absorbing state Pnas defined by P0as = 1 and Pnas = 0 for n ≥ 1 is a solution of Eq. (3.37) for all values of the parameters. This solution is stable by definition of the absorbing state: once all particles have been annihilated, there is no way to create new particles and thus
3.2 Dynamics with Creation and Annihilation of Particles
99
to change state. This result seems to contradict the mean-field result of Sect. 3.2.2, according to which an active phase is present for κ > ν. The reason for this is that the mean-field approach neglects fluctuations and that the absorbing state may be reached when κ > ν only through atypically large fluctuations, as shown below. This suggests that the active phase is actually a long-lived metastable state, with as we will see, a lifetime that diverges exponentially with the volume of the system. The metastable active state can be approximately described by a probability Pnms such that P0ms = 0, thus excluding the state with no particle. From Eq. (3.37), we have that d P0 = ν P1 . (3.38) dt Hence the assumption of a metastable state is consistent only if P1ms is very small, so that P0 remains close to zero for a very long time. In the large volume limit, we assume that the probability distribution Pnms takes a large deviation form [49] (see Sect. 8.3 for more details): Pnms ∝ e−V φ(ρ)
(n > 0),
(3.39)
where ρ = n/V is the (fluctuating) density, and φ(ρ) is a large deviation function. Note that due to the normalization of the probability density, one has φ(ρ) ≥ 0 for all ρ. In order to use the large deviation form of Pnms in Eq. (3.37), we need to determine ms , which can be done by a first-order expansion of φ, namely, Pn±1
ms ∝ e−V φ(ρ± V ) ≈ e−V φ(ρ) e∓φ (ρ) . Pn±1 1
(3.40)
Expressing n as a function of ρ everywhere in Eq. (3.37), one finds in the limit V → ∞, after rearranging the terms
(νρ + λρ 2 ) e−2φ (ρ) − [(κ + ν)ρ + λρ 2 ] e−φ (ρ) + κρ = 0 .
(3.41)
The fact that a well-defined equation is obtained in the limit V → ∞ shows that the assumption of a large deviation form for the distribution Pnms is consistent. The quadratic equation (3.41) for the variable e−φ (ρ) can be solved, yielding two potential solutions, the relevant one being e−φ (ρ) =
κ . ν + λρ
(3.42)
φ (ρ) = ln
ν + λρ . κ
(3.43)
We thus end up with
We already see from this equation that for κ > ν, the solution ρ0 found in Eq. (3.33) yields φ (ρ0 ) = 0, and thus corresponds to the most probable value of ρ, consistently with the deterministic mean-field picture developed in Sect. 3.2.2. That ρ0 is indeed a
100
3 Models of Particles Driven Out of Equilibrium
3
κ = 0.5 κ = 1.5 κ = 2.5
2
φ(ρ)
Fig. 3.1 Illustration of the large deviation function φ(ρ) for different values of κ (ν = λ = 1). The active phase (κ > 1) corresponds to the situation where the minimum of φ(ρ) is reached for ρ > 0.
1
0 0
1
ρ
2
3
minimum of φ (a maximum of the probability distribution) can be checked explicitly using the results below. Integrating φ (ρ) in Eq. (3.43) and choosing the integration constant such that φ(ρ0 ) = 0, we get φ(ρ) = −(ρ − ρ0 )(1 + ln κ) +
κ 1 (ν + λρ) ln(ν + λρ) − ln κ . λ λ
(3.44)
The behavior of the function φ(ρ) is illustrated in Fig. 3.1. As mentioned above, for κ > ν the function φ(ρ) has a minimum (equal to zero) for ρ = ρ0 . It follows that φ(0) > 0, so that P1ms ≈ e−V φ(0) is exponentially small with the volume. The assumption of a metastable active state is thus consistent, according to Eq. (3.38). The lifetime τms of the metastable state can be estimated as the inverse of the rate of increase of P0 , leading to τms ≈
1 V φ(0) e . ν
(3.45)
The lifetime is thus found to increase exponentially with the volume of the system, which justifies the fact to consider the active phase as a “true” phase of the system in the large size limit. In constrast, for κ < ν, the function φ(ρ) has no minimum for ρ > 0 but is minimum for ρ = 0. As a result, φ(0) = 0 so that P1ms remains of the order of 1 even for large volume V . Hence P0 increases rapidly according to Eq. (3.38), and the system converges to the absorbing state in a relatively short time. The assumption of a metastable active state is thus no longer consistent in this case. One thus recovers the absorbing phase transition occurring at κ = ν. One of the interests of the present large deviation approach is to be able to describe the fluctuations of the number of particles in the active phase. Expanding φ(ρ) around its maximum to second order in ρ − ρ0 , we can compute the variance σρ2 = (ρ − ρ0 )2 in the Gaussian approximation, yielding for the relative standard deviation
3.2 Dynamics with Creation and Annihilation of Particles
σρ λ = ρ0 κ −ν
κ . λV
101
(3.46)
Hence the relative amplitude of fluctuations is inversely proportional to the square root of the volume, as expected from the Central Limit Theorem (see Sect. 8.1.1). One may also note from Eq. (3.46) that the relative amplitude of density fluctuations diverges when κ − ν → 0. To conclude on this model, it is also of interest to briefly comment on the analogies and differences with the fluctuations of the number of particles in the equilibrium grand-canonical ensemble (see Sect. 1.3.3), in which particles are exchanged with a reservoir. Assuming, for instance, that particles are randomly distributed among a large number of boxes, the entropy reads S(ρ) = −Vρ ln ρ, where V is here the number of boxes. Taking into account the contribution of the chemical potential μ characterizing the particle reservoir, one finds a large deviation function φGC (ρ) = ρ ln ρ − μρ + φ0 ,
(3.47)
¯ = 0, ρ¯ being the average density. Note where the constant φ0 is chosen such that φ(ρ) that we have set the temperature to T = 1 as it plays no role here. In spite of some similarities, it is thus not possible to directly map the nonequilibrium large deviation function φ(ρ) given in Eq. (3.44) to an equilibrium one as given in Eq. (3.47), showing again that an absorbing phase transition is a genuine nonequilibrium phenomenon.
3.3 Solvable Models of Interacting Driven Particles on a Lattice Let us now turn to a different type of situation, involving alternative techniques. In many cases equilibrium-type methods are not sufficient to solve nonequilibrium models, due, for instance, to the presence of fluxes in the system. One must then resort to other kinds of approaches. Among possible approaches, one can consider simple enough stochastic models for which an exact solution of the master equation can be found in the steady state, although detailed balance is not satisfied. A simple and prominent example of such type of models is the so-called Zero-Range Process (ZRP) [24], that we describe below. Another well-known example of exactly solvable nonequilibrium model is the Asymmetric Simple Exclusion Process (ASEP), for which the derivation of the solution is however much more technical [17] (see Sect. 3.3.3).
102
3 Models of Particles Driven Out of Equilibrium
Fig. 3.2 Sketch of the ZRP in the one-dimensional geometry, with asymmetric bulk dynamics and periodic boundary conditions
3.3.1 Zero-Range Process and Condensation Phenomenon In the ZRP, N particles are randomly placed on the L sites of a one-dimensional lattice with periodic boundary conditions,2 and can jump from site i to one of the neighboring sites i + 1 and i − 1 (with the conventions L + 1 ≡ 1 and 0 ≡ L). Motion is in general biased, which generates a current of particles along the ring. The interaction between particles is taken into account through the fact that the probability per unit time to jump from site i to one of the neighboring sites depends on the current number n i of particles on site i (and only on site i, hence the name Zero-Range Process). To be more specific, the probability to jump to site i + 1 is pu(n i ), and the probability to jump to site i − 1 is qu(n i ), with p + q = 1 and u(n i ) an arbitrary positive function of n i . A sketch of the model is displayed on Fig. 3.2. In the following, we restrict calculations to the fully biased case p = 1 and q = 0 in order to lighten notations. Yet calculations are performed in the same way for q > 0. A configuration of the ZRP is given by the set C = (n 1 , . . . , n L ) of the occupation numbers of all sites. The transition rate W (C |C) can be written formally as W (n 1 , . . . , n L |n 1 , . . . , n L ) =
L
u(n i ) δni ,ni −1 δni+1 ,n i+1 +1
δn j ,n j ,
(3.48)
j =i,i+1
i=1
where δn ,n is the Kronecker symbol, equal to 1 if n = n, and to 0 otherwise. Using this form of the transition rate, one can write the corresponding master equation (see Sect. 2.1), which we do not display here to lighten the presentation. It can be shown [24] that the steady-state distribution takes a factorized form 1 P(n 1 , . . . , n L ) = Z
L
f (n i ) δ j n j ,N ,
(3.49)
i=1
where the Kronecker delta symbol accounts for the conservation of the total number of particles. Inserting this form into the master equation, one obtains the expression of f (n): 2
We consider here for simplicity the ring geometry, but the ZRP can actually be defined on an arbitrary graph [25].
3.3 Solvable Models of Interacting Driven Particles on a Lattice
f (n) =
⎧ n ⎨ k=1 ⎩
1 u(k)
103
if n ≥ 1, (3.50) if n = 0.
1
Note that the model can also be defined in such a way as to obtain any desired function f (n) in the steady-state distribution: one simply needs to choose u(n) = f (n − 1)/ f (n), for n ≥ 1. One of the interesting properties of the ZRP is the presence of a condensation transition, where a finite fraction of the total number of particles gather on a single site. Such a phenomenon appears in the case of a function f (n) decaying as a power law, f (n) ∼ 1/n α , or equivalently u(n) = 1 + α/n + o(1/n). The single-site distribution can be obtained by considering the rest of the system as a reservoir of particles, a situation similar to the canonical ensemble at equilibrium. Assuming the system to be homogeneous, the single-site distribution is then given by p(n) = c f (n) e−μn ,
(3.51)
where μ is the effective chemical potential of the reservoir. The normalization constant c is determined by ∞ 1 = f (n) e−μn . (3.52) c n=0 The convergence of this sum requires that μ > 0 (or μ ≥ 0 if α > 1). The average density ∞ ρ = n = c n f (n) e−μn (3.53) n=1
is a decreasing function of μ, which thus reaches its maximum value ρc for μ → 0: ρc = c
∞ n=1
n f (n) ∼
∞ 1 . α−1 n n=1
(3.54)
Hence ρc is infinite if α ≤ 2, and finite if α > 2. As a result, if α > 2, a homogeneous density of particles cannot exceed a finite density ρc . If, on the contrary, one imposes a density ρ0 > ρc , by including in the system a number of particles N > Lρc , the dynamics will necessarily evolve toward a non-homogeneous state. It can be shown [24] that the resulting state is composed of a “fluid phase”, homogeneous at density ρc , and a “condensate”, that is a single site containing a macroscopic number of particles L(ρ0 − ρc ). Applications of this model range from a vibrated granular matter (each site corresponding to a vibrated urn containing grains), to road traffic (sites being sections of roads), or network dynamics (that is, the dynamics of attachment and detachment
104
3 Models of Particles Driven Out of Equilibrium
of links on the network) [24]. Note that the ZRP is a very simplified model, so that mapping it to more realistic situations often implies approximations.
3.3.2 Dissipative Zero-Range Process and Energy Cascade To go beyond the simple Zero-Range Process discussed above, it is interesting to consider situations where on the one hand the geometry is more complex than the simple one-dimensional ring, and on the other hand particles are exchanged with reservoirs. The case of a single reservoir can be treated in a simple way by replacing the Kronecker δ in Eq. (3.49) by an exponential factor exp(−μ j n j ), where μ is the chemical potential of the reservoir. One finds in this case that the system remains homogeneous for all values of the chemical potential: the condensation transition occurs only in the case when the total number of particles is fixed. A more interesting situation corresponds to the case of two reservoirs. Then, depending on the properties of the reservoirs, a steady flux of particles may be observed from one reservoir to the other. Alternatively, one may also interpret the particles as fixed amounts of energy that can be exchanged between different nodes of the lattice —the chemical potential is then replaced by an inverse temperature. This is the interpretation we retain below, where we combine a non-trivial geometry (a tree geometry) with the presence of two reservoirs. The model is defined as follows [6]. The network on which the dynamics takes place is a tree composed of M successive levels. At each level j < M, each site has a number m > 1 of forward branches linked to a node at level j + 1 —see Fig. 3.3. The number of nodes at a level j is equal to m j−1 . Nodes are thus labeled by their level index j, and by a further index i = 1, . . . , m j−1 distinguishing the different nodes present at the same level. It is convenient to think of each level of the tree as a different length scale. Large length scales correspond to the top of the tree and include a small number of degrees of freedom. On the contrary, the bottom of the tree describes small length scales and is associated with a large number of degrees of freedom. To associate a length scale with each level j of the tree, it is convenient to also introduce a quantity k j = m j−1 , interpreted as a “pseudo-wave number” in a physics terminology. The corresponding length scale is then given by j = 1/k j .
Fig. 3.3 Sketch of the dissipative model, illustrating the tree geometry in the case of M = 3 levels and m = 3 forward branches per node
3.3 Solvable Models of Interacting Driven Particles on a Lattice
105
By connecting a high-temperature reservoir at large scales, and a low-temperature one at small scales, the resulting model may be thought of as a dissipative system, where energy is injected at large scales, and dissipated at small scales. Physical examples of this broad class of systems include, for instance, hydrodynamic turbulence [26], wave turbulence in fluids or in plasma [51], and vibrating plates [11, 19]. Of course, the dissipative ZRP that we consider here is only a toy model that cannot account for all the complexity of these realistic systems. We assume that each node can carry an energy taking only discrete values proportional to a finite amount ε0 , so that the energy ε j,i at node ( j, i) is equal to n j,i ε0 , with n j,i an integer. The configuration of the system is thus described by the list of all the n j,i ’s. Energy transfer within the tree proceeds by moving an amount ε0 of energy along any branch linking level j and j + 1 with a rate (probability per unit time) ν j = νk αj , where ν is a constant parameter. Energy injection is modeled by −1 to the level j = 1. The connecting an energy reservoir at temperature Text = βext frequency of exchanges is chosen to be equal to ν for simplicity (but the model can be equally solved for an arbitrary value of this coupling to the reservoir). To model dissipation, we assume that energy is randomly withdrawn from any node at level M with a rate M . By solving the associated master equation, it can be shown that the stationary probability distribution Pst ({n j,i }) takes the form M m 1 −β j n j,i ε0 e , Z j=1 i=1 j−1
Pst ({n j,i }) =
(3.55)
where the parameters β j are effective inverse temperatures associated with level j of the tree, and Z is a normalization factor. For the distribution Pst ({n j,i }) to solve the master equation of the model, the inverse temperatures β j have to satisfy the following set of equations, expressed in terms of the parameters z j = exp(−β j ε0 ): ν j−1 (z j−1 − z j ) − mν j (z j − z j+1 ) = 0,
j = 2, . . . , M − 1
(3.56)
with the boundary conditions ν(e−βext ε0 − z 1 ) − mν1 (z 1 − z 2 ) = 0, ν M−1 (z M−1 − z M ) = M z M .
(3.57) (3.58)
These equations can also be interpreted as the local balance of the diffusive currents ν j (z j − z j+1 ) and the dissipative current M z M . Note that if the dissipation M is set to zero, no more current flows in the system, leading to an equilibrium solution with β1 = . . . = β M = βext . Solving Eqs. (3.56) to (3.58), two different temperature profiles may be obtained in the large M limit depending on the value of α. For α < −1, the temperature profile slowly converges to the constant value β j = βext imposed by the reservoir. In contrast, for α > −1, the temperature profile converges
106
3 Models of Particles Driven Out of Equilibrium
Fig. 3.4 Dissipated flux as a function of α, for different values D = 10−x of the dissipation coefficient. The full line corresponds to the limit D → 0. Parameters: m = 2, ν = 0.1, βext = 1, ε0 = 1
0.04
Φ
0.03
x=4 x=6 x=8 x=10 x=12
0.02 0.01 0 -3
-2
-1
0
α
1
2
in the large M limit to a genuine nonequilibrium profile given by neq
βj
=
1 1 (1 + α) ln k j + βext + ln c, ε0 ε0
(3.59)
with c = 1 + m − m −α . Note that the nonequilibrium profile β j is determined only by parameters characterizing energy injection and transfer, and not by parameters related to the dissipation mechanism. Interestingly, the temperature profile is continuous as a function of α, in the sense that β j → βext when α → −1+ . To better understand the origin of this change of behavior around α = −1, one can compute the mean energy flux crossing the system, neq
= ν e−βext ε0 − e−β1 ε0 .
(3.60)
In the limit M → ∞, this flux is found to be =
ν (m − m −α )e−βext ε0 c
(3.61)
for α > −1 and = 0 for α < −1. The energy flux is plotted as a function of α in Fig. 3.4, showing a continuous transition at the value α = −1. Hence the emergence of a non-uniform temperature profile across scales is associated with the existence of a finite injected and dissipated energy flux across the system. To sum up, the system reaches a quasi-equilibrium state in the large size limit for α < −1, while it reaches a true nonequilibrium state, with a finite energy flux, for α > −1. A peculiar feature of this nonequilibrium state is that it explicitly depends on the internal dynamics characterized by α, as can be seen on Eq. (3.59). In contrast, the quasiequilibrium state obtained for α < −1 does not depend on α (at least asymptotically, for M → ∞); it is only characterized by the temperature βext of the reservoir. We have thus seen that the presence of a non-vanishing flux is a key ingredient to generate a nonequilibrium steady state. At a qualitative level, the need for a flux
3.3 Solvable Models of Interacting Driven Particles on a Lattice
107
to sustain a nonequilibrium steady state may be considered as a relatively general feature of complex systems, which is not specific to the physical model considered here, and may be found in very different contexts, like, for instance, in economics. As an elementary example, a store cannot maintain its activity if the flux of sold products is too low. This can be understood within the framework of a very simplified model of store, in which only one type of product is sold. Let us call φ the flux of products, that is, the number of products sold per unit time. The total profit made by the store is simply P = φ p − C0 , where p is the difference between buying and selling prices of products, and C0 is the fixed cost per unit time for running the store (rent for the building, employee wage, etc.). Of course, the activity of the store can be maintained only if the profit P is positive. As a result, φ has to be larger than φc = C0 / p. Hence a store is a nonequilibrium system which needs a flux (of sold product) larger than a threshold value to maintain a steady state.
3.3.3 Asymmetric Simple Exclusion Process In the Zero-Range Processes considered above, the distribution P({n i }) of the N degrees of freedom takes a simple form, simply being a product of the marginal distributions pi (n i ) (with additionally a conservation of the total number of particles in the absence of a reservoir). However, this simplicity is the exception rather than the rule, and in many cases, the N -body distribution is too complicated to be determined explicitly. Yet, there are examples of lattice particle models where the distribution is much more complex than a simple product of marginal distribution, but where the complexity of the distribution can still be “tamed” through an appropriate parameterization that we describe below. This class of problems includes the so-called Asymmetric Simple Exclusion Process (ASEP) with boundary reservoirs. The model is defined on a one-dimensional lattice with L sites. Particles stochastically hop from site i to site i + 1 with a rate (probability per unit time) p and to site i − 1 with a rate q (we do not assume here that p and q sum up to 1 as these are rates and not directly probabilities). However, the dynamics is constrained by the exclusion rule stipulating that no more than one particle can lie on each lattice site. As a result, hopping to a neighboring site is possible only if the target site is empty. A sketch of the dynamics is displayed on Fig. 3.5. Boundary reservoirs are modeled as appropriate rates of particle injection and withdrawal at the boundary sites. To be specific, particles are randomly injected on the leftmost site i = 1 with a rate α (on condition that this site is empty) and are randomly withdrawn from the rightmost site i = L with a rate γ (Fig. 3.5). A microscopic configuration C is given by the set of occupation numbers C = (n 1 , . . . , n L ), where n i = 0 if there is no particle on site i, and n i = 1 if there is one particle (there cannot be more than one particle on a site). Formally, the transition rate W (C |C) can be written as
108
3 Models of Particles Driven Out of Equilibrium
q
α
γ
p
nL
ni
n1
Fig. 3.5 Sketch of the dynamics of the ASEP model illustrating the possible moves to empty neighboring sites, and the effect of the boundary reservoirs
W ({n i }|{n i })
=
L−1
p δni ,ni −1 δni+1 ,n i+1 +1 δn i+1 ,0
+
δn j ,n j
j =i,i+1
i=1 L
q δni ,ni −1 δni−1 ,n i−1 +1 δn i−1 ,0
δn j ,n j
j =i,i−1
i=2
+ αδn 1 ,n 1 +1 δn 1 ,0
L j=2
δn j ,n j + γ δn L ,n L −1
L−1
δn j ,n j .
(3.62)
j=1
The transition rate of the ASEP is relatively similar to that of the Zero-Range Process [see Eq. (3.48)], but the transition rate (3.62) additionally takes into account the constraint that a transition takes place only when the target site is empty. Writing the master equation defined from the transition rates (3.62), one can try to find the steadystate solution for the N -body distribution P(n 1 , . . . , n L ). With this aim in mind, it is convenient to use the following parameterization of the distribution P(n 1 , . . . , n L ), called Matrix Product Ansatz (MPA): P({n i }) =
1 L M(n 1 )M(n 2 ) . . . M(n L ) , Z
(3.63)
having introduced the matrix-valued function M(n). The linear operator L is used to transform matrices into real values. In practical situations, the operator L is chosen in the form L(M) = W |M|V (3.64) for systems in contact with reservoir as considered here. The notations W | and |V , borrowed from quantum mechanics, correspond to two vectors acting on the matrix M; W | is a row vector and |V is a column vector. If one would instead consider a closed systems with periodic boundary conditions, the operator L would be chosen as the trace operator, L(M) = Tr(M), consistently with the circular permutation symmetry of the system. The MPA parametrization (3.63) is a generalization of the fully factorized distribution (like in the ZRP) that keeps some formal advantages of the factorization properties using matrices instead of real numbers, but also includes correlations (while the fully factorized form does not).
3.3 Solvable Models of Interacting Driven Particles on a Lattice
109
The constant Z , introduced to normalize the distribution, reads C≡ M(n). Z = W |C L |V ,
(3.65)
n
Although a priori not intuitive, the MPA form Eq. (3.63) of the distribution P(n 1 , . . . , n L ) can be shown to hold for stochastic particle lattice models in one dimension under relatively mild assumptions, including in particular the fact that the rules remain local (jumps are short-ranged) [32]. Unfortunately, this existence proof does not provide us with a general way to explicitly determine the matrix M(n), which has to be found on a case-by-case basis. Since the occupation number n i takes only two values, determining the function M(n) simply consists of determining two matrices D ≡ M(1) and E ≡ M(0). In what follows, we assume without loss of generality that p = 1, which simply amounts to choosing p −1 as the time unit. Plugging the MPA ansatz (3.63) into the steady-state master equation, one obtains that the matrices D and E have to obey an algebraic equation [35]: D E − q E D = (1 − q)(D + E).
(3.66)
Additional relations should also be fulfilled by the vectors W | and |V W |E =
1−q W | , α
D|V =
1−q |V . γ
(3.67)
Several explicit representations of the matrices D and E and of the vectors W | and |V have been found [9, 10, 17, 23, 35, 44]. In many cases, infinite-dimensional matrices and vectors have to be used. Yet, some finite-dimensional representations have been found when the parameters of the model satisfy particular relations. In the seemingly simpler case p = 1 and q = 0 (called TASEP for Totally Asymmetric Simple Exclusion Process), it can actually be shown that only infinite-dimensional representations exist [17]. Among these, an elegant representation of the matrices D and E reads as [17] ⎛
1 ⎜0 ⎜ ⎜ D = ⎜0 ⎜0 ⎝ .. .
1 1 0 0
0 1 1 0
⎞ 0 ··· 0 ⎟ ⎟ 1 ⎟ ⎟, 1 ⎟ ⎠ .. .
⎛
1 ⎜1 ⎜ ⎜ E = ⎜0 ⎜0 ⎝ .. .
0 1 1 0
0 0 1 1
⎞ 0 ··· 0 ⎟ ⎟ 0 ⎟ ⎟. 1 ⎟ ⎠ .. .
(3.68)
The associated boundary vectors W | and |V take the form W | = κ(1, a, a 2 , a 3 , . . . ), with parameters a, b and κ given by
|V = κ(1, b, b2 , b3 , . . . )T
(3.69)
110
3 Models of Particles Driven Out of Equilibrium
a=
1−γ 1−α , b= , κ= α γ
α+γ +1 . αγ
(3.70)
Note that this representation yields divergences in some parameter regimes, where other representations may be more convenient [17]. As mentioned above, the advantage of the MPA parametrization is that it keeps a factorized form, although in terms of more complicated and generally noncommuting objects. These properties are useful, for instance, to evaluate the mean density profile or the different correlation functions. For example, the mean density n i can be evaluated by summing P(n 1 , . . . , n L ) over all occupation numbers n j ( j = i), which leads to W |C i−1 DC L−i |V , (3.71) n i = W |C L |V having defined C = D + E. In the same way, one finds for the two-point correlation function n i n j ( j > i), n i n j =
W |C i−1 DC j−i−1 DC L− j |V . W |C L |V
(3.72)
Beyond these purely static quantities characterizing the spatial repartition of particles in the system, one can also look at dynamical quantities, for instance, the mean particle current on a given link (i, i + 1). In steady state, this current J is independent of i in the present one-dimensional setting. In mathematical terms, the current J reads (recalling that p = 1, and keeping here an arbitrary q) J = n i (1 − n i+1 ) − q(1 − n i )n i+1 .
(3.73)
Using the formal MPA parametrization without explicit knowledge of the matrices D and E, one finds for the current J=
W |C i−1 (D E − q E D)C L− j−1 |V . W |C L |V
(3.74)
This expression of J can be simplified thanks to the algebraic relation Eq. (3.66), yielding W |C L−1 |V J = (1 − q) . (3.75) W |C L |V In many cases, the quantity W |C L |V follows for large values of L the scaling form [17] (3.76) W |C L |V ∼ AL z λ L
3.3 Solvable Models of Interacting Driven Particles on a Lattice
1 0.8
MC LD
0.6 γ
Fig. 3.6 Phase diagram of the TASEP model in terms of the rates of transfer with the reservoirs, indicating the low-density phase (LP), the high-density phase (HP), and the maximal current phase (MC)
111
0.4 HD
0.2 0
0
0.2
0.4
α
0.6
0.8
1
with A, z and λ some L-independent parameters. In the limit L → ∞, Eq. (3.75) asymptotically leads to the relation J = (1 − q)/λ. From the knowledge of the current J , one can establish the phase diagram of the ASEP model. Restricting here for simplicity to the totally asymmetric case (q = 0, TASEP), three phases can be distinguished [17]: (i) for (α > 21 and γ > 21 ), one finds a maximal current phase with J = 14 ; (ii) for (α < 21 and γ > α), one has a lowdensity phase where the current, equal to J = α(1 − α), is fixed by the left reservoir (injection mechanism); (iii) for (γ < 21 and γ < α), one has a high-density phase where the current, given by J = γ (1 − γ ), is fixed by the right reservoir (dissipation mechanism). As a result, the line α = γ < 21 in the phase diagram corresponds to a first-order phase transition [17]. The phase diagram of the TASEP model is displayed in Fig. 3.6. In the case q > 0, a qualitatively similar phase diagram is found [44]. Finally, it can also be shown thanks to the MPA form of the distribution P({n i }) that the stationary state of the ASEP model exhibits long-range correlations over the system size [16].
3.4 Approximate Description of Driven Frictional Systems We have seen in the previous section how to compute exact solutions of N -particle stochastic models on a few examples. However, in most cases it is not possible to determine exact stationary solutions of the model, and it is thus of interest to devise approximation schemes relevant to specific classes of systems. As an example of this type of approach, we now briefly discuss the approximate description of granular matter put forward by Edwards and coworkers [20–22, 37]. Granular matter consists of a class of materials made of “grains” of macroscopic size (e.g., a fraction of a millimeter) which experience solid friction when put in contact one with the other. As
112
3 Models of Particles Driven Out of Equilibrium
a result, the contact area between grains is able to support static or dynamic tangential forces, making the properties of this type of materials different from that of usual materials. Another important difference is that the thermodynamic temperature plays no role in granular materials, because the potential energy of a grain (or, to be more specific, the work needed to lift a grain by a height comparable to its diameter) is much larger than the thermal energy, and thermal fluctuations alone are not able to move grains.3 It is thus necessary to inject energy mechanically, for instance, by shaking the container in which sand is placed. A standard protocol used in this context is to shake the grains for some given duration, and to let the grains relax to a mechanically stable configuration (i.e., an arrangement of grains such that all forces exerted on any given grain sum up to zero). By repeating this shaking and relaxation cycle a large number of times, the system explores a large number of mechanically stable configurations, thus allowing one to measure average values of some observables like the height of a granular pile, or the forces exerted by the grains on the walls of the container. The goal of the approach developed by Edwards and coworkers is to postulate a relatively simple form for the statistics of mechanically stable configurations, also called “blocked configurations” for short in this context.
3.4.1 Edwards Postulate for the Statistics of Configurations The basic idea of the Edwards statistics is to assume that all configurations that are not mechanically stable have zero probability, and that blocked configurations have a uniform probability, provided additional constraints like energy or volume are also properly taken into account. In practice, this often amounts to assuming an equilibrium-like distribution, restricted to blocked configurations [2, 8, 20–22, 37] P(C) =
E(C) V (C) 1 exp − F (C), − Z Teff X
(3.77)
where C is a configuration of the system, typically the set of all positions of the grains. The parameters Teff and X are nonequilibrium thermodynamic parameters called effective temperature and compactivity, respectively. The function F (C) indicates whether the configuration C is mechanically stable or not: F (C) = 1 for a blocked configuration, and F (C) = 0 otherwise. This restriction of the probability measure to blocked configurations is the main formal difference with usual equilibrium statistical physics. As we will see on an explicit example below, this restriction to mechanically stable configurations is at the origin of a new and interesting phenomenology. Yet, the function F (C) often has a complicated expression which makes calculations very difficult beyond mean-field type approximations. 3
However, note that temperature may play a role in the mechanics of grains either in the small frictional contact areas between grains, or by dilating or contracting the grains if the temperature slightly fluctuates. But the resulting displacements generally remain much smaller than the grain diameter.
3.4 Approximate Description of Driven Frictional Systems
113
ext
fi
ξi xi
x i+1
Fig. 3.7 Sketch of the spring-block model
3.4.2 A Shaken Spring-Block Model As an example of application of the Edwards approach, we consider a simple frictional one-dimensional model consisting of masses sliding on a substrate and linked by springs [28]. To be more specific, the model is composed of N + 1 masses of mass m located at position xi (t) on a line, and neighboring masses are linked by a spring, as illustrated in Fig. 3.7. All springs are identical, and characterized by the same stiffness k and rest length 0 . The masses are subjected to the elastic forces exerted by the springs, to their weight, and a solid friction exerted by the substrate, that resists motion. The force exerted on mass i by the spring linking masses i and i + 1 is k(xi+1 − xi − 0 ). The rest length 0 is assumed to be large to avoid crossings of masses. When a mass is at rest, it can start moving only if the sum of the forces exerted by the two neighboring springs exceeds the weight mg (where g is the intensity of gravity) multiplied by the static friction coefficient μ; otherwise the mass stays at rest. A blocked state thus corresponds to a configuration of the mass positions such that the sum of the forces exerted by springs on each mass is less than μmg. To sample blocked states, one can use a periodic driving protocol to inject energy in the system through a strong external random force for a duration τ and then let the system relax to a blocked state. Repeating the driving and relaxation cycles many times, many blocked configurations are explored. When a mass moves at velocity x˙i , the substrate exerts a dynamic friction force equal to −mgμd sign(x˙i ), where μd is the dynamic friction coefficient. The specificity of the dynamical friction force is that it does not depend on the speed, but only on the direction of motion through the sign of the velocity. The equations of motion read m x¨i = −mgμd sign(x˙i ) + k(xi+1 + xi−1 − 2xi ) + f iext ,
(3.78)
where f iext is the external force which is applied only during the driving phase. Since by translating all masses by the same distance, the configuration of the system remains the same, we characterize the configuration by the list of spring elongations ξi ≡ xi+1 − xi − 0 . As explained above, a blocked configuration is characterized by the condition k|ξi+1 − ξi | < μs mg for all i = 1, . . . , N − 1. To lighten notations, it is convenient to choose units such that k = 1 and mg = 1. Following the Edwards approach, we postulate that the statistics of configurations C = (ξ1 , . . . , ξ N ) can be approximately
114
3 Models of Particles Driven Out of Equilibrium
described by the distribution (3.77). In this one-dimensional setting, the volume is N ξi . Since we do not impose actually the total length L = x N +1 − x1 = N 0 + i=1 any confinement or any constraint on the total length of the system, the total length is not expected to bias the distribution, and we thus assume an infinite compactivity, X → ∞. The total elastic energy is the sum of the elastic energy 21 xi2 of all springs. One thus simply needs to provide an explicit expression of the function F (ξ1 , . . . , ξ N ) that indicates whether a configuration is blocked or not. Since the condition for a configuration to be blocked is that the extensions ξi satisfy |ξi+1 − ξi | < μs for all i = 1, . . . , N , the function F (ξ1 , . . . , ξ N ) can be expressed as F (ξ ) =
N
(μs − |ξi+1 − ξi |),
(3.79)
i=1
with the Heaviside function defined by (x) = 1 for x > 0 and (x) = 0 for x < 0. In the Edwards approximation, the probability P(ξ1 , . . . , ξ N ) to sample the configuration (ξ1 , . . . , ξ N ) in the driving-relaxation cycling protocol is thus given by ! " N N 1 2 1 P(ξ1 , . . . , ξ N ) = exp − ξ (μs − |ξi+1 − ξi |) , Z 2Teff i=1 i i=1
(3.80)
where Z is a normalization factor, or effective partition function, determined by imposing that dξ1 . . . dξ N P(ξ1 , . . . , ξ N ) = 1. The effective temperature Teff is actually an auxiliary parameter that needs to be linked to a physical observable that can be directly measured in a numerical simulation, namely, the average energy per spring ε = 21 ξ 2 . This can be done as follows. Defining the inverse temperature −1 βeff = Teff , the energy density ε is obtained by derivation of the effective free energy density F = −(Nβeff )−1 ln Z , ε=−
∂ ln λmax . ∂βeff
(3.81)
The partition function can be evaluated using a transfer operator technique. Let us define the operator T by T [ f ](x) =
∞
−0
dy T (x, y) f (y),
(3.82)
where the function T (x, y) is a symmetric kernel given by T (x, y) = e−βeff x
2
/4
Then the partition function Z reads as
(μs − |x − y|) e−βeff y
2
/4
.
(3.83)
3.4 Approximate Description of Driven Frictional Systems
115
Z=
dξ1 . . . dξ N T (ξ1 , ξ2 ) T (ξ2 , ξ3 ) · · · T (ξ N −1 , ξ N ) T (ξ N , ξ1 ) ,
(3.84)
which can be formally written as Z = Tr(T N ), where Tr is the trace operator. To be more specific, we have defined the operator T n by iteration, T n+1 [ f ] = T n T [ f ] , and the trace functional Tr(A) = d x A[δx ](x) for an operator A[ f ], where δx is the Dirac delta distribution at point x. In the large N limit, the effective free energy can be expressed in terms of the largest eigenvalue λmax (βeff ) of the operator T , as F = −Teff ln λmax . The largest eigenvalue λmax can be determined numerically after discretizing the operator into a large matrix [28]. One eventually finds that the energy 1/2 density ε is proportional to Teff at low energy and proportional to Teff at high energy.
3.4.3 Long-Range Correlations for Strong Shaking When the external forces applied during the driving cycles are strong, the energy density of the blocked configurations reached after relaxation is large. It has been observed numerically in this regime that spring elongations ξi are correlated over a distance (measured in numbers of springs) that becomes large and turns out to be proportional to the energy density ε [28]. Defining the spring correlation function Ci j = ξi ξ j , Ci j is numerically observed to take the scaling form Ci j = C˜
|i − j| (ε)
(3.85)
which allows for a precise definition of the correlation length (ε), which satisfies ∼ ε. This scaling of the correlation length with the energy can be reproduced using the transfer operator technique [28]. At a heuristic level, it may also be recovered in a simple way in the framework of a Gaussian field theory. Using the following (rough) Gaussian approximation, 1 |ξi+1 − ξi |2 , (μs − |ξi+1 − ξi |) → √ exp − 4μ2s π
(3.86)
one can eventually rewrite the partition function in the form of a Gaussian multiple integral Z∝
dξ1 . . . dξ N e−H(ξ1 ,...,ξ N ) ,
(3.87)
where H(ξ1 , . . . , ξ N ) is an effective quadratic Hamiltonian, " ! N N 1 1 2 2 H(ξ ) = ξi + (ξi+1 − ξi ) . βeff 2 2μ2s i=1 i=1
(3.88)
116
3 Models of Particles Driven Out of Equilibrium
Taking a continuum limit, one ends up with a one-dimensional Gaussian field theory, for which standard results are known in the literature (see, e.g., [34]). These results are obtained by working with the Fourier transform ξq of the elongations ξi . One obtains 1/2 1/2 in this way ∼ Teff . Since one also has ε ∼ Teff in this high energy regime, we recover the numerical result ∼ ε. Beyond the precise scaling ∼ ε, the interest of the result is to show that the model exhibits a critical point at infinite temperature. This is in striking contrast with standard equilibrium results, where one-dimensional systems with short-range interactions can only have a critical point at zero temperature. This shows that in spite of the equilibrium-like flavor of the Edwards distribution, which looks like an equilibrium Boltzmann–Gibbs distribution but with an effective temperature, this distribution structurally differs from the equilibrium one. The reason is that the restriction to blocked configurations deeply modifies the statistics of configurations and introduces correlations between neighboring variables ξi .
3.5 Collective Motion of Active Particles Active particles are particles able to sustain a continuous motion thanks to some external energy input. This concept is used by physicists to describe, for instance, the motion of animals, bacteria, or more recently different types of self-driven colloids [36]. A very schematic model of active particle is a point-like particle with a velocity vector of constant modulus, but arbitrary direction. In the simplest cases, the direction of motion of the particles just diffuses randomly, and one speaks about active Brownian particles (see Sect. 3.1.3). Different types of interactions may be included between active particles, like repulsion forces, for instance. In this case, the interplay between self-propulsion and repulsion leads to a phase separation, with the formation of dense clusters, as if the particles had effective attractive interactions [15]. Besides, other types of interactions which are specific to self-propelled particles can be included. This is the case in particular of velocity-alignment interactions. A paradigmatic model for this type of interactions between active particles is the so-called Vicsek model that has been extensively studied through numerical simulations [29, 48, 50]. A transition from disordered motion when the density of active particles is low, to ordered collective motion when the density is high, has been reported. This transition exhibits some properties similar to that of phase transitions observed in physical systems. It is also possible to develop analytical approaches, either by postulating phenomenological equations of motion at the macroscopic scale (hydrodynamic equations) [47], or by using a Boltzmann approach to derive such hydrodynamic equations [7]. We present here a brief summary of the results obtained from the latter approach. We consider self-propelled point-like particles moving on a continuous twodimensional space, with a velocity vector v of fixed magnitude v0 (to be chosen as the speed unit) in a reference frame. The velocity of the particles is simply defined
3.5 Collective Motion of Active Particles
θ
θ’ η
117
θ1
θ
θ2
θ’ 1 θ 2’
η1 η2
Fig. 3.8 Schematic representation of the dynamics of the model. Left: self-diffusion of the velocity angle; Right: binary collision. See text for notations
by the angle θ between v and a given reference direction. Particles move in straight line, following their velocity vector, until they experience either a self-diffusion event (a random scattering), or a binary collision that tends to align the velocities of the two particles (see Fig. 3.8). Self-diffusion events are defined as follows: the velocity angle θ of any particle is changed with a probability λ per unit time to a value θ = θ + η, where η is a Gaussian noise with distribution p0 (η) and variance σ02 . Binary collisions, that are the only interactions between particles, occur when the distance between two particles becomes less than d0 (in the following, we set d0 = 21 ). The velocity angles θ1 and θ2 of the two particles are then changed into θ1 = θ + η1 and θ2 = θ + η2 , as shown on Fig. 3.8. In the last expression, θ = arg(eiθ1 + eiθ2 ) is the average angle, and η1 and η2 are independent Gaussian noises with the same distribution p(η) and variance σ 2 . Note that these binary collisions are different from the collisions in usual gases, as in this latter case, collisions are ruled by energy and momentum conservation laws. In the following, we take for simplicity identical distributions p0 (η) and p(η); a single parameter σ thus characterizes the amplitude of the noise.
3.5.1 Derivation of Continuous Equations A useful mathematical tool to describe statistically the dynamics of the system is the one-particle phase-space distribution f (r, θ, t), namely, the probability to find a particle at position r and with a velocity angle θ , at time t. The evolution of this one-particle phase-space distribution is ruled by the Boltzmann equation, which reads ∂f (r, θ, t) + e(θ ) · ∇ f (r, θ, t) = Idif [ f ] + Icol [ f ]. ∂t
(3.89)
The functionals Idif [ f ] and Icol [ f ], respectively, account for the self-diffusion and collision phenomena. The vector e(θ ) is the unit vector in the direction θ . The diffusion functional Idif [ f ] is given by Idif [ f ] = −λ f (r, θ, t) + λ
∞ −∞
dη p(η) f (r, θ − η, t) .
(3.90)
118
3 Models of Particles Driven Out of Equilibrium
The evaluation of the collision term Icol [ f ] is more subtle. We know that two particles collide if their distance becomes less than the interaction range d0 . In the frame of particle 1, particle 2 has a velocity v2 = e(θ2 ) − e(θ1 ). Hence, particles that collide with particle 1 between t and t + dt are those that lie, at time t, in a rectangle of length |v2 | dt and of width 2d0 , yielding for the collision functional (the collision area does not change going back to the lab frame) π dθ |e(θ ) − e(θ )| f (r, θ , t) Icol [ f ] = − f (r, θ, t) −π π ∞ π dθ1 dθ2 dη p(η) |e(θ2 ) − e(θ1 )| + −π
−π
(3.91)
−∞
× f (r, θ1 , t) f (r, θ2 , t)δ2π (θ + η − θ ), with θ = arg(eiθ1 + eiθ2 ), and δ2π a generalized Dirac distribution taking into account the periodicity of angles. One can check that the uniform angular distribution f (r, θ, t) = ρ/2π is a solution of Eq. (3.89) for an arbitrary constant density ρ, and for any value of the noise amplitude σ . In order to deal with convenient physical quantities, we introduce the hydrodynamic density and velocity fields ρ(r, t) and u(r, t):
π
dθ f (r, θ, t) , π 1 u(r, t) = dθ f (r, θ, t) e(θ ) . ρ(r, t) −π
ρ(r, t) =
(3.92)
−π
(3.93)
Integrating the Boltzmann equation (3.89) over θ , one directly obtains the continuity equation for ρ(r, t): ∂ρ + ∇ · (ρu) = 0. (3.94) ∂t The operator ∇ is the vectorial differential operator4 of components (∂/∂ x, ∂/∂ y). The derivation of a hydrodynamic equation for the velocity field is less straightforward and involves an approximation scheme. The reader is referred to Ref. [7, 43] for more details on the derivation. The principle of the derivation is to expand the distribution f (r, θ, t) into angular Fourier modes according to5
4
More explicitly, Eq. (3.94) reads ∂ ∂ρ ∂ + (ρu x ) + (ρu y ) = 0, ∂t ∂x ∂y
where (u x , u y ) are the components of the vector u. Here, i is a complex number such that i 2 = −1.
5
(3.95)
3.5 Collective Motion of Active Particles
f (r, θ, t) =
119 ∞ 1 f k (r, t) e−ikθ , 2π k=−∞
(3.96)
where the Fourier coefficients f k are defined as f k (r, t) =
π
−π
dθ f (r, θ, t) eikθ .
(3.97)
Note that f 0 (r, t) is nothing but the local density ρ(r, t). Note also that for all k, f −k = f k∗ (the star denotes the complex conjugate), since f (r, θ, t) is real. The Boltzmann equation can in turn be expanded into Fourier modes, leading to ∞ v0 ∂ fk + (∂ f k−1 + ∂ ∗ f k+1 ) = −(1 − Pk ) f k + Jk,q f q f k−q , ∂t 2 q=−∞
(3.98)
where we have used the shorthand notation ∂ and ∂ ∗ for the complex differential operators ∂ ∂ ∂ ∂ +i , ∂∗ ≡ −i . (3.99) ∂≡ ∂x ∂y ∂x ∂y The coefficient Jk,q is given by Jk,q = Pk (σ )Ik,q − I0,q , where Ik,q is defined by the integral # x# 1 π # # Ik,q = d x #sin # e−iq x+ikx/2 (3.100) π −π 2 ∞ with Pk (σ ) = −∞ dη Pσ (η) eikη the Fourier transform of the noise distribution (restricted here to integer values of k). One has 0 ≤ Pk (σ ) ≤ 1 and Pk (0) = 1, ∀k. For a Gaussian noise distribution, the Fourier transform has the simple form Pk (σ ) = e−σ
k /2
2 2
.
(3.101)
To proceed further, it is necessary to identify the linear instability of the disordered state, corresponding to an isotropic distribution for which f k = 0 for all k = 0. This linear instability occurs for a critical value ρc which depends on the intensity of the noise. For an average density ρ0 slightly above ρc , one assumes that the distribution f (r, θ, t) is still close to an isotropic distribution and makes a scaling ansatz for the different fields at stake, as a function of a small parameter characterizing the distance to the threshold density, f 1 ∼ ,
f 2 ∼ 2 , ρ − ρ0 ∼ .
(3.102)
One also needs to make similar assumptions about the space and time derivatives. A coherent scaling ansatz turns out to be
120
3 Models of Particles Driven Out of Equilibrium
∂ ∼ ,
∂ ∼ ∂t
(3.103)
consistently with the propagative nature of the dynamics. The Boltzmann equation, expressed in Fourier modes, is then truncated to order 3 , neglecting terms of order 4 or higher. This results in two coupled equations for f 1 and f 2 . The Fourier coefficient f 1 is a complex number having real and imaginary parts equal to components of the pseudo-momentum field w(r, t) = ρ(r, t)u(r, t). Hence this is the relevant field we need to keep in the description to characterize the emergence of polar order. The second Fourier coefficient f 2 actually has a much faster relaxation dynamics than f 1 . Using this separation of time scales, one can express f 2 as a function of f 1 , and obtain in this way a closed equation for the evolution of f 1 (which is however still coupled to the density ρ, the other relevant field). Mapping complex numbers onto vectors, we end up with the following hydrodynamic equations: 1 ∂w + γ (w · ∇)w = − ∇(ρ − κw2 ) ∂t 2 + (μ − ξ w2 )w + ν w − κ(∇ · w)w ,
(3.104)
with the Laplacian operator,
≡
∂2 ∂2 + 2. 2 ∂x ∂y
(3.105)
It is interesting to give a physical interpretation of the different terms appearing in this hydrodynamic equation. The first term in the r.h.s. of Eq. (3.104) can be interpreted as a pressure gradient, considering p = 21 (ρ − κw2 ) as an effective pressure. The second term accounts for the local relaxation of w, while the third term is analogous to the standard viscous term appearing in the Navier–Stokes equation describing usual fluids. Finally, the last term corresponds to a feedback on the flow from compressibility effects. The different coefficients appearing in Eq. (3.104) can be computed explicitly as a function of the microscopic parameters of the model. They are given by [7] −1
$ % 4 14 2 1 2 2 λ 1 − e−2σ0 + ρ + e−2σ , 4 π 15 3 8ν 16 2 2 + 2e−2σ − e−σ /2 , γ = π 15 8ν 4 2 2 + 2e−2σ + e−σ /2 , κ= π 15 $ % 4 2 2 2 − λ 1 − e−σ0 /2 , μ = ρ e−σ /2 − π 3 ν=
(3.106) (3.107) (3.108) (3.109)
3.5 Collective Motion of Active Particles
ξ =
64ν π2
121
1 2 2 2 e−σ /2 − + e−2σ . 5 3
(3.110)
Note that ν, γ and κ are always positive; μ can change sign and ξ > 0 whenever μ > 0.
3.5.2 Phase Diagram and Instabilities Turning to the study of the spontaneous onset of collective motion in the present model, we look for possible instabilities of the spatially homogeneous flow, that is the appearance of a uniform, non-zero, velocity field u (or pseudo-momentum field w). Considering a time-dependent, but spatially homogeneous flow, we get ∂w = (μ − ξ w2 )w. ∂t
(3.111)
Obviously, w = 0 is a solution for arbitrary values of the coefficients. However, √ this solution becomes unstable for μ > 0, when a non-zero solution w0 = μ/ξ e appears (e is a unit vector pointing in an arbitrary direction). From the expression (3.109) of μ, it turns out that μ = 0 corresponds to a threshold value ρt given by π λ(1 − e−σ0 /2 ) . 4(e−σ 2 /2 − 23 ) 2
ρt =
(3.112)
The transition line defined by ρt in the plane (ρ, σ ) is plotted on Fig. 3.9. The instability is seen to occur at any density, provided the noise is low enough. The transition line saturates at a value σt = (2 ln 23 )1/2 ≈ 0.90. Further instabilities leading to more complicated patterns, like traveling solitary waves are also observed, both at the level of the hydrodynamic equations [7] and in numerical simulations of the Vicsek model [29]. These instabilities occur in the parameter region denoted as B in Fig. 3.9.
3.5.3 Varying the Symmetries of Particles A similar approach can be used in models having different symmetries from the ones considered here. For instance, considering an experiment with vibrated rice grains, one observes that these vibrated elongated grains have a tendency to move back and forth along their main axis preferentially. Similarly to the self-propelled particles considered in the previous section, such particles can also be attributed a direction characterized by an angle θ (we are still considering a two-dimensional problem), but their internal symmetry renders the directions θ and θ + π equivalent. Interactions
122
3 Models of Particles Driven Out of Equilibrium
1 A
0.8 0.6
σ
B C
0.4 0.2
0
0
1
2
ρ
3
4
5
Fig. 3.9 Phase diagram of the hydrodynamic equation (3.104) in the noise-density plane (λ = 1, d0 = 0.5, v0 = 1). A transition line (full line) separates the domains with zero hydrodynamic velocity (region A), from the domain where collective motion occurs (regions B and C). In region C, homogeneous motion is stable (except if one goes to high density and low noise, in which case a further instability—not shown here—appears). In region B, homogeneous motion is unstable, and one observes solitary waves of high density moving over a disordered background.
between grains also tend to align close-by grains, but they need to take into account the equivalence of the directions θ and θ + π called nematic symmetry. To model such an experimental situation, we consider a model of point-like particles quite similar to the one introduced previously, but taking into account the nematic symmetry. At each elementary time step, particles move randomly either in the θ or in the θ + π direction with equal probability. Upon a collision, the angles θ1 and θ2 of the two particles are changed into θ1 = θ + η1 and θ2 = θ + η2 , where ˜ ˜ now θ = arg(ei θ1 + ei θ2 ) is the average angle obtained by choosing θ˜1 = θ1 [π ] and ˜θ2 = θ2 [π ] such that |θ˜2 − θ˜1 | < π . As before, η1 and η2 are independent Gaussian 2 noises with the same distribution p(η) and variance σ 2 . This model can be studied following the same lines [5, 43] as for the polar case by writing a Boltzmann equation, which takes a form similar to Eq. (3.89), but includes diffusion terms (of second order in space derivative) instead of the drift terms (of first order in space derivative), since particles are diffusing with no net motion. Integrating the Boltzmann equation over the angles yields the following evolution equation for the density ρ (making an appropriate choice of units): 1 1 ∂ρ = ρ + Re(∂ ∗2 f 2 ) , ∂t 2 2
(3.113)
where f 2 is the complex field describing nematic order. To derive an evolution equation for this order parameter f 2 , a procedure similar to the one used in Sect. 3.5.1 is used. Namely, one expands the Boltzmann equation in Fourier space, identifies the linear instability threshold and determines a consistent scaling ansatz close to
3.5 Collective Motion of Active Particles
123
threshold. Note that quite importantly, all odd Fourier coefficients f 1 , f 3 , etc. are equal to zero due to the nematic symmetry. The relevant order parameter is thus the second Fourier coefficient f 2 , and keeping the next Fourier coefficient f 4 in the truncation procedure turns out to be important to obtain the non-linear terms saturating the instability. The relevant scaling ansatz reads in this case f 2 ∼ ,
f 4 ∼ 2 , ρ − ρ0 ∼ , ∂ ∼ ,
∂ ∼ 2, ∂t
(3.114)
now corresponding to a diffusive space-time scaling, in line with the diffusive nature of the dynamics. After truncation and closure, one eventually obtains the following equation for the nematic field f 2 ∂ f2 1 1 = (μ − ξ | f 2 |2 ) f 2 + f 2 + ∂ 2 ρ, ∂t 2 4
(3.115)
where the coefficients μ and ξ can be expressed as a function of the density and of microscopic parameters of the models [5]. Here again, μ is negative below a threshold density ρt (σ ), and positive above. It is interesting to note that this equation is a generalization of the standard Ginzburg–Landau equation [1], with a coupling to the density field in the term linear in f 2 . The phase diagram of this equation is quite similar to that of the polar case, replacing polar order by nematic order. In particular, the linear instability line in the noise-density plane has a shape similar to that shown in Fig. 3.9 for the polar case. One of the main differences is that the high-density traveling bands appearing in the polar case are replaced in the nematic case by high-density bands that do not move and that are now oriented along the axis of order (while polar order is perpendicular to the band in the polar case). Quite importantly, these bands are themselves unstable for large enough systems, leading to a regime of spatiotemporal chaos in which long-range nematic order cannot build up; order is only restored going to higher density or lower noise. Note that other cases with “mixed” symmetries can also be considered [43], for instance, self-propelled particles interacting nematically, a case sometimes called “self-propelled rods” with experimental realizations, e.g., in biological systems like colonies of bacteria [42].
3.6 Exercices 3.1 Run-and-Tumble particle in a potential A Run-and-Tumble particle in two dimensions moves at speed v0 along the unit vector e(θ ). The angle θ is randomly drawn anew from a uniform angular distribution with a probability per unit time λ. In the presence of an external potential U (r), the dynamics of the position r = (x, y) of the Run-and-Tumble particle is given by Eq. (3.12), similarly to the Active Brownian particle. Assuming a large tumbling rate λ, derive the perturbative expression of the stationary profile ρ(x), to first order
124
3 Models of Particles Driven Out of Equilibrium
in 1/λ, considering a potential U that is invariant in the y-direction. Calculations can be performed following the same steps as done in Sect. 3.1.3 for the Active Brownian Particle. 3.2 Derivation of the Poisson distribution Show that the Poisson distribution P(n, T ) describing the statistics of the number n of events over a duration T for a Poisson process with rate λ is expressed as P(n, T ) =
(λT )n −λT e . n!
(3.116)
Hint: obtain a differential equation on P(n, T ) by relating the distribution P(n, T + dT ) to P(n, T ) and P(n − 1, T ) in the limit dT → 0, and show that the above expression of the Poisson distribution is solution of the differential equation. 3.3 Absorbing phase transition and birth–death process The fully connected model introduced in Sect. 3.2.3 in order to discuss fluctuations in an absorbing phase transition can be slightly generalized by introducing a small but non-zero transition rate W (1|0) = ε to create one particle from the empty state. In this way the stationary state is no longer the empty state. Determine the steadystate probability distribution Pn using the general results on birth–death processes described in Sect. 3.2.1. By eventually taking the limit ε → 0, recover the results of Sect. 3.2.3 that were obtained considering transient states in the model with ε = 0. 3.4 Frictional masses attached to a spring under tapping dynamics Consider N particles of mass m attached to a spring of stiffness λ and subjected to a dry friction force with a substrate, characterized by a friction coefficient μ (static and dynamic friction coefficients are considered equal). The particles are constrained to move on a one-dimensional axis x and can be characterized by their position xi on this axis. Particles are subjected to a periodic tapping dynamics: during each period, particles are first strongly shaken, and then let relax to a mechanically stable (or blocked) configuration. Particles do not interact with each other. (a) Write the evolution equation for the position of mass xi , and identify the range of blocked configurations. (b) Using the Edwards prescription, write the probability of a configuration (x1 , . . . , x N ) of the N particles, assuming an infinite compactivity, but a finite effective temperature Teff . (c) Evaluate the effective free energy per particle in the infinite N limit, and deduce a modified energy equipartition relation linking the average energy per spring to the effective temperature. 3.5 Mean-field Fokker–Planck equation for interacting active particles Consider a large number N of active Brownian particles subjected to alignment interactions with other particles that lie closer than an interaction range d0 . The angle θi of particle i obeys the following dynamics
3.6 Exercices
125
dθi = γ sin(θ j − θi ) + ξi (t) dt j∼i
(3.117)
with γ the interaction strength and ξi (t) a white noise satisfying ξi (t)ξ j (t ) = 2DR δi j δ(t − t ).
(3.118)
The notation j ∼ i in Eq. (3.117) means a sum over particles j within the interaction range of particle i. (a) Write the mean-field Fokker–Planck equation obtained under the approximation that the two-particle distribution can be factorized as a product of single-particle distributions f (r, θ, t). (b) Assuming space homogeneity [i.e., f (r, θ, t) = f (θ, t)], derive the evolution equation for the mean collective flux of particles w = ρu (where u is the collective velocity of active particles) using similar approximations as in Sect. 3.5.1. It is convenient to introduce the angular Fourier modes f k (t) associated with f (θ, t), and to neglect f k for |k| ≥ 3.
References 1. Aranson, I.S., Kramer, L.: The world of the complex Ginzburg-Landau equation. Rev. Mod. Phys. 74, 99 (2002) 2. Barrat, A., Kurchan, J., Loreto, V., Sellitto, M.: Edwards’ measures for powders and glasses. Phys. Rev. Lett. 85, 5034 (2000) 3. Bechinger, C., Di Leonardo, R., Löwen, H., Reichhardt, C., Volpe, G., Volpe, G.: Active particles in complex and crowded environments. Rev. Mod. Phys. 88, 045006 (2016) 4. Bertin, E.: Theoretical approaches to the steady-state statistical physics of interacting dissipative units. J. Phys. A: Math. Theor. 50, 083001 (2017) 5. Bertin, E., Chaté, H., Ginelli, F., Mishra, S., Peshkov, A., Ramaswamy, S.: Mesoscopic theory for fluctuating active nematics. New J. Phys. 15, 085032 (2013) 6. Bertin, E., Dauchot, O.: Far-from-equilibrium state in a weakly dissipative model. Phys. Rev. Lett. 102, 160601 (2009) 7. Bertin, E., Droz, M., Grégoire, G.: Hydrodynamic equations for self-propelled particles: Microscopic derivation and stability analysis. J. Phys. A Math. Theor. 42, 445001 (2009) 8. Bi, D.P., Henkes, S., Daniels, K.E., Chakraborty, B.: The statistical physics of athermal materials. Annu. Rev. Condens. Matter Phys. 6, 63 (2015) 9. Blythe, R.A., Evans, M.R.: Nonequilibrium steady states of matrix-product form: a solver’s guide. J. Phys. A Math. Theor. 40, R333 (2007) 10. Blythe, R.A., Evans, M.R., Colaiori, F., Essler, F.H.L.: Exact solution of a partially asymmetric exclusion model using a deformed oscillator algebra. J. Phys. A Math. Gen. 33, 2313 (2000) 11. Boudaoud, A., Cadot, O., Odille, B., Touzé, C.: Observation of wave turbulence in vibrating plates. Phys. Rev. Lett. 100, 234504 (2008) 12. Bricard, A., Caussin, J.B., Desreumaux, N., Dauchot, O., Bartolo, D.: Emergence of macroscopic directed motion in populations of motile colloids. Nature 503, 95 (2013) 13. Buttinoni, I., Bialké, J., Kümmel, F., Löwen, H., Bechinger, C., Speck, T.: Dynamical clustering and phase separation in suspensions of self-propelled colloidal particles. Phys. Rev. Lett. 110, 238301 (2013)
126
3 Models of Particles Driven Out of Equilibrium
14. Cates, M.E., Tailleur, J.: When are active Brownian particles and run-and-tumble particles equivalent? Consequences for motility-induced phase separation. EPL 101, 20010 (2013) 15. Cates, M.E., Tailleur, J.: Motility-induced phase separation. Annu. Rev. Condens. Matter Phys. 6, 219 (2015) 16. Derrida, B.: Non-equilibrium steady states: fluctuations and large deviations of the density and of the current. J. Stat. Mech. P07023 (2007) 17. Derrida, B., Evans, M.R., Hakim, V., Pasquier, V.: Exact solution of a 1d asymmetric exclusion model using a matrix formulation. J. Phys. A Math. Gen. 26, 1493 (1993) 18. Deseigne, J., Dauchot, O., Chaté, H.: Collective motion of vibrated polar disks. Phys. Rev. Lett. 105, 098001 (2010) 19. Düring, G., Josserand, C., Rica, S.: Weak turbulence for a vibrating plate: can one hear a Kolmogorov spectrum? Phys. Rev. Lett. 97, 025503 (2006) 20. Edwards, S.F., Grinev, D.V.: Statistical mechanics of vibration-induced compaction of powders. Phys. Rev. E 58, 4758 (1998) 21. Edwards, S.F., Mounfield, C.C.: The statistical mechanics of granular systems composed of elongated grains. Physica A 210, 279 (1994) 22. Edwards, S.F., Oakeshott, R.B.S.: Theory of powders. Physica A 157, 1080 (1989) 23. Essler, F.H.L., Rittenberg, V.: Representations of the quadratic algebra and partially asymmetric diffusion with open boundaries. J. Phys. A Math. Gen. 29, 3375 (1996) 24. Evans, M.R., Hanney, T.: Nonequilibrium statistical mechanics of the zero-range process and related models. J. Phys. A Math. Gen. 38, R195 (2005) 25. Evans, M.R., Majumdar, S.N., Zia, R.K.P.: Factorized steady states in mass transport models on an arbitrary graph. J. Phys. A Math. Gen. 39, 4859 (2006) 26. Frisch, U.: Turbulence. Cambridge University Press, Cambridge (1995) 27. Gomez-Solano, J.R., Blokhuis, A., Bechinger, C.: Dynamics of self-propelled Janus particles in viscoelastic fluids. Phys. Rev. Lett. 116, 138301 (2016) 28. Gradenigo, G., Ferrero, E.E., Bertin, E., Barrat, J.L.: Edwards thermodynamics for a driven athermal system with dry friction. Phys. Rev. Lett. 115, 140601 (2015) 29. Grégoire, G., Chaté, H.: Onset of collective and cohesive motion. Phys. Rev. Lett. 92, 025702 (2004) 30. Guioth, J., Bertin, E.: Lack of an equation of state for the nonequilibrium chemical potential of gases of active particles in contact. J. Chem. Phys. 150, 094108 (2019) 31. Hinrichsen, H.: Nonequilibrium critical phenomena and phase transitions into absorbing states. Adv. Phys. 49, 815 (2000) 32. Krebs, K., Sandow, S.: Matrix product eigenstates for one-dimensional stochastic models and quantum spin chains. J. Phys. A Math. Gen. 30, 3165 (1997) 33. Kudrolli, A., Lumay, G., Volfson, D., Tsimring, L.S.: Swarming and swirling in self-propelled polar granular rods. Phys. Rev. Lett. 100, 058001 (2008) 34. Le Bellac, M.: Quantum and Statistical Field Theory. Oxford Science Publications, Oxford (1992) 35. Mallick, K., Sandow, S.: Finite-dimensional representations of the quadratic algebra: applications to the exclusion process. J. Phys. A Math. Gen. 30, 4513 (1997) 36. Marchetti, M.C., Joanny, J.F., Ramaswamy, S., Liverpool, T.B., Prost, J., Rao, M., Simha, R.A.: Soft active matter. Rev. Mod. Phys. 85, 1143 (2013) 37. Mehta, A., Edwards, S.F.: Statistical mechanics of powder mixtures. Physica A 157, 1091 (1989) 38. Narayan, V., Menon, N., Ramaswamy, S.: Nonequilibrium steady states in a vibrated-rod monolayer: tetratic, nematic, and smectic correlations. J. Stat. Mech. (2006) 39. Nicolis, G., Prigogine, I.: Self-organization in nonequilibrium systems: from dissipative structures to order through fluctuations. J. Wiley, New York (1977) 40. Palacci, J., Cottin-Bizonne, C., Ybert, C., Bocquet, L.: Sedimentation and effective temperature of active colloidal suspensions. Phys. Rev. Lett. 105, 088304 (2010) 41. Palacci, J., Sacanna, S., Steinberg, A.P., Pine, D.J., Chaikin, P.M.: Living crystals of lightactivated colloidal surfers. Science 339, 936 (2013)
References
127
42. Peruani, F., Starruss, J., Jakovljevic, V., Søgaard-Andersen, L., Deutsch, A., Bär, M.: Collective motion and nonequilibrium cluster formation in colonies of gliding bacteria. Phys. Rev. Lett. 108, 098102 (2012) 43. Peshkov, A., Bertin, E., Ginelli, F., Chaté, H.: Boltzmann-Ginzburg-Landau approach for continuous descriptions of generic Vicsek-like models. Eur. Phys. J. Spec. Top. 223, 1315 (2014) 44. Sandow, S.: Partially asymmetric exclusion process with open boundaries. Phys. Rev. E 50, 2660 (1994) 45. Seifert, U.: Stochastic thermodynamics, fluctuations theorems and molecular machines. Rep. Prog. Phys. 75, 126001 (2012) 46. Theurkauff, I., Cottin-Bizonne, C., Palacci, J., Ybert, C., Bocquet, L.: Dynamic clustering in active colloidal suspensions with chemical signaling. Phys. Rev. Lett. 108, 268303 (2012) 47. Toner, J., Tu, Y.: Long-range order in a two-dimensional dynamical XY model: how birds fly together. Phys. Rev. Lett. 75, 4326 (1995) 48. Toner, J., Tu, Y., Ramaswamy, S.: Hydrodynamics and phases of flocks. Ann. Phys. (N.Y.) 318, 170 (2005) 49. Touchette, H.: The large deviation approach to statistical mechanics. Phys. Rep. 478, 1 (2009) 50. Vicsek, T., Czirók, A., Ben-Jacob, E., Cohen, I., Shochet, O.: Novel type of phase transition in a system of self-driven particles. Phys. Rev. Lett. 75, 1226 (1995) 51. Zakharov, V.E., L’vov, V.S., Falkovich, G.: Kolmogorov Spectra of Turbulence I: Wave Turbulence. Springer, Berlin (1992) 52. Zhou, S., Sokolov, A., Lavrentovich, O.D., Aranson, I.S.: Living liquid crystals. Proc. Natl. Acad. Sci. USA 111, 1265 (2014)
Chapter 4
Models of Social Agents
We have discussed in the previous chapter models of interacting macroscopic units like sand grains or bacteria. We would like now to go one step further and consider along the same logics stochastic models of social agents. Of course, true human beings are incredibly complex and have consciousness, so that they cannot, in general, be described as agents having only a handful of features. However, this description in terms of “social atoms” [25] may be relevant when considering very constrained situations like walking in a crowd [3, 28] or driving on a highway [2, 34], when a limited number of actions or decisions can be made. Models involving such very simplified agents in interaction already capture at a qualitative (or semi-quantitative) level some interesting and non-trivial collective behaviors that may emerge out of a large assembly of agents [7, 9, 12]. In this chapter, we discuss a few examples of models of social agents in which collective effects appear, with a special emphasis on phase transitions. Indeed, although the notion of phase transition has been originally introduced in the context of condensed matter physics [14], it turns out to be also a very relevant concept in the framework of models of social agents [7]. In this spirit, we discuss, in particular, the urban segregation phenomenon in the Schelling model (Sect. 4.1), the congestion of highway traffic (Sect. 4.2) and a model of competition between shops selling fresh products (Sect. 4.3). We then discuss in Sect. 4.4 the emergence of power-law distributions of wealth among individuals, and describe a minimal model accounting for this phenomenon. Finally, we explore in Sect. 4.5 a tentative way to take into account a greater complexity of agents within a statistical physics framework, and briefly discuss the interplay of internal and collective complexities in a minimal setting.
© Springer Nature Switzerland AG 2021 E. Bertin, Statistical Physics of Complex Systems, Springer Series in Synergetics, https://doi.org/10.1007/978-3-030-79949-6_4
129
130
4 Models of Social Agents
4.1 Dynamics of Residential Moves A standard example of complex system dynamics is the Schelling model which represents in a schematic way the dynamics of residential moves in a city [19, 20, 30, 32]. The Schelling model is actually a particular instance of the more general checkerboard model proposed slightly earlier by Sakoda [31], which unfortunately remained mostly unrecognized [22]. In the Schelling model, the city is modeled as a checkerboard, divided into cells. Two types of agents (say red and green) live in the city. They reside in the cells of the checkerboard, with at most one agent in each cell. Agents characterize their degree of satisfaction regarding their environment by a utility, which is a given function (the same for all agents) of the number of agents of the same type in their neighborhood. The neighborhood can be defined in different ways. One possibility would be to consider the set of nearest neighbors. However, most studies rather use the Morse neighborhood, that is, the 3 × 3 (or sometimes 5 × 5) square surrounding the current cell. Before moving, an agent chooses at random an empty cell, and evaluates the utility u new associated with this new location. The agent compares this quantity to the utility u old of his present location, by computing the utility difference u = u new − u old . The move is then accepted with probability 1/(1 + e−u/T ). Here, T is a parameter analogous to the temperature in physical systems that characterizes the influence of other factors, like the presence of facilities, shops, or friends that are not explicitly taken into account in the model, but could bias the decision of moving or not. At low T , and for a large class of utility functions such that agents have a (possibly slight) preference for being with agents of the same type, a segregation phenomenon is observed when simulating the model numerically: two types of domains form, namely, domains with a majority of red agents and domains with a majority of green agents. Quite surprisingly, this segregation phenomenon seems quite robust, and is also observed in the case where agents have a marked preference for mixed neighborhood. The Schelling model in its standard form is very hard to solve analytically, and solutions are not presently known. The reason for these difficulties is mainly that the neighborhoods of two neighboring cells overlap, generating complicated correlations in the system. In order to find an analytical solution, a standard strategy is to define a variant of the model on a specific geometry that avoids these correlations. This strategy was, for instance, successful in the Ising model, by introducing a fully connected version of the model (see Sect. 1.4.1): assuming that all spins interact together, the phase transition could be obtained analytically in a simple way. A straightforward application of this idea to the Schelling model a priori seems to lead to a deadlock. If an agent evaluates its utility by considering the whole city as its neighborhood, this utility will not change when moving within the city. A more interesting strategy is then to divide the city into a large number of blocks, so that agents evaluate their utility within blocks, and move from blocks to blocks. In this way, correlations between blocks may be suppressed.
4.1 Dynamics of Residential Moves
131
4.1.1 A Simplified Version of the Schelling Model In order to implement this strategy, we consider a model [20], with a single type of agents to further simplify the derivation (the case of two different types of agents is briefly discussed below). The segregation phenomenon then corresponds to the formation of domains of different densities. The city is divided into a large number Q of blocks, each block containing H cells (a cell may be thought of as representing a flat). We assume that each cell can contain at most one agent, so that the number n q of agents in a given block q (q = 1, . . . , Q) satisfies n q ≤ H . A microscopic configuration C of the city corresponds to the knowledge of the state (empty or occupied) of each cell. For each block q, we also introduce the density of agents ρq = n q /H . Each agent has the same utility function u(ρq ), which describes the degree of satisfaction concerning the density of the block it is living in. The collective utility is defined as the total utility of all the agents in the city: U (C) = q n q u(ρq ). A dynamical rule allows the agents to move from one block to another. At each time step, one picks up at random an agent and a vacant cell, within two different blocks. The agent moves in that empty cell with probability: W (C |C) =
1 , 1 + e−u/T
(4.1)
where C and C are the configurations before and after the move, respectively, and u is the variation of the individual utility of the chosen agent, associated with the proposed move. The parameter T has the same interpretation as in the standard Schelling model. It is interesting at this stage to emphasize the difference between the present model and standard physical approaches. It could seem at first sight that the utility is simply the equivalent, up to a sign reversal, of the energy in physics. In the present model, however, an economics perspective is adopted, so that the agents are considered as purely selfish. They make decisions only according to their own utility change u, and do not consider the potential impact of their decision on the other agents. In contrast, in physical models, the probability for a particle to move depends on the energy variation of the whole system, and the effect on the other particles is thus taken into account from the outset. This has important consequences, as we shall see below. We wish to find the stationary probability distribution P(C) of the microscopic configurations C. This is not an easy task in general. Yet, if we were able to show that a detailed balance relation holds in this model, we would straightforwardly get the solution. Let us assume that the individual cost u can be written as u = F(C) − F(C ), where F is a function on configuration space1 . From Eq. (4.1), we find that the dynamics satisfies a detailed balance relation: The relation u = F(C) − F(C ) is non-trivial, because the utility of a single agent cannot be computed from the sole knowledge of the system configuration; one also needs to know who is the considered agent, and this information is not included in the configuration C.
1
132
4 Models of Social Agents
W (C |C)P(C) = W (C|C )P(C ),
(4.2)
with a distribution P(C) given by P(C) =
1 −F(C)/T e , Z
(4.3)
where Z is the analog of a partition function. It can be shown that a function F satisfying this condition is given by F(C) = −
nq m . u H q m=0
(4.4)
To characterize the “segregation” phenomenon, the full statistical information on the occupation number of each cell is not necessary. Instead, an aggregated description in terms of densities of the blocks turns out to be more useful. Such a coarse-grained description is obtained by aggregating all configurations with the same number of agents in each block. As there are H !/[n!(H − n)!] ways of ordering n agents in H cells, we obtain the following coarse-grained probability distribution:
H ˜ 1 , . . . , n Q ) = K˜ exp − P(n f˜(n q ) , T q
(4.5)
with K˜ a normalization constant, and where we have introduced the function f˜: nq n!(H − n)! 1 m T ln − . u f˜(n) = H H! H m=0 H
(4.6)
The above expression suggests to consider the limit of large H in order to get a continuous formulation for f˜. Keeping constant the density of each block ρq = n q /H (ρq hence becoming a continuous variable) and expanding the factorials using Stirling’s formula ln n! ≈ n ln n − n, valid for large n, one obtains for H → ∞ n q !(H − n q )! 1 ln → ρq ln ρq + (1 − ρq ) ln(1 − ρq ) . H H!
(4.7)
Similarly, the last term in the expression of f˜ converges to an integral: nq
ρq 1 m → u u(ρ )dρ . H m=0 H 0
(4.8)
4.1 Dynamics of Residential Moves
133
˜ 1 , . . . , n Q ) turns into a probaIn terms of density ρq , the stationary distribution P(n Q bility density P(ρ1 , . . . , ρ Q ) given by (with q=1 ρq = Qρ0 held fixed): ⎛
⎞ Q H P(ρ1 , . . . , ρ Q ) = K exp ⎝− f (ρq )⎠ , T q=1
(4.9)
where K is a normalization constant, and where the function f (ρ) is defined as
f (ρ) = T ρ ln ρ + T (1 − ρ) ln(1 − ρ) −
ρ
u(ρ )dρ .
(4.10)
0
Q f (ρq ) may be called a potential, or a large The function (ρ1 , . . . , ρ Q ) = q=1 deviation function. It is also the analog of the free energy functions used in physics. The configurations (ρ1 , . . . , ρ Q ) that minimize the potential (ρ1 , . . . , ρ Q ) under the Q ρq are the most probable to come up. In the limit H → ∞, constraint of fixed q=1 these configurations are the only ones that appear in the stationary state, as the probability of other configurations vanishes exponentially with H .
4.1.2 Condition for Phase Separation Focusing on the large H case, the problem gets back to finding the set (ρ1 , . . . , ρ Q ) which minimizes the potential (ρ1 , . . . , ρ Q ) with the constraint q ρq fixed. We are interested in knowing whether the stationary state is statistically homogeneous or inhomogeneous. Following standard physics textbooks methods [10], the homogeneous state at density ρ0 is unstable against a phase separation if there exists two densities ρ1 and ρ2 such that γ f (ρ1 ) + (1 − γ) f (ρ2 ) < f (ρ0 ) .
(4.11)
The parameter γ (0 < γ < 1) corresponds to the fraction of blocks that would have a density ρ1 in the segregated state. This condition simply means that the value of the potential is lower for the segregated state than for the homogeneous state, so that the segregated state has a much larger probability to occur. Geometrically, inequality (4.11) corresponds to requiring that f (ρ) is a non-convex function of ρ. The values of ρ1 and ρ2 are obtained by minimizing γ f (ρ1 ) + (1 − γ) f (ρ2 ) over all possible values of ρ1 and ρ2 , with γ determined by the mass conservation γρ1 + (1 − γ)ρ2 = ρ0 . The corresponding geometrical construction is called the common tangent construction (see Fig. 4.1). We now try to translate the convexity condition (4.11) into a condition on the utility function u(ρ). Phase separation occurs if there is a range of density for which f (ρ) is concave, namely, f (ρ) < 0. We thus compute the second derivative of f ,
Fig. 4.1 Sketch of phase separation: the system of density ρ0 splits into two phases of densities ρ1 and ρ2 to lower its free energy. The densities ρ1 and ρ2 are determined from the common tangent construction (dashed line) on the free energy curve f (ρ) (full line). The free energy of the phase-separated system is given by the value of the tangent at ρ = ρ0
4 Models of Social Agents
f(ρ)
134
ρ1
ρ0
ρ2
yielding f (ρ) =
T − u (ρ) . ρ(1 − ρ)
(4.12)
For a given utility function, the sign of f (ρ) can be checked explicitly. We note that in the limit T → 0, f (ρ) = −u (ρ), so that the homogeneous state is stable (i.e., f (ρ) > 0) if u(ρ) is a monotonously decreasing function of ρ. The specific form of the utility function is an input of the model, and it can be postulated on a phenomenological basis, or rely on a theory of the interactions among agents. In order to analyze an explicit example of a non-linear utility function, we consider the peaked utility function defined as u(ρ) =
⎧ ⎨ 2ρ ⎩
if ρ ≤
1 2
2(1 − ρ) if ρ >
1 2
(4.13)
which is maximum for ρ = 21 (see left panel of Fig. 4.2). The expression of f (ρ) can be easily deduced from u(ρ), and is illustrated on the right panel of Fig. 4.2 for different values of T . To study the stability of the homogeneous phase, we look at the sign of f (ρ). One has for ρ < 1/2 f (ρ) =
T −2, ρ(1 − ρ)
(4.14)
f (ρ) =
T +2. ρ(1 − ρ)
(4.15)
and for ρ > 1/2:
It is easy to check that f (ρ) is minimum for ρ →
1− , the corresponding value being 2
135
1
0
0.8
-0.2
0.6
-0.4
f(ρ)
u(ρ)
4.1 Dynamics of Residential Moves
0.4
-0.6
0.2
-0.8
0 0
0.2
0.4
ρ
0.6
0.8
1
-1 0
0.2
0.4
ρ
0.6
0.8
1
Fig. 4.2 Left: utility function defined in Eq. (4.13). Right: corresponding effective free energy f (ρ), for different values of temperature, T = 0, 0.2, 0.5, and 0.8 (from top to bottom), illustrating that f (ρ) becomes non-convex for T < 0.5, which leads to a phase separation
lim− f (ρ) = 4T − 2 .
(4.16)
ρ→ 21
Thus, for T > 1/2, the function f (ρ) is convex on the whole interval 0 < ρ < 1 as f (ρ) > 0 on this interval, and the homogeneous phase is stable. On the contrary, for T < 1/2, there exists an interval of density ρ where f (ρ) is concave ( f (ρ) < 0), so that in the stationary state, the system is split into two phases with different densities. The surprising phenomenon here is that a phase separation occurs even in the case ρ = 1/2, although all agents have a significant preference for a half-filled neighborhood. This can be understood intuitively as follows. At a small, but non-zero temperature T , small fluctuations of density in the blocks are possible. Let us assume that we start from the homogeneous state of density ρ = 1/2, with some small fluctuations of this density around the mean value 1/2. If a block has a density smaller that 1/2, then this block becomes less attractive for the agents living in it. So some agents will start to move to the most attractive blocks which have exactly the density 1/2. In doing so, the initial block becomes less and less attractive, thus making more and more agents leave it. This avalanche process, which is related to the selfish behavior of the agents, qualitatively explains the instability of the homogeneous state with density 1/2. Interestingly, this model can be slightly generalized to take into account a degree of “altruism” of the agents, in the sense that agents may also partially take into account the cost imposed by their moves to the neighboring agents. This can be done by replacing u in Eq. (4.1) by a cost C = u + α(U − u),
(4.17)
where 0 ≤ α ≤ 1 is a weighting parameter, and U is the total variation of utility of all agents in the city. The effective free energy f (ρ) given in Eq. (4.10) is then
136
4 Models of Social Agents
changed into
ρ
f (ρ) = T ρ ln ρ + T (1 − ρ) ln(1 − ρ) + αρu(ρ) − (1 − α)
u(ρ )dρ . (4.18)
0
The case α = 1 is actually very similar to what happens in physics, if one maps the agents’ utility onto the opposite of an energy. The introduction of this parameter α has important consequences on the dynamics. It can be shown, in particular, that there exists a threshold value αc such that for α > αc , the phase separation disappears and the system remains homogeneous in steady state [20].
4.1.3 The “True” Schelling Model: Two Types of Agents To get closer to Schelling’s original model, we briefly mention the case where two types of agents are included. We thus introduce agents of two colors such as “red” and “green”. The two types of agents are labeled with subindexes “R” and “G” in the following. To keep the model solvable, we assume that agents of each type are only concerned by the density of agents of the same type in their neighborhood. Hence, the utility functions depend only on the density of the same type of agents, namely, u R (ρ R ) and u G (ρG ). We consider here the original “selfish” dynamics corresponding to α = 0. The effective free energy defined in Eq. (4.10) can be generalized to f (ρ R , ρG ) = −T ρ R ln ρ R − T ρG ln ρG − T (1 − ρ R − ρG ) ln(1 − ρ R − ρG )
ρR
ρG + u R (ρ )dρ + u G (ρ )dρ . (4.19) 0
0
In order to determine the equilibrium configurations of the model, one needs to find the set {ρq R , ρqG } maximizing the potential F(ρ1R , . . . , ρ Q R , ρ1G , . . . , ρ QG ) =
f (ρq R , ρqG )
(4.20)
q
under the constraints that the total number of agents of each type is fixed, that is, q ρq R = Qρ0R and q ρqG = Qρ0G , where ρ0G and ρ0R are, respectively, the overall densities of “green” and “red” agents. Due to the constraint that the total density of agents (disregarding their type) has to remain less than one, this model including two types of agents does not reduce to two uncoupled models composed of a single type of agents. It is however possible to compute the stationary states. We consider again the small “temperature” limit T → 0, and assume for simplicity that the overall densities of “red” and “green” agents are equal, ρ0R = ρ0G = ρ0 /2. One then finds a segregated state in which each block contains a single type of agents, at density ρ0 [20].
4.1 Dynamics of Residential Moves
137
Another generalization involving two types agents is to consider agents with the same utility function, but one group with α = 0 (“selfish agents”) and another group with α = 1 (“altruistic” agents). It has been shown that in this case, even a small fraction of altruistic agents may have an important effect (somewhat analogous to “catalytic” effects in chemistry) to increase the overall utility [26].
4.2 Traffic Congestion on a Single Lane Highway Another commonlife situation when a phenomenon akin to a phase transition appears is road traffic. When the car density becomes too high on a highway, for instance, traffic jams may form. In physical terms, this corresponds to a phase separation between a high-density slow-moving phase and a low-density fast-moving phase. The simplest example of phase separation in physics is the liquid gas phase separation. At the microscopic level, liquid–gas phase separation occurs because the interaction force between particles has an attractive part, so that particles tend to form a dense phase in some parameter regimes. For the same reason, purely repulsive particles do not phase separate at equilibrium. However, the situation is different out of equilibrium, and it has been discovered that breaking detailed balance by including a self-propulsion force may lead to a phase separation scenario called motility-induced phase separation, which has been observed both in numerical models and in experiments on self-propelled colloids [13]. A slightly different but related phenomenon occurs in the case of road traffic. To be specific, we refer here to single lane road traffic on a highway so that there is no traffic light, and in between two entrances or exits so that the number of cars is conserved. In terms of modeling, a car is not so different from a particle: both are described by their position and velocity as a function of time. There are, however, at least two important differences even in the crudest models of traffic. First, a car is a driven object, subjected to a self-propulsion force, and its motion is quite different from, e.g., Brownian particle motion, as the car always moves in the same direction (at least on a highway). Second, a car has a driver who adjusts the speed of the car depending on its environment. On a given type of road with a fixed allowed maximal speed, the driver adjusts the speed mostly as a function of the distance to the car in front. As a first approximation, this behavioral rule leads to a preferred speed which is a function of the local car density. Of course, all drivers are not identical, and their preferred speed as a function of density should slightly vary from one to the other. In addition, drivers take into account the information regarding cars in front much more than that regarding cars behind, and this fore-aft asymmetry may lead to further effects that are neglected in the present model by looking only at the local density.
138
4 Models of Social Agents
4.2.1 Agent-Based Model and Statistical Description We are now in a position to precisely define the model. We consider a large number of cars driving on a single lane, without entrances or exits on the portion of road considered. Car i has a velocity vi (t) = d xi /dt, where xi (t) is its position at time t. The driver adjusts the velocity vi so as to reach the desired speed u(ρ), where ρ = ρ(xi ) is the local car density seen by the driver of car i. The evolution equation of the velocity vi is assumed to be dvi = −γ vi − u(ρ) + ξi (t) , dt
(4.21)
where γ is the relaxation rate of the car speed (inverse of its relaxation time), and ξi (t) is a Gaussian white noise modeling the fact that the driver not only needs to follow the car flow at speed u(ρ), but also has to adjust the speed of the car not to collide with the car in front, leading to speed fluctuations. This effect becomes stronger when the density is increased, and we assume ξi (t) = 0 , ξi (t)ξi (t ) = 2D(ρ) δ(t − t ) ,
(4.22)
where the diffusion coefficient D(ρ) is an increasing function of ρ, going to zero when ρ → 0. The target speed u(ρ), the relaxation rate γ, and the diffusion coefficient D(ρ) are assumed for simplicity to be the same for all cars. This is for sure an oversimplification, but this toy model already contains the main ingredients allowing for traffic jams to occur, as we will see below. We define the one-particle phase-space distribution f (x, v, t) as the probability to find a car at position x, with velocity v, at time t. The evolution equation for f (x, v, t) is a Fokker–Planck equation, schematically written as
∂ ∂ ∂ ∂f + x˙ f + v˙ f = ∂t ∂x ∂v ∂x
D(ρ)
∂f ∂x
,
(4.23)
where the overdot indicates a time derivative; (x, ˙ v) ˙ are the components of the oneparticle phase-space velocity. Replacing (x, ˙ v) ˙ by their explicit expressions, one gets ∂ ∂ ∂ ∂f + vf − γ v − u(ρ) f = ∂t ∂x ∂v ∂x
∂f D(ρ) ∂x
.
(4.24)
It is convenient to define the density field ρ(x) and car flux field w(x) as follows:
ρ(x, t) =
∞
−∞
dv f (x, v, t) ,
w(x, t) =
∞
−∞
dv v f (x, v, t) .
(4.25)
Note that the ensemble average local velocity is given by v = w/ρ, but we do not use this quantity explicitly below. With these definitions, we obtain by integrating
4.2 Traffic Congestion on a Single Lane Highway
139
Eq. (4.24) over v the usual continuity equation for the density, ∂ ∂ρ ∂w + = ∂t ∂x ∂x
∂ρ D(ρ) ∂x
.
(4.26)
This equation is not closed in the sense that it involves two different fields ρ and w. We thus need a second equation to characterize the evolution of w. Multiplying Eq. (4.24) by v and integrating it over v, we get after some algebra ∂ ∂w ∂ + 2u(ρ)w − ρu(ρ)2 + γ w − ρu(ρ) = ∂t ∂x ∂x
D(ρ)
∂w ∂x
.
(4.27)
Note that Eq. (4.27) is not exact, as the second term on the LHS has been obtained via a linear expansion in v − u(ρ), assuming that f (x, v, t) is peaked around v = u(ρ). This approximation is needed to express the equation in terms of the fields ρ and ∞ w only. In its exact form, it involves the second-order moment −∞ dv v 2 f (x, v, t). More generally, the equation on the n th moment involves moments up to order n + 1, so that the hierarchy of moments can never be closed exactly. The coupled equations (4.26) and (4.27) can be studied (e.g., by numerical integration) to determine whether traffic jams form or not in this model. However, a simple analytical approach can be obtained by assuming that the relaxation rate γ is large. This is not a very restrictive assumption in itself, it rather means we consider only phenomena occurring on time scales significantly larger than 1/γ. Rewriting Eq. (4.27) as ∂ ∂ ∂w + γw = γρu(ρ) − 2u(ρ)w − ρu(ρ)2 + ∂t ∂x ∂x
D(ρ)
∂w ∂x
,
(4.28)
we see that w actually closely follows the evolution of the RHS of Eq. (4.28). If the latter evolves on time scales larger than 1/γ, the time derivative can be neglected and w can be approximated as 1 ∂ 1 ∂ 2u(ρ)w − ρu(ρ)2 + w = ρu(ρ) − γ ∂x γ ∂x
∂w D(ρ) ∂x
.
(4.29)
Unfortunately, this last equation is not an explicit expression of w, because w also appears on the RHS of Eq. (4.29). Yet, considering Eq. (4.29) as an expansion of w to first order in the small parameter 1/γ, one can replace w on the RHS by its zerothorder approximation, w = ρu(ρ) + O(γ −1 ), which leads to an explicit expression of w, 1 ∂ ∂ 1 ∂ ρu(ρ)2 + D(ρ) ρu(ρ) . (4.30) w = ρu(ρ) − γ ∂x γ ∂x ∂x Injecting this last expression in Eq. (4.26) eventually leads to a closed equation on ρ,
140
4 Models of Social Agents
∂ ∂ρ ∂ 2 ρu(ρ) + D(ρ) ρu(ρ) + D(ρ) . ∂x ∂x ∂x (4.31) Interestingly, the RHS of Eq. (4.31) contains some non-trivial terms that could have hardly been guessed from phenomenological considerations. These new terms come on top of the simple advection and diffusion terms that were present from the outset in Eq. (4.26) and have a simple interpretation. 1 ∂2 ∂ ∂ρ + ρu(ρ) = ∂t ∂x γ ∂x 2
4.2.2 Congestion as an Instability of the Homogeneous Flow We are now equipped to study whether traffic jams form in the present model of road traffic. A traffic jam typically appears when the homogeneous flow at a uniform density ρ becomes linearly unstable, meaning that a small perturbation around the homogeneous state gets amplified and eventually makes the flow strongly nonhomogeneous. To test the linear stability of the homogeneous flow, we linearize the evolution equation around a uniform profile ρ = ρ0 . We thus set ρ(x) = ρ0 + δρ(x) and expand Eq. (4.31) to linear order in δρ. We obtain 2 ∂ ∂ 3 1 ∂ 1 ∂ δρ = − ρu(ρ) δρ + ρu(ρ)2 + D(ρ0 ) δρ + ρu(ρ) δρ , 2 ∂t ∂x γ ∂x γ ∂x 3 (4.32) with the shorthand notations d ρu(ρ) , ρu(ρ) = dρ |ρ0
d ρu(ρ)2 = ρu(ρ)2 . dρ |ρ0
(4.33)
Since this is a linear equation in δρ, it can be studied by taking its Fourier transform. In practice, this boils down to considering a complex perturbation δρ(x) = δρ0 est+iq x .
(4.34)
The complex growth rate s is then obtained as i 1 s = −iq ρu(ρ) − q 2 D(ρ0 ) + ρu(ρ)2 − q 3 ρu(ρ) . γ γ
(4.35)
A linear instability is obtained when the real part of s is positive, so that the amplitude of the perturbation δρ increases with time. We have Re(s) = −q
2
1 D(ρ0 ) + ρu(ρ)2 γ
meaning that the sign of the effective diffusion coefficient
(4.36)
4.2 Traffic Congestion on a Single Lane Highway
Deff (ρ) = D(ρ) +
141
1 d ρu(ρ)2 γ dρ
(4.37)
controls the stability of the homogeneous state. When Deff (ρ) > 0, the perturbation regresses exponentially and the homogeneous state is stable. In contrast, when Deff (ρ) < 0, the perturbation grows exponentially and the homogeneous state is unstable. As a concrete illustration, we choose the simple form u(ρ) = u 0 (1 − ρ/ρm ) for the target speed, where u 0 and ρm are two parameters. The density ρm is the maximal possible density of cars on the road, given the fact that cars have a non-zero extension and cannot overlap. We also choose the simple form D(ρ) = D1 ρ, with D1 a parameter, assumed to be small if needed. One finds that the effective diffusion coefficient Deff (ρ) is positive for 0 < ρ < ρc and ρs < ρ < ρm , so that the homogeneous state is linearly stable on these ranges of density. In contrast, Deff (ρ) < 0 for ρc < ρ < ρs , leading to a linear instability of the homogeneous state. Choosing time and length units such that u 0 = 1 and γ = 1, the densities ρc and ρs , solutions of Deff (ρ) = 0, are given for small D1 by ρc 1 1 = + D1 + O(D12 ) , ρm 3 6
ρs 1 = 1 − D1 + O(D12 ) . ρm 2
(4.38)
This instability of the homogeneous state is expected to lead to the formation of a traffic jam, that is, a phase separation between a high-density phase at density ρh and a low-density phase at density ρ , similarly to standard phase separation scenarios like the liquid–gas phase separation. The densities ρ and ρh lie in the stability intervals of the homogeneous state, meaning that they satisfy 0 < ρ < ρc and ρs < ρh < ρm . At equilibrium, or in equilibrium-like models like the Schelling model, the densities ρ and ρh are determined by the double tangent construction illustrated on Fig. 4.1, because they are found as solutions of a variational problem. In the present traffic flow model, no simple variational formulation is available, and it is not clear how to determine the densities ρ and ρh analytically. Nevertheless, it is likely that ρ is close to ρc , and that ρh is close to ρs (in other words, the binodal points are expected to be close to the corresponding spinodal points). Under this assumption, one can plot the qualitative phase diagram of the model. When studying traffic flow either experimentally or theoretically, it is customary to represent the average flow of cars w = ρu(ρ) as a function of the car density ρ. Such a diagram is called in this context the fundamental diagram of traffic flow [34]. For the model at hand, the fundamental diagram is sketched in Fig. 4.3. The stable states are indicated in full lines, while the unstable homogeneous state is indicated by a dashed line. For simplicity, and in the absence of more detailed information, we have neglected the distinction between binodal and spinodal points when plotting the graph. The right part of the diagram has a characteristic shape called “inverse λ shape” in the traffic flow literature (see, e.g., [34]). Note that as usual, the relative fractions α and αh = 1 − α of the low- and high-density phases, respectively, are obtained by the conservation law α ρ + (1 − αh )ρm = ρ. Then in the phase-separated state, the
Fig. 4.3 Sketch of the fundamental diagram of traffic flow: plot of the car flux ρu(ρ) as a function of ρ. The full line indicates the stable state, corresponding to a congested flow for ρc < ρ < ρs and to a homogeneous flow otherwise. The dashed line indicates the unstable homogeneous flow
4 Models of Social Agents
ρ u(ρ)
142
0 0
ρc
ρ
ρs
ρm
average car flow w is evaluated as a weighted average of car flows in the low- and high-density phases, respectively, w = α ρ u(ρ ) + αh ρh u(ρh ). To conclude, we note that the above model is just a simple example of traffic flow models that fits into the general statistical physics framework presented in this book. Many different models of traffic flow have been considered in the literature, often postulating continuum equations on phenomenological grounds [2, 21, 34], or only partly taking into account driver-level behaviors. Particular attention has been paid to the consistency of the models, ensuring that information does not travel faster than car speed, or taking into account the fact that drivers mostly focus their attention onto cars in front. Derivations of macroscopic equations for the traffic flow have also been proposed on the basis of kinetic theory [16, 23, 24]. In addition to such continuous-space models of traffic flow, some lattice models have also been proposed, where cars jump from one lattice site to the next at discrete times [17, 18, 33]. For specific choices of the dynamical rules, such lattice models have the advantage that their stationary probability distribution can be worked out exactly, but their solution is technical involved. Considered models of this type are often generalizations with parallel update rules of the ASEP model described in Sect. 3.3.3, which was defined with asynchronous updates.
4.3 Symmetry-Breaking Transition in a Decision Model We have seen above in the Schelling model and in the road traffic model that phase transitions may occur in models of social agents, for instance, in the form of a discontinuous phase transition leading to a phase separation. We discuss below a different example leading to a continuous phase transition associated with a spontaneous symmetry breaking. The model deals with a set of agents buying fresh products in two competing stores, and are more satisfied with fresher products. They thus choose with higher probably the shop they believe from past experience to sell fresher prod-
4.3 Symmetry-Breaking Transition in a Decision Model
143
ucts. This typically means that the other shop sells less products, and its available products become less fresh. This is the essence of the feedback mechanism at play in this model, leading to a symmetry breaking phase transition.
4.3.1 Choosing Between Stores Selling Fresh Products The model is composed of N agents which randomly visit one of two shops selling similar products, which differ only by their age τ [27]. The number of products is fixed and equal to N p in each shop. Each time a product is sold, it is immediately replaced by a new one having age τ = 0. Products are characterized by a freshness observable called h(τ ), ranging between 0 and 1, with the value 1 corresponding to a fully fresh product. For simplicity, we choose in the following a simple exponential form h(τ ) = e−τ /τa , with τa the typical time over which products age. At each time step, separated by a duration δt = τ0 /N , an agent i is randomly selected and chooses to visit the first shop with a probability Pi =
1 , 1 + e−σi /T
(4.39)
while it visits the second shop with the complementary probability 1 − Pi . When visiting a store, the agent buys a product at random. In Eq. (4.39), T is a parameter akin to a temperature, as in the Schelling model, and the quantity σi = Si1 − Si2 quantifies the difference in satisfaction (or utility) between the two shops. After visiting shop j = 1 or 2, agent i updates its satisfaction regarding the corresponding shop as (4.40) Si j = αSi j + (1 − α)h(τ ) , where α ∈ (0, 1) may be interpreted as a memory parameter. Numerical simulations of the model show that at high temperature both shops reach the same activity level (measured as the average number of customers by unit time), whereas at low temperature, one of the shops has a high activity (and very fresh products) and the other shop has a low activity (and old products) [27].
4.3.2 Mean-Field Description of the Model It is interesting to look for a simple mean-field description of the model, allowing for an analytical characterization of the phase transition. The basic idea of the mean-field approach is here to consider a “representative agent” [1], whose characteristics are the average characteristics of the population. Using Eq. (4.40) and assuming that 0 < 1 − α 1, we obtain a continuous-time evolution equation for σ
144
4 Models of Social Agents
dσ = ν −σ(t) + h 1 (t) − h 2 (t) , dt
(4.41)
where we have introduced ν = (1 − α)/τ0 . The quantity h j (t) is the average freshness of products in shop j. To evaluate this quantity, we need to determine the age distribution of products in each store. The distribution φ j (τ , t) of the age τ of products in shop j evolves according to ∂φ j ∂φ j (τ , t) = − (τ , t) − λ j (t) φ j (τ , t) , ∂t ∂τ
(4.42)
where λ j dt is the mean fraction of products that have been sold in a time dt in shop j. Given the fraction p j of agents visiting the store j, the parameter λ j can be expressed as Npj , (4.43) λj = N p τ0 which is nothing but the average flux of customers N p j /τ0 normalized by the number N p of products in the shop. Equation (4.42) admits an exponential steady state φstj (τ ) = λ j e−λ j τ ,
(4.44)
leading for the average freshness h j to
∞
hj = 0
dτ φstj (τ ) e−τ /τa
R = 1+ 2pj
−1
,
(4.45)
having introduced the parameter R=
2N p τ0 . N τa
(4.46)
The parameter R plays the role of a control parameter, and characterizes the dynamical regime of the shops. For small R, the flux of customers is high and products are rapidly sold and renewed, thus remaining fresh on average. By contrast, the opposite regime of a large R instead corresponds to a situation where the flux of customers is low, and products get old before being sold. Although the expression (4.45) should, in principle, be corrected to take into account non-stationary effects, for simplicity we neglect these corrections which have no effect on the determination of the stationary state of the model. To simplify the expression of the evolution equation (4.41), it is convenient to replace the satisfaction difference σ by a magnetization-like variable m = 21 ( p1 − p2 ), according to the following relation:
4.3 Symmetry-Breaking Transition in a Decision Model
σ = T ln
1+m 1−m
145
.
(4.47)
In terms of the variable m, the evolution equation (4.41) can be rewritten as −1 −1 R dm R γ(1 − m 2 ) 1−m + 1+ − 1+ . = T ln dt 2T 1+m 1+m 1−m (4.48) We are now in a position to analyze the phase transition behavior of the model.
4.3.3 Symmetry-Breaking Phase Transition In the present framework, a symmetry-breaking phase transition is signaled by the onset of a stable fixed point m = 0 of the dynamical equation (4.48). We thus expand the right-hand side of Eq. (4.48) to order m 3 , leading to dm = am − bm 3 , dt with
γ a= T
R −T (1 + R)2
γ b= T
R 2 (2 + R) 2T − (1 + R)4 3
(4.49) .
(4.50)
One easily sees that the coefficient a of the linear term in Eq. (4.49) vanishes for T = Tc , with R Tc = . (4.51) (1 + R)2 With this definition, one has a = γ(Tc − T )/T . For T > Tc , the coefficient a is negative, implying that the state m = 0 is a linearly stable fixed point (see Chap. 7 for more details on fixed points and their stability in dynamical systems). In contrast, for T < Tc the coefficient a becomes positive, meaning that the fixed point m = 0 is linearly unstable. This implies that a symmetry-breaking phase transition takes place in the statistical physics language (which is justified by the fact that the model involves a large number of agents). From a dynamical system perspective, this corresponds to a bifurcation (see Chap. 7). To determine whether the non-zero value of m emerges continuously from m = 0 or if it appears in a discontinuous way at a non-zero value of m, one needs to look at the coefficient b of the cubic term in Eq. (4.49). If b > 0√ when a > 0, the cubic term stabilizes the instability and a non-zero value of m ≈ a/b emerges continuously from m = 0 at the transition. If on the contrary one has b < 0 at the transition, the cubic term also contributes to the instability, and higher order terms have to be taken into account to stabilize the transition, which is discontinuous in this case. Evaluating b(Tc ), one finds that b(Tc ) > 0 for R > R ∗ and b(Tc ) < 0
146
4 Models of Social Agents
for R < R ∗ , where R ∗ is given by R∗ =
√ 3 − 1.
(4.52)
As a result, the symmetry-breaking transition is continuous for R > R ∗ and discontinuous for R < R ∗ [27]. Interestingly, it is possible to generalize this model by introducing a price on top of the freshness. Assuming that the price is strongly decreased for old products, one obtains in some cases an oscillatory regime where m oscillates as a function of time, each shop becoming periodically more attractive for the customers [35].
4.4 A Dynamical Model of Wealth Repartition The way wealth is shared among individuals is known to be very uneven, in the sense that a large part of the total wealth is typically owned by a small fraction of individuals. More quantitatively, the wealth distribution often has a power-law tail with a relatively small exponent (called Pareto distribution in this context), corresponding to a relatively important probability of large individual wealth. Although real socioeconomic mechanisms at play are likely to be very complex and multifactorial, one may wonder if some simple mathematical model could already capture some stylized facts like the emergence of a power-law distribution. In addition, such a simple model could allow one to relate the exponent of the wealth distribution to the parameters of the model describing the dynamics of individual wealth, thus potentially shedding light on how to possibly make the distribution less uneven.
4.4.1 Stochastic Coupled Dynamics of Individual Wealths As an illustrative example of a dynamical model of wealth distribution, we consider the model introduced in Ref. [8]. The model consists of N agents having each an individual wealth Wi (i = 1, . . . , N ) following the dynamics dWi = m + ξi (t) Wi + Ji j W j − J ji Wi . dt j (=i)
(4.53)
The first term on the RHS describes the individual random gain or loss (of relative mean m) generated by investments for instance. The Gaussian white noise ξ(t), satisfying ξ(t)ξ(t ) = 2σ 2 δ(t − t ), describes stochastic fluctuations of the gain or loss process. Technically, the multiplicative noise term is understood here in the Stratonovich sense. The second term on the RHS describes monetary exchanges between agents. The coupling parameter Ji j , which has the dimension of a rate (i.e.,
4.4 A Dynamical Model of Wealth Repartition
147
inverse of a time), describes the fact that agent j buys a product to agent i. The probability to buy a product is assumed to be proportional to the wealth of the buyer. This term corresponds to a redistribution process that conserves the total wealth (note the formal similarity with a master equation describing probability conservation), while the first term in the RHS does not conserve the total wealth. The model can be studied numerically for arbitrary couplings Ji j . To get a simple analytical solution, we consider here only the simplest, mean-field version of the model where all couplings are equal, Ji j = J/N . Note the 1/N scaling which is introduced here to keep a relaxation time of order one for large N . For this mean-field model, the individual wealth dynamics reads dWi = (m − J )Wi + Wi ξi (t) + J W , dt
(4.54)
where the average wealth W is defined as W =
N 1 Wi . N i=1
(4.55)
This is a random variable, but due to the Law of Large Numbers (see Sect. 8.1.1), it converges for N → ∞ to the ensemble average W . In the following, we implicitly take the large N limit and not longer distinguish W and W . We first try to determine the time evolution of the average wealth W . For reasons that will become clear below, it is convenient to transform the Stratonovich mean-field Langevin equation (4.54) into an Ito Langevin equation, using the transformation rule Eq. (2.100) for the drift term. This leads to the following equation: dWi (Ito) = (m − J + σ 2 )Wi + Wi ξi (t) + J W . dt
(4.56)
Formally, this simply corresponds to adding a contribution σ 2 Wi in the drift term. But the important point, as discussed in Sect. 2.2.3, is that the multiplicative noise term Wi ξi (t) is interpreted in different ways in the Stratonovich and Ito conventions. Taking the average of Eq. (4.56), we have to evaluate Wi ξi (t), which factorizes in the Ito convention, leading to Wi ξi (t) = Wi ξi (t) = 0
(4.57)
since ξi (t) = 0. We thus obtain a simple evolution equation for W , dW = (m + σ 2 )W , dt leading to the exponential time dependence
(4.58)
148
4 Models of Social Agents
W (t) = W 0 e(m+σ
2
)t
.
(4.59)
The average wealth W (t) may thus increase or decrease exponentially with time, depending on both the average and the variance of investments returns.
4.4.2 Stationary Distribution of Relative Wealth To determine the shape of the wealth distribution, it is convenient to introduce the normalized (or relative) wealth wi = Wi /W (t) ,
(4.60)
which may be expected to have a stationary distribution at large time. Coming back to the Stratonovich convention, we get from Eq. (4.53) dwi = −(J + σ 2 )wi + wi ξi (t) + J . dt
(4.61)
Using the results of Sect. 2.3.3, the corresponding (Stratonovich) Fokker–Planck equation reads ∂ p(w)/∂t = −∂J(w)/∂w, with a probability current J(w) given by ∂p . (4.62) J(w) = − (J + σ 2 )w + T p(w) − σ 2 ∂w Note that we have dropped the index i since all wi ’s play the same role. The equilibrium distribution peq (w), corresponding to the zero current condition J(w) = 0, is obtained by integration as [8] peq (w) =
1 e−(μ−1)/w Z w 1+μ
(w > 0),
(4.63)
with the single parameter μ defined as μ=1+
J . σ2
(4.64)
∞ The normalization constant Z is given by Z = (μ)/(μ − 1)μ , with (μ) = 0 dt t μ−1 e−t the Euler Gamma function. The distribution peq (w) thus has a power-law tail ∼ 1/w1+μ , with μ > 1 so that the average is finite (contrary to the case 0 < μ ≤ 1). One can easily check from the full expression (4.63) of peq (w) that w = 1 as it should, due to the definition (4.60) of w. The expression (4.64) of the exponent μ shows that, not surprisingly, favoring exchanges (i.e., increasing the coupling J ) increases μ and thus reduces the inequalities in wealth distribution. Conversely,
4.4 A Dynamical Model of Wealth Repartition
149
increasing the amplitude of individual gain and loss strategies (i.e., increasing σ) decreases μ and enhances inequalities in wealth distribution.
4.4.3 Effect of Taxes The above toy model does not take into account any institutional redistribution mechanism like taxes. It is possible to include taxes in the following way. An income tax can be introduced by continuously taking away a fraction φ of the income dWi /dt. The income tax may be partly redistributed, and we assume for simplicity that an equal fraction f /N (with f < 1) of the total income tax is redistributed to each agent. Similarly, a capital tax could be introduced by taking away a fraction of the wealth of each agent and redistributing part of it. However, to lighten the presentation we focus here on the case of the income tax only. The dynamics of the wealth Wi then reads dW dWi dWi = (m − J )Wi + Wi ξi (t) + J W − φ + fφ , dt dt dt
(4.65)
where we have used again the Stratonovich convention. Performing similar calculations as above, one finds that the average wealth W grows exponentially, W (t) = W 0 eγt , with γ given by [8] σ2 1 m+ . γ= 1 + φ(1 − f ) 1+φ
(4.66)
The equilibrium distribution of the normalized wealth wi = Wi /W still has a powerlaw tail [8] peq (w) ∼ 1/w 1+μ , with an exponent μ now given by μ=1+
σ2 f φ(1 + φ) J m + . (1 + φ) + σ2 1+φ σ 2 1 + φ(1 − f )
(4.67)
The exponent μ is plotted as a function of φ on Fig. 4.4, for different values of f . As may have been expected intuitively, the income tax leads to a reduction of the wealth inequalities (i.e., μ increases). Not surprisingly either, this effect is stronger if a larger fraction f of the income tax is redistributed. To conclude, let us mention that this type of wealth distribution model can be generalized in different ways, for instance, by making it non-mean field [8], by taking into account risk aversion or by including social protection policies which favor in some way the agents with lowest wealth during an exchange [11].
150 3.5
f=0 f = 0.2 f = 0.4 f = 0.6
3 2.5
μ
Fig. 4.4 Plot of the exponent μ as a function of the income tax rate φ, for different values of the redistributed fraction f ( f increases from bottom to top). A larger exponent μ, and thus a less unequal wealth distribution, is found when increasing either φ or f (parameters: J = 0.5, σ = 1, m = 1)
4 Models of Social Agents
2 1.5 1
0
0.2
0.4
φ
0.6
0.8
1
4.5 Emerging Properties at the Agent Scale Due to Interactions We now explore an important aspect of the statistical physics of social agents. Although statistical physics tends to consider very simplified models, one should not completely discard the intrinsic complexity of human beings when considering models of social agents. As a first step in this direction, we consider a model in which agents have a large number of internal configurations, and we explore whether interactions between agents could in some cases lead to an apparent simplification of agents, whereby some common characteristics of agents could emerge. In a second step, we further study how the emergence of a common characteristic at an agent scale can in turn lead to some form of collective organization in the presence of appropriate interactions. Using simple spin-like models, we illustrate how interactions between complex particles or agents having many internal configurations can induce a change of properties at the agent scale, like generating a non-zero spin. Interactions between these emerging spins can in turn lead to a spontaneous symmetry breaking at the global scale. In other words, interactions can modify the state of the system both at the microscopic and at the macroscopic scales.
4.5.1 A Simple Model of Complex Agents We consider a model composed of N interacting agents, each of them having many internal states [5]. These internal states are characterized by a configuration C ∈ {1, ..., H + 1}, where H is a fixed but large number. A spin-like variable Si (C) ∈ {0, 1} indicates whether the agent presents or not a given characteristic when in configuration C; if the characteristic is present, then Si (C) = 1. We assume that a
4.5 Emerging Properties at the Agent Scale Due to Interactions
151
single configuration (labeled as C = 1 for simplicity) allows for the characteristic to appear. Hence, Si (1) = 1 whereas Si (C) = 0 for C ∈ {2, ..., H + 1}. The idea is that since the characteristic is absent in most configurations, it should be essentially unobserved, unless a strong dynamical bias, that could result from interactions between agents, increases the probability of configuration C = 1. It is customary in social models to assume that the dynamics of agents is ruled by a utility function u i (C) that encodes the behavior of agents and their interactions. Apart from the inherent stochasticity of the dynamics, agents tend to choose new configurations in such a way as to maximize their individual utility. Following standard models like the one of Ref. [29], we assume an Ising-like form for the utility u i of agent i, K ui = Si S j . (4.68) N j (=i) From this utility function, one can then build a transition rate for an agent to change configuration, 1 (4.69) W (C |C) = 1 + e−u i /T with u i = u i (C ) − u i (C). Note that the configurations of all other agents are kept fixed in the transition. This is an important point because the utility of an agent depends not only on its internal configuration, but also on the configurations of all other agents. The stochasticity of the dynamics is quantified by the parameter T , akin to temperature in physics. It is useful to reexpress the utility variation u i of a given agent i as the variation of a global quantity that we call E, and which plays a role similar to energy (up to a sign) in physics. In the present model, E takes the form K Si S j . (4.70) E= 2N i, j (i= j) Note that E differs from the total utility U = i u i , because E = 21 U . Although the form (4.70) resembles the energy of the Ising model, a significant difference is that the microscopic configurations of the model are not defined by the values of the spins as in the Ising model, but rather by the underlying configurations C; the spin of an agent is a function of C. This difference has an important impact on the counting of configurations, as discussed below. Thanks to the form (4.69) of the transition rate, and to the property u i = E (which, importantly, is valid for all i while E is independent of i), one finds that the equilibrium distribution is P(C1 , . . . , C N ) ∝ e E/T ,
(4.71)
because the transition rate satisfies a detailed balance property with respect to this distribution.
152
4 Models of Social Agents
We wish to study the possible emergence, at a statistical level, of the characteristic encoded by the spin value Si = 1. With this aim in mind, we introduce an order parameter N 1 Si . (4.72) q= N i=1 Interestingly, the pseudo-energy E can be expressed in terms of q for large N , as E = N ε(q) =
1 N K q2 . 2
(4.73)
The probability density P(q) is evaluated by summing the equilibrium distribution P(C1 , . . . , C N ) over all configurations (C1 , . . . , C N ) compatible with the selected value of q, which leads to P(q) =
P(C1 , . . . , C N ) δ q(C1 , . . . , C N ) − q .
(4.74)
C1 ,...,C N
The notation δ indicates here a Kronecker delta, equal to 1 if its integer argument is equal to 0, and equal to 0 otherwise. The distribution P(q) can be rewritten as P(q) ∝ e N [S(q)+βε(q)] ,
(4.75)
where S(q) is the entropy counting the number of configurations associated with a given value of q. The entropy is evaluated from the number (q) of such configurations, according to 1 ln (q) . (4.76) S(q) = N The number (q) is obtained from a simple counting argument as (q) =
N! H N0 , N0 !N1 !
(4.77)
having defined the numbers N0 and N1 of agents, respectively, having spin values Si = 0 and 1. It follows that the distribution P(q) can be written for large N in a so-called large deviation form (see Chap. 8), P(q) ∝ exp[N f (q)]
(4.78)
in which the function f (q) is called a rate function or large deviation function. An important point is that the function f (q) is independent of N . It is the analog of the Landau free energy defined in the physics of phase transitions. Given that q = N1 /N , we end up with the following expression of f (q):
4.5 Emerging Properties at the Agent Scale Due to Interactions
1
f(q)
0.78968
0
0.78966
f(q)
Fig. 4.5 Plot of f (q) for various temperatures (from bottom to top, T > TS , T = TS , and T < TS ). A discontinuous phase transition is observed: for T > TS the maximum of f (q) is found for q close to 0, while for T < TS it is found at q close to 1 (parameter values: T = 0.6, 0.5429, and 0.5; H = 104 and K = 10)
153
0.99995
-1
-2
0
0.2
0.4
1
q
q
0.6
1 f (q) = −q ln q − (1 − q) ln(1 − q) + β K q 2 + (1 − q) ln H , 2
0.8
1
(4.79)
which is illustrated on Fig. 4.5. We observe graphically that a discontinuous transition occurs at a finite temperature TS . The most probable value q ∗ corresponds to q ∗ ≈ 0 at high temperature, for T > TS , and to q ∗ ≈ 1 at low temperature, for T < TS , meaning that the transition is discontinuous. These results can be obtained more formally as follows. Equating the derivative f (q) to zero and looking for the solution q ∗ that maximizes f (q), we identify the transition temperature as TS =
K . 2 ln H
(4.80)
Including finite-H corrections, one finds for T > TS that q ∗ ≈ 1/H , while for T < TS one has q ∗ ≈ 1 − H e−K /T . This transition can be interpreted by saying that at low temperature, or equivalently for strong coupling, agents may be considered as “standardized” [5] as they are all with a high probability in the configuration Si = 1, and thus exhibit the corresponding characteristic. Hence, they have in a way lost their complexity due to the interactions. In contrast, at high temperature, they keep their complexity and visit rather evenly their many internal configurations. This emergence of a characteristic at the individual agent level is to be contrasted with the standard scenario of collective phenomena in statistical physics, where interactions result in emerging phenomena at a global scale, simultaneously involving many agents. By contrast, here interactions lead to changes on individual agents, through a sort of feedback mechanism mediated by interactions. Yet, we will see below how this standardization of agents may precisely allow for more standard collective phenomena to take place.
154
4 Models of Social Agents
4.5.2 Collective Order for Interacting Standardized Agents We have seen that interactions between agents can lead, in some parameter range, to an emerging characteristic at the individual agent level. In the above minimal model, standardized agents all have the same value of the characteristic. However, if the characteristic had more than a single value, as is the case, for instance, for a spin variable s = ±1, adding some new interactions in the system may allow for some collective organization through a symmetry-breaking transition. This is precisely what we would like to illustrate here. To illustrate this mechanism, we consider a generalization of the agent model, in which the variable Si (C) is allowed to take three distincts values, −1, 0, and 1. We still wish that most configurations C are such that Si (C) = 0. As a minimal generalization of the above model, we label configurations C such that Si (1) = 1 and Si (2) = −1, while one has Si (C) = 0 for C = 3, . . . , H + 2 (where we recall that H is a large number). In this way, the presence of a characteristic is encoded by |Si |, whereas the possibility of a symmetry breaking would result from the interactions of the spins Si . To include both interactions between |Si | and between Si , we consider the following utility: ui =
K J |Si | |S j | + Si S j , N j (=i) N j (=i)
(4.81)
and keep the transition rate in the same form as in Eq. (4.69). Following similar steps as in the initial version of the model, we find as previously that P(C) ∝ e E/T
(4.82)
with now a new expression for the pseudo-energy E, E=
J K |Si | |S j | + Si S j , 2N i, j (i= j) 2N i, j (i= j)
(4.83)
a form which suggests connections with the spin-1 model studied by Blume, Emery, and Griffiths [6] in a condensed matter context. As mentioned above, we are now interested in two coupled phenomena, the standardization of agents, and the possibility of the self-organization of collective order through a spontaneous symmetry-breaking mechanism. We thus need to introduce two distinct order parameters to quantify each phenomenon: the order parameter q used above and the magnetization m as in the Ising model, N N 1 1 |Si |, m = Si . (4.84) q= N i=1 N i=1
4.5 Emerging Properties at the Agent Scale Due to Interactions
155
It is straightforward to see that |m| ≤ q. In the large-N limit, E can be reexpressed in terms of these order parameters, E = N ε(q, m) =
1 1 N K q2 + N J m2 . 2 2
(4.85)
To study both the standardization of agents and the possible onset of a non-zero magnetization, it is convenient to introduce the joint distribution P(q, m), which takes a large deviation form (4.86) P(q, m) ∝ e N f (q,m) with f (q, m) = S(q, m) + βε(q, m). Here, as previously, S(q, m) is an entropy counting the number of configurations compatible with given values of the twoorder parameters. Following similar counting arguments as above, we end up with the following expression of the rate function f (q, m): 1 f (q, m) = −(1 − q) ln(1 − q) − (q − m) ln(q − m) − 2 1 1 2 + β K q + β J m 2 + (1 − q) ln H 2 2
1 (q + m) ln(q + m) 2 + q ln 2 .
(4.87)
Looking for the local maxima, which are solutions of ∂ f /∂q = ∂ f /∂m = 0 under the constraint |m| ≤ q, one finds the following results. Defining the temperatures TS =
K , TM = J , 2 ln H2
(4.88)
TS corresponds to the temperature at which agents get “standardized”, similar to the first version of the model; TM is the temperature at which spins would order in the Ising model, that is, assuming that agents are already standardized. Hence, in the case T > TS , the stable macroscopic state corresponds to m = 0, q ≈ 0. In the opposite case T < TS , the state q ≈ 1 instead becomes stable, and a magnetization appears for T < min(TS , TM ). If TM < TS , the onset of magnetization m from zero is continuous for T < TM . In contrast, if TS < TM , the onset of magnetization is discontinuous for T < TS . These two situations are illustrated in Fig. 4.6. For completeness, let us mention that the above presentation has been slightly simplified for the sake of clarity. The coupling between both order parameters actually leads in the case TS < TM to a slight shift of the temperature TS with respect to the expression (4.88), but this simplification does not modify the phase diagram. As a final remark, we note that this model can be generalized to include idiosynchratic preferences in the form of (quenched) disordered utility [4]. The resulting model shares some similarities with the random energy model for disordered systems [15]. Its behavior exhibits a crossover between a phase dominated by interactions like in the above (non-disordered) model, and a phase dominated by disorder where interactions are subdominant.
156
4 Models of Social Agents
1
|m|, q
|m|, q
1
|m| q
0 0
|m| q
TM
TS
0 0
TS T M
Fig. 4.6 Phase diagram of the model, showing the order parameters q and m versus temperature T . Left: if TM < TS , the magnetization continuously increased from zero below TM , thanks to the standardization transition occurring at the higher temperature TS . Right: opposite case TS < TM , where temperature has to be lowered below TS to observe the discontinuous onset of a non-zero magnetization. In both cases, the transition on the parameter q is strongly discontinuous, with a jump from q ≈ 0 to q ≈ 1
4.6 Exercises 4.1 Schelling model (a) Find examples of utility functions that do not lead to segregation in the Schelling model defined in Sect. 4.1, in the limit of zero temperature T and for α = 0. (b) Still considering the limit T → 0, find an example of a utility function that leads to segregation if the cooperativity parameter α < α0 and no segregation if α > α0 , for a given α0 to be specified (with the condition 0 < α0 < 1). 4.2 Model of traffic flow Considering the model of traffic flow introduced in Sect. 4.2, study the role of the density-dependent diffusion coefficient D(ρ) regarding the phase diagram (ρu, ρ) called the fundamental diagram of traffic flow in this context. What happens if D(ρ) = 0 for all ρ? Study the phase diagram in qualitative terms for a generic D(ρ). 4.3 Pricing of perishable goods In Sect. 4.3, a model of perishable goods sold in two competiting shops has been introduced. In this model, the goods are characterized only by their freshness. The model can be generalized in a simple way by introducing a price p = f (h) that depends on the freshness h. Customers take both the price and freshness into account in their satisfaction, which is now updated as Si j = αSi j + (1 − α)[(1 − g)h − gp],
(4.89)
4.6 Exercises
157
where g is a parameter satisfying 0 < g < 1, sometimes called greed parameter. Si j is the satisfaction of customer i regarding shop j. Find a simple form of the function f (h) defining the price that avoids the symmetry-breaking phase transition. 4.4 Wealth distribution with capital tax The model of wealth distribution defined in Sect. 4.4 has been studied in the case of an income tax, which is partly redistributed. Consider a variant of the model by replacing the income tax by a capital tax, meaning that a fraction φ of the current wealth is continuously taken away as a tax, and that a fraction f of this collected tax is redistributed. Evaluate the exponent μ characterizing the wealth distribution p(w) ∼ 1/w1+μ (w → ∞) in this model.
References 1. Anderson, S.P., De Palma, A., Thisse, J.F.: Discrete Choice Theory of Product Differentiation. MIT Press (1992) 2. Aw, A., Rascle, M.: Resurrection of “second order” models of traffic flow. SIAM J. Appl. Math. 60, 916 (2000) 3. Bain, N., Bartolo, D.: Dynamic response and hydrodynamics of polarized crowds. Science 363, 46 (2019) 4. Bertin, E.: Emergence of simple characteristics for heterogeneous complex social agents. Symmetry 12, 1281 (2020) 5. Bertin, E., Jensen, P.: In social complex systems, the whole can be more or less than (the sum of) the parts. C. R. Phys. 20, 329 (2019) 6. Blume, M., Emery, V.J., Griffiths, R.B.: Ising model for the λ transition and phase separation in He3 -He4 . Phys. Rev. A 4, 1071 (1971) 7. Bouchaud, J.P.: Crises and collective socio-economic phenomena: simple models and challenges. J. Stat. Phys. 151, 567 (2013) 8. Bouchaud, J.P., Mézard, M.: Wealth condensation in a simple model of economy. Physica A 282, 536 (2000) 9. Bouchaud, J.P., Mézard, M., Dalibard, J. eds.: Complex Systems. Elsevier (2007) 10. Callen, H.: Thermodynamics and an Introduction to Thermostatistics. J. Wiley, New York (1985) 11. Cardoso, B.H.F., Gonçalves, S., Iglesias, J.R.: Wealth distribution models with regulations: dynamics and equilibria. Physica A 551, 24201 (2020) 12. Castellano, C., Fortunato, S., Loreto, V.: Statistical physics of social dynamics. Rev. Mod. Phys. 81, 591 (2009) 13. Cates, M.E., Tailleur, J.: Motility-induced phase separation. Annu. Rev. Condens. Matter Phys. 6, 219 (2015) 14. Chaikin, P.M., Lubensky, T.C.: Principles of Condensed Matter Physics. Cambridge University Press, Cambridge (2000) 15. Derrida, B.: Random-energy model: an exactly solvable model of disordered systems. Phys. Rev. B 24, 2613 (1981) 16. Dimarco, G., Tosin, A.: The Aw-Rascle traffic model: Enskog-type kinetic derivation and generalisations. J. Stat. Phys. 178, 178 (2020) 17. Evans, M.R.: Bose-Einstein condensation in disordered exclusion models and relation to traffic flow. EPL 36, 13 (1996) 18. Evans, M.R., Rajewsky, N., Speer, E.R.: Exact solution of a cellular automaton for traffic. J. Stat. Phys. 95, 45 (1999)
158
4 Models of Social Agents
19. Gauvin, L., Nadal, J.P., Vannimenus, J.: Schelling segregation in an open city: a kinetically constrained Blume-Emery-Griffiths spin-1 system. Phys. Rev. E 81, 066120 (2010) 20. Grauwin, S., Bertin, E., Lemoy, R., Jensen, P.: Competition between collective and individual dynamics. Proc. Natl. Acad. Sci. USA 106, 20622 (2009) 21. Greenberg, J.M.: Extensions and amplifications of a traffic model of Aw and Rascle. SIAM J. Appl. Math. 62, 729 (2002) 22. Hegselmann, R., Thomas, C.S., James M.S: The intellectual, technical and social history of a model. JASSS 20, 15 (2017) 23. Helbing, D.: Gas-kinetic derivation of Navier-Stokes-like traffic equations. Phys. Rev. E 53, 2366 (1996) 24. Helbing, D., Hennecke, A., Shvetsov, V., Treiber, M.: Micro- and macro-simulation of freeway traffic. Math. Comput. Model. 35, 517 (2002) 25. Jensen, P.: The politics of physicists’ social models. C. R. Phys. 20, 380 (2019) 26. Jensen, P., Matreux, T., Cambe, J., Larralde, H., Bertin, E.: Giant catalytic effect of altruists in Schelling’s segregation model. Phys. Rev. Lett. 120, 208301 (2018) 27. Lambert, G., Chevereau, G., Bertin, E.: Symmetry-breaking phase transition in a dynamical decision model. J. Stat. Mech. P06005 (2011) 28. Moussaïd, M., Helbing, D., Theraulaz, G.: How simple rules determine pedestrian behavior and crowd disasters. Proc. Natl. Acad. Sci. USA 108, 6884 (2011) 29. Phan, D., Gordon, M.B., Nadal, J.P.: Cognitive Economics, chap. Social Interactions in Economic Theory: an Insight from Statistical Mechanics. Springer (2004). (J.-P. Nadal and P. Bourgine Eds.) 30. Rogers, T., McKane, A.J.: A unified framework for Schelling’s model of segregation. J. Stat. Mech. P07006 (2011) 31. Sakoda, J.M.: The checkerboard model of social interaction. J. Math. Sociol. 1, 119 (1971) 32. Schelling, T.C.: Dynamic models of segregation. J. Math. Sociol. 1, 143 (1971) 33. Schreckenberg, M., Schadschneider, A., Nagel, K., Ito, N.: Discrete stochastic models for traffic flow. Phys. Rev. E 51, 2939 (1995) 34. Siebel, F., Mauser, W.: On the fundamental diagram of traffic flow. SIAM J. Appl. Math. 66, 1150 (2006) 35. Yi, S.D., Baek, S.K., Chevereau, G., Bertin, E.: Symmetry restoration by pricing in a duopoly of perishable goods. J. Stat. Mech. P11001 (2015)
Chapter 5
Stochastic Population Dynamics and Biological Evolution
In this chapter, we turn to the case of biological populations (of bacteria for instance), that evolve generation after generation under the combined effect of selection and random mutations. This is of course a topic of very general interest and with a broad literature, most of it being outside physics journals. Giving even a brief summary of what has been done in this field is clearly not possible in a few pages. The reader interested in this field is referred for instance to the introductory review [3] or to the more advanced books [2, 5]. Our more modest goal here is to give a flavor of the important similarities and differences between the dynamics of an assembly of physical units, and the evolution dynamics of a biological population. The chapter is organized as follows. After briefly motivating the goal of a stochastic description of population dynamics and evolution in Sect. 5.1, we describe in Sect. 5.2 the selection dynamics in the absence of mutations, using the Moran model as one of the simplest formal illustrations of this phenomenon. We discuss in particular Fisher’s theorem for fitness evolution as well as the notion of fixation probability. We then consider in Sect. 5.3 what happens when mutations come into play, leading either to a notion of fitness landscape at low mutation rate, or to mixed populations at high mutation rate. Finally, real space clustering properties arising in population dynamics are discussed in Sect. 5.4, in the simple framework of neutral evolution, when all individuals have the same fitness.
5.1 Motivation and Goal of a Statistical Description of Evolution On the shortest time scales we are considering here, a biological population evolves through the birth and death of individuals. The number of individuals is thus not conserved, similarly to the reaction–diffusion processes we have described in Sect. 3.2. © Springer Nature Switzerland AG 2021 E. Bertin, Statistical Physics of Complex Systems, Springer Series in Synergetics, https://doi.org/10.1007/978-3-030-79949-6_5
159
160
5 Stochastic Population Dynamics and Biological Evolution
However, the focus of studies of biological evolution is not on the statistics of the number of individuals in the population but rather on the statistics of the characteristics of these individuals. Contrary to the simple particles considered in reaction–diffusion processes that carry no information, individuals in a biological population are characterized, even in the simplest models, by a “genome” that encodes their genetic information. This genetic information has two main roles in the models. First, it determines the fitness of an individual, that is, its relative capacity to have offsprings (“children”) as compared to other individuals in the population. The higher its fitness, the more offsprings an individual is statistically expected to have: this is the selection process. The fitness characterizes the degree of adaptation of an individual to its environment. Second, the genetic information is transmitted to the offsprings, which thus carry the same genetic information as their ancestors, up to rare mutations (“errors” in copying the genetic code). Such mutations are also essential for the longterm evolution of the population and its adaptation to changes in the environment [14, 22]. Since the primary interest is not on fluctuations of population size but on statistics of genetic information, a usual trick is to use the following type of dynamics. Considering a population of N individuals, an individual is randomly chosen with a probability per unit time proportional to its fitness. Once chosen, it gives birth to an offspring which replaces another individual, randomly chosen with uniform probability. With some small probability, the offspring may also be affected by a mutation. Such types of rules hence (somewhat artificially) ensure a constant population size, making analytical treatments easier [16]. Forgetting about the underlying individuals in the model, one can map the above model of birth and death onto a model of N genomes subjected to a jump dynamics: the genome of the newborn individual simply replaces that of the dead individual. Note that this change of genome should not be confused with mutations. Mutations correspond to (often small) changes in the genome of an offspring with respect to its direct ancestor. In contrast, we are here formally replacing a genome, on the list of genomes of the population, by the genome of another individual, without any filiation between the two underlying individuals. For concreteness, let us denote as σi the genome of individual i. In practice, σi is generically an ordered list σi = (si,1 , . . . , si,L ) of L symbols belonging to a finite alphabet, for instance, si, j ∈ {A, U, G, C} like in DNA, or si, j ∈ {0, 1} in simplified models. We will, however, not refer to the detailed structure of the variables σi in the following, but simply use the assumption that σi takes a finite number of discrete values. Given the stochastic nature of the above population dynamics, standard methods of nonequilibrium statistical physics suggest to describe the dynamics of the population with a master equation for the probability P(C, t) that the population has a configuration C = (σ1 , . . . , σ N ), characterizing the list of genomes of all the individuals in the population. Such an approach however leads to a complicated master equation that cannot be solved easily, and is thus not very helpful in practice. An alternative solution is to describe the stochastic dynamics at the level of an individual genome σ , instead of considering explicitly the full population. Yet, due
5.1 Motivation and Goal of a Statistical Description of Evolution
161
to the presence of effective interactions between genomes generated by the fitness, transition rates between different values of σ are not predefined, time-independent functions. This results in a non-linear master equation instead of the standard linear one, as explained below. A further difficulty is to be able to determine the evolution of the probability distribution P(σ, t) under the combined effect of birth–death processes and mutations. To simplify the problem, one may assume that the mutation rate is very low, leading to a separation of time scales between the relatively fast renewal of genomes through births and deaths of individuals, and the much slower mutation dynamics. We thus first consider the convergence to a steady state for the dynamics in the absence of mutations, and in a second step describe the quasi-static evolution of this steady state under a very low mutation rate, before eventually dealing with a larger mutation rate.
5.2 Selection Dynamics Without Mutations 5.2.1 Moran Model and Fisher’s Theorem We start by considering the dynamics of a population of fixed size N under the selection birth–death process without mutations, a model called the Moran model [16]. The fitness is assumed to depend only on the genome σ , and is denoted as f (σ ). The continuous-time dynamics is defined as follows. An individual with genome σ is randomly chosen with a probability per unit time proportional to its fitness f (σ ). The chosen individual then gives birth to an offspring having exactly the same genome σ , due to the absence of mutations. This offspring replaces another individual with genome σ , randomly chosen among the population with uniform probability. Our goal is to write down an effective master equation for the probability distribution P(σ, t) of genomes at time t. We first need to determine the transition rate W (σ |σ ) from a genome σ to a genome σ . The process is driven by the choice of the individual with genome σ which gives birth to an offspring, so that the transition rate does not depend on the genome σ of the replaced individual. The transition rate W (σ |σ ) is simply proportional to the fitness f (σ ) and to the number n(σ , t) of individuals with genome σ at time t in the population. We thus end up with Wt (σ |σ ) = f (σ )
n(σ , t) , N
(5.1)
where the factor 1/N has been included to normalize the time scale (a frequency unit could be included here, but we have assumed the transition rates to be dimensionless). Note that we have made explicit in the notation the time dependence of the transition rate. In this form, the transition rate Wt (σ |σ ) is however not well defined, due to the fact that n t (σ ) is actually a random variable, depending on the global configuration of the population. This problem is nevertheless solved in the infinite population size
162
5 Stochastic Population Dynamics and Biological Evolution
limit N → ∞, for which Eq. (5.1) reduces to Wt (σ |σ ) = f (σ ) P(σ , t) .
(5.2)
Now Wt (σ |σ ) is no longer a random variable, but is determined in a self-consistent way by the solution of the master equation ∂P (σ, t) = [Wt (σ |σ )P(σ , t) − Wt (σ |σ )P(σ, t)] , ∂t σ (=σ )
(5.3)
the price to pay being that this master equation is now non-linear. In order to find the steady-state solution of Eq. (5.3), it is natural to first look at the simplest type of solutions, namely, solutions satisfying detailed balance (see Sect. 2.1.2). In the present case, detailed balance corresponds to the following equality (we drop the explicit time dependence of the transition rates since we are considering a steadystate solution) (5.4) W (σ |σ )P(σ ) = W (σ |σ )P(σ ) . Taking into account the expression (5.2) of the transition rates, Eq. (5.4) leads to the self-consistent solution 1 f (σ )P(σ ), (5.5) P(σ ) = Z where Z is a normalization factor. This equation implies that P(σ ) is non-zero only over the configurations σ having a common value f 0 of the fitness f (σ ). In the following, we assume for simplicity that all genomes σ have distinct fitnesses f (σ ). Under this assumption, the stationary probability distribution P(σ ) concentrates on a single genome σ0 , (5.6) P(σ ) = δσ,σ0 . This phenomenon, by which the whole population acquires the same genome, is called fixation of the genome σ0 . A dynamical view on the fixation phenomenon can be obtained by computing the rate of change of the mean fitness f =
f (σ )P(σ, t) .
(5.7)
σ
Starting from Eqs. (5.2) and (5.3), one has d f = f (σ )[ f (σ ) − f (σ )]P(σ, t) P(σ , t) . dt σ,σ
(5.8)
Note that we have dropped the constraint σ = σ in the sum, since the corresponding term is equal to zero. Expanding the term between brackets in Eq. (5.8) directly
5.2 Selection Dynamics Without Mutations
163
yields d f = f 2 − f 2 . dt
(5.9)
Hence, in the absence of mutation, the rate of change of the mean fitness in an infinite population is equal to the variance of the fitness across the population. This important result is known as Fisher’s theorem of natural selection. It shows, in particular, that the mean fitness of the population cannot decrease, and remains constant only when the variance of the fitness vanishes, corresponding to a population in which all individuals have the same fitness. If all genomes have different fitnesses, a single genome is thus selected in the long time limit, and one recovers the phenomenon of fixation described above. A natural question is then to know the probability of fixation of a given genome σ0 starting from a heterogenous population in which many different genomes are present. Obviously, a genome with a high fitness f (σ0 ) should have a higher probability of fixation than a genome with a low fitness. Giving a quantitative answer for an arbitrary initial condition is however a complicated problem. In the following subsection, we will discuss a specific case in which a relatively simple answer can be given.
5.2.2 Fixation Probability The fixation probability can be evaluated in a relatively simple way if one considers an initially homogeneous population of individuals having a genome σ1 in which a single individual with a genome σ2 (different from σ1 ) is introduced [11]. The new genome σ2 may, for instance, result from a random mutation of σ1 , or be the genome of an external individual joining the initially homogeneous group (as when a plant is brought from another continent). We call N the total number of individuals in the population. For brevity, the fitnesses f (σ1 ) and f (σ2 ) are denoted as f 1 and f 2 , respectively. Under the population dynamics, the number n of genomes σ2 in the population evolves in a stochastic manner. When an offspring replaces another individual, several situations may occur: (i) the offspring has genome σ2 and replaces a genome σ1 , in which case n increases by one; (ii) the offspring has genome σ1 and replaces a genome σ2 , in which case n decreases by one; (iii) the genome of the offspring is the same as the replaced one, so that n does not change. In case (i), the probability that the offspring has genome σ2 is proportional to f 2 n (each of the n genomes σ2 have a probability proportional to f 2 to be chosen in order to give an offspring), while the probability that the randomly replaced individual has genome σ1 is proportional to N − n (it is picked up in a uniform way among the N − n genomes σ1 ). We define the probabilities T (n + 1|n) and T (n − 1|n) as the conditional probabilities that n increases or decreases by 1, given that n varies. This means that we do not need to take into account the case (iii) above when n does not change. In this way, we look at the effective dynamics of n. It follows that
164
5 Stochastic Population Dynamics and Biological Evolution
T (n + 1|n) + T (n − 1|n) = 1, while this would not be true if one explicitly takes into account the probability that the value of n does not change in a single step. However, note that this effective evolution of n no longer occurs in real time, but according to an effective time that may be related in a non-linear way to the real time. This has no consequences here since we only look at the final state reached, that is, the fixation probability. The probability T (n + 1|n) to increase n by one (case (i) above) is then given by T (n + 1|n) = Cn f 2 n(N − n),
(5.10)
where Cn is a normalization constant. Similarly, in case (ii), the probability that the offspring has genome σ1 is proportional to f 1 (N − n), while the probability that the randomly replaced individual has genome σ2 is proportional to n. The probability T (n − 1|n) that n decreases by one then reads T (n − 1|n) = Cn f 1 n(N − n) .
(5.11)
The normalization constant Cn is determined by the condition that T (n + 1|n) + T (n − 1|n) = 1, yielding T (n + 1|n) =
f2 , f1 + f2
T (n − 1|n) =
f1 . f1 + f2
(5.12)
The problem thus boils down to a random walk for the integer n in the interval 0 ≤ n ≤ N , with transition rates given by Eq. (5.12). The walk stops when it reaches either n = 0 or n = N . For brevity, we introduce the notations q ≡ T (n + 1|n), and hence 1 − q = T (n − 1|n). With probability one, the walk eventually reaches either n = 0 (in which case the mutated genome σ2 disappears from the population) or n = N (in which case the genome σ2 is fixed). The probability that the walk does not reach any boundary after an infinite number of steps is zero. The fixation probability is the probability for the walk to reach n = N starting from n = 1. To compute the fixation probability, let us introduce more generally the probability Pf (m) that the walk first reaches n = N starting from a position n = m. The fixation probability is then simply Pf (1). The interest of introducing Pf (m) lies in the fact that this quantity obeys a recursion relation, namely, Pf (m) = q Pf (m + 1) + (1 − q) Pf (m − 1)
(5.13)
with 1 ≤ m ≤ N − 1. The interpretation of this relation is very simple. Starting from position n = m, the walk can either jump to m + 1 with probability q and then have a probability Pf (m + 1) to eventually reach n = N , or jump to m − 1 with probability 1 − q and have a probability Pf (m − 1) to reach N . Equation (5.13) can be rewritten as (5.14) Pf (m + 1) − Pf (m) = r [Pf (m) − Pf (m − 1)]
5.2 Selection Dynamics Without Mutations
with r=
165
1−q f1 = , q f2
(5.15)
where we have taken into account the definition Eq. (5.12) of the transition rates. By summation, Pf (m) can be obtained from Eq. (5.14) as Pf (m) = Pf (0) + [Pf (1) − Pf (0)]
m−1
rk
(5.16)
k=0
for m = 1, . . . , N . Computing explicitly the geometric sum, and taking into account the boundary conditions Pf (0) = 0 and Pf (N ) = 1, one obtains Pf (m) =
1 − rm 1 − rN
(5.17)
from which the fixation probability follows, Pf (1) =
1−r . 1 − rN
(5.18)
Note that although we may have in mind a beneficial mutation, that is, f 2 > f 1 , the fixation probability (5.18) is valid whatever the values of f 1 and f 2 . With the above notation r = f 1 / f 2 , one has for large N that Pf (1) ≈ (1 − r ) if f 2 > f 1 , and Pf (1) ≈ (r − 1)/r N 1 if f 2 < f 1 . Hence, a beneficial mutation ( f 2 > f 1 ) has a finite probability of fixation, while a deleterious mutation ( f 2 < f 1 ) has a very small fixation probability that goes to zero when the population size N goes to infinity. Finally, let us mention that another extension of the Moran model takes into account the geographical structuring of populations, for instance, over close-by islands. In this generalization, an individual which dies may be replaced by another individual coming from another island. The model may be defined with an arbitrary connectivity over a set of islands. The fixation probability can be evaluated exactly for specific types of graphs [12].
5.2.3 Fitness Versus Population Size: How Do Cooperators Survive? Before discussing in more details the effect of mutations on the selection dynamics, we briefly discuss an interesting application of the expression (5.18) of the fixation probability to the issue of cooperativity, which is relevant, for instance, in the bacterial world. Cooperativity means here that a group of individuals (cooperators) spend part of their resources to produce a common good that is useful to the whole population. In contrast, non-cooperators only spend their resources for their own
166
5 Stochastic Population Dynamics and Biological Evolution
needs, and thus have a higher fitness than cooperators, while they can benefit from the common good produced by cooperators. Since genomes with higher fitness are selected by evolution, it would be expected that cooperators, having a lower fitness, would eventually disappear [4]. It is thus difficult to understand why cooperative behavior can still be observed in bacteria for instance. Different types of explanations have been proposed, relying notably on more complex (e.g., multilevel [19, 21]) selection processes, or on an enhanced fitness for cooperators, which would come from indirect effects (i.e., different from the mere availability of resources). Although such scenarios may well be valid, a simpler argument can also be given considering only single-level selection processes and no enhanced fitness for cooperators. The argument takes into account stochastic effects and the role of population size. For f 1 = 1 and f 2 = 1 + s, the fixation probability (5.18) reads, assuming a small selection pressure N s 1, s 1 + . (5.19) Pf = N 2 Hence, on top of the direct contribution of the relative fitness to the fixation probability, a contribution 1/N of purely stochastic origin (i.e., independent of the fitness) also appears, and cooperators may take advantage of this stochastic contribution to increase their fixation probability relatively to non-cooperators. In short, the reason is that by producing a common good useful to the whole population, cooperators may reach a population size Nc significantly larger than the population size Nc of non-cooperators. If the increase of population size is large enough, it may compensate for the smaller fitness and eventually result in a larger fixation probability, that is, a larger invasion capacity [10]: 1 1 sc snc , + > + Nc 2 Nnc 2
(5.20)
hence Pfc > Pfnc . This effect is one of the simplest possible explanation for the natural observation of cooperators. It is likely to be combined with other, more complicated effects in real situations.
5.3 Effect of Mutations on Population Dynamics 5.3.1 Quasi-static Evolution Under Mutations We have seen above that an initial population with heterogeneous genomes and fitnesses across individuals evolves under a simple population dynamics with selection (but no mutation) to a homogeneous population where all individuals have the same genome. Under the time scale separation hypothesis mentioned above, one can assume that the typical time between mutations is much larger than the time
5.3 Effect of Mutations on Population Dynamics
167
needed for the population to relax to a single genome. In this framework, the effect of successful mutations (those that reach fixation) can thus be conceived simply as jumps between different values of the genome —while mutations that do not reach fixation can simply be neglected. Since mutations are random, one needs to resort to a stochastic description, involving transition rates from one value of the genome to another. We denote as Q(σ ) the distribution of the genome σ . Let us emphasize the different interpretation of the distribution Q(σ ) with respect to the distribution P(σ ) introduced in Sect. 5.2.1. The function P(σ ) describes the distribution of genomes across the (a priori heterogeneous) population, in the limit of a large population size where the fluctuations of this distribution can be neglected. By contrast, the distribution Q(σ ) describes a large ensemble of populations, each population being homogeneous with a single genome σ . In order to describe the dynamics of the distribution Q(σ ), we need to determine the probability per unit time of the fixation of a mutated genome. This probability per unit time is the product of the mutation rate ν (the probability per unit time that a random mutation occurs) and of the probability that the new genome eventually reaches fixation. The fixation probability has been evaluated in Sect. 5.2.2, see Eq. (5.18). Considering the long time scale dynamics through which the genome of the homogeneous population changes under mutation and fixation of the mutated genome, the transition rate from σ1 to σ2 reads Wm (σ2 |σ1 ) = ν
1 − ( f1 / f2 ) 1 − ( f1 / f2 )N
(5.21)
with ν the mutation rate. These transition rates govern the evolution of the distribution Q(σ, t) according to ∂Q (σ, t) = [Wm (σ |σ )Q(σ , t) − Wm (σ |σ )Q(σ, t)] . ∂t σ (=σ )
(5.22)
One can easily check that the transition rate (5.21) satisfies a detailed balance relation of the form [18] (5.23) Wm (σ2 |σ1 ) f 1N −1 = Wm (σ1 |σ2 ) f 2N −1 . As a result, the stationary probability distribution Q(σ ), reached in the infinite time limit, is proportional to f (σ ) N −1 , Q(σ ) =
1 f (σ ) N −1 , Z
(5.24)
where Z is a normalization factor. Let us emphasize that N is simply an external parameter in this long time scale dynamics of the genome σ , since the population is homogeneous and the definition of σ does not involve N . This is to be contrasted, for instance, with the full configuration (σ1 , . . . , σ N ) of a heterogeneous population, which depends on N .
168
5 Stochastic Population Dynamics and Biological Evolution
5.3.2 Notion of Fitness Landscape Equation (5.24) suggests an interesting analogy with equilibrium statistical physics [18]. Defining ε(σ ) = − ln f (σ ), one can rewrite Eq. (5.24) as Q(σ ) =
1 −βeff ε(σ ) . e Z
(5.25)
This form of the distribution Q(σ ) shows a clear analogy to the equilibrium distribution in statistical physics, provided one interprets ε(σ ) as an effective energy, and βeff = N − 1 as an effective inverse temperature. Having an effective temperature which depends on N may be surprising at first sight, but let us emphasize again that N can here be considered as an external parameter, as explained above. This property has important consequences. In the infinite size limit, the effective temperature −1 is equal to zero, and the distribution concentrates on the lowest energy Teff = βeff states, that is, on the genomes with the highest fitness. For a finite population size, fluctuations around these states of highest fitness are allowed, and these fluctuations become larger when population size is decreased. By analogy with the potential energy landscape of physical systems, it is customary to speak about the “fitness landscape” in the context of biological evolution modeling. Note that, strictly speaking, the fitness landscape characterizes a single genome; it is simply a representation of the value of the fitness as a function of the genome value. However, the notion of fitness landscape is more useful to describe a population, especially under the simplifying assumption that the population is homogeneous and can be described by a single genome. It is only by considering a population that a dynamics in the fitness landscape can be defined, and we have seen that the population size N plays a key role as being essentially the inverse effective temperature. In a complex fitness landscape, a small population size (relatively high effective temperature) may help to reach higher values of the fitness by escaping local maxima in the fitness landscape thanks to fluctuations that may temporarily decrease the fitness. As mentioned earlier, the genome σ is often described in evolution models as a sequence of symbols, σ = (s1 , . . . , s L ), with in the simplest models s j ∈ {0, 1} or s j ∈ {−1, 1}. The integer L is, in general, assumed to be large. In the context of these models, there is thus a natural mapping between the fitness landscape and the energy landscape of spin models in physics. In particular, mappings to disordered spin models like spin-glass models have been proposed [3]. Typical examples of fitness functions directly inspired by disordered spin models include the analog of the random field paramagnetic model f (σ ) =
L
h i si + F0 ,
σ = (s1 , . . . , s L ),
i=1
where h i is a quenched random variable, as well as the p-spin model
(5.26)
5.3 Effect of Mutations on Population Dynamics
f (σ ) =
Ji1 ,i2 ,...,i p si1 si2 . . . si p + F0 ,
169
(5.27)
i 1 ,i 2 ,...,i p
where p ≥ 3 is an integer parameter of the model. The constant F0 is included to ensure that the fitness remains positive. The parameters Ji1 ,i2 ,...,i p are (timeindependent) random coupling constants that couple the p spins (si1 , . . . , si p ). The distribution of these coupling constants, in general, depends on the total number L of spins. The sum is performed over all sets of p spins among the L spins. The random field model (5.26) is a simple realization of a so-called “Fujiyama landscape” [3], in which there is a single maximum in the landscape, reached under evolution from any initial genome. By contrast, the p-spin model yields a complicated fitness landscape that includes many local maxima in which the population may get trapped during evolution. A popular alternative to the p-spin model is the so-called NK-landscape, which associates with each of the L spins σi a set of K (typically randomly chosen) “neighbors”, with K < L. The values of these neighbor spins determine the contribution of spin si to the total fitness, according to f (σ ) =
L 1 ˜ Ji (si , si1 , si2 , . . . , si K ) . L i=1
(5.28)
The parameters J˜i (si , si1 , si2 , . . . , si K ) are quenched random variables that take statistically independent values for each configuration (si , si1 , . . . , si p ). One of the main differences with the p-spin model is that in the NK-model, each spin si interacts with a single set of K spins, considered as its neighbors. In contrast, in the p-spin model a given spin si interacts with all possible sets of p − 1 other spins.
5.3.3 Selection and Mutations on Comparable Time Scales We have seen in Sect. 5.3.1 how to describe the combined effect of selection and mutations when the rate of mutation is very low and the selection dynamics can be summarized by the fixation probability of the mutated genome. Under this time scale separation assumption, it was possible to describe the effective dynamics, on long time scales, of homogeneous populations of genomes. In the present subsection, we now consider the case when selection and mutations operate on comparable time scales, no longer allowing for a time scale separation assumption. As a result, it is no longer possible to focus the description on homogeneous populations of genomes, but heterogeneous populations have to be taken into account, which significantly raises the level of difficulty if one aims at describing an arbitrary number of possible genomes σ . To keep technicalities at a minimal level, we restrict our study to the case of only two possible genomes (or allele) σ1 and σ2 . Although this restriction looks superficially similar to the one made when studying the fixation probability,
170
5 Stochastic Population Dynamics and Biological Evolution
the assumption made here is actually more restrictive. When studying the fixation probability, one focuses on one particular mutation among many possible different mutations, and since one starts from a homogeneous population, one ends up with only two different genomes. However, this does not mean that only two genomes are potentially accessible to mutations. In contrast, here we make the restrictive assumption that only two genomes σ1 and σ2 are accessible. As a consequence, the heterogeneous population is fully described by the number n of individuals with genome σ2 , over the N individuals of the population. In the quasi-static regime obtained for a mutation rate ν → 0, only the configurations n = 0 and n = N were relevant, while here the full distribution Pn for 0 ≤ n ≤ N has to be determined. At odds with Sect. 5.2.2 where we considered for simplicity a discretetime steps dynamics, we consider here a continuous-time dynamics, which allows us to treat selection and mutations as two processes acting in parallel, the corresponding transition rates simply being additive. Without loss of generality, we set f 1 = 1 and f 2 = f to lighten notations, which simply amounts to a particular choice of time unit. In line with the notations used for the birth–death processes in Sect. 3.2.1, we denote as λn the probability per unit time to increase n by one (i.e., replacing a genome σ1 by σ2 ), and by μn the probability to decrease n by one (replacing a genome σ2 by a genome σ1 ). The transition rates λn and μn are given by λn = f (N − n)
n N
+ν ,
μn = n
N −n +ν . N
(5.29)
The stationary distribution Pn is known from birth–death process theory, see Eq. (3.23). For the process at hand, it reads for 1 ≤ n ≤ N , n (N − k + 1)[ f (k − 1) + ν N ] , Pn = P0 k(N − k + ν N ) k=1
(5.30)
N where P0 is determined by the normalization condition n=0 Pn = 1. The exact expression (5.30) of Pn can easily be evaluated numerically on a computer. Simpler asymptotic expressions can also be found in different limits. For a finite population size N and a very small mutation rate ν such that ν N 1, one finds to leading order Pn ≈ P0 ν f n−1
N2 n(N − n)
(1 ≤ n ≤ N − 1) ,
(5.31)
and PN ≈ P0 f N . This expression provides the first correction in ν to the quasi-static limit where only P0 and PN are non-zero, and the distribution Pn remains strongly peaked in n = 0 and n = N . In the opposite limit ν N 1 (where ν is still small), the logarithm of the product in Eq. (5.30) can be approximated by an integral, and one finds Pn ≈ C e N g(n/N ) , where C is a normalization constant, and the function g(x) is given by
5.3 Effect of Mutations on Population Dynamics
g(x) = x ln f +
ν [1 + ln( f x)] + ν [1 + ln(1 − x)] . f
171
(5.32)
At odds with the very small mutation rate limit ν N 1, here the distribution Pn is sharply peaked around a maximum value n ∗ , with 0 < n ∗ /N < 1, determined by maximizing g(x). A simple expression of x ∗ = n ∗ /N is obtained when f = 1 + s with |s| 1, s 1 , (5.33) x∗ = + 2 8ν indicating a well-mixed population of genomes due to a high mutation rate, with a small bias of the population toward the individuals with highest fitness. As a side remark, note that at a qualitative level, a somewhat similar situation occurs, for low (or even zero) mutation rate in the case of a fluctuating environment (an issue that may play a role, for instance, to understand some aspects of protein allostery [6]). Fitness strongly depends on the environment, and environmental changes lead to fitness changes. As a toy model, one may consider a model where the values of the fitness f 1 and f 2 are randomly exchanged with some probability rate α. Although we do not study this model in detail, one may expect the exchange of fitness values to be qualitatively similar to a high mutation rate, which reshuffles the fitnesses in the population. In this case, one thus also expects a mixed population, with a priori probability distribution P˜n peaked around an intermediate value n˜ ∗ equal to N /2 if the average time spend in both fitness configurations ( f 1 , f 2 ) = (1, f ) and ( f, 1) is equal. An asymmetry in the time spend in these configurations is expected to lead to a value n˜ ∗ = N /2.
5.3.4 Biodiversity Under Neutral Mutations We have examined above a simple example of selection dynamics under mutations, and we found that for low enough mutation rate, only individuals with the highest fitness remain. To simplify the discussion, we have restricted the argument to the case of two different possible genomes. However, in more realistic situations, mutations may lead to many different possible genomes. One may argue that genomes with a lower fitness will rapidly be eliminated. However, there may be a significant number of possible genomes with the same fitness. Changing from one to the other corresponds to neutral mutations. Since they all have the same fitness, selection cannot discriminate between them in a deterministic way. Several such genomes thus coexist in the population over long time scales, and eventually disappear due to a random extinction. This type of scenario has been investigated in the framework of neutral models of biodiversity, where different genomes are interpreted as different species. A natural question in this context is the average number of species in the population, which is a measure of biodiversity.
172
5 Stochastic Population Dynamics and Biological Evolution
We consider the following model of neutral biodiversity [13, 20]. The population is composed of a fixed number of individuals, all having the same fitness. When an individual dies, which occurs with a unit rate, it is immediately replaced by the offspring of another individual chosen at random. Most of the time, the offspring has the same genome as its ancestor and thus belongs to the same species, but there is also a small probability ν 1 that a mutation occurs, leading to the first individual of a new species that may spread out later on. We are interested in the dynamics of a given species, and we define the probability Pk (t) to have k individuals of the same species at time t among the N individuals of the population. The number of individuals of a given species follows a stochastic birth–death process (as defined in Sect. 3.2.1), and the probability Pk (t) is thus ruled by the following master equation [see also Eq. 3.21]: d Pk = λk−1 Pk−1 + μk+1 pk+1 − (λk + μk )Pk dt
(k ≥ 1) .
(5.34)
The birth rate λk and the death rate μk are, respectively, given by [20] (N − k)k (1 − ν) , N (N − 1) k[N − k + ν(k − 1)] . μk = N (N − 1) λk =
(5.35) (5.36)
These definitions of the birth and death rates can be understood as follows. The number k of individuals in the considered species increases by one if an individual of another species dies, with probability (N − k)/N and is replaced by an offspring of an individual from the considered species, chosen with probability k/(N − 1) among the remaining N − 1 individuals. Taking into account the probability 1 − ν that no mutations occurs (otherwise the new individual does not contribute to the considered species), one finds the expression (5.35) of the birth rate λk . On the other side, k decreases by one when an individual of the considered species dies, with probability k/N , and is replaced either by an individual of another species with probability (N − k)/(N − 1), or by a mutated offspring of an individual from the same species, which occurs with a probability ν(k − 1)/(N − 1). Gathering terms, one obtains the expression (5.36) of μk . The stationary distribution Pkst is given by the general theory of birth–death processes, see Eq. (3.23), and one has Pkst = δk,0 . In other words, in steady state, the species has disappeared. In this model, species are thus transient entities that randomly appear and disappear. A natural question is thus to know the average number of species in the population at an arbitrary given time, and one may expect that this average number of species may reach a stationary state even though each species has a finite lifetime. As a convenient auxiliary variable, we define the average number φk of species having k individuals (1 ≤ k ≤ N ). This quantity is obtained by considering that the species has randomly appeared at a time t < t, and its number of individuals has evolved from 1 at time t to k at time t. A new species has a probability νdt
5.3 Effect of Mutations on Population Dynamics
173
to appear in a time interval [t , t + dt ], because an individual dies with probability dt in this time interval and each of the N − 1 remaining individuals may then give rise to a mutated offspring with probability ν/(N − 1). The subsequent evolution of the number of individuals is given by master equation (5.34), with initial condition Pk (t ) = δk,1 . Since the birth and death rates λk and μk are time independent, the solution depends only on the time difference t − t , and we simply write it Pk (t − t ). The average number φk of species with k individuals is then given by [20] φk = ν
t
−∞
Pk (t − t ) dt ,
(5.37)
∞ which with a simple change of variables can be rewritten as φk = ν 0 Pk (t) dt. We thus indeed find that φk is time independent. By integrating (5.34) over the time interval [0, ∞], one finds λk−1 φk−1 + μk+1 φk+1 − (λk + μk )φk = 0
(k ≥ 2) .
(5.38)
Hence, the quantity φk satisfies the same recursion relation as the stationary distribution of a birth–death process, and it thus reads k B λ j−1 φk = , k j=1 μ j
(5.39)
where B is a normalization parameter to be determined. Introducing the parameter θ as (N − 1)ν , (5.40) θ= 1−ν one can then write φk as φk =
k B N − j +1 . k j=1 N − j + θ
(5.41)
As φk is not a probability distribution, it is not normalized to one, but rather satisfies N kφk = N , meaning that the average population the normalization condition k=1 size is N . Defining N k N − j +1 , (5.42) ZN = N − j +θ k=1 j=1 one can easily obtain the recursion relation Z N +1 =
N +1 ZN + 1 , N +θ
(5.43)
174
5 Stochastic Population Dynamics and Biological Evolution
with the initial condition Z 1 = 1/θ . It follows that Z N = N /θ , so that B = θ . completely determined, the average number S N of species is Now that φk is N φk . It can be shown [20] that S N also satisfies a simple obtained as S N = k=1 recursion relation S N +1 = S N + θ/(N + θ ), so that SN =
N −1 n=0
θ . θ +n
(5.44)
Approximating for large N the above sum by an integral, and recalling that ν 1, one finds the rough asymptotic estimates S N ≈ 1 + ν N ln N 1 S N ≈ 1 + ν N ln ν
(ν N 1) ,
(5.45)
(ν N 1) .
(5.46)
Note that although it is perfectly legitimate to take the limit ν N 1 in the framework of the present model, the assumption of a neutral dynamics is better justified in the low mutation rate limit ν N 1, according to the results of Sect. 5.3.3.
5.4 Real Space Neutral Dynamics and Spatial Clustering We have up to now discussed population dynamics in terms of numbers of individuals having certain internal characteristics. We also mentioned that the selection dynamics under the effect of mutations can be represented in an abstract space called fitness landscape. We would like now to briefly discuss how population dynamics may also lead to interesting structures in real space in the form of clusters. With this aim in mind, we focus on the simplest situation of a neutral population dynamics with all individuals having the same fitness (whether they have or not the same genome is unimportant in this discussion). The basic ingredient leading to non-trivial effects is that contrary to physical particles, the individual considered here have a finite lifetime, and birth and death events play a key role in the resulting spatial clusters. This phenomenon is common in nature and has been reported in diverse examples ranging from phytoplankton [15] to forest trees [1, 17]. Although the heterogeneity of the environment is expected to play a potential role in this phenomenon, clustering has also been observed in a well-controlled experimental setup using a population of micro-organisms [8] where a homogeneous environment was maintained.
5.4 Real Space Neutral Dynamics and Spatial Clustering
175
5.4.1 Local Population Fluctuations in the Absence of Diffusion To model this situation, we consider a simple model of micro-organisms having a finite lifetime. Dead organisms are withdrawn, while birth of a new organism is assumed to proceed by division (as in cell division), whereby a single microorganism divides into two independent organisms. We assume that the birth rate is equal to the death rate, so that the average number of micro-organisms is conserved, but their actual number may fluctuate (at odds with, e.g., the Moran model where the number of individuals is fixed by construction of the model). To qualitatively understand the clustering phenomenon, let us first imagine that micro-organisms are initially distributed into a large number L of boxes from which they cannot escape (i.e., diffusion between boxes is neglected). Due to birth and death events, the number n i of micro-organisms in box i fluctuates, and performs a symmetric random walk with an absorbing boundary at n i = 0 (when all micro-organisms in a box are dead, no birth can take place). Let us focus on a given box and simply call n its number of micro-organisms. We call α the birth rate, also equal to the death rate. The transition rate W (n |n) is given by W (n + 1|n) = W (n − 1|n) = αn, and W (n |n) = 0 otherwise. The probability Pn (t) to have n micro-organisms in the box at time t obeys the following master equation: d Pn = α(n − 1)Pn−1 (t) + α(n + 1)Pn+1 (t) − 2αn Pn (t) (n ≥ 1) (5.47) dt d P0 = α P1 (t) . (5.48) dt It is straightforward to show that the first two moments n(t) and n(t)2 satisfy the evolution equation: dn 2 dn = 0, = 2αn . (5.49) dt dt Assuming that at t = 0, the number of micro-organisms is fixed to a (non-random) value n = n 0 , one finds n(t) = n 0 ,
n(t)2 = n 20 + 2αn 0 t .
(5.50)
It follows that the variance Var[n(t)] = n(t)2 − n(t)2 is given by Var[n(t)] = 2αn 0 t .
(5.51)
Hence, the average number of micro-organisms remains fixed, while its variance grows linearly with time [7], meaning that fluctuations become larger and larger. We are talking here about formal ensemble averages, but they can also be interpreted as averages over a large number of boxes. This means that in some boxes, all microorganisms have disappeared (n = 0), while in others n has become large. A subtle
176
5 Stochastic Population Dynamics and Biological Evolution
balance of these very different time evolutions occurs, in such a way that the average number n(t) keeps a constant value. Looking at the spatial distribution of microorganisms among boxes, one thus sees a strong heterogeneity: some boxes are empty while others are full, which is precisely the essence of the clustering phenomenon.
5.4.2 Can Diffusion Smooth Out Local Population Fluctuations? In the above simplified argument, we have completely neglected the diffusion of micro-organisms. Allowing for diffusion, micro-organisms can, for instance, jump to neighboring boxes, which means, in particular, that an empty box can be filled again, or that a box containing a large number of micro-organisms can see its population decrease due to migrations. One may thus expect that diffusion at least partly smooths out the clustering phenomenon. A study of the influence of diffusion may also be performed in the above discrete framework, where space is divided into boxes. For simplicity, we restrict here calculations to the one-dimensional case. Since microorganisms move from one box to another, it is no longer possible to write a closed and exact evolution equation for the probability Pn of a single box population n i . Instead, one needs to consider the joint probability P(n 1 , . . . , n L ) of the L boxes. Introducing the probability rate β to jump from box i to box i + 1 or i − 1 (we assume for simplicity periodic boundary conditions L + 1 ≡ 1), the evolution equation for the joint probability P(n 1 , . . . , n L ) reads [9] L dP (n 1 , . . . , n L ) = β (n i + 1)P(n 1 , . . . , n i−1 , n i +1, n i+1 −1, n i+2 , . . . , n L ) dt i=1 L +β (n i + 1)P(n 1 , . . . , n i−2 , n i−1 −1, n i +1, n i+1 , . . . , n L ) i=1
+α
L (n i − 1)P(n 1 , . . . , n i−1 , n i −1, n i+1 , . . . , n L ) i=1
+(n i + 1)P(n 1 , . . . , n i−1 , n i +1, n i+1 , . . . , n L ) −2(α + β)n i P(n 1 , . . . , n L ) .
(5.52)
Multiplying Eq. (5.52) by n i and summing over all possible configurations (n 1 , . . . , n L ), one finds an evolution equation for the average local population, dn i = β n i−1 + n i+1 − 2n i , dt
(5.53)
which is nothing but a space-discretized diffusion equation. Note that, as expected, the birth and death rate α does not appear in this equation, since the numbers of births and deaths exactly compensate on average. Assuming a uniform initial repartition between boxes, n i (0) = n 0 , the average local population remains uniform at later times, n i (t) = n 0 .
5.4 Real Space Neutral Dynamics and Spatial Clustering
177
A richer information is obtained by computing the spatial correlation function u k = n i n i+k − n 20 , which is independent of i due to space translation invariance (valid for a uniform local population n i (t) = n 0 ). It is also easy to show that u −k = u k . The average n i n i+k is evaluated by multiplying Eq. (5.52) by n i n i+k and summing over all configurations (n 1 , . . . , n L ). One finds that the correlation function u k also satisfies a diffusion-like equation, but now with respect to the variable k [9], du k = 2β u k−1 + u k+1 − 2u k dt
(k = 0) .
(5.54)
The evolution equation for u 0 takes a slightly different form, du 0 = 4β(u 1 − u 0 ) + (α − 4β)n 0 . dt
(5.55)
The time-dependent solution of the coupled Eqs. (5.54) and (5.55) can be expressed exactly in terms of Bessel functions [9]. This explicit form allows the asymptotic behavior at large√ time to be determined, and one finds that the correlation u k (t) diverges as αn 0 t/4β when t goes to infinity [9]. In particular, the variance Var[n(t)] = u 0 (t) diverges as t 1/2 . This divergence is slower than the linear divergence of the variance found above in the absence of diffusion, but it still indicates that fluctuations eventually become arbitrarily large so that the clustering phenomenon remains present in a one-dimensional geometry in the presence of diffusion. This calculation can be generalized to higher dimensions, and one finds that the qualitative picture of clustering remains essentially the same in a two-dimensional geometry [9], the latter being particularly relevant to experiments with bacteria on a substrate (see, e.g., [8]). In the two-dimensional case, one finds, in particular, that the variance Var[n(t)] diverges logarithmically in time when t goes to infinity. Hence, fluctuations are again present, but they diverge much more slowly than in one dimension, or than in the absence of diffusion. However, they remain clearly observable on experimental time scales [8]. Yet, the general trend is that diffusion tends to be more efficient to smooth out local population fluctuations when increasing the space dimension. For space dimension d ≥ 3, the variance Var[n(t)] converges to a finite value at large times [9]. As a final comment, one may wonder why we used a discretization of space into boxes to describe diffusing micro-organisms. After all, the diffusion is continuous in space, and one may wish to use a continuum description in terms of the density ρ(r, t) of micro-organisms at point r at time t. Since the birth and death rates are equal, one would usually write a simple diffusion equation for the time evolution of ρ(r, t), ∂ρ = D ρ , (5.56) ∂t where D is the diffusion coefficient and is the Laplacian in d dimensions. Such an evolution equation predicts that the density profile converges to a flat profile which does not account for the clustering phenomenon. The reason is that Eq. (5.56)
178
5 Stochastic Population Dynamics and Biological Evolution
actually describes the average density profile, and not the actual, fluctuating profile. To include fluctuations in the description, it is customary to add a Langevin noise term to Eq. (5.56), yielding ∂ρ (r, t) = D ρ(r, t) + ξ(r, t) . ∂t
(5.57)
In standard physics problems, the number of particles is locally conserved, and the noise is constrained to be the divergence of a fluctuating current, ξ(r, t) = ∇ · J(r, t). However, here the birth and death of micro-organisms break the local conservation of the number of particles, so that the noise is no longer constrained to be a conserved noise. Considering a non-conserved Gaussian white noise ξ(r, t), with a variance proportional to ρ(r, t) (the noise results from the birth and death processes which are more frequent for a large local density), one may expect that the clustering phenomenon could at least be qualitatively described within the continuum framework.
5.5 Exercises 5.1 Kimura diffusion equation Consider a population of fixed size N with two types of individuals A and B evolving according to a birth–death process defined by a birth rate λn and a death rate μn given by n N −n μn = νn , (5.58) λn = ν(N − n) (1 + s) , N N where n is the number of A individuals in the population and N − n the number of B individuals. Show by taking a continuum limit that for large N the probability Pn (t) can be approximated by a continuous distribution p(x, t) satisfying the so-called Kimura diffusion equation [11, 14], ∂ ∂2 ∂p (x, t) = −N s [x(1 − x) p] + 2 [x(1 − x) p] ∂t ∂x ∂x
(5.59)
with x = n/N . The excess fitness s is assumed to be small, so that N s remains of the order of 1. 5.2 Population dynamics with a high mutation rate The probability to have n individuals with genome σ2 in a population of N individuals with two accessible genomes σ1 and σ2 is obtained from birth–death process theory in Eq. (5.30). Derive from this distribution the expression (5.32) of the large deviation function g(x), and show that its maximum x ∗ is given by Eq. (5.33) in the high mutation regime ν N 1 (with N 1 and ν 1). 5.3 Model of biodiversity The quantity φk defined in Eq. (5.41) is the average number of species with k indi-
5.5 Exercises
179
viduals, and its expression is given up to an unknown prefactor B. Introducing the auxiliary quantity Z N defined in Eq. (5.42), derive the recursion relation (5.43) and use it to determine the value of B. 5.4 Clustering in real space neutral dynamics (a) Derive Eqs. (5.54) and (5.55) from the master equation (5.52). (b) Show that the noisy diffusion equation (5.57) leads to giant density fluctuations and thus to clustering, focusing on the one-dimensional case. Hint: Introduce the spatial Fourier transform ρ(q, ˆ t) of the density field ρ(x, t), defined as ρ(q, ˆ t) =
d x ρ(x, t) eiq x .
(5.60)
Show that ρ(q, ˆ t) obeys a Langevin equation, |ρ(q, ˆ t)|2 . Conclude on
and deduce 2 the variance of the density field, Var[ρ] = d x ρ(x, t) using Parseval identity.
References 1. Condit, R., Pitman, N., Leigh, E.G., Chave, J., Terborgh, J., Foster, R.B., Núnez, P., Aguilar, S., Valencia, R., Villa, G., Muller-Landau, H.C., Losos, E., Hubbell, S.P.: Beta-diversity in tropical forest trees. Science 295, 666 (2002) 2. Crow, J.F., Kimura, M.: An Introduction to Population Genetics Theory. The Blackburn Press, Caldwell (2009) 3. Drossel, B.: Biological evolution and statistical physics. Adv. Phys. 50, 209 (2001) 4. Dugatkin, L.A.: The Altruism Equation. Princeton University Press, Princeton (2006) 5. Ewens, W.J.: Mathematical Population Genetics. Springer, Berlin (2004) 6. Hemery, M., Rivoire, O.: Evolution of sparsity and modularity in a model of protein allostery. Phys. Rev. E 91, 042704 (2015) 7. Houchmandzadeh, B.: Clustering of diffusing organisms. Phys. Rev. E 66, 052902 (2002) 8. Houchmandzadeh, B.: Neutral clustering in a simple experimental ecological community. Phys. Rev. Lett. 101, 078103 (2008) 9. Houchmandzadeh, B.: Theory of neutral clustering for growing populations. Phys. Rev. E 80, 051920 (2009) 10. Houchmandzadeh, B.: Fluctuation driven fixation of cooperative behavior. BioSystems 127, 60 (2015) 11. Houchmandzadeh, B., Vallade, M.: Alternative to the diffusion equation in population genetics. Phys. Rev. E 82, 051913 (2010) 12. Houchmandzadeh, B., Vallade, M.: The fixation probability of a beneficial mutation in a geographically structured population. New J. Phys. 13, 3020 (2011) 13. Hubbell, S.P.: The Unified Neutral Theory of Biodiversity and Biogeography. Princeton University Press, Princeton (2001) 14. Kimura, M.: Solution of a process of random genetic drift with a continuous model. Proc. Natl. Acad. Sci. USA 41, 144 (1955) 15. Martin, A.: The kaleidoscope ocean. Philos. Trans. Royal Soc. A 363, 2873 (2005) 16. Moran, P.A.P.: The Statistical Processes of Evolutionary Theory. Oxford University Press, New York (1962) 17. Nekola, J.C., White, P.S.: The distance decay of similarity in biogeography and ecology. J. Biogeogr. 26, 867 (1999)
180
5 Stochastic Population Dynamics and Biological Evolution
18. Sella, G., Hirsh, A.E.: The application of statistical physics to evolutionary biology. Proc. Natl. Acad. Sci. USA 102, 9541 (2005) 19. Traulsen, A., Nowak, M.A.: Evolution of cooperation by multilevel selection. Proc. Natl. Acad. Sci. USA 103, 10952 (2006) 20. Vallade, M., Houchmandzadeh, B.: Analytical solution of a neutral model of biodiversity. Phys. Rev. E 68, 061902 (2003) 21. Wilson, D.S.: The group selection controversy: history and current status. Annu. Rev. Ecol. Syst. 14, 159 (1983) 22. Wright, S.: The differential equation of the distribution of gene frequencies. Proc. Natl. Acad. Sci. USA 31, 382 (1945)
Chapter 6
Complex Networks
In previous chapters, we have considered in different situations the dynamics of particles, agents, or individuals in a population. Although this was not necessarily made explicit, such interacting entities often generate a complex network of interactions, which evolves in time. A case of particular importance is the contact network in a population of individuals, which plays a very important role in the propagation of epidemics, for instance. There are many other examples of networks in our modern technological world, like the Internet (network of connected computers), the web (network of hyperlinks between websites), or different types of transportation networks (railways, airline connections between airports, etc.) [1, 2, 11, 18]. In such situations, the regular lattice geometry physicists are familiar with is no longer relevant, and less regular networks with more complex connectivity patterns have to be considered and characterized. In particular, complex networks including some type of randomness have proven useful in the description of real-world data [1, 11, 48]. In the last two decades, this field has known a very intense research activity, partly driven by the increase of computer power (both in terms of memory and computation speed) and the somewhat related availability of large datasets. In many situations of interest, networks rearrange by creating or removing links on a time scale much longer than the dynamics occurring on the network (e.g., transportation, or internet traffic). In these situations, the dynamics occurs on an essentially static network, and it is thus of interest to characterize the statistical properties of static networks [11]. In this chapter, we first discuss in Sect. 6.1 some basic statistical characterizations of static random networks, considering both homogeneous and heterogeneous random networks. We then discuss in Sect. 6.2 the case of epidemic spreading as a typical example of dynamics on a network, where network heterogeneity plays a key role. The analogy with rumor propagation on social networks is also briefly discussed. Finally, in Sect. 6.3 we turn to another important application of random networks, by considering formal neural networks. We discuss the asymmetric diluted Hopfield model of associative memory and briefly introduce © Springer Nature Switzerland AG 2021 E. Bertin, Statistical Physics of Complex Systems, Springer Series in Synergetics, https://doi.org/10.1007/978-3-030-79949-6_6
181
182
6 Complex Networks
classification problems on the example of the Perceptron model, which leads us to mention the related and more general issue of constraint satisfaction problems.
6.1 Basic Types of Complex Networks 6.1.1 Random Networks A network (or graph) is basically defined by a set of N nodes (also called vertices, or sites), and a list of links (or edges) between these nodes. One can define a variable gi j which is equal to 1 when there is a link between i and j, and equal to 0 otherwise. Links may be directed or not. A directed link means that the link is defined from i to j. Hence the fact that i is linked to j does not imply that j is linked to i; to do so, a second link has to be drawn from j to i. If links are directed, the network is called a directed graph. From the knowledge of lattices in statistical physics, we are more familiar with undirected graphs. For undirected graphs, the matrix formed by the coefficients gi j is symmetric by construction. For a directed graph, the matrix gi j is not necessarily symmetric, but it may also be in some limit cases, if all links between two sites come in pair with opposite orientations. This limit case is however of low interest, as it boils down to having an undirected graph. So in practice, directed graphs have asymmetric matrices gi j . For a random graph, one has to assign to each possible realization of a graph with N nodes (that is, to each matrix gi j ) a given probability. The simplest type of random network is the so-called Erdös–Rényi random network. Two variants of this network actually exist. The original Erdös–Rényi model consists in assigning an equal probability to all possible graphs with N nodes and M undirected links, and a zero probability to graphs having a number of links different from M. Although conceptually simple, this model is not the most convenient one for practical calculations, and one often uses instead an alternative version of the model. In this second variant, one builds a random graph by including a link between each pair (i, j) of nodes independently with a probability p. Hence the number of links is on average M = pN (N − 1)/2, but fluctuations around this value are allowed. The difference between these two versions of the model is somewhat similar to the difference between the microcanonical and canonical ensembles introduced in equilibrium statistical physics. In this latter context, calculations are more convenient in the canonical ensemble. Similarly, calculations of graph properties are easier in the second version of the model with independent probabilities to have a link between two nodes. In the version of the Erdös–Rényi model having independent random links on each node with probability p, the total number M of links is a random variable with a binomial distribution
6.1 Basic Types of Complex Networks
P(M) =
183
Mmax ! p M (1 − p) Mmax −M , M!(Mmax − M)!
(6.1)
where Mmax = N (N − 1)/2 is the number of pairs of nodes. The interpretation of the binomial distribution (6.1) is simple. The probability that a link is present on a given pair of nodes is p, and the probability to have no link is 1 − p. Hence the probability to have M links at given positions (i.e., on given pairs of nodes) and no links elsewhere is p M (1 − p) Mmax −M , due to statistical independence. The combinatorial factor in Eq. (6.1) simply counts the number of possible positions that the M links can occupy. In the large N limit, the distribution P(M) takes a large deviation form (0 < y < 1) (6.2) P(M) ≈ e−Mmax (y) with y = M/Mmax and (y) = y ln
1−y y + (1 − y) ln . p 1− p
(6.3)
√ The fluctuations of M on a scale Mmax ∼ N are described by a Gaussian distribution of mean value M = pN 2 /2 and variance (M − M)2 = pN 2 /2. Another simple quantity of interest is the degree k of a node, defined as the number of nodes to which a given node is connected. This degree k ranges by definition between 0 and N − 1. The degree distribution is also binomial, like the distribution of the total number of links, and reads Pd (k) =
(N − 1)! p k (1 − p) N −1−k . k!(N − 1 − k)!
(6.4)
This distribution is easily interpreted as follows. A given node can potentially be connected to any of the N − 1 other nodes. The probability to have a degree k is thus given by the probability p k to have k links, times the probability (1 − p) N −1−k that there is no link to the N − 1 − k remaining nodes. The combinatorial factor in Eq. (6.4) then simply counts the number of ways to choose the k nodes among N − 1 to connect the links. From Eq. (6.4), the average degree is equal to k = p(N − 1) (≈ pN for large N ). Hence it diverges with the size of the graph, meaning that in a large graph, any node is connected to a large number of other nodes. In many applications however, one is interested in large random graphs with a fixed (relatively low) average degree. The solution is then to choose a probability p that depends on N , namely, p = z/N , where z is a constant. The average number of neighbors is then equal to z for large N . In this case, the distribution (6.4) simplifies, for large N , to a Poisson distribution [44] z k −z e , (6.5) Pd (k) = k!
184
6 Complex Networks
which is independent of N . Under the assumption p = z/N , the distribution P(M) of the total number of nodes takes for large N a large deviation form which differs from Eq. (6.2), namely, ˜ P(M) ≈ e−N (x) (6.6) with x = M/N , and ˜ (x) = x ln
2x z −x+ . z 2
(6.7)
The distribution (6.6) concentrates around the average value M = z N /2. Note that ˜ the function (x) is now defined over the entire positive real axis, while the function (y) introduced √ in Eq. (6.3) is restricted to the interval 0 < y < 1. Fluctuations of M on a scale N are still described by a Gaussian statistics, with a variance (M − M)2 = z N /2.
6.1.2 Small-World Networks Another important class of complex networks, which plays an important role in the modeling of real-world data, is the small-world network introduced by Watts and Strogatz [56]. The precise definition of the Watts–Strogatz model can be found, for instance, in [1]. Here, we only sketch the main idea which is common to networks with small-world properties. It consists in associating properties of “metric networks” (networks where only nodes closer than a given distance in the embedding Euclidean space are linked) with properties of random networks, like the fact that the length of the shortest path (counted in number of links along the network) remains relatively small even for large networks. To do so, one basically starts from a “metric network” (e.g., a lattice) and adds to it with a small probability some random links between arbitrary (typically distant) nodes. Instead of adding links, one may also “rewire” the network, that is choose randomly a link (i, j) and replace it by a link (i, k), where k has been chosen randomly among all nodes (possibly with some constraints). In this way, the number of links is conserved, which may be of interest in some cases. The motivation for introducing such networks notably comes from the wish to have complex networks with a high clustering coefficient (as observed in many real-world networks), while classical Erdös–Rényi networks have a low clustering coefficient. The (local) clustering coefficient quantifies the tendency of the neighbors of a given node i 0 to be connected between themselves. If a node has k neighbors, these neighbors can have at most n max = k(k − 1)/2 undirected links between them (note that links to the original node i 0 are not counted). If the actual number of links between the neighbors of the node i 0 is n (again excluding links with i 0 ), the local clustering coefficient is given by c(i 0 ) = n/n max . An average clustering coefficient c¯ can be defined by averaging c(i 0 ) over the nodes i 0 . Regular networks like lattices have a relatively large (meaning a finite fraction of unity) average clustering coefficient, while the average shortest path length is large for a large graph. In contrast, random
6.1 Basic Types of Complex Networks
185
graphs of the Erdös–Rényi type have a low (much smaller than one) average clustering coefficient, as well as a small average shortest path length. Yet, many real-world networks have both a relatively large clustering coefficient, and a small shortest path length, which motivated the introduction of small-world networks satisfying this property. As a more concrete illustration of the interest of small-world networks, let us consider the following real-world application related to transportation networks. Let us imagine that we try to analyze the railway network on the scale of a large country, or of a continent. Nodes of the network are the cities in which there is a railway station, and the links are the railways between cities. Such a network is constrained by the two-dimensional geometry of the surface of the Earth, and is thus expected to be of metric type: there are most often no direct railways between very distant cities. Now imagine that one is not interested only in the railway network, but more generally on transportation means between cities. One will thus also include in the network the airplane lines between large cities having an airport. Including these airplane lines thus drastically changes the properties of the network. With only railways, the length of the shortest path along the network typically grows as the Euclidean distance (on the surface of the Earth) between the nodes. Including the airplane lines, the “length” of the shortest path (understood here as the time needed to travel along the path) grows much more slowly with distance. This is in line with our common life experience. For instance, in our modern world, the time needed to travel 10000 km is much less than one hundred times the duration of a 100 km trip. Note however that this was not the case more than one century ago, when only ground transportations were available.
6.1.3 Preferential Attachment We have seen in Eq. (6.5) that the degree distribution is a Poisson distribution, which implies that its variance is equal to the average value k. As a result, typical fluctuations around k are of the order of k1/2 . It is thus very unlikely to observe a node with a degree much larger than the average value. Yet, power-law distributions of degrees have been reported in many real networks [1]; such networks have been called “scale-free”. Even in cases when the degree distribution does not follow a power law, it still significantly differs from the Poisson distribution in most cases. A generic mechanism to account for this broader distribution has been proposed by Barabási and Albert [1] (see also the more general proportional growth model proposed by Simon [15, 54]). This mechanism relies on two basic ingredients. The first one is to model the dynamics which builds the graph, instead of simply looking at the final graph. Hence the graph is built step by step, by successively adding new nodes and connecting each new node to one or several previous nodes according to some stochastic rules. The second ingredient is related to a specific property of these stochastic rules and is called preferential attachment. The idea is that the new node tends to attach preferentially to existing nodes that already have a high degree. A
186
6 Complex Networks
simple way to do that is to assume that the probability to attach the new node to an existing node i is proportional to its degree ki . The connectivity of the added node is fixed to a given value m, meaning that m new links are randomly attached to the existing links according to the preferential attachment rule. Numerical simulations of this model show that the degree distribution has a powerlaw tail proportional to k −3 for large k [1]. This is again in stark contrast with the classical Erdös–Rényi random graph which has a Poisson degree distribution, which decays faster than any power-law distribution at large k. Interestingly, this power-law k −3 can also be predicted using relatively simple analytical arguments [1]. Let us denote as Nk (t) the average number of nodes with k edges at time t. We assume that time is continuous and that on average one node is added per unit time. According to the preferential attachment rule, the probability for each of the m links of a new node to attach at time t to a given existing node of degree k is equal to qk (t) =
k
k . k Nk (t)
(6.8)
Taking into account the fact that m links are attached to each new node, the average number Nk (t) evolves according to d Nk = m(k − 1)qk−1 (t) − mkqk (t) + δk,m . dt
(6.9)
Equation (6.9) is a balance equation formally similar to a master equation, except that the total number of nodes is not constant in time, contrary to a total probability which is constrained to remain equal to 1. The first term in Eq. (6.9) accounts for the increase of Nk due to the attachment of a new link to a node of degree k − 1, which thus becomes of degree k. The second term describes the decrease of Nk due to the attachment of a new link to a node of degree k, which thus becomes of degree k + 1. Finally, the last term accounts for the newly added node, which has degree m and contributes only to Nm . At large time, the total number of nodes satisfies
Nk (t) = t,
(6.10)
k
and the total number of links is equal to mt. Since a link is attached to two nodes, the average degree of the nodes is equal to 2m, so that for large time
k Nk (t) = 2mt .
(6.11)
k
The degree distribution P(k), assumed to be time-independent in the long time regime considered here, is given by Nk (t) P(k) = (6.12) k Nk (t)
6.1 Basic Types of Complex Networks
187
so that Nk (t) = t P(k), taking into account Eq. (6.10). In this regime, Eq. (6.9) can be rewritten as P(k) =
1 1 (k − 1)P(k − 1) − k P(k) + δk,m . 2 2
(6.13)
At large times, all nodes have at least a degree m, since new added nodes have a degree m, and initial nodes have been connected to added nodes and have a large degree (one may also assume that the initial nodes all have a degree of at least m). We thus assume that in the large time regime, Nk = 0 for k < m. Then Eq. (6.13) reduces for k > m to the following recursion relation P(k) =
k−1 P(k − 1) , k+2
(6.14)
while the case k = m provides the condition P(m) = 2/(m + 2). Solving this recursion equation leads to 2m(m + 1) P(k) = (6.15) k(k + 1)(k + 2) from which the large k behavior P(k) ∼ k −3 follows. Note that as briefly mentioned above, the present preferential attachment model is actually a specific case of the classical and more general “proportional growth” model proposed Simon [54] to account for the emergence of power-law distributions in many different contexts from simple growth rules (see also [52]). In the context of complex networks, the stochastic growth rules of the Simon model generalize the preferential attachment rules and are able to describe a power-law degree distribution P(k) ∼ k −γ with an arbitrary exponent γ > 2 [15]. To conclude this section, we further note that an interesting mapping between the random dynamics of networks and the dynamics of the Zero-Range Process has also been proposed [24]. In this case, the number of nodes is fixed, and the dynamics rather proceeds through the rewiring of links. Without entering into details, the basic idea of this mapping is that rewiring a link is similar to moving a particle from one node to another, as if a particle was attached to the end of each link. Modeling preferential attachment actually requires a generalization of the Zero-Range Process, called Misanthrope process, in which the transfer of a particle from a site to another depends on the numbers of particles on both the departure and arrival sites. With a preferential attachment dynamics, which favors rewiring to nodes with a high degree, one may obtain under some conditions a condensation transition similar to the one of the Zero-Range Process (see Sect. 3.3.1). This condensation corresponds in the network to the onset of a hub that is a node to which a finite fraction of all the links is attached. Note that the above mapping is actually not exact and requires some approximations. An exact, though more complicated mapping of a directed network to a
188
6 Complex Networks
Zero-Range process with many different types of particles has also been proposed [7]. We have considered here very basic network models, mostly undirected. Directed graphs also play an important role. Besides, each link of the network may also be associated with a “weight”, like the passenger traffic on a transportation line (train, plane,...); such graphs are called weighted networks. In addition to static graphs (or graph ensembles), we have briefly seen one type of dynamics which is the network growth, as in the Albert–Barabási model. This is actually just a way to build the graph, but the final graph which is studied is also static. Studying the dynamics of graphs rather consists in looking at graphs which evolve dynamically, for instance, due to some rewiring dynamics.
6.2 Dynamics on Complex Networks What have just discussed the elementary properties of complex networks and briefly seen possible dynamical mechanisms that may generate them. Another important dynamical aspect consists in the dynamics that may occur on the graph. In this case the graph itself is often considered as static, but both dynamical aspects could be combined, and one may consider a dynamics taking place on a rearranging graph, potentially giving rise to an interesting and non-trivial interplay between both types of dynamics. For instance, one may investigate the number of passengers transported by airplanes along regular commercial airlines [10, 18]. The graph is that of airline connections between airports. It may be considered as a static graph on short time scales, but on longer time scales airline connections may be added or removed depending on the passenger traffic, thus leading to rearrangements of the network. In the last two decades, researchers have notably focused on specific types of dynamical processes occurring on complex networks, like the spreading of an epidemic or of a rumor in a population. In this case the network represents the contacts between individuals, along which a disease or a piece of information may be transmitted. We discuss below a simple model of epidemic spreading, the SusceptibleInfected-Recovered (SIR) model. We start by the elementary mean-field version of the model, and then discuss how the model can be generalized to heterogeneous networks, leading to significant changes in the phenomenology [11, 48]. We then briefly discuss how rumor propagation may be described in a similar framework, albeit with a few important differences.
6.2 Dynamics on Complex Networks
189
6.2.1 Basic Description of Epidemic Spreading: The SIR Model Epidemic spreading through a population is obviously an important issue, and simple models attempting at a description of this phenomenon have been proposed long time ago [6, 9, 20]. Similar frameworks have also been proposed for the spreading of computer viruses [8, 30, 49]. Basic descriptions rely on a classification of individuals in the population as susceptible (S), infected (I) and in a some cases recovered (R). Susceptible individuals have not yet contracted the disease, and they can become infected if they are in contact with an infected individual, which is by definition contagious. At a later stage, the infected individual may spontaneously recover from the disease after some time. The recovered individual is no longer contagious and no longer susceptible (it cannot be infected anymore). Alternatively, the infected individual may evolve again to a susceptible state and eventually be infected again. The simplest version of the model is the susceptible-infected (SI) model, which aims at describing the early stages of the epidemic spreading. Slightly more involved versions include the susceptible-infected-susceptible (SIS) model as well as the susceptible-infected-recovered (SIR) model. In the SIS model, the epidemic spreads out and the fraction of infected individuals eventually stabilizes to a non-zero value, leading to a permanent epidemic (an endemic). On the other side, the SIR model describes a situation when the individuals are protected from the epidemic once they have recovered, leading to an eventual disappearance of the epidemic at long time. We focus below on the description of the SIR model, but the SI and SIS models are treated in a similar way. Models of epidemic spreading actually belong to the class of reaction–diffusion models, as described in Sect. 3.2.2. They can be described by analogues of chemical reactions. For the SIR model, one has S + I → 2I with rate β , I →R
with rate μ .
(6.16) (6.17)
In words, a susceptible individual S in contact with an infected individual I becomes infected with a rate β. In addition, an infected individual spontaneously recovers with a rate μ. Note that if initially no infected individual is present, there is no dynamics and all individuals remain susceptible. This is the analogue of the absorbing state in standard reaction–diffusion processes. To describe the dynamics quantitatively, we introduce the time-dependent fractions ρS (t), ρI (t) and ρR (t) of susceptible, infected and recovered individuals, respectively. By definition, ρS (t) + ρI (t) + ρR (t) = 1. At a mean-field level, the evolution equations for ρS (t), ρI (t) and ρR (t) take the simple form:
190
6 Complex Networks
dρS = −βkρI ρS , dt dρI = −μρI + βkρI ρS , dt dρR = μρI . dt
(6.18) (6.19) (6.20)
Note that the contamination rate β is reweighted by the number k of contacts of a given individual which, in a mean-field spirit has been replaced by the average number k of contacts, thereby neglecting degree fluctuations in the contact network. It is straightforward to check that this dynamics conserves the sum ρS + ρI + ρR , so that there are only two independent dynamical equations. The above dynamical equations are non-linear. However, it is of interest to study the first stage of evolution of the epidemic by linearizing the equations for small ρI and ρR , when almost all individuals are in the susceptible state (ρS ≈ 1). One then finds a closed linearized evolution equation for ρI , dρI = βk − μ ρI . (6.21) dt Hence a threshold appears in the linearized dynamics. If βk < μ, then the fraction ρI of infected individuals decays exponentially to zero, and the epidemic does not spread out, because infected individuals recover fast enough and do not have time to infect a significant number of other individuals while they are contagious. In contrast, if βk > μ, the fraction ρI of infected individuals grows exponentially with time, ρI (t) ∝ et/τ , since infected individuals contaminate a larger number of other individuals. The growth time τ , given by τ=
1 , βk − μ
(6.22)
is called the typical outbreak time [6, 11]. Note that, quite importantly, at very early stages when only a tiny number of individuals are infected, Eq. (6.21) is no longer valid because stochastic effects cannot be neglected. The starting point of the exponential growth thus occurs randomly in time and cannot be predicted from Eq. (6.21). At later stages, the exponential growth of the fraction of infected individuals saturates, and the full non-linear dynamics has to be taken into account. An exact expression of ρI (t) can be found in the SI model (i.e., when the recovery rate μ = 0), namely, 1 , (6.23) ρI (t) = 1 + e−βkt up to an arbitrary time translation. Hence the fraction of infected individuals goes to one when t → ∞, and the whole population is eventually infected. For the SIR model, an exact expression of ρI (t) has been found through a nonlinear time reparameterization [26]. This expression involves an integral that needs to be evaluated numerically or through an accurate analytical approximant [33]. Here,
6.2 Dynamics on Complex Networks Fig. 6.1 Plot of the fractions ρS , ρI and ρR of susceptible, infected and recovered individual, respectively, as a function of time (parameters: βk = 1, μ = 0.2).
191
1 0.8
ρS ρI ρR
ρ
0.6 0.4 0.2 0 -10
0
t
10
20
for the sake of simplicity, we only mention the following elementary approximation of ρI (t) for the SIR model, obtained by a simple non-linear interpolation between the early stage exponential growth ρI (t) ∝ et/τ and late stage exponential decay ρI (t) ∝ e−μt that can be obtained exactly by linearizing Eq. (6.19) in both regimes: ρI (t) ≈
e(βk−μ)t . 1 + eβkt
(6.24)
The fraction of infected individuals thus first increases, before decaying again to zero at long time. Within this approximation, the fraction ρS (t) of susceptible individuals may then be obtained by integrating numerically Eq. (6.18). The fraction ρR (t) of recovered individuals is then finally obtained from the relation ρR (t) = 1 − ρS (t) − ρI (t). As an illustration, the three quantities ρS (t), ρI (t) and ρR (t) evaluated according to the above approximation are plotted in Fig. 6.1.
6.2.2 Epidemic Spreading on Heterogeneous Networks We have described the mean-field scenario of the SIR model, in which the number of contacts between agents is assumed constant. However, in many situations, the network of contact may be very heterogeneous [27]. For instance, some people may stay at home most of the time and have a low number of contacts, while on the other side people working in a hospital, for instance, may have a large number of contact (and, additionally, may have a larger probability to be infected because the fraction of infected people is locally higher than in the general population). In this subsection, we describe a way to take into account the heterogeneity of contact networks, that is the basic fact that individuals with more contacts will spread the epidemic more efficiently, a feature not taken into account in the mean-field approach. We mostly follow the presentation given in Ref. [11].
192
6 Complex Networks
To take connectivity into account, we consider the degree distribution Pd (k) of the contact network, that is the probability that a node on the contact network has k neighbors. Less formally, in the present context Pd (k) is simply the probability that an individual is in contact with k other individuals (we assume that k ≥ 1, i.e., all individuals have at least one contact). To describe the epidemic spreading on a heterogeneous network, one has to condition the fractions of susceptible, infected, and recovered individuals to their number k of contacts. We thus introduce the fraction ρkI (t) of infected individuals among the subpopulation of individuals having k contacts. Similar fractions ρkS (t) and ρkR (t) can be defined for the susceptible and recovered individuals, respectively. The fraction of infected individuals at the global population level is then defined as Pd (k)ρkI , (6.25) ρI = k
and similar definitions hold for the fractions of susceptible and recovered individuals. Extending the standard SIR framework, the dynamics of ρkI reads [14, 37, 42, 43] dρkI = −μρkI + βkρkS θk . dt
(6.26)
The first term on the rhs of Eq. (6.26) is the spontaneous decay rate to the recovered state, as for the mean-field dynamics. The second term, which describes the contamination of susceptible individuals, in more subtle. Not surprisingly, it takes into account the actual number k of contacts (instead of the average k as in the mean-field scenario), as well as the fraction ρkS of susceptible individuals having k contacts. The subtle point comes from the quantity θk , which is the probability per contact of being infected. To be more specific, θk is the probability that a randomly chosen contact of a susceptible individual having k contacts is infected. In the meanfield scenario, θk is simply equal to the global fraction ρI of infected individuals. Here, however, we need to take into account the information at hand. First, there are in principle correlations between degrees of neighboring nodes on a network, and we denote as P(k |k) the conditional probability of the degree k of neighboring nodes of a node with degree k. In the limiting case where no correlations are present between the degrees of neighboring nodes, one has P(k |k) =
k Pd (k ) , k
(6.27)
where k = k k Pd (k ) is the average node degree, used here as the correct normalization of the distribution P(k |k). The conditional distribution P(k |k) is thus independent of k in this case. Yet, note that even in the absence of degree correlations one has P(k |k) = Pd (k ) because nodes with a larger degree have a higher probability (proportional to their degree) to be connected to any given node. In addition, a second piece of information to be taken into account is that the node
6.2 Dynamics on Complex Networks
193
with degree k is also in contact with the original node of degree k which is assumed to be in the susceptible state. Hence it means that the conditional probability that the node of degree k is infected given that it is connected to at least one susceptible node is not exactly ρkI , but should be smaller. In particular, if k = 1, the node is only connected to a susceptible node and cannot be infected. If k = 2, it can be infected, but still with a somewhat smaller probability. On the other side, for large k the fact that one of the neighbors is known to be susceptible should essentially not affect the probability to be infected. As a simple ansatz satisfying the above behavior, we write the probability that a node of degree k is infected given that it is connected to a susceptible node as (1 − 1/k )ρkI . Then the probability θk is obtained by averaging over k , yielding ∞ k − 1 I θk = ρk P(k |k) . (6.28) k k =1 When correlations between node degrees are absent as in Eq. (6.27), θk becomes independent of k and dropping the index k, its expression reduces to ∞
θ=
1 (k − 1)ρkI Pd (k ) . k k =1
(6.29)
A closed evolution equation for θ can be obtained in the early stage of the epidemic by taking the time derivative of θ and using Eq. (6.26). Under the assumption that ρkI 1 (and thus θ 1) and that ρkR 1, one can approximate ρkS ≈ 1 and obtain the linearized evolution equation for θ , θ dθ = , dt τ
τ=
k . βk 2 − (β + μ)k
(6.30)
It follows that θ (t) ∝ et/τ , in turn leading to ρkI ∝ et/τ , with a k-dependent prefactor. The condition for the epidemic outbreak, namely, the positivity of the growth rate τ −1 , reads [11] k β > 2 . (6.31) μ k − k For a Poissonian degree statistics [see Eq. (6.5)], which corresponds to the simplest random networks, Var(k) ≡ k 2 − k2 = k, so that one recovers the meanfield criterion β/μ > 1/k. However, for very heterogeneous networks, k 2 may be much larger than k and the epidemic threshold becomes much lower than its mean-field value. In the limit of an infinite size network, with a power-law degree distribution, one may even have an infinite value for k 2 while k keeps a finite value. In this case, the epidemic outbreak occurs for any value of β and μ. Of course, real networks are always finite and then so is k 2 , but the effect of network heterogeneity on decreasing the epidemic threshold can be very strong. Further it can also be shown that the epidemic spreading proceeds by a sort of cascade, infecting first
194
6 Complex Networks
nodes with the largest degrees and thus progressively spreading toward nodes with smaller and smaller degrees [12, 13].
6.2.3 Rumor Propagation on Social Networks We now briefly discuss a different type of propagation phenomenon on networks, namely, the propagation of a rumor on a social network. It has been argued that rumor propagation on a social network could in some circumstances be modeled in a similar way as epidemic spreading, with however some important differences, as discussed below. A first difference is that a rumor is propagated intentionally, at odds with viral diseases, but it is not clear that this aspect alone should lead to differences at the crude level of modeling we are considering here. Note that rumor is intended here in a very broad sense, it could be any piece of knowledge (including, for instance, a behavioral habit) that is propagated to other individuals one is in contact with. Along this line, a fast propagation of some type of information, for instance, the knowledge on how to use a newly available technology, may be desirable. This is at odds with epidemic spreading, where one tries to slow down the spreading as much as possible. Keeping in mind the formal analogy with epidemics, one may be tempted to define different states of the individuals [20–22, 36]. In a social context, the equivalent of the susceptible individual is now an “ignorant” individual (I), who has not learned yet the information that may propagate. Then, the analogue of an infected individual is called a “spreader” (S), i.e., an individual that can propagate rumors.1 These two social states would allow one to define an analogue of the SI model of epidemics. It is interesting, though, to go one step further and to define the analogue of a recovered state, namely, individuals who are aware of the propagated knowledge, but no longer spread it themselves. Such individuals are sometimes called “stiflers”, and are denoted by the letter R (S already being used). Having specified the analogy, one may propose a dynamics of transitions between these categories. The following dynamics has been proposed, I + S → 2S with rate λ,
(6.32)
S + R → 2R with rate α, S+S→ R+S with rate α.
(6.33) (6.34)
It is then straightforward to write mean-field equations for the dynamics of the fractions ρI , ρS and ρR of ignorant, spreader and stifler individuals, respectively, along the same lines as in the case of epidemic spreading [see Eqs. (6.18), (6.19) and (6.20)]. One finds
1
Unfortunately, these rather natural denominations lead to invert the role of the notations S and I with respect to the epidemic case, possibly leading to some confusion.
6.2 Dynamics on Complex Networks
195
dρI = −λkρS ρI , dt dρS = λkρS ρI − αkρS ρS + ρR , dt dρR = αkρS ρS + ρR . dt
(6.35) (6.36) (6.37)
The main difference with the equations describing the epidemic spreading is that the transition from spreader to stifler is no longer spontaneous like the transition from infected to recovered, but is triggered by the contact of a spreader either with another spreader or with a stifler. An important consequence is that the associated term in Eq. (6.36), being non-linear, disappears from the linearized dynamics at early stages of the propagation, when both ρS and ρR are small. In this regime, the linearized equation for ρS reads dρS (6.38) = λkρS , dt leading to an early stage exponential growth of ρS for any propagation rate λ > 0. Hence there is no threshold for the spreading of the rumor, at odds with the SIR model of epidemic spreading where the coefficient β (the equivalent of λ) has to be larger than a threshold value. Beyond mean field, it is also possible to study the effect of network heterogeneity on rumor propagation. The effect is less spectacular here than in the SIR model, because of the absence of threshold for rumor spreading. In addition, network heterogeneity is much more difficult to tackle analytically, even for networks with uncorrelated degrees. However, network heterogeneity may still produce interesting effects. While the presence of hubs (i.e., nodes with a large connectivity) naturally leads to an enhanced propagation of the epidemic, their effect on the rumor spreading is more complex. On one side, hubs favor rumor spreading like for epidemics, but on the other side, they also increase the number of contacts among spreaders or between spreaders and stiflers, thereby increasing the number of stiflers who no longer propagate the rumor. This non-linear amplification effect is absent from the SIR model, where the node degree has no effect on recovery. It is also of interest to determine the final fraction of individuals that have been reached by the information or rumor propagation. This quantity is obtained as the infinite time limit ρR∞ of the fraction of stiflers [55]. To lighten notations, we define ρa∞ = limt→∞ ρa (t) and ρa−∞ = limt→−∞ ρa (t) (a=I, S or R). Using ρS + ρR = 1 − ρI and assuming ρS∞ = 0, Eq. (6.35) can be formally integrated as ρI∞ = ρI−∞ e−λk
∞
−∞
ρS (t) dt
.
(6.39)
On the other side, Eq. (6.37) can be transformed into α dρI dρR = αkρS + . dt λ dt
(6.40)
196
6 Complex Networks
By integrating this last equation over the whole real axis, and using the initial conditions ρR−∞ = 0 and ρI−∞ = 1 as well as again the final condition ρS∞ = 0, we get αk
α . ρS (t) dt = 1 + λ −∞ ∞
(6.41)
Combining Eqs. (6.39) and (6.41) and using the relation ρI∞ = 1 − ρR∞ , we finally obtain a simple closed equation for ρR∞ [11, 55], ∞
ρR∞ = 1 − e−βρR ,
(6.42)
with β = 1 + λ/α. This last equation can be rewritten as f (ρR∞ ) = 0, introducing the function f (x) = x − 1 + e−βx . The value ρR∞ = 0 is always a possible solution, and we are interested in the existence of a positive solution ρR∞ > 0. As f (0) = 0 and f (0) < 0, while f (x) → ∞ when x → ∞, at least one non-trivial solution ρR∞ > 0 exists. Expanding f (x) to second order in x for x → 0, we get that for λ/α 1, ρR∞ ≈
2λ . α
(6.43)
In the opposite limit λ/α 1, one obtains ρR∞ ≈ 1 − e−1−λ/α .
(6.44)
One thus finds that the final fraction of individuals reached by the rumor continuously increases from 0 to 1 as the ratio λ/α is increased.
6.3 Formal Neural Networks 6.3.1 Modeling a Network of Interacting Neurons One of the most important and complicated example of dynamics occurring on a complex network is probably that of the brain, which may be thought of as a large number of neurons connected by synapses. Neurons have an electrical activity, which is transmitted by synapses to other neurons in a unidirectional way, meaning that the information cannot flow backwards along the same synapse. Quite importantly, synapses may either be excitatory or inhibitory. If a synapse connects a neuron i to another neuron j, an increase of electrical activity in neuron i enhances the activity of j if the synapse is excitatory, while it tends to suppress it if the synapse is inhibitory. It is convenient to quantify the electrical activity of neurons by a function vi (t) (i = 1, . . . , N , where N is the number of neurons). The stimulus exerted on neuron i by other neurons j connected to i can be written, assuming additivity,
6.3 Formal Neural Networks
197
Si (t) =
Ji j (t)vi (t) ,
(6.45)
j∈Ci
where Ji j (t) is the synaptic strength (that may depend on time) and Ci is the set of neurons connected to i. Then vi (t) is assumed to evolve according to a non-linear dynamics that depends on the stimulus Si (t) [47]. A description of the time evolution of the synaptic strength Ji j (t) should also be given. The set of all these evolution equations provides a simplified model of brain, which is already extremely complex. In order to try to understand some aspects of the collective behavior of neurons, much simpler models that already capture key features of the underlying complexity have been proposed. For instance, the Hopfield model (discussed below in Sect. 6.3.2) tries to understand how the memory of externally received information can be encoded into the synaptic strengths and recovered through the dynamics of the electrical activity vi (t). The Hopfield model uses the language of spin-glasses (see Sect. 1.5) for which a formalism has been developed to deal with the complexity resulting from the presence of quenched disorder. Obviously, the mapping to a spinglass framework also induces some significant limitations and simplifications, and the resulting models cannot address all questions of biological relevance. They nevertheless constitute interesting attempts to tame part of the complexity of real neural networks. The basic idea of the Hopfield model is to replace the continuous voltages vi (t) by binary variables corresponding to zero or maximal electrical activity, and to replace the time-dependent synaptic strengths by time-independent couplings that explicitly encode some (random) stored patterns, with the aim that the dynamics of the binary variables could retrieve these stored patterns under some circumstances. For a short introductory lecture on the modeling of neural networks in a spin-glass framework, see, e.g., Ref. [47]. A much more detailed account can be found, for instance, in the classical book by Amit [3].
6.3.2 Asymmetric Diluted Hopfield Model We now describe in more detail a specific version of the Hopfield model, the asymmetric diluted Hopfield model. The Hopfield model (or Little–Hopfield model) [28, 29, 35] is a spin model composed of N spins σi = ±1 (i = 1, . . . , N ), with random coupling constants Ji j encoding some stored patterns. Due to the analogy with spin-glass models, the binary variables describing neural activity are chosen to take ±1 values, instead of the more natural values 0 and 1, but this just corresponds to a linear change of variables. The original version of the Hopfield model [28, 29, 35] considered symmetric couplings in a fully connected geometry, and its properties can be analyzed in the framework of equilibrium spin-glasses [4, 5]. Versions of the Hopfield model defined with asymmetric couplings, which no longer map to equilibrium spin-glasses, have also been considered [23, 46]. We consider here a diluted asymmetric version of the Hopfield model [23], in which most couplings Ji j = 0 and non-zero couplings are most often not symmetric, meaning that Ji j and J ji may
198
6 Complex Networks
be different. The asymmetry of the couplings implies that the model is not defined by a Hamiltonian as usual equilibrium spin-glass models, but through stochastic dynamical rules. It is useful to define a local field h i as hi =
Ji j σ j .
(6.46)
j ( =i)
For the sake of simplicity, we focus here on a parallel update rule, but asynchronous updates could equally be considered [23]. In the parallel stochastic update scheme, the spin σi (t + 1) = s with probability [1 + e−2sβh i ]−1 (s = ±1), where β is an inverse temperature (it may be considered as an effective temperature rather than a genuine thermodynamic temperature). It follows that the average value of σi (t + 1) is given by (6.47) σi (t + 1) = tanh(βh i (t)) . The specificity of the Hopfield model is that coupling constants actually encode a number of stored patterns. We define p patterns ξ μ , μ = 1, . . . , p, with value μ μ ξi = ±1 on site i = 1, . . . , N . All pattern variables ξi are independent random variables that do not depend on time, therefore, constituting a quenched disorder in the model. The coupling Ji j is then defined in terms of the patterns ξ μ as [23] Ji j = Ci j
p
μ μ
ξi ξ j ,
(6.48)
μ=1
where Ci j is a (time-independent) random variable equal to 1 with probability λ/N , and equal to 0 with probability 1 − λ/N , where λ is a fixed parameter, assumed to be smaller than N . Note that Ci j and C ji are independent random variables, so that in most instances Ji j = J ji . Hence the random constants Ci j encode the asymmetry as well as the dilution (most coupling constants Ji j = 0) in this version of the model. Note also that the original version of the Hopfield model was symmetric, with Ci j = 1/N for all i, j. A consequence of the dilution property is that the number K of sites j connected to site i (in the sense that Ji j = 0) is a random variable distributed, in the large N limit, according to a Poisson distribution (see also Sect. 6.1.1) Pd (K ) =
λ K −λ e . K!
(6.49)
In particular we have for the average value K = λ (we use an overbar here to denote the average over the couplings, to distinguish it from the dynamical average . . . over the spin dynamics). At a qualitative level, the idea of the Hopfield model is that if the initial spin configuration has a non-zero overlap (i.e., some similarity) with one of the stored patterns, say ξ 1 , then it will evolve to a steady state which keeps a non-zero overlap with this pattern (while if the pattern was not stored, the overlap would progressively disappear). To be more quantitative, we define the overlap m(t)
6.3 Formal Neural Networks
199
of the average spin configuration σi (t) with the pattern ξ 1 , N 1 1 m(t) = ξ σi (t) . N i=1 i
(6.50)
We assume that m(t) takes for t = 0 a non-zero macroscopic value (i.e., a value that remains non-zero in the limit N → ∞), while the initial overlaps with the other stored patterns ξ μ (μ = 2, . . . , p) vanish in the macroscopic limit. We then try to derive an evolution equation for m(t). Using Eq. (6.47), m(t + 1) can be written as N 1 tanh ξi1 βh i (t) , N i=1
m(t + 1) =
(6.51)
where ξi1 = ±1 can be written inside the hyperbolic tangent using parity properties. Defining the variable h˜ i (t) = ξi1 h i (t), and assuming that for all i, h˜ i (t) is ran˜ t), we obtain in the limit N → ∞ that domly drawn from the same distribution P(h, Eq. (6.51) can be rewritten as m(t + 1) =
∞
−∞
˜ t) tanh(β h) ˜ , d h˜ P(h,
(6.52)
by replacing the empirical average over i by an ensemble average, in the spirit of the Law of Large Numbers (see Sect. 8.1.1). For a given site i, let us define the K sites j1 , . . . , j K connected to i (i.e., Ji j = 0). We recall that K is a random variable distributed according to the Poisson distribution (6.49). Using Eqs. (6.46) and (6.48), we obtain h˜ i (t) = rK=1 xr,i , where we have defined xr,i = ξ 1jr σ jr (t) +
p
μ μ
ξi1 ξi ξ jr σ jr (t) .
(6.53)
μ=2
For large λ, K is typically large, and h˜ i (t) is a sum of a large number of identi˜ t) converges to cal random variables. One thus expects that the distribution P(h, a Gaussian distribution in this limit. To quantitatively determine this distribution, we evaluate the mean and variance of the random variable xr,i . To simplify the presentation, we use the annealed approximation, meaning that we average both μ on the spin variables σ j = ±1 and on the pattern variables ξi = ±1 (in principle, the pattern variables are quenched random variables and should be averaged over only at the end of the calculation). For the sake of simplicity, we also assume that the local overlap m j (t) = σ j (t) is site-independent, and is thus equal to the average overlap m(t). Within this approximation, one easily obtains xi,r = m(t) and Var(xi,r ) = p − m(t)2 . One can also show that xi,r and xi ,r are uncorrelated for (i, r ) = (i , r ).
200
6 Complex Networks
For large λ, the Poissonian distribution Pd (K ) is sharply peaked around the average value K , and we can then approximate K by K . Knowing the first two cumulants xi,r and Var(xi,r ), the Gaussian random variable h˜ can be conveniently parametrized as
(6.54) h˜ = K m + K ( p − m 2 ) z , where z is a centered Gaussian random variable with unit variance. This result comes from the Central Limit Theorem (see Sect. 8.1.1). It follows that Eq. (6.52) can be rewritten as m(t + 1) = f m(t) , with a function f (m) defined as 1 f (m) = √ 2π
∞
dz e
−z 2 /2
−∞
tanh β(K m +
K(p −
m 2 )z)
.
(6.55)
To get a smooth function f (m) for large K , it is useful to consider that p is large as well, and that β is small (high effective temperature). We thus assume the following scaling relations between p, β and K , p = αK ,
β=
β˜ K
,
(6.56)
with α and β˜ two fixed parameters, and K 1 (note that the limit N → ∞ is taken first, before the large K limit). Under these scaling assumptions, the function f (m) defined in Eq. (6.55) converges in the large K limit to the smooth function [23] 1 f (m) = √ 2π
∞ −∞
dz e−z
2
/2
√ ˜ + αz) . tanh β(m
(6.57)
It is easy to show that f (−m) = − f (m), as expected from general symmetry considerations. In the long time limit, the dynamics of m(t) converges to a fixed point satisfying f (m) = m. One has f (0) = 0, so that m = 0 is always a fixed point. Fur> 1. ther, as f (m) → 1 when m → ∞, a non-zero fixed point m ∗ is expected if f (0)√ Evaluating the derivative f (0), one finds with the change of variable y = β˜ α z that ∞ 1 2 ˜2 dy e−y /(2β α) [1 − (tanh y)2 ] . (6.58) f (0) = √ 2π α −∞ √ f (0) = 2/(π α). The condiFor large β˜ (taken after the limit K → ∞), one finds √ tion f (0) = 1 thus defines the critical value αc = 2/π ≈ 0.637 [23]. For α > αc , f (0) < 1 and there is no fixed point solution m ∗ = 0. In contrast, for α < αc , f (0) > 1 and a fixed point solution m ∗ = 0 exists. By further expanding f (m) in powers of m for small m, it can be shown that m ∗ ∼ (αc − α)1/2 , for 0 < αc − α 1 [23]. We recall that α compares the average connectivity K with the number p of stored patterns. Hence when p < αc K , the memory of the stored pattern ξ 1 is at least partially recovered dynamically, in the sense that the spin state reached dynamically
6.3 Formal Neural Networks
201
has a non-zero overlap with the selected stored pattern. By contrast when p > αc K , the pattern is not recovered dynamically. In a sense, the memory capacity is exceeded. Interestingly, for α αc , a small initial overlap m of the spin configuration σi with the stored pattern ξi1 leads to an almost full dynamical recovery of the pattern since m ∗ ≈ 1.
6.3.3 Perceptron and Constraint Satisfaction Problem We have seen above how some basic aspects of memory could be modeled in terms of formal neural networks, within the underlying framework of spin-glass theory. Another application of neural networks concerns classification problems, and the simplest model of neural network addressing this issue is the so-called Perceptron model. Classification tasks can be easily understood with the following simple example. Suppose you have two sets of images all in the same format, the first set being pictures of cats and the second set pictures of dogs. In a computer, these images are stored as vectors of numbers. Now the question is: could a formal neural network classify these pictures (i.e., the corresponding vectors) into two sets, the pictures of cats and the pictures of dogs? In practice, one would give an image as input, and the neural network would give a binary variable as output, say −1 for pictures of cats and 1 for pictures of dogs. Of course, if such a classification by a neural network is possible it should require some training. So the actual procedure would rather be to train the network on a set of pictures whose category is known in order to set the parameters of the neural network. Once this training is performed, one could try new pictures as inputs (without changing the parameters of the network) and see whether the output correctly predicts the category of the picture. This general procedure is precisely the one used in the Perceptron model, which is defined as follows. A set of p training patterns (ξ 1 , . . . , ξ p ) is introduced, where each ξ k is a vector with N components ξik (i = 1, . . . , N ). In spirit, N could be the number of pixels in a black and white picture, for instance and ξik the corresponding pixel value. Yet, statistical physicists have mostly focused on the specific situation when ξ k takes binary values, say ξik = ±1 (in the picture example, this would be an extreme case where pixels can only be black or white, with no intermediate gray scale). The patterns are classified into two classes, and we define a binary variable σk (k = 1, . . . , p) which indicates the class the pattern belongs to: σk = −1 if pattern ξ k belongs to the first class, while σk = 1 if it belongs to the second class. The Perceptron is defined by the following output function f which associates a binary output f (ξ ) = ±1 with an arbitrary input vector pattern ξ (with binary components ξi = ±1) [39], N wi ξi . (6.59) f (ξ ) = sign i=1
202
6 Complex Networks
Here the parameters wi are N adjustable weights (analogous to the synaptic strengths) that need to be determined using the set of training patterns in order to satisfy the following conditions, k = 1, . . . , p . (6.60) f (ξ k ) = σk , Once the weights wi have been determined from the training set, the function f (ξ ) can be used to make predictions on new patterns. Note that the training phase where weights are progressively adjusted to match the output is an example of the more general notion of supervised learning. Determining the weights wi that match conditions (6.60) is a difficult task that has to be performed numerically and requires powerful algorithms when N is large. Such efficient algorithms have been known for a long time when the weights wi are allowed to take real values [51]. However, the situation is more complicated when the weights can take only binary values wi = ±1 (this restricted choice is motivated in part by the fact that some real synapses have a finite number of discrete stable states [45, 50]). For such binary weights, it has been shown theoretically that the training phase of the perceptron could be successfully performed, in the limit N → ∞, for p/N ≡ α < αc ≈ 0.83 [32]. This means that for α < αc a set of weights wi satisfying conditions (6.60) in principle exists, although finding it in practice for large N is difficult. Efficient algorithms allowing this task to be solved even for large systems have been developed in the last 20 years [17]. They are based on the so-called message passing procedures [38]. Roughly speaking, message passing procedures consist in writing self-consistency relations on some probability distributions of the variables xi . In their simplest instances, like the Belief Propagation algorithm, the self-consistency relations are exact only on tree-like graphs. Yet, generalizations taking into account loops or correlations have also been proposed [16, 57]. Let us mention that finding the weights wi that satisfy Eq. (6.60) is a typical example of a general problem called constraint satisfaction problem. A more generic instance of constraint satisfaction problem may be formulated in terms of logical clauses as follows [39]. Consider N Boolean variables xi , taking by definition one of the two logical values xi = T (“true”) or xi = F (“false”). This is equivalent to having binary spin variables taking ±1 values, but the Boolean formulation is more natural in this context. The constraints to be satisfied take the form of logical clauses that typically combine some of the variables xi , or their negation x i , using logical “OR” functions, like, for instance, xi1 ∨ x i2 ∨ . . . xi K if the clause involves K distinct variables. If there exists a configuration of the N Boolean variables xi such that all clauses are valid, the list of constraints is said to be “SAT” (for “satisfied”), while it is “UNSAT” otherwise. In the latter case one may try to find configurations of variables xi that satisfy the largest possible number of clauses. Note that satisfiability problems are related to graph theory through the so-called factor graph representation [39], which translates the list of constraints into a graph. This representation involves heterogeneous graphs with different types of nodes and links: nodes may represent either a variable or a logical OR function, and links
6.3 Formal Neural Networks
203
connecting a variable to a function encode whether the variable is negated or not [34]. Satisfiability problems have been studied either according to a worst case analysis [19] or by studying typical cases [38]. The latter approach focuses on typical realizations of the random clauses, while the former also includes atypical realizations that are rare for large N . Statistical physics methods developed in the field of spin-glasses are well-suited to study typical realizations of the random clauses [39]. An important example of problems of this type is the so-called random K-SAT problem, in which all p clauses involve exactly K variables xi randomly chosen among the N variables. Each selected variable xi is negated with a probability 21 . It is again convenient to define α = p/N and to take the large N limit at fixed α. In the case K = 3, for instance, one finds for almost all realizations of the random clauses that for α < αc ≈ 4.3 all clauses are satisfied, while for α > αc all clauses cannot be simultaneously satisfied [25, 40, 41]. For finite N , the transition takes the form of a smooth crossover as a function of α [31, 53]. The sharp transition observed in the infinite N limit has been interpreted as a phase transition, for which statistical physics tools are well-suited [40, 41]. The interested reader may find more information, for instance, in the book by Mézard and Montanari [38].
6.4 Exercices 6.1 Moments of node degree Consider an Erdös–Rényi random network with probability p = z/N of connecting two nodes. In the large N limit, the degree distribution Pd (k) is Poissonian, its expression is given in Eq. 6.5. Evaluate the average degree k and the variance Var(k) = k 2 − k2 . 6.2 Final fraction of recovered individuals in the SIR model Drawing inspiration from a somewhat similar calculation made for rumor propagation in Sect. 6.2.3, find a closed equation for the final number ρR∞ of recovered individuals in the standard SIR model introduced in Sect. 6.2.1 (ρR∞ measures the total number of individuals that have been infected during the epidemics). Determine ρR∞ perturbatively close to the epidemic threshold. 6.3 SIS model on heterogeneous network Generalize the reasoning made for the SIR model on heterogeneous network to the SIS model briefly introduced in Sect. 6.2.1. In the SIS model, individuals can only be in the susceptible or infected state, and infected individuals become susceptible again with a rate μ. Write the evolution equations for ρkI and determine the epidemic threshold and the growth rate, under the assumption of uncorrelated node degrees. 6.4 Perceptron model for small N Solve the Perceptron problem of finding the weights wi = ±1 satisfying Eqs. (6.59) and (6.60) on a specific example in the simple case N = 3 and p = 2, having chosen
204
6 Complex Networks
some values of the patterns ξik = ±1 and of the variables σk = ±1 defining the two classes of patterns. One can set, for instance, σ1 = 1 and σ2 = −1.
References 1. Albert, R., Barabási, A.L.: Statistical mechanics of complex networks. Rev. Mod. Phys. 74, 47 (2002) 2. Amaral, L.A.N., Scala, A., Barthélémy, M., Stanley, H.E.: Classes of small-world networks. Proc. Natl. Acad. Sci. USA 97, 11149 (2000) 3. Amit, D.J.: Modeling Brain Function: The World of Attractor Neural Networks. Cambridge University Press, Cambridge (1992) 4. Amit, D.J., Gutfreund, H., Sompolinsky, H.: Spin-glass models of neural networks. Phys. Rev. A 32, 1007 (1985) 5. Amit, D.J., Gutfreund, H., Sompolinsky, H.: Storing infinite numbers of patterns in a spin-glass model of neural networks. Phys. Rev. Lett. 55, 1530 (1985) 6. Anderson, R.M., May, R.M.: Infectious Diseases of Humans: Dynamics and Control. Oxford University Press, Oxford (1992) 7. Angel, A.G., Hanney, T., Evans, M.R.: Condensation transitions in a model for a directed network with weighted links. Phys. Rev. E 73, 016105 (2006) 8. Aron, J.L., Gove, R.A., Azadegan, S., Schneider, M.C.: The benefits of a notification process in addressing the worsening computer virus problem: results of a survey and a simulation model. Comput. Secur. 20, 693 (2001) 9. Bailey, N.T.: The mathematical theory of infectious diseases. Griffin (1975) 10. Barrat, A., Barthélemy, M., Pastor-Satorras, R., Vespignani, A.: The architecture of complex weighted networks. Proc. Natl. Acad. Sci. USA 101, 3747 (2004) 11. Barrat, A., Barthélémy, M., Vespignani, A.: Dynamical Processes on Complex Networks. Cambridge University Press, Cambridge (2008) 12. Barthélemy, M., Barrat, A., Pastor-Satorras, R., Vespignani, A.: Velocity and hierarchical spread of epidemic outbreaks in complex scale-free networks. Phys. Rev. Lett. 92, 178701 (2004) 13. Barthélemy, M., Barrat, A., Pastor-Satorras, R., Vespignani, A.: Dynamical patterns of epidemic outbreaks in complex heterogeneous networks. J. Theor. Biol. 235, 275 (2005) 14. Boguñá, M., Pastor-Satorras, R., Vespignani, A.: Epidemic spreading in complex networks with degree correlations. Lect. Notes Phys. 652, 127 (2003) 15. Bornholdt, S., Ebel, H.: World wide web scaling exponent from Simon’s 1955 model. Phys. Rev. E 64, 035104(R) (2001) 16. Braunstein, A., Mézard, M., Zecchina, R.: Survey propagation: an algorithm for satisfiability. Rand. Struct. Algor. 27, 201 (2005) 17. Braunstein, A., Zecchina, R.: Learning by message passing in networks of discrete synapses. Phys. Rev. Lett. 96, 030201 (2006) 18. Colizza, V., Barrat, A., Barthélemy, M., Vespignani, A.: The role of the airline transportation network in the prediction and predictability of global epidemics. Proc. Natl. Acad. Sci. USA 103, 2015 (2006) 19. Cook, S.A.: The complexity of theorem-proving procedures. In: Proceedings of the 3rd Annual ACM Symposium on the Theory of Computing, p. 151 (1971) 20. Daley, D.J., Gani, J.: Epidemic Modelling: An Introduction. Cambridge University Press, Cambridge (2000) 21. Daley, D.J., Kendall, D.G.: Epidemics and rumours. Nature 204, 1118 (1964) 22. Daley, D.J., Kendall, D.G.: Stochastic rumours. IMA J. Appl. Math. 1, 42 (1965) 23. Derrida, B., Gardner, E., Zippelius, A.: An exactly solvable asymmetric neural network model. EPL 4, 167 (1987)
References
205
24. Evans, M.R., Hanney, T.: Nonequilibrium statistical mechanics of the zero-range process and related models. J. Phys. A Math. Gen. 38, R195 (2005) 25. Friedgut, E.: Sharp thresholds of graph properties, and the K-Sat problem. J. Amer. Math. Soc. 12, 1017 (1999) 26. Harko, T., Lobo, F.S.N., Mak, M.K.: Exact analytical solutions of the Susceptible-InfectedRecovered (SIR) epidemic model and of the SIR model with equal death and birth rates. Appl. Math. Comput. 236, 184 (2014) 27. Hethcote, H.W., Yorke, J.A.: Gonorrhea: transmission and control. Lect. Notes Biomath. 56, 1 (1984) 28. Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. USA 79, 2554 (1982) 29. Hopfield, J.J.: Neurons with graded response have collective computational properties like those of two-state neurons. Proc. Natl. Acad. Sci. USA 81, 3088 (1984) 30. Kephart, J.O., White, S.R., Chess, D.M.: Computers and epidemiology. IEEE Spect. 30, 20 (1993) 31. Kirkpatrick, S., Selman, B.: Critical behavior in the satisfiability of random Boolean expressions. Science 264, 1297 (1994) 32. Krauth, W., Mézard, M.: Storage capacity of memory networks with binary couplings. J. Phys. France 50, 3057 (1989) 33. Kröger, M., Schlickeiser, R.: Analytical solution of the SIR-model for the temporal evolution of epidemics. Part A: time-independent reproduction factor. J. Phys. A: Math. Theor. 53, 505601 (2020) 34. Kschischang, F.R., Frey, B.J., Loeliger, H.A.: Factor graphs and the sumproduct algorithm. IEEE Trans. Inf. Theory 47, 498 (2001) 35. Little, W.A.: The existence of persistent states in the brain. Math. Biosci. 19, 101 (1974) 36. Maki, D.P., Thompson, M.: Mathematical Models and Applications, with Emphasis on the Social, Life and Management Sciences. Prentice-Hall, Hoboken (1973) 37. May, R.M., LLoyd, A.L.: Infection dynamics on scale-free networks. Phys. Rev. E 64, 066112 (2001) 38. Mézard, M., Montanari, A.: Information, Physics and Computation. Oxford University Press, Oxford (2009) 39. Mézard, M., Mora, T.: Constraint satisfaction problems and neural networks: a statistical physics perspective. J. Physiol. Paris 103, 107 (2008) 40. Mézard, M., Parisi, G., Zecchina, R.: Analytic and algorithmic solution of random satisfiability problems. Science 297, 812 (2003) 41. Monasson, R., Zecchina, R., Kirkpatrick, S., Selman, B., Troyansky, L.: Determining computational complexity from characteristic phase transitions. Nature 400, 133 (1999) 42. Moreno, Y., Pastor-Satorras, R., Vespignani, A.: Epidemic outbreaks in complex heterogeneous networks. Eur. Phys. J. B 26, 521 (2002) 43. Newman, M.E.J.: Spread of epidemic disease on networks. Phys. Rev. E 66, 016128 (2002) 44. Newmann, M.E.J., Strogatz, S.H., Watts, D.J.: Random graphs with arbitrary degree distributions and their applications. Phys. Rev. E 64, 026118 (2001) 45. O’Connor, D.H., Wittenberg, G.M., Wang, S.S.H.: Graded bidirectional synaptic plasticity is composed of switch-like unitary events. Proc. Natl. Acad. Sci. USA 102, 9679 (2005) 46. Parisi, G.: Asymmetric neural networks and the process of learning. J. Phys. A 19, L675 (1986) 47. Parisi, G.: Attractor neural networks (1994). arxiv:cond-mat/9412030 48. Pastor-Satorras, R., Castellano, C., Van Mieghem, P., Vespignani, A.: Epidemic processes in complex networks. Rev. Mod. Phys. 87, 925 (2015) 49. Pastor-Satorras, R., Vespignani, A.: Epidemic spreading in scale-free networks. Phys. Rev. Lett. 86, 3200 (2001) 50. Petersen, C.C.H., Malenka, R.C., Nicoll, R.A., Hopfield, J.J.: All-or-none potentiation at Ca3Ca1 synapses. Proc. Natl. Acad. Sci. USA 95, 4732 (1998) 51. Rosenblatt, F.: Principles of Neurodynamics: Perceptions and the Theory of Brain Mechanisms. Spartan Books, New York (1962)
206
6 Complex Networks
52. Saichev, A., Malevergne, Y., Sornette, D.: Theory of Zipf’s law and beyond. Lecture Notes in Economics and Mathematical Systems, vol. 632. Springer, Berlin (2009) 53. Selman, B., Kirkpatrick, S.: Critical behavior in the computational cost of satisfiability testing. Artif. Intell. 81, 273 (1996) 54. Simon, H.A.: On a class of skew distribution functions. Biometrika 42, 425 (1955) 55. Sudbury, A.: The proportion of the population never hearing a rumor. J. Appl. Prob. 22, 443 (1985) 56. Watts, D.J., Strogatz, S.H.: Collective dynamics of “small-world” networks. Nature 393, 440 (1998) 57. Yedidia, J.S., Freeman, W.T., Weiss, Y.: Advances in Neural Information Processing Systems (NIPS), chap. Generalized belief propagation, p. 689. MIT Press, Cambridge (2001)
Chapter 7
Statistical Description of Dissipative Dynamical Systems
Although we have up to now mostly focused on stochastic descriptions of nonequilibrium systems, deterministic descriptions are also a widely used tool in complex system modeling. In the beginning of Chap. 1, we have already discussed some examples of deterministic dynamics, when describing, for instance, the time evolution of mechanical systems and some preliminary aspects of the construction of statistical physics. From Chap. 2 on, we have switched to a stochastic description of the systems under consideration, as this type of description turns out to be very convenient in a statistical physics context. In the present chapter, we come back to deterministic dynamical systems to briefly introduce some basic properties of this type of dynamics in the case of a dissipative dynamics (Sect. 7.1), with in particular the emergence of bifurcations, or even chaos, when varying a control parameter. We will also discuss how probabilistic tools may be of some relevance to describe chaotic deterministic systems (Sect. 7.2). In a second stage, we discuss how the coupling of a large number of dynamical systems having different parameter values may lead to a non-trivial collective behavior, like the synchronization of coupled oscillators (Sect. 7.3), or the global restabilization of unstable individual dynamical units (Sect. 7.4).
7.1 Basic Notions on Dissipative Dynamical Systems 7.1.1 Fixed Points and Simple Attractors In this first subsection, we focus on dynamical systems with continuous-time dynamics. The notions introduced here can be defined in a similar way for discrete-time dynamical systems. We will briefly discuss such discrete-time systems in Sect. 7.1.3, when introducing the notion of chaotic dynamics. © Springer Nature Switzerland AG 2021 E. Bertin, Statistical Physics of Complex Systems, Springer Series in Synergetics, https://doi.org/10.1007/978-3-030-79949-6_7
207
208
7 Statistical Description of Dissipative Dynamical Systems
In a continuous-time description, deterministic systems are described by an ordinary differential equation dx = F(x(t)). (7.1) dt We focus here on dissipative dynamical systems, for which trajectories concentrate at large time to attractors, that are manifolds of smaller dimension than the full configuration space. Attractors may be of different types: fixed points, limit cycles, or chaotic attractors. Alternatively, invariant manifolds may also be unstable, in which case they are sometimes called repellors. Let us start with the simplest example of invariant manifold, namely, the notion of fixed point. If a constant x (0) is such that F(x (0) ) = 0, then x(t) = x (0) is a timeindependent solution of Eq. (7.1), and x (0) is said to be a fixed point of the dynamics. Fixed points can then be classified into two main categories according to their linear stability properties. If a value x slightly away from x (0) tends to converge with time to x (0) , then x (0) is said to be a linearly stable fixed point. In the opposite case, the distance between x(t) and x (0) grows with time, and x (0) is said to be a linearly unstable fixed point. Mathematically, one writes x(t) = x (0) + ε(t), assuming ε(t) to be small, and linearizes Eq. (7.1), leading to dε = F (x (0) ) ε, dt
(7.2)
where F (x) is the derivative of F(x), taking into account the fact that F(x (0) ) = 0. The linear stability of the fixed point x (0) is simply given by the sign of F (x (0) ): the fixed point is linearly stable for F (x (0) ) < 0, while it is linearly unstable for F (x (0) ) > 0. In the case when F (x (0) ) = 0, the fixed point is said to be marginally stable, and stability is actually determined by the first non-zero term in the expansion of F(x (0) + ε) in powers of ε. Note also that a linearly stable fixed point may be unstable with respect to large enough perturbations. In this case, the fixed point is said to be non-linearly unstable. These basic notions can be easily generalized to the deterministic dynamics of several coupled degrees of freedom, described by a set of ordinary differential equations characterizing the evolution of N degrees of freedom xi (t), d xi = Fi (x1 , . . . , x N ), dt
i = 1, . . . , N .
(7.3)
The notion of fixed point and of their stability can be introduced in the same way as above. If (x1(0) , . . . , x N(0) ) are such that F(x1(0) , . . . , x N(0) ) = 0, then xi (t) = xi(0) is a time-independent solution of Eq. (7.3) for all i = 1, . . . , N and the point (x1(0) , . . . , x N(0) ) is said to be a fixed point. The stability is studied by introducing a small perturbation around the fixed point: xi (t) = x (0) + εi (t)
(7.4)
7.1 Basic Notions on Dissipative Dynamical Systems
209
leading after linearization of Eq. (7.1) to ∂ Fi (0) dεi = x1 , . . . , x N(0) ε j , dt ∂x j j=1 N
i = 1, . . . , N .
(7.5)
Determining the stability of the fixed point is then more involved mathematically than in the case of a single degree of freedom. We first note that Eq. (7.5) can be rewritten more formally in terms of the vector E = (ε1 , . . . , ε N )T (the superscript T denotes the matrix transpose) and the matrix M of elements Mi j =
∂ Fi (0) x1 , . . . , x N(0) . ∂x j
(7.6)
With these notations, Eq. (7.5) then reads d E = ME. dt
(7.7)
The matrix M is sometimes called the stability matrix. Using the standard tools of linear algebra, the stability of the fixed point is given by the eigenvalue λ M of the matrix M having the largest real part, denoted as Re λ M . If Re λ M < 0, the fixed point is linearly stable, while if Re λ M > 0 the fixed point is linearly unstable. The case Re λ M = 0 corresponds to a marginally stable fixed point, whose actual stability is given by terms of higher order than the linear terms retained in Eq. (7.5). If the eigenvalues are all real, the linearized dynamics simply corresponds to a sum of exponential functions of time. If some of the eigenvalues are complex (they need to appear as pairs of complex conjugate values), then the linearized dynamics also includes oscillations. As mentioned above, a stable fixed point is actually the simplest example of the more general notion of attractor. An attractor is, generically speaking, a subset of phase space (i.e., the space in which the vector x1 , . . . , x N is defined) onto which the dynamics concentrates at long time. These attractors can be classified according to their dimension. A stable fixed point is thus a zero-dimensional attractor, and a limit cycle is a one-dimensional attractor; attractors of higher dimension can also exist. Obviously, the dimension of the attractor cannot be larger than the dimension of phase space, that is the number N of dynamical variables. In some situations, the long time dynamics may be very irregular even at long times, and the dynamics is called chaotic in this case—we shall provide later on a more accurate definition of chaotic dynamics. For dissipative systems, chaotic dynamics leads to “strange attractors” having a fractal dimension. We shall come back to the description of chaotic dynamics in Sect. 7.1.3, though without describing strange attractors.
210
7 Statistical Description of Dissipative Dynamical Systems
7.1.2 Bifurcations When the dynamics of the system depends on an external control parameter, the stability of the fixed points, as well as their locations, generically depends on this control parameter. Most often, the location of the fixed point varies continuously with the control parameter, so that it is possible to “follow” the evolution of the fixed points. Moreover, a stable fixed point may become unstable when the control parameter is varied beyond a critical value. Such a change of stability, which is often accompanied by the onset of one or several new attractors (two stable fixed points, or a limit cycle, for instance), is called a bifurcation [4]. Let us illustrate this notion on the simple example of the non-linear dynamics of a single variable x(t), described by an equation dx = f (x, μ), (7.8) dt where μ is the control parameter. We assume for simplicity that x = 0 is a fixed point for all values of μ (otherwise, one simply needs to redefine x through a shift). This implies that f (0, μ) = 0. We further assume that f is odd, meaning that f (−x, μ) = − f (x, μ). Expanding f (x, μ) around x = 0, one generically obtains f (x, μ) = α(μ)x − β(μ)x 3 + O(x 5 ),
(7.9)
where α(μ) and β(μ) are two functions of μ, that we assume for simplicity to be continuous. Note that even terms in x vanish due to the parity properties of f . A bifurcation occurs when there exists a value μ0 such that α(μ) changes sign at μ = μ0 . Without loss of generality, we assume that α(μ) < 0 for μ < μ0 and α(μ) > 0 for μ > μ0 . Then, a linear stability analysis of the fixed point x = 0 leads to dx = α(μ) x. (7.10) dt Hence x = 0 is a stable fixed point for μ < μ0 , and an unstable fixed point for μ > μ0 , so that a bifurcation occurs for μ = μ0 . If the coefficient β(μ) appearing in Eq. (7.9) is strictly positive for μ ≥ μ0 , a pair of symmetric fixed points appears when μ > μ0 at x = ±x0 , with x0 =
α(μ) β(μ)
α(μ) > 0 .
(7.11)
It can be checked easily that these new fixed points are linearly stable. The continuity of α(μ) for μ → μ0 implies that the fixed points ±x0 continuously emerge from 0 when μ is increased above μ0 . Such a bifurcation is called a supercritical bifurcation. If by contrast β(μ) < 0 for μ > μ0 , the determination of the emerging stable fixed points involves the terms of order x 5 in the expansion given in Eq. (7.9), or more
x0
x0
7.1 Basic Notions on Dissipative Dynamical Systems
0
μ0
μ
211
0
*
μ
μ0
μ
Fig. 7.1 Sketch of two standard types of bifurcations, occurring at a value μ0 of the control parameter μ. Lines indicate the fixed points x0 of the dynamics (full line: stable fixed point; dashed line: unstable fixed point). Left: supercritical bifurcation; two symmetric stable fixed points appear continuously from x0 = 0 for μ > μ0 . Right: subcritical bifurcation; stable non-zero fixed points appear at a finite distance from x0 = 0 for a value μ∗ < μ0 , indicated by a vertical dotted line. Linear stability of the fixed point x0 = 0 is lost only for μ > μ0
generally the lowest order stabilizing terms. In this case, the new fixed points emerge at finite values ±x0∗ > 0 when μ → μ+ 0 . Such a transition is called a subcritical bifurcation (Fig. 7.1). Note that there are strong formal analogies between the simple bifurcations we have just presented, and the Landau theory of phase transitions, as described in Sect. 1.4. Note also that in systems with more than one degree of freedom, more complex bifurcations may occur, like the Hopf bifurcation in which a limit cycle appears when a fixed point becomes unstable as a control parameter is varied. Here again, notions of supercritical and subcritical bifurcations may be introduced depending on whether the limit cycle emerges continuously from a point or with a finite size, respectively. The supercritical Hopf bifurcation is described by the following dynamics of a complex variable z, z˙ = (α(μ) + iω)z − (1 + iq)|z|2 z ,
(7.12)
where μ is again a control parameter. Note that the real part of the coefficient of the cubic term has been set to one by an appropriate choice of units, without loss of generality. Complex number notations are here just a simple and compact way to write an equation for a two-dimensional dynamical system with variables x and y, writing z = x + i y. It also allows one to conveniently switch from the Cartesian to a polar representation z = r eiθ in terms of the amplitude and phase, which is convenient to describe the limit cycle. In terms of these polar variables, the dynamics reads r˙ = α(μ)r − r 3 , θ˙ = ωr − qr 3 .
(7.13)
212
7 Statistical Description of Dissipative Dynamical Systems
If α(μ) < 0, the fixed point r = 0 (or z = 0) is stable. If α(μ) > 0, this fixed point becomes unstable, and the dynamics converges at large time to a limit cycle defined √ by r = r0 ≡ α(μ) and θ = (ω − qr02 )r0 t + θ0 . Finally, a subcritical bifurcation to a limit cycle may be obtained with an equation of the form z˙ = (α + iω)z + (1 + iq)|z|2 z − (γ + is)|z|4 z .
(7.14)
with γ > 0.
7.1.3 Chaotic Dynamics To discuss the notion of chaotic dynamics, it is actually more convenient to use the framework of discrete-time dynamics. In this case, time takes only integer values and the value xt+1 of a dynamical variable at time t + 1 is given as a function of its value xt at time t (7.15) xt+1 = f (xt ). The function f (x) is called a map, as it maps the interval of definition of the variable x onto itself. To emphasize the discreteness of time, we write it as a subindex. The discrete-time dynamics may be interpreted as a periodic sampling of an underlying continuous-time dynamics. Yet, this does not need to be the case, and a discrete-time dynamics can also be considered in its own right, independently of any continuoustime dynamics. The notion of fixed point, limit cycle and chaotic dynamics can be similarly defined for discrete-time dynamics. A fixed point x (0) is such that x (0) = f (x (0) ) .
(7.16)
The stability is tested by introducing a small perturbation around the fixed point, xt = x (0) + εt , yielding (7.17) εt+1 = f (x (0) ) εt . The fixed point is thus linearly stable when ε converges to zero, that is when | f (x (0) )| < 1. In the opposite case | f (x (0) )| > 1, the fixed point is linearly unstable. In the simple case of the discrete-time dynamics of a single degree of freedom as we consider here, a limit cycle consists in a finite set of q values (x1c , . . . , xqc ) such that f (x1c ) = x2c , f (x2c ) = x3c , . . . , f (xqc ) = x1c .
(7.18)
If the dynamics does not converge in time to a fixed point or to a limit cycle, it may be chaotic, with an apparently erratic behavior. Chaoticity is defined by the fact that the distance between two initially close points increases exponentially with time; the growth rate is called the Lyapunov exponent. Note that this notion is different from that of an unstable fixed point, since we are considering arbitrary close-by initial
7.1 Basic Notions on Dissipative Dynamical Systems Fig. 7.2 Illustration of the shape of the function f (x) for λ = 5 (dot-dashed line), λ = 10 (dashed line) and λ = 15 (full line)
213
1
λ=5 λ = 10 λ = 15
0.8
f(x)
0.6 0.4 0.2 0
0
0.2
0.4
x
0.6
0.8
1
points, rather than the neighborhood of a fixed point. More generally, for a system with N degrees of freedom, there are N Lyapunov exponents. The system is chaotic if the largest Lyapunov exponent is positive. As a simple illustration of the emergence of chaotic dynamics, let us consider the following map, 1 (x − 1) + x, (7.19) f (x) = λx x − 2 where λ is a parameter taken in the interval 0 < λ < 16 to ensure that 0 < f (x) < 1 for 0 < x < 1 (negative values of λ would also satisfy this constraint in some range, but we focus here on positive values of λ). This map is illustrated on Fig. 7.2. Fixed points, satisfying f (x (0) ) = x (0) , are readily given by x (0) = 0, 21 and 1. The two fixed points x (0) = 0 and x (0) = 1 are unstable since f (0) = f (1) = 1 + λ2 > 1. The stability of the fixed point x (0) = 21 is more interesting as it depends on λ. We have λ 1 =1− (7.20) f 2 4 so that | f ( 21 )| < 1 for 0 < λ < 8 and | f ( 21 )| > 1 for λ > 8. As a result, the fixed point x (0) = 21 is linearly stable for λ < 8 and linearly unstable for λ > 8. In this latter case, we can check by numerical simulations toward which kind of attractor the dynamics converges. We see that for λ = 10, a limit cycle is obtained, while for λ = 15, a chaotic dynamics is observed—see Fig. 7.3. By chaotic, we simply mean here that the dynamics appears to be very irregular. A more quantitative statement would require a numerical evaluation of the Lyapunov exponent.
214
7 Statistical Description of Dissipative Dynamical Systems 0.8
0.6
0.6
xt
xt
0.8
0.4
0.4
0.2 0
5
1
t
10
0.2 0
15
5
t
10
15
100
150
1 0.8
0.6
0.6 xt
xt
0.8
0.4
0.4
0.2
0.2
0 0
5
t
10
15
0 0
50
t
Fig. 7.3 Top left: Convergence to a fixed point for λ = 5. Top right: Convergence to a limit cycle (oscillation between two different points) for λ = 10. Bottom: case λ = 15, showing a chaotic behavior (Left: same time window as on the top panels; Right: larger time window)
7.2 Deterministic Versus Stochastic Dynamics 7.2.1 Qualitative Differences and Similarities An interesting notion to discuss qualitative similarities and differences between deterministic and stochastic dynamics is that of chaotic walk. Let us define a walk y(t) through the relation (7.21) yt+1 = yt + (2xt − 1), where the variable xt , obeying Eq. (7.15) with the function f (x) given in Eq. (7.19), has been rescaled into 2xt − 1 so that it spans the entire interval [−1, 1]. We choose λ = 15 to be in the chaotic regime of the map. An illustration of the chaotic walk is shown on the left panel of Fig. 7.4, and it turns out to be visually similar to a random walk, at least at first sight. Looking more carefully, one may, however, notice some anticorrelation between the steps, in the sense that positive steps are more often followed by negative steps than by positive ones. To make the comparison
7.2 Deterministic Versus Stochastic Dynamics
215
xt
150 2
1
linear fits
2
100
0 50 -1 -2
0 0
50
100
t
150
200
0
20 0
400
600
800
1000
t
Fig. 7.4 Left: Chaotic walk (full line); a random walk (dashed line) is shown for comparison. Right: Mean displacement xt and mean square displacement xt2 of the chaotic walk, obtained by averaging over many trajectories having different initial conditions. Similarly to what would be obtained for a random walk, the average displacement is zero for all time, and the mean square displacement increases linearly with time
more quantitative, one can compute the mean displacement xt and the mean square displacement xt2 —see right panel of Fig. 7.4. The average is taken over an ensemble of trajectories with different initial conditions, so that averages depend on time. More precisely, the initial position is given by y0 = 0, and the initial value x0 is uniformly sampled from the interval (0, 1), taking a set of regularly spaced values over this interval. We observe that the mean displacement is almost equal to zero up to some small fluctuations, while the mean square displacement turns out to be linear in time, again up to some small fluctuations. These results coincide with the results obtained from a random walk, showing that random systems and chaotic systems share some common properties. In a sense, this is not very surprising since in practice, random processes are simulated on a computer using random number generators that are nothing but chaotic deterministic processes. However, let us stress that random number generators need to be tuned to satisfy the required properties of statistical independence and uniformity of the generated numbers. We see here that taking an arbitrary map to compute a chaotic walk, we already obtain without any fine tuning some basic properties that are similar to that of random systems. In the following section, we discuss in more details this similarity between chaotic and random systems.
7.2.2 Stochastic Coarse-Grained Description of a Chaotic Map We consider again the chaotic map xt+1 = f (xt ) introduced in Sect. 7.1.3, with the function f (x) given in Eq. (7.19). We have previously determined average values
216
7 Statistical Description of Dissipative Dynamical Systems
of a few observables in the chaotic regime of this map. Here we wish to go one step further and to determine the full histogram of the values of xt . This histogram is displayed on Fig. 7.5 (left panel), for a value λ = 15 corresponding to a chaotic regime. The histogram has been built following standard methods, namely, dividing the interval (0, 1) into a relatively large number of subintervals, called “bins”, and determining for a long trajectory the relative number of values of xt that are contained in each bin. An important remark at this stage is that the dynamics of the map is purely deterministic only if the value of xt is known with an infinite accuracy. To illustrate this issue, we may use the bins not only to build the histogram, but also to define an effective dynamics using coarse-grained configurations. In terms of these new configurations, the dynamics is no longer deterministic, because the evolution starting from a given bin at time t may lead to several distinct bins at time t + 1, while deterministic dynamics would require a single target bin. Stated otherwise, the knowledge of the initial coarse-grained configuration is not enough to determine the configuration at any later time. Interestingly, this property is also a characteristic property of stochastic systems, and we will now elaborate more quantitatively on this comparison. With this aim in mind, we define an auxiliary stochastic model as follows. Having defined a partition of the interval (0, 1) into bins, we start by measuring the frequency of occurrence F( j → k) of direct (i.e., in a single step) transitions from bin j to bin k for any pair ( j, k). In a second stage, we define the auxiliary stochastic model as a Markov chain on the set of bins, with transition probabilities T ( j → k) chosen to be equal to the frequencies F( j → k) measured in the chaotic dynamics. Simulating this Markov chain, we can also determine the corresponding empirical histogram of x(t). The result is shown on the right panel of Fig. 7.5. A striking similarity is observed with the original histogram obtained from the chaotic dynamics. This similarity indicates that in practical situations, in which data necessarily have a finite resolution, it may be difficult to distinguish a deterministic process from a stochastic one. These difficulties may actually be overcome by using more sophisticated tools to characterize the nature of the dynamics. For instance, under the assumption of a deterministic dynamics, one may try to characterize the dimension of the attractor. If this procedure is applied to a stochastic signal, the resulting dimension of the attractor would be found to be very large (in principle infinite). Yet, this type of analysis is, like the more naive ones previously discussed, limited in practice by the finite amount of available data (typically a finite set of points). To take this hard fact into account, an interesting proposition has been put forward, namely, to characterize the deterministic or stochastic nature of a signal relatively to a given scale of resolution [3].
217
2
2
1.5
1.5 p(x)
p(x)
7.2 Deterministic Versus Stochastic Dynamics
1
0.5
0 0
1
0.5
0.2
0.4
x
0.6
0.8
1
0 0
0.2
0.4
x
0.6
0.8
1
Fig. 7.5 Left: Histogram of the values of xt , using the deterministic evolution xt+1 = f (xt ), in the case λ = 15. Right: Histogram obtained from the effective stochastic process mimicking the deterministic one (see text). Both histograms turn out to be very similar
7.2.3 Statistical Description of Chaotic Systems To go beyond the numerical analysis presented above, let us discuss the statistical approach to chaotic systems from a more theoretical perspective. Although chaotic systems are deterministic, they can be described by tools that are similar to that used for stochastic systems, namely, probability distributions. In the case of deterministic systems, probabilistic aspects do not come from the evolution in itself, but rather from the fact that one follows an ensemble of trajectories, determined by a set of initial conditions. Hence one follows the evolution of a distribution pt (x) of configurations x as a function of time t, under the deterministic dynamics. We consider here the case of a deterministic dynamics defined by a map xt+1 = f (xt ), and we assume without loss of generality that the variable x is defined over the interval (0, 1) (any other interval, even unbounded, can be mapped onto the interval (0, 1) through some— possibly non-linear—transform). Formally, the evolution of the distribution pt (x) is given by the equation
1
pt+1 (x) =
d x pt (x ) δ f (x ) − x .
(7.22)
0
1 Note that the distribution pt (x) is normalized according to 0 p(x) d x = 1. Now one has to identify, for each value of x, the list of values xi , i = 1, . . . , n(x) such that f (xi ) = x. Then, thanks to the standard properties of Dirac delta functions (see Appendix A), one can rewrite n(x) δ f (x ) − x = i=1
1 δ(x − xi ) | f (xi )|
(7.23)
7 Statistical Description of Dissipative Dynamical Systems
1
1
0.8
0.8
p(x)
f(x)
218
0.6
0.6
0.4
0.4
0.2
0.2
0 0
0.2
0.4
x
0.6
0.8
1
0 0
0.2
0.4
x
0.6
0.8
1
Fig. 7.6 Left: Illustration of the shape of the tent map for a = 0.3 (full line) and a = 0.8 (dashed line). Right: Stationary probability distribution measured numerically for the tent map given in Eq. (7.26), confirming the prediction of a uniform distribution (full line: a = 0.3; dashed line: a = 0.8). Both distributions are indistinguishable
so that Eq. (7.22) now reads, after integration of the Delta distributions, pt+1 (x) =
n(x) pt (xi ) . | f (xi )| i=1
(7.24)
The stationary distribution is then obtained by assuming that pt (x) does not depend on t, yielding the following equation for the stationary distribution p(x), p(x) =
n(x) p(xi ) . | f (xi )| i=1
(7.25)
This equation is in general complicated to solve, because it is non-local (recall that xi is a function of x). In most cases, n(x) is piecewise constant, which slightly simplifies the problem, although finding the solution on each interval where n is constant remains potentially difficult. As a simple illustration, let us consider the following map, sometimes called (generalized) tent map: f (x) =
⎧ ⎨
if 0 ≤ x ≤ a,
x a
⎩ 1−x 1−a
(7.26) if a ≤ x ≤ 1.
This map is illustrated in the left panel of Fig. 7.6. In this case, the equation f (xi ) = x has two solutions for all x (hence n(x) = 2). These solutions are given by x1 = ax,
x2 = 1 − (1 − a)x
(7.27)
7.2 Deterministic Versus Stochastic Dynamics
219
and one has | f (x1 )| = a and | f (x2 )| = 1 − a, so that Eq. (7.25) reads p(x) = a p(ax) + (1 − a) p 1 − (1 − a)x .
(7.28)
A constant value of p(x) is obviously a solution of this equation, and this constant 1 has to be p(x) = 1 from the normalization condition 0 p(x) d x = 1. Note that the same uniform distribution is obtained whatever the value of a. Whether Eq. (7.28) admits other solutions is not easy to verify, but one can resort to numerical simulations to check whether this uniform distribution is the one asymptotically reached by the dynamics. Note that such a comparison requires the hypothesis of ergodicity, whereby ensemble averages are assumed to be equal to time averages. Under this assumption, we show on the right panel of Fig. 7.6 that the histogram computed from numerical simulations agrees with the predicted uniform distribution p(x) = 1, meaning either that the solution is unique, or that at least it is the dynamically selected solution. The example has been chosen here to yield a simple enough analytic solution. Of course, and although the distribution is found here not to depend on the parameter a, not all maps yield a uniform stationary distribution. We have seen, for instance, in Fig. 7.5 that the map f (x) given in Eq. (7.19) gives a non-uniform distribution.
7.3 Globally Coupled Oscillators and Synchronization Transition Up to now, we have discussed some basic properties of dissipative dynamical systems, focusing on low-dimensional systems, that is, systems with a small number of degrees of freedom. An interesting generalization, much in the spirit of statistical physics, is to couple a large number of dynamical systems, each of them possibly having its own characteristics, as defined by some individual control parameter, for instance. The coupled system is then a high-dimensional dynamical system that may exhibit a rich behavior. In this spirit, we now focus on a phenomenon emerging from the global coupling of low-dimensional dynamical systems, namely, the synchronization transition, through which coupled oscillators with distinct natural frequencies oscillate in phase, with a common frequency, if the coupling is strong enough. The synchronization phenomenon occurs in many different real systems in the fields of biology, chemistry, physics, or even social sciences [1, 11]. The paradigmatic model for the synchronization transition is the Kuramoto model, that we describe below.
7.3.1 The Kuramoto Model of Coupled Oscillators The Kuramoto model [7] consists in a set of N oscillators of phase θ j , evolving according to the coupled equations
220
7 Statistical Description of Dissipative Dynamical Systems
dθ j K jk sin(θk − θ j ), = ωj + dt k=1 N
j = 1, . . . , N ,
(7.29)
where ω j is the natural frequency of oscillator j, and K jk is the coupling constant between oscillators j and k. Applications of the Kuramoto model range from chemical oscillators to neural networks, laser arrays or Josephson junctions [1]. We shall here mostly follow the presentation of this model given in Ref. [1], and refer the reader to this specialized review for further details. The simplest version of the Kuramoto model is obtained by choosing uniform (mean-field type) couplings K jk = K /N , such that any pair of oscillators has the same coupling. The 1/N scaling is included so that the sum of all coupling terms does not trivially dominate the natural frequency in Eq. (7.29). The evolution of θ j is then given by N dθ j K = ωj + sin(θk − θ j ), dt N k=1
j = 1, . . . , N .
(7.30)
In order to characterize the possible synchronization of the oscillators resulting from the coupling terms, it is convenient to introduce the complex order parameter r eiψ defined as N 1 iθk e . (7.31) r eiψ = N k=1 In the absence of synchronization, the (mean) value of this order parameter is equal to zero, while the presence of synchronization is indicated by a value r > 0, the phase ψ corresponding to the “average” phase of the oscillators. It is convenient to reformulate Eq. (7.30) as dθ j = ω j + K r sin(ψ − θ j ), dt
j = 1, . . . , N ,
(7.32)
using the fact that from Eq. (7.31), r ei(ψ−θ j ) =
N 1 i(θk −θ j ) e N k=1
(7.33)
for any j, and taking the imaginary part of Eq. (7.33). We shall now focus on the limit of a very large number of coupled oscillators, N → ∞. In this case, the natural frequencies are described by the density g(ω), which means that the fraction of oscillators having a natural frequency ω j in ∞the infinitesimal range [ω, ω + dω] is g(ω)dω. The density g(ω) is normalized as −∞ g(ω)dω = 1. By an appropriate transform ∞ θ → θ − t, it is possible to redefine the model in such a way that ω ≡ −∞ ωg(ω)dω = 0. This can be interpreted as looking at
7.3 Globally Coupled Oscillators and Synchronization Transition
221
the oscillators in a rotating frame. In this frame, synchronization appears as a timeindependent value of the average phase ψ, with r > 0. The statistics of the phases of oscillators having a given frequency ω is encoded into the time-dependent ∞ probability distribution ρ(θ |ω, t). This distribution, normalized according to −∞ ρ(θ |ω, t)dθ = 1, describes the statistics of a set of identical oscillators having different initial conditions. Taking into account Eq. (7.32), the evolution of the distribution ρ(θ |ω, t) is governed by the equation1 ∂ρ ∂ ω + K r sin(ψ − θ ) ρ(θ |ω, t) = 0 . (θ |ω, t) + ∂t ∂θ
(7.34)
In the infinite N limit considered here, the expression (7.31) of the order parameter reduces to
π
∞ r eiψ = eiθ ≡
dθ −π
−∞
dω eiθ ρ(θ |ω, t)g(ω) .
(7.35)
In the following, we look for steady-state solutions and study whether the oscillators get synchronized or not in this regime, depending on the coupling strength K .
7.3.2 Synchronized Steady State In order to find the steady-state solution of the model, we need to find for all frequency ω the time-independent distribution ρ(θ |ω) solution of Eq. (7.34), in which r and ψ are time-independent values self-consistently determined from Eq. (7.35). It can easily be checked that the uniform distribution ρ(θ |ω) = (2π )−1 , which leads to r = 0, is a solution of Eq. (7.34) for all coupling strength K . This solution corresponds to a complete lack of synchronization between oscillators. While such a situation is likely to be relevant at low coupling, it is, however, possible that other solutions exist if the coupling strength K is strong enough. To look for such possible solutions, we start from a given value of the order parameter r eiψ with r > 0, determine the solution of Eq. (7.34) for these values of r and ψ, and then check whether a self-consistent solution of Eq. (7.35) can be found. We first note that if a stationary solution with global phase ψ exists, then another steady-state solution of phase ψ + α can be obtained by shifting all the phases θ j by the same amount α. Hence we can restrict our study to the case ψ = 0, the other cases being deduced by a simple phase shift. Under this assumption, the steady-state solution of Eq. (7.34) satisfies ω − K r sin θ ρ(θ |ω) = C,
1
(7.36)
This equation may be thought of as a Fokker–Planck equation (see Sect. 2.3) in the zero noise limit.
222
7 Statistical Description of Dissipative Dynamical Systems
where C is a constant. The condition ρ(θ |ω) ≥ 0 implies that such a solution exists only if |ω| ≥ K r . The case |ω| = K r is further excluded as it would lead to a nonnormalizable distribution ρ(θ |ω). As a result, one finds ω2 − (K r )2 1 , ρ(θ |ω) = 2π |ω − K r sin θ |
|ω| > K r .
(7.37)
If |ω| ≤ K r , the distribution (7.37) is no longer valid. We leave aside the discussion of the marginal case |ω| = K r , which plays no role in the following, and focus on the situation |ω| < K r . In this case, the evolution equation (7.32) has two fixed points, solutions of ω − K r sin θ = 0 . (7.38) To check the linear stability of a fixed point θ0 , we set θ = θ0 + , with 1. Expanding Eq. (7.32) to linear order in , we get d = −(K r cos θ0 ) , dt
(7.39)
so that the fixed point θ0 is stable if cos θ0 > 0 and unstable if cos θ0 < 0. Taking into account Eq. (7.38), the stable fixed point is thus given by θ0 = sin−1
ω . Kr
(7.40)
The distribution ρ(θ |ω) associated with this fixed point solution is a Dirac delta function (see Appendix A), that is an infinitely peaked solution around the fixed point: |ω| < K r . (7.41) ρ(θ |ω) = δ θ − sin−1 (ω/K r ) , Now that we have determined ρ(θ |ω) for both |ω| < K r and |ω| > K r , we can self-consistently determine r from Eq. (7.35), setting ψ = 0:
r =
π
Kr
dθ −π
+
−K r
ω2
dω eiθ δ θ − sin−1 (ω/K r ) g(ω)
− (K r )2 2π
π
dθ −π
dω |ω|>K r
eiθ g(ω) . |ω − K r sin θ |
(7.42)
Let us now further assume that g(ω) is an even function, that is g(−ω) = g(ω) for all ω (which is consistent with the assumption ω = 0). Using the symmetries of the sine function, it can be shown that the second integral in Eq. (7.42) is equal to zero. The first integral can be computed thanks to the properties of the δ function, Namely,
b d x f (x) δ(x − x0 ) = f (x0 ) (7.43) a
7.3 Globally Coupled Oscillators and Synchronization Transition
223
for any function f , provided that a < x0 < b. One thus finds, exchanging the order of integration between θ and ω:
r=
Kr
dω g(ω) ei sin
−1
(ω/K r )
−K r
.
(7.44)
Using the parity of g(ω), the imaginary part of the integral vanishes, and Eq. (7.44) reduces to
Kr r =2 dω g(ω) cos sin−1 (ω/K r ) . (7.45) 0
Performing the change of variable ω = K r sin x, one eventually finds the following self-consistent equation, taking into account the assumption r > 0
π/2
d x (cos x)2 g(K r sin x) =
0
1 . 2K
(7.46)
The solutions of this equation depend on some generic properties of the function g(ω). In the following, we assume that g(ω) has its maximum at ω = 0, that is for all ω = 0, g(ω) < g(0). Denoting as I (r ) the integral on the left-hand side of Eq. (7.46), we have for all r > 0, I (r ) < I (0). Hence if the coupling constant K is such that (2K )−1 > I (0), Eq. (7.46) has no solution for r , while a solution r > 0 exists for (2K )−1 < I (0). This defines the critical coupling K c = [2I (0)]−1 , above which a solution r > 0 exists. Expanding g(ω) for small ω as 1 g(ω) = g(0) − |g (0)|ω2 + O(ω4 ) , 2
(7.47)
with g (0) < 0, one finds after some algebra the following relation, for 0 < K − Kc Kc, 16(K − K c ) . (7.48) r≈ π K c4 |g (0)| The above result is valid for any regular function g(ω) having its maximum at ω = 0. In the specific case ω0 1 g(ω) = , (7.49) π ω02 + ω2 where ω0 > 0 is a constant, one finds K c = 2ω0 , and the solution of Eq. (7.46) can be given explicitly for all r , namely, r=
1−
2ω0 . K
(7.50)
224
7 Statistical Description of Dissipative Dynamical Systems
Finally, it can be shown that the synchronized solution r > 0 corresponds to the stable state of the system for K > K c [1].
7.3.3 Coupled Non-linear Oscillators and “Oscillator Death” Phenomenon Up to now, we have considered coupled oscillators where each individual oscillator is purely described by a phase variable θ j . However, more realistic oscillators are described both by an amplitude and a phase variable. In general, the non-linear dynamics of an individual oscillator is such that it converges to a limit cycle. Along the limit cycle, the dynamics can effectively be described only by a phase. Yet, when introducing a coupling between oscillators, the amplitude dynamics of individual oscillators should also be impacted. The phase-only dynamics of the Kuramoto model is thus valid only in the limit when the attraction to the limit cycle is strong as compared to the intensity of the coupling between oscillators. In this subsection, we thus investigate what happens when the attraction to the limit cycle is not strong enough, and the full amplitude and phase non-linear dynamics of each oscillator has to be taken into account. As we will see, this situation leads to a new phenomenology. To account for both phase and amplitude variables, we assume that each individual oscillator is characterized by a complex variable z j = |z j |eiθ j ( j = 1, . . . , N ) where |z j | describes the amplitude of the oscillation, and θ j its phase. In the absence of coupling between oscillators, the dynamics of the complex variable z j is assumed to follow a weakly non-linear dynamics [6, 9] corresponding to the normal form of the Hopf supercritical bifurcation described in Eq. (7.12) z˙ j = (1 + iω j )z j − |z j |2 z j .
(7.51)
Note that the imaginary part of the coefficient of the cubic term on the right-hand side of Eq. (7.51) has been set to zero to simplify the presentation, but the calculation can be performed in the same way keeping this imaginary part non-zero. We essentially follow here the presentation made in Ref. [6], although we have slightly simplified the original dynamics used there (which included an imaginary part in the coefficient of the cubic term). The value z j = 0 is a fixed point of the dynamics, because z˙ j = 0 when z j = 0. However, it is easy to see that this fixed point is unstable. Linearizing the dynamics close to z j = 0 leads to z˙ j = (1 + iω j )z j , whose solutions are z j (t) ∝ e(1+iω j )t which diverge with time. The full non-linear dynamics Eq. (7.51) then evolves to a limit cycle z j (t) = eiω j t , and thus becomes a pure phase oscillator akin to that used in the Kuramoto model. Convergence to the limit cycle can be shown, for instance, by looking at the dynamics of the squared amplitude u j = |z j |2 , which reads u˙ j = 2(u j − u 2j ) .
(7.52)
7.3 Globally Coupled Oscillators and Synchronization Transition
225
An elementary stability analysis shows that u = 0 is an unstable fixed point, while u = 1 is a stable fixed point. We now wish to introduce couplings between oscillators, in the same spirit as in the Kuramoto model. Following Ref. [6], we consider the following set of N coupled oscillators, with mean-field type couplings, N K z˙ j = (1 + iω j )z j − |z j | z j + (z k − z j ) , N k=1 2
(7.53)
where K > 0 is the (rescaled) coupling constant between oscillators. On top on having a very simple form, the coupling term in Eq. (7.53) is a natural generalization of the one of the Kuramoto model, in the following sense. Writing z j = r j eiθ j , the imaginary part of Eq. (7.53) reads as r j θ˙ j = ω j r j +
N K sin(θk − θ j ) . N k=1
(7.54)
Setting the amplitude r j = 1, one recovers the dynamics of the Kuramoto model, Eq. (7.30). However, an important difference with the Kuramoto model is that the real part of Eq. (7.53) yields a dynamics for the amplitudes r j , and that in the presence of a non-zero coupling K , the value r j = 1 is no longer a fixed point of the dynamics. The dynamics of the model defined by Eq. (7.53) can thus not be reduced to that of the Kuramoto model, because of the dynamics of the amplitudes. Our main focus when studying the Kuramoto model was the characterization of the synchronization transition, through which the oscillation of a global order parameter sets in due to the presence of a strong enough coupling, and although individual oscillators have different frequencies. Here, we will not concentrate on the synchronization transition, but rather focus on the new phenomenology that emerges by taking into account the amplitude degree of freedom of individual oscillators. We will see in particular that although in the absence of coupling each oscillator converges to a limit cycle, the presence of the coupling may lead to a disappearance of oscillations, a phenomenon that has been called “oscillator death” [1, 6, 10]. At variance with the synchronization transition that regards a collective degree of freedom (the order parameter), the oscillator death phenomenon is actually a feedback of the coupling onto individual oscillators. Having in mind the aim to study this oscillator death phenomenon, we first note that the value z j = 0 for all j is a fixed point of the coupled dynamics Eq. (7.53). To study the stability of this fixed point, we consider the linearized dynamics around the fixed point z j = 0, which can be written as N K z˙ j = (1 − K + iω j )z j + zk . N k=1
(7.55)
226
7 Statistical Description of Dissipative Dynamical Systems
Defining P(t) =
1 N
N
k=1 z k ,
Eq. (7.55) can be integrated to give,
z j (t) = z j (0) e(1−K +iω j )t + K
t
e(1−K +iω j )(t−s) P(s) .
(7.56)
0
If K ≤ 1, the first term in the right-hand side of Eq. (7.56) diverges (if K < 1), or remains of constant amplitude (if K = 1). This implies that perturbations around the fixed point z j = 0 do not decay to zero, and the fixed point is unstable in this case. In the following, we thus assume that K > 1 to allow for the possibility that z j = 0 becomes a stable fixed point. Summing Eq. (7.56) over j and changing the integration variable s into t − s, one finds an integral equation for the function P(t),
t
P(t) = K
e(1−K )s eiω j s P(t − s) ,
(7.57)
0
with e
iω j s
N 1 iω j s = e . N k=1
(7.58)
We now take the limit of an infinite number N of oscillators, with ∞ frequencies ω distributed according to a probability density g(ω), normalized as −∞ g(ω) dω = 1. In the limit N → ∞, one thus finds
∞ iω j s eiωs g(ω) dω = G(s) , (7.59) e → −∞
where G(s) is the Fourier transform of the density g(ω). From now on, we assume that g(ω) is a symmetric density, centered on ω = 0. Note that even if the case of a nonzero average frequency ω is often physically more meaningful, a change of variable z j → z j eiω t turns the original problem into one with zero average frequency. Looking for exponential solutions P(t) = P0 eλt of Eq. (7.57) in the limit t → ∞, we find that the growth rate λ is solution of the equation
∞ 0
e(1−K −λ)s G(s) ds =
1 . K
(7.60)
If all solutions λ of Eq. (7.60) have a negative real part, Re(λ) < 0, then the fixed point z j = 0 for all j is stable; otherwise it is unstable. Finding the solutions of Eq. (7.60) requires the knowledge of the function G(s), and thus to specify the probability density of frequencies g(ω). As a simple example, we consider the Lorentzian density g(ω) given in Eq. (7.49) that we already introduced in the study of the Kuramoto model. In this case, G(s) = e−ω0 |s| , and Eq. (7.60) has a single solution λ = 1 − ω0 , which is real. The growth rate λ is thus negative for ω0 > 1. Hence the fixed point z j = 0 is stable for a coupling constant K > 1 and a width ω0 > ω0c = 1 of the
7.3 Globally Coupled Oscillators and Synchronization Transition
227
Lorentzian density g(ω) given in Eq. (7.49). The oscillator death phenomenon thus occurs for a strong enough coupling between oscillators, and for a strong enough spread of frequencies. This Lorentzian case is particularly simple because the critical width ω0c of the frequency density g(ω) is independent of the coupling K . For more general densities g(ω), ω0c depends on K . For example, in the case of the frequency density [6] g(ω) =
ω4
g0 , + ω04
(7.61)
the growth rate λ can also be determined exactly (one actually finds two distinct solutions in this case), and the fixed point z j = 0 is stable for K > 1 and √ 1 ω0 > ω0c = √ 1 + 2K − 1 . 2
(7.62)
The critical width ω0c thus depends explicitly on the coupling K , which is also the situation for a generic frequency density g(ω). We have focused here on the oscillator death phenomenon because it is a new phenomenon that cannot occur in the framework of the Kuramoto model, when the amplitudes of oscillators are not taken into account. However, the present model of coupled non-linear oscillators defined by Eq. (7.53) exhibits a rich phenomenology that is not limited to the oscillator death phenomenon. The phenomenon of synchronization is also present in this model, as well as other types of dynamical bifurcations. The interested reader is referred, for instance, to Refs. [2, 8, 9].
7.4 A General Approach for Globally Coupled Dynamical Systems 7.4.1 Coupling Low-Dimensional Dynamical Systems We have seen above that one possible consequence of coupling non-linear oscillators was in some cases the restabilization of the “zero fixed point”, a phenomenon called oscillator death. We now explore more generally this phenomenon and show that the restabilization by the coupling of an unstable fixed point of the individual dynamical systems is actually present in a wide variety of problems and is not restricted to coupled non-linear oscillators. In the following, we restrict ourselves to large, globally coupled dynamical systems, in which a given individual dynamical system interacts with all the others in the same way. In principle, such a large system can also be studied purely in the framework of dynamical systems, by determining, for instance, the attractors in the high-dimensional phase space. However, such a task is in general very complicated.
228
7 Statistical Description of Dissipative Dynamical Systems
Also, in the spirit of statistical physics, one may wish to reduce the description of the system to a small number of global variables (or “order parameters”), based on the knowledge of the individual dynamical systems. A general procedure, involving some approximation scheme, has been developed to obtain such a reduced description in terms of a small number of average variables [5]. We will illustrate this procedure on the following globally coupled model, composed of N dynamical systems each defined in terms of two variables x j (t) and y j (t), for j = 1, . . . , N . The variables x j (t) and y j (t) obey the deterministic dynamics given by dx j = τ j g(x j , y j ) + k(X − x j ) dt dy j = τ j h(x j , y j ) + k(Y − y j ), dt
(7.63) (7.64)
where X (t) = N −1 Nj=1 x j (t) and Y (t) = N −1 Nj=1 y j (t) are, respectively, the instantaneous average values of x j (t) and y j (t) over the population of dynamical systems. The functions g(x, y) and h(x, y) are at this stage arbitrary given functions that do not depend on j. Heterogeneity is incorporated in the model by introducing a specific time scale τ j > 0 for each dynamical system j. The last terms in Eqs. (7.63) and (7.64) are global coupling terms that constrain x j and y j to remain close to the population averages X and Y . The constant k is called the coupling constant. In order to illustrate the usefulness of the order parameter expansion method, we will investigate how the coupling between dynamical systems may change the stability of the fixed point with respect to the uncoupled case k = 0 (or in other words, of the individual dynamics). With this aim in mind, we assume that (x, y) = (0, 0) is an unstable fixed point of the uncoupled system (k = 0). This implies in particular that g(0, 0) = h(0, 0) = 0, and that at least one of the two eigenvalues λ1 and λ2 of the stability matrix (see Sect. 7.1.1) has a positive real part. Then, for any value of the coupling k, the configuration (x j , y j ) = (0, 0), j = 1, . . . , N (implying X = Y = 0) is also a fixed point of the dynamics. The question we wish to address now is whether this fixed point of the global system remains unstable when the coupling constant is increased.
7.4.2 Description in Terms of Global Order Parameters A generic approximation method to obtain a closed set of equations for a reduced number of global order parameters is the following. One starts by expanding Eqs. (7.63) and (7.64) perturbatively into the deviations of the variables x j , y j and τ j with respect to their population average values. Note that we assume that the coupling constant k is strong enough so that the global system is in a coherent regime where each individual system remains close to the population average value. We introduce the notations
7.4 A General Approach for Globally Coupled Dynamical Systems
x j = X + δx j ,
y j = Y + δy j , τ j = τ + δτ j ,
229
(7.65)
N N where τ = N −1 Nj=1 τ j . Hence one has by definition j=1 δx j = j=1 δy j = N δτ = 0. By appropriately rescaling the functions g and h, we can set τ = 1, j j=1 without affecting the existence and the stability of the fixed point (0, 0) of the individual dynamical systems. The order parameters consist in the population average parameters X and Y , and possibly a small number of other parameters that remain to be defined if needed. Expanding Eq. (7.63), one gets d ∂g ∂g dX + δx j = g(X, Y ) + g(X, Y ) δτ j + (X, Y ) δx j + (X, Y ) δy j dt dt ∂x ∂y ∂g ∂g (X, Y ) δτ j δx j + (X, Y ) δτ j δy j − kδx j , + (7.66) ∂x ∂y where we have kept only linear terms with respect to δx j and δy j . A similar equation is obtained by expanding Eq. (7.64), simply exchanging δx j and δy j , and replacing g by h. Let us denote by a j the population average of any quantity a j , namely, a j ≡ N −1 Nj=1 a j . Taking the population average of Eq. (7.66), we see that linear terms in δτ j , δx j and δy j cancel out. Taking also into account the analogous equation for Y , we are thus left with the equations dX = g(X, Y ) + dt dY = h(X, Y ) + dt
∂g ∂g (X, Y ) V + (X, Y ) W ∂x ∂y ∂h ∂h (X, Y ) V + (X, Y ) W, ∂x ∂y
(7.67) (7.68)
where we have introduced the new order parameters V = δτ j δx j ,
W = δτ j δy j .
(7.69)
To close the set of Eqs. (7.67) and (7.68), one needs to find evolution equations for the new global variables V and W . The equation for V is obtained by multiplying Eq. (7.66) by δτ j and averaging over the population. Applying a similar procedure to obtain an equation for W , we eventually get dV = σ 2 g(X, Y ) + dt dW = σ 2 h(X, Y ) + dt
∂g ∂g (X, Y ) V + (X, Y ) W − kV ∂x ∂y ∂h ∂h (X, Y ) V + (X, Y ) W − kW, ∂x ∂y
(7.70) (7.71)
where σ 2 = δτ j2 . Hence we have obtained a closed set of four equations for the four-order parameters X , Y , V , and W . Note that to close the equations, we have neglected terms proportional to δτ j2 δx j or δτ j2 δy j .
230
7 Statistical Description of Dissipative Dynamical Systems
7.4.3 Stability of the Fixed Point of the Global System Having obtained Eqs. (7.67), (7.68), (7.70) and (7.71), we can now proceed to the stability analysis of the global fixed point (X, Y, V, W ) = (0, 0, 0, 0), corresponding to a situation in which all individual dynamical systems are at the point (x j , y j ) = (0, 0). We thus linearize the set of Eqs. (7.67), (7.68), (7.70) and (7.71), yielding a matrix equation d Z = M Z, (7.72) dt where we have introduced the vector Z = (X, Y, V, W )T , and M is the stability matrix. To lighten notations, we define g1 =
∂g ∂g ∂h ∂h (0, 0), g2 = (0, 0), h 1 = (0, 0), h 2 = (0, 0). ∂x ∂y ∂x ∂y
(7.73)
The stability matrix M is then given by ⎛
g1 ⎜ h1 M=⎜ ⎝σ 2 g1 σ 2h1
⎞ g2 g1 g2 h2 h1 h2 ⎟ ⎟ . 2 σ g2 g1 − k g2 ⎠ σ 2h2 h1 h2 − k
(7.74)
One needs to compute the eigenvalues of the matrix M in order to determine the stability of the global system. Interestingly, one sees that M has a natural block structure in terms of the 2 × 2 matrix M2 which describes the linear stability of the individual, uncoupled dynamical system (x, y), namely,
g g M2 = 1 2 h1 h2
.
(7.75)
Using this block matrix, M reads
M2 M2 M= σ 2 M2 M2 − kI2
,
(7.76)
with I2 the two-dimensional identity matrix. Diagonalizing M2 , there exists a matrix P2 such that ν 0 (7.77) P2−1 M2 P2 = D2 ≡ 1 0 ν2 where the eigenvalues ν1 and ν2 are the two solutions of the equation ν 2 − (g1 + h 2 )ν + g1 h 2 − h 1 g2 = 0 .
(7.78)
7.4 A General Approach for Globally Coupled Dynamical Systems
231
Note that in what follows, we actually do not need to determine the matrix P2 explicitly. Let us introduce the 4 × 4 matrix Q defined in terms of blocks as
P2 0 Q= 0 P2
.
(7.79)
Its inverse matrix Q−1 is simply given by
P2−1 0 = 0 P2−1
−1
Q
.
(7.80)
˜ = Q−1 MQ, which reads Now we can compute the matrix M ˜ = M or more explicitly
D2 D2 σ 2 D2 D2 − kI2
,
(7.81)
⎛
⎞ 0 ν1 0 ν1 ⎜ 0 ν2 ⎟ ν2 ⎟ ˜ = ⎜ 20 M ⎝σ ν1 0 ν1 − k 0 ⎠ . 0 σ 2 ν2 0 ν2 − k
(7.82)
˜ and M are related by a similarity transform, they share the same eigenvalues. Since M ˜ are determined by solving the equation det(M ˜ − λI) = 0, The eigenvalues λ of M with I the 4 × 4 identity matrix. After some straightforward algebra, one finds ˜ − λI) = [(ν1 − λ)(ν1 − k − λ) − σ 2 ν1 ] [(ν2 − λ)(ν2 − k − λ) − σ 2 ν2 ] . det(M (7.83) Hence the eigenvalues are solutions of one of the following equations, for i = 1 or 2, (7.84) (νi − λ)(νi − k − λ) − σ 2 νi = 0, yielding the four eigenvalues λi,±
k = νi − ± 2
k2 + σ 2 νi2 , 4
i = 1, 2.
(7.85)
Note that the quantity under the square root may be a complex number. We are interested in checking whether the presence of the coupling between individual dynamical systems may restabilize the fixed point X = Y = 0, which is assumed to be unstable in individual systems. The stability criterion for the global system is that the four eigenvalues λi,± have a negative real part. It turns out that this happens when the eigenvalues νi are complex (and thus necessarily complex conjugate) [5], so that we set ν1,2 = β ± iγ (β, γ real), with β > 0 due to the assumed instability of the fixed
232
7 Statistical Description of Dissipative Dynamical Systems
point for individual uncoupled systems. The stability criterion Re λi,± < 0 can be re-expressed as k k2 + σ 2 (β + iγ )2 < − β . (7.86) Re 2 4 One can immediately check that if γ = 0, condition (7.86) cannot be fulfilled, so that complex eigenvalues ν1,2 are indeed required. It also appears clearly that heterogeneity is needed too: if all the dynamical systems share the same time constant τ j , one has σ = 0 and condition (7.86) cannot hold. Assuming γ β, and performing asymptotic expansions of Eq. (7.86) for both k γ and k γ , one finds that the fixed point (X, Y, V, W ) = (0, 0, 0, 0) is stable in the range of coupling values 2β(1 + σ ) < k < (γ 2 − β 2 )
σ2 . β
(7.87)
Hence a sufficiently strong coupling (together with the presence of heterogeneity) is able to restabilize the unstable fixed point present in individual uncoupled systems, generalizing the “oscillator death” phenomenon observed for coupled oscillators and discussed in Sect. 7.3.3. This restabilization effect has been observed in different types of coupled dynamical systems where the dynamics in the absence of coupling converges to a limit cycle after getting away from the unstable fixed point [5]. However, note that for very large values of the coupling, the global fixed point is again unstable.
7.5 Exercices 7.1 Stability of the fixed points of a map Consider the map xt+1 = f (xt ) with a function f (x) defined on the interval [0, 1] and taking values in the same interval. Determine the fixed points of the map and their stability for the following functions f : (a) f (x) = 4x(1 − x); (b) f (x) = 1 − αx + αx 2 where α is a fixed parameter in the range 1 < α < 4. 7.2 Stationary distribution of a chaotic map Determine the stationary distribution p(x) of the chaotic map xt+1 = f (xt ) defined by the function f (x) = 2x if 0 ≤ x ≤ 21 and f (x) = 2x − 1 if 21 < x ≤ 1. 7.3 Kuramoto model with two frequencies Study the Kuramoto model for oscillator synchronization defined with two frequencies ω0 and −ω0 , corresponding to a frequency distribution g(ω) =
1 1 δ(ω − ω0 ) + δ(ω + ω0 ) . 2 2
(7.88)
7.5 Exercices
233
Show that the synchronization transition becomes discontinuous in this case. 7.4 Oscillator death phenomenon For the coupled oscillators model studied in Sect. 7.3.3 and leading to the oscillator death phenomenon, consider the frequency distribution g(ω) given in Eq. (7.61). Determine the growth rate λ for the linearized dynamics around the fixed point z j = 0, and recover the stability condition given in Eq. (7.62) for the width ω0 of the frequency distribution g(ω).
References 1. Acebrón, J.A., Bonilla, L.L., Pérez Vicente, C.J., Ritort, F., Spigler, R.: The Kuramoto model: a simple paradigm for synchronization phenomena. Rev. Mod. Phys. 77, 137 (2005) 2. Bonilla, L.L., Casado, J.M., Morillo, M.: Self-synchronization of populations of nonlinear oscillators in the thermodynamic limit. J. Stat. Phys. 48, 571 (1987) 3. Cencini, M., Falcioni, M., Olbrich, E., Kantz, H., Vulpiani, A.: Chaos or noise: difficulties of a distinction. Phys. Rev. E 62, 427 (2000) 4. Crawford, J.D.: Introduction to bifurcation theory. Rev. Mod. Phys. 63, 991 (1991) 5. De Monte, S., d’Ovidio, F., Mosekilde, E.: Coherent regimes of globally coupled dynamical systems. Phys. Rev. Lett. 90, 054102 (2003) 6. Ermentrout, G.B.: Oscillator death in populations of “all to all” coupled nonlinear oscillators. Phys. D 41, 219 (1990) 7. Kuramoto, Y.: Chemical Oscillations. Waves and Turbulence. Springer, New York (1984) 8. Matthews, P.C., Mirollo, R.E., Strogatz, S.H.: Dynamics of a large system of coupled nonlinear oscillators. Phys. D 52, 293 (1991) 9. Matthews, P.C., Strogatz, S.H.: Phase diagram for the collective behavior of limit-cycle oscillators. Phys. Rev. Lett. 65, 1701 (1990) 10. Mirollo, R.E., Strogatz, S.H.: Amplitude death in an array of limit-cycle oscillators. J. Stat. Phys. 60, 245 (1990) 11. Strogatz, S.H.: SYNC: The Emerging Science of Spontaneous Order. Hyperion, New York (2003)
Chapter 8
A Probabilistic Viewpoint on Fluctuations and Rare Events
Statistical physics obviously bears some strong connection with probability theory. Indeed, the very basis of statistical physics is to associate a probability with each microscopic configuration of a system. When dealing with dynamics, the mathematical theory of stochastic Markov processes also plays a central role, as we have seen in Chap. 2. In the present chapter, we wish to introduce some further aspects of probability theory which are of relevance to statistical physics, namely, the properties of the sum and of the extreme values of a set of random variables. Theorems on sums of random variables are actually cornerstones of statistical physics, as they constitute the underlying reason why macroscopic variables take non-random values even though microscopic variables are random. The chapter is organized as follows. In Sect. 8.1 we introduce key aspects of sums of random variables, including the Law of Large Numbers as well as the standard and generalized forms of the Central Limit Theorem. Then in Sect. 8.2 we briefly consider the statistics of extreme values and of records in a set of random variables. Finally, we briefly discuss in Sect. 8.3 the notion of large deviations, which plays an increasing role in modern statistical physics.
8.1 Global Fluctuations as a Random Sum Problem A generic problem of interest in statistical physics is to determine the statistics of fluctuating global observables, like the total energy, magnetization, or number of particles, which are obtained as the sum of a large number of individual contributions associated with small regions of the system. Probabilistic theorems describing the behavior of random sums are thus of great importance in statistical physics.
© Springer Nature Switzerland AG 2021 E. Bertin, Statistical Physics of Complex Systems, Springer Series in Synergetics, https://doi.org/10.1007/978-3-030-79949-6_8
235
236
8 A Probabilistic Viewpoint on Fluctuations and Rare Events
8.1.1 Law of Large Numbers and Central Limit Theorem We start by discussing two cornerstones of the probabilistic theory of random sums, namely, the Law of Large Numbers and the Central Limit Theorem. Both of them deal with the statistical properties of sums of random variables. Roughly speaking, the Law of Large Numbers describes the fact that the empirical average of a series of random variables converges to the theoretical average (the expectation, in probabilistic terms). On the other side, the Central Limit Theorem characterizes the tiny fluctuations of the empirical average around the expectation. To formulate these two theorems, we consider a set of independent and identically distributed random variables (x 1 , . . . , x N ), drawn from a common distribution p(x), and we define the sum S N = Nj=1 x j . Law of Large Numbers. Let us define the rescaled sum s N = S N /N , which is nothing but the empirical average of the set of variables (x1 , . . . , x N ). The probability distribution of s is denoted as N (s). Under the assumption that the distribution p(x) has a finite average value m = x p(x) d x, the Law of Large Numbers states that the distribution N (s) converges to a Dirac distribution, N (s) → δ(s − m),
N →∞
(8.1)
(see Ref. [10] for a mathematically more rigorous formulation). In other words, the random variable s N converges to the non-random value m when N → ∞. The Law of Large Numbers has important applications in statistical physics. The fact that global observables like the total energy or total magnetization in a large system composed of many degrees of freedom has only tiny fluctuations precisely comes from the Law of Large Numbers. In addition, the fact that the probability of an event can be measured as the empirical frequency of appearance of the event over a large number of independent realizations is also a consequence of the Law of Large Numbers. As an example, let us briefly discuss how the cumulative distribution x0 p(x) d x of a random variable x can be evaluated in this way. function F(x0 ) = −∞ One starts by introducing an auxiliary variable y = θ (x0 − x), where x0 is an arbitrary constant, and θ is the Heaviside function, equal to 1 for a positive argument and to 0 otherwise. Applying the Law of Large Numbers to the variable y, one finds that the empirical average s N (x0 ) = N −1 Nj=1 θ (x0 − x j ) converges to θ (x0 − x) = F(x0 ) when N → ∞. Since this result is valid for any x0 , s N (x0 ) is thus an estimator of the cumulative probability distribution F(x0 ). Central Limit Theorem. Knowing that the empirical average s N converges to its expectation m, one can wonder about the amplitude and the shape of the fluctuations m. Assuming that the distribution p(x) has a finite second moment of s N around x 2 = x 2 p(x) d x, one easily finds that the variance of s N is equal to σ 2 /N , a finite where σ 2 is the variance of x (note that a finite second moment also implies √ first moment). Fluctuations of s N around m are thus of the order of 1/ N . To characterize the shape of these fluctuations (or equivalently, of the fluctuations of S N = Nj=1 x j around its expectation N m), we introduce the rescaled variable
8.1 Global Fluctuations as a Random Sum Problem
zN =
237
sN − m SN − N m , √ √ = 1/ N N
(8.2)
which by definition has a variance equal to σ 2 . The Central Limit Theorem states that the distribution N (z) of the variable z N converges to a centered Gaussian distribution of variance σ 2 , 1 2 2 e−z /2σ , N (z) → √ 2 2π σ
N → ∞.
(8.3)
Let us emphasize that both the Law of Large Numbers and the Central Limit Theorem rely on the assumptions that the variables are independent, identically distributed, and have a finite first moment, as well as a finite second moment in the case of the Central Limit Theorem. When at least one of these assumptions breaks down, the theorems are no longer valid. In practice, the convergence results given in Eqs. (8.1) and (8.3) remain valid when the random variables xi ’s are not too strongly correlated and have distributions that do not strongly differ one from the other. In the case of strongly correlated variables, and/or strongly non-identically distributed random variables, the limit distributions may differ from that given in Eqs. (8.1) and (8.3). A scaling different from Eq. (8.2) may also be needed in some cases to obtain a convergence to a well-defined limit distribution. This is also true when the assumption of a finite variance is broken: as soon as σ 2 is infinite, the limit distribution becomes nonGaussian, and a scaling different from Eq. (8.2) is needed to reach a limit distribution. This situation is described by the Generalized Central Limit Theorem that is stated below.
8.1.2 Generalization to Variables with Infinite Variances As we have just mentioned, a generalization of the Central Limit Theorem is required when the variables considered have an infinite variance. Before stating the Generalized Central Limit Theorem, let us first provide typical examples of probability distributions having infinite variances. Such laws typically have (at least approximately) a power-law tail, and for the sake of simplicity, we briefly discuss here only the case of a pure power-law distribution p(x) =
α x0α , x 1+α
x ≥ x0 (α > 0)
(8.4)
with p(x) = 0 for x < x0 (a lower bound is necessary to make the distribution normalizable). A distribution like Eq. (8.4) is sometimes called a Pareto distribution. The second moment of the distribution reads
238
8 A Probabilistic Viewpoint on Fluctuations and Rare Events
∞
x 2 =
x 2 p(x) d x =
x0
∞ x0
α x0α d x, x α−1
(8.5)
which converges on condition that α > 2. If α ≤ 2, the second moment x 2 is infinite. As a result, the Central Limit Theorem only applies to sets of independent and identically distributed random variables (x1 , . . . , x N ) distributed according to Eq. (8.4) if α > 2. For α ≤ 2 a generalization of the theorem is needed. Before presenting the Generalized Central Limit Theorem, we first need to introduce the Lévy distribution L(z; α, β), which is defined for 0 < α ≤ 2 and −1 ≤ β ≤ 1 through its characteristic function (that is, the Fourier transform of the probability density) ˆ L(k; α, β) ≡
∞
L(z; α, β) eikz dz = exp − |k|α 1 − iβ sgn(k) ϕ(k, α) −∞
with ϕ(k, α) =
⎧ ⎨ tan ⎩
πα 2
(8.6)
if α = 1 , (8.7)
2 π
ln |k| if α = 1 .
The Lévy distribution L(z; α, β) is illustrated in Fig. 8.1, for β ≥ 0. Distributions with β < 0 are the symmetric of the ones for β > 0 with respect to the Y -axis, since L(z; α, −β) = L(−z; α, β). In practice, the Lévy distribution L(z; α, β) is obtained by inverting the Fourier transform (8.6), leading to the integral representation 1 L(z; α, β) = π
∞
α
dk e−k cos(βk α ϕ(k, α) − kz) .
(8.8)
0
The integral in Eq. (8.8) then has to be evaluated numerically. We now provide the formulation of the Generalized Central Limit Theorem. We denote again as (x1 , . . . , x N ) a set of N independent and identically distributed random ∞variables drawn from a distribution p(x) having an infinite second moment −∞ x 2 p(x) d x. The cumulative distribution function is defined as x F(x) ≡ −∞ p(x ) d x . We define for a given set of constants {a N } and {b N } the rescaled sum of the variables xi , ⎛ ⎞ N 1 ⎝ zN = x j − aN ⎠ . bN j=1
(8.9)
The Generalized Central Limit Theorem [10, 12] states that for a suitable choice of the rescaling parameters a N and b N , the distribution N (z) of the rescaled sum z N converges to the Lévy distribution L(z; α, β) if the following conditions are satisfied
8.1 Global Fluctuations as a Random Sum Problem
10
L(z;α,β)
0.5 10 0.4 0.3
10
-1
0.3
10
-2
10
-3
1
10
100
10
0.2
-1
-2
10
L(z;α,β)
0.6
239
-3
-4
10
-5
1
10
100
0.1
0.2 0.1 0
-8
-4
z
0
4
8
0 -8
-4
0
z
4
8
Fig. 8.1 Illustration of the Lévy distribution L(z; α, β) for different values of α and β. Left: α = 0.5, and β = 0 (dashed line), 0.5 (full line) and 1 (dot-dashed). The inset shows the positive tails of these distributions, which behave as 1/x 1+α (the dotted line indicates a slope −1.5). Right: α = 1.5, and β = 0 (dashed line), 0.5 (full line) and 1 (dot-dashed). Inset: positive tails of the same distributions (dotted line: slope −2.5)
F(−x) 1−β = x→∞ 1 − F(x) 1+β 1 − F(x) + F(−x) = r α. ∀r > 0, lim x→∞ 1 − F(r x) + F(−r x) lim
(8.10) (8.11)
Let us now briefly comment on this Generalized Central Limit Theorem. First, it is worth noticing that the parameter α in the Lévy distribution L(z; α, β) has the same interpretation as in the example of the Pareto distribution discussed in Eq. (8.4): it characterizes the power-law decay of the tail of the distribution. In fact, both the Lévy distribution and the distribution p(x) from which the summed variables are drawn decay at large x as an inverse power law with exponent 1 + α. The parameter β, on the other side, characterizes the asymmetry of the Lévy distribution. A value β = 0 corresponds to a symmetric distribution L(−z; α, 0) = L(z; α, 0), while for β > 0 (resp. β < 0) the positive (resp. negative) tail carries a higher probability weight than the opposite tail. The parameters α and β of the limit distribution can be determined from the original distribution p(x), according to Eqs. (8.10) and (8.11). In practice, simpler criteria can, however, be used, as discussed now. Let us first notice that the distribution p(x) actually has two tails, one in +∞ and one in −∞. The parameter α is related to the tail with the slowest decay. Let us assume that p(x) behaves as c1 , x 1+α1 c2 p(x) ∼ , |x|1+α2 p(x) ∼
x → +∞ ,
(8.12)
x → −∞ ,
(8.13)
240
8 A Probabilistic Viewpoint on Fluctuations and Rare Events
with 0 < α1 , α2 ≤ 2. Then the parameter α is given by α = min(α1 , α2 ). The parameter β is related to the relative weight of the two tails of p(x). If α1 < α2 , the positive tail is dominant and β = 1. This is also what happens if one sums positive variables, that is, if p(x) = 0 for x < 0. More generally, this is the case when p(−x)/ p(x) → 0 when x → +∞. Conversely, if the negative tail is dominant (α2 > α1 , and more generally p(−x)/ p(x) → 0 when x → −∞), one finds β = −1. In cases when both tails have comparable weights, that is, α1 = α2 so that p(−x)/ p(x) goes to a finite value when x → ∞, the parameter β satisfies −1 < β < 1, and is given by β=
c1 − c2 , c1 + c2
(8.14)
where c1 and c2 are the prefactors of the power-law decays of the tails, as given in Eqs. (8.12) and (8.13). The choice of the parameters a N and b N also depends on α. When α > 1, the mean value x is finite, so that a N is simply equal to N x (with possibly an additive constant). When α ≤ 1, x is infinite, so that there is no point in characterizing the fluctuations around the mean. One then takes a N = 0 (or again, possibly a constant non-zero value) for α < 1 (the case α = 1 involves logarithmic corrections). The Generalized Central Limit Theorem may then more naturally be interpreted, for α ≤ 1, as a generalization of the Law of Large Numbers. On the other side, the coefficient b N formally takes a similar expression, namely, b N ∝ N 1/α , for all values of α (0 < α < 2). The limit case α = 2 corresponds to the standard rescaling used in the Central Limit Theorem.
8.1.3 Case of Non-identically Distributed Variables In the previous subsection, we focused on the simplest case of independent and identically distributed random variables. As we already emphasized, this is a strong assumption whose validity can be questioned in many applications. Going beyond this assumption requires to consider either correlated variables (strictly speaking, dependent variables), or independent non-identically distributed variables. Of course, in a general situation, the random variables would be both correlated and non-identically distributed. We start by considering the case of independent, but non-identically distributed random variables. By definition, the probability distribution of such random variables factorizes into a product of non-identical functions of the individual variables (called the marginal distributions): PN (x1 , ..., x N ) =
N j=1
p j (x j ).
(8.15)
8.1 Global Fluctuations as a Random Sum Problem
241
Such a factorization property makes the analytical treatment easier, so that some of the results obtained for independent and identically distributed random variables can tentatively be generalized in the present framework. In particular, a generalized form of the Central Limit Theorem exists, if the following condition, called the Lindeberg condition [10], is satisfied. Consider a set (x1 , . . . , x N ) of N independent random variables with probability distribution p j (x), j = 1, . . . , N , with finite first and second moments. We denote as m j ≡ x j the first moment of x j , and as σ j2 ≡ x 2j − m 2j its variance. We further introduce the rescaling parameters
aN =
N
m j,
j=1
⎛ ⎞ 21 N bN = ⎝ σ j2 ⎠ ,
(8.16)
j=1
as well as the rescaled sum ⎛ ⎞ N 1 ⎝ zN = x j − aN ⎠ , b N j=1
(8.17)
which has a distribution N (z). Note that by definition of the rescaling parameters a N and b N , the variable z N has zero mean and unit variance. The distribution N (z) converges to a Gaussian distribution if and only if the Lindeberg condition is satisfied, namely, N 1 dv v 2 p j v + x j = 0 (8.18) lim 2 N →∞ b N j=1 |v|>b N for all > 0. Intuitively, the Lindeberg condition means that the individual terms in the sum become infinitesimal with respect to the sum in the limit of an infinite number of terms. It is clear that the Lindeberg condition holds for independent and identically distributed random variables, in which case the condition simply reads 1 lim N →∞ b2 1
|v|>b N
dv v 2 p (v + x) = 0,
(8.19)
√ using b N = b1 N (a condition valid only for independent and identically distributed variables). The fact that b N → ∞ ensures that condition (8.19) is satisfied. Coming back to the general case of non-identically distributed variables, it is worth noticing that the Lindeberg condition implies that the variance b N of the sum diverges when the number of terms goes to infinity. This can be checked be showing that if b N is bounded, the Lindeberg condition cannot be satisfied. As b N is by definition an increasing function of N , assuming that it is bounded implies that b N goes to a finite limit when N → ∞. As a result, all the integrals that are summed in Eq. (8.18) become independent of N (although they generically depend on j) in the large N limit. All these integrals are non-negative by definition. For small enough ε, the
242
8 A Probabilistic Viewpoint on Fluctuations and Rare Events
integral corresponding to j = 1 is strictly positive, so that the sum in Eq. (8.18) has to be larger than a strictly positive bound. Since the prefactor 1/b2N converges, by assumption, to a finite limit, the limit in Eq. (8.18) cannot be equal to zero. In cases when the Lindeberg condition does not hold, N (z) converges to a nonGaussian limit distribution, which depends on the specific problem at hand. As an explicit example of non-Gaussian distribution appearing in this context, one can mention the following simple 1/ f -noise model [2]. In this model, one considers a random time signal h(t) that is discretized into a sequence of values h k , k = 0, . . . , N − 1. Note that we consider t as a time for the sake of simplicity, but t could alternatively be interpreted as a space coordinate in a one-dimensional system. The discretized signal h k can be analyzed through a discrete Fourier transform, defining the complex Fourier amplitude N −1 1 cn = √ h e−2iπ fn N =0
(8.20)
associated with the frequency f n = n/N , n = 0, . . . , N − 1. As a simplification, the present 1/ f -noise model assumes that the Fourier coefficients cn are statistically independent random variables, with real and imaginary parts distributed according to Gaussian distributions of variance σn2 = κ/n for n > 0 (hence the name 1/ f noise, since the frequency is proportional to n). The fluctuations of the signal are characterized by its empirical variance (or “roughness”) EN =
N −1
(h − h)2 ,
(8.21)
=0
which is also a random variable. Using Parseval’s theorem, the empirical variance may be rewritten as N −1 |cn |2 . (8.22) EN = n=1
In this way, E N also appears as the total energy (or integrated spectrum) of the signal. Note that the Fourier coefficient c0 , which is proportional to the average value of the signal, plays no role here and thus does not appear in Eq. (8.22), since we focus on fluctuations around the mean value. At this stage, we see that E N turns out to be a sum of independent, but non-identically distributed variables u n ≡ |cn |2 . One can show, from the Gaussian distributions of the real and imaginary parts of cn , that the distribution of u n is exponential, p˜ n (u n ) = nκ e−nκu n .
(8.23)
8.1 Global Fluctuations as a Random Sum Problem
243
It follows that variance of u n is given by κ 2 /n 2 , according to which the variance the N −1 Var(E N ) = n=1 Var(u n ) goes to a finite limit when N → ∞. As a result, the Lindeberg condition does not hold, showing that the limit distribution is non-Gaussian. In order to determine the limit distribution when N → ∞, we rescale the energy − E N )/σ , where σ 2 is the infinite N limit of the variance of E N , into ε = (E N 2 2 2 that is σ = ∞ n=1 κ /n . The distribution of ε N is denoted as N (ε). In order to determine N (ε), it is convenient to define the characteristic function χ N (ω) =
∞
−∞
dε N (ε) e−iωε .
(8.24)
Since the variables u n are independent, the characteristic function of the sum is simply the product of the characteristic functions of the variables u n , n = 1, . . . , N − 1. Taking the limit N → ∞, the characteristic function χ N (ω) converges to a limit function χ∞ (ω), which reads [2, 5] ∞ iω −1 iω . χ∞ (ω) = 1+ exp nσ κ nσ κ n=1
(8.25)
The expression can be put into a more manageable form using the relation (1 + z) = e−γ E z
∞ e z/n , 1 + nz n=1
(8.26)
with γ E = 0.577 . . . the Euler constant, and where z is an arbitrary complex number satisfying z = −1, −2, ∞. . . [13]. The Euler Gamma function appearing in Eq. (8.26) is defined as (x) = 0 dt t x−1 e−t . The inverse Fourier transform of χ∞ (ω) can be computed explicitly, yielding a Gumbel distribution ∞ (ε) = exp[−(bε + γ E ) − e−(bε+γ E ) ],
π b= √ . 6
(8.27)
This is a rather surprising result as the Gumbel distribution is known to appear usually in the context of extreme value statistics, as described below. The fact that it also appears in the present model which bears no obvious relation to extreme value statistics can actually be understood in a simple way [4, 5], which we, however, do not detail here. One of the main interests of the present 1/ f -noise model is to illustrate the fact that correlated random variables (the original signal h k ) can in some cases be converted, through a Fourier transform, into a sequence of independent, but non-identically distributed variables (the amplitudes cn ), which allows for a simpler analytical treatment. Of course, the assumption that the Fourier amplitudes are statistically independent random variables is an approximation with respect to realistic systems, but it already captures the onset of a non-Gaussian limit distribution for the total energy, which
244
8 A Probabilistic Viewpoint on Fluctuations and Rare Events
is a result of interest. In more general cases, however, a problem of sum of correlated random variables cannot be so easily converted into a problem of independent random variables, and one has to deal directly with the correlated case.
8.1.4 Case of Correlated Variables Determining the limit distribution of the sum of a sequence of correlated variables may be a difficult task. There exist, however, results for specific classes of correlated random variables, like martingale differences [10], or functionals of stationary Gaussian sequences [7, 18]. On a less rigorous basis, arguments have also been put forward in the physics literature to generalize the Central Limit Theorem to some classes of correlated random variables, for instance, by considering “deformed products” [3] or related notions based on the non-extensive entropy formalism [20]. Results based on a class of random variables with a joint probability expressed as a product of matrix functions (instead of real functions in the case of independent random variables) have also been proposed recently [1], generalizing the type of distributions found in the ASEP model (see Sect. 3.3.3). From a heuristic point of view, the emergence of non-Gaussian distributions for strongly correlated variables can be understood as follows. Let us consider a large system in which microscopic degrees of freedom are correlated over a typical length ξ , smaller than the system size L. One can then virtually decompose the system into boxes of linear size of the order of ξ , so that correlations between different boxes are weak. In many cases, the global variable of interest can be decomposed as a sum of contributions from each box. Then, the global observable can be approximated as a sum of N = (L/ξ ) D independent random variables, where N is the number of boxes (and D is the space dimension). Assuming that the local observable in each box has a finite variance, the distribution of the sum tends to a Gaussian distribution when the number of boxes N goes to infinity. This is the case, for instance, if the correlation length ξ is fixed and the system size L goes to infinity. Yet, it happens in some physical systems that the correlation length is proportional to the system size L, so that ξ/L takes a finite value when L → ∞ (this is the case, for instance, in generalizations of the 1/ f -noise problem [5]). In this situation, the number of independent boxes remains finite, and the Central Limit Theorem does not hold so that the limit distribution of the sum is not Gaussian. Beyond this type of heuristic arguments, some rigorous results exist in particular for Gaussian sequences of random variables. Such sequences are defined by the following joint probability distribution ⎞ ⎛ N det R 1 ⎠, exp ⎝− xi Ri−1 PN (x1 , . . . , x N ) = j xj (2π ) N /2 2 i, j=1 √
(8.28)
8.1 Global Fluctuations as a Random Sum Problem
245
where R is a positive-definite matrix of determinant det R, and Ri−1 j are the elements −1 of the inverse matrix R . Note that for simplicity, we focus here on the case of centered variables, for which xi = 0. Generalization to non-centered variables is, however, straightforward. If there exists a function r (m) such that the matrix elements of R satisfy Ri j = r (|i − j|), the Gaussian sequence is said to be stationary. In this case, the marginal distribution of xi is a centered Gaussian distribution of variance r (0), while the two-point correlation xi xi+m is equal to r (m). If a large Gaussian stationary sequence (x1 , . . . , x N ) is characterized by a corre−α lation function r (m) with a power-law decay N at large distance, r (m) ∼ m (α > 0), xi converges [as usual, up to a rescalthen the distribution of the sum S N = i=1 ing z N = (S N − a N )/b N ] to a Gaussian distribution for all values of α > 0 [7, 18]. Hence for Gaussian sequences, even very strong correlations do not prevent the distribution of the sum from converging to a Gaussian limit. For stationary sequences of non-Gaussian correlated variables, a Gaussian distribution of the sum is obtained at least when the sum nm=1 r (m) converges for n → ∞ [6]. Deviations from the Gaussian limit distribution may be obtained by considering non-Gaussian sequences of random variables, with strong enough correlations. Generic results exist for variables obtained as non-linear transforms yi = ψ(xi ) of the Gaussian variables xi [7, 18]. These results rely on the expansion of the function ψ(x) onto the basis of Hermite polynomials. The detailed presentation of this result goes beyond the scope of the present book. However, a simple application of the theorem can be provided. In the case when yi = xi2 − 1 [17], the theorem states that the limit distribution of the rescaled sum z N is Gaussian as long as α > 1/2, while it is non-Gaussian when α < 1/2. The corresponding non-Gaussian limit distribution is known through its cumulant expansion.
8.1.5 Coarse-Graining Procedures and Law of Large Numbers As mentioned above, the Law of Large Numbers and the Central Limit Theorem have many applications in statistical physics. We have seen, for instance, in Chap. 2 direct applications of the Central Limit Theorem in the context of random walks. Standard random walks are described by the standard Central Limit Theorem, while random walks with broad distributions of jump size (i.e., with infinite variance) are described by the Generalized Central Limit Theorem. Accordingly, the distribution of the position of the random walk is Gaussian in the first case and is a Lévy distribution in the second case. Here, we would like to briefly address the role of the Law of Large Numbers in the derivation of large-scale equations describing continuous fields like the density field. Let us consider a model similar to the Zero-Range Process, though we do not explicitly specify the dynamics. The model is defined on a one-dimensional lattice with L sites and describes identical particles hopping between different sites. The
246
8 A Probabilistic Viewpoint on Fluctuations and Rare Events
number of particles on site i = 1, . . . , L is denoted as n i . In order to coarse-grain the model, we further split the lattice into boxes containing a number of sites, such that 1 L. A given box is labeled by an almost continuous variable x = i 0 /L, where i 0 is the site at the center of the box. The width of the boxes is equal to x = 1/L. We then define a coarse-grained density field as ρ(x) =
1 ni . x i∈B(x)
(8.29)
Turning to dynamics, the model is defined in such a way that particles can be exchanged from any site within a given box to any site within neighboring boxes. Given a time interval [t, t + t], we introduce the number φi, j (t, t) of particles transferred between i and j (with i and j belonging to different boxes) in this interval, counted positively from i to j and negatively from j to i (i < j). Coarse-graining this particle transfer at the level of boxes leads us to introduce the quantity Q(x, t, δt) =
φi, j (t, t) .
(8.30)
i∈B(x), j∈B(x+x)
The balance of particle transfers then reads ρ(x, t + t) − ρ(x, t) =
1 Q(x − x, t, t) − Q(x, t, t) . x
(8.31)
We now assume that the random variables n i corresponding to different sites (whether in the same box or not) are statistically independent, with the same distribution —this property is true, for instance, in the Zero-Range Process, assuming it is in contact with a reservoir of particles. We further assume that the statistical independence property is still valid when the system is close to a stationary state. As a result, the Law of Large Numbers can be applied, in the large box size limit, to the density of particles in each box since it is defined as a sum of a large number of independent and identically distributed variables. Hence it follows that the random variable ρ(x) can be identified with its ensemble average value ρ(x). In the same way, and under a similar assumption of statistical independence, the coarse-grained transfer of particles Q(x, t, t) can be identified with its ensemble average value Q(x, t, t). This allows for further simplifications, since Q(x, t, t) is, for sufficiently small t, proportional to t: Q(x, t, t) = J (x, t) t .
(8.32)
Expanding Eq. (8.31) to first order in t and x, we eventually get ∂ρ ∂J (x, t) = − (x, t) ∂t ∂x
(8.33)
8.1 Global Fluctuations as a Random Sum Problem
247
which is the general form of the continuity equation for the density field, in one dimension —see also Eq. (3.94) for the two-dimensional version in the case of self-propelled particles. The above reasoning illustrates how the Law of Large Numbers can be of key importance in order to justify coarse-graining procedures into continuous fields obeying deterministic equations like Eq. (8.33). To conclude this discussion, two remarks are in order. First, the above model was build in an ad hoc way, to fulfill all requirements in order to safely apply the Law of Large Numbers. One generally faces several difficulties when trying to coarse-grain more generic models: the random variables may not be independent and identically distributed, and the flux of particles between boxes may not be a sum of many contributions, but rather a small number of local terms through the box boundaries. In this case, more sophisticated methods need to be used to perform the coarse-graining. Second, it is worth noticing that one can go beyond the Law of Large Numbers and try to describe the tiny fluctuations of the coarse-grained fields using the Central Limit Theorem. This is done in practice by adding a Gaussian noise term to the flux J [16]. The density field then remains a stochastic variable, but it is possible to decompose it into an average value plus some fluctuations whose evolution can be characterized through the continuity equation.
8.2 Rare and Extreme Events 8.2.1 Different Types of Rare Events In full generality, rare events are simply events with a low probability (one sometimes says “in the tail of the probability distribution”). Yet, to deserve interest, what is often called a rare event is implicitly an event that, on top of occurring unfrequently, may also have a significant impact on the evolution of the system, if the event is relative to some observable defined on a given system. Beyond these very generic characterizations, the notion of rare events may actually refer to several types of events. A first type of rare event is that associated to the crossing of a threshold or an energy barrier, for instance, like in chemical reaction pathways or during a local rearrangement of a dense amorphous material. If the threshold or barrier is high, the typical time to overcome it is very large, and the crossing event thus has a very low probability. Yet, this crossing is key to the dynamics and allows the systems to explore different regions of phase space, thereby allowing the chemical reaction between molecules to occur, or the material to deform and to relax its stress. Hence this is typically a situation in which a rare event has a noticeable impact on the dynamics of the system. A second type of rare event corresponds to extreme values, that is to maximum or minimum values in a set of random variables or in practice empirical data, and the related notion of records. These important notions are, respectively, the subjects of the next two subsections (Sects. 8.2.2 and 8.2.3). Moreover, a third type of rare event
248
8 A Probabilistic Viewpoint on Fluctuations and Rare Events
may be identified as the effect of extreme events on a sum of broadly distributed random variables, as we have seen in Chap. 2 in the case of anomalous diffusion, and in the present chapter when discussing the Generalized Central Limit Theorem in Sect. 8.1.2. Finally, a last type of rare event corresponds to extremely rare events that cannot be observed in practice unless a control parameter of the system is tuned to make them become typical. This situation is one of the basic motivations to introduce the notion of large deviation function that characterizes events that have a probability which decreases exponentially with system size (typically like its volume). This is the topic of Sect. 8.3.
8.2.2 Extreme Value Statistics In this section, we provide some standard results on the statistics of extreme values, namely, the maximum or minimum value in a set of random variables. For simplicity, it is convenient to restrict the study to the case of maximum values, as the case of minimum values can be mapped onto the one of maximum values through a change of sign. Let us consider a sequence (x1 , . . . , x N ) of independent and identically distributed random variables, with probability distribution p(x), called the parent distribution. It is useful to also introduce the cumulative distribution, defined as x p(x ) d x , (8.34) F(x) = −∞
which is the probability that the random variable is smaller than a value x. For a given sequence of variables (x1 , . . . , x N ), one can define the maximum value in the set, (8.35) y N = max(x1 , . . . , x N ). We denote as FNmax (y) the probability that the maximum y N is smaller than a given value y. By definition of the maximum, xi ≤ y N for all i. Hence FNmax (y) is equal to the probability that all the variables xi are smaller than y. Since we are considering independent and identically distributed random variables, FNmax (y) can simply be written as (8.36) FNmax (y) = F(y) N . In the same spirit as when looking for the limit distribution of the rescaled sum in the context of the Central Limit Theorem, we are interested here in the limit distribution of the rescaled maximum of the set (x1 , . . . , x N ). Introducing a sequence of rescaling parameters a N and b N , we first define the rescaled maximum as zN =
yN − aN . bN
(8.37)
8.2 Rare and Extreme Events
249
0.8
10
Gumbel Fréchet Weibull
10
-1
p(z)
p(z)
0.6
0
0.4
10
-2
0.2 0 -4
0
z
4
8
10
-3
-4
0
z
4
8
Fig. 8.2 Left: Illustration of the Gumbel (full line), Fréchet (dashed line), and Weibull (dot-dashed) probability densities. Both the Fréchet and Weibull distributions have a parameter μ = 2. Right: Same data on semi-logarithmic scale
The question is then to know whether, for a suitable choice of a N and b N , the cumulative distribution HN (z) of the rescaled maximum z N converges to a limit distribution. Standard results of extreme value statistics [11, 14] lead to the following statement. Depending on the distribution p(x) of the variables xi , the limit cumulative distribution, lim N →∞ HN (z), can take three different forms: • Hf (z) = exp −z −μ for z > 0 and 0 for z ≤ 0 (Fréchet distribution); μ • Hw (z) = exp(−(−z) ) for z < 0 and 1 for z ≥ 0 (Weibull distribution); • Hg (z) = exp −e−z (Fisher-Tippett–Gumbel or Gumbel distribution). In these distributions, μ is a positive parameter, related to the parent distribution p(x). The probability densities pf (z), pw (z) and pg (z) are obtained from the cumulative distributions Hf (z), Hw (z) and Hg (z) by taking the derivative. As an example, the Gumbel probability density is given by pg (z) = exp −z − e−z .
(8.38)
An illustration of the Fréchet, Weibull, and Gumbel distributions is provided in Fig. 8.2. Note that the cumulative distributions Hf (z), Hw (z) and Hg (z) can also be formulated in a compact way using a single expression Hγ (z ) = exp −(1 + γ z )−1/γ , 1 + γ z > 0,
(8.39)
where z = az + b, a and b being some constant parameters which depend on γ . The case γ > 0 corresponds to the Fréchet distribution, with the relation μ = 1/γ . The case γ < 0 instead corresponds to the Weibull distribution, for which μ = −1/γ . Finally, the case γ = 0, to be interpreted as the limit γ → 0 in Eq. (8.39), corresponds to the Gumbel distribution. The different distributions Hf (z), Hw (z) and Hg (z) are selected according to some asymptotic large x properties of the parent distribution p(x). If p(x) decays
250
8 A Probabilistic Viewpoint on Fluctuations and Rare Events
as a power law p(x) ∼ 1/x 1+μ when x → ∞ (μ > 0), the limit distribution is the Fréchet one with the same exponent μ as the one characterizing the parent distribution p(x). If rather the variable x is bounded by a constant A in the sense that p(x) = 0 for x > A, and p(x) behaves as a power law close to x = A, namely, p(x) ∼ (A − x)μ−1 with μ > 0, the limit distribution is the Weibull one. Finally, in the case when the distribution p(x) decays faster than any power law, either for x going to infinity or for x going to a finite bound A, the limit distribution is the Gumbel one. It is customary to say that the distribution p(x) belongs either to the Fréchet, Weibull, or Gumbel class according to the limit distribution of the maximum. A typical example of a distribution p(x) belonging to the Gumbel class is the exponential distribution p(x) = λ exp(−λx). Let us illustrate in this simple case how the limit distribution can be derived. The corresponding cumulative probability distribution is F(x) = 1 − exp(−λx) for x > 0 and F(x) = 0 for x ≤ 0. Then the cumulative probability distribution FNmax (y) of the maximum is given by N FNmax (y) = F(y) N = 1 − e−λy .
(8.40)
The goal is to find rescaling parameters a N and b N such that the distribution of the rescaled maximum z N = (y N − a N )/b N converges to a limit distribution, in this case the Gumbel distribution. Considering Eq. (8.40), one finds by inspection that a correct rescaling is obtained by choosing a N = (ln N )/λ and b N = 1/λ. Inverting the definition of z N to get y N = a N + b N z, we have the following convergence property: FNmax (a N
N e−z + b N z) = 1 − → exp −e−z N
(8.41)
when N → ∞. The derivation of the Gumbel limit distribution starting from an arbitrary cumulative distribution F(x) is more complicated, but the spirit of the derivation remains the same.
8.2.3 Statistics of Records A notion closely related to that of extreme value is that of record. A record in a sequence of random variables (x1 , x2 , . . . , xi , . . . ) occurs at the nth step when the value xn is larger than all previous values xi , i = 1, . . . , n − 1 (for simplicity, we focus here on upper records; lower records are obtained symmetrically as minimal values). Of course, the record value xn is also the maximum value of the set of variables (x1 , . . . , xn ), but the question asked in extreme value statistics and in record statistics are slightly different. In extreme value statistics, one considers a fixed number N of variables, (x1 , . . . , x N ), and asks about the statistics of the maximum value y N = max(x1 , . . . , x N ) in the set; the limit N → ∞ is eventually taken. Note that the order of the variables in the set (x1 , . . . , x N ) plays no role. In record statistics,
8.2 Rare and Extreme Events
251
one rather looks at the occurrence of successive records, so that the order of variables in a given sample matters, and the sequence does not have a fixed length, but is rather considered to be infinite from the outset. One thus defines the kth record in a recursive way. The first variable x1 defines the first record r1 . Then one looks at the next variables (x2 , x3 , . . . ) in the sequence for the occurrence of the second record r2 , that is, the first variable x j ( j > 1) such that x j > r1 . This occurs for a value j = n 2 , and we have r2 = xn 2 . In the same way, one defines recursively the record rk = xn k , which is the first variable in the sequence exceeding the previous record rk−1 . There are typically two types of quantities that can be investigated in the framework of record statistics: first, the statistics of the “time” n k at which the kth record occurs; second, the statistics of the records rk themselves. In the latter case, one may be interested in looking for the limit distribution of the variable rk in the limit k → ∞, up to a suitable rescaling as done in the case of extreme value statistics. We will consider here only the most basic statistical properties of n k . Interestingly, for independent and identically distributed random variables xi , these properties do not depend on the probability distribution p(x) but are universal [11, 15]. Let us start by considering the probability Pn that the nth variable in the sequence, xn , is a record. This probability reads Pn =
∞
−∞
p(xn )F(xn )n−1 d xn ,
(8.42)
x where F(x) = −∞ p(x) d x is the cumulative distribution of x. Eq. (8.42) is simply obtained by averaging over xn the probability F(xn )n−1 that the n − 1 other variables x1 , . . . , xn−1 are smaller than xn . Noting that d F(xn )n = np(xn )F(xn )n−1 d xn
(8.43)
we easily obtain that Pn = n1 , independently of the distribution p(x). An immediate consequence is that the average number Nn of records occurring up to “time” n is given by n n 1 (8.44) Pk = Nn = k k=1 k=1 which in the large n limit behaves logarithmically to leading order, Nn ≈ ln n + γE ,
(8.45)
where γE ≈ 0.577 is the Euler constant. In contrast, the asymptotic limit distribution of the records rk depends on the distribution p(x) of the variables in the sequence, but only through classes of limit distributions, similarly to the case of extreme value statistics. One can here again
252
8 A Probabilistic Viewpoint on Fluctuations and Rare Events
split the distributions p(x) into three different classes, namely, Gumbel, Fréchet and Weibull, depending on their asymptotic behavior, which allows one to define the parameter γ in a similar way as in extreme value statistics, see Eq. (8.39). As above, the Gumbel class corresponds to distributions p(x) decaying faster than any power law (typically exponentially), the Fréchet class to distributions decaying as a power law at infinity, while the Weibull class describes distributions behaving as a power law close to an upper bound. The limit distributions are, however, different from that obtained in extreme value statistics. Let us introduce the rescaled kth record r k − ak , bk
zk =
(8.46)
where ak and bk are suitably chosen rescaling parameters. We denote as Rk (z) the cumulative distribution of z k . If the distribution p(x) is in the Gumbel class (γ = 0), the limit distribution limk→∞ Rk (z) is given by [11] Rg (z) = (z),
(8.47)
where (z) is the integrated normal distribution, (z) =
z
e−y
2
/2
dy.
(8.48)
−∞
In other words, the limit distribution is simply a Gaussian (or normal) distribution. For the Fréchet class (γ > 0), the limit distribution reads Rf (z) = (γ ln x), x > 0
(8.49)
thus corresponding to a (positive) lognormal distribution. Conversely, for the Weibull class (γ < 0), the limit distribution is given by Rw (z) = γ ln(−x) , x < 0
(8.50)
which corresponds to a (negative) lognormal distribution.
8.3 Large Deviation Functions We have already encountered the notion of large deviation form of a probability distribution, for instance, in the case of phase transitions (Sect. 1.4), reaction–diffusion processes (Sect. 3.2), or random networks (Sect. 6.1). However, this form only appeared as a formal property in these previous examples, and we wish to discuss here the interest and interpretation of such a form.
8.3 Large Deviation Functions
253
8.3.1 A Simple Example: The Ising Model in a Magnetic Field To illustrate the notion of large deviation function and the relevance to describe extremely rare events, let us consider a simple example, the effect of a magnetic field h on an Ising model at high temperature, well above the ferromagnetic transition temperature. In this case, the coupling energy between spins can be safely neglected with respect to the thermal energy (i.e., J/T 1). Let us first consider the case of a zero magnetic field (h = 0). By simply counting the configurations having a macroscopic magnetization N 1 si , (8.51) m= N i=1 where N is the number of spins, one obtains for the probability distribution P(m) P(m, h = 0) ∝ e−N cm
2
(8.52)
for not too large values of m, with some constant c > 0 [see Eqs. (1.91) and (1.93)]. Hence any value m = 0 has a vanishingly small probability to be observed in the thermodynamic limit N → ∞. Equation (8.52) is a simple example of large deviation form of a distribution. More generally, a distribution function P(x) has a large deviation form if it takes the asymptotic form, for large N , P(x) ∝ e−N φ(x) ,
(8.53)
where φ(x) is called the large deviation function, or rate function. A more rigorous definition can be written as φ(x) = − lim
N →∞
1 ln P(x) . N
(8.54)
In this general setting, N may be the number of particles, of spins, or the volume of the system. In the case of the paramagnetic model, Eq. (8.52) yields for the large deviation function φ(m, h = 0) = cm 2 . In the presence of a magnetic field h = 0, one then finds P(m) ∝ e−N cm
2
+N hm/kT
∝ e−N c(m−m 0 ) , m 0 = 2
h 2ckT
(8.55)
or in other words φ(m, h) = c(m − m 0 )2 . Hence in the presence of a magnetic field, the magnetization m 0 , which was extremely rare and in practice unobserved for h = 0, becomes the typical value. Varying an external control parameter thus makes typical a value of the observable that was extremely rare otherwise. The interest of the notion of large deviation function, therefore, partly resides in this property.
254
8 A Probabilistic Viewpoint on Fluctuations and Rare Events
Characterizing the extremely low probability of a random variable is not so interesting in itself: whether the probability of a given event is 10−40 or 10−100 does not make much difference, as the event will never be observed in practice. However, knowing this very low probability enables one to predict the effect of an external control parameter like a magnetic field, which acts as a simple exponential reweighting of the zero field probability: P(m, h) ∝ P(m, 0) e N hm .
(8.56)
A review of the use of large deviation functions in a statistical physics context can be found in Ref. [19].
8.3.2 Explicit Computations of Large Deviation Functions Large deviation functions can be computed thanks to the Gärtner–Ellis theorem, which can be (loosely) stated as follows [19]. Given a set of random variables x N indexed by an integer N and defined over an interval (a, b), the distribution p N (x) takes a large deviation form (8.57) p N (x) ∝ e−N φ(x) if the following scaled cumulant generating function λ(k) = lim
N →∞
1 lne N kx N , N
(8.58)
with k real, exists (i.e., takes finite values) over some interval of k, possibly the whole real axis. Then the large deviation function exists and is given by the Legendre– Fenchel transform of λ(k), φ(x) = sup[kx − λ(k)].
(8.59)
k
At a heuristic level, this relation can be understood as follows, assuming the validity of the large deviation form Eq. (8.57). To compute λ(k), one first needs to evaluate e
N kx N
=
b
d x e N [kx−φ(x)] .
(8.60)
a
When k is such that the maximum xk∗ of kx − φ(x) falls within the interval (a, b), the integral can be evaluated in the large N limit through a saddle-point approximation, a
b
∗
∗
d x e N [kx−φ(x)] ∼ e N [kxk −φ(xk )]
(8.61)
8.3 Large Deviation Functions
leading to
255
λ(k) = kxk∗ − φ(xk∗ ) = sup [kx − φ(x)] .
(8.62)
x∈(a,b)
Hence λ(k) is the Legendre–Fenchel transform of φ(x). Inverting this transform precisely yields Eq. (8.59). A simple application of this theorem is provided by the case of the sum of independent and identically distributed random variables (u 1 , . . . , u N ) of distribution P(u). Defining x N as the empirical mean of the variables u i , xN =
N 1 ui , N i=1
(8.63)
we can test whether the distribution p N (x) of x N takes a large deviation form. Following the Gärtner–Ellis theorem, we compute λ(k), yielding λ(k) = lneku ,
(8.64)
where the brackets here mean an average over the distribution P(u). The large deviation function is then obtained by Eq. (8.59). For example, for an exponential distribution P(u) = e−u , we have λ(k) = − ln(1 − k) and thus φ(x) = x − 1 − ln x (x > 0).
8.3.3 A Natural Framework to Formulate Statistical Physics Large deviation functions turn out to be a natural language for statistical physics, as can be already seen at equilibrium. We have seen in particular when studying equilibrium phase transitions that the distribution of magnetization in the mean-field Ising model takes a large deviation form P(m) ∝ e−N f (m) ,
(8.65)
where f (m) is given in Eq. (1.92). This function has been seen to provide useful information on the phase transition. This is actually another example of the usefulness of large deviation functions. In this mean-field case, the computation of the large deviation function is easy (which is not the case in general as soon as there are correlations—or interactions—in the system), thus providing a direct characterization of the phase transition. Hence determining the whole probability distribution of events that are for most of them unobservable is actually one of the easiest ways to compute the physically observed values. This also has the further advantage to predict the two symmetric most probable values of the magnetization, while a direct computation of the mean magnetization would result in an average over the two symmetric values, hence to m = 0.
256
8 A Probabilistic Viewpoint on Fluctuations and Rare Events
The importance of large deviation functions in equilibrium statistical physics also comes from the fact that basic quantities like the phase-space volume N (E) or partition functions Z N (T ) take large deviation forms Z N (T ) ∝ e−N f (T )/kT
N (E) ∝ e N s(ε) ,
(8.66)
showing that the entropy per degree of freedom s(ε) (with ε = E/N ) and the (rescaled) free energy f (T )/kT play the role of large deviation functions (although in a less restricted sense than that previously introduced, since N (E) and Z N (T ) are not probability distributions). Turning to out-of-equilibrium situations, we have seen an example of the use of a large deviation function in a nonequilibrium context when discussing absorbing phase transitions (Sect. 3.2), as well as networks (Sect. 6.1). More generally, there has been several attempts to use large deviation functions in nonequilibrium models in order to generalize the equilibrium notion of free energy [8, 9]. Such attempts, however, go much beyond the scope of the present book and will not be discussed here.
8.4 Exercises 8.1 Derivation of convergence theorems Derive the Law of Large Numbers and the Central Limit Theorem for N independent and identically distributed random variables xi drawn from the same probability distribution p(x), with finite mean m and variance σ 2 (the finite variance assumption is actually not needed for the Law of Large Numbers). Hint: use characteristic functions as well as appropriately rescaled Nglobal random variables. To be more xi , the appropriate rescaled specific, considering a random sum S N = i=1 √ variable is s N = S N /N for the Law of Large Numbers, and z N = (S N − N m)/ N for the Central Limit Theorem. 8.2 Generalization of the Law of Large Numbers for a mixture Generalize the Law of Large Numbers to the case of a mixture of independent and identically distributed random variables with distinct distributions. In this case the joint probability distribution of the variables (x1 , . . . , x N ) reads 1 1 p1 (xi ) + p2 (xi ) . 2 i=1 2 i=1 N
P(x1 , . . . , x N ) =
N
The appropriate rescaled random variable is still z N = S N /N , with S N =
(8.67) N i=1
xi .
8.3 Link between extreme values and sums of random variables Show that the extreme value statistics of a set of N independent and identically distributed random variables distributed according to the exponential distri-
8.4 Exercises
257
bution p(x) = λe−λx can be mapped to the problem of the sum of independent but non-identically distributed random variables. Hint: order the random variables (x1 , . . . , x N ) and introduce the intervals between successive ordered variables. 8.4 Examples of large deviation functions Large deviation functions characterizing the sum of random variables can be determined using the Gärtner–Ellis theorem. Here we focus on independent and identically distributed (iid) random variables. The procedure consists in evaluating the scaled cumulant generating function and then performing an inverse Legendre–Fenchel transform. Find the explicit N expressions of the large deviation function I (s) of the xi in the following cases: rescaled sum s = N −1 i=1 (a) xi are positive real random variables distributed according to a gamma distribution p(x) = μ2 x e−μx ; (b) xi are positive integer random variables distributed according to Poisson distribution p(n) = e−α α n /n!.
References 1. Angeletti, F., Bertin, E., Abry, P.: General limit distributions for sums of random variables with a matrix product representation. J. Stat. Phys. 157, 1255 (2014) 2. Antal, T., Droz, M., Györgyi, G., Rácz, Z.: 1/f noise and extreme value statistics. Phys. Rev. Lett. 87, 240601 (2001) 3. Baldovin, F., Stella, A.L.: Central limit theorem for anomalous scaling due to correlations. Phys. Rev. E 75, 020101(R) (2007) 4. Bertin, E.: Global fluctuations and Gumbel statistics. Phys. Rev. Lett. 95, 170601 (2005) 5. Bertin, E., Clusel, M.: Generalised extreme value statistics and sum of correlated variables. J. Phys. A Math. Gen. 39, 7607 (2006) 6. Bouchaud, J.P., Georges, A.: Anomalous diffusion in disordered media: statistical mechanisms, models and physical applications. Phys. Rep. 195, 127 (1990) 7. Breuer, P., Major, P.: Central limit theorems for non-linear functionals of Gaussian fields. J. Multivar. Anal. 13, 425 (1983) 8. Derrida, B., Lebowitz, J.L., Speer, E.R.: Free energy functional for nonequilibrium systems: an exactly solvable case. Phys. Rev. Lett. 87, 150601 (2001) 9. Derrida, B., Lebowitz, J.L., Speer, E.R.: Exact free energy functional for a driven diffusive open stationary nonequilibrium system. Phys. Rev. Lett. 89, 030601 (2002) 10. Feller, W.: An Introduction to Probability Theory and its Applications, vol. II, 2nd edn. Wiley, New York (1971) 11. Galambos, J.: The Asymptotic Theory of Extreme Order Statistics. Wiley, New York (1987) 12. Gnedenko, B.V., Kolmogorov, A.N.: Limit Distributions for Sums of Independent Random Variables. Addisson-Wesley (1954) 13. Gradshteyn, I.S., Ryzhik, I.M.: Table of Integrals, Series, and Products. 5th edn. Academic Press, London (1994) 14. Gumbel, E.J.: Statistics of Extremes. Dover Publications (2004) 15. Krug, J.: Records in a changing world. J. Stat. Mech. P07001 (2007) 16. Ortiz de Zarate, J.M., Sengers, J.V.: Hydrodynamic Fluctuations in Fluids and Fluid Mixtures. Elsevier (2006) 17. Rosenblatt, M.: Independence and dependence. In: Proceedings of the 4th Berkeley Symposium on Mathematical Statistics and Probability, vol. 2, p. 431. University of California (1961)
258
8 A Probabilistic Viewpoint on Fluctuations and Rare Events
18. Taqqu, M.S.: Convergence of iterated process of arbitrary Hermite rank. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 50, 53 (1979) 19. Touchette, H.: The large deviation approach to statistical mechanics. Phys. Rep. 478, 1 (2009) 20. Umarov, S., Tsallis, C., Steinberg, S.: On a q-Central Limit Theorem consistent with nonextensive statistical mechanics. Milan J. Math. 76, 307 (2008)
Appendix A
Dirac Distributions
The Dirac distribution δ(x) can be thought of as a function ∞being equal to zero for all x = 0, and being infinite for x = 0, in such a way that −∞ δ(x) d x = 1. The main interest of the Dirac distribution is that for an arbitrary function f ,
∞
−∞
f (x) δ(x − x0 ) d x = f (x0 ),
(A.1)
where x0 is an arbitrary constant. In other words, once inserted in an integral, the Dirac distribution precisely picks up the value of the integrand associated with the value of the variable around which it is peaked. The following property, related to changes of variables in the calculation of integrals, also proves useful. Suppose one needs to compute the integral I (a) =
xmax
d x g(x) δ f (x) − a ,
(A.2)
xmin
where g(x) is an arbitrary function. Such integrals appear, for instance, in the computation of the probability distribution of the variable y = f (x), assuming that the random variable x has a probability distribution g(x). However, this calculation is more general, and does not require the function g(x) to be normalized to 1, or even to be normalizable. To compute an integral such as I (a), the following transformation rule is used, n(a) 1 δ x − xi (a) , (A.3) δ f (x) − a = | f xi (a) | i=1 where x1 (a), . . . , xn(a) (a) are the solutions of the equation f (x) = a over the integration interval (xmin , xmax ). One thus ends up with the following expression for I (a)
© Springer Nature Switzerland AG 2021 E. Bertin, Statistical Physics of Complex Systems, Springer Series in Synergetics, https://doi.org/10.1007/978-3-030-79949-6
259
260
Appendix A: Dirac Distributions
I (a) =
xmax
dx xmin
n(a) i=1
g(x) δ x − xi (a) , | f xi (a) |
(A.4)
leading after integration of the delta distributions to n(a) g xi (a) . I (a) = | f xi (a) | i=1
(A.5)
Appendix B
Numerical Simulations of Markovian Stochastic Processes
In this appendix, we briefly describe some elementary methods to simulate Markovian stochastic processes. We first describe the easiest case of discrete-time processes, and then move on to continuous-time processes.
B.1 Discrete-Time Processes A discrete-time Markovian stochastic process (also called Markov chain) is characterized by the list of transition probabilities T (C |C). We assume here that the process involves a finite number M of discrete configurations, which is often the case in practice. Configurations can thus be labeled as (C1 , . . . , C M ). To simulate the stochastic dynamics, one needs to know how to choose a new configuration C among (C1 , . . . , C M ) starting from an arbitrary configuration C. The new configuration C has to be chosen randomly with a probability T (C |C). This can be done in practice in the following way. For a given configuration C, let us define the variables ai i ai = T (C j |C) , i = 1, . . . , M. (B.1) j=1
One thus has by definition a M = 1. It is also convenient to define a0 = 0. We have for all i = 1, . . . , M that ai − ai−1 = T (Ci |C). Drawing a random number u uniformly distributed over the interval (0, 1], the probability that this random number falls between ai−1 and ai is precisely T (Ci |C), the length of the interval. Hence one simply has to determine i such that ai−1 < u ≤ ai , and to pick up the corresponding configuration Ci . In this way, the configuration Ci is indeed selected with a probability T (Ci |C). An efficient procedure to find the value i such that ai−1 < u ≤ ai is to use a dichotomic algorithm. One starts from j = E(M/2), where E(x) is the integer part of x, and tests if a j < u or a j ≥ u. If a j < u, the correct i satisfies j + 1 ≤ i ≤ © Springer Nature Switzerland AG 2021 E. Bertin, Statistical Physics of Complex Systems, Springer Series in Synergetics, https://doi.org/10.1007/978-3-030-79949-6
261
262
Appendix B: Numerical Simulations of Markovian Stochastic Processes
M, and one takes as a new trial value j the middle of the interval, namely, j = E((M + j + 1)/2). In contrast, if a j ≥ u, the correct i satisfies 1 ≤ i ≤ j, and the new trial value is j = E(( j + 1)/2). By iteration, one rapidly converges to the value i satisfying ai−1 < u ≤ ai .
B.2 Continuous-Time Processes Continuous-time Markovian processes are characterized by transition rates W (C |C), with C = C. We assume here again that the process involves a finite number M of discrete configurations. Starting from a given configuration C j , the questions are: (i) what is the probability to select the configuration Ci (i = j)? (ii) what is the time lag τ until the jump to the new configuration Ci ? The answer to point (i) is quite natural: configurations are selected with a probability proportional to the transition rates, meaning that the probability to choose configuration Ci starting from configuration C j is W (Ci |C j ) . (B.2) P(Ci |C j ) = k(k= j) W (C k |C j ) Concerning point (ii), the time lag τ is a random variable following an exponential distribution p(τ ) = λ e−λτ , (B.3) where λ is the total “activity” λ=
W (Ci |C j ) .
(B.4)
i (i= j)
Hence to simulate the dynamics of a continuous-time Markovian stochastic process, one has to draw a random number τ according to the exponential distribution (B.3), and to select a new configuration i with the probability P(Ci |C j ) given in Eq. (B.2). The procedure to select the configuration is thus very similar to the one used in discrete-time processes. The way to draw a random variable from an exponential distribution is explained in Appendix C. The algorithm to simulate continuous-time Markovian stochastic processes is sometimes called the Gillespie algorithm.
Appendix C
Drawing Random Variables with Prescribed Distributions
Standard random number generators provide independent and identically distributed (pseudo-) random variables with a uniform distribution over the interval (0, 1)— whether the boundaries 0 and 1 are included or not in the interval has to be checked case by case for each generator. The question encountered in practical simulations of stochastic processes is to be able to generate a random variable x with an arbitrary prescribed probability distribution p(x), based on the uniform random number generator at hand. We describe below two methods enabling one to do so. More details can be found, for instance, in the standard textbook Numerical Recipes [1].
C.1 Method Based on a Change of Variable The simplest method is based on a change of variable. For simplicity, we assume that the variable x is defined over an interval (a, b), where −∞ ≤ a < b ≤ +∞. Let us define the variable u = F(x) (a < x < b) (C.1)
with
x
F(x) ≡
p(x ) d x
(C.2)
a
the cumulative distribution function of x. The probability distribution of u is denoted as P(u), and is defined over the interval (0, 1). The standard relation P(u)|du| = p(x)|d x| connecting the distributions of u and x can be rewritten as P(u) =
p(x) . |du/d x|
(C.3)
From Eq. (C.1), we get du/d x = p(x), so that we end up with P(u) = 1. Hence Eq. (C.1) connects a uniformly distributed variable to the desired variable x, and one © Springer Nature Switzerland AG 2021 E. Bertin, Statistical Physics of Complex Systems, Springer Series in Synergetics, https://doi.org/10.1007/978-3-030-79949-6
263
264
Appendix C: Drawing Random Variables with Prescribed Distributions
can simply generate x by drawing a uniform random number u and computing x = F −1 (u),
(C.4)
where F −1 is the reciprocal function of F. In practice, this method is useful only when an analytical expression of F −1 is available, which already covers a number of usual cases of interest, like random variables with exponential or power-law distributions. For instance, an exponential distribution p(x) = λ e−λx
(x > 0)
(C.5)
with λ > 0 can be simulated using the change of variable x =−
1 ln(1 − u) . λ
(C.6)
Since u and (1 − u) have the same uniform distribution, one can in principle replace (1 − u) by u in the r.h.s. of Eq. (C.6). However, one needs to pay attention to the fact that the argument of the logarithm has to be non-zero, which guides the choice between u and (1 − u), depending on whether 0 or 1 is excluded by the random number generator. Similarly, a power-law distribution p(x) =
αx0α x 1+α
(x > x0 )
(C.7)
with α > 0, can be simulated using x = x0 (1 − u)−1/α .
(C.8)
Here again, the same comment about the choice of u or (1 − u) applies. Many other examples where this method is applicable can be found. When no analytical expression of the reciprocal function F −1 is available, one could think of using a numerical estimate of this function. There are, however, other more convenient methods that can be used in this case, as the rejection method described below. Before describing this generic method, let us mention a generalization of the change of variable method, which as an important application allows for the simulation of a random variable with a Gaussian distribution. Instead of making a change of variable on a single variable, one can consider couples of random variables: (x1 , x2 ) = F (u 1 , u 2 ), where u 1 and u 2 are two independent uniform random numbers. It can be shown [1] that the following choice
−2 ln u 1 cos(2π u 2 ) ,
x2 = −2 ln u 1 sin(2π u 2 ) x1 =
(C.9)
Appendix C: Drawing Random Variables with Prescribed Distributions
265
leads to a pair of independent Gaussian random variables x1 and x2 , each with distribution 1 2 (C.10) p(x) = √ e−x /2 . 2π In practice, one often needs a single Gaussian variable at a time and uses only one of the variables (x1 , x2 ). A Gaussian variable y of mean m and variance σ can be obtained by the simple rescaling y = m + σ x, where x satisfies the distribution (C.10).
C.2 Rejection Method An alternative method, which is applicable to any distribution, is the rejection method that we now describe. Starting from an arbitrary target distribution p(x) defined over an interval (a, b) (where a and/or b may be infinite), one first needs to find an auxiliary positive function G(x) satisfying the three following conditions: (i) for all x such b that a < x < b, G(x) ≥ p(x); (ii) a G(x) d x is finite; (iii) one is able to generate numerically a random variable x with distribution p(x) ˜ = b a
G(x) G(x ) d x
(a < x < b) ,
(C.11)
through another method, for instance, using a change of variable. Then the rejection method consists in two steps. First, a random number x is generated according to the distribution p(x). ˜ Second, x is accepted with probability p(x)/G(x); this is done by drawing a uniform random number u over the interval (0, 1), and accepting x if u < p(x)/G(x). The geometrical interpretation of the rejection procedure is illustrated in Fig. C.1. That the resulting variable x is distributed according to p(x) can be shown using the following simple reasoning. Let us symbolically denote as A the event of drawing the variable x according to p(x), ˜ and as B the event that x is subsequently accepted. We are interested in the conditional probability P(A|B), that is, the probability distribution of the accepted variable. One has the standard relation P(A|B) =
P(A ∪ B) . P(B)
(C.12)
The joint probability P(A ∪ B) is simply the product of the probability p(x) ˜ and the acceptance probability p(x)/G(x), yielding from Eq. (C.11) P(A ∪ B) = b a
p(x) G(x ) d x
.
(C.13)
266
Appendix C: Drawing Random Variables with Prescribed Distributions
p(x), G(x)
0.4 0.3
G(x) p(x) P1
0.2 0.1
P2
Acceptance area
0 0
1
Rejection area
2
3
4
5
x Fig. C.1 Illustration of the rejection method, aiming at drawing a random variable according to the normalized probability distribution p(x) (full line). The function G(x) (dashed line) is a simple upper bound of p(x) (here, simply a linear function). A point P is randomly drawn, with uniform probability, in the area between the horizontal axis and the function G(x). If P is below the curve defining the distribution p, its abscissa x is accepted (point P1 ); it is otherwise rejected (point P2 ). The random variable x constructed in this way has probability density p(x)—see text
Then, P(B) is obtained by summing P(A ∪ B) over all events A, yielding P(B) = a
b
dx b a
p(x) G(x ) d x
= b a
1 G(x ) d x
.
(C.14)
Combining Eqs. (C.12), (C.13) and (C.14) eventually leads to P(A|B) = p(x). From a theoretical viewpoint, any function satisfying conditions (i), (ii), and (iii) is appropriate. Considering the efficiency of the numerical computation, it is, however, useful to minimize the rejection rate, equal from Eq. (C.14) to r = 1− b a
1 G(x) d x
.
(C.15)
b Hence the choice of the function G(x) should also try to minimize a G(x) d x, to make it relatively close to 1 if possible. Note that G(x) does not need to be a close upper approximation of p(x) everywhere, only the integral of G(x) matters. Reference 1. Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes, the Art of Scientific Computing, 3rd edn. Cambridge University Press, Cambridge (2007)
Solutions of the Exercises
The solutions of the exercises given at the end of each chapter are reported here. Note that throughout the solutions of the exercises, we use for convenience the notation β = 1/kB T for the inverse temperature, and set the Boltzmann constant to kB = 1 by an appropriate choice of units, unless otherwise stated.
Exercises of Chap. 1 1.1 Free energy and entropy Given the partition function Z in the canonical ensemble at temperature T , the free energy is defined as F = −T ln Z and the entropy can be evaluated as S = −∂ F/∂ T . (a) For the set of harmonic oscillators, the partition function reads Z=
N i=1
∞
−∞
d xi e− 2 βλxi = 1
2
2π βλ
N /2
.
(D.1)
The free energy F and the entropy S are then obtained as 2π T N NT ln , S= F =− 2 λ 2
2π T ln +1 . λ
(D.2)
(b) Similarly for the paramagnetic spin model one finds for the partition function N Z = 2 cosh(βh) ,
(D.3)
leading for the free energy and the entropy to F = −N T ln 2 cosh(βh) , S = N ln 2 cosh(βh) − Nβh tanh(βh) . © Springer Nature Switzerland AG 2021 E. Bertin, Statistical Physics of Complex Systems, Springer Series in Synergetics, https://doi.org/10.1007/978-3-030-79949-6
(D.4) 267
268
Solutions of the Exercises
1.2 Energy equipartition (a) By definition, in the canonical ensemble, the average energy is computed as E = −∂ ln Z /∂β. Here for a single degree of freedom with a quadratic energy 21 λx 2 , the partition function boils down to Z = 2π/βλ, see Exercise 1.1. The relation
1 2 1 λx = kB T 2 2
(D.5)
straightforwardly follows. (b) The above calculation still applies to the case of non-interacting harmonic oscillators, since the degrees of freedom xi are statistically independent. One thus has for all i 1 1 2 λi x = kB T , (D.6) 2 i 2 which is the precise meaning of the equipartition relation: all quadratic degrees of freedom have the same average energy although their stiffness may differ. (c) Using the definition of the Fourier amplitudes u(q), ˆ one obtains u j+1 − u j =
1 −iqa (e − 1) uˆ q e−iq ja . N q
(D.7)
The square (u j+1 − u j )2 can then be expressed as a double sum over two indices q and q . Performing the sum over j and exchanging the sums over j and over (q, q ), one has to evaluate N e−i(q+q ) ja = N δq+q ,0 , (D.8) j=1
where δn,n is the Kronecker delta, equal to 1 if n = n and 0 otherwise. It follows that q = −q in the sum, and the total energy reads N 1 j=1
2
λ(u j+1 − u j )2 =
1 λ 1 − cos(qa) |uˆ q |2 , N q
(D.9)
2 2 + uˆ q,I , with uˆ q,R and uˆ q,I , respectively, the real and imaginary where |uˆ q |2 = uˆ q,R parts of the complex Fourier amplitude uˆ q . Now the total energy decomposes into a sum of quadratic terms depending on a single degree of freedom (the real or imaginary part of uˆ q ) and the equipartition relation can be applied to each degree of freedom.
1.3 Ising chain (a) Using periodic boundary conditions (N + 1 ≡ 1), the total energy of the Ising chain can be written as
Solutions of the Exercises
269
E(s1 , . . . , sn ) = −
N i=1
J si si+1 +
h (si + si+1 ) 2
.
(D.10)
The partition function Z = {si } e−β E(s1 ,...,sn ) can then be expressed as the trace Z = TrT N of the N th power of a 2 × 2 matrix T called transfer matrix given by
β(J +h) eβ J e . T = eβ J eβ(J −h)
(D.11)
Note that the two indices of the transfer matrix T are the two spins si and si+1 , and the elements of the matrix T are
h (D.12) T (s, s ) = exp J ss + (s + s ) . 2 Introducing the two eigenvalues λ1 and λ2 (with λ1 > λ2 > 0), Z is then given by Z = λ1N + λ2N . The eigenvalues are solutions of [eβ(J +h) − λ] [eβ(J −h) − λ] − e−2β J = 0 so that λ1,2 = e
βJ
2 −4β J cosh βh ± sinh βh + e .
(D.13)
(D.14)
(b) It follows that the free energy per spin f is given in the limit N → ∞ by f = −T ln λ1 , so that
f = −J − T ln cosh βh + sinh2 βh + e−4β J . (D.15) (c) Using the definition of the average magnetization m, one sees that it can be obtained from the free energy per spin f as m = −T
sinh βh ∂f = . 2 ∂h (sinh βh + e−4β J )1/2
(D.16)
The magnetization is thus equal to zero for all temperature in the absence of external field, h = 0. Then the susceptibility follows by differentiating m with respect to h, χ=
∂m βe−4β J cosh(βh) = . ∂h (sinh2 βh + e−4β J )3/2
(D.17)
One asymptotically finds different results depending on h in the small temperature limit (β → ∞),
270
Solutions of the Exercises
χ ∼ 4β exp(−2β(2J + h)) → 0
if h > 0 ,
(D.18)
χ ∼ β exp(2β J ) → ∞
if h = 0 .
(D.19)
Hence the two limits h → 0 and T → 0 do not commute. 1.4 Paramagnetic model in a random field (a) For a given realization of the random fields h i , the magnetization reads m=
N 1 tanh(βh i ) . N i=1
(D.20)
Averaging over the disorder (the corresponding average is indicated by an overbar), one has ∞ N 1 1 2 2 m= tanh(βh i ) = √ dh e−h /2h 0 tanh(βh) = 0, N i=1 2π h 0 −∞
(D.21)
where the last equality follows by parity considerations. (b) The Edwards–Anderson parameter q = m 2 reads q=
1 tanh(βh i ) tanh(βh j ) . N 2 i, j
(D.22)
For i = j, the fields h i and h j are statistically independent so that tanh(βh i ) tanh(βh j ) = tanh(βh i ) tanh(βh j ) = 0 .
(D.23)
It follows that for N → ∞, q=
∞ N 1 1 2 2 2 tanh(βh i ) = √ dh e−h /2h 0 [tanh(βh)]2 . N i=1 2π h 0 −∞
(D.24)
For a high temperature T (small β) such that βh 0 1, one finds by expanding the hyperbolic tangent to leading order β2
q≈√ 2π h 0
∞ −∞
dh h 2 e−h
2
/2h 20
h2 =√ 0 . 2π T 2
In the opposite limit T → 0 (β → ∞), [tanh(βh)]2 → 1 so that q → 1.
(D.25)
Solutions of the Exercises
271
Exercises of Chap. 2 2.1 Detailed balance with respect to the canonical equilibrium distribution Detailed balance reads, in the equilibrium canonical ensemble,
W (C |C) e−β E(C) = W (C|C ) e−β E(C ) .
(D.26)
Assuming that W (C |C) = f (E) with E = E(C ) − E(C), we get f (−E) = f (E) eβE .
(D.27)
A standard form satisfying this relation is f (E) =
ν 1 + eβE
(D.28)
called the Glauber rate (the parameter ν is a frequency scale). A more general form satisfying (D.27) is ν f (E) = , (D.29) αβE (1 + e )1/α but it is rarely used in practice for α = 1. However, note that by taking the limit α → ∞, we get (D.30) f (E) = min 1, e−βE , called the Metropolis rate, which also obeys Eq. (D.27) and is often used in practice in numerical simulations. More generally, a transition rate satisfying the detailed balance relation (D.26) can be parameterized as W (C |C) = f (E) g E(C) + E(C )
(D.31)
with an arbitrary function g, and a function f satisfying (D.27). 2.2 Random walk with memory The vector X t = (xt2 , vt2 )T satisfies the recursion relation X t+1 = AX t + B,
(D.32)
where the matrix A and the vector B are given by
1 1 A= 0 α2
,
0 B= . 1
A particular solution of this equation is given by
(D.33)
272
Solutions of the Exercises (p)
Xt (p)
Writing X t = Yt + X t
=
1 1 − α2
t . 1
(D.34)
one finds Yt+1 = AYt and thus Yt = At Y0 . It follows that (p) (p) X t = At X 0 − X 0 + X t .
(D.35)
Evaluating the powers of A, one finds ⎛
1
At = ⎝ 0 One eventually finds xt2 = −
1−α 2t 1−α 2
α
⎞ ⎠
(D.36)
2t
1 − α 2t t + . (1 − α 2 )2 1 − α2
(D.37)
2.3 Linear Langevin equation with white or colored noise The Langevin equation reads dv = −γ v + ξ(t) . dt
(D.38)
(a) For a white noise satisfying ξ(t)ξ(t ) = δ(t − t ), the calculation of the correlation function v(t)v(t ) closely follows the calculation of v(t)2 done in Sect. 2.2.2, and one finds v(t)v(t ) =
−γ |t−t | e . 2γ
(D.39)
D −|t−t |/τ e , τ
(D.40)
(b) For a colored noise ξ(t) satisfying ξ(t)ξ(t ) =
the integral form of the variance given in Eq. (2.42) remains valid. Evaluating the latter integral using the noise correlation (D.40) one obtains v(t)2 =
D . γ (1 + γ τ )
(D.41)
2.4 Fokker–Planck equation in the Stratonovitch interpretation One considers the multiplicative Langevin equation in the Stratonovitch scheme, dx = Q(x) + B(x)ξ(t) , dt
(D.42)
Solutions of the Exercises
273
with ξ(t) a white noise satisfying ξ(t) = 0 and ξ(t)ξ(t ) = 2 δ(t − t ) (the diffusion coefficient is taken as unity, as it can be absorbed into the factor B(x) > 0). In the Stratonovitch interpretation, usual rules of differential calculus apply. One can then divide the Langevin equation (D.42) by B(x) and define a change of variable y(x) through dy 1 = , (D.43) dx B(x) leading to an additive Langevin equation for the variable y, dy ˜ = Q(y) + ξ(t) dt
(D.44)
˜ with Q(y) = Q(x)/B(x). For the Langevin equation with additive noise there is no ambiguity on the determination of the associated Fokker–Planck equations, which reads ∂ 2 p˜ ∂ ˜ ∂ p˜ (y, t) = − Q(y) p(y, ˜ t) + 2 . (D.45) ∂t ∂y ∂y One can now come back to the original variable x in the Fokker–Planck equation using the relations dx p(y, ˜ t) = p(x, t) , dy
∂ dx ∂ = ∂y dy ∂ x
(D.46)
with d x/dy = B(x). After simplification by an overall factor B(x), one finds ∂ ∂ ∂ ∂p (x, t) = − Q(x) p(x, t) + B(x) B(x) p(x, t) ∂t ∂x ∂x ∂x
(D.47)
which is the Stratonovitch form of the Fokker–Planck equation, as given in Eq. (2.96). 2.5 Fully biased random walk in a disordered environment After N steps, the particle has moved over a distance x = N a where N is the lattice spacing since displacements are fully biased. The time elapsed is the sum of the trapping times N t= τi , (D.48) i=1
where the trapping times τi are drawn from a distribution ψ(τ ) ∼ τ0α /τ 1+α for τ → ∞, with τ0 a characteristic time scale. For 0 < α < 1 and large N the sum typically scales as N 1−1/α 1/α τi ∼ τ0 t . (D.49) i=1
274
Solutions of the Exercises
Eliminating N , one finds that the displacement x scales with time as x ∼
a α t . τ0α
(D.50)
2.6 Decrease of the time-dependent free energy It is convenient to rewrite the time-dependent free energy F = E − T S (with E the ˜ where energy, T the temperature and S the time-dependent entropy) as F = −T S, ˜ =− S(t)
P(C, t) ln P(C, t) e E(C)/T .
(D.51)
C
Then following the same steps as in Sect. 2.6.2, one arrives at 1 d S˜ = dt 2
ln Q(C , t) − ln Q(C, t) Q(C , t) − Q(C, t) W (C|C ) e−E(C )/T ,
C,C (C =C )
(D.52) with the shorthand notation Q(C, t) = P(C, t) e E(C)/T , and where we have used the canonical detailed balance property W (C |C) e−E(C)/T = W (C|C ) e−E(C )/T . As the factors [ln Q(C , t) − ln Q(C, t)] and [Q(C , t) − Q(C, t)] have the same sign, their product is positive and the sum in Eq. (D.52) is positive, meaning that ˜ is an increasing function of time, that reaches its maximum when the canonical S(t) equilibrium is attained (corresponding to Q(C, t) independent of C and of time). As a result, the time-dependent free energy F = −T S˜ is a decreasing function of time that reaches its minimum at equilibrium.
Exercises of Chap. 3 3.1 Run-and-Tumble particle in a potential A Run-and-Tumble particle in a potential U (r) is described by the following master equation (the reader is referred to Sect. 3.1.3 for notations) 2π ∂P λ (r, θ) = −∇ · v0 e(θ) − κ∇U (r) P(r, θ) + dθ P(r, θ ) − λP(r, θ). ∂t 2π 0
(D.53)
Expanding P(r, θ ) into angular Fourier modes f k (r), one obtains v0 ∂ fk =− (∂x + i∂ y ) f k−1 + (∂x − i∂ y ) f k+1 + κ∇ · ( f k ∇U ) − λ f k 1 − δk,0 . ∂t 2 (D.54)
Solutions of the Exercises
275
Calculations for the Run-and-Tumble particle then closely follow that done in Sect. 3.1.3 for an Active Brownian particle. For large tumbling rate λ, one expands the stationary density profile ρ(r) to first order in 1/λ, as
1 ρ(r) ∝ exp −φ0 (r) − φ1 (r) . λ
(D.55)
Assuming that the potential U (r) is invariant along the direction y [U (r) = U (x)], one eventually obtains for φ0 (x) and φ1 (x) κ U (x) , D x κ κ2 2 κ3 φ1 (x) = − U (x) − d x U (x )3 , U (x) + 2 4D 2D 2 0
φ0 (x) =
(D.56a) (D.56b)
where the effective diffusion coefficient D = v02 /2λ is kept fixed when taking the large λ limit. 3.2 Derivation of the Poisson distribution The Poisson distribution P(n, T ) gives the probability that n events occur during a time duration T , for a Poisson process with rate λ (i.e., an event occurs with probability λdt in an infinitesimal time interval dt). For a small dT , one can relate the Poisson distribution associated with durations T + dT and T as follows: P(n, T + dT ) = P(n, T ) (1 − λdT ) + P(n − 1) λdT + o(dT ) ,
(D.57)
for n ≥ 1, and P(0, T + dT ) = P(0, T ) (1 − λdT ) + o(dT ). The first term in the rhs of Eq. (D.57) is the probability that there were n events in the time interval [0, T ] and no event in [T, T + dT ]. Conversely, the second term in the rhs of Eq. (D.57) is the probability that there were n − 1 events in the time interval [0, T ] and 1 event in [T, T + dT ]. The probability to have more than one event in [T, T + dT ] is of higher order in dT , and can be neglected when dT → 0. One thus obtains the following differential equations dP (n, T ) = −λP(n, T ) + λP(n − 1, T ) dT dP (0, T ) = −λP(0, T ) . dT
(n ≥ 1) ,
(D.58) (D.59)
These equations are supplemented by the initial condition P(n, 0) = δn,0 . One can then easily check that the Poisson distribution P(n, T ) =
(λT )n −λT e n!
is a solution of Eqs. (D.58) and (D.59) satisfying the initial condition.
(D.60)
276
Solutions of the Exercises
3.3 Absorbing phase transition and birth–death process The fully connected reaction–diffusion model of Sect. 3.2.3 is generalized to W (n + 1|n) = κn (n ≥ 1) , W (1|0) = ε , λ W (n − 1|n) = νn + n(n − 1) (n ≥ 1) , V
(D.61) (D.62)
with a small parameter ε > 0. The stationary distribution Pn can then be computed from the results on birth–death processes described in Sect. 3.2.1, yielding for n ≥ 1 pn = p0
n ε κ(k − 1) , ν k=2 νn + λk(k − 1)/V
(D.63)
with the convention that the “empty” product (n = 1) is equal to 1. For a large volume V , the product can be approximated by the exponential of an integral, pn ≈ p(ρ) = p0
ρ ε ε κ ≡ p0 e V f (ρ) , exp V dρ ln ν ν + λρ ν 0
(D.64)
with ρ = n/N and ρ = k/V . For κ < ν, the function f (ρ) is maximum at ρ = 0, with f (0) = 0. Taking the limit ε → 0, one finds p0 = 1 and pn = 0 for n ≥ 1. In contrast, when κ > ν, f (ρ) is maximum for ρ = ρ ∗ ≡ (κ − ν)/λ. As long as ∗ εe V f (ρ ) 1, the probability distribution remains peaked around ρ ∗ . Hence by taking the limit V → ∞ before the limit ε → 0, one observes an absorbing phase transition: the distribution is peaked at ρ = 0 for κ < ν and at ρ = ρ ∗ > 0 for κ > ν. One thus recovers in a different way the results of Sect. 3.2.3. 3.4 Frictional masses attached to a spring under tapping dynamics (a) The equation of motion of particle i reads
d xi d 2 xi + f i (t), m 2 = −λxi + μmg sign dt dt
(D.65)
where f i (t) is the external driving force applied during the tapping phases. Blocked configurations are static configurations satisfying λ|xi | < μmg (where mg is the weight of a particle of mass m), meaning that the force tangential to the frictional contact is less than μ times the normal force. Hence blocked configurations correspond to |xi | < a with a length a = μmg/λ. (b) According to the Edwards prescription, the probability of a configuration is nonzero only for blocked configurations, and is then given by an effective Boltzmann factor taking into account the potential energy and an effective temperature Teff . Given that energy is here additive, we get with the notation β = 1/Teff PN (x1 , . . . , x N ) =
N i=1
p(xi )
(D.66)
Solutions of the Exercises
277
with a one-body distribution p(x) given by 1 −βλx 2 /2 e if |x| < a , Z1 p(x) = 0 otherwise .
p(x) =
(D.67) (D.68)
The one-body partition function reads Z1 =
a
d x e−βλx
2
/2
.
−a
(D.69)
(c) The average energy is given by E=
N Z1
Using the change of variable y = tion,
a
dx −a
λ 2 −βλx 2 /2 x e . 2
(D.70)
√ βλ x, one finds a generalized equipartition rela-
1 E = N Teff f 2
a
λ Teff
,
(D.71)
.
(D.72)
with a function f (u) defined as u u f (u) = −u
d x x 2 e−βλx
−u
2
/2
d x e−βλx 2 /2
One has f (u) ∼ 13 u 2 for u → 0 and f (u) → 1 when u → ∞, so that 1 N Teff 2 1 E ≈ NT∗ 2 E≈
for Teff T ∗
(D.73)
for Teff T ∗ .
(D.74)
Hence at low effective temperature, the usual form of the equipartition relation is recovered (though with an effective temperature instead of the true thermodynamic temperature), while at high effective temperature, energy saturates to an upper value, determined by the friction coefficient. 3.5 Mean-field Fokker–Planck equation for interacting active particles (a) The mean-field Fokker–Planck equation governing the evolution of the oneparticle phase-space density f (r, θ, t) reads ∂2 f ∂ ∂f + v0 e(θ ) · ∇ f − τ f + DR 2 , ∂t ∂θ ∂θ
(D.75)
278
Solutions of the Exercises
with e(θ ) the unit vector in the direction θ , v0 the particle speed, and where τ (θ ) is the average torque exerted by neighboring particles, τ (θ ) = π d02
π −π
dθ K (θ − θ ) f (θ ) .
(D.76)
Here K (θ − θ ) = γ sin(θ − θ ) is the torque exerted by a given neighboring particle. (b) Assuming that f (r, θ, t) is spatially homogeneous, f (r, θ, t) = f (θ, t), one introduces the angular Fourier expansion
∞ 1 f (θ, t) = f k (t) e−ikθ , 2π k=−∞
f k (t) =
π −π
dθ f (θ, t) eikθ .
(D.77)
Projecting Eq. (D.75) onto Fourier modes, one gets d fk = −DR k 2 f k + ik dt
π
−π
dθ eikθ τ (θ, t) f (θ, t) .
(D.78)
Note that f 0 = ρ and that Eq. (D.78) for k = 0 implies that the particle density ρ is constant. Introducing the Fourier expansion (D.77) of f (θ, t) in the integral appearing in Eq. (D.78) as well as in the definition (D.76) of τ , one finds after some straightforward algebra (using the change of variable θ˜ = θ − θ to evaluate the double integral on θ and θ ):
π
−π
dθ eikθ τ (θ, t) f (θ, t) =
∞ 1 ˆ K −q f q f k−q , 2π q=−∞
(D.79)
where Kˆ q is the Fourier coefficient of order q of the two-body torque K (θ ), Kˆ q =
π
−π
dθ eikθ K (θ ) .
(D.80)
In the following we neglect f k for |k| ≥ 3 (see Sect. 3.5 for a justification of this approximation). Under this assumption, Eq. (D.78) yields for k = 1 and 2 γ γ d f1 = (ρ − ρc ) f 1 − f 1∗ f 2 , dt 2 2 d f2 2 = −4DR f 2 + γ f 1 , dt
(D.81) (D.82)
where we have defined the critical density ρc = 2DR /γ above which the isotropic state f 1 = f 2 = 0 becomes linearly unstable. For ρ larger than but close to ρc , f 1 is a slow mode while f 2 has a fast relaxation as compared to f 1 . Hence one can
Solutions of the Exercises
279
make a quasi-steady-state approximation on f 2 by neglecting d f 2 /dt, leading to f 2 = γ f 12 /(4DR ). Hence d f1 γ2 γ | f 1 |2 f 1 . = (ρ − ρc ) f 1 − dt 2 DR
(D.83)
Turning to vectorial notations, the evolution equation for the average particle flux w = (Re f 1 , Im f 1 ) reads dw γ2 γ ||w||2 w . = (ρ − ρc )w − dt 2 DR
(D.84)
The corresponding stationary solution w0 for ρ > ρc has an arbitrary direction and a norm given by ||w|| =
(ρ − ρc )DR . 2γ
(D.85)
Exercises of Chap. 4 4.1 Schelling model (a) Referring to the notations of Sect. 4.1, the absence of segregation is characterized by the fact that the second derivative f (ρ) > 0 for all density ρ ∈ (0, 1). In the zero temperature limit, f (ρ) = −u (ρ) where u(ρ) is the utility function, so that the condition for no segregation reads u (ρ) < 0 for all ρ. For instance, the simple utility function u(ρ) = u 0 (1 − ρ) with u 0 an arbitrary positive constant leads to a homogeneous state (no segregation). (b) Introducing the cooperativity (or “altruism”) parameter α according to Eq. (4.17), the second derivative f (ρ) reads, using Eq. (4.18), f (ρ) = (3α − 1)u (ρ) + αρu (ρ) .
(D.86)
Then taking, for instance, the simple utility function u(ρ) = u 1 ρ with u 1 > 0 leads to f (ρ) = (3α − 1)u 1 . Hence for 0 < α < 13 , f (ρ) < 0 and there is segregation (or phase separation), while for 13 < α < 1 one has f (ρ) > 0 and no segregation. 4.2 Model of traffic flow If D(ρ) = 0 for all ρ, the restabilization density ρs becomes equal to the maximal density ρm and the fundamental diagram no longer takes the characteristic inverse λ shape. On the other side, any form of the diffusion coefficient D(ρ) such that D(ρm ) > 0 restabilizes the flow at high density and leads to ρs < ρm . 4.3 Pricing of perishable goods A simple condition can be found by assuming that the contribution of the price p to the satisfaction precisely balances the contribution of the freshness h, that is
280
Solutions of the Exercises
p=
1−g h. g
(D.87)
However, this is a strong condition that can be relaxed. Assuming for simplicity a linear relation p = f (h) = ah so that the average of the price over the freshness can be easily calculated, a more refined condition can be found following the same steps as in Sect. 4.3. 4.4 Wealth distribution with capital tax Introducing a capital tax, the Langevin equation of the wealth distribution model reads [1] dWi = (m − J )Wi + Wi ξi (t) + J W − φWi + f φW . (D.88) dt Following the same reasoning as in Sect. 4.4, one finds for the growth rate γ = m + σ 2 − φ(1 − f )
(D.89)
and for the exponent μ of the rescaled wealth distribution μ=1+
J + φf . σ2
(D.90)
Exercises of Chap. 5 5.1 Kimura diffusion equation One writes the master equation of the birth–death process dpn = λn−1 pn−1 + μn+1 pn+1 − (λn + μn ) pn dt
(1 ≤ n ≤ N − 1) .
(D.91)
Defining x = n/N and p(x, t) = N Pn (t), one can expand Eq. (D.91) in powers of 1/N . The fitness difference s is assumed to be small, of order 1/N . Neglecting all terms of order higher than 1/N 2 (including, for instance, terms like s/N 2 ), one obtains the Kimura diffusion equation, ∂ ∂2 ∂p (x, t) = −N s [x(1 − x) p] + 2 [x(1 − x) p] . ∂t ∂x ∂x
(D.92)
5.2 Population dynamics with a high mutation rate The probability to have n individuals with genome σ2 in a population of N individuals with two accessible genomes σ1 and σ2 is given in Eq. (5.30). For N 1, it is convenient to divide the numerator and the denominator of each factor by N 2 . Then the logarithm of Pn can be approximated as an integral,
Solutions of the Exercises
281
ln Pn ≈ ln P0 + N
n/N
d x ln 0
(1 − x)( f x + ν) . x(1 − x + ν)
(D.93)
The condition ν N 1 is required to avoid discreteness effects at the boundary x = 0. Evaluating explicitly the integral in Eq. (D.93), one finds the expression (5.32) of the large deviation function g(x). It is then straightforward to determine the maximum x ∗ of the function g(x), and one ends up with Eq. (5.33). 5.3 Model of biodiversity The following sum of products ZN =
k N N − j +1 N − j +θ k=1 j=1
(D.94)
has been introduced in Sect. 5.3.4 as a useful auxiliary quantity. To determine its value, it is convenient to derive a recursion relation satisfied by Z N . One first notes that Z 1 = 1/θ . Then for N ≥ 1, one can write Z N +1 as N +1 k
Z N +1 =
N +1 N − j +2 + . N +θ N − j +1+θ k=2 j=1
(D.95)
Using the changes of indices k = k − 1 and j = j − 1, one obtains the recursion relation N +1 1 + ZN . (D.96) Z N +1 = N +θ The solution of this recursion relation satisfying the initial condition Z 1 = 1/θ is easily checked to be Z N = N /θ . N kφk = N translates into B Z N = N , Finally, the normalization condition k=1 implying B = θ . 5.4 Clustering in real space neutral dynamics Considering a one-dimensional system on the segment 0 < x < L, the density field evolves according to ∂ 2ρ ∂ρ (x, t) = D 2 + ξ(x, t), (D.97) ∂t ∂x where ξ(t) is a non-conservative spatiotemporal noise, ξ(x, t)ξ(x , t ) = δ(x − x ) δ(t − t ).
(D.98)
It is convenient to introduce the spatial Fourier expansion ρ(x, t) =
∞ 1 ρˆq (t) e−iq x L n=−∞
2π n q= , q integer L
(D.99)
282
Solutions of the Exercises
with
L
ρˆq (t) =
d x ρ(x, t) eiq x .
(D.100)
0
One has in particular ρ(0, ˆ t) = L ρ, ¯ where ρ¯ is the average density. In terms of Fourier transform, Eq. (D.97) reads
with
d ρˆq = −Dq 2 ρˆq + ξq (t) dt
(D.101)
ξq (t)ξq∗ (t ) = δq,q δ(t − t ),
(D.102)
where the star indicates a complex conjugate. Hence each mode ρˆq follows a Langevin equation independent of the other modes. The average value |ρq |2 is evaluated following the same steps as in Sect. 2.2.2, and one finds |ρq |2 =
2π . Dq 2
(D.103)
Note that there is a factor of 2 with respect to the results of Sect. 2.2.2 because one has to add up the contributions from the real and imaginary parts of ρq . One is interested in the variance Var(ρ) of the density of individuals in the system. For normal fluctuations, Var(ρ) is proportional to 1/L and thus decays to zero when the system size goes to infinity. Here one obtains 1 Var(ρ) = L
L 0
d x ρ(x, t)2 − ρ¯ 2 =
1 |ρq |2 , L 2 q=0
(D.104)
where we have used the Parseval identity, which follows directly from the definition of the Fourier transform. For large L, the sum over q can be approximated by an integral, also using Eq. (D.103), ∞ 2π L L 2 2 . |ρq | ≈ dq = 2π 2π/L Dq 2 2π D q=0
(D.105)
It follows that Var(ρ) = /(2π D) is independent of L for large L, and thus does not decay to zero, indicating anomalously large density fluctuations.
Solutions of the Exercises
283
Exercises of Chap. 6 6.1 Moments of node degree For the Erdös-Rényi random network with probability p = z/N of connecting two nodes, the node degree k has a Poisson distribution Pd (k) =
z k −z e , k!
(D.106)
when the number N of nodes goes to infinity. The average value k is given by k =
∞
k Pd (k) = e−z
k=0
∞
∞
zk zk = ze−z = z. (k − 1)! k! k=0
k=1
(D.107)
For the second moment k 2 , one finds k 2 =
∞
k 2 Pd (k) = e−z
k=0
∞ k=1
∞
k
d z ze = z + z 2 . = ze−z dz
zk zk = ze−z (k + 1) (k − 1)! k! k=0 (D.108)
It follows that the variance Var(k) is given by Var(k) = k 2 − k2 = z .
(D.109)
More generally, moments of arbitrary order can be conveniently computed using the moment generating function μ(s) = e−sk , whose expansion in powers of s generates the moments at all order,1 μ(s) =
∞ (−s)n n=0
n!
k n .
(D.110)
For the Poisson distribution (D.106), one finds μ(s) =
∞ k=0
e
−sk
Pd (k) = e
−z
∞ (ze−s )k k=0
k!
= e z(e
−s
−1)
.
(D.111)
Expanding μ(s) in powers of s to order s 2 , one finds
1
Provided that moments exist and that the series has a non-zero convergence radius, which is the case for a Poisson distribution.
284
Solutions of the Exercises
μ(s) = 1 − sz +
s2 (z + z 2 ) + o s 2 . 2
(D.112)
Hence one recovers in this way the results k = z and k 2 = z + z 2 , and higher moments can be computed in a systematic way from the expansion of μ(s). 6.2 Final fraction of recovered individuals in the SIR model ±∞ = limt→±∞ ρI,R (t). Dividing Eq. (6.18) by ρS and We define the notations ρI,R integrating over the whole real axis, one finds ln ρS∞
−
ln ρS−∞
= −βk
∞ −∞
ρI (t) dt .
(D.113)
On the other side, integrating Eq. (6.20) over the real axis leads to ρR∞ − ρR−∞ = μ
∞ −∞
ρI (t) dt .
(D.114)
Combining Eqs. (D.113) and (D.114), and using ρS−∞ = 1, ρR−∞ = 0 and ρS∞ = 1 − ρR∞ (i.e., ρI∞ = 0), one finds ∞
ρR∞ = 1 − e−βkρR /μ .
(D.115)
For βk/μ < 1, the only solution of Eq. (D.115) is ρR∞ = 0, as expected since the dynamics remains below the epidemic threshold. In contrast, for βk/μ > 1, a nonzero solution of Eq. (D.115) exists and can be obtained perturbatively for βk/μ close to 1, assuming that ρR∞ is small. One then finds ρR∞ ≈
2μ(βk − μ) . (βk)2
(D.116)
6.3 SIS model on heterogeneous network Similarly to the SIR model, the dynamics of the fraction ρkI of infected individuals on nodes of degree k reads dρkI = −μρkI + βkρkS θk . dt
(D.117)
However, at variance with the SIR model, here ρkS = 1 − ρkI , leading to dρkI = (βk − μθk )ρkI + βkθk . dt
(D.118)
The quantity θk is the probability that a randomly chosen contact of a susceptible individual having k contacts is infected. Here, since an individual becomes susceptible again after being infected, the fact to be in contact with a susceptible individual
Solutions of the Exercises
285
does not obviously reduce the probability of being infected as it was the case for the SIR model. Hence it is natural to model θk as θk =
∞
ρkI P(k |k) .
(D.119)
k =1
In the absence of correlations between node degrees, P(k |k) = k Pd (k )/k, leading to ∞ 1 I θ= k ρ Pd (k ) , (D.120) k k =1 k where θk is simply written as θ since it no longer depends on k. Taking the time derivative of θ and using Eq. (D.118), one finds for the early stage linearized dynamics (θ 1, ρkI 1)
k 2 dθ = β −μ θ. (D.121) dt k Epidemic growth is thus obtained for βk 2 − μk > 0, and the corresponding characteristic growth time is given by τ=
k . βk 2 − μk
(D.122)
6.4 Perceptron model for small N Consider, for instance, the following elementary example of the Perceptron model with p = 2 patterns ξ k (k = 1 or 2) each having N = 3 components ξik (i = 1, 2 or 3). Take, for instance, ξ 1 = (1, −1, −1) and ξ2 = (−1, −1, 1), as well as σ1 = 1 and σ2 = −1. To solve the Perceptron problem, one then needs to find three weight parameters (w1 , w2 , w3 ) such that w1 − w2 − w3 > 0 ,
−w1 − w2 + w3 < 0 .
(D.123)
One can check that the choices (w1 , w2 , w3 ) = (1, 1, −1) and (1, −1, −1) both satisfy the constraints given in Eq. (D.123).
Exercises of Chap. 7 7.1 Stability of the fixed points of a map (a) The fixed points satisfying f (x) = x are x1 = 0 and x2 = 43 . The derivative f reads f (x) = 4 − 8x so that f (0) = 4 and f 43 = −2. Hence | f (0)| > 1 and | f 43 | > 1: both fixed points x1 = 0 and x2 = 34 are unstable. (b) The fixed points are x1 = α1 and x2 = 1. The derivative f is given by f (x) =
286
Solutions of the Exercises
−α + 2αx, so that f α1 = 2 − α and f (1) = α. Therefore, the fixed point x1 = α1 is stable for 1 < α < 3 and unstable for 3 < α < 4, while the fixed point x2 = 1 is unstable for all α in the range 1 < α < 4. 7.2 Stationary distribution of a chaotic map The equation f (y) = x has two solutions y for a given x, namely, y1 = x/2 and y2 = (1 + x)/2. The stationary distribution p(x) satisfies the equation
p(y2 ) 1 x 1 1+x p(y1 ) + = p + p . p(x) = | f (y1 )| | f (y2 )| 2 2 2 2
(D.124)
It is easy to check that the uniform distribution p(x) = 1 (the value of the constant being fixed by normalization) is a solution of Eq. (D.124). 7.3 Kuramoto model with two frequencies For a frequency distribution g(ω) =
1 1 δ(ω − ω0 ) + δ(ω + ω0 ) 2 2
(D.125)
Equation (7.45) boils down to r=
1−
w02 , K 2r 2
(D.126)
under the condition that K r > ω0 . Eq. (D.126) has two real solutions when K > 2ω0 . The solution of Eq. (D.126) satisfying the constraint K r > ω0 is ⎞ ⎛ 4ω02 ⎠ 1⎝ r= 1+ 1− 2 . 2 K
(D.127)
For K = 2ω0 (1 + ε) with 0 < ε 1, the expression of r simplifies to r≈
1 + 2
ε . 2
(D.128)
The synchronization transition is thus discontinuous, since the order parameter r takes a non-zero value when ε → 0. 7.4 Oscillator death phenomenon We consider the following distribution of frequencies g(ω) given by g(ω) =
ω4
g0 , + ω04
(D.129)
for the coupled oscillators defined in Eq. (7.53). As argued in Sect. 7.3.3, we focus on the regime K > 1, where the oscillator death phenomenon may happen. Defining
Solutions of the Exercises
287
the Fourier transform G(s) of g(ω), G(s) =
∞
−∞
eiωs g(ω) dω ,
(D.130)
the growth rate λ of the linearized dynamics around the fixed point z j = 0 is solution of the equation [see Eq. (7.60)]
∞
e(1−K −λ)s G(s) ds =
0
1 . K
(D.131)
For g(ω) defined in Eq. (D.129), the Fourier transform G(s) reads [2] G(s) = e
√ −|s|ω0 / 2
sω0 |s|ω0 . cos √ + sin √ 2 2
(D.132)
Evaluating explicitly the integral in Eq. (D.131), one obtains the following equation √ √ λ2 + λ( 2ω0 − 2 + K ) + ω02 − 2ω0 + 1 − K = 0 .
(D.133)
This equation has two (real or complex) solutions λk (k = 1, 2), that can be determined explicitly but have lengthy expressions. The stability condition is that both solutions satisfy Reλk < 0. Whether the solutions are real or complex conjugate values, the condition Reλk < 0 for k = 1 and 2 is equivalent to the condition that λ1 + λ2 < 0 and λ1 λ2 > 0. The sum and product are directly read off from the coefficients of Eq. (D.133), √ λ1 + λ2 = − 2ω0 + 2 − K ,
λ1 λ2 = ω02 −
√ 2ω0 + 1 − K ,
(D.134)
and we look for the range of values of ω0 satisfying these constraints, for a given K > 1. The condition λ1 λ2 > 0 leads to √ 1 ω0 > ω0c = √ 1 + 2K − 1 . 2
(D.135)
One can check that for ω0 > ω0c , one has λ1 + λ2 < 0 so that the stability criterion is fulfilled.
Exercises of Chap. 8 8.1 Derivation of convergence theorems Consider N independent and identically distributed (iid) random variables x j ( j = 1, . . . , N ), with probability distribution (i.e., probability density function) p(x), with
288
Solutions of the Exercises
mean value m and variance σ 2 . One defines the (random) sum S N = useful to introduce the characteristic function ∞ d x p(x) eiq x , χ (q) = eiq x =
N j=1
x j . It is
(D.136)
−∞
which is nothing but the Fourier transform of p(x); the function χ (q) fully characterizes the probability distribution. (a) Law of Large Number: taking s N = S N /N , the characteristic function of s N reads χ N (q) = eiqs N =
N
! eiq x j /N = χ
j=1
q N N
,
(D.137)
where the last equality results from the iid property. The small q expansion of the characteristic function χ (q) reads χ (q) = 1 + iqm −
q2 2 x + o q 2 . 2
(D.138)
It follows that N 1 iqm +o χ N (q) = 1 + → eiqm (N → ∞) . N N
(D.139)
As eiqm is the characteristic function of the Dirac distribution δ(s − m), one concludes that the distribution PN (s) of the random variable s N converges to P∞ (s) = δ(s − m) when N → ∞. (b) Central Limit Theorem: the reasoning follows a similar path, using now the √ rescaled variable z N = (S N − N m)/ N . One finds for the characteristic function χ˜ N (q) of the variable z N :
N √ q . χ˜ N (q) = eiqz N = e−iqm/ N χ √ N
(D.140)
Using the expansion (D.138) of χ (q), one finds
q 2σ 2 +o χ˜ N (q) = 1 − 2N
1 N
N
→ e−q
2
σ 2 /2
(N → ∞) ,
(D.141)
which is the characteristic function of a centered Gaussian distribution of variance σ 2 . Hence the distribution P˜N (z) of the variable z N converges to 1 2 2 e−z /(2σ ) . P˜∞ (z) = √ 2 2π σ
(D.142)
Solutions of the Exercises
289
8.2 Generalization of the Law of Large Numbers for a mixture For a mixture of iid random variables, defined as 1 1 p1 (x j ) + p2 (x j ) 2 j=1 2 j=1 N
P(x1 , . . . , x N ) =
N
(D.143)
one can follow the same steps as for the derivation of the standard Law of Large Numbers, and one finds for the characteristic function χ N (q) of the rescaled sum s N = N −1 Nj=1 x j that χ N (q) →
1 iqm 1 1 iqm 2 + e e 2 2
(N → ∞) ,
(D.144)
∞ with m k = −∞ d x x pk (x) (k = 1, 2). Hence the distribution PN (s) of s N converges when N → ∞ to 1 1 P∞ (s) = δ(s − m 1 ) + δ(s − m 2 ) , (D.145) 2 2 meaning that the empirical average s N no longer converges to a single, well-defined value as in the case of the standard Law of Large Numbers. 8.3 Link between extreme values and sums of random variables Consider N independent and identically distributed (iid) random variables x j > 0 (1 ≤ j ≤ N ), with distribution P(x), and define the ordered set of random variables xn = xσ (n) , with σ (n) a permutation such that x1 ≥ x2 ≥ · · · ≥ x N . Then in particular x1 = max(x1 , . . . , x N ). Introduce the difference y j = x j − x j+1 (1 ≤ j ≤ N − 1)
(D.146)
as well as, for notational convenience, y N = x N . It follows that max (xn ) =
1≤ j≤N
N
yj ,
(D.147)
j=1
meaning that an extreme value problem is mapped to a problem of random sum, where in general the variables y j are not independent and not identically distributed. If P(x) = λ e−λx , the joint distribution PN (y1 , . . . , y N ) is evaluated by ordering the random variables x j ,
∞
PN (y1 , . . . , y N ) = λ N N !
d x N e−λx N . . .
0
× δ(y N − x N )
∞
d x1 e−λx1
x2 N −1 j=1
δ y j − (x j − x j+1 ) .
(D.148)
290
Solutions of the Exercises
The integration interval for the variable x j is [x j+1 , ∞] (1 ≤ j ≤ N − 1). With the change of variables z j = x j − x j+1 (1 ≤ j ≤ N − 1) in Eq. (D.148), the different integrals factorize and one finds: PN (y1 , . . . , y N ) =
N
kλ e−kλyk .
(D.149)
k=1
Hence starting from an exponential distribution P(x), the random variables y j are independent (but not identically distributed). According to known results of extreme value statistics (see Sect. 8.2.2), the mapping between random variables x j and y j shows that the sum of the variables y j is distributed according to a Gumbel distribution. This result can actually be extended to a generalized Gumbel distribution by slightly modifying the distribution of the variables y j [3]. 8.4 Examples of large deviation functions The rescaled sum s = N −1 Nj=1 x j of random variables x j often takes a large deviation form (D.150) PN (s) ∼ e−N φ(s) which may be interpreted as a refined form of the Law of Large Numbers and of the Central Limit Theorem. The Gärtner–Ellis theorem briefly presented in Sect. 8.3 indicates that the large deviation function φ(s) is linked to the scaled cumulant generating function λ(k) through a Legendre–Fenchel transform λ(k) = sup[ks − φ(s)] .
(D.151)
s
Inverting this Legendre–Fenchel transform leads to the expression of φ(s), φ(s) = sup[ks − λ(k)] .
(D.152)
k
To find φ(s), one thus determines k0 such that λ (k0 ) = s, leading to φ(s) = k0 s − λ(k0 ) .
(D.153)
For iid random variables x j with distribution p(x), λ(k) = ekx , where the brackets means an average over the random variable x. (a) Real positive random variable x with distribution p(x) = μ2 x e−μx : In this case,
k (k < μ) , (D.154) λ(k) = −2 ln 1 − μ so that λ (k) = 2/(μ − k). The solution k0 of the equation λ (k0 ) = s is k0 = (μs − 2)/s, yielding μs φ(s) = μs − 2 − 2 ln . (D.155) 2
Solutions of the Exercises
291
(b) Integer positive random variable n with Poisson distribution p(n) = e−α α n /n!: Here λ(k) = α(ek − 1), leading to k0 = ln(s/α) for the solution of λ (k0 ) = s. One then finds for φ(s), s (D.156) φ(s) = s ln − s + α . α References 1. Bouchaud, J.P., Mézard, M.: Wealth condensation in a simple model of economy. Phys. A 282, 536 (2000)
2. Ermentrout, G.B.: Oscillator death in populations of “all to all” coupled nonlinear oscillators. Phys. D 41, 219 (1990)
3. Bertin, E.: Global fluctuations and Gumbel statistics. Phys. Rev. Lett. 95, 170601 (2005)