135 24 6MB
English Pages 438 Year 2023
Fundamental Theories of Physics 212
Naoto Shiraishi
An Introduction to Stochastic Thermodynamics From Basic to Advanced
Fundamental Theories of Physics Volume 212
Series Editors Henk van Beijeren, Utrecht, The Netherlands Philippe Blanchard, Bielefeld, Germany Bob Coecke, Oxford, UK Dennis Dieks, Utrecht, The Netherlands Bianca Dittrich, Waterloo, ON, Canada Ruth Durrer, Geneva, Switzerland Roman Frigg, London, UK Christopher Fuchs, Boston, MA, USA Domenico J. W. Giulini, Hanover, Germany Gregg Jaeger, Boston, MA, USA Claus Kiefer, Cologne, Germany Nicolaas P. Landsman, Nijmegen, The Netherlands Christian Maes, Leuven, Belgium Mio Murao, Tokyo, Japan Hermann Nicolai, Potsdam, Germany Vesselin Petkov, Montreal, QC, Canada Laura Ruetsche, Ann Arbor, MI, USA Mairi Sakellariadou, London, UK Alwyn van der Merwe, Greenwood Village, CO, USA Rainer Verch, Leipzig, Germany Reinhard F. Werner, Hanover, Germany Christian Wüthrich, Geneva, Switzerland Lai-Sang Young, New York City, NY, USA
The international monograph series “Fundamental Theories of Physics” aims to stretch the boundaries of mainstream physics by clarifying and developing the theoretical and conceptual framework of physics and by applying it to a wide range of interdisciplinary scientific fields. Original contributions in well-established fields such as Quantum Physics, Relativity Theory, Cosmology, Quantum Field Theory, Statistical Mechanics and Nonlinear Dynamics are welcome. The series also provides a forum for non-conventional approaches to these fields. Publications should present new and promising ideas, with prospects for their further development, and carefully show how they connect to conventional views of the topic. Although the aim of this series is to go beyond established mainstream physics, a high profile and open-minded Editorial Board will evaluate all contributions carefully to ensure a high scientific standard.
Naoto Shiraishi
An Introduction to Stochastic Thermodynamics From Basic to Advanced
Naoto Shiraishi College of Arts and Sciences University of Tokyo Meguro City, Tokyo, Japan
ISSN 0168-1222 ISSN 2365-6425 (electronic) Fundamental Theories of Physics ISBN 978-981-19-8185-2 ISBN 978-981-19-8186-9 (eBook) https://doi.org/10.1007/978-981-19-8186-9 © Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
In the past three decades, the field of stochastic thermodynamics has been formulated and highly developed. Stochastic thermodynamics extends the framework of thermodynamics, which is aimed to macroscopic objects, to small fluctuating systems including Brownian particles in the laser trap and molecular motors in living systems. Improvements in experiments enable us to control these small systems accurately, which have pushed further developments of stochastic thermodynamics. In conventional thermodynamics for macroscopic systems, the entropy is a key quantity, which characterizes the irreversibility of processes. Several relations on entropy and other related quantities are proved in the form of inequalities, such as the second law of thermodynamics. In stochastic thermodynamics, we define entropy production, which characterizes the irreversibility of processes in small systems. Unlike conventional thermodynamics, entropy production satisfies not only inequalities but also equalities in highly nonequilibrium conditions. Celebrated examples are the fluctuation theorem and the Jarzynski equality, taking an impressive exponential forms. These equalities revealed a hidden symmetric structure of entropy, which sheds new light on thermodynamic irreversibility. Moreover, novel inequalities tighter bounds than the second law of thermodynamics have also been discovered in stochastic thermodynamics. These inequalities supply fresh views on thermodynamic irreversibility in that fundamental thermodynamic constraints exist beyond the second law. In most inequalities, thermodynamic irreversibility is connected to a kind of speed of processes, which is usually out of the scope of conventional thermodynamics. This textbook aims to provide a comprehensive view of stochastic thermodynamics developed in the last three decades. Important research topics in stochastic thermodynamics including the fluctuation theorem, information thermodynamics, and the thermodynamic uncertainty relation are explained by devoting one or more chapters. This textbook also covers a variety of important universal relations in stochastic thermodynamics, ranging from the stochastic efficiency, waiting time statistics, the Hatano-Sasa relation, the Harada-Sasa relation, to Brownian motors and flashing ratchet, autonomous free energy transducers, efficiency at maximum power, and speed limit inequalities. v
vi
Preface
Readers are assumed to be familiar with conventional thermodynamics and basic linear algebra, whereas other additional knowledge is not necessary. This textbook is written in a self-contained manner, and we do not require any knowledge on information theory and stochastic processes. I am grateful to Ken Funo, Sosuke Ito, Kyogo Kawaguchi, Takumi Matsumoto, Takahiro Sagawa, Keiji Saito, Hiroyasu Tajima, Hal Tasaki, and Shumpei Yamamoto, who are collaborators of some of my works related to the subjects of this textbook. Some of the collaborated works with them are explained in this textbook in detail. I also thank Kay Brandner, Massimiliano Esposito, Takashi Hara, Masaru Hongo, Jordan Horowitz, Masato Itami, Eiki Iyoda, Yuki Izumida, Kiyoshi Kanazawa, Kazuya Kaneko, Eiro Muneyuki, Naoko Nakagawa, Takenobu Nakamura, Yohei Nakayama, Takahiro Nemoto, Juan Parrondo, Shin-ichi Sasa, Udo Seifert, Ken Sekimoto, Akira Shimizu, Frédéric von Wijland, David Wolpert, and Kaoru Yamamoto for stimulating discussions, which helps my understanding of stochastic thermodynamics in depth. I express my gratitude to Kay Brandner, Amit Kumar Chatterjee, Andreas Dechant, Hisao Hayakawa, Jordan Horowitz, Masato Itami, Sosuke Ito, Yuki Izumida, Kiyoshi Kanawaza, Kyogo Kawaguchi, Ikumi Kobayashi, Keigo Masaki, Urei Miura, Kunimasa Miyazaki, Kunihiko Mori, Takashi Mori, Yohei Nakayama, Takahiro Nemoto, Takahiro Sagawa, Shin-ichi Sasa, Udo Seifert, Ken Sekimoto, Akira Shimizu, Kohei Takuma, Hiroyasu Tajima, Hal Tasaki, David Wolpert, Kaoru Yamamoto, and Subaru Yoshimoto for careful readings of this textbook and for helpful comments and advice. Meguro City, Japan July 2022
Naoto Shiraishi
Contents
1
Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Aims of Stochastic Thermodynamics . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Overview of This Textbook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Overview of Part I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Overview of Part II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.3 Overview of Part III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.4 Overview of Part IV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.5 How to Read This Textbook? . . . . . . . . . . . . . . . . . . . . . . . 1.3 Notation, Terminologies and Remarks . . . . . . . . . . . . . . . . . . . . . . .
Part I
1 1 3 4 5 7 8 10 11
Basic Framework
2
Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Markov Process and Discrete-Time Markov Chain . . . . . . . . . . . . 2.2 Continuous Time Markov Jump Process on Discrete System . . . . 2.3 Convergence Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Formal Introduction of Markov Process . . . . . . . . . . . . . . . . . . . . . .
17 17 20 22 27
3
Stochastic Thermodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Shannon Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Stochastic Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Shannon Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Definition of Heat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Time-Reversal Symmetry of Equilibrium State . . . . . . . . 3.2.2 Heat in Discrete-State Systems and Detailed-Balance Condition . . . . . . . . . . . . . . . . . . . . 3.3 Entropy Production . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Differences Between Conventional Thermodynamics and Stochastic Thermodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Summary of Conventional Thermodynamics . . . . . . . . . . 3.4.2 Summary of Stochastic Thermodynamics . . . . . . . . . . . . 3.4.3 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31 32 32 33 35 35 37 40 42 43 43 44 vii
viii
Contents
3.4.4 3.4.5 4
Reversible Adiabatic Processes . . . . . . . . . . . . . . . . . . . . . How to Derive Results for Macroscopic Systems from Stochastic Thermodynamics . . . . . . . . . . . . . . . . . . .
46
Stochastic Processes in Continuous Space . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Mathematical Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Wiener Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 Stochastic Differential Equations and Integrals . . . . . . . . 4.1.3 Differential Chapman-Kolmogorov Equation . . . . . . . . . 4.2 Description of Langevin Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Langevin Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Experimental Verification of Langevin Description . . . . 4.3 Heat in Langevin System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Entropy Production and Mean Local Velocity . . . . . . . . . . . . . . . . 4.5 Multi-dimensional Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Discretization and Continuum Limit . . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 Decomposition of Operator . . . . . . . . . . . . . . . . . . . . . . . . 4.6.2 Discretization of the Stochastic Part . . . . . . . . . . . . . . . . . 4.6.3 Discretization of the Deterministic Part . . . . . . . . . . . . . . 4.6.4 Space Discretization and Time Discretization . . . . . . . . .
49 49 49 53 62 66 66 70 71 73 74 75 76 76 78 79
Part II 5
6
45
Equalities
Fluctuation Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Detailed Fluctuation Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Stochastic Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.2 Deterministic Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Integral Fluctuation Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Integral Fluctuation Theorem . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Jarzynski Equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Entropy Production as Phase Volume Change and Expression with KL Divergence . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Kullback-Leibler Divergence . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Phase Volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Deterministic Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.4 Stochastic Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.5 Absolute Irreversibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Thermodynamic Quantities with Strong-Coupling . . . . . . . . . . . . .
83 83 84 88 90 90 92 94 94 99 101 102 103 104
Reduction from Fluctuation Theorem to Other Thermodynamic Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Second Law of Thermodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Standard Derivation of the Second Law . . . . . . . . . . . . . . 6.1.2 Large Deviation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . .
109 109 109 110
Contents
6.2
6.3 7
8
ix
Fluctuation-Dissipation Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Fluctuation-Dissipation Theorem at Zero Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Fluctuation-Dissipation Theorem with Finite Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Higher-Order Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.4 Difference from Conventional Linear Response Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Onsager Reciprocity Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fluctuation-Theorem-Type Equalities . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Hatano-Sasa Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.1 Dual Transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.2 Hatano-Sasa Relation and Generalized Second Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.3 Framework of Steady State Thermodynamics . . . . . . . . . 7.1.4 Hatano-Sasa Inequality and Monotonicity of Kullback-Leibler Divergence . . . . . . . . . . . . . . . . . . . . . 7.2 Entropy Production Under Coarse-Graining . . . . . . . . . . . . . . . . . . 7.2.1 Case Without Nonequilibrium Driving . . . . . . . . . . . . . . . 7.2.2 Case with Nonequilibrium Driving and Hidden Entropy Production . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.3 Invariance of Extended Entropy Through Coarse-Graining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Various Aspects of Symmetry in Entropy Production . . . . . . . . . . . . . 8.1 Introduction to Large Deviation Property and Generating Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.1 Moments and Cumulants . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.2 Counting Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.3 Large Deviation Theory and Rate Function . . . . . . . . . . . 8.1.4 Gärtner-Ellis Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Lebowitz-Spohn Fluctuation Theorem . . . . . . . . . . . . . . . . . . . . . . . 8.2.1 Symmetry in Cumulant Generating Function of Entropy Production . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.2 Fluctuation-Dissipation Theorem Derived from the Symmetry of Cumulant Generating Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Waiting Time Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.1 Martingale Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.2 First Passage Time Statistics . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Work-Heat Rate Function and Stochastic Efficiency . . . . . . . . . . . 8.4.1 Stochastic Current and Stochastic Efficiency . . . . . . . . . . 8.4.2 Carnot Efficiency as Least Probable Efficiency . . . . . . . .
112 112 116 117 119 120 125 125 126 126 128 131 132 133 136 139 141 141 141 142 144 146 147 147
150 150 150 152 154 154 155
x
9
Contents
Information Thermodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 Maxwell’s Demon Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.1 Maxwell’s Original Problem Setting . . . . . . . . . . . . . . . . . 9.1.2 Breakthrough by Szilard . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.3 Arguments by Brillouin and Gabor . . . . . . . . . . . . . . . . . . 9.1.4 Arguments by Landauer and Bennett . . . . . . . . . . . . . . . . 9.1.5 Is Maxwell’s Demon Problem Solved? . . . . . . . . . . . . . . . 9.2 Second Law of Information Thermodynamics . . . . . . . . . . . . . . . . 9.2.1 Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.2 Second Law of Information Thermodynamics . . . . . . . . . 9.2.3 Clarification of Maxwell’s Demon . . . . . . . . . . . . . . . . . . . 9.3 Sagawa-Ueda Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.1 Sagawa-Ueda Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.2 Additivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 Problem of Autonomous Maxwell’s Demon . . . . . . . . . . . . . . . . . . 9.4.1 Autonomous Maxwell’s Demon: 4-State Model . . . . . . . 9.4.2 Second Law of Information Thermodynamic in General Information Processes . . . . . . . . . . . . . . . . . . . 9.4.3 Limitation of Sagawa-Ueda Relation . . . . . . . . . . . . . . . . 9.5 Partial Entropy Production and IFT for General Information Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.1 Partial Entropy Production . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.2 Fluctuation Theorem for Partial Entropy Production . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.3 Fluctuation Theorem for General Information Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6 Another Extension: Ito-Sagawa Relation . . . . . . . . . . . . . . . . . . . . . 9.6.1 Bayesian Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6.2 Transfer Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6.3 Ito-Sagawa Relation and Its Derivation . . . . . . . . . . . . . . . 9.7 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.7.1 Partial Entropy Production with Broken Time-Reversal Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . 9.7.2 Inequality for Partial Entropy Production . . . . . . . . . . . . . 9.7.3 Definition of Heat in Discrete-Time Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.7.4 Information Reservoir . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
159 159 159 160 163 164 166 167 167 168 169 173 173 176 178 178
10 Response Relation Around Nonequilibrium Steady State . . . . . . . . . . 10.1 Fluctuation-Response Relation at Stalling State . . . . . . . . . . . . . . . 10.1.1 Fluctuation-Response Relation on Current at Stalling State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.2 Fluctuation-Response Relation on Time-Symmetric Current at Stalling State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
205 206
179 180 181 181 185 187 190 192 193 195 198 198 201 201 203
206
211
Contents
10.2 Response Theory of Stationary Distribution . . . . . . . . . . . . . . . . . . 10.2.1 Expression of Stationary Distribution by Matrix-Tree Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.2 Response Equality and Inequality for Stationary Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.1 Alternative Proof of Eq. (10.5) Based on the Generating Function . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.2 Proof of Eq. (10.46) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Some Results on One-Dimensional Overdamped Langevin Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Path Probability of Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.1 Onsager-Machlup Functional . . . . . . . . . . . . . . . . . . . . . . . 11.1.2 Fluctuation Theorem and Hatano-Sasa Relation . . . . . . . 11.1.3 Harada-Sasa Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Stationary State of One-Dimensional Overdamped Langevin Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.1 Expression of Stationary Distribution . . . . . . . . . . . . . . . . 11.2.2 Generating Function of Velocity . . . . . . . . . . . . . . . . . . . . 11.2.3 Diffusion Constant and Mobility . . . . . . . . . . . . . . . . . . . .
xi
217 218 219 223 223 226 233 233 233 237 240 243 244 245 249
Part III Intermission: Interesting Models 12 Externally-Controlled Systems: Flashing Ratchet and Pump . . . . . . 12.1 Ratchet and Asymmetric Pumping . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.1 Flashing Ratchet and Curie Principle . . . . . . . . . . . . . . . . 12.1.2 Reversible Transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Hidden Pumping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
253 253 253 255 258
13 Direction of Transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1 Brownian Motor and Adiabatic Piston . . . . . . . . . . . . . . . . . . . . . . . 13.1.1 Brownian Motor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.2 Adiabatic Piston Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.3 Heuristic Argument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2 Parrondo’s Paradox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.1 Problem and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.2 Similarity to Simpson’s Paradox . . . . . . . . . . . . . . . . . . . .
265 265 265 266 267 271 271 273
14 Stationary Systems: From Brownian Motor to Autonomous Macroscopic Engines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1 Autonomous Ratchet Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.1 Feynman’s Ratchet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.2 Büttiker-Landauer Model . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.3 Unattainability of Carnot Efficiency . . . . . . . . . . . . . . . . . 14.2 Small Autonomous Models Attaining the Carnot Efficiency . . . .
275 275 275 276 278 279
xii
Contents
14.3 Macroscopic Autonomous Engines . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.1 Setup and Its Coarse-Grained Description . . . . . . . . . . . . 14.3.2 Maximum Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.3 Attainability of Carnot Efficiency . . . . . . . . . . . . . . . . . . . 14.4 Necessary Condition to Attain Carnot Efficiency . . . . . . . . . . . . . . 14.4.1 Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.4.2 General Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.4.3 Nonlinear Tight-Coupling Window . . . . . . . . . . . . . . . . . .
280 280 282 284 287 288 289 293
Part IV Inequalities 15 Efficiency at Maximum Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.1 Endoreversible Processes and Curzon-Ahlborn Efficiency . . . . . . 15.2 Onsager Matrix Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3 Linear Expansion with Velocity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
297 297 300 302 304
16 Trade-Off Relation Between Efficiency and Power . . . . . . . . . . . . . . . . 16.1 Carnot Efficiency and Finite Power: Prelude . . . . . . . . . . . . . . . . . . 16.1.1 No Restriction from General Frameworks . . . . . . . . . . . . 16.1.2 Model Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2 Trade-Off Relation Between Heat Current and Entropy . . . . . . . . 16.2.1 Main Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2.2 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3 Trade-Off Relation Between Efficiency and Power . . . . . . . . . . . . 16.4 Notion of Finite Speed and Finite Power . . . . . . . . . . . . . . . . . . . . . 16.4.1 Inherent Time Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.4.2 Time-Scale Separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.5 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.5.1 Inequality for General Conserved Quantities . . . . . . . . . . 16.5.2 Evaluation of . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
307 307 307 310 314 314 316 320 322 322 323 325 325 326
17 Thermodynamic Uncertainty Relation . . . . . . . . . . . . . . . . . . . . . . . . . . 17.1 Thermodynamic Uncertainty Relation . . . . . . . . . . . . . . . . . . . . . . . 17.1.1 Main Claim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.1.2 Proof Based on Generalized Cramér-Rao Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.2 TUR-Type Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.2.1 Generalization of Thermodynamic Uncertainty Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.2.2 Kinetic Uncertainty Relation . . . . . . . . . . . . . . . . . . . . . . . 17.2.3 The Optimal TUR-Type Inequality . . . . . . . . . . . . . . . . . . 17.2.4 Attainability of Equality in TUR-Type Inequalities . . . . 17.3 Thermodynamic Uncertainty Relation for Ballistic Transport with Broken Time-Reversal Symmetry . . . . . . . . . . . . .
329 329 329 330 335 335 343 344 348 350
Contents
xiii
17.4 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.4.1 TUR in Langevin Systems . . . . . . . . . . . . . . . . . . . . . . . . . 17.4.2 Alternative Derivation of Thermodynamic Uncertainty Relation with Large Deviation Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.4.3 Weaker Relation Derived from Time-Reversal Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.4.4 Statistical Meaning of the Cramér-Rao Inequality and the Fisher Information . . . . . . . . . . . . . . . . . . . . . . . . .
353 353
354 359 360
18 Speed Limit for State Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.1 Geometric Viewpoint for Speed Limit Inequalities . . . . . . . . . . . . 18.2 Speed Limit for Overdamped Langevin System . . . . . . . . . . . . . . . 18.2.1 Linear Expansion by Speed . . . . . . . . . . . . . . . . . . . . . . . . 18.2.2 Optimal Bound with Wasserstein Distance . . . . . . . . . . . . 18.3 Speed Limit for General Markov Processes on Discrete States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.3.1 Speed Limit with Entropy Production . . . . . . . . . . . . . . . . 18.3.2 Numerical Demonstration . . . . . . . . . . . . . . . . . . . . . . . . . . 18.3.3 Optimal Speed Limit with Pseudo Entropy Production . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.4 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.4.1 Quantum Speed Limit for Isolated Systems . . . . . . . . . . .
363 363 364 365 366
19 Variational Aspects of Entropy Production . . . . . . . . . . . . . . . . . . . . . . 19.1 Variational Expression of Entropy Production Rate and Bounds in Relaxation Processes . . . . . . . . . . . . . . . . . . . . . . . . 19.1.1 Variational Expression of Entropy Production Rate . . . . 19.1.2 Bound for Relaxation Processes . . . . . . . . . . . . . . . . . . . . . 19.1.3 Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.2 Variational Expression of Excess and Housekeeping Entropy Productions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.2.1 Excess/Housekeeping Decomposition . . . . . . . . . . . . . . . . 19.2.2 Bound of Excess Entropy Production in Relaxation Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.3 Entropy Production with Inappropriate Initial Distribution . . . . . .
387
Part V
369 370 374 377 382 382
387 387 391 393 394 395 399 400
Notes and History
20 Notes and History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.1 Notes and History of Part I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.2 Notes and History of Part II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.3 Notes and History of Part III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.4 Notes and History of Part IV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
405 405 406 409 410
xiv
Contents
Appendix A: Derivation of Eqs. (14.15) and (14.18) . . . . . . . . . . . . . . . . . . . 413 Appendix B: Proof of Eq. (16.49) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 Appendix C: Evaluation of in Several Systems . . . . . . . . . . . . . . . . . . . . . 419 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
Chapter 1
Background
1.1 Aims of Stochastic Thermodynamics Stochastic thermodynamics is an extended form of thermodynamics to small fluctuating systems such as Brownian particles, molecular motors in living systems, and mesoscopic quantum dots. With this framework, novel relations, including the celebrated fluctuation theorem and related relations [1–6] have been discovered, which reveal a hidden symmetric structure of thermodynamic irreversibility in nonequilibrium fluctuations. In the last two decades, stochastic thermodynamics has attracted the interest of many physicists. Besides them, one basic but important achievement of stochastic thermodynamics is to establish how to define thermodynamic quantities (e.g., heat, work, and entropy) in stochastic fluctuating systems and what relations (e.g., the first and the second law of thermodynamics) hold among them. This is a highly nontrivial task from the standpoint of conventional macroscopic thermodynamics and statistical mechanics. In fact, not so long ago, some people even consider that small stochastic systems may escape from the restriction of conventional macroscopic thermodynamics1 , which can also be seen in many proposals of the perpetual motion machines of the second type. This skeptical view is refuted by the above development of formulations. We here summarize the significant motivation of stochastic thermodynamics, which is mainly divided into two roots. The first is based on experimental observations of small fluctuating systems, from biochemical systems to quantum mesoscopic systems. We here take molecular motors in biological systems as prominent examples. From the viewpoint of thermodynamics and statistical mechanics, molecular motors can be regarded as engines converting the chemical potential of resources such as ATP to mechanical force. However, molecular motors apparently differ from conventional heat engines in many aspects: Molecular motors are so small that thermal fluctuation affects them, which sometimes disturbs 1
For example, Feyerabend [7] states that the Brownian particle can violate the second law of thermodynamics. © Springer Nature Singapore Pte Ltd. 2023 N. Shiraishi, An Introduction to Stochastic Thermodynamics, Fundamental Theories of Physics 212, https://doi.org/10.1007/978-981-19-8186-9_1
1
2
1 Background
and sometimes helps their motion. This shows clear contrast to conventional heat engines, which do not fluctuate. In addition, molecular motors work autonomously under fluctuation, whereas conventional heat engines are operated externally and deterministically. Interestingly, careful experiments have revealed high efficiency of many molecular motors as F1-ATPase [8, 9], kinesins2 [11], and myosins [12] compared to our power plants3 . Therefore, why and how molecular motors achieve high efficiency and what is the significant difference of molecular motors from conventional heat engines are important questions in nonequilibrium statistical mechanics. The second is motivated by a theoretical interest to understand what structures or fundamental relations exist. One of the central problems in nonequilibrium statistical mechanics is the foundation of thermodynamic irreversibility from a microscopic viewpoint. The problem of how the arrow of time appears from microscopic reversible dynamics4 has been debated since Boltzmann. Aside from this, several relations in nonequilibrium statistical mechanics rely on both thermodynamics and microscopic dynamics. For example, the fluctuation-dissipation relation was first derived by using the consistency with the second law of thermodynamics [13] and later derived from microscopic dynamics [14, 15]. The original derivation of the Onsager reciprocity theorem [16] employed the combination of thermodynamic phenomenology and reversibility of microscopic dynamics. Although all of these relations are well established, it is a fruitful task to clarify what aspects of thermodynamics and nonequilibrium statistical mechanics are reflected in these relations, which will also help us unveil novel relations. Discovering universal relations in nonequilibriums systems is also important. Unlike equilibrium and near-equilibrium systems, few relations are known in nonequilibrium systems. If we succeed in clarifying universal relations, they might offer clues for the comprehensive characterization of nonequilibrium dynamics, which is an ultimate goal of nonequilibrium statistical physics. Stochastic thermodynamics has answered these two questions, at least partially. Regarding the first set of questions, stochastic thermodynamics and stochastic energetics are formulated as a thermodynamic framework for small stochastic systems, ensuring that molecular motors are not different from conventional heat engines from this aspect. In addition, many ratchet models [17] confirm that the unidirectional motion of molecular motors in a stochastic environment is not a surprising phenomenon. Some researchers, including biophysicists, argued that the specialty of molecular motors is seen in the use of information [18]. Molecular motors consist of several subsystems, and experimental observations [18] suggest that one of the subsystems behaves as if it measures another subsystem and changes its motion depending on the measurement outcome, which is a kind of information processes5 . This interpre2
We, however, note that some recent experiments on kinesins reported that the efficiency of kinesins is not so high [10]. 3 The efficiencies of some molecular motors are around 0.7–0.9, whereas those of power plants are usually less than 0.5. 4 Both Hamilton dynamics and unitary evolution are reversible. 5 In the case of F1-ATPase, the α 3 β 3 unit surrounds the γ shaft. The α 3 β 3 unit has three main stable states, playing as three different potential landscapes of the γ shaft. The experiment [18]
1.2 Overview of This Textbook
3
tation meets the intuitive picture that molecular motors work by harnessing thermal fluctuation. Information thermodynamics serves as a stage to analyze the connection between information and thermodynamic quantities. Autonomy is also the characteristic of molecular motors, which shows a sharp contrast to conventional externally-controlled engines. Since quasistatic controls cannot be realized in autonomous conditions, the achievability of the Carnot efficiency is nontrivial. Recent studies have revealed the conditions for autonomous systems to achieve the Carnot efficiency. In addition, recent progress in stochastic thermodynamics elucidate several tradeoffs between entropy production and some quantities, including the speed of operations and current fluctuations. These inequalities enable us to define a novel type of efficiency, which may serve as guiding principles of molecular motors in their history of evolution. Regarding the second set of questions, various universal relations have been derived. In particular, the fluctuation theorem provides a clear understanding of thermodynamic irreversibility. It sheds new light on the importance of time-reversal symmetry and microscopic reversibility. Various relations on entropy production in fact rephrase this symmetry. In addition, known macroscopic laws in thermodynamics and nonequilibrium statistical mechanics, including the second law of thermodynamics, the fluctuation-dissipation theorem, and the Onsager reciprocity theorem, are reproduced by the fluctuation theorem. It is well known that the second law of thermodynamics and other thermodynamic relations are violated if a process accompanies information processing, which is called the Maxwell’s demon problem. Information thermodynamics establishes how to construct a thermodynamic framework for a single subsystem with information processes by introducing the mutual information. From a more abstract viewpoint, information thermodynamics pushes the idea of additive decomposition of entropy production, where we decompose the total entropy production into each small component which individually satisfies thermodynamic relations. Recently, a number of inequalities, not equalities, have been proposed in stochastic thermodynamics. These inequalities suggest that entropy production also plays the role of the fundamental limitation of the speed of operations in systems. The notion of speed is not fully captured in conventional thermodynamics. The modern framework of stochastic thermodynamics connects two important concepts, the speed of dynamics and the thermodynamic irreversibility.
1.2 Overview of This Textbook This textbook consists of four parts. Part I is devoted to mathematical foundations and definitions of basic quantities in stochastic thermodynamics. We introduce stochastic reports that the α 3 β 3 unit changes its state as if it measured the state of the γ shaft and performs feedback control depending on the measurement outcome.
4
1 Background
processes both on discrete states and in continuous space (Langevin systems). In addition to this, we describe how to define thermodynamic quantities (heat, work, and entropy) in small stochastic systems. In Part II, we present various equalities in stochastic thermodynamics. The most important one is the fluctuation theorem, which unveils an unexpected symmetry in nonequilibrium fluctuations. In fact, most of the equalities shown in stochastic thermodynamic can be regarded as variants of the fluctuation theorem. Another important achievement is information thermodynamics, which combines information theory and stochastic thermodynamics. With this framework, we can analyze the role of information in thermodynamic processes with measurement and feedback operations. In particular, we solve the problem of Maxwell’s demon with this framework. Part III is an exceptional part, where we mainly treat concrete toy models, not universal relations. One-directional transports by external operations and autonomous free energy transducers are two main subjects in this part. We first present numerous toy models showing interesting behaviors, and then seek the general principles behind them. In Part IV, we present various inequalities in stochastic thermodynamics. One important inequality shown in this part is the thermodynamic uncertainty relation, which connects entropy production and fluctuation around nonequilibrium stationary states. Other inequalities mainly concern the relationship between the entropy production and the speed of dynamics, which elucidates a novel aspect of thermodynamic irreversibility.
1.2.1 Overview of Part I In Part I, we prepare mathematical tools and explain the basic framework of stochastic thermodynamics. Readers who are familiar with Markov processes, the local detailed-balance condition, and definitions of quantities in stochastic thermodynamics can start from Part II without reading this part. In Chap. 2, we introduce Markov processes and Markov jump processes on discrete states. In stochastic systems, our main interest is in the probability distribution p on microscopic discrete states, which evolves according to master equation. d p(t) = R p(t), dt where R is the transition matrix. With some reasonable assumptions on R, the stationary distribution pss satisfying dtd pss = R pss = 0 uniquely exists, and any initial distribution converges to this stationary distribution. We shall prove these results in Sect. 2.3. We then review the framework of stochastic thermodynamics on discrete states in Chap. 3. We first introduce the Shannon entropy and the stochastic entropy (sur-
1.2 Overview of This Textbook
5
prisal) in Sect. 3.1, and then define the heat, work, and entropy production in stochastic systems. The entropy production σ is the most important quantity in stochastic thermodynamics, which quantifies the degree of thermodynamic irreversibility of processes. At the same time, we introduce key ideas; the time-reversal symmetry in equilibrium states and the (local) detailed-balance condition, which are employed in the characterization of heat. At the end of this chapter (Sect. 3.4), we clarify the differences between conventional thermodynamics and stochastic thermodynamics. In Chap. 4, we treat stochastic processes and stochastic thermodynamics in continuous space. Note that this textbook mainly treats discrete systems and continuous systems appear only in Chap. 11, Sects. 12.1, 17.4.1, and 18.2. Therefore, readers who do not plan to read these sections can skip this chapter. Unlike the case with discrete states, stochastic processes in continuous space require various careful treatments. A continuous stochastic variable xˆ evolves according to the following form of a stochastic differential equation (a general form of Langevin equations): d xˆ = a(x(t), ˆ t) + b(x(t), ˆ t)ξˆ (t), dt where ξˆ (t) is the white Gaussian noise satisfying ξˆ (t) = 0 and ξˆ (t)ξˆ (t ) = ˆ t) and ξˆ (t). In δ(t − t ). The problem lies in the rule of the product of b(x(t), Sect. 4.1.2, we introduce two important rules of products, the Itô product and the Stratonovich product, and explain how the rule of products affects stochastic dynamics. We also introduce the corresponding Fokker-Planck equation, which describes the time evolution of the probability distribution P(x, t). The problem of the rule of products appears in the definition of heat in Langevin systems. As shown in Sect. 4.3, the definition of heat must employ the Stratonovich product to satisfy the first law of thermodynamics. In Sect. 4.6, we demonstrate how to discretize stochastic processes in continuous space into those on discrete states, which allows us to reproduce the results in discrete systems to continuous systems.
1.2.2 Overview of Part II In Part II, we present various equalities in stochastic thermodynamics. Two important achievements, the fluctuation theorem and the information thermodynamics, are presented in this part. We introduce and prove the fluctuation theorem and the Jarzynski equality in Chap. 5. The integral fluctuation theorem (IFT) is expressed as
e−σˆ = 1,
6
1 Background
where σˆ is the entropy production and · represents the ensemble average. Notably, this equality holds in general nonequilibrium processes far from equilibrium. One big advantage of the fluctuation theorem is that it can reproduce known relations in nonequilibrium statistical mechanics. We derive the second law of thermodynamics (Sect. 6.1), the fluctuation-dissipation theorem (Sect. 6.2), and the Onsager reciprocity theorem (Sect. 6.3) from the fluctuation theorem. Not only reproducing existing relations, the fluctuation theorem also produces novel relations. We derive a higher-order fluctuation-dissipation theorem in Sect. 6.2.3. We note that the connection between the IFT and the fluctuation-dissipation theorem can be extended to IFT-type equalities. We present some extensions of the fluctuation-dissipation theorem to nonequilibrium stationary states in Sect. 10.1. The form of the IFT now becomes a standard form to manifest the existence of a thermodynamic structure because we can withdraw various thermodynamic relations from IFT-type equalities, as mentioned above. We present some important IFT-type equalities in Chap. 7. In Sect. 7.1, we explain a generalized form of the fluctuation theorem, the Hatano-Sasa relation, which serves as a pioneering work for steady state thermodynamics. In Sect. 7.2, we treat systems with coarse-graining of quick variables. While the entropy production is preserved through coarse-graining in equilibrium cases, it generally decreases in nonequilibrium cases. This decrease is characterized by an IFT-type equality. The fluctuation theorem reflects a hidden symmetric property of entropy production. This symmetric property takes various forms and appears in various situations besides the IFT, which is the subject of Chap. 8. In Sect. 8.2, we see this symmetry in the cumulant generating function. In Sect. 8.3, we consider waiting time statistics of entropy production. In particular, we show that the stochastic variable e−σˆ is a martingale in stationary systems. In Sect. 8.3, we introduce stochastic efficiency and show that the least probable stochastic efficiency is the Carnot efficiency. This fact is shown by combining a geometric interpretation of stochastic entropy and the fluctuation theorem. Chapter 9 is devoted to another important achievement in stochastic thermodynamics, the information thermodynamics. We start from the problem of Maxwell’s demon. We review the arguments of Maxwell, Szilard, Brillouin, Landauer, and Bennett in Sect. 9.1. Although some papers and books state that the memory erasure is crucial to understanding Maxwell’s demon, we show that this argument is somewhat secondary (which is discussed in Sect. 9.2.3). To clarify this point, we need to formulate the information thermodynamics. We introduce the Sagawa-Ueda relation (Sect. 9.3) ˆ e−σˆ + I = 1, where Iˆ is the change in the mutual information between the system of interest and another system (e.g., a memory). Accordingly, we obtain the generalized second law: σ ≥ I . The Sagawa-Ueda relation reveals that if the exchange in mutual information exists between the system and another system, thermodynamic relations must be modified.
1.2 Overview of This Textbook
7
The information thermodynamics is expected to capture the informational aspect of biological systems, including molecular motors. However, the Sagawa-Ueda relation applies only systems with external controls, and autonomous systems such as biological ones are out of the scope. To overcome this problem, we present two extensions of the information thermodynamics to cover general information processes. In Sect. 9.5, we introduce the partial entropy production which is a decomposition of entropy production σˆ into each possible transition. Notably, the partial entropy production also satisfies an IFT-type equality. Applying this idea to composite systems, we find a generalization of the Sagawa-Ueda relation for general information processes. In Sect. 9.6, we present another type of generalization; relations on causal networks. We introduce the transfer entropy, which is a kind of conditional mutual information, and using this, we show the Ito-Sagawa relation. In Chap. 10, we investigate relations on response functions around nonequilibrium stationary states. From vast literature seeking the extension of the fluctuationresponse relation to nonequilibrium stationary states, we pick up results around nonequilibrium stalling states. In Sect. 10.1, we show that the fluctuation-response relation holds around nonequilibrium stalling states in the same form as the conventional one. As another topic, in Sect. 10.2, we derive some equalities and inequalities on the response of the stationary distribution. Chapter 11 is an exceptional chapter, where we treat overdamped Langevin systems, not Markov jump processes on discrete states. Thanks to the Gaussian property, we can compute the path probability of overdamped Langevin systems P( ) in a simple form, which is called the Onsager-Machlup functional (Sect. 11.1.1). Using this expression, in Sect. 11.1.3 we derive the Harada-Sasa relation, which claims that the violation of the fluctuation response relation in the nonequilibrium stationary state is directly related to the stationary heat dissipation. The Harada-Sasa relation is useful in experiments since the stationary heat dissipation is not easy to observe experimentally whereas both the fluctuation and response are measurable. We also present explicit forms of the stationary distribution, the diffusion coefficient, and the mobility in one-dimensional overdamped Langevin systems by employing the techniques of the cumulant generating function (Sect. 11.2).
1.2.3 Overview of Part III Part. III is an intermission, where we introduce various interesting models. Unlike other parts, most of the arguments in this part (except Sect. 14.4) are model-dependent and not universal. The subject of Chap. 12 is one-directional transport by external driving. The Curie principle states that if the driving is symmetric, we cannot induce asymmetric, onedirectional transport. However, the converse is not always true: some asymmetric driving cannot induce one-directional transport. We first introduce a flashing ratchet, which is a simple model of one-directional transport by switching potential. We then examine the possibility of reversible one-directional transport. One stimulat-
8
1 Background
ing model, the hidden pump model, realizes (apparently) reversible one-directional transport with finite speed. In Chap. 13, we see the fact that deciding the direction of transport is not an easy task. We first consider a simple model of Brownian motors in Sect. 13.1. We construct a composite system by rigidly connecting a rectangle object and a wedge-shaped object in baths with different temperatures. This composite object moves steadily in one direction, accompanying heat flow from hot to cold. This model reproduces the adiabatic piston as its limiting case. We present a heuristic and qualitative (partially semi-quantitative) argument to determine the direction of the movement of the composite object. In addition, in Sect. 13.2, a strange composite system called Parrondo’s game is analyzed. In this game (system), two subsystems realize transport in the same direction, whereas their composite system realizes transport in the opposite direction. The subject of Chap. 14 is the behavior of autonomous free energy transducer (stationary cross-transport) from the aspect of maximum efficiency with finite temperature difference (or finite chemical potential difference). We first examine two famous models, Feynman’s ratchet and the Büttiker-Landauer model, which convert heat flow to work (Sect. 14.1). Although these models are sometimes claimed to achieve the Carnot efficiency in the quasistatic limits, we show that these models in fact fail to achieve the Carnot efficiency even in the quasistatic limit. The difficulty for autonomous systems to achieve the Carnot efficiency lies in the fact that all variables in autonomous engines inevitably fluctuate, which may cause undesired heat leakage leading and the suppression of efficiency. The attainability of the Carnot efficiency is not expected of all autonomous engines, and in fact we reveal that autonomous engines attain the Carnot efficiency only by satisfying severe conditions. In Sect. 14.2 we briefly review some small autonomous models which attain the Carnot efficiency. In Sect. 14.3 we introduce a model of a macroscopic autonomous engine converting chemical potential difference into mechanical work, whose efficiency is less than the Carnot efficiency in moderate setups, but reaches the Carnot efficiency with singular transition rates. In Sect. 14.4, we derive a general necessary condition for autonomous engines to attain the Carnot efficiency. We show that a certain type of singularity is inevitable to attain the Carnot efficiency, as suggested in the previous section. We then clarify the difference between autonomous engines with finite size and that in the thermodynamic limit by introducing the viewpoint of nonlinear tight-coupling window.
1.2.4 Overview of Part IV In Part IV, we present various inequalities in stochastic thermodynamics. Particularly important results in this part are the thermodynamic uncertainty relation and some inequalities manifesting the trade-off between the entropy production and the speed of dynamics.
1.2 Overview of This Textbook
9
Three chapters, Chaps. 15, 16, and 18, are devoted to investigating the relationship between the entropy production (or related quantities) and the speed of processes (or related quantities) from different perspectives. In Chap. 15, we consider the efficiency at maximum power, which is known to be bounded by the half of the Carnot efficiency in the linear response regime. We derive this result in three different setups, endoreversible thermodynamics, linear irreversible thermodynamics for stationary systems, and a linear expansion of velocity. In Chap. 16, we prove the trade-off relation between power and efficiency. It is plausible to expect that an engine with large power inevitably accompanies much dissipation, which implies less efficiency. In particular, a heat engine at the Carnot efficiency is expected to have zero power. However, conventional frameworks, thermodynamics and linear irreversible thermodynamics, do not formally prohibit the coexistence of finite power and the Carnot efficiency. To resolve this controversy, we employ the framework of stochastic thermodynamics. In Sect. 16.2, we √ first derive a trade-off relation between heat current and entropy production |J q | ≤ σ˙ with a coefficient , and applying it we obtain a trade-off inequality between power and efficiency: W ¯ L η(ηC − η), ≤ β τ ¯ is a coefficient (time-average of ), βL is the inverse temperature of the cold where
bath, and ηC is the Carnot efficiency. This inequality clearly shows that an engine with high efficiency has lower maximum power, and in particular an engine at the Carnot efficiency has zero power. In Chap. 17, we present an important inequality in stochastic thermodynamics, the thermodynamic uncertainty relation. The thermodynamic uncertainty relation claims that the entropy production in stationary systems is bounded by any relative fluctuation of current as Var(J d ) σ ≥ 2, (Jdss )2 where Jd is a cumulative current and Var(·) represents the variance. The thermodynamic uncertainty relation is now understood as a special case of the generalized Cramèr-rao inequality. We present its proof with this approach in Sect. 17.1.2. Extensions of the thermodynamic uncertainty relation and its optimality are discussed in Sect. 17.2. We consider speed limit inequalities for classical stochastic systems in Chap. 18. The speed limit inequality is a trade-off relation between the time length of the process (i.e., the speed) and some quantity, which is regarded as the cost of quick state transformation. In the case of both overdamped Langevin systems (Sect. 18.2) and Markov jump processes on discrete states (Sect. 18.3), we find that the entropy production bounds the speed of state transformation. The latter speed limit inequality is expressed as L( p(0), p(τ ))2 ≤ τ, 2σ Aτ
10
1 Background
where L( p, p ) is a distance between two probability distributions, A is the activity (average number of jumps), and τ is the time length of the state transformation. Both this chapter and Chap. 16 show that the entropy production, which is a quantifier of thermodynamic irreversibility, is also essential in the argument of the maximum speed of dynamics. In the final chapter (Chap. 19), we look at variational aspects of entropy production. Variational relations play the role of both equality and inequality: A variational relation is an inequality, and its equality is achievable. We show three variatinoal aspects of entropy production.
1.2.5 How to Read This Textbook? For readers interested only in Markov jump processes, a course with Chaps. 2 and 3 (basics), Chaps. 5 and 6 (fluctuation theorem), Sect. 7.1 (Hatano-Sasa relation), Sects. 9.2 and 9.3 (basics of information thermodynamics; Sagawa-Ueda relation), Sects. 16.2 and 16.3 (trade-off relation between efficiency and power), Sects. 17.1 and 17.2 (thermodynamic uncertainty relation), Sect. 18.3 (classical speed limit inequality) will provide concise but sound learning. Advanced readers are invited to further reading of Sect. 7.2 (entropy production under coarse-graining), Sect. 8.3 (waitingtime statistics), Sect. 8.4 (stochastic efficiency), Sect. 9.5 (information thermodynamics for general information processes), Sect. 9.6 (Ito-Sagawa relation), Sect. 10.1.1 (fluctuation-dissipation theorem at nonequilibrium stalling state), Sect. 13.1 (Brownian motors), Chap. 15 (efficiency at maximum power), Sects. 19.1 and 19.2 (variational expression of entropy production rate and excess entropy production rate), depending on their interests. For readers interested in Langevin systems, a short course with Chap.4 (stochastic processes and stochastic thermodynamics in continuous space), Chap. 11 (various results in Langevin systems, which includes the fluctuation theorem), and Sect. 18.2 (a speed limit inequality for Langevin systems) will serve as a brief overview of Langevin systems. We here classify results in this textbook as with or without the local detailedbalance condition: The following results are proven without requiring the local detailed-balance condition: • • • • • • •
6
The fluctuation theorem (5.14). The Hatano-Sasa relation (7.10). The IFT-type equality for hidden entropy production6 (7.39). The Sagawa-Ueda relation (9.28). The IFT-type equality for partial entropy production (9.61). The trade-off inequality between efficiency and power (16.65). The classical speed limit inequality (18.34).
This relation, however, requires some assumptions weaker than the local detailed-balance condition.
1.3 Notation, Terminologies and Remarks
11
• The variational expression of entropy production rate (19.1). • The Kolchinsky-Wolpert relation (19.41). Here, the numbers in the bracket represent equation numbers. These relations only use the fact that a system attached to a heat bath relaxes to the equilibrium state (the canonical distribution), and thus these relations have high universality. In contrast, the following results are proven with requiring the local detailedbalance condition. Although these relations also hold in various systems, the degree of universality is less than the previous ones. • • • • • •
The Speck-Seifert relation (7.17). The martingale property of entropy production (8.47). The least probable stochastic efficiency (8.61). The fluctuation-dissipation theorem for stalling state (10.5). The thermodynamic uncertainty relation (17.4) The variational expressions of excess and housekeeping entropy production (19.22) and (19.25).
1.3 Notation, Terminologies and Remarks Throughout this textbook, we frequently use the following notation, terminologies and setups without any explicit explanation: Quantities • A bracket · represents an ensemble average of a stochastic quantity. • A quantity with hat Aˆ means that this quantity is a stochastic variable. A quantity ˆ without hat means its ensemble average A := A. • Calligraphic symbols are reserved to represent time-integrated quantities (e.g., τ J := 0 dt J (t)). • A derivative ∂∂ xf is also expressed as ∂x f . States • We usually use the symbol w to represent a discrete state. The label of a state appears in the subscript7 as wi . We frequently use a shorthand notation i to refer to state wi . • The probability distribution of state w at time t is denoted by P(w, t). In a vector representation, we write pw (t). The probability distribution of state wi is also denoted by pi (t) eq • pw represents the equilibrium distribution of state w. pwss represents the stationary (steady state) distribution of state w. 7
Remark that in the systems in continuous space, we use the subscripts to represent the number of steps.
12
1 Background
• When we represent a vector, we employ a bold font as p. • w¯ represents the time-reversal state of state w, where parity-odd variables (e.g., momentum) are multiplied by −1. Dynamics (on discrete states) • We use the word “Markov jump processes” to refer to processes with continuous time. We use the word “Markov chains” to refer to processes with discrete-time. This textbook mainly treats continuous-time Markov jump processes on discrete states otherwise explicitly noted. • We refer to the matrix R in the master equation for Markov jump processes (Eq. (2.10)) as “transition rate matrix”. We refer to the matrix T in the time evolution equation of Markov chains (Eq. (2.4)) as “transition probability matrix”. • In a Markov jump process, a transition rate from a state w to another state w at time t is written as Pw→w ;t . The corresponding transition matrix is written as Rw w . Since the former symbol is preferable in representing path probabilities, we mainly use the former symbol Pw→w ;t in Part II8 and mainly use the latter symbol Rw w in Part IV. • The escape rate with a state w at time t is denoted by ew,t . • The transition rate in the time-reversal system (i.e., if there exists a parity-odd field such as a magnetic field in the original system, then we invert its sign in the † time-reversal system) is denoted by Pw→w ;t . ˜ • The transition rate with tilde as Pw→w ;t represents the dual transition rate defined in Eq. (7.2). • For a transient process, we usually set its time interval as 0 ≤ t ≤ τ . • A stochastic trajectory in a Markov jump process is denoted by . In this trajectory, the number of jumps is written as N . The n-th jump occurs at t = t n , and the state is changed from wn−1 to w n . The superscript represents the order of jumps. We write t 0 = 0 and t N +1 = τ for convenience. Langevin systems (in continuous space) • We usually use the symbol x to represent a state in continuous space. • When we consider a discretized version of Langevin systems with respect to time, the position (state) and the time at the n-th step are denoted by xn and tn , respectively. • The white Gaussian noise (see Sect. 4.1.1) is denoted by ξˆ (t). Its discretization with time interval t is written as ξˆt (t). • The rule of product in stochastic differential equation (see Sect. 4.1.2.1) is denoted by α. • The Itô, Stratonovich, and anti-Itô product (see Sects. 4.1.2.2 and 4.2.1) are represented by ·, ◦, and , respectively.
8
The exception is Chap. 10, where we use the symbol Rw w . In this chapter, we treat response functions in stationary systems with symbols.
1.3 Notation, Terminologies and Remarks
13
Others • • • • • • • • • • • •
The number of particles is denoted by M. Each particle is labeled with m. The number of heat baths is denoted by k. Each bath is labeled with ν. The energy of state w is written as E w . The heat current is positive for energy transfer from the system to a bath. Correspondingly, the heat is positive when the heat releases into a bath. The entropy production is denoted by σ . The entropy production rate is denoted by σ˙ . The inverse temperature is denoted by β. We normalize the Boltzmann constant kB to 1. The imaginary unit is denoted by i. The density matrix is denoted by ρ. The Onsager matrix is denoted by L. The work is denoted by W , which is positive when the work is extracted by an external agent. The spatial dimension is denoted by D. We also use the following abbreviations in this textbook:
• • • • • • • •
CE: Carnot efficiency CGACE: coarse-grained autonomous Carnot engine DFT: detailed fluctuation theorem EMP: efficiency at maximum power FDT: fluctuation-dissipation theorem FT: fluctuation theorem IFT: integral fluctuation theorem TUR: thermodynamic uncertainty relation
Part I
Basic Framework
Chapter 2
Stochastic Processes
In this chapter, we introduce a mathematical framework, stochastic processes, which is used to describe small stochastic systems. In most part of this textbook, we consider stochastic processes with discrete states, and this chapter is devoted to such processes. Mathematical foundations of stochastic processes in continuous space are presented in Chap.4. A stochastic process is a time evolution in a probabilistic manner. The dynamics of a Brownian particle is a celebrated example of stochastic processes, where the movement of the particle is given by the collision of enormous number of small water molecules. Since we do not know the detailed positions and momentums of water molecules, the dynamics of the Brownian particle can be predicted only in a probabilistic form. Another famous toy model of stochastic processes is a random walk on a one-dimensional lattice: A person on the lattice moves left or right by one site with probability half (e.g., by flipping a coin). Various biophysical systems including molecular motors and other elaborated proteins can also be described as stochastic processes.
2.1 Markov Process and Discrete-Time Markov Chain We here introduce an important class of stochastic processes, a Markov process. Roughly speaking, a Markov process is a stochastic process whose dynamics is determined only by its latest state. Although it can be defined in both discretetime and continuous-time, for simplicity we first explain the Markov property by taking stochastic processes in discrete time steps. Consider a stochastic process in N steps. Through this process, we obtain a stochastic sequence of N + 1 states, (w 0 , w 1 , w 2 , . . . , w N ), where wi is the state at the i-th step.1 This sequence is regarded as a trajectory of states. We call the transition from the state at the n − 1-th step to that at the n-th step as the n-th transition or the transition at the n-th step.
1
We denote the initial state by w 0 for convenience.
© Springer Nature Singapore Pte Ltd. 2023 N. Shiraishi, An Introduction to Stochastic Thermodynamics, Fundamental Theories of Physics 212, https://doi.org/10.1007/978-981-19-8186-9_2
17
18
2 Stochastic Processes
In general stochastic processes, the probability distribution of the state at the n-th step conditioned by the all past states w 0 , w 1 , . . . , w n−1 differs from that conditioned only by the last state w n−1 ; P(w n |w n−1 , w n−2 , . . . , w 1 , w 0 ) = P(w n |w n−1 ).
(2.1)
In contrast, in some cases, these two coincide for all n: P(w n |w n−1 , w n−2 , . . . , w 1 , w 0 ) = P(w n |w n−1 ),
(2.2)
which means that only the latest state matters the present probability distribution, and the other past states do not directly.2 We call the above type of a stochastic process as a Markov process. In the case with continuous time, the last state coincides with the present state, and thus the Markov property claims that the transition probability at time t is characterized only by the present probability distribution at t. The formal definition of a Markov process is provided in Sect. 2.4. A Markov process with discrete-time steps is particularly called a Markov chain. We consider a Markov chain on discrete states, where possible states of the system are labeled as w1 , w2 , . . . , w M , which are also written as 1, 2, . . . , M for brevity. This stochastic process can be described by using a transition probability matrix. Definition: Transition probability matrix
A matrix T is a transition probability matrix if all the matrix elements satisfy the following two conditions: • Nonnegativity: Ti j ≥ 0 for all i, j. • Normalization condition: The sum in each column is 1, that is, Ti j = 1
(2.3)
i
for any j.
Of course, the other past states w n−2 , . . . , w 0 can affect the present distribution through the latest state w n−1 .
2
2.1 Markov Process and Discrete-Time Markov Chain
19
Definition: Time evolution of Markov chain
Let pn be the probability distribution at the n-th step. The dynamics of the probability distribution is given by the following time evolution equation: p nj =
T ji pin−1 .
(2.4)
i
An element of the transition probability matrix Ti j provides the probability that the state at the next step is i under the condition that the present state is j. We also express the above time-evolution equation as pn = T pn−1 .
(2.5)
We require two conditions on the transition probability matrix in order to keep p a well-defined probability distribution. The nonnegativity is necessary to keep the condition is necessary probability distribution pi nonnegative.3 The normalization to keep the probability distribution normalized as i pi = 1. In fact, the sum of the all elements of pn is computed as j
p nj =
i, j
T ji pin−1 =
⎛ ⎞ pin−1 ⎝ T ji ⎠ ,
i
j
(2.6)
and thus provided i pin = 1, the right-hand side is identically one only if4 j T ji = 1 for all i. If the transition probability depends on time, the transition probability T at the nth step is identified by the superscript n as T n . The elements of a transition probability matrix at the n-th step are also written in the form of a conditional probability as Tinj = P((wi , n)|(w j , n − 1)),
(2.7)
where P((wi , n)|(w j , n − 1)) is a probability that the state at the n-th step is wi under the condition that the state at the n − 1-th step is w j .
If Ti j is negative, then applying T to a probability distribution with p n−1 = 1 and pkn−1 = 0(k = j j), the i-th element of the probability distribution at the n-th step is negative: pin = Ti j < 0. n−1 4 If = 1 and pkn−1 = 0 for k = i, we have j p nj = j T ji = c = 1 for some i, by setting pi j T ji = c = 1. 3
20
2 Stochastic Processes
2.2 Continuous Time Markov Jump Process on Discrete System We next consider a stochastic process with discrete states and continuous time, which we call a Markov jump process. This textbook mainly treats continuous-time Markov jump processes, since Markov chains sometimes involve unwanted additional technical care. Discrete-time Markov chains are discussed in the last half of the next section, Sect. 9.6, and the last half of Sect. 17.2.1. We note that relations in discretetime Markov chains are safely extended to continuous-time Markov jump processes. We introduce continuous-time Markov jump processes in an intuitive and nonrigorous manner: We construct continuous-time Markov jump processes by taking the short time limit of discrete-time Markov chains. A more formal characterization is presented in Sect. 2.4. We construct a stochastic process in the time interval 0 ≤ t ≤ τ . To this end, we first discretize the time interval 0 ≤ t ≤ τ into N steps with short time interval t (i.e., N t = τ ). On this discrete time, we introduce a discrete-time Markov chain with the transition probability matrix T given by5 Tinj =
Ri j (nt)t i = j, 1 − k Rk j (nt)t i = j,
(2.8)
where Ri j (t) with i = j is a fixed real-valued function.6 By taking t → 0 and N → ∞ limit with keeping N t = τ , we obtain a continuous-time Markov jump process. The time evolution equation of the probability distribution p(t) is given by the following master equation. Definition: Transition rate matrix
A matrix R is a transition rate matrix if it satisfies the following two conditions: • Nonnegativity: Ri j ≥ 0 for all off-diagonal elements (i.e., i = j). • Normalization condition: The sum of elements in each column is 0: Ri j = 0 (2.9) i
for any j.
5 6
By taking t sufficiently small, all the matrix elements of T become nonnegative. We implicitly assume that function Ri j (t) is continuous almost everywhere.
2.2 Continuous Time Markov Jump Process on Discrete System
21
Definition: Master equation
Consider a continuous-time Markov jump process on discrete states. The time derivative of pi (t) is given by the following equation called a master equation: d pi (t) = Ri j (t) p j (t), dt j
(2.10)
where R(t) is a transition rate matrix at time t.
We refer to the transition rate matrix also as transition matrix in short. Unlike the case of the transition probability matrix, the diagonal elements of a transition matrix is nonpositive, and the sum in each column is zero, not one. This difference comes from the fact that the left-hand side of the master equation Eq. (2.10) is the derivative of the probability distribution, not the probability distribution itself. The transition probability that the state j jumps to another state i at time t per unit time denoted by Pi→ j;t is equal to the off-diagonal element of the transition rate matrix (2.11) R ji (t) = Pi→ j;t for i = j. The minus of the diagonal element of a transition matrix R j j (t) with state j is called the escape rate of state j. The escape rate characterizes the staying probability Prem ( j; 0, τ ), which is a probability that the system remains at the state j from time t = 0 to t = τ . Definition: Escape rate
The escape rate of state j is defined as e j,t :=
P j→i;t .
(2.12)
i(= j)
The quantity e j,t · t represents the probability that the state j at time t jumps to other states between t and t + t. By construction, we have R j j (t) := −e j,t . Theorem: Staying probability
The staying probability that the state remains at j from time 0 to τ denoted by Prem ( j; 0, τ ) is given by
Prem ( j; 0, τ ) = e−
τ 0
e j,t dt
.
(2.13)
22
2 Stochastic Processes
Note that if Pi→ j;t is time independent, then this is the same as the well-known property of the Poisson process that the waiting time follows an exponential distribution. Proof We divide the time interval τ into N steps with the length t as τ = N t and take the N → ∞ and t → 0 limit with keeping τ fixed. By definition of the escape rate, the staying probability is calculated as Prem ( j; 0, τ ) =
N −1
(1 − e j,nt t + O(t 2 ))
n=0
=
N −1
(e−e j,nt t + O(t 2 ))
n=0
= e
−
N −1 n=0
e j,nt t
+ O(t).
(2.14)
In the last line, we used O(N t 2 ) = O(t). Taking the N → ∞ and t → 0 limit, we obtain the desired equality Eq. (2.13). The term O(t 2 ) in the first line of Eq. (2.14) has two origins. First is the change of the escape rate in time. We choose e j,nt as a representative of the escape rate e j,t in the time interval nt ≤ t < (n + 1)t. Actual escape rates may change and differ from e j,nt , whose difference is evaluated as O(t 2 ). Second is the possibility that jumps may occur twice or more in nt ≤ t < (n + 1)t, whose contribution is also evaluated as O(t 2 ).
2.3 Convergence Theorem Consider a continuous-time Markov jump process with a transition rate matrix R of the size K × K . We suppose that R is time-independent. We identify K states as integers 1, 2, · · · K for brevity. We shall investigate the properties of the stationary distribution. Since a physical system usually relaxes to the unique stationary state after a long time, the existence and uniqueness of the stationary distribution are strongly expected in plausible setups. In the following, we prove that a moderate R has a unique stationary distribution, and any initial state converges to this stationary distribution in the long time limit. To state the conditions on transition rate matrices and transition probability matrices, we first introduce several properties on matrices.
2.3 Convergence Theorem
23
Definition: Connectivity
A K × K transition rate matrix R satisfies connectivity7 if for any positive integers i, j ≤ K (i = j) there exists a sequence of integers a1 , a2 , · · · an such that a1 = i, an = j and Ral+1 al > 0 for 1 ≤ l ≤ n − 1. We define the connectivity of transition probability matrices T in a similar manner. The connectivity claims that for any j and k, there exists a pass from j to k with nonzero transition probability. For example, the matrix ⎛
−2 ⎜0 R=⎜ ⎝0 2
3 −3 0 0
0 1 −1 0
⎞ 0 0⎟ ⎟ 2⎠ −2
(2.15)
is a transition rate matrix with connectivity. In contrast, the matrix ⎛ −4 ⎜1 R=⎜ ⎝1 2
0 −5 2 3
0 1 −4 3
⎞ 0 5⎟ ⎟ 2⎠ −7
(2.16)
is a transition rate matrix without connectivity, because the states 2, 3, and 4 cannot go to the state 1. Another example of a transition rate matrix without connectivity is ⎛ −2 ⎜2 R=⎜ ⎝0 0
3 −3 0 0
0 0 −2 2
⎞ 0 0⎟ ⎟, 4⎠ −4
(2.17)
where the set of states 1 and 2, and that of 3 and 4 are separated as isolated islands. Theorem: Existence and uniqueness of stationary distribution
Consider a transition rate matrix R with connectivity. Then, there uniquely exists a positive vector p (up to a constant factor) satisfying R p = 0. Here, we call for all i. By normalizing the above a vector p positive vector if pi > 0 holds vector p as a probability distribution (i.e., i pi = 1), the obtained p is called the stationary distribution of R denoted by pss . In the case of discrete-time Markov chains, we can obtain the same result by just setting R = T − I , where T is the transition probability matrix and I is the identity matrix. 7
A matrix with connectivity is also called irreducible.
24
2 Stochastic Processes
The positivity of p ensures that we can set p as a probability distribution by normalizing it. In most textbooks, this theorem and the convergence theorem shown soon later are derived by using the Perron-Frobenius theorem.8 However, as shown below, these theorems can be proven in an elementary way without resorting to the Perron-Frobenius theorem. Proof We prove this by mathematical induction on the size of a transition matrix K . The proposition is the following: For any transition matrix R ∈ K × K with connectivity, a positive vector p satisfying R p = 0 uniquely exists. For K = 2 it is trivial to show this proposition.9 We assume that the proposition holds for K = k. For a given (k + 1) × (k + 1) transition rate matrix R with connectivity, we construct a k × k transition rate matrix R with connectivity from R as Ri, j := Ri, j −
Ri,k+1 Rk+1, j . Rk+1,k+1
(2.18)
We note Rk+1,k+1 < 0 due to the connectivity.10 We shall verify that R is a transition matrix (i.e., satisfying nonnegativity and the normalization condition) with connectivity. Nonnegativity of R is a direct consequence k of Rk+1,k+1 < 0 and Ri, j = 0 for all j is Ri, j , Ri,k+1 , Rk+1, j ≥ 0. The normalization condition i=1 shown as k
k Ri,k+1 Rk+1, j Rk+1,k+1 Rk+1, j = −Rk+1, j + = 0, R Rk+1,k+1 k+1,k+1 i=1 i=1 i=1 (2.19) where we used the normalization condition for R in the second equality. The connectivity is shown as follows: We consider connectivity paths (sequences of integers) in R. For any i and j, if the path [a1 , a2 , · · · an ] from i to j does not contain k + 1, then this path is also a connectivity path in R . If the path contains the state k + 1 as am , then the path [a1 , a2 , . . . , am−1 , am+1 , . . . , an ] is a connectivity path in R because both Ram+1 ,k+1 and Rk+1,am−1 are nonzero and thus Ra m+1 ,am−1 is also nonzero. We first show the existence: The transition rate matrix R has at least one positive vector p satisfying R p = 0. By assumption, the k × k matrix R constructed above has a positive vector p satisfying R p = 0. Using this p , we construct p as 8
Ri, j =
k
Ri, j −
Perron-Frobenius theorem states that a nonnegative matrix A with strong connectivity with eigenvalues |λ1 | ≥ |λ2 | ≥ · · · satisfies (i) λ1 is real, (ii) the first inequality is strict (i.e., λ1 > |λ2 |), (iii) we can set the corresponding eigenvector to λ1 such that all the vector components are real and positive.
9 A 2 × 2 transition rate matrix with connectivity can be expressed as R = −a b with a, b > 0. a −b
b/(a + b) This matrix has a unique positive vector v = which satisfies Rv = 0. a/(a + b) 10 Connectivity ensures that for any i there exists j = i such that R j,i > 0. Then, Ri,i = − j=i R j,i and R j,i ≥ 0 for j = i implies that all the diagonal elements of R are strictly negative; Ri,i < 0 for any i.
2.3 Convergence Theorem
25
pi =
pi
: i ≤ k, k 1
− Rk+1,k+1 j=1 Rk+1, j p j : i = k + 1.
(2.20)
By inserting this relation and Eq. (2.18) into the master equation, we confirm that p is indeed the desired positive vector: R p = 0. We next show the uniqueness: If two positive vectors pa and pb satisfying R pa = 0 and R pb = 0, then these two vectors are equal up to a constant factor. By defining the restriction of pa to 1 ≤ i ≤ k (i.e., ( p1a , p2a , · · · , pka )t ) as pa , R constructed in the same way satisfies R pa = 0 because k j=1
Ri, j paj =
k
Ri, j paj −
j=1
k Ri,k+1 Rk+1, j paj j=1
a = −Ri,k+1 pk+1 −
= 0,
Rk+1,k+1
a ) Ri,k+1 (−Rk+1,k+1 pk+1 Rk+1,k+1
(2.21)
where we used R pa = 0 in the second line. Following a similar argument, we arrive
at R pb = 0. Since R p = 0 has a unique solution, we find pa = c pb (c is a a b constant factor). kThen, the ka + 1-th elements of pk and p areb uniquely determined a b = −( i=1 Rk+1,i pi )/Rk+1,k+1 = −( i=1 Rk+1,i c pi )/Rk+1,k+1 = cpk+1 , as pk+1 a b a which follows from R p = 0 and R p = 0. This shows the uniqueness p = c pb . Hence, the proposition holds for K = k + 1. We next show that any initial state converges to the unique stationary state in the long time limit. For the convenience of explanation, we here mainly treat a discretetime Markov chain instead of a continuous-time Markov jump process. To state the theorem, we introduce strong connectivity11 of matrices. Definition: Strong connectivity
A non-negative matrix T has strong connectivity12 if there exists a positive integer m such that all elements of (T )m (T to the m-th power) are positive. Here we denote the m-th power of T by (T )m in order to distinguish it from the transition matrix at the m-th step, T m . A simple example of a matrix which satisfies connectivity but does not satisfy strong connectivity is
01 T = . (2.22) 10
11 12
This property is also called primitiveness or aperiodic. It is easy to show that a transition probability matrix with strong connectivity has connectivity.
26
2 Stochastic Processes
This matrix induces periodic shuttles between states 1 and 2. Strong connectivity excludes such situations. Theorem: Convergence theorem (discrete time Markov chain)
Consider a discrete-time Markov chain with a K × K transition probability matrix T satisfying the strong connectivity. Then, for any initial probability distribution p0 , the probability distribution at the n-th step, pn := T n p0 , converges to the unique stationary distribution in the infinite step limit lim pn = pss .
n→∞
(2.23)
In the case of continuous-time Markov jump processes, the convergence theorem holds with a weaker condition, the connectivity of a transition rate matrix. Theorem: Convergence theorem (continuous-time Markov jump)
Consider a continuous-time Markov jump process with a K × K transition rate matrix R satisfying the connectivity. Then, for any initial probability distribution p(0), the probability distribution at time t, p(t) := e Rt p(0), converges to the unique stationary distribution in the infinite time limit lim p(t) = pss .
t→∞
(2.24)
We first prove the convergence theorem for discrete-time Markov chains Eq. (2.23), and then show how to reduce the convergence theorem for continuous-time Markov jump processes to that for discrete-time Markov chains. Proof (for discrete-timeMarkov chains) Let V0 ⊆ R K be a vector subspace means v = 0. If v ∈ V , then T v ∈ V because such that v ∈ V 0 0 i i i (T v)i = 0 T v = v = 0. In other words, the action of T keeps V inside itself. 0 i, j i j j j j We shall show that for any v ∈ V0 , limn→∞ (T )n v = 0, which is equivalent to the convergence theorem because any initial distribution p0 can be decomposed into p0 = pss + v 0 with v 0 ∈ V0 . Due to the strong connectivity, there exists m such that all elements of (T )m are positive. Let μ := mini, j ((T )m )i j be the minimum matrix element13 of (T )m , and introduce a matrix M such that Mi j = μ for all i, j. We construct a new matrix S as (2.25) S := (T )m − M.
13
This minimum satisfies μ < 1/K .
2.4 Formal Introduction of Markov Process
27
By construction, all the matrix elements of S are nonnegative and satisfy i Si j = 1 − K μ < 1 for any j. In addition, for any v ∈ V0 , S and (T )m have the same action: Sv = (T )m v
(2.26)
m because Mv = 0. Combining these two facts, we find that application of (T ) monotonically decreases the L 1 -norm of the vector |v|1 := i |vi | as
m (T ) v = |Sv|1 ≤ Si j v j = (1 − K μ) |v|1 . 1
(2.27)
i, j
This directly implies lim ((T )m )n v 0 1 ≤ lim (1 − K μ)n v 0 1 = 0,
n→∞
n→∞
which completes the proof.14
(2.28)
Proof (for continuous-time Markov jump processes) With a transition matrix R with connectivity, we consider a matrix (I + Rt)m with an integer m.Since R has connectivity and all the matrix elements of I + Rt are nonnegative, for any (i, j)
there exists m < K such that the (i, j) element of (I + Rt)m is strictly positive. With noting that all the diagonal elements of I + Rt are strictly positive, the (i, j) element of (I + Rt)m is also strictly positive for any m ≥ m . Hence, we conclude that all the matrix elements of U = (I + Rt) K are strictly positive. Since U propagates the probability distribution as U p(t) = p(t + K t), we can construct a discrete-time Markov chain as pn := p(n K t) with the transition probability matrix as U . Hence, by setting (T )m as U , we succeed in reducing the convergence theorem for continuous-time Markov jump processes to that for discrete-time Markov chains, which completes the proof.
2.4 Formal Introduction of Markov Process In this section, we introduce a Markov process and a Markov jump process in a more formal and mathematical manner compared to Sect. 2.1. Readers who are not interested in mathematical characterizations can skip this section. Let S be a space of states w. A stochastic process is formulated as a probability theory of a path of state w(t) in 0 ≤ t ≤ τ . Here, a trajectory of a state of the system is regarded as a function15 of [0, τ ] → S, which maps time (a real number) to a state. The probability (density) to these paths (functions) is assigned first. By integrating We here use the fact that |v|1 = 0 holds only when v = 0. Here, we, of course, should restrict a class of possible functions to moderate (e.g., measurable) ones. 14 15
28
2 Stochastic Processes
a probability distribution of paths over all states except at t, we obtain a probability distribution at time t; P(w(t); t). In a similar manner, we obtain joint probability distributions of multiple times and conditional probability distributions. The Markov process is a specific class of stochastic process with a path probability satisfying the following condition. Definition: Markov process
Consider a stochastic process. Let P((w, t)|(w , t ) · · · ) be a conditional probability that the state at time t is w under the condition that the states at time t
is w and · · · . Then, if the following equality P((w, t)|(w , t ), (w
, t
), · · · ) = P((w, t)|(w , t ))
(2.29)
holds for any t > t > t
> · · · , we call this stochastic process as a Markov process. This definition states that the probability distribution at time t depends only on the state at the latest time t , and the past states before t are irrelevant. This definition of the Markov property is valid regardless of whether the state space and the time step are discrete or continuous. In the remainder of this textbook, we mainly use the symbol w for systems with discrete states and x for systems with a continuous state space. We next introduce a Markov jump process, a stochastic process with discrete states and continuous time. Again, the path probability is assigned first. Let pi (t) be the probability distribution of the state wi at time t. Owing to the Markov property, the time derivative of p(t) depends only on the present probability distribution p(t), and thus is given by a function of p(t) at the same t. We define the transition rate with the jump from w to w at time t as16 Pi→ j;t := lim
t→0
P((w j , t + t)|(wi , t)) , t
(2.30)
where the numerator of the right-hand side P((w j , t + t)|(wi , t)) is determined from the given path probability. By setting R ji (t) = Pi→ j;t
16
(2.31)
In the following definitions, we assume that a proper limit of the right-hand side of Eq. (2.30) exists. This assumption is satisfied in physically plausible settings.
2.4 Formal Introduction of Markov Process
29
for i = j and Rii (t) = − k=i Pi→k;t , the probability distribution p(t) satisfies the following master equation: d pi (t) = Ri j (t) p j (t). dt j
(2.32)
By construction, the matrix R satisfies both nonnegativity and the normalization condition.
Chapter 3
Stochastic Thermodynamics
In this chapter, we demonstrate how thermodynamics is implemented in stochastic Markov processes. The thermodynamic properties of a small stochastic system are guaranteed by the attached thermal baths. The Markov property of stochastic dynamics of the system is fulfilled in the situation that the thermal bath is always in equilibrium1 . Here, we can apply conventional thermodynamics and equilibrium statistical mechanics to heat baths. This textbook mainly treats continuous-time Markov jump processes on discrete states, and in particular we investigate these processes in this chapter. We consider systems in continuous space in the next chapter (Chap. 4). In stochastic thermodynamics, it is crucial to define thermodynamic quantities on each single stochastic trajectory. We define both entropy and heat on a single trajectory consistently with thermodynamic relations. We shall show how to define these quantities and how thermodynamic relations are reproduced in Markov jump processes. In Sect. 3.1, we introduce stochastic entropy and Shannon entropy. In Sect. 3.2, we define the amount of heat in a consistent manner with the first and second law of thermodynamics. Combining them, in Sect. 3.3 we introduce entropy production, which is the most important quantity in stochastic thermodynamics. We, however, remark that some definitions and setups in stochastic thermodynamics are slightly different from those in conventional thermodynamics. In Sect. 3.4, we clarify the difference between stochastic thermodynamics and conventional macroscopic thermodynamics, and discuss how to interpret the results in stochastic thermodynamics as those in conventional macroscopic thermodynamics.
1 Suppose the system of interest shows non-Markov dynamics. In that case, we usually formulate this system such that the initial state of the baths (or sometimes the system) is in the Gibbs distribution, which guarantees thermodynamic properties. Here we supposed that the dynamics of the composite system of a system and baths is given by deterministic dynamics (Hamilton dynamics or unitary evolution).
© Springer Nature Singapore Pte Ltd. 2023 N. Shiraishi, An Introduction to Stochastic Thermodynamics, Fundamental Theories of Physics 212, https://doi.org/10.1007/978-981-19-8186-9_3
31
32
3 Stochastic Thermodynamics
3.1 Shannon Entropy 3.1.1 Stochastic Entropy Consider a probabilistic trial with M possible events x1 , x2 · · · x M with respective probabilities P(x j ) ( j = 1, 2, . . . , M), which we also abbreviate as p j . We first define the surprisal2 , or stochastic entropy, of each event, which measures how much we are surprised if this event occurs. The surprisal of an event xi is defined as3 s(xi ) := − ln pi ,
(3.1)
which takes its minimum, zero, if the event xi invariably occurs, and takes a large value if the event xi rarely occurs. The surprisal s is the unique function of a probability of an event (up to a constant factor) satisfying the following properties of the measure of surprise. Theorem: Uniqueness of the measure of surprise
Let f ( p) be a function of a probability p of a event. Then, a function f satisfying the following two properties is unique, which is the surprisal s up to a constant factor: • f ( p) is a continuous function of p. • f ( p) is additive with independent events 4 : f ( pp ) = f ( p) + f ( p ).
Proof Let a n/m = b (n, m ∈ N). The additivity implies f (a) = m f (a 1/m ) and f (b) = n f (a 1/m ), which leads to f (b) = n/m · f (a). By setting a to Napier’s constant e, any x satisfying ln x ∈ Q is calculated as f (x) = f (e) ln x.
(3.2)
Since the set of rational numbers is dense in real numbers, the continuity of f tells that5 f (x) ∝ − ln x for all x ≥ 0. We remark that although surprisal itself is defined for a single stochastic trial, we need to perform many trials to estimate the value of surprisals experimentally in order to know the probability distribution. 2
Some textbooks on information theory call this quantity self information or information content. We take the base of logarithms Napier’s constant e. This choice is natural in physics, whereas the base is usually set as two in information theory. 4 If two events x and y are independent events with each other and occur with probability p and p respectively, then the probability that both x and y occur simultaneously is pp . 5 We here put the minus sign in order to make the surprisal a nonnegative and decreasing function of p. 3
3.1 Shannon Entropy
33
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.2
0.4
0.6
0.8
1.0
Fig. 3.1 A plot of the binary entropy H ( p) = − p ln p − (1 − p) ln(1 − p). The binary entropy takes its maximum ln 2 = 0.6931 · · · at p = 21
3.1.2 Shannon Entropy We next define the Shannon entropy (the measure of uncertainty) as the expectation value of surprisal: Definition: Shannon entropy [19]
Suppose that events x1 , x2 , · · · , x M occur with probabilities p1 , p2 , · · · , p M , respectively. Then the Shannon entropy on x with probability p is defined as the average of surprisal: H (x) := −
p j ln p j .
(3.3)
j
The larger the Shannon entropy is, the harder we predict which event occurs. As an example, let us consider coin-tossing. When the probabilities of head and tail are 1 respectively, we cannot predict at all what will happen, and the Shannon entropy 2 is 21 ln 2 + 21 ln 2 = ln 2 0.6931 · · · in this case. In contrast, when the probability of the head is 13 , and that of the tail is 23 , we can predict that tail is likely to occur, and the Shannon entropy is 13 ln 3 + 23 ln 23 = 0.6365 · · · , which is smaller than the former. For probability distributions on two different events as the above coin-tossing, the Shannon entropy is given by the binary entropy function (see Fig. 3.1): H ( p) = − p ln p − (1 − p) ln(1 − p),
(3.4)
34
3 Stochastic Thermodynamics
where p is a probability of one of the events. The binary entropy function takes its maximum ln 2 at p = 21 , and takes its minimum 0 at p = 0 and p = 1. In general, the Shannon entropy is maximum when all the events occur with equal probabilities. The Shannon entropy is determined by the probability distribution of events, which means that the Shannon entropy measures its inherent uncertainty. Then, one may ask how to define the measure of uncertainty when we know some “information” on what event is likely to occur. The answer is the conditional Shannon entropy: Definition: Conditional Shannon entropy
The Shannon entropy of x under the condition y is defined as H (x|y) := −
P(xi , y j ) ln P(xi |y j )
i, j
=−
j
P(y j )
P(xi |y j ) ln P(xi |y j ).
(3.5)
i
Here, P(xi |y j ) is the conditional probability of xi under the condition that y j occurs.
To grasp its characteristics, let us again consider coin-tossing. Consider three different coins: the coin A with the probability of head 13 , the coin B with the probability of head 23 , and the coin C with the probability of head 21 . In addition, we use the coin A and B with probability 15 respectively, and use the coin C with probability 35 (See Table 3.1). If we do not know which coin is tossed, the total probability (including which coin is used) that the result of coin-tossing is the head is 21 , and the Shannon entropy is ln 2 0.6931 · · · . In contrast, if we have the information on which coin is tossed, the uncertainty of the result decreases when the coin A or B is tossed. When the coin A or B is tossed, the Shannon entropy under this condition is 1 ln 3 + 23 ln 23 = 0.6365 · · · . When the coin C is tossed, the Shannon entropy under 3 this condition is still ln 2 0.6931 · · · . Thus, the average Shannon entropy of the result of coin-tossing x under the condition of the tossed coin y is given by 1 1 3 · 0.6365 · · · + · 0.6365 · · · + · 0.6931 · · · 0.6705 · · · . 5 5 5
(3.6)
As shown above, the Shannon entropy of the result of coin-tossing decreases by getting the information on which coin is used. Not only for coin-tossing, H (x|y) ≤ H (x) holds in general settings. Intuitively speaking, this inequality tells that the Shannon entropy always decreases (or does not change) when we get some information.
3.2 Definition of Heat
35
Table 3.1 The probability of the head and the tail with biased coins, and the probability of the use of each coin Head Tail Probability that the coin is used coin A coin B coin C
1/3 2/3 1/2
2/3 1/3 1/2
1/5 1/5 3/5
Theorem: Monotonicity of Shannon entropy by conditionalization
For any stochastic variables X and Y , H (X |Y ) ≤ H (X ) is satisfied.
(3.7)
We shall prove this inequality by using the nonnegativity of the Kullback-Leibler divergence in Sect. 5.3.1.
3.2 Definition of Heat 3.2.1 Time-Reversal Symmetry of Equilibrium State Before going to the definition of heat, in this subsection, we summarize physical observations on equilibrium states. Mathematical definitions of thermodynamic quantities are presented in the next subsection. We first introduce some words and symbols related to time-reversal. We say a variable parity-odd if this variable changes its sign under the time-reversal operation. A typical example of a parity-odd variable is momentum. If we invert the direction of time, a particle moves in the opposite direction with the same speed, which means that the momentum changes its sign. A variable which is kept unchanged under time-reversal is called parity-even. We denote the time-reversal state of w by w. ¯ In a one-dimensional system with a single particle, for example, the state is characterized by the position and the momentum of the particle w = (x, p), and its time-reversal state is given as w¯ = (x, − p). The time-reversal of dynamics is defined in a similar manner. If there is a field with broken time-reversal symmetry such as a magnetic field or a rotating field with the Coriolis force, we invert the sign of such a field in the time-reversal evolution (e.g., the magnetic field B is inverted as −B). The time-reversal transition rate of R and the time-reversal path probability of P are denoted by R † and P † , respectively. Similarly, we denote a quantity A in the time-reversal process by A† .
36
3 Stochastic Thermodynamics
An equilibrium system does not change its state through its time evolution, which implies the absence of any macroscopic current. This observation motivates us to characterize the equilibrium state as a state with no time direction: A trajectory of the time evolution of the system and its time-reversal trajectory occur with the same probability. This characterization can be proven for deterministic dynamics. If a time evolution w0 → w1 exists, its time-reversal evolution w¯ 1 → w¯ 0 also exists. In addition, in the microcanonical ensemble, the principle of equal a priori probabilities guarantees that these two initial states appear with the same probability: Peq (w0 ) = Peq (w¯ 1 ). Combining these two, we find that a trajectory w0 → w1 and its timereversal trajectory w¯ 1 → w¯ 0 occur with the same probability. In other words, if we shoot a video of the dynamics of a system in equilibrium, we cannot distinguish the film running forward and the file running in reverse. By regarding that stochastic processes in physical systems are generated by coarsegraining of microscopic deterministic processes,6 the observation as mentioned earlier is also valid for stochastic processes7 : Requirement: Time-reversal symmetry of equilibrium states
In equilibrium systems, for any w and w a transition w → w and its timereversal transition w¯ → w¯ occur with the same probability: eq
pweq Pw→w = pw¯ Pw†¯ →w¯ .
(3.8)
Here, p eq is an equilibrium distribution whose details have not yet been specified at present. We remark that the word “time-reversal symmetry" has two slightly different meanings. In one case, when we consider the time-reversal process, we invert the signs of parity-odd variables (e.g., momentum) and parity-odd fields (e.g., magnetic field). If the time-reversed system follows the same time evolution equation, we say that this system satisfies the time-reversal symmetry. In this meaning, all Hamilton dynamics and its coarse-grained description satisfy the time-reversal symmetry. The requirement Eq. (3.8) manifests the time-reversal symmetry in this sense. When we say “the fluctuation theorem (or entropy production) manifests the time-reversal symmetry”, we adopt this meaning. In the other case, the time-reversal process does not accompany the inversion of signs. In this case, systems with momentum and a magnetic field do not satisfy the time-reversal symmetry. The word “broken time-reversal symmetry” is usually used in the latter sense. 6
This assumption may be violated for quantum systems with measurement backactions. However, otherwise, this assumption is very plausible. 7 We remark that in the case of a system attached to two baths simultaneously, we should decompose transition rates into the contribution from each bath (and that of Hamilton dynamics) as in Sect. 4.6.
3.2 Definition of Heat
37
The argument as mentioned earlier (a stochastic process is obtained through coarse-graining of microscopic deterministic dynamics) also implies time-reversal invariance of escape rates8 : Requirement: Time-reversal invariance of escape rate
In stochastic processes describing physical systems induced by heat baths, the escape rate is invariant under time-reversal.
† ew,t = ew,t ¯ .
(3.9)
In a microscopic description, the energy is a sum of potential energy and kinetic energy. The former only depends on the positions of particles, and the latter contains momentum in a quadratic form p 2 , which is invariant under the sign inversion. This fact implies that the energy is invariant under time-reversal. Requirement: Time-reversal invariance of energy
The energy is invariant under time-reversal
E w = E w¯ = E w† .
(3.10) 9
From the perspective of physics, the heat release associated with a transition w → w is defined as the energy difference Q w→w := E w − E w .
(3.11)
In the next subsection, we shall rewrite this quantity in terms of transition rates.
3.2.2 Heat in Discrete-State Systems and Detailed-Balance Condition Consider a stochastic process on discrete states induced by a single heat bath. If our system is in continuous space, we apply the decomposition and discretization technique explained in Sect. 4.6. In this subsection and next section, we suppose that the time evolution of the system is given by the master equation Eq. (2.10). Roughly speaking, this is confirmed by setting w = w in Eq. (3.8). The heat release is sometimes called heat dissipation. However, we remark that heat dissipation in this sense is not dissipation. The heat dissipation exists in general reversible processes. On the other hand, the existence of dissipation means the irreversibility of processes, and thus reversible processes accompany no dissipation. Confusion related to this point is seen in the debate on Maxwell’s demon, which is discussed in Sect. 9.2.3. 8 9
38
3 Stochastic Thermodynamics
From the perspective of mathematics, we define the amount of heat as follows. We here suppose that the map of a state to its time-reversal w → w¯ and the time-reversal transition rate R † are properly given. Definition: Heat in stochastic processes
Consider a transition w → w induced by a heat bath with inverse temperature β. Then, the heat release from the system to the heat bath with this transition is given as Pw→w 1 . (3.12) Q w→w := E w − E w = ln † β Pw¯ →w¯ We frequently use the inverse temperature β := 1/T in stochastic thermodynamics instead of the temperature T . We here normalize the Boltzmann constant to 1. This definition is consistent with the physical observations and requirements in the previous subsection. In fact, we can derive the above definition for systems attached to a single heat bath by accepting (i) time-reversal symmetry of equilibrium states (3.8), (ii) time-reversal symmetry of energy (3.10), and (ii) the equilibrium eq distribution is the canonical distribution: pw ∝ e−β Ew . In addition, if the system is attached to multiple heat baths, we can justify the above definition by supposing (counterfactually) that only the heat bath inducing this transition is attached to the system and applying the above argument. Here we assumed that the transition rates and the amount of heat release do not change even if we remove other heat baths. Suppose that a system is attached to a single heat bath and has no odd variables and no field with broken time-reversal symmetry. Then, the requirement (3.8) reads the following detailed-balance condition: Definition: Detailed-balance condition A transition rate is called to satisfy the detailed-balance condition if the relation pwss Pw→w = pwss Pw →w holds for any w and w , where pwss is the stationary distribution of w.
(3.13)
Since a continuous system can be properly discretized as shown in Sect. 4.6, the definition of heat proposed in this subsection covers both discrete and continuous stochastic systems. This definition of the detailed-balance condition is favored by mathematicians (specialized in stochastic processes and probability theory). Mathematicians prefer to treat states abstractly and not to enter the detailed physical structures of states, e.g., the parity of variables in the state. The definition of the detailed-balance condition Eq. (3.13) meets this preference.
3.2 Definition of Heat
39
Suppose a system is attached to multiple baths, and transitions induced by each bath satisfy Eq. (3.13) with the corresponding temperature. In that case, we say that this system satisfies the local detailed-balance condition. Definition: Local detailed-balance condition
A system satisfies the local detailed-balance condition if for any heat bath ν with inverse temperature βν and any transition w → w induced by the heat bath ν the following relations is satisfied: (E w − E w ) =
1 Pν ln w→w . βν Pwν →w
(3.14)
We remark that the word “detailed-balance condition” is sometimes used in slightly different meanings. In one case, the word “detailed-balance condition” is used for properties not of transition rates but a probability distribution. In this meaning, a probability distribution pw is said to satisfy the detailed-balance condition if pw Pw→w = pw Pw →w is satisfied for any w, w . This definition is equivalent to the combination of Eq. (3.13) and that the probability distribution is in the stationary distribution. In another case, the word “detailed-balance condition” is used with including the time-reversal process, that is, the relation pwss Pw→w = pwss Pw†¯ →w¯
(3.15)
is regarded as the definition of the detailed-balance condition. In this case, the detailed-balance condition shows that only a single heat bath is attached to the system. In the other case, the word “detailed-balance condition” is used in the same definition as the local detailed-balance condition Eq. (3.14). In this textbook, we use the word detailed-balance condition only in the meaning of Eq. (3.13). In addition, in some literature, the word “local detailed-balance condition” is used with including the time-reversal process. In this meaning, the local detailed-balance condition is essentially the same as the requirement of time-reversal symmetry in equilibrium systems Eq. (3.8). When reading a paper or a textbook on stochastic thermodynamics, one should make sure what definition is used in “detailed-balance condition” and “local detailed-balance condition.”
40
3 Stochastic Thermodynamics
If a system has a time-dependent control parameter, its change induces work extraction or work consumption. Let E wλ be the energy at a state w with a control parameter λ. Then, the work is given as follows: Definition: Work in stochastic processes in discrete states
Suppose that the system is at a state w. The work extraction per unit time is given by d E λ(t) dλ(t) . (3.16) Wˆ˙ = − w dλ dt The average work extraction per unit time is thus given by W˙ = −
d E λ(t) dλ(t) w
w
dλ
dt
pw (t).
(3.17)
This definition of work satisfies the law of energy conservation (the first law of thermodynamics): W + Q = −E, (3.18) where Q is the average heat release, and E is the average change in the energy of the system.
3.3 Entropy Production We now introduce the most important quantity in stochastic thermodynamics, entropy production. The entropy production is defined as the sum of the increase of entropy of the system and that of the attached baths, which quantifies the degree of irreversibility of processes. We first define the average entropy production. Definition: Entropy production (average)
Consider a process in a small system attached to k heat baths with inverse temperatures β1 , · · · , βk in 0 ≤ t ≤ τ . The average entropy production is defined as k σ := βν Q ν + H ( p(τ )) − H ( p(0)), (3.19) ν=1
where Q ν represents the average heat release to the ν-th bath, H is the Shannon entropy, and p(τ ) and p(0) are the final and the initial probability distributions. The first term of Eq. (3.19) represents the entropy increase of baths, with noting that Q/T is the entropy increase in an isothermal condition in thermodynamics.
3.3 Entropy Production
41
The second and third terms represent the entropy increase of the system, where we employ the Shannon entropy as the entropy of the system. If the composite system of a main system and baths is isolated, the second law of thermodynamics (a monotonic increase of entropy in an isolated system) reads σ ≥ 0, which is proven in Sect. 5.3. We next define thermodynamic quantities for each single stochastic trajectory. The energy change with baths, or heat, is a well-defined quantity for a single trajectory. In contrast, the Shannon entropy is defined only for an ensemble average, not for a single trajectory. To define the entropy change of the system in a single trajectory, we recall the fact seen in Sect. 3.1.1 that the Shannon entropy is an ensemble average of the stochastic entropy (surprisal). Definition: Entropy production (stochastic)
Consider a process in a small system attached to k heat baths with inverse temperatures β1 , · · · , βk in 0 ≤ t ≤ τ . The stochastic entropy production for a stochastic single trajectory is defined as σˆ :=
k
βν Qˆ ν + sˆ (w(τ ); τ ) − sˆ (w(0); 0),
(3.20)
ν=1
where Qˆ ν represents heat release to the ν-th bath through this trajectory, and w(τ ) and w(0) are the final and the initial states. In addition, sˆ (w; t) is the stochastic entropy of a state w at time t. Here we put the hat ˆ· to express stochastic variables. A symbol without the hat represents its ensemble average. Throughout this textbook, we employ this notation. With this definition, the ensemble average of the stochastic entropy production indeed provides the averaged entropy production as σ = σˆ . Seifert [20] put emphasis on the choice of the stochastic entropy in Eq. (3.20) as the realized distribution at t = 0 and t = τ , which is operationally accessible quantities. Since the heat Q ν can be expressed by the ratio of transition rates as Eq. (3.12), the entropy production σˆ is written in terms of quantities in stochastic processes. Remark that this definition Eq. (3.20) is still valid in the case of non-Markovian dynamics and deterministic dynamics.
42
3 Stochastic Thermodynamics
We note that the entropy production rate, the increase of average entropy production per unit time, has a simple expression in terms of probabilities and transition rates. Theorem: Entropy production rate
The entropy production rate σ˙ := dσ/dt is written as σ˙ =
pw Pw→w ln
w,w
pw Pw→w pw Pw†¯ →w¯
.
(3.21)
In this expression, the right-hand side can be decomposed into the contribution from the system and baths as pw d H ( p(t)) = pw Pw→w ln , dt pw w,w k ν=1
βν Q˙ ν =
pw Pw→w ln
w,w
Pw→w Pw†¯ →w¯
(3.22) ,
(3.23)
where Q˙ ν is the heat release to the ν-th bath per unit time, and we used the master equation Eq. (2.10) in the first line. Since this expression takes the form of the Kullback-Leibler divergence, the nonnegativity of Kullback-Leibler divergence (proven in Sect. 5.3.1) directly implies σ˙ ≥ 0,
(3.24)
τ which leads to the second law of thermodynamics σ = 0 dt σ˙ (t) ≥ 0. This expression is useful not only for the proof of the second law but for many other relations. We will see some of them in Part IV.
3.4 Differences Between Conventional Thermodynamics and Stochastic Thermodynamics We have seen in the previous subsections that several thermodynamic relations still hold in small stochastic systems. On the other hand, some differences still lie between conventional thermodynamics and stochastic thermodynamics. In spite of their similarity, some quantities have slightly different definitions, which may lead to different results between them. In this subsection, we first clarify these differences, and then demonstrate how to derive results in macroscopic systems from stochastic thermodynamics.
3.4 Differences Between Conventional Thermodynamics …
43
3.4.1 Summary of Conventional Thermodynamics We first summarize the framework of conventional thermodynamics for macroscopic systems. The conventional thermodynamics is formulated in the following setups: • Observable: Observables are restricted to macroscopic ones. By denoting the volume of the system by V , all observables can possess fluctuation of order o(V ). • State: A state is specified by the values of macroscopic observables.10 Suppose two states have values of macroscopic (extensive) observables with differences of o(V ). These two states are regarded as the same state in a thermodynamic sense and satisfy the same thermodynamic relations. • Operation: There exist several styles of the setup of operations. In one style, an external agent performs external operations without any fluctuation. In another style, external operations performed by an external agent can fluctuate in the order of o(V ). In the latter case, two operations with the difference of o(V ) should provide the same thermodynamic prediction. In addition, thermodynamic quantities are defined as follows: • Work: Work extraction through an operation is defined as mechanical work done on the change in macroscopic parameters (e.g., movement of a piston). • Heat: Heat is defined as the difference between the change in internal energy and extracted work11 : Q = −W − U . The most important point is that quantities of order o(V ) are considered negligible. For example, a process with finite entropy production of o(V ) is regarded as a reversible process.
3.4.2 Summary of Stochastic Thermodynamics We next summarize the framework of stochastic thermodynamics. Stochastic thermodynamics is formulated in the following setups: • Observable: Observables can be microscopic ones. In stochastic thermodynamics, measurements and controls with arbitrarily high resolution are assumed to be possible. As a result, the difference of order O(1) in terms of volume V is detectable. • State: In stochastic thermodynamics, a probability distribution over possible microscopic states is sometimes regarded as a state. For example, we frequently consider a process with a given initial probability distribution, which plays a counterpart of the initial state in macroscopic thermodynamics. A macroscopic system with gases is usually specified by a set of extensive variables (E, V, N ), where E is the energy, V is the volume, and N is the number of particles. 11 This definition employs the first law of thermodynamics. If we can measure the energy of a bath directly, we can define the amount of heat more directly. 10
44
3 Stochastic Thermodynamics
• Operation: In stochastic thermodynamics, an external agent can perform any microscopic operation on the system. In addition, thermodynamic quantities are defined as follows: • Work: Work extraction through an operation is defined as the mechanical work done on the change in control parameters (e.g., changing energy levels).12 • Heat: Heat is defined as the difference between the change in internal energy and extracted work: Q := −W − E. The crucial difference from conventional thermodynamics is that quantities of order O(1) are detectable. For example, a process with finite entropy production of O(1) is regarded as an irreversible process. Some differences come from the aforementioned difference of resolution. One example is the existence of a reversible adiabatic process, which is discussed in Sect. 3.4.4. Some other differences come from what operation is allowed. This point is seen in the next subsection.
3.4.3 Entropy In stochastic thermodynamics, we define the entropy of the system as the Shannon entropy. One may consider that the Shannon entropy is not a good definition of thermodynamic entropy, because the Shannon entropy does not increase in any adiabatic processes13 including non-quasistatic ones14 whereas thermodynamic entropy in conventional thermodynamics generally increases through such processes. To refute this skeptics, we here provide one possible argument why the Shannon entropy is a good definition of entropy (measure of irreversibility) in stochastic thermodynamics, in contrast to the case of conventional thermodynamics. The irreversibility in thermodynamics comes from the restriction on possible operations. To confirm this, let us consider its inverse. If an external agent performs arbitrary microscopically fine-tuned controls all over the world, there is no irreversible process because the Hamilton equation is a reversible equation, and this agent can perform a process violating the second law of thermodynamics.15 Of course, we For the case of deterministic Hamilton dynamics with a control parameter λ, work is given by dt∂ H (w; λ)/∂λ · λ˙ (t). 13 In thermodynamics, a process is called adiabatic if the change in energy occurs only through the change in control parameters. 14 The Hamilton equation keeps the phase volume, and thus the Shannon entropy is invariant under the Hamilton dynamics. This argument also holds for quantum unitary evolution. 15 More precisely, if an irreversible adiabatic process X → Y in a macroscopic system exists, then the agent can construct its reversal process Y → X by employing the recurrence theorem [21, 22] and shortcut to adiabaticity [23–26]. The recurrence theorem claims that if we wait in a certain time interval, the system returns to a state arbitrarily close to the initial state. The shortcut to adiabaticity claims that for any given state evolution X 1 → X 2 within a time interval τ , there exists a Hamiltonian which conveys X 1 → X 2 within an arbitrarily short time interval τ τ . Combining 12
3.4 Differences Between Conventional Thermodynamics …
45
cannot perform such a miracle operation practically, and we can perform operations only in a set of allowed operations. This restriction leads to irreversibility, where the change of states X → Y is possible within this set of operations but its opposite Y → X is impossible within it. For the case of conventional thermodynamics, the set of allowed operations is given by all macroscopic operations, where we cannot control a microscopic molecule solely. In contrast, for the case of stochastic thermodynamics, the set of allowed operations is any operation on the system, but not on baths. This set includes microscopically fine-tuned operations on the system, such as controlling a single molecule. The irreversibility in stochastic thermodynamics comes from the fact that we cannot control baths; We cannot recover the information of the past states of the system escaping to the baths. Now we go back to the problem of the definition of entropy. In conventional thermodynamics, adiabatic processes are irreversible in general for an agent with macroscopic operations, and thus entropy in conventional thermodynamics should increase in such processes. In contrast, in stochastic thermodynamics, all adiabatic processes are reversible within the set of allowed operations because we can perform any operations on the system. Thus the entropy in stochastic thermodynamics should be invariant in adiabatic processes. This is the reason why the Shannon entropy can serve as a good definition of entropy in stochastic thermodynamics, but not a good definition in conventional thermodynamics.
3.4.4 Reversible Adiabatic Processes Adiabatic processes in small stochastic systems require technical cares, which are unnecessary in conventional thermodynamics. In conventional thermodynamics, since the temperature of the end state of a quasistatic adiabatic process is welldefined, we can attach a heat bath at the same temperature to the system without dissipation . In contrast, in stochastic thermodynamic, Liouville’s theorem implies that the final distribution is in general not a canonical distribution with any temperature even if the adiabatic process is quasistatic [27, 28]. Suppose that the initial distribution is a canonical distribution with inverse temperature β. With changing a control parameter from λ(0) to λ(τ ), the adiabatic process maps the state from w(0) to w(τ ), which we denote by w(τ ) = M(w(0)). By denoting the energy at a state w with a control parameter λ by E wλ , the initial distribution is given by λ(0) e−β Ew (3.25) P(w; 0) = β,λ(0) Z and due to Liouville’s theorem the final distribution is written as
these two, we construct a process which transforms the system back to the initial one within a short time interval.
46
3 Stochastic Thermodynamics
λ
λ
t
t Deterministic
With fluctuation o(V)
Fig. 3.2 Examples of a deterministic operation (left) and an operation with fluctuation of order o(V ) (right). Operations as the left one are usually supposed in stochastic thermodynamics, and those as the right one realize in macroscopic thermodynamics. With these two operations, all macroscopic quantities take the same value because these two operations are equivalent in the thermodynamic sense λ(0)
P(M(w); τ ) =
e−β Ew . Z β,λ(0)
(3.26)
In order to set the final distribution also a canonical distribution with another inverse temperature β and a normalization constant Z as
λ(τ )
e−β E M(w) P(M(w); τ ) = , Z any state w should satisfy λ(τ ) = E M(w)
β λ(0) E . β w
(3.27)
(3.28)
This condition means that all energy levels are fattened or shrunk with the same ratio, which is not expected to realize in physical setups. Without Eq. (3.28) it is impossible to achieve reversible processes between two equilibrium states with different temperatures even in the quasistatic limit.16
3.4.5 How to Derive Results for Macroscopic Systems from Stochastic Thermodynamics In Chaps. 14 and 16, we derive some results for macroscopic systems through methods in stochastic thermodynamics. We briefly see how to justify this interpretation in spite of the differences. 16
Dinis, Martínez, Roldán, Parrondo, and Rica [29] proposed a microadiabatic process in which the system interacts infinitely many baths with slightly different temperatures conserving Shannon entropy of the system.
3.4 Differences Between Conventional Thermodynamics …
47
We suppose that the obtained relation in stochastic thermodynamics is meaningful in the thermodynamic limit; that is, the relation is on quantities of order O(V ). If one adopts the second style of the definition of operation (allowing fluctuation in thermodynamic operations), they should pay additional care about the difference in the definition of operation. However, as shown below, this difference does not matter as a result. Suppose that we have a relation with quantities of order O(V ) with a deterministic operation λ(t) in stochastic thermodynamics (see Fig. 3.2). Then this process is one of the realizations of thermodynamic process λ(t) including fluctuation of order o(V ). Recalling the fact that all processes regarded as the same thermodynamic process17 should provide the same results in the thermodynamic sense, we safely conclude that the thermodynamic operation λ(t) should also satisfy the obtained relation in the thermodynamic sense (i.e., up to order O(V )).
For example, two processes λ(t) and λ (t) with |λ(t) − λ (t)| = o(V ) for any t are regarded as the same thermodynamic process.
17
Chapter 4
Stochastic Processes in Continuous Space
In this chapter, we consider a stochastic particle in continuous space. A prominent example is a Brownian particle, which is briefly explained at the beginning of Sect. 2.4. Various biological systems moving in continuous space are also stochastic systems in continuous space. For simplicity, we treat the case with a single variable, while it is easy to extend our analyses to the case with multiple variables.1 We denote a stochastic variable in continuous space by the symbol x. ˆ
4.1 Mathematical Foundations 4.1.1 Wiener Process We now introduce the most basic continuous stochastic Markov process: the Wiener process. The Wiener process Wˆ (t) describes the free Brownian motion. We here introduce the Wiener process in an intuitive and non-rigorous manner. Definition: Wiener process
The Wiener process Wˆ (t) is defined as a process satisfying 1 P(Wˆ (t + t) = x|Wˆ (t) = x ) = √ e− 2π t
(x−x )2 2t
(4.1)
for any t and t, with the initial state as the Dirac delta function: P(x, 0) = δ(x).
1
The textbook by Gardiner [30] describes the case with multiple variables.
© Springer Nature Singapore Pte Ltd. 2023 N. Shiraishi, An Introduction to Stochastic Thermodynamics, Fundamental Theories of Physics 212, https://doi.org/10.1007/978-981-19-8186-9_4
49
50
4 Stochastic Processes in Continuous Space
It is an important but not easy problem in probability theory to prove the existence of the Wiener process. However, in this textbook, we accept the existence by an intuitive argument instead of rigorous proof. A reader interested in the mathematical foundation is invited to a standard textbook of probability theory. Consider a discrete-time random walk on a one-dimensional discrete lattice2 with a lattice constant a. A particle hops to one of its nearest neighbor sites with probability half in every time step with time length τ , which is described as a Markov chain with transition probability Tx→x+a = Tx→x−a =
1 . 2
(4.2)
By setting the initial state distribution as P(x, 0) = δx,0 , the average value of x is always 0. Here, since the space is discrete, the right-hand side is not the Dirac delta function (for a continuous variable) but the Kronecker delta (for a discrete variable). The diffusion coefficient3 is given by x
P(x, t + t) − P(x, t) 1 = x t t x 2
=
(x − a)2 + (x + a)2 2 − x P(x, t) 2
a2 , t
(4.3)
and the probability distribution converges to the Gaussian distribution due to the central limit theorem.4 Taking a limit t → 0 and a → 0 with keeping the diffusion coefficient a 2 /t = 1, we can identify this process as the Wiener process. We now introduce the white Gaussian noise as a limit of the Wiener process. Note that the white Gaussian noise defined below is not a function but a generalized function in mathematics. Definition: White Gaussian noise
The white Gaussian noise ξˆ (t) is given by the following way: ξˆ (t) := lim
t→0
Wˆ (t + t) − Wˆ (t) . t
(4.4)
For later convenience, we define a discretized white Gaussian noise with a small time interval t as ξˆt (t) := Wˆ (t + t) − Wˆ (t). (4.5) We set unit length and unit time beforehand. Both a and τ are dimensionless defined in this unit. 3 The diffusion coefficient D is defined as D := d x 2 /dt for continuous-time random walks. 4 We can approximate the distribution explicitly through Stirling’s formula. 2
4.1 Mathematical Foundations
51
In some literature, this quantity ξˆt (t) is also denoted by Wˆ (t). By construction, the white Gaussian noise satisfies
ξˆ (t) = 0, ξˆ (t)ξˆ (t ) = 0 for t = t ,
(4.6) (4.7)
and reproduces the Wiener process as Wˆ (τ ) =
τ
dt ξˆ (t).
(4.8)
0
We formally write the above relation as d Wˆ (t) = ξˆ (t)dt.
(4.9)
On the basis of Eq. (4.7), we suppose a relation ξˆ (t)ξˆ (t ) = kδ(t − t ) and calculate the value of the coefficient k. Since Eq. (4.1) holds with any t, we have
Wˆ (τ )2 = τ
(4.10)
by setting t as τ . On the other hand, the left-hand side is also calculated as
Wˆ (τ )
2
τ τ = 0
dt dt ξˆ (t)ξˆ (t ) = kτ.
(4.11)
0
These two relation suggests k = 1:
ξˆ (t)ξˆ (t ) = δ(t − t ).
(4.12)
There is a more intuitive but non-rigorous way to construct the white Gaussian noise and the Wiener process as follows: We first discretize time, and then define the white Gaussian noise as a small but finite noise, and finally define the Wiener process by piling up the white Gaussian noise. This approach is similar to the aforementioned idea of the infinitesimal small random walk explained in p.48.
52
4 Stochastic Processes in Continuous Space
Notably, the square of the Wiener process satisfies (d Wˆ (t))2 = dt without taking the ensemble average. More precisely, the integral with (d Wˆ (t))2 is equivalent to a usual time integral in the sense of mean-square limit.5 Theorem: Differential form of the Wiener process
Let f (t) be an arbitrary continuous function. Then the Wiener process satisfies the following relation:
lim
t→0
N −1
ξˆt (nt)2 f (nt) −
τ
2 dt f (t)
= 0,
(4.13)
= 0,
(4.14)
= 0,
(4.15)
0
n=0
lim
N −1
t→0
lim
2 ξˆt (nt)t f (nt)
n=0
N −1
t→0
2 ξˆt (nt) f (nt) k
n=0
where N := τ/t and k ≥ 3. We formally write these relations as (d Wˆ (t))2 = dt, d Wˆ (t)dt = (d Wˆ (t))k = 0.
(4.16) (4.17)
Intuitively speaking, the average is taken not by collecting many ensembles but by piling up many pieces of noise with infinitesimal time length, which leads to the relations without taking the ensemble average. Proof We here show the proof of Eq. (4.13). Others can be shown in a similar manner. By definition of the Wiener process (4.1) and the discretized white Gaussian noise (4.5), the fourth moment of ξˆt is calculated as 1 w2 4 ξˆt = dww 4 · √ (4.18) e− 2t = 3t 2 . 2π t We thus have
2 ξˆt − t = 0, 2 2 ˆξt = 2t 2 . − t
(4.19) (4.20)
By defining
We say that limt→0 gt = g is satisfied in the sense of the mean-square limit if and only if limt→0 (gt − g)2 = 0 is satisfied.
5
4.1 Mathematical Foundations
53
Dt :=
N −1
τ f (nt)t −
dt f (t),
n=0
(4.21)
0
which can be arbitrarily small by taking small t, the left-hand side of Eq. (4.13) is evaluated as ⎞2
⎛ N −1 τ lim ⎝ ξˆt (nt)2 f (nt) − dt f (t)⎠
t→0
n=0
0
N −1
2 2 ξˆt (nt) − t f (nt) + Dt = lim t→0
= lim
t→0
n=0
N −1
ξˆt (nt)2 − t
2
f (nt)2 + O(Dt )
n=0
= lim O(t) + O(Dt ) t→0
= 0,
(4.22)
where we used O(Dt ) = O(t).
2 In the reminder of this textbook, we formally write ξˆτ (t) not referring to a simple 2 square (ξˆτ (t)) but to
2 ξˆτ (t) :=
N τ −1
(ξˆt (t + nt))2
(4.23)
n=0
with Nτ := τ/t, which converges to τ without taking the ensemble average.
4.1.2 Stochastic Differential Equations and Integrals 4.1.2.1
Singularity
We next define the differentials and integrals with the Wiener process. Since the Wiener process is singular almost everywhere, a careful treatment is required compared to normal integrals and differentials.
54
4 Stochastic Processes in Continuous Space
Let f (t, Wˆ (t)) be a function which depends directly on the quantity of the Wiener process Wˆ (t). The integration of f (t, Wˆ (t)) with d Wˆ (t) is defined as τ
d Wˆ (t) f (t, Wˆ (t)) := lim
t→0
0
N −1
ξˆt (tn ) f (tn , Wˆ t (tn )).
(4.24)
n=0
Here, we defined tn := nt and tn := tn + αt with 0 ≤ α ≤ 1. We leave the ambiguity of α at present. We also defined the modified Wiener process discretized with a time step t as Wˆ t (tn ) : =
n−1
ξˆt (tm ),
(4.25)
m=0
Wˆ t (tn ) : = Wˆ t (tn ) + α ξˆt (tn ).
(4.26)
Now a problem on how to define tn (i.e., α) arises. There exists arbitrariness of α. In normal integrals, the choice of tn is irrelevant to the result after taking the t → 0 limit. In contrast, the integral with the Wiener process depends on the choice of tn even after taking the t → 0 limit. To see this, let us consider f (t, Wˆ (t)) = Wˆ (t) as a simple example. In this case, we have6
τ
N −1 ˆ ˆ ξˆt (tn ) Wˆ t (tn ) + α ξˆt (tn ) = ατ, d W (t)W (t) = lim t→0
0
(4.27)
n=0
which clearly depends on the choice of α. We here remark that this problem stems from not stochasticity but singularity. In fact, a similar problem still occurs in deterministic dynamics with singularity.7 Let us consider a deterministic differential equation: d x(t) = x(t)δ(t − 1) dt
(4.28)
with the initial condition x(0) = 1. Arbitrariness again appears in the product of the delta function at t = 1: The rule of product at t = 1 affects the time evolution as lim x(t)δ(t − 1) = (1 − α)
t→1
lim x(t) δ(t − 1) + α lim x(t) δ(t − 1).
t→1−0
t→1+0
(4.29) Solving this differential equation under the rule of product Eq. (4.29), we have We remark that Wˆ t (tn ) does not contain any term of ξˆt (tn ), and hence ξˆt (tn )Wˆ t (tn ) = 0 holds. 7 See also the Ph.D. thesis of Kanazawa [31]. 6
4.1 Mathematical Foundations
55
x
Fig. 4.1 Solutions of the differential equation Eq. (4.29) with α = 0 and α = 1/2
3
α=1/2
2
α=0
1
x(t) = 1 +
1 (t > 1), 1−α
t
(4.30)
which clearly depends on the choice of α (see Fig. 4.1). In summary, if a differential equation of x contains a product of a function of x and a singular function, we should care about the rule of product: Otherwise, the differential equation might be ill-defined or unrealistic. The choice of the rule of product relies on other information on the stochastic dynamics (e.g., what type of dynamics exists behind the stochastic process). We see what rule of products works well in overdamped Langevin systems with space-dependent temperature and space-dependent viscosity in Sect. 4.2.1 and in the definition of heat in Sect. 4.3.
4.1.2.2
Itô Integral and Stratonovich Integral
We here introduce two important rules of product. The first one corresponds to α = 0 in Eq. (4.24) called Itô integral or Itô’s rule of product. The second one corresponds to α = 1/2 in Eq. (4.24) called Stratonovich integral or Stratonovich product. Definition: Itô integral
The Itô integral, which is denoted by the product sign “·”, is defined as follows: τ
d Wˆ (t) · f (t, Wˆ (t)) := lim
t→0
0
N −1
ξˆt (tn ) f (tn , Wˆ t (tn )).
(4.31)
n=0
56
4 Stochastic Processes in Continuous Space
Definition: Stratonovich integral
The Stratonovich integral, which is denoted by the product sign “◦”, is defined as follows8 : τ
d Wˆ (t) ◦ f (t, Wˆ (t)) := lim
t→0
0
N −1
ξˆt (tn ) f
n=0
tn , Wˆ t
tn + tn+1 2
. (4.32)
The definition of the Itô integral places emphasis on time ordering: A new noise ξˆt is generated and acts on f step by step, where ξˆt is multiplied by the function f at the value just before this noise ξˆt is generated. In other words, a stochastic τ process given by the Itô integral Xˆ (t) := 0 d Wˆ (t) · f (t, Wˆ (t)) is a martingale.9 In contrast, the definition of the Stratonovich integral places emphasis on averaging. When we multiply the noise ξˆt , the function f takes the mean value just before and just after the noise ξˆt is generated. The Itô integral and the Stratonovich integral are connected through a simple relation, which follows from the Taylor expansion:
tn + tn+1 ) 2 1 = ξˆt (tn ) f (tn , Wˆ t (tn ) + ξˆt (tn )) 2 ∂ f (tn , W ) 1 + o(t) = ξˆt (tn ) f (tn , Wˆ t (tn )) + ξˆt (tn )2 ˆ 2 ∂W W =Wt (tn ) ∂ f (tn , W ) 1 ˆ ˆ = ξt (tn ) f (tn , Wt (tn )) + t + o(t). (4.34) ˆ 2 ∂W W =Wt (tn ) ξˆt (tn ) f (tn , Wˆ t
This relation directly implies a useful relation connecting the Itô product and the Stratonovich product: 1 ∂ f (t, W ) ˆ ˆ ˆ ˆ . d W (t) ◦ f (t, W (t)) = d W (t) · f (t, W (t)) + dt 2 ∂ W W =Wˆ (t)
8
(4.35)
Some textbooks define the Stratonovich integral as lim
t→0
N −1 n=0
ξˆt (tn )
f (tn , Wˆ t (tn )) + f (tn , Wˆ t (tn+1 )) . 2
(4.33)
These two definitions are equivalent and give the same results. 9 A stochastic process X ˆ (t) is called a martingale if the expectation value of increment Xˆ (t + t) − Xˆ (t) conditionalized by its all past states Xˆ (t ) (t ≤ t) is zero. See also Sect. 8.3.1 for further information on the martingale.
4.1 Mathematical Foundations
57
We note that the function f (t, Wˆ (t)) symbolically contains the argument Wˆ (t), while it is not necessarily a function of Wˆ (t) itself but a function of its differences ξˆt (t0 ), ξˆt (t1 ), . . . , ξˆt (tn ). In this case, the differential ∂ f (t, Wˆ (t))/∂ Wˆ is interpreted as ∂ f (t, ξˆt (t0 ), . . . , ξˆt (tn ))/∂ ξˆt (tn ). The significance of the Itô integral is non-anticipation, where d Wˆ (t) and f (t, Wˆ (t)) have no correlation and thus the expectation value of their product vanishes. Theorem: Non-anticipation of Itô integral
The time integral of the product of d Wˆ (t) and f (t, Wˆ (t)) has zero ensemble average:
τ
d Wˆ (t) · f (t, Wˆ (t)) =
0
τ
d Wˆ (t)
f (t, Wˆ (t)) = 0.
(4.36)
0
The significance of the Stratonovich product lies in the fact that the conventional chain rule in the differential form holds in this product. Theorem: Chain rule for Stratonovich product
With the Stratonovich product, the differential satisfies the normal chain rule: d f (t, Wˆ (t)) =
∂f ∂f dt + ◦ d Wˆ (t). ∂t ∂ Wˆ
(4.37)
The precise meaning of Eq. (4.37) reads f (t, Wˆ (t)) : = f (t + t, Wˆ t (t) + ξˆt (t)) − f (t, Wˆ t (t)) ∂ f ∂f ξˆt (t) + o(t). t + = ∂t ∂ Wˆ t Wˆ t =Wˆ t (t)+ξˆt (t)/2
(4.38)
Proof (of Eq. (4.37)) The Taylor expansion of f (t + t, Wˆ t (t) + ξˆ (t)) up to O(t) leads to the desired relation: f (t + t, Wˆ t (t) + ξˆt (t)) ∂ f 1 ∂ 2 f ∂f t + ξt (t) + ξt (t)2 + o(t) = f + 2 ∂t 2 ∂ Wˆ t ∂ Wˆ t Wˆ t =Wˆ t (t) Wˆ t =Wˆ t (t) ∂ f ∂f ξˆt (t) + o(t), t + (4.39) = f + ∂t ∂ Wˆ t Wˆ =Wˆ (t)+ξˆ (t)/2 t
t
t
58
4 Stochastic Processes in Continuous Space
where the arguments of f are (t, Wˆ t ) without explicit description. We used the fact
that (d Wˆ t (t))2 = O(dt) and d Wˆ t (t)dt = o(dt). Conversely, the Stratonovich integral does not satisfy the non-anticipation relation
τ
d Wˆ (t) ◦ f (t, Wˆ (t)) = 0
(4.40)
0
in general, and the Itô product does not satisfy the chain rule: d f (t, Wˆ (t)) =
∂f ∂f dt + · d Wˆ (t). ∂t ∂ Wˆ
(4.41)
The extension of the chain rule to the Itô products is called Itô’s lemma, which is presented in the next subsection. The peculiarity of stochastic integrals and stochastic differential √equations can be seen in the fact that the noise term ξˆt is regarded as of order O( t).10 Owing to this, the square of small parameters usually has nonnegligible contributions of order O(t), which leads to second-order modification terms in the analyses of stochastic integrals and stochastic differential equations.
4.1.2.3
Stochastic Differential Equation
Using these two rules, we can define two types of stochastic differential equations, the Itô type and the Stratonovich type: Theorem: Stochastic differential equation
The stochastic differential equations of the Itô type and the Stratonovich type are respectively defined as d xˆ = a(x(t), ˆ t) + b(x(t), ˆ t) · ξˆ (t), dt d xˆ = a(x(t), ˆ t) + b(x(t), ˆ t) ◦ ξˆ (t), dt
(4.42) (4.43)
where a(x, t) and b(x, t) are given functions corresponding to the mobility and the diffusion coefficient. The noise term ξˆt is understood as a time integral of the white noise ξˆ (t) in continuous time. This point is important in the dimensional analysis: The noise term ξˆ (t) in stochastic differential equations has the dimension of the inverse of the square root of time (i.e., [time]−1/2 ).
10
4.1 Mathematical Foundations
59
Here, Eqs. (4.42) and (4.43) are formal expressions of the following stochastic equations: xˆn+1 = a(xˆn , tn )t + b(xˆn , tn )ξˆt (tn ), xˆn + xˆn+1 , tn ξˆt (tn ), xˆn+1 = a(xˆn , tn )t + b 2
(4.44) (4.45)
where we considered a corresponding discrete-time stochastic process with interval t and defined the n-th time step as tn := nt. We also defined the state at the n-th ˆ n ) and its displacement as xˆn+1 := xˆn+1 − xˆn . We remark that the step as xˆn := x(t definition of the Stratonovich type is recursive, since xˆn+1 appears on both sides. A noise is called additive if b(x(t), ˆ t) is independent of x(t), ˆ and called multiplicative otherwise.11 In the case of additive noise, stochastic dynamics is independent of the rule of products. We first see how the Stratonovich type stochastic differential equations is transformed into the Itô type one. Setting f (t, Wˆ (t)) in Eq. (4.35) as b(x(t), ˆ t),12 we 13 obtain ∂b(x(t), ˆ t) ˆ 1 ∂ x(t) d Wˆ (t) ◦ b(x(t), ˆ t) = d Wˆ (t) · b(x(t), ˆ t) + dt 2 ∂ Wˆ (t) ∂ x(t) ˆ ∂b(x(t), ˆ t) 1 ˆ t) , = d Wˆ (t) · b(x(t), ˆ t) + dt b(x(t), 2 ∂ x(t) ˆ
(4.46)
which suggests that the following two stochastic differential equation are equivalent: d xˆ 1 ∂b(x(t), ˆ t) = a(x(t), ˆ t) + b(x(t), ˆ t) + b(x(t), ˆ t) · ξˆ (t), dt 2 ∂ x(t) ˆ d xˆ = a(x(t), ˆ t) + b(x(t), ˆ t) ◦ ξˆ (t). dt
(4.47) (4.48)
More generally, by denoting the rule of product with α by ×α , the following two stochastic differential equation are equivalent:
11 In the case of multiplicative noise, the noise is multiplied by a variable (i.e., b( x(t), ˆ t)) depending on the same noise. This is why such a noise is called multiplicative. In contrast, in the case of additive noise, the noise b(t) ξˆ (t) is only added to the term a(t, x(T ˆ )) in stochastic differential equations. 12 Here, x(t) ˆ = dt a(x(t ˆ ), t ) + d Wˆ (t ) ◦ b(x(t ˆ ), t ) is a function of ξˆt (t0 ), . . . , ξˆt (tn ). ˆ In the second term of this relation b(x(t), ˆ t) ∂b(∂x(t),t) , we need not bother with the rule of products x(t) ˆ because a small quantity dt is multiplied by this quantity, and thus the difference caused by the rule of products has only a higher-order contribution.
13
60
4 Stochastic Processes in Continuous Space
∂b(x(t), ˆ t) d xˆ = a(x(t), ˆ t) + αb(x(t), ˆ t) + b(x(t), ˆ t) · ξˆ (t), dt ∂ x(t) ˆ d xˆ = a(x(t), ˆ t) + b(x(t), ˆ t) ×α ξˆ (t). dt
(4.49) (4.50)
We next see the differential form of x(t). ˆ In a similar manner to the derivation of the chain rule Eq. (4.37), the differential form of a function f (x) ˆ denoted by d f (x) ˆ is expressed as follows. Theorem: Itô’s lemma
Consider a stochastic differential equation d x(t) ˆ = a(x(t), ˆ t) + b(x(t), ˆ t) ×α ξˆ (t) dt
(4.51)
whose rule of product (i.e., α) is arbitrary. Then any twice-differentiable function f satisfies 1 d f (x(t)) ˆ = f (x(t)) ˆ t)2 f (x(t))dt ˆ · d x(t) ˆ + b(x(t), ˆ 2 ˆ ◦ d x(t), ˆ = f (x(t))
(4.52) (4.53)
where we defined d x(t) ˆ := a(x(t), ˆ t)dt + b(x(t), ˆ t) ×α d Wˆ (t) with the same 14 rule of products. The first line is named Itô’s lemma. We emphasize that the above relations hold independent of the rule of products in the stochastic differential equation Eq. (4.51). For example, if the stochastic differential equation Eq. (4.51) is with the Stratonovich rule (i.e., α = 1/2), Ito’s lemma Eq. (4.52) for the Itô product still holds. We also remark that even if the noise is additive (i.e., b(t, x(t)) ˆ is independent of x(t)), ˆ the second term of the right-hand side of Eq. (4.52) does not vanish. This means that the conventional chain rule does not hold even for additive noise unless we employ the Stratonovich product. Proof We write the relation as a difference equation, where d f (x(t)) ˆ corresponds to f (xˆn+1 ) − f (xˆn ). The Taylor expansion of f (xˆn+1 ) thus leads to the Itô’s lemma: 1 2 f (xˆn )xˆn+1 + o(t) 2 1 = f (xˆn )xˆn+1 + f (xˆn )b(x(t ˆ n ), tn )2 t + o(t), 2
f (xˆn+1 ) − f (xˆn ) = f (xˆn )xˆn+1 +
where we used 14
To be precise, d x(t) ˆ is defined as xˆn+1 − xˆn .
(4.54)
4.1 Mathematical Foundations
61
2 xˆn+1 = b(x(t ˆ n ), tn )2 t + o(t).
(4.55)
To obtain Eq. (4.53), we use the transformation rule Eq. (4.35).
4.1.2.4
Fokker-Planck Equation
We have seen the dynamics of a stochastic variable xˆ in detail. Now we shall proceed to the time evolution of a probability distribution P(x, t). The probability distribution of stochastic variables follows a deterministic differential equation called the FokkerPlanck equation. Theorem: Fokker-Planck equation (Itô type)
Consider a system described by the Itô type stochastic differential equation Eq. (4.42). Then, the probability distribution P(x, t) of this system evolves as ∂ 1 ∂2 d P(x, t) 2 P(x, t). = − a(x, t) + b(x, t) dt ∂x 2 ∂x2
(4.56)
Remark that the differential operator ∂/∂ x acts not only on a(x, t) and b(x, t) but also on P(x, t). Proof Let f (x) be a twice-differentiable function and f (x) ˆ t be its expectation value with the probability distribution P(x, t). Then the ensemble average of f (x) ˆ at time t + t with small t is calculated as
f (x) ˆ t+t = f xˆ + xˆ t 1 d 2 f (x) ˆ d f (x) ˆ 2 2 xˆ + = f (x) ˆ + xˆ + o(xˆ ) dx 2 dx2 t 1 d 2 f (x) d f (x) 2 P(x, t)t + o(t) a(x, t) + = d x f (x) + b(x, t) dx 2 dx2 1 d2 d 2 = f (x) ˆ t + d x f (x) − (a(x, t)P(x, t)) + (b(x, t) P(x, t)) t dx 2 dx2 (4.57) + o(t). Inserting
ˆ t= f (x) ˆ t+t − f (x)
d x f (x)
∂ P(x, t)t + o(t) ∂t
(4.58)
62
4 Stochastic Processes in Continuous Space
to the above relation, we arrive at a relation
d 1 d2 ∂ 2 P(x, t) + (a(x, t)P(x, t)) − (b(x, t) P(x, t)) t = 0 d x f (x) ∂t dx 2 dx2 (4.59)
up to O(t). Since the above relations hold for any function f (x), we find that the two integrands without f (x) are equivalent, which means Eq. (4.56).
Since Eqs. (4.47) and (4.48) are equivalent, we easily transform Eq. (4.56) into the Fokker-Planck equation for a Stratonovich type stochastic differential equation.15 Theorem: Fokker-Planck equation (Stratonovich type)
Consider a system described by the Stratonovich type stochastic differential equation Eq. (4.43). Then the probability distribution evolves as
∂ 1 ∂ ∂ d P(x, t) = − a(x, t) + b(x, t) b(x, t) P(x, t). dt ∂x 2 ∂x ∂x
(4.60)
4.1.3 Differential Chapman-Kolmogorov Equation We have restricted our attention to a specific form of Markov processes; stochastic differential equations in the form of Eq. (4.42).16 In this subsection, we investigate the most general form of equations which cover all possible Markovian stochastic processes. As a result, a Markovian stochastic process is shown to be a composition of a process with the white Gaussian noise ξˆ (t) and the Poisson noise. The Poisson noise supplies a delta function type impulse (kick) according to the Poisson process, which induces discrete displacement of stochastic variables. If we require that a stochastic path should be continuous, the stochastic differential equation Eq. (4.42) in fact covers all possible Markov dynamics. In other words, our restriction to stochastic differential equations in the form of Eq. (4.42) turns out not to overlook any possible continuous Markov process.
15
We remark that this expression holds only for the case of a single variable. In the case of multiple variables, the Fokker-Planck equation for the Stratonovich type stochastic differential equation is much more complicated. 16 The Stratonovich type stochastic differential equation Eq. (4.43) can be transformed into the Itô type one.
4.1 Mathematical Foundations
63
Theorem: Differential Chapman-Kolmogorov equation
We assume that the following three limits exist for any ε > 0: 1. lim
t→0
P(x, t + t|z, t) = W (x|z, t) t
(4.61)
for |x − z| ≥ ε. 2. 1 lim t→0 t
d x(x − z)P(x, t + t|z, t) = A(z, t) + O(ε). (4.62) . |x−z|ε
is satisfied.
It is noteworthy that higher order moments of x − z vanish in the ε → 0 limit. In fact, the k-th moment (k ≥ 3) is calculated as 1 lim d x(x − z)k P(x, t + t|z, t) t→0 t |x−z|