Typicality Reasoning in Probability, Physics, and Metaphysics 3031334477, 9783031334474

This book provides a comprehensive investigation into the concept of typicality and its significance for physics and the

146 41 8MB

English Pages 391 [380] Year 2023

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Acknowledgments
Contents
List of Figures
List of Tables
1 Introduction
1.1 Typicality
1.2 Typicality Explanations
1.3 The Boltzmannian Framework
1.4 Brute Facts
References
Part I Probability
2 Typicality in Probability Theory
2.1 Expectation Value and Typical Values
2.2 Law of Large Numbers
2.2.1 The N Law
2.2.2 The Central Limit Theorem
2.3 Subjective Probabilities and Propensities
2.3.1 Subjective Probabilities
2.3.2 Stochastic Laws
References
3 Cournot's Principle
3.1 Formulations of Cournot's Principle
3.2 On the Rationality of Cournot's Principle
3.2.1 Moral Certainty
3.2.2 Does Nature Have to Obey Cournot's Principle?
3.2.3 Black Swans and Pascal's Wager
3.2.4 The Lottery Paradox and Rational Belief
CP and the Stability Theory of Belief
References
4 A Typicality Theory of Probability
4.1 The Coin Toss
4.1.1 Normal Numbers as a Model for Coin Tossing
Law of Large Numbers for Rademacher Functions
Biased Coins
4.2 Typical Frequencies
4.3 Probabilities for Singular Events
4.3.1 Rational Credences from Statistics
References
5 The Mentaculus: Typicality Versus Humean Chances
5.1 Typicality Versus Humean Chances
5.1.1 A True Regularity Theory of Chance
Principal Principle Versus Cournot's Principle
Probability Versus Typicality Measures
5.2 Epistemology and Metaphysics of Typicality Measures
5.3 Justification of Typicality Measures
5.3.1 Stationarity, Uniformity, Symmetry
A Geometric View of Stationarity
Invariance Under Symmetries
5.3.2 How to Choose a Typicality Measure
5.4 Typicality for Humeans
References
6 The Structure of Typicality
6.1 A Theory of ``Small'' and ``Big'' Sets
6.2 Criteria for Typicality
6.3 Typicality Measures
6.3.1 Equivalence of Typicality Measures
Absolute Continuity
Total Variation and Typicality Thresholds
A Bound from Densities
References
Part II Physics
7 From the Universe to Subsystems
7.1 The Hamiltonian Picture
7.2 Probabilities in Classical Mechanics
7.2.1 Ideal Gas: The Maxwell Distribution
7.2.2 The Coin Toss Again
7.3 Deterministic Subsystems
7.3.1 The Stone Throw
References
8 Boltzmann's Statistical Mechanics
8.1 The Second Law of Thermodynamics
8.1.1 The Typicality Account
8.1.2 Macroscopic Irreversibility
Past Hypothesis and the Thermodynamic Arrow
8.1.3 The Role of the Typicality Measure
8.1.4 On the Boltzmann Entropy
8.2 The Status of Macroscopic Laws
8.2.1 Derivation of Typicality Laws
8.3 Boltzmann vs. Gibbs
8.3.1 Empirical Equivalence of Equilibrium Values
8.3.2 Derivation of the Equilibrium Ensembles
References
9 It's Complicated: The Relationship between Physics and Mathematics
9.1 The Pernicious Influence of Ergodic Theory
9.2 Proof and Explanation
References
10 Boltzmann Equation and the H-Theorem
10.1 Kinetic Equations
10.1.1 Molecular Chaos
10.2 The H-Theorem as a Typicality Result
10.2.1 The Stoßzahlansatz
10.2.2 Irreversibility of the Boltzmann Equation
References
11 Past Hypothesis and the Arrow of Time
11.1 The Easy and the Hard Problem of Irreversibility
11.1.1 The Status of the Past Hypothesis
11.2 Thermodynamic Arrow Without a Past Hypothesis
11.2.1 Past Hypothesis and Self-Location
Predictions and Retrodictions
The Mystery of Our Low-Entropy Universe
11.3 Entropy of a Classical Gravitating System
11.3.1 Typical Evolutions of a Gravitating System
11.4 Gravity and Typicality from a Relational Point of View
11.4.1 Shape Complexity and a Gravitational Arrow
11.4.2 Entropy as an Absolutist Concept
Conclusion: Can We Dispense with the Past Hypothesis?
References
12 Causality and the Arrow of Time
12.1 Causal Explanations as Typicality Explanations
12.2 Causal and Epistemic Asymmetry
12.2.1 Asymmetry of Records
12.2.2 Asymmetry of Influences
References
13 Quantum Mechanics
13.1 The Measurement Problem
13.1.1 Connecting the Wave Function to the World
13.2 Born's Rule and the Measurement Process
13.2.1 Typicality and Observation
13.3 Observable Operators as Statistical Book-Keepers
13.4 Quantum Equilibrium: Probabilities in Bohmian Mechanics
13.4.1 Bohmian Mechanics
The Typicality Measure
Effective Wave Functions for Subsystems
Quantum Equilibrium
13.4.2 Absolute Uncertainty
Why Determinism?
13.4.3 Thermodynamic Arrow in Bohmian Mechanics
13.5 Born's Rule in the Many-Worlds Theory
13.5.1 Probabilities of What?
13.5.2 Everett's Typicality Argument
13.5.3 Living and Dying in the Multiverse
References
Part III Beyond Physics
14 Other Applications of Typicality
14.1 Typicality and Well-Posedness
14.2 Typicality and Fine-Tuning
14.2.1 The Flatness Problem
14.2.2 Fine-Tuning of the Natural Constants
14.3 Typicality in Mathematics
References
15 Special Science Laws
15.1 Ontology of Special Sciences
15.1.1 Probability and Causation in Special Sciences
15.2 Special Science Laws as Typicality Laws
15.2.1 The Hierarchy of Sciences
15.3 Is Life Atypical?
References
16 Typicality and the Metaphysics of Laws
16.1 What Are the Laws of Nature?
16.2 Typicality in Metaphysics
16.2.1 Ontological Possibility
16.2.2 Typicality and the Case Against Humeanism
16.3 Typical Humean Worlds Have No Laws
16.3.1 The Chaitin Model
16.3.2 From the Toy Model to the Real World
Finite Systematizations
Indeterministic Laws
16.4 On the Uniformity of Nature
References
A Time-Reversal Invariance
Newtonian Mechanics
Electrodynamics
Quantum Mechanics
Bohmian Mechanics
On the Role of Metaphysics
B Proof of Theorems
B.1 Computation of the Gravitational Entropy
B.2 Atypicality of Lawfulness Among Possible Humean Worlds
References
Index
Recommend Papers

Typicality Reasoning in Probability, Physics, and Metaphysics
 3031334477, 9783031334474

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

NEW DIRECTIONS IN THE PHILOSOPHY OF SCIENCE

Typicality Reasoning in Probability, Physics, and Metaphysics Dustin Lazarovici

New Directions in the Philosophy of Science

Series Editor Lydia Patton Department of Philosophy Virginia Tech Blacksburg, VA, USA

These are exciting times for scholars researching the philosophy of science. There are new insights into the role of mathematics in science to consider; illuminating comparisons with the philosophy of art to explore; and new links with the history of science to examine. The entire relationship between metaphysics and the philosophy of science is being re-examined and reconfigured. Since its launch in 2012, Palgrave’s New Directions in the Philosophy of Science series has, under the energetic editorship of Steven French, explored all these dimensions. Topics covered by books in the series during its first eight years have included interdisciplinary science, the metaphysics of properties, chaos theory, the social epistemology of research groups, scientific composition, quantum theory, the nature of biological species, naturalism in philosophy, epidemiology, stem cell biology, scientific models, natural kinds, and scientific realism. Now entering a new period of growth under the enthusiastic guidance of a new Editor, Professor Lydia Patton, this highly regarded series continues to offer the ideal home for philosophical work by both early career researchers and senior scholars on the nature of science which incorporates novel directions and fresh perspectives. The members of the editorial board of this series are: Holly Andersen, Philosophy, Simon Fraser University (Canada) Otavio Bueno, Philosophy, University of Miami (USA) Anjan Chakravartty, University of Notre Dame (USA) Steven French, Philosophy, University of Leeds (UK) series editor Roman Frigg, Philosophy, LSE (UK) James Ladyman, Philosophy, University of Bristol (UK) Michela Massimi, Science and Technology Studies, UCL (UK) Sandra Mitchell, History and Philosophy of Science, University of Pittsburgh (USA) Stathis Psillos, Philosophy and History of Science, University of Athens (Greece) For further information or to submit a proposal for consideration, please contact Brendan George on [email protected]

Dustin Lazarovici

Typicality Reasoning in Probability, Physics, and Metaphysics

Dustin Lazarovici Humanities and Arts Department Technion - Israel Institute of Technology Haifa, Israel

ISSN 2947-6828 ISSN 2947-6836 (electronic) New Directions in the Philosophy of Science ISBN 978-3-031-33447-4 ISBN 978-3-031-33448-1 (eBook) https://doi.org/10.1007/978-3-031-33448-1 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Cover illustration: Cover image by Nirel Matsil This Palgrave Macmillan imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Paper in this product is recyclable.

To Detlef Dürr (1951–2021)

Acknowledgments

This book is, in many ways, my take on and elaboration of insights I owe to people who I am lucky to call my teachers. First and foremost, my mentor and friend Detlef Dürr. The ideas developed in this book have typically been inspired by him, either directly or indirectly. Sheldon Goldstein is the person who has done most to introduce the modern concept of typicality. Tim Maudlin and Nino Zanghì played important roles in deepening my understanding of it. Glenn Shafer has been my primary source on the history of the closely related Cournot principle. I am just as lucky to have outstanding scholars like David Z Albert and Barry Loewer engaging with me from an opposing side of many debates, constantly pushing me to sharpen my position and arguments. I am very grateful to Michael Esfeld for making my transition to philosophy possible through many years of support and everything I have learned from him. Among the many other teachers, colleagues, collaborators, and friends who have influenced my work throughout the years, I want to thank the following in particular: Jeff Barrett, Julian Barbour, Christian Beck, Jean Bricmont, Eddy Keming Chen, Dirk-André Deckert, Uri Eran, Mario Hubert, Ohad Nachtomy, Peter Pickl, Paula Reichert, Christian Sachse, Stefan Teufel, Isaac Wilhelm, and Gerhard Winkler (2014). vii

viii

Acknowledgments

This book grew out of research I undertook at the Université de Lausanne, and I would like to thank Michael Esfeld, Barry Loewer, and Sheldon Goldstein for their advice at that time. A very inspiring and productive stage of the doctorate was a research stay at Columbia University, kindly hosted by David Z Albert. For this, I gratefully acknowledge funding by the Swiss National Science Foundation (SNSF) Doc.Mobility Fellowship P1LAP1_184150. Some sections are based on previous publications. Material from publications with co-authors was used with their consent. • Lazarovici, D. (2023). Typicality versus humean probabilities as the foundation of statistical mechanics. In B. Loewer, B. Weslake, & E. Windsberg (Eds.), The probability map of the universe: Essays on David Albert’s time and chance. Harvard University Press. • Dürr, D., & Lazarovici, D. (2020). Understanding quantum mechanics: The world according to modern quantum foundations. Chapters 3 & 6. Springer International Publishing. ISBN 978-3-030-40067-5. • Lazarovici, D., & Reichert, P. (2020). Arrow(s) of time without a past hypothesis. In V. Allori, (Ed.), Statistical mechanics and scientific explanation: Determinism, indeterminism and laws of nature. World Scientific. ISBN 978-981-121-171-3. • Lazarovici, D. (2019). On Boltzmann versus Gibbs and the equilibrium in statistical mechanics. Philosophy of Science, 86 (4), 785–793. https:// doi.org/10.1086/704983. • Oldofredi, A., Lazarovici, D., Deckert, D.-A., & Esfeld, M. (2016). From the universe to subsystems: Why quantum mechanics appears more stochastic than classical mechanics. Fluctuation and Noise Letters, 15(03), 1640002. https://doi.org/10.1142/S0219477516400022. • Lazarovici, D., & Reichert, P. (2015). Typicality, irreversibility and the status of macroscopic laws. Erkenntnis, 80(4), 689–716. Many thanks to Stephen Lyle for the excellent copy editing. Any remaining typos or misspellings are mine alone. Special thanks go to Sir Roger Penrose, who gave me permission to reproduce his wonderful drawing included as Fig. 1.1 in this book.

Contents

1

Introduction 1.1 Typicality 1.2 Typicality Explanations 1.3 The Boltzmannian Framework 1.4 Brute Facts References

Part I Probability

1 3 8 11 15 19

21

2 Typicality in Probability Theory 2.1 Expectation Value and Typical Values 2.2 Law of Large √ Numbers 2.2.1 The N Law 2.2.2 The Central Limit Theorem 2.3 Subjective Probabilities and Propensities 2.3.1 Subjective Probabilities 2.3.2 Stochastic Laws References

23 24 26 27 29 32 33 35 37

3 Cournot’s Principle

39 ix

x

Contents

3.1 3.2

Formulations of Cournot’s Principle On the Rationality of Cournot’s Principle 3.2.1 Moral Certainty 3.2.2 Does Nature Have to Obey Cournot’s Principle? 3.2.3 Black Swans and Pascal’s Wager 3.2.4 The Lottery Paradox and Rational Belief References 4

5

6

A Typicality Theory of Probability 4.1 The Coin Toss 4.1.1 Normal Numbers as a Model for Coin Tossing 4.2 Typical Frequencies 4.3 Probabilities for Singular Events 4.3.1 Rational Credences from Statistics References

42 45 45 46 48 49 54 57 58 60 66 70 71 73

The Mentaculus: Typicality Versus Humean Chances 5.1 Typicality Versus Humean Chances 5.1.1 A True Regularity Theory of Chance 5.2 Epistemology and Metaphysics of Typicality Measures 5.3 Justification of Typicality Measures 5.3.1 Stationarity, Uniformity, Symmetry 5.3.2 How to Choose a Typicality Measure 5.4 Typicality for Humeans References

75 79 82 87 90 92 96 97 100

The Structure of Typicality 6.1 A Theory of “Small” and “Big” Sets 6.2 Criteria for Typicality 6.3 Typicality Measures 6.3.1 Equivalence of Typicality Measures

103 104 106 110 113

Contents

References

Part II Physics

xi

116

117

7

From the Universe to Subsystems 7.1 The Hamiltonian Picture 7.2 Probabilities in Classical Mechanics 7.2.1 Ideal Gas: The Maxwell Distribution 7.2.2 The Coin Toss Again 7.3 Deterministic Subsystems 7.3.1 The Stone Throw References

119 120 123 124 128 130 131 132

8

Boltzmann’s Statistical Mechanics 8.1 The Second Law of Thermodynamics 8.1.1 The Typicality Account 8.1.2 Macroscopic Irreversibility 8.1.3 The Role of the Typicality Measure 8.1.4 On the Boltzmann Entropy 8.2 The Status of Macroscopic Laws 8.2.1 Derivation of Typicality Laws 8.3 Boltzmann vs. Gibbs 8.3.1 Empirical Equivalence of Equilibrium Values 8.3.2 Derivation of the Equilibrium Ensembles References

135 136 139 142 144 146 149 152 154

9

It’s Complicated: The Relationship between Physics and Mathematics 9.1 The Pernicious Influence of Ergodic Theory 9.2 Proof and Explanation References

155 158 160

163 164 170 172

xii

Contents

10 Boltzmann Equation and the H -Theorem 10.1 Kinetic Equations 10.1.1 Molecular Chaos 10.2 The H -Theorem as a Typicality Result 10.2.1 The Stoßzahlansatz 10.2.2 Irreversibility of the Boltzmann Equation References

175 178 181 183 185 189 191

11

193 193 194 197 200 207 211

Past Hypothesis and the Arrow of Time 11.1 The Easy and the Hard Problem of Irreversibility 11.1.1 The Status of the Past Hypothesis 11.2 Thermodynamic Arrow Without a Past Hypothesis 11.2.1 Past Hypothesis and Self-Location 11.3 Entropy of a Classical Gravitating System 11.3.1 Typical Evolutions of a Gravitating System 11.4 Gravity and Typicality from a Relational Point of View 11.4.1 Shape Complexity and a Gravitational Arrow 11.4.2 Entropy as an Absolutist Concept References

215 219 222 224

12 Causality and the Arrow of Time 12.1 Causal Explanations as Typicality Explanations 12.2 Causal and Epistemic Asymmetry 12.2.1 Asymmetry of Records 12.2.2 Asymmetry of Influences References

227 231 234 237 240 242

13 Quantum Mechanics 13.1 The Measurement Problem 13.1.1 Connecting the Wave Function to the World 13.2 Born’s Rule and the Measurement Process

245 246 250 254

Contents

13.2.1 Typicality and Observation 13.3 Observable Operators as Statistical Book-Keepers 13.4 Quantum Equilibrium: Probabilities in Bohmian Mechanics 13.4.1 Bohmian Mechanics 13.4.2 Absolute Uncertainty 13.4.3 Thermodynamic Arrow in Bohmian Mechanics 13.5 Born’s Rule in the Many-Worlds Theory 13.5.1 Probabilities of What? 13.5.2 Everett’s Typicality Argument 13.5.3 Living and Dying in the Multiverse References

xiii

257 259 264 265 273 278 280 281 283 287 289

Part III Beyond Physics

295

14 Other Applications of Typicality 14.1 Typicality and Well-Posedness 14.2 Typicality and Fine-Tuning 14.2.1 The Flatness Problem 14.2.2 Fine-Tuning of the Natural Constants 14.3 Typicality in Mathematics References

297 297 300 300 303 304 310

xiv

Contents

15 Special Science Laws 15.1 Ontology of Special Sciences 15.1.1 Probability and Causation in Special Sciences 15.2 Special Science Laws as Typicality Laws 15.2.1 The Hierarchy of Sciences 15.3 Is Life Atypical? References

313 313 317 319 320 324 326

16 Typicality and the Metaphysics of Laws 16.1 What Are the Laws of Nature? 16.2 Typicality in Metaphysics 16.2.1 Ontological Possibility 16.2.2 Typicality and the Case Against Humeanism 16.3 Typical Humean Worlds Have No Laws 16.3.1 The Chaitin Model 16.3.2 From the Toy Model to the Real World 16.4 On the Uniformity of Nature References

327 327 330 332 335 338 339 341 345 350

A

Time-Reversal Invariance

353

B

Proof of Theorems B.1 Computation of the Gravitational Entropy B.2 Atypicality of Lawfulness Among Possible Humean Worlds

361 361 366

References

369

Index

371

List of Figures

Fig. 1.1

Fig. 2.1 Fig. 2.2

Fig. 4.1 Fig. 5.1

Fig. 7.1

Fig. 8.1

God picking the initial conditions of the universe with a fine needle. Drawing by Sir Roger Penrose (1989, p. 343, Fig. 7.19). ©Oxford University Press. Reproduced with permission of the Licensor through PLSclear. All rights reserved Bernoulli distribution with .p = 1/2 and .N = 40 (left), .N = 400 (right) Branching structure of possible histories after four coin tosses. The weights of the branches are probabilities assigned by a stochastic law Area under the graphs of the Rademacher functions .rk for .k = 1, 2, 3. The pre-images are half-open intervals Sketch of the solution space and its parameterization by time slices. The trajectory X is evaluated at times s and t, respectively. The flow .t,s yields the corresponding transition map The coin toss outcome .χi at time t is determined by microscopic initial conditions X in the macrostate .M0 and deterministic dynamics represented by the flow .t,0 . Thus, .χi (X) is obtained by evolving X and coarse-graining .X(t) to the macro-event heads or tails Partition of phase space into macro-regions. Size differences are much more extreme than depicted

6 30

36 61

93

129 138 xv

xvi

Fig. 8.2 Fig. 9.1

Fig. 11.1

Fig. 11.2

Fig. 11.3

Fig. 12.1

Fig. 13.1

List of Figures

Thermodynamic evolution of an expanding gas Typical entropy curves of macroscopic systems on thermodynamic time scales (left) and ergodic time scales (right). On the right, periods of maximal entropy are vastly longer than depicted Typical entropy curve for a Carroll universe. Arrows indicate the arrows of time on both sides of the global entropy minimum (Janus point) Self-location hypothesis in the fluctuation scenario (upper image) and Big Bang scenario (lower image) with bounded entropy. In the upper image, time scales are much longer than below, and periods of equilibrium much longer than depicted Top: evolution of the shape complexity .CS found by numerical simulation for .N = 1000 and Gaussian initial data. Bottom: schematic depiction (not found by numerical simulation) of three corresponding configurations on Newtonian spacetime. Source: Barbour et al. (2015) Typical microstates in the intermediate macro-region .M1 = Mact evolve into a higher-entropy region .M2 in both time directions. Only a small subset of microstates (light gray area) have evolved from the lower-entropy state .M0 in the past; an equally small subset (shaded area) will evolve into .M0 in the future. The actual microstate (cross) has evolved from the lower-entropy state in the past; only its future time evolution corresponds to the typical one relative to the macrostate .Mact . For simplicity, the diagram assumes that macrostates are invariant under the time-reversal transformation (.(q, p) → (q, −p) in classical mechanics) Branching Many-Worlds histories after three spin measurements. Successive arrows indicate successive outcomes. Adapted from Barrett (2016)

140

166

198

202

220

232

286

List of Tables

Table 4.1

Table 13.1

Absolute and relative number of 0–1-sequences of The given values are length .n = 1000 with k zeros.  approximate. Note that . nk is symmetric about .n/2 Comparison: Boltzmann distribution as the typical distribution of classical subsystems in thermal equilibrium (left column) and Born distribution as the typical distribution of Bohmian subsystems in quantum equilibrium (right column)

58

272

xvii

1 Introduction

Consider the following regularities that we observe in the world: A. Apples do not spontaneously jump up from the ground onto the tree. B. Rocks thrown on Earth fly along (roughly) parabolic trajectories. C. The relative frequency of heads in a long series of fair coin tosses comes out (approximately) .1/2. These regularities are all of a different kind. C is a statistical regularity. B a mechanical phenomenon. A turns out to be an instance of the second law of thermodynamics. All three regularities strike us as law-like. Arguably, they are even among the more basic experiences founding our belief in a lawful cosmos. And yet, none of them is nomologically necessary under the kind of fundamental microscopic laws that we take to hold in our universe. Indeed, given the huge number of microscopic degrees of freedom and the chaotic nature of their dynamics (small variations in the initial conditions can lead to vastly different evolutions), the microscopic laws put very few constraints on what is physically possible on macroscopic scales. It is possible for particles in the ground to move in such a coordinated way as to push an apple up in the air (we know that because the time© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Lazarovici, Typicality Reasoning in Probability, Physics, and Metaphysics, New Directions in the Philosophy of Science, https://doi.org/10.1007/978-3-031-33448-1_1

1

2

D. Lazarovici

reversed process is common and the microscopic laws are time-reversal invariant). It is possible for a balanced coin to land on .heads every single time it is tossed. And it is possible, as Albert (2015, p. 1) so vividly points out, for a flying rock to “suddenly [eject] one of its trillions of elementary particulate constituents at enormous speed and careen off in an altogether different direction, or (for that matter) spontaneously disassemble itself into statuettes of the British royal family, or (come to think of it) recite the Gettysburg Address.” Assuming deterministic laws, a physical event is nomologically possible if and only if there exist microscopic initial conditions—in the last resort, for the universe as a whole—that evolve under the dynamics in such a way as to realize the event. (I refer to “initial conditions” for simplicity, but one need not commit to a distinguished moment in or even the direction of time.) A necessary condition for the laws to be true is that they admit one solution that describes the actual micro-history of our universe. However, finding this exact solution to account for observed phenomena is neither feasible nor satisfying. Given our limited epistemic access to the microscopic state of the universe—or any complex system, for that matter—as well as the computational challenges involved in solving the equations of motion, we need a different inferential procedure from the microscopic laws to salient macroscopic regularities. But even if we did know the exact initial conditions and could predict the entire history of the universe deterministically, it would be utterly bewildering if law-like regularities such as the ones stated above turned out to be merely accidental, contingent on a very particular, fine-tuned microconfiguration of the universe. In other words, even if we were Laplacian demons and could verify that dynamical laws .+ initial conditions make (let’s say) the second law of thermodynamics true in our world, we should care for some additional fact or principle that makes it counterfactually robust and gives it more nomological authority. The answer proposed in this book is a simple one. The regularities that we observe in the world do not hold for all initial conditions—i.e., in all nomologically possible worlds—but in the overwhelming majority of them. They are, as we shall say, typical.

1 Introduction

1.1

3

Typicality

The basic definition of typicality is as follows. Definition 1 Let .Ω be a set—the domain or reference set of the typicality statements—and .⨅ a set of properties that the elements of .Ω could possess. A property .P ∈ ⨅ is typical within .Ω if nearly all members of .Ω instantiate P . The property is atypical within .Ω if .¬P is typical, i.e., if nearly none of the members of .Ω instantiate P . .Ω could be a set of possible worlds in which case the properties are propositions that are true or false at each.1 In any case, every .P ∈ ⨅ can be identified with a subset of .Ω (the extension of a predicate or the set of worlds at which a proposition is true) and is typical if that subset exhausts nearly all of .Ω. An archetypical example of a typicality statement is: The property of being irrational is typical within the set of real numbers. Being rational is atypical in .R.2 One will often encounter statements of the form “a typical .x ∈ Ω is P ,” e.g., “a typical real number is irrational.” I will adopt this convenient way of speaking from time to time, but we must be aware that it involves an abuse of language. No element of a reference set .Ω can be typical or atypical per se; it can only be typical or atypical with respect to a particular property (Maudlin, 2020). For instance, the real number √ 2 instantiates the typical property of being irrational but the atypical . property of being algebraic. There is no such thing as a typical number or the set of typical numbers on which something could be predicated. “Typical,” just like the locution “nearly all,” is in the first instance vague and context dependent but can receive a more rigorous formalization

1 I have no stake in the debate about whether propositions genuinely are properties of worlds, but the identification allows for a more unified presentation. 2 There are various ways to express the atypicality of rational numbers in precise mathematical terms, the most obvious being that .Q ⊂ R is a countable subset of an uncountable set or that it is a set of Lebesgue measure zero.

4

D. Lazarovici

within a particular theoretical context. The question then arises as to how we should quantify whether a subset of .Ω contains “nearly all” elements. If .Ω is a finite set, then by simple counting. In almost every context, all but ten out of a billion is nearly all. If .Ω is a continuum—as is usually the case in physical applications—we usually employ a natural measure .μ in the sense of mathematical measure theory: Typ(P ) : ⇐⇒ μ ({x ∈ Ω : P (x)}) ≈ μ(Ω).

.

(1.1)

What makes a measure “natural,” how to understand “.≈,” and what other ways there are to formalize typicality are among the questions that will be addressed in this book. For now, the most helpful answer is that we do not need to worry too much. In general, all reasonable measures will agree on whether or not P is typical. The details of the measure are not important as its role is only to identify “very big” or “very small” subsets of .Ω. Tim Maudlin (private communication) provides the following instructive analogy: The Sahara desert is nearly all sand. One may ask back: “Nearly all by what measure? In terms of surface area, or volume, or metric tons, or …?” But the question betrays that one did not get the point. The answer is all of them. Or any of them. By any reasonable standard, the Sahara is nearly all sand. What is typical is so overwhelming in numbers that it leaves little room for ambiguity. Remark 1 I am using the locution nearly all for “all except for a set of very small measure,” in distinction to almost all, which is common mathematical terminology for “all except for a set of measure zero.” The statement almost all real numbers are irrational would thus be correct (with respect to the Lebesgue measure and, more generally, any non-discrete measure on .R), but this standard of typicality is too strong for most applications in physics and other real-world contexts. It is important to distinguish two types of typicality statements. First, for a reference set that is an ensemble of actual entities or events: It is typical for ravens to be black. It is typical for lottery tickets to be losers. It is typical for calendar days not to be leap days (although today, at the time

1 Introduction

5

of writing, happens to be one). Such typicality facts are more familiar but of secondary interest to our further discussion. The typicality statements we will focus on instead have a decidedly modal character, with a reference set of possibilia. In particular, the most relevant typicality statements in physics refer, first and foremost, to what obtains in most nomologically possible worlds, not to what obtains most of the time in the actual one. Example: For nearly all possible initial micro-conditions, a gas concentrated in a small part of a sealed container will expand homogeneously over the accessible volume. The intuition behind typicality is the same for both types of statements, but misunderstandings are possible when the relevant domain is left implicit. The statement “The 11:45 train from Lausanne to Geneva is typically on time” is arguably true when referring to what happens on most days but false when referring to what happens on a particular day in most possible worlds. (Although someone fond of both metaphysics and the Swiss Federal Railways might argue that punctuality is an essential property of Swiss trains.) When applied to a reference class of possible worlds, typicality figures in a way of reasoning about contingency. If a fact about the world is contingent, it means that it could have been different. But not all contingent facts are equally surprising, or counterfactually robust, or deserving of an explanation. Some facts stand out in that they make our world extremely special. Some facts could have been different, but only if God—metaphorically speaking—had meticulously arranged things in the world to make it so (Fig. 1.1). In recent years, several papers have explored how such typicality facts can ground explanations, predictions, and rational belief, both in everyday life and in the context of fundamental physics and statistical mechanics.3 I will expand on this in detail in the course of this book.

3 For instance Dürr et al. (2017); Goldstein (2001, 2012); Hubert (2021); Maudlin (2007, 2020); Volchan (2007); Wilhelm (2022a).

6

D. Lazarovici

Fig. 1.1 God picking the initial conditions of the universe with a fine needle. Drawing by Sir Roger Penrose (1989, p. 343, Fig. 7.19). ©Oxford University Press. Reproduced with permission of the Licensor through PLSclear. All rights reserved

Typicality in the modal sense is weaker than necessity and stronger than possibility. Cum grano salis, we can understand it as a modal operator .Typ such that □p → Typ(p) → ♦p.

.

(1.2)

NB, .□p → p, whereas .Typ(p) does not logically imply that p actually obtains. Typicality explanations are thus not deductive-nomological explanations. They are instead based on the following rationality principle:4 Rationality Principle of Typicality: Suppose we accept a theory T and find that our world has some salient property P .

4 This

formulation draws from valuable conversations with Tim Maudlin. For a slightly different formulation of the rationality principle, see Wilhelm (2022b).

1 Introduction

7

If P is typical according to T , there is nothing left to explain. It is irrational to wonder further why our world is typical with respect to instantiating P . If P is atypical according to T , we should look for additional explanation or, in the last resort, revise or reject the theory. There is some debate among advocates of typicality about whether typicality facts are also predictive, that is, whether we should endorse a rationality principle like: If P is typical according to our theory T , we should expect P to obtain. I hold that typicality facts are predictive, but the difference between a phenomenon being predicted by T and its being conclusively explained if actually observed strikes me as slim, to begin with. The reason for the debate will be addressed in Sect. 1.4. It is essential to understand the role that the physical theory T plays in a typicality explanation. This is first and foremost to determine the reference set .Ω of nomological possibilities, i.e., possible microscopic histories, with respect to which certain (macroscopic or coarse-grained) properties come out as typical or atypical. We will later see that the theory with its dynamical laws also plays an important role in suggesting natural typicality measures. But this only strengthens the point that the explanatory work is done by the laws and the modal structure they determine—not by any specific choice of measure (a great many measures will agree on whether P is typical) and certainly not by any particular initial micro-conditions. Typicality thus figures in a way of reasoning about the laws, and it is the laws that ultimately ground (or fail to ground) explanations. Typicality explanations are unifying and reductive as scientific explanations should be: A small set of relatively simple laws makes a large variety of phenomena typical. But it is not my goal to convince the reader that typicality explanations are a subspecies of a more familiar kind. On the contrary, I will make a case for typicality as a fundamental way of reasoning and argue, in particular, that both causal and probabilistic explanations are based on typicality.

8

D. Lazarovici

1.2

Typicality Explanations

While I dismissed typicality statements about actual ensembles as less interesting, it pays off to think about the extent to which, e.g., the typicality of blackness among ravens explains the blackness of a particular raven, say, the raven sitting just above my chamber door. Why is this raven black? Because nearly all ravens are! does not leave us entirely satisfied, and it certainly makes sense to ask further why nearly all ravens are black. What does not make sense is to insist on an explanation specific to the raven sitting above my chamber door. Once we understand why blackness is typical among ravens—the generic explanation—there is no interesting story left to tell about this particular specimen. A white raven, in contrast, would prompt us to account for its deviation from the norm. It is like the song “Why is this night different from all other nights?” initiating the retelling of the Passover story to explain the traditions of the Seder dinner. On a typical night, there is just not much to sing about. In physics, explanations end with the laws. The laws constrain the possibility space of the world. And if a feature of our world turns out to be typical with respect to the nomic possibilities, there is nothing left to explain. It is irrational to wonder further why our world is, in that particular respect, like nearly all possible worlds described by our theory. The laws of nature are the “generic explanation” across possible worlds, the answer to the question of why a certain phenomenon or regularity is typical. The specific account that is (in principle) provided by the microscopic history of our world is both unnecessarily detailed and entirely contingent on the exact microscopic boundary conditions of the universe—which are themselves unexplained and most likely unexplainable. Ultimately, even the awesome powers of Laplace’s demon would accomplish little more than verifying the truism that P because the initial conditions of the universe were such that P . Why does the second law of thermodynamics hold in our universe? Because it holds in nearly all possible worlds allowed by the fundamental laws! should leave us deeply satisfied. It is the best and most conclusive explanation we could hope for. What is typical need not necessarily happen, and what is atypical is not impossible. But assuming our world to be a typical “model” of the laws is

1 Introduction

9

basically a necessity of thought. We do it routinely and all the time, if only implicitly. When we are confident not to suffocate even though all the air molecules might happen to assemble on the opposite side of the room, we assume typical behavior. When we infer the existence of a tree from our observation of the tree, we neglect the possibility of atypical fluctuations in the electromagnetic field creating an illusion. When we consider Newtonian mechanics falsified by quantum phenomena, it is not because Newtonian laws would make the interference patterns in the double-slit experiment or the violations of Bell’s inequalities impossible, but because they make them atypical. In fact, more authors have questioned the laws of logic in light of such phenomena than entertained “explanations” based on atypical initial conditions. On the one hand, it is a criterion imposed by us on our theories that they must make the relevant phenomena typical (or at least not atypical). But to the extent that we believe in discovering true laws of nature, we have to trust that our empirical evidence is not merely the product of atypical initial conditions. Otherwise, all bets are off. Typicality Versus Probability The earlier examples of typicality in physics may already suggest that the concept is related to, and often conflated with, that of probability. Indeed, one of the primary goals of this book is to clarify the distinctions and interrelations between the two. Let me start here with two observations: 1. Contrary to probability, typicality does not come in numerical degrees. As we will discuss in more detail, typicality is relative to a context determined by the reference set .Ω and a set .⨅ of relevant properties/propositions. But in any given context, a property P can be typical, atypical, or neither; it cannot be more or less typical than some other .P ' ∈ ⨅. Even if we use a normalized measure—technically a probability measure—to explicate “nearly all” or “nearly none,” we are not committing to giving meaning to the exact number that this measure assigns to subsets of .Ω. The only values relevant for typicality statements are .≈ 1 and .≈ 0.

10

D. Lazarovici

2. Typicality statements do not presuppose any sort of randomness or indeterminism, nor do they refer to, or depend on, anyone’s knowledge, ignorance, or degrees of belief. Typicality statements can refer to frequencies (when the reference set is a finite ensemble), but the more interesting ones do not. When we analyze a physical theory, the relevant typicality results express objective facts about the modal structure of the laws. These modal facts come with certain normative implications for what we should expect or consider to be explained, but they are not themselves of an epistemic nature. Put plainly, when you hear “typical” or “atypical,” think “very big” or “very small” sets. When you hear “probable” or “improbable,” consult four centuries of debate about what it could mean. Still, the basic yet revolutionary idea of predicting outcomes by counting possibilities stood at the beginning of probability theory and is also the prototype of typicality reasoning. It is sometimes criticized that by being content with a typicality explanation—or puzzled about atypical facts—we are making an unwarranted inference from typicality to probability, as if, let’s say, the fact that a subset of initial conditions is small implies that it is unlikely for one of them to be picked out. Counterexamples to such an inference are readily produced. The bullseye makes up a small fraction of a dartboard’s surface area, but how likely it is to be hit depends on the skills (and intentions) of the player throwing darts. Almost all real numbers are irrational, but if we ask a person on the street to name one, we would not be surprised if she picked a rational number, as those are more familiar to people. One might point out that, in these examples, we are already beginning to invoke additional explanations (an agent’s dexterity with darts or people’s familiarity with certain numbers) to account for otherwise remarkable outcomes. The more important observation, however, is that when it comes to the initial conditions of the universe, there is no one making the pick, no God throwing darts onto the universe’s phase space. Typicality is the right concept for reasoning about the universe precisely because it is free of such connotations. If a feature of our world is atypical according to a theory, it means that our world is—in that particular respect—unlike the vast majority of worlds instantiating its laws, the vast

1 Introduction

11

majority of the theory’s “models.” It is this fact alone that challenges the theory and creates explanatory pressure. No further inference to probability is made, needed, or even meaningful. Some critics object that we should withhold all a priori expectations about our world, that no single feature of our universe can be puzzling per se. This is not an unreasonable stance in general, but speaking of typicality reasoning in terms of “a priori expectations” is misleading. The physical laws determine the modal structure of the world and thereby typical and atypical features. And the laws are not a priori, but their empirical content and adequacy must be assessed by typicality reasoning since fine-tuned initial conditions could produce virtually any macroscopic evidence whatsoever. What is true is that typicality reasoning draws to some extent on pre-theoretic intuitions (about the meaning of “typical”), but we cannot let a theory completely define its own standards of success. Probabilities are ill-suited for judging a (deterministic) law hypothesis precisely because no initial conditions of the universe are “likely” or “unlikely.” In contrast, it makes sense to speak of probabilities associated with throwing darts or generating “random” numbers. But those are physical processes (leaving aside the issues of human consciousness and free will) whose outcome distributions have to be explained on the basis of physical laws. A core thesis of this book is that such explanations are grounded in typicality. The laws make certain statistical regularities typical. These are the objective probabilities we encounter in the world. The objection that “typicality does not imply probability” thus has it backward. Typicality is the more fundamental concept, and probabilistic intuitions are based on typicality facts.

1.3

The Boltzmannian Framework

The appropriate technical framework for typicality statements in physics is that of Ludwig Boltzmann’s statistical mechanics. Later chapters will discuss statistical mechanics in the narrower sense. But beyond its traditional applications in physics, Boltzmann’s framework can be seen as a general framework for understanding macro-on-micro supervenience.

12

D. Lazarovici

On the micro-physical level, we have a state space .Ω comprising the microscopic degrees of freedom of the physical system (the universe, in the last resort) and a deterministic law defining a vector field on .Ω whose integral curves are the possible evolutions of the microscopic state. The general solution of the equations of motion is described by a flow .⏀ : R × R × Ω → Ω such that .X(t) = ⏀t,s (ω) is the unique solution with initial condition .X(s) = ω. In classical mechanics, .Ω ∼ = R3N × R3N is the phase space of N particles, comprising the positions and momenta of each. A microstate thus corresponds to a point .X = (q, p) in .Ω, where .q = (q1 , q2 , . . . , qN ) and .p = (p1 , p2 , . . . , pN ) are the particle positions and momenta, respectively, and the laws of motion  are given by a Hamiltonian vector

˙ p) ˙ = ∂H , − ∂H . field of the form . dtd X = (q, ∂p ∂q There is also a natural measure .λ on .Ω, the uniform Liouville measure corresponding to the intuitive phase space volume. This measure has the special property of being stationary, i.e., conserved by the Hamiltonian dynamics. This means that .λ(⏀t,s A) = λ(A) for any (measurable) .A ⊆ Ω and .s, t ∈ R, so that subsets of .Ω maintain their size under time evolution. Mathematically, the triple .(Ω, ⏀t,s , λ) forms a dynamical system. While a stationary measure is not an absolute necessity for a typicality analysis—as mentioned before, many measures and even structures short of an additive measure (see Chap. 6) could express what we mean by “large” and “small” sets—we will discuss why it is a natural and desirable feature. A key concept introduced by Boltzmann is the distinction between microstates and macrostates. Whereas a system’s microstate .X ∈ Ω is given by a complete specification of its microscopic degrees of freedom— in classical mechanics, the positions and momenta of all its constituent particles—a macrostate M is specified in terms of coarse-graining variables that are functions of X and characterize the system on macroscopic scales. A macrostate is completely determined by the microstate, that is, .M = M(X), but one and the same macrostate can be realized by a large (in general infinite) number of different microstates, all of which “look macroscopically the same.” A microstate thus corresponds to one point in .Ω, while a macrostate corresponds to a subset .ΩM ⊆ Ω containing all the

1 Introduction

13

microstates that realize M. We thus obtain a partition of the microscopic state space into macro-regions corresponding to (approximately) constant values of the relevant macro-variables. And as the laws determine the micro-history .X(t), they also determine the macro-history .M(X(t)). Typical examples of macro-variables in statistical mechanics are volume, pressure (.∼ mean force per unit area), or temperature (.∼ mean kinetic energy per degree of freedom). Approximate values of these variables may describe the macrostate of an ideal gas, and one finds that the relation .P · V ∝ N · T (the ideal gas law for N particles) is typical, i.e., holds for a great majority of possible microstates.5 Going beyond statistical mechanics in this narrow sense, it is a basic premise of our discussions that the truth value of any physical proposition .ϕ is determined by the micro-history of the universe. We can fix a time .t0 at which we consider the possible initial conditions of the universe so that every .ω ∈ Ω determines a micro-history .X(t) := ⏀t,t0 (ω), t ∈ R. (It is common to speak of initial conditions, but .t0 is really arbitrary.) .ϕ can now be identified with a characteristic function χϕ : Ω → {0, 1} = {f alse, true},

.

(1.3)

or, equivalently, with the “macro-region” Ω[ϕ] = {ω ∈ Ω : χϕ (ω) = 1}.

.

(1.4)

In modal logic, the set .Ω[ϕ] would be called the intension of .ϕ. We see a close connection between the Boltzmannian framework and possible worlds semantics. More importantly, we begin to see the formal physical underpinning of typicality statements: .Typ(ϕ) if and only if .Ω[ϕ] is a “very big” subset of .Ω, where “very big” is usually understood in terms of a natural typicality measure, e.g., the Liouville measure in classical mechanics. It is not an overstatement (as I hope this book will make clear) that the primary goal of analyzing a micro-physical theory is to derive such typicality results.

5 The

proportionality factor is the Boltzmann constant .kB = 1.380649 × 10−23 K−1 .

14

D. Lazarovici

Remark 2 (Time-Indexed Propositions) If .ϕ is a tensed proposition whose truth value depends only on the present microstate of the universe, we can define the time-indexed proposition .ϕt , t ∈ R, by ω ∈ Ω[ϕt ] : ⇐⇒ ⏀t,t0 (ω) ∈ Ω[ϕ] ⇐⇒ ω ∈ ⏀−1 t,t0 Ω[ϕ].

.

(1.5)

⏀−1 t,t0 Ω[ϕ] is the set of microstates at .t0 that will (or did) realize .ϕ at time t. It is, of course, a consequence of the deterministic laws that the truth value of any proposition referring to any time t is determined by the initial conditions of the universe or, more generally speaking, the microstate at any other time .t0 .

.

Remark 3 (Conditional Typicality) Many typicality statements refer not to the reference set of all possible initial conditions (micro-histories) but to a subset restricted by pertinent macroscopic boundary conditions. Sometimes, the need for such a restriction is obvious. If we are interested in a physical regularity involving falling apples, we consider only possible worlds in which apples exist. Other times, we only want to say that P is typical given Q. In either case, we write, e.g., .Typ(P | Q) and consider ∩Q) the conditional measure .λ(P | Q) = λ(P . (Here and in the future, I λ(Q) omit the distinction between a proposition and its characteristic set.) As we generalize this framework beyond classical mechanics, the implicit metaphysical assumption—which is really an assumption about how a theoretical formalism connects to empirical facts—is that the theory has a microscopic ontology of matter that could, in some sense, coarse-grain to the world that we experience. Theories of this kind are also known as primitive ontology theories.6 With few arguable exceptions, theories without a primitive ontology are also not of a shape and form that would allow for a “realistic” interpretation as objective descriptions of the physical world.

6 See, e.g., Allori (2013). The term “primitive ontology” goes back to Dürr et al. (1992). See also Bell’s notion of “local beables” in (Bell, 2004, Chap. 7).

1 Introduction

15

A question that will pop up now and again is what determines the “right” partition of phase space, that is, both the relevant set of macrovariables and the range of values that still count as the same macrostate. There is no easy answer. It does not follow analytically from the microscopic theory that we should partition microstates into different levels of pressure and temperature, let alone into “apples” and “oranges.” It is an additional part of the scientific enterprise to find variables and predicates that allow us to identify regularities and that do a good job carving nature at its joints. The scales at which we experience the world are undoubtedly relevant to the coarse-graining of microstates, but this does not lead to subjectivism or anthropocentrism as sometimes claimed. A pattern is only discernible at certain resolutions—you won’t see the picture if you look at individual pixels or from so far that all structure is blurred. Still, the patterns we care about in the natural sciences are in nature whether we recognize them or not. And the results of a typicality analysis depend on us only in the trivial sense that we decide what questions to ask of the theory. Our intentions, beliefs, or epistemic limitations play no role in the answers that the theory yields in response.

1.4

Brute Facts

Typicality reasoning always applies in a specific context characterized by a domain or reference set .Ω and a set .⨅ of properties or propositions that we want to reason about. In physics, .Ω is determined by the relevant theory T as the phase space of (initial) micro-conditions that parameterize possible worlds. What determines .⨅ is less clear-cut. .⨅ can be a set of regularities described as laws of a second (effective or phenomenological) theory .T˜ . This would be a context of theory reduction. The reduction of .T˜ by T is successful if the T -laws (perhaps together with suitable macroscopic auxiliary assumptions) make the regularities described by the .T˜ -laws (perhaps in a suitable approximation) typical. The two paradigmatic examples that will be discussed in this book are the reduction of thermodynamics by the classical mechanics of point particles and (lesser-known but even more straightforward) the reduction of the

16

D. Lazarovici

statistical formalism of quantum mechanics to another deterministic theory of point particles, Bohmian mechanics. As important as these examples are, typicality reasoning is not restricted to the task of theory reduction. But .⨅ must be restricted or at least qualified. The problem is that, as soon as we go to a very fine-grained description, every possible world is atypical with respect to some of its features. Our universe is certainly atypical with respect to its exact microscopic configuration, or the exact number of stars in our galaxy, or the precise sequence of numbers drawn in the Powerball lottery. That the reader is reading this very sentence at this particular time and place is an atypical event but hardly one that challenges our best theories of nature. So why do some atypical features of our world cry out for explanation or even falsify established theories while others seem unproblematic and acceptably brute? Stating the rationality principle associated with typicality, I referred somewhat evasively to “salient” properties. But the question of what makes a feature of our world “salient,” a valid target of scientific explanation, evades a simple answer. A partial answer is that science is usually concerned with the explanation of robust phenomena and regularities rather than individual data points.7 For instance, if we shoot electrons through a double slit onto a photographic screen, the formation of an interference pattern is a typical phenomenon predicted by quantum mechanics. The exact configuration of impact points in the experiment is always atypical. But it is also not reproducible, not the explanatory target of physicists, and different quantum theories (that are considered empirically equivalent) disagree on whether the individual impact points are even in principle determined by physical laws. Some help in making this phenomenon/data distinction precise may come from algorithmic information theory, where we find various proposals for quantifying the amount of “non-random information” contained in a string of numerical data (e.g., sophistication (Koppel, 1995) or

7 Although

across different scientific disciplines, one person’s data point may be another one’s phenomenon.

1 Introduction

17

effective complexity (Gell-Mann & Lloyd, 1996)). The idea, in a nutshell, would be that a “phenomenon” is a sufficiently compressible pattern in the data that remains once we filter out the noise. This can be a fruitful perspective, but the technical details raise more questions than they answer since several choices are involved in the specification of any such measure. One should not hope to find a clear-cut computational solution to a very subtle issue. Another relevant observation is that many of the previous examples of atypicality would allow for further explanation, were we interested in one. The reader might be able to provide reasons for deciding to read this book at this particular time and place. A detailed description of the history of star formation in our galaxy could account for the number of stars found today. The drawback of such explanations is that they generally trace one atypical event back to other atypical events. It is the usual conundrum of causal explanations, in particular, that one can always keep on asking, “what caused the cause?” Things stand differently if one wonders why the initial microstate of our universe was exactly X (when there is a continuum of other possibilities, all of which are not X). This is a form of the measure zero problem that every point on a continuum forms a null set (with respect to any continuous measure), including the one corresponding to the actual “outcome.” Here, no further explanation seems possible, and there is something about the question that strikes me as deeply irrational. It is the same unease that one might feel about questioning the lottery numbers. It had to be something, after all, and any outcome would have been atypical. Sidney Morgenbesser famously responded to the ontological question: Why is there something rather than nothing? with: “If there were nothing, you’d be still complaining!” In reference to this bon mot, I call the kind of pseudo-problems we want to avoid Morgenbesser cases: Why is .F (@) = X in our universe? If it were anything else, you’d still be complaining! Hence, the following addendum to the previously formulated rationality principle seems both necessary and compelling: An atypical event requires explanation only if a typical outcome would have been possible. Otherwise, it is a brute fact. Let me try to make this more precise.

18

D. Lazarovici

Definition 2 Let .F : Ω → Rk be a somewhat natural function (“macro”-variable) on the state space .Ω. The formulas .Py (ω) : F (ω) = y express brute facts if .Typ(¬Py ) for all .y ∈ Rk . This definition extends to formulas like .Py (ω) : ‖F (ω) − y‖ < δ for fixed .δ > 0, or vague propositions of the form .Py≈ : F (ω) ≈ y. What do I mean by a somewhat “natural” function? We have to exclude very contrived and ad hoc partitions of .Ω that could always be constructed to make any fact “brute.” That is, we have to exclude functions that do nothing but pick out artificially carved-up subsets of .Ω. The following condition might not be the final word of wisdom, but it does a fairly good job: We require that the number of possible values that F can attain is much larger than the number of constants entering its definition. Now, if .Py expresses an atypical but brute fact, it is still rational to expect .¬Py (@) for y picked out in advance. But if .Py (@) happens to be true, it does not require a further explanation, at least not based on typicality reasoning. Consider, in contrast, the proposition: The second law of thermodynamics holds at w. We can restate it (ignoring some subtleties to be discussed later) as the .y = 0 case in the family 

dS .Qy : inf dt

 ≥ y,

so that .Q0 states that the entropy S is non-decreasing. Then .Typ(Q0 ) (whereas .Typ(¬Qy ) for .y < 0), as we will explain in Chap. 8. The failure of the second law of thermodynamics would thus not be an acceptably brute fact but severely challenge established theories, arguably to the point of falsifying them.8

8I

recall the famous quote from Arthur Eddington (1928, p. 74): “If someone points out to you that your pet theory of the universe is in disagreement with Maxwell’s equations—then so much the worse for Maxwell’s equations. If it is found to be contradicted by observation—well, these experimentalists do bungle things sometimes. But if your theory is found to be against the second law of thermodynamics I can give you no hope; there is nothing for it but to collapse in deepest humiliation.”

1 Introduction

19

This analysis also highlights the importance of proper coarse-graining. If .F : Ω → R is a continuous or very fine-grained variable, we are generally interested in events of the form Py,δ (ω) : F (ω) ∈ (y − δ, y + δ),

.

(1.6)

where .δ must be large enough that a range of typical values exists. For instance, we do not seek to explain why the relative frequency of heads in a long series of coin tosses is exactly .r ∈ Q but why it is approximately 9 .1/2. Similarly, that the number of stars in the Milky Way is exactly N (whatever N may be) is a brute fact, but that the number is somewhere between .109 and .1014 is, very plausibly, typical for a galaxy of its size.

References Albert, D. Z. (2015). After physics. Cambridge, Massachusetts: Harvard University Press. Allori, V. (2013). Primitive ontology and the structure of fundamental physical theories. In A. Ney & D. Z. Albert (Eds.), The wave function: Essays on the metaphysics of quantum mechanics (pp. 58–75). New York: Oxford University Press. Bell, J. S. (2004). Speakable and unspeakable in quantum mechanics (2nd ed.). Cambridge: Cambridge University Press. Dürr, D., Froemel, A., & Kolb, M. (2017). Einführung in Die Wahrscheinlichkeitstheorie Als Theorie Der Typizität. Berlin: Springer. Dürr, D., Goldstein, S., & Zanghì, N. (1992). Quantum equilibrium and the origin of absolute uncertainty. Journal of Statistical Physics, 67 (5–6), 843– 907. Eddington, S. A. S. (1928). The nature of the physical world. New York: Macmillan. Gell-Mann, M. & Lloyd, S. (1996). Information measures, effective complexity, and total information. Complexity, 2(1), 44–52.

9 Typically, .δ

1

= O(N − 2 ) for a series of N trials, see Sect. 2.2.

20

D. Lazarovici

Goldstein, S. (2001). Boltzmann’s approach to statistical mechanics. In J. Bricmont, D. Dürr, M. C. Galavotti, G. Ghirardi, F. Petruccione, & N. Zanghì (eds.). Chance in physics: Foundations and perspectives (pp. 39–54). Berlin: Springer. Goldstein, S. (2012). Typicality and notions of probability in physics. In Y. Ben-Menahem & M. Hemmo (Eds.). Probability in physics. The Frontiers Collection (pp. 59–71). Berlin: Springer. Hubert, M. (2021). Reviving frequentism. Synthese, 199(1), 5255–5284. Koppel, M. (1995). Structure. In R. Herken (Ed.). The universal turing machine: A half-century survey (2nd ed., pp. 403–419). Wien: Springer. Maudlin, T. (2007). What could be objective about probabilities? Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics, 38(2), 275–291. Maudlin, T. (2020). The grammar of typicality. In V. Allori (ed.). Statistical mechanics and scientific explanation: Determinism, indeterminism and laws of nature. World Scientific. Penrose, R. (1989). The emperor’s new mind: Concerning computers, minds, and the laws of physics. Oxford: Oxford University Press. Volchan, S. B. (2007). Probability as typicality. Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics, 38(4), 801–814. Wilhelm, I. (2022a). Typical: A theory of typicality and typicality explanation. The British Journal for the Philosophy of Science, 73(2), 561–581. Wilhelm, I. (2022b). The typical principle. The British Journal for the Philosophy of Science. Advance online publication. https://doi.org/10.1086/723240.

Part I Probability

2 Typicality in Probability Theory

In questions of a practical nature, we may be forced to consider events whose probability is more or less close to unity as certain and events whose probability is small as impossible. Accordingly, one of the most important tasks of probability theory is to identify those events whose probabilities are close to unity or zero. —Andrey Markov, Wahrscheinlichkeitsrechnung (1912, p. 12)1

Although one of this book’s primary goals is to clarify the formal, conceptual, and metaphysical differences between typicality and probability, I will start by discussing typicality in the context of standard probability theory. As a first approximation, it is not inappropriate to read “typical” as a synonym for “very probable”; the more philosophical and interpretative questions come into focus, the more sharply we have to draw the distinction. Since the mathematical theory per se does not tell us much about what probability means, there is no point in insisting, on this level of abstraction, that very high probability means something different from typicality. 1 Translation

from German by D.L.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Lazarovici, Typicality Reasoning in Probability, Physics, and Metaphysics, New Directions in the Philosophy of Science, https://doi.org/10.1007/978-3-031-33448-1_2

23

24

2.1

D. Lazarovici

Expectation Value and Typical Values

While probability can be a subtle, controversial, and downright mysterious concept, its mathematical theory is a rather sober business. In the axiomatic tradition of Kolmogorov, probability theory is essentially the theory of normalized measures. Definition 3 A probability space is a triple .(Ω, A, P) consisting of a set Ω of elementary events, a sigma-algebra .A of measurable subsets,2 and a non-negative, countably additive set function .P on .A, a.k.a. a measure, normalized to .P(Ω) = 1. The measure .P(A) ∈ [0, 1] of a set .A ∈ A is called “the probability of A.”

.

In general, the elements of .Ω represent mere possibilities—in our further discussions, .Ω will often correspond to the set of possible initial conditions of a physical system—but a probability space can also be used to describe the actual distribution of a statistical ensemble. A random variable (for simplicity, we consider only real-valued ones) is a measurable3 function .X : Ω → R, mapping the elementary events to numerical values. Mark Kac (1959, p. 22) called this “a horrible and misleading terminology.” Indeed, the term “random variable” suggests something indeterminate or chance-like, when a random variable is really just a function—usually a coarse-graining one, meaning that many different .ω ∈ Ω are mapped to the same outcome. In the language of statistical mechanics, we will call such coarse-graining functions macrovariables. Given a random variable .X : Ω → R, the integral  E(X) =

X(ω)dP(ω)

.

(2.1)

Ω

2 The sigma-algebra is the set of subsets of .Ω on which the measure is defined. It is required to be closed under complements and countable unions and contain .Ω itself (the “certain event”). 3 X is measurable (more precisely, Borel measurable) if the pre-images of open sets in .R are measurable in .Ω, i.e., elements of the sigma-algebra .A.

2 Typicality in Probability Theory

25

with respect to the probability measure is called the expectation value of X. When referring to an actual statistical distribution, it is the statistical mean or average. The expectation value, as the name suggests, is considered to have a predictive quality, even though it need not correspond to the most likely value or even a possible one. In fact, the expectation value is only a reasonable prediction if significant deviations from it are very unlikely, that is, if P (|X − E(X)| > ∈) ≤ δ

(2.2)

.

for reasonably small values of .∈ and .δ. In other words, the expectation value yields a sensible prediction only insofar as it provides a good approximation to typical values, that is, to a range of outcomes that will obtain with very high probability. In the first place, it is such typical values rather than the expectation value that we should take as the prediction of a probabilistic model. We can readily see from (2.2) that there is, in general, a trade-off between the smallness of .∈ and .δ which, moreover, express inherently vague notions of “significant” deviations and “very low” probability. This vagueness pops up in different forms and places, so it’s important to emphasize from the outset that there is no way around it. Probabilistic (as well as typicality) reasoning always involves some degree of vagueness and pragmatism. We have to deal with it, no matter what. A mathematical quantity expressing how much a random variable X fluctuates around its expectation value is the variance   V(X) := E (X − E(X))2 .

(2.3)

.

The square root of the variance is called the standard deviation and is commonly denoted by .σ . The relevance of the variance can be seen from a simple application of the Chebyshev inequality: 1 .P (|X − E(X)| > ∈) ≤ ∈2

 |X − E(X)|2 dP(X) =

V(X) . ∈2 (2.4)

26

D. Lazarovici

Simply put, the smaller the variance, the more “weight” is concentrated in a smaller range of values around the mean. Small variance thus ensures a narrow and hence predictive range of typical values.

2.2

Law of Large Numbers

The law of large numbers—the central result in probability theory— should be understood in exactly this manner. If we consider a family .X1 , . . . , XN of uncorrelated and identically distributed variables (say, the outcomes of N independent dice rolls) together with their empirical mean N .m emp

N 1  := Xi N i=1

(2.5)

(e.g., the average number rolled with the dice), the variance of the sum is additive (order N ) while the pre-factor . N1 enters quadratically, i.e., as −2 .N . Hence, (2.4) becomes    v  N  N , (2.6) .P m emp − E(memp ) > ∈ ≤ N ∈2 where .v = V(Xi ) is the variance, which is the same for all .Xi . In words, the probability that the empirical mean deviates by more than .∈ from its expectation value—here also called the theoretical mean—is less than v .δ(∈, N ) := . N∈ 2 Again, the trade-off between .∈ (characterizing a small range of values around the theoretical mean) and .δ(∈, N ) (the bound on the probability of larger deviations) is evident from (2.6). The eponymous “large number” is the ensemble size N . In view of (2.6), it must be large to ensure not only that the result is correct but that it is relevant. For Bernoulli variables .Xi ∈ {0, 1} (each outcome either obtains or not), .mN emp is the relative frequency. The law of large numbers then states that, for large N , typical relative frequencies lie in a small range of values around the theoretical mean.

2 Typicality in Probability Theory

27

For instance, for the standard coin toss model with .P(Xi = 1) = P(Xi = 0) = 21 , say .Xi = 1 stands for “heads” and .Xi = 0 for “tails” on the i’th trial, (2.5) counts the relative frequency of “heads” in a series of N tosses. The law of large numbers then tells us that typical values of this relative frequency deviate only slightly from . 21 if the series of tosses is long enough. It is not an original observation that the empirical import of probabilities is to be found in such cases dealing with statistical regularities, i.e., frequencies, in large ensembles. Together with the understanding that the relevant predictions are always based on typicality—that our model or theory predicts what it deems “overwhelmingly likely”—we see why the law of large numbers that gives us typical relative frequencies plays such a central role in connecting the probability calculus to the world. There is no sharp, universal threshold above which an event counts as “overwhelming likely” (an issue that will occupy us for a while). But when we speak of typical values, we mean that the possibility of atypical events—those lying outside the predicted range—can be considered negligible. An atypical event occurring after all (for instance, 800 times heads in a series of 1000 tosses) would be the kind of event that compels us to revise or reject our model (e.g., conclude that the coin is biased) rather than to merely shrug our shoulders and say: “Well, I guess anything is possible.” Of course, in practice, one may choose to be a more aggressive forecaster and report a smaller range of predictions for the price of greater uncertainty. Whether this is a reasonable thing to do depends not only on the rarity or improbability of the neglected outcomes but also on their potential impact.

2.2.1 The

√ N Law

As sum Nwe saw in the derivation of (2.6), the standard deviation of a √ N X of N independent variables (not normalized) is of order . i=1 i (if typical values of the variables are of order 1). This provides a general rule of thumb for the magnitude of typical deviations from the theoretical mean. The theoretical mean itself is of order N . Typical relative deviations are thus of order . √1N .

.

28

D. Lazarovici

The reason why statistical mechanics works so well is that it deals with extremely large values of N , usually the number of microscopic degrees of freedom in a macroscopic system. In fact, the most important constant in statistical mechanics is not Boltzmann’s but Avogadro’s constant, .NA = 6.02214076 × 1023 mol−1 , which is the number of molecules in one mole of a given substance, e.g., in .18 g of water (.H2 O) or .32 g of oxygen (.O2 ). Hence, the number of microscopic constituents in a macroscopic system is generally of the order of .N ∼ 1024 . This huge number manifests the separation of scales between the microscopic and macroscopic regimes, which makes the inherent vagueness emphasized above—the trade-off between .∈ and .δ(∈, N )—unproblematic in practice. Simply put, huge N gives us enough wiggle room to choose both .∈ and .δ small enough that typical fluctuations are negligible in magnitude while large fluctuations are negligible in probability. Negligible, that is, on the observational scales relevant to the pertinent macro-phenomena, say, the dispersion of heat in a metal, or the relations between temperature, volume, and pressure that we find in a weakly-interacting gas. There is no reason to expect analogous statistical “laws” if we consider very fine-grained variables, extremely long time scales, or systems with few degrees of freedom. The continual outpour of publications presenting this as a foundational problem rather than a natural consequence of statistical mechanics can be somewhat disheartening.

Stronger LLN Estimates. Assuming statistical independence, stronger LLN estimates than (2.6) can be obtained from the general form of the Chebyshev inequality .P (|Z

− E(Z)| > ∈) ≤

E [(Z − E(Z))m ] , m∈N ∈m

(2.7)

1 N with .Z = mN emp = N i=1 Xi , depending on the regularity of the random variables, in particular, up to which m the m’th moments .E(Xim ) remain finite.

(continued)

29

2 Typicality in Probability Theory

For particularly “nice” variables (especially Bernoulli variables), one can even obtain exponential bounds, e.g., of the form .P



 N∈ 2 N − const. |mN . emp − E(memp )| > ∈ ≤ e

(2.8)

Relevant results go under such names as Bernstein inequalities, Hoeffding’s inequality, or Chernoff bounds. These are just the most basic examples of so-called concentration inequalities, providing rigorous estimates for typical values of a random variable on which the probabilistic weight concentrates.

Kolmogorov’s Zero-one Law. It is a general feature of large families of mutually independent random variables that they partition .Ω into very large/small sets corresponding to typical/atypical properties. In an idealized form, this is manifested in Kolmogorov’s zero-one law: Let .(Xn )n≥1 be a family of independent random variables on .(Ω, A, P). An event .A ∈ A is called a tail event for the family .(Xn )n≥1 if, for all .k ∈ N, its occurrence depends only on .(Xn )n≥k . Informally speaking, tail events are not sensitive to the values of individual .Xn (more precisely, finite subsets of the infinite family of variables) but express “asymptotic properties.” Let .σ (Xn : n ≥ 1) be the smallest sigma-algebra containing all such tail events. Then .P(A)

= 0 or P(A) = 1, for all A ∈ σ (Xn : n ≥ 1).

(2.9)

A prominent instance of this general result is the strong law of large numbers for asymptotic frequencies in the limit of infinitely many trials. Since the strong LLN involves such (hypothetical) limits, its relevance is more theoretical than practical.

2.2.2 The Central Limit Theorem Another fundamental result in probability theory is the central limit theorem. It states that, if .(Xi )i≥1 is an independent and identically distributed family of random variables with expectation .μ and variance

30

D. Lazarovici

Fig. 2.1

Bernoulli distribution with .p = 1/2 and .N = 40 (left), .N = 400 (right)

v, the probability distribution of

.

N  N 1  Xi − μ v N i=1

(2.10)

converges, for .N → ∞, to a normal distribution with mean 0 and variance 1. This is another kind of law of large numbers. Morally, it means that, for large N , the probability distribution of the empirical mean .mN Gaussian, centered around .μ with standard emp is approximately v deviation .σ = N . This distribution becomes more and more peaked with growing sample size N , its weight being concentrated within a few standard deviations of order . √1N . The central limit theorem applies to independent coin tosses. Figure 2.1 shows the probability distribution for the total number of heads in a series of N tosses, for .N = 40 and .N = 400, respectively. We can see the Gaussian shape emerging. More importantly, we can see that the distribution is essentially concentrated on √ outcomes for which the total number of heads deviates by not more than . N (two standard deviations) from .N /2, i.e., the relative frequency is within .[ 21 ± √1N ]. Indeed, under the normal distribution that we approximate for large N , about 95% of the weight is concentrated within two standard deviations from the mean, i.e., in .[μ ± 2σ ]; 99.73% is within .3σ , and 99.999943% is within .5σ , which, in our case, is still only a deviation of .± 2√5N from the relative frequency . 21 .

2 Typicality in Probability Theory

31

It is a characteristic feature of the Gaussian distribution that it decays very quickly away from the mean, so that strong outliers are atypical. This is one of its many nice properties, that it has “light tails,” as one says. The real world is not always so nice, and it has been argued that many catastrophic prediction failures in finance and other socio-economic domains have resulted from unjustified assumptions of Gaussianity that lead to underestimating the probability of extreme events (Taleb, 2010). This serves as a word of caution but also to re-emphasize that expectation values are not always a good guidepost for what to expect. If we understand typical values as the relevant predictions, it also becomes immediately clear that a probabilistic model yields, in general, a range of predictions or prediction interval (though not necessarily a single connected one). In some unfortunate cases, this range may be so large that the model is not very predictive at all. In particularly nice cases, the range of typical values will be narrow and centered around the expectation value, so that the latter becomes a good stand-in for the prediction. This applies, in particular, in the regime of the law of large numbers or the central limit theorem, but it can fail spectacularly for systems with strong correlations or variables with large, or even unbounded, variance. An analogous observation applies to actual statistical distributions. If we consider a sample consisting of 1000 nurses and Elon Musk, the average net worth and the typical net worth will differ very significantly. (The median net worth would be a better approximation to the latter, but this is not always the case.) In such instances, one has to consider very carefully which statistical quantity is relevant for practical decisionmaking. The following is a textbook example4 of a game with positive expectation value in which the player typically loses money in the long run. In the limit of .N → ∞ rounds, the expectation value is even infinite, while the risk of ruin is .100%. Should one take the bet if one had to commit to a game of (say) .N = 10,000 rounds? If we agreed that one should not, we would seem to agree that rational expectations are based on typical rather than expected outcomes. 4 Figuratively

and literally, see, e.g., Georgii (2004, Ex. 5.8); for a recent philosophical discussion in the context of typicality, see Maudlin (2020).

32

D. Lazarovici

St. Petersburg Paradox (A game with infinite expectation value in which you will typically go bankrupt) Consider a biased coin for which the probability of heads is .p ∈ (1/3, 1/2). The player starts with a positive capital of .X0 dollars. In each round, she doubles her capital if the outcome is heads but loses half if the outcome is tails. The expectation value for the .(n + 1)’st round is thus .E(Xn+1 )

= p(2Xn ) + (1 − p) =

1 3p + 2 2

Xn =

1 Xn 2

1 3p + 2 2

n+1

n→∞

X0 −−−→ +∞.

However, after a large number N of rounds, the typical tallies of heads and tails obtained in total are approximately Np and .N(1 − p), respectively, resulting in a capital of .XN ≈ 2Np × 2−N (1−p) × X0 = 2−N (1−2p) X0 , which tends to zero for .N → ∞.

2.3

Subjective Probabilities and Propensities

The discussion so far has made use of one (and only one) philosophical principle relating very high probability to empirical predictions. This is a form of Cournot’s principle, which will be the focus of the upcoming chapter. Beyond that, I have omitted the question of what probabilities actually mean or refer to. The debate about the interpretation of probability spans centuries, and this book won’t do justice to either the extensive history or the variety of positions defended in contemporary literature. In principle, I believe that different notions of probability (when clearly distinguished) can peacefully coexist in separate contexts. However, in the context of natural science, especially physics, probabilities must somehow be connected to natural laws on the one side and empirical facts on the other. Before getting to my view of how this is achieved, I want to address two alternative notions of probability that will not be the focus of our further discussions.

2 Typicality in Probability Theory

33

2.3.1 Subjective Probabilities The classical dialectic in the debate about the interpretation of probability is that of objective versus subjective conceptions, where the latter understand probability in terms of epistemic uncertainty or degrees of belief. Depending on how strongly one takes the relevant credences to be constrained by objective rationality principles and evidence, the term “subjective probabilities” may not be entirely accurate. However, other umbrella terms such as Bayesianism can get equally fuzzy around the edges. Subjective probabilities certainly have their place in everyday reasoning, even in some special sciences and areas of philosophy. When I say: “There is a 30% chance that my ex will respond to my text message,” I mean: “My credence that my ex will respond to my text message is (roughly) 30%.” My estimate may be based on objective factors and prior experience but ultimately expresses a personal degree of belief. (Hopefully, I’m not so desperate as to produce a statistically relevant sample of messages.) Probabilities in physics serve an epistemic and behavior-guiding function, among other things. But first and foremost, physics should seek to explain why as a matter of fact certain statistical regularities obtain, be it in a series of coin tosses, among gas particles in a box, or in sophisticated scattering experiments. If our theory is able to explain and predict such objective regularities, I begin to see a rationale for assigning corresponding credences to individual events, say, the outcome of my next coin toss. I fail to see, however, what the dispersion of heat or the creation of an interference pattern in the double-slit experiment could have to do with anyone’s ignorance or degrees of belief. Undoubtedly, explanations and predictions have a psychological dimension. We make inferences based on our best available theories, which lead us to expect certain future events or ease our wonderment about past observations. It is thus tempting to think that such inferences take epistemic or doxastic states as input and that we deduce knowledge or belief from prior knowledge or belief. This idea is wrongheaded, though. A physical explanation should be a statement about natural facts and laws.

34

D. Lazarovici

The pertinent inferences will involve empirical evidence that we possess, but only in the form of physical macrostates, not mental states. The epistemic or doxastic dimension comes in at a later point and lies, strictly speaking, outside the purview of physics. It concerns the normative implications of physical/nomological facts, rationality principles that provide a link from objective physical results to rational expectations rather than from one epistemic state to another. Of course, to solve deterministic equations of motion for a physical system, we need to know the system’s initial state. But the prediction thus provided by the theory involves the deterministic laws and the state of the system—not the laws and our knowledge or beliefs about the state of the system. The content of the theory is “If the state at time .t = 0 is X, then the state at T will be .X(T ),” not: “If you believe that the state at .t = 0 is X, you (should) believe that the state of T will be .X(T ).” The latter statement might be in some sense entailed by the former, but it is not a statement about nature. Yet, what would be easily recognized as a category mistake in case of such initial value problems has become a popular way of thinking about physics when it deals with statistical phenomena. With the consequence, one could half-jokingly say, that some physicists are talking only about their own ignorance. This is all to argue that subjective probabilities in physics would be offtarget even if they made sense. A satisfying answer to why, say, the laws of thermodynamics hold cannot involve concepts like ignorance, credence, information, or belief. There is also reason to question whether it makes sense to represent an agent’s epistemic state as a probability measure on a continuum such as the phase space of a micro-physical theory. It would be quite the formidable agent, possessing exact credences—specified to infinitely many decimal places—for the entire sigma-algebra of events; uncountably many subsets of possible microscopic configurations, most of them not even definable in human terms. To end on a more conciliatory note, let me mention the recent book by Jean Bricmont (2022), which gets a lot right in its attempted synthesis of subjective and objective probabilities. Bricmont’s objective Bayesianism, combined with the language of typicality, has much in common with the view that will be presented here. So much, in fact, that I think its subjectivist undertones can be removed without loss.

2 Typicality in Probability Theory

35

2.3.2 Stochastic Laws Throughout our further discussions, I won’t say much about indeterministic theories or, more precisely, fundamentally stochastic laws that involve irreducible randomness. There are interesting philosophical questions about the probabilities—or maybe propensities (Popper, 1959)—that such laws would describe, in particular, the question of what they are actually doing in the world. I take these questions seriously—seriously enough that I don’t think indeterministic probabilities are easier to comprehend than deterministic ones—but they are of a different nature than the questions I want to focus on. The one important observation I do want to make is that, even if the fundamental laws of physics assigned probabilities to individual events, their empirical and explanatory import would ultimately be based on a form of typicality reasoning or Cournot principle of the negligible event. Indeed, the necessity of such a principle comes out particularly clearly in the indeterministic case. Let us consider again the example of a series of N coin tosses (the reader may also think of spin measurements on a spin-1/2 particle if she prefers), conceived as intrinsically random events. This is to say that, for each trial, the fundamental laws of nature do not determine a unique outcome but only assign a probability of .1/2 to the possible outcomes 1 (heads or spin up) and 0 (tails or spin down). The possibilities for the first four iterations of the experiment are depicted in Fig. 2.2. We obtain a branching structure of possible histories with the laws determining the probabilistic weight of each branch. Any history, that is, any conceivable sequence of outcomes is possible and (in this case) equally likely. So in what sense is the law even informative? How does it make any empirical predictions at all? We know the answer already: What we should take such a stochastic law to predict (and explain) is not any individual history but the empirical regularities to which it assigns a very high cumulative probability. In the present example, all possible branches have equal probability, but for large N , the set of branches in which the relative frequencies of 0’s and 1’s are approximately .1/2 sums up to a total probability that is very nearly one.

36

D. Lazarovici

1 2

1 2

1 2

1 2 1 2

1 2 1 2

1 2

1 2

1 2 1 2

1 2

1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2

Fig. 2.2 Branching structure of possible histories after four coin tosses. The weights of the branches are probabilities assigned by a stochastic law

In other words, for the law to have any empirical implications, we must understand it as saying that this typical frequency of .1/2 will, in fact, be observed. Or, conversely, that an atypical distribution (such as 0 coming out on every single trial), though not impossible, will not obtain. If it did, we would—and should—consider the law to be empirically falsified. That is despite the fact that the law allows for possible worlds in which the nomic probabilities differ significantly from the instantiated distribution. From a modal realist perspective, those are worlds in which physicists would come to justified but incorrect conclusions about what the laws actually are, being “misled by a cosmic run of bad luck” as Maudlin (2007, p. 277) puts it. “But there here is no insurance policy against bad epistemic luck. If the world came into existence just a few years ago, in the appropriate state (apparent fossils in place, etc.), we will never know it: Bad luck for us” (ibid.).

2 Typicality in Probability Theory

37

The issue that probabilistic laws would not (or barely) constrain nomic possibilities would be only more pronounced if we considered stochastic micro-dynamics, since the world’s micro-constituents evolving at random could realize any conceivable macro-history. The reader might have noticed that this is very much analogous to the situation I described at the very beginning for deterministic laws. Even if initial conditions determine a unique history of the world, these initial conditions are unknowable in practice and contingent in principle. Special microscopic conditions, however, could realize even the wildest macroscopic history. The difference between stochastic and deterministic dynamics is that the latter do not assign probabilities to possible histories. What is typical according to a deterministic law is what obtains for nearly all possible initial conditions. On the one hand, this raises an additional question about what measure we should use to quantify these possibilities.5 On the other hand, it rids us entirely of the problem of what the nomic probabilities are supposed to refer to in the world.

References Bricmont, J. (2022). Making sense of statistical mechanics. Undergraduate lecture notes in physics. Cham: Springer International Publishing. Georgii, H.-O. (2004). Stochastik: Einführung in die Wahrscheinlichkeitstheorie und Statistik (2nd edn.). Berlin: De Gruyter. Kac, M. (1959). Statistical independence in probability, analysis, and number theory. The Carus Mathematical Monographs. Washington DC: Mathematical Association of America. Markov, A. A. (1912). Wahrscheinlichkeitsrechnung. Leipzig, Berlin: B.G. Teubner. Maudlin, T. (2007). What could be objective about probabilities? Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics, 38(2), 275–291.

5 In

fact, this question might also arise in stochastic theories if the laws determine only transition probabilities but no probability distribution over initial states.

38

D. Lazarovici

Maudlin, T. (2020). The grammar of typicality. In V. Allori (ed.). Statistical mechanics and scientific explanation: Determinism, indeterminism and laws of nature. World Scientific. Popper, K. R. (1959). The propensity interpretation of probability. The British Journal for the Philosophy of Science, 10(37), 25–42. Taleb, N. N. (2010). The black swan: The impact of the highly improbable (2nd edn.). Random House Publishing Group.

3 Cournot’s Principle

Debates about the metaphysical interpretation of probabilities often lose sight of the more prosaic question of how to make a connection between probabilistic results and observable facts in the world. We are thereby faced with the following dilemma: Only probabilistic facts follow from probability theory. There are no genuinely probabilistic facts in the world (any possible event either occurs or not). .∴ No facts about the world follow from probability theory.

The second premise could be denied by admitting something like propensities into the physical ontology. But we can replace the word “facts” with “empirical facts” and end up with an analogous conclusion: logically, no empirical facts follow from probabilistic ones. In particular, we cannot deduce from probability theory that the frequency with which a repeatable event occurs will approach its probability; at best that it will do so with high probability. To avoid running in circles, we thus need some kind of bridge principle providing a link between probability theory

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Lazarovici, Typicality Reasoning in Probability, Physics, and Metaphysics, New Directions in the Philosophy of Science, https://doi.org/10.1007/978-3-031-33448-1_3

39

40

D. Lazarovici

and the world. A principle that captures the extra-logical inference from probabilistic results to physical/empirical predictions. An obvious idea is to postulate a frequentist principle (FP) like: (FP) If the probability of an event A is p, the event will occur approximately .N × p times on a large number N of independent trials.

This has been called the empirical law of chance, a name that seems to go back to the influential Italian textbook by Castelnuovo (1919). While there is little disagreement that FP captures most of the empirical content of probabilities, it has several issues that put its viability as a fundamental principle in question. I would count neither its fallibility nor its vagueness among them (we may have to live with both when it comes to probabilistic predictions) were it not for the fact that we use the probability calculus to estimate what deviations of the relative frequency from the theoretical probability to expect for a given number of trials. This suggests that, as far as empirical predictions go, some qualified version of FP should have the status of a theorem rather than an axiom. The second major problem is that it is rarely clear, outside of more or less controlled “random experiments,” if and how an event can be embedded in an ensemble of “independent trials.” The final problem is that certain complex events surely cannot. In particular, the principle makes no sense on the universal level, i.e., when applied to a proposition about our universe as a whole. This may not worry the pragmatist but should disqualify FP for anyone who wants to understand, for instance, how statistical mechanics explains the thermodynamic arrow of time. Andrey Kolmogorov, the father of modern probability theory, was very explicit about where he saw the connection between mathematical formalism and the world of experience. His philosophy and its intellectualhistorical context are discussed in detail in the wonderful paper by Shafer and Vovk (2006). In Chapter 1, Section 2 of his Grundbegriffe (1933), aptly titled “Das Verhältnis zur Erfahrungswelt” (The connection to the

3 Cournot’s Principle

41

world of experience), Kolmogorov sets out two principles that are used to motivate and give flesh to the axioms of probability: Under certain conditions, that we will not go into further here, we may assume that an event A that does or does not occur under conditions .S is assigned a real number .P (A) with the following properties: A. One can be practically certain that if the system of conditions .S is repeated a large number of times, n, and the event A occurs m times, then the ratio .m/n will differ only slightly from .P (A). B. If .P (A) is very small, then one can be practically certain that the event A will not occur on a single realization of the conditions .S.

We recognize Principle A as a version of the frequentist principle (FP). Principle B is a version of Cournot’s principle (CP) that I implicitly applied throughout the previous discussion of probability theory. It is not entirely clear why Kolmogorov saw the need to include both since Principle A can be derived from Principle B by the law of large numbers! Part of the answer is that, even if Principle A is redundant, it “has an independent role in Kolmogorov’s story […] because it comes into play at a point that precedes the adoption of the axioms and hence the derivation of Bernoulli’s theorem [the law of large numbers]: it is used to motivate the axioms” (Shafer & Vovk, 2006, p. 92). Certainly, Kolmogorov was also mindful of the fact that relative frequency will approach probability under conditions for which we might not be able to prove a law of large numbers (i.e., in particular, establish good enough independence). But there is no indication that Postulate A is supposed to apply even in cases in which there is a high probability that .m/n differs significantly from .P (A). This is to say that Cournot’s principle (CP) will ground the frequentist principle whenever the latter is justified. But CP is more general, more fundamental, and the only link to the world that we actually need.

42

3.1

D. Lazarovici

Formulations of Cournot’s Principle

Cournot’s principle has been somewhat forgotten in modern times but has a long tradition in the philosophy of probability, with some version of it being endorsed by Hadamard, Fréchet, Lévy, and Borel, among others (see Martin (1996); Shafer and Vovk (2006) on the history of CP). An unfortunate historical fact is that the formulation provided by its namesake sounds plainly wrong, at least with today’s meaning of the operative term: A physically impossible event is one whose probability is infinitely small. This remark alone gives substance – an objective and phenomenological value – to the mathematical theory of probability. (Cournot, 1843)

It is clear from his examples that by “physically impossible event,” Cournot did not mean an event forbidden by natural laws:1 We consider it physically impossible that a material cone remains in equilibrium on its apex; that an impulsion communicated to a sphere is exactly directed along a straight line passing through its centre and therefore does not lead to any rotation; that the centre of a disc falling on a floor covered with square tiles lands on the intersection of the diagonals of a tile; that an angle-measuring instrument is exactly centred; that a balance is rigorously exact; that a certain measure rigorously conforms to the standard, etc. (ibid.)

Interestingly, these are all examples in which it is evident from the geometry of the problem alone that, out of a continuum of possible configurations, only a vanishingly small and very particular subset would realize the events in question. This does not mean that these events cannot happen, but it explains why they don’t (cf. Lipton (2004, p. 31) for a similar kind

1 Doubling down on confusing terminology, Cournot refers to those as “mathematically or metaphysically impossible.”

3 Cournot’s Principle

43

of non-causal explanation). With regard to the rotating sphere, Cournot elaborates: [I]f a sphere collides with a body moving in space, because of causes independent from the presence of that sphere in a certain place, it is physically impossible, and it never happens, that among the infinitely many possible directions of that body the causes of its motion lead to its exactly passing through the centre of the sphere. (ibid.)

So what Cournot means by “physically impossible” is very close to what we mean today by “atypical” in the non-probabilistic sense in which the term was introduced in Chap. 1. Not an impossible event, but one whose realization is opposed by an overwhelming majority of different physical possibilities.2 It means something objective, a fact about the nature of “the things themselves” (Cournot, 1843) that the probability calculus can refer to. However, Cournot does not spend much time defending or qualifying his claim that such events never actually happen. And to be fair, it is pretty much common sense until subjected to philosophical scrutiny and someone complains that any exact angle at which the moving body could hit the sphere would be just as “physically impossible.” When Cournot’s principle was rediscovered in the twentieth century, it was usually (para)phrased as the principle that an event with very small probability will not happen, often with the qualification that the respective event must be picked out in advance on a single trial. It is also in this form that the principle was later mocked, to great effect, by subjectivists like (de Finetti, 1974) since, taken at face value, it still runs into the obvious objection that very unlikely events are not impossible and some do, in fact, happen. We already saw a more careful formulation in the work of Kolmogorov: If an event has a very low probability, “then one can be practically certain that the event will not occur.” Roughly equivalent (by contraposition): if an event has a very high probability, we should expect it to occur. Other

2 Cournot’s

examples are special (and particularly intuitive) in that the relevant possibility space can be reduced to a simple geometric degree of freedom. In general, we have to think in terms of a more abstract phase space and adapt the notion of (relative) sizes of sets accordingly.

44

D. Lazarovici

authors have cast CP in more decision-theoretic terms, for instance: If an event has very high probability, “we should act as if [its] occurrence was certain” (Richter (1966, p. 52); my translation). What such formulations begin to grasp is the normative character of Cournot’s principle, its status as a rationality principle. What they omit— but is crucial—are the implications of an atypical event occurring after all. For even though a probabilistic model or theory is not logically inconsistent with such occurrences, they are generally the kind of events that call for additional explanation and thus challenge our theoretical assumptions. It is not impossible for a fair die to land on six 80 times on 100 trials, but the rational conclusion would be that the die is loaded. Hence, my proposed formulation of Cournot’s principle mirroring the rationality principles we set out for typicality: CP Reformulated Let T be a (probabilistic) model or theory and A a possible event. We shall say that T predicts A if it assigns to it a probability very close to 1. Cournot’s principle then consists in the following set of axioms: (i) If T predicts A and A occurs, then T explains the occurrence of A. (ii) If T predicts A and one accepts T , then one should expect A to occur. (iii) If T predicts A and .¬A occurs, then one should not accept T (but revise, amend, or reject it) unless .¬A is an acceptably brute fact in the sense of Sect. 1.4, i.e., if it belongs to a natural partition of the probability space with respect to which every possible outcome is atypical. The caveat regarding acceptably brute facts is in lieu of the methodological rule that A should be picked out in advance. Both deal with the “measure zero problem” that every possible outcome is atypical with respect to a sufficiently fine partition of the elementary event space. If we reason about a possible outcome picked out before observation or before performing the relevant experiment, it will generally not be a brute fact belonging to an extremely fine-grained partition. This methodological rule is, however, of limited scope since a theory can also be challenged by the unanticipated discovery of pre-existing facts.

3 Cournot’s Principle

45

My favorite example of an application of Cournot’s principle comes from Martin Scorsese’s movie Casino (1995). Ace Rothstein, played by the great Robert DeNiro, is a gambling expert hired to manage a Las Vegas casino controlled by the mafia. In a pivotal scene of the movie, he gets into the following exchange with his employee (Don Ward) in charge of overseeing the slot machines. I apologize in advance for reproducing the colorful language. Ace Rothstein: Four reels, sevens across on three $15,000 jackpots. Do you have any idea what the odds are? Don Ward: Shoot, it’s gotta be in the millions, maybe more. A.R.: Three f***in’ jackpots in 20 minutes? Why didn’t you pull the machines? Why didn’t you call me? D.W.: Well, it happened so quick, three guys won; I didn’t have a chance… A.R. [interrupts]: You didn’t see the scam? You didn’t see what was going on? D.W.: Well, there’s no way to determine that… A.R.: Yes there is! An infallible way, they won! D.W.: Well, it’s a casino! People gotta win sometimes. A.R. [grows more irritated]: Ward, you’re pissing me off. Now you’re insulting my intelligence; what you think I am, a f***in’ idiot? You know goddamn well that someone had to get into those machines and set those f***in’ reels. The probability of one four-reel machine is a million and a half to one; the probability of three machines in a row; it’s in the billions! It cannot happen, would not happen, you f***in’ momo! What’s the matter with you?

3.2

On the Rationality of Cournot’s Principle

3.2.1 Moral Certainty Cournot’s principle stands in the philosophical tradition of moral certainty, which describes a degree of certainty that falls short of absolute metaphysical/logical/mathematical certainty but must nonetheless be considered sufficient for practical purposes or the purposes of a particular field of inquiry. The distinction goes back to Aristotle, who explains that it would be unreasonable to hold moral philosophy to the same standard of proof as mathematics (Nicomachean Ethics 1094b). Thus, in the literal sense, moral certainty refers to the degree of certainty with which moral

46

D. Lazarovici

truths can be known. In the more general and relevant sense, it means as much as beyond a reasonable doubt, where unreasonable doubt amounts to a violation of epistemic and sometimes even ethical norms, albeit not the laws of logic. This also comes out clearly with Leibniz, who writes: Certainty might be taken to be knowledge of a truth such that to doubt it in a practical way would be insane; and sometimes it is taken even more broadly, to cover cases where doubt would be very blameworthy (N.E. 445, quoted after Leibniz (1765/1982))

While Leibniz already invoked probabilistic notions, the connection with a mathematical theory of probability is first made explicit in Jakob Bernoulli’s seminal Ars Conjectandi (1713). Bernoulli defines something as “morally certain if its probability is so close to certainty that the shortfall is imperceptible” and “morally impossible if its probability is no more than the amount by which moral certainty falls short of complete certainty.” He goes on to explain: Because it is only rarely possible to obtain full certainty, necessity and custom demand that what is merely morally certain be taken as certain. It would therefore be useful if fixed limits were set for moral certainty by the authority of the magistracy—if it were determined, that is to say, whether 99/100 certainty is sufficient or 999/1000 is required…

Both the pragmatic and normative aspects of probability reasoning are discernible in this statement. As is the problem of vagueness, i.e., of fixing a threshold value for “moral certainty.” I will say more about this soon.

3.2.2 Does Nature Have to Obey Cournot’s Principle? Émile Borel referred to the principle that “events with sufficiently small probability don’t happen” as “the only law of chance” (Borel, 1948). While this characterization is, again, too strong if read as claiming the impossibility of atypical events, Borel’s “law of chance” always had a multifaceted meaning as a bridge law, a rationality principle, and an empirical generalization that beautifully captures the subtle status of

3 Cournot’s Principle

47

Cournot’s principle. But these many facets also contain a tension between the factual and normative aspects of the principle. If CP amounted only to the rule that very unlikely events rarely happen (this is sometimes called the weak Cournot principle), it would not ground the belief that what we observe in a particular case is not one of these rare or unusual instances. But we make this assumption all the time, not least when we infer theoretical probabilities from observed frequencies. On the other hand, if CP were only an epistemic norm—a claim about what we should believe or expect—would we not lose the link to the physical world? There it is, the usual dialectic between the objective and epistemic nature of probability. Here, it essentially comes down to the question: Who has to abide by Cournot’s principle, rational agents or nature itself? The synthesis comes with the role that CP plays in the dynamics of theory acceptance and rejection, most explicitly in the practice of statistical hypothesis testing: If the probability of an observation O under a hypothesis H is very low, we reject the hypothesis. Somewhat oversimplified, .P(O | H ) is the infamous p-value. The standard convention in special sciences sets the threshold for rejecting a “null-hypothesis” in a single study at .p = 0.05 (and it is currently debated whether this value is too large); in particle physics, the “gold standard” is 5 Gaussian standard deviations or roughly .3·10−7 . But no matter how huge the sample or how often an experiment is reproduced, no p-value, however small, would make it impossible for a true hypothesis to be falsely rejected (called a type I error in statistics). The upshot is that rational scientists would never accept a theory that makes the relevant phenomena atypical—even if that theory were, in fact, true. Whether we have to admit the possibility that our world is an atypical “model” of the true theory depends, to a large extent, on our metaphysical views about the laws of nature. The Humean best system account, which regards laws as optimal summaries of contingent regularities in the world (see Chaps. 5 and 16), offers a convenient way out: A probabilistic law can get it wrong some of the time (i.e., assign a very low probability to an event that actually happens), but it cannot be wrong a lot of the time, or else it would not be part of (or deducible from) the best systematization of the world. I agree with this view when it comes to effective probabilistic models as opposed to “fundamental”

48

D. Lazarovici

laws of nature, but I am a modal realist in regard to the latter. For me, it is not a conceptual truth that the actual world does not correspond to an atypical instantiation of the laws but a foundational belief of its scientific investigation. I admit the possibility of a “type I error” on the universal scale in the sense that I admit the possibility of many other skeptical scenarios. But in its deepest and most abstract sense, Cournot’s principle—and typicality reasoning, in general—expresses a necessary trust in the rationality and comprehensibility of nature, a trust that “God is subtle but he is not malicious,” as Einstein put it.3 In the end, every good scientist accepts the rationality of the scientific method while also being aware of the possibility of error. For Humeans, this epistemic humility is only due to limited data (if she knew all concrete physical facts, a sensible physicist could not be wrong about the laws). To my mind, it is the right attitude toward nature, regardless of observational limitations. In practice, it won’t make much of a difference since we are, in fact, limited beings.

3.2.3 Black Swans and Pascal’s Wager While philosophers are prone to worry about the possible, no matter how implausible, overconfidence is the cardinal sin of the practitioner when it comes to probabilistic forecasts. Thus, it has been convincingly argued (e.g., in Silver (2012); Taleb (2010)) that many catastrophic prediction failures, especially in economic, social, and environmental sciences, come from neglecting the possibility of low-probability but high-impact events (so-called black swans): market crashes, political revolutions, environmental disasters, etc. This lesson may seem to go against the rationality of Cournot’s principle—the principle of the negligible event, as it has been called—and favor decision-making based on expected utility. A few remarks are in order here. First, pragmatic considerations may very well figure in choosing the threshold for “very low probability” and thus identifying the range of typical versus negligible outcomes. If the stakes are high, the bar for applying CP should also be set high. Second, 3 From

Oxford Essential Quotations, 5th ed, Ratcliffe (2016).

3 Cournot’s Principle

49

one must be careful in identifying the relevant events to which CP is applied. Even the cunning investor who successfully bets on rare events that the market tends to underestimate is following CP: She is betting on the typical regularity that unexpected market events happen occasionally. Finally, we saw with the St. Petersburg paradox that, in the most extreme cases in which typicality and expected utility pull in opposite directions, it is fairly obvious that typicality wins out as a guide to rational decisionmaking. In the end, the difference between a cautious forecaster and a paranoic (or delusionist) is that the former will apply CP eventually. If we could never neglect the possibility of extreme events based on their minuscule probability alone, we would constantly have to buy into Pascalian wagers of sorts since there is essentially no upper bound on the magnitude of conceivable catastrophes (or windfalls).

3.2.4 The Lottery Paradox and Rational Belief As a normative bridge principle grounding predictions and explanations, Cournot’s principle has a similar status to David Lewis’s “Principal Principle” (Lewis, 1980), which we will discuss in more detail in Chap. 5. In a nutshell, the Principal Principle postulates that we should align credences, i.e., degrees of belief, with the objective physical probabilities. Lewis thus assigns the same epistemic and behavior-guiding function to all probability values, whereas it is characteristic of the view associated with Cournot that statements of “very high” and “very low” probability are privileged with respect to their physical content and normative implications. If one wants to conceive of Cournot’s principle in doxastic terms, one could say that it does not tie probabilities to degrees of belief but “very high probability” to belief simpliciter. The problem that then arises is that rational belief is usually assumed to be closed under conjunction Bel(A) ∧ Bel(B) ≡⇒ Bel(A ∧ B),

.

(3.1)

50

D. Lazarovici

while typicality or “very high probability” is not (unless it referred only to events of measure 1, which would make CP of very limited practical use). Clearly, the probability of .A1 ∩ A2 ∩ . . . ∩ An could get arbitrarily small with increasing n, even if that of each .Ai is very large. We are thus facing the infamous lottery paradox (Kyburg, 1961): It is very likely for any lottery ticket to be a loser. Hence, I should believe that ticket 1 will lose, that ticket 2 will lose, …, and that ticket N will lose. But then, by (3.1), I should believe “ticket 1 will lose, ticket 2 will lose,…, and ticket N will lose”—which is certainly false if one of the tickets will be drawn for sure. Such cases in which a conjunction of typical facts ceases to be typical, or even becomes atypical, threaten to make Cournot’s principle inconsistent with normal doxastic logic. A possible reaction is to deny that rational belief—or at least the notion of expectation associated with CP—is closed under conjunction. This was, in fact, the original point of the lottery paradox whose author later taunted the insistence on (3.1) as “conjunctivitis” (Kyburg, 1970). The preface paradox (Makinson, 1965) hits a similar note. In essence, it seems perfectly reasonable to believe that at least one of your beliefs is false. If we think in terms of explanations, which were central to our formulation of CP, it is equally plausible that, based on nuclear physics, no further explanation is required for the fact that plutonium atom 1 hasn’t decayed within the past hour, that plutonium atom 2 hasn’t decayed within the past hour, and so on, but that no atoms decaying within .1 kg of .Pu would very much call for additional explanation (maybe we have discovered a new isotope?). We face, of course, a “sorites problem” (what threshold amount of .Pu would make no decay “puzzling”?), but we established at the outset that probabilistic reasoning is vague. So the bullet of rejecting (3.1) is not a particularly hard one to bite. However, what degrees of belief (coupled with probabilistic logic) get right, and what comes closer to solving the paradoxes rather than biting a bullet, is that we tolerate different levels of uncertainty in different contexts.

3 Cournot’s Principle

51

CP and the Stability Theory of Belief Hannes (Leitgeb, 2014, 2017) tackled the problem of reconciling normal doxastic logic and degrees of belief (satisfying the axioms of probability) with the Lockean thesis (Foley, 1992): There exists a threshold .r ≥ 21 such that any proposition A is believed if and only if the credence in A is at least r. Formally, (LT ) Bel(A) ⇐⇒ P (A) ≥ r.

(3.2)

.

In a nutshell, Leitgeb shows that both (3.2) and (3.1) can be maintained for the price of admitting that belief is context-sensitive, i.e., that the threshold value r is relative to P and a (countable) set .⨅ of relevant propositions partitioning our probability space. In his stability theory of belief, there then exists, in any given context, a unique proposition .BW of stably high probability (.P (BW | A) > 21 for all A with.A ∩ BW = ∅) such that .Bel(B) ⇐⇒ BW ⊆ B. Obviously, .BW = A and r can Bel(A)

be set equal to .P (BW ). In general, the finer the partition .⨅, the larger r has to be chosen to obtain a consistent set of beliefs. For instance, in the scenario of the lottery paradox, one may care either about the outcome that participant 1 wins (.⨅ = {{w1 }, {w2 , . . . , wN }}) or about the outcome that someone wins (.⨅ = {∅, {w1 , . . . , wN }}), but it is harder to imagine a realistic decision problem that would require us to represent both at once in the same logical algebra. The first partition .⨅ is relevant to the gambler, who better act as if his ticket is not going to win (instead of buying a boat in anticipation of his imminent fortune). The second is relevant to the lottery company, which better be prepared to pay out the jackpot. Although the connection is rarely made (neither Foley nor Leitgeb mentions Cournot), there are obvious parallels between the Lockean thesis and Cournot’s principle. The two major differences are that 1. LT starts out with subjective probability, while CP is usually concerned with the epistemic implications of objective probability assignments, such as probabilities derived from a scientific theory without taking anyone’s epistemic state as input.

52

D. Lazarovici

2. While LT requires only .r ≥ 21 , this is too generous for appealing to CP. In terms that are nowadays more commonly associated with law, a threshold probability of . 21 would correspond to the standard of a “preponderance of evidence,” while CP is about certainty “beyond a reasonable doubt.” These are the main reasons why Leitgeb’s stability theory of belief does not carry over one-to-one for our purposes. Still, there is precedence and independent motivation for regarding Cournot’s principle as context dependent. In general, we do not care about all possible events .A ⊂ Ω, at least not all at once. Especially in statistical mechanics, when .Ω is microphysical phase space, most measurable subsets are just arbitrary collections of microstates that do not correspond to any meaningful macro-event. Instead, a specific context of inquiry will be associated with a limited set of macroscopic variables and propositions that lead to the relevant partition of .Ω. And in each context, there can be a different 1 .δj such that CP applies when 2 P(A) > 1 − δj , A ∈ ⨅j .

.

(3.3)

In principle, .⨅ and .δ can be balanced such that this condition is closed under logical conjunction and CP will cohere with Leitgeb’s theory of belief. In particular, there then exists a strongest typicality fact entailing all others.4 Often, different contexts will be associated with different scales of time, area, or sample size. The probability of an earthquake with a Richter magnitude .≥10 occurring next week in Sacramento is negligible (given the absence of seismic indicators for one); the probability of such an earthquake hitting California within the next 100 years or so is not. In general, different scales will be relevant for different epistemic agents, e.g., individual property owners as opposed to insurance companies or government regulators. In a scientific context—when one could say that the relevant epistemic agent is the scientific community—we generally care about robust phe4 Cf.

our Proposition 3 in Chap. 6.

3 Cournot’s Principle

53

nomena and statistical regularities (facts that are not acceptably brute). It is in such contexts that a theory really sticks its neck out and says that atypical events will not happen—or else we should reject the theory. The class of phenomena that theories are tasked with explaining will, however, differ across scientific disciplines, and the threshold for typicality seems to be different, in an interesting hierarchical way, for regularities falling under the purview of fundamental physics or the various special sciences.5 We will return to this point in Chap. 15. Borel actually made proposals for the orders of magnitude characterizing negligible probabilities: p < 10−6 on the individual human scale, −15 .p < 10 on the terrestrial scale, −50 .p < 10 on the cosmic scale. .

He argues, for example: “In the ordinary conduct of his life, every man usually neglects probabilities whose order of magnitude is less than .10−6 , that is, one millionth, and we will even find that a man who would constantly take such unlikely possibilities into account would quickly become a maniac or even a madman” (Borel, 1939, p. 6). Despite occasional excursions into more worldly territory, we are mostly concerned with what Borel describes as negligible probabilities on the super-cosmic scale and associates with “the demands of scientists and philosophers.” Those are “probabilities that are to be assessed by a power of ten whose negative index is in excess of one million, and may even be as high as billions. Such are the probabilities encountered in the kinetic theory of gases and in thermodynamics; they are also those involved in Jeans’ Miracle (the water that is placed in a burning-hot oven and changes into ice) or in the typewriting miracle (the exact reproduction, by pure chance, of a thousand-page volume, or even of a library)” (Borel, 1963, p. 120).

5 Which is in line with what the stability theory of belief would suggest if a more specialized science

corresponds to a lower level of detail and hence larger partition cells.

54

D. Lazarovici

References Bernoulli, J. (1713). Ars Conjectandi. Fratrum: Impensis Thurnisiorum. Borel, E. (1939). Valeur pratique et philosophie des probabilités. Paris: GauthierVillars. Borel, É. (1948). Le hasard. Paris: Presses universitaires de France. Borel, E. (1963). Probability and certainty. London: Walker. Castelnuovo, G. (1919). Calcolo delle probabilità. New York: Albrighi, Segati & C. Cournot, A. A. (1843). Exposition de la théorie des chances et des probabilités. L. Hachette. de Finetti, B. (1974). Theory of probability: A critical introductory treatment (Vol. 1). New York: Wiley. Foley, R. (1992). The epistemology of belief and the epistemology of degrees of belief. American Philosophical Quarterly, 29(2), 111–124. Kolomogoroff, A. (1933). Grundbegriffe Der Wahrscheinlichkeitsrechnung. Ergebnisse Der Mathematik Und Ihrer Grenzgebiete. 1. Folge. Berlin: SpringerVerlag. Kyburg, H. E. (1961). Probability and the logic of rational belief. Wesleyan University Press. Kyburg, H. E. (1970). Conjunctivitis. In M. Swain (ed.). Induction, acceptance, and rational belief (pp. 55–82). Dordrecht: D. Reidel. Leibniz, G. W. (1982). New essays on human understanding (abridged ed.). Cambridge: Cambridge University Press. Leitgeb, H. (2014). The stability theory of belief. Philosophical Review, 123(2), 131–171. Leitgeb, H. (2017). The stability of belief: How rational belief coheres with probability. Oxford: Oxford University Press. Lewis, D. (1980). A subjectivist’s guide to objective chance. In W. L. Harper, R. Stalnaker, & G. Pearce (Eds.). IFS: Conditionals, belief, decision, chance and time, The University of Western Ontario Series in Philosophy of Science (pp. 267–297). Dordrecht: Springer Netherlands. Lipton, P. (2004). Inference to the best explanation (2nd ed.). London: Routledge. Makinson, D. C. (1965). The paradox of the preface. Analysis, 25(6), 205–207. Martin, T. (1996). Probabilités et critique philosophique selon Cournot. Librairie Philosophique J. VRIN. Ratcliffe, S. (Ed.). (2016). Oxford essential quotations (Vol. 1). Oxford University Press.

3 Cournot’s Principle

55

Richter, H. (1966). Wahrscheinlichkeitstheorie. Number 86 in Grundlehren der mathematischen Wissenschaften (2nd ed.). Berlin: Springer-Verlag. Shafer, G. & Vovk, V. (2006). The sources of Kolmogorov’s Grundbegriffe. Statistical Science, 21(1), 70–98. Silver, N. (2012). The signal and the noise: Why so many predictions fail—but Some don’t. New York: Penguin. Taleb, N. N. (2010). The Black Swan: The impact of the highly improbable (2nd edn.). Random House Publishing Group.

4 A Typicality Theory of Probability

This chapter will introduce our proposed interpretation of deterministic probabilities as typical relative frequencies. In the language of typicality, this approach was laid out by Goldstein (2012) and expanded in the textbooks of Dürr et al. (2017) and Dürr and Lazarovici (2020, Chap. 3) that our discussion will partly follow. Hubert (2021) defends a very similar view under the name typicality frequentism. In addition, many landmark works in the history of probability could be claimed as precedents—one may even make a case that Kolmogorov’s axiomatization of modern probability theory was proposed with a similar view in mind—but it would be a matter of historical debate how far these claims can go. It is also in this chapter that we begin to decouple the notion of typicality from that of “very high probability” and free it from unnecessary connotations of ignorance or chanciness. What is typical is simply what obtains in the overwhelming majority of possible cases. This gives Cournot’s principle an immediate intuitive appeal that further validates its fundamental status.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Lazarovici, Typicality Reasoning in Probability, Physics, and Metaphysics, New Directions in the Philosophy of Science, https://doi.org/10.1007/978-3-031-33448-1_4

57

58

4.1

D. Lazarovici

The Coin Toss

We start from the paradigmatic example of a “random experiment,” the repeated tossing of a fair coin. A sequence of coin tossings, say of length .n = 1000, can be viewed as a 0–1-sequence, where 0 stands for heads and 1 for tails. Now consider the following mathematical facts: 1. The total number of possible 0–1-sequences of length 1000 is .21000 . 2. The n number of sequences of length .n = 1000 with exactly k heads is . . k 3. The values of this combinatorial factor for different k are shown in Table 4.1. We care, in particular, about the relative number of sequences with .k ≈ 500 (equal distribution of 0 and 1) versus those with .k ⪡ 500.     differs from . 1000 by a huge factor of .1036 . More We see that . 1000 300 500 generally, the number of sequences with a roughly equal distribution of 0’s and 1’s is overwhelmingly greater than the number of sequences with a distinctly uneven distribution. In fact, we can readily estimate from the bottom row of the table that sequences with .k ∈ [500 ± 50] make up nearly all possible sequences; those with fewer than 450 1’s or 0’s contribute almost nothing to the total number. Thus, all sequences are different; some consist almost entirely of 0’s, others contain twice as many 1’s, etc. However, among the set of possibilities, we find a typical regularity that comes with large numbers (the large number here being .n = 1000): in nearly all possible sequences, 0 and 1 (heads and tails) appear with roughly equal frequency. This equidistribution is typical. Evidently, this is just a pedestrian version of the law of large numbers. Noteworthy is that we did nothing but count. Table 4.1 Absolute and relative number of 0–1-sequences   of length .n = 1000 with k zeros. The given values are approximate. Note that . nk is symmetric about .n/2 .k

100

200

300

≈ k 1000  1000 . ≈ /2 k

139 .10

215 .10

263 .10

.10

.10

.10

.10

1 . 161 10

1 . 85 10

1 . 37 10

1 . 11 10

1 . 4 10

1 . 100

1 . 40

1000

.

400 290

450 297

480 299

500 299

4 A Typicality Theory of Probability

59

Another typicality fact is that nearly all possible sequences are irregular. This is to say that they look nothing like .0101010101 . . ., but that the occurrences of 0 and 1 seem unpredictable. There are different ways to make this precise, e.g., in terms of complexity, but a simple argument is the following: Let’s say we have .26 = 64 orthographic symbols at our disposal, including mathematical symbols and a blank character. Then m  there are . (64)k ≈ 26m possible sentences of length m or less, compared k=0

to .2n possible 0–1-sequences of length n, which is a far greater number if .n ⪢ m. Therefore, the vast majority of sequences do not allow for any description that is significantly shorter than the sequence itself. Some form of irregularity or unpredictability is characteristic of what we would call “random” behavior, and here we see that (apparently) random behavior is itself a typical phenomenon. Notably, this is true regardless of whether the process producing the sequence is deterministic or intrinsically stochastic. A stochastic coin-toss law can produce very regular sequences like .0101010101 . . . but only with very low probability. And a deterministic law can typically produce very irregular sequences that pass all statistical tests for randomness. (A concrete example will be discussed below.) This is also why the question of whether our world is, in fact, deterministic or indeterministic can never be settled on empirical grounds alone. Irregular behavior in deterministic systems is closely related to the concept of chaos or dynamical instability. For actual coin-toss experiments, we intuitively know what we have to do to produce “random” outcomes: The coin must be sent spinning and whirling, enough to make the result unpredictable, because the smallest change in the initial (angular) momentum imparted to the coin by the tossing hand can lead to a different outcome, from heads to tails or vice versa. The motion of the coin becomes chaotic, in the sense that small “causes” can have large effects. This is very important for the appearance of random behavior, not just because of practical unpredictability but also because chaotic dynamics tend to produce some form of statistical independence that allows the law of large numbers to take effect.

60

D. Lazarovici

4.1.1 Normal Numbers as a Model for Coin Tossing Simple counting, as we just did to determine what is typical, won’t do the job if we want to take the coin toss seriously as a physical process whose outcome is determined by dynamical laws and initial conditions. The initial conditions for the coin are positions and (angular) momenta, continuous variables for which we can no longer count the possibilities. So how can we say which possibilities are many and which are few if there is an infinite number either way? By using a measure on the continuum, a typicality measure. In a somewhat realistic physical analysis, this would be a measure on phase space—which is 12-dimensional if we consider the coin as a Newtonian rigid body and roughly .1024 -dimensional if we consider it microscopically, as a collection of particles. Naturally, such an analysis is very difficult to perform (virtually impossible in the microscopic case). Instead, we shall consider a mathematical model that is highly instructive and easy to analyze in a rigorous way. We will not smuggle in any “randomness” due to external perturbations but conceive our model universe as a closed system in which coin tosses occur as a perfectly deterministic process. Let .Ω denote the physical state space of the system. For any initial condition .x ∈ Ω (at some fixed time .t0 ), we obtain a sequence of coin-tossing outcomes that is completely determined by x. The results of the individual tosses are given by coarsegraining functions on .Ω mapping an initial condition x to the value 0 or 1 that represents the outcome of the i’th toss. In our model, we take .Ω = [0, 1) so that .x ∈ [0, 1) is a real number in the unit interval. We are now looking for functions (“random variables”) that map this interval to the value set .{0, 1} and capture the characteristic features of coin tossing, in particular, the idea of statistical independence. The coarse-graining must produce this independence in interplay with a natural measure. When probability theory was being worked out as a mathematical discipline, a central question was whether there are natural examples of such coarse-graining functions or whether one would have to rely on contrived ad hoc constructions. The following realization, though somewhat forgotten in modern days, proved to be a huge stepping stone.

61

4 A Typicality Theory of Probability

1

r1

1/2

1

1

r2

r3

1/4

1

1

1/8

1

Fig. 4.1 Area under the graphs of the Rademacher functions .rk for .k = 1, 2, 3. The pre-images are half-open intervals

Definition 4 (Rademacher Functions) Represent each .x ∈ [0, 1) in binary form: x = 0.x1 x2 x3 . . . , xk ∈ {0, 1}, x =



.

xk 2−k ,

k≥1

so that .xk ∈ {0, 1} is the k-th digit in the binary expansion of x.1 Now for .k ∈ N, consider the functions rk : [0, 1) → {0, 1}; x → xk ,

.

(4.1)

mapping the real number x to its k’th binary digit. These are called Rademacher functions. The first three Rademacher functions are sketched in Fig. 4.1. We can use these functions to model coin tossing (cf. Kac 1959, Chap. 2). Think of .rk (x) as representing the outcome of the k’th coin toss determined by the solution of the equations of motion for the initial conditions x (plus coarse-graining as we care only about which side of the coin is facing up). The binary expansion of the real numbers thus plays the role of the deterministic physical laws.

1 With

the convention that we use .0.10000 . . . instead of .0.01111 . . . etc., when the expansion is ambiguous.

62

D. Lazarovici

The Rademacher functions capture our intuitive understanding of independence. The values of .rk (x) for .k ≤ n, i.e., the first n binary digits of x, tell us something about x (e.g., .r1 (x) = 1 ⇒ x ∈ [1/2, 1)) but imply nothing about the .(n + 1)-th binary digit or any digit after that. The precise mathematical concept of statistical independence, however, is not a feature of the macro-variables or the dynamics alone but is defined in terms of a measure. As macro-variables (i.e., coarse-graining functions), the Rademacher functions partition their domain .Ω into “cells,” the pre-image sets −1 .r k (δ), δ ∈ {0, 1} corresponding to the set of initial conditions leading to the outcome .δ on the k’th trial. In this case, we have a particularly clear intuition about the size or content of these cells: They are just disjoint unions of intervals, and the content of an interval .[a, b) is its length .λ([a, b)) := b − a. This leads to the construction of the Lebesgue measure .λ on .Rn and then to the general mathematical concept of measures, but the prototype of all measures is the intuitive content. With respect to this natural measure, it is straightforward to check that       λ rk−1 (δk ) ∩ rl−1 (δl ) = λ rk−1 (δk ) λ rl−1 (δl )

.

1 = , ∀k /= l, δk,l ∈ {0, 1}. 4

(4.2)

This product structure (4.2) defines statistical independence, and we see that it can indeed arise as a natural feature of coarse-graining variables. The independence of the Rademacher functions can almost literally be seen from Fig. 4.1. The coarse-graining yields a very distinct partition; the pre-images of .rk for different k mix or intertwine in an extremely orderly fashion to realize (4.2). This is the ideal case, the paradigmatic example of statistical independence. In more realistic physical models, the “mixing” would be much less clean and harder to picture, let alone prove.

Law of Large Numbers for Rademacher Functions With this groundwork, it is a standard mathematical exercise to derive a law-of-large-numbers result for the Rademacher functions. We consider

4 A Typicality Theory of Probability

63

again the empirical mean 1 := rk (x), n k=1 n

n .m emp (x)

i.e., the relative frequency of 1’s in the first n binary digits of .x ∈ [0, 1). Due to statistical independence, we can directly apply (2.6) to obtain  λ

.

  x ∈ [0, 1) : mnemp (x) −



1 1  >ϵ ≤ , ∀ϵ > 0.  4nϵ 2 2

(4.3)

We can phrase this result in various ways: • The set of .x ∈ [0, 1) for which 1 and 0 do not appear roughly equally often in the binary expansion has negligible measure. • For the overwhelming majority of .x ∈ [0, 1), the relative frequency of 1’s and 0’s is approximately .1/2. • A relative frequency of .mnemp ≈ 21 is typical in .Ω. • We can call this typical value .p = 21 of .mnemp the probability of “heads” (and, analogously, “tails”), which corresponds to the intuitive Laplace probabilities for coin tossing. Let us make sure that the transfer from this mathematical model to the relevant physical situation is clear. Each .x ∈ Ω corresponds to a possible initial condition, i.e., a nomologically possible world, instantiating an outcome sequence of heads and tails that is uniquely determined by the physical dynamics. For some initial conditions, almost all the coins land on tails; for others, heads come out with much higher frequency. However, nearly all possible initial conditions manifest the statistical regularity that the relative frequency of heads and tails in a long series of tosses is approximately .1/2. This is an objective “non-random” fact about the possible worlds allowed by the physical laws, just as the typical distribution of binary digits is an objective fact about real numbers. In standard textbook terminology, one would call   P(k, δ) := λ rk−1 (δ)

.

(4.4)

64

D. Lazarovici

“the probability” of the outcome .δ ∈ {0, 1} on the k’th trial. But what would be the point of using a term with so much philosophical baggage for something as prosaic as the size of intervals? It also bears emphasizing that while (4.4) is a natural mathematical concept—technically the image measure of .λ under the random variable .rk —it played no role in the final analysis and the relevant statement about typical relative frequencies. One might object that (4.3) is nonetheless a mathematical consequence of .P(k, 1) = P(k, 0) = 21 , ∀k, i.e., of the Lebesgue measure assigning equal weight to the sets of initial conditions leading to the outcomes “heads” and “tails” on individual trials. This is correct in the sense that our simple derivation based on (2.6) exploited the fact that the Rademacher functions are i.i.d. (independent and identically distributed) under .λ. But this only a sufficient—not a necessary—condition for a law of large numbers to hold. Indeed, many  than  measures other the Lebesgue  n 1 measure would make it true that .μ memp (x) − 2  > ϵ ≈ 0 for large n, i.e., agree on the typical relative frequencies while assigning different weights to the individual pre-image sets. In other words, typical relative frequencies—which we identify as the physically relevant probabilities— are very robust against variations of the typicality measure. Mathematically, it is possible to define measures on .[0, 1) with respect to which “typical” relative frequencies are very different from .1/2. But those are measures that differ radically from .λ—in the limit .n → ∞, they would have to be singular measures, i.e., concentrated on Lebesgue null sets—and could hardly be mistaken for sensible typicality measures. Mathematically, it is also possible to put a delta-measure on .x = 0 and say that almost all numbers are identically zero, but aside from technical jargon, this is merely an abuse of language. While the relevant typicality result depends in no way on the details of the Lebesgue measure, it is a particularly nice feature of this measure that the typical empirical distribution coincides with the theoretical expectation .E(mnemp ) and even the theoretical distribution (4.4) of the individual events .rk . This “statistical transparency” (Goldstein, 2012) is a non-trivial feature of the “dynamics” that make our random variables statistically independent and identically distributed under .λ. The natural typicality measures we use in physics often exhibit statistical transparency

4 A Typicality Theory of Probability

65

(in ideal situations). This is part of what makes them so natural. But it distinguishes these particular measures only as representatives of a large class of typicality measures that are all empirically equivalent.

Biased Coins Evidently, somewhere in our model, the assumption must have entered that the coin is “fair.” That was by using the binary expansion of numbers (together with the Rademacher function) to model the physical dynamics, not by using .λ as our typicality measure. A biased coin—say, one with an uneven distribution of mass—would correspond to a different evolution from initial conditions to outcomes and thus to a different partition of .Ω. The important point is that probabilities (typical relative frequencies) other than .1/2 arise from a difference in the physical dynamics, not by choosing different typicality measures.2 Special “macroscopic” boundary conditions can also lead to different typical regularities. If we impose the boundary condition .x < 2−500 (so that .rk (x) = 0, ∀k ≤ 500), the first .n = 1000 digits of the binary expansion would typically contain about three times as many 0’s as 1’s. This corresponds to what we would call a non-equilibrium situation in statistical mechanics, and in a very simplified sense, we can even see convergence to equilibrium in this model: the relative frequencies of 0 and 1 start out in non-equilibrium, with an overpopulation of 0’s, and typically approach the equidistribution as n increases toward infinity. We have, of course, a strong a priori intuition about the tossing of a fair coin. The symmetry of the coin should manifest itself in equal probabilities for heads and tails. Often, a principle of indifference is invoked to make this connection, as if the statistical equidistribution of heads and tails comes about because we have insufficient reason to

2 To adapt our toy model accordingly, consider the b-adic expansion of .x ∈ [0, 1), i.e., .x =

 0 if xk ∈ {0, . . . , m − 1} −k for any .b ≥ 2 and the macro-variables .r˜k (x) = for k≥1 xk b 1 if xk ∈ {m, . . . , b − 1} any .0 ≤ m < b. The typical relative frequency of “heads” (i.e., .r˜ = 0) is then .≈ mb , where typicality is still understood with respect to the Lebesgue measure.

66

D. Lazarovici

prefer one side of the coin over the other. In fact, the connection between the symmetry of the coin, the symmetries of the physical laws, and the approximately equal frequencies of heads and tails is made by typicality. The physical symmetries are statistically manifested in typical models of the theory. In Sect. 5.3, we will see that a crucial argument for the naturalness of the uniform typicality measure on classical phase space is, in fact, not its uniformity per se but its invariance under the symmetries of Galilean spacetime, matching those of the dynamical laws.

4.2

Typical Frequencies

The theory that has emerged from our discussion relates probabilities to relative frequencies but is different from traditional frequentism or hypothetical frequentism. Slightly oversimplified, frequentists try to define the probability p of a (repeated) event as N 1  .p = Xi , N i=1

(4.5)

while hypothetical frequentists consider the limit of infinitely many (hypothetical) trials N 1  Xi . N →∞ N i=1

p = lim

.

(4.6)

The theory proposed here understands probability as typical relative frequency, that is,    N 1     .μ Xi − p > ϵ ≈ 0,  N 

(4.7)

i=1

where .μ is a typicality measure and .ϵ is a small non-negative number.

4 A Typicality Theory of Probability

67

Evidently, the conceptual distinction between typicality and probability measures (which will be further elaborated on) is essential here; otherwise, (4.7) would be circular as a definition of probability. Probabilities, however, refer to statistical regularities, while the typicality measure is defined on sets of possible (initial) micro-conditions and only used to identify which statistical regularities obtain for an overwhelming majority of them. Just like the Lebesgue measure on the set of reals, it has nothing to do with frequencies, credences, or any particular element being picked out “at random.” This is indeed essential if we refer to the initial conditions of the universe, as any account of objective probabilities ultimately has to if it is not to beg the question. There are various objections against finite and hypothetical frequentism—Hájek (1996, 2009) formulates 15 against each. But what makes them complete non-starters for a fundamental physical analysis is that there are no meaningful frequencies—not even hypothetical ones—when we speak about the universe as a whole.3 Returning to the definition of probabilities as typical relative frequencies, we see that (4.7) will, in general, not determine a unique number p but a small range .[p − ϵ, p + ϵ] of typical frequencies. It is only in the theoretical limit .n → ∞ that we can expect the typical value to become sharp. I consider this to be a feature rather than a bug. Ultimately, most interpretations of probability fail to make sense of exact real numbers as more than a convenient mathematical abstraction. What matters, however, is that the axioms of probability are satisfied in an appropriate sense. Here, we find that typical relative frequencies are  positive (since . N1 N i=1 Xi ≥ 0) and that the typical relative frequency of  the sure event is one (since . N1 N i=1 1 = 1). We must, however, insist that the reported typicality result should be as strong as possible, i.e., we should say that the probability of the sure event is one rather than “approximately .0.9999999” (which is technically true but silly).

3 For

a comprehensive discussion of how typicality frequentism avoids the main objections against finite and hypothetical frequentism, I refer the reader to Hubert (2021).

68

D. Lazarovici

Finally, typical relative frequencies for mutually exclusive events A and B are additive in the following sense:4    N 1     μ  XAi − p > ϵA = δA ≈ 0, N  i=1    N  1    .μ XBi − q  > ϵB = δB ≈ 0   N i=1    N  1    XAi ∨Bi − (p + q) > ϵA + ϵB ≤ δA + δB ≈ 0. ⇒μ   N i=1

(4.8) If we called .(p − ϵ) and .(p + ϵ) the “lower” and “upper probabilities,” respectively, this would be familiar from theories of imprecise probabilities. However, while most such theories refer to subjective probabilities, I submit that objective physical probabilities are (except in idealized limits) unsharp. As we saw in our discussion of the law of large numbers, the range of typical frequencies tends to get narrower with increasing sample size N . Physics deals with very robust phenomena (huge N ) and thus very precise probabilities, while more specialized sciences usually study regularities with fewer instances, for which the relevant probabilities are less sharp. Notably, this is not just an epistemic or methodological claim but a physical one. The typical relative frequencies that can be grounded in the fundamental laws of nature are much less sharp for macroeconomic than for thermodynamic regularities. The Significance of Theoretical Limits In most situations in which we speak of probability, the ensemble size N is not exactly known or even fixed. For instance, what do we have in mind when we speak of the probabilities for coin tossing: .N = 1000, .N = 106 , follows from the fact that events and thus i ∨Bi = XAi + XBi for mutually exclusive  .XA  N  1   1 N  N ≤ + X − (p + q) X − p X − q     . i=1 Ai ∨Bi i=1 Ai i=1 Bi N N

4 This

 

.

1 N

4 A Typicality Theory of Probability

69

or maybe .N = #coin tosses actually occurring throughout the history of our universe? For the colloquial use, it doesn’t really matter. As long as N is reasonably large, the typical relative frequencies are approximately .1/2. However, if we want to be more precise about “typical” and “approximately,” we need at least a ballpark figure for the ensemble size. Could we also say that the typical relative frequencies for coin tossing are approximately .0.499 and .0.501? We could, but only when referring to a coin-toss experiment in which the sample size is not too large. In contrast, the typical limit frequencies for .N → ∞, here . 21 , are distinguished by the fact that they are a good reference point for the range of typical values for arbitrarily large N . Following Goldstein (2012), we could call these limits theoretical probabilities, as opposed to physical probabilities, though only if this is not misunderstood as introducing two distinct philosophical concepts. The theoretical limits are just a useful means to identify typical relative frequencies, both in the technicalmathematical sense—when they simplify proofs or calculations—and in the pragmatic-linguistic sense—when theoretical probabilities are used as shorthand for a small range of nearby frequencies. Speaking of theoretical limits, we should not let (4.6) stand. From probability theory, a convergence of the empirical mean as in (4.6) can only be obtained as a typicality result. In particular, in the sense of the weak law of large numbers (“convergence in probability”)    N 1     .∀ϵ > 0 : lim μ χi − p  > ϵ = 0,  N→∞ N 

(4.9)

i=1

or in the sense of the strong law of large numbers (“almost sure convergence”) 

N 1  χi = p .μ lim N→∞ N i=1

 = 1.

(4.10)

With the latter, we are really in the realm of mathematical abstraction, though. Unless infinite ensembles are physically possible, there is no

70

D. Lazarovici

reference class of possible worlds for which the event .limN →∞

1 N

N 

χi =

i=1

p makes sense.

4.3

Probabilities for Singular Events

The role of a typicality measure is not to assign an exact value to every possible event or phase space region but only to identify “very large” and “very small” sets of initial conditions. Probabilities, on the other hand, refer to statistical regularities rather than singular events. These statistical regularities are predicted and explained by the physical laws that make them typical. The typicality account thus explains what objective probabilities mean, why they are compatible with determinism, and how they are grounded in the fundamental laws. One may, however, wonder how this view could make sense of a statement like: “The probability of event A: My dog gets sick, given B: He eats the piece of chocolate I dropped on the floor is p.” Well, to the extent that this conditional probability is meant to express more than a subjective credence, we must interpret the statement by embedding the respective singular events into a statistical ensemble. For instance, this and that fraction of dogs get sick if they eat this or that amount of chocolate per kilogram body weight. It also seems possible to decompose the events into a finer-grained description that is then part of a statistical regularity, e.g., the rate at which a dog’s intestinal tract can metabolize theobromine or, finer still, the interaction rates of certain molecules. I am convinced that the intuition that singular macro-events could or should have an objective physical probability, in addition to a deterministic micro-description, comes from such possibilities of embedding or decomposition—which are, notably, non-unique and always require further context and analysis. This non-uniqueness—the fact that there can be many plausible events to embed an event into a statistical ensemble—is the root of the so-called reference class problem (see Hájek (2006) for a detailed discussion). But it

4 A Typicality Theory of Probability

71

is only a fundamental impasse (as opposed to a practical complication) if one insists on a notion of objective probability that pertains to individual events. According to the typicality view, (deterministic) physical probabilities always refer to typical ensembles. One and the same event may indeed be part of multiple statistical regularities, and which one we find most informative or take to guide our actions will depend on pragmatic considerations, not least on which regularities we are able to identify in the first place. If I kept track of the initial orientation of a coin and were sensitive enough to narrow its initial angular momentum down to a small interval .[L1 , L2 ], I would be able to place the respective coin flip in a statistical ensemble whose typical frequencies might differ from . 21 . Conceptually, this would pose a problem if we tried to associate a physical probability with each individual coin flip. There is, however, no contradiction between the statements “the typical relative frequency of heads in a long series of coin tosses is . 21 ” and “the typical relative frequency of heads in a long series of coin tosses with initial angular momentum .L ∈ [L1 , L2 ] is . 31 .” Notably, no promise is made that using the additional information leads to a more accurate prediction for that particular trial, but it will typically pay off in the long run.

4.3.1 Rational Credences from Statistics One might still worry that the typicality theory is doing too little by being, so to speak, unopinionated about most singular events. It may be true that the laws of physics don’t predict probabilities for, let’s say, the next presidential election, but some pollsters and political scientists certainly do. One could understand these probabilities subjectively, and the role of professional forecasters is to provide educated guesses. But if there is an objective sense in which such predictions can be more justified or less, it should be at least partially grounded in physical facts. To see how, we must take a brief look at what pollsters actually do besides guesswork. Here is an idealized example:

72

D. Lazarovici

Setup We imagine an exit poll for a presidential election. Every single voter is assigned a number from 1 to N, and a pollster picks a sample of 1000 participants using a random number generator. Result Out of the 1000 participants, 480 respond that they voted for the party A candidate, while 520 voted for the candidate of party B. Fair Sampling Hypothesis Each voter had an equal and independent chance of being polled. Hence, for each interview, the probability of picking a party A voter is equal to the actual share .p = Nk of votes that party A has received.5 Mathematical Fact Under the above assumption, the sampling can be described as a Bernoulli process with an unknown probability p. From the polling result, one can then compute a probability of approximately .0.8 that the actual percentage of party A voters lies below .50%. Prediction We should believe with 80% confidence that the candidate of party A has lost the popular vote. It is important to note that the probabilistic nature of the prognosis comes only from the “fair sampling hypothesis.” The chances it refers to can be interpreted as rational credences, but ones justified by the typical regularity that the number generator produces fair samples in the long run (that is, all possible numbers with roughly equal frequency and in “random” order). What the poll, together with the mathematical theory of probability, accomplishes is then to base a credence for a complicated singular event—the outcome of a presidential election—on a typical regularity concerning relatively simple and regular events—the outputs of a random number generator. This is as far as the physical basis of the election prognosis goes. To my mind, it is far enough.

5 I am ignoring the difference between sampling with or without “replacement,” which is negligible

if the population is much larger than the sample size. In practice, of course, one wouldn’t allow for the same voter to be picked more than once so that the surveyable population is shrinking.

4 A Typicality Theory of Probability

73

Needless to say that this oversimplified example doesn’t do justice to the craftsmanship involved in designing a good poll or statistical model under realistic conditions.6 What it demonstrates, however, is that there is no need for a physical probability associated with the election outcome per se, nor would such a probability do us any good. My point is in opposition to “imperialistic” (Loewer, 2012) views, like the Humean Mentaculus, according to which statistical-mechanical probabilities exist for any conceivable macro-event, and every probabilistic forecast should aim at matching those chances. The imperialistic view is doing both too much and too little. On the one hand, it insists on physical probabilities for singular events that have no empirical content or relation to actual scientific practice. On the other hand, it is of little help in explaining how some forecasts not based (solely) on statistical mechanics can nonetheless be justified. The typicality view is more modest about the scope of physical probabilities. But sometimes modesty truly is a virtue.

References Dürr, D., Froemel, A., & Kolb, M. (2017). Einführung in Die Wahrscheinlichkeitstheorie Als Theorie Der Typizität. Berlin: Springer. Dürr, D. & Lazarovici, D. (2020). Understanding quantum mechanics: The world according to modern quantum foundations. New York: Springer International Publishing. Goldstein, S. (2012). Typicality and notions of probability in physics. In Y. Ben-Menahem & M. Hemmo (Eds.). Probability in physics. The Frontiers Collection (pp. 59–71). Berlin: Springer. Hájek, A. (1996). “Mises redux” — Redux: Fifteen arguments against finite frequentism. Erkenntnis, 45(2), 209–227. Hájek, A. (2006). The reference class problem is your problem too. Synthese, 156, 563–585. Hájek, A. (2009). Fifteen arguments against hypothetical frequentism. Erkenntnis (1975-), 70(2), 211–235. Hubert, M. (2021). Reviving frequentism. Synthese, 199(1), 5255–5284. 6I

also avoid being dragged into the religious war about Bayesian vs. non-Bayesian statistics any deeper than by saying that the natural allies of the typicality view are among the non-Bayesians.

74

D. Lazarovici

Kac, M. (1959). Statistical independence in probability, analysis, and number theory. The Carus mathematical monographs. Washington DC: Mathematical Association of America. Loewer, B. (2012). The emergence of time’s arrows and special science laws from physics. Interface Focus, 2(1), 13–19.

5 The Mentaculus: Typicality Versus Humean Chances

But that there is no science of the accidental is obvious; for all science is either of that which is always or of that which is for the most part. — Aristotle, Metaphysics (1027a)

The objective probabilities we encounter in the world are the results of physical processes. Grounding them in the fundamental micro-physical laws is thus the task of statistical mechanics, broadly construed. Of course, in the narrower sense, we think of statistical mechanics as being primarily concerned with thermodynamic regularities, from gas laws to the second law of thermodynamics and the entropic arrow of time. But the key foundational question is then, nonetheless, where and how probabilities enter the fundamentally deterministic picture. For the following discussion, I will focus on classical statistical mechanics, based on Newtonian microdynamics, which provides the general blueprint. In anticipation of Part of this book, we note that there is a reasonably widespread agreement that the following holds true as a mathematical

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Lazarovici, Typicality Reasoning in Probability, Physics, and Metaphysics, New Directions in the Philosophy of Science, https://doi.org/10.1007/978-3-031-33448-1_5

75

76

D. Lazarovici

statement (see, e.g., Albert 2000; Bricmont 1995; Carroll 2010; Goldstein 2012; Lazarovici and Reichert 2015; Penrose 1989): There exists a small (low-entropy) region .M0 in the phase space .Г of the universe such that the uniform Liouville measure1 .λ assigns high weight to initial conditions in .M0 which lead to micro-evolutions instantiating the thermodynamic regularities (in particular, the second law of thermodynamics) and other salient regularities (about coin tosses, stone throws, etc.) that we observe in the world. That is, if we denote this set of “good” initial conditions by .M0∗ ⊂ M0 , it holds true that .λ(M0∗ )/λ(M0 ) ≈ 1. In recent lectures, which I had the pleasure of attending, David Albert called this the fundamental theorem of statistical mechanics (FTSM), a name I will gladly borrow, even though the FTSM is not literally a theorem in the sense of a rigorously proven result. Some people will find it preposterous to refer to the initial conditions of the universe in order to account for something like the motion of a rock or the cooling of a cup of coffee. Well, in practice, we don’t. In principle, however, even the best-isolated subsystem is part of a larger system with which it has at some point interacted. Hence, if we make postulates about the initial conditions of individual subsystems, we commit redundancy and risk inconsistency.2 Any attempt at a fundamental account must therefore speak about the universe as a whole. An important question is, of course, why the mathematical statement seems so compelling despite being practically impossible to prove for anything more than highly simplified and idealized models. This question will be addressed in Chap. 8, which discusses Boltzmann’s statistical mechanics in detail. Here, I shall focus on the physical and philosophical interpretation of the FTSM (assuming its validity), in particular, on the meaning and status of the measure figuring in it.

1 If we can conditionalize on the constant total energy, the relevant measure is, more precisely, the induced microcanonical measure on the energy surface. 2 To adopt an expression from John Bell (2004, p. 166).

5 The Mentaculus: Typicality Versus Humean Chances

77

David Z Albert (2000, 2015) and Barry Loewer (2007, 2012) have developed a popular and well-worked-out view in the context of the Humean best system account of laws (BSA), adapting David Lewis’s theory of Humean chances (Hoefer, 2019; Lewis, 1980, 1994; Loewer, 2001, 2004). In a nutshell, the BSA regards the laws of nature as the best systematization of contingent regularities in the world, “best” in terms of striking an optimal balance between simplicity and strength (informativeness) in summarizing the “Humean mosaic.” According to Albert and Loewer, the best system laws of our world consist in 1. The deterministic microscopic dynamics. 2. The Past Hypothesis postulating a low-entropy initial macrostate of the universe (see Chap. 11 for a detailed discussion). λ 3. A probability measure .P = λ(M on the Past Hypothesis macro-region 0) .M0 . This probability measure does not refer to any propensities or intrinsically random events in the world. It is a particular kind of bookkeeping device whose inclusion in the best systematization is justified by the fact that it comes at relatively little cost in simplicity but makes the system much more informative, precisely because it accounts—via the FTSM—for the laws of thermodynamics, the entropic arrow of time, and many other macroscopic regularities. Loewer introduced the name “Mentaculus” for this best system candidate, a reference to the Coen brothers’ movie A Serious Man (2009), in which an oddball character is trying to devise “a probability map of the entire universe.” As a philosophical proposal, the Mentaculus (oddball or not) is certainly appealing as it promises to reconcile objective probabilities with deterministic micro-dynamics. Based on Humean supervenience, it allows us to include a probability law while avoiding metaphysical worries about what the respective probabilities are supposed to do in the world. Like all Humean laws, they don’t do anything; they only describe. Moreover, the Mentaculus provides the basis for a sophisticated account of counterfactuals, records, and even compatibilist free will (Loewer, 2020), the details of which are beyond the scope of this chapter.

78

D. Lazarovici

In contrast, the view defended in this book is that the Liouville measure on the initial macro-region should be understood as a typicality measure. It is not an additional law of nature, but pertains to a way of reasoning about the laws. Its technical role is to determine “very large” and “very small” sets of possible initial conditions, but no physical or epistemic meaning is attached to the exact numbers that it assigns to various phase space regions. The FTSM is thus interpreted as a typicality statement rather than a probabilistic one. It simply says that the relevant regularities obtain in nearly all possible worlds (consistent with the dynamical laws and the Past Hypothesis). The notion of probability is applied to typical statistical regularities in the world, but not to the fundamental measure on the phase space of the universe. Despite these disagreements, the two views share a lot of common ground. They agree, in particular, on the Boltzmannian approach to statistical mechanics, how it reduces macroscopic regularities to microscopic laws via the FTSM, and the very limited role that subjectivist notions like ignorance or information play in that story. The disagreements concern primarily the meaning and status of the phase space measure and the scope of objective physical probabilities in general. This is a good starting point for a comparative analysis3 that allows me to further elaborate on the typicality view by contrasting it with the interpretation of the phase space measure as a Humean. An obvious contrast I have to draw right away is that the latter is tied to a particular metaphysics of laws—Humean supervenience and the BSA—while typicality is not. The point of this chapter is not to litigate Humeanism in general. Instead, I will argue that one should embrace typicality even as a Humean, while there are additional motivations if one holds an anti-Humean view of natural laws.

3 See

also Lazarovici (2023) and, in response, the comments of David Albert in the same volume.

5 The Mentaculus: Typicality Versus Humean Chances

5.1

79

Typicality Versus Humean Chances

When one asks a Humean to explain the regularity theory of chance in five minutes, one will likely hear something along the following lines:4 In our world, we find an irregular pattern of coin-toss outcomes. Providing a complete list of every single outcome would be very informative but not at all simple. Saying that some outcomes are heads and some are tails would be simple but not at all informative. The statement that the probability of heads and tails is 50% strikes an optimal balance between simplicity and strength. It summarizes the statistical pattern by telling us that both outcomes occur in irregular order but with a relative frequency of .1/2 throughout the history of the world. Fair enough, but in the Mentaculus theory, probabilities mean something different. First and foremost, the probability .P(A) of an event A (at time t) is the value that the fundamental probability measure .P assigns to the set .⏀0,t (A) ∩ M0 of initial micro-conditions that evolve under the dynamical laws to realize the respective event (cf. Albert 2015, p. 8). The probability measure is thus supposed to contain an enormous amount of information, far beyond the summary of statistical patterns. In fact, it will assign a (conditional) probability to any physical proposition about the world: a probability that my dog gets sick if he eats a piece of chocolate, or that your favorite football team wins the next Super Bowl (given the current state of the NFL), or that the United States elect a female president in 2028 (given the current state of American politics). The epistemic and behavior-guiding function of these predictions is then based on a normative principle, the Principal Principle (PP), which posits that we should align our initial credences with these objective Humean probabilities. Formally: C(A | P(A) = x) = x,

.

(5.1)

4 David Albert has a great way of telling this story, but I couldn’t do it justice if I tried to reproduce

it verbatim.

80

D. Lazarovici

where C is the credence function, or, for conditional probabilities, C(A | B ∧ P(A | B) = x) = x.

.

(5.2)

There are other variants of the PP proposed in the literature and debates about what constitutes “admissible information” that one can conditionalize on Hall (1994, 2004); Lewis (1994); Loewer (2004), but these subtleties will not be relevant here. The best system account contains at least a hint (that some authors have tried to turn into a rigorous argument, see, e.g., Hoefer (2019, Chap. 4)) as to why we should follow the Principal Principle. Since the probability measure .P is part of the best systematization of the world, its outputs are bound to be accurate—maybe not perfectly so, but as much as the trade-off with simplicity will permit. This would start to make sense if it were clear what Humean chances tell us about the world in the first place. When the Mentaculus predicts, let’s say, a (conditional) probability of 30% for the United States electing a female president in 2028, how does the measure .0.3 assigned to a set of initial conditions evolving into a female president summarize what actually happens? The Lewis–Loewer theory agrees, after all, that there are no genuinely probabilistic facts in the world. Every possible event either occurs or does not, and whether it does is entailed by initial conditions and deterministic dynamics. So what are such single-case probabilities supposed to inform us about? It seems obvious to me that a Humean law can inform us only about facts that it supervenes on in the first place. If it is more accurate to assign a probability of 30% than of 60%, there must be concrete physical facts in the world that make it so. And these facts must be relevant for evaluating the strength or fit of a probability measure as it competes for a place in the best system. In other words, assigning a probability of .30% to the particular event must be part of what makes that system best. However, a great many probability measures would assign a probability close to 1 to the thermodynamic and other statistical regularities, yet a chance very different from .0.3 to the United States electing a female president in 2028. By some standard, these measures might not be as simple as the Liouville measure—which is why they are not elevated to a Humean law—but this does not make them less reliable predictors of the 2028 election.

5 The Mentaculus: Typicality Versus Humean Chances

81

Some authors have read Lewis as suggesting that the probability law is supposed to fit the macro-history of the world by assigning as high a probability as possible to any event that actually occurs and as low a probability as possible to any event that does not occur (while being constrained by the criterion of simplicity). This would explain why the Humean chance of an event provides the best possible guess about whether that event occurs. But it cannot really work this way. In competing for the best system, being a good predictor of presidential elections does not gain you as many points for “strength” as predicting the second law of thermodynamics. And assigning a probability of .1/2 to individual coin tosses (which may look like the law is completely undecided about their outcomes) is actually informative because it implies a very high probability for relative frequencies .∼1/2 in the long run. (The complex event is much more informative than any singular outcome.) At the end of the day, the best system probability law will be one that informs us about robust regularities and global patterns in the world—by assigning to them a probability near one—while the fit to singular events will count for little to nothing in the trade-off with simplicity. In sum, the Humean regularity theory holds that there are certain “chancemaking patterns” (Lewis, 1994) on which a measure supervenes, while “probabilities” for a great many other events—in fact, for all measurable subsets of phase space, most of which do not even correspond to meaningful macro-events—come out for free. I submit that these chancemaking patterns are what the measure actually predicts (by making them typical!), while the numbers assigned to any other odd events have no physical meaning and play no legitimate epistemic role (except to the extent that they provide a guess for typical frequencies). An interesting Humean response is that the probability of a singular event is meaningful (at least in the sense of possible worlds semantics) because .P(A) = x is true in all and only those worlds whose best systematization implies .P(A) = x. The probability may not express anything about the event A per se, but there is something about the structure of the mosaic as a whole that makes this particular value true (Lewis, 1980). Actually, though, it is only the criterion of simplicity that would make such probability assignments into theorems of the best system since many other probability measures would systematize the

82

D. Lazarovici

same regularities. In particular, if, given the dynamical laws and the Past Hypothesis, two probability measures .P and  .P are equivalent in terms of strength while  .P loses out in terms of simplicity, there is no possible world in which  .P replaces .P as part of the best systematization. (Unless the standard of simplicity is oddly contingent on the details of our universe.) Therefore, a proposition like “the best system probability of event A is .P(A) = x rather than  .P(A) = y” does not restrict the set of possible worlds any further than to those instantiating the typical regularities on which .P and .P˜ agree. Plainly put, both measures (and many others) have the same physical content because single-case probabilities have none.

5.1.1 A True Regularity Theory of Chance In a nutshell, Humean chances are supposed to be efficient summaries of statistical regularities. Then they turn out to refer, first and foremost, to a measure on sets of possible initial micro-conditions of the universe. What has one to do with the other? In most cases, nothing at all (is exactly my point). In many relevant cases, however, the connection between a statistical pattern instantiated by a series .S = (Ai )1≤i≤N of similar events and the probability .P(Ai ) = p of the individual events that make up the pattern is provided by a law of large numbers (LLN), e.g., a result of the form     N 1   1   .P x ∈ M0 :  (5.3) 1Ai (x) − p > ∈ ∝ 2 ≈ 0. N  ∈ N i=1 1Ai (x) is the characteristic function mapping each possible initial microstate x to 1 if the event .Ai occurs (for the micro-history with initial condition x) and to 0 if it does not. The standard proof of the law of large numbers would require that the events are uncorrelated under .P and make use of the fact that p comes out as the expectation value of the empirical distribution .mN emp = 1 N i=1 1Ai (x). In the end, however, the role of the measure in (5.3) N is only to tell us that a particular set of initial conditions—the initial

.

5 The Mentaculus: Typicality Versus Humean Chances

83

conditions that would lead to significant deviations from the statistical pattern—is negligibly small. And at this point, it doesn’t matter where the number p came from, whether it agrees with the individual .P(Ai ) or not, and whether we gave it any meaning as a probability in the first place. Its significance as a frequency describing a typical statistical pattern is established by, rather than assumed in, the law-of-large-numbers result (5.3). Philosophically, it is thus unnecessary and misleading to think of (5.3) as a consequence of the single-case probabilities determined by .P. It is the other way around: What a law-of-large-numbers result does, in effect, is to reduce theoretical probabilities to typical relative frequencies. These typical frequencies are all that the fundamental laws can or need to inform us about. In particular, if our best theory tells us that (with “near certainty”) roughly .1/2 of the coin tosses result in heads, we can justify the rationality of assigning credences about individual tosses accordingly. For instance, by appealing to Dutch-book arguments (if I accept bets of less than 2:1 on each one of these events, I can be almost certain to lose money in the long run) or maybe by invoking a principle of indifference with regard to the individual event in the pattern that we are about to observe (Schwarz, 2014). In any case, the idea that the Mentaculus provides a shortcut from the fundamental laws of nature to individual chance prescriptions for all conceivable events might be philosophically appealing, but it is ultimately too simplistic to pan out. The typicality view is in no way committed to Humean metaphysics, but when combined with a regularity theory, it puts the latter back on its feet. As originally advertised, probabilities are summaries of statistical patterns that the best system predicts rather than abstract weights that it assigns to sets of possible initial conditions.

Principal Principle Versus Cournot’s Principle The upshot of our discussion is that the Principal Principle—or at least the instances of PP that could have a basis in physics—can and should be reduced to Cournot’s principle (CP), which we introduced in Chap. 3. In contrast to PP, CP does not try to ground credences in any odd value that

84

D. Lazarovici

the phase space measure assigns to any odd collection of microstates, but regards only typicality statements—“probabilities” near one or zero—as predictions of events occurring or not occurring. Many typical events are complex, corresponding to (statistical) regularities that are thus reduced to fundamental laws. While the Humean theory of objective chance is traditionally associated with PP (Lewis even regarded PP as non-negotiable), it is very much, if not more, compatible with CP: If the Humean probability of an event is very high, we can be almost certain that this event actually happens. Why? Because this is what the best system is trying to tell us, because the way it summarizes relevant regularities in the world is to assign them a measure very close to one. Ironically, a version of what Lewis (1994) called the “big bad bug” of his theory of objective chance can serve to vindicate even the strongest form of CP (in some cases). The Mentaculus will assign a very small yet non-zero probability to the universe evolving on an entropy-decreasing trajectory. However, if our universe did evolve on an entropy-decreasing trajectory, the Mentaculus would not be its best systematization, given that so many important features of our universe depend on its entropic history. Hence, the fact that the Mentaculus assigns a near-zero probability to the anti-thermodynamic evolution of the universe, together with the premise that the Mentaculus is the best system for our universe, makes the entropy increase a logical certainty. On the other hand, if we are talking about an event that the best system could conceivably get wrong, it is quite immaterial whether it predicts a 10 probability of .10−100 or .10−10 . Our residual uncertainty about whether the event obtains after all does not come from anything the best system tells us about the world, but from the possibility that it just had to take this miss in the trade-off with simplicity. In any case, the concrete physical information that the Mentaculus provides is to be found, first and foremost, in statements of measure close to 1 and 0, while the rationality of aligning credences with any odd value of the Humean probability is spurious, at best. It also bears emphasizing that the only way to test a probabilistic law is by applying Cournot’s principle, that is, by rejecting the law hypothesis if we observe phenomena to which it assigns a negligibly low chance. I do not claim

5 The Mentaculus: Typicality Versus Humean Chances

85

that single-case probabilities are meaningless just because they cannot be empirically tested, but have argued that the Humean theory fails to give them meaning as deterministic chances—except to the extent that they can be reduced to typical frequencies. Advocates of the Mentaculus often stress that their probability measure is “empirical,” yet it is supposed to provide information far beyond what is empirically testable. I don’t think they can have it both ways.

Probability Versus Typicality Measures The next step from probability to typicality comes by emphasizing the following insight: If all we need on the fundamental level are “probabilities” close to 1 and 0, then a whole lot of different measures could do the job. If we don’t like the Liouville measure, how about putting a (truncated) Gaussian measure on .M0 ? In fact, we can tweak the measure in almost any way we like. Any measure that doesn’t differ radically from Liouville will make a statement analogous to the FTSM true and thus imply the same thermodynamic and statistical regularities. We cannot be too extreme, of course. A delta-measure concentrated on an anti-entropic microstate would, evidently, lead to very different predictions. However, as Maudlin (2007, p. 286) concludes, against this backdrop, “our concerns about how to pick the ‘right’ probability measure to represent the possible initial states …or even what the ‘right’ measure means, very nearly evaporate.” An important observation is that probabilities (or weights, to use a more neutral term) close to 1 or 0 are very robust against variations of the underlying measure. Suppose the measure .μ has a density with respect to the uniform Liouville measure .λ (i.e., .dμ = f dλ, so that .μ(A) = A f dλ for any measurable A). If .λ(A) ≈ 0 but .μ(A) 0, then f must differ drastically from unity on the very small set A. In contrast, for .μ(B) = 0.4 while .λ(B) = 0.3, .μ needs to deviate only mildly from .λ over the larger set B (see (6.23) in Chap. 6 for a rigorous statement). Here, “large” and “small” are understood with respect to the Liouville measure, but this does not make the argument circular. The point is that it would require radical deviations from the Liouville measure to come to different conclusions about typicality facts, while relatively small

86

D. Lazarovici

variations of the phase space measure can lead to significantly different probability assignments for other events. At least for the sake of argument, David Albert is willing to concede that we could consider best system candidates that involve an entire set or equivalence class of probability measures, with the stipulation that the theory endorses all and only those probability statements on which these measures (more or less) agree (Albert, 2015, footnote 2). While this is in the spirit of typicality, it strikes me as conceding an option that is set up to fail within the BSA since an entire set or equivalence class of measures is neither simpler nor more informative than the Liouville measure (I have only argued that it is equally informative). There is nothing wrong with using the Liouville measure as the simplest and most natural choice; just use it with the understanding that it is not a probability measure but a typicality measure. Its role and purpose are to designate events as “typical” (measure .≈ 1), “atypical” (measure .≈ 0), or neither, while the precise numerical assignments have no physical significance. Part of what makes the Liouville measure natural is that, in particularly nice (or idealized) situations, the measure assigned to individual events will coincide with their typical frequencies on repeated trials. This statistical transparency has to do with how the Liouville measure tends to produce statistical independence in the interplay with Hamiltonian micro-dynamics. We may thus use the Mentaculus “probabilities” as a good guess for the typical long-term frequencies, as long as we keep in mind that only the latter are empirically relevant and bona fide predictions of the physical laws. My first proposal to friends of the Mentaculus thus sums up as follows: Regard the phase space measure as a Humean law but interpret it as a typicality measure tied to Cournot’s principle instead of a probability measure tied to the Principal Principle. This is a pretty modest proposal (Callender (2007) comes close to making a similar one) as it leaves the epistemological and metaphysical status of the measure untouched. It is also just a first step toward understanding the more subtle aspects of typicality.

5 The Mentaculus: Typicality Versus Humean Chances

5.2

87

Epistemology and Metaphysics of Typicality Measures

It is a prima facie attractive feature of the BSA that it can assign the same epistemological and metaphysical status to the phase space measure and the dynamical laws. That the measure does not “govern” or genuinely constrain physical possibilities need not bother the Humean. In her view, neither do dynamical laws. Both are bookkeeping devices for the Humean mosaic. Both are primarily empirical hypotheses and acquire lawhood if they turn out to be axioms of the best systematization of our world. But a typicality measure does not have the same status as a Humean (probability) law: • A Humean law is supposed to summarize regularities in the world. Typicality statements summarize, first and foremost, the modal structure of the laws. They do not refer directly to the actual world but to the fact that a certain feature is typical among all nomologically possible ones. • Humean chances, in particular, are supposed to be descriptions provided by the laws. Typicality pertains to a way of reasoning about the laws, a crucial way in which the laws ground explanations or point to phenomena that call for one. Using a typicality measure is the continuum equivalent of simply counting states if the phase space of our theory is finite. • According to the BSA, the Humean mosaic is the truth-maker of the laws. At least in my view of typicality, a choice of typicality measure can be reasonable or justified, but there are no concrete physical facts that make it, strictly speaking, true. (I am inclined to the view that there are objective normative facts that make it true, but that goes beyond the scope of this discussion.) • According to the BSA, the dynamical postulates and the phase space measure have the same epistemic status. A typicality measure is epistemically much more robust than the dynamical hypotheses. In particular, it is never justified to change the typicality measure in light of new empirical evidence before modifying

88

D. Lazarovici

the laws of motion (or other theoretical postulates, e.g., about boundary conditions). Quine’s picture of a “web of belief ” (Quine, 1951) is useful to illustrate the sense in which a typicality measure is “less empirical” than the dynamical postulates. The latter are closer to the edges of the web, while some notion of typicality—which finds its mathematical expression in the measure—lies in between the dynamical postulates and the logical inference rules. While it is ultimately the theoretical system as a whole that is challenged by empirical evidence, the typicality measure is never the first node to adjust. One reason is that, because typicality judgments are so robust against variations of the measure, any revision would have to be radical. A typicality measure is not the kind of theoretical tool that can be incrementally updated or modified only on small scales. In effect, changing the typicality measure to save our equations of motion from falsification would amount to the unreasonable claim that we were right about the microscopic dynamics of the universe but completely wrong in the way we have been characterizing sets containing nearly all possible initial conditions. There is a more important reason why the typicality measure must be less empirical or epistemically more robust than the dynamical postulates. It could not play the role it does in assessing the dynamical postulates if it were an equally malleable hypothesis. David Albert, as one of the main proponents of the Mentaculus, makes the case better than anyone that microscopic dynamics alone are barely predictive because special initial conditions could produce even the wildest macroscopic phenomena (Albert, 2015, Chap. 1). By the same token, given any somewhat complex micro-dynamics and virtually any macroscopic phenomena in the world, there will be some measure that makes the phenomena “typical” or sufficiently likely. Treating the typicality measure on the same footing as the dynamical laws would thus give us too many moving parts that could be fit to the data. It would not only increase the risk of a tie for the best system—since simplicity of the measure and of the dynamical equations might pull in opposite directions—but, worse, make it nearly impossible to test micro-dynamics against empirical evidence.

89

5 The Mentaculus: Typicality Versus Humean Chances

Example: Fitting Probability Measures. We consider a slight modification of our coin-toss model discussed in Chap. 4. Let .Г = (−1, 1) ⊂ R be the phase space and .rk the k’th Rademacher function, i.e., the k’th digit in the binary expansion of the real number .x ∈ Г. We interpret .rk as a macro-variable describing the outcome of the k’th trial in a long series of coin tosses. Now consider the following family of truncated Gaussians as probability measures on .Г : For .n ∈ N, let .N(0, σ 2 (n)) be the normal distribution with mean 0 and standard deviation .σ (n) := 101 21n and .μn

:=

1(−1,1) N(0, σ 2 (n)) , ‖1(−1,1) N(0, σ 2 (n))‖1

(5.4)

where .‖1(−1,1) N(0, σ 2 (n))‖1 is the normalization. All these measures are simple

but concentrated (to more than .10σ ) on the interval .I (n) := equally −( 21 )n , ( 21 )n , on which .rk (x) = 0, ∀ k ≤ n. That is, for any .n ∈ N, the measure .μn makes it overwhelmingly likely that the first n coin tosses result in tails. Hence, no matter how dominant the occurrence of tails, we could always fit the statistical regularity without revising the dynamics of the model.

In actual scientific practice, typicality judgments do have a privileged epistemic status that the regularity theory fails to capture or account for.5 In particular, atypicality is precisely the standard by which dynamical theories are reasonably rejected as empirically inadequate. Think again of the double-slit experiment as a falsification of classical mechanics (it is not impossible for Newtonian particles to create interference patterns, it is atypical) or the .5σ -standard commonly used in particle physics. Interestingly, this applies in pretty much the same way to deterministic laws as to intrinsically stochastic ones. The difference is that, in the latter case, falsifying a dynamical and a probabilistic hypothesis is one and the same, while for deterministic theories, it is primarily the dynamical postulates that stand trial. And this is only possible if the choice of typicality measure is epistemically more robust or—in a sense that remains to be discussed— tied to the dynamical laws. In any case, some notion of typicality must be

5 Marc Lange (2009) makes a similar point when he argues for “degrees of necessity” in laws, but typicality is not a law, and degrees of nomic necessity are not the right concept here.

90

D. Lazarovici

part of the backdrop against which law hypotheses are evaluated rather than an additional law hypothesis in its own right.

5.3

Justification of Typicality Measures

For this reason, most advocates of typicality do not consider the typicality measure to be an independent postulate of the physical theory, even though it is one from a strictly logical perspective. A good analogy is still the counting of states if the phase space of the theory were finite. While it might not be logically entailed by the dynamical equations, it would be rather odd to insist that it amounts to an additional law of nature. Of course, on a continuum, there is more ambiguity about how to compare the content of sets. So we cannot avoid the question of what it is that determines the right typicality measure or at least an appropriate class of measures that agree on the relevant typicality facts. I will provide two answers before attempting a synthesis. One answer I have already alluded to is that typicality is not defined by a measure but has a pre-theoretic meaning that a measure can succeed in capturing or fail to capture. To succeed, a measure must provide a somewhat natural formalization of what we mean by “overwhelmingly large” or “negligibly small” sets. This criterion is not very sharp, but typicality, as we discussed, is very forgiving. We might disagree on the naturalness of the Liouville measure, but unless you try hard to construct a measure that is concentrated on a very particular phase space region, we won’t disagree on the relevant typicality facts. The concept of typicality has, in any case, a certain vagueness. Just as it is impossible to fix a general threshold measure above which a set contains “nearly all initial conditions,” it would be wrongheaded to insist on rigid criteria that qualify or disqualify a measure to be used as a typicality measure. In the framework of classical (Hamiltonian) mechanics, the Liouville measure is clearly a reasonable choice, while a delta-measure is clearly not, but a certain gray area in between is unavoidable. Indeed, a family of Gaussian measures with standard deviation .σ ∈ (0, +∞) interpolates between the two, with the distributions becoming ever more uniform for .σ → ∞ and more and more peaked for .σ → 0 (see the

5 The Mentaculus: Typicality Versus Humean Chances

91

example above). Again, it would be misguided to ask for a sharp threshold value for .σ below which the Gaussians cease to be acceptable typicality measures. Still, this does not mean that the concept of typicality is illconceived or that its vagueness is problematic in practice. The bottom line is that typicality statements ground explanations and predictions iff they are made with respect to a reasonable standard for “large” versus “small” sets. And while it seems impossible to formalize what makes a measure reasonable or unreasonable, we can generally tell them apart when we see them. Another response puts less emphasis on the flexibility of a typicality measure and more on a criterion that ties it to the dynamical laws. The typicality measure should be stationary under the dynamics, which is to say that the size of a set of microstates does not change as the microstates evolve in time. By imposing this condition, the dynamical laws constrain the choice of typicality measure, and we guarantee, in particular, that typicality statements do not depend on the time at which we consider “initial conditions.” I will provide further explanation and justification for the stationarity condition very soon. Fortunately or unfortunately, as long as we are dealing with classical mechanics, there are many ways to justify the consensus measure (cf. Bricmont 2022, Chap. 6.8) since the simplest and most intuitive choice—the uniform measure on phase space—is also stationary under the Hamiltonian dynamics (though not uniquely so). Even the naive principle of indifference can claim to get this measure right. There is, however, a notable example where stationarity and uniformness part ways. In Bohmian quantum mechanics, the natural typicality measure grounding the statistical predictions of quantum mechanics is given by the .|Ψ|2 -density on configuration space, induced by the universal wave function .Ψ (Dürr et al. (1992), reprinted as Chap. 2 in Dürr et al. (2013); see our detailed discussion in Chap. 13). This measure is stationary (more precisely, equivariant) under the particle dynamics and even uniquely determined as such (Goldstein & Struyve, 2007). However, since we know little about the wave function of the universe, the justification for this typicality measure can hardly lie in pre-theoretic intuitions about the shape of the .|Ψ|2 -density. It might well be sharply

92

D. Lazarovici

peaked or in other ways highly non-uniform across configuration space. Let us, therefore, look at criteria that unify the typicality measures for Newtonian and Bohmian mechanics. They will provide a good guide for justifying typicality measures in general.

5.3.1 Stationarity, Uniformity, Symmetry While the following discussion is somewhat technical, the basic point is simple. So far, we have mostly talked about measures on the set of possible initial conditions of the universe. In fact, the relevant reference class for typicality statements—what we actually want to quantify— is not microstates but nomologically possible worlds. Initial conditions are just the natural way to parameterize solution trajectories that arise from deterministic dynamics. A stationarity measure on the relevant phase space allows us to quantify sets of possible micro-histories without distinguishing an “initial time” or any other preferred moment at which we cut across the respective solution trajectories. There is one subtlety here that can lead to disagreements about whether the relevant measure in classical mechanics is actually stationary. Proponents of the Mentaculus tend to think of the fundamental probability measure on phase space .Г as one that is uniform over the Past Hypothesis macro-region .M0 ⊂ Г and zero outside. This is not a stationary measure on .Г because weight will “flow out” of the initial macro-region and disperse all over phase space. The typicality view refers, in general, to the stationary Liouville measure on the entire phase space, which is then conditionalized on the initial macrostate .M0 . The distinction will become less relevant with the following considerations that focus on the solution space rather than the phase space.

A Geometric View of Stationarity Let .S be the set of solution trajectories for the microscopic dynamics (consistent with the Past Hypothesis) in the state space .Г ∼ = Rn . For any .t ∈ R, let .εt : S → Г, X → X(t) be the map evaluating the trajectory

93

5 The Mentaculus: Typicality Versus Humean Chances

time Fig. 5.1 Sketch of the solution space and its parameterization by time slices. The trajectory X is evaluated at times s and t, respectively. The flow .t,s yields the corresponding transition map

X at time t. These maps can be understood as charts, turning the solution set .S into an n-dimensional differentiable manifold.6 The transition maps between different charts are then .εt ◦ εs−1 = ⏀t,s , where .⏀t,s is the flow arising as the general solution of the laws of motion (Fig. 5.1). The easiest way to define a measure .μ on the solution space S is in one of these charts, let’s say .ε0 . Indeed, a possible viewpoint is that there exists a time in the history of the universe, e.g., the Big Bang, that is distinguished for parameterizing solutions by initial data. The other view, emphasized by our geometric notation, is that the choice of the time slice is arbitrary, amounting to one of many equivalent coordinatizations of the solution space. Under a transition map, i.e., a change of coordinates, the measure transforms by a pullback, .μt = ⏀t,0 #μ0 . That is, .μt (A) = μ0 (⏀−1 t,0 A)

6 In

principle, some solutions may exist only on a finite time interval so that the charts are only locally defined. Here, I assume the global existence of solutions for simplicity. Also, if one insists on admitting only trajectories that start out in an open region .Г0  Г, then .∈t will only map into n .⏀t,0 (Г0 )  Г. Since the flow is diffeomorphic, this corresponds to an open subset of .R on which the charts are still well defined.

94

D. Lazarovici

for any measurable .A ⊂ Г, where .μt is the measure represented in the chart .εt . Now, a measure on .Г is stationary if and only if it has the same form in every time-chart: μt := ⏀t,0 #μ0 = μ0 , ∀t ∈ R.

.

(5.5)

Equivariance is the next best thing if the dynamics is itself time dependent. Concretely, in the case of Bohmian quantum mechanics, the particle dynamics are determined by the universal wave function .Ψt which, in turn, evolves according to the Schrödinger equation. Nonetheless, we have

|Ψt |2 dn x = ⏀t,0 # |Ψ0 |2 dn x , ∀t ∈ R,

.

(5.6)

so that the typicality measure has the same functional form in terms of .Ψt for any time .t ∈ R. In conclusion, a stationary or equivariant measure on the state space .Г induces a canonical measure on the solution space .S, i.e., a measure that can be defined without distinguishing a set of coordinates in the form of a particular time-chart. Uniformity, on the other hand, is a metric property. It requires that μ(B(x, r)) = μ(B(y, r)), ∀x, y ∈ Г, r > 0,

.

(5.7)

where .B(x, r) is the ball of radius r around x. The uniformity of a measure is thus only defined with respect to a metric. However, even if the state space .Г comes equipped with metric, it does not induce a canonical metric on .S. While there are ways to define one (e.g., via cylinder sets), it would amount to an additional structure in need of further justification. In any case, even the Liouville measure in classical mechanics is uniform on the “wrong” space, namely on phase space rather than the solution space of the theory. Since we ultimately want to make

5 The Mentaculus: Typicality Versus Humean Chances

95

typicality statements with respect to possible worlds, uniformity of the phase space measure is per se of questionable relevance.

Invariance Under Symmetries The uniformity of the Liouville measure, however, has to do with another and very significant feature of the typicality measure, namely its invariance under Galilean symmetries. After all, a translation in phase space amounts to a translation in space and/or momentum, i.e., into another inertial frame. More abstractly, a dynamical symmetry corresponds to an isomorphism .T : Г → Г that commutes with the dynamical flow, i.e., .⏀t,s (T x) = T ⏀t,s (x). This then induces a canonical transformation .T ∗ : S → S on the solution space by .T ∗ = εt−1 ◦ T ◦ εt , which is independent of t.7 The most important symmetries of classical mechanics are those of Galilean spacetime: (qi , pi )1≤i≤N −→

.

(qi + a, pi )

(Translation)

(Rqi , Rpi )

(Rotation)

(qi + ut, pi + mi u) (Galilean boost) They all correspond to Euclidean transformations—rotations or translations—on phase space in canonical coordinates, thus leaving the uniform measure invariant. Consequently (as is straightforward to check), the induced measure on .S is invariant under the corresponding symmetry transformations on the solution manifold. In Bohmian mechanics, the issue is a little more subtle in that the wave function transforms non-trivially under Galilean symmetries, namely as

7 Proof: .ε −1 T ε t t

= εs−1 ⏀s,t T ⏀t,s εs = εs−1 ⏀s,t ⏀t,s T εs = εs−1 T εs .

96

D. Lazarovici

(Dürr & Teufel, 2009): Ψt (q1 , . . . , qN ) −→ Ψt (q1 − a, . . . , qN − a) (Translation)

.

i

e h¯

N i=1

Ψt (R −1 q1 , . . . , R −1 qN ) (Rotation) mi (uqi − 21 u2 )

Ψt (q1 − ut, . . . , qN − ut) (Galilean boost)

While this can be motivated by purely dynamical considerations, we see immediately that the .|Ψ|2 -density is covariant under these transformations, ensuring again that the induced measure on the relevant solution space .S does not change under symmetry transformations.

5.3.2 How to Choose a Typicality Measure We saw that the typicality measures in both classical mechanics and Bohmian mechanics are justified and tied to the dynamics by two precise mathematical features: stationarity/equivariance and invariance/ covariance under spacetime symmetries. However, at least in classical mechanics, these conditions are not sufficient to determine the measure uniquely. Depending on how generously we interpret “covariance,” they may not even rule out evidently inadequate choices such as a deltameasure concentrated on a stationary microstate. At the end of the day, part of what makes a typicality measure compelling is that it does not seem overly biased, contrived, or ad hoc. And once we understand typicality as a way of reasoning, some reliance on good intellectual taste seems both appropriate and unavoidable. Attempts to axiomatize typicality measures (Werndl, 2013) strike me as well intentioned but misguided. Not just because the particular proposals are uncompelling but because typicality is, by its nature, not a formalistic concept. Even the technical criteria of symmetry and stationarity should be understood as justifications for the typicality measure, not as axioms or analytical necessities. In general, once the form of the dynamical laws is fixed, there are only a few reasonable choices for the typicality measure (at least modulo “equivalence”), if not a unique one. In practice, difficulties arise not because of an

5 The Mentaculus: Typicality Versus Humean Chances

97

abundance of good options, but in theories, like relativistic field theories, that make it difficult, for mathematical reasons, to construct any natural measure at all. When faced with more than one reasonable option for the typicality measure, we can and should appeal to empirical adequacy as a tiebreaker. Since two measures must differ drastically to disagree on typicality facts, they will give rise to very different predictions. Certainly, observing that a measure makes the right phenomena typical goes some way toward justifying its use. But empirical criteria cannot be the first or only ones we appeal to. For all the reasons discussed earlier, the path to typicality is never just fitting a measure to empirical data. In conclusion, we have a three-step test that a measure should pass (in that order) to be particularly distinguished as a typicality measure: 1. The measure should be natural and, more precisely, natural to express the meaning of “nearly all” trajectories on the theory’s solution space. 2. The measure should be well adapted to the dynamics by being stationary/equivariant and invariant/covariant under the relevant symmetries. 3. Then and only then can we invoke empirical if we still have to justify a choice from several alternatives.

5.4

Typicality for Humeans

The appeal to “justified” or “reasonable” measures seems to be what bothers Humeans most about typicality. Or maybe more profoundly, the status of typicality judgments as neither empirical nor deductive, which may seem to place them in the nether lands of traditional analytical philosophy. My first proposal for Humeans did not insist on this subtlety. It was a rather moderate revision of the Mentaculus, interpreting the fundamental phase space measure of statistical mechanics as a typicality measure instead of a probability measure, but leaving its status as an axiom of the best systematization untouched. The role of the measure is thus to summarize regularities in the world by making them typical—in concert with the dynamical laws and the Past Hypothesis. I believe that this fits the empiricist bent of Humeanism very well, more so than the probability

98

D. Lazarovici

law of the original Mentaculus, whose theorems—i.e., chances for any conceivable event—are mostly devoid of empirical content. My second, more ambitious offer to Humeans is to think of typicality along the following lines. If the laws of nature are the axioms of the best system—the system that strikes the best balance between simplicity and informativeness in summarizing the world—then a key criterion for the informativeness of law hypotheses is that they make relevant macro-phenomena typical. The typicality measure itself is not a Humean law (or candidate law); it is a means of assessing the informativeness of (candidate) laws. But the question “typical by what measure?” should not cause more headaches than the question “simple by what measure?” In fact, it should cause less because which of two candidate theories is simpler is often a close call while typicality judgments rarely are. In any case, the right standard of simplicity cannot supervene on the mosaic as part of the best system, or else it could not play its role in adjudicating which system is best in the first place. Naturally, we judge a theory by standards that are, to some extent, pre-theoretical. And at the end of the day, the promise that the BSA can ground objective laws of nature must be based on the hope that all reasonable “measures” will agree on which system is best. The status of “simplicity” and “typicality” is similar in these respects. I am not trying to make a tu quoque argument complaining that simplicity is fuzzy. My point, on the contrary, is that there are good reasons why neither judgments of simplicity nor typicality fall neatly into the categories of a priori or empirical and why the analyzability of both concepts goes only so far. The BSA’s concept of simplicity operates in a context of justification. When we say that one theory is simpler than another, we are giving a reason for accepting the former over the latter. We are making an epistemic value judgment that is neither inferred from data nor derived from the theory. The status of typicality is even more subtle as it plays a role in both systematization (of regularities) and justification (of the theory). But in the latter context, it too operates in the normative “space of reasons” (Sellars, 1962), being integral to the process of accepting and rejecting candidate laws and thus necessarily pre-theoretical to some extent. It seems to be this normative character of typicality that does—but shouldn’t—make most Humeans uncomfortable.

5 The Mentaculus: Typicality Versus Humean Chances

99

While the meaning of typicality is constant across theories, a typicality measure is a technical tool that allows us to formalize and derive typicality statements in a way adapted to the dynamical framework at hand. According to my proposal for Humeans, dynamical laws and the appropriate typicality measure still come as a package, although more loosely tied up than in the Mentaculus. The dynamics constrains appropriate typicality measures by requirements like stationarity and symmetry. The measure allows us to analyze which regularities the micro-dynamics predicts as typical and thus assess their informative strength. As a result, a Humean can still maintain that the statistical-mechanical predictions of the best system are reliable by definition, and that the fundamental laws must make the relevant regularities of our world typical, or else they would not be the best systematization of our world. Conclusion To sum the discussion up, there are essentially three disagreements between the Mentaculus of Albert and Loewer and the typicality account—aside from the fact that the latter is not necessarily tied to Humean metaphysics. The first concerns the scope of the phase space measure, whether the exact numbers it assigns to every physical event are meaningful as “probabilities.” The second concerns the rationality principle expressing its epistemic and behavior-guiding function, Lewis’s Principal Principle versus a form of Cournot’s principle that cares only about “very large” and “very small” sets. The most profound disagreement is about the metaphysical and epistemological status of the measure, whether it is an additional law, an empirical hypothesis on par with the dynamics, or a way of analyzing and reasoning about the laws. These points of contention leave room for compromise, e.g., maintaining that the measure is a (Humean) law but interpreting it as a typicality rather than a probability measure. As so often, though, the extremal positions are the most interesting ones, and the view defended in this book comes down on the opposite side of Albert and Loewer on all of the issues just mentioned. Still, the philosophical disagreements should not cloud the fact that, for all practical purposes and concerning, in particular, the role and understanding of statistical mechanics, the two views have much more in common than what divides them.

100

D. Lazarovici

References Albert, D. Z. (2000). Time and chance. Cambridge, Massachusetts: Harvard University Press. Albert, D. Z. (2015). After physics. Cambridge, Massachusetts: Harvard University Press. Bell, J. S. (2004). Speakable and unspeakable in quantum mechanics (2nd ed.). Cambridge: Cambridge University Press. Bricmont, J. (1995). Science of chaos or chaos in science? Annals of the New York Academy of Sciences, 775(1), 131–175. Bricmont, J. (2022). Making sense of statistical mechanics. Undergraduate lecture notes in physics. Cham: Springer International Publishing. Callender, C. (2007). The emergence and interpretation of probability in Bohmian mechanics. Studies in History and Philosophy of Modern Physics, 38, 351–370. Carroll, S. (2010). From eternity to here. New York: Dutton. Dürr, D. & Teufel, S. (2009). Bohmian mechanics: The physics and mathematics of quantum theory. Berlin: Springer. Dürr, D., Goldstein, S., & Zanghì, N. (1992). Quantum equilibrium and the origin of absolute uncertainty. Journal of Statistical Physics, 67 (5–6), 843– 907. Dürr, D., Goldstein, S., & Zanghì, N. (2013). Quantum physics without quantum philosophy. Berlin: Springer. Goldstein, S. (2012). Typicality and notions of probability in physics. In Y. Ben-Menahem & M. Hemmo (Eds.). Probability in physics. The Frontiers Collection (pp. 59–71). Berlin: Springer. Goldstein, S. & Struyve, W. (2007). On the uniqueness of quantum equilibrium in Bohmian mechanics. Journal of Statistical Physics, 128(5), 1197–1209. Hall, N. (1994). Correcting the guide to objective chance. Mind, 103(412), 505– 518. Hall, N. (2004). Two mistakes about credence and chance. Australasian Journal of Philosophy, 82(1), 93–111. Hoefer, C. (2019). Chance in the world: A Humean guide to objective chance (1st ed.). New York: Oxford University Press. Lange, M. (2009). Laws and lawmakers: Science, metaphysics, and the laws of nature. Oxford: Oxford University Press.

5 The Mentaculus: Typicality Versus Humean Chances

101

Lazarovici, D. (2023). Typicality versus Humean probabilities as the foundation of statistical mechanics. In B. Loewer, E. Winsberg, & B. Weslake (Eds.). The probability map of the universe: Essays on David Albert’s time and chance. Cambridge: Harvard University Press. Lazarovici, D. & Reichert, P. (2015). Typicality, irreversibility and the status of macroscopic laws. Erkenntnis, 80(4), 689–716. Lewis, D. (1980). A subjectivist’s guide to objective chance. In W. L. Harper, R. Stalnaker, & G. Pearce (Eds.). IFS: Conditionals, belief, decision, chance and time, The University of Western Ontario Series in Philosophy of Science (pp. 267–297). Dordrecht: Springer Netherlands. Lewis, D. (1994). Humean supervenience debugged. Mind, 103(412), 473–490. Loewer, B. (2001). Determinism and chance. Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics, 32(4), 609–620. Loewer, B. (2004). David Lewis’s Humean theory of objective chance. Philosophy of Science, 71(5), 1115–1125. Loewer, B. (2007). Counterfactuals and the second law. In H. Price & R. Corry (Eds.). Causation, physics, and the constitution of reality. Russell’s republic revisited (pp. 293–326). Oxford: Oxford University Press. Loewer, B. (2012). Two accounts of laws and time. Philosophical Studies, 160(1), 115–137. Loewer, B. (2020). The Consequence Argument Meets the Mentaculus. Preprint: http://philsci-archive.pitt.edu/17328/. Maudlin, T. (2007). What could be objective about probabilities? Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics, 38(2), 275–291. Penrose, R. (1989). The emperor’s new mind: Concerning computers, minds, and the laws of physics. Oxford: Oxford University Press. Schwarz, W. (2014). Proving the principal principle. In A. Wilson (Ed.). Chance and temporal asymmetry (pp. 81–99). Oxford University Press. Sellars, W. (1962). Philosophy and the scientific image of man. In R. Colodny (Ed.). Frontiers of science and philosophy (pp. 35–78). Pittsburgh: University of Pittsburgh Press. Quine, W. V. (1951). Main trends in recent philosophy: Two dogmas of empiricism. The Philosophical Review, 60(1), 20–43. Werndl, C. (2013). Justifying typicality measures of Boltzmannian statistical mechanics and dynamical systems. Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics, 44(4), 470–479.

6 The Structure of Typicality

The following chapter is rather technical and can be skipped without great loss if the reader is eager to get to the meat, namely, applications of typicality in physics. Its main goal is to further disentangle the concepts of typicality and probability. Since typicality results are usually formulated in terms of measure theory—most conveniently with normalized measures—they are easily confounded with probabilistic results. This can happen by mistake or quite deliberately out of the conviction that “typical” is just another word for “very probable,” despite attempts by some authors (like myself ) to make it into a bigger deal. In fact, typicality and probability are formally, conceptually, and metaphysically distinct (cf. Wilhelm 2022). This chapter will focus on the formal distinctions, in particular on the fact that the logical structure of typicality is different from, and more minimal than, that of probability. It will discuss nonmeasure-theoretic criteria for typicality but also elaborate on the fact that large classes of measures are essentially equivalent for the purpose of typicality reasoning.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Lazarovici, Typicality Reasoning in Probability, Physics, and Metaphysics, New Directions in the Philosophy of Science, https://doi.org/10.1007/978-3-031-33448-1_6

103

104

6.1

D. Lazarovici

A Theory of “Small” and “Big” Sets

We recall: a property .P ∈ ⨅ is typical within a reference set .Ω if the great majority of elements in .Ω instantiate P . This is to say that the extension .AP = {x ∈ Ω : P (x)} is a (very) large subset of .Ω or, equivalently, that the complement .AcP is (very) small. Symbolically, Typ(P ) ↔ BIG(AP ) ↔ SMALL(AcP ).

.

This means that formalizing “typical” in a given context does not necessarily require an additive measure or even a set function assigning numerical values to subsets of .Ω. All we need is a precise notion of “big”, respectively “small” sets. Measures in the sense of mathematical measure theory are but one way—albeit a very natural and powerful one—to obtain such a notion. I will flesh this out by proposing an axiomatization of small sets. I want to emphasize right away that this is not intended as a complete analysis of typicality. In particular, I regard the axioms as necessary but by no means sufficient for a reasonable realization of the concept. The exercise is somewhat akin to the abstract definition of a topology. It identifies minimal requirements for a set structure to do the job (of supporting a notion of connectedness, continuity, convergence, etc.) but thereby admits realizations (like the trivial topology in which anything converges to everything) that are purely academic and remote from the intuitions that guided us in the first place. The deeper point is that typicality is not a formalistic concept but has a semantic and normative dimension that eludes a formal logical analysis. Definition 5 (Small Sets) Let .Ω be a non-empty set and .⨅ ⊆ P(Ω) be an algebra of subsets that we want to evaluate as small, large, or neither. .(Ω, ⨅) forms the context of a typicality argument. We call .S ⊂ ⨅ a system of small sets if it satisfies the following axioms: (i) .∅ ∈ S. (ii) .A ∈ S, ⨅  B ⊆ A ⇒ B ∈ S. (iii) .A, B ∈ S ⇒ (A ∪ B)c ∈ / S.

6 The Structure of Typicality

105

To make this more perspicuous, we can define two set predicates on .⨅ by SMALL(A) :⇔ A ∈ S

.

BIG(A) :⇔ Ac ∈ S. That is, a set is .BIG if and only if its complement is .SMALL (but note that .¬SMALL does not imply .BIG). In terms of these predicates, the axioms read (i) .SMALL(∅) (ii) .SMALL(A), ⨅  B ⊆ A ⇒ SMALL(B) (iii) .SMALL(A), SMALL(B) ⇒ ¬BIG(A ∪ B). From these three axioms, we can immediately derive the following rules:1 Lemma 1 For all .A, B ∈ ⨅ (a) .BIG(Ω) (b) .BIG(A), A ⊆ B ∈ ⨅ ⇒ BIG(B) (c) .SMALL(A), SMALL(B) ⇒ SMALL(A ∩ B) (d) .BIG(A), BIG(B) ⇒ BIG(A ∪ B) (e) .BIG(A), BIG(B) ⇒ ¬SMALL(A ∩ B) (f ) .SMALL(A) ⇒ ¬BIG(A), BIG(A) ⇒ ¬SMALL(A) In some contexts, it makes sense to consider a stronger notion of “smallness,” let’s call it .SMALL∗ , which satisfies (i*) .SMALL∗ (∅), ¬SMALL∗ (Ω) (ii*) .SMALL∗ (A), ⨅  B ⊆ A ⇒ SMALL∗ (B) (iii*) .SMALL∗ (A), SMALL∗ (B) ⇒ SMALL∗ (A ∪ B). The crucial difference between .SMALL and .SMALL∗ lies in axioms (iii) and (iii*), respectively. While the former only required that the union 1 Proofs: (a) By (i) since .Ω = ∅c . (b) By (ii) since .B c ⊆ Ac . (c) By (ii) since .A ∩ B ⊆ A. (d) From (c) since .(A ∪ B)c = Ac ∩ B c . (e) By (iii), since .SMALL(Ac ), SMALL(B c ) ⇒ ¬BIG(Ac ∪ B c ) and .Ac ∪B c = (A∩B)c , hence .¬BIG(Ac ∪B c ) ⇐⇒ ¬SMALL(A∩B). (f ) By (i) and (iii), since c .SMALL(A) ⇒ ¬BIG(A ∪ ∅ = A). The second part follows with .BIG(A) ⇐⇒ SMALL(A ).

106

D. Lazarovici

of two small sets is not a big set, the stronger axiom (iii*) requires that the union of small sets is still small. .SMALL∗ sets thus form an ideal, corresponding to the mathematical notion of negligible sets. In many cases, ∗ .SMALL will even be closed under countable unions. While this can be quite significant for technical purposes, I shall refrain from differentiating a further set predicate (say, .SMALL∗∗ ) for the sake of simplicity.

6.2

Criteria for Typicality

A trivial realization of .SMALL∗ (and a fortiori .SMALL) is .SMALL∗ (A) ⇔ A = ∅. It corresponds to calling a property “typical” if and only if it is instantiated by all elements of the reference set. Not very interesting, except maybe to demonstrate consistency of the axioms. Let us look at more useful ways to identify small sets and thus formalize typicality. By Cardinality 1. .Ω a finite set with .|Ω| = n. SMALL(A) :⇔ |A| < k, for some fixed k ≤

.

n . 3

(6.1)

This most basic and intuitive notion of smallness, defined by simple counting, satisfies axioms (i)–(iii) but not (iii*). 2. .Ω an infinite set. SMALL∗ (A) :⇔ |A| < |Ω|.

.

(6.2)

For example, finite subsets of a countably infinite set (which is closed under finite unions) or countable subsets of an uncountably infinite set (which is closed under countable unions). Measure Theory .(Ω, A, μ) a measure space with sigma-algebra .A ⊇ ⨅ and .μ not trivial (.μ(Ω) /= 0).

6 The Structure of Typicality

107

• Strong measure-theoretic notion: SMALL∗ (A) ⇔ μ(A) = 0.

.

(6.3)

Null sets are closed under countable unions. .BIG∗ sets in this strong measure-theoretic sense contain almost all elements of .Ω. In physics, this is the appropriate standard of (a)typicality when we deal with the thermodynamic limit or similar mathematical idealizations. Otherwise, it is too strong. • Weak measure-theoretic notion: SMALL(A) :⇔

.

μ(A) 1 < ∈, for some fixed ∈ ≤ . 3 μ(Ω)

(6.4)

.BIG sets in this sense contain nearly all elements of .Ω. • Family of measures: .M a set of measures on .(Ω, A).

μ(A) ≤ ∈. μ∈M μ(Ω)

SMALL(A) :⇔ sup

.

(6.5)

In other words, .SMALL(A) iff .μ(A) ≤ ∈, ∀μ ∈ M. • Non-normalizable measures: The previous definitions allow for the case 1 = 0 and . ∞ = 1. Hence, (6.4) .μ(Ω) = ∞ with the convention . ∞ ∞ becomes SMALL∗ (A) : ⇐⇒ μ(A) < ∞ = μ(Ω),

.

(6.6)

which is closed under finite unions. Dimensionality • .Ω a manifold. ¯ < dim(Ω), SMALL∗ (A) :⇔ dim(A)

.

(6.7)

108

D. Lazarovici

where .dim denotes the dimension and .A¯ the topological closure of A. The definition extends to normal topological spaces using the topological dimension. Topology • .Ω a topological space. SMALL∗ (A) :⇔ A is a nowhere dense set.

.

(6.8)

A set A is nowhere dense if its closure has empty interior, i.e., if for any neighborhood .U ⊆ Ω, there exists a non-empty open set .V ⊆ U such that .A ∩ V = ∅. The complement of a nowhere dense set is an open dense set. Nowhere dense sets form an ideal. • A set is called meager (also thin or Baire first category) if it is a countable union of nowhere dense sets. If .Ω is a Baire space,2 (6.8) can be weakened to SMALL∗ (A) :⇔ A is a meager set,

.

(6.9)

which is even closed under countable unions. .Q ⊂ R is an example of a set that is meager but dense (and thus, in particular, not nowhere dense). Smallness defined in terms of cardinality or dimensionality is, in general, stronger than the strong measure-theoretic definition in terms of null sets. More precisely: • If .Ω is uncountably infinite and .μ is zero on singletons, all countable sets have measure zero (while the converse is not generally true). • If .Ω ∼ = Rn , all measurable subsets of dimension .< n have Lebesgue measure zero. The topological and measure-theoretic notions are orthogonal and, in some sense, dual to one another. The nowhere denseness of a set bears no relation to its measure (except that a nowhere dense set cannot have 2A

Baire space is a topological space in which countable unions of closed sets with empty interior have empty interior. This ensures that .Ω is not itself meager.

6 The Structure of Typicality

109

full measure if .μ is strictly positive on open sets). In particular, there exist not only dense sets of Lebesgue measure zero (e.g., .Q ⊂ R) but also nowhere dense subsets of the unit interval with measure arbitrarily close to 1 (“fat Cantor sets”).3 .R can even be partitioned into two disjoint sets, c c .R = A ∪ A , such that A is meager and .A is a Lebesgue null set. This incommensurability of topological and measure-theoretic criteria poses a threat to my claim that typicality is one unified concept and that reasonable standards of typicality can be more or less strict or suitable for a particular context but won’t pull in completely opposite directions. My view on the matter is that the topological notions of meagerness or nowhere denseness are not reasonable standards for typicality in the sense discussed in this book. Roughly speaking, topology is about closeness and separation of points, not about their quantity. If a set .A ⊂ Ω is nowhere dense, it means that its points do not accumulate in .Ω (every open subset of .Ω contains points that are topologically separated from A). When we talk about the initial conditions of a physical system, this does capture a sense of counterfactual robustness but one that is different from, and complementary to, typicality. A property that holds for all but a nowhere dense set of initial conditions is robust against sufficiently small perturbations. But the “bad” initial conditions can still be all over the place and make up an arbitrarily large portion of the phase space .Ω. A deeper reason for the structural similarity between meager sets and null sets is manifested in the Sierpinski–Erdös Duality Theorem: Assuming the truth of the continuum hypothesis (which is undecidable in standard

3 The

best-known example of a nowhere dense set with positive measure is the Smith–Volterra– Cantor set. It is constructed as follows: In the first step, remove from the unit interval .[0, 1] the middle open interval of length .1/4, leaving .S1 = [0, 3/8] ∪ [5/8, 1]. In the n’th step, remove from each of the remaining .2n−1 intervals the middle open interval of  n length . 41 , leaving a union .Sn of .2n closed connected intervals.  The Smith–Volterra–Cantor set is then .S∞ := ∞ n=1 Sn . .S∞ is closed as a countable intersection of closed sets. And it contains no open interval (every interval is broken up at some step of the construction), hence it has an empty interior and is nowhere ∞ ∞ n−1   1 2 = = 21 from the unit dense. However, we removed in total a set of measure . 22n 2n+1 n=1

interval, so that .λ(S∞ ) = 21 .

n=1

110

D. Lazarovici

ZFC set theory), there exists a self-inverse function .f : R → R that maps meager sets to Lebesgue null sets and vice versa. That is, .f (A) is meager if and only if A is a Lebesgue null set, and .f (A) is null if and only if A is meager.

6.3

Typicality Measures

I will present some results about the possibility of characterizing small sets—and thus typicality—in measure-theoretic terms. The positive results may shed some light on the naturalness of the measure-theoretic criteria. But the negative results are equally important. It is in those cases, when small sets cannot be identified with sets of small measure, that the formal and logical distinction between “(a)typical” and “very (im)probable” becomes manifest. The first theorem specifies the condition under which a system of small sets can be identified with the null sets of an appropriate measure. Theorem 1 Let .σ (S) be the sigma-algebra generated by .S, i.e., the smallest sigma-algebra containing all small sets. There exists a measure .μ on .σ (S) such that

SMALL(A) ⇔ μ(A) = 0

.

(6.10)

if and only if .SMALL(·) is closed under countable unions.4 In practice, the measure obtained from this theorem may not be very useful. It is only constructed on a small sigma-algebra (the sigma-algebra generated by .SMALL sets), which need not even contain all of .⨅ (i.e., 4 Proof: For any measure .μ, the sets of measure zero form a sigma-ideal that is closed under .σ -subadditivity of measures: If .(Ai )i ≥ 1 is a countable countable unions. This follows from the   family with .μ(Ai ) = 0 ∀i ≥ 1, then .μ( i≥1 Ai ) ≤ ∞ suppose that i=1 μ(Ai ) = 0. Conversely,  0; if SMALL(A) .SMALL(·) is closed under countable unions. Then the set function .μ(A) ˜ = 1; if BIG(A) extends to a (normalized) measure .μ on .σ (S).

6 The Structure of Typicality

111

all the properties or propositions that initially defined the context of our typicality reasoning). The measure is nonetheless sufficient to identify all small and big sets, while any set .A ∈ ⨅ \ σ (S) is neither. The following negative result holds for the weak measure-theoretic criterion (6.4), which is the one most commonly used for typicality arguments. Theorem 2 A measure .μ on .(Ω, A) is called non-atomic if .μ(A) > 0 ⇒ ∃B ⊂ A : 0 < μ(B) < μ(A). Let .(Ω, A, μ) be a non-atomic measure space. Let .∈ ≤ 21 μ(Ω) and

SMALL(A) ⇔ μ(A) < ∈, A ∈ A.

.

(6.11)

Then there exists .A, B ∈ A with .SMALL(A), SMALL(B) but ¬SMALL(A ∪ B). Hence, .SMALL cannot be closed under unions (i.e., satisfy the axioms of .SMALL∗ ) if defined on the entire sigma-algebra.5

.

Theorem 2 is the reason why not all notions of “smallness” can be reduced to the measure-theoretic criterion (6.4). Wilhelm (2022) provides the following example. Example Consider .Ω = N with the cardinality criterion for smallness, i.e., .SMALL∗ (A) ⇐⇒ |A| < ∞. For .n ≥ 1, let .An = {1, . . . , n}. Then .(An )n∈N is an ascending sequence of finite (and thus small) sets with . ∞ n=1 An = N. Now suppose there was a measure .μ with .μ(An ) < ∈ < μ(N) for n. By the upward continuity of measures, we would have all ∞ .μ(N) = μ n=1 An = lim μ(An ) ≤ ∈ and hence a contradiction. n→∞

It is, however, crucial on what domain we are looking for a measure. Theorem 2 states that the condition .SMALL(A) ⇔ μ(A) < ∈ cannot 5 Proof:

For non-atomic measures, Sierpinski’s theorem provides a kind of intermediate-value theorem: there exists, for any .A ∈ A and .p ∈ [0, μ(A)], a .B ∈ A with .μ(B) = p. Since 3 5 c .μ(Ω) ≥ 2∈, there thus exists a measurable B with .μ(B) = 4 ∈. And since .μ(B ) ≥ 4 ∈, there 3 c exists .A ∈ B with .μ(A) = 4 ∈. Hence, .μ(A) = μ(B) < ∈. But because A and B are disjoint, 6 .μ(A ∪ B) = μ(A) + μ(B) = ∈ > ∈. 4

112

D. Lazarovici

be closed under unions on the whole sigma-algebra of measurable subsets of .Ω. As discussed in Sect. 3.2.4, it is often still possible to reconcile (6.4) (the weak measure-theoretic criterion) with axiom (iii*) (closedness under unions) by restricting our considerations to a more limited algebra .⨅ ⊂ A of events that constitute the context of our typicality reasoning. This was a key insight behind Leitgeb’s stability theory of belief, whose main result is summarized in the following theorem (cf. Leitgeb 2017, Thm. 7). Theorem 3 Suppose .⨅ is finite/countable and

BIG(A) ⇐⇒ μ(A) ≥ 1 − ∈, A ∈ ⨅

.

for a normalized measure .μ. Then the following two conditions are equivalent: (i) .BIG(·) is closed under finite/countable intersections. (ii) There exists a smallest .BIG set .BΩ such that ∀A ∈ ⨅ : BIG(A) ⇐⇒ BΩ ⊆ A.

.

(6.12)

Evidently, .BΩ is the union of all .BIG sets. And it is also .μ-stable, meaning that, for any .A ∈ ⨅ with .μ(A) > 0 and .BΩ ∩ A /= ∅, 1 μ(BΩ | A) ≥ . 2

.

(6.13)

In particular, .¬SMALL(BΩ | A) for all compatible .A ∈ ⨅.6

 := BIG(A) A is the desired set satisfying (ii). Assuming (ii), let .(Ai )i≥1 be a finite/countable sets. Then, .∀i ≥ 1 : BIG(Ai ) ⇐⇒ ∀i ≥ 1 : BΩ ⊆ collection of .BIG  Ai ⇐⇒ BΩ ⊆ i Ai ⇐⇒ BIG( i Ai ). Hence, .BIG(·) is closed under intersections. To prove (6.13), we consider, for .μ(A) > 0, the conditional probability .μ(B | A) := μ(B∩A) μ(A) =  −1 μ(B c ∩A) μ(B∩A) . Now we use the following lemma: If .X ∈ ⨅ with .X ⊂ μ(B∩A)+μ(B c ∩A) = 1 + μ(B∩A) 6 Proof: Assuming (i), .B Ω

BΩ , then .μ(X) > ∈. For otherwise, .μ(Xc ) ≥ 1 − ∈ ⇒ BIG(Xc ), but .BΩ  Xc in contradiction c ∩ A) ≤ μ(B c ) ≤ ∈, and to (ii). We conclude: .BΩ ∩ A ⊂ BΩ ⇒ μ(BΩ ∩ A) > ∈, while .μ(BΩ 1 thus .μ(BΩ | A) > 2 .

6 The Structure of Typicality

113

6.3.1 Equivalence of Typicality Measures I have repeatedly insisted that, since the role of a typicality measure is only to identify “very large” resp. “very small” sets, a great many measures can do the job. The goal of this section is to make more precise in what sense different measures can be regarded as equivalent qua typicality measures.

Absolute Continuity Let .(Ω, A) a measurable space and .μ, ν two measures on it. .ν is called absolutely continuous with respect to .μ (notation .ν  μ) if μ(A) = 0 ⇒ ν(A) = 0, ∀A ∈ A.

.

(6.14)

Assuming that the measures are .σ -finite (which means that .Ω can be written as a countable union of sets with finite measure), the Radon– Nikodym theorem states that .ν  μ if and only if .ν has a density with respect to .μ. This means that there exists a non-negative, .μ-integrable function g (unique up to sets of measure zero) such that

ν(A) =

g dμ, ∀A ∈ A.

.

(6.15)

A

Obviously, .ν ∼ μ : ⇐⇒ ν  μ ∧ μ  ν defines an equivalence relation of measures determining the same null sets—and thus the same ∗ .SMALL sets in the sense of (6.3).

Total Variation and Typicality Thresholds Absolute continuity thus provides a precise notion of equivalence of typicality measures with respect to the strong measure-theoretic criterion, i.e., when we say that P is typical in .Ω iff .P (x) for almost all .x ∈ Ω. When we admit exception sets of small but positive measure—as we do

114

D. Lazarovici

in most contexts—“typical” is less sharp, and the equivalence of measures must be taken with a grain of salt. In the situation of Theorem 3, when typicality is determined by a smallest big set, the issue is still clear. Given .BΩ such that .∀A ∈ ⨅ : BIG(A) ⇐⇒ BΩ ⊆ A, we have ∀A ∈ ⨅ : BIG(A) ⇐⇒ μ(A) ≥ μ(BΩ )

.

(6.16)

for any measure .μ with .BΩ  A ∈ ⨅ ⇒ μ(A) < μ(BΩ ). Simply put, typicality is already determined by the set-theoretic structure of .⨅, and a great many measures will be able to identify the “big sets” with .μ(BΩ ) as threshold size. In practice, however, we usually find ourselves in the opposite situation of not knowing the proposition .BΩ (if it exists) but relying on a natural measure to identify typicality facts. In this situation, it rarely makes sense to specify a sharp threshold measure for “big” respectively “small” sets, i.e., an exact value for .∈ in (6.4), a priori. Instead, we should understand .SMALL (and thus typicality) as a vague predicate, which is, in fact, better captured by the informal condition .SMALL(A) ⇐⇒ μ(A) ≈ 0. Instead of a sharp threshold measure, it is more appropriate to specify some .∈ > 0 such that .μ(A) < ∈ is sufficient for regarding A as negligibly small in the given context. That is, μ(A) < ∈ ⇒ SMALL(A), A ∈ Π.

.

(6.17)

Similarly, we can specify .∈ < ϒ < 1 − ∈ such that .μ(A) < ϒ is a necessary condition, i.e., SMALL(A) ⇒ μ(A) < ϒ,

.

(6.18)

while remaining agnostic about the range .μ(A) ∈ [∈, ϒ]. If we are primarily interested in identifying small sets in the sense of (6.17), then, again, a large class of measures can do the job. Unless two measures differ drastically from one another, they will agree that certain subsets of .Ω are very small. But what does it mean for two measures to differ “drastically”?

6 The Structure of Typicality

115

Let me assume normalized measures from here on for simplicity. A natural metric on the space of normalized measures on .(Ω, A) is the total variation distance dT V (μ, ν) = sup |μ(A) − ν(A)|.

.

(6.19)

A∈A

That is, the total variation distance between .μ and .ν is the maximal difference between the values that the two measures can assign to one and the same set. Evidently, if .dT V (μ, ν) = δ < ∈, then ν(A) < ∈ − δ ⇒ μ(A) < ∈.

.

(6.20)

In other words, if .μ(A) < ∈ is sufficient for .SMALL(A), then .ν(A) < ∈ − δ will be sufficient as well.

A Bound from Densities A bound in total variation is actually stronger than necessary for our purposes. Given a normalized measure .μ as reference, all we need is a bound on .

sup {|ν(A) − μ(A)| : μ(A) < ∈} .

(6.21)

If .ν  μ with density g, then (6.21) can be written as .supμ(A) 0:

.

λE

   b   N 1{vi,x ∈[a,b]} (X) − a ρMB (vx )dvx  > ∈ X :  N1 i=1

→ 0, N → ∞,

(7.12)

with   2π k T − 21 mvx2 B .ρMB (vx ) := exp − . m 2kB T

(7.13)

126

D. Lazarovici

The corresponding distribution for the three-dimensional velocity vector v is thus   2π k T − 23 mv 2 B 3 .ρMB (v) d v := exp − (7.14) d3 v. m 2kB T The derivation of (7.11)–(7.14) requires little more than standard calculus and measure theory. The interesting philosophical question is what this mathematical result means. The function .ρMB (v) is called the Maxwellian or Maxwell–Boltzmann distribution. It is a probability density describing a distribution of particle velocities. Despite the appearance of a probability distribution, there is nothing intrinsically random about the velocities of particles in a gas. The velocity (and position) of every single particle is comprised in the microstate X, whose evolution follows a deterministic equation of motion. There are possible X for which the actual velocity distribution differs significantly from the Maxwellian. For instance, microstates for which all particles move with one and the same velocity, or microstates for which a few very fast particles account for almost the entire kinetic energy. But these states are extremely special ones. The crucial and remarkable fact expressed by (7.12) is that, for large N , the overwhelming majority of possible microstates X is such that the distribution of velocities in the gas is (approximately) Maxwellian. What constitutes an “overwhelming majority of microstates” is made precise in terms of the stationary measure .λE . The Maxwell distribution is thus derived from the microscopic theory as a statistical regularity manifested for typical micro-configurations. Ludwig Boltzmann expressed this reasoning as follows: The ensuing, most likely state, which we call that of the Maxwellian velocity distribution, since it was Maxwell who first found the mathematical expression in a special case, is not an outstanding singular state, opposite to which there are infinitely many more non-Maxwellian velocity distributions, but it is, on the contrary, distinguished by the fact that by far the largest number of possible states have the characteristic properties of the Maxwellian distribution, and that compared to this number the amount of possible velocity distributions that deviate significantly from Maxwell’s is

7 From the Universe to Subsystems

127

vanishingly small. The criterion of equal possibility or equal probability of different distributions is thereby always given by Liouville’s theorem (Boltzmann, 1896, p. 252, translation by the author).

It is crucial to appreciate that, while three different measures appear in the mathematical expression (7.12), their status is very different (cf. Goldstein 2012). We have: • The actual (empirical) distribution .ρemp [X] dv =

1 N

N 

1{vi,x ∈dv} (X),

i=1

yielding the ratio of particles with .vx ∈ dv as a function of the microstate X. 1 mv 2 • The theoretical (Maxwellian) distribution .ρMB dv ∝ exp − kB T 2 dv. • The typicality (microcanonical) measure .λE . Equation (7.12) thus tells us that .ρemp ≈ ρMB for typical microstates X ∈ ГE . This characterizes the equilibrium distribution for an ideal gas. Notably, the Maxwellian .ρMB and the empirical distribution .ρemp refer to the ensemble of particles in the box, whereas the microcanonical measure does not refer to an ensemble of boxes but is used to define typicality. I should also emphasize again the very limited extent to which knowledge, information, credence, or any other subjectivist notions play a role in the analysis. It is an objective fact that, for nearly all possible microstates, the distribution of velocities in an ideal gas is approximately Maxwellian. This typicality fact has nothing to do with quantifying our ignorance and would not be any less true if we somehow did have complete knowledge of the system’s microstate. So, should we expect to find a given gas system in equilibrium because it is the typical macrostate? Indeed, we should. In actuality, we will sometimes observe gases out of equilibrium, but then we would infer that the system was fairly recently subject to external interactions, thus invoking an explanation for its atypical state. This is all based on the rationality principles of typicality reasoning laid out in Chap. 1.

.

128

D. Lazarovici

7.2.2 The Coin Toss Again Analogous reasoning can be applied to more mundane examples like the repeated tossing of a coin. It is a statistical regularity found in our universe that the relative frequency of heads or tails in a long series of fair coin tosses comes out as approximately .1/2. Since coin tossing is guided by the same laws as all other physical processes, this statistical regularity has to be explained on the basis of the fundamental microscopic theory (here, classical mechanics). It is not a new kind of law that holds over and above the deterministic micro-dynamics. We have already seen what such an explanation would look like. We denote by .χi (X) ∈ {0, 1} the outcome of the i’th coin toss in a long series of N trials. Since classical mechanics is deterministic, the outcome of every single trial is determined, through the fundamental laws of motion, by suitable initial conditions X. Obviously, the functions .1i are very coarse-graining. We do not care about the exact configuration of atoms making up the coin; we do not even care about the exact position or orientation of the coin. We only ask which side is facing up as the coin lands on the floor. This defines our macroscopic variable (Fig. 7.1). To keep the discussion focused on physics, it helps to exclude all human involvement and think of the coin being tossed, not by human hand, but by a coin-tossing machine, i.e., a mechanical device that takes a coin, tosses it, registers heads or tails, takes the coin again, tosses it again, and so on. We may imagine this system as isolated; that is, the machine is set up at some initial time .t0 , resulting in certain initial micro-conditions, and from there on, everything runs its deterministic course. A stereotypical explanation of the coin-tossing probabilities would now be a result of the form N  1   .λ  χi (X) − N i=1

 1    > ∈ M0 ≤ δ(∈, N ), 2

(7.15)

where .M0 is the initial macrostate of the coin-tossing machine and .δ(∈, N ) becomes arbitrarily small with increasing N . The variable X thus ranges over possible initial conditions of our coin-tossing machine, and

7 From the Universe to Subsystems

129

Fig. 7.1 The coin toss outcome .χi at time t is determined by microscopic initial conditions X in the macrostate .M0 and deterministic dynamics represented by the flow .Фt,0 . Thus, .χi (X) is obtained by evolving X and coarse-graining .X(t) to the macro-event heads or tails

(7.15) says that, if N is sufficiently large, the set of initial conditions for which the relative frequency of heads deviates significantly from .1/2 is extremely small. Such initial conditions are not impossible, but atypical. Conversely, long-term frequencies close to .1/2 are not necessary, but typical. By now, the reader will have certainly made the connection with our previously discussed model in terms of Rademacher functions and also recognized (7.15) as a law-of-large-numbers result. Still, we could not be entirely satisfied with an explanation based on typical initial conditions of our coin-tossing machine. Even if we accept the idealization of a perfectly isolated machine, its initial microstate is itself the result of physical processes (the process of setting up the machine, for instance) and thus determined by the physical laws and suitable initial conditions. Why should these initial conditions be such that the device ends up in one of the “good” microstates that produce a roughly equal number of heads and tails? If we think this through till the end, we must eventually speak about the universe as a whole—the only truly closed system—and treat the universe as one big coin-tossing machine.

130

D. Lazarovici

The point here is of conceptual rather than practical relevance. We know, of course, that classical mechanics does not accurately describe the entire universe, and we would never be able to prove a cosmological version of (7.15) even if it did. Still, if we want to argue that Newtonian theory is sufficient to explain coin toss statistics—and it is hard to see why relativistic or quantum effects should be relevant here—we must conceive of these statistics as a regularity of a Newtonian universe. In the context of our limited mathematical treatments, we might be more comfortable speaking of “model universes” rather than “models of the universe.” That’s perfectly fine as long as these model universes follow the relevant deterministic laws and we don’t smuggle in some external source of “randomness.”

7.3

Deterministic Subsystems

When we think of classical mechanics, the first applications that come to mind seem very different from the coin toss or the velocity distribution of molecules in a gas. For instance, when we want to predict the trajectory of a stone thrown on Earth, we can get very accurate results by solving a simple deterministic equation without being embarrassed by our ignorance of the exact initial microstate of the stone or its environment. There are two conditions satisfied in this case that allow us to do so: 1. The external forces, that is, the influences of the rest of the universe neglected in our calculations, are very small compared to the gravitational attraction between stone and Earth. This is because other gravitating bodies are either very far away or have very small masses compared to our planet. Formally, this is to say that Vext ≈ 0,

.

(7.16)

which allows us to treat the system (stone, Earth) as an autonomous Newtonian system for most practical purposes. 2. The evolution of the relevant macroscopic variable—here, the stone’s center of mass—is fairly robust against variations of the microscopic

7 From the Universe to Subsystems

131

initial conditions. In other words, small changes in the initial microconditions (typically) have only a small effect on the stone’s trajectory. This is why our ignorance about the exact microscopic configuration of the stone (or planet Earth, or the person throwing the stone) does not prevent us from making reliable predictions.

7.3.1 The Stone Throw Nonetheless, even in this case, our predictions are, strictly speaking, typicality results. Atypical events in the environment or among the microscopic constituents of the stone would lead to very different flight trajectories. For instance, as (Albert, 2015, p. 1) reminds us, the stone could be “suddenly ejecting one of its trillions of elementary particulate constituents at enormous speed and careening off in an altogether different direction.” It would thus be quite appropriate to cast our mechanical prediction in a form very similar to the probabilistic statements (7.12) or (7.15). For instance, denoting by .q(t) the computed center-of-mass trajectory (depending on its initial position and momentum) and by .q(t) ˜ the actual one (depending on the initial condition X of the universe), an “honest” statement might take the form:

 λ

.

    X : sup |q(t) ˜ − q(t)| > ∈ M0 ≈ 0,  t∈[0,T ]

(7.17)

where .M0 is the initial macrostate, which includes approximate initial conditions for the stone but also our macroscopic evidence justifying the description of the system (stone, Earth) as an isolated Newtonian subsystem. It is because atypical events generally don’t happen that our “deterministic” predictions are reliable. But just as it is easy to confuse unpredictability for indeterminacy, the predictability of certain macroscopic regularities—from the stone throw to the oscillations of a pendulum to planetary motions—can create an illusion of necessity when what we actually observe and tacitly assume is typical behavior.

132

D. Lazarovici

As Erwin Schrödinger emphasized, even the proverbial clockwork must ultimately be understood in the “statistical picture” and thus as a typical phenomenon: Whether the motion of a clock is to be assigned to the dynamical or to the statistical type of lawful events (to use Planck’s expressions) depends on our attitude. In calling it a dynamical phenomenon we fix attention on the regular going that can be secured by a comparatively weak spring, which overcomes the small disturbances by heat motion, so that we may disregard them. But if we remember that without a spring the clock is gradually slowed down by friction, we find that this process can only be understood as a statistical phenomenon. However insignificant the frictional and heating effects in a clock may be from the practical point of view, there can be no doubt that the second attitude, which does not neglect them, is the more fundamental one, even when we are faced with the regular motion of a clock that is driven by a spring. For it must not be believed that the driving mechanism really does away with the statistical nature of the process. The true physical picture includes the possibility that even a regularly going clock should all at once invert its motion and, working backward, rewind its own spring—at the expense of the heat of the environment. (Schrodinger 2012[1944], pp. 82– 83)

What Schrödinger alludes to in the last sentence is the possibility— though atypicality—of a violation of the second law of thermodynamics. The second law and its microscopic explanation will be the focus of our next chapter.

References Albert, D. Z. (2015). After physics. Cambridge, Massachusetts: Harvard University Press. Boltzmann, L. (1896). Entgegnung auf die wärmetheoretischen Betrachtungen des Hrn. E. Zermelo. Wiedemanns Annalen, 57, 773–784.

7 From the Universe to Subsystems

133

Goldstein, S. (2012). Typicality and notions of probability in physics. In Y. Ben-Menahem & M. Hemmo (Eds.). Probability in physics. The Frontiers Collection (pp. 59–71). Berlin: Springer. Schilpp, P. (Ed.) (1949). Albert Einstein: Philosopher-scientist. Number VII in The Library of Living Philosophers. The Library of Living Philosophers Inc. (1st edn.). Evanston, Illinois. Schrodinger, E. ([1944] 2012). What is life? (reprint edn.) Cambridge: Cambridge University Press.

8 Boltzmann’s Statistical Mechanics

This chapter will discuss statistical mechanics in the manner of Ludwig Boltzmann (1844–1906). I will focus, in particular, on Boltzmann’s account of thermodynamic irreversibility and the second law of thermodynamics—the most profound legacy of the great Austrian physicist, in which the role of typicality comes out particularly clearly. In contemporary philosophical literature, this account is sometimes referred to as “neo-Boltzmannian,” a term that strikes me as overly flattering to both Boltzmann’s critics and his contemporary defenders. This is not to diminish the important contributions of Bricmont (1995); Carroll (2010); Goldstein (2001); Lebowitz (1993a,b); Penrose (1989), and others to whom we owe the modern presentations and elaborations of Boltzmann’s ideas. I believe these authors would agree with me that the key concepts and insights were all there in Boltzmann’s original work and have stood the test of time, despite ongoing but largely unnecessary controversies and misunderstandings.1

1 See Bricmont (1995) and Lazarovici and Reichert (2015) for replies to some of the most common ones.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Lazarovici, Typicality Reasoning in Probability, Physics, and Metaphysics, New Directions in the Philosophy of Science, https://doi.org/10.1007/978-3-031-33448-1_8

135

136

8.1

D. Lazarovici

The Second Law of Thermodynamics

Our discussion is concerned with the explanation of the irreversible thermodynamic behavior of macroscopic systems. “Thermodynamic behavior” refers to the ubiquitous phenomenon that physical systems, prepared or created in a non-equilibrium state and then suitably isolated from their environment, evolve to, and then stay in, a distinguished macroscopic configuration which is their equilibrium state. Familiar examples are the expansion of a gas, the dissipation of heat, and the mixing of milk and coffee. This empirical regularity is captured by the second law of thermodynamics, which posits the monotonic increase of a macroscopic variable of state called entropy that attains its maximum value in equilibrium. One of the main tasks of statistical mechanics is to explain this macroscopic regularity on the basis of the more fundamental laws guiding the behavior of the system’s microscopic constituents. We recall a key concept of Boltzmann’s statistical mechanics which is the distinction between microstates and macrostates. Whereas the microstate .X(t) of a system is given by the complete specification of its microscopic degrees of freedom, the macrostate .M(t) is specified in terms of physical variables that characterize the system on macroscopic scales (e.g., volume, pressure, temperature, and magnetization). A system’s microstate completely determines its macrostate, .M(t) = M(X(t)), but one and the same macrostate is realized by many different microstates for which the macro-variables attain the same or (for the relevant purpose) indistinguishable values. We thus obtain a partition or coarse-graining of the microscopic state space into macrostates. Turning to the phase space picture of classical Hamiltonian mechanics for an N -particle system, a microstate corresponds to one point .X = (q, p) in phase space .Г ∼ = R3N × R3N , .q = (q1 , q2 , . . . , qN ) being the position and .p = (p1 , p2 , . . . , pN ) the momentum coordinates of the N particles, while a macrostate M corresponds to an entire region .ГM ⊆ Г of phase space, viz., the set of microstates that realize M. The microscopic laws of motion are such that any initial microstate .X0 determines the complete micro-evolution .X(t) = Фt,0 (X0 ) of

8 Boltzmann’s Statistical Mechanics

137

the system—represented by a unique trajectory in phase space going through .X0 —thereby also determining the macro-evolution .M(X(t)) as the microstate passes through different macro-regions. These concepts are pretty much forced on us if we accept the supervenience of macroscopic facts on microscopic facts. And they are essential to appreciating the challenge at hand. The second law of thermodynamics describes an empirical regularity of the macro-evolution of physical systems. This macro-evolution is always determined by the evolution of the microstate, which follows exact deterministic laws of motion. The goal of statistical mechanics is thus to explain the empirical regularity expressed by a macroscopic law on the basis of the more fundamental microscopic theory. When it comes to the second law of thermodynamics, this seems like a quite formidable task, as it requires us to reconcile the irreversibility of thermodynamic behavior with the time-reversal invariance of the microscopic laws of motion (see Appendix A for a discussion of time-reversal symmetry). Simply put, the microscopic laws would allow any process to occur in reverse, while the second law tells us that thermodynamic processes never do. The task was nonetheless accomplished by Ludwig Boltzmann in the second half of the nineteenth century. His account is based on two essential insights: 1. The identification of entropy S with the logarithm of the phase space volume corresponding to a system’s current macrostate. Formally, S := kB ln |ГM(X) |,

.

(8.1)

where .kB is the Boltzmann constant and .|ГM | denotes the phase space volume of the macro-region .ГM , the set of all microstates .X ∈ Г that realize the macrostate M. Phase space volume is captured by the Liouville measure or the induced micro-canonical measure if we consider systems of fixed total energy. 2. The understanding that the separation of scales between the microscopic and the macroscopic level leads to enormous differences in the phase space volume corresponding to states with different values of entropy. In particular, we generally find that the equilibrium region—by defini-

138

D. Lazarovici

tion, the region of maximum entropy—is vastly larger than any other macro-region, so large, in fact, that it exhausts almost the entire phase space volume. In other words, nearly every microstate is an equilibrium state. Or again, the equilibrium values are the typical values of the relevant macro-variables. In the simplest case, when the particles can be considered independent and the macro-variables are sums or averages of one-constituent func tions, say of the form .F (X) = N1 N i=1 f (qi , pi ), this is simply the law of large numbers at work: On a set of nearly full measure, (relative) fluctuations in the value of F are of not much higher order than . √1N . The two points are related as follows. Entropy is an extensive variable of state, meaning that it scales like N (the number of microscopic constituents) for fixed values of the other macro-variables. Notable entropy differences are thus of order N (times Boltzmann constant), and the difference in phase space measure of the corresponding macro-regions of order .exp(N ). If we now recall that .N ∼ 1024 for macroscopic systems (from Avogadro’s constant), we see that different entropy levels correspond to enormous differences in phase space measure. In other words, we generally find that, for systems with a large number of degrees of freedom, the coarse-graining of microstates into macrostates does not correspond to a partition of phase space into regions of roughly the same size, but into regions whose sizes vary by a great many orders of magnitude, with the equilibrium region being the largest by far (Fig. 8.1). Fig. 8.1 Partition of phase space into macro-regions. Size differences are much more extreme than depicted

Equilibrium

8 Boltzmann’s Statistical Mechanics

139

Given such a partition of phase space, we can already understand that the thermodynamic behavior we want to explain is not a feature of some very particular micro-dynamics. On the contrary, a microevolution would have to be very peculiar to avoid carrying a microstate into larger and larger macro-regions—corresponding to an increase in entropy—and finally into the equilibrium region, where it spends most of the time throughout its history. This is why Boltzmann’s account is extremely robust against the details of the microscopic theory, giving us an understanding of thermodynamic behavior as a virtually universal feature of macroscopic systems. And yet, it cannot be true that all initial micro-conditions lead to an evolution of increasing (or non-decreasing) entropy. This is a straightforward consequence of the time-reversal symmetry of the microscopic dynamics, as was famously pointed out by Josef Loschmidt in his “reversibility objection.” Lebowitz rightly warned us, quoting Ruelle, that Boltzmann’s ideas are “at the same time simple and rather subtle” (Lebowitz, 1993b, p. 7).

8.1.1 The Typicality Account Let us approach the subtleties in the explanation of thermodynamic irreversibility by discussing the paradigmatic example of an expanding gas in a box. We thus consider a system of .N ≈ 1024 particles—interacting by a short-range potential (or not at all in the model of an ideal gas)—which are confined to a finite volume within a box with reflecting walls. Assume that we find or prepare the gas in the macrostate .M2 sketched below (Fig. 8.2), that is, a particle configuration that looks, macroscopically, like a gas filling about half of the accessible volume. What kind of macroscopic evolution should we expect for this system? A simple combinatorial argument shows that the overwhelming majority of microstates that the system could evolve into look, macroscopically, like .Meq , i.e., like a gas homogeneously distributed over the entire box. In fact, one can readily conclude that (in agreement with our general reasoning above) the phase space volume corresponding to this

140

D. Lazarovici

Fig. 8.2 Thermodynamic evolution of an expanding gas

24

equilibrium macrostate .Meq is about .2N ≈ 1010 times(!) larger than the phase space volume corresponding to a macrostate like .M2 . Hence, as the particles move with different velocities in different directions, scattering from each other and occasionally from the walls, the system’s microstate wanders around on an erratic path in the highdimensional phase space, and we should expect that this path will soon end up in the equilibrium region .ГMeq and stay there for the foreseeable future. Fluctuations out of equilibrium, e.g., from .Meq back into .M2 , are possible. They must, in fact, occur for almost all initial conditions according to the Poincaré recurrence theorem.2 However, as Boltzmann (1896) already explained (see also Ehrenfest 1907), the typical time scales for such large fluctuations are so astronomical—many orders of magnitude greater than the age of our universe—that they have no empirical relevance. It was also clear to Boltzmann (at least after Loschmidt’s objection) that there are initial conditions in .ГM2 for which the system will not exhibit thermodynamic behavior but follow an anti-thermodynamic trajectory of decreasing entropy. If we consider a macrostate .M1 of even lower entropy (e.g., the one depicted in Fig. 8.2), the time-reversal symmetry of the microscopic laws implies that, for every solution evolving from .ГM1 into .ГM2 , there exists another solution carrying a microstate from .ГM2 into the lower-entropy region .ГM1 . (Indeed, we only have to take the solutions 2 Poincaré recurrence theorem: Let .A ⊂ Г be a set of positive Liouville measure. Then, for almost all .X ∈ A, there exists a sequence of times .ti → ∞ such that .X(ti ) = Фti ,0 (X) ∈ A, ∀i ∈ N.

8 Boltzmann’s Statistical Mechanics

141

that have evolved from .ГM1 into .ГM2 and reverse all particle velocities.) However, the initial micro-conditions in .ГM2 that lead to such an antithermodynamic evolution are atypical relative to all possible microstates realizing .M2 ; they form a subset whose measure is a tiny fraction of .|ГM2 |. The correct statement is thus that nearly all microstates in .ГM2 will evolve into the equilibrium region .ГMeq , while only a very small subset of “bad” initial conditions will exhibit the anti-thermodynamic evolution into a lower-entropy state. In other words, a thermodynamic evolution of increasing entropy is typical given the initial non-equilibrium macrostate .M2 . It is actually more appropriate to think of the set of all possible microevolutions with initial conditions in .ГM2 rather than any individual trajectory. The dynamics of a system of .N ≈ 1024 particles is very chaotic, in the sense that small variations of the initial conditions can lead to very different time evolutions. Under the microscopic dynamics, the set of microstates realizing .M2 at the initial time will thus disperse over phase space (respectively a submanifold compatible with the constants of motion), with the overwhelming majority of microstates ending up in the equilibrium region and only a small fraction of “bad” initial configurations evolving into the comparably tiny macro-regions of equal or lower entropy. Remark 4 (On the Notion of Chaos) The notion of “chaos” is difficult to exhaust with rigorous mathematical definitions. It is clear that some form of dynamical instability is characteristic of thermodynamic systems with many degrees of freedom, and various formal concepts try to capture this feature in some neat, rigorous, mathematically polished way. The fruitfulness of these concepts in certain areas of mathematics has contributed to the idea that one of them in particular must play a central role in the foundations of statistical mechanics and be identified as the precise dynamical assumption underlying Boltzmann’s account of the second law. However, as emphasized before, the explanation of thermodynamic behavior is extremely robust against the details of the microscopic model and doesn’t hinge on any narrowly conceived property of the dynamics. In particular, the relevant systems might easily fail to be ergodic, or mixing, or have everywhere positive Lyapunov exponents—to throw around some

142

D. Lazarovici

mathematical jargon—though their overall behavior would have to be completely qualitatively different from what it is generally understood to be in order to render the typicality account irrelevant.

8.1.2 Macroscopic Irreversibility While my presentation of Boltzmann’s account already incorporated his response to Loschmidt and other critics, the solution to the “reversibility paradox” remains to be explicitly spelled out. We argued that it is typical for microstates realizing a low-entropy macrostate to undergo a thermodynamic evolution of increasing entropy and converge to equilibrium (and typical for equilibrium states to remain in equilibrium for a very long time). Crucially, this typicality statement refers to initial conditions relative to the initial macrostate. In other words, the relevant reference set, when discussing convergence to equilibrium from a non-equilibrium macrostate .M2 , is the set .ГM2 of possible microstates that coarse-grain to .M2 . In terms of overall phase space volume, a non-equilibrium macrostate occupies a vanishingly small fraction of phase space, to begin with. Thus, more technically speaking, the typicality measure must be conditionalized on the initial macrostate. The time symmetry of the microscopic laws is now manifested in the fact that the phase space volume occupied by the “good” initial conditions in .ГM2 —those for which the system will relax into equilibrium—is just as large as the measure of “bad” initial conditions in .Гeq for which the system will evolve out of equilibrium into the macrostate .M2 . In other words, over any given period of time, there are just as many solutions that evolve into equilibrium as solutions evolving out of equilibrium into a lowerentropy state. The first case, however, is typical for microstates in .ГM2 , i.e., for systems in non-equilibrium, whereas the anti-thermodynamic evolution is atypical for microstates in .Гeq , that is, with respect to all possible equilibrium configurations. It is this fact and this fact alone that grounds the irreversibility of the second law of thermodynamics. And what breaks the time symmetry is only the assumption (or preparation) of a special, i.e., low-entropy initial macrostate.

8 Boltzmann’s Statistical Mechanics

143

There is another manifestation of time symmetry that is not broken by Boltzmann’s typicality account. Relative to a low-entropy initial macrostate, entropy increase is typical in both time directions. In other words, it is typical for the system to evolve into equilibrium in the future but also to have evolved from an equilibrium state in the past. This is not a concern as long as we are talking about subsystems that have been prepared or created in a low-entropy state (and cannot be regarded as closed systems prior to this initial time). However, it does give rise to a further puzzle as soon as we talk about the thermodynamic history of our universe and how we should infer its past from present evidence.

Past Hypothesis and the Thermodynamic Arrow Indeed, by identifying special macroscopic boundary conditions as the origin of the thermodynamic asymmetry, the typicality account is shifting the puzzle from why it is that non-equilibrium systems converge to equilibrium, to why it is that we find systems in such special states in the first place. Note that with respect to all possible microstates, nearly all configurations realize a state for which the system is in equilibrium, will be in equilibrium for most of its future, and has been in equilibrium for most of its past. This situation—which would be typical tout court—is indeed a time-symmetric one. As long as we are preoccupied with boxes of gas or melting ice cubes or the like, their low-entropy states are usually attributable to influences from outside, i.e., to the fact that these systems are part of some larger system (usually containing a physicist, or a freezer, or the like) from which they branched off at some point to undergo a (more or less) autonomous evolution as (more or less) isolated subsystems. This presupposes, however, that these larger systems were themselves out of equilibrium; otherwise, they could not have given rise to branching subsystems with less than maximal entropy without violating the second law. If we think this through to the end, we arrive at the question of why it is that we find our universe in such a special state, far away from thermodynamic equilibrium, and how to justify our belief that

144

D. Lazarovici

its state was even more special the farther we go back in time. This is what (Goldstein, 2001, p. 49) calls the “hard part of the problem [of irreversibility],” and it concerns, broadly speaking, the origin of irreversibility and the thermodynamic arrow of time in our universe. Dealing with the “hard problem” will require us to confront the role and status of the Past Hypothesis (Albert, 2000) postulating a very-low-entropy beginning of our universe, which we will do in Chap. 11. It will also lead us to the question of whether the arrow of time per se can be reduced to the thermodynamic asymmetry, that is, whether the temporal direction of the low-entropy boundary condition is determinative of what we experience as the past.

8.1.3 The Role of the Typicality Measure In Boltzmann’s account of the second law, typicality is understood in terms of the Liouville measure .λ, corresponding to the intuitive phase space volume. More precisely, for a perfectly isolated system with total energy E, we have to consider the microcanonical measure .λE on the hypersurface .ГE ⊂ Ω of constant energy E, to which the motion of the system is confined in virtue of energy conservation. I usually omit this distinction for the sake of simplicity and merely refer to “phase space” and the “measure” or “volume” of its subsets. In any case, a crucial property of both the Liouville and the microcanonical measure is their stationarity under the microscopic time evolution, which ensures that phase space volume is conserved by the dynamics and that macro-regions do not change in size. Returning to the above presentation of the typicality account, we note that the measure serves two distinct purposes: 1. To establish that the phase space region corresponding to the macrostate .M2 is very much larger than the phase space region corresponding to the macrostate .M1 and that the phase space region corresponding to the equilibrium macrostate .Meq is very much larger than the macro-region of .M2 , so large that it occupies almost the entire phase space volume.

8 Boltzmann’s Statistical Mechanics

145

It is easy to learn about this “dominance of the equilibrium state” (Frigg, 2011) yet hard to appreciate the scale of proportions involved. 24 Just think of the ratio .1010 : 1 that we encountered for the gas model, which is beyond anything we could intuitively grasp. For reference, the volume of the observable universe compared to that of a hydrogen atom is about .10110 : 1. Together with the stationarity of the phase space measure, the dominance of the equilibrium state already entails that typical solution trajectories are in equilibrium by far most of the time (Reichert, 2023). This “qualitatively ergodic behavior” is not what we need since it implies nothing relevant about the convergence to equilibrium, but it goes in the right direction. 2. To define a notion of typicality relative to the system’s initial macrostate, allowing us to assert that nearly all microstates in the non-equilibrium region .ГM2 will evolve into equilibrium, while nearly all equilibrium configurations will stay in equilibrium for the foreseeable future. Regarding the meaning of “nearly all,” one must note that it is only in the idealized situation of a thermodynamic limit (where the number of microscopic degrees of freedom tends to infinity) that one can expect the exception set of “bad” configurations to be of measure zero. For realistic systems, the atypicality of such configurations means that they form a set of extremely small (yet positive) measure compared to that of the respective macro-region. In fact, stationarity allows us to estimate the ratio of good versus bad microstates in .ГM2 in the following sense. Let .ГMlow be the region of phase space corresponding to states of (significantly) lower entropy, and .B ⊂ ГM2 be the set of “bad” initial conditions in .ГM2 that will have evolved into .ГMlow after a given time T . Then .ФT ,0 (B) ⊆ ГMlow and thus .|B| = |ФT ,0 (B)| ≤ |ГMlow |. Hence, .|B| : |ГM2 | ≤ |ГMlow | : 24 |ГM2 | ≈ 1 : 1010 . There are thus two typicality statements involved in Boltzmann’s account of the second law. First, equilibrium configurations are typical in .Г, that is, with respect to all possible microstates. This is an observation about the kind of phase space partitions that natural sets of macrovariables give rise to. Second, micro-configurations converging to equi-

146

D. Lazarovici

librium and hence leading to thermodynamic macro-behavior are typical relative to a non-equilibrium initial macrostate. This second, conditional typicality statement is more subtle. Morally, it is a consequence of the first but notoriously difficult to prove for somewhat realistic models.

8.1.4 On the Boltzmann Entropy The Boltzmann entropy (8.1) is a strange kind of macro-variable. Strictly speaking, there is no such thing as the Boltzmann entropy. Since it is defined in terms of the size of the system’s macro-region, the Boltzmann entropy depends on the partition of phase space and thus on both the choice of macro-variables and the coarse-graining of their values into macrostates. When we are talking gas theory, there is a natural set of macrovariables—U (internal energy), V (volume), T (temperature), p (pressure)—two of which can be varied independently while the other two will relax to their typical, i.e., equilibrium values under those constraints. In terms of these variables, one finds that the (relative) Boltzmann entropy of the respective equilibria is the statistical-mechanical analog of the thermodynamic Clausius entropy defined by dSC =

.

1 δQ|rev , T

(8.2)

where .δQ|rev is the (infinitesimal) heat transfer along a reversible thermodynamic process. Note that the Clausius formula determines only entropy changes but no absolute value for .SC . The entropy (both Clausius’s and Boltzmann’s) is as much an objective variable of state as volume or temperature. Still, the statistical mechanical perspective has given rise to the idea that entropy, if not all of thermodynamics, is somehow subjective or anthropomorphic, dependent on our epistemic limitations and the way we perceive the world. There is a serious

8 Boltzmann’s Statistical Mechanics

147

question hiding behind a deep confusion:3 What determines the relevant partition of phase space into macrostates? The answer (only a partial one, but the important part) is the phenomena we want to explain, not our ignorance about the system’s microstate. Say we want to explain the phenomenon of ferromagnetism by studying an Ising model. Whether a magnet will stick to your fridge has nothing, absolutely nothing, to do with what you know or don’t know about its atomic spins. Most thermodynamic concepts do not even make sense on microscopic scales, but there is also nothing particularly human about the scales on which they do make sense and figure in highly stable and predictive regularities. They reach (at least) from nanostructures to single cells and all the way to cosmology. True, but old news is that one would not see thermodynamic irreversibility if one looked at the world on truly microscopic scales. And that one could make gases contract or heat flow from colder to hotter bodies if one had precise control over .∼1024 microscopic degrees of freedom. True but uninteresting is that, with respect to a partition of phase space into sets of measure zero—say, every single microstate corresponds to a different “macrostate”—the corresponding Boltzmann entropy (if one can call it that) would always be .−∞ (if defined at all). What would be the point of such a partition? Even if one takes the first step of introducing sensible macro-variables, it is a simple consequence of statistical mechanics that their exact values will fluctuate (unless they are constants of motion and the system perfectly isolated). The “first-order argument” one will often hear is that nearby values of the macro-variables coarse-grain to the same macrostate because they are empirically indistinguishable. While not wrong, it might be better to speak of describing a physical system at a certain resolution instead. In any case, statistical mechanics allows us to estimate fluctuations, e.g., through the variance and higher moments of the macro-variables, and thus the resolution at which statistical regularities manifest and with 3 The

confusion has other sources that I can’t go into here, from misunderstandings of Maxwell’s demon to formal analogies between the Gibbs entropy and the information-theoretic Shannon entropy that lead to confounding the two concepts.

148

D. Lazarovici

what amount of noise. These are themselves objective results—typicality results, of course. Recall, once again, the rule of thumb that typical ) fluctuations are of order . F −E(F ≈ √1N for macro-variables of the form E(F ) N .F (X) = i=1 fi (xi ). This means, in particular, that at resolutions sensitive to such minuscule fluctuations, thermodynamic equilibrium is never really stationary. This basic fact has also been used to question, if not the objectivity of macrostates, the concept of the Boltzmann equilibrium as the dominant macrostate. For instance, Lavis (2005) follows Boltzmann’s combinatorial analysis of the gas in a box, in which the one-constituent state space is partitioned into finitely many cells, and the macro-variable is the distribution of corresponding occupation numbers. He then observes—considering, for instance, the simple case of .N = 8 particles distributed over .m = 4 cells—that while the “most likely” occupation .(2, 2, 2, 2) (meaning, every cell contains exactly two particles) corresponds to more phase space volume than, say, .(3, 2, 2, 1), there are 12 possible permutations of .(3, 2, 2, 1), all describing non-equilibrium “macrostates”. Hence, he continues, the sum of the measures of such degenerate states exceeds that of the largest “macrostate” .(2, 2, 2, 2), which Lavis incorrectly identifies with the Boltzmann equilibrium (cf. Lazarovici and Reichert 2015). To cut a long story short, Lavis identifies macrostates with exact occupation numbers only to discover that, if we say a gas is “out of equilibrium” when a cell contains even a single atom more than the others, the gas will be “out of equilibrium” a whole lot. This is pretty much analogous to the observation that an exact equidistribution of numbers of eyes in a series of dice rolls is not typical (the probability even goes to 0 for .N → ∞). Boltzmann (1896) himself draws this analogy, saying that “far more possible combinations lead to an approximately equal number of ones, twos, etc.” (p. 776; translation and emphasis mine). Yet somehow, this triviality has caught on as a serious objection to Boltzmann, at least in the philosophical literature. In a similar vein, (Werndl & Frigg, 2015) criticize both Goldstein (2001) and Penrose (1989) for incorrectly inferring the dominance of the equilibrium state (the equilibrium region exhausting a majority of phase space volume) from its prevalence (the equilibrium region being larger

8 Boltzmann’s Statistical Mechanics

149

than any non-equilibrium region) “by calculating that the ratio between the measure of the equilibrium macro-region and the macro-region of a standard non-equilibrium state is of order .10N ” (p. 22). The objection is, again, that the measure of the non-equilibrium regions could sum up to a total that exceeds the volume of the equilibrium state. But if we recall that .N ∼ 1024 for macroscopic systems, we must wonder how many different macrostates Werndl and Frigg want to distinguish and why. Sure, with respect to a phase space partition fine enough to distinguish 1024 .∼10 “macrostates,” the state of maximum Boltzmann entropy will generally not be dominant and hence not much of an “equilibrium.” The only reproach one could make to Goldstein and Penrose is that their discussions implicitly assume macrostates deserving of the name.

8.2

The Status of Macroscopic Laws

In the old theory of thermodynamics, Clausius’s second law .

d S ≥ 0 (for isolated systems) dt

(8.3)

was understood as a law of nature, just as the name suggests. In addition to a general skepticism toward the atomistic theory, this understanding was the major obstacle to accepting the reduction achieved by Boltzmann’s statistical mechanics. The Boltzmannian analysis forces us to make two concessions with respect to the nomological status of the second law of thermodynamics. First, it cannot hold necessarily, that is, for all possible initial microconditions. Second, it cannot hold forever since, even with proper coarsegraining, the Boltzmann entropy will fluctuate on time scales approaching those of the Poincaré cycles—to values as low as one likes if one could just

150

D. Lazarovici

wait long enough. Hence, the second “law” holds only typically and on empirically relevant time scales.4 Some publications are still putting a lot of emphasis on the fact that Boltzmann’s second law is not exact, distinguishing, for instance, “thermodynamic-like behavior”—associated with the fluctuating Boltzmann entropy—from the “thermodynamic behavior” that was associated with the (supposedly) strictly non-decreasing Clausius entropy. But physicists have understood this point for more than a century, starting with Boltzmann himself. Clausius’s second law was not the full story, and the thermodynamic behavior we observe in nature is the one that statistical mechanics accounts for. (Not that it makes a difference for the kind of practical applications, especially to heat engines, that Clausius was actually concerned with.) Philosophically, the more interesting aspect of the statistical nature of thermodynamics is not that laws once thought to be exact turn out to be merely probabilistic or approximate, but that the supposed laws turn out to be contingent rather than necessary. According to the microscopic theory, the initial condition of our universe could have been such that systems, prepared or created in a low-entropy state, regularly end up in one of the “bad” micro-configurations that undergo an antithermodynamic evolution of decreasing entropy. That is to say that there are possible Newtonian universes in which gases occasionally contract rather than expand, broken eggs sometimes piece themselves together, and heat often flows from colder to hotter bodies. In these possible universes, it is not true that such events are “very unlikely” because they happen all the time. If we accept the microscopic laws as (more) fundamental, we have to concede that macroscopic laws like the second law of thermodynamics— even in an approximate or statistical form—are strictly speaking no laws at all, in that they lack nomological necessity. And yet, it is more than a mere contingency, more than a factum brutum that thermodynamic regularities hold in our universe. The question we must ask is therefore this: In what 4 Glenn

Shafer has pointed out to me that it doesn’t even make sense to say that a fluctuating quantity “increases” or “decreases” without a specification of the time scales to which such a statement refers.

8 Boltzmann’s Statistical Mechanics

151

sense do the fundamental laws ground such regularities? What concept is weaker than necessity and captures the nomological status of the second law of thermodynamics? The answer is typicality. Something is nomologically necessary if it obtains in all possible worlds permitted by the fundamental laws. The second law of thermodynamics—like many other macro-regularities that strike us as “law-like”—falls short of that, but not by a lot. They do not hold in all nomologically possible worlds but in nearly all of them. The standard notions of probability are ill-suited to do the job. Epistemic probabilities are no substitute for nomological necessity, not even a poor one. The strength of my belief in the second law of thermodynamics is completely beside the point. Frequentist probabilities might be appropriate to describe the explanandum: over some characteristic time interval, the Boltzmann entropy of an isolated system in non-equilibrium very likely increases. But frequentist probabilities are question-begging when referring to initial conditions of subsystems and meaningless when referring to those of the universe. If we take the deterministic theory seriously, we have to concede that there is nothing more random about the creation or preparation of a subsystem with certain initial conditions than about the evolution of an isolated system once prepared. To defer the source of “randomness” to the outside—from the box of gas to the shaky hands of the experimentalist or to external perturbations of the subsystem—is merely to pass the buck. But the buck must stop eventually. The universe as a whole is what it is. It exists once and only once. There is nothing before and nothing outside. And we either live in a universe in which the second law holds (for the universe as a whole and across its branching subsystems), or we do not. Typicality is just what the doctor ordered. It is a modal concept, expressing objective facts about the space of nomic possibilities. These facts provide for the relevant relation between microscopic laws and macroscopic regularities: The microscopic laws make the regularities typical. And they ground explanations and predictions, thus serving the epistemic and behavior-guiding functions that theorems of the laws are supposed to serve. Finally, typicality captures a sense of counterfactual robustness in that the regularity obtains not only for the actual initial conditions of the universe but for nearly all possible ones.

152

D. Lazarovici

Thermodynamic “laws” and other macroscopic regularities are typical regularities under the fundamental microscopic laws. It is this fact that characterizes the relevant reduction and grounds its own law-like status.

8.2.1 Derivation of Typicality Laws In the philosophy of science, the often-criticized yet very persistent models of Nagelian reduction and deductive-nomological explanations have established the idea that the relationship between a microscopic theory and a macroscopic regularity should be one of logical entailment. The macroscopic law (suitably translated into the language of the more fundamental theory) should be deduced from the more fundamental theory plus appropriate “auxiliary assumptions.” While not entirely wrong, this naive understanding misses the crucial role that initial conditions play in an account of macroscopic phenomena. For what is it to deduce, e.g., the thermodynamic behavior of a gas from the Newtonian laws of particle dynamics? Is it to show that there exists at least one microscopic configuration for which the gas will relax to equilibrium? Is it to show that it will happen for all possible initial states? The weakness of the first and the falsity of the second statement must severely question the adequacy of purely deductive schemes of reduction. Consider an inference of the form   ∀x F (x) ⇒ G(x) ,

.

where x ranges over possible microscopic realizations of the system and the predicate G is a suitable formulation of “exhibiting thermodynamic behavior.” Then, the antecedent .F (x) would have to contain some clause equivalent to “The initial conditions of the system x are such that .G(x),” which makes the inference too trivial to be explanatory. Of course, there are initial conditions for which the gas will expand. There are also initial conditions for which the gas will contract. And initial conditions for which the gas will transform into a banana. The point is that one could almost always maintain, for virtually any macroscopic property G, that .G(x) because the initial conditions were such that .G(x).

8 Boltzmann’s Statistical Mechanics

153

The only thing that can provide explanatory value in this context is the assertion of typicality. To establish that G is not a feature of certain special and fine-tuned micro-conditions but a physical property that the system would instantiate for the great majority of possible initial states (under relevant macroscopic boundary conditions). This is what it means to derive a macroscopic “law.” And it is also to ensure that the explanatory work (all the heavy lifting, at least) is really done by the reducing, more fundamental laws and not by initial micro-conditions. Notably, the relevant statement is now, logically and grammatically, a statement about G rather than any particular x. To derive the second law of thermodynamics is thus not to state a set of assumptions about an individual system from which its thermodynamic behavior can be deduced but to establish the relevant phenomenon as a typical feature of the microscopic laws. The same is true for other macroscopic regularities. Probabilistic schemes may give the appearance of strictly deductive explanations. If the explanandum is cast in probabilistic terms, it could be derived from the dynamical laws plus suitable probabilistic assumptions. Or so it seems. We must not fall for the trick of changing the meaning of “probability” somewhere along the way. Objective probabilities may describe the explanandum as a statistical regularity but not ground a satisfying explanation (for reasons already discussed). Epistemic probabilities may be used to reason about initial conditions but do not describe any physical phenomenon. The Humean chances discussed in Chap. 5 come closest to squaring the circle (if one grants that Humean laws are at all explanatory), but there I argued that Humean chances express typical frequencies whenever they are meaningful. Without a resort to typicality, the much weaker relation of supervenience faces the same problem as logical entailment. To say that a macroscopic regularity supervenes on the microscopic laws is to say that the relevant macroscopic features could not have been different without a difference in the microscopic laws. And this is patently false. For different micro-configurations, the same microscopic laws with the same macroscopic boundary conditions could give rise to entirely different macro-behavior. Supervenience holds true only if the target of reduction is correctly understood as a typicality statement: There cannot be any

154

D. Lazarovici

difference in the typical macroscopic features without a difference in the microscopic laws (or the relevant macroscopic boundary conditions). What else is left to say? Not much, I believe. To understand that a certain regularity is typical and still wonder why we observe this regularity in nature is to wonder why our universe is not exceedingly special; why it is, in the relevant respect, like the overwhelming majority of possible universes allowed by the fundamental laws. And while I wouldn’t know how to answer—except again with Einstein’s bon mot that “God is subtle, but not malicious”—the very question strikes me as utterly uncompelling. Explanations have to end somewhere. If we can establish that a given property is typical for a certain kind of physical system, it should remove any wonder or puzzlement as to why we find such systems instantiating the said property in nature. We should consider this regularity to be conclusively explained by the microscopic theory. Similarly, if we establish that a property is typical for a certain kind of system, we should expect to find this property instantiated in systems of the said kind. It thus constitutes a prediction of the theory. In this fashion, typicality figures in a fundamental and indispensable way not just of theory reduction but of analyzing physical theories in general. Indeed, since our situation in the world is necessarily one in which our evidence is compatible with a plurality of (microscopic) states of affairs, the relevant explanations and predictions that we extract from fundamental physics are almost always typicality results.

8.3

Boltzmann vs. Gibbs

Although it does not quite fit with the rest of this chapter, I have to mention the other influential framework of statistical mechanics that goes back to J.W. Gibbs. The key difference between Boltzmann’s and Gibb’s statistical mechanics is often characterized as one between an individualist and an ensemblist approach (cf. Goldstein 2019). Boltzmann’s statistical mechanics assigns micro- and macrostates to individual systems. This sometimes raises the question of how probability theory can be applied, but the question is answered by the concept of typicality

8 Boltzmann’s Statistical Mechanics

155

which then grounds objective probabilities as typical frequencies. In Gibbsian statistical mechanics, probability measures are interpreted as ensemble distributions and thus taken to describe the state of an (actual or hypothetical) collection of systems. Alternatively, there is a long (but in my view misguided) tradition of subjectivist interpretations (see Uffink (2011) for a historical overview). Both raise the question of what Gibbsian predictions imply for observations on individual systems. It is this question that I want to address, at least for equilibrium distributions.

8.3.1 Empirical Equivalence of Equilibrium Values In the Boltzmannian framework of statistical mechanics, the macrovariables take (approximately) constant values on the equilibrium region of phase space, which are thus revealed by suitable measurements on a system in equilibrium—a system, that is, whose actual microstate is in the equilibrium state. In the Gibbsian framework, equilibrium is a property of an ensemble represented by a stationary probability distribution .ρ on phase space .Г. And it is usually (though maybe somewhat carelessly) said that the prediction for a measurement of a macro-variable f on an individual ensemble system is given by the phase average  f  =

f (x)ρ(x) dx ,

.

(8.4)

Г

where .x ∈ Г are the phase space coordinates. This quantity is also called the ensemble average or simply the expectation value of f . Given these conceptual differences, Werndl and Frigg (2017) pose the question, if and when Boltzmann and Gibbs yield equivalent predictions for equilibrium values of macroscopic observables. Their paper begins by mentioning the “Khinchin condition” which it briefly characterizes as the phase function having “small dispersion for systems with a large number of constituents.” This is essentially the correct answer, and I argued elsewhere why the rest of Werndl’s and Frigg’s discussion is based on fundamental misunderstandings of Boltzmannian statistical mechanics (Lazarovici, 2019). Indeed, a sufficiently small dispersion of the macro-

156

D. Lazarovici

variable means precisely that typical values (the Boltzmann equilibrium value) are close to the ensemble average (the Gibbsian equilibrium value). If dispersion is measured by the variance of f , this is just our (2.4) from Chap. 2. Another way to formulate the Khinchin condition—now from a Boltzmannian perspective—is to say that there exists a unique Boltzmann equilibrium whose corresponding macro-region exhausts almost the entire phase space volume. Formally,   .μρ Гeq =

 1{f (x) ∈ (ξ ± Δξ )} ρ(x) dx = 1 − ∈,

(8.5)

Г

where .Δξ  |ξ |, and .∈  1. For then, the macro-variable f takes an (approximately) constant value—the Boltzmannian equilibrium value .ξ ± Δξ —on a set of measure close to 1—the Boltzmannian equilibrium region .Гeq . Hence, the phase average (8.4) will be close to the Boltzmannian equilibrium value (provided f is somewhat well behaved and its values don’t suddenly “explode” outside the equilibrium region).5 The existence of a dominant Boltzmann equilibrium is the generic case in statistical mechanics, so that Boltzmann and Gibbs make (in general) equivalent predictions for systems in the respective equilibria. If one considers the Boltzmannian formulation as the more fundamental one, this also explains why Gibbsian phase averaging yields relevant predictions for individual systems. Simply put, the macro-variables are essentially constant across most of the ensemble, and the average thus reflects this typical value. There are also famous and well-studied cases in which (8.5) does not hold. For instance, in the two-dimensional Ising model without external field (that Werndl and Frigg briefly mention), it makes sense to speak of

5A

rigorous estimate:



|f  − (1 − ∈)ξ | ≤ .

 |f (x) − ξ | ρ(x) dx +

Гeq

|f (x)|ρ(x) dx Г\Гeq

≤ (1 − ∈)Δξ + ∈ sup |f (x)|, x∈Г\Гeq

and thus .f  ≈ ξ assuming .∈ supx∈Г\Гeq |f (x)|  |ξ |.

8 Boltzmann’s Statistical Mechanics

157

two Boltzmann equilibria below the critical temperature, corresponding to a positive or negative magnetization, respectively. The distribution .ρ is, however, symmetric under a flip of all spins, thus yielding an average magnetization of zero. There is nothing mysterious about this fact, as long as we keep in mind that the Gibbsian value refers to an ensemble average. In particular, treatments of the Ising model rarely try to draw interesting conclusions from the phase averages. Instead, one usually studies phase transitions at the critical temperature by fixing either .+1 or .−1 boundary conditions (referring to the polarization of spins at the edge of the lattice), thus implicitly picking one of the two magnetization states. Cum grano salis, the Khinchin condition in the sense of uniqueness and dominance of the Boltzmannian equilibrium is not only sufficient6 but also necessary for the empirical equivalence of Boltzmannian and Gibbsian equilibrium predictions. If the condition does not hold, it means that we have either no Boltzmann equilibrium—and thus no Boltzmann equilibrium value—or multiple Boltzmann equilibria, so that the phase average will approximate an average of the respective values rather than any one in particular.7 A more relevant observation is, however, the following: If the Khinchin condition is violated, it means that there is a high probability of finding macro-values that differ significantly from the phase average, so that this phase average, as a prediction for individual measurements, is highly dubious in the first place. This becomes clearer if we also consider the ensemble variance  2 .(Δf ) := (f (x) − f )2 ρ(x) dx (8.6) and identify the Gibbsian prediction with .f  ± Δf . Finally, I have to address the common misconception that the empirical relevance of Gibbsian phase averages has something to do with Birkhoff ’s

6 Together

with some appropriate bound on the variation of the macro-variable.

7 Unless that average just happens to correspond itself to one of the Boltzmann equilibrium values.

158

D. Lazarovici

ergodic theorem, which establishes equality between the phase average (8.4) and the time average 1 . lim T →∞ T



T

f (x(t))dt

(8.7)

0

for almost all initial conditions. The argument is that measurements are not instantaneous but require a prolonged interaction between system and measurement device and will thus reveal a value close to (8.7). This is wrong because the time scales on which a macroscopic system could exhibit ergodic behavior and explore a large portion of phase space are much too long to be empirically relevant (see Goldstein 2001).

8.3.2 Derivation of the Equilibrium Ensembles To further clarify the relationship between the Gibbsian and Boltzmannian framework (and validate the view of the latter as more fundamental), let us look at how equilibrium ensemble distributions would be derived within Boltzmann’s statistical mechanics. The most important examples are: • The microcanonical ensemble for isolated subsystems with Hamiltonian h and constant energy .ES −1 ρm (ω) = Zm δ(h − ES ).

.

(8.8)

• The canonical ensemble for subsystems that can exchange energy (heat) with their environment ρc (ω) = Zc−1 exp(−βh(ω)).

.

(8.9)

This is also called a Boltzmann distribution. The normalization factor Zc is the canonical partition function.

.

8 Boltzmann’s Statistical Mechanics

159

• The grand canonical ensemble for subsystems that can exchange both energy and particles with the environment ρg (ω) = Zg−1 exp(−β[h(ω) − μN (ω)]),

.

(8.10)

where .N (ω) is the number of particles in the subsystem with microstate .ω and the constant .μ is called the chemical potential. These are equilibrium states in the sense of Gibbs, i.e., stationary distributions under the subsystem dynamics generated by h. They also maximize the Gibbs entropy  SG (ρ) = −kB

.

ρ(ω) log ρ(ω)dω

(8.11)

under the relevant constraints (e.g., constant particle numbers and fixed mean energy in the case of the canonical ensemble). This is a useful but deceptive feature since the Gibbs entropy is not the right concept to understand the second law of thermodynamics and convergence to equilibrium (see Goldstein et al. 2020), not least because it is always constant under a Hamiltonian time evolution. To understand these equilibrium distributions from a Boltzmannian perspective, we have to take the notion of ensembles seriously. Suppose that our universe includes an ensemble of n independent subsystems with the same Hamiltonian h. We split the phase space coordinates into .X = (ω1 , . . . , ωn , Xenv ), where .ωi are the coordinates of subsystem i and .Xenv the remaining degrees of freedom of the environment. If f is a macrovariable for the subsystems, we consider its empirical mean across the ensemble, i.e., 1 f (ωi ). := n i=1 n

n .f emp (X)

(8.12)

f could be a characteristic function .f (ω) = 1A (ω), in which case (8.12) is an empirical distribution proper, yielding the relative frequency of ensemble systems with microstate in A.

160

D. Lazarovici

A derivation of an ensemble distribution .ρ would now be a typicality result    n .Typ femp ≈ f (ω)ρ(ω)dω , more precisely, a law-of-large-numbers result of the form  λE X ∈ ГE .

  n : femp − f (ω)ρ(ω)dω > ∈ ≤ δ(f , ∈, n) with δ(f , ∈, n) → 0, n → ∞. (8.13)

λE is the microcanonical measure on the phase space .ГE of the “universe,” serving as the typicality measure. This is the general form of a justification of a statistical hypothesis. We have already seen a special case of (8.13), namely the Maxwellian velocity distribution of an ideal gas (also a kind of canonical ensemble) for which the desired result can be obtained analytically. Textbook derivations of the equilibrium ensembles (see, e.g., Schwabl (2006, Chap. 2.6 ff.) or Bricmont (2022, Chap. 6.6)) are not generally cast in this form but can and should be read as arguments for (8.13). There are more subtle questions about the relationship between Boltzmann and Gibbs, especially for non-equilibrium systems. But in general, Gibbsian statistical mechanics can be understood as pertaining to typical subsystem ensembles in the Boltzmannian sense.

.

References Albert, D. Z. (2000). Time and chance. Cambridge, Massachusetts: Harvard University Press. Boltzmann, L. (1896). Entgegnung auf die wärmetheoretischen Betrachtungen des Hrn. E. Zermelo. Wiedemanns Annalen, 57, 773–784. Bricmont, J. (1995). Science of chaos or chaos in science? Annals of the New York Academy of Sciences, 775(1), 131–175.

8 Boltzmann’s Statistical Mechanics

161

Bricmont, J. (2022). Making sense of statistical mechanics. Undergraduate lecture notes in physics. Cham: Springer International Publishing. Carroll, S. (2010). From eternity to here. New York: Dutton. Ehrenfest, P. a. T. (1907). Begriffliche Grundlagen der Statistischen Auffassung in der Mechanik. In F. Klein & C. Müller (Eds.). Mechanik (pp. 773–860). Wiesbaden: Vieweg+Teubner Verlag. Frigg, R. (2011). Why typicality does not explain the approach to equilibrium. In M. Suárez (Ed.). Probabilities, causes and propensities in physics. Synthese Library (pp. 77–93). Dordrecht: Springer Netherlands. Goldstein, S. (2001). Boltzmann’s approach to statistical mechanics. In J. Bricmont, D. Dürr, M. C. Galavotti, G. Ghirardi, F. Petruccione, & N. Zanghì (eds.). Chance in physics: Foundations and perspectives (pp. 39–54). Berlin: Springer. Goldstein, S. (2019). Individualist and ensemblist approaches to the foundations of statistical mechanics. The Monist, 102(4), 439–457. Goldstein, S., Lebowitz, J. L., Tumulka, R., & Zanghì, N. (2020). Gibbs and Boltzmann entropy in classical and quantum mechanics. In Statistical mechanics and scientific explanation (pp. 519–581). World Scientific. Lavis, D. A. (2005). Boltzmann and Gibbs: An attempted reconciliation. Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics, 36 (2), 245–273. Lazarovici, D. (2019). On Boltzmann versus Gibbs and the equilibrium in statistical mechanics. Philosophy of Science, 86 (4), 785–793. Lazarovici, D. & Reichert, P. (2015). Typicality, irreversibility and the status of macroscopic laws. Erkenntnis, 80(4), 689–716. Lebowitz, J. L. (1993a). Boltzmann’s entropy and time’s arrow. Physics Today, 46 (9), 32–38. Lebowitz, J. L. (1993b). Macroscopic laws, microscopic dynamics, time’s arrow and Boltzmann’s entropy. Physica A, 194(1–4), 1–27. Penrose, R. (1989). The emperor’s new mind: Concerning computers, minds, and the laws of physics. Oxford: Oxford University Press. Reichert, P. (2023). Essentially ergodic behaviour. The British Journal for the Philosophy of Science, 74(1), 57–73. Schwabl, F. (2006). Statistical mechanics. Advanced texts in physics. Berlin: Springer. Uffink, J. (2011). Subjective probability and statistical physics. In Probabilities in physics. Oxford: Oxford University Press.

162

D. Lazarovici

Werndl, C. & Frigg, R. (2015). Rethinking Boltzmannian equilibrium. Philosophy of Science, 82(5), 1224–1235. Werndl, C. & Frigg, R. (2017). Mind the gap: Boltzmannian versus Gibbsian equilibrium. Philosophy of Science, 84(5), 1289–1302.

9 It’s Complicated: The Relationship between Physics and Mathematics

The intellectual attractiveness of a mathematical argument, as well as the considerable mental labor involved in following it, makes mathematics a powerful tool of intellectual prestidigitation – a glittering deception in which some are entrapped, and some, alas, entrappers. — Jack Schwartz, The Pernicious Influence of Mathematics on Science, 1966

Here is a joke: How can you tell that your janitor studied philosophy of physics? You complain that the heat is not working, and he asks you to check if your living room has ergodic properties. A joke is never good if one has to explain the punchline. But since I have limited comedic ambitions, I am going to do exactly that. The dispersion of heat in a volume, such as your living room, is a thermodynamic process, an instance of the second law, in fact. It can be phenomenologically described by the heat equation but is well understood, from a more fundamental point of view, in terms of particle motion. The reduction of the phenomenological law to the micro-dynamics of particles falls under the purview of statistical mechanics. And it is a widespread belief in the philosophical (but also part of the physical) literature that the explanation

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Lazarovici, Typicality Reasoning in Probability, Physics, and Metaphysics, New Directions in the Philosophy of Science, https://doi.org/10.1007/978-3-031-33448-1_9

163

164

D. Lazarovici

of thermodynamic behavior—if not the success of statistical mechanics, in general,—rests on the assumption of ergodicity (or stronger properties higher up the “ergodic hierarchy,” see Frigg and Kronz (2016)). If this were so, physics would not provide good reasons to expect the dispersion of heat emitted from a radiator if the room, qua physical system, failed to be ergodic. But the very question of whether a living room is an ergodic system strikes me as, well, comical. There is a sense in which the answer is obviously negative: A living room is not a perfectly isolated system and a fortiori not an ergodic system. There is another sense in which the answer is bound to depend on our mathematical modeling of the room (and practically impossible to prove for any but the most idealized models).1 It’s like asking about the second homotopy group of a cow. For a spherical cow, I can tell you.

9.1

The Pernicious Influence of Ergodic Theory

Fortunately, ergodic properties are completely irrelevant to the dispersion of heat from a radiator. Let us recall what ergodicity is all about. In the modern literature,2 ergodicity is introduced as a property of dynamical systems. A dynamical system is ergodic if every invariant set has measure 1 or 0. A measurable subset .A ⊆ Г is invariant under the time evolution if .Фt (A) = A, ∀t. Confusingly, a particular solution trajectory (i.e., a flow line) of the dynamical system can also be called ergodic. Then it means that the proportion of time that the trajectory spends (over its entire history) in any region of phase space corresponds to the measure of 1 One could, for instance, consider a hard-sphere gas in a closed ellipsoid with perfectly reflecting walls; see Bunimovich (1979); Sinai (1970) for ergodic properties of analogous two-dimensional models. 2 The “ergodic hypothesis” was first introduced by Ludwig Boltzmann but didn’t even appear in his second lectures on gas theory (1896). The concept was later revived, in modern form, by the groundbreaking works of Birkhoff, von Neumann, and Khinchin that established ergodic theory as a productive—and admittedly very elegant—field of mathematics, whose physical relevance is, however, questionable.

9 The Relationship between Physics and Mathematics

165

that region. Formally, 1 . lim T →∞ T



T

1A (X(t))dt = λE (A),

(9.1)

0

where .1A (x) is the characteristic function of .A ⊆ ГE and .λE is the microcanonical measure. This, in turn, is essentially equivalent to the statement that the solution trajectory comes arbitrarily close to every single point in .ГE , thus establishing the connection with Boltzmann’s original (quasi-)ergodic hypothesis.3 The celebrated theorem of Birkhoff (1931) establishes that typical solutions of an ergodic system—in the strong sense of all solutions except for a set of initial conditions with measure zero—are ergodic trajectories.4 In the literature on foundations of statistical mechanics, ergodicity has been assigned various tasks: to justify the choice of the microcanonical measure, to account for the empirical relevance of Gibbsian ensemble averages,5 or to account for thermodynamic behavior and the convergence to equilibrium. All of these ideas are misguided for different reasons, but for now, I shall focus on the claim that ergodicity is relevant for the explanation of the second law of thermodynamics. Frigg and Werndl (2011) provide the following argument: Consider an initial condition x that lies on an ergodic solution. The dynamics will carry x to [the equilibrium region] .ГMeq and will keep it there most of the time. The system will move out of the equilibrium region every now and then and visit non-equilibrium states. Yet since these are small compared to .ГMeq , it will only spend a small fraction of time there. Hence the entropy is close to its maximum most of the time

3 See,

for instance, the Ehrenfests (1907) on Boltzmann’s ergodic hypothesis, or Sklar (1973). and Werndl (2011) advocate instead for a weaker notion of “epsilon-ergodicity,” which only requires an ergodic evolution for all initial micro-conditions except for a set of positive measure .≤ ∈. This is doing nothing to avoid our following objections: the uselessness of epsilon-ergodicity is only more obvious since non-equilibrium macro-regions have tiny measure to begin with and may thus be entirely included in this exception set. 5 Cf. Sklar (1973) and our remarks in Sect. 8.3. 4 Frigg

166

D. Lazarovici

and fluctuates away from it only occasionally. Therefore, ergodic solutions behave [thermodynamic]-like. (p. 633)

In brief, ergodicity is not sufficient for thermodynamic behavior because: (a) Ergodicity of trajectories is a time-symmetric property (the time reversal of an ergodic solution is also an ergodic solution) and thus cannot account for thermodynamic irreversibility. (b) Infinite-time averages imply nothing about the behavior of the system on empirically relevant time scales. The characteristic time scale associated with irreversible thermodynamic behavior is that of a system’s relaxation time (the time it typically takes to reach equilibrium), which may be seconds for the spreading of a gas, minutes for the cooling of a hot bowl of soup, many years for the decay of radon, and many billions of years for the heat death of the universe. But all this is just the blink of an eye compared to the time scales associated with ergodic behavior (see Fig. 9.1). Ergodic time scales, the time scales, that is, on which trajectories begin to “wind around” the energy hypersurface and explore even the smallest (macro-)regions, are those of the Poincaré cycles, which were already estimated by Boltzmann to 10 be about .1010 years(!) for the gas model—exceeding the age of our universe by many orders of magnitude. S

Smax

20

1010

yrs

relaxation time

Fig. 9.1 Typical entropy curves of macroscopic systems on thermodynamic time scales (left) and ergodic time scales (right). On the right, periods of maximal entropy are vastly longer than depicted

9 The Relationship between Physics and Mathematics

167

Arnold and Avez, in their standard work on ergodic theory, put it very concisely (1968, p. 77, footnote 17): Statistical mechanics deals with asymptotic behavior as .N → +∞ (N =number of particles) and not as .t → +∞ for fixed N .

And ergodicity is not necessary for thermodynamic behavior because: (a.' ) We do not care if and for how long a micro-trajectory visits every measurable subset of phase space. Only the phase space regions associated with the partition into macrostates are relevant to the system’s macro-evolution. ' (b. ) A trajectory need not densely cover all of phase space to spend most of the time in the equilibrium region, any more than a person’s travel route needs to densely cover the surface of the Earth to spend most of the time outside the Vatican. In fact, that typical trajectories spend most of the time in the equilibrium region follows simply from the dominance of the equilibrium state (and the stationarity of the phase space measure) and does not require ergodicity (Reichert, 2023). In effect, despite the prima facie plausibility of the argument articulated by Werndl and Frigg, an ergodic evolution of the microstate has nothing to do with thermodynamic macro-behavior. There is certainly something about ergodicity as a property of dynamical systems, as opposed to individual solutions, that is in the right spirit as it captures a notion of chaos and implies the absence of dynamical “barriers” that prevent solutions from reaching the equilibrium region. But also from this perspective, ergodicity is doing both too much—in requiring that typical trajectories visit every open subset of phase space—and too little—in implying nothing about typical evolutions on empirically relevant time scales. Toy models like the Kac ring or the Ehrenfest urn are very instructive to see that ergodicity is not only unnecessary for convergence to equilibrium but almost entirely beside the point. For good discussions of these models, see Bricmont (1995, 2001).

168

D. Lazarovici

Let us briefly discuss another ergodic property that is stronger than ergodicity but somewhat more illuminating. A dynamical system .(Г, μ, Фt ) is called mixing if .

lim μ(A ∩ Ф−t (B)) = μ(A)μ(B)

t→∞

(9.2)

for all measurable .A, B ⊆ Г. We now consider a macro-variable F which is an average of one-particle quantities, i.e., a function of the form .F (x) = 1 N i=1 f (xi ), x = (x1 , . . . , xN ) ∈ Г. The equilibrium region then N corresponds to 

N  1    .B = x∈Г: f (xi ) − E(F ) < ∈ , N i=1

for some very small .∈. This is once again the law of large numbers: microconfigurations for which the value of F deviates only slightly from the theoretical mean exhaust the great majority of phase space volume. More precisely, μ(B) ≥ 1 −

.

V(f ) . ∈2N

(9.3)

We are, however, interested in systems that start out in a non-equilibrium macro-region A, which is to say that, at any time .t > 0, we care only about equilibrium configurations that have evolved from the set A at .t = 0. Such a boundary condition leads, in general, to correlations between the particles. But now we use the mixing property to conclude “convergence to equilibrium” in the following sense: .

μ(Фt (A) ∩ B)



V(f ) = μ(A ∩ Ф−t (B)) −−−→ μ(A)μ(B) ≥ μ(A) 1 − 2 . ∈ N t→∞

(9.4)

9 The Relationship between Physics and Mathematics

169

This means that, in the limit .t → ∞ (and for large N ), nearly all initial microstates in A end up in the equilibrium region B. This result (notably a typicality result) is as elegant as it is physically irrelevant. On the one hand, because realistic systems are hardly “mixing.” On the other hand, because—yet again—nothing of empirical relevance follows from infinite-time limits. This is in notable contrast to the quantitative estimate in terms of the particle number N on the right-hand side. Simply put, the .N → ∞ limit coming from the LLN is physically relevant, while the .t → ∞ limit coming from the mixing property is— absent additional results about the convergence rate—pure mathematical abstraction. Just like the weaker notion of ergodicity, mixing does succeed in capturing, in a very Platonic sense, some idea of chaotic behavior that is both plausible and relevant for realistic macro-systems. After a certain number of scatterings, the particles “forget” the system’s origin in the non-equilibrium region A and acquire some form of statistical independence. Hence, the configuration will start to look more and more like a typical configuration (relative to the entire phase), i.e., an equilibrium state. However, just like ergodicity, the mixing property (9.2) is doing both too much and too little to account for the relevant phenomenon of convergence to equilibrium. What we would really need is .μ(Фt (A) ∩ B) ≈ μ(A)μ(B) for t of the order of the system’s relaxation time. This is not the kind of result that follows from the very generic and abstract properties of dynamical systems. To prove a precise theorem for some interesting model, we will have to leave the Platonic realm of ergodic theory and get our hands dirty with very hard analysis and cumbersome epsilonics. Or we can live with less than rigorous proofs and appreciate the fact that Boltzmann’s typicality account provides everything we need to understand thermodynamic behavior as a virtually universal feature of macroscopic systems.

170

9.2

D. Lazarovici

Proof and Explanation

The deeper moral here is that when it comes to the difficult problem of macro-to-micro reduction, mathematical physics is, in many ways, the art of the possible. Evidently, we cannot just solve the equations of motion for 24 .N ≈ 10 particles to check for the desired macro-behavior. Hence, it lies in the nature of the problem that rigorous results are rare and difficult to come by. Instead, we make simplifications, approximations, and idealizations. We use cut-offs, rescalings, and infinite limits—alongside various technical assumptions that allow us to derive certain estimates or apply theorems that are already part of our mathematical toolbox. Often, such assumptions take precedence in the statement of a mathematical result, while the crucial ideas behind the strategy of proof get lost in technical details. In consequence, not all proofs are explanatory, and not all explanations can be turned into rigorous proofs. Discerning mathematical abstractions and technical crutches from physical insights that do actual explanatory work is a very subtle task at which the philosophy of physics too often fails. When we discuss the foundations of statistical mechanics, it is tempting to look at mathematical publications and read the premises of the reported results as the relevant axioms or auxiliary assumptions for some sort of deductive-nomological explanation—especially when they come in such a simple and elegant form as ergodic properties do. This literal-mindedness about mathematics is, however, counterproductive, leading us further away from true understanding. Rigorous theorems in statistical mechanics are extremely valuable by refining, substantiating, or challenging our physical understanding. But insisting on mathematical rigor is not always testament to a rigorous mind. Hand-waving about the fundamental postulates of a theory should not be acceptable (cf. the discussion of the quantum measurement problem in Chap. 13); but when it comes to applying a theory in complex situations, arguments based on educated intuition can be more instructive than precise yet sterile proof. For while it lies in the nature of a logical deduction that the truth of the conclusion depends rigidly on the truth

9 The Relationship between Physics and Mathematics

171

of the premises, it is essential to a good physical explanation that it is reasonably stable under perturbations of its underlying assumptions— especially when they are themselves the result of approximations and idealizations (cf. Schwartz 1966). Boltzmann’s account of thermodynamic irreversibility is an explanation, or explanatory scheme, not a proof. It leaves many details to be filled out for particular examples, but its generality and robustness is what makes it so powerful. In the philosophical literature, the account has nonetheless come under attack for its lack of mathematical rigor and the alleged failure to make its assumptions about the microscopic dynamics explicit (Frigg, 2009, 2011; Frigg & Werndl, 2011, 2012; Uffink, 2007). Frigg and Werndl (2012) even go as far as declaring that the typicality account is “mysterious” because the “connection with the dynamics” is unclear (p. 918). Jos Uffink writes on a similar note: [I]n order to obtain any satisfactory argument why the system should tend to evolve from non-equilibrium states to the equilibrium state, we should make some assumptions about its dynamics. In any case, judgments like ‘reasonable’ or ‘ridiculous’ remain partly a matter of taste. The reversibility objection is a request for mathematical proof (which, as the saying goes, is something that even convinces an unreasonable person). (2007, p. 61)

Looking at these objections in more detail, one finds that they are at least partially based on a misunderstanding of what the typicality account actually argues for (Lazarovici & Reichert, 2015). That aside, the critics seem to insist that any satisfactory account of the second law must involve a precise mathematical assumption about the dynamics of macroscopic systems that logically implies their thermodynamic behavior (see also Frigg and Werndl (2011, p. 632)). This request strikes me as naive, and I have tried to explain why a “reasonable person” will sometimes settle for less than rigorous proof. In any case, the promise of ergodic programs old and new was that the dynamics of trillions of interacting particles can be abstracted to a simple mathematical feature of the phase space flow that is both rigorous and universal across relevant systems. I would be elated if such a feature existed but see no reason why it should.

172

D. Lazarovici

One of the key insights to be gained from Boltzmann’s analysis is precisely that thermodynamic behavior does not rely on any special, narrowly defined feature of the microscopic time evolutions. Simply put, the role of the dynamics is merely to carry a great majority of the microstates in the vanishingly small non-equilibrium region reasonably quickly into the rest of phase space corresponding to thermodynamic equilibrium. And this is so much weaker and so much more plausible as an assumption about the dynamics of relevant macroscopic systems that it is hard to see how it could be further elucidated by a rigid technical premise. It is like asking what precise mathematical assumption about oceanic currents implies that most bottles thrown somewhere into the Atlantic won’t end up on the shore of Newport, New Jersey. It is simply a bad question, unless asked as a half-decent joke. It is indeed the absence of thermodynamic behavior that would point to some remarkable feature of the dynamics (like attractors or hidden conserved quantities) warranting further investigation. In principle, I agree with Frigg and Werndl (2011) that the ideal result, from a technical point of view, would be yet another typicality statement: that typical Hamiltonians, within a broad class of relevant interacting models, lead to thermodynamic behavior and convergence to equilibrium. I just don’t think that such a result is in the cards, at least for now. And I don’t think that our physical understanding of the second law of thermodynamics hinges on it in any meaningful sense.

References Arnold, V. I., & Avez, A. (1968). Ergodic problems of classical mechanics. The Mathematical Physics Monograph Series. W.A. Benjamin (1st ed.). Birkhoff, G. D. (1931). Proof of the ergodic theorem. Proceedings of the National Academy of Sciences of the United States of America, 17, 656–660. Bricmont, J. (1995). Science of chaos or chaos in science? Annals of the New York Academy of Sciences, 775(1), 131–175. Bricmont, J. (2001). Bayes, Boltzmann and Bohm: Probabilities in physics. In Chance in physics. Lecture Notes in Physics (pp. 3–21). Berlin: Springer.

9 The Relationship between Physics and Mathematics

173

Bunimovich, L. A. (1979). On the ergodic properties of nowhere dispersing billiards. Communications in Mathematical Physics, 65(3), 295–312. Ehrenfest, P. a. T. (1907). Begriffliche Grundlagen der Statistischen Auffassung in der Mechanik. In F. Klein & C. Müller (Eds.). Mechanik (pp. 773–860). Wiesbaden: Vieweg+Teubner Verlag. Frigg, R. (2009). Typicality and the approach to equilibrium in Boltzmannian statistical mechanics. Philosophy of Science, 76 (5), 997–1008. Frigg, R. (2011). Why typicality does not explain the approach to equilibrium. In M. Suárez (Ed.). Probabilities, causes and propensities in physics. Synthese Library (pp. 77–93). Dordrecht: Springer Netherlands. Frigg, R. & Kronz, F. (2016). The ergodic hierarchy. In E. N. Zalta (Ed.). The Stanford encyclopedia of philosophy (Summer 2016 ed.). Frigg, R. & Werndl, C. (2011). Explaining thermodynamic-like behavior in terms of epsilon-ergodicity. Philosophy of Science, 78(4), 628–652. Frigg, R. & Werndl, C. (2012). Demystifying typicality. Philosophy of Science, 79(5), 917–929. Lazarovici, D. & Reichert, P. (2015). Typicality, irreversibility and the status of macroscopic laws. Erkenntnis, 80(4), 689–716. Reichert, P. (2023). Essentially ergodic behaviour. The British Journal for the Philosophy of Science, 74(1), 57–73. Schwartz, J. (1966). The pernicious influence of mathematics on science. In E. Nagel, P. Suppes, & A. Tarski (Eds.). Studies in logic and the foundations of mathematics. Logic, Methodology and Philosophy of Science (Vol. 44, pp. 356–360). Elsevier. Sinai, Y. G. (1970). Dynamical systems with elastic reflections. Russian Mathematical Surveys, 25(2), 137. Sklar, L. (1973). Statistical explanation and ergodic theory. Philosophy of Science, 40(2), 194–212. Uffink, J. (2007). Compendium of the foundations of classical statistical physics. In Philosophy of physics (pp. 923–1074). Elsevier.

10 Boltzmann Equation and the H-Theorem

Although the formula engraved on Boltzmann’s tombstone is equation (8.1), connecting the entropy of a system with the measure associated with its macrostate, his name is at least as intimately associated with the Boltzmann equation and the H -theorem, describing, in a more quantitative manner, convergence to equilibrium for a low-density gas. This H theorem is of great interest in light of our previous discussion, because it can be seen as a concrete implementation of the general scheme that we introduced as the “typicality account.” By elaborating on this connection, I also want to counter two common misconceptions that may have arisen from Boltzmann’s first presentation of the H -theorem but persisted despite his more refined argumentation in later writings. The first is manifested in the charge that the H theorem begs the question as an account of thermodynamic irreversibility because the derivation of the Boltzmann equation is based on a timeasymmetric assumption about the micro-dynamics. The second, more basic misunderstanding is that the H -theorem and the typicality account are somehow competing accounts of entropy increase and convergence to

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Lazarovici, Typicality Reasoning in Probability, Physics, and Metaphysics, New Directions in the Philosophy of Science, https://doi.org/10.1007/978-3-031-33448-1_10

175

176

D. Lazarovici

equilibrium. Witness, for instance, Huw Price, who writes with respect to the latter: In essence, I think –although he himself does not present it in these terms– what Boltzmann offers is an alternative to his own famous H -theorem. The H -theorem offers a dynamical argument that the entropy of a nonequilibrium system must increase over time, as a result of collisions between its constituent particles. […] The statistical approach does away with this dynamical argument altogether. (Price, 2002, p. 27)

Similarly, the pertinent entry in the Stanford Encyclopedia of Philosophy (Uffink, 2017) presents Boltzmann’s work as a series of rather incoherent (and ultimately inconclusive) attempts to explain thermodynamic irreversibility. I am convinced that the reason why Boltzmann did not present the “statistical approach” as an alternative to the H -theorem is simply that it is no such thing. Understood correctly, there is a clear conceptual continuity between the H -theorem and the typicality account, so the latter does not appear as a break with Boltzmann’s earlier work but as a distillation of its essence (see also Goldstein (2001), Goldstein and Lebowitz (2004)). To understand the H -theorem correctly, we need to discuss the concept of distribution functions and kinetic equations in general.1 We recall that the microstate of an N -particle system is represented by a point .X = (q1 , . . . , qN ; p1 , . . . , pN ) in the 6N -dimensional phase space .Г, comprising the position and momenta of its constituent particles. Assuming “identical” particles, i.e., that all particles are of the same type, the same state—modulo permutations—can also be represented as N points in the six-dimensional .μ-space, whose coordinates correspond to the position and velocity of a single particle, i.e., .X → {(q1 , v1 ), . . . , (qN , vN )}, with .vi := pi /m and m the particle mass. Many results in many-body physics and statistical mechanics, most famously Boltzmann’s H -theorem, are concerned with the evolution of a function .fX (q, v) on this .μ-space, which provides an efficient 1 For a good introduction to these topics, see also Davies (1977); for more detailed mathematical treatments, e.g., Spohn (1991), Villani (2002).

10 Boltzmann Equation and the H-Theorem

177

description of the most important (macroscopic) characteristics of a system in the microstate X. This function is the empirical distribution or coarse-grained density of points in .μ-space. We can think of dividing .μ-space into small cells—whose dimension is large enough to contain a great number of particles, yet very small compared to the resolution of macroscopic observations—and counting the number of particles in each. For fixed q and v, .fX (q, v) then corresponds to the fraction of particles in the cell around .(q, v). In the limit where the size of the cells goes to zero, the coarse-grained empirical distribution becomes the microscopic distribution N .μ X

N 1  := δ(q − qi ) δ(v − vi ). N i=1

(10.1)

Note that if .A(q, v) is some function on .μ-space,  .

A(q, v)μN X (q, v) dqdv

N 1  A(qi , vi ) = N i=1

(10.2)

returns its average value for the microstate X. Analogously, the expression  A(q, v)fX (q, v) dqdv is a coarse-grained average, describing the system at a lower resolution, so to speak. Although .fX (q, v) is technically a probability density (just as .μX is technically a probability measure), there is nothing random about it. Instead, we should think of .X I→ fX itself as a kind of macro-variable, a coarse-graining function of microstates. In particular, we can compute the Boltzmann entropy associated with any such distribution. Suppose we divide (a compact subset of ) .μ-space into .m  N cells .(C1 , . . . , Cm ) of equal size .|C| and denote by .Nk the number of particles in the cell .Ck , k ∈ {1, .., m}. By simple combinatorics, there are .

.

N! N1 ! · · · Nm !

(10.3)

178

D. Lazarovici

ways to distribute the particles over the cells that lead to √ the same  n occupation numbers. Using the Sterling approximation .n! ≈ 2π n ne and .N1 + . . . + Nm = N , we find for the Boltzmann entropy associated with .fX :  S ≈ kB N |C| const. −

m 

.

 Nk log(Nk ) .

(10.4)

k=1

Writing .Nk = Nfk |C| with .fk the density of particles in the cell .Ck , we m thus have .S ≈ const. − kB N |C|2 fk log(fk ), and see Boltzmann’s k=1

famous H -functional  H [ft ] =

.

f (t, q, v) log f (t, q, v)

(10.5)

emerging in the continuum limit, whence a decrease in .H (t) corresponds to an increase in the Boltzmann entropy of the macrostate described by .fX (q, v).

10.1 Kinetic Equations Since .fX is a function of the microstate, tracking its time evolution would essentially require us to solve the microscopic equations of motion for all N particles. A hopeless task, even if the coarse-graining forgives small imprecisions. To obtain a more tractable description of the system, one thus considers instead a continuum limit of .fX , in which the sizes of the cells tend to zero while N tends to infinity. Indeed, the distribution function becomes a truly powerful concept when one studies effective models in which a continuous function .f (t, q, v) follows an autonomous time evolution given by a partial differential equation, a so-called kinetic equation, of the form ∂t f + p · ∇q f + K · ∇p f = (∂t f )coll .

.

(10.6)

10 Boltzmann Equation and the H-Theorem

179

Here, K is a force term describing long-range interactions and .(∂t f )coll the collision term characteristic of the Boltzmann equation. The classical ansatz for the collision term is  . (∂t f )coll (q, v) = W (v1 , v2 ; v3 , v) [f (t, q, v1 )f (t, q, v2 ) −f (t, q, v3 )f (t, q, v)] dv1 dv2 dv3

(10.7)

with an appropriate scattering kernel .W (v1 , v2 ; v1 , v2 ), giving the probability per unit time that a collision of two particles with velocities .v1 and .v2 results in velocities .v and .v , respectively. The standard form of the 1 2 Boltzmann equation is (10.6) with the force term K set to zero. Important examples of kinetic equations without collision term are socalled Vlasov or mean-field equations2 with a force term  K(t, q) = −

.

∇V (q − q )f (t, q , v )dq dv

(10.8)

for an interaction potential V . The idea behind these models is that every particle “feels” the average force exerted by the current particle distribution. In any case, the continuous distribution .f (t) arising as a solution of the kinetic equation (10.6) is supposed to approximate (in the limit of large particle numbers) the actual empirical distribution .fX(t) of the N particle system that evolves according to the “true” micro-dynamics. To derive the kinetic equation, i.e., justify the effective model based on the more fundamental microscopic theory, is thus to prove a statement of the following kind: Let .f (t, q, v) be a solution of (10.6) with the boundary condition .f (0, q, v) = f0 (q, v). If the (continuous) density .f0 is a good approximation to the empirical distribution .fX of the initial microstate X, then .f (t) will be a good approximation to the empirical distribution .fX(t) of

2 An equation of this type was introduced by A.A. Vlasov (1938, 1968) in his work on plasma physics and even earlier by J.H. Jeans (1915) in the context of Newtonian stellar dynamics.

180

D. Lazarovici

the time-evolved microstate .X(t). 0

microscopic

kinetic equation

time evolution

time evolution

.

()

( )

(10.9)

For somewhat realistic interactions, this won’t be true for all initial configurations X with .fX ≈ f0 but only for typical ones (see, e.g., Hauray and Jabin (2015); Lazarovici and Pickl (2017) for pertinent results about mean-field equations). The derivation of a kinetic equation is thus, in general, a typicality result.

Weak Convergence Mathematically, the relevant approximation is made precise in terms of the weak topology on the space of probability measures. For a sequence .(μk )k of normalized measures, weak convergence to .ν is denoted by .μk  ν and means that 

 .

φ(x) dμk (x) →

φ(x) dν(x),

k → ∞,

for all bounded and continuous functions .φ : Rn → R. A convenient metric inducing this topology is the first Wasserstein metric, which can be defined as:

  φ(x) − φ(y) =1 . .W1 (μ, ν) := sup φ(x) dμ(x) − φ(x) dν(x) : sup |x − y| φ x =y

(continued)

10 Boltzmann Equation and the H-Theorem

181

Hence, a small Wasserstein distance between two measures implies approximately equal averages when integrating a somewhat well-behaved “macrovariable” (one that doesn’t vary too quickly). In fact, such a metric allows us to compare a continuous density .f (t) directly with the discrete, microscopic distribution .μX(t) , making the “stepfunction” .fX(t) (which depends on a partition of .μ-space into cells) dispensable for technical purposes. We nonetheless keep the focus on .fX(t) , as it makes the coarse-graining nature of the Boltzmannian distribution function more evident.

One can also take a different perspective on the derivation of kinetic equations that deals –in mathematical lingo– with ensembles or “random initial conditions,” but is best understood as aiming more directly at typicality results. This approach considers measures on the N -particle phase space .Г rather than distributions on the reduced .μ-space. Suppose that, at .t = 0, the particles are identically and independently distributed according to .f0 , that is, according to the product measure .F0N = ⊗N f0 on .Г. If F is evolved with the N -particle flow determined by the microscopic dynamics, it satisfies the Liouville equation ∂t FtN +

N 

.

i=1

pi · ∇qi FtN +

N  1  ∇V (qi − qj ) · ∇pi FtN = 0. N i =j i=1

(10.10) Now one would like to establish that, under this time evolution, the particles remain “approximately independent” with .FtN ≈ ⊗N ft , where .ft is the solution of the kinetic equation (10.6) with initial condition .f0 .

10.1.1 Molecular Chaos How is this approximation understood mathematically? We still want to look at the thermodynamic limit .N → ∞, but .(FtN )N∈N is now a

182

D. Lazarovici

sequence of measures on different phase spaces. To speak about convergence, one thus considers the reduced k-particle marginals (k) .

FtN (q1 , . . . , qk , v1 , . . . , vk )  := FtN (q1 , . . . , qN , v1 , . . . , vN ) d3 qk+1 . . . d3 qN d3 vk+1 . . . d3 vN , (10.11)

obtained by integrating out the degrees of freedom of .N − k particles. For any .k ∈ N, .((k) FtN )N>k is now a sequence of measures on the same k-particle phase space, and the goal is to prove (k)

.

FtN  ⊗k ft , N → ∞,

(10.12)

for all .k ∈ N. (In fact, it suffices to prove (10.12) for .k = 1, 2.) Equation (10.12) is the modern mathematical definition of molecular chaos. Under mild assumptions about the initial distribution .f0 , it is equivalent to the “deterministic” result sketched in the diagram 10.9 for typical initial conditions with respect to the product measure .F0N = ⊗N f0 . Why not the stationary Liouville or microcanonical measure? For large N , .F0N is concentrated on microstates X for which .fX ≈ f0 (see Fournier and Guillin (2014) for a rigorous result) and thus basically equivalent to the uniform measure conditionalized on the initial macroregion .Mf0 = {X ∈ Г : fX ≈ f0 }. This is our non-equilibrium boundary condition: That the system’s empirical distribution at .t = 0 is well approximated by the continuous distribution function .f0 . However, because of the manifest statistical independence of the particles, .F0N is a mathematically much more convenient choice. Still, if we think in terms of the microcanonical measure and recall from (7.11) that its k-particle marginals (for .N k) are Maxwellian distributions, this may already indicate how molecular chaos for the Boltzmann equation could establish convergence to equilibrium: Typical initial conditions in .Mf0 evolve into the equilibrium region .Mfeq characterized by an (approximately) Maxwellian distribution of velocities.

10 Boltzmann Equation and the H-Theorem

183

Remark 5 (Scaling Limits) The derivation of a kinetic equation always requires an appropriate rescaling of the microscopic dynamics to ensure that the relevant physical quantities remain of constant order in the limit .N → ∞. Conceptually, this is best understood as a dimensional rescaling of the time, position, and/or momentum coordinates. For the Vlasov equation, the relevant regime is the mean-field scaling .V → N1 V , which ensures that the total mass/charge of the system remains of order 1. This corresponds to tracking the time evolution on large (macroscopic) time scales, i.e., in rescaled coordinates .t = N −1/2 t, p = N 1/2 p. To derive the Boltzmann equation, one has to make some ansatz for the particle collisions in the microscopic dynamics. The simplest (interesting) one is the hard-sphere model. In any case, the relevant scaling regime is the Boltzmann–Grad limit, in which the scattering radius scales as −1/2 .r(N ) ∼ N , keeping the mean free path .λ−1 ∝ N r 2 constant.

10.2 The H-Theorem as a Typicality Result While kinetic equations are routinely and successfully applied in many areas of physics, the rigorous justification of the continuous model can be an awfully hard mathematical problem. For the Boltzmann equation, the landmark result of Lanford (1975) establishes molecular chaos only for a very short time interval (a fraction of the particles’ mean free time). Subsequent results have extended the proof to a larger class of scattering potentials but not yet overcome this crucial limitation. Instead of going into the technical details of these modern mathematical proofs, let us take a more informal look at Boltzmann’s reasoning behind the H -theorem. The goal of the H -theorem is to show the convergence of an initial non-equilibrium distribution .f0 (q, v) to the Maxwell distribution .feq (q, v). We have already seen from (7.12) that the Maxwell distribution corresponds to the equilibrium state, i.e., the typical value of the macro-variable .X → fX . In other words, while the coarse-grained distribution .fX may be different for different microscopic configurations

184

D. Lazarovici

X, it is, in fact, (more or less) the same for the overwhelming majority of possible microstates, namely (approximately) of the form fX (q, v) ∝ e− 2 βmv , 1

2

.

for a constant .β that corresponds to the inverse temperature of the system. Note that the distribution having no q-dependence means that the gas is homogeneously distributed over the entire volume, with no correlations between position and velocities, i.e., with uniform temperature. We can already see the connection with the general typicality account. Typical non-equilibrium microstates evolve into the Maxwellian distribution because the vast majority of possible microstates exhibit a Maxwellian velocity distribution. This crucial insight does not appear explicitly in Boltzmann’s H -theorem, however. It is rather based on the following three propositions: (1) For a low-density gas, the time evolution of .fX(t) (q, v) is well described by an effective kinetic equation, the Boltzmann equation. (2) For a solution .f (t,  q, v) of the Boltzmann equation, the H -function .H [f (t)] = f (t, q, v) log f (t, q, v)dqdv is monotonically decreasing in t. Recall that in (10.4), we already identified this H function as a (negative) measure of the Boltzmann entropy. (3) The H -function reaches its minimum for the Maxwell distribution .feq (q, v). Together with (2), this implies, in particular, that the Maxwell distribution is a stationary solution of the Boltzmann equation. Propositions (2) and (3) are fairly standard mathematical results. The crux of the matter is proposition (1). When Boltzmann first presented the H theorem in 1872, he argued that a dilute gas must evolve in accord with his equation; he later had to qualify this statement, claiming, in effect, only that it would do so typically and on empirically relevant time scales. Indeed, proposition (1), and therefore the H -theorem as a whole, must be understood as a typicality statement.

10 Boltzmann Equation and the H-Theorem

185

10.2.1 The Stoßzahlansatz Boltzmann’s original derivation of his namesake equation was famously based on the Stoßzahlansatz or the assumption of molecular chaos. “Assumption” is not a perfectly accurate translation of the German word Ansatz. Whereas the first is often used synonymously with a logical premise, the latter has a distinctly pragmatic character. A better translation would be “working hypothesis,” a plausible (though oversimplified) guess that is, in the first instance, validated by its success, but would ultimately require a deeper justification. Boltzmann’s derivation is thus a brilliant physical argument but never claimed to be a rigorous proof from first principles. The Stoßzahlansatz is an assumption about the relative frequencies of collisions between the particles in the gas. Denoting by .dN(t, q; v1 , v2 ) the number of collisions happening near q in a small time interval around t between particles with velocities (approximately) .v1 and .v2 , the Stoßzahlansatz is: dN(t, q ; v1 , v2 ) ∝ N 2 f (t, q, v1 )f (t, q, v2 ) |v1 − v2 | dt dq dv1 dv2 . (10.13)

.

Simply put, the relative frequency of collisions between particles of different velocities occurring in the cell around q is proportional to the density of particles with velocities near q. The scattering probability being proportional to the product of .f (t, q, v1 ) and .f (t, q, v2 ) means that particles of different velocities are statistically independent as they contribute to the collisions. This is, more specifically, the meaning of molecular chaos. What Boltzmann did prove is that, if and as long as the assumption of molecular chaos and hence the Stoßzahlansatz are valid, the Boltzmann equation will hold (as a good approximation to the evolution of the empirical distribution under the actual micro-dynamics). The H theorem thus hinges on the question of whether and in what sense the assumption of molecular chaos is justified. For the purpose of illustration, let us imagine that we could freeze the system at time .t = 0 and arrange the position and momentum of every

186

D. Lazarovici

single particle before letting the clock run and the system evolve in time.3 Which particles are going to collide and how they are going to collide is completely determined by these initial conditions and the microscopic laws of motion. We could, for instance, arrange the initial configuration in such a way that slow particles will almost exclusively scatter with other slow particles and fast particles with other fast particles. But such initial conditions are obviously very special ones. For typical configurations coarse-graining to the initial distribution .f0 (q, v), we will find that the relative frequencies with which particles of different velocities meet for the first collision are roughly proportional to the density of particles with the respective velocities, i.e., given by (10.13). This is nothing more and nothing less than the law of large numbers. The validity of (10.13) at the initial time is thus, like all LNN results, a typicality statement and, as such, another mathematical fact. The critical issue is whether molecular chaos propagates with the microscopic dynamics. Assume that after a brief time interval . t, for which the Boltzmann equation is valid, the continuous distribution has evolved into .f ( t, q, v). How do we know that (10.13) is still a good approximation for all but a small set of initial conditions? It is still true that (10.13) is satisfied for typical microscopic configurations realizing the current distribution, i.e., counting all possible configurations that coarse-grain to .f ( t, q, v). But we cannot count all these configurations since the relevant microstates are those that evolved from the macroregion realizing the initial distribution .f0 (q, v). Mathematically, this constraint translates into a loss of statistical independence at times .t > 0, making it prima facie questionable whether a law-of-large-numbers statement for the collisions, i.e., (10.13), is still valid. Boltzmann’s Stoßzahlansatz is thus the assumption that statistical independence is sufficiently well preserved under the microscopic time evolution, or, in other words, that the relative frequency of collisions is always the typical one with respect to the current distribution function.

3 There

is no issue here as to whether we let the clock run “forward” or “backward”—the problem is symmetric with respect to the time evolution in both directions.

10 Boltzmann Equation and the H-Theorem

187

A rigorous derivation of the Boltzmann equation would require proof of this assumption, i.e., of the propagation of molecular chaos (in the Boltzmann–Grad limit). The precise mathematical statement would likely take the form (10.12). This would be a monumental mathematical achievement, a sure claim to fame and a Fields Medal (for anyone young enough to qualify). But it is the technical difficulty of the problem, not the truth of the desired result, that experts grapple with. Based on physical intuition, the empirical success of the Boltzmann equation, and various encouraging (if only partial) mathematical results, there is little doubt that Boltzmann’s assumption, though idealized, is justified. Given that the microscopic dynamics are very chaotic, that the number of particles in a gas is huge, and the gas (by assumption) very dilute so that problematic re-collisions are rare, it is highly plausible that the relative frequencies of scatterings will not become too special—in the sense of deviating significantly from the expectation values (10.13)—unless the initial micro-configuration was very special. All this is subject to the important caveat (of which Boltzmann was well aware) that, unless one considers the thermodynamic limit of infinitely many particles, molecular chaos and (10.13) will hold at best approximately and for all but a small set of “bad” initial conditions, that this approximation will get worse with time, and that the approximation is only good enough until it isn’t. Eventually, typical systems will exhibit fluctuations out of equilibrium, at which point their evolution is no longer described by the Boltzmann equation. A Toy Model for the Boltzmann Equation We consider a system of .N 1 balls. At the beginning, n of the balls are black and .m := N − n are white. When two like-colored balls collide, they change their color; otherwise they stay the same: w+w →b+b .b

+b →w+w

(10.14)

b+w →b+w

(continued)

188

D. Lazarovici

Note that these dynamics are time-symmetric. For our model, we consider discrete time steps, assuming that .k  N collisions occur in each round. Now we make the following “Stoßzahlansatz”: the probability that a black/white ball enters a collision corresponds to the current fraction of black/white balls. The expected numbers of collisions in each round are thus:

n 2

collisions b + b N

m 2 .k collisions w + w N

n  m collisions b + w. 2k N N k

(10.15)

Consequently, the expected change in the number of black and white balls is    m 2 n 2 .n → n + 2k − N N   

n 2 m 2 m → m + 2k − N N and taking the difference: .(n

− m) → (n − m) −

  4k 4k 2 2 (n − m), (n − m ) = 1 − N2 N

where we used .(n2 − m2 ) = (n + m)(n − m) = N(n − m). This is iterated in each round. For large N, typical evolutions will be close to this theoretical expectation and, hence, after .T ∈ N time steps, .(n

  4k T (n − m)(0) → 0, T → ∞. − m)(T ) ≈ 1 − N

(10.16)

We thus have convergence to equilibrium: the number of black and white balls in the system tends toward equidistribution; and if we start with an unequal number of black and white balls –i.e., in non-equilibrium– the timesymmetric scattering dynamics (10.14) will lead to the irreversible macroevolution (10.16). However, small deviations from the expectation values will add up over time, leading to fluctuations out of equilibrium. At that point, the effective equation (10.16) is no longer valid.

10 Boltzmann Equation and the H-Theorem

189

10.2.2 Irreversibility of the Boltzmann Equation With all that said, let us summarize how Boltzmann’s H -theorem fits into his general typicality argument. While the micro/macro distinction does not appear as prominently in the formulation of the H -theorem, it is crucial that the function .ft (q, v) pertains to a coarse-grained description of the system, thus distinguishing a macro-region consisting of microstates whose particle distribution is well approximated by .ft . Convergence to equilibrium is then established for typical initial conditions relative to this initial non-equilibrium region. And the equilibrium state—characterized by the Maxwell distribution—is, as always, distinguished by the fact that it is the one realized by an overwhelming majority of all possible microstates. Although the focus on the Stoßzahlansatz is justified from a technical point of view, I contend that the tendency to equilibrium is first and foremost explained by this dominance of the equilibrium state (cf. the Boltzmann quote in Chap. 7). The point of molecular chaos is somewhat subsidiary to this insight, namely to substantiate the intuition that the “most likely” evolutions will thus carry a non-equilibrium configuration into the overwhelmingly large equilibrium region. Finally, we understand that the irreversibility of the Boltzmann equation (as an effective description of a system’s macro-evolution) is—as usual—a consequence of the fact that non-equilibrium configurations converging to equilibrium are typical with respect to the corresponding macrostate, whereas microstates leading to the time-reversed evolution are atypical relative to the equilibrium state, i.e., relative to all microconfigurations coarse-graining to .feq (q, v). The same holds true with respect to any macrostate that the evolution passes through on the way. It is sometimes claimed that the Stoßzahlansatz is a manifestly timeasymmetric assumption in that the incoming rather than the outgoing velocities are assumed to be independently distributed according to the current density function (see, in particular, Uffink (2007, p. 117)). This then results in the charge that Boltzmann failed to reconcile the timereversible micro-dynamics with the time-irreversible macro-evolution described by the Boltzmann equation. The claim about the asymmetry

190

D. Lazarovici

of the Stoßzahlansatz is technically correct but off-target, and the misunderstanding seems to be mostly due to the critics’ failure to recognize the Stoßzahlansatz as a typicality statement. As a (conditional) typicality statement, the Stoßzahlansatz is equally justified for the time evolution in both time directions. The origin of the time-asymmetry is, as always, the special boundary condition, i.e., the assumption of a non-equilibrium initial distribution .f0 . Relative to this low-entropy state at .t0 , an increase in entropy (decrease of H ) is typical— still in both time directions away from .t0 . But along the corresponding solutions, the microstates are necessarily atypical, relative to their current macro-distribution, with respect to their evolution in the direction of the lower-entropy “initial” state. In other words, the atypical evolution toward the low-entropy boundary condition—for which molecular chaos does not hold—is explained by the low-entropy boundary condition. So, of course, the Stoßzahlansatz or assumption of molecular chaos breaks the time symmetry in the sense that it applies to the thermodynamic evolution but not to the reversed motion. This does not mean, however, that Boltzmann smuggled in a time-asymmetry over and above the one resulting from the assumption of a non-equilibrium distribution at .t0 . If the terms “incoming” and “outgoing” velocities are misleading, we should speak instead of velocities “toward”, respectively “away from” the non-equilibrium boundary condition (in a temporal sense). Since statistical independence is a typicality property, molecular chaos can only hold for the time evolution away from the said boundary condition. For evolutions toward it, the boundary constraints will necessarily impose strong, seemingly conspiratorial correlations among the particle motions. The real question is why the low-entropy boundary conditions that we find or are able to prepare are always “past” rather than “future” ones— and thus why only the Boltzmann and not the “anti-Boltzmann” equation is empirically relevant. This question, however, goes beyond the scope of the H -theorem. It is a question about the arrow of time and the boundary conditions of our universe that we will address in Chap. 11.

10 Boltzmann Equation and the H-Theorem

191

References Davies, P. C. W. (1977). The physics of time asymmetry. Berkeley and Los Angeles: University of California Press. Fournier, N. & Guillin, A. (2014). On the rate of convergence in Wasserstein distance of the empirical measure. Probability Theory and Related Fields, 162, 1–32. Goldstein, S. (2001). Boltzmann’s approach to statistical mechanics. In J. Bricmont, D. Dürr, M. C. Galavotti, G. Ghirardi, F. Petruccione, & N. Zanghì (eds.). Chance in physics: Foundations and perspectives (pp. 39–54). Berlin: Springer. Goldstein, S. & Lebowitz, J. L. (2004). On the (Boltzmann) entropy of nonequilibrium systems. Physica D: Nonlinear Phenomena, 193(1), 53–66. Hauray, M. & Jabin, P.-E. (2015). Particle approximation of Vlasov equations with singular forces: Propagation of chaos. Annales scientifiques de l’École normale supérieure, 48(4), 891–940. Jeans, J. H. (1915). On the theory of star-streaming and the structure of the universe. Monthly Notices of the Royal Astronomical Society, 76 (2), 70–84. Lanford, O. E. I. (1975). Time evolution of large classical systems. In J. Moser (ed.). Dynamical systems, theory and applications. Lecture Notes in Physics (Vol. 38, pp. 1–111). Berlin: Springer. Lazarovici, D. & Pickl, P. (2017). A mean field limit for the Vlasov–Poisson system. Archive for Rational Mechanics and Analysis, 225(3), 1201–1231. Price, H. (2002). Burbury’s last case: The mystery of the entropic arrow. In C. Callender (Ed.). Time, reality & experience (pp. 19–56). Cambridge: Cambridge University Press. Spohn, H. (1991). Large scale dynamics of interacting particles. Berlin: Springer. Uffink, J. (2007). Compendium of the foundations of classical statistical physics. In Philosophy of physics (pp. 923–1074). Elsevier. Uffink, J. (2017). Boltzmann’s work in statistical physics. In Zalta, E. N., (Ed.). The Stanford encyclopedia of philosophy (spring 2017 ed.). Metaphysics Research Lab, Stanford University Villani, C. (2002). A review of mathematical topics in collisional kinetic theory. In S. Friedlander & D. Serre (Eds.). Handbook of mathematical fluid dynamics (Vol. 1, pp. 71–74). North-Holland: Elsevier Science. Vlasov, A. A. (1938). On vibration properties of electron gas. Journal of Experimental and Theoretical Physics, 8(3), 291. Vlasov, A. A. (1968). The vibrational properties of an electron gas. Soviet Physics Uspekhi, 10(6), 721.

11 Past Hypothesis and the Arrow of Time

11.1 The Easy and the Hard Problem of Irreversibility In what seems like a terminological nod to Chalmers (1995) problems of consciousness, Sheldon Goldstein (2001) introduces the distinction between the easy and the hard part of the problem of irreversibility. The easy part of the problem is: Why do isolated systems in a state of low entropy evolve into states of higher entropy (but not the other way around)? The answer, which goes back to Boltzmann’s work in statistical mechanics, was discussed in Chap. 8. From a technical point of view, the easy problem of irreversibility can still be arbitrarily hard if one seeks to obtain rigorous results about the convergence to equilibrium in realistic physical models. It is easy in the sense that Boltzmann’s account is conceptually well understood and successfully applied in physics and mathematics. The hard problem begins with the question: Why do we find systems in low-entropy states to begin with if such states are atypical? Often, the answer is that we prepared them, creating low-entropy subsystems for the price of entropy increase in their environment. But why then is the entropy of this © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Lazarovici, Typicality Reasoning in Probability, Physics, and Metaphysics, New Directions in the Philosophy of Science, https://doi.org/10.1007/978-3-031-33448-1_11

193

194

D. Lazarovici

environment so low—most strikingly in the sense that it allows us to exist? If one thinks this through to the end1 one comes to the conclusion that the universe as a whole must be in a low-entropy state and that it must have evolved from a state of even lower entropy in the distant past. The latter assumption is necessary to avoid the absurd conclusion that our present macrostate—which includes all our memories and records of the past—is much more likely the product of a fluctuation out of equilibrium than of the low entropy past that our memories and records ostensibly record. In other words, only with this assumption does Boltzmann’s account “make it plausible not only that the paper will be yellower and ice cubes more melted and people more aged and smoke more dispersed in the future, but that they were less so (just as our experience tells us) in the past” (Albert, 2015, p. 5). For excellent discussions of this issue, see also Feynman (1967, Chap. 5) and Carroll (2010). With this bigger picture in mind, the hard problem of irreversibility is thus to explain the existence of a thermodynamic arrow of time in our universe. And the standard account today involves the assumption of a very special (i.e., very low-entropy) initial macrostate for which Albert (2000) coined the now-famous term Past Hypothesis (PH). The thermodynamic asymmetry of the universe has thus been reduced to asymmetric boundary conditions. But the status of the Past Hypothesis is highly controversial. Is the very low-entropy beginning of the universe not itself a mystery crying out for an explanation?

11.1.1 The Status of the Past Hypothesis In the literature, by and large three different stances have been taken toward the status of the Past Hypothesis. 1. The low-entropy beginning of the universe calls for an explanation. 2. The low-entropy beginning of our universe is a brute fact that does not require or allow for any explanation.

1 As

few have done as meticulously as Penrose (1989, Chap. 7).

11 Past Hypothesis and the Arrow of Time

195

3. The Past Hypothesis is a law of nature (and therefore does not require or allow for further explanation). The first point of view is based on typicality reasoning. Boltzmann’s analysis allows us to conclude that typical microstates, relative to a low-entropy initial macrostate, undergo a thermodynamic evolution of increasing entropy. But it also tells us that an initial state far away from thermodynamic equilibrium is itself atypical, relative to all possible microstates, since a low-entropy macro-region makes up a tiny fraction of the overall phase space volume. For our universe, Penrose (1989) estimates the measure of the relevant macro-region (relative to the available phase 123 space volume) to be at most .1 : 1010 , a mind-bogglingly small number.2 There is thus already an internal tension in the Boltzmannian account when it comes to the hard problem of irreversibility. And we have said that atypical facts are generally the kind of facts that cry out for further explanation. Indeed, the low-entropy beginning of our universe seems like a clearer case of “fine-tuning” than most that have pre-occupied cosmologists, although not in the sense of tuning particular parameters to a very specific range of values. Still, the necessity of a Past Hypothesis implies that our universe looks very different from typical models of the fundamental laws of nature. This fact alone creates explanatory pressure. To me, it is even one of the deepest mysteries that we face in physics. The second view is well articulated by Craig Callender (2004a,b). While Callender is sympathetic to regarding PH as a (Humean) law, he argues more broadly that there is no single feature of facts—such as being atypical—that makes them require explanation and that the conceivable strategies to further ground PH don’t promise a more satisfying conclusion than accepting it as a brute fact. Notably, a low-entropy initial state does not qualify as an acceptably brute fact according to the analysis proposed in Sect. 1.4 (since a universe starting out in thermodynamic

2 This

is what Penrose’s illustration, Fig. 1.1 of this book, was originally used to illustrate.

196

D. Lazarovici

equilibrium would have been typical). Nonetheless, Boltzmann himself eventually arrived at a similar conclusion to Callender: The second law of thermodynamics can be proved from the mechanical theory if one assumes that the present state of the universe, or at least that part which surrounds us, started to evolve from an improbable state and is still in a relatively improbable state. This is a reasonable assumption to make, since it enables us to explain the facts of experience, and one should not expect to be able to deduce it from anything more fundamental. (Boltzmann, 1897)

The third stance is most prominently advocated by David Albert (2000) and Barry Loewer (2007) in the context of the Humean best system account of laws. According to their view, PH expresses a contingent fact but is elevated to a law of nature by virtue of playing the role of an axiom in the best systematization of the world. This best system would have the form of the “Mentaculus” (Loewer, 2012) discussed in Chap. 5, i.e., consist of microscopic dynamical laws, the Past Hypothesis, and a probability measure on the PH macro-region. Notably, though, the proposition that Albert wants to grant the status of a law is not that the universe started in any low-entropy state. The Past Hypothesis, in its current form, is rather a placeholder for “the macrocondition …that the normal inferential procedures of cosmology will eventually present to us” (Albert, 2000, p. 96). Ideally (I suppose), physics will one day provide us with a nice, simple, and informative characterization of the initial macrostate of our universe—maybe something along the lines of Roger Penrose’s Weyl curvature conjecture (Penrose, 1989)—that would strike us as “law-like.” But this is also what many advocates of option 1 hope for as an explanation of the PH. One may question, as Callender does, how much explanatory value it would ultimately add, but also how much is already accomplished by calling PH a law if we have to rely on future physics to fill in the details.

11 Past Hypothesis and the Arrow of Time

197

This part of the debate has been somewhat confused by tying the option “PH as a law” to the best system account of lawhood.3 One would do well to disentangle the claim that the hard problem of irreversibility is already solved by the Mentaculus from the claim that only Humeanism could accommodate a law for cosmological boundary conditions if we found a suitable candidate. Then one should reject both. A postulate saying little more than that the initial entropy of the universe was tiny does not seem like a satisfying conclusion to the project of physics. But if we had a more law-like characterization of the relevant boundary conditions, we would still have the option to interpret them as a Humean law or as nomologically necessary in a metaphysically more robust, i.e., non-Humean, sense (see, in particular, Chen and Goldstein (2022) for the latter option). Non-Humeans don’t have to be committed to only dynamical laws, as Chen and Goldstein make clear.

11.2 Thermodynamic Arrow Without a Past Hypothesis In recent years, Sean Carroll together with Jennifer Chen (2004; see also Carroll (2010)), and Julian Barbour together with Tim Koslowski and Flavio Mercati (2013, 2014, 2015; see also Barbour (2020)) independently put forward audacious proposals to explain the arrow of time without the postulate of a special initial state. The goal, in other words, is to (dis)solve the hard problem of irreversibility by establishing the existence of an arrow of time in the universe as typical. While Barbour’s arrow is not, strictly speaking, an entropic one but connected to a concept of shape complexity, Carroll’s account is largely based on the Boltzmannian picture, although with a crucial twist. For this reason, I will focus on Carroll’s proposal first, before comparing it to the theory of Barbour et al. in Sect. 11.4. The crucial assumption of Carroll and Chen is that the universe has no equilibrium state so that its Boltzmann entropy can increase without

3 As

Loewer does in his beautiful essay “Two accounts of laws and time” (2012a), which compares the Mentaculus account to Maudlin’s primitivism about laws (Maudlin, 2007).

D. Lazarovici

entropy

198

time

Fig. 11.1 Typical entropy curve for a Carroll universe. Arrows indicate the arrows of time on both sides of the global entropy minimum (Janus point)

bound. Indeed, if there are macrostates of arbitrarily (but not infinitely4 ) high entropy, then every macrostate is a non-equilibrium state from which the entropy can increase in both time directions. A typical entropy curve (one hopes) would thus be roughly “.U-shaped,” attaining its global minimum at some moment in time and growing monotonically (modulo rare fluctuations) in both directions from this vertex (Fig. 11.1). Barbour et al. (2015) describe such a profile as “one-past-two-futures,” the idea being that observers on each branch of the curve would identify the direction of the entropy (or complexity) minimum—which the authors call the Janus point—as their past. In other words, there would be two future-eternal epochs making up the total history of the universe, with the respective arrows of time pointing in opposite directions. This picture is common to the ideas of both Carroll and Barbour and constitutes a bold yet plausible departure from the “gas in the box” paradigm that is still guiding most discussions of the thermodynamic history of our universe (Barbour, 2017, 2020). A drawback of Carroll’s 4 I won’t go there, but Carroll and Chen (2004) argue that even infinite phase space volumes could be meaningfully compared.

11 Past Hypothesis and the Arrow of Time

199

proposal, in particular, is that it has to assume a non-normalizable phase space measure, one that assigns an infinite volume to the relevant phase space and thus allows for an unbounded Boltzmann entropy (and also avoids the Poincaré recurrence theorem). This leads to certain ambiguities in the statistical analysis, as Goldstein et al. (2016) show. And while it is plausible that an eternal universe with no equilibrium state would exhibit the desired thermodynamic behavior (since there are always vastly larger and larger macro-regions, corresponding to higher and higher entropy values, that the microstate can involve into), the details of the dynamics and the phase space partition must play a greater role than in the familiar Boltzmannian account. For instance, the measure of low-entropy macro-regions could sum up to arbitrarily large values, exceeding those of the high-entropy regions, or the high-entropy macro-regions could be arbitrarily remote in phase space, so that the dynamics does not carry lowentropy configurations into high-entropy regions on relevant time scales. The two main questions we need to ask are therefore: • Are there plausible dynamical laws (and macro-variables) that make such entropy curves typical? • And would this suffice to ground sensible inferences about the thermodynamic history of our universe without assuming something akin to a Past Hypothesis? Let us address the first question first. The original idea of Carroll and Chen (2004) is as fascinating as it is speculative. The authors propose a model of eternal spontaneous inflation in which the late stages of a universe approach de Sitter space and give birth to new baby universes (or “pocket universes”) growing out of fluctuations in an inflation field. The creation of a new universe would then increase the overall entropy of the multiverse, while the baby universes themselves would typically start in an inflationary state that has much lower entropy than a standard Big Bang universe. This means, in particular, that our observed universe can be in a low-entropy state, with an even lower-entropy past, even if the state of the multiverse as a whole is arbitrarily high up the entropy curve. The details of this model are beyond both the scope of this book and my area of expertise, but so far they include neither concrete dynamical laws nor a precise definition of the relevant entropy.

200

D. Lazarovici

In more recent talks, Carroll discusses a simple toy model—essentially an ideal gas without a box—in which a system of N non-interacting particles can expand freely in empty space. The only macro-variables N  considered are the constant total energy .E = 21 p2i and the moment of inertia .I =

N  i=1

i=1

q2i (in the center-of-mass frame), providing a measure

for the expansion of the system.5 It is then not difficult to see that I will attain a global minimum at some moment in time, from which it grows without bound for .t → ±∞ (see (11.8) below). The same will hold for the corresponding Boltzmann √ entropy, since a given value of I corresponds to a sphere of radius . I in the position√coordinates, while all momenta are constant, lying on a sphere of radius . 2E. The entropy curve will thus have the desired .U-shape with a minimum when the particle configuration is most dense. For a detailed discussion of this toy model, see Reichert (2012) and Goldstein et al. (2016). Baby universes might be a little too speculative, and freely expanding particles too might be simplistic. However, Paula Reichert and I have argued that there exists a very familiar and relevant theory that realizes Carroll’s entropy model: Newtonian gravity. Before I present our analysis, let us turn to the second question and see whether Carroll’s entropy model could succeed in explaining the Past Hypothesis away.

11.2.1 Past Hypothesis and Self-Location To approach this question and clarify the role of the PH in the standard account, we should disentangle two often confounded issues: (i) Given the fundamental (dynamical) laws of nature, what do typical macro-histories of the universe look like? In particular, is the existence of a thermodynamic arrow typical? (ii) Given our knowledge about the present state of the universe, what can we infer about its past and future? 5 For

simplicity, I set the particle masses equal to 1.

11 Past Hypothesis and the Arrow of Time

201

The answer to question (i) will depend on dynamical laws as well as cosmological considerations. If we have an eternal universe and finite maximal entropy, a typical macro-history will be in thermodynamic equilibrium almost all the time, but also exhibit arbitrarily deep fluctuations into low-entropy states, leading to periods with a distinct entropy gradient, i.e., local thermodynamic arrows. This fluctuation scenario was, in fact, one of Boltzmann’s attempts to resolve the hard problem of irreversibility (Boltzmann, 1896). However, to assume a fluctuation as the origin of our thermodynamic arrow is highly unsatisfying; Feynman (1967, p. 115) even calls the fluctuation hypothesis “ridiculous.” The reason is that fluctuations which are just deep enough to account for our present macrostate would be much more frequent than fluctuations producing an even lower-entropy past from which the current state could have evolved in accordance with the second law. We would thus have to conclude that we are currently experiencing a local entropy minimum, that our present state—including all our records and memories—is the product of a “random” fluctuation rather than a lower-entropy past. Feynman makes the further point that the fluctuation scenario leads not only to absurd conclusions about the past but also to wrong conclusions about the present state of the universe since we would have to assume that our current fluctuation is not deeper than necessary to explain the evidence we already have. If we dig in the ground and find a dinosaur bone, we should not expect to find other bones nearby. If we stumble upon a book about Napoleon, we should not expect to find other books containing the same information about some French emperor named Napoleon. The most extreme form of this reductio ad absurdum is the Boltzmann brain problem (see, e.g., Carroll (2010) for a nice discussion): A fluctuation that is just deep enough to account for your empirical evidence would produce only your brain, floating in space, with the rest of the universe at equilibrium. You should thus conclude that this is, by far, the most likely state of the universe you currently experience.

202

D. Lazarovici

Fig. 11.2 Self-location hypothesis in the fluctuation scenario (upper image) and Big Bang scenario (lower image) with bounded entropy. In the upper image, time scales are much longer than below, and periods of equilibrium much longer than depicted

The only escape in such a fluctuation scenario would seem to be an additional postulate—a close cousin of the Past Hypothesis6 —that the present macrostate is not the bottom of an entropy fluctuation but has been preceded by a sufficiently long period of entropy increase. In this context, the PH would thus serve a self-locating function, taking the form of an indexical proposition that locates our present state on the upward slope of a particularly deep fluctuation (Fig. 11.2). The now-standard account assumes a bounded entropy and a relatively young universe—about 13.8 billion years old according to standard

6 As

my colleague and friend Saakshi Dulani put it.

11 Past Hypothesis and the Arrow of Time

203

cosmological estimates. In this setting (we interpret the Big Bang as the actual beginning of time), a typical macro-history would be stationary. The universe would start out in equilibrium and not exhibit significant entropy fluctuations on the time scale of .∼1010 years. Therefore, we need PH—the postulate of low-entropy boundary conditions at the Big Bang—to account for the universe having any thermodynamic arrow at all. A self-locating proposition is still crucial and hidden in the assumption of a young universe. Winsberg (2012) makes it explicit in what he calls the “Near Past Hypothesis” (NPH), which states that the present state of the universe lies in-between the low-entropy beginning and the time of first relaxation into equilibrium. Without the NPH—but assuming a future-eternal universe—we would essentially be back in a fluctuation scenario with all its Boltzmann-brain-like absurdities. That is because, after the universe first reaches equilibrium, there will still be infinitely many fluctuations into the macrostate we currently experience (and which includes all our evidence for a beginning of the universe about 13.8 billion years ago). And it would still be much more likely that we find ourselves in one of those fluctuations than on the initial upward slope originating in the low-entropy Big Bang (cf. Loewer (2020)). The self-locating role of the PH (taken to include the NPH) is thus indispensable. And it is the indexical proposition that is involved, rather than the non-dynamical boundary condition, that seems unbefitting of a law of nature when we consider this option for the status of PH. Carroll’s model postulates an eternal universe and unbounded entropy, suggesting that typical macro-histories will have the “one past two futures” entropy profile depicted in Fig. 11.1. If this works out (and I will argue that it does), the existence of a thermodynamic arrow—or rather two opposite ones—will be typical.7 It may still turn out that we need to invoke a kind of Past Hypothesis for purposes of self-locating if the theory would otherwise suggest that our present macrostate is the global entropy minimum, i.e., has not evolved from a lower-entropy past. The postulate may then take the form of an

7 For completeness, we could also discuss the option of a temporally finite universe with unbounded

entropy, but this model does not seem to add much of interest.

204

D. Lazarovici

indexical clause—stating that our present state is sufficiently high up the entropy curve—or characterize the state of minimal entropy, i.e., the Janus point, of the universe. (In the first case, it would locate the present moment within the history of an eternal universe; in the latter, it would locate the actual universe within the space of nomologically possible ones.) But it is not obvious why the Carroll model would lead to the conclusion that our current state is at or near the entropy minimum. And the issue actually belongs to our second question—how to make inferences about the past and future—to which I shall now turn.

Predictions and Retrodictions The most straightforward response is the following method of statistical reasoning: Observe the current state of the universe (respectively a suitably isolated subsystem), restrict the relevant probability (more correctly, typicality) measure to the corresponding macro-region in phase space, and use the conditional measure to make inferences about the history of the system. I shall call this naive evidential reasoning, reviving a terminology introduced in an unpublished 2011 draft of Goldstein et al. (2016). The negative connotation is warranted because we know that, while this method works well for predictions—inferences about the future—it leads to absurd, if not self-refuting, conclusions when applied to retrodictions— i.e., inferences about the past. The standard move to avoid this predicament is to employ the PH to block naive evidential reasoning in the time direction of the lowentropy boundary condition. For sensible retrodictions, we learn, one must conditionalize on the low-entropy initial state in addition to the observed present state. It is rarely, if ever, noted that an appeal to PH may be sufficient but not necessary at this point. The key is to appreciate that the second question—how to make inferences about the past and future of the universe—must be addressed subsequently to the first—whether a thermodynamic arrow in the universe is typical. For if we have good reasons to believe that we live in a universe with a thermodynamic arrow of time, this alone is sufficient to conclude the irrationality of retrodicting by conditionalizing the phase space measure on the present macrostate.

11 Past Hypothesis and the Arrow of Time

205

Indeed, it follows from the Boltzmannian analysis that, in a system with a thermodynamic arrow, the evolution toward the future (the direction of entropy increase) looks like a typical evolution relative to any intermediate macrostate, while the actual microstate is necessarily atypical with respect to its evolution toward the entropic past. This is just the inverse of the familiar “paradox” that entropy increase in both time directions is typical relative to any non-equilibrium macrostate. In other words, if our theory tells us that the existence of a thermodynamic arrow is typical tout court, it also tells us not to expect typical behavior relative to the present macrostate in the time direction of decreasing entropy. In a universe with a thermodynamic arrow, a sensible method for making inferences about the past is not naive evidential reasoning but the following method of inference to the best explanation. Instead of asking what past state is typical given the present macrostate (or looking at the conditional probabilities .P(M0 | Mact ) for past states .M0 ), we should ask what past state would typically evolve into the present one, i.e., “bet” on macrostates .M0 that maximize .P(Mact | M0 ). If we find a dinosaur bone, we should infer a past state containing a dinosaur. If we find history books with information about Napoleon, we should infer a past state containing a French emperor by the name of Napoleon. In particular, considering the current macrostate of the universe as a whole, the fact that it has evolved from a lower-entropy state in the past is inferred, rather than assumed, by this kind of abductive reasoning. By now, it should be clear that the debate is not about whether the assertion of a low-entropy past is true but about whether it is an axiom. And the upshot of our discussion is that, if the existence of a thermodynamic arrow in the universe turns out to be typical, our lowentropy past can be reasonably inferred from empirical evidence and our best theory of nature (as any knowledge about our place in the history of the universe arguably should be). There is another way to make the case against naive evidential reasoning: Naive evidential reasoning applied toward both past and future will always lead to the conclusion that the current macrostate is the (local) entropy minimum. However, if we know that we observe a universe (or any other system) with a thermodynamic arrow, we also know that this

206

D. Lazarovici

conclusion would be wrong almost all the time. That is, it would be wrong unless we happened to observe a very special period in the history of the universe in which the universe is close to its entropy minimum. Goldstein, Tumulka, and Zanghì provide a mathematical analysis of this issue in the context of Carroll’s toy model of freely expanding particles (Goldstein et al., 2016). Their discussion shows that the two opposing ways of reasoning about the past—based on typical microstates relative to the current macrostate versus typical time periods relative to the history of an eternal universe characterized by a .U-shaped entropy curve– come down to different ways of regularizing the unbounded typicality measure by choosing an appropriate cut-off. Thus, while Goldstein et al. concur in dismissing the first option, a mathematical ambiguity remains as a result of the non-normalizable measure that Carroll’s solution to the hard problem of irreversibility has to assume.

The Mystery of Our Low-Entropy Universe Another objection to this solution goes as follows: Does the fact that the entropy of the universe could be arbitrarily high not make its lowentropy present, and even lower-entropy past, only more mysterious? In other words, does it not lead us to expect a Janus point entropy much higher than it actually was? To my mind, the Carroll model precludes any a priori expectation of what the entropy of the universe should be. If it can be arbitrarily (but not infinitely) high, any possible value could be considered mysteriously low by skeptics. This is a prototypical “Morgenbesser case” as discussed in Sect. 1.4: Why is the entropy of our universe so damn low? If it were any higher, you’d still be complaining! The possibility of arbitrarily high Janus point entropies thus makes its actual value, whatever it was, an acceptably brute fact. I admit, however, that divergent intuitions about this question are possible. And the somewhat paradoxical situation that any finite range of entropy values would be atypical is once again the result of the nonnormalizable phase space measure. I will have to leave it at that as far as the discussion of the Carroll model is concerned, exploring instead in an

11 Past Hypothesis and the Arrow of Time

207

upcoming section how the relationalist theory of Barbour et al. may be able to resolve the issue. But first, I owe the reader further evidence that our discussion has not been purely speculative since a .U-shaped entropy curve seems to be indeed a typical feature of Newtonian gravitating universes.

11.3 Entropy of a Classical Gravitating System Gravity is particularly relevant to understanding the low-entropy beginning of our universe. Penrose (1989) argues that “all the remarkable lowness of entropy that we find about us—and which provides this most puzzling aspect of the second law—must be attributed to the fact that vast amounts of entropy can be gained through the gravitational contraction of diffuse gas into stars” (p. 322). “The entropy might have been given to us as ‘low’ in many other different ways,” but it seems to be a nonequilibrium gravitational state that is the source of our thermodynamic arrow. Ultimately, this state has to be characterized in terms of general relativity (as Penrose tries to do with his Weyl curvature conjecture), if not a future theory of quantum gravity, but understanding the entropic history of our universe from the point of view of Newtonian gravity is a good start. There is a lot of confusion and controversy about the statistical mechanics of classical gravitating systems, despite the fact that statistical methods are commonly and successfully applied in areas of astrophysics that are essentially dealing with the Newtonian N -body problem (see, e.g., (Heggie & Hut, 2003)). (An excellent paper clearing up much of the confusion is Wallace (2010)); see Callender (2010) for some problematic aspects of the statistical mechanics of gravitating systems and Padmanabhan (1990) for a mathematical treatment.) Some examples of common claims are: (a) Boltzmann’s statistical mechanics is not applicable to systems in which gravity is the dominant force. (b) The Boltzmann entropy of a classical gravitating system is ill defined or infinite.

208

D. Lazarovici

(c) An entropy-increasing evolution for a gravitating system is exactly opposite to that of an ideal gas. While the tendency of the latter is to expand into a uniform configuration, the tendency of the former is to clump into one big cluster. I believe that the first two statements are false, while the third is an oversimplification. Rather than arguing against these claims in the abstract, I shall provide a demonstration of the contrary by proposing an analysis of a classical gravitating system in the framework of Boltzmann’s statistical mechanics (based on joint work with Paula Reichert in Lazarovici and Reichert (2020)). We start by looking at the naive calculation, along the lines of the standard textbook computation for an ideal gas, that finds the Boltzmann entropy of the classical gravitating system to be infinite (see, e.g., Kiessling (2001)). For N gravitating particles with mass m in a volume V , we have S(E, N , V ) := kB log|Г(E, N , V )| ⎡ ⎤     3N 3N 1 = kB log ⎣ 3N δ H − E d q d p⎦ , h N!

.

V N R3N

(11.1) with N



p2i Gm2 .H (q, p) = − 2m 1≤i 0, when compensated by a corresponding rescaling of time, .t → α 3/2 t. This is called dynamical similarity. However, the characteristic scale .σ (t) of our N -particle universe changes with time. Hence, if we eliminate this scale “by hand,” viz. by a time-dependent coordinate transformation q .q → , the resulting dynamics can be formulated on shape space but σ (t) will no longer have the standard Newtonian form. Instead, the dynamics become non-autonomous (time-dependent) with scale acting essentially like friction (Barbour et al., 2014). How to capture this “time-dependence” without reference to Newtonian time? Barbour et al. make use of the fact that the dilatational momentum .D = 21 I˙ is monotonically increasing (.I¨ > 0 by (11.8)) and can thus be used as a physical time parameter. We should note that D itself is not a scale-invariant quantity and thus not a bona fide relational “clock,” but this is a concession we have to make in order to formulate Newtonian dynamics on shape space. In any case, we now observe that .D = 0 marks the unique and hence global minimum of I , which characterizes the scale of a Newtonian configuration. This central time is thus the midpoint between a period of contraction and a period of expansion, or better (though this remains to be justified) the Janus point between two periods of expansion with respect to opposite arrows of time. The central time, .D = 0, also provides a natural reference point for parameterizing solutions of the shape space theory in terms of mid-point data on the shape phase space .T ∗ S.12

11 Precisely the point of Newton’s famous bucket experiment and rotating spheres argument against

Leibniz; see Maudlin (2012) for an excellent discussion that comes down on Newton’s side and Barbour (2001) for a relationalist defense. 12 Mathematically, this is the cotangent bundle of shape space S, just as Hamiltonian phase space is the cotangent bundle of Newtonian configuration space.

218

D. Lazarovici

This has been a very brief summary of profound ideas and non-trivial mathematics. But let’s recollect and note that, after all the steps described so far, we still have a Hamiltonian system—albeit a non-autonomous one—on an even-dimensional phase space for which there exists a canonical stationary measure. There is, however, one further redundancy from the relational point of view: The dynamical similarity mentioned above that allows for a simultaneous rescaling of D and corresponding shape momenta (the shape configuration variables are already scale invariant). Simply put, two solutions are considered physically identical if they correspond to the same curve in shape space, only run through at different “speeds.” Thus, factoring out the absolute magnitude of the shape momenta at central time, we reduce the phase space that parameterizes solutions by one further dimension. The resulting space .P T ∗ S (mathematically, the projective cotangent bundle of shape space S) is compact and will therefore have a finite total volume with respect to natural measures. That is despite the fact that it still describes an unconstrained system—we do not have to put the universe in a box, so to speak. And this is where the relational formulation, that is, the elimination of absolute degrees of freedom, really starts to pay off since a normalizable measure will allow for a statistical analysis that avoids the ambiguities we encountered in the Carroll model. Still, it comes at certain costs. First, the construction of the measure on .P T ∗ S is no longer canonical but involves the choice of a metric on shape space, which is used to remove the redundancy of dynamically similar solutions by fixing the norm of the shape momenta at the central time to 1.13 Second, the resulting measure is no longer stationary but distinguishes .D = 0 and the corresponding parameterization of solutions by mid-point data for typicality statements. (Although it is questionable whether stationarity is still a natural desideratum in a relational theory without an external time to synchronize different solution trajectories.)

et al. (2015) choose the (arguably) simplest scale-invariant metric, .ds 2 = 2 i=1 mi dqi , but other choices would be possible and lead to different typicality results.

13 Barbour

I (q)−1

N

219

11 Past Hypothesis and the Arrow of Time

In any case, when all is said and done, Barbour et al. have constructed a normalizable measure .με on .P T ∗ S, which they take to be the natural typicality measure for the gravitational theory on shape space. And while their construction seems compelling enough, the justification of this typicality measure (and typicality measures for relational mechanics, in general) remains a critical point that would require a more in-depth discussion than I am able to provide here.14 For now, let us accept the proposed typicality measure and see where it takes us.

11.4.1 Shape Complexity and a Gravitational Arrow To describe the macro-evolution of a gravitating system on shape space, Barbour et al. introduce a dimensionless (scale-invariant) macro-variable .CS which they call shape complexity CS = −V ·

.



I.

(11.15)

Comparison with (11.6) and (11.7) (setting .E = 0) shows a close relationship between shape complexity and the gravitational entropy (with respect to the macro-variables I and .U = −V ) √ that we computed on absolute phase space: .S(E = 0, I , U ) ∝ N log( I CS ). Recalling our qualitative discussion of this entropy—or noting that .CS ≈ R/r, where R is the largest and r the smallest inter-particle distance—we can see that low shape complexity corresponds to dense (on the scale of r) homogeneous states in absolute space, while high shape complexity indicates “structure”—dilute configurations of multiple clusters. Considering the simplest case of 3-particle shape space, the configuration of minimal shape complexity is the equilateral triangle, while the configuration of maximal shape complexity corresponds to “binary coincidences” in which the distance between two particles—relative to their distance to the third—is zero. This suggests that 3-particle configurations with high (but not maximal) shape complexity will contain a 14 Dürr et al. (2019) provide an insightful discussion of typicality measures on shape space, although focusing on the quantum case.

220

D. Lazarovici

Fig. 11.3 Top: evolution of the shape complexity .CS found by numerical simulation for .N = 1000 and Gaussian initial data. Bottom: schematic depiction (not found by numerical simulation) of three corresponding configurations on Newtonian spacetime. Source: Barbour et al. (2015)

Kepler pair—a gravitational bound state of two particles—with the third particle escaping to infinity. Above, we discussed the typical evolution of .−V · I and found it to be roughly parabolic or .U-shaped. √ Analogously, one can conclude that the evolution of .CS = −V · I will typically exhibit a .V-shaped profile, with a global minimum at central time .D = 0, from which it grows roughly linearly (modulo fluctuations) in both time directions (see Fig. 11.3). In the terminology of Barbour, Koslowski, and Mercati, this defines two opposite gravitational arrows of time with the Janus point as their common past. Note that these are not entropic arrows, although our previous discussion suggests that the evolution of the shape complexity on shape space will generally align with the evolution of the gravitational entropy (11.7) on absolute phase space. A remarkable feature of the relational theory is, however, that it reveals the origin of the gravitational arrow to be dynamical rather than statistical. The negative of the shape complexity corresponds to the potential that

11 Past Hypothesis and the Arrow of Time

221

generates the gravitational dynamics on shape space. There is thus a dynamical tendency toward higher values of .CS (lower values of the shape potential), and the configurations of maximal shape complexity act, in fact, as attractors. Turning to the statistical analysis of the shape space theory, we are interested in determining typical values of .CS at the Janus point. To this end, we consider the measure assigned to mid-point data (Janus point configurations) with low shape complexity on the one hand CS ∈ [Cmin , αCmin ] := J1 ,

.

(11.16)

and high shape complexity on the other CS ∈ (αCmin , ∞) := J∞ .

.

(11.17)

Here, .1 < α  ∞ is some positive constant, and .Cmin is the smallest possible value of .CS . The key result of Barbour et al. (not rigorously proven but substantiated by the three-particle case as well as numerical experiments for large N ) is that already for small values of .α (more precisely, .α < 2 for large N ) .

με (J∞ ) ≈ 0, με (P T ∗ S)

(11.18)

με (J1 ) ≈ 1. με (P T ∗ S)

(11.19)

and consequently .

In other words, typical Janus point configurations have low shape complexity. This is a spectacular result. It means that it is typical for the universe to be in a very homogeneous state at the beginning of our macrohistory (.∼ Big Bang); one that looks like a very-low-entropy state from the absolutist point of view. Regardless of the philosophical merits of relationalism, the theory of Barbour, Koslowski, and Mercati thus comes with two great virtues.

222

D. Lazarovici

First, it provides a sensible, normalizable measure of the set of possible micro-evolutions that still establishes an arrow of time as typical. Second, if one accepts this measure as typicality measure, the theory resolves the two potential puzzles we were left with in the Carroll model: The “mysteriously” low entropy of our universe and the justification for locating our present state far away from the entropy minimum.

11.4.2 Entropy as an Absolutist Concept Observing the close relationship between (11.15) and (11.7), we may wonder whether we could compute the Boltzmann entropy associated with the shape complexity or other scale-invariant macro-variables on absolute phase space. Interestingly, the answer is negative, and the reason is the following simple result. Proposition 1 Let .μ be a measure on .Rn (equipped with its Borel sigmaalgebra) which is homogeneous of degree d, i.e., .μ(λA) = λd μ(A) for any measurable .A ⊂ Rn and all .λ > 0. Let .F : Rn → Rm be a measurable function which is homogeneous of degree k, i.e., .F (λx) = λk F (x), ∀x ∈ Rn . Then we have for any measurable value set .J ⊂ Rm :     μ {x | F (x) ∈ λk J } = λd μ {x | F (x) ∈ J } .

.

(11.20)

Proof   μ {x | F (x) ∈ λk J } = μ ({λx | F (x) ∈ J })

.

= λd μ ({x | F (x) ∈ J }) . From this, we can immediately conclude Corollary 1 If the measure .μ is homogeneous of degree .d = 0 and F is homogeneous of degree 0 (i.e., scale-invariant), then   μ F −1 (J ) ∈ {0, +∞}.

.

(11.21)

11 Past Hypothesis and the Arrow of Time

223

 −1  Proof Applying (11.20) with .k = 0 and .d = 0 yields .μ F (J ) =  d −1 λ μ F (J ) for any .λ > 0. ⨆ ⨅ Hence, using a homogeneous phase space measure—such as the Liouville measure on .Г ∼ = R6N —macro-regions defined in terms of scale-invariant macro-variables must have measure zero or infinity, meaning that the corresponding Boltzmann entropy would be ill-defined. This suggests that the concept of entropy is intimately linked to absolute scales and thus not manifestly relational. Note, in particular, that expansion and heating— processes that are paradigmatic for entropy increase (especially, but not exclusively in our analysis of gravitating systems)—require absolute scales of distance and velocity, respectively. This emphasizes once again that the gravitational arrow of Barbour et al. is not an entropic arrow, although it matches—maybe accidentally, maybe for reasons that remain to be better understood—the entropic arrow that we identified on absolute phase space with respect to the macro-variables I and .U = −V . The result also poses an interesting question for the relationalist: either the concept of entropy is meaningful only for subsystems—for which the environment provides extrinsic scales— or one has to explain why the entropy of the universe is a useful and important concept despite the fact that it is related to degrees of freedom that are strictly speaking unphysical, corresponding to mere gauge in the shape space formalism. Barbour’s view is decidedly the former, while I regard the entropy of the universe as a highly significant concept whose status in the relationalist theory deserves to be clarified.

Conclusion: Can We Dispense with the Past Hypothesis? Carroll’s model of unbounded entropy and Barbour’s concept of relational shape complexity show the prospect of explaining the arrow of time as a typical feature of our universe without postulating a low-entropy initial macrostate. That is, in other words, as typical tout court rather than just typical relative to atypical boundary conditions. Questions remain about both approaches, not least whether and how they can be generalized to dynamical theories beyond Newtonian gravity. But the Past Hypothesis

224

D. Lazarovici

that long appeared both necessary and deeply puzzling (at least based on typicality reasoning) might turn out to have been just a temporary crutch.

References Aaronson, S., Carroll, S. M., & Ouellette, L. (2014). Quantifying the rise and fall of complexity in closed systems: The coffee automaton. arXiv:1405.6903 [cond-mat, physics:gr-qc, physics:nlin]. Albert, D. Z. (2000). Time and chance. Cambridge, Massachusetts: Harvard University Press. Albert, D. Z. (2015). After physics. Cambridge, Massachusetts: Harvard University Press. Barbour, J. (2001). The discovery of dynamics. Oxford: Oxford University Press. Barbour, J. (2003). Scale-invariant gravity: Particle dynamics. Classical and Quantum Gravity, 20, 1543–1570. Barbour, J. (2017). Arrows of time in unconfined systems. In R. Renner & S. Stupar (Eds.). Time in physics. Tutorials, Schools, and Workshops in the Mathematical Sciences (pp. 17–26). Cham: Springer International Publishing. Barbour, J. (2020). The Janus point: A new theory of time. New York: Basic Books. Barbour, J., Koslowski, T., & Mercati, F. (2013). A gravitational origin of the arrows of time. arXiv:1310.5167 [astro-ph, physics:gr-qc]. Barbour, J., Koslowski, T., & Mercati, F. (2014). Identification of a gravitational arrow of time. Physical Review Letters, 113(18), 181101. Barbour, J., Koslowski, T., & Mercati, F. (2015). Entropy and the typicality of universes. arXiv:1507.06498 [gr-qc]. Boltzmann, L. (1896). Vorlesungen über Gastheorie (Vol. 1). Leipzig: J. A. Barth. Boltzmann, L. (1897). Zu Hrn. Zermelos Abhandlung ‘Über die mechanische Erklärung irreversibler Vorgänge’. Annalen der Physik, 60, 392–398. Callender, C. (2004a). Measures, explanations and the past: Should ‘special’ initial conditions be explained? The British Journal for the Philosophy of Science, 55(2), 195–217. Callender, C. (2004b). There is no puzzle about the low-entropy past. In C. Hitchcock (Ed.). Contemporary debates in philosophy of science (pp. 240–255). Oxford : Blackwell.

11 Past Hypothesis and the Arrow of Time

225

Callender, C. (2010). The past hypothesis meets gravity. In G. Ernst & A. Hüttemann (Eds.). Time, chance, and reduction: Philosophical aspects of statistical mechanics (pp. 34–58). Cambridge: Cambridge University Press. Carroll, S. (2010). From eternity to here. New York: Dutton. Carroll, S. M. & Chen, J. (2004). Spontaneous inflation and the origin of the arrow of time. Manuscript arXiv:hep-th/0410270. Chalmers, D. J. (1995). Facing up to the problem of consciousness. Journal of Consciousness Studies 2(3), 200–219. Chen, E. K. & Goldstein, S. (2022). Governing without a fundamental direction of time: Minimal primitivism about laws of nature. In Y. Ben-Menahem (Ed.). Rethinking the concept of law of nature: Natural order in the light of contemporary science. Jerusalem Studies in Philosophy and History of Science (pp. 21–64). Cham: Springer International Publishing. Dürr, D., Goldstein, S., & Zanghì, N. (2019). Quantum motion on shape space and the gauge dependent emergence of dynamics and probability in absolute space and time. Journal of Statistical Physics, 180, 92–134. Feynman, R. (1967). The character of physical law. M.I.T. Press. Goldstein, S. (2001). Boltzmann’s approach to statistical mechanics. In J. Bricmont, D. Dürr, M. C. Galavotti, G. Ghirardi, F. Petruccione, & N. Zanghì (eds.). Chance in physics: Foundations and perspectives (pp. 39–54). Berlin: Springer. Goldstein, S., Tumulka, R., & Zanghì, N. (2016). Is the hypothesis about a low entropy initial state of the universe necessary for explaining the arrow of time? Physical Review D, 94(2), 023520. Heggie, D. & Hut, P. (2003). The gravitational million-body problem. Cambridge University Press. Kiessling, M. K.-H. (2001). How to implement Boltzmann’s probabilistic ideas in a relativistic world? In J. Bricmont, G. Ghirardi, D. Dürr, F. Petruccione, M. C. Galavotti, & N. Zanghi (Eds.). Chance in physics: Foundations and perspectives. Lecture Notes in Physics (pp. 83–100). Berlin: Springer. Lazarovici, D. & Reichert, P. (2020). Arrow(s) of time without a past hypothesis. In V. Allori (Ed.). Statistical mechanics and scientific explanation: Determinism, indeterminism and laws of nature. World Scientific. Loewer, B. (2007). Counterfactuals and the second law. In H. Price & R. Corry (Eds.). Causation, physics, and the constitution of reality. Russell’s republic revisited (pp. 293–326). Oxford: Oxford University Press. Loewer, B. (2012). Two accounts of laws and time. Philosophical Studies, 160(1), 115–137.

226

D. Lazarovici

Loewer, B. (2020). The mentaculus vision. In V. Allori (ed.). Statistical mechanics and scientific explanation: Determinism, indeterminism, and laws of nature. World Scientific. Marchal, C. & Saari, D. G. (1976). On the final evolution of the n-body problem. Journal of Differential Equations, 20(1), 150–186. Maudlin, T. (2007). The metaphysics within physics. Oxford: Oxford University Press. Maudlin, T. (2012). Philosophy of physics: Space and time. Princeton: Princeton University Press. Padmanabhan, T. (1990). Statistical mechanics of gravitating systems. Physics Reports, 188(5), 285–362. Penrose, R. (1989). The emperor’s new mind: Concerning computers, minds, and the laws of physics. Oxford: Oxford University Press. Pollard, H. (1967). The behavior of gravitational systems. Journal of Mathematics and Mechanics, 17 (6), 601–611. Reichert, P. (2012). Can a parabolic-like evolution of the entropy of the universe provide the foundation for the second law of thermodynamics? Master’s thesis, LMU, Munich. Saari, D. G. (1971). Improbability of collisions in Newtonian gravitational systems. Transactions of the American Mathematical Society, 162, 267–271. Wallace, D. (2010). Gravity, entropy, and cosmology: In search of clarity. The British Journal for the Philosophy of Science, 61(3), 513–540. Winsberg, E. (2012). Bumps on the road to here (from eternity). Entropy, 14(3), 390–406.

12 Causality and the Arrow of Time

The law of causality, I believe, like much that passes muster among philosophers, is a relic of a bygone age, surviving, like the monarchy, only because it is erroneously supposed to do no harm. — Bertrand Russell, On the notion of cause, 1912

While causal intuitions are integral to our experience of the world, corresponding relations between states or events are conspicuously absent in its micro-physical description. If the fundamental laws are bi-deterministic (they don’t even have to be time-symmetric), the complete state of the universe at any moment in time1 determines its complete state at any other time. We have, in other words, a symmetric relation of entailment or necessitation and should resist the temptation to superimpose causal notions in order to reify philosophical prejudices. The challenge is then to justify causal talk on the macroscopic level. One may find it surprising that this is a task for statistical mechanics (broadly speaking), at least if one is used to thinking of statistical mechan1 Or,

relativistically, on any Cauchy hypersurface of spacetime.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Lazarovici, Typicality Reasoning in Probability, Physics, and Metaphysics, New Directions in the Philosophy of Science, https://doi.org/10.1007/978-3-031-33448-1_12

227

228

D. Lazarovici

ics as dealing with “random” phenomena and of randomness as pretty much the opposite of causality. It is less surprising if one understands statistical mechanics as the framework for identifying typical phenomena. A standard example of a causal relation is: “The ball hitting the window causes the window to break.” However, it is not true that a ball (with this and that momentum) hitting a window (with such and such material properties) will necessarily break the window. There are certainly microscopic states realizing the given macrostate for which the glass will resist the impact. By the same token, it is not necessarily true that the window would not have broken had it not been hit by the ball. Even in the absence of macroscopic “backup causes” (like a stone hitting instead of the ball), special micro-configurations of the window alone could lead to it spontaneously bursting into a thousand pieces. What is very plausibly true is that typical microstates, coarse-graining to the window and the ball flying toward it, evolve into microstates coarse-graining to a splintered window with a ball on the other side of it. And that, without any impact on the window, it would have typically stayed intact. With this in mind, I propose the following typicality analysis of causation. Suppose B is an atypical macro-event given background conditions C, i.e., .Typ(¬B | C), where conditional typicality is understood in terms of a conditionalized typicality measure. We shall say that a macroevent A strongly causes B (under the “ceteris paribus” conditions C) if Typ(B | A ∩ C),

.

(12.1)

i.e., if A makes the occurrence of B typical. We shall say that A weakly causes B (under the “ceteris paribus” conditions C) if ¬Typ(¬B | A ∩ C),

.

i.e., if A makes the occurrence of B not atypical.

(12.2)

12 Causality and the Arrow of Time

229

Strong causation is the sense in which kicking a ball at rest causes it to move. Weak causation is the sense in which a COVID infection causes the death of an otherwise healthy person.2 These causal relations are also defined for merely possible events. It is not presupposed that A and/or B actually occur. The events A, B, and C must be conceived as macrostates of one physical system (e.g., the system ball .+ window, rather than both separately). In general, A and B will refer to macrostates at different times, but no a priori assumption is made about their temporal order. It is not a premise of the analysis that a cause must precede its effect. When A and B are actual events, both weak and strong causation incorporate a kind of Hume counterfactual, which is not if A had not happened, B would not have happened—the truth value of this counterfactual is underdetermined if A and B are characterized in terms of macroscopic variables—but if A had not happened, B would typically not have happened. It would also make sense to consider the case that A makes a state B typical that is otherwise neither typical nor atypical.3 This would be the sense in which cheating on a decisive die roll causes it to land on six. It supports only the weaker counterfactual, if A had not happened, B happening would not have been typical. But such examples are less relevant to our further discussion, and I shall refrain from introducing a third type of causation. What does not make sense is to speak of causing an event that will typically occur anyway. For instance, given the current state of our solar system, it is typical that the sun will rise tomorrow. If I perform a ritual dance to summon the sun, its rising is still typical, but my dance would not qualify as a cause.

2 In

many cases, weak causes will correspond to what one might intuitively think of as “partial causes,” but if .A ∩ A' strongly causes B, it does not follow that either A or .A' weakly causes B. 3 That is, .¬Typ(B | C) and .¬Typ(¬B | C) but .Typ(B | A ∩ C).

230

D. Lazarovici

Some remarks on inference rules, or rather their possible failure (ceteris paribus conditions omitted): • .Typ(B | A) and .Typ(E | B) does not imply .Typ(E | A). Transitivity fails when the instances of A that typically lead to B are among the exceptional instances of B that fail to realize E. For example, a fire in my home will typically result in my smoke detector going off. Setting fire to my smoke detector will typically result in a fire in my home. But setting fire to my smoke detector does not typically result in the smoke detector going off. • .Typ(E | A) and .Typ(E | B) does not imply .Typ(E | A ∩ B). For instance, an overdose of blood-pressure-lowering medication may cause death, and an overdose of blood-pressure-increasing medication may cause death, whereas an overdose of both will not. • .Typ(E | A) and .Typ(E | B) implies .¬Typ(¬E | A ∪ B) but not 4 .Typ(E|A ∪ B). In other words, if A and B strongly cause E, then .A ∪ B causes E at least weakly. To understand why .A ∪ A' might fail to be a strong cause, note that ' ' .A ∪ A is equivalent to .A ∪ (A \ A). But the microstates realizing ' .A and B might include many that realize A, as well. To stay with the morbid examples: A plane crash might typically result in death, and severe physical injury might typically result in death, but severe physical injury OR a plane crash without severe injury does not typically result in death.

4 Proof:

Let .μ the typicality measure and w.l.o.g. .μ(A) ≥ μ(B):

.μ(E

| A ∪ B) = ≥

μ(E ∩ (A ∪ B)) μ(E ∩ A) + μ(E ∩ B) − μ(E ∩ A ∩ B) = μ(A ∪ B) μ(A ∪ B)

μ(E ∩ A) + μ(E ∩ B) − μ(E ∩ B) μ(E ∩ A) 1 ≥ = μ(E | A). μ(A) + μ(B) 2μ(A) 2

12 Causality and the Arrow of Time

231

• .Typ(E | A), Typ(E | B) ⇒ Typ(E | A∪B) is valid if .E∩A∩B = ∅, that is, intuitively speaking, if A and B bring about E in distinct ways.5

12.1 Causal Explanations as Typicality Explanations According to the proposed analysis, causal relations are not fundamental and not instantiated by microscopic interactions. Given the state of the world at any time t, the fundamental laws determine the state of the world at any other time, sooner or later. If the laws are time symmetric, there is an additional sense in which they make no difference between “past” and “future.” Not only can we infer the microstate at .t1 from the microstate at .t2 (as much as the other way around), but the dynamics evolving one into the other has the same mathematical form. Causal relations, on the other hand, are understood in terms of typical evolutions between macrostates. And these relations will manifest themselves asymmetrically in systems which—like our universe—have a thermodynamic arrow of time. We already discussed how this follows from Boltzmann’s statistical mechanics: If we consider the history of a system with an entropic gradient, the evolution toward the entropic past (lower entropy) will always be atypical relative to any macrostate that the system passes through. Only in the direction of entropy increase can we expect to see an evolution that is typical relative to the “present” macrostate (see Fig. 12.1).

5 Since .μ(E

then, we have for any measure .μ: | A ∪ B) =

μ(E ∩ (A ∪ B)) μ(E ∩ A) + μ(E ∩ B) = μ(A ∪ B) μ(A ∪ B)



μ(E | A)μ(A) + μ(E | B)μ(B) μ(A) + μ(B)



min{μ(E | A), μ(E | B)}(μ(A) + μ(B)) ≥ min{μ(E | A), μ(E | B)}. μ(A) + μ(B)

232

D. Lazarovici

2

0

1

0

=

2

Fig. 12.1 Typical microstates in the intermediate macro-region .M1 = Mact evolve into a higher-entropy region .M2 in both time directions. Only a small subset of microstates (light gray area) have evolved from the lower-entropy state .M0 in the past; an equally small subset (shaded area) will evolve into .M0 in the future. The actual microstate (cross) has evolved from the lower-entropy state in the past; only its future time evolution corresponds to the typical one relative to the macrostate .Mact . For simplicity, the diagram assumes that macrostates are invariant under the time-reversal transformation (.(q, p) → (q, −p) in classical mechanics)

Thus, in a universe with a thermodynamic arrow, causal relations in the sense of (12.1) will generally only be instantiated between past causes and future effects (with respect to the thermodynamic arrow), and causal inferences of the form A, Typ(B | A)  B

.

(12.3)

will only be successful for predictions, i.e., when B lies in the entropic future of A. I emphasize again that this is not a premise but a conclusion of the analysis. Of course, in the ordinary way of speaking, it is possible to cause a lower-entropy state, e.g., when a freezer causes water to turn into an ice cube. However, to apply (12.1), we have to look at the larger picture: A room containing a freezer with water typically evolves into a room with a freezer, an ice cube, and a slightly increased temperature—the latter being

12 Causality and the Arrow of Time

233

a state of higher entropy. The room may be considered as a closed system for the purpose of the analysis, but the system refrigerator .+ water alone may not. The emission of heat into the environment must be taken into account to see the thermodynamic arrow. Since in a universe with a thermodynamic arrow only the macroevolution into the future will be typical relative to the present macrostate, a sensible way of making retrodictions, i.e., inferences about the past, is not (12.3) (from a present “cause” to a past “effect”) but abductive reasoning, by which I mean the following method of “inference to the best explanation”: B, Typ(¬B), Typ(B | A)  A

.

(12.4)

In other words, rather than asking what past state (or states) would be typical given the present macrostate, we should ask what past state (or states) would make our present macrostate typical. Comparison with (12.1) shows that this corresponds to inferring a past cause for a present event. I am not able to provide a complete analysis of what makes an explanation good or best, but the following principles of parsimony seem like reasonable criteria: (i) If A and .A' are possible explanations for B with .A'  A, then we should infer A rather than .A' . In other words, we should not commit to an explanation that is more specific than necessary. (ii) Assuming constant background conditions C, if both A and .A' are possible explanations for B but .Typ(A | C) while .¬Typ(A' | C) (A is typical given C but .A' is not) or .Typ(¬A' | C) while ' .¬Typ(¬A | C) (.A is atypical given C but A is not) then A is preferable to .A' . This is the point where, when we see hoof prints in the ground, we infer a horse rather than a unicorn (based on known zoological facts C). Analogously to (12.4), we can make abductive inferences about weak causes in the sense of (12.2): B, Typ(¬B), ¬Typ(¬B | A)  A

.

(12.5)

234

D. Lazarovici

Other things being equal, a strong cause is a better explanation than a weak one: (iii) If .Typ(B | A) while .¬Typ(¬B | A' ) but also .¬Typ(B | A' ), we should infer A rather than .A' (other things being equal). However, this preference can pull in opposite directions to both (i) and (ii) above. I suggest that (iii) generally trumps (i) but is trumped by (ii). That is, a typical (or not atypical) weak cause is a better explanation than a non-typical (or atypical) strong cause, but a more specific strong cause is preferable to a less specific weak one if both are typical/nontypical/atypical given the relevant background assumptions. To summarize, causal relations are defined between macrostates and in terms of typicality. In particular, causal inferences and explanations are a form of typicality inference and explanation—notably based on conditional typicality. Features of the world that are typical tout court, i.e., with respect to all nomologically possible worlds, not just given some other contingent state of affairs, are explained in a more profound and conclusive sense by the laws of nature alone.

12.2 Causal and Epistemic Asymmetry Since the thermodynamic asymmetry corresponds, so to speak, to an asymmetry of typicality, one might worry that we committed what Price (1996) calls a “temporal double standard” in accounting for the thermodynamic arrow in the first place. This account was based on the fact that a thermodynamic evolution of the universe is typical relative to a low-entropy initial macrostate. But the same evolution (just viewed in reverse) is atypical with respect to a future (final?) macrostate. And have I not insisted that explanations based on atypicality are unacceptable, that atypical facts generally cry out for further explanation? Indeed I have, but the atypical evolution of the universe toward its entropic past is explained by the low-entropy boundary condition, i.e., the Past Hypothesis. There is not much more to be said here, unless one is unhappy with this explanation. And there are reasons to be unhappy, since a low-entropy initial condition is itself atypical relative to the total phase space of the

12 Causality and the Arrow of Time

235

theory. In Chap. 11, we thus discussed the prospect of making do without the assumption of a special initial state and establishing the existence of a thermodynamic arrow as typical tout court. Then, it would also be typical (tout court) that the universe’s macro-evolution toward the entropic past looks atypical (relative to any intermediate macrostate), and we could be truly satisfied. In any case, it should be emphasized that, if one holds a reductive view about the direction of time, the direction of the lower-entropy states is not a priori identified as the past. The aim is instead to reduce what we experience as the difference between past and future to the thermodynamic asymmetry and/or the asymmetric boundary conditions. An opposite point of view is advanced by Tim Maudlin (2007), who regards the direction and, more precisely, the passage of time as metaphysically primitive. For Maudlin, it lies in the nature of time and laws that explanations go from past or initial to future or final states. So we have the following situation: if the asymmetrical treatment of the ‘initial’ and ‘final’ boundary conditions of the universe is a reflection of the fact that time passes from the initial to the final, then the entropy gradient, instead of explaining the direction of time, is explained by it. […] If we are to maintain that typicality arguments have any explanatory force –and it is very hard to see how we can do without them– then there must be some account of why they work only in one temporal direction. Why are microstates, except at the initial time, always atypical with respect to backward temporal evolution? And it seems to me that we have such an explanation: these other microstates are products of a certain evolution, an evolution guaranteed (given how it started) to produce exactly this sort of atypicality. This sort of explanation requires that there be a fact about which states produce which. That is provided by a direction of time: earlier states produce later ones. Absent such a direction, there is no account of one global state being a cause and another an effect, and so no account of which evolutions from states should be expected to be atypical and typical in which directions. If one only gets the direction of causation from the distribution of matter in the spacetime, but needs the direction of causation to distinguish when appeals to typicality are and are not acceptable, then I don’t see how one could appeal to typicality considerations to explain the distribution of matter, which is what we want to do. (pp. 131–134)

236

D. Lazarovici

I agree with Maudlin’s profound observation of an intimate connection between the asymmetries of entropy, typicality, and causation but disagree on questions of priority. In particular, I do not think there is any physical sense in which a microstate .X(t0 ) produces .X(t+ ) but not .X(t− ) (for .t+ > t0 > t− ). There is a sense in which the macrostate .M(t0 ) produces .M(t+ ) but not .M(t− ), namely that Typ (M(t+ ) | M(t0 )) but Typ(¬M(t− ) | M(t0 ))

.

(12.6)

because of the entropic arrow. I believe that this macroscopic asymmetry is sufficient for capturing our experience of a causal asymmetry, but extrapolating it to the fundamental (microscopic) level is unwarranted. At least, I want to follow the reductive program for the direction of time and see how far we get. My starting point will be the typicality-based methods of causal and abductive inferences that we found appropriate for predictions and retrodictions, respectively. (That is, præ and retro with respect to the entropic arrow.) To what extent can this asymmetry of inferences account for our experience of an arrow of time? The question, to be clear, does not concern temporal qualia6 or the phenomenology of passage. The challenge we can address without having to tackle the hard problem of consciousness is to identify a physical basis for temporal phenomena that are characteristic of the perceived distinctions between future and past. There is a sense in which the mind will be involved with its capacity for theoretical reasoning, but it won’t be part of the physical story. Several authors (e.g., Albert (2000); Callender (2016); Price (1996); Reichenbach (1956)) have highlighted two phenomena, in particular, that must be accounted for: 1. We have records (including memories) of the past but not of the future. 2. We can influence the future but not the past—or have, at least, the very strong impression that this is so.

6 On

this issue, see Farr (2020); Ismael (2011).

12 Causality and the Arrow of Time

237

I want to show that the entropic arrow, via the corresponding modes of typicality inferences toward the entropic past and future, provides a plausible physical basis for these “psychological” arrows of time.

12.2.1 Asymmetry of Records We find a dinosaur bone in the ground7 and conclude that a past macrostate that would have typically evolved into the present state containing a dinosaur bone is a state containing a dinosaur. The bone is thus a record of a dinosaur in the past, based on abductive reasoning in the sense of (12.4). The usual objection that a state containing a dinosaur is much more unlikely than a random fluctuation producing a bone has no basis in our analysis, not least because it never associated the measure of a macro-region with an intrinsic probability. We can also predict, for the distant future, that the bone will further decay (as almost everything else), but this seems less exciting than a dinosaur. In some cases, however, the idea that a system’s present state tells us more about its past than about its future might indeed be unwarranted. A half-melted ice cube is arguably as much a “record” of an entire ice cube in the past as of a puddle of water in the future. Why do we find fossil records from animals that have existed in the past but not from animals that will exist in the future? Because dying and decaying is an entropy-increasing, i.e., thermodynamically irreversible process that typically occurs in one time direction only. Remark 6 I have to emphasize that, when talking about inferences to past or future macrostates, I am not referring to a rigorous deductive procedure. The claim is that an inference from a present bone to a past (but not a future) dinosaur is physically justified—based on typicality reasoning and the entropic arrow—not that it is compelled by logic. Note, in particular, that what candidate states we consider in the first place depends on the pertinent macro-variables or predicates that carve them out. And there is nothing in the fundamental laws per se that tells 7 This

example is adapted from Feynman (1967, Chap. 5).

238

D. Lazarovici

us to partition phase space into dinosaur and non-dinosaur states. As I said before, it is an additional part of the scientific enterprise—in this case, paleontology—to devise a language that helps us make sense of the world. In principle, we could always evolve a macro-region B back in time to find .Typ(B | Ф−t B). But .Ф−t B is a miscellaneous collection of microstates spread across phase space that does not correspond to any meaningful or empirically discernible macro-event. The dinosaur example is somewhat special in that it can be conceived in terms of the thermodynamic history of the dinosaur alone. In contrast, many records are interesting precisely because they tell us something about interactions with other systems. When we expose a photographic film to take a picture, its final record state has lower entropy than its initial, unexposed state. Still, if we find a photograph and ask what past state would have typically produced it, we can infer a film in a camera being exposed to light reflected from the scenery depicted in the image. There is no autonomous evolution of the photograph—at least none I can think of—that would make its current state typical. Hence, we infer that it was part of a larger system in which it interacted and from which it branched off in the past. In the first instance, when we observe a subsystem B at time .t0 , we don’t know with which other systems, if any, it has interacted, or will interact, at other times. However, we infer past interactions under the constraint that they should typically produce the present state of B, while no such constraint is justified for future interactions. Someone could come and throw the photograph into a fire. The evolution of a heap of ash into a photograph is atypical. Hence, we can conclude that the photograph has not been burned to ash at any time .t < t0 , but not that it won’t be burned at some time .t > t0 (because, to repeat, the thermodynamic asymmetry implies that macro-evolutions toward the entropic past are atypical relative to the macrostate at t). I submit that human memory—whatever its exact neurophysiological basis—must work according to this abductive principle. To be reliable, the event recalled from (or represented in) a particular brain state B (a macrostate) must be an event that makes this brain state typical. Given the entropic arrow, a past interaction with another system A might make my

12 Causality and the Arrow of Time

239

current brain state typical (relative to that past state of .A ⊕ B), but future interactions generally won’t. Thus, if memories are “produced” by external events, they must be events in the entropic past of the corresponding brain state. The analysis just sketched is different from, but not incompatible with, that of David Albert’s seminal Time and Chance (2000) (see also Loewer (2012) and Albert (2015)). Albert argues that a record is not one instantaneous state from which we infer a state in the past but two diachronic states—a ready state and a record state—from which we infer events occurring in between. If we consider a frictionless billiard game and know that the black ball had momentum .p0 at time .t0 and momentum .p /= p at .t1 > t0 , we can conclude that it experienced a collision at 1 0 some time between .t0 and .t1 (Albert, 2015, p. 37 ff.). There is no entropic arrow in this idealized (frictionless) example. However, when we observe the state .p0 , we do not know the ball’s momentum at .t1 . When we observe the state .p1 , we remember .p0 at .t0 and can conclude that the ball must have collided in the meantime. What makes .p0 the “ready state” and .p1 the “record state” is that we possess information about the former by the time we observe the latter. And we use this information to make inferences to events preceding the record state. At least in such cases, Albert’s analysis strikes me as spot-on. But how does it fare as an account of the epistemic asymmetry more broadly? An obvious concern is that Albert’s argument involves a circularity: Our records are records of the past and not of the future because we know more about the past than about the future. And we know more about the past than about the future because we have records of the past but not of the future. What breaks the circle, according to Albert, is the Past Hypothesis, which he calls “the mother …of all ready conditions” (Albert, 2000, p. 118). But it is not at all clear how cosmological boundary conditions, which are so far removed and of which we know very little, are supposed to figure into an inference about billiard balls. Albert’s response is that “some crude, foggy, partly unconscious, radically incomplete, but nonetheless perfectly serviceable acquaintance with the consequence of the past hypothesis and the statistical postulate and the microscopic equations of motion will very plausibly have been hard-wired into the

240

D. Lazarovici

cognitive apparatus of any well-adapted biological species”(Albert, 2015, p. 39). Plausible maybe, but the explanation itself remains foggy and incomplete. I believe that the typicality analysis brings us closer to an explanation of why biological species adapted to the thermodynamic asymmetry would remember the past and not the future. Modulo terminological differences, it can be viewed as complementary to the story of Albert’s Time and Chance. Some macrostates, including brain states, hold information about the past via abductive inferences grounded in the thermodynamic asymmetry: They point to past conditions that would have typically produced them. Some of those, in turn, will play the role of “ready conditions” in the sense of Albert’s analysis (including in situations where entropic considerations are negligible), whence knowledge about the past grounds further knowledge about the past.

12.2.2 Asymmetry of Influences Let us now consider the asymmetry of influences that is equally important to how we experience the direction of time. We have at least an illusion of agential control over a limited number of physical degrees of freedom, first and foremost over our body and then, by extension, our immediate surroundings with which we can physically interact (cf. Loewer (2020b)).8 As limited as this (perceived) control may be, it is enough to make a difference between macro-configurations whose typical evolutions are strikingly divergent. Just think of David Lewis’s example of President Nixon deciding whether or not to push the atomic button (Lewis, 1979). A slight movement of the finger may cause—or not cause—a nuclear war. I have already explained why, in systems with a thermodynamic arrow, causal relations manifest asymmetrically, with causes lying in the entropic past of their effects. Our causal intuitions—in particular, the intuition that our choices make a difference in the world—are based on situations in which a macrostate .M0 (e.g., a push of the launch button 8I

note for the record that I am a compatibilist about free will, but nothing in the following discussion hinges on that.

12 Causality and the Arrow of Time

241

of a missile system) typically and actually evolves into a macrostate .M1 (e.g., the launch of a missile). Toward the entropic past, however, the actual evolution is atypical relative to .M0 , the macrostate manifesting the agent’s choice. And an atypical macro-evolution can look arbitrarily strange, conspiratorial, and unpredictable, like a broken vase spontaneously reassembling into a whole. This is one reason why we don’t perceive the same kind of regular conjunctions between “present” choices and preceding events. Notably, nothing about this argument undercuts a counterfactual like “if Nixon had pressed the button, then the initial microstate of the universe would have been different.” If the laws are deterministic, this counterfactual is true, period. It just isn’t constitutive of a causal influence in the sense of (12.1), which defines them in terms of typical macrohistories. As we have seen, the thermodynamic asymmetry then justifies two different modes of inference from present to future and past macrostates, respectively. On the one hand, we ask: What future macro-history would have typically resulted from Nixon pushing or not pushing the button? On the other hand, we ask: What past macro-history would have typically resulted in Nixon pushing or not pushing the button? In both time directions, the inferred states can and usually will depend on the agent’s choice. If we think of retrodiction as an “inference to the best explanation,” it seems obvious that a different explanans (present state) will generally require a different explanandum (past cause), but one might worry that the terminology is doing too much work in dismissing this counterfactual dependence as unremarkable. An important difference between the two modes of inference is that macrostates typically evolving into Nixon pushing or not pushing the button must include the state of Nixon’s brain—to which the agent himself has no direct epistemic access. We can introspect about the reasons for our decisions but rarely about their physical causes. It is then an observation of psychological rather than metaphysical significance that the macrostates which are causally related to our choices (in the sense of a typical evolution from one into the other) are largely external when they lie in the entropic future but largely internal—involving, in particular,

242

D. Lazarovici

our brain states—when they lie in the entropic past. This at least begins to explain why a causal inference of the form (12.1) “feels” so different from an abductive inference of the form (12.4); why the former rather than the latter would be associated with the impression—and be it only a stubborn illusion—of having an impact on the world. Albert (2000, 2015) and Loewer (2007, 2012, 2020a) argue that the possible macro-histories are more constrained toward the past because they must converge in the Past Hypothesis macro-region. This is why the counterfactual dependence of the past on present choices is more limited and the future appears more open. Again, their argument is neither part of my analysis nor contradicted by it. It might well provide an additional piece to the puzzle, although I hesitate to assign such a prominent role to the Past Hypothesis because of my hopes, articulated in the last chapter, that we might be able to get rid of it. In any case, the account of Albert and Loewer reduces, not the causal to the thermodynamic asymmetry but both to the Past Hypothesis. And since they interpret the Past Hypothesis as a law of nature, there is a sense in which the asymmetry of influences gets more nomological oomph9 compared to my analysis in which it is one or two steps further removed from the fundamental laws. I am not unhappy about this aspect since I have long been convinced that causation is only an effective and somewhat anthropomorphic concept whose physical basis goes only so deep.

References Albert, D. Z. (2000). Time and chance. Cambridge, Massachusetts: Harvard University Press. Albert, D. Z. (2015). After physics. Cambridge, Massachusetts: Harvard University Press. Callender, C. (2016). Thermodynamic asymmetry in time. In E. N. Zalta (Ed.). The Stanford encyclopedia of philosophy. Metaphysics Research Lab, Stanford University (winter 2016 ed.).

9 To

the extent that Humean laws have any.

12 Causality and the Arrow of Time

243

Farr, M. (2020). Explaining temporal qualia. European Journal for Philosophy of Science, 10(1), 8. Feynman, R. (1967). The character of physical law. M.I.T. Press. Ismael, J. (2011). Temporal experience. In C. Callender (Ed.). The Oxford handbook of philosophy of time. Oxford University Press. Lewis, D. (1979). Counterfactual dependence and time’s arrow. Noûs, 13(4), 455–476. Loewer, B. (2007). Counterfactuals and the second law. In H. Price & R. Corry (Eds.). Causation, physics, and the constitution of reality. Russell’s republic revisited (pp. 293–326). Oxford: Oxford University Press. Loewer, B. (2012). Two accounts of laws and time. Philosophical Studies, 160(1), 115–137. Loewer, B. (2020a). The Consequence Argument Meets the Mentaculus. Preprint: http://philsci-archive.pitt.edu/17328/. Loewer, B. (2020b). The Mentaculus vision. In V. Allori (ed.). Statistical mechanics and scientific explanation: Determinism, indeterminism, and laws of nature. World Scientific. Maudlin, T. (2007). The metaphysics within physics. Oxford: Oxford University Press. Price, H. (1996). Time’s arrow and Archimedes’ point. New directions for the physics of time. Oxford: Oxford University Press. Reichenbach, H. (1956). The direction of time. Mineola, New York: Dover Publications. Russell, B. (1912). On the notion of cause. Proceedings of the Aristotelian Society, 13, 1–26.

13 Quantum Mechanics

Assuming the success of efforts to accomplish a complete physical description, the statistical quantum theory would, within the framework of future physics, take an approximately analogous position to the statistical mechanics within the framework of classical mechanics. — Albert Einstein in (Schilpp, 1949, p. 672)

Our discussions have been based almost exclusively on deterministic laws. Some might object that this is not very naturalistic. Aren’t our best physical theories, viz. quantum theories, indeterministic? The short answer is that they most likely are not. While it is folklore that intrinsic randomness takes hold in quantum mechanics, the claim does not stand up well to scrutiny. Standard quantum mechanics involves only one precise dynamical equation—the Schrödinger equation describing the time evolution of the wave function—and this equation is perfectly deterministic. Randomness comes in only with the infamous collapse postulate according to which the “measurement” of an “observable” produces one of the possible outcomes with probabilities given by the Born rule (related to the famous .|ψ|2 probability distribution). But what distinguishes “measurements” from other physical interactions? What are © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Lazarovici, Typicality Reasoning in Probability, Physics, and Metaphysics, New Directions in the Philosophy of Science, https://doi.org/10.1007/978-3-031-33448-1_13

245

246

D. Lazarovici

the precise physical conditions under which the Schrödinger evolution gets suspended in favor of the probabilistic state reduction? The collapse postulate is hopelessly vague, as no one pointed out more clearly than John Bell (despite earlier complaints by Einstein and others). It would seem that the theory is exclusively concerned about ‘results of measurement’, and has nothing to say about anything else. What exactly qualifies some physical systems to play the role of ‘measurer’? Was the wavefunction of the world waiting to jump [collapse] for thousands of millions of years until a single-celled living creature appeared? Or did it have to wait a little longer, for some better qualified system …with a Ph.D.? If the theory is to apply to anything but highly idealised laboratory operations, are we not obliged to admit that more or less ‘measurement-like’ processes are going on more or less all the time, more or less everywhere? Do we not have jumping then all the time? (Bell, 2004, p. 216)

13.1 The Measurement Problem To understand what is wrong with textbook quantum mechanics, we have to speak about the measurement problem. The measurement problem is essentially Schrödinger’s cat paradox, but its most rigorous formulation is due to Maudlin (1995). It is the logical inconsistency of the following three premises: (1) The wave function .ψ of a system provides a complete description of its physical state. (2) The time evolution of the wave function always follows a linear Schrödinger equation of the form .ih∂ ¯ t ψ = Hˆ ψ. (3) Measurements usually have unique outcomes (though not always the same when the measured systems are described by the same wave function). It is not hard to see that these three premises are mutually inconsistent, for instance, as follows. Suppose a system is described by a linear combination of wave functions .ϕ1 and .ϕ2 , and a measurement apparatus can display either “.ϕ1 ” or “.ϕ2 ” after interacting with the system. This apparatus is also a physical system described by a wave function. After

13 Quantum Mechanics

247

all, we conceive of the apparatus as consisting of atoms and molecules (or more elementary particles), and if those are all described by a wave function, then this wave function must also describe the apparatus as a whole. This means that our measurement device has states .Ф1 and .Ф2 — pointer positions “1” and “2” corresponding to wave packets with disjoint support in configuration space—and a ready state .Ψ0 , such that Schrödinger evolution

−→

ϕi Ф0

.

ϕi Фi .

(13.1)

The Schrödinger time evolution (13.1), however, is linear. Therefore, a system wave function ϕ = c1 ϕ1 + c2 ϕ2 ,

.

c1 , c2 ∈ C,

|c1 |2 + |c2 |2 = 1,

(13.2)

evolves into ϕФ0 = (c1 ϕ1 + c2 ϕ2 )Ф0

Schrödinger evolution

−→

.

c1 ϕ1 Ф1 + c2 ϕ2 Ф2 .

(13.3)

The superposition c1 ϕ1 Ф1 + c2 ϕ2 Ф2

.

(13.4)

describes an entangled state between the measured system and the measurement device in which the pointer position seems to be “1” and “2” at the same time. “Pointer” here is just a stand-in for whatever indicates the result of the measurement—a cat being dead or alive, a detector clicking or not clicking, etc. Strictly speaking, the mathematics alone doesn’t tell us what to make of such a superposed wave function (e.g., that the sum in (13.4) should be read as a logical and). But if (13.4) is supposed to provide a complete description of the total system, it cannot, on some occasions, describe a state in which the measurement device indicates the outcome “1” and, on others, a state in which it indicates the outcome “2”. Hence, if we insist that a measurement of (13.2) has a unique outcome (though not always the same), we must conclude that either the state

248

D. Lazarovici

description provided by the wave function is incomplete or the linear Schrödinger evolution of the wave function is not universally valid. Since assumptions (1)–(3) lead to a contradiction, any consistent formulation of quantum mechanics has to reject at least one of them. 1. If we reject assumption (1), i.e., that the wave function of a system provides a complete description of its physical state, we have to specify additional variables that complete the state description. The most natural way to do so is realized in a quantum theory called Bohmian mechanics or de Broglie–Bohm theory. Bohmian mechanics postulates an ontology of point particles so that the state of a system is given by its actual particle configuration in addition to its wave function. For an N particle system, this is a pair .(ψ, Q), with .Q ∈ R3N the configuration of particles in physical space. The role of the wave function .ψ is first and foremost to guide the motion of the particles. This is expressed by a precise mathematical law in which the wave function enters. The measurement problem is solved because every system has a welldefined spatial configuration at all times, given by the positions of its constituent particles. The wave function of a measurement device may be in a superposition (13.4), but the actual configuration Q describes a pointer pointing either to the left or to the right. 2. If we reject assumption (2), i.e., that the evolution of the wave function is described by a linear equation, we have to specify what (kind of ) equation describes it instead. This leads to the class of objective collapse theories whose prototype is known as GRW (after Ghirardi et al. (1986)). Objective collapse theories modify the linear Schrödinger equation by a stochastic collapse term, which is such that systems with few degrees of freedom hardly ever collapse (thus behaving as they would according to Schrödinger’s equation), while macroscopic superpositions are typically destroyed almost instantly. The crucial point is that, in contrast to orthodox quantum mechanics, the collapse of the wave function is described by a precise mathematical law which is valid at all times; it is not an ad hoc postulate presupposing a distinguished status of “measurements” or “observers.”

13 Quantum Mechanics

249

3. Rejecting assumption (3) and admitting that measurements have, in general, no definite outcome leads to Many-Worlds theories that accept macroscopic superpositions like (13.4) as describing the coexistence of equally real states, e.g., a measurement device pointing left and a measurement device pointing right. The key insight, going back to Hugh Everett III, is that this is not per se in contradiction with our experience if we take the quantum theory seriously on all scales. That is because the superposition does not stop with the measurement device; it will come to include the experimentalist, the laboratory, indeed the entire universe. In the last resort, all possible measurement outcomes are realized in different “worlds,” corresponding to decoherent branches of the universal wave function. This decoherence—basically the separation of branches on “configuration space”—is an irreversible process (think thermodynamic irreversibility), and the linearity of the Schrödinger evolution ensures that different world branches do not interact. These three theories (or classes of theories) are also called “quantum theories without observers” because they provide an objective physical description in which “the observer” assumes no a priori distinguished role. They are not without alternatives—at least not when it comes to the details—but the most serious proposals for what non-relativistic quantum mechanics could be. Of little interest are the various “interpretations” of quantum mechanics that implicitly or explicitly deny at least one of the assumptions (1)–(3) without taking the step of including additional state variables, replacing the Schrödinger equation, or admitting a coherent Many-Worlds picture. This includes the old Copenhagen-style quantum mechanics that can be understood as choosing options one and two some of the time—insisting that the wave function provides a complete state description for some kinds of systems but not for others, or that the Schrödinger equation is valid for some kinds of interactions but not for others—with a shifty demarcation line somewhere between the microscopic and macroscopic worlds.

250

D. Lazarovici

13.1.1 Connecting the Wave Function to the World If, from an orthodox perspective, the measurement problem is acknowledged at all, then it is usually as a challenge to improve upon the unprofessional vagueness of the collapse postulate. In the spirit of “interpreting” quantum mechanics, this call for precision is not met by rigorous mathematics—as in objective collapse theories—but in prose. The idea is that, if we could only provide a better answer to Bell’s question of what qualifies an interaction as a “measurement” or a system as an “observer” (maybe consciousness?), the problem would be solved.1 Aside from missing the point of Bell’s criticism, such attempts to interpret the measurement problem away are also missing the deeper issue: How does the wave function (or the quantum formalism in general) connect at all to the world that we experience? How does a complexvalued field on configuration space (or maybe some abstract vector in Hilbert space) describe a cat or the pointer of a measurement device that we perceive as material objects in three-dimensional space? The real question, in other words, is not what to make of a superposed wave function of “dead cat” and “alive cat,” but what a wave function has to do with a cat in the first place. To even begin to answer this question, orthodox quantum mechanics introduces a number of axioms that are at least as problematic as the

1 Witness,

for instance, David Mermin’s defense of the so-called QBist interpretation:

Albert Einstein famously asked whether a wavefunction could be collapsed by the observations of a mouse. Bell expanded on that, asking whether the wavefunction of the world awaited the appearance of a physicist with a PhD before collapsing. The QBist answers both questions with “no.” A mouse lacks the mental facility to use quantum mechanics to update its state assignments on the basis of its subsequent experience, but these days even an undergraduate can easily learn enough quantum mechanics to do just that. (Mermin, 2012)

13 Quantum Mechanics

251

collapse postulate. If we look at an honest textbook such as CohenTannoudji et al. (1991), we find axioms like the following (numbering in the original): ˆ this 2. Every measurable physical quantity Q is described by an operator .Q; operator is called an observable. 3. The only possible result of the measurement of a physical quantity Q is ˆ one of the eigenvalues of the corresponding observable .Q. 4. When the physical quantity Q is measured on a system in the normalized state .ψ, the probability .P(qn ) of obtaining the non-degenerate eigenˆ is value .qn of the corresponding observable .Q  .P(qn ) = φn∗ ψ where .φn is the normalized eigenvector of .Qˆ associated with the eigenvalue .qn .

I will say more about what the role and status of the “observable operators” actually is. The point here is that textbook quantum mechanics relies on problematic axioms about “measurements” or “observations” long before we arrive at the contradiction between the collapse postulate (CohenTannoudji’s postulate 5) and the linear Schrödinger evolution (postulate 6), axioms that already require us to accept “measurements” as primitive and hardly apply outside of more or less controlled experiments. Even within a controlled laboratory context, it is actually remarkable that the procedure works as well as it does; that, in so many cases, physicists will figure out what operators to use and what matching experiments to perform, even though it does not—and cannot—follow from an analysis of the theory. (In some cases, it is not clear what to compute if one has only the operator formalism to work with. For instance, if one asks about the distribution of arrival times. (See, e.g., Das & Dürr (2019); Vona et al. (2013)): How long will it take an electron to hit a detector screen after leaving the particle source? But this question is beyond the scope of our discussion.) Vagueness and ambiguity are certainly unacceptable in the dynamical laws of a theory. But why should they be more acceptable when it comes

252

D. Lazarovici

to the way the theoretical formalism connects to physical facts in the first place? In order to provide a precise and objective description of nature— including but not limited to measurement processes—modern quantum theories have, by and large, followed two different strategies. One is the primitive ontology program,2 which admits physical variables, over and above the wave function, that represent the microscopic constituents of matter in space and time. Such theories—with Bohmian mechanics as the prime example—thus relieve the wave function from the burden of representing matter, its role being instead a dynamical one for the evolution of the primitive ontology. The other strategy can be subsumed under the moniker of quantum state functionalism (see Ney (2021) for a detailed discussion). It tries to develop an objective description of nature by locating three-dimensional objects as patterns in the wave function (Albert, 2013, 2015) or a more abstract notion of quantum state (see, in particular, the spacetime state realism of Wallace and Timpson (2010) and Wallace (2012)). In contrast to the primitive ontology program, the basic relation between the fundamental ontology and manifest macro-objects is then not one of mereological composition but one of functional enactment. Sophisticated versions of Everettian quantum mechanics, i.e., Many-Worlds theories, generally fall into this camp. I have concerns about the viability of the functionalist approach (cf. Lazarovici (2020), Maudlin (2010)) but will suppress them for the remainder of this discussion. The key insight behind both approaches is that, if a theory has well-defined dynamical laws and a clear ontology in terms of which it can describe cats and dogs and pointers on measurement devices, there can be no measurement problem. The description of nature provided by such a theory can be wrong in that it does not match the empirical facts, but it cannot be paradoxical. In particular, whether measurements result in definite outcomes (as in Bohmian mechanics) or in “many worlds” (as in Everettian quantum mechanics) or in some

2 See Allori et al. (2008), Allori et al. (2014), Allori (2013), Esfeld (2014a), Esfeld (2020); cf. Bell (2004, Chap. 4) on “local beables” and the notion of “primary ontology” in Maudlin (1997, 2019).

13 Quantum Mechanics

253

unrecognizable mess is not postulated or interpreted but inferred from an analysis of the theory. Objective collapse theories are generally regarded as a third option for solving the measurement problem (which is how I introduced them myself ) but fall into either one of the two camps just described when it comes to the role of the wave function. The original GRW theory (now sometimes called GRW0) is a theory about the wave function alone. It modifies the linear Schrödinger evolution to suppress macroscopic superpositions like that of “dead cat” and “alive cat.” But there is still a difference between a cat and the wave function of a cat—even a collapsed one—and GRW0 faces the same challenges as Everettian quantum mechanics in making the connection. Nowadays, it is thus common to equip the GRW theory with a primitive ontology. One proposal (due to John Bell (2004, Chap. 22)) regards the collapse centers themselves as the primitive ontology—discrete “matter flashes” in space and time that constitute macro-objects (GRWf). Another proposal (due to Ghirardi et al. (1995)) uses the wave function to define a continuous mass density field in physical space (GRWm).3 In any case, the wave function then assumes a dynamical role for the evolution of the primitive ontology— much as it does in Bohmian mechanics—only that the evolution is now intrinsically stochastic. Thus, if the measurement problem is understood as the problem of connecting the wave function to observable facts, the collapse dynamics add little to the solutions provided by Bohmian mechanics or modern versions of Many-Worlds (cf. Esfeld (2018)). Empirically, though, they might just turn out to be correct. For spontaneous collapse theories make certain predictions that differ from those of unitary quantum theories (like Bohm and Everett), in which the superposition principle is valid on all scales. There is currently an increased interest in testing for those predictions, wherein the game is to narrow the possible range of two free parameters—the collapse frequency and the location length—that enter the collapse law. So far, there is no evidence for a breakdown of unitarity, but the remaining parameter range becomes increasingly hard to probe.

3 An

option that is also available for Many-Worlds, see Allori et al. (2011).

254

D. Lazarovici

Where does this leave us in regard to determinism? Both Bohmian and Everettian quantum mechanics are fundamentally deterministic. Out of the serious solutions to the measurement problem, only objective collapse theories require real, irreducible randomness. Somewhat ironically, this is also the class of theories whose predictions differ, in principle, from what are considered to be predictions of standard quantum mechanics. Hence, one might go as far as saying that if quantum mechanics is indeterministic, it is not exact, and if quantum mechanics is exact, it is not indeterministic.

13.2 Born’s Rule and the Measurement Process It seems to me that the term “probability” is often abused nowadays. […] A statement of probability presupposes full reality of its subject. No reasonable person will make a guess as to whether Caesar’s die on the Rubicon had a five on top. Quantum mechanics sometimes acts as if probability statements were to be applied to events with blurry reality. — Erwin Schrödinger, Letter to A. Einstein on Nov. 18, 1950 (translation D.L.)

A common first reaction to the measurement problem is to insist that the wave function was never meant to describe the actual physical facts but that only its statistical interpretation according to the Born rule is significant. A superposition like (13.4) should thus be read as saying that the measurement outcome is “1” with probability |c1 |2 or “2” with probability |c2 |2 , nothing more and nothing less. It is this statistical law, after all, that is empirically confirmed with great precision. Fair enough, but merely pointing to the Born rule does not solve the measurement problem. According to the Schrödinger equation, the wave function at the end of the experiment is always (13.4). If this wave function provides a complete description of system and apparatus, the outcome of the measurement will always be the same. So again, either the linear Schrödinger equation is incorrect or the wave function is incomplete, and we are missing precisely those physical variables whose probability distribution the Born rule is supposed to describe (cf. the “problem of statistics” in Maudlin (1995)). That said, let us apply the Born rule to our measurement scenario and see what we can gather from it. We describe the configuration space

255

13 Quantum Mechanics

of the complete system by coordinates q = (x, y), where x ∈ Rk are the coordinates of the measured system and y ∈ Rm those of the measurement device. The result of our measurement interaction is the superposition (13.4), where Ф1 and Ф2 are concentrated on different pointer configurations and have thus essentially disjoint support in configuration space (supp Ф1 ∩ supp Ф2 ≈ ∅). According to Born’s rule, we then have  .P(pointer points to 1) = |c1 ϕ1 Ф1 + c2 ϕ2 Ф2 |2 dk x dm y (13.5) supp Ф1



= |c1 |

|ϕ1 Ф1 |2 dk x dm y

2 supp Ф1

 + |c2 |2

|ϕ2 Ф2 |2 dk x dm y supp Ф1

  + 2Re c1 c2

(ϕ1 Ф1 )∗ ϕ2 Ф2 dk x dm y



supp Ф1

(13.6)

 ≈ |c1 |2

|ϕ1 Ф1 |2 dk x dm y = |c1 |2 .

(13.7)

Here, we used the fact that Ф1 is (just about) zero on supp Ф2 and vice versa so that the interference term (13.6) is zero (or nearly so). This suppression of the interference term is also called decoherence. The probability of the outcome “1” is thus p1 = |c1 |2 , and the probability of the outcome “2” is p2 = |c2 |2 , just as the rules of textbook quantum mechanics suggest. However, before getting into philosophical debates about what these probabilities mean, different quantum theories will disagree about what events they even refer to. In Bohmian mechanics, Born’s rule provides a probability distribution for actual particle configurations. p1 is thus the probability that, at the end of our measurement process, the pointer— composed of particles—points to the left, indicating the measurement outcome “1”.

256

D. Lazarovici

This corresponds to the natural way of speaking, though not one that orthodox quantum mechanics could justify.4 Instead, it would have to insist that p1 is a probability for finding the pointer pointing left if we “measure” its position, but decidedly not the probability that it actually points left, regardless of whether anyone is looking. In the GRW theory, Born’s rule provides a good approximation to the probability distribution for the “peak” of the collapsed wave function. Strictly speaking, individual collapse events are associated with a particular particle degree of freedom and a “collapse center” in three-dimensional space. But for a system with a huge number of degrees of freedom, a great many collapse events will typically occur in a very short period of time, localizing the wave function ever better in configuration space. Unless the collapse centers themselves are interpreted as the ontology of the theory (GRWf ), p1 is thus, first and foremost, the probability that (13.4) collapses onto a wave function localized in the support of Ф1 . The interpretation of the Born rule in the Many-Worlds theory is difficult—since all possible outcomes actually occur—and will be postponed for a little while. An interesting proposal (though not the one I will endorse) understands probabilities in terms of self-locating uncertainty, e.g., p1 as a credence for finding ourselves on a world branch in which the measurement outcome was “1.” This brief comparison shows that it is impossible to have a meaningful discussion about probabilities in quantum mechanics without specifying which theory (or “interpretation”) one has in mind. This applies, in fact, to almost any subject in quantum physics. The instrumentalist may not care about such discussions. After all, for most practical purposes, there is a widespread agreement on how to apply the “quantum recipe” (Maudlin, 2020)—the mathematical apparatus and loose set of rules that constitute the textbook formalism—to obtain statistical predictions that can be compared with experiment. One just has to be suspicious of claims that anything profound about the nature of quantum probabilities follows from this recipe or the experiments alone.

4 If it even allows us to apply quantum mechanics to macroscopic systems like measurement devices.

13 Quantum Mechanics

257

Remark 7 (Shenanigans with the Density Matrix) Above, I said that orthodox quantum mechanics does not justify an understanding of probabilities as referring to observation-independent facts, even when speaking about macroscopic systems like cats or measurement devices. I have to emphasize that no justification is provided by considering a reduced density matrix of the system and pointing out that it converges to a mixed state in the limit of perfect (environmental) decoherence. The sleight of hand here is based on the fact that the same mathematical device can be used for different things. In particular, it can be used to describe a series of measurements in which the state of the system cannot be reliably prepared, e.g., the wave function of the system is ϕ1 with probability p1 and ϕ2 with probability p2 but we don’t know which in each individual case. Then, the statistics of the experiment can be described by the density matrix ρM = p1 |ϕ1 〉〈ϕ1 | + p2 |ϕ2 〉〈ϕ2 | which represents a “proper mixture.” One can also associate a similarlooking density matrix to a subsystem by starting out with a “pure state” ρM = |c1 ϕ1 Ф1 + c2 ϕ2 Ф2 〉〈c1 ϕ1 Ф1 + c2 ϕ2 Ф2 | for the subsystem and its environment (this would correspond to the entangled wave function c1 ϕ1 Ф1 + c2 ϕ2 Ф2 ) and averaging over (formally, tracing out) the degrees of freedom of the environment. The vanishing of off-diagonal terms proportional to |ϕ1 〉〈ϕ2 | or |ϕ2 〉〈ϕ1 | is then a standard way to see (or even define) decoherence. The fallacy is now to pretend that as soon as this reduced density matrix of the subsystem starts to look like ρred ≈ |c1 |2 |ϕ1 〉〈ϕ1 | + |c2 ||ϕ2 〉〈ϕ2 |, now describing a so-called improper mixture, the same kind of ignorance interpretation as for ρM is suddenly justified. This is wrong for many reasons, but it is, in particular, one of the more striking examples of the lack of intellectual rigor (or honesty) in many treatments of quantum mechanics.

13.2.1 Typicality and Observation There is another interesting point to make about the calculation (13.5). Suppose .ϕ1 corresponds to a concentrated wave packet of a particle moving to the left and .ϕ2 to a wave packet of the same particle moving to the right, say in a wave guide. Let .Ф1 , Ф2 correspond to the states of

258

D. Lazarovici

a detector clicking on the left and right arms, respectively, to indicate the detection of the particle. Now at least in Bohmian mechanics, it makes sense to ask: What is the probability that the particle is actually on the left-hand side when the detector clicks on the right-hand side? So .ϕ1 has support in a region .L ⊂ R3 (in the left arm of the wave guide), while .Ф2 is concentrated on a region .R ⊂ Rm of detector configurations indicating a detection in the right arm. And if X denotes the actual position of the particle and Y the actual configuration of the detector, we find  P(X ∈ L, Y ∈ R) =

|c1 ϕ1 Ф1 + c2 ϕ2 Ф2 |2 dk x dm y ≈ 0,

.

L×R

(13.8)

since .ϕ2 is zero on .L, while .Ф1 is (approximately) zero on .R, hence both .ϕ1 Ф1 and .ϕ1 Ф2 are (approximately) zero on .L × R. Simply put, if you look where the particle is, you will typically find the particle where it is. However, it is quite realistic to assume that the detector states .Ф1 and .Ф2 have long “tails”—i.e., some overlap in configuration space—so that .P(Y ∈ L, Z ∈ R) is not exactly zero but only nearly so (as indicated by the .≈ symbol in (13.8)). Hence, there is a non-zero probability that the particle is on the left while the detector clicks on the right. If our detector is somewhat accurate, this probability will be negligibly small, but the atypical outcome is nonetheless possible according to the theory. And the same is true, in principle, if we replace the particle with the moon and the detector with a human observer looking at the night sky. It is possible, yet atypical, that I see the moon to my right when it is actually to my left. This is not a peculiarity of Bohmian mechanics (or only in the sense that, in contrast to orthodox quantum mechanics, it allows us to speak of actual positions and analyze how they correlate with actual records). Also according to classical electrodynamics, it is possible, yet atypical, that I see the moon to my right, although it is, in fact, to my left because what I see is a very special fluctuation in the electromagnetic field. It is also possible, yet atypical, that I hold a thermometer (or my hand) in hot water but register a very low temperature because all the fast particles happen to be moving away from it.

13 Quantum Mechanics

259

The analysis of the quantum measurement process is thus an excellent illustration of the fact that atypicality can always undermine the reliability of observations. Consequently, any inference from empirical evidence has to rely on typicality reasoning, more precisely, on the assumption that the evidence was not produced by an atypical or very low probability event.

13.3 Observable Operators as Statistical Book-Keepers In textbook quantum mechanics, much ado is made about the so-called observable operators on Hilbert space. These are supposed to represent measurable quantities with their eigenvalues corresponding to the possible outcome values. The role of these operators as the connection between the quantum state and empirical data is axiomatic. Little or no explanation is given as to why a certain quantity or measurement procedure should be associated with a particular operator—or any operator at all. The aim of this somewhat technical section is to explain how, in more precise quantum theories like those mentioned before, observable operators arise as nothing but convenient book-keepers for measurement statistics—an understanding that could have already been gathered from John von Neumann’s seminal Mathematische Grundlagen der Quantenmechanik (1932) . The arguments are easiest to understand in the context of Bohmian mechanics5 but require only the Born rule, the linear evolution of the wave function during the measurement process, and a clear ontology in terms of which the theory can account for the spatiotemporal distribution of matter—including pointer positions, display readings, clicking detectors, or whatever else records the outcome of a measurement. An ideal quantum measurement is then an interaction between a system S and a measurement device D resulting in one of several macroscopically discernible configurations of D (“pointer positions”), which

5 See, in particular, Dürr et al. (2004) (reprinted as Chap. 3 in Dürr et al. (2013)) for a rigorous treatment.

260

D. Lazarovici

are correlated with possible states of S. Schematically, the interaction between the measured system and measurement device is such that, under the Schrödinger evolution .ϕi Ф0 −→ ϕi Фi , the wave function .Ф0 is concentrated on pointer configurations corresponding to the “ready state” of the measurement device and .Фi are concentrated on macroscopically disjoint configurations indicating different measurement results. Since the Schrödinger evolution is linear, a superposition ϕ=



.



ci ∈ C,

ci ϕi ,

i≥1

|ci |2 = 1

(13.9)

i≥1

results in ϕФ0 =



.

ci ϕi Ф0

Schrödinger evolution

−→

i≥1



ci ϕi Фi .

(13.10)

i≥1

The coupling of a system to a measurement device thus leads to a canalization of the wave function into decoherent (orthogonal) branches corresponding to different measurement outcomes. And it follows from the conservation of probability that, if the “pointer states” .Фi are orthogonal, the system wave components .ϕi must be orthogonal as well. Now let .Pi denote the orthogonal projection onto .ϕi . In Dirac notation .Pi = |ϕi 〉〈ϕi |. Then, we immediately obtain 〈ϕ, Pi ϕ〉 = |ci |2 ,

.

(13.11)

corresponding to the Born probability for the pointer position i (i.e., .Y ∈ supp Фi ) as computed in (13.7). If the measurement value indicated by the pointer position i is .αi ∈ R, the expectation value for outcomes of the measurement experiment is  .

i≥1

 αi |ci | = ϕ, 2

 i≥1

 ˆ αi Pi ϕ = 〈ϕ, Aϕ〉

(13.12)

13 Quantum Mechanics

261

with the self-adjoint operator Aˆ =



.

(13.13)

αi Pi .

i≥1

Mathematically, the right-hand side of (13.13) is the spectral decomposiˆ tion of .A.

Example: Spin Measurement An instructive example is the measurement of spin  on a spin-1/2 particle. If one sends a spinor wave function .ψ0 = φ0 (x) βα , |α|2 + |β|2 = 1 (in the z-spin eigenbasis) through a Stern–Gerlach magnet oriented in z-direction, the wave function splits into spatially separating parts



α 0 .ψt = φ+ (x, t) + φ− (x, t) . 0 β is deflected in the positive z-direction and .φ− in the negative z-direction, and the experiment is such that after a sufficient amount of time, the two wave parts will no longer overlap. The probability for measuring spin UP or spin DOWN is now simply the probability that the particle is found in the support of .φ+ (above the symmetry axis), respectively .φ− (below the symmetry axis), when registered by a detector or on a photographic plate. (In Bohmian mechanics, we wouldn’t have to involve the detector in this case but could simply ask where the particle actually is.) Using Born’s rule, we compute

.φ+

 .P(Spin

UP) = P(X ∈ supp φ+ ) = |α|2

|φ+ (x, t)|2 d3 x = |α|2 , 

P(Spin DOWN) = P(X ∈ supp φ− ) = |β|2

|φ− (x, t)|2 d3 x = |β|2 .

These probabilities can be read   off easily from the projections onto the spin components . 01 , respectively . 01 . Written in matrix form, we have



1 0 00 .P+ = , P− = , 00 0 1

(continued)

262

D. Lazarovici

and immediately obtain .〈ψ, P+ ψ〉

= |α|2 , 〈ψ, P− ψ〉 = |β|2 .

(13.14)

The expectation value is, accordingly, .

h¯ h¯ h¯ h¯ P(spin UP) − P(spin DOWN) = 〈ψ, P+ ψ〉 − 〈ψ, P− ψ〉 2 2 2 2     h¯ h¯ = ψ, (P+ − P− )ψ = ψ, σz ψ . 2 2

Here, the Pauli matrix . h2¯ σz , commonly called the “z-spin observable,” appears as the book-keeping operator associated with the experiment.

Let us now consider a more general measurement scheme that also applies in less idealized situations. We associate an experimental situation with a coarse-graining function (a macro-variable) F on configuration space, mapping the possible configurations of system .+ measurement device to a set .A ⊂ R of measurement values. As before, we split the total configuration .q ∈ Q into .q = (x, y), where x ranges over possible configurations of the measured system and y over those of the measurement device. A measurement process now consists of the following steps: 1. The system wave function couples to a measurement apparatus: ϕ(x) −→ Ψ(x, y) = ϕ(x)Ф(y).

.

2. The total wave function of system + apparatus evolves according to the appropriate Schrödinger equation describing the measurement interactions: Ψ0 −→ ΨT (x, y).

.

13 Quantum Mechanics

263

By the Born rule, .ΨT defines a probability distribution over the possible configurations: ΨT (x, y) =⇒ ρ ΨT (x, y) = |ΨT |2 (x, y).

.

3. We are only interested in the distribution of the macro-variable F , coarse-graining configurations to measurement outcomes: Pϕ (A) := PΨT (F −1 (A)), for A ⊂ A.

.

(13.15)

Altogether the sequence results in a positive, sesquilinear map from system wave functions to outcome probabilities. This means that (according to a general theorem in functional analysis) it can be written as ϕ → Pϕ (·) =: 〈ϕ, O(·)ϕ〉 ≥ 0 ,

.

(13.16)

where .〈, 〉 is the scalar product on Hilbert space and .O(A) is a self-adjoint operator for any (measurable) value set .A ⊂ A. The so-constructed family of operators .(O(A))A⊂R defines a generalized “observable,” a so-called positive-operator-valued measure (POVM). The term “measure” becomes clear if we note two properties that follow immediately from (13.15) and (13.16): .

O(A) = PΨT (F −1 (A)) = 1, .

(13.17)

O(A ∪ B) = O(A) + O(B), for disjoint A, B ⊂ A.

(13.18)

Thus, a POVM defines for any .ϕ a probability measure on .A (more precisely, on the Borel sigma-algebra of .A) given by (13.16). Only in nice cases, when the operators .O(A) are orthogonal projections, meaning that they satisfy the additional property O(A ∩ B) = O(A)O(B), for A, B ⊂ A,

.

(13.19)

does the positive-operator-valued measure (POVM) become a projectionvalued measure (PVM), which corresponds to the spectral decomposition

264

D. Lazarovici

of a self-adjoint operator and thus to the textbook notion of a “quantum observable”. Three points, in particular, are important to take away. First, that the Born rule for positions is sufficient to ground the entire measurement formalism of quantum mechanics. Second, the operator formalism is only a convenient mathematical toolkit for summarizing the outcome distributions of measurement experiments. But since the details of the experiment—from the state of the apparatus to the measurement Hamiltonian to the relevant coarse-graining—are so completely absorbed into the associated POVM or PVM, one can see how observable operators could have developed a life of their own. In fact, and this is point three, the “observable values” emerge, in general, only through the measurement process and the canalization of the wave function into macroscopically decoherent branches. Not in any mysterious sense of “emergence,” just as a result of physical interactions characteristic of measurement experiments. Most of the confusion about “quantum logic,”, “metaphysical indeterminism,” “contextuality,” etc. arises only if one tries to think of the observables as fundamental or as representing intrinsic properties that a system might possess prior to, or independently of, the measurement process (cf. Bell (2004); Daumer et al. (1996); Dürr & Lazarovici (2020); Lazarovici et al. (2018)).

13.4 Quantum Equilibrium: Probabilities in Bohmian Mechanics If there is anything all quantum theories agree on, it is that empirical predictions are based on Born’s rule associating the wave function .ψ with the famous .|ψ|2 probability distribution. This is why they can disagree on so much else and still be empirically equivalent for most practical purposes. In textbook quantum mechanics, the Born rule is introduced as a postulate whose exact meaning remains obscure. I will now discuss how in Bohmian mechanics, it can be derived as a typicality result following the quantum equilibrium analysis of Dürr, Goldstein, and Zanghì (1992, reprinted as Chap. 2 in Dürr et al. (2013)), which was anticipated by Bell

13 Quantum Mechanics

265

(2004, Chap. 15). This typicality analysis is in many ways the realization par excellence of the Boltzmannian program that already allowed us to ground classical statistical mechanics in the Newtonian theory of point particles. Sharing the same primitive ontology of point particles, Bohmian mechanics is ideal for highlighting the conceptual continuity between probabilities in classical mechanics and quantum mechanics—contrary to the popular belief that the latter must be of a fundamentally different kind.

13.4.1 Bohmian Mechanics In Bohmian mechanics, the state of a closed system of N particles is completely described by a pair .(Q, Ψ), where .Q = (Q1 , . . . , QN ) ∈ R3N represents the spatial configuration of the particles and .Ψ is the wave function on their configuration space .R3N . The theory is then defined by two dynamical equations. 1. The evolution of the wave function .Ψ follows the Schrödinger equation ih∂ ¯ t Ψt = Hˆ Ψt ,

.

(13.20)

where .Hˆ is the Hamiltonian of the system. 2. The evolution of the particle configuration follows a first-order differential equation in which the wave function .Ψt enters to determine a velocity field .v Ψt for the particles. More precisely, the particle configuration evolves according to the guiding equation .

d h¯ ∇Ψt (Q(t)), Q(t) = v Ψt (Q(t)) := Im dt m Ψt

(13.21)

where .∇ is the gradient on the 3N -dimensional configuration space, Im denotes the imaginary part, and I assume, for simplicity, particles of equal mass m (otherwise read m as a mass matrix).

.

266

D. Lazarovici

Every fundamental theory is a theory of the universe as a whole (cf. Chap. 7). For Bohmian mechanics, this means that, on the fundamental level, there is only one wave function—the universal wave function— guiding the evolution of all the particles together. We then need a procedure for getting from this universal level to an effective description of subsystems. The issue is particularly subtle in the context of quantum physics, where a new kind of holism takes hold. Due to entanglement and the non-separability of the wave function, one needs to carefully analyze when and how subsystems allow for an autonomous description. Large distances or weak interactions are no longer enough to ensure that external influences are negligible. This is manifested, in particular, in Bell’s theorem (Bell, 2004, Chaps. 2, 16, 24), showing that certain statistical correlations predicted by quantum mechanics and well confirmed in experiments (e.g., Hensen et al. (2015)) cannot be explained without assuming some kind of nonlocal influence between distant events. In Bohmian mechanics, this nonlocality becomes explicit in the guiding equation (13.21) formulated on configuration space, which is such that the velocity of any one particle depends, in general, on the positions of all the other particles at the same time. Fortunately, we will see that Bohmian mechanics also allows for the necessary analysis of when and how subsystems can be described independently in terms of their own effective wave function.

The Typicality Measure Bohmian mechanics is a deterministic theory. Given the wave function and particle configuration of the universe at time .t0 , the evolution is completely and uniquely determined for all times.6 Since we do not (in

6 For rigorous results about the existence and uniqueness of solutions, see Berndl et al. (1995), Teufel and Tumulka (2005).

13 Quantum Mechanics

267

fact, cannot) know the exact particle configuration, we need to look at typical regularities grounded in the Bohmian laws to see their empirical import.7 Given the universal wave function .Ψ, the natural typicality measure Ψ .μ on the configuration of the universe is the measure with density 2 Ψ .ρ (q) = |Ψ(q)| . It is distinguished as the unique equivariant measure for the Bohmian particle dynamics.8 Equivariance is the natural generalization of stationarity (see Sect. 5.3) and ensures that typical sets remain typical and atypical sets remain atypical under the Bohmian time evolution. More precisely, if .ФΨ t,0 is the flow on configuration space induced by the guiding equation (13.21), then 

 |Ψ0 | d q =

μ (A) :=

.

2

Ψ

A

3N

ФΨ t,0 (A)

|Ψt |2 d3N q

(13.22)

holds for any measurable .A ⊆ R3N . Indeed, one way to guess the Bohmian guiding law (13.21) is from the continuity equation ∂t ρ Ψ = −∇ · j Ψ =: −∇ · (ρ Ψ v Ψ ),

.

(13.23)

h¯ where .j Ψ = 2im (Ψ ∗ ∇Ψ − Ψ∇Ψ ∗ ) = mh¯ I m Ψ ∗ ∇Ψ is the quantum flux that can be derived from the Schrödinger equation. The .|Ψ|2 -density is thus transported along the vector field .v Ψt that defines the dynamics of the particle configuration. Note that the guiding law is ill-defined for configurations at which .Ψ = 0, so it makes sense that the typicality measure assigns no weight to these singular points. Due to the uniqueness of the equivariant measure, the situation in Bohmian mechanics is more satisfying than in classical mechanics, where the Liouville measure was just the most natural among an infinite number of stationary measures.

7I

am using the “epistemic” argument for simplicity, but it is really a shorthand for the various reasons discussed throughout this book for why the empirical import of microscopic laws is always found in typical regularities. 8 That depends only locally on .Ψ and its derivatives, see Goldstein and Struyve (2007).

268

D. Lazarovici

Effective Wave Functions for Subsystems The typicality measure defined in terms of the universal wave function does not express probabilities and is not what the Born rule is about. Probabilities refer to statistical distributions of subsystems. To get to the latter, we have to take a closer look at how Bohmian mechanics treats subsystems of the universe. Suppose that the subsystem consists of .n < N particles. We then split the configuration space into .R3N = R3n × R3(N−n) , so that, writing .q = (x, y), the x-coordinates describe the degrees of freedom of the subsystem and the y-coordinates the possible configurations of its environment, i.e., the rest of the universe. Analogously, we split the actual particle configuration into .Q = (X, Y ). To pass from the fundamental (universal) theory to a description of the subsystem, we take the universal wave function .Ψt (q) = Ψt (x, y) and plug into the y-argument the actual configuration .Y (t) of the environment. The resulting ψtY (x) := Ψt (x, Y (t))

.

(13.24)

is called the conditional wave function. It is now a wave function for the x-system only but still involves an explicit dependence on .Y (t). However, in many relevant situations, the subsystem will dynamically decouple from its environment. We say that the subsystem has an effective wave function .ϕ if the universal wave function takes the form Ψ(x, y) = ϕ(x)χ(y) + Ψ ⊥ (x, y),

.

(13.25)

where .χ and .Ψ ⊥ have macroscopically disjoint y-supports and .Y ∈ supp χ, so that .Ψ ⊥ (x, Y ) = 0. (This is, notably, much weaker than assuming that .Ψ has a product structure, which is almost never justified.) This means that we can forget about the “empty” branch .Ψ ⊥ (x, y) and describe the subsystem in terms of its own wave function .ϕ (normalized  to . |ϕ(x)|2 d3n x = 1).

269

13 Quantum Mechanics

In terms of this effective wave function, the guiding equation for subsystem particles reads h¯ ∇ϕ ˙ X(t) = Im (X(t)), ϕ m

.

(13.26)

to be compared with (13.21). If we can furthermore assume that the interaction between subsystem and environment (from the Hamiltonian in (13.20)) is negligible, that is, Vext (x, y)ϕ(x)χ(y) ≈ 0,

.

(13.27)

the effective wave function will satisfy its own autonomous Schrödinger evolution. (From the point of view of the subsystem, this part of the interaction potential, coupling x and y degrees of freedom, is an external potential; condition (13.27) thus corresponds to (7.16) in classical mechanics.) Hence, in these situations, the equations describing the subsystem take the same form as the fundamental Bohmian laws for the universe. Effective wave functions are the Bohmian counterparts of the usual wave functions in quantum mechanics and the wave functions to which Born’s rule generally refers. The definitions (13.24) and (13.25) also give rise to a precise and utterly unmysterious sense in which the wave function of a subsystem collapses (while the universal wave function always obeys a linear Schrödinger equation). For instance, as a result of the measurement process (13.1), when ϕ Y (x) = c1 ϕ1 (x)Ф1 (Y ) + c2 ϕ2 (x)Ф2 (Y ) = ci ϕi (x)Фi (Y ),

.

if Y ∈ supp Фi , i = 1, 2, we see that .ϕ1 or .ϕ2 becomes the new effective wave function of the system depending on the measurement outcome, i.e., the actual pointer position of the measurement device.

270

D. Lazarovici

Quantum Equilibrium We proceed with our statistical analysis by considering the conditional measure |Ψ(x, Y )|2 d3n x = |ψ Y (x)|2 d3n x, μΨ (X ∈ d3n x | Y ) =  |Ψ(x, Y )|2 d3n x (13.28)

.

where the conditional wave function .ψ Y is now normalized (and we keep in mind that, in situations described by (13.25), it becomes an effective wave function). This equation already holds a deep insight to which we shall return later. But conditionalizing on Y , the configuration of the entire rest of the universe, is not very practicable. To get to something more useful, we exploit the fact that many different Y s yield one and the same wave function for the subsystem (and thus also one and the same conditional measure (13.28)). Collecting all those Y s, a simple identity for conditional probabilities9 yields μΨ (X ∈ d3n x | ψ Y = ϕ) = |ϕ|2 d3n x,

.

(13.29)

where the measure is now only conditionalized on the fact that the wave function of the subsystem is .ψ Y = ϕ. From this formula, we can derive a law-of-large-numbers result for ensembles of subsystems. We consider an ensemble .X = (X1 , . . . , XM ) of M independent subsystems with the same wave function .ϕ. .Xi ∈ R3n denotes the actual configuration of the i’th subsystem.10 So we have

 9 Let .B = Bi be a pairwise disjoint with .μ(A|Bi ) = a for all .Bi . Then we have .μ(B)a =   μ(A∩B) μ(A|B )μ(B i i) = i i μ(A ∩ Bi ) = μ(A ∩ B) and thus .μ(A|B) = μ(B) = a. 10 The analysis for “time-like ensembles,” i.e., consecutive measurements on the same system, is mathematically more involved and carried out in Dürr et al. (1992).

13 Quantum Mechanics

ψ Y (x1 , x2 , . . . , xM ) =

.

μΨ

.

M

i=1 ϕ(xi ) ,

271

and (13.29) becomes

  Q = (X, Y ) : X1 ∈ dn x1 , . . . , XM ∈ dn xM | ψ Y = ⊗M ϕ =

M 

|ϕ(xi )|2 dn xi

(13.30)

i=1

We are now looking at a product measure for which the law of large numbers can be readily applied. For any .A ⊆ R3n the indicator function .1{Xi ∈A} is 1 if the configuration .Xi is in A and 0 otherwise. The function 1 M M .ρ emp [A] = N i=1 1{Xi ∈A} thus returns, for any universal configuration Q, the relative frequency of subsystems whose particle configurations are in A. With (5.3) we get, for any .A ⊆ R3n and .ϵ > 0,  μΨt .

 M  1     Q: 1{Xi ∈A} (Q) − |ϕ(x)|2  < ϵ M i=1 A

 ≤ δ(ϵ, M),

with δ(ϵ, M) → 0, M → ∞. (13.31) That is, for nearly all possible configurations of the universe, the particle configurations in an ensemble of subsystems with wave function .ϕ are (approximately) distributed according to .|ϕ|2 . We can make use of the equivariance to formulate the result in terms of typical initial conditions of the universe  μΨ0 .

 M   1    Q: 1{Xi ∈A} (Q(t)) − |ϕt (x)|2  < ϵ M i=1 A

 ≤ δ(ϵ, M),

with δ(ϵ, M) → 0, M → ∞. (13.32)

272

D. Lazarovici

Table 13.1 Comparison: Boltzmann distribution as the typical distribution of classical subsystems in thermal equilibrium (left column) and Born distribution as the typical distribution of Bohmian subsystems in quantum equilibrium (right column) Generator of particle dynamics Typicality measure Typical empirical distribution (.≈)

Boltzmann

Born

.H

= H1 + Hext − E)d3N qd3N p −1 .Z exp(−βH1 )

.

.δ(H

.||

= ϕχ + ψ ⊥ 2 d3N q 2 .|ϕ|

In any case, we have found  Typ ρemp [dx] ≈ |ϕ 2 |dx ,

.

(13.33)

for a sufficiently large ensemble of subsystems with conditional/effective wave function .ϕ. The Bohmian laws thus make Born statistics typical. This justifies Born’s statistical hypothesis and clarifies its meaning. Just as in classical mechanics, probabilities in Bohmian quantum mechanics refer to typical relative frequencies. It is instructive to compare this result with (7.12) and the derivation of the Boltzmann distribution in classical mechanics (see Table 13.1). In essence, it is Boltzmann’s statistical mechanics applied to two different theories. Just as the canonical ensemble characterizes classical subsystems in thermal equilibrium, Born’s statistical hypothesis characterizes Bohmian subsystems in quantum equilibrium. And with respect to this feature, i.e., the statistical distribution of particle positions, our universe is indeed in equilibrium. A mathematically helpful but didactically unfortunate fact is that, in contrast to the classical case, the typicality measure in Bohmian mechanics has the same functional form in terms of the universal wave function as the typical empirical distributions in terms of the wave functions of subsystems. If one does not appreciate Boltzmann’s argument and the different meaning and status of the mathematical objects involved, this can easily give rise to the impression that the derivation of Born’s rule is circular: .Ψ 2 in, .ϕ 2 out; what’s the big deal? So let me summarize again: The role of the measure .μΨ defined in terms of the universal wave function is to define typicality. It does not describe an “ensemble

13 Quantum Mechanics

273

of universes” (whatever that would mean), nor is it an expression of our ignorance. This typicality measure is distinguished before all others by stationarity, more precisely, equivariance. By contrast, the .|ϕ|2 -measure given in terms of the conditional or effective wave function refers to statistical distributions in ensembles of “identically prepared” subsystems. .ρemp is the actual distribution of particle configurations in such an ensemble of subsystems with wave function .ϕ. And the fact that .ρemp ≈ |ϕ|2 is typical with respect to .μΨ , i.e., holds for nearly all possible initial configurations of a Bohmian universe, is neither postulated nor assumed but proven. Several non-trivial features of the deterministic Bohmian laws play into this result, and all known empirical predictions of quantum mechanics follow from it. A pretty big deal indeed.

13.4.2 Absolute Uncertainty So, probabilities in Bohmian quantum mechanics are what they were in classical mechanics: Typical relative frequencies grounded in deterministic laws. Nonetheless, there appear to be striking differences between classical and quantum mechanics that we need to account for. In particular, also in Bohmian mechanics, one cannot do better than making empirical predictions based on Born statistics, while in classical physics, we rely on the statistical method only some of the time. We discussed in Chap. 7 why this is somewhat deceptive; why, at the end of the day, even the ostensibly deterministic phenomena predicted by Newtonian mechanics are statistical in nature and predicted as typical regularities of manyparticle systems. Still, the question remains as to why the quantum realm appears to us so much more random and unpredictable. Part of the answer is trivial. Quantum mechanics is primarily used to describe microscopic systems, while Newtonian mechanics is successfully applied on macroscopic scales. Macroscopic predictions are bound to be more robust against our ignorance about the micro-conditions. Of course, we should think of quantum mechanics as the more fundamental theory from which classical mechanics should emerge in an appropriate “classical limit.” In Bohmian mechanics, this refers to situations in which the Bohmian trajectories look approximately Newtonian on macroscopic

274

D. Lazarovici

scales (see Allori et al. (2002) or (Dürr & Teufel, 2009, Chap. 9)). This means, however, that the successful “deterministic” predictions of classical mechanics are also—and more fundamentally—predictions of Bohmian mechanics. Furthermore, we have seen that the predictability of a system depends on our ability to describe it, at least effectively, as independent of external influences. The nonlocality of quantum mechanics makes this a particularly delicate issue. Newtonian gravity is also nonlocal, but only in a milder sense. Forces fall off quickly with increasing distance (and gravity is very weak, to begin with) so that parts of the universe can often be described as practically autonomous systems. Quantum mechanics, by contrast, has a distinctly holistic character. In Bohmian mechanics, this is manifested in nonlocal dynamics, in which the entire configuration of particles is guided by a common wave function. Quantum entanglement (or what Maudlin (2011) calls the “quantum connection”) is universal and does not fall off with distance. This makes it much more difficult to consider subsystems as isolated while ignoring the influence of the rest of the universe. Fortunately, many relevant situations allow for an autonomous Bohmian description of a subsystem in terms of an effective wave function. Einstein’s worry that nonlocality would make the investigation of nature by local experiments impossible (see Einstein (1948)) was not borne out. Still, we have to be careful since the effective wave function depends implicitly on the environment configuration (e.g., on the procedure used to prepare the state in an experiment) via (13.25). More precisely (and more profoundly), the information that we can possess about the configuration of a Bohmian subsystem is restricted by the theorem of absolute uncertainty (Dürr et al., 1992), which has no analog in classical physics. “Information” here is understood very prosaically as a correlation between the configuration of the subsystem and the configuration of some other system—a brain, a measurement device, a notebook—that could constitute a record. Absolute uncertainty is then a direct consequence of the conditional probability formula (13.28): All external records about the subsystem are included in the particle configuration Y of the rest of the universe and thus already taken into account (i.e., conditionalized on) in (13.28), which leads to Born’s rule for the distribution of particle positions. The theorem of absolute

13 Quantum Mechanics

275

uncertainty thus states that, if the wave function of a subsystem is .ϕ, an external observer cannot have more information about the particle configuration of that system than is provided by the .|ϕ|2 -distribution. Conversely, this means that if we perform additional measurements to determine the particle positions with greater accuracy, the effective wave function of the system will become more and more peaked.11 As a consequence, the gradient in the guiding equation (13.21) will induce higher and higher possible velocities that vary ever stronger depending on the exact particle configuration X. Less uncertainty about the particle positions at time .t0 thus implies more uncertainty about the positions at .t0 + T —this is the source of Heisenberg’s uncertainty principle. Absolute uncertainty is a feature of quantum equilibrium. And this may be the deepest explanation for our epistemic limitations in the quantum realm: Our universe is in quantum equilibrium, but macroscopically in (thermodynamic) non-equilibrium. And it is non-equilibrium that allows for more informative correlations between subsystems. Just as Newtonian subsystems in thermal equilibrium are always Boltzmann distributed (with different temperatures and effective Hamiltonians), Bohmian subsystems are always Born distributed (with different effective wave functions). In conclusion, the “randomness” of quantum mechanics is the result of quantum equilibrium and quantum nonlocality, which are such that a system becomes immediately more chaotic as we try to determine the micro-conditions with greater accuracy. This forces us to resort much more routinely to probabilistic reasoning than in classical physics. For quantum systems, Born’s rule provides—provably—as good a description as we can get of a universe in quantum equilibrium.

11 Absolute

uncertainty is sometimes misunderstood as implying the impossibility of locating Bohmian particles with arbitrary precision. What it actually implies is the impossibility of locating Bohmian particles with arbitrary precision without affecting their quantum state.

276

D. Lazarovici

Why Determinism? Since the empirical predictions of Bohmian mechanics are based on the Born rule, i.e., the quantum equilibrium distributions, they will agree with the predictions of standard quantum mechanics whenever the latter are unambiguous. I could now point to cases in which standard quantum mechanics is ambiguous12 and to others in which the phenomena are more naturally interpreted in Bohmian terms.13 However, at the end of the day, the testable predictions of the Bohmian theory come from its statistical analysis rather than initial value problems for individual trajectories. That the phenomena are necessarily “random” in this sense is often regarded as one of the great innovations of quantum mechanics. Demonstrably false are the claims that this must be the result of some fundamental, even metaphysical indeterminacy. True is that quantum theory entails rigorous limits on our epistemic access to the microcosm as manifested in the theorem of absolute uncertainty (see Cowan & Tumulka (2016) for an analogous result about collapse theories). But the idea that, as a practical matter, the situation was very different in classical mechanics has always struck me as naive. Who has ever thought it feasible to determine the initial conditions of Newtonian particles with infinite precision? And did classical statistical mechanics not already confront us with phenomena that are random in every practical sense? What Einstein said about Brownian motion in his 1910 lecture “Über das Boltzmann’sche Prinzip und einige unmittelbar aus demselben fliessender Folgerungen”14

12 E.g.,

the above-mentioned problem of arrival times Das & Dürr (2019), or “Wigner’s friend” gedankenexperiments, in which quantum measurements are performed on human observers Lazarovici & Hubert (2019). 13 Such as weak measurements of particle trajectories, see, e.g., Dürr and Lazarovici (2020, Chap. 8) for a discussion. 14 Archived by Physikalische Gesellschaft Zürich. http://www.pgz.ch/history/einstein/index.html.

13 Quantum Mechanics

277

thus translates perfectly to the situation in Bohmian mechanics, if not in physics in general: If we now conclude by asking once again the question, “Are the observable physical facts completely causally linked with one another?” we must firmly deny it. […] According to the theory, one would need, in order to [compute the trajectories], to know the position and velocity of every single molecule, which seems impossible. Nevertheless, the laws of mean values that have proven themselves all over, as well as the statistical laws of fluctuations applicable in those areas of subtle effects, convince us that we must adhere to the principle of a complete causal link between the occurrences in the theory, even if we cannot hope to ever obtain direct confirmation of this view through refined observations of nature. (Translation D.L.)

What gives us trust in the microscopic theory is not primarily the intuitive appeal that determinism and particle trajectories may have, but the naturalness and coherence with which it grounds the empirical (statistical) phenomena. For most practical purposes, however, we could do just as well by postulating, rather than deriving, a bunch of phenomenological (probabilistic) rules—which is essentially what textbook quantum mechanics is doing. The final irony is that, while Einstein’s derivation of Brownian motion proved instrumental in fostering the acceptance of the atomic theory at the beginning of the nineteenth century, the Copenhagen school of quantum mechanics soon thereafter led to a regress into the Machian positivism that called atomic particles a “figment of the imagination” (Hirngespinste). No physicist today will flat out deny the existence of atoms and more elementary particles, but many will at least pay lip service to the idea that “particle” refers only to some abstract state characterized by its observable properties. There is certainly a general case to be made that determinism is a “defeasible methodological imperative” (Earman, 2007, p. 1372) and, in particular, that a deterministic theory is always preferable, other theoretical virtues being equal. But the situation we are facing in quantum mechanics is, though marked by empirical underdetermination, not such a ceteris paribus situation. Objective collapse models are empirically

278

D. Lazarovici

distinguishable from Bohmian mechanics (but see Tilloy and Wiseman (2021) for a surprising caveat), while various “interpretations” of the textbook formalism that insist on some form of irreducible randomness operate at a very different standard of conceptual clarity and mathematical rigor.

13.4.3 Thermodynamic Arrow in Bohmian Mechanics If our universe, conceived as a Bohmian universe, is in quantum equilibrium, where does the thermodynamic arrow come from? The received view is that it comes from the universal wave function, which is in a non-equilibrium state—while relative to this wave function, the particle configuration is in quantum equilibrium. To make this more precise, I will briefly summarize the generalization of Boltzmann’s statistical mechanics to quantum states, according to Goldstein et al. (2010). We can consider a wave function .Ψ as a microstate in a Hilbert space .H. More precisely, we shall restrict the system to a finite-dimensional energy shell, corresponding to a degenerate eigenvalue of the Hamiltonian. Furthermore, we consider a partition H=



.



(13.34)

α∈A

of the Hilbert space (energy shell) into orthogonal subspaces of varying though finite dimension, usually determined by a set of relevant observables. The subspaces correspond to the Boltzmannian macro-regions, and their respective quantum Boltzmann entropy is S(Hα ) := kB log (dim Hα ) ,

.

(13.35)

where .dim Hα is the dimension of the subspace .Hα . The equilibrium region is, as always, the region of maximal entropy with .dim Heq ≈ dim H. In contrast to classical mechanics, however, a quantum state can be in a superposition of different macrostates, i.e., .Ψ need not lie entirely in any of the subspaces making up the partition. Thus, we shall say that

13 Quantum Mechanics

279

Ψ realizes the macrostate corresponding to .Hα iff .〈Ψ | Pα | Ψ0 〉 ≈ 1, where .Pα is the projection onto .Hα . In Bohmian mechanics, it also makes sense to say that the macrostate is determined by the branch of the wave function that actually guides the particle configuration.

.

Remark 8 (Decoherence) The branching of the wave function, i.e., decoherence, is itself one of the most important examples of a thermodynamically irreversible process. Heuristically, this can be understood as follows. Reversing the thermodynamic evolution of a Newtonian N -particle system would require an exact reversal of N particle velocities. Similarly, bringing two macroscopic wave packets back into interference—i.e., to overlap on configuration space—would require precise control over N phases of the one-particle wave components. Simply put, each dimension of configuration space is one along which the wave packets might fail to “meet.” Unfortunately, I am not aware of any treatment that makes the connection with the quantum Boltzmann entropy (13.35) precise. Returning to the thermodynamic arrow in Bohmian mechanics, one makes a past hypothesis for the universal wave function—that it has started out in a macro-region of very low (quantum) Boltzmann entropy and is still in a low-entropy state. It is fascinating to think about how, according to this picture, the phenomena of our world arise from the interplay between quantum equilibrium (on the level of the particles) and thermodynamic non-equilibrium (on the level of the wave function).15 Of course, it also raises the usual worries about the Past Hypothesis, in this case, about the atypicality of our universe with respect to its quantum state. Moreover, Bohmian mechanics opens up the interesting possibility of a stationary wave function of the universe, satisfying a constraint Schrödinger equation of the form Hˆ Ψ = 0

.

(13.36)

15 Cf. Dürr et al. (2013, p. 65). See Chen (2021) for an interesting proposal to unite the two levels by conceiving the universal quantum state as represented by a density matrix.

280

D. Lazarovici

(similar to the Wheeler–DeWitt equation in canonical quantum gravity). In contrast to orthodox quantum theories, Bohmian mechanics would not be hit by the “problem of time” (e.g., Kiefer (2015)). The particle configuration and effective wave functions of subsystems would, in general, evolve even if the universal wave function did not. This option seems particularly attractive if one favors a nomological interpretation of the wave function according to which .Ψ is part of the physical laws rather than a beable over and above the particles (Dürr et al., 1997; Esfeld, 2014b; Esfeld et al., 2014; Goldstein & Zanghì, 2013). But how could there be an arrow of time if the wave function is stationary and the particles are in quantum equilibrium? There is no good answer, only an intriguing research program. Could we combine quantum equilibrium with the ideas of Julian Barbour and Sean Carroll (Chap. 11) that establish macroscopic irreversibility without the assumption of a special initial state and maybe even without reference to an external time? If we understand time relationally, that is, with respect to the change in a physical parameter .θ (playing the role of a universal “clock”), the conditional wave function θ .Ψ t = Ψ(·, θ(t)) will evolve, even if .Ψ itself doesn’t. (Here, t is an arbitrary parameterization for the Bohmian dynamics of .θ and not physical time.) The aspired result would thus be roughly the following: a natural, possibly non-normalizable, stationary wave function of the universe that makes a “thermodynamic” arrow with respect to an appropriate physical time parameter typical.

13.5 Born’s Rule in the Many-Worlds Theory Hugh Everett III is the father of the Many-Worlds theory, although the name was only later introduced by Bryce DeWitt, and there is some historical dispute about whether it does justice to Everett’s intended interpretation (Barrett, 2011). Undisputed is Everett’s insistence that we must take quantum theory seriously on all scales. He introduced the concept of the universal wave function that now plays a fundamental role in all quantum theories without observers (Everett, 1956). Everett recognized that the shifty split between the microscopic quantum regime and the macroscopic classical regime could not stand if quantum mechanics was

13 Quantum Mechanics

281

supposed to provide a coherent description of nature. In contrast to David Bohm (1952a,b), however, he refused to introduce additional variables into the theory but insisted on “pure wave mechanics,” defined only in terms of the universal wave function and the linear Schrödinger equation. Today, it is generally accepted that such a theory leads to a many-worlds picture, in which decoherent branches of the wave function describe a multitude of different but coexisting macro-histories. One may certainly find such a theory bizarre or extravagant. John Bell (2004) called it “above all …extravagantly vague” (p. 194). His main point of criticism was not the many worlds per se but how the theory is supposed to describe any world at all. Everettians share the belief that if we look at the wave function or quantum state the right way, we can locate cats and dogs and measurement devices with pointer positions in it, but they disagree, at times fundamentally, on what the right way of looking is. Everett’s original relative-state formulation (see (Barrett, 2018)), which was still based on observable operators and plagued by the “preferred basis problem,” is of little relevance today. The most honest attempts at discerning the empirical content of Everettian quantum mechanics fall under the aforementioned program of quantum state functionalism. But these attempts can still disagree on such basic questions as whether the fundamental arena of the theory is three-dimensional space (respectively four-dimensional spacetime) or the 3N -dimensional space on which the universal wave function is defined. It is thus only with reservations that one could speak of the Many-Worlds theory. And it is not a trivial concession to make—although I am going to make it—that pure wave mechanics can make contact with observable physical reality in the first place.

13.5.1 Probabilities of What? The challenge I will focus on instead is how to ground the statistical predictions of quantum mechanics, i.e., the Born rule, in Everettian quantum mechanics. The problem is not that Everettian quantum mechanics is deterministic (there is only one equation, the Schrödinger equation, which is deterministic). Large parts of this book have been devoted to

282

D. Lazarovici

discussing how objective probabilities can be grounded in deterministic laws, including the derivation of Born’s rule in Bohmian mechanics. The problem that arises in the context of Many-Worlds is probabilities of what? In particular, what could it even mean to ask about the probability for a specific measurement outcome if, according to the theory, all possible outcomes actually occur? An obvious idea is that the probability of a certain measurement outcome refers to the relative frequency of worlds in which the said outcome occurs. In the popular version of the Many-Worlds interpretation, it is assumed that, e.g., a spin measurement on a “particle” in the spin state 2 2 .ψ = α|↑z 〉 + β|↓z 〉 (with .|α| + |β| = 1, α, β /= 0) will result in exactly two worlds, one in which the outcome is spin UP and the other in which the outcome is spin DOWN. The problem is then that this “branch counting” is clearly at odds with probabilities predicted by quantum mechanics. The relative frequency of each outcome would always be .1/2, which agrees with the Born probabilities .|α|2 and .|β|2 only in the special case .α = β = √12 . According to more sophisticated (decoherence-based) versions of Everettian quantum mechanics, the number of distinct world branches is not even well defined, and naive branch counting is a nonstarter.16 This might be metaphysically unsettling—we must not merely accept that there are two cats at the end of Schrödinger’s experiment but that there are an indefinite number of them—but it explains why we don’t end up with the wrong statistics that would result from simply counting discrete worlds. Finding it hard to locate interesting probabilities in the Everettian multiverse, the next idea not far to seek is to locate them in our minds, i.e., interpret them subjectively. For instance, after I perform a spin

16 As

Wallace (2012) explains: “[T]here is no sense in which [decoherence] phenomena lead to a naturally discrete branching process: as we have seen in studying quantum chaos, while a branching structure can be discerned in such systems, it has no natural ‘grain’. To be sure, by choosing a certain discretization of (configuration-)space and time, a discrete branching structure will emerge, but a finer or coarser choice would also give branching. And there is no ‘finest’ choice of branching structure: as we fine-grain our decoherent history space, we will eventually reach a point where interference between branches ceases to be negligible, but there is no precise point where this occurs. As such, the question ‘How many branches are there?’ does not, ultimately, make sense.” (pp. 99– 100).

13 Quantum Mechanics

283

measurement—but before I look at the detector to see the result—I do not know if I find myself on a branch in which the detector registered “spin UP” or on a branch in which the detector registered “spin DOWN.” What should my credence be for one or the other? If someone offers me a 2:1 bet on “spin UP,” should I accept it? The chances, in this case, arise from my self-locating uncertainty (Sebens & Carroll, 2018; Vaidman, 1998). I do not know what world within the Everettian multiverse my present self inhabits, and the goal of a theoretical analysis would be to show that it is rational to assign degrees of belief according to the Born rule. Other authors, most notably Deutsch (1999) and Wallace (2012), have taken a more decision-theoretic perspective, trying to argue that it is rational to act (and, in particular, bet) in accordance with the probabilistic predictions of quantum mechanics. In this vein, Wallace proposes a set of ten axioms to justify the use of the branch amplitudes squared for calculating expected utilities in decision problems. Maudlin (2014) points out that these axioms do not allow a rational agent to split a payoff among two or more of her descendants, i.e., to see any utility in the option all of the above. “If one were mischievous, one might even put it this way: Wallace’s ‘rationality axioms’ entail that one should behave as if one believes that Everettian quantum theory is false” (p. 804). I am not going to pay any of these approaches the attention they would deserve based on their ingenuity alone. My remarks from Sect. 2.3 apply as to why I think epistemic probabilities are missing the point. What is at stake here is the empirical adequacy of the Many-Worlds theory. First and foremost, the theory has to account for well-established statistical regularities, not physicists’ beliefs or betting behaviors. Certain credences and decisions may be rational in virtue of the theory’s physical predictions, but we first have to figure out what the relevant predictions are.

13.5.2 Everett’s Typicality Argument In light of the ongoing debates about the justification of Born’s rule in Everettian quantum mechanics, it might be a surprising claim that Everett’s original account (presented in his 1956 doctoral thesis and summarized in his short 1957 paper) is still the most satisfying by far.

284

D. Lazarovici

Everett’s account is based on a typicality argument—and thus on objective probability assignments—not unlike the one we discussed in Bohmian mechanics. Therein, the .|Ψ|2 -measure determined by the universal wave function (i.e., the branch amplitudes squared) defines a typicality measure on world branches which is used to identify statistical regularities that hold in the vast majority of branching histories. Probabilities are once again typical relative frequencies, except that typicality is understood with respect to an actual ensemble of worlds existing within the Everettian multiverse. As Everett explained: We wish to make quantitative statements about the relative frequencies of the different possible results of observation—which are recorded in the memory—for a typical observer state; but to accomplish this we must have a method for selecting a typical element from a superposition of orthogonal states. […] The situation here is fully analogous to that of classical statistical mechanics, where one puts a measure on trajectories of systems in the phase space by placing a measure on the phase space itself, and then making assertions …which hold for “almost all” trajectories. […] However, for us a trajectory is constantly branching (transforming from state to superposition) with each successive measurement. To have a requirement analogous to the “conservation of probability” in the classical case, we demand that the measure assigned to a trajectory at one time shall equal the sum of the measures of its separate branches at a later time. This is precisely the additivity requirement which we imposed and which leads uniquely to the choice of square-amplitude measure. (Everett, 1957, pp. 460–461)

Just like Boltzmann in classical statistical mechanics and Dürr, Goldstein, and Zanghì in Bohmian mechanics, Everett appeals to a form of stationarity to justify the choice of typicality measure. More precisely, he stipulates three requirements that distinguish the measure uniquely (see Barrett (2016) for an excellent discussion). 1. It should be a positive function of the complex-valued coefficients associated with the branches of the superposed wave function. 2. It should be a function of the amplitudes of the coefficients alone.

13 Quantum Mechanics

285

3. It should satisfy the following additivity requirement: If a branch b is decomposed into a collection .{bi } of sub-branches, the measure assigned to b should be the sum of the measures assigned to the subbranches .bi . This last additivity condition can be understood diachronically as stationarity: the weight assigned to a world at any given time equals the sum of the weights assigned to its branching histories at later times. This also assures that the weight of a world branch does not change upon the splitting of other branches. Understood synchronically, the additivity condition does away with the problem that the notion of a world branch is unsharp. Whether one regards some component of the wave function as corresponding to one world (in which, let’s say, a particular measurement outcome occurs) or further subdivides it into two or ten or a million distinct world branches (with the same measurement outcome, but possibly different with respect to a finer-grained description), the total measure remains the same. In other words, the amplitude-squared weight assigned to a class of worlds with a certain characteristic is well defined, even if the number of worlds in that class is not. To see how the typicality argument proceeds, we consider the paradigmatic example of a series of spin measurements performed on identically prepared electrons in the state ϕ = α|↑z 〉 + β|↓z 〉, |α|2 + |β|2 = 1.

.

Of course, we have to assume (though this is not at all trivial to justify) that we can start with such wave functions of subsystems to which the Born rule is actually applied. Now we denote by .|⇑〉 respectively .|⇓〉 the state of the measurement device—and, in the last resort, the rest of the universe—that has registered “spin up” respectively “spin down.” After the first measurement, the joint (and, ultimately, universal) wave function will be in the decoherent superposition Ψ = α|↑z 〉1 |⇑〉 + β|↓z 〉1 |⇓〉 ,

.

(13.37)

286

D. Lazarovici

where index 1 indicates the first round of the experiment. Notably, this decomposition of the universal wave function corresponds to a very coarse-grained partition of the Everettian multiverse. In particular, no assumption is made about how many numerically distinct copies of the measurement device indicating “spin up” a term like .α|⇑〉|↑z 〉1 represents or even whether there is a well-defined number. With the second measurement, the wave function splits anew Ψ = α 2 |↑z 〉2 |↑z 〉1 |⇑⇑〉 + β|↓z 〉2 |↑z 〉1 α|⇓⇑〉 + α|↑z 〉2 |↓z 〉1 β|⇑⇓〉

.

+ β 2 |↓z 〉2 |↓z 〉1 |⇓⇓〉. The first three steps of the branching process are shown in Fig. 13.1. The conservation of the measure in each branch can be readily verified. For instance, along the history on the very left, we have after the second measurement: |α|4 + |α|2 |β|2 = |α|2 (|α|2 + |β|2 ) = |α|2 .

.

After n rounds of spin measurements, the total weight of branches in the outcome “spin up” was registered exactly k-times is n which 2k 2(n−k) . |β| . Writing .|α|2 =: p and .|β|2 = 1 − p, we recognize this |α| k as a Bernoulli process with n independent trials and “success” probability p. According to the law of large numbers, the typical relative frequencies for spin UP for large n are thus . nk ≈ p = |α|2 , matching the Born statistics predicted by standard quantum mechanics.

Fig. 13.1 Branching Many-Worlds histories after three spin measurements. Successive arrows indicate successive outcomes. Adapted from Barrett (2016)

13 Quantum Mechanics

287

13.5.3 Living and Dying in the Multiverse What have we accomplished in regard to the “probability problem”? Everett’s analysis establishes that quantum statistics hold across typical histories of the constantly branching multiverse. One would now like to conclude with an empirical prediction and say something like: “Hence, I should expect to experience a typical history in which the Born statistics hold.” But the indexical I does not pick out an individual with a unique future history. My current branch will split repeatedly, and there are going to be future versions of me who experience very different statistics. Those who regard the justification of Born’s rule primarily as a problem of decision-making can conclude that they should follow the Born rule to maximize utility among typical future selves. However, as argued before, this is not tantamount to an empirical prediction and somewhat beside the point. I see no way around the conclusion that the Many-Worlds theory lacks a certain predictive quality. When we ask what statistical regularity we will observe, the answer is always that any possible sequence of outcomes will be observed by some future versions of ourselves. I believe, however, that Everett’s typicality argument successfully grounds postfactum explanations. When I lie on my deathbed and wonder why I have experienced a history consistent with quantum mechanics, I will die in peace knowing that this is typical, that nearly all Many-Worlds histories— in the most natural sense of “nearly all” that the theory allows—manifest statistics consistent with Born’s statistical hypothesis. Wilhelm (2022) makes the interesting observation that this typicality explanation is manifestly distinct from probabilistic explanations if we agree that the latter presuppose that only one of various alternatives is actually realized: “[I]n Everettian quantum mechanics, the various possible outcomes of any given experiment all obtain. Everett himself makes this point: it would be a mistake, he says, to think of just one outcome as obtaining, to the exclusion of the others. So the sequences of outcomes other than the one invoked in the explanandum …occur too. But in probabilistic explanations, that cannot happen. In probabilistic explanations, the event invoked in

288

D. Lazarovici

the explanandum is the only outcome, of the various possible mutually exclusive outcomes, that occurs.”

One might try to evade Wilhelm’s argument by falling back on selflocating probabilities: Only one of the copies of D.L. existing in the multiverse is the branch-indexical I. But me being me doesn’t seem like the right explanandum. There is no self-locating uncertainty in the deathbed scenario; I know what life I have lived and hence what branch of the multiverse I have inhabited. For better or worse, the typicality explanation ends with the fact that the Born rule holds across the great majority of world branches. To ask further, the probability that I find myself on any one of the branches (as if my ego had been somehow thrown at random into the multiverse) strikes me as redundant at best and meaningless at worst. Finally, one must wonder why modern Everettians have almost universally dismissed Everett’s account of the Born rule. The most common objection is that Everett’s derivation involves a circularity—an understandable misconception if this derivation is not appreciated as a typicality argument in the spirit of Boltzmann. David Wallace, in his authoritative book The Emergent Multiverse, expresses the objection very pointedly: In his original paper (1957) [Everett] proved that if a measurement is repeated arbitrarily often, the combined mod-squared amplitude of all branches on which the relative frequencies are not approximately correct will tend to zero. And of course this is circular: it proves not that modsquared amplitude equals relative frequency, but only that mod-squared amplitude equals relative frequency with high mod-squared amplitude. Substitute ‘probability’ for ‘mod-squared amplitude’, though, and the circularity should sound familiar; indeed, Everett’s theorem (as is well known) is just the Law of Large Numbers transcribed into quantum mechanics. So the circularity in Everett’s argument is just the circularity in the simplest form of frequentism, disguised by unfamiliar language. (Wallace, 2012, p. 127)

But Everett does not argue that probability equals relative frequency with high probability. He argues that relative frequencies equal modsquared amplitudes in nearly all world branches, and there is nothing circular about that. The result is even optimal in the following sense: Born

13 Quantum Mechanics

289

statistics are the empirical regularity we need to explain, and to establish that they hold in nearly all branches of the multiverse is the best we can hope for, since the Born rule will certainly be false in some. In any case, Everett’s argument is not based on the simplest form of frequentism but on typicality, a concept he explicitly appeals to. One aspect of the circularity objection is analogous to the spurious “.|ψ|2 in, .|ψ|2 out” objection against the quantum equilibrium analysis in Bohmian mechanics, which I already addressed. That the natural typicality measure has the same functional form (in terms of the universal wave function) as typical empirical distributions (in terms of the wave functions of subsystems) is a non-trivial feature of the theory and its dynamics, something that God may have arranged to punish people who don’t pay attention to the distinction between the two concepts. In particular, we could not have run the same argument with branch amplitudes to the power .k /= 2 as typicality measure to deduce that typical frequencies approximate branch amplitudes to the power k. Everett’s typicality account is thus neither conceptually nor logically circular, and its mathematical simplicity should not blind us to the fact that it is quite profound.

References Albert, D. Z. (2013). Wave function realism. In A. Ney & D. Z. Albert (Eds.), The wave function (pp. 52–57). New York: Oxford University Press. Albert, D. Z. (2015). After physics. Cambridge, Massachusetts: Harvard University Press. Allori, V. (2013). Primitive ontology and the structure of fundamental physical theories. In A. Ney & D. Z. Albert (Eds.), The wave function: Essays on the metaphysics of quantum mechanics (pp. 58–75). New York: Oxford University Press. Allori, V., Dürr, D., Goldstein, S., & Zanghì, N. (2002). Seven steps towards the classical world. Journal of Optics B: Quantum and Semiclassical Optics, 4(4), 482–488.

290

D. Lazarovici

Allori, V., Goldstein, S., Tumulka, R., & Zanghì, N. (2008). On the common structure of Bohmian mechanics and the Ghirardi-Rimini-Weber theory. British Journal for the Philosophy of Science, 59(3), 353–389. Allori, V., Goldstein, S., Tumulka, R., & Zanghì, N. (2011). Many worlds and Schrödinger’s first quantum theory. The British Journal for the Philosophy of Science, 62(1), 1–27. Allori, V., Goldstein, S., Tumulka, R., & Zanghì, N. (2014). Predictions and primitive ontology in quantum foundations: A study of examples. British Journal for the Philosophy of Science, 65(2), 323–352. Barrett, J. (2018). Everett’s relative-state formulation of quantum mechanics. In E. N. Zalta (Ed.). The Stanford encyclopedia of philosophy. Metaphysics Research Lab, Stanford University (winter 2018 edn.) Barrett, J. A. (2011). Everett’s pure wave mechanics and the notion of worlds. European Journal for Philosophy of Science, 1(2), 277–302. Barrett, J. A. (2016). Typicality in pure wave mechanics. Fluctuation and Noise Letters, 15(03), 1640009. Bell, J. S. (2004). Speakable and unspeakable in quantum mechanics (2nd ed.). Cambridge: Cambridge University Press. Berndl, K., Dürr, D., Goldstein, S., Peruzzi, G., & Zanghì, N. (1995). On the global existence of Bohmian mechanics. Communications in Mathematical Physics, 173(3), 647–673. Bohm, D. (1952a). A suggested interpretation of the quantum theory in terms of “hidden” variables. 1. Physical Review, 85(2), 166–179. Bohm, D. (1952b). A suggested interpretation of the quantum theory in terms of “hidden” variables. 2. Physical Review, 85(2), 180–193. Chen, E. K. (2021). Quantum mechanics in a time-asymmetric universe: On the nature of the initial quantum state. The British Journal for the Philosophy of Science, 72(4), 1155–1183. Cohen-Tannoudji, C., Diu, B., & Laloe, F. (1991). Quantum mechanics (Vol. 1, 1st ed.). New York: Wiley. Cowan, C. W. & Tumulka, R. (2016). Epistemology of wave function collapse in quantum physics. British Journal for the Philosophy of Science, 67, 405–434. Das, S. & Dürr, D. (2019). Arrival time distributions of spin-1/2 particles. Scientific Reports, 9(1), 1–8. Daumer, M., Dürr, D., Goldstein, S., & Zanghì, N. (1996). Naive realism about operators. Erkenntnis, 45(2), 379–397.

13 Quantum Mechanics

291

Deutsch, D. (1999). Quantum theory of probability and decisions. Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, 455(1988), 3129–3137. Dürr, D., Goldstein, S., & Zanghì, N. (1997). Bohmian mechanics and the meaning of the wave function. In R. S. Cohen, M. Horne, & J. J. Stachel (Eds.). Experimental metaphysics: Quantum mechanical studies for Abner Shimony (Vol. 1). Boston Studies in the Philosophy and History of Science (pp. 25–38). Netherlands: Springer. Dürr, D., Goldstein, S., Norsen, T., Struyve, W., & Zanghì, N. (2013). Can Bohmian mechanics be made relativistic? Proceedings of the Royal Society A, 470, 2162. Dürr, D., Goldstein, S., & Zanghì, N. (1992). Quantum equilibrium and the origin of absolute uncertainty. Journal of Statistical Physics, 67 (5–6), 843– 907. Dürr, D., Goldstein, S., & Zanghì, N. (2004). Quantum equilibrium and the role of operators as observables in quantum theory. Journal of Statistical Physics, 116 (1), 959–1055. Dürr, D. & Lazarovici, D. (2020). Understanding quantum mechanics: The world according to modern quantum foundations. New York: Springer International Publishing. Dürr, D. & Teufel, S. (2009). Bohmian mechanics: The physics and mathematics of quantum theory. Berlin: Springer. Earman, J. (2007). Aspects of determinism in modern physics. In J. Butterfield, & J. Earman (Eds.). Philosophy of physics. Handbook of the philosophy of science (pp. 1369–1434). Amsterdam: North-Holland. Einstein, A. (1948). Quanten-Mechanik und Wirklichkeit. Dialectica, 2, 320– 324. Esfeld, M. (2014a). The primitive ontology of quantum physics: Guidelines for an assessment of the proposals. Studies in History and Philosophy of Modern Physics, 47, 99–106. Esfeld, M. (2014b). Quantum Humeanism, or: Physicalism without properties. The Philosophical Quarterly, 64(256), 453–470. Esfeld, M. (2018). Collapse or no collapse? What is the best ontology of quantum mechanics in the primitive ontology framework? In S. Gao (Ed.). Collapse of the wave function: Models, ontology, origin, and implications (pp. 167–184). Cambridge: Cambridge University Press.

292

D. Lazarovici

Esfeld, M., Lazarovici, D., Hubert, M., & Dürr, D. (2014). The ontology of Bohmian mechanics. British Journal for the Philosophy of Science, 65(4), 773– 796. Esfeld, M. (2020). From the measurement problem to the primitive ontology programme. In V. Allori, A. Bassi, D. Dürr, & N. Zanghi (Eds.). Do wave functions jump? Perspectives of the work of GianCarlo Ghirardi (pp. 95–108). Springer Nature. Everett, H. (1956). The theory of the universal wave function. Ph.D. thesis. Everett, H. (1957). “Relative state” formulation of quantum mechanics. Reviews of Modern Physics, 29(3), 454–462. Ghirardi, G. C., Grassi, R., & Benatti, F. (1995). Describing the macroscopic world: Closing the circle within the dynamical reduction program. Foundations of Physics, 25(1), 5–38. Ghirardi, G. C., Rimini, A., & Weber, T. (1986). Unified dynamics for microscopic and macroscopic systems. Physical Review D, 34(2), 470–491. Goldstein, S., Lebowitz, J. L., Mastrodonato, C., Tumulka, R., & Zanghì, N. (2010). Approach to thermal equilibrium of macroscopic quantum systems. Physical Review E, 81(1), 011109. Goldstein, S. & Struyve, W. (2007). On the uniqueness of quantum equilibrium in Bohmian mechanics. Journal of Statistical Physics, 128(5), 1197–1209. Goldstein, S. & Zanghì, N. (2013). Reality and the role of the wave function in quantum theory. In D. Dürr, S. Goldstein, & N. Zanghì (Eds.). Quantum physics without quantum philosophy (pp. 263–278). Berlin: Springer. Hensen, B., Bernien, H., Dréau, A. E., Reiserer, A., Kalb, N., Blok, M. S., Ruitenberg, J., Vermeulen, R. F. L., Schouten, R. N., Abellán, C., Amaya, W., Pruneri, V., Mitchell, M. W., Markham, M., Twitchen, D. J., Elkouss, D., Wehner, S., Taminiau, T. H., & Hanson, R. (2015). Loophole-free Bell inequality violation using electron spins separated by 1.3 kilometres. Nature, 526 (7575), 682–686. Kiefer, C. (2015). Does time exist in quantum gravity? Philosophical Problems in Science (Zagadnienia Filozoficzne w Nauce), 59(59), 7–24. Lazarovici, D. (2020). Position measurements and the empirical status of particles in Bohmian mechanics. Philosophy of Science, 87 (3), 409–424. Lazarovici, D. & Hubert, M. (2019). How quantum mechanics can consistently describe the use of itself. Scientific Reports, 9(1), 470. Lazarovici, D., Oldofredi, A., & Esfeld, M. (2018). Observables and unobservables in quantum mechanics: How the no-hidden-variables theorems support the Bohmian particle ontology. Entropy, 20(5), 381.

13 Quantum Mechanics

293

Maudlin, T. (1995). Three measurement problems. Topoi, 14, 7–15. Maudlin, T. (1997). Descrying the world in the wave function. The Monist, 80(1), 3–23. Maudlin, T. (2010). Can the world be only wave-function? In S. Saunders, J. Barrett, A. Kent, & D. Wallace (Eds.). Many worlds? Everett, quantum theory, and reality (pp. 121–143). Oxford: Oxford University Press. Maudlin, T. (2011). Quantum non-locality and relativity (3rd ed.). WileyBlackwell. Maudlin, T. (2014). Critical study David Wallace, the emergent multiverse: quantum theory according to the Everett interpretation. Oxford University Press, 2012, 530 + xv pp. Noûs, 48(4), 794–808. Maudlin, T. (2019). Philosophy of physics: Quantum theory. Princeton: Princeton University Press. Maudlin, T. (2020). The grammar of typicality. In V. Allori (ed.). Statistical mechanics and scientific explanation: Determinism, indeterminism and laws of nature. World Scientific. Mermin, N. D. (2012). Commentary: Quantum mechanics: Fixing the shifty split. Physics Today, 65(7), 8–10. Ney, A. (2021). The world in the wave function: A metaphysics for quantum physics. Oxford: Oxford University Press. Schilpp, P. (Ed.) (1949). Albert Einstein: Philosopher-scientist. Number VII in The Library of Living Philosophers. The Library of Living Philosophers Inc. (1st edn.). Evanston, Illinois. Sebens, C. T. & Carroll, S. M. (2018). Self-locating uncertainty and the origin of probability in Everettian quantum mechanics. The British Journal for the Philosophy of Science, 69(1), 25–74. Teufel, S. & Tumulka, R. (2005). Simple proof for global existence of Bohmian trajectories. Communications in Mathematical Physics, 258(2), 349–365. Tilloy, A. & Wiseman, H. M. (2021). Non-Markovian wave-function collapse models are Bohmian-like theories in disguise. Quantum, 5, 594. Vaidman, L. (1998). On schizophrenic experiences of the neutron or why we should believe in the many-worlds interpretation of quantum theory. International Studies in the Philosophy of Science, 12(3), 245–261. Vona, N., Hinrichs, G., & Dürr, D. (2013). What does one measure when one measures the arrival time of a quantum particle? Physical Review Letters, 111(22), 220404. Wallace, D. (2012). The emergent multiverse: Quantum theory according to the Everett interpretation. Oxford: Oxford University Press.

294

D. Lazarovici

Wallace, D. & Timpson, C. G. (2010). Quantum mechanics on spacetime I: Spacetime state realism. The British Journal for the Philosophy of Science, 61(4), 697–727. Wilhelm, I. (2022). Typical: A theory of typicality and typicality explanation. The British Journal for the Philosophy of Science, 73(2), 561–581.

Part III Beyond Physics

14 Other Applications of Typicality

In this short chapter, I will discuss applications of typicality beyond statistical mechanics and probability theory. On the one hand, this will emphasize the wide scope and philosophical potential of typicality. On the other hand, the appeal to probabilistic concepts is very dubious in the following examples, so they should help to further clarify the distinction between typicality and probability.

14.1 Typicality and Well-Posedness Deterministic laws are, in general, given as differential equations of motion that lend themselves to a well-posed initial value problem. The terminology goes back to Hadamard (1902), who required of un problème bien posé that (i) a solution exists, (ii) the solution is unique, and (iii) the solution depends continuously on the initial data. At least the first two conditions are certainly necessary for the laws to determine a unique history (for a closed physical system) given suitable initial data at some time .t0 . A famous example for the failure of (ii) in the framework of classical mechanics is Norton’s dome (Norton, 2008). © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Lazarovici, Typicality Reasoning in Probability, Physics, and Metaphysics, New Directions in the Philosophy of Science, https://doi.org/10.1007/978-3-031-33448-1_14

297

298

D. Lazarovici

Technically, what happens in this case is that the Hamiltonian vector field fails to be Lipschitz-continuous, thus violating a premise of the Pascal– Lindelöff theorem establishing existence and uniqueness of solutions of ordinary differential equations. Condition (i), existence of solutions, usually means global existence, i.e., that the solution is defined for all times (or at least all .t ≥ t0 ). A solution .X(t) that cannot be extended beyond a bounded (or half-bounded) interval .(t− , t+ ) indicates the formation of a singularity at which the equations of motion break down. In Newtonian gravity, this happens when two point particles collide and the gravitational force between them diverges: . lim Gmr 21 m2 = +∞. r→0

However, it is known that initial conditions leading to such collision singularities are atypical; they form a set of Lebesgue measure zero that is also topologically meager (Saari, 1971, 1973). More surprising is the possibility of non-collision singularities in the N -body problem, where particles escape to infinity in finite time.1 Solutions with non-collision singularities are known to exist for .N ≥ 5 (Xia, 1992), but not for .N ≤ 3 (Painlevé, 1897). Whether they exist for .N = 4 is an open mathematical problem. Ironically, this is the only N for which proof exists that initial conditions resulting in a non-collision singularity (if there are any) must form a set of Lebesgue measure zero. Saari conjectures that this holds true for all .N ≥ 4 (Saari, 2005, p. 221), and intuitively, it seems clear that only very conspiratorial behavior could lead to particles being accelerated to infinity in finite time. For the guiding equation of Bohmian mechanics, the solution theory is more settled. For sufficiently “nice” wave functions .Ψ, singularities can occur only if the particle configuration runs into a node of the wave function where the velocity field diverges. However, equivariance of the 2 .|Ψ| -measure under the Bohmian flow already implies that Bohmian trajectories tend to avoid regions where .Ψ ≈ 0. And indeed, global existence of solutions has been proven for almost all initial conditions (i.e., a set of measure 1) relative to this natural typicality measure (Berndl et al., 1995; Teufel and Tumulka, 2005).

1 The time reversal of such solutions—corresponding to particles suddenly appearing in space—has also been discussed as an example of Newtonian indeterminism, see Earman (1986, Chap. 3).

14 Other Applications of Typicality

299

So we note that, when singularities are mathematically possible (which they are in any fundamental theory I am aware of ), mathematical physicists aim at proving the existence and uniqueness of solutions for almost all initial conditions, i.e., as typical in the strong sense of “all initial conditions except for a set of measure zero.” Such results are generally regarded as satisfactory, establishing that the laws are sound and deterministic—despite pathological “counterexamples” that receive more attention in the philosophical than in the physical literature. There is certainly an empirical rationale here. On the one hand, atypical micro-configurations leading to singularity formation would be impossible to create in practice. On the other hand, no empirical evidence could ever justify a belief that a physical system actually is in one of the “bad” microstates that run into a singularity. However, if we want to take the equations seriously as candidates for fundamental laws of nature, beliefs and practical limitations offer only so much comfort. One would rather conclude that singular solutions do not correspond to genuine physical possibilities. A singularity, after all, does not mean that the evolution of the universe suddenly stops, but that the laws themselves cease to make sense. Taking this stance, we are engaging in typicality reasoning, not with respect to a reference class of nomologically possible worlds, but with respect to a mathematical solution space. And we are satisfied with the mathematical expression of the law—and willing to accept that the relation between formal solutions and possible worlds is not one-to-one— provided that unphysical solutions are also atypical. (This is different from some people’s willingness to dismiss even large classes of well-defined solutions if they conflict with their metaphysical prejudices.) Probability does not support analogous reasoning. Impossible initial conditions are not unlikely but impossible. And whether a particular solution runs into a singularity is not random but a mathematical fact. The situation as described for Newtonian gravity and Bohmian mechanics should be compared with general relativity (GR), where singularity theorems establish the existence of spacetime singularities (in the form of geodesic incompleteness) under very general conditions (see, e.g., Hawking and Ellis (1973)). In other words, singularities in GR seem

300

D. Lazarovici

to be generic rather than atypical. Although some of these singularity theorems are even “predictive”—in that they are taken to establish the inevitability of a Big Bang—they must be considered as negative results from a foundational perspective, pointing to an intrinsic limitation of GR and the need for a more fundamental theory of spacetime.2

14.2 Typicality and Fine-Tuning One could say that an atypical feature of the universe requires a finetuning of the initial micro-conditions. The possible initial states realizing it are exceedingly few and special ones. We have to accept some such features as brute facts. But when they correspond to a robust phenomenon that warrants explanation, there is little debate, at least among physicists, that fine-tuning is bad—if only for the reason that a theory could be finetuned to account for almost anything. In the literature, however, the term “fine-tuning” is more often used in different contexts, especially when referring to the specific values of physical parameters, including ones that have not been “produced” by a dynamical evolution but have the status of a constant of nature. This raises further questions about whether or not typicality reasoning substantiates claims of fine-tuning problems.

14.2.1 The Flatness Problem A famous example of an alleged fine-tuning problem is the flatness problem in standard Big Bang cosmology (which is said to have been resolved by inflationary cosmology). The puzzle, in a nutshell, is that the energy density .ρ of our universe is very close to the “critical value” .ρc required for a flat spatial geometry on cosmological scales. Moreover, the energy density departs very quickly from the critical value as the universe expands. The ratio .ρ/ρc is commonly denoted by .Ω, and while .Ω ≈ 1 today, the deviation from unity would have had to be about .1060 times 2 To

be clear, for our universe, general relativity is certainly a more fundamental theory than Newtonian gravity, but the latter seems to be self-consistent in a way that GR is not.

14 Other Applications of Typicality

301

smaller at the Planck time, shortly after the Big Bang. More precisely, from the Friedmann equations, one obtains (Ω−1 − 1)ρa 2 = −

.

−3kc2 , 8π G

(14.1)

where a is the scale factor of the universe and .k ∈ {−1, 0, +1} indicates negative, flat, or positive curvature, respectively. The right-hand side of (14.1) is constant. The energy density .ρ on the left-hand side decreases as .a −3 for a matter-dominated universe and even as .a −4 for a radiationdominated one. Thus, if the universe has expanded .1060 -fold, .(Ω−1 − 1) must have increased accordingly. But is the flatness of the universe really puzzling? The “atypical” value of .Ω seems like a perfect example for what we identified as acceptably brute facts: .Ω had to be something, and while a number near 1 may seem special to us, it is as good (or rather as atypical) as any other. Moreover, several authors have pointed out that since .Ω always tends to unity as we approach the Big Bang (.a → 0), there is always a time in the very early universe at which it would look “fine-tuned” (Coles and Ellis (1997, p. 22); Lake (2005); Helbig (2012)). After all, in classical (i.e., nonquantum) cosmology, there is nothing special about the Planck time that many formulations of the flatness problem take as reference point. One could argue that we are experiencing a very special period in the history of our universe in which .Ω is still very close to 1, but this turns the flatness problem into one of self-location instead of fine-tuning per se. In any case, the relevant philosophical debate is over which features of our universe are valid targets of scientific explanation versus acceptably brute. The flatness of our universe as such may not be a good explanandum. However, it has been argued that if the value of .Ω in the early universe had been only slightly different, the universe would have either recollapsed too quickly or expanded too fast to allow for the formation of stars and galaxies (Lewis and Barnes, 2016, pp. 164–167). The relevant phenomenon that standard Big Bang cosmology fails to explain is thus not the “special” numerical value of .Ω but the abundance of stars and galaxies in our universe. This is what the energy density appears to be fine-tuned for.

302

D. Lazarovici

Having said this, .Ω is not an independent degree of freedom, and a proper typicality analysis would have to ask whether the observed value of .Ω is atypical with respect to the possible initial configurations of the matter and metric fields. Unfortunately, whenever we are dealing with a field theory like GR, the fundamental state space .Г is infinite-dimensional, which makes the construction of a natural typicality measure difficult (see Curiel (2015) for a discussion of this issue).3 There exists, however, a canonical measure—the GHS measure—on the reduced phase space (“minisuperspace”) of the Friedmann–Lemaître–Robertson– Walker models used in standard cosmology. And with respect to this measure, a flat universe turns out to be typical: “Thus for arbitrarily large expansions (and long times), and for arbitrarily low values of the energy density, the canonical measure implies that almost all solutions of the Friedmann–Robertson–Walker scalar equations have negligible spatial curvature and hence behave as .k = 0 models.” (Hawking and Page, 1988, pp. 803–4) “[T]he measure is entirely concentrated on exactly flat universes; universes with nonvanishing spatial curvature are a set of measure zero. […] Therefore, our interpretation is clear: almost all universes are spatially flat.” (Carroll and Tam, 2010, p. 18)

The statement of Carroll and Tam seems a little too strong. In a critical discussion, McCoy (2017) points out that, without an “artificial” regularization of the measure, the result is rather that, for any value .κ∗ of the curvature parameter .κ = ak2 , FLRW spacetimes with .κ < κ∗ form a set of infinite measure, while those with .κ ≥ κ∗ form a set of finite measure. This is a perfectly valid standard of typicality as we discussed in Chap. 6. In particular, a typicality measure—in contrast to a probability measure—does not have to be normalizable. The correct statement, however, is then that “nearly flat” (rather than “exactly flat”) spacetimes are typical—which is a perfectly satisfying result. 3 Intuitively,

it is easy to understand why a generalization of the Lebesgue measure to infinite dimensions doesn’t exist. If we think of volume naively as width .× height .× length, etc., every infinite-dimensional set would have measure 0, 1, or .∞.

14 Other Applications of Typicality

303

McCoy objects that the threshold value .κ∗ for “nearly flat” is arbitrary, so that any observed curvature could be deemed either typical or not small enough. I don’t see this problem. On the one hand, if we are asking, “Why does the large-scale structure of our universe look so flat?” the explanandum is vague, to begin with. On the other hand, as argued before, the relevant explanandum raising worries about fine-tuning is structure formation, not flatness per se, which means that the upper bound .κ∗ is not arbitrary. The universe must have been flat enough to allow for structure formation, and according to the GHS measure, a flat enough universe is typical. In sum, I disagree with McCoy’s conclusion that, mainly because of its non-normalizability, the GHS measure “cannot be used to make typicality arguments in this context” (McCoy, 2017, p. 1251). Whether we are justified in using this particular typicality measure is a more complicated debate (see, e.g., Schiffrin and Wald (2012)) that I won’t be able to settle here. Typicality avoids most of the conceptual problems associated with probabilistic reasoning in cosmology. It also mitigates the technical problems since we don’t have to insist on normalized measures and thus regularization procedures that tend to cause ambiguities. Technical challenges nonetheless remain, as it is far from clear how to construct a natural typicality measure for theories like general relativity.

14.2.2 Fine-Tuning of the Natural Constants Another famous fine-tuning problem pertains to the constants of nature. For instance, if the relative strength of the electromagnetic and strong nuclear force had not been very close to what it actually is, heavy elements (beyond hydrogen) could not have formed in stellar fusion processes. Either most of the hydrogen would have been burned in the very early universe or stellar nucleosynthesis would have been much less efficient (see, e.g., Barrow and Tipler (1986); Lewis and Barnes (2016)). Notably, the explanandum here is not that the value of the fine structure constant .α (which determines the strength of electromagnetic 1 interactions) is close to . 137 . To wonder about that strikes me as an exercise in numerology. What physics should be able to explain is the

304

D. Lazarovici

existence of heavy elements in our universe and, on a more fine-grained level, the statistical regularities regarding nuclear reactions that make their formation likely. Physical explanations, however, usually end with the fundamental laws, and the constants (with their specific values) are arguably part of them. In any case, universes with a different fine structure constant or Higgs mass are not nomologically possible worlds in the sense discussed so far, viz., worlds parameterized by initial conditions for the dynamical quantities. If someone complains that our universe is atypical with regard to the natural constants, we must therefore ask for clarification as to which reference set this typicality judgment is based upon. Most likely, it assumes a broader notion of nomological possibility, one that allows the constants to vary. But why then stop with the constants? Why not consider quantum field theories with different fields/gauge groups as representing physical possibilities? Why not different equations entirely? The idea might be that physical laws should correspond to mathematical structures—including the likes of symmetry groups but excluding numbers. The phobia of numbers, in turn, can have many reasons (e.g., the worry that their appearance in the laws might commit us to Platonism via an indispensability argument), but discussing them would take us too far. To be clear, there is little debate about the fact that theories with fewer free parameters are preferable and those with too many suspicious. But as long as constants appear in our best theories of nature, I am skeptical of the intuition that they represent something more contingent than other parts of the laws.

14.3 Typicality in Mathematics The wide scope of typicality is well illustrated by its use in “pure” mathematics. Typicality results are remarkably common in various mathematical disciplines. What seems to be lacking is a unified theory of the concept and a broader appreciation of its relevance. The following discussion is a modest attempt to improve upon this situation, but much remains to be done.

14 Other Applications of Typicality

305

One problem is the parallel use of various technical terms obscuring the fact that they express one and the same concept. Most commonly found are mathematical statements that hold almost everywhere in some measure space or for almost all elements of a set of points, numbers, functions, etc. More rarely, “small” exception sets are referred to as negligible sets. Not a bad terminology given that, in many contexts, such sets are literally irrelevant for the realization of some more “coarse-grained” property. For instance, changing finitely many elements of an infinite sequence does not affect its convergence properties. Measurable functions that differ only on a null-set of points are indistinguishable by Lebesgue integration (and thus identified in equivalence classes as elements of an p .L space). The notion of a generic property or set can also be found in the mathematical literature, more frequently (in my impression) than typical property, though they are used synonymously. Depending on the context, these notions can be explicated in terms of measure theory, cardinalities, or topological properties (e.g., meager sets; see Chap. 6). Finally, also in mathematics, typicality results often come disguised as probabilistic statements—erroneously, as some of the following examples are meant to demonstrate. Here Are Some Examples of Typicality Results in Mathematics 1. Almost all real numbers are irrational/transcendental/uncomputable. This is true in the sense of “all except for countably many” and a fortiori also in the sense of “all except for a Lebesgue null set.” 2. Almost all real numbers are normal numbers, meaning that the digits .0, . . . , (b − 1) appear with equal frequency if the numbers are expanded in the integer basis .b ≥ 2. Although there are many uncountable non-normal numbers, they form a set of measure zero. The first rigorous proof is due to (Borel, 1909). 3. A monotone function .f : (a, b) → R is almost everywhere differentiable. [Lebesgue’s theorem for the differentiability of monotone functions] 4. A bounded function .f : [a, b] → R is Riemann-integrable if and only if it is almost everywhere continuous. [Riemann–Lebesgue theorem]

306

D. Lazarovici

5. Given m linearly independent vectors .{v1 , . . . , vm } in a vector space V of dimension .n > m, almost all vectors are linearly independent of .{v1 , . . . , vm }. This is true because .x ∈ V is linearly dependent if and only if it lies in the m-dimensional subspace spanned by .{v1 , . . . , vm }. 6. A typical quadratic matrix is invertible. In the vector space .Rn×n (or n×n .C ), singular (non-invertible) matrices form an .n2 − 1-dimensional subspace and thus also a set of measure zero. Invertible matrices are also typical in the topological sense of forming an open and dense set. (Open because it is the pre-image of .R \ {0} under the continuous determinant function and dense because every singular matrix can be approximated by invertible ones.) 7. Almost all values of a smooth map between smooth manifolds are regular values. [Sard’s theorem] Given a smooth map .f : M → N , the critical set .X ⊂ M consists of those points x at which the differential .df (x) has a rank .< dim(N ). Sard’s theorem states that the image .f (X)—the set of critical values—has Lebesgue measure 0 in N (while X itself may be large). 8. Khinchin’s theorem on Diophantine approximations. Let .ψ : Z+ → R+ be a non-increasing function. A real number x is called p .ψ-approximable if there exist infinitely many rationals . such that q    p  ψ(q)  . x − . <  q q

(14.2)

Whether an (irrational) number is .ψ-approximable depends ∞ on the function .ψ. Khinchin (1926) proved that, if the series . q=1 ψ(q) diverges, almost all real numbers (in the sense of Lebesgue measure) are .ψ-approximable, and if the series converges, then almost none are. A more general statement about such “Diophantine approximations” is the Duffin–Schaeffer conjecture (Duffin and Schaeffer, 1941), a proof of which was very recently announced (Koukoulopoulos and Maynard, 2019).

14 Other Applications of Typicality

307

9. Typical graphs are asymmetric. A famous theorem by Erd˝os and Rényi (1963) establishes the following (even stronger) result: Let .Ω(n) be the set of (non-directed) graphs with n vertices (for any .n > 0, there are a total of .|Ω(n)| = n 2(2) such graphs). .Г ∈ Ω(n) is called symmetric if it has a non-trivial automorphism group, i.e., if there exists a non-trivial permutation of its vertices that leaves the graph invariant. For .∈ > 0, let .A(n, ∈) ⊂ Ω(n) be the set of graphs that cannot be transformed into a symmetric one by changing at most . n(1−∈) edges (it is always possible to obtain a 2 n−1 symmetric graph with at most . 2 changes). Then .

|Ω(n) \ A(n, ∈)| = 0. n→∞ |Ω(n)| lim

10. Every orthonormal basis in high dimensions is uniformly distributed over the unit sphere. This refers to the following result of Goldstein et al. (2017): Let .V n be an n-dimensional (real or complex) Hilbert space with n n .n ≥ 4. Let .S(V ) = {x ∈ V : ‖x‖ = 1} be the unit sphere in .V n with the uniform measure (i.e., the normalized surface area) .λ. Let n ∼ .G = O(n) or .U(n) the orthogonal or unitary group on .V and .μG the uniform (Haar) measure on G. Then, for any orthonormal basis 1 .B = {b1 , . . . , bn } and .∈, δ > 0 with .n ≥ 2 δ ∈  μG

.

    #(B ∩ R(A))    R∈G: − λ(A) ≤ δ ≥ 1 − ∈, n (14.3)

for every Borel measurable set .A ⊂ S(V n ). This may not be so easily recognizable as a typicality result since it is a statement about all orthonormal bases. Indeed, any two orthonormal bases differ by a rotation (an orthogonal or unitary transformation), so if the vectors of one basis are uniformly distributed over the sphere, it should be true for all. The tricky question, however, is what it means for n discrete points on the sphere to be “uniformly

308

D. Lazarovici

distributed.” And this is where Goldstein et al. invoke a typicality property: Given any (measurable) .A ⊂ S(V n ) and its congruent (i.e., rotated) sets .R(A), R ∈ G, nearly all of them are such that the fraction of base vectors contained in .R(A) is approximately equal to the fraction of surface area that .R(A) occupies on the sphere. It is not uncommon for mathematicians to use probabilistic language when stating some such results. Indeed, Erdös and Rényi formulate their theorem in terms of the “probability” that a graph can be transformed into a symmetric one. Goldstein et al. use the language of typicality very prominently but also refer to the test sets .R(A) as “random rotations” and announce (in the abstract) that any orthonormal basis in high dimensions “will pass the random test [for uniformity] with probability close to 1.” In a purely technical sense, these statements are perfectly correct, and the proofs of the results may indeed draw a lot from probability theory. Conceptually, though, the reference to probability is misplaced and should be abandoned in favor of typicality. Goldstein et al. define in terms of typical test sets what it means for a set of points on the sphere to be “uniformly distributed.” Then they prove that any orthonormal basis in high dimensions is uniformly distributed over the sphere, not that an orthonormal basis is probably uniformly distributed, or something like that. In particular, if one disagreed with their choice of a uniform measure on the rotation group, one would not disagree about the likelihood of finding a uniformly distributed orthonormal basis but about the very meaning of “uniformly distributed.” Helpfully, the authors draw explicit parallels between their result and other “typicality theorems about spheres in high dimensions” that are clearly non-probabilistic such as “most of the area of a sphere is near the equator” and “most of the volume of the unit ball is near the surface” (p. 703). In their work on asymmetric graphs, Erdös and Rényi formulate the n assumption that “all possible .2(2) graphs should have the same probability to be chosen” when they actually establish—in the most ordinary sense of counting—that for large n, a certain property, viz. being asymmetric, is shared by the great majority of graphs. Their theorem is, in fact, not about

14 Other Applications of Typicality

309

choosing any graph but a typicality result about the set of all non-directed graphs of finite order. If we want to apply the theorem to a real-life situation in which a graph (or some structure isomorphic to a graph) is actually chosen or produced, a probabilistic language may become appropriate. But then we are leaving the purely mathematical realm and need to analyze whether the assumption of a uniform probability is justified for the respective process. If graphs are produced by a physical process, the question becomes whether a uniform probability is typical under the relevant physical dynamics. If the graphs are produced by asking people on the street to draw one, the probability distribution seems like a question for an empirical social study rather than mathematics. Interestingly, though, there is a partially mathematical explanation for the fact that an analogous poll asking participants to name a real number might produce mostly algebraic ones despite their being atypical in .R. It lies in the paradox that, although almost all, i.e., all except for countably many, numbers are transcendental, we can know (or specify, or construct) almost none of them since the set of numbers that could be described in some language, or expressed in a closed formula, or produced by a finite algorithm, is only countably infinite. Barry Loewer once asked me whether the fact that transcendental numbers are typical in .R is supposed to explain the fact that .π is a transcendental number. It is a difficult question to what extent pure mathematics, in general, is in the business of answering why questions. To my mind, the typicality of transcendental numbers does make the transcendentality of .π both unsurprising and prima facie plausible, but reasonable minds may disagree. Barry’s question was surely meant to cast doubt on my claim that typicality facts in physics are explanatory but actually points us to a crucial difference between physics and mathematics. The relevant typicality facts in physics have a modal character (referring to a reference set of possible worlds or initial conditions) that is completely absent in the mathematical context. Physics studies the laws of nature with all their modal structure but must ultimately explain the phenomena of the actual world. The explanatory connection between laws and contingent phenomena

310

D. Lazarovici

is thereby provided by typicality. Mathematics studies abstract sets or structures—much more than their constituting elements—and finds that many of their interesting and useful features are typicality facts. But even when mathematics studies individual objects, it deals with logical necessity, not contingency. If it feels like the typicality of transcendental numbers adds nothing to the proof that .π is transcendental (Lindemann, 1882), it is because the latter already establishes a necessary truth. Why it is that typicality facts in mathematics strike us as interesting in themselves deserves further contemplation. The first step, however, is to recognize typicality results as such.

References Barrow, J. D., & Tipler, F. J. (1986). The anthropic cosmological principle. Oxford: Oxford University Press. Berndl, K., Dürr, D., Goldstein, S., Peruzzi, G., & Zanghì, N. (1995). On the global existence of Bohmian mechanics. Communications in Mathematical Physics, 173(3), 647–673. Borel, E. (1909). Les probabilités dénombrables et leurs applications arithmétiques. Rendiconti del Circolo Matematico di Palermo (1884–1940), 27 (1), 247–271. Carroll, S. M. & Tam, H. (2010). Unitary evolution and cosmological finetuning. Manuscript arXiv:1007.1417 [hep-th]. Coles, P. & Ellis, G. (1997). Is the universe open or closed?: The density of matter in the universe. Cambridge Lecture Notes in Physics (1st ed.). Cambridge: Cambridge University Press. Curiel, E. (2015). Measure, topology and probabilistic reasoning in cosmology. arXiv:1509.01878 [gr-qc, physics:math-ph, physics:physics]. Duffin, R. J. & Schaeffer, A. C. (1941). Khintchine’s problem in metric Diophantine approximation. Duke Mathematical Journal, 8(2), 243–255. Earman, J. (1986). A primer on determinism. The Western Ontario Series in Philosophy of Science. Netherlands: Springer. Erd˝os, P. & Rényi, A. (1963). Asymmetric graphs. Acta Mathematica Academiae Scientiarum Hungarica, 14(3), 295–315.

14 Other Applications of Typicality

311

Goldstein, S., Lebowitz, J. L., Tumulka, R., & Zanghì, N. (2017). Any orthonormal basis in high dimension is uniformly distributed over the sphere. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, 53(2), 701–717. Hadamard, J. (1902). Sur les problèmes aux dérivées partielles et leur signification physique. Princeton University Bulletin, 13, 49–52. Hawking, S. W. & Ellis, G. F. R. (1973). The large scale structure of space-time. Cambridge University Press. Hawking, S. W. & Page, D. N. (1988). How probable is inflation? Nuclear Physics B, 298(4), 789–809. Helbig, P. (2012). Is there a flatness problem in classical cosmology? Monthly Notices of the Royal Astronomical Society, 421(1), 561–569. Khinchin, A. (1926). Zur metrischen Theorie der diophantischen Approximationen. Mathematische Zeitschrift, 24(1), 706–714. Koukoulopoulos, D. & Maynard, J. (2019). On the Duffin-Schaeffer conjecture. arXiv:1907.04593 [math]. Lake, K. (2005). The flatness problem and Λ. Physical Review Letters, 94(20), 201102. Lewis, G. F. & Barnes, L. A. (2016). A fortunate universe. Cambridge University Press. Lindemann, F. (1882). Über die Zahl π . Mathematische Annalen, 20, 213–225. McCoy, C. D. (2017). Can typicality arguments dissolve cosmology’s flatness problem? Philosophy of Science, 84(5), 1239–1252. Norton, J. D. (2008). The Dome: An unexpectedly simple failure of determinism. Philosophy of Science, 75(5), 786–798. Painlevé, P. (1897). Leçons sur la théorie analytique des équations différentielles. Paris: A. Hermann. Saari, D. G. (1971). Expanding gravitational systems. Transactions of the American Mathematical Society, 156, 219–240. Saari, D. G. (1973). Improbability of collisions in Newtonian gravitational systems. II. Transactions of the American Mathematical Society, 181, 351–368. Saari, D. G. (2005). Collisions, rings, and other Newtonian N-body problems. Number Nr. 104 in CBMS Regional Conference Series in Mathematics. American Mathematical Society. Schiffrin, J. S. & Wald, R. M. (2012). Measure and probability in cosmology. Physical Review D, 86 (2), 023521. Teufel, S. & Tumulka, R. (2005). Simple proof for global existence of Bohmian trajectories. Communications in Mathematical Physics, 258(2), 349–365. Xia, Z. (1992). The existence of noncollision singularities in Newtonian systems. Annals of Mathematics, 135(3), 411–468.

15 Special Science Laws

This chapter will address the special sciences, in particular the reduction of special science laws to microphysical laws. I will focus mostly on the example of biology because more specialized sciences, e.g., social or economic ones, begin to involve human agency (but see Wagner (2020) for a discussion of typicality in this context), while the boundary between physics and chemistry can be blurry.

15.1 Ontology of Special Sciences The view of special science laws that I am about to sketch is ontologically reductive. The fundamental ontology of the world is that of fundamental physics. Genes or tigers or ecosystems are nothing over and above this physical ontology but have to be located in it—usually by providing appropriate functionalist definitions and then identifying physical systems that (typically) fulfill that functional role. My view, however, is not explanatorily reductive. Biological phenomena are much better explained in the language of biology than in terms of atomic trajectories. Special sciences thus exist not only as a poor substitute for physics when the © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Lazarovici, Typicality Reasoning in Probability, Physics, and Metaphysics, New Directions in the Philosophy of Science, https://doi.org/10.1007/978-3-031-33448-1_15

313

314

D. Lazarovici

complexity of a system makes a complete physical description unfeasible, but have explanatory autonomy. The key to this balancing act between reductionism and autonomy is the Boltzmannian framework of macro-to-micro reduction. A system’s macrostate supervenes on its microstate, which, in turn, evolves according to the (deterministic) microphysical laws. But the microscopic theory does not determine the relevant macro-variables or the partition of phase space that characterize macrostates in the first place. By introducing its own concepts and vocabulary, a special science introduces a particular set of macro-variables and a way of coarse-graining the physical phase space into, say, biological states. I am using the word “macro-variable” loosely here. In general, it won’t be a nice mathematical function of the microphysical variables, but even a term like “tiger” or “X chromosome” partitions the phase space into microscopic configurations that do or do not realize the relevant macroscopic (or mesoscopic) properties—with some fuzziness around the edges. In any case, no physical theory tells us that it is useful to coarse-grain microscopic configurations into tiger and non-tiger states. It is a genuine achievement of special sciences to devise theoretical concepts that allow us to identify and systematize salient regularities. Compared to thermodynamic regularities, those identified by biology or macroeconomics will be more limited in scope, and more contingent on the (physical) macrostate of our universe that allows tigers and markets to exist in the first place. Still, there is a sense in which special sciences can succeed at carving some part or aspect of nature “at its joints.” It is then quite appropriate to think of special science laws in roughly Humean or even Super-Humean terms. From the former, I adopt the idea that “lawhood” is simply bestowed upon the axioms of the best systematization of salient regularities. From so-called Super-Humeanism (see, e.g., Esfeld and Deckert (2017)), I adopt the idea that theoretical concepts and structures figuring in such laws need not refer directly to (fundamental) properties or entities but can be introduced because they allow for a more efficient systematization. An even better reference point is Barry Loewer’s package deal account (Loewer, 2021), according to which the laws and the “natural properties” that the laws refer to supervene together on the world as its best systematization. While I find Humeanism metaphysically untenable when applied to fundamental

15 Special Science Laws

315

physics (this is the subject of the final chapter), a best system account strikes me as plausible in the context of special sciences which deal with non-fundamental laws and entities. Plausible against the backdrop that there is a fundamental physical ontology as the ultimate truth-maker of naturalistic propositions and fundamental non-Humean laws that ground special science laws as typical regularities. By and large, I thus see the status of biology vis-à-vis physics as follows: 1. There are genuine biological regularities in the world. Our ability to identify and systematize regularities in the world depends on our theoretical language. Philosophers might inquire about “natural properties” that carve nature at its joints (Lewis, 1983). For a more technical perspective, we can think of regularities in terms of algorithmic information theory, that is, roughly in terms of compressible data sets (cf. the “Chaitin model” in Sect. 16.3). Whether there is an algorithm producing the data set that is significantly shorter than the data set itself depends, in general, on the (programming) language in which algorithms can be written. Thus, there are regularities in the world that can only be identified, or at least systematized, in the language of biology (rather than the language of physics), even though they are instantiated in the physical ontology. While biological terms are in principle translatable into physical terms—by suitable functional definitions or Ramseyfication— the functional definition of a gene, or a cell, or a tiger in terms of elementary particles is extremely complex. The translation into the language of physics would thus come at a very high cost. In addition, there is the issue of multiple realizability (see Esfeld and Sachse (2007) for a good discussion), i.e., one and the same biological term may be realized by different physical states in different instances. Hence, a set of physical events may not instantiate any regular pattern (the data set may be “incompressible”, or nearly so), unless we introduce appropriate biological predicates and macro-variables. 2. What makes “genes” part of biological laws is their role in the best systematization of biological regularities. For the best system, I adopt (for now) the Mills–Ramsey–Lewis criterion of striking an optimal

316

D. Lazarovici

balance between simplicity and strength. I am open, even sympathetic, to including additional metrics or theoretical virtues, but this is beyond the scope of the present discussion. In any case, since we are concerned with the systematization of genuine biological regularities in the sense discussed above, the “nomic status” of the theoretical entity gene is provided by biology. This status is what distinguishes genes from spurious or unnatural concepts à la grue emeralds that could also be defined in functional physical terms. 3. However, every concrete proposition about genes (and every biological proposition, in general) has a physical truth-maker. Any proposition about the constitution, behavior, or interaction of biological entities is ultimately a proposition about the physical world and thus made true or false by physical facts. In particular, every biological system is also a physical system and must, therefore, obey the laws of physics. No biological law could ever contradict the physical laws. The upshot is that there is nothing over and above the physical facts that makes a biological fact a fact (this is the reductive part of the account). But there is something beyond physics that makes a fact a biological fact— with emphasis on the logos part (this being the autonomy of the special sciences). While I take the blame for the particular view just sketched, my claims to originality are fairly limited, as I am drawing a lot from Loewer’s package deals, the Canberra plan for metaphysics (see Esfeld (2020b) for a recent discussion), and also Daniel Dennett’s concept of real patterns (Dennett, 1991). There are, however, several reasons why I hesitate to adopt the concept of “real patterns” for the ontology of non-fundamental entities. First, because it may carry the connotation of something abstract, while I am pretty sure of genes, and damn sure of tigers, that they are concrete entities. Second, because my view is rather conservative in that it is reductive, specific to special natural sciences, and (as I will explain below) hierarchical. In contrast, the main focus of Dennett (1991) is beliefs, which fall into the mental/normative domain (with respect to which I do not advocate a reductive view), while in more recent literature, “real patterns” are often tied to metaphysics that are either radically nonreductionist and non-hierarchical (“rainforest realism,” see Ladyman and

15 Special Science Laws

317

Ross (2007)), or take the reductive strategy a step too far, viz. beyond a primitive ontology of matter in motion. Finally, the concept of real patterns is often invoked in debates about different forms of realism, and I don’t feel like I have much to contribute to these debates, nor that they are particularly productive for my present purposes. Genes are real, of course, but they are not fundamental. If we want to call them patterns, then they are patterns instantiated in the physical ontology.1 But the same is true of grue emeralds if the term succeeds in referring at least once. Their different status is due to the fact that genes figure in biological laws, while grue emeralds do not figure in gemological ones. There is certainly a pragmatic element involved in this distinction (genes are not “more real” than grue emeralds, they are just more useful). But here, I would share in the usual hope of best system accounts that one candidate theory will be objectively best in its respective domain (whether we can decide it in practice or not). I am not sure that this is true, but even less convinced of the need to concede to relativism. If two biologists disagree about which part of a DNA sequence is the gene for blue eyes, then they either disagree about the meaning of “gene” or one of their theories will prove superior in systematizing blue-eye heredity.

15.1.1 Probability and Causation in Special Sciences We can now connect this discussion of special sciences to our analysis of probability and causation in terms of typicality. In fact, I believe that causal explanations are much more relevant to special sciences than to (fundamental) physics. There is, however, a prima facie tension between the view that biological entities or properties are causally efficacious and the view that they are ontologically reducible to micro-physical entities or properties. What does it mean, for instance, that a gene mutation increases the fitness of a certain biological form when the survival and reproductive success of every individual is determined by the physical laws governing its microscopic constituents? When there exists, in principle, a complete 1 Whether the correct metaphysical relation between a gene and, let’s say, a configuration of elementary particles is one of identity or grounding is too subtle a question for me.

318

D. Lazarovici

description of natural evolution in terms of atomic trajectories? It would seem like the biological explanation is either wrong or redundant – unless we had a genuine case of causal overdetermination. Well, I am not worried about causal overdetermination in this case because I don’t believe in causal relations on the micro-physical level. According to the analysis proposed in Chap. 12, causal relations can only hold between two macrostates A and B, namely in the sense that one macrostate makes the other typical: Typ.(B | A) while .¬Typ(B). Such macrostates could be specified in biological (or other special science) terms. If biological predicates can be translated into the language of physics (as claimed), they can be conceived as coarse-graining functions on the microscopic state space, i.e., as Boltzmannian macro-variables. Our analysis of causation, and also the typicality theory of probability, thus carry over to “biological states”. For instance, the fact that an individual has developed the phenotypical trait P makes it typical that .S : it survives long enough to reproduce in the environmental conditions E: Typ(S | E ∨ P ), whereas ¬Typ(S | E).

.

(15.1)

Or, if .{a1 , . . . , aN } is a population with genotype A and .{b1 , . . . , bM } a population with genotype B (i.e., two statistical ensembles), then the reproduction rate of A may typically be greater than the reproduction rate of B, i.e., N M   1  1   .Typ S(ai ) > S(bj )  E . N i=1 M j =1

(15.2)

Typicality here is still understood in the usual physical sense of a phenomenon obtaining for nearly all possible (initial) micro-conditions (but note the later remarks about the context dependence of typicality). There is thus no need for “causal emergentism” or a new source of “randomness.” Causal explanations in the special sciences are a form of causal inference as discussed in the physical context in Chap. 12. And objective probabilities

15 Special Science Laws

319

in special sciences mean what they mean in physics, namely typical relative frequencies. According to the typicality theory of causation, there is also nothing mysterious or troublesome about one and the same macro-event B having ˜ This can easily happen, both a biological cause A and a physical cause .A. not only if .A˜ ⊂ A (i.e., if physics provides a “more detailed” account), but also if the macrostates A and .A˜ pertain to different partitions of phase space, defined with respect to different sets of macro-variables.

15.2 Special Science Laws as Typicality Laws Special science laws are usually understood as ceteris paribus laws (CPlaws). In contrast to fundamental physical laws, they are not universally true but hold under specific circumstances that exclude interfering factors. The main problem with this concept of (exclusive) CP-laws is that it seems impossible to provide a complete specification of all potential interferences to be excluded (especially in the language of the respective science) without falling into the tautology that the law L holds except in circumstances where it doesn’t. Hence the charge that CP-laws are in danger of being either false or trivial (Hempel, 1988; Lange, 1993). I think the problem arises at least in part from the attempt to model special science laws after fundamental physical laws when they should really be understood as typical regularities. Thermodynamic laws would have been a better example to follow if Boltzmann’s reduction to statistical mechanics had been more widely appreciated. To ground explanations, predictions, and counterfactuals, an effective (non-fundamental) law need not state conditions that make its instances necessary. It only has to be specific enough about its domain—and robust enough against small perturbations—to make the regularities typical. A limited number of definite ceteris paribus clauses will thus belong to the description and systematization of the respective regularity (including the macroconditions we have to conditionalize on), while the indefinable range of other potential interferences is negligible by virtue of being atypical. The understanding of special science laws as typical regularities (and in the

320

D. Lazarovici

framework of Boltzmannian statistical mechanics) thus leads naturally to the conclusion expressed by Marc Lange (2002): To discover the law that all F ’s are G, ceteris paribus, scientists obviously must understand what factors qualify as ‘disturbing’. But they needn’t identify all of the factors that can keep an F from being G. They needn’t know of factors that, when present, cause only negligible deviations from strict G-hood, or factors that, although capable of causing great departures from G-hood, arise with negligible frequency in the range of cases with which the scientists are concerned. (p. 411)

The typicality view is also similar but different from normality theories which understand CP-laws as laws that hold under normal conditions, that is, simply put “conditions that normally, usually, mostly obtain” (Spohn, 2008, p. 278). Spohn ultimately rejects this characterization, taking a more epistemic turn to explicate normality conditions in terms of doxastic states and degrees of belief (more precisely, “ranking functions”; see Spohn (2002, 2014)). Both definitions of normality differ from typicality. On the one hand, typicality refers first and foremost to what obtains for most possible micro-conditions, i.e., in most nomologically possible worlds, not to what obtains most of the time in the actual world. It is then a theorem rather than a definition that a repeatable typical event will typically obtain most of the time. On the other hand, what is typical does not depend on anyone’s expectations or beliefs. It is the other way round: typicality facts guide rational expectations and beliefs.

15.2.1 The Hierarchy of Sciences The theory of a hierarchy of modern sciences, often attributed to Auguste Comte (1830), has great intuitive appeal. While the issue can become messy and controversial when one gets into the weeds—and the structure of science is arguably more like a branching tree than a pyramid—it is by and large correct to say that biological facts reduce to chemical facts, and chemical facts to physical ones. The various levels of this hierarchy are often associated with different scales of size or complexity, different degrees of generality or fundamentality, and sometimes different

15 Special Science Laws

321

standards of rigor and predictive uncertainty. Here, I want to argue that this hierarchy of sciences is well captured and explained by different “levels” of typicality. We have already discussed the context-sensitivity of typicality. Simply put, depending on the relevant set of propositions, there can be a different order of magnitude for .∈ such that a proposition P is deemed typical if .μ(P ) > 1 − ∈ with respect to a designated typicality measure .μ. And it seems correct to say that this “threshold” for typicality is very high in the context of physics (e.g., for thermodynamic regularities), lower in the context of chemistry, lower still for propositions relevant to biology, and so on. All propositions are ultimately translatable into the language of the micro-physical theory, where they correspond to certain macroregions in the fundamental phase space (the set of microstates X for which the proposition P is true). However, which macro-regions are considered “large” or “small” is relative to the partition of phase space thus induced, i.e., the context of the typicality reasoning. In a very loose sense, we could associate these “levels of typicality” with different degrees of certainty. What truly matters, however, are the rationality principles underlying typicality reasoning, in particular, typicality as a guide to which facts require further explanation. A regularity that is typical in the biological context does not warrant further biological explanation but may still be reducible to finer-grained chemical or physical regularities. And an atypical biological phenomenon calls, in the first place, for additional biological explanation (and may ultimately compel us to revise or reject our biological theory) but will rarely, if ever, rise to the level of an atypical event by the standard of physics, to challenge our fundamental theories. In addition to fitting with scientific practice, this hierarchy of typicality has also a basis in mathematics if we think of regularities in statistical terms. As we pass to larger and larger scales and more and more specialized sciences, we are dealing with ever-smaller ensembles. Oversimplifying somewhat, social regularities are instantiated in systems of .∼ 102 − 109

322

D. Lazarovici

people, physiological regularities are instantiated in systems of .∼ 106 − 1014 cells, and thermodynamic regularities are instantiated in systems of 20 .∼ 10 −1080 atomic particles. If we now look at the law of large numbers   const. P |relative frequency − theoretical mean| > ∈ = δ  2 , ∈ N (15.3)

.

as the prototype of a typicality result, we see (or recall from Chap. 2) that a larger sample size N allows for smaller values of .∈ and .δ. Conversely, smaller sample size means greater “uncertainty,” that is, both a broader range .∈ of typical values, and a larger measure .δ of atypical events (those for which the empirical distribution differs significantly from the theoretical one). In conclusion, it is not just an epistemic or methodological issue that biological predictions seem less reliable and precise than thermodynamic ones. Nature and the very scope of the respective regularities make it so. There is another way to understand the hierarchy of sciences, not by reducing each special science directly to physics, but by reducing each higher-level theory to the next lower level. It is helpful to start with an intra-physical example. We can describe a ball as a rigid Newtonian body with 6 degrees of freedom (3 for the position of its center of mass and 3 rotational degrees of freedom). This is a very coarse-grained description, ignoring the ball’s internal (microscopic) degrees of freedom. Technically, we are thereby passing from the microscopic phase space of .∼ 1024 particles to a reduced 12dimensional phase space (6 spatial degrees of freedom and their conjugate momenta) which is good enough to account for many mechanical regularities. In doing so, we are ignoring the fact that the 12 effective degrees of freedom are actually macro-variables on the microscopic phase space  −1  N N (e.g., .R[X(t)] = m k k=1 i=1 mi qi the ball’s center of mass) and implicitly assume that each configuration in the reduced phase space is realized by typical micro-states, disregarding micro-conditions for which the ball would suddenly decay, or shoot off an ultra-fast particle while veering in the opposite direction, or perform other shenanigans. Such

15 Special Science Laws

323

atypical micro-states would cease to realize a “ball” in the sense relevant to the higher-level description as a rigid body. On the other hand, if we consider a system of N balls (whose interactions can be approximated by elastic collisions, let’s say) their common effective phase space (12N dimensional) will be equipped with its own natural measure that allows us to identify typical regularities—now with respect to initial conditions of N “hard spheres.” A similar thing happens as we go from elementary physics to molecular physics, to chemistry, to biology, and so on. Under the right macroconditions (that we will have to conditionalize on), typical configurations of quarks realize nucleons, typical configurations of nucleons and electrons realize atoms, typical configurations of .O, N, H, C atoms realize cytosine, guanine, adenine, or thymine molecules, and typical configurations of those nucleobases (plus deoxyribose and some organic phosphates) realize DNA. In each step, we are passing to a reduced state space, ignoring internal degrees of freedom by assuming typical behavior of the more fundamental constituents. Admittedly, as we move further away from physics and into more qualitative territory, the relevant “state space” becomes elusive, and we may no longer have a quantitative conception of a system’s degrees of freedom and dynamics that would allow us to formulate precise typicality statements. Still, taken with a grain of salt, we can say that chemical regularities are instantiated by typical physical systems (of the right kind to typically realize the relevant chemical properties), biological regularities are instantiated by typical chemical systems, medical regularities by typical biological systems, and so on and so forth.2 In each step, we are multiplying the possibilities of atypical events on the lower levels, so that the relevant regularities are grounded in the fundamental laws by a weaker standard of typicality.

2 Wagner (2020) even applies the concepts of typicality and “degrees of freedom” in a quite compelling manner to social sciences.

324

D. Lazarovici

15.3 Is Life Atypical? In addition to our previous considerations, it is important to note that specialized sciences deal primarily, if not exclusively, in conditional typicality. For instance, economic regularities may be typical given the existence of market economies, but the majority of physically possible universes will arguably contain no markets at all. (Pace Marx, a period of capitalism is not that inevitable.) As discussed in Chap. 11, typicality statements in physics might also have to be conditionalized on the Past Hypothesis—the assumption of a very low-entropy initial macrostate of the universe—but the conditions under which economic sciences apply are much more restrictive than that. And if we regard the Past Hypothesis as genuinely restricting nomic possibilities, we can say that thermodynamic regularities are indeed typical tout court, whereas economic regularities are at best typical given very particular macro-conditions that hold at specific times and places. And what about biology? Is the very existence of biological phenomena an atypical feature of our universe? This is a profound and challenging question, not least because it concerns our place as intelligent beings in the cosmos. The main issue, to be clear, is not whether life is ubiquitous in our universe but whether typical universes allowed by the fundamental laws of nature contain any (complex) life at all. I don’t have much expertise to offer on this question, but, by and large, three conclusions seem plausible: 1. The existence of life in the universe is typical because the thermodynamic evolution of the universe—the way in which entropy typically increases—is somehow conducive to the creation of complex subsystems with self-replicating entities that get Darwinian evolution started. (A much-noted exploration of this idea is due to England (2013). Erwin Schrödinger’s What is Life? (1944) was a milestone in relating the question of life to thermodynamic considerations.) 2. The existence of life is typical—or at least not atypical—merely because of the “large numbers.” That is, even though the environmental conditions and the physical processes necessary for the origin of life

15 Special Science Laws

325

are extremely special, the universe is so big (and so old) that they are bound to occur somewhere. 3. Life is atypical, requiring very fine-tuned initial conditions of the universe. Let us suppose that the last conclusion is correct and the existence of biological systems (or at least somewhat complex ones) is an atypical phenomenon according to the fundamental laws of physics. What are the implications? Would the fact that our very existence is atypical not undermine the rationality of typicality arguments altogether? I believe that, based on typicality reasoning, two different stances could be adopted. The first would be to accept that the phenomenon of life is a challenge to our current best theories. We cannot be content with the fact that the existence of life in our universe does not require a flat-out violation of the physical laws; its atypicality compels us to look for additional physical principles—or better theories altogether—that don’t make life atypical. Some may appeal to a (strong) anthropic principle to meet the explanatory burden (see Barrow and Tipler (1986) for a classic reference), but its value is deservedly controversial. The other option is more sobering: from the point of view of fundamental physics, the existence of life is an atypical feature of our universe, but it is not a bona fide physical phenomenon that warrants a physical explanation. It is, in other words, a brute fact—consistent with the fundamental laws of nature but purely accidental. To me, this would point to more than a strict division of labor between physics and biology. It would mean that the existence of life—our existence—is insignificant in the great cosmic scheme of things, a footnote in the book of nature. The idea that our universe is fine-tuned for life is, of course, a popular argument for God, at least a deistic one, who does not have to intervene in the course of nature but set up things very meticulously to bring about intelligent life. If the reader will indulge the religious imagery, I would put it the other way round. Whether God, upon creating the universe, cared about the existence of human beings (more than, let’s say, about the shape of a particular sand dune changing in the wind) is tantamount to the question of whether God set up natural laws that make our existence typical.

326

D. Lazarovici

References Barrow, J. D., & Tipler, F. J. (1986). The anthropic cosmological principle. Oxford: Oxford University Press. Comte, A. (1830). Cours de philosophie positive. Bachelier. Dennett, D. C. (1991). Real patterns. Journal of Philosophy, 88(1), 27–51. England, J. L. (2013). Statistical physics of self-replication. The Journal of Chemical Physics, 139(12), 121923. Esfeld, M. (2020b). Super-Humeanism: The Canberra plan for physics. In D. Glick, G. Darby, & A. Marmodoro (Eds.). The foundation of reality: Fundamentality, space, and time (Chapter 6). Oxford, New York: Oxford University Press. Esfeld, M. & Deckert, D.-A. (2017). A minimalist ontology of the natural world. Routledge Studies in the Philosophy of Mathematics and Physics. Oxford: Routledge. Esfeld, M. & Sachse, C. (2007). Theory reduction by means of functional subtypes. International Studies in the Philosophy of Science, 21(1), 1–17. Hempel, C. G. (1988). Provisoes: A problem concerning the inferential function of scientific theories. Erkenntnis, 28(2), 147–164. Ladyman, J. & Ross, D. (2007). Every thing must go: Metaphysics naturalized. Oxford: Oxford University Press. Lange, M. (1993). Natural laws and the problem of provisos. Erkenntnis, 38(2), 233–248. Lange, M. (2002). Who’s afraid of ceteris-paribus laws? Or: How I learned to stop worrying and love them. Erkenntnis, 57 (3), 407–423. Lewis, D. (1983). New work for a theory of universals. Australasian Journal of Philosophy, 61(4), 343–377. Loewer, B. (2021). The package deal account of laws and properties (PDA). Synthese, 199(1), 1065–1089. Spohn, W. (2002). Laws, ceteris paribus conditions, and the dynamics of belief. Erkenntnis, 57 (3), 373–394. Spohn, W. (2008). Causation, coherence and concepts: A collection of essays. Springer Science & Business Media. Spohn, W. (2014). The epistemic account of ceteris paribus conditions. European Journal for Philosophy of Science, 4(3), 385–408. Wagner, G. (2020). Typicality and minutis rectis laws: From physics to sociology. Journal for General Philosophy of Science, 51, 447–458.

16 Typicality and the Metaphysics of Laws

It would seem unreasonable …if the whole universe and each and every part of it were in order…, while there were nothing of the kind in the principles. — Theophrastus, Metaphysics 7 a 101

16.1 What Are the Laws of Nature? Over the past few decades, the best system account has developed into a popular, maybe even dominant, position regarding the metaphysics of the laws of nature. In a nutshell, this view holds that the laws of nature are merely descriptive, an efficient summary of contingent regularities that we find in the world. Metaphysically, it is based on the thesis of Humean supervenience—named in honor of David Hume’s denial of necessary connections—that David Lewis (1986a) famously characterized as “the doctrine that all there is to the world is a vast mosaic of local matters of particular fact, just one little thing and then another.” The laws of 1 Quoted

after Finkelberg (2017, p. 59).

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Lazarovici, Typicality Reasoning in Probability, Physics, and Metaphysics, New Directions in the Philosophy of Science, https://doi.org/10.1007/978-3-031-33448-1_16

327

328

D. Lazarovici

nature are then supposed to supervene on this Humean mosaic as the deductive system that strikes the optimal balance between simplicity and informativeness in describing the world. The Humean “regularity view” of laws is opposed to the “governing view,” in its various forms, according to which the fundamental laws play an active role in guiding, producing, or constraining the history of the universe. I take the main contemporary contenders to be dispositional essentialism (Bird, 2007)—which grounds the laws of nature in dispositional properties instantiated by the fundamental entities—and nomic primitivism (Maudlin, 2007a), which admits “laws of nature” as a primitive ontological category and laws as fundamental entities in the ontology of the world. My discussion will only be concerned with fundamental laws whose domain is the entire physical history of the world. And it will only defend the minimal anti-Humean thesis that laws constrain that history (see Chen & Goldstein (2022) for such a minimalist version of nomic primitivism). If the view that laws “produce” entails more than that (as Schaffer (2016) argues) or is tied to a particular metaphysics of time (see Loewer (2012b) versus Maudlin (2007a)), it will require additional arguments. There is a way to phrase the debate between Humean and anti-Humean metaphysics that I find misleading and of little interest (for reasons that will become clearer in the course of the discussion): Laws can determine regularities (as their instances), and regularities can determine laws (as their best systematization), and so the question is, what comes first and what is more fundamental, the laws or the regularities? “What grounds what?” is how one would put it, more properly, in contemporary metaphysics (see, e.g., Schaffer (2008, 2009)). One might then be skeptical about one of the two grounding relations and choose sides accordingly, e.g., deny that there could ever be an objectively best systematization or find it mysterious how natural laws are supposed to “govern” anything. These are not my main concerns, however, and I shall grant that both the regularity and the governing view of laws are conceptually sound. Instead, I consider the debate between Humean and anti-Humean metaphysics to be first and foremost a debate about fundamental ontology—whether there is more to the fabric of the world than the Humean mosaic—and

16 Typicality and the Metaphysics of Laws

329

the relevant choice to be one between ontological parsimony and other theoretical virtues. In this debate, Humeans have had remarkable success in defending a counterintuitive position against all objections that have been thrown their way. In recent years, criticisms of the best system account have focused, in particular, on the lack of explanatory power of Humean laws (e.g., Lange (2013); Maudlin (2007a)), the alleged subjectivity of the best system (Armstrong, 1983; Carroll, 1994), or the commitment to a separable ontology which is put into question by the entanglement structure found in quantum mechanics (Maudlin, 2007a). Humeans have resisted all of these attacks with at least some degree of persuasiveness (see, e.g., Cohen and Callender (2009); Hall (2015); Lewis (1994); Loewer (1996, 2012b); for the application of Humeanism to (Bohmian) quantum mechanics see Bhogal and Perry (2017); Callender (2015); Esfeld (2014b); Esfeld et al. (2014); Miller (2014)). This is not to say that these debates are settled or that the objections have no merit. But I believe they have not quite managed to capture the manifest implausibility of Humean metaphysics and turn it into a decisive argument for modal realism. The aim of this chapter is to do just that. It will thereby elaborate on a fairly common anti-Humean intuition, which is to look at the astonishing order in our cosmos, the regularity of nature expressed by the simple and elegant laws that physics discovers, and wonder: How likely is it that these regularities come about by pure chance? One author articulating this argument is John Foster in his book “The Divine Lawmaker” (2004): What is so surprising about the situation envisaged – the situation in which things have been gravitationally regular for no reason – is that there is a certain select group of types, such that (i) these types collectively make up only a tiny portion of the range of possibilities, so that there is only a very low prior epistemic probability of things conforming to one of these types when outcomes are left to chance …(p. 68)

The objection to Humean laws as “cosmic flukes” has also been raised by Strawson (2014) and recently defended by Filomeno (2021). But the

330

D. Lazarovici

argument, phrased in probabilistic terms, does not succeed. Humeans have several good points to make in response: 1. We don’t have to account for why the law of gravitation—or any other particular law described by physics—holds in our universe. AntiHumean theories can’t explain this, either. The debate is about what it is to be a law, not why the laws of our world are what they are. 2. What do you mean by “chance”? The thesis of Humean supervenience holds that the history of the universe, the distribution of “local particular facts,” is contingent. But contingency, or the absence of a further metaphysical ground, is not the same as randomness. In fact, Humean metaphysics is opposed to all intuitions about the mosaic being “produced” by a chancy process—particles performing random motions, or God playing blindfolded darts and throwing local particulars into spacetime, or anything of this kind. 3. Where do your “prior probabilities” come from? What determines the right probability distribution over possible worlds? All successful applications of probability theory come from within science. And according to the most prominent Humean account (see Chap. 5), the fundamental probability measure grounding probabilistic predictions and rational priors is itself part of the best system that supervenes on the Humean mosaic. In other words, all meaningful probabilities are determined by the actual world, and there can be no justified priors entailing that the world, as it is, is unlikely. These points are well taken. In particular, I very much agree that any reference to probability is dubious in a metaphysical context, not to speak of the fact that subjectivist principles—like the often-invoked principle of indifference—can fail us even in much mundaner ones. The concept of typicality, however, is a perfect fit for the issue at hand.

16.2 Typicality in Metaphysics Why does typicality avoid the objections raised against probability? For one, typicality statements are extremely robust against variations of the measure, so much so that, in many cases, the question of how to pick

16 Typicality and the Metaphysics of Laws

331

the right measure or what it even means to be the “right” measure doesn’t even arise (Maudlin, 2007b, p. 286). Recall our very first example of a typicality statement: almost all real numbers are irrational. That is, being irrational is typical within the set .R of reals. In what sense is this proposition true? First and foremost, in terms of cardinalities. The set of real numbers is uncountably infinite, while the subset of rational numbers is only countably infinite. Therefore, |R\Q| . = 0. This is a very precise and uncontroversial sense in = 1 and . |Q| |R| |R| which almost all real numbers are irrational. In principle, nothing more needs to be said here, but since we will use it later on—and since it is more familiar from applications in physics—we can spell out typicality in terms of a typicality measure. It is then natural to consider the uniform Lebesgue measure on .R, which makes it true that all real numbers except for a subset of measure zero are irrational. Notably, the Lebesgue measure on .R is not normalizable, so it cannot be mistaken for a probability measure. But the uniform measure might seem suspicious as it reeks too much of a “principle of indifference.” Fair enough, we can pick virtually any other measure we like. Any non-discrete measure, that is, any measure that is zero on singletons, will agree that .Q ⊂ R is a null set, i.e., that almost all numbers are irrational.2 In effect, we assume nothing more than that a one-element subset is vanishingly small compared to an uncountably infinite set. There is thus a very innocent and intuitive sense in which all reasonable measures agree on the typicality of irrational numbers. We saw in previous chapters that typicality statements in physics usually admit exception sets of very small but positive measure. For the present discussion, the example of irrational numbers is quite appropriate since we can work with the stricter standard of typicality (allowing only exception sets of measure zero) that is even more robust against the choice of measure and yields stronger results than can be realistically obtained in physics. The second crucial difference between typicality and probability is that typicality is not tied to ignorance, randomness, or indeterminism.

2 By .σ -additivity, a measure can be non-zero on countable sets if and only if it is non-zero on some one-element sets.

332

D. Lazarovici

The typicality of irrational numbers is an objective fact. It has nothing to do with anyone’s credences or a priori expectations, nor with some number being picked at random or picked out at all. When applied to a reference class of possible worlds, typicality figures in a way of reasoning about contingency—and contingency, if anything, is central to Humean metaphysics. Having applied typicality reasoning to various questions in physics and statistical mechanics, I am going to argue that it extends to a powerful way of reasoning in metaphysics. The typicality fact that the best system account has to confront is then the following: Typical Humean worlds have no Humean laws. Almost all possible mosaics have no regularities but are worlds of irreducible complexity that do not allow for any meaningful systematization. (This will be rigorously proven for deterministic laws and in a more handwaving manner for probabilistic ones.) The challenge, to be clear, is not to account for why we find these particular laws in our universe, but why we find any laws at all. And based on typicality reasoning, we must conclude that, if we live in a world regular enough to be described by physical laws, this fact requires explanation in the form of something in the fundamental ontology that makes it so.

16.2.1 Ontological Possibility The orthographical symbols are twenty-five in number. This finding made it possible, three hundred years ago, to formulate a general theory of the Library and solve satisfactorily the problem which no conjecture had deciphered: the formless and chaotic nature of almost all the books. — Jorge Luis Borges, The Library of Babel

While most typicality statements require some mathematical tool like measures to give precise meaning to the locution “almost all,” their truthmaker—and what is ultimately doing the explanatory work—is not the measure but the reference class with respect to which properties come out as typical or atypical (or neither). In physics, this reference class is the set of nomologically possible worlds determined by the pertinent laws and usually parameterized by initial conditions. And we saw that

16 Typicality and the Metaphysics of Laws

333

an important—if not the most important—way in which these laws, however conceived, explain or predict is by delimiting a set of nomological possibilities that makes certain phenomena typical. Conversely, we never accept explanations based on extremely special and fine-tuned initial conditions since those could “explain” almost anything. Analogously, if we want to apply typicality reasoning in a metaphysical context—evaluating the merits of a Humean versus anti-Humean metaphysics—we need a reference class of possible worlds that is determined by the respective ontologies and does not a priori coincide with nomic possibilities. The relevant reference class that I propose is generated as follows: Fix the fundamental ontology of the world as postulated by a metaphysical theory, that is, the fundamental entities with their essential properties, and consider all their possible configurations, i.e., possible distributions of contingent properties (such as spatiotemporal relations) over these “individuals.” Possible worlds thus generated are sometimes called Wittgenstein worlds, in reference to the following passage of the Tractatus: 2.0271 The object is the fixed, the existent; the configuration is the changing, the variable. 2.0272 The configuration of the objects forms the atomic fact. […] 2.04 The totality of existent atomic facts is the world.

Allowing for “augmentation” and “contraction”—adding individuals (but not universals) beyond those that exist or removing some that do— the set of Wittgenstein worlds is extended to “Armstrong worlds” (Kim, 1986) and the theory of modality known as combinatorialism (Armstrong (1986, 1989); see Sider (2005) for a recent discussion). The present discussion will not require augmentations and contractions, and if we consider the option that the laws of nature are themselves among the fundamental “entities” that exist in our world, adding or removing them would defeat the purpose. Hence, we shall keep the basic furniture of our world fixed, both in type and in number. To be clear, my goal here is not to defend this or any other version of combinatorialism as a full-blown theory of metaphysical possibility. Instead, let us call the relevant notion

334

D. Lazarovici

of modality ontological possibility, the crucial point being that a world is ontologically possible (according to a metaphysical theory) if it has the same fundamental ontology as posited for ours. Here are some examples of the use of ontological possibility. If the fundamental ontology of the world consists in point particles moving in space, it is ontologically necessary for all material objects to be spatially localized. If the fundamental ontology of the world consists of N permanent point particles, it is ontologically impossible for any object to be composed of more than N parts. According to a SuperHumean theory of space or spacetime (Huggett, 2006), it is ontologically possible for spacetime to have more than four dimensions. According to a functionalist theory of the mind—but not according to theories that postulate “minds” as ontological primitives—consciousness is ontologically contingent. Why should we care about this form of modality that I have just defined and that appears to have little precedent in the philosophical literature? Most basically, because the set of ontologically possible worlds amounts to a standard semantic interpretation of what a hypothesis about the fundamental ontology of the world means. Intuitively, because the fundamental entities that we believe to exist should have a distinguished epistemic and explanatory role over those that are merely possible or conceivable. Most importantly, because ontological possibility is the form of modality that captures the real disagreement between Humean and anti-Humean metaphysics. Humeans and anti-Humeans will agree on nomological possibilities (if they agree on our best physical theories), and they may agree or disagree on metaphysical possibilities for all kinds of philosophical reasons that can go beyond their stance on natural laws. Humeans, however, are committed to a principle of unrestricted recombination (Lewis, 1986b): It is possible to change the configuration of fundamental entities or properties in any part of the Humean mosaic while holding fixed the rest of the mosaic. This is the positive content of Humean metaphysics, the flip side of the negative theology regarding necessary connections and all kinds of “non-Humean whatnots.” The main anti-Humean positions, on the other hand, hold that there exists something in the actual world—be it essential dispositional

16 Typicality and the Metaphysics of Laws

335

properties or primitive laws—that restricts combinations, that makes it impossible, let’s say, for a world to have the same fundamental ontology as ours but a distribution of masses incompatible with the law of gravitational attraction. Notably, the relevant ontological commitment is not to some non-Humean laws, no matter how silly or complex, but to the fundamental laws that physics discovers in our universe (and of which, as of today, we have only partial or approximate knowledge). The anti-Humean positions I have in mind also include the view that the manifestations of the laws are essential to them, i.e., that a non-Humean law is the same in every world in which it exists. The different meanings of “nomological possibility” under a Humean and anti-Humean understanding are then manifested in the fact that, according to the latter but not the former, ontologically possible worlds form a subset of the nomologically possible ones. (Of course, many anti-Humeans go as far as holding that nomic possibility coincides with metaphysical possibility, but this is an unnecessarily strong assumption for our purposes.)

16.2.2 Typicality and the Case Against Humeanism With such a reference class of ontologically possible worlds, typicality can play a similar role in metaphysics as it does in the physical sciences. Any law hypothesis in physics designates a set of nomologically possible worlds. This set must contain the actual world for the proposed law to have any chance of being true. However, this is not sufficient for us to judge the law hypothesis as satisfying or explanatory or even empirically adequate. Very plausibly, there are Newtonian universes in which particles form interference patterns whenever they are shot through a double slit and recorded on a screen. These and other quantum phenomena that come as close as it gets to falsifying Newtonian mechanics are not made impossible by Newtonian laws; they just come out as atypical. On the other hand, whenever we succeed in explaining (macroscopic) phenomena based on the fundamental (microscopic) laws, we show that they are typical, that they obtain in nearly all nomologically possible worlds (possibly restricted by certain macroscopic boundary conditions). Among the typical features of our world are statistical regularities, which

336

D. Lazarovici

is where objective probabilities come into play (see Chaps. 4 and 5). And if Bohmian mechanics is true, we even understand why the phenomena of quantum mechanics are typical in this sense (Chap. 13). The case of the thermodynamic arrow discussed in Chaps. 8 and 11 is a particularly interesting example. It is argued, based on the insights of Boltzmann’s statistical mechanics, that nearly all possible micro-histories, relative to a low-entropy initial macrostate, correspond to an evolution of increasing entropy. However, it is atypical for the universe to be in a low-entropy state, to begin with. This is why it’s considered necessary to invoke the Past Hypothesis as an additional theoretical postulate—leading to the debate about whether this Past Hypothesis is of the right kind to be a basic axiom, even an additional law, or whether it cries out for further explanation. The way in which we evaluate a physical theory based on typicality reasoning is thus roughly the following: We consider the set of nomologically possible worlds determined by the laws it postulates and require that the relevant features of our world—the phenomena that are the target of physical explanation—come out as typical (or, at the very least, not as atypical). If our world corresponds, in the relevant respects, to an extremely special and fine-tuned model of the theory, we amend or reject the theory. If we did not follow this standard, we would lose all means to test a theory against empirical evidence, because special initial conditions could account for almost anything. Atypical phenomena, in other words, are not logically inconsistent with the theory but create an epistemically unstable situation. And refusal to move means, in effect, giving up on a rational understanding of the world. The idea that our world just happens to be, in the relevant respects, an atypical model of our theory is unacceptable in science. I believe that this rationality principle is so deeply rooted in scientific thought that it is rarely made explicit, let alone questioned. In fact, more authors have questioned the rules of logic (see, e.g., Putnam (1969)) than entertained “explanations” based on atypicality. I submit that the same way of reasoning should apply in metaphysics when we judge theories about the fundamental ontology of the world. If we want to assess what explanatory work an ontological hypothesis is

16 Typicality and the Metaphysics of Laws

337

doing and how it matches the world that we live in, we should consider the corresponding set of ontologically possible worlds and require, at the very least, that the features of our world which fall under the purview of the proposed metaphysics do not come out as atypical. We will never get around the problem of empirical underdetermination, but this does not mean that there are no rational standards by which ontological commitments can be tested against the world. Typicality provides such a standard. And if we ignored it, we could postulate virtually any ontology we like—as long as it gives us enough “degrees of freedom” to play around with—and claim that they are arranged in precisely such a way as to ground, realize, or serve as the supervenience base of whatever structure we identify in nature. In other words, except for being logically inconsistent, both physical and metaphysical theories can hardly do worse (in their respective domains) than make the relevant features of our world atypical. And when we consider proposals for the metaphysics of laws, the “lawfulness” of the world is undoubtedly a relevant feature. Very much related is another precedent from science, namely that of typicality as a necessary condition for a successful reduction. We accept the reduction of the thermodynamic theory of gases to the kinetic theory of particles—including the ontological reduction of gases to particle configurations—because the atomistic theory makes the thermodynamic phenomena typical. We accept that water is .H2 O because large .H2 O systems typically realize the functional properties of water. Conversely, since special micro-configurations could realize almost anything, the typicality standard prevents spurious reductions of the form: Assume that X has the right configuration to reduce Y , then reduce Y to X. Humean supervenience has essentially this character: Assuming that the Humean mosaic is exactly as if governed by laws, we can reduce the laws to the mosaic. It is, admittedly, difficult to find genuine examples of typicality reasoning in metaphysics that do not rely on natural laws (and hence nomic possibilities). However, it is not too much of a stretch to revisit Leibniz’s psychophysical parallelism, which denied the possibility of causal interactions between body and mind, and ask: Why did Leibniz need to invoke his (in)famous doctrine of pre-established harmony to account for the correlation of physical and mental events? Could he not have simply

338

D. Lazarovici

declared it a brute fact that physical and mental states happen to evolve in perfect coordination? No, because this claim is patently absurd; because without God’s synchronization and in the absence of any modal connection, the correlation of physical and mental events would be not only unexplained but atypical. Clearly, there are countless more ways in which the mental and physical history of the world could be in discord than in harmony.3 And clearly, discord is, therefore, what Leibniz’s ontology of monads would entail without God’s pre-established harmony. The upshot is that, because a mere denial of mind–body connections would make their conformity atypical (though not impossible) and because giving up on this conformity would lead to absurdity, it is unacceptable to stop at denying mind–body connections without invoking an additional metaphysical principle that accounts for their coordination. Humeanism, as I am now going to prove, makes the lawfulness of the world atypical, the “harmony” between physical events at different times and places that allows for their systematization by Humean laws. As a consequence, we can accept Humean metaphysics and be anti-realists about laws,4 or we can believe in physical laws and admit something into the fundamental ontology that accounts for their existence. What we cannot do, based on the rationality principles of typicality, is to buy into what most Humeans are selling: That the ontology of our world is a Humean one and that our world is an atypical instantiation of a Humean ontology—atypical with respect to its lawfulness, the very feature at the center of the Humean account.

16.3 Typical Humean Worlds Have No Laws We turn to proving the main theorem of this chapter. In brief, typical Humean worlds have no laws. I will begin with a simple toy model that I call the Chaitin model, after Gregory Chaitin (2007), who, based on ideas 3A

judgment we can make with high confidence even though we have nothing like a “probability measure” on mental states. 4 Which is not completely absurd; maybe Nancy Cartwright (1983) is right, and we never had good reason to believe that the laws of nature are exactly and universally true.

16 Typicality and the Metaphysics of Laws

339

that strike me as very Humean,5 proposed a connection between scientific practice and algorithmic information theory.

16.3.1 The Chaitin Model In our model, a Humean world—with the totality of physical facts—is represented by an infinite sequence of 0’s and 1’s. Assuming a principle of unrestricted recombinations, the set of ontologically possible worlds thus corresponds to .W = {0, 1}N , the set of all .0 − 1-sequences. The Kolmogorov complexity of a sequence .w ∈ W is defined as the length of the shortest algorithm that generates it. If w has finite Kolmogorov complexity, i.e., can be produced by a finite algorithm, it is called algorithmically compressible. For instance, the sequence .w0 = 0101010101 . . . can be generated by an algorithm like w h i l e Tr u e : print ("01") an infinite loop repeating 01, so that it is algorithmically compressible with Kolmogorov complexity 22 or less. An algorithm producing the sequence .w ∈ W corresponds to a systematization of w, a candidate for the Humean laws that provide the optimal summary of this world. In the spirit of the best system account, we can think of the algorithms’ lengths as the measure of their simplicity (cf. Wheeler (2016)), although our argument will not require laws to be particularly simple; they only have to be finite. One problem, also familiar from the best system account, is that the length of an algorithm depends on the language in which it is written.6 We will call two languages .L1 and .L2 intertranslatable if there exists a finite set of rules translating any algorithm in .L1 into an algorithm in .L2 and vice versa. It is easy to see that intertranslatability is an equivalence relation and that the Kolmogorov complexity of a sequence with respect 5 Interestingly, 6 The

Chaitin himself attributes them to Leibniz. short example above is written in Python.

340

D. Lazarovici

to any two intertranslatable languages differs at most by a finite constant. Hence, algorithmic compressibility is well defined on these equivalence classes. It is well known that the best system account would be trivial without some restriction on the admissible languages in which the systematizations can be formulated. For otherwise, the best system would simply consist in a primitive predicate F such that .F (w) is true if and only if w is the actual world .@, see Lewis (1983, p. 367). The restriction we impose is that the language of the systematization be intertranslatable with some language known to humanity. This seems very generous, certainly more so than if we assumed a privileged language of “natural predicates.”7 Hence, let .L be the set of finite algorithms (“possible laws”) in any language intertranslatable with some language known to humanity, and ∗ .W ⊂ W be the corresponding set of compressible sequences. We call any .w ∈ W ∗ a lawful world. Now, the following are simple mathematical facts: • The set W of “possible Humean mosaics” is uncountably infinite (its cardinality is that of the continuum): .|W | = 2ℵ0 > ℵ0 . • The set .L is countably infinite: .|L| = ℵ0 . (There are at most countably many admissible languages and countably many finite algorithms that can be formulated in each. A countable union of countable sets is countable.) • The set of compressible sequences (“lawful worlds”) cannot be greater than the set of possible algorithms (“laws”): .|W ∗ | ≤ |L| = ℵ0 . (Since each algorithm generates at most one sequence.) ∗ | • We conclude: . |W = 0. That is, almost all sequences are algorith|W | mically incompressible. Or again, almost all Humean worlds have no laws. As in the example of irrational numbers, we could also express typicality in terms of a measure rather than cardinalities. It then holds true that ∗ .μ(W ⊂ W ) = 0 with respect to all measures on W that are zero on one-

7 In

the toy model, this might correspond to defining compressibility with respect to a fixed universal Turing machine.

16 Typicality and the Metaphysics of Laws

341

element subsets. In conclusion, “lawfulness” is atypical among Humean worlds under any reasonable interpretation of the concept.

16.3.2 From the Toy Model to the Real World While the Chaitin model is instructive, the real world is obviously not a sequence of numbers, and the fundamental laws of nature are not just algorithms for data compression. They are, first and foremost, dynamical laws for the microscopic constituents of the world. In order to extend our previous result to realistic physical laws—focusing, for now, on deterministic ones—we proceed as follows. We take a slice V of the mosaic that is sufficiently extended in space and time to fix, not only initial conditions for any deterministic dynamics, but also the values of all free parameters, like constants of nature, that may appear in their formulation. V could be the actual history of our universe up to some time t, but a great many other choices will do, as well. Then there exist at most countably many deterministic laws (if any) compatible with the facts in V —each determining a unique history for the rest of the universe—but uncountably many Humean possibilities to complete the mosaic. Hence, we conclude that, whatever the facts in V , it is atypical for the rest of the Humean mosaic to be constituted in a way that is consistent with a deterministic law (formulated in any language, formal or natural, that we could ever hope to understand). As a corollary, we obtain this: Assuming Humean supervenience, any deterministic system that can describe a world up to time t will typically fail to be true at later times. This gives a precise form to the argument that Humeanism cannot sustain inductive inferences (Armstrong, 1983; Dretske, 1977). Admittedly, the principle of induction is hard to justify in general, but Humeanism undermines it, making it irrational to expect anything but its failure. While this (a)typicality result seems serious enough, it is, strictly speaking, a conditional claim “given one part of the mosaic.” In general, there are already uncountably many possibilities for the “initial data,” that is, an uncountable infinity of worlds consistent with every single

342

D. Lazarovici

deterministic law. At this point, we need some measure theory, after all, to obtain an unconditional typicality result. As always, we assume that one-element subsets (and hence, by .σ -additivity, countable subsets) of an uncountable set have measure zero. In addition, we require only that this remains true if we conditionalize on the configuration of the Humean mosaic in V and count the possible configurations in some distant region U . This is certainly legitimate considering the Humean principle of free combinations, which holds that one puts no restrictions on the other. Our assumption is actually much weaker than independence of the configurations in V and U , i.e., that the measure factorizes. There is one technical subtlety involved in the proof that is given in Appendix B.2. It is due to the fact that we are potentially conditionalizing on a null set. But the argument essentially concludes as follows. Denote by .wU the configuration of the mosaic in a spacetime region U . There are uncountably many possibilities for .wU , but (by the previous argument) at most countably many consistent with a deterministic law and the “boundary condition” .wV . Hence, .μ(wU consistent with a law |wV ) ≡ 0 (for a suitable choice of U and V ) and thus, with .W ∗ the set of lawful Humean worlds,  ∗ .μ(W ) ≤ μ(wU consistent with a law |wV ) dμ(wV ) = 0 according to any reasonable measure. We conclude as follows. Theorem 5 It is atypical for Humean worlds to be consistent with any deterministic systematization. Philosophically, the notion of a “reasonable measure” is doing important work here. Mathematically, it is possible to define other measures, but those are so clearly biased or ad hoc that they cannot play the role of a typicality measure. Mathematically, one could also put a delta-measure on the reals and say that “almost all real numbers are zero.” But this statement would only be true in the technical sense in which the locution “almost all” is used in measure theory. In any other sense of the words, it is merely an abuse of language. The point is that a typicality statement will have

16 Typicality and the Metaphysics of Laws

343

rational implications if and only if it is made with respect to a reasonable notion of “almost all” (or “large” versus ”small” sets of possible worlds). And I claim that the assumptions of our theorem are so weak and well motivated that they exhaust all measures that could pass for “reasonable.” To deny the conclusion of the theorem is to either deny that a oneelement subset is vanishingly small compared to an uncountably infinite set (which seems absurd) or presuppose extremely strong “correlations” between different parts of the mosaic (which means, in effect, to deny Humeanism).

Finite Systematizations The theorem about the atypicality of lawful Humean worlds relies on the assumption that there are uncountably many possible configurations of the mosaic or an infinite number of physical facts that the laws would have to summarize. On what basis could this assumption be denied? One could insist that the world is finitary, i.e., that space and time are finite and discrete, and that there are no continuous degrees of freedom in the physical ontology. While this cannot be ruled out in principle, it constitutes a very strong a priori commitment and a revisionary stance with respect to contemporary physics. Alternatively, one could insist that the laws of nature do not have to provide a complete systematization of the world but only summarize a particular, limited subset of events—e.g., measurement results or empirical observations—that is plausibly finite. This second option essentially amounts to instrumentalism, the view that laws are efficient book-keepers of empirical data rather than universal truths about the world. In any case, if laws had to account only for a finite number of physical facts, it would still be true that typical Humean worlds are more or less irreducibly complex—meaning that they cannot be systematized by laws that are significantly simpler than a complete list of the relevant events—but only with respect to a more limited set of languages in which the systems can be formulated. (Think of the Chaitin model and the question of whether the Kolmogorov complexity of a finite sequence is significantly lower than the length of that sequence.) One could thus

344

D. Lazarovici

retreat to the idea that the regularity of our universe is not objective but that (instrumentalist) laws, and the regular patterns they summarize, exist because we have adapted our conceptual and mathematical tools to the world that we inhabit (cf. Wenmackers (2016)). Although I find this view uncompelling—how could we even exist without a high degree of order in nature?—I am not going to argue against it in more detail. If Humeans conceded that their view of laws is de facto instrumentalism (or requires a revisionist stance on physics), we would have a very different debate.

Indeterministic Laws The issue becomes more complicated if we consider the possibility of indeterministic laws. On the one hand, a probabilistic law could be logically consistent with any mosaic whatsoever (unless there were real propensities in the world that the law is supposed to summarize). It is usually argued that it could not be the best system for a world whose frequencies have little to do with the law’s probabilities. But the argument is based on the premise that there would then be a probability law with much better “fit,” which begs the question as to whether there is any good systematization at all. On the other hand, one could make the case that typical Humean worlds are actually well described by something like Brownian motion8 —which is technically a stochastic law but one describing pure noise rather than any kind of regular order. For a probabilistic law to be predictive and support something resembling causal inferences, it must describe robust correlations between relevant events (Lewis (1980) talks, in particular, about “history to chance conditionals”), that is, expressions of the form .P(A | B) where the conditional probability for A depends non-trivially on B. In a world like ours, the history of the Galaxy should make it reasonably likely that the Earth will still be in its solar orbit 10 seconds from now. Kicking a ball from the left/right should make it likely that the ball flies off to the right/left. In general, distributions of masses and charges in different 8 That

is, a Wiener process for the distribution of the fundamental ontology.

16 Typicality and the Metaphysics of Laws

345

spacetime regions must be conspicuously correlated in order to manifest law-like regularities. Could such correlations be typical, or at least not atypical, among ontologically possible Humean worlds? Not if we take Humean metaphysics seriously in its denial of modal connections. If the principle of unrestricted recombinations allows us to “count” the possible configurations of one part of the mosaic independently of the configuration of any other part of the mosaic, robust correlations would be atypical. There might be some islands of order –just as meaningful text will occasionally appear among the random typings of a monkey– but no regularities that persist through space and time. This is, of course, a less rigorous argument than the theorem about deterministic laws. And the typicality statement has to rely on much more specific typicality measures with strong independence properties. However, at the end of the day, I don’t expect the contentious point to be whether Humean metaphysics fares much better with respect to probabilistic laws than deterministic ones. As with instrumentalism, committing to indeterminism a priori does not seem like an escape that the best system account of laws would survive in its current form.

16.4 On the Uniformity of Nature With the caveats just discussed, it is a mathematical fact that the existence of Humean laws is atypical for Humean worlds. This typicality fact does not depend on our beliefs or a priori expectations, but it has certain implications for what we should believe, accept, or seek to explain; in this case, that we cannot accept Humean metaphysics and believe in a lawful universe without seeking an explanation for its lawfulness. On the other hand, the explanation provided by non-Humean laws is not a bona fide typicality explanation because non-Humean laws make their regularities not just typical but necessary (which is strictly stronger). The primary role of typicality in the argument is thus not to sustain an explanation but to establish that one is required, that the price for declaring the history of the universe to be entirely contingent is unreasonably high. However, what applies here, as well as to bona fide

346

D. Lazarovici

typicality explanations, is that they do not have to involve an interesting “mechanism” by which the explanandum comes about. In statistical mechanics, there is no causal mechanism by which entropy increases, nor is there any need for one. Here, there is no interesting story left to tell about how laws govern or how dispositions bring about their manifestation;9 the point is that they are a natural part of an ontology that doesn’t make the existence of regularities in the world miraculous. This explanatory virtue of non-Humean laws comes from their modal force, from the way in which they restrict the possibility space of the world. In contrast, the idea that non-Humean laws fare better in explaining their particular instances has made the modal realist position vulnerable to the virtus dormitiva objection that any explanation they provide over and above the regularity theory is trivial or circular: Why do masses attract each other? Because they have the disposition to attract each other. Or again, because it is a law that masses attract each other. In contemporary literature (see, e.g., Emery (2019)), such statements are often spelled out in terms of grounding relations or as in virtue of explanations, which makes them manifestly non-circular but still ring hollow to people not already sold on the merits of these metaphysical concepts. Indeed, the impactful argument of Loewer (2012b)—which not only rejects the charge that Humean laws are not explanatory but puts anti-Humeans on the defensive—was to insist on a distinction between scientific and metaphysical explanations, suggesting that the latter are ipso facto unscientific and thus somehow suspect. Thinking in terms of typicality—and applying the rationality principles that we apply in science—one understands that the point of non-Humean laws is not to provide an additional metaphysical ground for individual instances but to account for why our world is lawful in the first place. At the end of the day, one can only go so far in compelling someone to accept a particular way of reasoning and the epistemic norms that come with it. Some readers may deny that typicality facts have any philosophical implications, not even grant that Humeanism makes the lawfulness of our world surprising or remarkable. But there is no shame in sharing, at least,

9 “It

is the business of laws to govern,” as Schaffer (2016) puts it.

16 Typicality and the Metaphysics of Laws

347

in a sense of wonder about the order of our cosmos (after all, according to Aristotle, the sense of wonder is the beginning of philosophy). The following passage from one of Albert Einstein’s letters to Maurice Solovine comes to mind: You find it strange that I consider the comprehensibility of the world (to the extent that we are authorized to speak of such a comprehensibility) as a miracle or as an eternal mystery. Well, a priori one should expect a chaotic world which cannot be grasped by the mind in any way. One could (yes one should) expect the world to be subjected to law only to the extent that we order it through our intelligence. Ordering of this kind would be like the alphabetical ordering of the words of a language. By contrast, the kind of order created by Newton’s theory of gravitation, for instance, is wholly different. Even if the axioms of the theory are proposed by man, the success of such a project presupposes a high degree of ordering of the objective world, and this could not be expected a priori. That is the “miracle” which is being constantly reinforced as our knowledge expands. There lies the weakness of positivists and professional atheists who are elated because they feel that they have not only successfully rid the world of gods but “barred the miracles.” (Cited from Einstein (1987, pp. 132–33))

What, to their credit, distinguishes most Humeans from the “positivists and professional atheists” that Einstein talks about, is some recognition that the best system account of laws has to rely on nature being “kind to us” (Lewis, 1994, p. 479), on “a high degree of ordering of the objective world” that could not, by any means, be expected a priori. But this kindness of nature is so stupendous and is doing so much work in the best system account that it is highly unsatisfying, if not intellectually dishonest, to leave it as a footnote or some sort of auxiliary assumption without any basis in the metaphysical theory. If Humeans tried to give it more flesh and spell it out as a metaphysical principle that makes the uniformity of the world typical (or necessary),10 their account would be much more sound but also start to look a lot more like anti-Humeanism.

10 A

metaphysical analog of the Past Hypothesis in physics, so to speak.

348

D. Lazarovici

On the other hand, some authors have argued that anti-Humean metaphysics fare no better in explaining the uniformity of nature (Hildebrand, 2013). In this vein, advocates of the regularity theory might admit that Humeanism fails to account for a lawful universe but deny that antiHumean positions have an explanatory advantage. In the language of typicality, the relevant argument goes roughly as follows: Even if the ontology of our world contained laws (or some form of modal connections) that necessitate simple universal regularities, this very fact would be atypical as well. In almost all worlds in which non-Humean laws exist, the laws are too strange or complex to allow for any meaningful systematization. Hence, the typicality argument can be turned just as well against the antiHumean theories. It is not clear that this typicality statement is true. At least, modal realism does not entail the possibility of arbitrarily complex laws in the sense in which Humean metaphysics entails the possibility of arbitrarily complex mosaics. It also seems relevant to note that if we change the configuration of a lawful mosaic only slightly,11 it will, in general, no longer be a lawful mosaic. If we change a simple law only slightly, it will still be a simple law. This is to say that the “degrees of freedom” of a law are very different from those of the world, and the question of what possibilities we must admit with respect to the type “law of nature” strikes me as a very difficult one. Hildebrand (2013) takes nomic primitivism to mean that there exists a primitive lawhood operator “It is a law that…,” which can attach to any proposition P , no matter how gruesome or unnatural. But this is not at all how physical laws are formulated or what the main anti-Humean theories actually commit to. It is also unclear what the reference class of the above typicality statement is supposed to be. Certainly not ontologically possible worlds as I introduced them, probably metaphysically possible worlds under a liberal interpretation of metaphysical possibility that regards the fundamental ontology of the world as contingent. But the argument is thereby shifting the debate from ontology to meta-ontology, from the question: “What 11 In

a topological sense, e.g., in a small spacetime region, not in the Humean sense of “closeness of worlds” that tries to hold the laws fixed by fiat.

16 Typicality and the Metaphysics of Laws

349

is the fundamental ontology of our world (and does it contain the laws that physics discovers)?” to “Why is the fundamental ontology (here, specifically, the laws) what it is?” I am not sure if this is a good and tractable question. It is, in any case, not the question we set out to debate. It might be worth exploring the idea of meta-laws that constrain the possible non-Humean laws (Lange, 2009), but then the account ends with the meta-laws, which may leave the inquirer equally unsatisfied. To my mind, the physicist has done her job when she succeeds in grounding the phenomena in fundamental laws (by showing that they are typical with respect to nomological possibility), and the metaphysician has done hers when she succeeds in grounding those laws in the fundamental ontology. Humeanism does not succeed—because it makes the existence of laws atypical—and we must be wary of attempts to move the goalposts in response. As an analogy, consider the claim that matter moves along three spatial dimensions because space, however conceived, is three dimensional. (It is possible yet atypical that space has more dimensions,12 but all motion happens to occur along a three-dimensional submanifold.) But why has space three dimensions when it could, at least mathematically, have arbitrarily many? I don’t know, and this was not the issue. To be clear, the goal of this discussion was never to defend antiHumeanism as an a priori thesis. No one, I think, holds the view that our world must contain some primitive laws or dispositions, even if they govern only the growth of beetroots or account for no meaningful regularities at all. My belief in non-Humean laws is very much contingent on the success of the scientific enterprise. And if I woke up tomorrow and found that the law of gravitation no longer holds, I would float through the air and admit that Humeanism was probably correct all along. Anti-Humean metaphysics does not relieve us of wonder and amazement about the simple and elegant laws that we discover in our universe. But the existence of something over and above the Humean mosaic is an ontological conclusion that we draw from this discovery—with good

12 Or no dimensionality at all, see Lazarovici (2018) for a typicality argument against so-called Super-Humeanism.

350

D. Lazarovici

reason, as this chapter has argued in detail. That may be as far as we can go. However, if there were a chance to take the explanation one step further, to understand why the laws are what they are, we should, by all means, follow the evidence where it leads us. It could, in any case, lead us only further away from Humeanism.

References Armstrong, D. M. (1983). What is a law of nature? Cambridge: Cambridge University Press. Armstrong, D. M. (1986). The nature of possibility. Canadian Journal of Philosophy, 16 (4), 575–594. Armstrong, D. M. (1989). A combinatorial theory of possibility. Cambridge: Cambridge University Press. Bhogal, H., & Perry, Z. R. (2017). What the Humean should say about entanglement. Noûs, 51(1), 74–94. DOI 10.1111/nous.12095. Bird, A. (2007). Nature’s metaphysics: Laws and properties. New York: Oxford University Press. Callender, C. (2015). One world, one beable. Synthese, 192(10), 3153–3177. Carroll, J. W. (1994). Laws of nature. Cambridge: Cambridge University Press. Cartwright, N. (1983). How the laws of physics lie. Oxford: Oxford University Press. Chaitin, G. J. (2007). Thinking about Gödel and Turing: Essays on complexity, 1970–2007. Singapore: World Scientific. Chen, E. K. & Goldstein, S. (2022). Governing without a fundamental direction of time: Minimal primitivism about laws of nature. In Y. Ben-Menahem (Ed.). Rethinking the concept of law of nature: Natural order in the light of contemporary science. Jerusalem Studies in Philosophy and History of Science (pp. 21–64). Cham: Springer International Publishing. Cohen, J. & Callender, C. (2009). A better best system account of lawhood. Philosophical Studies, 145(1), 1–34. Dretske, F. I. (1977). Laws of nature. Philosophy of Science, 44(2), 248–268. Einstein, A. (1987). Letters to Solovine. Philosophical Library. Emery, N. (2019). Laws and their instances. Philosophical Studies, 176 (6), 1535– 1561. Esfeld, M. (2014b). Quantum Humeanism, or: Physicalism without properties. The Philosophical Quarterly, 64(256), 453–470.

16 Typicality and the Metaphysics of Laws

351

Esfeld, M., Lazarovici, D., Hubert, M., & Dürr, D. (2014). The ontology of Bohmian mechanics. British Journal for the Philosophy of Science, 65(4), 773– 796. Filomeno, A. (2021). Are non-accidental regularities a cosmic coincidence? Revisiting a central threat to Humean laws. Synthese, 198(6), 5205–5227. Finkelberg, A. (2017). Heraclitus and Thales’ conceptual scheme: A historical study. BRILL. Hall, N. (2015). Humean reductionism about laws of nature. In A companion to David Lewis, Chap. 17 (pp. 262–277). Wiley. Hildebrand, T. (2013). Can primitive laws explain? Philosopher’s Imprint, 13(15), 1–15. Huggett, N. (2006). The regularity account of relational spacetime. Mind, 115(457), 41–73. Kim, J. (1986). Possible worlds and Armstrong’s combinatorialism. Canadian Journal of Philosophy, 16 (4), 595–612. Lange, M. (2009). Laws and lawmakers: Science, metaphysics, and the laws of nature. Oxford: Oxford University Press. Lange, M. (2013). Grounding, scientific explanation, and Humean laws. Philosophical Studies, 164, 255–261. Lazarovici, D. (2018). Against fields. European Journal for Philosophy of Science, 8(2), 145–170. Lewis, D. (1980). A subjectivist’s guide to objective chance. In W. L. Harper, R. Stalnaker, & G. Pearce (Eds.). IFS: Conditionals, belief, decision, chance and time, The University of Western Ontario Series in Philosophy of Science (pp. 267–297). Dordrecht: Springer Netherlands. Lewis, D. (1983). New work for a theory of universals. Australasian Journal of Philosophy, 61(4), 343–377. Lewis, D. (1986a). On the plurality of worlds. Oxford: Blackwell. Lewis, D. (1986b). Philosophical papers (Vol. 2). Oxford: Oxford University Press. Lewis, D. (1994). Humean supervenience debugged. Mind, 103(412), 473–490. Loewer, B. (1996). Humean supervenience. Philosophical Topics, 24, 101–127. Loewer, B. (2012b). Two accounts of laws and time. Philosophical Studies, 160(1), 115–137. Maudlin, T. (2007a). The metaphysics within physics. Oxford: Oxford University Press. Maudlin, T. (2007b). What could be objective about probabilities? Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics, 38(2), 275–291.

352

D. Lazarovici

Miller, E. (2014). Quantum entanglement, Bohmian mechanics, and Humean supervenience. Australasian Journal of Philosophy, 92, 567–583. Putnam, H. (1969). Is logic empirical? In R. S. Cohen, & M. W. Wartofsky (Eds.). Boston studies in the philosophy of science: Proceedings of the Boston colloquium for the philosophy of science 1966/1968. Boston Studies in the Philosophy of Science (pp. 216–241). Dordrecht: Springer Netherlands. Schaffer, J. (2008). Causation and laws of nature: Reductionism. In T. Sider, J. Hawthorne, & D. W. Zimmerman (Eds.). Contemporary debates in metaphysics (pp. 82–107). Blackwell. Schaffer, J. (2009). Spacetime the one substance. Philosophical Studies, 145(1), 131–148. Schaffer, J. (2016). It is the business of laws to govern. Dialectica, 70(4), 577– 588. Sider, T. (2005). Another look at Armstrong’s combinatorialism. Noûs, 39(4), 679–695. Strawson, G. (2014). The secret connexion: causation, realism, and David Hume: Revised edition. Oxford: Oxford University Press. Wenmackers, S. (2016). Children of the cosmos. In A. Aguirre, B. Foster, & Z. Merali (Eds.). Trick or truth? The mysterious connection between physics and mathematics. The Frontiers Collection (pp. 5–20). Cham: Springer International Publishing. Wheeler, B. (2016). Simplicity, language-dependency and the best system account of laws. Theoria: An International Journal for Theory, History and Foundations of Science, 31(2), 189–206.

A Time-Reversal Invariance

Our discussion of the second law of thermodynamics has repeatedly emphasized the time-reversal invariance of the microscopic dynamics without making precise what this symmetry actually is and how relevant laws exhibit it. I have the impression that the concept of time reversal is rather uncontroversial in physics, though not so among philosophers of physics. Hence, some remarks (beyond Newtonian mechanics) may be in order.

Newtonian Mechanics Consider an N particle system following a Newtonian equation of motion ¨ mX(t) = F (X(t)) ,

.

(A.1)

where .X(t) = (x1 (t), . . . , xN (t)) ∈ R3N is the (spatial) configuration at time t, .m = diag(m1 , . . . , mN ) the mass matrix, and F an N -particle

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Lazarovici, Typicality Reasoning in Probability, Physics, and Metaphysics, New Directions in the Philosophy of Science, https://doi.org/10.1007/978-3-031-33448-1

353

354

D. Lazarovici

force field. Let .X(t), t ∈ [0, T ] be a solution of (A.1). We call ¯ X(t) := X(T − t), t ∈ [0, T ]

.

(A.2)

¯ the time reversal of .X(t), and the evolution of .X(t) reversible if .X(t) is also a solution of (A.1), i.e., a possible Newtonian history. To verify that this is indeed the case, we take the time derivatives d ˙¯ ˙ X(t) = X(T − t) = −X(T − t), dt . d2 ¨¯ ¨ − t), X(t) = 2 X(T − t) = X(T dt

(A.3)

and using the fact that .X(t) is a solution of (A.1), we find ¨¯ ¨ − t) = F (X(T − t)) = F (X(t)). ¯ mX(t) = mX(T

.

(A.4)

Hence, all Newtonian evolutions are reversible and we therefore call laws of the form (A.1) time-reversal invariant. Remark 9 (On Time Reversal) 1. Since the law is also time-translation invariant, it does not matter if we ¯ ¯ := X(T − t) or .X(t) := X(−t) as the time reversal of consider .X(t) .X(t). 2. The time reversal of .X(t) goes through the same spatial configurations in opposite order, while the velocities are reversed (as we can see from (A.3)). 3. Time-reversal invariance would not hold in the presence of dissipative ˙ But such Newtonian force laws are usually forces .F = F (X, X). considered to be effective (e.g., friction) rather than candidates for a “fundamental” law.

A Time-Reversal Invariance

355

Electrodynamics For simplicity, we shall not consider the full Maxwell–Lorentz theory but only a single charge in an external electromagnetic field .(E(t, x), B(t, x)). The Lorentz force law then reads: m¨x(t) = q [E (t, x(t)) + x˙ (t) × B (t, x(t))] .

.

(A.5)

Let .x(t), t ∈ [0, T ] be a solution of (A.5) and .x¯ (t) := x(T − t) its time reversal. Analogously, the “naive” time reversal of the electromagnetic field ¯ ¯ would be .(E(t), B(t)) := (E(T − t), B(T − t)). With (A.3), we find that .x(t) ¯ satisfies   mx¨¯ (t) = q E¯ (t, x¯ (t)) − x˙¯ (t) × B¯ (t, x¯ (t)) .

.

(A.6)

This is not the Lorentz force law (A.5) for the time-reversed fields ¯ ¯ (E(t), B(t)) because of the minus sign in front of the velocity-dependent term. However, .x¯ (t) is a solution of (A.5) with the electromagnetic field

.

.

     E(t),  B(t) := E(T − t), −B(T − t) .

(A.7)

And it is this transformation that is also consistent with time reversal in the field equations that I omitted here (although it would not be hard to see from the inhomogeneous Maxwell equations .∇ × E = −∂t B and 1 .∇ × B = B + 2 ∂t E why the .B-field picks up a minus sign under a c reparameterization .t → T −t). Therefore, it is widely agreed that classical electrodynamics is time-reversal invariant but that the fields transform ¯ B). ¯ non-trivially under time reversal, namely into .( E,  B) rather than .(E, A notable dissent comes from Albert (2000), who argues that the fields are part of the physical state and that time-reversal invariance would require that the exact same sequence of states could unfold in reversed order. I will briefly give my take on the issue below and otherwise refer to Struyve (2022) for an insightful discussion.

356

D. Lazarovici

Quantum Mechanics The fundamental dynamical equation in quantum mechanics is the Schrödinger equation i h¯ ∂t ψ = H ψ,

.

(A.8)

describing the time evolution of the wave function .ψ. Given a solution ¯ := ψ(t), t ∈ [0, T ] of (A.8), we consider the naive time reversal .ψ(t) ψ(T − t), t ∈ [0, T ] and see that it satisfies

.

¯ ¯ − i h¯ ∂t ψ(t) = H ψ(t).

.

(A.9)

Again, this is not the “right” equation, but off by a minus sign due to the fact that the Schrödinger equation is of first order in t. However, taking the complex conjugate of (A.9) (noting the imaginary unit on the lefthand side and that the Hamiltonian is real), we obtain i h¯ ∂t ψ¯ ∗ (t) = H ψ¯ ∗ (t).

.

(A.10)

¯ = ψ(T − t) does not solve In other words, the naive time reversal .ψ(t) the Schrödinger equation, but its complex conjugate (t) := ψ ∗ (T − t) ψ

.

(A.11)

does. This transformation corresponds to an anti-unitary time-reversal operator, if one wants to be fancy about it. So why is quantum mechanics considered to be time-reversal invariant?1 An orthodox answer is that the empirically relevant quantity is not .ψ(t) but the probability density .|ψ(t)|2 , which is obviously the same for .ψ and .ψ ∗ , or maybe “transition amplitudes” of the form 1 In quantum field theory, more precisely the standard model of particle physics, time-reversal invariance is violated in weak interactions, though the combined symmetry CPT (charge, parity, and time) is still considered a fundamental—and necessary—symmetry of nature, see Streater and Wightman (2000).

A Time-Reversal Invariance

357

ψ, φ = φ, ψ. However, for the non-positivist, there would seem to be a difference between an empirical invariance and a fundamental symmetry of nature. The orthodox view is thus somewhat unsatisfying.

.

Bohmian Mechanics As usual, things are much clearer in Bohmian mechanics. Bohmian mechanics is a theory about the motion of point particles (just like classical mechanics), so what needs to be reversible is, again, the evolution of the particle configuration .X(t) = (x1 (t), . . . , xN (t)) ∈ R3N , which now follows the guiding equation h¯ ψ ∗ ∇ψ ˙ X(t) = v ψ (X(t)) = Im ∗ (X(t)), m ψ ψ

.

(A.12)

defined in terms of the wave function. Given a solution .X(t), t ∈ [0, T ] ¯ of (A.12), one readily checks that its time reversal .X(t) solves ˜ ˙¯ ˙¯ X(t) = v ψ (X(t)),

.

(A.13)

i.e., the guiding equation for the time-reversed wave function (A.11). Therefore, Bohmian mechanics has time-reversal symmetry.

On the Role of Metaphysics The situation in Bohmian mechanics is quite analogous to that in classical electrodynamics: the evolution of the particle configuration is “exactly” reversible, but there are additional degrees of freedom—here the wave function, there the fields—that transform in a canonical yet non-trivial manner. One metaphysical view distinguishes the primitive ontology (PO)— the fundamental constituents of matter postulated by a theory—from degrees of freedom that belong to the dynamical or nomological structure of theory, whose role, in other words, is first and foremost a dynamical

358

D. Lazarovici

one for the evolution of the PO. Under symmetry transformation in general and time reversal in particular, the history of the PO has to be invariant (resp. covariant in the natural way), while the dynamical structure can transform non-trivially. Elsewhere, I defended the view that, in Bohmian mechanics and classical electrodynamics, the particles alone are the primitive ontology, while the wave function and electromagnetic field fall into the latter category of dynamical structure (Lazarovici (2018); Esfeld et al. (2014); for an even more radical view, see the “minimalist ontology” of Esfeld and Deckert (2017)). In any case, I believe that the debate about which theories have bona fide time-reversal symmetry is ultimately a debate about ontology. Let me end with a remark that may seem to come out of the blue but further illustrates the relevance of metaphysical considerations. If one holds that time itself has no primitive direction, then time reversal can only be understood as a passive symmetry transformation, since the idea that one could hold the direction of time fixed while reversing the order of events unfolding in it is meaningless. This passive transformation should be understood as follows. We parameterize physical histories by a continuum of real numbers. But this mathematical continuum has a surplus structure in that it is totally ordered, while time as such is not. The structure of time involves only a betweenness relation (a triadic relation between points in time), but no dyadic order relation of “earlier than” and “later than.” (This corresponds to a “C-theory” of time in the sense of McTaggart (1908); see also Farr (2020) for a good discussion.) Thus, reversing the orientation of our time parameterization (by .X(t) → X(−t)) is merely a different choice of gauge, which does not correspond to any difference in the world. It yields, in other words, a different mathematical representation of the same physical history. Notably, I did not claim that time-reversal symmetry of the fundamental laws implies the absence (or even irrelevance) of a fundamental arrow of time, say, on the basis that time reversal with respect to such an arrow would be a difference without a difference. Such arguments are ultimately begging the question (see Maudlin (2007, Chap. 4)). Timereversal invariant laws do not require a fundamental directionality of time. But a time-reversal transformation entails at least a global reversal of

A Time-Reversal Invariance

359

velocities (.v → −v). And whether or not this amounts to a genuine physical difference (e.g., the difference between an expanding and a contracting universe) depends on whether or not there is a fundamental direction of time.

B Proof of Theorems

B.1

Computation of the Gravitational Entropy

We prove our estimate of the gravitational entropy as a function of the macro-variables .E, I , U = −V , as introduced in Chap. 11. More precisely, the following. Theorem 4 We are interested in the phase space volume of the macro-region .Г(E, I ± ∈I , U ± ∈U ), i.e.,   1 3N d p d3N q δ H (q, p) − E 3N N !h

Gm2 ≤ (1 + ∈)U 1 (1 − ∈)U ≤ |qi − qj | . i N2 and .N ≥ 4, where .C = (2m) , with 3N 2(N!)h 3N−1 .Ω the surface area of the .(3N − 1)-dimensional unit sphere.

For non-negative E, this can be simplified by using .(E +(1+∈)U )n ≤ (1 + ∈)n (E + U )n ≤ e∈n (E + U )n and .(E + (1 − ∈)U )n ≥ (1 − ∈)n (E + U )n ≥ e−2∈n (E + U )n , for .∈ < 21 . This yields the bounds stated as Theorem 4 in Chap. 11

B Proof of Theorems

363

Proof We first perform the integral over the momentum variables and are left with 3N −2

(2m) 2 . Ω3N−1 2N !h3N



d q E+ 3N

i