129 21
English Pages 368 [389] Year 2016
PHYSICS RESEARCH AND TECHNOLOGY
QUANTUM MECHANICS PRINCIPLES, NEW PERSPECTIVES, EXTENSIONS AND INTERPRETATION REVISED EDITION
No part of this digital document may be reproduced, stored in a retrieval system or transmitted in any form or by any means. The publisher has taken reasonable care in the preparation of this digital document, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained herein. This digital document is sold with the clear understanding that the publisher is not engaged in rendering legal, medical or any other professional services.
PHYSICS RESEARCH AND TECHNOLOGY Additional books in this series can be found on Nova’s website under the Series tab.
Additional E-books in this series can be found on Nova’s website under the eBook tab.
PHYSICS RESEARCH AND TECHNOLOGY
QUANTUM MECHANICS PRINCIPLES, NEW PERSPECTIVES, EXTENSIONS AND INTERPRETATION REVISED EDITION
OLAVO LEOPOLDINO DA SILVA FILHO
New York
Copyright © 2016 by Nova Science Publishers, Inc. All rights reserved. No part of this book may be reproduced, stored in a retrieval system or transmitted in any form or by any means: electronic, electrostatic, magnetic, tape, mechanical photocopying, recording or otherwise without the written permission of the Publisher. For permission to use material from this book please contact us: Telephone 631-231-7269; Fax 631-231-8175 Web Site: http://www.novapublishers.com
NOTICE TO THE READER The Publisher has taken reasonable care in the preparation of this book, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained in this book. The Publisher shall not be liable for any special, consequential, or exemplary damages resulting, in whole or in part, from the readers’ use of, or reliance upon, this material. Any parts of this book based on government reports are so indicated and copyright is claimed for those parts to the extent applicable to compilations of such works. Independent verification should be sought for any data, advice or recommendations contained in this book. In addition, no responsibility is assumed by the publisher for any injury and/or damage to persons or property arising from any methods, products, instructions, ideas or otherwise contained in this publication. This publication is designed to provide accurate and authoritative information with regard to the subject matter covered herein. It is sold with the clear understanding that the Publisher is not engaged in rendering legal or any other professional services. If legal or any other expert assistance is required, the services of a competent person should be sought. FROM A DECLARATION OF PARTICIPANTS JOINTLY ADOPTED BY A COMMITTEE OF THE AMERICAN BAR ASSOCIATION AND A COMMITTEE OF PUBLISHERS. Additional color graphics may be available in the e-book version of this book.
Library of Congress Cataloging-in-Publication Data Names: Silva Filho, Olavo Leopoldino da, author. Title: Quantum mechanics : principles, new perspectives, extensions and interpretation / Olavo Leopoldino da Silva Filho (Universidade de Brasbilia, Instituto de Fbisica, Nbucleo de Fbisica Estatbistica e Fbisica Matembatica, Campus Universitbario Darcy Ribeiro, Brazil). Other titles: Physics research and technology. Description: Revised edition. | Hauppauge, New York : Nova Science Publishers, Inc., [2016] | Series: Physics research and technology | Includes bibliographical references and index. Identifiers: LCCN 2016023926 (print) | LCCN 2016026950 (ebook) | ISBN 9781634855709 (softcover) | ISBN 1634855701 (softcover) | ISBN 9781634855747 (H%RRN) Subjects: LCSH: Quantum theory. Classification: LCC QC174.12 .S524 2016 (print) | LCC QC174.12 (ebook) | DDC 530.12--dc23 LC record available at https://lccn.loc.gov/2016023926
Published by Nova Science Publishers, Inc. † New York
CONTENTS Preface
ix
List of Tables
xiii
List of Figures
xv
PART 1. PRINCIPLES
1
Chapter 1
XIXth
Historical Background Century and Beyond th 1.1. The Early XIX Century 1.2. Waves and the Eletromagnetic Theory 1.3. Matter and Atomism in the Xixth Century 1.4. Physics in the XXth Century 1.5. The “Impossible” Reduction 1.6. The Structure of the 1927 Interpretation: Reduction Revisited
3 4 7 9 14 15 29
Chapter 2
The Characteristic Function Derivation 2.1. The Axioms and the Formalism 2.2. Quantization in Spherical Coordinates: An Example 2.3. Quantization in Generalized Coordinates 2.4. Generalization to Mathieu Canonical Transformations 2.5. Connection to Bohr-Sommerfeld Rules 2.6. Connections with Feynman’s Path Integral Approach
33 34 40 44 46 47 61
Chapter 3
The Entropy Derivation 3.1. The Derivation 3.2. Connections with the Characteristic Function Derivation 3.3. Three-Dimensional Derivation 3.4. The Phase-Space Probability Density Function 3.5. Average Values Calculated Upon Phase Space 3.6. Entropy Maximization by F(q,p;t)
65 66 69 73 76 82 90
Chapter 4
The Stochastic Derivation 4.1. The Stochastic Approach 4.2. Connections with Previous Derivations 4.3. Ensemble and Time Averages: The Ergodic Assumption 4.4. Consequences of the Stochastic Derivation 4.5. The Origin of Stochastic Behavior
93 94 101 103 107 117
vi
Contents
Chapter 5
Quantum Mechanics and the Central Limit Theorem 5.1. The Central Limit Theorem for One Random Variable 5.2. Phase Space Derivation 5.3. Connection with The Characteristic Function Derivation 5.4. The CLT Derivation of the Schrödinger Equation 5.5. Ensemble and Single Systems Revisited 5.6. Ontological Duality As a False Problem 5.7. The Dispersion Relations 5.8. Limitations of the Proof
125 126 126 135 137 138 139 143 145
Chapter 6
Langevin Equations for Quantum Mechanics 6.1. Previous Results 6.2. Langevin Stochastic Analysis 6.3. The Harmonic Oscillator 6.4. The Morse Oscillator 6.5. The 3D-Harmonic Oscillator 6.6. The Hydrogen Atom 6.7. The “Classical” Limit 6.8. Extrinsic and Intrinsic Interpretations 6.9. Further Developments
147 149 151 157 174 177 179 180 187 189
PART II. NEW PERSPECTIVES
191
Chapter 7
Classical Representation of the Spin 7.1. The Classical Bidimensional Oscillator 7.2. The Spin Eigenfunctions 7.3. On Rotations by 4 7.4. Derivation of the Pauli Equation: 7.5. The Stern-Gerlach Experiment 7.6. Experimental Assessment to non-Commuting Observables
193 194 200 205 211 215 221
Chapter 8
Operator Formation and Phase Space Distributions 8.1. The notion of a Probability Distribution in Quantum Mechanics 8.2. Operator formation in Quantum Mechanics 8.3. The dynamical equations 8.4. On Distributions Assuming Negative Values 8.5. Why FO(q,p,t)?
225 228 235 238 240 244
Chapter 9
On Reality, Locality and Bell’s Inequalities 9.1. Von Neumann’s Theorem and Bell’s Argument 9.2. On Bohmian Mechanics and Hidden Variable Theories 9.3. Non-Locality: What It Means? 9.4. Bohmian Equations: Wholeness 9.5. On Determinism
253 254 266 260 269 272
Chapter 10
Indistinguishability 10.1. Some Trivial Results on Counting 10.2. Gibb's Paradox and the Distinction between Classical and Quantum Worlds
273 274 277
Contents 10.3. A Derivation of the Correct Boltzmann Weight 10.4. How Come?
vii 278 280
PART III. RELATIVISTIC EXTENSION
283
Chapter 11
285 286 295 300 303 312
Special and General Relativistic Quantum Mechanics 11.1. Special Relativistic Equations 11.2. General Relativistic Quantum Mechanics 11.3. Application: The General Relativistic "Free" Particle Problem 11.4. The Negative Mass Conjecture 11.5. Consequences of the Relativistic Approach
PART IV. INTERPRETATION
315
Chapter 12
317
The Interpretation of Quantum Mechanics 12.1. The Copenhagen Interpretation: Reduction of the Wave Packet 12.2. David Bohm’s Interpretation: Wholeness 12.3. The Statistical Interpretation: Ensembles 12.4. The Stochastic Interpretation: Randomness 12.5. The Relative State Interpretation: Many Worlds 12.6. Consistent Histories Interpretation: Events 12.7. The Present Interpretation 12.8. Closing Remarks on Physics, Weirdness and Garbage Collectors
317 324 328 331 333 337 342 349
References
351
Author's Contact Information
357
Index
359
PREFACE Interpretations of Quantum Mechanics galore. Since the pioneering works of Heisenberg, Bohr, Dirac, Schrödinger and others, when in 1927 the so called Copenhagen Interpretation was presented, there appeared interpretations using concepts like hidden variables, many worlds, coherent histories, and many others, some of them minor deviants of the Copenhagen Interpretation, others defying it. In almost all cases these interpretations are not fully coherent and consistent with each other. One is then led to the question of which of them is to be accepted. This can be a tricky question; indeed, given the formal apparatus of a theory (its syntax), there may be a number of interpretations (its semantics) consistent with it, there being what can be called a hermeneutic indeterminacy. Of course, the syntactical apparatus shortens the semantic possibilities and one should expect only some interpretations to be acceptable. How many of such interpretations will find grounds in the underlying syntax is always unknown in advance. Quantum Mechanics, for instance, seems to accept a lot. There may be cases, however, in which the syntax restricts the set of interpretations to unit and one is left with an unambiguous and stable interpretation of the theory. In the history of science one may find some examples of such a situation: the most prominent one was the dispute that took place between conservationists and atomists since the beginning of Newtonian mechanics. Both sides had accused the opponent of being “metaphysical”, a word that became anathema to modern science—forces were metaphysical entities for the conservationists and energy was the metaphysical entity for dynamists. However, in the middle of the XIX 𝑡ℎ century, there appeared the work-energy theorem within the realm of thermodynamics (its syntax) and it was quickly realized that the two ways of considering physical problems were translatable into one another. Thus, “metaphysical” either should apply to both sides or it should apply to none. The translatability imposed, thus, the dissolution of the discussion and the unification of the two perspectives into one sole approach to physical problems. The issue to be learned here is that, sometimes, disputes occurring in the semantic level can be solved or dissolved at the syntactical level with a coercitive power. That is, the syntactical developments are so unambiguous that physicists, given their previous commitments with other criteria (as simplicity) for selecting physical theories, are compelled to accept the implications clearly posed by the syntactical framework. A primary and undisputable commitment of physicists with respect to theory choice is the amount of constructs necessary to its establishment. Thus, the transition from Ptolemaic to
x
Olavo Leopoldino da Silva Filho
Copernican astronomy was based not only upon the fact that in the latter it was necessary to use much less epicycles to the description of planetary motion—although it still needed some—but also, and mainly, upon the fact that it dismissed with the heavy edifice of Aristotelian physics. Thus, if we apply this criterion to interpretations, any interpretation of a theory that allows us to understand its syntactical structure using much less constructs (a lighter semantics) should be considered the one to prefer. If, by the way, this preferred interpretation conflicts with other ones, then, depending on the economy, clarity and predicting power that this interpretation lends to the syntactical apparatus, it will acquire coercitive powers too. The question then is if one can find an interpretation of quantum mechanics that shows this kind of coercitive effect. This book intends to answer this question affirmatively. Furthermore, the main objective of this book is to present in great detail all the formal developments and their unambiguous interpretation to show, more than to abstractly arguing for, its coercitive power. However, the reader should not expect abstruse mathematical formalism. Indeed, we purposely chose to keep formal developments as simple as possible, since we believe that less involved formal developments restrict even more the set of possible interpretations consistent with them. Any graduate student is then completely prepared, from the formal point of view, to understand all the content of this book. This constraint also favoured the option of using an axiomatic approach, since in an axiomatized theory all its interpretation must be included in the axioms and must be inherited by results derived from them. As we will soon show, quantum mechanics can be completely described by only three postulates. Moreover, there will not be any semantic axiom, that is, axioms will be restricted to the syntactical structure, and its symbols, when interpreted in the usual way agreed upon by any physicist, will give rise to the semantics of quantum mechanics. In any way, whenever needed, philosophical thinking will not be stated without the underlying syntactical grounds to avoid empty physical discourse. The book is organized in the following way: PART I - In the first chapter we make a brief and non exhaustive historical assessment of the physics in the XIX 𝑡ℎ century. The main objective is to disclose the steps in the dispute between atomism and wave theory that prevailed during the whole nineteenth century, ending with the conclusion that matter should be composed of atoms and radiation should consist of waves. We then address the impact of the excruciating experiments that led physicist to the interpretation of quantum mechanics in terms of duality and its companion constructs: observers, indeterminacy, complementarity, etc. This composes the conceptual background against which the developments of the other chapters should be understood. The second chapter commences the formal developments with the presentation of the axioms and the derivation of the Schrödinger equation—within the present approach, the Schrödinger equation is not one of the postulates. Interpretation is left to a minimum in the first chapters, except when it comes to furnish the interpretation of the symbols of the syntactical apparatus. In these cases we show that once the symbols of the axioms are interpreted, the interpretation of all other derived results is fixed. We also make an application of the derivation method for the Schrödinger equation (which is nothing but a quantization procedure) to show that the present approach avoids the embarrassing problem of having to make quantization in Cartesian coordinates and then pass to the generalized coordinate system of preference. Cannonical transformations are also
Preface
xi
addressed and the Bohr-Sommerfeld quantization rules are also shown to be mathematically derivable from our axioms in the appropriate domain of application. This latter result, together with other developments, is then used to show how the excruciating experiments showing the undulatory behavior of matter could had been explained upon only a corpuscular ontology. This will be our first assessment of duality. At the end of the chapter we also show that the Feynman quantization procedure, based upon Path Integrals, is mathematically equivalent to ours. The third chapter presents another derivation of the Schrödinger equation that is shown to be formally equivalent to the one made in chapter two. This strategy is used to build around the quantum formalism as much mathematically equivalent perspectives as possible. Thus, while in chapter two the primary quantity is the characteristic function (density matrix), in chapter three the notion of entropy plays a most prominent role. The proof of mathematical equivalence between the two derivation methods closes the loop with respect to the possible interpretations of the syntactical constructs and furnishes the analytical form of the phase-space probability density function. The use of a derivation based upon different physical constructs allows us to widen our understand of the underlying interpretation. For example, the notion of entropy is akin to notions of fluctuations, ergodicity, etc, which are not that obvious within the characteristic function formalism. Chapter four presents the stochastic derivation of the Schrödinger equation and it is again shown its mathematical equivalence to all previous derivations. Stochastic behavior makes it obvious what the entropy approach only scratches: the importance of correctly accounting for fluctuations in the interpretation of quantum mechanics. Since the formalism is shown to be reducible to the proposed axioms and since the complete interpretation must be included (beyond mathematical equivalence) in the axioms, new understandings of the semantics of the quantum domain are unraveled. Fluctuations or independent random phenomena are then linked in a straightforward manner to the Central Limit Theorem. The connections of the Central Limit Theorem and quantum mechanics is the subject of chapter five. In this chapter we develop, although not in quite systematic ways, most of the interpretation of the formalism. The Central Limit Theorem is also connected with the Focker-Planck equation emerging from a random movement of particles by means of Langevin equations. In the sixth chapter we show the Langevin equations that reproduce all the results of quantum mechanics. We present these equations and show, using simulations, that they correctly describe quantum phenomena. Most of the results of the previous chapters are in fact seen in operation by means of the simulations. The semantics developed in the previous chapter is then reassessed in the light of these new results to strengthen its coercitive power. In this chapter we use the Langevin equations previously stated to work out the notion of “classical limit”, to be appropriately understood in the framework of the present interpretation. The simulations are used to show the process by which this limit takes place. PART II - There are a number of phenomena often used in physics manuals to establish the abyss between Classical Physics and Quantum Mechanics. In chapters seven through ten we address some of the most important arguments supporting this view and show (always resting on the mathematical formalism) that they can be seen from a totally different perspective, with opposite conclusions about the relations between Classical Physics and Quantum Mechanics.
xii
Olavo Leopoldino da Silva Filho
The case regarding the spin is exemplary and we study it in detail in chapter seven. In chapter eight we present a discussion about operator formation and phase-space distribution functions and equations within the realm of the present approach. Issues about non-locality and Bell’s theorem are addressed in chapter nine. In chapter ten we present a fresh view on Fermi-Dirac and Bose-Einstein weight functions an their relations to Boltzmann’s weight. PART III - Chapter eleven deals with the special and general relativistic extensions of the previous formalism. The use of an axiomatic approach is then justified not only from the point of view of the interpretation, but also from the syntactical perspective, since to make an extension of an axiomatized theory simply means extending its axioms and repeating the already known derivation steps. This chapter, however, may be skipped without any harm to those interested solely in interpretation issues. PART IV - The last chapter will put the results concerning the interpretation into a more systematic format and will argue for its persuasion power. Indeed, this book should be judged by the amount of persuasion it imparts on the reader.
LIST OF TABLES Comparison between the energy partition function and the momentum partition function as defined in the present work. Table 2. Phase space probability density functions for various values of the quantum number n for the harmonic oscillator problem Half-densities, momentum fluctuations and energies for some states of Table 3. the hydrogen atom problem given by the quantum numbers in the first column (ξ=cosθ). Phase space probability density functions for some levels of the Table 4. hydrogen atom (ξ=cosθ). Various averages for the harmonic oscillator problem using the Table 5. superposition of probability amplitudes and F(q,p;t) calculated from them. Only the results for F(q,p;t) are shown because they are all identical to those obtained using the probability amplitudes of the usual calculation. Momentum fluctuations for the first four quantum levels of the Table 6. harmonic oscillator. Two superpositions are also shown: the first is a superposition of states n=0 and n=1 and the second is a superposition of states n=2 and n=4. The phase-space probability distribution for the four quantum levels of Table 7. the harmonic oscillator. One example for a superposition of the levels n=0 and n=1 are also shown. For N=1 we used Np=400, Ntraj=10^5, γ=10.0 and τ=0.005. Expected Table 8. values of the statistical moments are shown in parenthesis. Values of the energies, expression of the momentum fluctuations for Table 9. the Morse oscillator with D=16 and y=exp(-q). Table 10. Kinetic energy fluctuations for various states of the one-dimensional harmonic oscillator. Table 11. Values of λ, m and N for various half-integral spins. Table 12. Properties of the distribution functions (taken from Lee, op. cit., p. 166). Table 13. Usual counting process for Boltzmann's, Bose-Einstein's and FermiDirac's distributions. Table 14. Possible combinations of mass and charge signs allowed by Nature for integral spin particles. The related amplitudes are also shown. Table 1.
72 84 89
90 112
158
158
165 177 181 203 230 280 307
xiv Table 15. Table 16. Table 17.
Olavo Leopoldino da Silva Filho Possible combinations allowed by Nature for particles with halfintegral spins. The possibilities introduced by parity are not shown. Possible combinations of matter and antimatter with different signs for the charge, spin and parity. Detailed account of the possible combinations allowed by Nature for particles with half-integral spin
311 312 313
LIST OF FIGURES Figure 2.1.
Figure 2.2.
Figure 2.3.
Figure 2.4. Figure 2.5.
Figure 2.6.
Figure 3.1.
Figure 3.2.
Figure 3.3.
Figure 3.4.
An experimental setup similar to the Davison-Germer arrangement to measure the phenomenon of particle diffraction. Most of the particles are reflected with the angle of incidence, but there are some that are reflected with different angles. The Ewald sphere (circle for two-dimensions) allowing us to determine what planes in the crystal structure will be responsible for the resulting Bragg-Laue diffraction pattern. One silt diffraction experiment. A source sends particles each at a time perpendicularly to a screen with an aperture of size a at its center. The particles pass through the slit and collide with a detector. The discrete momentum transfer between the slit is responsible for the quantized result. The Ewald sphere construction for an apperture. A sketch of an experiment on double slit interference of particles. Each incoming particle interacts with one single slit and exchange with it linear momentum by a quantized amount, hitting the detector at some spatially quantized positions. Schematic representation of a Feynman path compared to the most probable trajectory of a corpuscle. In the right there is a representation of the fluctuations related to this Feynman path. The solutions to the Liouville equation for states n=1,2 of the harmonic oscillator. The solutions are everywhere zero, except for points close to high values of the fluctuations. Left column: The phase space probability density functions for n=0 and n=1. Right column: the average fluctuation profiles defined upon configuration space. Left column: The phase space probability density functions for n=1 and n=2. Right column: the average fluctuation profiles defined upon configuration space. Contours of the Hamiltonian and of the probability density function for the level n=1 of the harmonic oscillator problem. For the Hamiltonian, each contour represents a constant energy shell, while for the probability density function each contour represents a constant probability density.
53
55
56 58
59
64
79
86
86
87
xvi Figure 4.1.
Figure 4.2. Figure 5.1.
Olavo Leopoldino da Silva Filho Ensemble and time averages. The vertical line represent an average at some time t over all the systems composing the ensemble, while the horizontal line represent an average over one single system in the time interval Δt. The behavior of a wave packet scattered by a potential barrier (taken from [70]). Three trials of random movements in one dimension. The steps are
105 123
(𝑘)
Figure 5.2.
Figure 6.1. Figure 6.2.
Figure 6.3. Figure 6.4. Figure 6.5.
Figure 6.6.
Figure 6.7.
Figure 6.8.
Figure 6.9.
marked on the horizontal axis as 𝑦𝑗 , while their (equal) sum 𝑦 is represented on the vertical axis. Schematic representation of random walks on phase space. On the left panel we present the drawing of five trials of random walks. On the right panel we present one possible probability distribution on phase space constructed by taking only the end points of innumerous (≈ 104 ) random walks. These end points are just the 𝑙 results of the sums 𝜋𝑘𝑙 = ∑𝑗 𝑝𝑘,𝑗 for the momentum variable (all k). The parallel between the usual random walk situation and the present analysis. The sampling of phase-space by momentum fibers indexed by configuration space coordinates. We also show how the sampling is done (ensemble approach): we let our particles to begin at any point of the phase space and, after some definite time, given by 𝑛𝜏 (𝑛 = 6 in this figure) we make our statistics over the fiber using the characteristic function given in (6.8). Density functions in configuration and momentum spaces for various levels of the harmonic oscillator (ensemble simulations). Fluctuation profiles of the variable q for the harmonic oscillator states n=1 and n=2 (ensemble simulation). Fluctuation profiles for the levels n=1 and n=2 of the harmonic oscillator. The isolated peaks show the vicinities of the regions in which the fluctuation diverges (ensemble simulation). Configuration and momentum space probability densities for various values of the parameter c. States n=1 and n=2 are shown. Note that the momentum probability density function is very insensitive to the values of c. Profiles of the fluctuation structure terms for the states n=1 and n=2 of the harmonic oscillator. (a) for n=1, (---)c=0, (++) c=0.01 and (--) c=0.001; (b) for n=2, (---)c=0, (++) c=0.15, (--) c=0.05. Single system simulations of the states n=1,2,3 of the harmonic oscillator. The parameter values for each state are shown in the figure. Evolution of the configuration space probability density profiles for the state n=1 of the harmonic oscillator using E=5.0 as the energy for the initial conditions. The other parameters are kept fixed as γ=17, τ=0.005, c=0.0005 and Ntraj=50000.
128
131 148
152 160 162
163
164
165
166
167
List of Figures Figure 6.10.
Figure 6.11. Figure 6.12. Figure 6.13. Figure 6.14. Figure 6.15. Figure 6.16.
Figure 6.17.
Figure 6.18. Figure 6.19.
Figure 6.20.
Figure 6.21.
Figure 6.22.
Figure 6.23.
Configuration space probability densities for the state n=1 of the harmonic oscillator for various values of the energy E used to calculate the random initial conditions (q0(k),p0(k)) in ensemble simulations. (a) Bohmian trajectories, (b) contours of the phase space probability densities and (c) simulation points for the sates n=1 and 2. The variance for various quantum mechanical states of the harmonic oscillator. Correlation profile for the simulations of the previous figure. Calculation of γ from the stochastic simulations. Quantum states for the Morse oscillator with D=16. The structure of the Morse potential is also shown. Probability density functions in configuration space for four levels of the Morse oscillator (n,m)=(7,7), (6,7), (5,3) and (4,1), with D=16. The parameters values are also shown. Results of the simulations of the three-dimensional harmonic oscillator in the ground state. On the left is the configuration space probability distribution for some non-zero value of the parameter c. On the right is the same distribution for c=0. The effect of the centrifugal terms is evident. The probability distribution profile for the (2,0,0) state of the three dimensional harmonic oscillator. Radial probability densities in configuration space for the states (1,0,0) and (2,0,0) of the hydrogen atom. The parameters’ values are also shown. Fluctuation profiles for two states: (1,0,0) and (2,0,0). The line refers to the exact values (c=0), while the crosses refer to values for (c=0.005). The behavior of the probability density functions in configuration space for the states n=0,1 for three different values of the masses. The particle tend to be increasingly confined to the position q=0, since the energy is kept constant and the mass is increased. The phase-space contours for different values of the energy for the harmonic oscillator problem (upper left) and the Morse oscillator problem (lower left). The `probability density’ related to these movements are shown in the upper right and lower right positions, respectively. The filling of the phase space for the n=0 state of the harmonic oscillator for various values of γ. As γ increases, the fluctuations tend to fill in non-homogeneous way the phase-space, thus giving the final probability density function.
xvii
168 169 172 173 173 176
176
178 179
180
181
183
184
185
xviii Figure 6.24.
Figure 6.25. Figure 7.1.
Olavo Leopoldino da Silva Filho Phase space filling for a single system simulation of the harmonic oscillator problem (n=0) for γ=1 and 40.000 point in the trajectory. Red and blue regions mean greater or smaller probability density, respectively. Newtonian limit based upon the ideas of the Correspondence Principle. Spin densities for various values of the quantum numbers 𝑆 and 𝑚. 𝟑
𝟏
𝟐
𝟐
For 𝑆 = , 𝑚 =
186 186
it is also shown a possible visualizable material
representation of the model. Figure 7.2. Conformal transform map given by 𝑧 = √𝑍. Riemann surfaces for the conformal map of Figure 7-2 and the Figure 7.3. ellipse. Schematic representation of the spin particle in its reference Figure 7.4. system 𝑜, and the laboratory system 𝑂. The direction of the magnetic field B is also shown. The Stern-Gerlach experiment with an oven O, a pair of magnets M Figure 7.5. and a screen S. Characteristic distances are shown in the picture. Magnetic field lines B are also shown. In the Stern-Gerlach experiment the silver was heated to about 1300K. The atoms emerge through slits which collimates them. Stern and Gerlach verified that their velocity distribution was Maxwellian. Typical speeds were about one 1 𝑘𝑚/𝑠 and 𝐿 ≅ 10 𝑐𝑚. Thus an atom spent about 10−4 𝑠 in it. The correct realization of the Stern-Gerlach experimental setup to Figure 7.6. produce the magnetic field given in (7.33). The magnetic field is, thus, of a quadrupole nature. The relations between the coordinate system related to internal Figure 7.7. degrees of freedom (𝑜𝑥𝑦𝑧) and the coordinate system related to the translational movement of the particle as a whole(𝑂𝑋𝑌𝑍). The expected experimental outcome in an Stern-Gerlach experiment Figure 7.8. using pinholes as collimators. The experimental result of a Stern-Gerlach experiment. See also Figure 7.9. [93]. Figure 7.10. The supposed theoretical prediction of the outcome of the SternGerlach experiment. Two dots on the detectors’ screen Solutions for the first excited state of the Harmonic Oscillator: (a) Figure 8.1. momentum space density function, (b) Wigner’s distribution and (c) phase space contours of Wigner’s distribution (the gray area corresponds to the region where Wigner’s distribution becomes negative. Figure 8.2.(a) Probability contours and (b) probability density function for the harmonic oscillator in the first excited state. Change of the behavior of the phase space probabiity density Figure 8.3. function with respect to the parameter gamma. Each frame was produced using 100,000 points in one trajectory and tau=0.0005.
204 209 209
212
215
216
216 221 221 222
240 241
243
List of Figures Figure 8.4.
Figure 8.5. Figure 8.6. Figure 9.1.
Figure 9.2. Figure 9.3.
Figure 9.4.
Figure 9.5. Figure 10.1. Figure 10.2. Figure 10.3. Figure 10.4. Figure 11.1.
Figure 11.2.
Bohm’s trajectories, probability density functions and equiprobability contours for the harmonic oscillator (levels n=0,1 and 2). Wigner’s distributions for the first three excited states of the harmonic oscillator. Momentum probability density functions as marginal distributions to FW and FO, respectively. Plot of the right hand side of Bell’s inequality as compared with its limiting value 2 to show that this inequality is violated for the experimental set up of the text. Calculation of the projections of the magnetic moment of the SternGerlach particle for the emission of each single particle. The choice of the Stern-Gerlach magnets to measure particle’s 1 spin (the smaller sphere on the left of the singlet initial state represented by the bigger sphere) and particle’s 2 spin (on the right). Axis is in the z-direction, while axes and are inclined by with respect to the z-direction, and axis is inclined by with respect to the z-direction. Frames 1 to 4 show different combinations of these axes (or magnet orientations) to correlate the results of the measured spin of both particles. plot of the right hand side of Bell’s inequality as compared with its limiting value 2 to show that this inequality is violated for the experimental set up of the text. Artistic representation of a purely classical experimental set to test Bell’s theorem for Classical Physics. How to arrange nine geometrical objects with respect to their color (even if we know that they have different shapes). ways to combine nine geometrical objects into three degenerate stages with degeneration numbers equal to 2, 2 and 3. Asymptotic behavior of Bose-Einstein's an Fermi-Dirac's weights as the density of states goes to infinity. the same example of combination of colors, but with a continuous range of shades for each color. evolution of the system with respect to the parameter τ and the way the probability density function must be built. The clocks represent the usual synchronization process done within the Special Theory of Relativity. an example of the interpretation of a flux through the time "surface". The particle is at rest in the four-volume (time passes for it as for the time surface). At some instant (b) the particle interacts with the field and particles and antiparticles are created. Particles flow to the left, staying within the 4-volume, while antiparticles flow to the right, trasversing the time surface and, thus, producing a time-flux.
xix
246 247 249
257 258
263
263 265 276 279 281 282
294
296
xx Figure 11.3.
Figure 12.1.
Figure 12.2.
Olavo Leopoldino da Silva Filho Fluctuations of energy from the particle subsystem to the field subsystem and vice-versa. The two particles shown are exchanging photons in their electromagnetic interaction. When (a) one photon leaves particle 2 towards particle 1 but has not arrived at particle 1, energy is transferred from the particle subsystem to the field subsystem. When (b) the photon is absorbed by particle 1, then energy is transferred from the field subsystem back to the particle subsystem, making energy fluctuate. The realistic situation is when (c) a very large number of photons is leaving both particles while another large number of them is being absorbed by the particles, making the fluctuation profile a very complex one. Everett's branching for the double slit experiment (in which the measured state is simply a position over the photographic plate). Only three systems are shown being measured. Quantum Mechanics is correct if averages over parallel universes correspond to averages taken in the same universe along time, for each branch, after a sufficient large number of measurements. A synthesis of the conceptual net developed in the present book.
297
340 348
PART 1. PRINCIPLES
Chapter 1
HISTORICAL BACKGROUND - XIXTH CENTURY AND BEYOND One of the most pervasive discussions beginning just after the advent of Newtonian Mechanics was the one concerning the interpretation of light in corpuscular or undulatory ways. Atomism and undulatory theory were then to begin a dispute that will last at least four centuries1. This dispute was at the hart of many discoveries of the XIXth century physics and its developments lent the grounds for the advent of quantum mechanics in the XXth century. In fact, the XIXth century was the period in which physicists had much confidence that the nature of matter was shown to be corpuscular and that the nature of light was shown to be undulatory. This strong confidence, together with the advent of many experiments made in the last quarter of the XIXth century laid the conceptual conditions for the advent of the interpretation of quantum mechanics put forth in 1927 with its central assumption of some sort of duality. In this chapter we briefly depict the historical process that, during the whole XIXth century, brought about the conceptions about these two natures: corpuscular and undulatory. Thus, two main theories are in the center of our concerns: electromagnetic theory and atomism, the latter mainly represented by the kinetic theory of gases[1]. This historical assessment should be considered as a mere clarification of the Zeitgeist of the time, since it neither intends to be exhaustive nor detailed, as a book strictly interested in such historical analyses should be. However, it does have much importance to the appreciation of the remaining chapters. Indeed, a historical description may show that what one considers as the inevitable and linear completion of some cultural development may had been, in fact, a choice, more or less arbitrary, depending on the subject, conquered after much wavering and intellectual pain. In some sense, we will be looking at the XIXth century and the beginning of the XXth century from a very particular perspective: the creation of the conditions of possibility to the acceptance of a principle of wave-particle duality. The chapter begins with some introductory remarks on the state of matters previous to the th XIX century, and proceeds with a sketch of the developments of wave theory within the realm of electromagnetic theory. Finally, we present the history of the struggle of both 1
In fact, many authors would place the beginning of this dispute much more distant in time, when Stoicism sustained the notion of phlegm (a pervasive background forming space) while Epicureanism defended the notion of atoms and void. For our purposes, we may begin with the modern version of this dispute.
4
Olavo Leopoldino da Silva Filho
physical and chemical atomism for their acceptance as sound approaches to the description of matter. In the final sections we address Heisenberg, Bohr, Schrödinger, Born, Einstein and others in their almost despaired trials to find some worldview that could encompass all the developments of the XIXth century while still in accord with the new experiments. In these last sections we will use as much as possible the voice of the protagonists of the endeavor leading to the interpretation of 1927. We hope that this will put their astonishing results in the correct historical and conceptual perspective.
1.1. THE EARLY XIXTH CENTURY If we can view mechanics as a history of the conceptions of space and time throughout the centuries, it is equally true that we can approach electromagnetic theory and optics as the development of the dispute about the nature and properties of light. The rectilinear propagation of light and the phenomena of reflection and refraction were already known by ancient philosophers, interested in the nature of light. Some of the earlier systematic studies about the characteristics of light were made by greek philosophers (e.g. Empedocles, ca. 490-430 B.C.) and mathematicians (Euclid, ca. 300 B.C.). The Greeks of that time already knew the law of reflection (equal angles). The Middle Ages were a less proficuous period for the study of the nature and properties of light. The subject was given a renewal by the so called modern philosophers, in special by René Descartes (1596-1650) that developed many ideas about the nature of light mostly from his metaphysical perspectives. “...when someone holds a torch in the darkness and, moving it from side to side looks at a mirror standing half a kilometer apart, he will be able to say if he feels the movement in his hands before seeing it in the mirror... If there were a possibility to detect a temporal lapse in such an experiment, I must admit that all my philosophy would completely fall apart.” (Descartes, letter to Beekman, august 22𝑡ℎ , 1622, my translation).
Indeed, for Descartes the notion of void (vacuum) was meaningless and light was essentially a pressure transmitted through an elastic medium which constitutes, rather than fills, all space—for Descartes, there were no difference between extension and matter, being the latter only a way for us to conceive the former. He thought that colors, for example, arise from the different rotational movements and velocities of certain parts of the extension. Descartes made some important contributions to the field of geometric optics [2,3]. The advent of the experimental method allowed the modern philosophers to put the field of geometric optics on solid grounds, both from the theoretical and empirical points of view. Galilee himself published a report, in 1638, about his trials to measure the velocity of light. In this report Galilee tells us that he positioned himself at the top of a hill with a lamp in his hand, while one assistant of his stayed at the top of another hill with another lamp. When viewing the lightning of Galilee’s lamp his assistant should remove the cloth covering his own lamp. In this way Galilee thought he would be able to measure the light velocity by knowing the time needed for light to arrive at him. The law of refraction was experimentally discovered in 1621 by Willebrord Snell (Snellius, 1591-1626). As soon as 1657, Pierre de Fermat (1601-1665) enunciated his
Historical Background - XIXth Century and Beyond
5
principle of the least time, by which the light always takes the least possible time to travel from one point of space to another. Despite its developments, geometric optics was (and is) unable to decide about the main controversy that existed by more than three centuries, that is, the controversy about the nature of light: corpuscular as many physicists with Newtonian leanings would defend or undulatory as physicists like Christian Huygens (1629-1695) and others would sustain. It should be noted, however, that many phenomena that would be capable of distinguishing the corpuscular and undulatory views about the nature of light were already experimentally known at the time of Newton; for instance, the phenomenon of interference by thin films, actually known as “Newtonian rings” was discovered independently by Robert Boyle (1627-1691) and Robert Hooke (1635-1703). In fact, Hooke also observed the phenomenon in which light may be found in the geometric shadow of illuminated bodies. This phenomenon was already reported by Francesco Maria Grimaldi (1618-1663). Hooke may be considered the precursor of Huygens’ geometrical principle concerning the construction of wavefronts, since he was the first one to sustain that light should consist of fast vibrations quickly (or eventually instantaneously) propagating through the medium in such a way that each vibration would generate, at each instant of time, secondary spherical waves. The debate about the nature of light also implied the dispute about the nature of space (void or ether) and this would have many implications for the physics of the XIXth and XXth centuries. Hooke also tried to explain the phenomena of refraction and color dispersion but the latter was characterized by Newton (1642-1727) in 1666 by showing that white light could be split into component colors using a prism and that each particular color should be associated with a specific change of direction (the dispersion relations, as we would say nowadays). The difficulties of the undulatory theory to furnish, at that time, a good explanation to the polarization of light were a major obstacle for its acceptance. This phenomenon could be understood by using the corpuscular approach and the model by which the corpuscles of light should have “sides”2. In a number of ways, including the political one, Newtonian atomism played a considerable role in favor of the corpuscular perspective. Huygens was most certainly the most prominent follower of Hooke’s ideas and his principle was fundamental to give credibility to the undulatory model, since it was able to explain the phenomena of light reflection and refraction in a systematic way. Huygens was also responsible for the interpretation of the double refraction, discovered in 1669 by Erasmus Bartholinus (1625-1698). His interpretation was based upon the hypothesis that, in the crystal, besides the primary spherical waves, there should be also a secondary ellipsoidal wave. It was during these investigations on the double refraction that Huygens discovered the polarization of light. Since, at the time of Huygens, light was not acknowledged to be a transversal wave, the explanation of its possible polarization was unavailable and was Newton that (acceptably) interpreted the phenomenon by assuming that the corpuscles of light have “sides”[4] (see, for example, Question 26 in the third book of Optiks). The interpretation of polarization together with the authority of Newton contributed to an hegemonic acceptance of the corpuscular
2
It is interesting to stress what 'model' means in this specific case. It means a way to lend rationality to phenomena, although it is not part of the physical theory itself, since it is never made part of its syntax. This 'morphological' model, however, would be translated into the dynamic one which attributes intrinsic momentum to photons and, by this, would make it part of modern physical theory.
6
Olavo Leopoldino da Silva Filho
model for the Nature of light, even though the undulatory model had received the adhesion of some intellectually significative persons, like Leonard Euler (1707-1783). The discovery that the velocity of light is finite was made in 1675, when Olaf Römer (1644-1710) showed that the anomaly in the Jupiter satellites could be explained with this hypothesis. Indeed, Römer reported a variation in the observation times when the moon Io, in its orbit around Jupiter, were eclipsed by this planet; when working the correlation between this variation and the relative positions of the Earth and Jupiter, he calculated the velocity of light necessary to travel from Io to the Earth (22min, quite close to the modern value 17min). Despite this, the debates about the finite character of the velocity of light went through the three first quarters of the XVII𝑡ℎ century until, in 1729, James Bradley (1693-1762) finally decided the issue when he published his calculations of the velocity of light from aberration phenomena caused by the orbital movement of the Earth (his findings were quite close to those of Römer). This discovery would be surely devastating to Descartes, if he were alive at the time. From the point of view of the dispute between the corpuscular and the undulatory models, the situation was most favorable to the corpuscular one until the first quarter of the XIX 𝑡ℎ century. As early as 1801, Thomas Young (1773-1829) enunciated the principle of interference and explained, in undulatory terms, the colors obtained in thin films. However, his explanations were still excessively qualitative and could not revert the convictions of the physicists about the adequacy of the corpuscular model. At this time, approximately, Étienne Louis Malus (1775-1812) discovered the polarization of light by reflection when observing that the light passing through his window and incident on a piece of crystal produced an image by double refraction that varied in intensity when the crystal was rotated with respect to the line of view. The corpuscular model, however, was still being developed by its followers, as Pierre Simon de Laplace (1749-1827) and Jean-Baptiste Biot (1774-1826). Laplace was very influent at that time in the French Academy of Sciences and proposed a contest about the problem of diffraction. He was hoping that a corpuscular explanation would be found proving the greater adequacy of the corpuscular model. His expectations were greatly frustrated when, despite strong opposition, the prize was given to Augustin Jean Fresnel (1788-1827), who dealt with the diffraction phenomenon based solely on the undulatory model and also announced a series of experiments that, in the course of some few years, would discredit the corpuscular model. Curiously enough, Poisson, that was a follower of the corpuscular model, noted that the results of Fresnel, if correct, should give origin to an unthinkable phenomenon: if light were made to pass through a region occupied by an opaque disk, in the center of the shadow produced by the disk there should appear a bright point! Poisson then argued that this absurd result should be considered as an evidence that there should be something wrong with the hypotheses assumed by Fresnel, particularly with respect to the undulatory nature of light. To the despair of the followers of the corpuscular model, Dominique François Arago (17861853) made an experiment that confirmed the existence of the bright point mentioned by Poisson. This made the undulatory model to pass from a descriptive to a predictive status, acquiring a much stronger power of persuasion. Together with Arago, Fresnel studied the interference of two polarized rays and found (1816) that two rays, perpendicularly polarized with respect to one another, do not interfere. Since such a discovery cannot be adjusted within an undulatory model with longitudinal
Historical Background - XIXth Century and Beyond
7
waves, it was concluded (Young, 1817) that it must be assumed a transversal model for light propagation—thus removing the major obstacle to the acceptance of the undulatory model3. This assumption, however, was very critical to the hypothesis of a luminiferous ether, since as Fresnel (one of its greatest enthusiasts) noted, only a solid should be able to sustain transversal waves. Fresnel was also responsible for the hypothesis (1821) that heat could be explained by a molecular theory of matter. This fact shows that Fresnel was assuming an undulatory model for light, while advocating for the corpuscular model for matter (atomism). Models for a solid ether were developed by Fresnel and led him to find laws (now bearing his name) that explain the behavior of the intensity and polarization of light produced by reflection and refraction. Much of the theories of the solid ether were developed by Louis Marie Henri Navier (1785-1836) and Augustine Louis Cauchy (1789-1857). By the 1850s the corpuscular theory of light was almost completely discredited. The final stroke came with the experiment made independently by Foucault and Fizeau, and Breguet, showing that the light velocity was smaller in denser media. Indeed, the corpuscular model would use linear momentum conservation to arrive at its results on the velocity of light—and these results lead to greater velocities in denser media. Although many problems were solved by the undulatory model in the first half of the XIXth century, the foundations of optics were still considered unsatisfactory. This fact should serve as an example that the foundations of a physical theory come, usually, much later after its applications are already established and these foundations are better suited to tell us what Nature is, rather than to solve practical problems. We are still in a period of time before the final establishment of the electromagnetic theory by Maxwell which would link the fields of optics, electricity and magnetism and give the undulatory model its final completion.
1.2. WAVES AND THE ELETROMAGNETIC THEORY The rise of electromagnetic theory has its own complex history. Our interests here rest on the importance of the completion of the electromagnetic theory to the debate between corpuscular and undulatory views (in the realm of a theory of light). Together with the notion of waves, one of the most prominent new notion was that of a field. One of the first physicists to develop the notion of field was Michael Faraday (17911867), although not in mathematical terms, since Faraday was rather crude with respect to formalizations. Faraday was a disciple of Humphrey Davy (1778-1829) and became aware of the 1819 experiment of Hans Christian Oersted (1777-1851) in which an electric current makes a needle to deflect. He became interested on the relations between electricity and magnetism and, in 1821, he discovered magnetic rotation, having even built what could be called a precursor of the electric motor. With his discovery, in 1831, of the magnetic induction, he began to formulate his questions in terms of lines of force, instead of solely in terms of force. Being a fierce 3
One should note here the difference between the notion of transversal waves and “sides” of particles of light. The concept of transversal waves is blended into the formal apparatus of wave theory (and will gain even more this status with the advent of electromagnetic theory), while the mechanical notion of particles’ sides has no such blending. This is why we call the latter only a model of intelligibility, while we call the former a physical model.
8
Olavo Leopoldino da Silva Filho
adversary of the notion of “action at a distance”, he began to work with the notion of lines of force (as being truly existent, in the same sense of matter). The notion of line of force in Faraday’s terms is quite complex and is based upon a notion of atom derived from Boscovitch, in which these atoms were considered entities without extension but capable of transmitting forces (Boscovitch used these notions to amalgamate the idea of material corpuscles of Newton and the Leibnizian monads). Close to half the XIXth century, Faraday was considering the lines of force in quite realistic terms, as being capable of being polarized in such a way to explain the polarizability of materials. Faraday’s explanation for his law of induction, for example, is based on the assumption that when we move a conductor through a magnetic field (or vice-versa), we cut the lines of force that react and generate an electric current inside the conductor. At the time of Faraday’s death (1867), his works were making a strong impression upon two great physicists of the XIXth century: William Thomson (Lord Kelvin of Largs, 18241907) and James Clerk Maxwell (1831-1879). These two scientist assumed for themselves the task of putting Faraday’s intuitions into strict mathematical terms. Thomson was, indeed, the first to acknowledge the richness of Faraday’s works (although initially he had taken them not too seriously). His friendship with Stokes allowed Thomson to build mathematical models to study electric and magnetic interactions, but at the cost of abandoning the concept of lines of force, substituted by the notion of torsion in elastic media. In 1847 Thomson wrote that: “What I’ve wrote is just a sketch of a mathematical analogy. I didn’t intend even to suggest its use as the foundation of a theory about the propagation of electric and magnetic forces, which, if completely established, would express as a necessary result the connection between the electric and magnetic forces (...) If such a theory could be discovered, it will also explain, when considered in connection with the undulatory theory of light, the effect of the magnetism on polarized light.” (William Thomson. Letter from Thomson to Faraday, June, 11𝑡ℎ , 1847; cited in [5]).
Thomson had much influence over Maxwell, particularly in introducing him to Faraday’s work and convincing him that the intuitions of Faraday could be proficuous. In 1855-56 Maxwell published his work On Faraday’s Lines of Force, in which the field theory imagined by Faraday appears in sound mathematical grounds. To accomplish that, Maxwell used a model based on fluid behavior that he considered only as a useful analogy, although imaginary. In his latter studies, Maxwell absorbed Thomson’s influence and began to use the rotatory model, based upon Thomson’s notions of vortices, to which Maxwell searched a mechanical foundation. Around 1860 the mechanical model of Maxwell for Thomson’s vortices was already abandoned and only the mathematical formulation had remained. Maxwell’s equations, published in his epoch making work A Treatise on Electricity and Magnetism[6] in 1873, unified both electricity, magnetism and optics. Within its framework the equations for light propagation are unambiguously represented by wave equations and this seemed to settle the problem of the Nature of light (at the time linked to the notion of Ether). As everyone knows, this wasn’t the last word on the subject. Atomism was being developed during the whole XIXth century within the realm of a theory of matter. Maxwell himself was one of its most brilliant contributors. As we will soon see, these two distinct fields of knowledge (atomic theory of matter and undulatory theory of light) remained stable
Historical Background - XIXth Century and Beyond
9
for a very small period of time. Soon there appeared experiments in which matter interacts with light, and those experiments intermingled everything again.
1.3. MATTER AND ATOMISM IN THE XIXTH CENTURY In the previous sections we tried to briefly present the historical process by which physicists were trying to know the intimate nature of light. In the present section we are interested in the historical process that took place in parallel with that on the nature of light: the struggle to understand the nature of matter. To some extent, the quest here is quite similar to the one about the nature of light: cientists were interested in knowing if matter is a continuum or if it is made by discrete more elementary corpuscles (atoms). This problem, indeed, may be traced back to the fifth century B.C., discussed by greek atomists like Leucipus and Democritus and may be found presented in great details in the preserved work of Lucretium[7], much time after their death. The efforts of all these philosophers to understand nature using atomism was revived in the modern era by the natural philosopher Pierre Gassendi (1592-1655), contemporaneous of Descartes. The work of Lucretius was recovered in the XVth century and Gassendi had access to it. In fact, in England, Francis Bacon said, in his work On the Philosophy of Democritus, Parmenides and Telesius, that “the atomic theory is distinguished among all others developed by him [Democritus] as the chain that better connects the best parts of the physical philosophy of the ancients and the modern world”. Until the XVIIth century, the discrete structure of matter could be considered as a mere philosophical hypothesis, without any empirical support. In the XVIIIth century, some considerations about the nature of heat, set forth by Daniel Bernoulli in 1738, put these speculations in a format suited for scientific investigation. However, it was not before the XIXth century, with the pioneering works of Lavoisier, Dalton, and others, that the atomism began to search systematically for its experimental verification; one that would fix it as a true physical model of Nature. We can distinguish, for pedagogic reasons, two types of atomism: chemical and physical. Although the chemical atomism began with the rise of the XIXth century, physical considerations on the atomist hypothesis began (disregarding the work of Bernoulli) around the second half of the XIXth century, within the realm of the kinetic theory of gases, revived mainly from Bernoulli intuitions. In what follows we describe briefly both the chemical and physical atomisms. The interested reader may refer to [1] for more details.
1.3.1. Chemical Atomism Although the atomic model had already been considered by physicists of the magnitude of Newton and Boyle in the end of the XVII 𝑡ℎ century and beginning of the XVIII 𝑡ℎ century, what is nowadays considered as the rising of a scientific systematic modern atomism may be traced back to the discoveries made in the transition between the XVIII𝑡ℎ and XIX 𝑡ℎ centuries by Antoine-Laurent Lavoisier (1740-1794). His first discovery, that water has a compound
10
Olavo Leopoldino da Silva Filho
structure, was deadly to the “four-elements philosophy” of the ancient Greeks (tracing back to Empedocles). This discovery was later confirmed by the works of Henry Cavendish (17311810) and Joseph Priestley (1733-1804), which also proved the compound nature of air. The second discovery of Lavoisier was the conservation of mass, together with the appraisal of gravimetric studies in the chemical analyses. As soon as the beginning of the XIXth century there appeared a number of important studies in this same line of thought, as the one related to the Law of Constant Proportions from Joseph-Louis Proust (1754-1826), published in 1806, and the Law of Multiple Proportions from John Dalton (1766-1844), published between the years 1802 and 1804. In 1808, by publishing his work A New System of Chemical Philosophy, Dalton put forth his convictions on the existence of invisible and indestructible entities that would be the primary constituents of matter. One of the greatest advances of this early atomism was its assumption that the atoms should be characterized by their weight. Soon there appeared other ways to classify the elements by means of their supposed atomic constituents, the most important of them being the one due to Jöns Jacob Berzelius (1779-1848). His classification, essentially based in the first letter of the element’s greek name, made evident an important characteristic of the laws of Proust and Dalton: they give us only the ratios by which the atoms are present in the elements, but they give us no clue to the true structure of them—an ambiguity that increases for more complex substances. This ambiguity implied in the possibility of a double interpretation of the experimental results: the atomist interpretation that we have just described and an equivalentist interpretation, which has an origin that can be traced back to 1792, being due to Jeremias Benjamin Richter (1762-1807). He, as early as 1791, had already discovered the Law of reciprocal ratios. The notion of equivalence was formally introduced with this name by William Wollaston (1766-1828) in his work Synoptic Scale of Chemical Elements. Another source of confusion was due to the nomenclature used at the time. Dalton, among others, did not distinguish among the (modern) notions of atoms and molecules. The distinction between these two concepts began to be made from the hypothesis of Amédée Avogadro (1776-1850) and the volumetric laws of Joseph-Louis Gay-Lussac (1778-1850). These laws were developed in 1809 and implied in transposing to gases the results of the laws of Dalton for liquids and solids. The Avogadro hypothesis (that equal volumes of all gases in the same conditions of temperature and pressure have the same number of molecules) led him to distinguish among simple molecules (atoms) and compound molecules. For instance, Avogadro’s law allows one to understand the oxygen gas as being formed by molecules of 𝑂2 and not by atoms of 𝑂. Although having such an impressive beginning as a quantitative chemistry, the atomic theory (or atomic hypothesis at that time) continued to be greatly criticized. One of the most fierce critics of the atomic approach was the french chemist Jean-Baptiste Dumas (18001884). The basic reason for the rejection of the atomist approach was that one should “refuse to whatever speculation too far from the observed facts” (an idea that was also present in XX 𝑡ℎ century physics, although mitigated). Scientific Positivism was largely based on these ideas. New advances due to the use of the atomist approach continued to appear in the first half of the XIXth century. An example was the Law of Dulong and Petit (Pierre Louis Dulong,
Historical Background - XIXth Century and Beyond
11
1785-1838 and Alexis Thérese Petit, 1791-1820), announced in 1819. Another nice example was the more precise determination of the atomic weights of new elements, done by Berzelius, Auguste Laurent (1807-1853) and Charles-Berhardt (1816-1907) and the construction of an enlarged list of elements that, in 1869, was put is its systematic structure by Dmitri Ivanovitch Mendeleev (1834-1907) in the form of the Periodic Table. Besides the developments regarding the electromagnetic theory, there were in the first half of the XIXth century the electrochemical studies, done by Alessandro Volta (1745-1827), Sir Humphrey Davy (1778-1829) and Michael Faraday (1791-1867). Those discoveries helped assuming that the forces binding the atoms in the elements were of an electrical origin, giving rise to a whole theoretical approach to the electroaffinity of atoms, mainly developed by Berzeilus. Much of Faraday’s work was directed to the effects of galvanic currents in chemical composites, specially electrolytes. His technique furnished a different way to find relative atomic weights. Indeed, the implications of Faraday’s discovery in 1833 of the second Law of electrochemistry (the amount of substance deposited on an electrode in an electrochemical process is proportional to its equivalent weight) were revolutionary, since they implied the existence of a quantum of electricity (the electron). In 1860, the distinction among atoms and molecules that was initially a source of confusion was made clear in a congress in Kahrlsrühe, Germany. After 1860 science continued to present evidences of the adequacy of the atomic hypothesis, although an experimental verification would have to wait until the last decade of the XIX 𝑡ℎ century. Indeed, after the first half of the XIXth century, the development of the atomic hypothesis and its ever growing successful application to phenomena was driven mainly by the physical approach, although other important discoveries were also made in the realm of chemistry, as the rise of structural chemistry, in which the substances began to be considered from the point of view of their internal structure, by which atoms were bound to each other. However, Positivism (in its various forms) continued to influence some important scientists as Marcellin Berthelot (1827-1907), Ernest Mach (1838-1916), Wilhelm Ostwald (1853-1932) and Pierre Duhem (1861-1916).
1.3.2. Physical Atomism The theory of heat was another field for possible applications of the atomic hypothesis. For more than a century, until 1780, heat was generally accepted as a macroscopic manifestation of the atomic movements (mainly due to the work of Boyle), although not based on sound theoretical basis. During the period of time extending from 1780 to 1825 the caloric theory (an imponderable fluid) was generally accepted and, after a transition period (1825-1847), the atomic theory was revived in connection to a dynamic theory of heat. Thermodynamics had a major development during this time since most of its conclusions do not depend upon considerations about the microscopic constitution of bodies. Clearly, some properties of matter depend in an obvious way on its microscopic constitution and thus, even thermodynamics was about to reach a point in which atomist considerations would become unavoidable. In fact, as early as 1819, with the empirical Law of Dulong and Petit, one already had a situation in which the knowledge of the dynamitic properties of matter and
12
Olavo Leopoldino da Silva Filho
the use of the equipartition theorem of energy would be needed to a full explanation of the underlying phenomena. Around 1850, the Kinetic Theory of Gases rose from considerations that emerged in the transition period regarding the reality of atoms. This theory began, in particular, with the works of John Herapath (1790,1868), who proposed that the properties of a gas were a result of the vis viva of its constituent particles. At that time, however, there still was a confusion about the identification of the vis viva with the kinetic energy and Herapath found that the product 𝑝𝑉 was proportional to 𝑇 2 , instead of 𝑇. At the same period (1845), J. J. Waterston (1811-1883) tried (in vain) to publish a work that was some time later (1891) rediscovered by Lord Rayleigh, in which one can devise a first enunciation of the Equipartition Theorem, later generalized by Maxwell. In all cases, the Kinetic Theory of Gases intended to furnish a dynamic basis to the theory of heat based upon the hypothesis of an underlying microscopic reality of atoms. It created a whole field of investigations that already presupposed the existence of atoms and which, from a quantitative perspective, gave information about molecular sizes, velocities, etc. In one word, the ontology of the field was mature. From this moment on, the atomic hypothesis began to furnish predictions that could be confronted with the experiments. The atomic hypothesis became thus an Atomic Theory. Indeed, in 1847, James Prescott Joule (1818-1889) gave his famous conference in St. Ann and presented for the first time his Kinetic Theory of Gases, by means of which he was able to calculate the specific heat of various gases and managed to find the velocity of hydrogen atoms at a certain temperature 𝑇. After the work of Joule, there appeared the works of August Krönig (1822-1879) and Rudolph Clausius (1822-1888), the latter having published the important work entitled The Nature of the Motion Which we Call Heat (1847) in which he reached his own conclusions (independently of Herapath) about the importance of the notion of vis viva in the Kinetic Theory of Heat and, most important, he recognized that besides the translational degrees of freedom of the molecules, one should also consider, in the application of the Equipartition Theorem, the internal degrees of freedom. Another important contribution of Clausius was his understanding that the velocities calculated for the molecules or atoms of gases were only average velocities, introducing the notion of probability (and statistics) in the account of thermal phenomena. In another important work entitled On the Mean Lengths and Paths, published in 1858, Clausius introduced the concept of mean free path to refute the criticism, recurrent at that time, stating that if the molecules of some gas were to travel in rectilinear paths, given their high velocities, one should note the combination of different gases much faster than was actually observed. In his calculations, Clausius used the velocities of the molecules as being equal, since the notion of velocity distribution was not available. The concept of a velocity distribution would be developed only in 1860 by Maxwell in the work Illustrations of the Dynamical Theory of Gases. In 1862, Maxwell published his most important work about the Kinetic Theory of Gases, entitled On the Dynamic Theory of Gases, in which appeared a theory about the viscosity of fluids and allowed the calculation of atomic diameters, together with properties of transport, like diffusion, for instance. In the end, however, Clausius rejected the statistical approach he himself gave birth.
Historical Background - XIXth Century and Beyond
13
Around 1870 there was no evidence that the atoms of the physicists were the same as the atoms of the chemists. In 1871, Maxwell published his Theory of Heat in a period of time which also saw the appearance of Ludwig Boltzmann (1844-1906). Boltzmann, in 1868, extended Maxwell’s statistical approach to cases in which there are external potentials or forces. In 1872 he derived his famous integro-differential equation that allowed the interpretation of the mechanism by which gases tend to equilibrium (in the average) and also introduced the notion of irreversibility. The problem of irreversibility was extensively treated in his work, published in 1877, entitled On the Relation between the Second Law of Thermodynamic and the Theory of Probability, but it was unable to change the perspective of the community. The notion of entropy, however, appeared somewhat earlier. Josiah Willard Gibbs (18391903) published in 1875 a work about this concept, in which he rectifies the notion used by Maxwell. In 1902 he published his ideas in the work entitled Elementary Principles in Statistical Mechanics Developed with Special Reference to the Rational Foundations of Thermodynamics in which the field of modern statistical mechanics is settled almost as we know it today (in general terms). He used the notion of ensemble and introduced the notions of microcanonical, canonical and gran canonical systems. The quantum of electricity (the electron) was, as we have already said, implicit in much of the works about electrochemistry but the hypothesis of an elementary electric charge was posed by Henrik A. Lorentz and was discovered in 1894-97 by J. J. Thomson. The evolution of vacuum techniques allowed J. J. Thomson to find the ratio 𝑒/𝑚 of the electron and to show that it was of the order of 2000 to 4000 times less than the one found for the hydrogen atom. Experiments connected with molecular spectra, done mainly between 1860 and 1870, once understood in part (since its complete understanding would have to wait for Quantum Mechanics) were able to fill a gap between the chemical atom and the physical one, showing the conceptual unity allowed by a discontinuous (atomic) notion of the nature of matter. In 1905, Einstein showed, with his work on the Brownian movement, that the random movement of a particle immersed in a colloid was due to the movement of the molecules of that fluid. In some sense we may say that the old greek idea on the existence of atoms was never totally lost [5]. We developed these historical assessments on the struggle to understand the nature of light and the nature of matter to provide not only a clearer understanding of the developments of the XX 𝑡ℎ century on Quantum Mechanics, but also, and mainly, to furnish a background to accompany the feelings of the physicists involved in the rise of Quantum Theory. As we will see in the next section, the beginning of the XX 𝑡ℎ century was a period of enormous efforts (both technical and existential) to overcome the difficulties that were set forth by the XIXth century, specially with respect to the certainty then developed about the two natures: of light and matter. Only against this historical framework can the highness of the works of Heisenberg and others be fully understood (whether we agree or not with them about possible interpretations).
14
Olavo Leopoldino da Silva Filho
1.4. PHYSICS IN THE XXTH CENTURY At the end of the XIXth century, the electromagnetic theory was a sound field of physics. It had already been framed in a precise experimental status related to the propagation of light, etc. However it began to present its own failures connected to phenomena in which there is the concourse of both electromagnetic field and matter (the former conceived as a continuum and the latter as discrete, as we have described). The phenomena of absorption and emission were first discovered by Josef Fraunhofer (1787-1826) in connection with the dark fringes of the solar spectrum (from 1814 to 1817). These fringes were interpreted only in 1861 based upon the experiments of Robert Wilhelm Bunsen (1811-1899) and Gustav Kirchhoff (1824-1887). This marks the beginning of the history of Spectral Analysis, developed from the assumption that each gaseous element has a characteristic spectrum. The situations related to Spectral Analysis were not, obviously, exclusively a problem of optics, since they demanded also a model about the nature of matter. It is precisely at this point that the two approaches developed in previous sections clashed and required a unified view. Moreover, since the gases were always assumed to be at specific temperatures, the Kinetic Theory of Gases should be considered too. We could say, indeed, that the questions about the emission spectra were central to the unification of the various fields of research of the XIXth century, a special example of which is the problem of the black body radiation spectra, solved by Planck (1858-1947) in 1900, inaugurating the whole field of Quantum Mechanics. Planck’s explanation for the black body experiment was based on a hypothesis about the nature of the electromagnetic field that was incompatible with the notion accepted at that time on the continuous nature of the electromagnetic wave (Planck itself were at pains to accept his own assumption; after all, he himself was a man of the XIX 𝑡ℎ century). Later on, Einstein would develop an explanation for the photoelectric effect that also assume as necessary that the electromagnetic field is composed of discrete entities, or a result of their behavior—the difference among the two explanations, however, was that with Planck’s it was the energy that should be discrete, while with Einstein’s explanation it were also the very ontological entities that compose light that should be discrete (the photon). The hypothesis of the quantum of energy or radiation was later used by Einstein, Compton, Duane, Bohr, Debye and many others to explain other phenomena assuming a corpuscular nature for the electromagnetic radiation. However, there were still the experiments of interference and diffraction, among others, that seemed to necessarily depend on the assumption of a wavelike, continuous, structure for radiation. These excruciating results destroyed all the confidence so hardly constructed in the XIXth century. They also seemed to block an interpretation of Nature in terms of a monist ontology, that is, one that depends only on one construct, be it a continuous ether or discrete atoms. When considered against the framework just mentioned in previous sections, these results seem almost brutal. In the next section we will present the major developments that led to the formalism and the interpretation of Quantum Mechanics. We will base our presentation in the beautiful work
Historical Background - XIXth Century and Beyond
15
of Jagdish Mehra [8] that rests upon the voices of the protagonists of this drama. For a more complete presentation, the reader is referred to [8].
1.5. THE “IMPOSSIBLE” REDUCTION In this section we try to historically reconstruct the process by which Bohr and Heisenberg arrived at their principles of Complementarity and Uncertainty. We also aim at furnishing the general scheme from which one can understand the logical role that such principles play in the development of the interpretation of Quantum Mechanics that appeared in the Fifth Solvay Conference (Brussels, October 1927). It is important to say that, despite the great success achieved in the years 1900 to 1922 with the explanations of various phenomena (e.g. black body radiation, photoelectric effect, Compton effect, Specific heat of solids, etc.), Quantum Mechanics was far from being a physical theory. As with Newtonian mechanics and classical Electromagnetism, physicists always search for a calculation scheme, together with a reasonable interpretation of it, that allow one to assess quantitatively and qualitatively the underlying physical phenomena. Although this section does not describe the successful explanations in the years before 1925, since they are very well known by any undergraduate student, it is important to point out that these explanations completely lack the systematic character that should be expected from a physical theory. In fact, each problem of this period had to be tackled in its own terms, by specific postulates that was difficult to transfer to other problems. In the case of Bohr’s solution for the hydrogen atom (or the Bohr-Sommerfeld quantization rules) the problem is even more complex, since there was no clue of what allowed one to write such rules and they seemed to contradict other rules coming from the classical theories. This search for unit was the endeavor of many great physicists at the beginning of the XX 𝑡ℎ century and their voices and feelings are what we briefly present in the following. Much wavering should be expected... Again.
1.5.1. Early Discussions between Bohr and Einstein Albert Einstein published his work about the emission of radiation in 1916-17 and noticed that a statistical residue seemed essential to the theory. This could point to a nondeterministic nature for the world that would shake, in Einstein’s opinion, the grounds of the notion of causality. With the photoelectric effect and the black body radiation (the Compton effect wasn’t discovered yet), the continuous nature of electromagnetic radiation became disputable. These are the two main problems aflicting Einstein: the irremovable statistical character of the approach to these phenomena and the problem of the continuum. In 1920 he wrote to Max Born: “I myself do not believe that the solution to the quanta has to be found by giving up the continuum. Similarly, it could be assumed that one could arrive at general relativity
16
Olavo Leopoldino da Silva Filho by giving up the coordinate system. I believe now, as before, that one has to look for redundancy in determination by using differential equations so that the solutions themselves no longer have the character of continuum. But how?” (Ref. [9], p. 21)
Einstein, thus, was searching for a description of quantum phenomena through a differential equation that will preserve in mathematical terms, by its continuous character, the relation of causality between events but, on the other hand, would imply in the discrete solutions that were experimentally observed. This would be exactly what Schrödinger would find later on. At these earlier times, Einstein and Bohr didn’t met and the formulations of Quantum Mechanics by Born, Heisenberg, Jordan, and Dirac and Schrödinger were too far ahead. However, in the same period of time, Niels Bohr already believed that, although the classical and quantum theories are connected in an asymptotic way by means of the Correspondence Principle, they were irreconcilable. Einstein, on the other hand, found this idea repugnant. He wrote to Max Born, in 1919: “The quantum theory gives me a feeling very much like yours. One really ought to be ashamed of its success, because it has been obtained in accordance with the Jesuit maxim: ‘Let not thy left hand know what thy right hand doeth’.”(Ref.[9], pp. 10,11)
At this time, Einstein believed that a complete theory of light should embody both the corpuscular and undulatory perspectives. This was a natural consequence of his constraints of keeping with the continuum and causality but, at the same time, also maintaining the idea of the radiation quanta. Bohr, at these earlier times of the XX 𝑡ℎ century, defended the undulatory classical theory. He, as well as Planck, thought about the light quanta as only an interesting hypothesis, without referent in the world. In this period of time Bohr searched for a complete breaking with the ideas of classical mechanics. Bohr and Einstein met for the first time around 1920. The difference between the two great physicists was in its summit around 1922, when the Compton effect was discovered. This effect implied in necessary ways the existence of the light quanta. Both Bohr and Planck believed that a theory based only upon corpuscles would imply in enormous difficulties for the explanation of phenomena related to the Electromagnetic Theory. Nevertheless, Bohr wanted to explain the Compton effect without the hypothesis of the light quanta. He tried this with the young physicist John Slater and the physicist Hendrik Kramers by postulating a quantum theory of radiation in which there wouldn’t be strict energy conservation. Conservation principles were the backbone of classical physics and this proposal was considered heretical by Einstein and Wolfgang Pauli. Einstein wrote to Born on 29 April 1924: “Bohr’s opinion about radiation is of great interest. But I should not be forced into abandoning strict causality without defending it more strongly than I have so far. I find the idea quite intolerable that an electron exposed to radiation should choose of its own free will, not only its moment to jump off, but also its direction. In that case, I would rather be a cobbler, or even an employee in a gambling house, than a physicist. It is true,
Historical Background - XIXth Century and Beyond
17
my attempts to give tangible form to the quanta have foundered again and again, but I am far from giving up hope.” (Ref.[9], p. 82)
He also wrote to Paul Ehrenfest on May 1, 1924: “A final abandonment of strict causality is very hard to me to tolerate.”(quoted in [10], p. 124)
In 1925, however, Walther Bothe and Hans Geiger made the experiment on the Compotn effect with only one electron, showing the strict energy and linear momentum conservation for this phenomenon. Bohr was then forced to abandon his approach. He wrote to Rutherford saying how he felt absolutely miserable. From this time on (1925), Bohr has already accepted Einstein’s idea about the quantum of light, but was very concerned about the difficulties of applying the notions of classical physics to quantum theory. In a letter to Einstein in 1927, Bohr wrote: “[the concepts of classical physics] give us only the choice between Scylla and Charybdis, depending on whether we direct our attention to the continuous or discontinuous features of the description.” (Quoted in [10], p. 125)4
These were the times of the birth of the Complementarity Principle (which we will address more thoroughly in what follows). Note that its formulation is strongly based upon the yet old discussion about the continuous and the discontinuous. Bohr also sent to Einstein a copy of Heisenberg’s work about the Uncertainty Relations. He was trying to connect the Uncertainty Principle with his own discussion with Einstein and told Einstein in his letter that Heisenberg, with his principle, had shown that inconsistencies could be avoided only because “limitations of our concepts coincide with the limitations of our capacities of observation”. Thus, in April of 1927 Bohr had already foreseen his Complementary Principle. He wrote to Einstein on April 13, 1927 that: “In view of this new formulation [Heisenberg’s indeterminacy relations] it becomes possible to reconcile the requirement of energy conservation with the implications of the wave theory of light, since according to the character of the description the different aspects of the problem never manifest themselves simultaneously.” (quoted in [10], pp. 125, 126)
1.5.2. Three Roads to the Quantum The first encounter between Werner Heisenberg and Niels Bohr was in Göttingen, in a set of conferences given by the latter from June, 12𝑡ℎ until June, 22𝑡ℎ , 1922. Arnold Sommerfeld 4
In Greek mythology, Scylla and Charybdis were a pair of monsters who lived on opposite ends of the Strait of Messina between Italy and Sicily. Scylla was a six-headed beast with three rows of sharp teeth in each head. When ships passed close by her, she struck out to grab and eat unwary sailors. Charybdis was a dangerous whirlpool across the strait from Scylla. Ships sailing the strait were almost certain to be destroyed by one of the monsters.
18
Olavo Leopoldino da Silva Filho
took two of his most brilliant students to attend to these conferences: Wolfgang Pauli and Werner Heisenberg. Heisenberg made a very good impression on Bohr and as a result of this, as Heisenberg recollects: “When the discussion was over, Bohr came to me and suggested that we should go for a walk together on the Hainberg outside Göttingen. Of course, I was very willing. (...) That discussion which took us back and forth over Hainberg’s wooded heights was the first thorough discussion I can remember on the fundamental physical and philosophical problems of modern atomic theory, and it has certainly had a decisive influence on my later career. For the first time I understood that Bohr’s view of atomic theory was much more skeptical than that of many other physicists, e.g. Sommerfeld, at that time, and that his insight into the structure of the theory was not a result of a mathematical analysis of the basic assumptions, but rather of an intense occupation with the actual phenomena, such that it was possible for him to sense the relationships intuitively, rather than derive them formally.”[11]
Bohr invited Heisenberg to go to Copenhagen for a few weeks in the following winter and after that, possibly, to stay with a scholarship to work for a longer period of time. Heisenberg was very proud for the invitation and also very happy, for he knew that his good friend Wolfgang Pauli was going to Copenhagen at the end of 1922. Although its arrival at Denmark would have been somewhat delayed, he stayed in contact with Bohr’s Institute, mainly through his discussions with Pauli about his progress in the anomalous Zeeman effect. Heisenberg finally arrived at Copenhagen at Easter, 1924. He went prepared with strong criticisms against the Bohr’s Correspondence Principle but soon became one of its fiercest defenders. Heisenberg returned to Copenhagen in the end of 1924. There he worked with Bohr and Kramers in some specific problems of atomic theory for which he was trying to formulate the Correspondence Principle in terms of equations from which new results could be derived. This search for a higher degree of formalization was inherited from the Göttingen physicists and was quite a happy accident that he, fond of formal approaches, had become in contact with Bohr, a highly intuitive physicist, but without the same enthusiasm for formalizations. Heisenberg did manage to establish the Correspondence Principle with more precision and this increased his believe in the approach being made at Copenhagen. As Heisenberg latter recalled, he was expecting that “Perhaps it would be possible one day, simply by clever guessing, to achieve the passage to a complete mathematical scheme of quantum mechanics.”[12]
In 1925, Heisenberg returned to Göttingen as Privatdozent for the summer semester. In Göttingen, Heisenberg tried to find the spectral lines of Hydrogen based on the Correspondence Principle but in this specific task he failed. He then concluded that the difficulties about the quantization rules were of a more general nature and that this should be dealt with before everything else. These difficulties, he believed, were due not mainly to a departure of quantum physics from classical mechanics but rather to the need for a rupture with kinematics upon which classical mechanics was based. We can hear Bohr’s own views echoing here.
Historical Background - XIXth Century and Beyond
19
From this presupposition, Heisenberg assumed a quite new idea: he postulated that the classical equations for the electron could be kept, but the kinematic interpretation of the quantity 𝑥(𝑡) as being a position depending on time should be rejected. He expressed 𝑥(𝑡) as a Fourier series in terms of the coefficients and frequencies that corresponded to the transition between one state 𝑛 and another state 𝑛 − 𝑚. He also motivated the introduction of the transition amplitude 𝑎(𝑛, 𝑛 − 𝑚) by saying that the intensities, and, thus, the probabilities, proportional to |𝑎(𝑛, 𝑛 − 𝑚)|2 were the observable quantities, in contrast with the function 𝑥(𝑡). The importance of the idea of using only observable quantities in physical theories was exhaustively discussed in Göttingen since Mach, Einstein and Minkowski introduced it. Born, Pauli, Jordan and Heisenberg discussed it for quite a long period of time in the context of Quantum Theory. However, it was Heisenberg that made this idea of using only observable quantities as the guiding philosophical principle for his quantum-theoretical interpretation of the kinematic variables. In the end, Heisenberg was capable of showing that the product of two quantum variables 𝑥(𝑡) and 𝑦(𝑡), written in terms of their Fourier series and reinterpreted according to his prescriptions, should obey to a non-commutative rule. With this interpretation, the Correspondence Principle was embraced in the very foundations of the theory.5 At that time, physicists were not acquainted with formalisms that involved noncommuting variables and Heisenberg was confused about it. Despite of this, Heisenberg was able to solve, in the beginning of July, a specific problem. In the winter of 1925, Heisenberg had a serious problem of hay fever and took some days on vacation in Helgoland. There he gave clearer boundaries to his ideas solving two important problems. He also proved that his reinterpretation of the kinematic variables would imply in a strict law for the energy conservation—something that has become of much importance since the Bothe-Geiger experiment on the Compton effect. He became very excited about his semi-formal scheme. When travelling back to Göttingen, Heisenberg took a time to visit Pauli, now in Hamburg, of whom he was very fond. Pauli heard his ideas and encouraged him to endure. In the following weeks, Heisenberg and Pauli exchanged many letters and in July, 1925, Heisenberg sent to Pauli the final version of his manuscript. Pauli had a favorable opinion about it. In the middle of July Heisenberg gave his work to Max Born and asked him to take a look at it. Born read the paper and became fascinated. He recalled that “I... was soon involved in it that I thought the whole day and could hardly sleep at night... In the morning I suddenly saw the light: Heisenberg’s symbolic multiplication was nothing but the matrix calculus, well known to me since my student days from the lectures of Rosanes in Breslay” (see Ref.[13], Vol. 3, Chapter I)
This was the beginning of a strong process of formalization of Quantum Mechanics. Born put Heisenberg’s condition in matrix notation in the way they are presently known. When coming back to Hanover, Born persuaded Jordan to help him with his work, and this collaboration led to their formulation of Quantum Mechanics, ended at september, 27𝑡ℎ , 5
Indeed, Heisenberg sustained the prescription of using only observable entities without perceiving that his notion of "observable" was quite different from the one coming from the Special Theory of Relativity.
20
Olavo Leopoldino da Silva Filho
1925. This work had a summary of all matrix methods, the interpretation of Heisenberg’s multiplication in terms of matrices, the proof of energy conservation and the proof of Bohr’s conditions for frequencies. Other developments in the direction of the completion of the Quantum Theory immediately began with a collaboration between Born, Heisenberg and Jordan. Their work became known as the Drei-Männer-Arbeit. This work had the essentials of the whole formal apparatus of modern Quantum Mechanics. Wolfgang Pauli quickly applied the new formalism to the hydrogen problem. This was a moment of triumph for the new Quantum Theory and Niels Bohr wrote a letter to Rutherford saying that all the reasons for his feelings of misery in the last spring were now gone. However, slightly before the Born-Heisenberg-Jordan paper was published in the Zeitschrift für Physik (January of 1926), Paul Adrien Maurice Dirac published his own work on Quantum Mechanics in the Proceedings of the Royal Society. In July, 1925, after he had sent his manuscript to Max Born, Heisenberg went to Leyden and Cambridge. In Leyden he was a guest of Paul Ehrenfest with whom, together with Uhlenbeck and Goudsmit, he had many conversations about spectroscopy. Uhlenbeck and Goudsmit were about to propose the hypothesis of the electronic spin. Heisenberg left Leyden and went to Cambridge where he was a guest of R. H. Fowler. Privately, he mentioned to Fowler his new ideas about the reformulation of the kinematic variables. Fowler became very interested and asked Heisenberg for a copy of his work as soon as they were available. When he had them at hands he sent them to Dirac. At that time, Dirac was working with the Hamiltonian approach to Classical Mechanics. Dirac was trying to extend this formalism to Quantum Mechanics. At first he wasn’t able to see the relations of Heisenberg’s formalism with his Hamiltonian approach and did not pay much attention to the paper. Later, when taking another look at it, it became suddenly clear for him that Heisenberg’s idea had furnished the key for all “the mystery”. In the weeks that followed, Dirac tried to connect the reinterpretation of the kinematic variables with the formal apparatus of Classical Hamiltonian Mechanics. During a long walk in September of 1925 it became clear to him that the quantity analogous to the Poisson brackets should be the commutator. This allowed Dirac to stay in the safe domain of the Hamiltonian approach. He then presented his new results to Fowler that immediately understood their importance. Fowler then managed to have Dirac’s paper, entitled The Fundamental Equations of Quantum Mechanics, published as soon as possible. In this work, Dirac briefly presented Heisenberg’s ideas, making the formalism clearer and more elegant. He anticipated all the more essential results of the work by Born and Jordan and Born, Jordan and Heisenberg, and derived many other results. He also applied his formalism to the hydrogen atom, citing the calculations made by Pauli that weren’t published yet. He also found the splitting of the spectral lines of atoms in the presence of magnetic field in agreement with the experiments. However, another line of reasoning leading to the formalization of Quantum Mechanics was being developed. Since 1921, Erwin Schrödinger was in the University of Zurich. Around the summer of 1925, Schrödinger was very upset with his staying at Zurich and was anxious to return to Austria. He was also very distressed with the work of Heisenberg, Born and Jordan on matrix Quantum Mechanics. He wrote:
Historical Background - XIXth Century and Beyond
21
“I was discouraged, if not repelled by what appeared to me a rather difficult method of transcendental algebra, defying any visualization.” (See Ref.[13], Vol. 5, Part 2, Chapter IV, section 5.)
He then decided to approach atomic physics with a formalism that could be considered as a genuine alternative to the formalism of Heisenberg, Born and Jordan and the one of Dirac, but that also could give a contribution to the interpretation of quantum phenomena. In four communications in the Annalen der Physik, submitted from the end of January to the end of June of 1926, Schrödinger developed his theory of Undulatory Mechanics, entitled Quantization as an Eigenvalue Problem. He postulated his fundamental equation and solved the hydrogen spectrum, being helped with the mathematical details by Hermann Weyl. With his communications, Schrödinger furnished the grounds to treat all those problems of atomic physics that no one was able to approach using the ideas Bohr used to treat the hydrogen atom. Schrödinger’s work matched very neatly the ideas of Einstein already mentioned. Schrödinger soon realized, however, that his approach and that of Heisenberg and Born do not collide, but are complementary. After the publication of Schrödinger’s works, undulatory Quantum Mechanics were successfully applied to a wide number of problems. It became clear that the theory could be extended to deal with problems that weren’t imagined by Schrödinger.
1.5.3. The Beginning of the Battle on the Interpretation Schrödinger was very fond of interpretation issues. His main criticism against the matrix approach was its inability to provide a clear visualization of the quantum phenomena. In the forth part of his communications he developed an interpretation for the function 𝜓(𝑥, 𝑡), called the wave-function. His interpretation has electromagnetic leanings since, according to Schrödinger, the density of electric charge in the case of many particle systems could be thought as given by: “𝜓 ∗ 𝜓 [which] is a kind of weight function in the system’s configuration space. The wave mechanical configuration is a superposition of many, strictly speaking of all, pointmechanical configurations kinematically possible. Thus, each point-mechanical configuration contributes to the true wave mechanical configuration with a certain weight, which is given precisely by 𝜓 ∗ 𝜓.”(See[13], Vol 5, Part 2, Chapter IV, p. 797)
While for macroscopic systems and movements the weight function occupies a very small region of space and should be irrelevant to the overal physical behavior, the distribution of the density 𝜓 ∗ (𝑟, 𝑡)𝜓(𝑟, 𝑡) over some microscopic region should play a fundamental role. Thus, this quantity could make the difference for quantum mechanical problems, while being irrelevant to classical situations. It is important to note that, according to this approach, the notion of particle would be dependent on the notion of waves, since the particle behavior would be the manifestation of an underlying wave nature at smaller scales (given by the wave-function).
22
Olavo Leopoldino da Silva Filho
He was latter forced to change his perspectives by assuming that the vibrations represented by 𝜓(𝑟, 𝑡) should be connected to the “very real effective electrodynamic fluctuations of the spacial electric density”. Surely, this new interpretation weakened his argument for visualization and reality, but Schrödinger in a letter to Wien said that: “this does not matter at all. If one can only control, with their help [i.e., of the 𝜓 vibrations], distributions and fluctuations of electricity, which are real in the highest sense, then one may be allowed to call them a substitute concept in the same sense as one speaks of the electrodynamic potentials, of which only the derivatives can be observed.” (See letter from Schrödinger to Wien, June 18,1926, Wien Collection, Deutsches Museum, Munich.)
Schrödinger’s ideas were most favorably received by Planck, Einstein, Lorenz, Sommerfeld and Wien, among others. They considered truly important the use of differential equations to approach Quantum Mechanics. Heisenberg, on the other hand, believed that there should be only one scheme for the solution of problems in Quantum Mechanics and once this scheme was found, there should be no room for any other. Heisenberg was much distressed with Schrödinger’s theory and hope it was wrong. When in June, 1926, Born applied Schrödinger’s method to problems of atomic collisions (leading him to the interpretation of the wave-function in statistical terms), Heisenberg reproached him. Heisenberg, in 1968, recalled very vividly Schrödinger’s speech to which he attended. He recalled that: “In July, 1926, Schrödinger was invited to Munich by Wilhelm Wien to report on his theory. The experimental physicists in Munich, headed by Wien, were enthusiastic about the possibility that now perhaps this ‘quantum mystery of atomic physics’ might be dealt with, and one would be able to return to the classical concepts of honest fields, such as one had learned from Maxwell’s theory. I listened to this lecture by Schrödinger, as I was then staying with my parents in Munich for the vacation; and I was quite horrified by his interpretation, because I simply could not believe it. I objected (in the discussion) that with such an interpretation one would not even be able to explain Planck’s radiation law. But the general opinion at that time was extremely hostile toward my objection. Wien answered me very harshly in that he could understand how I felt about the fact that the whole quantum jumping, the matrices and all that had become superfluous; anyway, it would be better for me to leave the field to Schrödinger, who would certainly solve all the difficulties in the [near] future. This was not very encouraging; I did not have the slightest chance to get across my point of view in the discussion.” (See Ref. [13], Vol. 5, Part 2, Chapter IV.5, p. 803)
And, in a letter to Wolfgang Pauli in June he wrote: “the more I ponder on the physical part of Schrödinger’s theory, the more detestable I find it. One should imagine the rotating electron, whose charge is distributed over the entire space and which has an axis in a fourth dimension. What Schrödinger writes on the visualizability of his theory... I find rubbish. The great achievement of Schrödinger’s theory is the calculation of matrix elements.”[14]
Historical Background - XIXth Century and Beyond
23
Bohr invited Schrödinger in a letter of September 11𝑡ℎ , 1926 to come to Copenhagen and he arrived there in October of the same year. His arrival was so expected that, according to Heisenberg: “Bohr’s discussions with Schrödinger began at the railway station and were continued daily form early morning until late at night. Schrödinger stayed at Bohr’s house so that nothing would interrupt the conversations.” (See Ref. [15], pp. 73-75)
Heisenberg recalled that Schrödinger’s main line of attack was against the notion of “quantum jumps”. He put forth a number of problems against this idea trying to show that one would reach many contradictions if one were to use ordinary concepts. Bohr then agreed with Schrödinger, but argued that this wasn’t a proof against quantum jumps, but only a statement that they cannot be imagined by us using the concepts we develop to understand our daily life and experiments in Classical Physics. In this sense, it should be these concepts that would need reformulation—another clear manifestation of Bohr’s leanings to promote a conceptual rupture between classical and quantum worlds. Schrödinger continued to resist to getting rid of these usual concepts and argued that his approach allowed an adequate idea of the phenomenon without appealing to quantum jumps. Bohr, however, disagreed and rebutted by saying that both Einstein and Planck had used the discontinuity in the energy that should imply that the atom should assume only certain discrete values for this variable and, furthermore, the atom should change the value of this variable only in discontinuous ways from time to time. Continuum and discrete are showing themselves here again, and we are already far from the XIX 𝑡ℎ century. Schrödinger himself accepted that these relations are not fully comprehended but also argued that Bohr and his followers also failed in presenting a satisfactory interpretation for Quantum Mechanics. Bohr agreed, saying that inconsistencies did persist. He kept sustaining the fundamental role of the concept of quantum jumps for the interpretation of Quantum Mechanics taking Schrödinger to the point of exasperation, when he affirmed: “if all this quantum jumping were here to stay, I should be sorry I ever got involved with quantum theory.”
To which Bohr replied: “But the rest of us are extremely grateful that you did; your wave mechanics has contributed so much to the mathematical clarity and simplicity that it represents a gigantic advance over all previous forms of quantum mechanics.” (See Ref.[15], pp. 7375)
Heisenberg also recalled that the continuing discussions exhausted Schrödinger: “After a few days Schrödinger fell ill, perhaps as a result of his enormous effort; in any case, he was forced to keep to his bed with a feverish cold. While Mrs. Bohr nursed him and brought in tea and cake, Niels Bohr kept sitting on the edge of the bed talking to Schrödinger: ‘But you must surely admit that...’ (...) [behaving as] an almost remorseless fanatic.” (See Ref.[15], pp. 73-75)
24
Olavo Leopoldino da Silva Filho
Heisenberg depicted Bohr as almost a “remorseless fanatic”, and said that Bohr felt as his obligation to convince his miserable guest that the approach being taken at Copenhagen was the most adequate one. However, as Heisenberg said: “no real understanding could be expected since, at that time, neither side was able to offer a complete and coherent interpretation of quantum mechanics. For all that, we in Copenhagen felt convinced toward the end of Schrödinger’s visit that we were on the right track, though we fully realized how difficult it would be to convince even leading physicists that they must abandon all attempts to construct perceptual models of atomic processes.” (See [15], pp. 73-75)
Schrödinger’s position after his encounter with Bohr became clear in this letter to Wien: “Quite certainly, the point of view of [using] visualizable pictures, which de Broglie and I assume, has not been carried through nearly far enough in order to render an account of the most important facts [of atomic theory]. It is of course probable that here and there a wrong path was taken that must now be abandoned. But that, even if one is Niels Bohr, one could possibly say at this point: the visualizable wave pictures work as little as the visualizable point [particle] models, there being something in the results of observation which cannot be grasped by our earthwhile way of thinking; this I do not believe. I believe even less since for me the comprehensibility of the external processes in nature is an axiom, say, in the sense: to grasp experience means nothing more than establishing the best possible organization among the different facts of experience.” (See letter from Schrödinger to Wien, October 21, 1926, in the Wien Collection, Deutsches Museum, Munich.)
Schrödinger considered premature to abandon so general concepts as space, time and causality; concepts that had been preserved in the General Theory of Relativity, even though Quantum Mechanics was completely new and surprising. Schrödinger wrote to Wien: “I can only say that I don’t care for this whole play of waves, if it should turn out to be nothing more than a comfortable computational device to evaluate matrix elements.” (See Wien Collection, Deutsches Museum, Munich.)
In the end, what Schrödinger was not willing to leave aside was the role of the interpretation (and visualization) of physical models to our understanding of the physical world–models that could be thought only with the concourse of our ordinary way of thinking. As we will see in a future section, the efforts of the physicists in Copenhagen to provide an interpretation to Quantum Mechanics were responsible for the appearance of the Heisenberg uncertainty relations and Bohr’s ideas about complementarity, that would become the backbone of the Interpretation of 1927.
1.5.4. The Statistical Interpretation of Max Born As mentioned in the previous subsection, Born used, in the summer of 1926, Schrödinger’s formalism to develop a quantum mechanical explanation for a collision process
Historical Background - XIXth Century and Beyond
25
between a free particle and an atom. This type of problem (as with the Compton effect, for example) tends to favor a corpuscular model. Thus, to make this corpuscular perspective compatible with the undulatory formalism of Schrödinger, Max Born interpreted the product 𝜓(𝑟, 𝑡)∗ 𝜓(𝑟, 𝑡) as the probability of an electron to approach the atom. In these explanations, Born favoured the corpuscular model and tried to associate a wave function to particles. It is important to note that, with this manoeuvre, Born subverted Schrödinger’s ontology. While for Schrödinger the corpuscles are only epiphenomena of an underlying undulatory nature (for matter)—and this nature is the element of reality—for Born the corpuscles are those having primary ontological reality, and the wave represents only a statistical way to represent the probability related to the behavior of these corpuscles. Born was following a path that could be traced back to the XIX 𝑡ℎ century, in which the statistical approach used in the Kinetic Theory of Gases also connected probabilistic descriptions to the corpuscular model. The new concept, obviously, is that the probability related to the phenomenon behaves like a wave. In this way, the ontology may be kept based on a corpuscular model and the undulatory pattern may be connected to the behavior of the particles (as an ensemble, for example—we will return to this problem later on in this book). Born in his work said: “it was almost self-understood to regard |𝜓|2 as the probability density of particles.” (My bold—See [16], p. 285)
The reader should be aware of the differences between behavior and nature, since they are crucial to the present book too. One may have an undulatory (probabilistic) behavior connected to a corpuscular nature, as with Born’s interpretation, or one may have a corpuscular behavior connected to an undulatory nature, as with Schrödinger’s interpretation. As we will see, the Duality Principle, upon which Bohr would base his thinking says that: (a) the nature of light and matter is both undulatory and corpuscular (the strongest ontological version) or (b) the behavior of light and matter is both undulatory and corpuscular (the weaker version). In both cases, stronger or weaker, there is a collapse between the notions of behavior and nature, or vice versa: they are because they behave or they behave because they are (even when we are not interested in coping with the “because-clause”)—in any case, there is no distinction between behaving and being. Thus, for Schrödinger the electron was in reality scattered through a region represented by those points of space where the wave function has appreciable value and, because of this, it would be basically a wave, not a particle. The weights to which he referred would be the contributions of each point of space to the final charge of the electron. On the other hand, for Born the electron was in reality simply a particle, that could occupy (ideally) only a point in space, but could occupy these points with statistical weights. The difference between the meanings of “weight” for Born and Schrödinger represent the difference in the respective ontological perspectives above mentioned. Schrödinger was confronted with the probabilistic interpretation of the wave-function soon after its conception and in a letter to Wien he confided his concerns about it. Schrödinger, however, remained irreductible on his conceptions and wrote to Max Born:
26
Olavo Leopoldino da Silva Filho “I have, however, the impression that you and others, who essentially share your opinion, are too deeply under the spell of those concepts [stationary states, quantum jumps, etc.], which have obtained civil rights in our thinking in the last dozen years; hence you cannot do full justice to an attempt to break away from this scheme of thought. (...) What is before my eyes, is only one thesis: one should not, even if a hundred trials fail, give up the hope of arriving at the goal—I do not say by means of classical pictures, but by logically consistent conceptions of the real structure of space-time processes. It is extremely probable that this is possible.” (See [13], Vol. 5, Part 2, Chapter IV, p. 829)
The work of Born paved the way to the final establishment of the interpretation of 1927. However, against his reduction, by which one would need only an ontology of particles, there were the recrudescent phenomena of interference and diffraction of matter and radiation to which only undulatory explanations were available.
1.5.5. The formulation of the Uncertainty and Complementary Principles The problem of the interpretation of Quantum Mechanics was Bohr’s concern with increasing strength since 1923, when the question about the nature of radiation became crucial to the understanding of the Compton effect. For Heisenberg, that had anxiously pressed for the abandonment of classical concepts, as electronic orbits in atoms, the problem about interpretations of the formalism came later, in 1925, when he thought about the simultaneous existence of a discrete energy for bound electrons in atoms and continuous energy spectra of free electrons moving along well defined paths. He knew that the complete determination of the momenta (Δ𝑝 = 0) for free electrons would imply an infinte Δ𝑞. At the end of 1926, Heisenberg returned to the question of the space-time description of the electron’s behavior in the atom. Pauli had remembered him that Schrödinger’s wave function should also be considered as defined in the momentum space 𝜓(𝑝), in the same way as it could be defined in the configuration space. To Pauli’s statement, Heisenberg answered: “The fundamental equivalence of 𝑝 and 𝑞 pleases me very much. Thus, in the wave formulation, the equation 𝑝𝑞 − 𝑞𝑝 = ℎ/2𝜋𝑖 always corresponds to the fact that it makes no sense to speak of a monochromatic wave at a definite moment. (...) If the [spectral] line may be taken as being not too sharp, i.e., the time interval is not too small, that of course makes sense. Analogously, there is no point in talking about the position of a particle of a definite velocity. However, it makes sense if one does not consider the velocity and the position too accurately. It is quite clear that, macroscopically, it is meaningful to talk about the position and velocity of a body.” (Heisenberg to Pauli October 28, 1926, [14])
The discussions between Bohr and Heisenberg, sometimes stormy, began just after Schrödinger had left Copenhagen and continued during all the winter months. Pauli was kept informed of the course of these discussions. Heisenberg wrote to him:
Historical Background - XIXth Century and Beyond
27
“During these months I spoke with Bohr almost daily about the fundamental problems of quantum theory. Bohr sought to make the duality between the wave picture and the corpuscular picture as the starting point of the physical interpretation, while I tried to derive my conclusions—without the help of wave mechanics—by appealing only to quantum mechanics and Dirac’s transformation theory.” (See Ref. [15], p. 45)
As Heisenberg recalled: “Bohr and I tried from different angles and therefore it was difficult to agree. Whenever Bohr could give an example in which I cound’t find the answer, then it was clear that we had not understood what the actual situation was... Shortly after Christmas, we both were in a kind of despair. In some way we couldn’t agree and so we were a bit angry about it. So about mid-February 1927, Bohr left for a skiing vacation in Norway. Eariler he had thought about taking me with him, but then he decided against it. He wanted to be alone and think, while I stayed in Copenhagen.” (See W. Heisenberg, AHQP Interview, February 25, l963, p. 16; conversations with J. Mehra, Geneva, July 1962)
In February 23, 1927, Heisenberg wrote a long letter to Pauli with an analysis of the commutation relation. He stated that it has the following physical interpretation: given the exact momentum 𝑝 of an electron in a atom, its position is then completely undetermined, and vice versa[14]. Heisenberg discussed the thought experiment for the observation of an electron by means of a 𝛾-ray microscope. He then wrote to Pauli to ask for his severe criticism but Pauli’s reaction was extremely positive by approving Heisenberg’s idea on the uncertainty principle, and thought this interpretation endowed Quantum Mechanics with a coherent physical meaning. After receiving Pauli’s impressions, Heisenberg wrote to Bohr: “I myself have worked very vigorously during recent weeks in order to carry through the program about which we talked before your departure. I believe that I have fully succeeded. The case, in which 𝑝 as well as 𝑞 are given certain accuracy, can be formulated without going beyond the Dirac, Jordan mathematics.... Further, one finds that the transition from ‘micro- to macromechanics’ can be understood very easily: classical mechanics is entirely a part of quantum mechanics. As for the old question concerning ‘statistical ou causal law’, the situation is this: one cannot say that quantum mechanics is statistical. However, one can obtain only statistical results, if one wants to calculate ‘future events’ from the ‘present’, since one cannot take into account all the initial conditions of the present.” (See [17], p. 128)
In his paper, published in the Zeitschrift für Physik on March 23, Heisenberg turned his attentions also to the question about visualization together with the theory’s status of completion. He began his considerations saying: “We regard a physical theory to be perceptual [only] if we can think of the experimental consequences of this theory qualitatively in all simple cases, and [when] we have recognized that the application of this theory never leads to inner contradictions. (...) Quantum mechanics arose exactly out of the attempt to break away from those habitual kinematic concepts, and substitute in their place concrete experimentally given
28
Olavo Leopoldino da Silva Filho magnitudes. Since this seems to have been achieved, the mathematical scheme of quantum mechanics would therefore not require a revision. All concepts, which are needed to describe a mechanical system in classical physics, can also be defined— analogously to classical notions—exactly for atomic processes. The experiments, which serve to define them, also lead to an inherent indeterminacy, if we demand from them the simultaneous determination of two canonically conjugate quantities (...) In this case, the quantum theory can be closed compared to the relativity theory.(...) The more accurately is the position determined, the more uncertain is the momentum and vice versa; in this we find a directly visualizable explanation of the relation 𝑝𝑞 − 𝑞𝑝 = ℎ/2𝜋𝑖.” (See [18], p. 172)
When Bohr returned from his vacation in Norway, he was not particularly satisfied with Heisenberg’s approach. As Heisenberg recalled: “When Bohr came back I showed him [my] paper... and I showed him Pauli’s reaction. I did realize that Bohr was a bit upset about it because he still felt that it was not quite clear what I had written—not in every way clear. At the same thime he saw Pauli’s reaction, and he knew Pauli was very critical, so he felt it should, in some way, be right.(...) The main point was that Bohr wanted to take this dualism between waves and corpuscles as the central point of the problem. (...) I [on the other hand] would say: ‘We have a consistent mathematical scheme and [it] tells us everything which can be observed. [There is] nothing in nature which cannot be described by this mathematical scheme.’ It was a different way of looking at the problem because Bohr would not like to say that nature imitates a mathematical scheme, that nature does only things which fit into a mathematical scheme.” (W. Heisenberg, AHQP Interview, February 25, 1963, p. 18)
In fact, what Bohr wasn’t willing to accept, and he was followed by Schrödinger and others in this point, was a mathematical formalism that didn’t come accompanied with the complete interpretation, without internal contradictions, etc. Bohr surely thought that this completion for the interpretation would be the result of the necessary inclusion of the wave particle duality. The discussions between Bohr and Heisenberg (assisted by O. Klein) about thought experiments and their interpretation continued for various weeks, frequently leading to misunderstandings. At the end of May, 1927, their points of view converged and after Pauli’s visit to Copenhagen, in July, 1927, harmony between them was restaured. Heisenberg’s interpretation of the dispersion relations (syntax) was the Uncertainty Principle (semantics) and was quickly accepted as the real core of the new theory. For Bohr, on the other hand, Heisenberg’s uncertainty relations were “a confirmation of the conceptions he had been groping for long before Heisenberg derived his principle from the Dirac-Jordan transformation theory. True, Heisenberg’s work prompted Bohr to give his thoughts on complementarity a consistent and final formulation, but these thoughts (...) can be traced back at least to July, 1925.” (Ref. [17], p. 345). In the public domain, Bohr discussed his ideas about complementarity for the first time in a conference given in September 16, 1927 in a congress to commemorate the hundredth anniversary of Alessandro Volta’s death. Bohr emphasized the distinction between a classical description of a natural phenomenon, based upon the hypothesis that it can be observed without being significantly disturbed, and
Historical Background - XIXth Century and Beyond
29
the quantum description of atomic processes, to which a quantum discontinuity should be attributed. He thus considered the causal space time description according to the quantum postulate as follows: “Characteristic of the quantum theory is the acknowledgement of a fundamental limitation in our classical physical ideas when applied to atomic phenomena. Just on account of this situation, however, we meet intricate difficulties when attempting to formulate the contents of the quantum theory in terms of concepts borrowed from classical theories. Still it would appear that the essence of the theory may be expressed through the postulate that any atomic process open to direct observation involves an essential element of discontinuity or rather individuality completely foreign to the classical ideas and symbolized by Planck’s quantum of action. This postulate at once implies a resignation as regards the causal space-time coordination of atomic phenomena.” (Niels Bohr, Manuscript on ‘Fundamental problems of the quantum theory.’ September, 13th, 1927. Bohr Archives: cited in [18], pp. 156, 157).
Thus Bohr arrived at the conclusion that the situation in atomic physics could only be described in terms of dual complementary perspectives that, in classical physics, exclude one another. In Quantum Mechanics, however, the uncertainty relations would guarantee that no contradiction would appear in the application of the Complementary Principle to Nature. The Uncertainty Principle would exclude the possibility in which both the corpuscular and the undulatory behavior would show up simultaneously.
1.6. THE STRUCTURE OF THE 1927 INTERPRETATION: REDUCTION REVISITED The previous section showed the historical steps towards the formulation of the interpretation of 1927 from the point of view of the dispute between corpuscular and undulatory models for light and matter. It seems clear that these steps were the reverberation of the dispute that took place during the XIX 𝑡ℎ century, a dispute that physicists thought solved at the end of that century but whose solution wasn’t able to survive the experiments in which light interacted with matter. This is why Bohr sought for an explanation encompassing the wave-particle duality. There were situations for light and matter in which they may behave in an undulatory or corpuscular manner (but not both), and these experimental situations could not be explained by a reduction at the ontological level–only corpuscles or only waves. Each ontology based solely upon one of these concepts (corpuscles or waves) were not able to pass all the experiments. Thus, only an interpretation that could include both concepts would be adequate to the formalism of Quantum Mechanics. But no classical interpretation would fit this condition, since the corpuscle and wave concepts imply in mutual contradictory characteristics (localized, delocalized, etc.) Then, it follows, quite naturally indeed, that one must break with classical concepts. However, there remains the problem of finding an interpretation that breaks with classical concepts in a coherent and consistent way. Bohr could have broken with the principle of
30
Olavo Leopoldino da Silva Filho
contradiction (in logic6), but this would be impossible to imagine at that point and would be quite desastrous to the making of Physics, one could say. This is the role played by the Complementarity Principle. By introducing time in its corpus, the two mutual contradictory behaviors may be accommodated within a single picture. The logical principle of non contradiction (in an epistemic reading) states that there should be no state of affairs referred to a denotative sentence of the type 𝑝 and not 𝑝 at the same time. Thus, by assuming that the two concepts do not appear at the same time, the contradiction is lifted. The concepts then are complementary in the sense that they exhaust the possibilities for the behavior of physical systems: in set theoretical words, they refer to disjoint behaviors whose union is the universe of all behaviors. This would not be an acceptable solution, however, if one cannot find a syntactical ground for the Complementarity Principle. In fact, if a syntactical foundation of this principle could not be found, the principle would look like simply as a runaway solution, in which one simply gives up trying to find the ontological reduction and assumes that the two perspectives are equally good—time then enters into the solution as only the manifestation of the compromise to the principle of non contradiction (this was the claim of Schrödinger cited before). This could be thought as the reason why Bohr, that had been thinking about the Complementarity Principle since 1925, had announced it only when he had at hands the (syntactical) dispersion relations and the (semantic) Uncertainty Principle7. There are three major ways (at least) by which the (syntactic) dispersion relations can be (semantically) interpreted: the ontological, the epistemological and the statistical ways[19]: Claim 1 (ontological) The kinematical (conjugate) quantities as position and momentum cannot simultaneously exist with well-defined values—that is, physical entities do not have well-defined position and momentum (or other conjugate kinematical quantity). Heisenberg’s relations should be understood as “determining the maximum precision in the definition of the energy and momentum of the physical entities related to the undulatory field.”[20] Claim 2 (epistemological) The kinematical (conjugate) quantities as position and momentum cannot simultaneously be experimentally determined. Any use of the words ‘position’ and ‘velocity’ implying more precision than the one given by the dispersion relations is as much deprived of meaning as words whose meaning were not defined.[21] Claim 3 (statistical) Heisenberg’s dispersion relations regarding conjugate kinematical quantities imply that there is a minimum value for the product of the statistical dispersions in the measurements of these quantities.[22] The first claim assumes duality as inscribed in nature itself, the second and third assume duality as the outcome of experimental (epistemological) possibilities. However, the second claim assumes that duality may appear for a single system, while the last claim states that it is 6
Nowadays there are logics that internally accept contradiction, as the paraconsistent logics of Newton da Costa. Such logics were unavailable at the time of Bohr. 7 It is another problem, of course, to accept that the dispersion relations and its interpretation in terms of the Uncertainty Principle do give grounds to the Complementary Principle. The connection seems feeble, to say the least.
Historical Background - XIXth Century and Beyond
31
the outcome of a number of measurements. The statistical claim is congenial to an ontology based upon only particles while the first claim is committed to an ontology of both particles and waves (or something else that could behave as particle or wave, depending upon the situation)—the second claim has no ontological commitment. One can find problems in each one of these claims, but we do not intend to proceed with this line of thought (at this point). As a last comment, we have said that the corpuscular interpretation of Born was blocked by the fact that there were experimental situations in which the behavior of matter should be explained only recurring to wavelike description (diffraction and interference experiments, for instance). This is an epistemic blocking, obviously, in such a way that if one finds a manner to explain, say, the so-called wavelike experiments using only corpuscular constructs, the blocking is dissolved. The fact that single systems may behave in a way explainable by wavelike constructs sets an epistemic blocking to the third claim above mentioned, which is compatible with corpuscular ontological reductions. If one can show that interference and diffraction wavelike interpretations may be replaced by corpuscular interpretations and that there is a way to explain the wavelike behavior (epistemology) of single systems even assuming them as being (ontology) corpuscular, then this would represent an enormous ontological reduction. It is important to stress that all individual measurements of physical systems always furnish a corpuscular pattern (for instance, an electron is never measured as a wave in a single instant of time–the instant of measurement). This is why it was necessary to postulate, later on, the axiom about the reduction of the wave-packet for some approaches using the notion of waves in an essential manner (epistemological and ontological approaches). On the other hand, the explanation of a typical “undulatory” phenomenon was available since 1923, presented (independently) by Compton and Duane in the same journal[23]. In their paper, these authors show how it would be possible to explain the pattern obtained in Bragg-Laue diffraction experiments, assuming a corpuscular nature for the physical entities in a physical phenomenon and the quantization of momentum exchange between these particles and a crystal. We will show in a future chapter that the same lines may be followed to explain all other ‘undulatory’ phenomena. The reasons why this path was not chosen at the time can be the subject of some investigation on the sociology of sciences and will not be of our concern here. The fact is that other physicists tried to follow such path later on having, however, little success[24]—showing that this is not an easy task. This, in fact, is something worth stressing: sometimes one formalism may take us into the correct solution in ways much easier than another formalism. Thus, the wave formalism gives us the solution for interference and diffraction problems without any need to consider the actual physical interactions taking place in the slit(s) – the calcuations are just a matter of analysing phase differences. If any corpuscular formalism based on the exchange of momentum at the slits is to be considered, the actual way by which this exchange of momentum takes place must be taken explicitly into account. It is natural, then, that one formalism takes precedence over the other until the latter shows, in its own terms, the same success. The last epistemic blocking, regarding the fact that single systems behave in a way explainable by wave-like constructs will also be addressed. The specific strategy would be to insist in the difference between behaving (which needs a time interval to be characterized) and being (which can be defined at an instant of time). We will show (by formal approach,
32
Olavo Leopoldino da Silva Filho
computational simulation, etc.) that it is possible for a particle to be a corpuscle at each instant of time and also to behave as a wave during some time-interval (the ergodic hypothesis will play a major role here). This mimics Bohr’s solution (inclusion of time to dissolve contradiction) but with complete different intentions and results. While Bohr’s solution implies the ontological commitment with physical entities being both corpuscles and waves (or something else that can show a behavior in both ways), the present approach will show itself as committed with physical entities being only corpuscles that can behave in both ways. This approach will, then, produce an enormous reduction of the ontological entities needed in the interpretation of Quantum Mechanics and, thus, will show itself much simpler: no complementarity principle needed, no duality principle, no need for ontological uncertainty, no need for observers, no need for reduction of the wave-packet, etc. In fact, the only ontological constructs that will be needed are corpuscles and fluctuations, the latter being a usual statistical concept, the overall interpretation being given only in terms of usual classical concepts and statistical constructs, like “correlation”. This is not the moment, however, to spend an enormous amount of time with pseudophilosophical discussions about these issues. Physicist-like pseudo-philosophical erudition is quite frequently too close to a fraud and citations of Kant or Heidegger in physics discussions are usually quite upsetting, when not only boring. The rest of this book is intended precisely to show, beyond reasonable doubt, with the tools of physical and mathematical reasoning, that the epistemic blocking above mentioned may be removed. This necessarily implies in a reduction of the ontological commitments, a simpler and visualizable approach and the abandon of almost all the constructs (such as “duality”, “conscious observers”, “complementarity”, “uncertainty”, etc.) that lent Quantum Mechanics its (manytimes celebrated) weirdness. The rest comes with the principle of simplicity that pervades Physics since Galilee. At the end of the book the reader will be capable to understand quantum phenomena only with the constructs usual to classical physics and the notion of randomness. The syntax of Quantum Mechanics will be kept invariant, although other formal developments will be made, but semantics will completely change by means of the mentioned reductions. These reductions at the semantic level (along with the principle of simplicity) are the ones responsible for the (expected and intended) coercitive character of the present approach. But coercion here should not be understood as meaning a lack of freedom; quite the opposite: it should be thought of as freedom in its utmost degree, for it is refrained only by intellectual honesty, which is freedom itself.
Chapter 2
THE CHARACTERISTIC FUNCTION DERIVATION As we have seen, Quantum Mechanics, both at the syntactic and semantic levels, was initially the result of very unsystematic and different trials. This is, indeed, what one should expect for the birth of a physical theory. It was not different with Electromagnetic theory or with the theory of Relativity. They all arose from trials coming from different assumptions, perspectives, etc. and were given their mature form afterwards, when the work of a Maxwell or an Einstein purged the theories of unnecessary elements. Mechanics, to cite the most famous example, had a Galilee and a Newton – the former received Middle Age’s unsystematic and disperse results already defiant of Aristotle’s physics and freed Mechanics from Aristotelian chains, the latter put it into its mature form. Coincidently or not, the cleanse produced by the examples above mentioned were all due to the ability of those great scientists to write down the theories in an axiomatic format. Of course, the axiomatization of a physical theory cannot be a sufficient condition for its cleansing, for the axioms themselves may contain many unnecessary elements. In any case, it is a very good point of departure, since in an axiomatized theory all its syntactical and semantic elements are contained (as an embryo) in the axioms. Furthermore, if these axioms are based on results of another established physical theory, then the derived theory inherits the interpretation of its symbols from the interpretation of the “parent” theory with absolute certainty. Quantum Mechanics has already been put into an axiomatic format. However, the fact that the Schrödinger equation is one of its postulates hinders some of the developments that, one may think, would be possible if this equation were mathematically derived from other more basic and well known equations. In this chapter we will try to show that it is possible to make such an axiomatization of Quantum Mechanics, beginning with the mathematical derivation of the Schrödinger equation. The way by which we make this axiomatization will be the source of innumerous consequences that will be worked out in the following chapters.1
1
Most part of the developments of this chapter was published in the papers: L.S.F. Olavo, Foundations of Quantum Mechanics: non-relativistic theory, Physica A 262, 197-214 (1999), and L.S.F. Olavo, Foundations of Quantum Mechanics (II): equilibrium, Bohr-Sommerfeld rules and duality, Physica A 271, 260-302 (1999). Some results were changed by the kind of hindsight produced by the author's own maturation process.
34
Olavo Leopoldino da Silva Filho
2.1. THE AXIOMS AND THE FORMALISM We begin our axiomatization of Quantum Mechanics by presenting the axioms and showing that they allow us to mathematically derive the Schrödinger equation. Axiom 1 For an isolated system, the joint phase-space probability density function related to any Quantum Mechanical phenomenon obeys the Liouville equation 𝑑𝐹(𝑞,𝑝;𝑡) 𝑑𝑡
= 0;
(2.1)
Axiom 2 The infinitesimal Fourier transformation, defined by +∞
𝑍(𝑞, 𝛿𝑞; 𝑡) = ∫−∞ 𝐹(𝑞, 𝑝; 𝑡)𝑒 𝑖𝑝𝛿𝑞/ℓ 𝑑𝑝,
(2.2)
where ℓ is a universal parameter with dimensions of angular momentum, can be applied to the description of a non-relativistic quantum system, and the function 𝑍(𝑞, 𝛿𝑞; 𝑡) can be written as 𝑍(𝑞, 𝛿𝑞; 𝑡) = 𝜓 ∗ (𝑞 −
𝛿𝑞 2
; 𝑡) 𝜓 (𝑞 +
𝛿𝑞 2
; 𝑡).
(2.3)
𝑍(𝑞, 𝛿𝑞; 𝑡) is then a characteristic function. Before we proceed with the derivation of the Schrödinger equation from these two axioms, two comments are in order: firstly, one may be quite astonished with the first axiom, for everyone knows that the Liouville equation is an equation of classical physics. Amazing as it may be, an axiom must be judged by its capability of endowing us with the tools to arrive at our final results; this means that the reader should suspend possible amazements until the consequences of the axiom are worked out (in connection with the other axioms of the approach) to see if the axiom in question can really deliver what is expected from it (the derivation of the Schrödinger equation). Secondly, one may be asking why the mere use of a Fourier transform is put as an axiom of the approach. The fact is that, in the deveopment of our derivation, we will make the assumption that the quantity 𝛿𝑞 is infinitesimal (that’s why we called the transformation infinitesimal). We will also assume the decomposition shown in (2.3), which may be quite restrictive. The nature of this infinitesimal quantity and its significance, although addressed in the next chapters, will be clarified only in the fifth and sixth chapters of this book, when we will show its connection to the Central Limit Theorem and to Langevin equations, respectively. That being said, we return to our derivation. Equation (2.1) may be written as 𝑑𝐹(𝑞, 𝑝; 𝑡) ∂𝐹 𝑝 ∂𝐹 ∂𝑉 ∂𝐹 = + − = 0, 𝑑𝑡 ∂𝑡 𝑚 ∂𝑞 ∂𝑞 ∂𝑝
The Characteristic Function Derivation
35
where it is already assumed that the underlying forces may be written as the gradient of a potential function (calculations will be done in one-dimension for clarity—all the results may be easily generalized to more dimensions). We now apply transformation (2.2) to this last equation and use the fact that 𝐹(𝑞, 𝑝; 𝑡) is a probability density function to put 𝑝=+∞
{𝐹(𝑞, 𝑝; 𝑡)𝑒 𝑖𝑝𝛿𝑞/ℓ }𝑝=−∞ = 0, to arrive at the equation −
ℓ2
∂2 𝑍
+ 𝛿𝑞
𝑚 ∂𝑞 ∂(𝛿𝑞)
∂𝑉 ∂𝑞
𝑍 = 𝑖ℓ
∂𝑍 ∂𝑡
,
(2.4)
that is the differential equation for the characteristic function 𝑍(𝑞, 𝛿𝑞; 𝑡). We can write the function 𝑍(𝑞, 𝛿𝑞; 𝑡) in terms of the functions 𝜓(𝑞; 𝑡) as in (2.3) and expand it up to second order to find, putting 𝜓(𝑞; 𝑡) = 𝑅(𝑞; 𝑡)𝑒 𝑖𝑆(𝑞;𝑡)/ℓ ,
(2.5)
since 𝜓(𝑞; 𝑡) is, in general, a complex function, the result 𝛿𝑞 2
∂2 𝑅
∂𝑅 2
𝑖𝛿𝑞 ∂𝑆
𝑍(𝑞, 𝛿𝑞; 𝑡) = {𝑅(𝑞, 𝑡)2 + ( 2 ) [𝑅(𝑞; 𝑡) ∂𝑞2 − (∂𝑞 ) ]} exp (
ℓ ∂𝑞
).
(2.6)
It is this expansion only up to second order that gives the second axiom its “infinitesimal” character and the prescription to stop at the second order is the important content of this axiom. Now we put expression (2.6) into (2.4) and separate the real and imaginary terms to find equations ∂𝑅2 ∂𝑡
+
∂ ∂𝑞
𝑅(𝑞;𝑡)2 ∂𝑆(𝑞,𝑡)
[
𝑚
∂𝑞
]=0
(2.7)
and 𝑖
𝛿𝑞 ∂
∂𝑆
[
ℓ ∂𝑞 ∂𝑡
+
1
∂𝑆 2
ℓ2
∂2 𝑅
( ) + 𝑉(𝑞) − 2𝑚𝑅(𝑞;𝑡) ∂𝑞2 ] = 0.
(2.8)
2𝑚 ∂𝑞
The first equation may be identified as a continuity equation, precisely by the kind of semantic inheritance to which we have already referred. Indeed, since we have written the characteristic function as in (2.3) and also put 𝜓 as in (2.5), we immediately find that +∞
𝑅(𝑞; 𝑡)2 = lim 𝑍(𝑞, 𝛿𝑞; 𝑡) = 𝜓 ∗ (𝑞; 𝑡)𝜓(𝑞; 𝑡) = ∫ 𝛿𝑞→0
𝐹(𝑞, 𝑝; 𝑡)𝑑𝑝,
−∞
which must be a probability density function defined upon configuration space, since 𝐹(𝑞, 𝑝; 𝑡) is a probability density function defined upon phase space—this means that 𝜓(𝑞; 𝑡)
36
Olavo Leopoldino da Silva Filho
must be a probability amplitude, an interpretation inherited from the axioms. It is also easy to show that +∞ 𝑅(𝑞; 𝑡)2 ∂𝑆(𝑞, 𝑡) 𝑖ℓ ∂𝑍(𝑞, 𝛿𝑞; 𝑡) 𝑝 = − lim =∫ 𝐹(𝑞, 𝑝; 𝑡)𝑑𝑝, 𝑚 ∂𝑞 𝑚 𝛿𝑞→0 ∂(𝛿𝑞) −∞ 𝑚
which gives equation (2.7) its unambiguous interpretation as a continuity equation. The second equation has a derivation with respect to 𝑞 and thus may be written as ∂𝑆 ∂𝑡
1
∂𝑆 2
ℓ2
∂2 𝑅
+ 2𝑚 (∂𝑞) + 𝑉(𝑞) − 2𝑚𝑅(𝑞;𝑡) ∂𝑞2 = 𝑓(𝑡),
(2.9)
in which the function 𝑓(𝑡) is arbitrary. Since we can redefine 𝑆(𝑞; 𝑡) as 𝑡
𝑆 ′ (𝑞; 𝑡) = 𝑆(𝑞; 𝑡) + ∫ 𝑓(𝑡 ′ )𝑑𝑡 ′ 0
to cancel out the right hand side of the previous equation, we may just consider that 𝑓(𝑡) = 0 without loss of generality. However, equations (2.9) with 𝑓(𝑡) = 0 and (2.7) are fully equivalent to the Schrödinger equation −
ℏ2 ∂2 𝜓 2𝑚 ∂𝑞 2
+ 𝑉(𝑞)𝜓(𝑞; 𝑡) = 𝑖ℏ
∂𝜓(𝑞;𝑡) ∂𝑡
,
(2.10)
since if we replace the definition (2.5) in (2.10) and collect the real and imaginary terms (and make ℓ = ℏ to “discover” the value of our universal parameter2), we also arrive at the same results (2.7) and (2.9). This is the complete derivation and it does not depend upon any kind of abstruse mathematics, although the nature of the expansion up to second order must still be clarified. As a first tip on what this second order expansion means, one is referred to the works of Smoluchovski on the description of physical systems presenting fluctuations, or traditional derivations of the random walk problem, where it is used exactly the same approach, using the same second order expansion, with differences on the adopted axioms[25]. There appeared in the literature a controversy about the equivalence between equations (2.7) and (2.9) with equation (2.10). The argument arose[26] within the realm of the Madelung explanations leading to Bohm’s expressions which are, essentially, (2.7) and (2.9). According to[26]: “Once we allow 𝑆 to be many valued, however, there is nothing in the Madelung equations to constrain 𝜓 = 𝑅(𝑞; 𝑡)𝑒 𝑖𝑆(𝑞;𝑡) to be single valued. For a generic solution to these equations, it will not be, and the connection to the Schrödinger equation breaks down. (...) In the context of the Madelung equations [(2.7) and (2.9)], there is no 2
One should remember that no derivation process can simply “find” the universal parameter of a theory. A similar situation can be found in Gravitation (where 𝐺 is experimentally obtained) or Electromagnetism.
The Characteristic Function Derivation
37
requirement that 𝜓 [constructed after we have solved (2.7) and (2.9)] be single valued.” (See [26], p. 1614)
The author then present the analysis of a simple problem of a particle in a well-behaved two-dimensional central potential; the Schrödinger equation implies that 𝑅𝑎 (𝑟) is the solution of the radial equation [−
1 𝑑2 𝑎 + 2 + 𝑉(𝑟)] 𝑅𝑎 (𝑟) = 𝐸𝑎 𝑅𝑎 (𝑟). 2 2 𝑑𝑟 2𝑟
He thus concludes: “If we insist that 𝜓 be single valued, we have that 𝜓𝑎 (𝑟, 𝜙) = 𝜓𝑎 (𝑟, 𝜙 + 2𝜋𝑗) (𝑗 an integer), and this implies that 𝑎 must be an integer. And indeed, all of the solutions 𝜌𝑎 = |𝑅𝑎 (𝑟)|2 and 𝑣𝑎 = 𝑎𝜙̂/𝑟 [where 𝑣𝑎 = 𝛻𝑆𝑎 ] satisfy the Madelung equations for the potential 𝑉, regardless of whether 𝑎 is an integer. (...) The solutions of the [Madelung] equations contain the solutions of the Schrödinger equation as a proper subset, and smoothly interpolate between them. Clearly (𝜌𝑎 , 𝑣𝑎 ) only corresponds to a single-valued solution of the Schrödinger equation when 𝑎 is an integer. The angular momentum is 𝑎, which takes on a continuum of values, and the energy can also be shown to assume a continuum of values.” (See [26], pp. 1614-1615)
The problem is with the fact that the Schrödinger equation, whenever 𝜓 is considered as a probability amplitude, implies the fact that 𝜓 must be single valued, imposing restrictions upon the identification of the mean value of 𝑝 with the derivative ∂𝑆(𝑞)/ ∂𝑞. Since in the Madelung approach the equations (2.7) and (2.9) are the primary equations there is no reason to even consider this function 𝜓, and thus to impose on it an interpretation of a probability amplitude, the argument goes. It is quite obvious that this argument doesn’t apply to the present derivation, since the symbol 𝜓 inherited its interpretation of a probability amplitude from the axioms and thus it inherited also its single valued character. The Madelung equation (2.9), at which we arrive, is only another representation of the Schrödinger equation.
2.1.1. Heisenberg Dispersion Relations From the previous derivation, we must find the Heisenberg relations as a simple theorem. Note that 𝑍(𝑞, 𝛿𝑞; 𝑡) is a characteristic function depending upon two variables. It is quite simple to define the position and momentum operators related to the definition of this function. After all, characteristic functions are introduced as statistical tools exactly to allow one to be capable of writing these statistical moments in terms of operators. We have seen that (we put ℓ = ℏ from now on) +∞ ∂𝑍(𝑞, 𝛿𝑞; 𝑡) = ∫ 𝑝𝐹(𝑞, 𝑝; 𝑡)𝑑𝑝 𝛿𝑞→0 ∂(𝛿𝑞) −∞
−𝑖ℏ lim
38
Olavo Leopoldino da Silva Filho
and it is quite obvious that +∞
𝑞𝑍(𝑞, 𝛿𝑞; 𝑡) = ∫
𝑞𝐹(𝑞, 𝑝; 𝑡)𝑑𝑝,
−∞
and we are again using the semantic interpretation of the quantities inherited from the Liouville equation. Thus, with respect to the characteristic function 𝑍(𝑞, 𝛿𝑞; 𝑡), the momentum and position operators are 𝑝̂ ′ = −𝑖ℏ lim
∂
𝛿𝑞→0 ∂(𝛿𝑞)
(2.11)
and 𝑞̂′ = 𝑞.
(2.12)
Note, however, that these two operator commute with each other, that is [𝑝̂ ′ , 𝑞̂′ ] = 0, since 𝑞 and 𝛿𝑞 are independent variables. Thus, we have immediately that Δ𝑞′ Δ𝑝′ ≥ 0, where Δ𝑞 and Δ𝑝 are the dispersions (mean square deviations) related to the two variables 𝑞 and 𝑝, respectively. However, we may also ask what would be the representation of the operators 𝑝̂ and 𝑞̂ with respect to the probability amplitudes 𝜓(𝑞; 𝑡). From the two expressions (2.11) and (2.12) we easily find that we must have 𝑝̂ = −𝑖ℏ
∂ ∂𝑞
; 𝑞̂ = 𝑞
(2.13)
(just expand 𝑍(𝑞, 𝛿𝑞, 𝑡) in 𝛿𝑞 and derive it as in (2.11) to find the result). However, with these expressions, the two operators no longer commute, for we have [𝑝̂ , 𝑞̂] = 𝑖ℏ,
(2.14)
which are Heisenberg’s dispersion relations. This last result implies, as everyone knows, that Δ𝑞Δ𝑝 ≥ ℏ/2.
(2.15)
These last arguments show that the possibility of writing the characteristic function 𝑍(𝑞, 𝛿𝑞; 𝑡) as the product shown in (2.3) is not devoid of an important physical content, since
The Characteristic Function Derivation
39
it would imply the transition from a dispersionless formalism (in two variables 𝑞, 𝛿𝑞) to a formalism with a minimum nonzero dispersion (in one variable 𝑞). Thus, quantum phenomena must be those obeying the Liouville equation [but integrated in momentum space, because of the use of the transformation (2.2)] in which the characteristic function may be written as in (2.3), that is, in which there are unavoidable dispersions in the kinematic variables. The full interpretation of this result is the objective of this book. The rest of the formal apparatus of Quantum Mechanics follows naturally from the Schrödinger equation and does not need to be developed here. The mathematical fact that the Schrödinger equation was derived from a classical equation (the Liouville equation) opens a wide avenue to an interpretation in complete disagreement with the Heisenberg-Bohr assumption that classical concepts should be abandoned. Moreover, the derivation shows that Born’s interpretation of 𝜓 is a direct consequence of the axioms (by inheritance of the interpreted symbols) and it is unambiguous that the theory refers to particles, since this is the content of the function 𝐹(𝑞, 𝑝; 𝑡). This also means at this point that the theory must also be considered as a theory for ensembles (but we will show in the next chapter and, mainly in chapter six, that this is not the whole story). However, since we are at the syntactical level, the epistemic blocking we have already mentioned in the previous chapters imposed by the “wave-like” experiments of interference and diffraction still persists and must be considered. Since these phenomena are completely explainable from the perspective of the Schrödinger equation, which is the natural result of the mathematical derivation from a particle-like equation, they must have a corpuscular interpretation. Despite of this obvious conclusion, we must find it (physics is not a mere mathematical theory—to know only that a solution exists means nothing but a psychological relief to us). The derivation took us from a corpuscular equation (Liouville’s) to an undulatory one (Schrödinger’s) and the answer to the dissolution of the duality problem, that we showed was a milestone of the assumptions of Bohr and Heisenberg, must be somewhere in between the steps of the mathematical derivation. We will present these corpuscular interpretations later on in this chapter but a full understanding of this transition will be the subject of the whole book. Before passing to this explanation of the “wave-like” experiments, we will address one extremely important problem of Quantum Mechanics that has endured throughout the years (to most physicists) as irrelevant. It is the problem of quantizing in generalized curvilinear orthogonal coordinates (quantization in the whole book would mean the transition from the Liouville to the Schrödinger equation). We show in the next section that, as expected, to quantize a quantum problem in generalized orthogonal curvilinear coordinates one simply has to write the axioms in the desired coordinate system and proceed as above with the mathematics. We hope that this application of the formalism, with its non-trivial algebraic aspects, will impress favorably the reader as to the soundness of the previous derivation.
40
Olavo Leopoldino da Silva Filho
2.2. QUANTIZATION IN SPHERICAL COORDINATES: AN EXAMPLE It is almost a shame that to quantize a system (to write down its Schrödinger equation) according to some orthogonal coordinate system one has first to write down its Schrödinger equation in Cartesian coordinates and then change to the desired orthogonal system. This would imply the embarrassing conclusion that all the formalism depends upon an arbitrary coordinate system, which is preposterous. There had been trials in the literature to overcome these difficulties[27],[28],[29], but even these approaches are permeated with additional suppositions as in [27] and [28], where the author has to postulate that the total quantum-mechanical momentum operator 𝑝𝑞𝑖 corresponding to the generalized coordinate 𝑞𝑖 is given by ∂
𝑝𝑞𝑖 = −𝑖ℏ ∂𝑞
(2.16)
𝑖
and also has to write the classical Hamiltonian (the kinetic energy term) as 𝐻=
1 2𝑚
∑𝑖𝑘 𝑝𝑞∗𝑖 𝑔𝑖𝑘 𝑝𝑞𝑘 .
(2.17)
These approaches seem rather unsatisfactory for we would like to derive our results using only first principles, without having to add more postulates to the theory. This problem may appear because Quantum Mechanics, as developed in textbooks, is not a theory with a clearly discernible set of axioms[30]. Indeed, the rules (2.16) and (2.17) could be considered as part of the fuzzy set of axioms one could append to Quantum Mechanic’s principles. On the other hand, if we do have an axiomatic approach, since every formal aspect of the theory shoud be contained in the axioms (or else they are not a complete set of axioms), the problem of quantization in generalized curvilinear orthogonal coordinate systems must also be contained in the axioms in such a way that, having written the axioms in the desired system, one must find the Schrödinger equation in this same coordinate system. This is an imposition (and in fact quite a strong one) upon the set of axioms. For the sake of clarity and simplicity, we will begin our exposition on quantization in generalized orthogonal coordinates with an example on spherical coordinate quantization. We then generalize our results in the next section to any orthogonal coordinate system. We will suppose a spherical potential, but this is irrelevant to the mathematical derivations. We begin rewriting our two axioms in the appropriate coordinate system as: Axiom 1 For an isolated system, the joint phase-space probability density function related to any Quantum Mechanical phenomenon obeys the Liouville equation 𝑑𝐹(𝑟,𝜃,ϕ,𝑝𝑟 ,𝑝𝜃 ,𝑝ϕ ;𝑡) 𝑑𝑡
= 0;
(2.18)
Axiom 2 The infinitesimal Fourier transformation, defined as +∞
𝑍(𝑟⃗, 𝛿𝑟⃗; 𝑡) = ∫−∞ 𝐹(𝑟, 𝜃, ϕ, 𝑝𝑟 , 𝑝𝜃 , 𝑝ϕ ; 𝑡)𝑒 𝑖𝑝⃗⋅𝛿𝑟⃗/ℏ 𝑑3 𝑝,
(2.19)
The Characteristic Function Derivation
41
can be applied to the description of any non-relativistic quantum system, and the function 𝑍(𝑟⃗, 𝛿𝑟⃗; 𝑡) can be written as 𝑍(𝑟⃗, 𝛿𝑟⃗; 𝑡) = 𝜓 ∗ (𝑟⃗ −
𝛿𝑟⃗ 2
𝛿𝑟⃗
; 𝑡) 𝜓 (𝑟⃗ +
2
; 𝑡).
(2.20)
𝑍(𝑟⃗, 𝛿𝑟⃗; 𝑡) is then a characteristic function. The classical Hamiltonian in spherical coordinates is given by 𝐻=
𝑝ϕ2 1 𝑝𝜃2 (𝑝𝑟2 + 2 + 2 2 ) + 𝑉(𝑟) 2𝑚 𝑟 𝑟 sin 𝜃
and the Liouville equation becomes ∂𝐹 ∂𝑡
𝑝 ∂𝐹
𝑝
𝑝ϕ
∂𝐹
∂𝐹
+ 𝑚𝑟 ∂𝑟 + 𝑚𝑟𝜃2 ∂𝜃 + 𝑚𝑟 2 sin2 𝜃 ∂ϕ −
∂𝑉
2 𝑝ϕ
𝑝2
∂𝐹
2 𝑝ϕ
∂𝐹
.
(2.21)
( ∂𝑟 − 𝑚𝑟𝜃3 − 𝑚𝑟 3 sin2𝜃) ∂𝑝 + 𝑚𝑟 2 sin2 𝜃 cot𝜃 ∂𝑝 = 0 𝑟
𝜃
Now, the infinitesimal transformation in (2.19) can be easily constructed. Note that we have ̂, 𝛿𝑟⃗ = 𝛿𝑟𝑟̂ + 𝑟𝛿𝜃𝜃̂ + 𝑟sin𝜃𝛿ϕϕ ̂ ) are the unit normals, and where (𝑟̂ , 𝜃̂ , ϕ 𝑝⃗ = 𝑝𝑟 𝑟̂ +
𝑝ϕ 𝑝𝜃 ̂, 𝜃̂ + ϕ 𝑟 𝑟sin𝜃
giving 𝑝⃗ ⋅ 𝛿𝑟⃗ = 𝑝𝑟 𝛿𝑟 + 𝑝𝜃 𝛿𝜃 + 𝑝ϕ 𝛿ϕ
(2.22)
which is a general feature of what is called Mathieu’s transformations (see Ref. [31], pp. 201204), that form a subset of the cannonical transformations. Point transformations are a particular case of Mathieu’s transformations—we will return to this topic later on in this chapter. The relation between the momenta in Cartesian and spherical coordinates is given by 𝑝𝑥 = 𝑝𝑟 sin𝜃cosϕ + 𝑝𝑦 = 𝑝𝑟 sin𝜃sinϕ + 𝑝𝑧 = 𝑝𝑟 cos𝜃 −
𝑝𝜃 𝑟
𝑝𝜃 𝑟 𝑝𝜃 𝑟
cos𝜃cosϕ − cos𝜃sinϕ +
𝑝ϕ sinϕ
𝑟 sin𝜃 𝑝ϕ cosϕ 𝑟 sin𝜃
,
sin𝜃
and the Jacobian relating the two volume elements is given by
42
Olavo Leopoldino da Silva Filho dpx dpy dpz = ‖J‖pdpr dpθ dpϕ
and thus 1
‖𝐽‖𝑝 = 2 𝑟 sin𝜃
(2.23)
means that the Jacobian must be given by the previous expression in spherical coordinates.) With results (2.22) and (2.23) we may write 𝑑𝑝𝑟 𝑑𝑝𝜃 𝑑𝑝ϕ 𝑖 𝑍(𝑟⃗, 𝛿𝑟⃗; 𝑡) = ∫ 𝐹(𝑟⃗, 𝑝⃗; 𝑡)exp [ (𝑝𝑟 𝛿𝑟 + 𝑝𝜃 𝛿𝜃 + 𝑝ϕ 𝛿ϕ)] . ℏ 𝑟 2 sin𝜃 We now impose this transformation upon the Liouville equation (2.21) to find3 −
ℏ2
[
1 ∂
𝑟2
∂𝑍
1
∂
∂𝑍
1
∂2 𝑍
(𝑟 2 ∂(𝛿𝑟)) + 𝑟 2 sin𝜃 ∂𝜃 (sin𝜃 ∂(𝛿𝜃)) + 𝑟 2 sin2 𝜃 ∂ϕ ∂(𝛿ϕ)] + ∂𝑟
𝑚 ℏ2 𝛿𝑟 ∂2 𝑍
∂2 𝑍
𝛿𝑟
𝛿𝜃cot𝜃
∂2 𝑍
∂𝑉
∂𝑍
.
(2.24)
[ + 𝑟 3 sin2 𝜃 ∂(𝛿ϕ)2 + 𝑟 2 sin2𝜃 ∂(𝛿ϕ)2] + 𝛿𝑟 ∂𝑟 𝑍 = 𝑖ℏ ∂𝑡 𝑚 𝑟 3 ∂(𝛿𝜃)2
To proceed with the calculations we must write 𝑍(𝑟⃗, 𝛿𝑟⃗; 𝑡) in spherical coordinates. We know that it must be written in Cartesian coordinates as R
∂2 R
4
∂xi ∂xj
Z(r⃗, δr⃗; t) = {R2 + ∑3i,j=1 δxi δxj i
∂S
∂S
1
∂R ∂R
4
∂xi ∂xj
− ∑3i,j=1 δxi δxj
∂S
}× ,
exp [ (δx + δy + δz )] ℏ ∂x ∂y ∂z where we used the fact that the 𝜓(𝑟⃗; 𝑡) can be written as in (2.5) and also the fact that 1
1 sinϕ
∂x = sinθcosϕ ∂r + r cosθcosϕ ∂θ − r sinθ ∂ϕ 1
1 sinϕ
r
r sinθ
∂y = sinθsinϕ ∂r + cosθsinϕ ∂θ +
∂ϕ ,
1
∂y = cosθ ∂r − r sinθ ∂θ where ∂𝑢 is an abbreviation for ∂/ ∂𝑢. Thus, in spherical coordinates, the characteristic function becomes
3
The calculations regarding the results to follow are quite simple, involving only derivations, sums, etc. However, their amount is so big that the results call for the use of algebraic computation. We use it and the development of the whole calculation was made using the algebraic computation program Maple®.
The Characteristic Function Derivation 𝑅
∂2 𝑅
4 ∂𝑅
∂𝑟 2
𝑍(𝑟⃗, 𝛿𝑟⃗; 𝑡) = {𝑅 2 + [𝛿𝑟 2 𝛿ϕ2 (
∂2 𝑅
∂ϕ2
+ 𝑟sin2 𝜃
∂2 𝑅
∂𝑟
∂2 𝑅
+ 𝛿𝜃 2 (
∂𝜃 2
+ cos𝜃sin𝜃
+𝑟
∂𝑟
)+ ∂2 𝑅
∂𝑅
1 ∂𝑅
) + 2𝛿𝑟𝛿𝜃 (∂𝑟 ∂𝜃 − 𝑟 ∂𝜃) + ∂𝜃
∂2 𝑅
1 ∂𝑅
∂𝑅
∂𝑅
2𝑟𝛿ϕ (∂𝑟 ∂ϕ − 𝑟 ∂ϕ) + 2𝛿𝜃𝛿ϕ (∂𝜃 ∂ϕ − cot𝜃 ∂ϕ)] − 2 1 2 ∂𝑅 [𝛿𝑟 ( ) 4 ∂𝑟 𝛿𝑟𝛿ϕ ∂𝑅 ∂𝑅
2
𝑟sin𝜃 ∂𝑟 ∂ϕ
+
𝛿𝜃 2
∂𝑅 2
𝛿ϕ2
( ) +
∂𝑅 2
( ) +2
𝑟 2 ∂𝜃 𝑟 2 sin2 𝜃 ∂ϕ 𝛿𝜃𝛿ϕ ∂𝑅 ∂𝑅 𝑖
+2
43
𝛿𝑟𝛿𝜃 ∂𝑅 ∂𝑅
∂𝑆
𝑟
∂𝑟 ∂𝜃 ∂𝑆
.
(2.25)
+ ∂𝑆
]} exp [ (𝛿𝑟 + 𝛿𝜃 + 𝛿ϕ )] ℏ ∂𝑟 ∂𝜃 ∂ϕ
𝑟 2 sin𝜃 ∂𝜃 ∂ϕ
Substituting this expression into (2.24) and collecting zeroth and first order terms in the infinitesimal elements, we find the following two equations ∂
∂𝑆
1
ℏ2
𝛿𝑟⃗ ⋅ ∂𝑟⃗ [ ∂𝑡 + 2𝑚 (∇𝑆)2 + 𝑉(𝑟) − 2𝑚𝑅 ∇2 𝑅] = 0
(2.26)
and ∂𝑅 2 𝑅2 + ∇ ⋅ ( ∇𝑆) = 0, ∂𝑡 𝑚 all written in spherical coordinates (the gradient and the Laplacian differential operators). It is then possible to show that we have the equivalence of these two last equations with the Schrödinger equation given by −
ℏ2 2 ∂𝜓(𝑟⃗, 𝑡) ∇ 𝜓(𝑟⃗, 𝑡) + 𝑉(𝑟)𝜓(𝑟⃗, 𝑡) = 𝑖ℏ , 2𝑚 ∂𝑡
also written in spherical coordinates. To see this, one needs only to write the last equation in spherical coordinates, write 𝜓 = 𝑅(𝑟⃗, 𝑡)exp(𝑖𝑆(𝑟⃗, 𝑡)/ℏ), substitute this result in the pervious equation and subtract the overall expression from the one coming from (2.26). This ends our derivation. The important thing to note here is the non-trivial algebraic relations involved in the derivation. Equation (2.24) is already very complicated and the substitution of the extremely complicated expression (2.25) turns the problem into a very long and intricate (although direct) algebraic problem (this is why we have used algebraic computation—this is an example of a problem in which it is very easy to make a fool algebraic mistake). It would be an extravagance to believe that the fact that the derivation was successful in this case (as it will be shown with general orthogonal curvilinear coordinate systems in the next section) is simply a matter of coincidence. Our confidence in the derivation method and the axioms should increase with the success of this application. The reader must also recognize that the use of the so-called infinitesimal coordinates was essential to the derivation, a fact that becomes obvious from expression (2.22).
44
Olavo Leopoldino da Silva Filho
2.3. QUANTIZATION IN GENERALIZED COORDINATES To study the general case we write the transformation rules as 𝑥𝑘 = 𝑥𝑘 (𝑢1 , 𝑢2 , 𝑢3 ), 𝑘 = 1,2,3
(2.27)
in which 𝑥𝑘 are the coordinates written in some coordinate system and 𝑢𝑖 are the coordinates in the new system. The line element is then given by [32] 𝑑𝑟⃗ = ℎ1 𝑑𝑢1 𝑒̂1 + ℎ2 𝑑𝑢2 𝑒̂2 + ℎ3 𝑑𝑢3 𝑒̂3 ,
(2.28)
where the 𝑒̂𝑖 are the unit normals in the new 𝑢-coordinate system and ℎ𝑖 𝑒̂𝑖 =
∂𝑟⃗ . ∂𝑢𝑖
The momenta are written as 𝑝⃗ = 𝑚 ∑3𝑖=1 ℎ𝑖
𝑑𝑢𝑖
𝑒̂𝑖 = ∑3𝑖=1
𝑑𝑡
𝑝𝑖
𝑒̂ ℎ𝑖 𝑖
(2.29)
such that, using (2.28), 3
𝑝⃗ ⋅ 𝛿𝑟⃗ = ∑ 𝑝𝑖 𝛿𝑢𝑖 . 𝑖=1
The Hamiltonian is written as[33] 3
1 𝑝𝑖2 𝐻= ∑ 2 + 𝑉(𝑢 ⃗⃗) 2𝑚 ℎ𝑖 𝑖=1
and the Liouville equation becomes ∂𝐹 ∂𝑡
+ ∑3𝑖=1
𝑝𝑖
∂𝐹
𝑚ℎ𝑖2
∂𝑢𝑖
𝑝2 ∂ℎ
∂𝑉
∂𝐹
𝑗
𝑗
+ ∑3𝑖=1 [∑3𝑗=1 (𝑚ℎ𝑖 3 ∂𝑢𝑗 ) − ∂𝑢 ] ∂𝑝 = 0. 𝑗
𝑖
(2.30)
Since the coordinate transformation (2.27) is a special case of a canonical transformation, we must have the Jacobian of the momentum transformation given by ‖𝐽‖𝑝 =
1 , ℎ1 ℎ2 ℎ3
since the Jacobian of the coordinate transformation is given by ‖𝐽‖𝑢 = ℎ1 ℎ2 ℎ3 ,
The Characteristic Function Derivation
45
and the infinitesimal volume element is a canonical invariant equal to one[34]. With all these previous results, the characteristic function becomes 𝑖
𝑍(𝑢 ⃗⃗, 𝛿𝑢 ⃗⃗; 𝑡) = ∫ 𝐹(𝑢 ⃗⃗, 𝑝⃗; 𝑡)exp (ℏ 𝑝⃗ ⋅ 𝛿𝑢 ⃗⃗)
𝑑𝑝1 𝑑𝑝2 𝑑𝑝3
(2.31)
ℎ1 ℎ2 ℎ3
and it is straightforward to find the differential equation it satisfies as ℏ2
− 𝑚 {ℎ
1
[
∂
ℎ2 ℎ3
1 ℎ2 ℎ3 ∂𝑢1
∂ ∂𝑢3
ℎ1 ℎ2
(
∂𝑍
ℎ3 ∂(𝛿𝑢3
(
∂𝑍
∂
)] − ∑3𝑖,𝑗=1 )
ℎ1 ℎ3
) + ∂𝑢 (
ℎ1 ∂(𝛿𝑢1 )
2
𝛿𝑢𝑖 ∂ℎ𝑗 ℎ𝑗3
∂𝑍
)+
ℎ2 ∂(𝛿𝑢2 )
∂2 𝑍
∂𝑢𝑖 ∂(𝛿𝑢𝑖
} + ∑3𝑖=1 𝛿𝑢𝑖 )2
∂𝑉 ∂𝑢𝑖
𝑍 = 𝑖ℏ
∂𝑍
.
(2.32)
∂𝑡
We now write the characteristic function as 𝑍(𝑢 ⃗⃗, 𝛿𝑢 ⃗⃗; 𝑡) = 𝜓 ∗ (𝑢 ⃗⃗ −
⃗⃗ 𝛿𝑢 2
; 𝑡) 𝜓 (𝑢 ⃗⃗ +
⃗⃗ 𝛿𝑢 2
; 𝑡)
(2.33)
with the generally complex amplitudes written as 𝑖
𝜓(𝑢 ⃗⃗; 𝑡) = 𝑅(𝑢 ⃗⃗; 𝑡)exp [ 𝑆(𝑢 ⃗⃗; 𝑡)]
(2.34)
ℏ
and expand expression (2.33) up to second order in 𝛿𝑢 ⃗⃗ to find 𝑍(𝑢 ⃗⃗, 𝛿𝑢 ⃗⃗; 𝑡) = {𝑅 2 + ∑3𝑖,𝑗=1 𝛿𝑢𝑖 𝛿𝑢𝑗 (
𝑅
∂2 𝑅
4
∂𝑢𝑖 ∂𝑢𝑗
1
∂𝑆
4
∑3𝑖𝑗=1 𝛿𝑢𝑖 𝛿𝑢𝑗
∂𝑅 ∂𝑅 ∂𝑢𝑖 ∂𝑢𝑗
𝑖
} exp ( ∑3𝑖=1 𝛿𝑢𝑖 ℏ
− ∑3𝑘=1 Γ𝑖𝑗𝑘
∂𝑅 ∂𝑢𝑘
)− ,
(2.35)
) ∂𝑢 𝑖
where Γ𝑖𝑗𝑘 is the Christoffel Symbol[35]. The reader may verify that this last expression gives the correct result for spherical coordinates, for instance. If we insert expression (2.35) into equation (2.32) and collect the zeroth- and first-order terms in 𝛿𝑢 ⃗⃗ we find 𝛿𝑢 ⃗⃗ ⋅
∂ ∂𝑆 1 ℏ2 2 (∇𝑆)2 + 𝑉(𝑢 [ + ⃗⃗) − ∇ 𝑅] = 0 ∂𝑢 ⃗⃗ ∂𝑡 2𝑚 2𝑚𝑅
and ∂𝑅 2 𝑅 2 ∇𝑆 +∇⋅( ) = 0. ∂𝑡 𝑚 These last two equations are equivalent to the Schrödinger equation in the variables 𝑢 ⃗⃗, when we substitute (2.34) into it and separate the real and imaginary parts.
46
Olavo Leopoldino da Silva Filho
2.4. GENERALIZATION TO MATHIEU’S CANONICAL TRANSFORMATIONS The problem of the invariance of Quantum Mechanics with respect to canonical transformations is a field of research by itself[36]. In this section we would like to present only some very simple developments that are an immediate consequence of the previous mathematical results. The most general canonical transformation is defined by ∑𝑖 𝑝𝑖 𝛿𝑞𝑖 = ∑𝑖 𝑃𝑖 𝛿𝑄𝑖 + 𝛿𝑆,
(2.36)
where (𝑝𝑖 , 𝑞𝑖 ) and (𝑃𝑖 , 𝑄𝑖 ) are the old and new phase space variables, respectively, and 𝛿𝑆 is a total differential of a certain function 𝑆 which is given as a function of 𝑞𝑖 and 𝑄𝑖 —the function 𝑆 is called the generating function of the canonical transformation. Since we have 𝑆 = 𝑆(𝑞1 , … , 𝑞𝑛 ; 𝑄1 , … , 𝑄𝑛 ), we may write 𝑛
∂𝑆 ∂𝑆 𝛿𝑆 = ∑ ( 𝛿𝑞𝑖 + 𝛿𝑄 ) ∂𝑞𝑖 ∂𝑄𝑖 𝑖 𝑖=1
and substitution in (2.36), assuming that 𝛿𝑞𝑖 and 𝛿𝑄𝑖 are free variations, implies that 𝑝𝑖 =
∂𝑆 ∂𝑆 ; 𝑃𝑖 = − . ∂𝑞𝑖 ∂𝑄𝑖
We may find a subgroup of the group of general canonical transformations by imposing that we must have ∑𝑖 𝑝𝑖 𝛿𝑞𝑖 = ∑𝑖 𝑃𝑖 𝛿𝑄𝑖 .
(2.37)
The previous equation is a restriction known by the name of Mathieu’s or Lie’s canonical transformations (see [31], p. 201). Note that in this case 𝑞𝑖 and 𝑄𝑖 cannot be completely independent of each other, since this would imply that all 𝑝𝑖 and 𝑃𝑖 vanish. Thus, there must be at least one functional relation 𝑓(𝑞1 , … , 𝑞𝑛 ; 𝑄1 , … , 𝑄𝑛 ) = 0 between 𝑞𝑖 and 𝑄𝑖 alone, without involving 𝑝𝑖 and 𝑃𝑖 . In fact, Mathieu’s canonical transformations are usually classified with respect to the number 𝑚 of independent relations which exist between the spatial coordinates alone (the smallest number being 1 and the largest being 𝑛). Note that point transformations (change of coordinates) are a special case of Mathieu’s canonical transformations, when 𝑚 = 𝑛. Thus, our results on quantization in generalized orthogonal coordinates are simply a particular case of these transformations.
The Characteristic Function Derivation
47
Being canonical, Mathieu’s transformations must leave Liouville’s equation invariant, and we must have 𝑑𝐹(𝑄, 𝑃; 𝑡) = 0, 𝑑𝑡 and the probability distribution density is now written with respect to the new phase space variables (𝑃, 𝑄). The infinitesimal transformation, defined in (2.2) as +∞
𝐹(𝑞, 𝑝; 𝑡)𝑒 𝑖𝑝𝛿𝑞/ℏ 𝑑𝑝,
𝑍(𝑞, 𝛿𝑞; 𝑡) = ∫ −∞
can be rewritten as +∞
𝑍(𝑄, 𝛿𝑄; 𝑡) = ∫ −∞
𝑝 𝐹(𝑄, 𝑃; 𝑡)𝑒 𝑖𝑃𝛿𝑄/ℏ 𝐽 ( ) 𝑑𝑃, 𝑃
where 𝐽(𝑝/𝑃) is the Jacobian of the transformation 𝑝 → 𝑃 [in the particular case of the point transformations, we have 𝐽(𝑝/𝑃) = 1/𝐽(𝑞/𝑄)]. Given all the previous results, these last equalities show that Quantum Mechanics must be invariant with respect to Mathieu’s canonical transformations.
2.5. CONNECTION TO BOHR-SOMMERFELD RULES As we have said before, the explanation of the “wavelike” experiments in terms of corpuscles is imperative to the present approach. Since the present formalism refers indisputably only to particles, every undulatory phenomenon must be explainable by the assumption of a corpuscular ontology. Note that there is nothing strange in such a strategy— this is actually what happens with the whole theory of sound, in which a set of particles behave, as a set, in an undulatory way. This is just an example of a corpuscular ontology giving rise to an explanation of undulatory behavior. Unfortunately, quantum phenomena cannot be explained in the same terms as sound theory. Indeed, in the case of a wave of sound, it is necessary that the particles be organized as a set of 𝑁 particles and the undulatory description applies to this set. However, in many quantum phenomena, as with the interference of particles in a double slit interferometer, we may send each particle (supposing that this is what they are) once at a time and still have, at the end, weighting the positions at which they reach some photographic plate, an interference pattern. Thus, although it is still possible to talk about an ensemble of single particle systems, this is quite different of having one many particle system (even if the particles in the many particle system are not interacting with each other—in the case of sound, it is the collective behavior of particles that is being described in wavelike terms). However, to find the explanation of these so called undulatory experiments in terms of corpuscles, we will have to develop our formalism somewhat further, so that we can prove, rather than speculate, that corpuscles behaving in such and such ways may produce
48
Olavo Leopoldino da Silva Filho
undulatory patterns, even when only one single system is considered. This will be completely done in chapter six. For the time being, the main formal tools here are the Bohr-Sommerfeld rules that we now mathematically derive from the previous formalism. We have the definition of the characteristic function as given by (2.2) and also the imposition that it must be written as a product of the type shown in (2.3). Since the characteristic function is a Fourier Transformation of the probability density defined upon phase space, if it is a product, then the probability density must be a convolution. It seems curious, at first sight, that in (2.3) the characteristic function may be interpreted as the probability density in configuration space displaced by an amount ±𝛿𝑞/2, and not by 𝑞 as one might think. Indeed, if we write the characteristic function as in (2.2) and interpret 𝑝 formally as the operator 𝑝̂ = −𝑖ℏ ∂/ ∂𝑞, then we would end with the formal identification 𝑍(𝑞, 𝛿𝑞; 𝑡) = ∫ 𝑒 𝛿𝑞 ∂/∂𝑞 𝐹(𝑞, 𝑝; 𝑡)𝑑𝑝, which would mean a displacement of 𝛿𝑞. Such an approach would be, however, misleading. As was already shown, the function 𝑝 is taken into the above-cited operator only when it is acting upon the probability amplitudes and not when it is acting upon the density function, as in the last expression. To see this we may go to a representation of 𝐹(𝑞, 𝑝; 𝑡), upon phase-space probability amplitudes ϕ(𝑞, 𝑝; 𝑡), imposing that we may write it from a function +∞
𝐹(𝑞, 𝑝; 𝑡) = ∫
ϕ∗ (𝑞, 2𝑝 − 𝑝′ ; 𝑡)ϕ(𝑞, 𝑝′ ; 𝑡)𝑑𝑝′ ,
−∞
which is the convolution above cited. In this case it is easy to show that the integration in (2.2) leads to 𝑍(𝑞, 𝛿𝑞; 𝑡) = 𝔉{ϕ∗ (𝑞, 𝑝; 𝑡)}𝔉{ϕ(𝑞, 𝑝; 𝑡)}, so that 𝔉{ϕ} represents the Fourier transformation of ϕ with respect to 𝑝. Now writing 𝜓 (𝑞 +
𝛿𝑞 2
𝑖
; 𝑡) = 𝔉{ϕ(𝑞, 𝑝; 𝑡)} = ∫ exp ( 𝑝𝛿𝑞) ϕ(𝑞, 𝑝; 𝑡)𝑑𝑝, 2ℏ
(2.38)
such that 𝜓 ∗ (𝑞 −
𝛿𝑞 2
𝑖
; 𝑡) = 𝔉{ϕ∗ (𝑞, 𝑝; 𝑡)} = ∫ exp ( 𝑝𝛿𝑞) ϕ∗ (𝑞, 𝑝; 𝑡)𝑑𝑝, 2ℏ
(2.39)
we reach the expression in (2.3). Thus, the constraint (2.3) is mathematically equivalent to assume the previous form for 𝐹(𝑞, 𝑝; 𝑡), and thus, the mathematical form for 𝜓(𝑞; 𝑡). Note that now the identification of 𝑝 with the above mentioned differential operator is appropriate and the factor 2 in the denominator is the one expected, since now, formally
The Characteristic Function Derivation 𝜓(𝑞; 𝑡) = ∫ ϕ(𝑞, 𝑝; 𝑡)𝑑𝑝 𝜓 (𝑞 +
𝛿𝑞 2
𝛿𝑞 ∂
; 𝑡) = ∫ exp (
2 ∂𝑞
𝛿𝑞 ∂
) ϕ(𝑞, 𝑝; 𝑡)𝑑𝑝 = exp ( 2
∂𝑞
) 𝜓(𝑞; 𝑡)
49
.
(2.40)
From results (2.38) and (2.39) it is very easy to derive the Bohr-Sommerfeld rules. Consider that we are interested in translating the amplitude 𝜓(𝑞; 𝑡) in configuration space from the point 𝑞 to the point 𝑞 + Δ𝑞 by infinitesimal transformations. In expression (2.38) we can see that the kernel of the infinitesimal transformation is given by 𝑖
𝐾𝑝(𝑞) (𝑞 + 𝛿𝑞, 𝑞) = exp (ℏ 𝑝(𝑞)𝛿𝑞),
(2.41)
and we write explicitly the dependence of 𝑝(𝑞) on variable 𝑞 to make it clear that we are on a trajectory of the system. The finite transformation 𝜓(𝑞; 𝑡) → 𝜓(𝑞 + Δ𝑞) would imply in the kernel 𝜓(𝑞 + Δ𝑞; 𝑡) = ∫ 𝐾𝑝 (𝑞 + Δ𝑞, 𝑞)ϕ(𝑞, 𝑝; 𝑡)𝑑𝑝, such that (the arguments here are quite similar to those of Feynman in his path integral approach[35]) 𝑁 𝐾𝑝 (𝑞 + Δ𝑞, 𝑞) = lim Π𝑛−1 𝐾𝑝(𝑞+(𝑛−1)𝛿𝑞) (𝑞 + 𝑛𝛿𝑞, 𝑞 + (𝑛 − 1)𝛿𝑞), 𝑁→∞
where we put 𝑁𝛿𝑞 = Δ𝑞 and take the limit 𝑁 → ∞, since Δ𝑞 is a finite interval and 𝛿𝑞 is infinitesimal. Using (2.41) we find that this last expression can be written as 𝑁
𝑖 𝐾𝑝 (𝑞 + Δ𝑞, 𝑞) = exp ( lim ∑ 𝑝(𝑞 + 𝑛𝛿𝑞)𝛿𝑞). ℏ 𝑁→∞ 𝑛=0
The sum in the exponent is clearly an integral taken along the trajectory of the particle and we end up with 𝑖 𝑞+Δ𝑞 𝐾𝑝 (𝑞 + Δ𝑞, 𝑞) = exp ( ∫ 𝑝(𝑞)𝑑𝑞). ℏ 𝑞 If Δ𝑞 assesses a symmetry of the problem (𝑞 + Δ𝑞 can be equal to 𝑞 for rotations by 2𝜋, for instance) we must impose that 𝜓(𝑞 + Δ𝑞; 𝑡) = ±𝜓(𝑞; 𝑡),
(2.42)
50
Olavo Leopoldino da Silva Filho
where the ± sign comes from the fact that 𝜓(𝑞; 𝑡) is an amplitude, and the physically important quantity is the density, which allows both signs. Since we can now write 𝑖 𝑞+Δ𝑞 𝜓(𝑞 + Δ𝑞; 𝑡) = ∫ exp ( ∫ 𝑝(𝑞)𝑑𝑞) ϕ(𝑞, 𝑝; 𝑡)𝑑𝑝, ℏ 𝑞 we obey (2.42) if we put 𝑖 𝑞+Δ𝑞 exp ( ∫ 𝑝(𝑞)𝑑𝑞) = ±1, ℏ 𝑞 because of (2.40). This last expression immediately implies that 𝑞+Δ𝑞
∫𝑞
2𝑛𝜋ℏ = 𝑛ℎ 𝑝(𝑞)𝑑𝑞 = { 1 2𝑛𝜋ℏ + 𝜋ℏ = (𝑛 + ) ℎ 2
if 𝐾𝑝 = +1 if 𝐾𝑝 = −1
,
(2.43)
which is the expression for the Bohr-Sommerfeld rules, with the difference that by derivation we also find the possibility of half-integral numbers. This was never predicted in the historical development of the approach and is usually considered as one flaw of the theory, since there are a number of situations in which halfintegral quantum numbers are necessary (see [38], p. 48). It is also obvious that this theory can neither assess results related to the probability amplitude, such as those related to intensities, nor problems without symmetries. However, given the derivation process, relations (2.43) cannot be considered a “mere approximation” for systems showing some kind of symmetry, although it may be assumed as a first approximation (semiclassical is the usual word) for systems in which there is no available symmetry. As with the Feynman approach (that we will soon study), the integrals (2.43) give the most or least probable trajectories of the system’s particles, and thus they also furnish the points at which one should expect maxima or minima for the probability density. This last formal result is the key to understand, using only a corpuscular model and the phenomenon of quantization, the so called “wavelike” experiments. We address these experiments in the next subsections.
2.5.1. Corpuscular Interpretation of Bragg-Laue Diffraction A Bragg-Laue diffraction pattern is the outcome of an experiment in which an incident monoenergetic beam of particles collides with a periodic structure like a crystal. In the Davisson-Germer experiment (1927) a parallel monoenergetic beam of electrons with low energy was produced by a source and accelerated through a potential 𝑉. This beam was directed normally to the surface of a crystal of Ni. Most of the electrons were then scattered backwards by the surface of the crystal while some electrons were scattered by an angle 𝜃 and captured with a detector. A small retarding potential 𝑉𝑟 , slightly weaker than 𝑉, allowed only the electrons scattered by the crystal with low energy loss to arrive at the detector 𝑃 and
The Characteristic Function Derivation
51
produce a current indicated in the apparatus 𝑀. Low energy electrons loose their energy quickly when entering a solid material; thus, the detected electrons must be those essentially scattered by the surface of the crystal. For a Ni crystal, the periodicity of the lattice is given by the parameter 𝑑 = 2.15 × 10−8 𝑐𝑚. The crystal is known to act like a diffraction grid for incident X-rays, for instance, and one observes for these rays a diffraction pattern. When, in spite of X-rays, which are assumed to be waves as the result of the overwhelming developments of the XIX 𝑡ℎ century, we use incident electrons, the majority of the electrons are scattered backwards, which would be considered a result compatible with their assumed corpuscular nature. However, there are also electrons scattered by other angles (for a potential 𝑉 = 54,0𝑒𝑉, and a Ni crystal one may find a second peak at 𝜃 = 50𝑜 ). This result shows that the electrons behave exactly like X-rays or other electromagnetic radiation interacting with diffraction grids. If we assume that there is no corpuscular explanation for the diffraction pattern generated by the incident electromagnetic radiation (X-rays), then a corpuscular explanation would be also lacking for the incident electrons and we must conclude that either these particles are dual entities (ontological view) or, by some unknown (ontological) reason, each one of them behave in a dual manner. It is important to stress that the incident beam of electrons may be produced with extremely low intensity so that the electrons are practically sent one at a time. It is also important to stress that the diffraction pattern is the outcome of the experiment made for a great number of electrons, since the outcome of the experiment for an individual electron is always a point in the detector. On the other hand, if one could find a corpuscular explanation for this phenomenon, then this corpuscular explanation could be extended also to the electromagnetic radiation as well. We are again at the hart of the dispute about the nature of radiation and matter of the XIX 𝑡ℎ century as described in chapter one. In the XIX 𝑡ℎ century the notion of discrete momentum transfer was not available, since this is a result of the quantum theory itself. The developments of the previous section, however, point in this direction and we will show that it is this notion that allows one to give a corpuscular and extremely simple (visualizable) explanation to Bragg-Laue diffraction for particles4. In the previous section we saw that for a physical system showing some sort of symmetry expression (2.43) musa follow. This expression may be interpreted as furnishing the amount of momentum exchange between the incident particles (for instance) and the bulk matter (the crystal, in this case). Thus, the important result here is that Quantum Mechanics predicts that the crystal, given its symmetry as a crystal structure, is capable of absorbing or emitting only discrete amounts of energy (or exchange discrete amounts of momentum). Since the momentum transfer between the two systems is quantized, so it must be the angle by which Note that we are using the term `particles' in the same epistemological level as the word “waves”. They do not refer to the nature of the physical entity—they do not uphold ontological commitments—, but refer only to our preconceptions inherited from the developments of the XIX 𝑡ℎ century about radiation and matter. This contrasts with our use in this work of the words “corpuscular” and “undulatory” (natures). Thus, both particles and waves (whatever we had previously thought they were) could have both “corpuscular” and “undulatory” natures, this being the central thesis of the Ontological Dualism. Or they may refer to different levels of description: “corpuscular” referring to the nature of the physical entities (electrons, photons, etc.), while “undulatory” refers to behavior as an ensemble of particles or of each particle considered during a time interval (as we will see later on), which is the central thesis of this book.
4
52
Olavo Leopoldino da Silva Filho
the electron is emitted. This problem, in fact, is simply a version of the usual scattering problem in which the outgoing particle must follow one of the allowed channels (which are discrete). The discrete character of the angles, a consequence of the discrete character of the momentum exchange, is sufficient to predict that the individual particles (the electrons) hit the detector only at discrete positions, giving the usual diffraction pattern usually observed. This can be easily shown mathematically[23]. For the arrangement shown in Figure 2-1, expression (2.43) becomes 𝑑
∫ 𝑝𝑧 𝑑𝑧 = ∫ 𝑝𝑧 𝑑𝑧 = 𝑛𝑧 ℎ, 0
where 𝑑 is the lattice constant, implying that 𝑝𝑧 =
𝑛𝑧 ℎ 𝑑
, are the allowed values of the momenta
within the lattice. Thus, any exchange of momentum between the crystal and the incoming particle would be of the form Δ𝑝𝑧 =
(Δ𝑛𝑧 )ℎ 𝑛ℎ = , 𝑑 𝑑
and the balance of momentum would imply that we must have the exchange of momentum between the incident particle and the crystal given by 𝑝𝑧 = 2𝑝sin𝜃. Now, for the sake of comparison with the traditional “wavelike” expression, if we write ℎ 𝑝= , 𝜆 we find that 𝑛𝜆 = 2𝑑sin𝜃,
(2.44)
which is the equation for the maximum intensity positions of the Bragg-Laue diffraction of particles by periodic structures. Result (2.44), that came only assuming a corpuscular nature of the incident particles and a quantized transfer of momentum, allows us to say that the diffraction pattern one finds making an incident flux of particles to hit upon the surface of a crystal is exactly the same one gets using the undulatory approach assuming waves. Of course, this approach does not include any information about intensities, that must be obtained within the realm of the Schrödinger equation.
The Characteristic Function Derivation
53
Figure 2.1. An experimental setup similar to the Davison-Germer arrangement to measure the phenomenon of particle diffraction. Most of the particles are reflected with the angle of incidence, but there are some that are reflected with different angles.
This gives us a first assessment of the possibility to avoid the duality notion (we will return to more complicated situations in future chapters)—while each particle is a corpuscle, the ensemble of particles behaves like a wave. Again, the use of the parameter 𝜆 is unessential and was introduced for the sake of comparison with undulatory approaches—it becomes necessarily associated with a wave only if we are assuming the undulatory approach from the beginning. Indeed, no one is trying to leave aside the undulatory formalism; we are only trying to fix its level of reference: individual object’s nature, or ensemble’s behavior5. It is clear from this approach that the diffraction pattern is giving us information about the interaction of the electron with the crystal, and not information about the nature of the electron. This is why, after all, these experiments of Bragg-Laue diffraction are used to know the crystalline structure of the crystal, given the momentum of the incoming particles. Note that in the present interpretation it is unnecessary to talk about observers, waves, ontological duality, complementarity, reduction of the wavepacket, etc. Other peculiarities of the diffraction phenomenon can be also explained within the corpuscular approach. For instance, if the incident particles have too high velocities they interact only a smaller fraction of time with the crystal surface, if we keep the lattice 5
This argument, however, calls for qualification: the referents of the symbols of some physical theory are internal to this theory. It would be misleading to say that 𝜆 stands, in the present corpuscular approach, to a wavelength, for the very reason that there is no such construct available. In the present corpuscular framework, 𝜆 stands for the values of momenta that still give the necessary time for the corpuscles to interact with the slit or crystal in such a way that the structured diffraction pattern can obtain (lesser 𝜆 would mean greater momentum and less time for interaction with the slit, and thus, the absence of the diffraction pattern). Within the undulatory approach 𝜆 stands for the minimal wavelength of waves that allows diffraction patterns. They fulfill analogous roles but have different meaning within their own theoretical framework. Thus, one cannot say that the expression 𝑝 = ℎ/𝜆 assumes duality, such that the right hand side would be meaningful only within some corpuscular theory, while the left hand side would be meaningful only within some undulatory theory. The fact is that both sides can be made meaningful within their own semantic frameworks – undulatory or corpuscular. The obvious analogy, however, calls for clarification, and we will produce one such in chapter six.
54
Olavo Leopoldino da Silva Filho
parameter unaltered. If we want the crystal surface to appropriately interact with the incoming beam having a higher velocity, we have to make the lattice parameter smaller to compensate for this increase in the velocity. This is the relation between the incoming velocities of the particles (their “wavelength”) and the lattice parameter, which is the condition to have diffraction patterns. Note that for situations in which we do not have strict crystalline structure (defects, for instance), the Bohr-Sommerfeld rule still applies, but only as a first approximation, like in the WKB method. Of course, another possibility for an ontological interpretation would be to assume that the incoming electron is both particle and wave and when it hits the surface of the crystal it interferes with itself (as an extended entity) and part of it goes backwards while other parts go into quantized angles in the direction of the detector. These reflected parts of the single electron then hit the detector as an extended entity and, when one observes, this wavepacket is reduced to the image of a particle (a space-time point in the detector). In this last case one would have to assume an unobservable entity (this wave cannot be directly observed by definition), the fact that there must be an observer (conscious, in some approaches), the underlying principle of wavepacket reduction, which is another principle of some interpretations of Quantum Mechanics, etc. It seems to us as unavoidable the conclusion about the most clear, economic and visualizable interpretation. This, however, does not end the dispute, for the corpuscular approach must explain also other “wavelike” phenomena, as the double slit interference and the diffraction by an aperture (hopefully in similarly easy ways). This is done in the next two subsections. In any case, we stress again that if we accomplish that for these “unequivocally wavelike” phenomena involving particles, the same conclusions are inherited by the similar phenomena involving electromagnetic radiation. Giving that all other quantum phenomena (blackbody radiation, photoelectric effect, Compton effect, etc.) are explainable by corpuscular models, part of the epistemic blocking to a corpuscular ontology mentioned before is removed. Giving also that some of these other quantum phenomena are not explainable by wavelike behavior, this settle the balance plates in favor of the corpuscular monist interpretation. Another different problem is to explain, with a corpuscular interpretation, phenomena in which one is always with the same physical system (e.g. only one hydrogen atom)—in this case, the notion of ensemble does not seem to apply, but the phenomenon is still governed by a wavelike equation—thus, the corpuscular interpretation may fail (this is the other part of the epistemic blocking). We will handle these cases in future chapters, for they demand further mathematical developments.
Another Approach to the Same Problem We can analyze the same problem from a different perspective that gives even more credibility to the corpuscular interpretation we are advancing. In a crystal described as a lattice of points of equal symmetry, the requirement for the appearance of maxima intensities in a diffraction experiment means that in momentum or reciprocal space the values of momentum transfer where these maxima occur also form a lattice (the reciprocal lattice). For example, the reciprocal lattice of a simple cubic real-space lattice is also a simple cubic structure. There is a construct very useful to determine which lattice planes (represented by the grid points on the reciprocal lattice) will result in a diffracted signal for a given momentum of
The Characteristic Function Derivation
55
incident radiation. It is the construct of the Ewald sphere, which is defined in the following way: the incident corpuscles falling on the crystal have wave vector 𝑘⃗⃗𝑖 whose length is 𝑝𝑖 /ℎ. The diffracted plane wave has a wave vector 𝑘⃗⃗𝑓 . If no energy is gained or lost in the diffraction process (it is elastic) then 𝑘⃗⃗𝑓 has the same length as 𝑘⃗⃗𝑖 . The difference between the ⃗⃗ = 𝑘⃗⃗𝑓 − wave-vectors of diffracted and incident waves is defined as the scattering vector Δ𝐾 𝑘⃗⃗𝑖 . Since 𝑘⃗⃗𝑖 and 𝑘⃗⃗𝑓 have the same length the scattering vector must lie on the surface of a sphere of radius 𝑝𝑖 /ℎ. This sphere is called the Ewald sphere. The reciprocal lattice points are the values of momentum transfer where the Bragg-Laue diffraction condition is satisfied and for diffraction to occur the scattering vector must be equal to a reciprocal lattice vector. Geometrically this means that if the origin of reciprocal space is placed at the tip of 𝑘⃗⃗𝑖 then diffraction will occur only for reciprocal lattice points that lie on the surface of the Ewald sphere (see Figure 2-2) From this construction, it is most clear that the process of Bragg-Laue diffraction can be interpreted in corpuscular terms. Note that quantization, in this case, comes from the fact that the Ewald sphere will always select planes of the crystal that do present spatial symmetry.
2.5.2. Corpuscular Interpretation of Diffraction by an Aperture The phenomenon of diffraction by an aperture is revealed when we make an incident beam of particles (or a wave—see previous footnote) to be incident upon a screen on which an aperture of size 𝑎 is made. The particles pass through the aperture and hit a detector that indicates the angle with which the particle came out of the aperture. Repeating the same experiment using one particle at a time, one ends up with a diffraction pattern exactly equal to those found for electromagnetic radiation (see Figure 2-3).
Figure 2.2. The Ewald sphere (circle for two-dimensions) allowing us to determine what planes in the crystal structure will be responsible for the resulting Bragg-Laue diffraction pattern.
56
Olavo Leopoldino da Silva Filho
Figure 2.3. One silt diffraction experiment. A source sends particles each at a time perpendicularly to a screen with an aperture of size a at its center. The particles pass through the slit and collide with a detector. The discrete momentum transfer between the slit is responsible for the quantized result.
The difference between this problem and the one of the previous section is that now one does not have in general any periodic structure from which to introduce symmetry relations. The material of the screen on which the aperture is made can be of any type of bulk matter, crystalline or not. Thus, if we want to explain this effect with the same constructs of the previous subsection, we must find another type of symmetry for this problem (Landé [24] failed in explaining this phenomenon precisely because he couldn’t find this general symmetry, and he had to assume that the screen containing the slit was of a crystalline nature). It is quite easy to see which is the symmetry of the problem: the system has spatial symmetry in the sense that we must have in one-dimensional arrangements (see Figure 2-3) 𝑎 𝑎 𝜓 (+ ) = ±𝜓 (− ), 2 2 (or, for circular slits, symmetry around the line of incidence), meaning that the probability density must be the same at both points. Thus, the results of the previous subsection let us write +𝑎/2
∫
𝑝𝑧 𝑑𝑧 = 𝑛ℎ,
−𝑎/2
where all the symbols have the same meaning of the last subsection. This integral means that after making a slit on some screen (crystalline or not), we induce upon it momentum transfers given by the above expression. The calculations proceed exactly as in the previous subsection and we find, for the position of the minina, the result 𝑛𝜆 = 𝑎sin𝜃,
(2.45)
The Characteristic Function Derivation
57
which is the usual result showing spatial quantization. Therefore, spatial quantization is the outcome of the new quantized momentum transfer relations between the screen and the incoming particles. In fact, the deposition of corpuscles on the photographic plate allows us to infer the sort of quantization we induced on the screen by making the slit. This makes it clear in the silt experiment what is the physical system being scrutinized and what is the probe used to unravel the system’s properties: the screen with the slit(s) is the physical system under investigation—and we want to know what is the result of making hole(s) on it; while the incoming particles form the probe used to make such investigation. This is completely paralel to the experiments with Bragg-Laue diffraction—in which the system is the crystal whose lattice structure (in reciprocal space) we want to investigate by probing it with a beam of X rays. Note that this formalism is unable to tell us if we are on a maximum or a minimum, since it does not refer to intensities. Within the realm of this theory we may only say that the previous expression refers to minima or maxima of the underlying diffraction pattern. Experiments can be used to remove the ambiguity. In fact, this is why we need the Schrödinger equation: it provides us with the mathematical symbols that allow us to calculate the intensities and thus find out if the previous result refers to maxima or minima. We must stress here the distinctions between the explanation of the two diffraction behaviors we have so far analyzed. Bragg-Laue diffraction is due to spatial periodicity possessed by the crystal, while diffraction by an aperture is connected to a particular symmetry of the system induced by the existence of the aperture in the screen (a rotation of 𝜋 around the axis perpendicular to the slit passing through its center). This is the explanation for the appearance of the lattice parameter in equation (2.44), while in equation (2.45) there appears the slit dimension. Again, if the incident particles have too high velocities (too small wavelength), they interact for very short time intervals with the aperture and to keep the same diffraction pattern we must make the slit dimension smaller, being this the condition for having diffraction patterns. Also, if we increase the size of the aperture, particles would interact less with the slit, giving a single maximum in the direction of the center of the aperture. Note that for situations in which we do not have a strict symmetry condition (some imprecision in the slit, etc.), the Bohr-Sommerfeld rule still applies, but only as a first approximation, like in the WKB method. The interpretation of the experimental situation in terms only of corpuscles is quite easy and visualizable: each particle passes through the aperture which must be small enough to let an exchange of momentum between itself and the corpuscle. When passing through the slit, each corpuscle exchanges momentum with the screen of the aperture in quantized ways, given by the Bohr-Sommerfeld rule. The amount of momentum exchange makes the corpuscle to hit the screen at some spatially quantized position; we are considering only the maxima and minima, which are the positions pointed out by the Bohr-Sommerfeld rule, but of course there will be corpuscles arriving also at other points of the detector, giving the continuous pattern usually observed in these experiments—these other positions will be hit, however, by less corpuscles, which gives the intensity pattern of a diffraction experiment. The relative intensities of the maxima and minima, thus, are the representation of the distinct probabilities related to quantized momentum exchange. The appearance of corpuscles around the points of minima and maxima are due to thermal movements of the screen atoms or any other effect that can impede the strict application of the Bohr-Sommerfeld rule for each particle.
58
Olavo Leopoldino da Silva Filho
The diffraction pattern obtained as the outcome of the experiment is a radiography of these momentum transfer rules induced by the existence of the aperture, considering the velocity of the incoming corpuscles, the experimental probe.
Alternative Explanation in Terms of Ewald Spheres We can use the same formal tool of the Ewald sphere to understand the phenomenon of diffraction by an aperture. Indeed, as we can see from Figure 2-4(a),(b), the very presence of the aperture induces a symmetry by rotations of 𝑘𝜋 around the 𝑧-axis that takes each point at 𝑎/2 + 𝑥 into −𝑎/2 − 𝑥, for any 𝑥. One aspect of this symmetry is that we can always find points at any position on the surface of an Ewald sphere, since it no longer rests on the positions of the atoms in the plate. Thus, we expect diffraction to occur at all angles (see Figure 2-4). Since intensity minima is given by angles for which there is the above mentioned relation (2.45), as calculated by the Fourier transform of the whole structure and represented by expression 2 𝑎 sin (𝜋 sin𝜃) 𝜆 𝐼(𝜃) = 𝐼0 [ ] , 𝑎 𝜋 sin𝜃 𝜆
we get the usual diffraction pattern. The soundness of the present approach comes from the fact that, within Quantum Mechanics, symmetry induces quantization, be it the space translational symmetry of a crystal or the translational symmetry induced by an aperture. While in a crystal there can be situations of no diffracted beam (no plane of the crystal falls into the surface of the related Ewald sphere), in the diffraction by an aperture there will be always a diffracted beam, as any usual experiment can confirm.
Figure 2.4. The Ewald sphere construction for an apperture.
The Characteristic Function Derivation
59
2.6.3. Corpuscular Interpretation of Double Slit Interference The interpretation of the double slit interference using a corpuscular approach follows the same lines of the previous explanations. In Figure 2-5 we show a sketch of a double slit interference experiment. The first screen has two slits of size 𝑎 separated by a distance 𝑐. The apertures are symmetrically placed with respect to a line (the 𝑧 axis) passing at their midpoint. The detectors are in the second screen. Particles are then provided by a source in the left, one at a time, and each one pass through only one aperture at each time. When passing through one of the slits, the particle exchange momentum with the aperture by quantized amounts and is deflected, colliding afterwards with the detector at some specific spatial position. As other particles are sent by the source, they also interact with one or the other slit (but not both!) and go impressing the detector. The intensities of the interference pattern at each point of maximum, that is the outcome of the experiment, depends upon the probabilities of having quantized momentum transfer of some specific value.
Figure 2.5. A sketch of an experiment on double slit interference of particles. Each incoming particle interacts with one single slit and exchange with it linear momentum by a quantized amount, hitting the detector at some spatially quantized positions.
This very simple corpuscular explanation relies upon the fact that the system possesses the following symmetries 𝜓 (−
𝑐+𝑎 𝑐+𝑎 ) = ±𝜓 ( ), 2 2
if we consider the point in the middle of the apertures, and also 𝑐 𝑐 𝜓 (− ) = ±𝜓 ( ), 2 2 if we consider the points at the beginning of the aperture closest to the center of the screen. The second symmetry gives 𝑝𝑧 𝑐 = 𝑛1 ℎ
60
Olavo Leopoldino da Silva Filho
while the first furnishes 1 𝑝𝑧 (𝑐 + 𝑎) = (𝑛2 + ) ℎ, 2 where 𝑛1 and 𝑛2 are two integers and we used the second relation in (2.43) guided by the known results of the experiments (note again that we cannot say which points are maxima or minima). We may solve these two equations in the same way we did before to find 𝑛𝜆 = 𝑐sin𝜃 1 , (𝑚 + ) 𝜆 = 𝑎sin𝜃 2 where we put 𝑛 = 𝑛1 and 𝑚 = 𝑛2 − 𝑛1 . The first expression gives the intensity maxima condition related to the interference pattern of a double slit of distance 𝑐 between the slits, while the second expression gives the conditions, also for intensity maxima, related to the diffraction by an aperture of width 𝑎. Except for the information about the intensities that lies outside the scope of this formalism, these are the conditions for intensity extrema as obtained by wavelike analysis. In this case, as with the other two cases of the previous sections, one does not need to go beyond a pure corpuscular ontology to explain the phenomenon of double slit interference (the name “interference” may be used taking into consideration that the behavior of the intensities, which are governed by the probabilities of having specific amount of quantized momenta, is a wave equation, if not for historical reasons). Another possibility of interpreting this same phenomenon is assuming that each electron, as an extended entity, interacts with both slits and interferes with itself to impress the whole surface of the detector as an extended entity. Then, when observed, this extended entity is reduced to a single space-time point, which is the usual observed outcome of the experiment with individual electrons. This work, of course, favours the corpuscular interpretation and bases it on the formalism just presented. Note that the problem with the second interpretation (besides its quite absurd aspect, usually happily welcome by some physicists) is not that it lacks formal grounds. It does have obvious formal grounds and it does explain the phenomenon. The problem with the second interpretation is that it cannot explain other typically corpuscular phenomena, such as the photoelectric effect, the Compton effect, etc. Thus, if one has an extremely visualizable interpretation, based upon a single ontological entity, that can explain all the known phenomena, it gets itself recommended by a principle of economy (to say the least, given the quite counterintuitive character of the dual interpretation itself). The same kind of epistemic blocking that the “undulatory” phenomena exerted upon the corpuscular approach, the “corpuscular” phenomena exert upon the undulatory approach. In fact, as the first chapter explicitly tried to show, it was because an ontological reduction of the type just shown was not available at the beginning of the XX 𝑡ℎ century that these two epistemic blockings imposed a dualist interpretation as a compromise between corpuscular and undulatory ontologies. The lifting of one of these epistemic blockings removes the need for such a compromise.
The Characteristic Function Derivation
61
It is now easy to see why an undulatory model for interference and diffraction was available much before any corpuscular model. To address the problem in undulatory ways one has just to write down the descriptors of the theory (the wave-fronts) and make a single assumption on the phase relations of the waves on the photographic plate. There is no need to explicitly consider the actual interaction with the slits which, of course, must obtain if one wants, after all, that the shape of the wave-fronts change. However, to use the corpuscular model, one must specify the type of momentum exchange occurring at the slits (quantized, as we saw), which makes sense only if one has a deeper understanding of what is going on.
A Deeper Understanding of Interference and Diffraction Experiments Another way to see these results is the following (regarding one slit, for instance): when we make the slit in the screen, we introduce a symmetry in 𝑎 in the z-direction 𝑘̂. However, this imply to introduce, in reciprocal space, the quantity 𝑘 = 2𝜋/𝑎 and the possibility for momentum exchanges of Δ𝑝 = 𝑛2𝜋ℏ/𝑎 = 𝑛ℎ/𝑎. If the initial momentum in the z-direction was zero, then the quantized exchange of momentum must imply that 𝑛ℎ/𝑎 = 𝑝sin𝜃 (that is, the amount of momentum transferred to the particle is a multiple of the characteristic momentum exchanges induced by making the slit). Writting 𝑝 = ℎ/𝜆 gives 𝑛𝜆 = 𝑎sin𝜃,
(2.46)
as expected. What the preceding sections established is a model for interference and diffraction that encompasses the notion of particle. This was accomplished by the supposition, based on the Schrödinger equation and the Bohr-Sommerfeld rules, of quantized momentum transfers. This interpretation, on passing, makes explicit an asymmetry of analysis usually made when considering Bragg-Laue diffraction and one- or double-slit interferometry. Indeed, when one looks at the outcome of Bragg-Laue diffraction experiments, the diffraction patterns, these patterns are always considered as reflecting the structure of the crystal. The X-ray is the probe, not the physical system under scrutiny. The only constraint on the X-ray, as a probe, is that it has the necessary energy to produce the studied phenomenon. In fact, the Bragg-Laue diffraction patterns are considered as revealing precisely the momentum structure of the crystal (its reciprocal lattice structure). However, when one considers one- or double-slit interferometry, the analysis changes without justification, and it almost seems as if the plate with the slit were to be considered the probe, while the electron were to be considered the system under scrutiny. The preceding discussion should be enough to show that this double-standard in the interpretation of the phenomena is the one responsible for many misunderstandings in the field.
2.6. CONNECTIONS WITH FEYNMAN’S PATH INTEGRAL APPROACH It may seem, at first sight, a sort of mannerism the search for connections of the present derivation with other known approaches already in the literature. However, this impression would be misleading; in fact, each one of these derivations uses its own symbols and
62
Olavo Leopoldino da Silva Filho
constructs that point to directions of interpretation that may be overlooked by other approaches. Whenever one succeeds in proving the mathematical connections between two seemingly distinct derivations of the same formalism, it is possible not only to improve the interpretation of the underlying phenomena, but also to construct a semantic bridge over the syntactic pillars just set. In this section we are interested in showing that our approach towards the establishment of the Bohr-Sommerfeld conditions is fully equivalent with Feynman’s path integral method (only slight modifications would be necessary)[37]. On the other hand, Feynman’s approach already has a very nice interpretation in terms of displacements and it points in the direction of randomness (without making it explicit—we will return to this point in what follows). Moreover, in Feynman’s interpretation there appears also an infinitesimal quantity (usually written as 𝜀) representing infinitesimal amounts of time. As we will show, the syntactic bridge between our derivation and Feynman’s path integral approach will give us some clues about the role played by the infinitesimal quantities 𝛿𝑞 and also about the role played by fluctuations in quantum phenomena. We begin by writing 𝑝𝛿𝑞 = 𝑝
𝛿𝑞 𝛿𝑡, 𝛿𝑡
and we put 𝑞̇ = 𝛿𝑞/𝛿𝑡, since these quantities are infinitesimal. This means that 𝑞̇ represents the velocity taken over the same trajectory, since in general we would have 𝛿𝑞 = Δ𝑞 + 𝑞̇ 𝛿𝑡, where Δ𝑞 is the separation between two distinct trajectories (which we made equal to zero). We must have 𝑞(𝑡2 )
𝑡2
Δ ∫ 𝑝𝑞̇ 𝑑𝑡 = Δ ∫
𝑝𝑑𝑞 = 0,
𝑞(𝑡1 )
𝑡1
which is an expression for the Principle of Least Action. We may now use 𝑞̇ 𝑝 = 𝐿(𝑞, 𝑞̇ ; 𝑡) − 𝐸, where 𝐿(𝑞, 𝑞̇ ; 𝑡) is the classical Lagrangian function and 𝐸 is the energy (supposed constant) of the system under consideration. Our expression (2.38) becomes 𝛿𝑡
𝑖
𝑝
𝜓 (𝑞 (𝑡 + 2 )) = ∫ exp [2ℏ (𝐿(𝑞, 𝑞̇ ; 𝑡) − 𝐸)𝛿𝑡] ϕ(𝑞, 𝑝; 𝑡)𝐽 (𝑞̇ ) 𝑑𝑞̇ , where 𝐽(𝑝/𝑞̇ ) is the Jacobian of the transformation (𝑞, 𝑝) → (𝑞, 𝑞̇ ). The kernel of the infinitesimal (in time) transformation in (2.47) is given by 𝑝(𝑡) 𝑖 𝐾𝑞̇ (𝑡) (𝑡 + 𝛿𝑡, 𝑡) = 𝐽 ( ) exp [ (𝐿(𝑞, 𝑞̇ ; 𝑡) − 𝐸)𝛿𝑡], 𝑞̇ (𝑡) 2ℏ
(2.47)
The Characteristic Function Derivation
63
such that the transformation between two different times 𝑡𝑎 = 0 and 𝑡𝑏 = 𝑡 may be written as 𝑁
𝐾𝑞̇ (𝑡) (𝑡𝑏 , 𝑡𝑎 ) = lim ∏ 𝐾𝑞̇ (𝑡+(𝑛−1)𝛿𝑡) (𝑡 + 𝑛𝛿𝑡, 𝑡 + (𝑛 − 1)𝛿𝑡), 𝑁→∞
𝑛=1
where 𝑁𝛿𝑡 = 𝑡𝑏 − 𝑡𝑎 , making it necessary to take the limit 𝑁 → ∞, since 𝛿𝑡 is infinitesimal. We may thus write [𝑡𝑛 = 𝑡 + (𝑛 − 1)𝛿𝑡] 𝑁
𝑝(𝑡𝑛 ) 𝐾𝑞̇ (𝑡) (𝑡𝑏 , 𝑡𝑎 ) = [ lim ∏ 𝐽 ( )] × 𝑁→∞ 𝑞̇ (𝑡𝑛 )
. 𝑛=1 𝑖 exp [ lim ∑ [𝐿(𝑞(𝑡𝑛 ), 𝑞̇ (𝑡𝑛 ); 𝑡) − 𝐸]𝛿𝑡] ℏ 𝑁→∞ In the appropriate limit we get 𝑖 𝑡𝑏 𝑖 𝐾𝑞̇ (𝑡) (𝑡𝑏 , 𝑡𝑎 ) = 𝐴exp [ ∫ [𝐿(𝑞(𝑡𝑛 ), 𝑞̇ (𝑡𝑛 ); 𝑡)]𝑑𝑡 − 𝐸(𝑡𝑏 − 𝑡𝑎 )], ℏ 𝑡𝑎 ℏ where we put 𝑁
𝑝(𝑡𝑛 ) 𝐴 = lim ∏ 𝐽 ( ). 𝑁→∞ 𝑞̇ (𝑡𝑛 ) 𝑛=1
Since the classical action is given by 𝑡𝑏
𝑆𝑐𝑙 [𝑡𝑏 , 𝑡𝑎 ] = ∫ 𝐿(𝑞, 𝑞̇ ; 𝑡)𝑑𝑡 𝑡𝑎
we finally get the desired result 𝑖 𝑖 𝐾𝑞̇ (𝑡) (𝑡𝑏 , 𝑡𝑎 ) = 𝐴exp [ 𝑆𝑐𝑙 [𝑡𝑎 , 𝑡𝑏 ] − 𝐸(𝑡𝑏 − 𝑡𝑎 )], ℏ ℏ which is the expression for the kernel of Feynman’s path integral approach. Note that the Feynman approach furnishes the amplitudes and is an alternative approach to the Schrödinger equation. This mathematical connection between the two derivations gives us a clue about the type of process that may be at the core of quantum phenomena. Indeed, Feynman’s approach relies upon the interpretation of path integrals as being taken at small time intervals 𝛿𝑡 during which the movement of the particles takes place in approximately straight lines (and they may be understood here simply as particles) (see Figure 2-6). The complete movement of each particle, thus, can be viewed as zigzagging along a trajectory resembling a random process. Each trajectory has its own probability of being followed, being this the semantic content of the function 𝜓(𝑞; 𝑡) within this formalism.
64
Olavo Leopoldino da Silva Filho
Figure 2.6. Schematic representation of a Feynman path compared to the most probable trajectory of a corpuscle. In the right there is a representation of the fluctuations related to this Feynman path.
This interpretation gives us a clue about the role played by randomness in quantum processes, although it does not furnish the adequate formal grounds to proceed with an interpretation based upon stochastic processes. In the next four chapters we will establish beyond any doubt these formal grounds to uphold the present interpretation.
Chapter 3
THE ENTROPY DERIVATION In the previous chapter, the derivation method using a characteristic function was connected to Feynman’s path integral approach. If the two approaches are syntactically equivalent this means that the clarifications about the interpretation raised by one formalism may be transferred to the other. This is the most important reason to approach the same problem by many different perspectives. Each formalism, being based upon different formal constructs, allows us to understand more deeply the underlying physics. It is like trying to paint a naturalist picture using innumerous impressionist paintings. The previous chapter showed that Feynman’s approach suggests that quantum phenomena must be connected to some sort of random process. However, the symbols of Feynman’s formalism are not adequate to make this connection explicit. That is, its syntax, although suggestive, cannot furnish a sound foundation to the semantic conclusions about the interpretation of quantum phenomena in terms of randomness. In the present chapter we approach the same problem on the derivation of the Schrödinger equation using a line of reasoning different from that of the second chapter, based upon different formal constructs. The main difference from the derivation presented in the second chapter will be the extensive use of the notion of entropy. The concept of entropy is closely related to those of fluctuations (by the fluctuation-dissipation theorem, for instance). Thus, if we succeed in making this derivation and if we accomplish to show the mathematical equivalence between it and the previous one (and Feynman’s path integral approach), then the use of the notion of fluctuation in the semantic level, that is, in the interpretation, would be justified—justified by the appearance of the corresponding constructs in the syntax. In chapters to follow, we will criticize the use of the notion of “observable” (among others) precisely because it does not have any support on the syntax: there is no symbol in the formalism that could be interpreted as the variable representing the observer. Thus, as a means of avoiding what we are trying to criticize, we must be very careful with the use of some semantic concepts. We must be always quite certain that semantic concepts are clearly represented by symbols in the underlying syntax of the approach. The derivation of this chapter will also allow us to find the phase space distribution function that represents any phenomenon of Quantum Mechanics. As the reader will see, this distribution is positive definite and gives the correct quantum mechanical results whenever we calculate average values in the usual classical statistical way. We show this by applying the formalism to the harmonic oscillator and the hydrogen atom.
66
Olavo Leopoldino da Silva Filho
3.1. THE DERIVATION We begin, as in the previous chapter, by presenting the postulates of the derivation. Again, we need only two axioms: Axiom 1 For an isolated system, the joint phase-space probability density related to any Quantum Mechanical phenomenon obeys the Liouville equation 𝑑𝐹(𝑞,𝑝;𝑡) 𝑑𝑡
= 0;
(3.1)
Axiom 2 The product of the mean-square deviations of the momenta and the positions of a physical process, calculated at each point 𝑞 of the configuration space, must satisfy ⟨𝛿𝑝(𝑞; 𝑡)2 ⟩⟨𝛿𝑞(𝑞; 𝑡)2 ⟩ =
ℏ2 4
.
(3.2)
Note that since the first axiom of the present derivation is equal to the first axiom of the characteristic function derivation, to assume expression (3.2) defining the second axiom of the present derivation must be equivalent to assuming the second axiom of the previous chapter—this, of course, must be proved. Before proving this, let us prove that these two axioms lead us to the derivation of the Schrödinger equation. Much of the developments here follow quite closely the usual approach of classical kinetic theory[39]. The Liouville equation can be written as ∂𝐹(𝑞, 𝑝; 𝑡) 𝑝 ∂𝐹(𝑞, 𝑝; 𝑡) ∂𝑉(𝑞) ∂𝐹(𝑞, 𝑝; 𝑡) + − =0 ∂𝑡 𝑚 ∂𝑞 ∂𝑞 ∂𝑝 and we can immediately integrate it with respect to 𝑝 and use the definitions ∫ 𝐹(𝑞, 𝑝; 𝑡)𝑑𝑝 = 𝜌(𝑞; 𝑡); ∫ 𝑝𝐹(𝑞, 𝑝; 𝑡)𝑑𝑝 = 𝑝(𝑞; 𝑡)𝜌(𝑞; 𝑡),
(3.3)
where 𝜌(𝑞; 𝑡) is the probability density upon configuration space and the product 𝑝(𝑞; 𝑡)𝜌(𝑞; 𝑡) is the momentum average also defined upon configuration space [𝑝(𝑞; 𝑡) is usually called the macroscopic momentum (see [39], p. 153)]. We arrive at the equation ∂𝜌(𝑞;𝑡) ∂𝑡
∂
+ ∂𝑞 [
𝑝(𝑞;𝑡) 𝑚
𝜌(𝑞; 𝑡)] = 0,
(3.4)
which is clearly a continuity equation for the probability density defined upon the configuration space. We may also multiply the Liouville equation by 𝑝 and integrate in 𝑝 to get, defining ∫ 𝑝2 𝐹(𝑞, 𝑝; 𝑡)𝑑𝑝 = 𝑀2 (𝑞; 𝑡),
(3.5)
The Entropy Derivation
67
the equation ∂ ∂𝑡
[𝜌(𝑞; 𝑡)𝑝(𝑞; 𝑡)] +
1 ∂𝑀2 (𝑞;𝑡) 𝑚
∂𝑞
+
∂𝑉(𝑞) ∂𝑞
𝜌(𝑞; 𝑡) = 0.
(3.6)
Using (3.4) into this last equation we find, after some straightforward calculations, the expression 1 ∂ 𝑚 ∂𝑞
[𝑀2 (𝑞; 𝑡) − 𝑝2 (𝑞; 𝑡)𝜌(𝑞; 𝑡)] + ∂𝑝(𝑞;𝑡)
𝜌(𝑞; 𝑡) [
∂𝑡
+
∂
𝑝2 (𝑞;𝑡)
∂𝑉(𝑞)
2𝑚
∂𝑞
( ∂𝑞
)+
.
(3.7)
]=0
The first term of the previous expression may be written as 𝑀2 (𝑞; 𝑡) − 𝑝2 (𝑞; 𝑡)𝜌(𝑞; 𝑡) = ∫ [𝑝2 − 𝑝2 (𝑞; 𝑡)]𝐹(𝑞, 𝑝; 𝑡)𝑑𝑝 = . ∫ [𝑝 − 𝑝(𝑞; 𝑡)]2 𝐹(𝑞, 𝑝; 𝑡)𝑑𝑝 We now put ⟨𝛿𝑝(𝑞; 𝑡)2 ⟩𝜌(𝑞; 𝑡) = ∫ [𝑝 − 𝑝(𝑞; 𝑡)]2 𝐹(𝑞, 𝑝; 𝑡)𝑑𝑝,
(3.8)
a result that clearly represents the momentum fluctuations of the physical process related to the phase-space probability density function 𝐹(𝑞, 𝑝; 𝑡). We must find an expression for ⟨𝛿𝑝(𝑞; 𝑡)2⟩. Note that at this point we have not used the second axiom yet—it will be necessary precisely to allow us to find the expression for ⟨𝛿𝑝(𝑞; 𝑡)2 ⟩ in terms of the probability density function 𝜌(𝑞; 𝑡). To show this, let us consider the entropy 𝑆(𝑞; 𝑡) defined upon the configuration space in such a way that the equal a priori probability postulate grants us that (see [25], pp. 290, 509) 𝜌(𝑞; 𝑡) = exp [
𝑆(𝑞; 𝑡) ], 𝑘𝐵
where 𝑘𝐵 is Boltzmann’s constant. We now make the system to fluctuate around 𝑞 by an amount 𝛿𝑞 in such a way that we have 𝜌(𝑞, 𝛿𝑞; 𝑡) = 𝜌(𝑞 + 𝛿𝑞; 𝑡) = 1
∂𝑆(𝑞+𝛿𝑞;𝑡)
𝜌(𝑞; 𝑡)exp [𝑘 ( 𝐵
∂𝑞
) 𝛿𝑞=0
1
∂2 𝑆(𝑞+𝛿𝑞;𝑡)
𝛿𝑞 + 2𝑘 ( 𝐵
∂𝑞 2
)
𝛿𝑞2 ]
,
(3.9)
𝛿𝑞=0
meaning that the perturbed 𝜌(𝑞, 𝛿𝑞; 𝑡) is a Gaussian and is related to the probability of having a fluctuation Δ𝜌 in the probability density around some point 𝑞 in the configuration space (this method of analysis has nothing new and was invented by Einstein and Smoluchovsky; see [40], p. 172; see also [25], pp. 288-291).
68
Olavo Leopoldino da Silva Filho Thus, it is obvious that +∞
⟨𝛿𝑞(𝑞; 𝑡)
2⟩
=
∫−∞ (𝛿𝑞)2 exp(𝛽𝛿𝑞 − 𝛾𝛿𝑞2 )𝑑(𝛿𝑞) +∞
∫−∞ exp(𝛽𝛿𝑞 − 𝛾𝛿𝑞2 )𝑑(𝛿𝑞)
+∞
2
∫ 𝛿𝑞exp(𝛽𝛿𝑞 − 𝛾𝛿𝑞2 )𝑑(𝛿𝑞) − ( −∞+∞ ) , ∫−∞ exp(𝛽𝛿𝑞 − 𝛾𝛿𝑞2 )𝑑(𝛿𝑞)
where we put 𝛾=−
1 ∂2 𝑆(𝑞; 𝑡) . 2𝑘𝐵 ∂𝑞2
A simple calculation of the integral shows that −1
⟨𝛿𝑞(𝑞; 𝑡)2 ⟩ =
1 ∂2 ln(𝑞; 𝑡) = −( ) . 2𝛾 ∂𝑞2
Note that we were looking for an expression for ⟨𝛿𝑝(𝑞; 𝑡)2 ⟩ but ended up with an expression for ⟨𝛿𝑞(𝑞; 𝑡)2 ⟩. In principle, there is no relation among these dispersions. However, such a relation is precisely the content of the second axiom (and of Quantum Mechanics). The second axiom allows us to write ⟨𝛿𝑝(𝑞; 𝑡)2 ⟩ = −
ℏ2 ∂2 ln𝜌(𝑞; 𝑡) 4 ∂𝑞2
and thus (see [41] p. 120 or [42]) ⟨𝛿𝑝(𝑞; 𝑡)2 ⟩𝜌(𝑞; 𝑡) = −
ℏ2 4
𝜌(𝑞; 𝑡)
∂2 ln𝜌(𝑞;𝑡)
.
∂𝑞 2
(3.10)
Substituting this last expression into (3.7) and writing (do not confuse 𝑆, the entropy function, with 𝑠) 𝜌(𝑞; 𝑡) = 𝑅(𝑞; 𝑡)2 ; 𝑝(𝑞; 𝑡) =
∂𝑠(𝑞;𝑡) ∂𝑞
,
(3.11)
we find ∂
∂𝑠
1
∂𝑠 2
ℏ2
∂2 𝑅
𝑅(𝑞; 𝑡)2 ∂𝑞 [∂𝑡 + 2𝑚 (∂𝑞) + 𝑉(𝑞) − 2𝑚𝑅(𝑞;𝑡) ∂𝑞2 ] = 0. Equations (3.4) and (3.12) are equivalent to the Schrödinger equation −
ℏ 2 ∂2 𝜓 ∂𝜓(𝑞; 𝑡) + 𝑉(𝑞)𝜓(𝑞; 𝑡) = 𝑖ℏ , 2 2𝑚 ∂𝑞 ∂𝑡
if we put
(3.12)
The Entropy Derivation
69
𝑖
𝜓(𝑞; 𝑡) = 𝑅(𝑞; 𝑡)exp [ℏ 𝑠(𝑞; 𝑡)],
(3.13)
as we have already shown in the previous chapter. Thus, the two axioms of the present chapter also allow us to mathematically derive the Schrödinger equation. In the next section we prove that this approach is mathematically equivalent to the characteristic function derivation of the previous chapter.
3.2. CONNECTIONS WITH THE CHARACTERISTIC FUNCTION DERIVATION The characteristic function derivation used the expression 𝑖
𝑍(𝑞, 𝛿𝑞; 𝑡) = ∫ 𝐹(𝑞, 𝑝; 𝑡)exp ( 𝑝𝛿𝑞) 𝑑𝑝
(3.14)
ℏ
which implies that 𝜌(𝑞; 𝑡) = lim 𝑍(𝑞, 𝛿𝑞; 𝑡), 𝑝(𝑞; 𝑡)𝜌(𝑞; 𝑡) = lim − 𝑖ℏ 𝛿𝑞→0
𝛿𝑞→0
∂𝑍(𝑞, 𝛿𝑞; 𝑡) ∂(𝛿𝑞)
and ∫ 𝑝2 𝐹(𝑞, 𝑝; 𝑡)𝑑𝑝 = lim − ℏ2 𝛿𝑞→0
∂2 𝑍(𝑞, 𝛿𝑞; 𝑡) . ∂(𝛿𝑞)2
Then, (3.8) becomes 2
⟨𝛿𝑝(𝑞; 𝑡)2 ⟩𝜌(𝑞; 𝑡) = lim [−ℏ2 𝛿𝑞→0
∂2 𝑍(𝑞, 𝛿𝑞; 𝑡) ∂𝑍(𝑞, 𝛿𝑞; 𝑡) + ℏ2 ( ) ], 2 ∂(𝛿𝑞) ∂(𝛿𝑞)
which may be rearranged as ⟨𝛿𝑝(𝑞; 𝑡)2 ⟩𝜌(𝑞; 𝑡) = −ℏ2 lim 𝑍(𝑞, 𝛿𝑞; 𝑡) 𝛿𝑞→0
∂2 ln𝑍(𝑞, 𝛿𝑞; 𝑡) . ∂(𝛿𝑞)2
It thus remains for us to give the explicit appearance of this expression and show that it is equivalent to (3.10). We may expand the exponential in (3.14) to write the characteristic function, up to second order in 𝛿𝑞, as 𝑍(𝑞, 𝛿𝑞; 𝑡) = ∫ 𝐹(𝑞, 𝑝; 𝑡)𝑑𝑝 + (𝛿𝑞)2 𝑖𝛿𝑞 ∫ 𝑝𝐹(𝑞, 𝑝; 𝑡)𝑑𝑝 − ∫ 𝑝2 𝐹(𝑞, 𝑝; 𝑡)𝑑𝑝 + 𝑜(𝛿𝑞3 ) ℏ 2ℏ2
70
Olavo Leopoldino da Silva Filho
and using (3.3) and (3.5) we rewrite it as 𝑍(𝑞, 𝛿𝑞; 𝑡) = 𝜌(𝑞; 𝑡) +
𝑖𝛿𝑞 ℏ
(𝛿𝑞)2
𝑝(𝑞; 𝑡)𝜌(𝑞; 𝑡) −
2ℏ2
𝑀2 (𝑞; 𝑡) + 𝑜(𝛿𝑞3 ).
(3.15)
The left-hand side, on the other hand, has to be written as 𝑍(𝑞, 𝛿𝑞; 𝑡) = 𝜓 ∗ (𝑞 −
𝛿𝑞 𝛿𝑞 ; 𝑡) 𝜓 (𝑞 + ; 𝑡) 2 2
and using (3.13) we find, up to second order in the infinitesimal parameter 𝛿𝑞, the expression [see (2.6)] 𝑍(𝑞, 𝛿𝑞; 𝑡) = 𝛿𝑞 2 ∂2 𝑅 ∂𝑅 2 𝑖𝛿𝑞 ∂𝑠 , {𝑅(𝑞, 𝑡)2 + ( ) [𝑅(𝑞; 𝑡) 2 − ( ) ]} exp ( ) 2 ∂𝑞 ∂𝑞 ℏ ∂𝑞 or, explicitly 𝑍(𝑞, 𝛿𝑞; 𝑡) = 𝑅(𝑞; 𝑡)2 + (𝛿𝑞)2 2
1
[4 𝑅(𝑞; 𝑡)2
𝑖𝛿𝑞
∂2 ln𝑅(𝑞;𝑡)2 ∂𝑞 2
ℏ
−
𝑅(𝑞; 𝑡)2 𝑅(𝑞;𝑡)2 ℏ2
∂𝑠(𝑞;𝑡) ∂𝑞
+
∂𝑠(𝑞,𝑡) 2
(
∂𝑞
.
(3.16)
) ]
Comparison of (3.15) with (3.16) shows that 𝜌(𝑞; 𝑡) = 𝑅(𝑞; 𝑡)2 ; 𝑝(𝑞; 𝑡) =
∂𝑠(𝑞;𝑡) ∂𝑞
,
as in (3.11), and 𝑀2 (𝑞; 𝑡) = −
ℏ2 ∂2 ln𝜌(𝑞; 𝑡) 𝜌(𝑞; 𝑡) + 𝑝(𝑞; 𝑡)2 𝜌(𝑞; 𝑡). 4 ∂𝑞2
Now, using (3.15), we can write (up to second order) 𝑍(𝑞, 𝛿𝑞; 𝑡) = 𝜌(𝑞; 𝑡) [1 +
(𝛿𝑞)2 1 ∂2 ln𝜌 𝑝(𝑞; 𝑡)2 𝑖𝛿𝑞 𝑝(𝑞; 𝑡) + ( − )], ℏ 2 4 ∂𝑞2 ℏ2
and thus ∂2 1 ∂2 ln𝜌(𝑞; 𝑡) ln𝑍(𝑞, 𝛿𝑞; 𝑡) = , 𝛿𝑞→0 ∂(𝛿𝑞)2 4 ∂𝑞2 lim
which implies equation (3.10), as we were willing to show.
(3.17)
The Entropy Derivation
71
Another way of comparing the two derivations is to substitute (3.15) into the equation satisfied by the characteristic function [see (2.4)] and take the real and imaginary parts to find equations (3.5) and (3.6). These results show that the two derivations are mathematically equivalent, and their comparison gives us another perspective about the physical content of the second axiom of the characteristic function derivation. That axiom is rather mathematical and hides the physical content related to the imposition it makes upon the dispersions in position and momentum of some physical process at each point of the configuration space. One may be feeling uneasy with relation (3.2), since it resembles Heisenberg’s dispersion relations, but with an equal sign. However, that relation is valid for each point of the configuration space; if one integrates it with respect to 𝑞, the result is the usual Heisenberg’s dispersion relations (see, for example, [42]). Indeed, if we begin with ⟨𝛿𝑝(𝑞; 𝑡)2 ⟩⟨𝛿𝑞(𝑞; 𝑡)2 ⟩ =
ℏ2 4
,
and use ⟨𝛿𝑝(𝑞; 𝑡)2 ⟩ = −
ℏ2 𝜕 2 𝑙𝑛𝜌(𝑥,𝑡) 4
𝜕𝑥 2
,
then ∆𝑝2 ∙ ∆𝑥 2 = −
ℏ2 4
∫ 𝜌(𝑥, 𝑡)
𝜕 2 𝑙𝑛𝜌(𝑥,𝑡) 𝜕𝑥 2
𝑑𝑥 ∫(𝑥 − 𝑥)2 𝜌(𝑥, 𝑡)𝑑𝑥,
with ∆𝑝2 and ∆𝑥 2 as implied. But then ∆𝑝2 ∙ ∆𝑥 2 = −
ℏ2 4
𝜕𝑙𝑛𝜌(𝑥,𝑡) 2
∫ 𝜌(𝑥, 𝑡) (
𝜕𝑥
) 𝑑𝑥 ∫(𝑥 − 𝑥)2 𝜌(𝑥, 𝑡)𝑑𝑥.
If we rewrite this last expression as ∆𝑝2 ∙ ∆𝑥 2 = −
ℏ2 4
∫ (√𝜌(𝑥, 𝑡)
𝜕𝑙𝑛𝜌(𝑥,𝑡) 2 𝜕𝑥
2
) 𝑑𝑥 ∫[√𝜌(𝑥, 𝑡)(𝑥 − 𝑥)] 𝑑𝑥,
and apply the Schwartz inequality, we get ∆𝑝2 ∙ ∆𝑥 2 ≥ −
ℏ2 4
(∫ √𝜌(𝑥, 𝑡)(𝑥 − 𝑥) √𝜌(𝑥, 𝑡)
𝜕𝑙𝑛𝜌(𝑥,𝑡) 𝜕𝑥
2
𝑑𝑥) ,
and thus (after a simplification and an integration by parts) ∆𝑝2 ∙ ∆𝑥 2 ≥
ℏ2 4
.
We note that one may consider the characteristic function 𝑍(𝑞, 𝛿𝑞; 𝑡) as a momentum partition function, in the same mathematical sense that the function 𝑍𝑒 = ∑𝑟 𝑒 −𝛽𝐸𝑟 of usual statistical mechanics is an energy partition function. Indeed, we may establish the close
72
Olavo Leopoldino da Silva Filho
formal analogy between these two functions for the derivation of useful statistical quantities. This comparison is presented in Table 1, where we present the cases for calculations with 𝑍𝑒 and with 𝑍(𝑞, 𝛿𝑞; 𝑡). The results shown in Table 1 address an old intuition first presented by Callen (see [43], pp. 458ff.), who argued in the realm of thermodynamics that we should write the characteristic function 𝑍𝑒 in its most general form as 1
𝑓𝑖 = 𝑍 𝑒𝑥𝑝(−𝛽𝐸𝑖 − 𝜆⃗𝑝 ⋅ 𝑃⃗⃗𝑖 − 𝜆⃗𝐽 ⋅ 𝐽⃗𝑖 ), where the Lagrange parameters (𝛽, 𝜆⃗𝑝 , 𝜆⃗𝐽 ) play the same role in the generalized theory as the parameter 𝛽 plays in the usual formalism, and 𝑃⃗⃗ and 𝐽⃗ are the linear and angular momenta, respectively. Table 1. Comparison between the energy partition function and the momentum partition function as defined in the present work Energy Partition Function 𝑍𝑒 = ∑𝑟 𝑒 −𝛽𝐸𝑟 ⟨𝐸⟩ = − ⟨𝐸 2 ⟩
=
⟨Δ𝐸 2 ⟩
∂ln𝑍𝑒 ∂𝛽
1 ∂2 𝑍𝑒 𝑍𝑒 ∂𝛽 2
=
∂2 ln𝑍𝑒 ∂𝛽 2
Momentum Partition Function 𝑖
𝑍 = ∫ 𝐹(𝑞, 𝑝; 𝑡)exp ( 𝑝𝛿𝑞) 𝑑𝑝 ⟨𝑝⟩ = lim𝛿𝑞→0 − 𝑖ℏ ⟨𝑝2 ⟩
= lim𝛿𝑞→0 −
⟨𝛿𝑝2 ⟩
ℏ ∂ln𝑍
∂(𝛿𝑞)
ℏ2 ∂2 𝑍 𝑍 ∂(𝛿𝑞)2
= lim𝛿𝑞→0 − ℏ2
∂2 ln𝑍 ∂(𝛿𝑞)2
Callen thus concludes: “In accepting the existence of a conserved macroscopic energy function as the first postulate of thermodynamics, we anchor that postulate directly in Noether’s theorem and in the time-translation symmetry of the physical laws. An astute reader will perhaps turn the symmetry argument around. There are seven ‘first integrals of the motion’ (as the conserved quantities are known in mechanics). These seven conserved quantities are the energy, the three components of the linear momentum and the three components of the angular momentum; and they follow in parallel fashion from the translation in ‘space-time’ and from rotation. Why, then, does energy appear to play a unique role in thermostatistics? Should not momentum and angular momentum play parallel roles with the energy? In fact, the energy is not unique in thermodynamics. The linear momentum and angular momentum play precisely parallel roles. The asymmetry in our account of thermostatistics is a purely conventional one that obscures the true nature of the subject. We have followed the standard convention of restricting attention to systems that are macroscopically stationary, in which case the momentum and angular momentum arbitrarily are required to be zero and do not appear in the analysis. But astrophysicists, who apply thermostatistics to rotating galaxies, are quite familiar with a more complete
The Entropy Derivation
73
form of thermostatistics. In that formulation the energy, the linear momentum, and the angular momentum play fully analogous roles.(...) The proper ‘first law of thermodynamics’, (...) is the symmetry laws of physics under space-time translation and rotation, and the consequent existence of conserved energy, momentum, and angular momentum functions.” (see [43], pp. 461-462)
We note that we could have defined the characteristic function with the energy term included (multiplied by 𝛿𝑡, and this is exactly what appeared when we made the connections with Feynman’s formalism); the inclusion of this term would not change our derivations. Moreover, the complete ‘momentum term’, that could be written as 𝜆⃗𝑝 ⋅ 𝑃⃗⃗𝑖 + 𝜆⃗𝐽 ⋅ 𝐽⃗𝑖 is just our term 𝑝⃗ ⋅ 𝛿𝑞⃗, where 𝛿𝑞⃗ plays the role of the parameters 𝜆⃗𝑝 and 𝜆⃗𝐽 , if we remember that 𝑝⃗ already includes angular momentum (as should had become clear when we approached the problem of quantization in generalized coordinates).
3.3. THREE-DIMENSIONAL DERIVATION The generalization of the entropy derivation to three-dimensions is straightforward, but presents some new elements that were kept hidden by the one-dimensional derivation. Thus, we now briefly repeat the previous derivation using three dimensions. We may put 𝜌(𝑞⃗; 𝑡) = ∫ 𝐹(𝑞⃗, 𝑝⃗; 𝑡)𝑑3 𝑝⃗; 𝑝⃗(𝑞⃗; 𝑡)𝜌(𝑞⃗; 𝑡) = ∫ 𝑝⃗𝐹(𝑞⃗, 𝑝⃗; 𝑡)𝑑 3 𝑝⃗ as a scalar and a vector quantity, respectively, and ⃗⃗⃗2 (𝑞⃗; 𝑡) = ∫ 𝑝⃗𝑝⃗𝐹(𝑞⃗, 𝑝⃗; 𝑡)𝑑 3 𝑝⃗ 𝑀 as a tensor. The Liouville equation becomes ∂𝐹(𝑞⃗, 𝑝⃗; 𝑡) 𝑝⃗ + ⋅ ∇𝑞 𝐹(𝑞⃗, 𝑝⃗; 𝑡) − ∇𝑞 𝑉(𝑞⃗) ⋅ ∇𝑝 𝐹(𝑞⃗, 𝑝⃗; 𝑡) = 0, ∂𝑡 𝑚 where ∇𝑞 and ∇𝑝 are the gradient operators taken with respect to the coordinates and the momenta, respectively. Integration of this last equation with respect to 𝑝⃗ gives immediately ∂𝜌(𝑞⃗⃗;𝑡) ∂𝑡
𝑝⃗(𝑞⃗⃗;𝑡)
+ ∇𝑞 ⋅ [
𝑚
𝜌(𝑞⃗; 𝑡)] = 0,
which is the continuity equation. Multiplying the Liouville equation by 𝑝⃗ and integrating it with respect to 𝑝⃗ we find
(3.18)
74
Olavo Leopoldino da Silva Filho ∂[𝑝⃗(𝑞⃗; 𝑡)𝜌(𝑞⃗; 𝑡)] 1 ⃗⃗⃗2 (𝑞⃗; 𝑡) + 𝜌(𝑞⃗; 𝑡)∇𝑞 𝑉(𝑞⃗) = 0. + ∇𝑞 ⋅ 𝑀 ∂𝑡 𝑚
Expanding the time derivative and using the continuity equation in the previous expression gives ∂𝑝⃗(𝑞⃗⃗;𝑡)
𝜌(𝑞⃗; 𝑡) [
∂𝑡 𝑝⃗(𝑞⃗⃗;𝑡)𝑝⃗(𝑞⃗⃗;𝑡)
[
+ ∇𝑞 𝑉(𝑞⃗)] +
1 𝑚
⃗⃗⃗2 (𝑞⃗; 𝑡) − ∇𝑞 ⋅ 𝑀
⋅ ∇𝑞 𝜌(𝑞⃗; 𝑡) + 𝜌(𝑞⃗; 𝑡)
𝑚
𝑝⃗(𝑞⃗⃗;𝑡)∇𝑞 ⋅𝑝⃗(𝑞⃗⃗;𝑡) 𝑚
.
(3.19)
]=0
We now drop the label ‘q’ in the gradient operators, since only derivations with respect to ⃗⃗⃗2𝑞 (𝑞⃗; 𝑡) = 𝑝⃗(𝑞⃗; 𝑡)𝑝⃗(𝑞⃗; 𝑡) and use the relation 𝑞 appear, we also define 𝑀 ⃗⃗⃗2𝑞 (𝑞⃗; 𝑡)𝜌(𝑞⃗; 𝑡)] = 𝑀 ⃗⃗⃗2𝑞 (𝑞⃗; 𝑡) ⋅ ∇𝜌(𝑞⃗; 𝑡) + ∇ ⋅ [𝑀 𝜌(𝑞⃗; 𝑡)𝑝⃗(𝑞⃗; 𝑡)∇ ⋅ 𝑝⃗(𝑞⃗; 𝑡) + 𝜌(𝑞⃗; 𝑡)[𝑝⃗(𝑞⃗; 𝑡) ⋅ ∇]𝑝⃗(𝑞⃗; 𝑡) in expression (3.19) to find 𝜌(𝑞⃗; 𝑡) [
∂𝑝⃗(𝑞⃗; 𝑡) 𝑝⃗2 (𝑞⃗; 𝑡) + ∇( ) + ∇𝑉(𝑞⃗)] + ∂𝑡 2𝑚
1 ⃗⃗⃗2 (𝑞⃗; 𝑡) − 𝑀 ⃗⃗⃗2𝑞 (𝑞⃗; 𝑡)𝜌(𝑞⃗; 𝑡)] = 0 ∇ ⋅ [𝑀 𝑚
.
Writing 𝑝⃗(𝑞⃗; 𝑡) = ∇𝑠(𝑞⃗; 𝑡) we rewrite the last equation as ∂𝑝⃗(𝑞⃗⃗;𝑡)
𝜌(𝑞⃗; 𝑡)∇ { 1
∂𝑡
+
[∇𝑠(𝑞⃗⃗;𝑡)]2 2𝑚
+ 𝑉(𝑞⃗)} +
𝑞
⃗⃗⃗2 (𝑞⃗; 𝑡) − 𝑀 ⃗⃗⃗2 (𝑞⃗; 𝑡)𝜌(𝑞⃗; 𝑡)] = 0 ∇ ⋅ [𝑀 𝑚
.
(3.20)
The last term in the brackets can be easily seen to be the generalization for three dimensions of the momentum fluctuation term expressed in (3.8). We thus put ⟨𝛿𝑝⃗2 (𝑞⃗; 𝑡)⟩𝜌(𝑞⃗; 𝑡) = ∫ [𝑝⃗ − 𝑝⃗(𝑞⃗; 𝑡)][𝑝⃗ − 𝑝⃗(𝑞⃗; 𝑡)]𝐹(𝑞⃗, 𝑝⃗; 𝑡)𝑑 3 𝑝⃗, In fact, this is exactly what one does in classical kinetic theory when using the Liouville equation to calculate the temperature. In such cases one defines the temperature as 3 𝑚𝛿𝑣⃗ 2 3 𝜌(𝑞⃗)𝑘𝐵 𝑇 = ∫ 𝐹(𝛿𝑣⃗) 𝑑 𝛿𝑣⃗, 2 2
The Entropy Derivation
75
and thus (see [39], pp. 155,156) 3𝑚𝑘𝐵 𝑇 = ⟨𝛿𝑝⃗2 (𝑞⃗; 𝑡)⟩.
(3.21)
Now we write
𝑍(𝑞⃗, 𝛿𝑞⃗; 𝑡) = 𝜌(𝑞⃗; 𝑡)exp [𝛿𝑞⃗ ⋅ ∇𝑆(𝑞⃗; 𝑡) −
1 ∂2 𝑆 ∑ 𝛿𝑞𝑖 𝛿𝑞𝑗 | |], 2𝑘𝐵 ∂𝑞𝑖 ∂𝑞𝑗 𝑖𝑗
where 𝑆 is the entropy function and we already identify 𝜌(𝑞⃗, 𝛿𝑞⃗; 𝑡) with 𝑍(𝑞⃗; 𝛿𝑞⃗; 𝑡). Since we are interested in random fluctuations, we must have ⟨𝛿𝑞𝑖 ⟩ = ⟨𝛿𝑞𝑖 𝛿𝑞𝑗 ⟩ = 0, 𝑖 ≠ 𝑗. We end with −1
⟨𝛿𝑞𝑖2 (𝑞⃗; 𝑡)⟩
∂2 𝑆(𝑞⃗; 𝑡) = 𝑘𝐵 | | . ∂𝑞𝑖2
If we also have for the momenta ⟨𝛿𝑝𝑖 ⟩ = ⟨𝛿𝑝𝑖 𝛿𝑝𝑗 ⟩ = 0, 𝑖 ≠ 𝑗 and if we use the second axiom to impose that ⟨𝛿𝑞⃗𝑖2 (𝑞⃗; 𝑡)⟩⟨𝛿𝑝⃗𝑖2 (𝑞⃗; 𝑡)⟩ =
ℏ2 , 4
then we find that ⟨𝛿𝑝⃗2 (𝑞⃗; 𝑡)⟩ = ∑3𝑖=1 ⟨𝛿𝑝𝑖2 (𝑞⃗; 𝑡)⟩ = −
ℏ2 4
∇2 ln𝜌(𝑞⃗; 𝑡)
(3.22)
and, if we put 𝜌(𝑞⃗; 𝑡) = 𝑅(𝑞⃗; 𝑡)2, as usual, this result allows us to write our equation (3.20) as ∂𝑝⃗(𝑞⃗⃗;𝑡)
𝜌(𝑞⃗; 𝑡)∇ {
∂𝑡
+
[∇𝑠(𝑞⃗⃗;𝑡)]2 2𝑚
+ 𝑉(𝑞⃗) −
ℏ2 ∇2 𝑅(𝑞⃗⃗;𝑡) 2𝑚𝑅(𝑞⃗⃗;𝑡)
} = 0.
(3.23)
Equations (3.18) and (3.23) are identical to the Schrödinger equation in three dimensions, if we proceed in the same fashion as in the one-dimensional derivation. The notion of temperature is very interesting and we will address this issue more deeply in a future chapter.
76
Olavo Leopoldino da Silva Filho
3.4. THE PHASE-SPACE PROBABILITY DENSITY FUNCTION The previous developments have very important consequences for our representation of the problem upon phase space. The equivalence between the two approaches shows us that we must identify 𝜌(𝑞 + 𝛿𝑞; 𝑡) with 𝑍(𝑞, 𝛿𝑞; 𝑡), since they are exactly equal up to second order in 𝛿𝑞. Then we may use (3.9) to write this identification as 1
∂S(q+δq;t)
ρ(q; t)exp [k (
∂q
B
1
) δq=0
δq + 2k ( B
∂2 S(q+δq;t) ∂q2
) δq=0
δq2 ] = .
i
∫ F(q, p; t)exp (ℏ pδq) dp This allows us to invert the Fourier transform to find 𝐹(𝑞, 𝑝; 𝑡) =
𝜌(𝑞;𝑡) √2𝜋𝑣(𝑞;𝑡)
exp {−
[𝑝−𝑝(𝑞,𝑡)]2 2𝑣(𝑞;𝑡)
},
(3.24)
where we already used 𝑝(𝑞; 𝑡) =
1 ∂𝑆(𝑞 + 𝛿𝑞; 𝑡) ∂ln𝑍(𝑞, 𝛿𝑞; 𝑡) ( ) = −𝑖ℏ lim 𝛿𝑞→0 𝑘𝐵 ∂𝑞 ∂(𝛿𝑞) 𝛿𝑞=0
and we put 𝑣(𝑞; 𝑡) = −
1 ∂2 𝑆(𝑞 + 𝛿𝑞; 𝑡) ℏ2 ∂2 ln𝜌(𝑞; 𝑡) ( ) = − . 𝑘𝐵 ∂𝑞2 4 ∂𝑞2 𝛿𝑞=0
Function 𝑣(𝑞; 𝑡) (the variance on 𝑝) will be generally positive definite. If we note that 𝑣(𝑞; 𝑡) = ⟨𝛿𝑝(𝑞; 𝑡)2 ⟩, expression (3.24) tells us that the probability distribution function of any phenomenon of Quantum Mechanics is given, at each point of the configuration space, as a Gaussian function with average momentum given by 𝑝(𝑞; 𝑡) and statistical variance given by 𝑣(𝑞; 𝑡). This result is of utmost importance and will allow us to clarify the use of the parameter 𝛿𝑞 as an infinitesimal quantity in the previous derivations. Note that this clarification is demanding for a number of reasons: first, we must know, beyond simple intuitions related to fluctuations, what it means to keep this parameter infinitesimal, since this is the condition to apply both derivations (and this is why the infinitesimal character of 𝛿𝑞 was included in the second axiom of the previous chapter). Note that the condition of infinitesimality is imperative, since the derivations would apply only if we consider this parameter infinitesimal—something that becomes even more obvious if we consider derivations in generalized coordinates, since we will need the equality ⃗⃗, 𝑝⃗ ⋅ 𝛿𝑞⃗ = 𝑃⃗⃗ ⋅ 𝛿𝑄 ⃗⃗ to be small related to Mathieu canonical transformations, which requires 𝛿𝑞⃗ and 𝛿𝑄 variations.
The Entropy Derivation
77
Second and also most important, although we are considering the parameter 𝛿𝑞 infinitesimal, we have performed an integration in 𝛿𝑞 in the previous mathematical steps (this is what the inverse Fourier transform does). It is not clear at all how we can consider 𝛿𝑞 infinitesimal and still take an integral in the interval (−∞, +∞) in this variable; in fact, more than unclear, this seems formally inconsistent. This is not the moment to clarify this issue, but we still may give a clue to the solution. The complete solution of this problem will be given in a future chapter, when we address the connection between Quantum Mechanics and the Central Limit Theorem (and we ask the reader to apply, until then, the principle of charity with respect to the use of 𝐹(𝑞, 𝑝; 𝑡) in our calculations). The fact that the phase space distribution function is a Gaussian function at each point of the configuration space is very suggestive, for we know that independent random phenomena are represented by such functions in the limit of 𝑛 → ∞, where 𝑛 is some variable that accounts for the sum of random variables. This is exactly what is done to derive the distribution function for the random walk problem, for instance (see [25], p. 37-39, where the author uses a characteristic function approach)—the Central Limit Theorem, in passing, is called an infinitesimal theorem of statistics. This means (as we will show in all mathematical detail in a later chapter) that the “infinitesimal” character of our parameter 𝛿𝑞 is related to the fact that it may be written as 𝛿𝑞 = Δ𝑞/𝑛, with 𝑛 → ∞ and it will be shown that we can make our integrations in the variable Δ𝑞, which is a usual finite variable [and thus is not inconsistent with integrations in the interval (−∞, +∞).] Before doing that, however, we must show beyond doubt that quantum processes are related to stochastic processes (from which the random character that allows the use of the Central Limit Theorem follows naturally).
3.4.1. The Stochastic Liouville Equation Another point that needs clarification is the fact that the probability distribution function 𝐹(𝑞, 𝑝; 𝑡), when substituted back in the Liouville equation does not satisfy it for each phase space point (𝑞, 𝑝). In fact, this function satisfies only the momentum integrated version of the Liouville equation, for which we have ∫ 𝑝𝑘 𝐿𝑜𝑝 (𝐹(𝑞, 𝑝; 𝑡))𝑑𝑝 = 0, 𝑘 = 0,1,2 where 𝐿𝑜𝑝 is the Liouville operator. In fact, if we substitute 𝐹(𝑥, 𝑝; 𝑡) into the Liouville equation, use the Schrödinger equations (3.4) and (3.12) to eliminate the time derivatives of and 𝜌(𝑞, 𝑡) and 𝑝(𝑞, 𝑡) and collect similar terms, we end with the expression 𝐿𝑜𝑝 {𝐹(𝑞,𝑝;𝑡)} 𝐹(𝑞,𝑝;𝑡) 𝜕𝑝(𝑞,𝑡)
𝑣} + 2
𝜕𝑞
𝜕𝑣
𝜕𝑣
= 𝜌 𝜕𝑞 {𝑝[𝑝 − 𝑝(𝑞, 𝑡)]2 − 2𝑣[𝑝 − 𝑝(𝑞, 𝑡)] − 𝑣𝑝} + 𝑚𝜌 𝜕𝑡 {[𝑝 − 𝑝(𝑞, 𝑡)]2 −
𝜌𝑣{[𝑝 − 𝑝(𝑞, 𝑡)]2 − 𝑣}.
78
Olavo Leopoldino da Silva Filho Now, it can be shown that the term 𝜕𝑣⁄𝜕𝑡 can be put in the form 𝜕𝑣
𝜕𝑣
𝑚 𝜕𝑡 = − 𝜕𝑞 𝑝(𝑞, 𝑡) − 2𝑣
𝜕𝑝(𝑞,𝑡) 𝜕𝑞
+
ℏ2 𝜕 3 𝑝(𝑞,𝑡) 𝜕𝑞 3
4
ℏ2 𝜕 2 𝑝(𝑞,𝑡) 𝜕𝜌
+ 4𝜌
𝜕𝑞 2
𝜕𝑞
.
and substitution of this term in the previous equation gives, for the most usual case in which we have 𝜕 2 𝑝(𝑥, 𝑡)⁄𝜕𝑥 2 and 𝜕 3 𝑝(𝑥, 𝑡)⁄𝜕𝑥 3 both equal to zero and thus 1 𝜕𝑣 [𝑝−𝑝(𝑞,𝑡)]2
𝐿𝑜𝑝 {𝐹(𝑞, 𝑝; 𝑡)} = 2𝑚 𝜕𝑞 {
𝑣(𝑞,𝑡)
− 3}.
Thus, if we substitute the probability density function (3.24) into the Liouville equation, we do not have an exact analytic solution. But how close to a solution we get? To see the answer to that question for a specific problem one can consider the harmonic oscillator problem and calculate the following quantity 𝑠𝑛 (𝑞, 𝑝; 𝑡) =
𝐿𝑜𝑝 {𝐹𝑛 (𝑞, 𝑝; 𝑡)} . 𝐹𝑛 (𝑞, 𝑝; 𝑡)
For the states 𝑛 = 1,2 of the harmonic oscillator we have the results shown in Figure 3-1, where it is obvious that the function 𝑠𝑛 (𝑞, 𝑝; 𝑡) becomes different from zero (𝐹𝑛 is not a solution of the Liouville operator) only at the points where we can find high values of the fluctuations (we divided by 𝐹𝑛 (𝑞, 𝑝; 𝑡) so that the exponential decay of this function does not influence the result). This means that we are very close to a solution, but we failed to take into account the detailed aspect of the fluctuations when we used the Liouville equation—as it is obvious from the fact that this equation does not take into account such a behavior. The question, then, is this: can we find a stochastic equation, taking into account the presence of fluctuations, that does not alter our derivation? To answer this question, we begin by writing our stochastic Liouville equation as 𝐿𝑠𝑡𝑜𝑐 𝑜𝑝 {𝐹𝑛 (𝑞, 𝑝; 𝑡)} =
∂𝐹𝑛 ∂𝑡
𝑝 ∂𝐹𝑛
+𝑚
∂𝑞
∂𝑉
− (∂𝑞 + 𝛼𝑛 (𝑞, 𝑝; 𝑡))
∂𝐹𝑛 ∂𝑝
= 0,
where, now, the term 𝛼𝑛 (𝑞, 𝑝; 𝑡) represents a stochastic force that must be added to the usual Newtonian force. We then substitute our function 𝐹𝑛 (𝑞, 𝑝; 𝑡) =
𝜌𝑛 (𝑞,𝑡) √2𝜋𝑣𝑛 (𝑞,𝑡)
exp {−
[𝑝−𝑝𝑛 (𝑞;𝑡)]2 2𝑣𝑛 (𝑞;𝑡)
}
into this equation to find (compare this with Wigner’s seminal work[44]) 𝛼𝑛 (𝑞, 𝑝; 𝑡)
∂𝐹𝑛 (𝑞,𝑝;𝑡) ∂𝑝
and since we have
=
1 𝑝−𝑝𝑛 (𝑞,𝑡) 𝜕𝑣𝑛 (𝑞,𝑡) [𝑝−𝑝𝑛 (𝑞,𝑡)]2 2𝑚 𝑣𝑛 (𝑞,𝑡)
𝜕𝑞
{
𝑣𝑛 (𝑞,𝑡)
− 3} 𝐹𝑛 (𝑞, 𝑝; 𝑡),
The Entropy Derivation ∂𝐹𝑛 (𝑞,𝑝;𝑡) ∂𝑝
=−
79
𝑝−𝑝𝑛 (𝑞;𝑡) 𝐹𝑛 (𝑞, 𝑝; 𝑡), 𝑣𝑛 (𝑞;𝑡)
we get 𝛼𝑛 (𝑞, 𝑝; 𝑡) = −
1 ∂𝑣𝑛 (𝑞, 𝑡) [𝑝 − 𝑝𝑛 (𝑞; 𝑡)]2 { − 3} 2𝑚 ∂𝑞 𝑣𝑛 (𝑞, 𝑡)
Figure 3.1. The solutions of the Liouville equation for states n=1,2 of the harmonic oscillator. The solutions are everywhere zero, except for points close to high values of the fluctuations.
as the explicit analytic term that must be added to the Liouville equation to make 𝐹𝑛 (𝑞, 𝑝; 𝑡) an analytic solution (whenever 𝜕 2 𝑝(𝑞, 𝑡)⁄𝜕𝑞 2 = 𝜕 3 𝑝(𝑞, 𝑡)⁄𝜕𝑞3 = 0). Note that 𝛼𝑛 (𝑞, 𝑝; 𝑡) depends only upon the fluctuations so that our equation becomes 𝐿𝑠𝑡𝑜𝑐 𝑜𝑝 {𝐹𝑛 (𝑥, 𝑝; 𝑡)} =
∂𝐹𝑛 𝑝 ∂𝐹𝑛 ∂𝑉 1 ∂𝑣𝑛 (𝑞, 𝑡) [𝑝 − 𝑝𝑛 (𝑞; 𝑡)]2 ∂𝐹𝑛 + −{ − { − 3}} = 0. ∂𝑡 𝑚 ∂𝑥 ∂𝑥 2𝑚 ∂𝑞 𝑣𝑛 (𝑞, 𝑡) ∂𝑝
80
Olavo Leopoldino da Silva Filho
Example: just as an example, let us consider the harmonic oscillator problem. Then, for n = 1,2 we have 𝑣1 (𝑞) =
𝑞2 + 1 4𝑞4 + 4𝑞2 + 5 (𝑞) , 𝑣 = , 𝑝𝑛 (𝑞) = 0, 2 2𝑞 2 2(2𝑞2 − 1)2
and 𝜌1 (𝑞), 𝜌2 (𝑞) as usual (with ℏ = 1, 𝑚 = 1). If we substitute these results into 𝐹𝑛 (𝑞, 𝑝; 𝑡) we find 𝐹1 (𝑞, 𝑝) and 𝐹2 (𝑞, 𝑝) as 𝐹1 (𝑞, 𝑝) = 𝐹2 (𝑞, 𝑝) =
2|𝑞|
2
𝑞2 𝑒 −𝑞 exp (−
𝜋√1 + 𝑞2 |(2𝑞2 − 1)|
2𝜋√4𝑞4 + 4𝑞2 + 5
(2𝑞2
𝑞2 𝑝2 ) 1 + 𝑞2 2 −𝑞2
− 1) 𝑒
(2𝑞2 − 1)2 exp [− 𝑝2 ] (4𝑞4 + 4𝑞2 + 5)
,
and the stochastic Liouville corrections 𝛼𝑛 (𝑞, 𝑝) are 𝑝2 3 − 3 2 𝑞(1 + 𝑞 ) 2𝑞 , 8𝑝2 𝑞(2𝑞2 + 3) 12𝑞(2𝑞2 + 3) 𝛼2 (𝑞, 𝑝) = − (2𝑞2 − 1)(4𝑞4 + 4𝑞2 + 5) (2𝑞2 − 1)3 𝛼1 (𝑞, 𝑝) =
and substitution of these expressions into the stochastic Liouville equation gives zero, as obvious (by construction). However, this does not suffice for the present analysis. As the reader recalls, our derivation of the Schrödinger equation was done by integrating the usual Liouville equation (to arrive at the continuity equation) and also by integrating the usual Liouville equation multiplied by 𝑝 (to arrive at the Schrödinger equation, properly). If this method of derivation ought to be still valid, the inclusion of the stochastic term cannot introduce any spurious term into the resulting equations. This is a constraint upon the fluctuating term. It can be easily shown that we have 2
∫
∂𝑣n (𝑝−𝑝𝑛 (𝑞;𝑡)) [ 𝑣 (𝑞;𝑡) ∂𝑞 𝑛
− 3] 𝑝𝑘
𝜕𝐹𝑛 (𝑞,𝑝;𝑡) 𝜕𝑝
𝑑𝑝 = 0, 𝑘 = 0,1,2,
so that the inclusion of the stochastic term does not alter in any sense the previous derivations. This ought to be the case, since the stochastic Liouville equation must give (i) the conservation of the probability density over the configuration space (integration over 𝑝), (ii) momentum conservation (multiplication by 𝑝 and integration over 𝑝) and (iii) conservation of energy (multiplication by 𝑝2 and integration over 𝑝). It may be thought that this way of approach is circular, since it may appear that we would need the final solution (with the quantum number 𝑛 already specified, etc.) to derive this result. However, this is not so! This fact, indeed, shows the content of the “infinitesimal” constraint referred to in the characteristic function derivation. To see that, consider the new corrected axioms:
The Entropy Derivation
81
Axiom 1 For an isolated system, the joint phase-space probability density related to any Quantum Mechanical phenomenon obeys the stochastic Liouville equation 𝐿𝑠𝑡𝑜𝑐 𝑜𝑝 {𝐹𝑛 (𝑞, 𝑝; 𝑡)} =
∂𝐹𝑛
𝑝 ∂𝐹𝑛
+𝑚
∂𝑡
∂𝑞
1 ∂𝑣𝑛 (𝑞,𝑡) [𝑝−𝑝𝑛 (𝑞;𝑡)]2
∂𝑉
− {∂𝑞 − 2𝑚
∂𝑞
{
𝑣𝑛 (𝑞,𝑡)
− 3}}
∂𝐹𝑛 ∂𝑝
= 0.
(3.25)
Axiom 2 The characteristic function defined upon momentum space as 𝑖𝑝𝛿𝑞
𝑍(𝑞, 𝛿𝑞; 𝑡) = ∫ exp (
ℏ
) 𝐹(𝑞, 𝑝; 𝑡)𝑑𝑝
(3.26)
can be expanded only up to second order in 𝛿𝑞. Axiom 3 The characteristic function can be written as 𝑍(𝑞, 𝛿𝑞; 𝑡) = 𝜓 ∗ (𝑞 −
𝛿𝑞 2
; 𝑡) 𝜓 (𝑞 +
𝛿𝑞
; 𝑡)
2
(3.27)
Thus, with the second axiom we readily find our phase space distribution function as 𝐹(𝑞, 𝑝; 𝑡) =
𝜌(𝑞; 𝑡) √2𝜋𝑣(𝑞; 𝑡)
exp {−
[𝑝 − 𝑝(𝑞; 𝑡)]2 }, 2𝑣(𝑞; 𝑡)
which is an exact explicit solution of the stochastic Liouville equation (3.25) (assuming 𝜕 2 𝑝(𝑞, 𝑡)⁄𝜕𝑞2 = 𝜕 3 𝑝(𝑞, 𝑡)⁄𝜕𝑞3 = 0). Writing (3.26) up to second order (using (3.27)) and substituting it into (3.25) we find exactly the same results of the previous axiomatic derivations, that is, we find the Schrödinger equation that fixes the expressions for 𝜌𝑛 (𝑞; 𝑡) and thus 𝑣𝑛 (𝑞; 𝑡). Moreover, the first three statistical moments of (3.25) with respect to 𝑝 are all zero (which is a constraint over the result 𝐹(𝑞, 𝑝; 𝑡)) showing that the probability density, the momentum and the energy are all conserved quantities. The previous way of presenting the axioms has the advantage of showing explicitly all the mathematical presuppositions of the derivations. The physical presuppositions contained in the previous second axiom will become clear when we connect it with the Central Limit Theorem in a future chapter. In a future section we will present the example of the hydrogen atom, which is a three dimensional problem. In this case, identical steps (see, for instance, [38], pp. 162, 163) as the previous ones take us into the phase space three dimensional distribution function 𝜌(𝑞⃗⃗;𝑡)
𝐹(𝑞⃗, 𝑝⃗; 𝑡) = [2𝜋𝑚𝑘
𝐵
𝑇(𝑞⃗⃗;𝑡)]3/2
[𝑝⃗−𝑝⃗(𝑞⃗⃗;𝑡)]2
𝑒𝑥𝑝 {− 2𝑚𝑘
},
⃗⃗;𝑡) 𝐵 𝑇(𝑞
and since we have (3.21), we may rewrite this function as 𝜌(𝑞⃗⃗;𝑡)
𝐹(𝑞⃗, 𝑝⃗; 𝑡) = [2𝜋⟨𝛿𝑝⃗(𝑞⃗⃗;𝑡)2⟩/3]3/2 exp {−
3 [𝑝⃗−𝑝⃗(𝑞⃗⃗;𝑡)]2 }. 2 ⟨𝛿𝑝⃗(𝑞⃗⃗;𝑡)2 ⟩
(3.28)
82
Olavo Leopoldino da Silva Filho
This last function is called a local Maxwellian distribution and is known from classical kinetic theory since the XIX 𝑡ℎ century. This function is a very general expression that still guarantees that one has obeyed the equation for the statistical balance (see [38], p. 163).
3.5. AVERAGE VALUES CALCULATED UPON PHASE SPACE The path taken by our derivations leaves no room for any sort of ambiguity in the calculation of average values using the probability density function 𝐹𝑛 (𝑞, 𝑝; 𝑡)—we are already assuming that the problem at hand is represented by some quantum number 𝑛, since we know that the Schrödinger equation will imply in the quantization of the spectrum—; given some function 𝑔(𝑞, 𝑝), its average value with respect to the quantum state labeled by 𝑛 is calculated simply as ⟨𝑔(𝑞, 𝑝)⟩𝑛,𝑞𝑝 = ∫ ∫ 𝑔(𝑞, 𝑝)𝐹𝑛 (𝑞, 𝑝; 𝑡)𝑑𝑞𝑑𝑝
(3.29)
while its average value with respect only to the momenta (and thus defined upon configuration space) is given by ⟨𝑔(𝑞, 𝑝)⟩𝑛 = ∫ 𝑔(𝑞, 𝑝)𝐹𝑛 (𝑞, 𝑝; 𝑡)𝑑𝑝. Thus, since we know that the expression for 𝐹𝑛 (𝑞, 𝑝; 𝑡) is given by (3.24), we can readily calculate the expression for any function 𝑔(𝑞, 𝑝) defined upon phase space. In what follows we show how this works for the energy. The average energy (upon configuration space) is given by ⟨𝐻(𝑞, 𝑝)⟩𝑛 = ∫ 𝐻(𝑞, 𝑝)𝐹𝑛 (𝑞, 𝑝; 𝑡)𝑑𝑝 = ℎ𝑛 (𝑞; 𝑡)𝜌𝑛 (𝑞; 𝑡), where 𝐻(𝑞, 𝑝) is the classical Hamiltonian and ⟨𝐻(𝑞, 𝑝)⟩𝑛 is a Hamiltonian density. Now, assuming, as we do in the derivations, that we may write the Hamiltonian of our problems as 𝐻(𝑞, 𝑝) =
𝑝2 + 𝑉(𝑞) 2𝑚
and using (3.24) it is very easy to show that ℎ𝑛 (𝑞; 𝑡)𝜌𝑛 (𝑞; 𝑡) = 𝜌𝑛 (𝑞; 𝑡) [
𝑝𝑛 (𝑞; 𝑡)2 ℏ2 ∂2 ln𝜌𝑛 (𝑞; 𝑡) + 𝑉(𝑞) − ]; 2𝑚 8𝑚 ∂𝑞2
the average energy 𝑒𝑛 = ⟨𝐻(𝑞, 𝑝)⟩𝑛,𝑞𝑝 is given by 𝑝𝑛 (𝑞;𝑡)2
𝑒𝑛 = ∫ 𝜌𝑛 (𝑞; 𝑡) [
2𝑚
ℏ2 ∂2 ln𝜌𝑛 (𝑞;𝑡)
+ 𝑉(𝑞) − 8𝑚
∂𝑞 2
] 𝑑𝑞,
(3.30)
The Entropy Derivation
83
where we are assuming that 𝜌𝑛 (𝑞; 𝑡) is normalized. However, we have already shown that we must have (compare with [45]) ∂𝑠𝑛 1 ∂𝑠𝑛 2 ℏ2 ∂2 𝑅𝑛 + ( ) + 𝑉(𝑞) − =0 ∂𝑡 2𝑚 ∂𝑞 2𝑚𝑅𝑛 (𝑞; 𝑡) ∂𝑞2 and if we have 𝑖 𝜓𝑛 (𝑞; 𝑡) = 𝑅𝑛 (𝑞; 𝑡)exp [ 𝑆𝑛 (𝑞; 𝑡)] ℏ such that 𝑆𝑛 (𝑞; 𝑡) = −𝐸𝑛 𝑡 + 𝑓𝑛 (𝑞), (where 𝐸𝑛 is the energy calculated using the Schrödinger equation), as is usual, we find that (we also write 𝑝𝑛 (𝑞; 𝑡) for ∂𝑠𝑛 (𝑞; 𝑡)/ ∂𝑞) 𝐸𝑛 =
𝑝𝑛 (𝑞; 𝑡)2 ℏ2 ∂2 𝑅𝑛 + 𝑉(𝑞) − 2𝑚 2𝑚𝑅𝑛 (𝑞; 𝑡) ∂𝑞 2
and thus, giving the normalization of 𝜌𝑛 (𝑞; 𝑡), we must have 𝑝𝑛 (𝑞;𝑡)2
𝐸𝑛 = ∫ 𝜌𝑛 (𝑞; 𝑡) [
2𝑚
+ 𝑉(𝑞) −
ℏ2 ∂2 𝑅𝑛 ] 𝑑𝑞, 2𝑚𝑅𝑛 (𝑞;𝑡) ∂𝑞 2
(3.31)
since 𝐸𝑛 is a constant. Now, we must compare 𝐸𝑛 with 𝑒𝑛 . We may differentiate the last term in (3.30) and use the fact that 𝜌𝑛 (𝑞; 𝑡) is a probability density to find 𝐸𝑛 = ∫ {𝜌𝑛 (𝑞; 𝑡) [
𝑝𝑛 (𝑞; 𝑡)2 ℏ2 ∂ 1 ∂𝜌𝑛 (𝑞; 𝑡) + 𝑉(𝑞)] − 𝜌𝑛 (𝑞; 𝑡) [ ]} 𝑑𝑞 = 2𝑚 8𝑚 ∂𝑞 𝜌𝑛 (𝑞; 𝑡) ∂𝑞 2
𝑝𝑛 (𝑞; 𝑡)2 ℏ2 1 ∂𝜌𝑛 (𝑞; 𝑡) ∫ {𝜌𝑛 (𝑞; 𝑡) [ + 𝑉(𝑞)] + ( ) } 𝑑𝑞 2𝑚 8𝑚 𝜌𝑛 (𝑞; 𝑡) ∂𝑞
,
This expression may be further simplified using 𝜌𝑛 (𝑞; 𝑡) = 𝑅𝑛 (𝑞; 𝑡)2 which gives 2
𝐸𝑛 = ∫ {𝜌𝑛 (𝑞; 𝑡) [
𝑝𝑛 (𝑞; 𝑡)2 ℏ2 ∂𝑅𝑛 (𝑞; 𝑡) + 𝑉(𝑞)] + ( ) } 𝑑𝑞 2𝑚 2𝑚 ∂𝑞
and integration by parts in the last term gives directly (3.31). Thus, 𝐸𝑛 = 𝑒𝑛 and our criterion to calculate the energy from (3.29) is mathematically shown to be appropriate. Other average values may be calculated. It is noteworthy that we immediately have, as we have shown in a previous section,
84
Olavo Leopoldino da Silva Filho ℏ Δ𝑞Δ𝑝 ≥ , 2
by the very process of derivation. Thus, the present approach implies a true probability density function defined upon phase space representing any quantum state of any quantum mechanical problem. It is also immediate to see that 𝐹𝑛 (𝑞, 𝑝; 𝑡) gives the true probability density function defined upon configuration space for any quantum mechanical problem. The fact that this function is a true probability density function contrasts with distributions derived from other (non-infinitesmal) approaches, like the one based upon Wigner distributions. We will return to this issue in a future chapter, when we address the problem of operator formation in Quantum Mechanics. It is now better to take a look at some examples to see in less abstract ways how the formalism works.
3.5.1. The One-Dimensional Harmonic Oscillator Example The harmonic oscillator problem has solutions given by 𝛽
1/4
𝜓𝑛 (𝑞) = ( 2𝑛 2 ) 𝜋2 𝑛!
exp (−
𝛽𝑞2 ) 𝐻𝑛 (√𝛽𝑞), 2
where 𝛽 = 𝑚𝜔/ℏ, and 𝐻𝑛 (√𝛽𝑞) are the Hermite polynomials. In table 1 we present the probability density function 𝐹𝑛 (𝑞, 𝑝; 𝑡) for various values of 𝑛. The distribution functions defined upon phase space quickly get very complicated, but the process to find them is mechanical and any algebraic computer program can do the job (as we have done to make this table). The dispersions Δ𝑝𝑛 and Δ𝑞𝑛 and the energy 𝐸𝑛 can also be calculated using the functions presented in Table 2 and the results are the usual ones (again, algebraic computation was used throughout). Table 2. Phase space probability density functions for various values of the quantum number n for the harmonic oscillator problem. 𝒏
𝑭𝒏 (𝒒, 𝒑)
0
1 𝑝2 exp [−𝛽𝑞 2 − 2 ] 𝜋 ℏ 𝛽 3
1 2
2|√𝛽𝑞|
𝜋ℏ√𝛽𝑞 2 + 1 − 1|3
|2𝛽𝑞 2
2𝜋ℏ√2𝛽 2 𝑞 4 + 4𝛽𝑞 2 + 5 3
exp [−𝛽𝑞 2 − exp [−𝛽𝑞 2 −
𝑝2 𝑞 2 ] + 1)
ℏ2 (𝛽𝑞 2
𝑝2 (2𝛽𝑞 2 − 1)2 ] ℏ2 𝛽(2𝛽 2 𝑞 4 + 4𝛽𝑞 2 + 5)
3
|√𝛽𝑞(2𝛽𝑞 2 − 3)|
3𝜋ℏ√4𝛽 3 𝑞 6 + 9𝛽𝑞 2 + 9
exp [−𝛽𝑞 2 −
𝑝2 𝑞 2 (2𝛽𝑞 2 − 3)2 ] ℏ2 (4𝛽 3 𝑞 6 + 9𝛽𝑞 2 + 9)
The Entropy Derivation
85
We can plot these phase space probability distributions for each value of the quantum number 𝑛. The results are shown in Figure 3-2, together with the profile of the average fluctuations ⟨𝛿𝑝(𝑞)2 ⟩. The fluctuation profiles may be compared to the contours of the probability density functions to make a first tentative interpretation of the results—an interpretation that we will keep improving for the rest of the book. Let us consider the state 𝑛 = 1 of the quantum harmonic oscillator. The function 𝐹1 (𝑞, 𝑝; 𝑡) is given in Table 2 and the momentum fluctuation profile is given in Figure 3-2 and Figure 3-3 (compare with the results in [46]). The Schrödinger equation can be written in the format ∂𝑝1 (𝑞; 𝑡) ∂ 𝑝1 (𝑞; 𝑡)2 ℏ2 ∂2 ln𝜌1 (𝑞; 𝑡) =− [ + 𝑉(𝑞) − ], ∂𝑡 ∂𝑞 2𝑚 8𝑚 ∂𝑞2 where we used (3.17). This equation can be rewritten as ∂𝑝1 (𝑞; 𝑡) 𝑝1 (𝑞; 𝑡) ∂𝑝1 (𝑞; 𝑡) ∂𝑉(𝑞) ∂𝑄1 (𝑞) + =− − , ∂𝑡 𝑚 ∂𝑞 ∂𝑞 ∂𝑞 where 𝑄(𝑞) = −
ℏ2 ∂2 ln𝜌1 (𝑞; 𝑡) , 8𝑚 ∂𝑞2
and it can be intuitively identified with the Hamilton equation 𝑑𝑝1 𝑑𝑡
=−
∂𝑉(𝑞) ∂𝑞
−
∂𝑄1 (𝑞) ∂𝑞
.
(3.32)
Note that this identification is only an intuitive one. The function 𝑝1 (𝑞; 𝑡) for the harmonic oscillator is zero. Moreover, when looking at the Hamiltonian formalism, we always consider 𝑝 and 𝑞 as independent variables (which become connected by Hamilton equations), and now we have the connection from the beginning. Function 𝑝1 (𝑞; 𝑡) also has the obvious interpretation of being an average momentum function defined upon configuration space and so the identification may be heuristically valid, but lacks formal meaning. If we are interested in finding a result like this from mathematically sound developments, we must follow other ways. We will do that in a future chapter and find the true stochastic Langevin equations for quantum mechanics; at that point we will show that the present discussion has much of the flavour of the real thing, but at this point, the reader should keep in mind that we are only producing a sketch of an interpretation. Having said this, we can turn our attention to equation (3.32) to write it by substituting the expression for the derivatives there shown. We then find that our “Hamiltonian” for equation (3.32) is given by (we also write the potential of the harmonic oscillator) 𝑝2
1
ℏ2
1+𝑞 2
1 𝐻(𝑞, 𝑝1 ) = 2𝑚 + 2 𝑚𝜔2 𝑞2 + 2𝑚 (
𝑞2
ℏ2
1+𝑞2
) , 𝑄(𝑞) = 2𝑚 (
𝑞2
),
86
Olavo Leopoldino da Silva Filho
Figure 3.2. Left column: The phase space probability density functions for n=0 and n=1. Right column: the average fluctuation profiles defined upon configuration space.
Figure 3.3. Left column: The phase space probability density functions for n=1 and n=2. Right column: the average fluctuation profiles defined upon configuration space.
The Entropy Derivation
87
which has the same appearance of the harmonic oscillator problem with the extra last term on the right hand side. This term was first introduced by Bohm[45] and was called the quantum potential. If this term were a truly fluctuating force, then equation (3.32) would be a true Langevin equation. At this point, it is a deterministic equation (but the reader should remember our previous warnings). We can draw the constant energy trajectories for the previous Hamiltonian and they are shown in Figure 3-4. We also show in Figure 3-4 the contours of the probability density function 𝐹1 (𝑞, 𝑝; 𝑡). These two set of contours bear a striking resemblance to each other showing that the constant probability curves of the function 𝐹1 (𝑞, 𝑝; 𝑡) are close to the constant energy ones, altough not exactly the same.
Figure 3.4. Contours of the Hamiltonian and of the probability density function for the level n=1 of the harmonic oscillator problem. For the Hamiltonian, each contour represents a constant energy shell, while for the probability density function each contour represents a constant probability density.
Most important, these two sets of contours give us a clue of what is going on in the detailed microscopic level. Indeed, we may see that the term given by 𝑄(𝑞) has an infinity at the origin; thus if the particle (of whose position 𝑞 is a label) is on a constant energy contour, whenever it comes close to the origin, there appears a very strong force which will be responsible for drastically altering the value of the particle’s momentum. This explains the strong modifications in value that the momenta suffer at points close to the origin, even if the coordinate 𝑞 is not substantially altered. The corpuscle then is thrown away from the origin. This can be seen by the fluctuation profile for the fluctuation in the momentum, shown in Figure 3-2. At great distances from the origin, the term given by 𝑄(𝑞) becomes almost constant and does not produce any appreciable force; thus, the system behaves exactly as a usual harmonic oscillator and the contours approach the form of a circle (or an ellipse, in general). The contours of the probability density function 𝐹1 (𝑞, 𝑝; 𝑡) just reproduce this pattern: since the corpuscle has a tendency to be pushed from the origin, the probability of finding it there is very low (zero, indeed, since there 𝑄(𝑞) goes to infinity). At great distances from the origin, the probability density function behaves as the ground state probability density (in phase space). Note again that this interpretation of the phenomenon must be seen as approximate! As stressed before, the Hamiltonian of the problem (an isolated problem) is given by a
88
Olavo Leopoldino da Silva Filho
deterministic function and the particle, once on a constant energy contour will never get out of there and thus, the very notion of a probability density function is out of place. One may try to understand this phenomenon appealing to an ensemble picture, but now this will not do, for we are now surely working with one isolated harmonic oscillator. There is a way out of this conundrum and we will only mention it here, for it is necessary first to develop the formal apparatus to lend the solution sound. If the quantum system is truly a random one, and the equation (3.32) is giving only an average measure of its behavior, then one can assume the validity of the ergodic principle, that would be obviously valid for this case, since we have a quantum mechanical stationary problem. Thus, it goes, averages taken over ensembles are equivalent to averages taken over a single system during some minimum time interval that allows the system to occupy all its accessible states by means of the fluctuations. This is the major step that will drive all our future developments. In fact, the rest of this first part of the book can be seen as a trial to give this claim unavoidable grounds. If the system is random and the ergodic principle can be applied, then, during some time interval, the particle will fill the phase space in a way compatible with the probability density function 𝐹1 (𝑞, 𝑝; 𝑡)—that is, behaving grosso modo as we have outlined before. We will have then a system of corpuscular nature behaving in time in such a way that it fills the configuration space with a probability that is mathematically a solution to a wave equation (Schrödinger’s, naturally). In a future chapter we will mathematically derive the Langevin equations presenting true random behavior for quantum mechanics and make numerical simulations to show that the previous sketch of an interpretation made in terms of a corpuscular nature showing undulatory behavior really obtains. It is the inclusion of time (in a way different from Bohr’s) that allows us to avoid the ontological dualism of the 1927 interpretation of Quantum Mechanics. At this point, it is nothing but an aperitif, but the next three chapters will settle this question beyond any doubt.
3.5.2. The Hydrogen Atom Example It is important to present another application of the previous formalism. Trials of studying Quantum Mechanics using phase space approaches are known to have the harmonic oscillator as a particular case because of the specific appearance of the potential function[46]. Thus, showing that the present formalism applies equally well to any quantum mechanical problem in any dimension avoids the criticism about the limited scope of the theory. In any case, we have already shown its adequacy for the general situation in previous sections. We thus take the hydrogen atom problem, for which the normalized probability density function is given by (we put 𝑚 = ℏ = 𝑎 = 𝑒 = 1) 𝜌𝑛 (𝑥) = 𝑅𝑛ℓ (𝑟)2 |𝑌ℓ𝑚 (𝜃, ϕ)|2 , where 2 3/2 2𝑟 ℓ 2𝑟 𝑅𝑛ℓ (𝑟) = − ( ) 𝑒 −𝑟/𝑛 ( ) 𝐿2ℓ+1 𝑛+ℓ ( ) 𝑛 𝑛 𝑛
The Entropy Derivation
89
with 𝐿𝑠𝑘 (𝑥) the associated Laguerre functions and 1/2
2ℓ + 1 (ℓ + |𝑚|)! 𝑌ℓ𝑚 (𝜃, ϕ) = [ ] 4𝜋 (ℓ − |𝑚|)!
𝑃ℓ𝑚 (𝜃)𝑒 𝑖𝑚ϕ
the usual spherical harmonics with 𝑃ℓ𝑚 (𝜃) the associated Legendre functions. We thus have 𝐩(𝐱; 𝑡) = ∇𝑠(𝐱; 𝑡) =
𝑚 ̂, ϕ 𝑟sin𝜃
that we got from 𝑠(𝐱; 𝑡) = 𝐸 + 𝑚ϕ. In Table 3 we have listed the quantum numbers of some states of the hydrogen atom together with the momentum fluctuations, in the first and second columns, respectively. The energy values obtained by explicit calculation of the integral (3.29) with 𝑔(𝑞, 𝑝) equal to the Hamiltonian function are given in the third column. The results are exactly those found for the hydrogen atom (algebraic computation used throughout). In spherical coordinates the phase space probability density functions are written from the general format [see (3.28)] 𝐹(𝑟, 𝜃, ϕ, 𝑝𝑟 , 𝑝𝜃 , 𝑝ϕ ) =
𝜌(𝑟⃗) 3[𝑝⃗ − ∇𝑠(𝑟⃗)]2 exp {− } [2𝜋⟨𝛿𝑝⃗(𝑟⃗)2⟩/3]3/2 2⟨𝛿𝑝⃗(𝑟⃗)2 ⟩
Table 3. Half-densities, momentum fluctuations and energies for some states of the hydrogen atom problem given by the quantum numbers in the first column (ξ=cosθ) (𝐧, ℓ, 𝐦) (1,0,0) (2,0,0) (2,1,0) (2,1, ±1) (3,0,0) (3,1,0) (3,1, ±1)
Momentum Fluctuations 1 3𝑟 1 (8 − 5𝑟 + 𝑟 2 ) 6 𝑟(𝑟 − 2)2 1 𝑟𝜉 2 + 1 6 𝑟2𝜉2 1 6𝑟 1 2187 − 1944𝑟 + 648𝑟 2 − 84𝑟 3 + 4𝑟 4 9 𝑟(27 − 18𝑟 + 2𝑟 2 )2 2 1 108𝑟𝜉 − 27𝑟 2 𝜉 2 + 2𝑟 3 𝜉 2 + 108 − 36𝑟 + 3𝑟 2 18 𝑟 2 (𝑟 − 6)2 𝜉 2 1 (−27𝑟 + 108 + 2𝑟 2 ) 18 𝑟(𝑟 − 6)2
Energy 1 2 1 − 8 1 − 8 1 − 8 1 − 18 1 − 18 1 − 18 −
90
Olavo Leopoldino da Silva Filho
where 2
2
[𝑝⃗ − ∇𝑠(𝑟⃗)] =
𝑝𝑟2
𝑝𝜃2 (𝑝ϕ − 𝑚) + 2+ 2 2 𝑟 𝑟 sin 𝜃
and ⟨𝛿𝑝⃗(𝑟⃗)2 ⟩ = −
ℏ2 2 ∇ ln𝜌(𝑟⃗). 4
Table 4. Phase space probability density functions for some levels of the hydrogen atom (ξ=cosθ) (𝐧, 𝓵, 𝐦) (1,0,0) (2,0,0) (2,1,0) (2,1,1) (3,2,1)
𝑭(𝒓, 𝜽, 𝛟, 𝒑𝒓 , 𝒑𝜽 , 𝒑𝛟 ; 𝒕) 3
2 𝑝ϕ
𝑝2
3√6𝑟 3/2 −2𝑟 −2[𝑝𝑟2+ 𝑟𝜃2 +𝑟 2sin2 𝜃]𝑟 𝑒 𝑒 4𝜋 5/2 𝑝2
2 𝑝ϕ
𝑟(𝑟−2)2
−3[𝑝𝑟2 + 𝜃 + ] 𝑟 3/2 |𝑟 − 2|5 𝑟 2 𝑟 2 sin2 𝜃 8−5𝑟+𝑟 2 −𝑟 𝑒 𝑒 5/2 2 3/2 32𝜋 (8 − 5𝑟 + 𝑟 ) 𝑝2
2 𝑝ϕ
𝜉 2 𝑟 2 +1 𝜉2𝑟 2
−3[𝑝𝑟2 + 𝜃 + ] 3√12𝜉 5 𝑟 5 𝑟 2 𝑟 2 sin2 𝜃 −𝑟 𝑒 𝑒 64𝜋 5/2 (𝜉 2 𝑟 2 + 1)3/2 2
𝑝2 (𝑝ϕ −1)
3√12𝑟 7/2 sin2 𝜃 −𝑟 −3[𝑝𝑟2+ 𝑟𝜃2 + 𝑟 2sin2𝜃 ]𝑟 𝑒 𝑒 128𝜋 5/2 𝑝2 (𝑝ϕ −1)
2
𝑟(𝑟−6)2
+ ] 2𝑟 −9[𝑝𝑟2 + 𝜃 6𝑟 7/2 (𝑟 − 6)7/2 sin2 𝜃 𝑟 2 𝑟 2 sin2 𝜃 2𝑟 2 −27𝑟+108 − 3𝑒 𝑒 1458𝜋 5/2 (2𝑟 2 − 27𝑟 + 108)3/2
In Table 4 we write the probability density functions in phase space for some levels of the hydrogen atom. Unfortunately, it is not possible to draw graphics of these results given the number of dimensions.
3.6. ENTROPY MAXIMIZATION BY F(Q,P;T) Function 𝐹(𝑞, 𝑝; 𝑡), given the role it plays both in classical kinetic theory and, as shown before, Quantum Mechanics, introduces a number of quite provocative results. In fact, this function was already known for some time by those working in the field of DensityFunctional Theory, although without knowing why its use should be allowed within the realm of Quantum Mechanics[47], [48], [49], [50], [51], [52], [53], [54]. “(...) the analogy between the density-functional theory of the ground states of inhomogeneous electronic systems and the classical thermodynamics of the equilibrium states of nonhomogeneous macroscopic systems: an analogy only, and only approximate, but in any case provocative.” (see [47], p. 239, my bold).
The Entropy Derivation
91
Within Density-Functional Theory, this function is introduced by defining the global entropy associated with the density distribution 𝜌(𝑞⃗) as 𝑆 = −𝑘𝐵 ∫ ∫ 𝐹(𝑞⃗, 𝑝⃗; 𝑡)ln𝐹(𝑞⃗, 𝑝⃗; 𝑡)𝑑𝑞⃗𝑑𝑝⃗ maximized with respect to the form of 𝐹(𝑞⃗, 𝑝⃗; 𝑡) subject to the constraints 𝜌(𝑞⃗; 𝑡) = ∫ 𝐹(𝑞⃗, 𝑝⃗; 𝑡)𝑑3 𝑝⃗; 𝑡𝑠 (𝑞; 𝑡) = ∫
𝑝2 2𝑚
𝐹(𝑞⃗, 𝑝⃗; 𝑡)𝑑 3 𝑝⃗,
(3.33)
where 𝑡𝑠 (𝑞; 𝑡) =
ℏ2 [∇𝜌(𝑞⃗; 𝑡)]2 ℏ2 { − ∇2 𝜌(𝑞⃗; 𝑡)} = − 𝜌(𝑞⃗; 𝑡)∇2 ln𝜌(𝑞⃗; 𝑡), 8𝑚 𝜌(𝑞⃗; 𝑡) 8𝑚
since it is known that the second equality in (3.33) is the functional expression of the kinetic energy term of quantum mechanics. Using then 𝛼(𝑞⃗) and 𝛽(𝑞⃗) as Lagrange multipliers one arrives at the result 𝐹(𝑞⃗, 𝑝⃗; 𝑡) = 𝑒 −𝛼(𝑞⃗⃗,𝑡) 𝑒 −𝛽(𝑞⃗⃗,𝑡)𝑝
2 /2
and the first condition in (3.33) implies that ∞
𝜌(𝑞⃗; 𝑡) = exp[−𝛼(𝑞⃗; 𝑡)]4𝜋 ∫ exp[−𝛽(𝑞⃗; 𝑡)𝑝2/2]𝑝2 𝑑𝑝 = 0
2𝜋 3/2 exp[−𝛼(𝑞⃗; 𝑡)] [ ] 𝛽(𝑞⃗; 𝑡) and one ends with 𝜌(𝑞⃗⃗;𝑡)
𝐹(𝑞⃗, 𝑝⃗; 𝑡) = [2𝜋𝑚𝑘
𝐵
𝑇(𝑞⃗⃗;𝑡)]3/2
exp {−
𝑝2
}
2𝑚𝑘𝐵 𝑇(𝑞⃗⃗;𝑡)
(3.34)
where 𝛽(𝑞⃗; 𝑡) =
1 3𝜌(𝑞⃗; 𝑡) = . 𝑘𝐵 𝑇(𝑞⃗; 𝑡) 2𝑡𝑠 (𝑞⃗; 𝑡)
This last equality implies that 1 𝛽(𝑞⃗⃗;𝑡)
1 ℏ2
= 𝑘𝐵 𝑇(𝑞⃗; 𝑡) = − 3 4𝑚 ∇2 ln𝜌(𝑞⃗; 𝑡).
(3.35)
The result (3.34) together with (3.35) is fully equivalent to (3.28) with (3.22). Thus, the probability density function has the correct functional appearance needed to maximize the
92
Olavo Leopoldino da Silva Filho
global entropy while giving the correct quantum mechanical probability density function upon configuration space and the correct quantum mechanical kinetic energy. Note that if it gives the correct values for these quantities, it necessarily gives the correct value for the energy, since the energy is an average of the kinetic energy term and the average of a potential term, that will always agree with the one calculated by usual methods, since the probability density function defined upon configuration space is, by construction, the correct quantum mechanical one. The results of this section may enlighten us about some issues that appeared in section 3.4.1 related to the Stochastic Liouville Equation. Indeed, the local Maxwellian plays a very important role within fluid mechanics because it is the phase-space density function that satisfies the H-theorem (about the irreversibility of processes). This means that if we had begun with a Stochastic Boltzmann Equation (a Boltzmann equation with the extra term 𝛼𝑛 (𝑥, 𝑝) defined in 3.4.1) instead of a Stochastic Liouville Equation, then we would still arrive at the same Schrödinger equation by the very same derivation process. This is so because the Stochastic Boltzmann Equation would be written as ∂𝐹𝑛 ∂𝑡
+
𝑝 ∂𝐹𝑛 𝑚 ∂𝑞
−{
∂𝑉 ∂𝑞
−
1 ∂𝑣𝑛 (𝑞,𝑡) [𝑝−𝑝𝑛 (𝑞;𝑡)]2 2𝑚
∂𝑞
{
𝑣𝑛 (𝑞,𝑡)
− 3}}
∂𝐹𝑛 ∂𝑝
∂𝐹𝑛
=(
∂𝑡
)
coll
,
where the term on the right is a collisional term. However, this term also gives zero when integrated in 𝑝, when multiplied by 𝑝 and integrated in 𝑝 or when multiplied by 𝑝2 and integrated in 𝑝; this is so because the collisional term cannot change the three conservation equations (conservation of probability, conservation of momentum and conservation of energy). Moreover, the Maxwellian is an exact solution for the collisional term, precisely because of the H-theorem. This clearly means that the integration over 𝑝 washes out a lot of details from the underlying phenomenon, even identifying different equations among themselves. In a future section we will connect this feature of the formalism with the fact that, indeed, the Schrödinger equation washes out all the elements related to fluctuations and must be considered, therefore, as a mean field equation (with an implicit stochastic support). It has also already been shown in the literature that the function 𝐹(𝑞⃗, 𝑝⃗; 𝑡) is equivalent to taking the Wigner distribution function 𝐹𝑊 (𝑞⃗, 𝑝⃗; 𝑡) up to second order[50]. We leave the discussion of the connection between 𝐹(𝑞⃗, 𝑝⃗; 𝑡) and 𝐹𝑊 (𝑞⃗, 𝑝⃗; 𝑡) for a future chapter. For our present purposes, it is now more important to turn our attention into making it explicit the role played by the fluctuations in Quantum Mechanics. This is the content of the next three chapters.
Chapter 4
THE STOCHASTIC DERIVATION In the previous chapter we have made some developments in the direction of showing that there must be a stochastic support for quantum mechanics. These developments were possible because one of our derivations of the Schrödinger equation is based on the concept of entropy that is akin to the concept of fluctuations (by means of the fluctuation-dissipation theorem, for instance). The present chapter is intended to establish the connection between Quantum Mechanics and stochastic behaviors on sound grounds, although not definitive1. Some of the results to follow are not new; the search for a stochastic support for Quantum Mechanics has taken place since the early 1950s [55,56] and became a fertile research field in the following two decades [57, 58, 59, 60, 61, 62]. It is still an important field for investigation of the mathematical and interpretive foundations of Quantum Mechanics. The stochastic approach to Quantum Mechanics can be illustrated by the mathematical derivation of the quantum mechanical formalism using only the formal apparatus of classical statistical mechanics[63]. The model underlying this approach is one in which the particles of a system, interacting via mutual forces, remain in dynamic equilibrium because of the balance of these forces with a stochastic force responsible for the particle’s random movement [64]. One of the problems of this derivation, however, is to explain the source or origin of that stochastic force. In fact, when carrying out the derivation of the Schrödinger equation from stochastic principles, one needs, in general, to introduce by postulate the stochastic velocity and acceleration and relate them, also by postulate, with the generalized stochastic force. Thus, although the stochastic approach shows mathematically that it is possible to connect the Schrödinger equation with a stochastic behavior, the introduction of this behavior by means of postulates may seem as a way to force the result (the Schrödinger equation). Moreover, accepting the stochastic behavior without knowing its origin implies in making the derivation of the Schrödinger equation to be based on an obscure element that, because of this approach’s postulates, becomes irremovable within its realm. This is why some authors defending this approach sustain, for example, that the source of the stochastic force would be some all-pervasive universal field, which seems obscure, if not preposterous, by any reasonable perspective. In fact, an all-pervasive universal stochastic field is as obscure, undetectable, etc., as is the conscious observer of some interpretations of 1
Most of the results of the present chapter were published in Phys. Rev. A 61, 052109 (2000).
94
Olavo Leopoldino da Silva Filho
the quantum formalism (despite the attempts to relate this pervasive field with the cosmic electromagnetic background). However, if we mathematically connect this derivation with our previous ones (and Feynman’s), we may profit from the best of two worlds: first, we may find that the source of the stochastic behavior must be internal to the isolated systems, since the former three derivations didn’t rely upon any variable foreign to the quantum systems themselves. Second, we may also profit from the sound establishment of the stochastic support of Quantum Mechanics, that is crucial for us to point to a solution regarding the dual behavior of single isolated corpuscular quantum systems. In the following we will present the stochastic derivation as it was presented in [65], since it seems to be the most general and mathematically clearer approach to the problem. We then show that this derivation is mathematically equivalent to the one presented in the previous chapters and address some other topics that may be developed within the stochastic framework.
4.1. THE STOCHASTIC APPROACH The two guiding principles of this derivation are: (a) the theory must be an extension of Newtonian mechanics and (b) the velocities and forces must transform according to certain rules with respect to time-inversion. As it will be shown, the derivation depends upon a parameter 𝜆 which “(...)appears in an unusual energy term in our generalized Schrödinger equation. (...) If it is unit, the classical quantum-mechanical case is obtained, while a value different from unit gives rise to the equations of Brownian motion and adds a new term to Schrödinger’s equation. This means that, although in our theory there is an interaction of the particle with its surroundings, postulated from the beginning, this interaction remains hidden in the quantum-mechanical case but gives rise to a dissipative term in the Brownian case.” (see [65], p. 1621. My emphasis)
The fact that the “interaction remains hidden” in the stochastic approach is the origin of its difficulty to correctly attribute the source of randomness. Thus, let us begin assuming that the velocity 𝑐⃗ of the particle is the sum of a systematic or current velocity 𝑣⃗ and a stochastic component 𝑢 ⃗⃗ 𝑐⃗ = 𝑣⃗ + 𝑢 ⃗⃗ and let us introduce the time inversion operator 𝑇̂. We now impose (a first axiom) that the velocities transform under 𝑇̂ as 𝑇̂𝑣⃗ = −𝑣⃗; 𝑇̂𝑢 ⃗⃗ = 𝑢 ⃗⃗.
(4.1)
The Stochastic Derivation
95
The first equality imposes itself since we want our formalism to imply Newtonian mechanics in the limit in which stochastic velocities are absent2 and, in this case, the velocity must be a vector under time inversion. We thus have 𝑇̂𝑐⃗ = 𝑐⃗′ = −𝑣⃗ + 𝑢 ⃗⃗ and thus 1 1 𝑣⃗ = (𝑐⃗ − 𝑐⃗′ ); 𝑢 ⃗⃗ = (𝑐⃗ + 𝑐⃗′ ). 2 2 We now need an operator that correlates the position and velocities (like 𝑑/𝑑𝑡 in usual Newtonian mechanics). We thus assume that there exists a distribution of the changes in the space coordinate 𝛿𝑞⃗ = 𝑞⃗(𝑡 + 𝛿𝑡) − 𝑞⃗(𝑡), occurring in a small time interval 𝛿𝑡. Suppose now that 𝑓(𝑞⃗; 𝑡) is any smooth function of 𝑞⃗ and 𝑡; we can write 𝑓(𝑞⃗(𝑡 + 𝛿𝑡), 𝑡 + 𝛿𝑡) − 𝑓(𝑞⃗(𝑡), 𝑡) ∂ 1 ∂ ≈ [ + ∑ [𝑞𝑖 (𝑡 + 𝛿𝑡) − 𝑞𝑖 (𝑡)] + 𝛿𝑡 ∂𝑡 𝛿𝑡 ∂𝑞𝑖 𝑖
1 ∂2 ∑ [𝑞𝑖 (𝑡 + 𝛿𝑡) − 𝑞𝑖 (𝑡)][𝑞𝑗 (𝑡 + 𝛿𝑡) − 𝑞𝑗 (𝑡)] + ⋯ ] 𝑓(𝑞⃗(𝑡), 𝑡) 2𝛿𝑡 ∂𝑞𝑖 ∂𝑞𝑗
.
𝑖,𝑗
Taking the average of the above expression (average values with respect to the 𝛿𝑞⃗ distributions) we find ̂ 𝑓(𝑞⃗; 𝑡) = lim 𝐷 ∂
𝛿𝑡→0 ∂
⟨𝑓(𝑞⃗⃗(𝑡+𝛿𝑡),𝑡+𝛿𝑡)−𝑓(𝑞⃗⃗(𝑡),𝑡)⟩ 𝛿𝑡 ∂2
[∂𝑡 + ∑𝑖 𝑐𝑖 ∂𝑞 + ∑𝑖𝑗 𝐷𝑖𝑗 ∂𝑞 𝑖
𝑖 ∂𝑞𝑗
= ,
(4.2)
+ ⋯ ] 𝑓(𝑞⃗; 𝑡)
where 𝑐𝑖 , 2𝐷𝑖𝑗 , ... stand for the limits of the first-, second-, ... order moments of the distribution divided by 𝛿𝑡, and we are identifying 𝑐𝑖 with the components of the previously
2
At this point the author affirms that: "(...) 𝑣⃗ being the velocity in Newtonian mechanics but 𝑢 ⃗⃗ having no classical analog." [see Pena, 1969, p. 1621, before equation (3)] This is a misconception that is worthy commenting here. Saying that 𝑢 ⃗⃗ has no Newtonian counterpart does not imply that it has no classical analog. Newtonian mechanics is one classical theory which is not based upon the constructs of random movements, etc. Stochastic behavior is a statistical construct that does not decide if we are within a classical or a non-classical theory. It is quite obvious that there are classical theories in which stochastic behavior does take part (canonical ensembles in classical thermostatistics, for instance).
96
Olavo Leopoldino da Silva Filho
introduced velocity 𝑐⃗ – note that all the dependence of the equation on 𝛿𝑞⃗ is now encarved in the coefficients 𝑐𝑖 , 2𝐷𝑖𝑗 , … We now assume that the matrix 𝐷𝑖𝑗 is diagonal3 and write ̂ 𝑓(𝑞⃗; 𝑡) = 𝐷
∂𝑓(𝑞⃗; 𝑡) + 𝑐⃗ ⋅ ∇𝑓(𝑞⃗; 𝑡) + 𝐷∇2 𝑓(𝑞⃗; 𝑡) + ⋯ ∂𝑡
̂ → ∂/ ∂𝑡 + 𝑐⃗ ⋅ ∇= 𝑑/𝑑𝑡 in the limit when 𝐷 → 0. This operator 𝐷 ̂ gives Note that 𝐷 under time inversion the result ̂ ′ 𝑓(𝑞⃗; 𝑡) = 𝑇̂𝐷 ̂ 𝑓(𝑞⃗; 𝑡) = − 𝐷
∂𝑓(𝑞⃗; 𝑡) + 𝑐⃗′ ⋅ ∇𝑓(𝑞⃗; 𝑡) + 𝐷 ′ ∇2 𝑓(𝑞⃗; 𝑡), ∂𝑡
̂ the mean forward derivative operator, while we call 𝐷 ̂ ′ the mean backward and we call 𝐷 derivative operator, since in the limit 𝑢 ⃗⃗ → 0, this operator goes into −𝑑/𝑑𝑡. We end up with ̂= 𝐷
∂ ∂𝑡
+ 𝑐⃗ ⋅ ∇ + 𝐷∇2 + ⋯
̂ ′ = − ∂ + 𝑐⃗′ ⋅ ∇ + 𝐷 ′ ∇2 + ⋯ 𝐷 ∂𝑡
(4.3)
and we note that we readily have ̂ 𝑞𝑖 = 𝑐𝑖 ; 𝐷 ̂ ′ 𝑞⃗𝑖 = 𝑐𝑖′ 𝐷 which implies that 1 ̂−𝐷 ̂ ′ )𝑞⃗ = 𝐷 ̂𝑐 𝑞⃗ 𝑣⃗ = (𝐷 2 , 1 ̂+𝐷 ̂ ′ )𝑞⃗ = 𝐷 ̂𝑠 𝑞⃗ 𝑢 ⃗⃗ = (𝐷 2 ̂𝑐 is the current derivative operator and 𝐷 ̂𝑠 is the stochastic derivative operator. Using where 𝐷 (4.3) these new operators may be written as ̂𝑐 = ∂ + 𝑣⃗ ⋅ ∇ + 𝐷− ∇2 + ⋯ 𝐷 ∂𝑡 , ̂ 𝐷𝑠 = 𝑢 ⃗⃗ ⋅ ∇ + 𝐷+ ∇2 + ⋯
(4.4)
̂𝑠 𝑞⃗ = 0 in the Newtonian where 𝐷+ = (𝐷 + 𝐷 ′ )/2 and 𝐷− = (𝐷 − 𝐷 ′ )/2. We note that 𝐷 limit. We now need to introduce a force to build a dynamic theory. We follow Newton’s prescription and assume (another axiom) that the acceleration is given by
3
The reader may have noticed that this is the same as assuming in the three-dimensional entropy derivation that < 𝛿𝑞𝑖 𝛿𝑞𝑗 >= 0, that is, the fluctuations related to different directions are statistically independent, which in turn implies that [𝑞̂𝑖 , 𝑞̂𝑗 ] = 0, in the quantum mechanical formalism.
The Stochastic Derivation
97
̂ 𝑐⃗, 𝑎⃗ = 𝐷 and is calculated as the forward derivative of the general velocity. This means that we have ̂𝑐 𝑣⃗ + 𝐷 ̂𝑠 𝑢 ̂𝑐 𝑢 ̂𝑠 𝑣⃗ 𝑎⃗ = 𝐷 ⃗⃗ + 𝐷 ⃗⃗ + 𝐷 ̂𝑐 𝑣⃗ = 𝑑𝑣⃗/𝑑𝑡, which is the usual that reduces in the Newtonian limit to the known result 𝑎⃗ = 𝐷 result for the total acceleration acting on the particle. Let us consider now only 𝑇̂-invariant forces (not depending upon velocities). Since we want to identify the acceleration 𝑎⃗ with the total force acting upon our particle, we must have 𝑎⃗ as a 𝑇̂-invariant quantity. However, we have that ̂𝑐′ = 𝑇̂𝐷 ̂𝑐 = −𝐷 ̂𝑐 ; 𝐷 ̂𝑠′ = 𝑇̂𝐷 ̂𝑠 = 𝐷 ̂𝑠 𝐷 and thus ̂𝑐 𝑣⃗ + 𝐷 ̂𝑠 𝑢 ̂𝑐 𝑢 ̂𝑠 𝑣⃗, 𝑇̂𝑎⃗ = 𝐷 ⃗⃗ − 𝐷 ⃗⃗ − 𝐷 which means that ̂𝑐 𝑣⃗ + 𝐷 ̂𝑠 𝑢 𝐷 ⃗⃗ = 𝑎⃗;
̂𝑐 𝑢 ̂𝑠 𝑣⃗ = 0. 𝐷 ⃗⃗ + 𝐷
(4.5)
Now we may identify the total force 𝑓⃗ with our acceleration 𝑎⃗ as 𝑓⃗ = 𝑚𝑎⃗. If we put 𝑎⃗ = 𝑎⃗𝑐 + 𝑎⃗𝑠 , where 𝑎⃗𝑐 = 𝐷𝑐 𝑣⃗ = 𝐷𝑐2 𝑞⃗ = 𝑇̂𝑎⃗𝑐 = 𝑎⃗𝑐′ , 𝑎⃗𝑠 = 𝐷𝑠 𝑢 ⃗⃗ = 𝐷𝑠2 𝑞⃗ = 𝑇̂𝑎⃗𝑠 = 𝑎⃗𝑠′ showing that the current changes of 𝑢 ⃗⃗ are always compensated by the changes impressed by the stochastic motion on 𝑣⃗, because of the second equation in (4.5). Our last postulate for this approach is given by the association of the external force with a combination of current and stochastic accelerations in the form 𝑓⃗0 = 𝑚(𝜆1𝑎⃗𝑐 − 𝜆𝑎⃗𝑠 ) and since the total force is given by 𝑓⃗ = 𝑚(𝑎⃗𝑐 + 𝑎⃗𝑠 ) we may write 𝑓⃗ as a linear combination of the external force and 𝑎⃗𝑠 . From space time translational symmetry we conclude that 𝜆1 and
98
Olavo Leopoldino da Silva Filho
𝜆 must be constants and since we search for an equation that reproduces Newtonian mechanics in the limit 𝑎⃗𝑠 → 0, we must have 𝜆1 = 1 and thus 𝑓⃗0 = 𝑚(𝑎⃗𝑐 − 𝜆𝑎⃗𝑠 ); 𝑓⃗ = 𝑓⃗0 + 𝑚(𝜆 + 1)𝑎⃗𝑠 .
(4.6)
Equations (4.5) and (4.6) imply that our system is described by ̂𝑐 𝑣⃗ − 𝜆𝐷 ̂𝑠 𝑢 𝑓⃗0 = 𝑚(𝐷 ⃗⃗) , ̂ ̂ 𝐷𝑐 𝑢 ⃗⃗ + 𝐷𝑠 𝑣⃗ = 0 and using the operators (4.4) we may write these equations more explicitly as ⃗⃗ ∂𝑣 ∂𝑡 ⃗⃗ ∂𝑢 ∂𝑡
+ (𝑣⃗ ⋅ ∇)𝑣⃗ + 𝐷− ∇2 𝑣⃗ − 𝜆(𝑢 ⃗⃗ ⋅ ∇)𝑢 ⃗⃗ − 𝜆𝐷+ ∇2 𝑢 ⃗⃗ + ⋯ = 𝑓⃗0 /𝑚 + (𝑣⃗ ⋅ ∇)𝑢 ⃗⃗ + 𝐷− ∇2 𝑢 ⃗⃗ + (𝑢 ⃗⃗ ⋅ ∇)𝑣⃗ + 𝐷+ ∇2 𝑣⃗ + ⋯ = 0
,
(4.7)
which are our primary equations. These “equations (...) describe the motion of a particle subject simultaneously to the action of a 𝑇̂-invariant external force 𝑓⃗0 and a stochastic force generated by the interaction of the particle with its surroundings, under the assumption that the velocity may be written as the sum 𝑣⃗ + 𝑢 ⃗⃗ of a systematic and a stochastic component, each transforming under 𝑇̂ according to (4.1), and that the external force 𝑓⃗0 is related to the total force as in (4.6) with constant 𝜆. Since the two velocities 𝑣⃗ and 𝑢 ⃗⃗ satisfy a system of coupled equations, they are not independent, the stochastic and systematic motions influencing one another in a complex way. Due to this fact, we may expect that the motion of a particle satisfying Eqs. (4.7) differs fundamentally from the corresponding Newtonian case. Clearly, in the Newtonian limit, when 𝐷+ = 𝐷− = 0, etc., the second equation in (4.7) has the trivial solution 𝑢 ⃗⃗ = 0 and then the first one of these equations reduces to Newton’s second law 𝑚𝑑𝑣⃗/𝑑𝑡 = 𝑓⃗0.” (See ref. [65], p. 1624. My emphasis).
Note that the stressed part of the previous quotation is the source of the interpretation by means of which there may be an all-pervasive background stochastic field. There is nothing in the derivation that allows one to assume that the physical system under scrutiny is connected or not with a reservoir that produces this background stochastic field, since the appearance of the fluctuation was postulated, that is, since the fluctuating force was put a fortiori in the formal developments (by axiom). If the present formalism allows us to derive the Schrödinger equation and if we manage to show that this derivation is equivalent to those presented in this book, then the interpretation of an external reservoir producing background fluctuations becomes untenable, since this book’s derivations clearly denies the existence of any external source of energy or force whatsoever. To derive the Schrödinger equation we need equations less general than (4.7). We thus restrict our investigations to cases in which 𝐷+ and 𝐷− depend only upon the time, 𝑐⃗ and 𝑐⃗′
The Stochastic Derivation
99
(and thus 𝑣⃗ and 𝑢 ⃗⃗) are irrotational and that the external force may be derived from a potential. Noting that ⃗⃗) = (𝐴⃗ ⋅ ∇)𝐵 ⃗⃗ + 𝐴⃗ × (∇ × 𝐵 ⃗⃗) + (𝐵 ⃗⃗ ⋅ ∇)𝐴⃗ + 𝐵 ⃗⃗ × (∇ × 𝐴⃗), ∇(𝐴⃗ ⋅ 𝐵 we write ⃗⃗⋅𝑣 ⃗⃗ 𝑣
(𝑣⃗ ⋅ ∇)𝑣⃗ = ∇ (
2
⃗⃗⋅𝑢 ⃗⃗ 𝑢
⃗⃗ ⋅ ∇)𝑢 ⃗⃗ = ∇ ( ) ; (𝑢
(𝑣⃗ ⋅ ∇)𝑢 ⃗⃗ + (𝑢 ⃗⃗ ⋅ ∇)𝑣⃗ = ∇(𝑢 ⃗⃗ ⋅ 𝑣⃗)
2
) .
(4.8)
We also use the fact that ∇ × (∇ × 𝐴⃗) = ∇(∇ ⋅ 𝐴⃗) − ∇2 𝐴⃗ to write ∇2 𝑣⃗ = ∇(∇ ⋅ 𝑣⃗); ∇2 𝑢 ⃗⃗ = ∇(∇ ⋅ 𝑢 ⃗⃗).
(4.9)
Equalities (4.8) and (4.9) allow us to rewrite (4.7) as ⃗⃗ ∂𝑣 ∂𝑡 ⃗⃗ ∂𝑢 ∂𝑡
𝑣2
+ ∇[
2
+ 𝐷− ∇ ⋅ 𝑣⃗ − 𝜆
𝑢2 2
− 𝜆𝐷+ ∇ ⋅ 𝑢 ⃗⃗] + ⋯ = −∇𝑉/𝑚
,
(4.10)
+ ∇[𝑢 ⃗⃗ ⋅ 𝑣⃗ + 𝐷− ∇ ⋅ 𝑢 ⃗⃗ + 𝐷+ ∇ ⋅ 𝑣⃗] + ⋯ = 0
which is the system of equations that we will show as equivalent, under some assumptions, to the Schrödinger equation.
4.1.1. The Derivation of the Schrödinger Equation Equations (4.10) are a system of coupled nonlinear differential equations. In some special cases this system may become uncoupled and linear. To show this, let us use the assumption that both 𝑣⃗ and 𝑢 ⃗⃗ are irrotational to write 𝑣⃗ =
∇𝑠(𝑞⃗⃗;𝑡) 𝑚
;𝑢 ⃗⃗ = 2𝐷0∇ln𝑅(𝑞⃗; 𝑡),
(4.11)
where 𝑅(𝑞⃗; 𝑡) and 𝑠(𝑞⃗; 𝑡) are real dimensionless functions of 𝑞⃗ and 𝑡 and 𝐷0 is a constant such that 𝐷+ = 𝐷0 𝜂+ and 𝐷− = 𝐷0 𝜂−, where 𝜂+ and 𝜂− are dimensionless functions of time. When 𝐷+ does not depend upon time, it is clear that we may put 𝜂+ = 1. It is very important to notice that the results of the previous sections were developed to arbitrary order in the Taylor expansions in 𝛿𝑡, although only terms up to second order were shown. We now follow[65] and “(...) restrict ourselves to a special case of Eqs. (4.10), namely, that which is obtained by putting all the coefficients of the terms of order > 2 in the series (4.4) equal to zero, i.e., by assuming that, for 𝛿𝑡 → 0, only the first and second moments become proportional to 𝛿𝑡. (...) Truncating the Taylor series, as was done above, is equivalent to stating a priori that all moments of order greater than two are of order (𝛿𝑡)𝑘 , with 𝑘 > 1,
100
Olavo Leopoldino da Silva Filho so that in the limit 𝛿𝑡 → 0 the ratio of these moments to 𝛿𝑡 vanish. This would appear to mean that the process is Markovian, since for this type [of processes] the higher coefficients do, indeed, vanish.”
Note that the present derivation, if successful, gives us a very good clue about the significance of making 𝛿𝑡 or 𝛿𝑞 ‘infinitesimal’ (going up only to second order in Taylor expansions in these variables). In the next chapter we will mathematically show that this is precisely the case and this result is a beautiful example of how one derivation may help clarifying another, as long as they are mathematically connected and show themselves equivalent. Taking (4.11) into (4.10) we find ∇{
(∇𝑠)2 ∂𝑠(𝑞⃗; 𝑡) + 𝑉(𝑞⃗) − [𝜂− ∇2 𝑠 − ∂𝑡 2𝑚
+2𝑚𝐷02 𝜆𝜂+ [ ∇{
∇2 𝑅 ∇𝑅 2 ∇𝑅 2 2 − ( ) ] + 2𝑚𝐷0 𝜆 ( ) ]} = 0 𝑅 𝑅 𝑅
,
2 ∂𝑅(𝑞⃗; 𝑡) ∇2 𝑅 ∇𝑅 2 ∇𝑠 ⋅ ∇𝑅 𝜂+ 2 − [2𝐷0𝜂− [ −( ) ]−2 − ∇ 𝑠]} = 0 𝑅 ∂𝑡 𝑅 𝑅 𝑚𝑅 𝑚
which can be readily integrated. As before, the integration constants may be absorbed within 𝑅(𝑞⃗; 𝑡) and 𝑠(𝑞⃗; 𝑡). We now fix the constants as 𝜂+ = 1, 𝜂− = 0, 𝐷+ = 𝐷0 , 𝜆 = 1
(4.12)
to rewrite the previous two differential equations as ∂𝑠(𝑞⃗⃗;𝑡)
+ 𝑉(𝑞⃗) +
∂𝑡 2 ∂𝑅(𝑞⃗⃗;𝑡) 𝑅
∂𝑡
(∇𝑠)2 2𝑚
− 2𝑚𝐷02
2 ∇𝑠
+ [𝑅 𝑚 ⋅ ∇𝑅 +
∇2 𝑅 𝑅
∇2 𝑠
=0
,
(4.13)
]=0 𝑚
where we have already integrated them to remove their outermost gradients. The last of these two equations may be written as ∂𝑅(𝑞⃗; 𝑡) ∇𝑠 ∇2 𝑠 + [2 ⋅ 𝑅∇𝑅 + 𝑅 2 ]= ∂𝑡 𝑚 𝑚 , ∂𝑅(𝑞⃗; 𝑡)2 ∇𝑠(𝑞⃗; 𝑡) 2 + ∇ ⋅ [𝑅(𝑞⃗; 𝑡) ]=0 ∂𝑡 𝑚 2𝑅(𝑞⃗; 𝑡)
and is the continuity equation. The first equation in (4.13) is our known Madelung equation if we put 𝐷0 = ℏ/2𝑚. As we have already noted in previous chapters, the two equations in (4.13) are equivalent to the Schrödinger equation if we put 𝜓(𝑞⃗; 𝑡) = 𝑅(𝑞⃗; 𝑡)exp [
𝑖𝑠(𝑞⃗; 𝑡) ]. 2𝑚𝐷0
The Stochastic Derivation
101
This derivation, in turn, may help us appreciate the arguments of Wallstrom[26], presented at the end of the first section, chapter two, after equation (2.10). Indeed, we must agree with Wallstrom that, from the point of view of the present derivation, there is no need for 𝜓 to be single valued and thus, strictly speaking, the Schrödinger equation is just a particular case of the previous derivation, when we assume as another axiom that 𝜓 must have indeed this property. Now, the equivalence between the derivation of this chapter and those of the previous chapters, whenever mathematically proved, will show that we must assume that 𝜓 is single valued, since it must be considered a probability amplitude. One of the most important expressions of this section is (4.11), which we rewrite here as 𝑢 ⃗⃗(𝑞⃗; 𝑡) =
ℏ 𝑚
∇ln𝑅 =
ℏ 2𝑚
∇ln𝜌(𝑞⃗; 𝑡) =
ℏ 2𝑚𝑘𝐵
∇𝑆(𝑞⃗; 𝑡),
(4.14)
where we put, as usual, 𝜌(𝑞⃗; 𝑡) = 𝑅 2 (𝑞⃗; 𝑡) and wrote 𝑆(𝑞⃗; 𝑡) = 𝑘𝐵 ln𝜌(𝑞⃗; 𝑡) for the entropy. This last equation associates the stochastic velocity with the gradient of the entropy, which is precisely the content of the fluctuation-dissipation theorem (see [25], pp. 594-597), generally written as 𝑢𝑖 =
𝑑⟨𝛿𝑞𝑖 ⟩ 𝑑𝑡
1 ∂𝑆
= ∑𝑗 𝛼𝑖𝑗 𝑘
𝐵
∂𝑞𝑗
,
(4.15)
where 0
𝛼𝑖𝑗 = ∫−∞ ⟨𝑐𝑖 (0)𝑐𝑗 (𝑤)⟩ 𝑑𝑤 = 0
ℏ 2𝑚
(4.16)
is a coupling constant related to the total velocity correlation function. We will return to these results when we develop the Langevin equations for truly random Quantum Mechanical systems; for now it suffices to see that we must have ℏ
1
𝑐⃗ = 𝑚 ∇ [2 𝑆(𝑞⃗; 𝑡) + 𝑠 ′ (𝑞⃗; 𝑡)],
(4.17)
where 𝑠 ′ (𝑞; 𝑡) = 𝑠(𝑞⃗; 𝑡)/ℏ, which is dimensionless.
4.2. CONNECTIONS WITH PREVIOUS DERIVATIONS The derivation of the previous section and the one we made in chapter three are seemingly equivalent. To see this equivalence in more explicit terms, we may work backwards. We thus begin with the stochastic expressions (4.10) and the assumptions (4.12) to write ∂𝑝⃗ ∂𝑡 ⃗⃗ ∂𝑢 ∂𝑡
𝑝⃗2
+ ∇ 2𝑚 = −∇[𝑉 + 𝑄] + ∇ [𝑢 ⃗⃗ ⋅ 𝑣⃗ +
ℏ 2𝑚
∇ ⋅ 𝑣⃗] = 0
,
(4.18)
102
Olavo Leopoldino da Silva Filho
where 𝑄(𝑞⃗; 𝑡) = −
𝑚 ℏ ⃗⃗(𝑞⃗; 𝑡)2 + ∇ ⋅ 𝑢 ⃗⃗] [𝑢 2 𝑚
is the so-called “quantum potential”. Given all the derivations already made it is obvious that 𝑄(𝑞⃗; 𝑡) cannot be interpreted as a true potential, being only the kinetic energy fluctuations of the quantum problem (which is temperature, as already noticed in the previous chapter). In fact, using (4.14) we may write 𝑄(𝑞⃗; 𝑡) = −
ℏ2 1 1 { 2 [∇𝑆(𝑞⃗; 𝑡)]2 + ∇2 𝑆(𝑞⃗; 𝑡)} = 4𝑚 2𝑘𝐵 𝑘𝐵
ℏ2 1 ∇𝜌 2 ∇𝜌 2 1 ℏ2 1 ∇𝜌 2 1 { ( ) − ( ) + ∇2 𝜌} = − {− ( ) + ∇2 𝜌} =, 4𝑚 2 𝜌 𝜌 𝜌 4𝑚 2 𝜌 𝜌 2 2 (∇𝑅)2 2 ℏ 1 1 4𝑅 ℏ − { ∇ ⋅ (2𝑅∇𝑅) − }=− ∇2 𝑅(𝑞⃗; 𝑡) 4𝑚 𝑅 2 2 𝑅2 2𝑚𝑅(𝑞⃗; 𝑡) −
which is exactly the result found in equation (3.23) of the previous chapter. In fact, we could rewrite that equation in the format ∂𝑝⃗(𝑞⃗; 𝑡) 𝑝⃗2 ℏ2 ∇2 𝑅(𝑞⃗; 𝑡) +∇ = −∇ [𝑉(𝑞⃗) − ], ∂𝑡 2𝑚 2𝑚𝑅(𝑞⃗; 𝑡) which may be compared with the first equation in (4.18). We also have made the same identifications in both derivations, that is 𝑝⃗(𝑞⃗; 𝑡) = ∇𝑠(𝑞⃗; 𝑡),
𝜌(𝑞⃗; 𝑡) = 𝑅(𝑞⃗; 𝑡)2
and most important 𝑢 ⃗⃗(𝑞⃗; 𝑡) =
ℏ ∇S, 2𝑚
such that 𝑐⃗ − 𝑣⃗ = 𝑢 ⃗⃗.
(4.19)
This means that our “distribution in 𝛿𝑞” that was assumed (but whose explicit functional form was not needed) is simply our 𝑍(𝑞⃗, 𝛿𝑞⃗; 𝑡) = 𝜌(𝑞⃗; 𝑡)exp [𝛿𝑞⃗ ⋅
1 1 ∇𝑆(𝑞⃗; 𝑡) − 𝛿𝑞⃗2 ∇2 𝑆(𝑞⃗; 𝑡)] 𝑘𝐵 2𝑘𝐵
and its expansion up to second order is equivalent to the expansion of the preceding differences
The Stochastic Derivation
103
𝑓(𝑞⃗(𝑡 + 𝛿𝑡), 𝑡 + 𝛿𝑡) − 𝑓(𝑞⃗(𝑡), 𝑡) 𝛿𝑡 only up to second order in 𝛿𝑡. This means that our entropy derivation inherits the explanation, much obvious in the stochastic approach, that the statistical moments greater than the second must go with 𝛿𝑡 𝑘 , 𝑘 > 1, (or some other equivalent behavior) so that the truncation is exact, and not merely an approximation. In fact, in the stochastic approach, we may easily derive the equation (see [65], p. 1626, eq. 40) ∂𝜌 ℏ 2 + ∇ ⋅ (𝑐⃗𝜌) − ∇ 𝜌 = 0, ∂𝑡 2𝑚 which is a Focker-Planck equation in configuration space, or better, “it is a particular case of a generalized Focker-Planck-Kolmogorov equation of the type discussed by Pawula [66], with 𝑐⃗ and 𝐷[=
ℏ
2𝑚
] proportional to the first and second
moments of the distribution in 𝛿𝑞⃗, respectively, and the ratio of all other moments to 𝛿𝑡 vanishing in the limit 𝛥𝑡 → 0.” (See [65], p. 1627).
This assumption, thus, allows us to assume that the expression for 𝑍(𝑞⃗, 𝛿𝑞⃗; 𝑡) is a Gaussian function in 𝛿𝑞⃗, and this leads to the Gaussian expression for the function 𝐹(𝑞⃗, 𝑝⃗; 𝑡), for each point 𝑞⃗ of the space. In fact, given (4.19) and 𝑝⃗ = 𝑚𝑐⃗, our 𝐹(𝑞⃗, 𝑝⃗; 𝑡) function, written as 𝐹(𝑞⃗, 𝑝; 𝑡) =
𝜌(𝑞⃗; 𝑡) 3 [𝑝⃗ − 𝑝⃗(𝑞⃗; 𝑡)]2 exp {− } [2𝜋⟨𝛿𝑝⃗(𝑞⃗; 𝑡)2 ⟩/3]3/2 2 ⟨𝛿𝑝⃗(𝑞⃗; 𝑡)2 ⟩
is, in fact, a Gaussian distribution function for the velocities (or momenta) 𝑝⃗. Such kind of Gaussian functional behavior is akin to the Central Limit Theorem and we will make all these mathematical connections explicit in the next chapter. In any case, the present approach showed us that our second axiom in both the entropy and the characteristic function derivations implies that we are assuming our quantum phenomenon to be of a Markovian type. On the other hand, the appearance of a Focker-Planck equation points in the direction of Langevin random systems, which are known to be equivalent to this equation. We will assess this topic in chapter six.
4.3. ENSEMBLE AND TIME AVERAGES: THE ERGODIC ASSUMPTION Now, this is the point at which we can remove the last epistemic blocking to which we have alluded in the previous chapter for a corpuscular monist ontology. In fact, we mentioned two main obstacles to the acceptance of a corpuscular ontology: the first obstacle was the phenomena of diffraction (Bragg-Laue and by an aperture) and
104
Olavo Leopoldino da Silva Filho
interference (double slit) that were assumed in the beginning of the XX 𝑡ℎ century and most of the XIX 𝑡ℎ century to be irreducible to a corpuscular explanation. This blocking was removed in the previous chapter when we used an ensemble picture to show that the symmetries of the problem allow us to use the Bohr-Sommerfeld rules that, together with the notion of discrete momentum-energy exchange, implies a corpuscular explanation. Note that these examples naturally favoured an ensemble picture, for the interference-diffraction pattern is a consequence of the repetition of innumerous experimental setups in which only one particle is sent at a time (for one-particle systems). In the last chapter we also said that this explanation, although quite convincing, given its grounds on the mathematical formalism, etc., could not be applicable to situations in which no ensemble was at our disposal. In fact, consider the behavior of one hydrogen atom when observed in the course of time. This physical system consisting of only one atom cannot, without further considerations, be connected to an ensemble picture, but its properties still come from a solution to the wavelike Schrödinger equation. The removal of this last impediment to a full corpuscular approach rests on the ergodic assumption. In fact, in statistical physics there are two types of averages that are of interest. We have first ordinary statistical average of some function ℎ(𝑞⃗; 𝑡) at a given time over all systems of an ensemble. This ensemble average is defined by 𝑁
1 ⟨ℎ(𝑞⃗; 𝑡)⟩ = ∑ ℎ(𝑘) (𝑞⃗; 𝑡), 𝑁 𝑘=1
where ℎ(𝑘) (𝑞; 𝑡) is the value assumed by ℎ(𝑞⃗; 𝑡) in the 𝑘 𝑡ℎ system of the ensemble and where 𝑁 is a very large total number of systems in the ensemble (see [25], p. 583). This is the type of average that we were considering in chapter three, which sufficed to explain in corpuscular terms the outcomes of interference-diffraction patterns. However, there is also another average of interest in which the function ℎ(𝑞⃗; 𝑡) for a single system is picked during some large enough time interval 2Δ𝑡. This single-system time average is defined by ℎ(𝑞⃗; 𝑡) =
1 +Δ𝑡 (𝑘) ∫ ℎ (𝑡 + 𝑡 ′ )𝑑𝑡 ′ , 2Δ𝑡 −Δ𝑡
where now we took one 𝑘-component of the ensemble and calculate the average using only the system under scrutiny. If we consider the drawings in Figure 4-1, we may say that an ensemble average at some time 𝑡 is represented by the vertical line there shown, while a time average is represented by the horizontal line also shown. Sometimes the two operations of taking the average commute. Indeed, 1
1
+Δ𝑡
(𝑘) (𝑡 + 𝑡 ′ )𝑑𝑡 ′ ] = ⟨ℎ(𝑞⃗; 𝑡)⟩ = 𝑁 ∑𝑁 𝑘=1 [2Δ𝑡 ∫−Δ𝑡 ℎ 1
+Δ𝑡 1
∫ 2Δ𝑡 −Δ𝑡
1
+Δ𝑡
∑𝑁 ℎ(𝑘) (𝑡 + 𝑡 ′ )𝑑𝑡 ′ = ⟨ℎ(𝑞⃗; 𝑡 + 𝑡 ′ )⟩𝑑𝑡 = ⟨ℎ(𝑞⃗; 𝑡)⟩ ∫ 𝑁 𝑘=1 2Δ𝑡 −Δ𝑡
.
(4.20)
The Stochastic Derivation
105
If the system is in a stationary situation with respect to the function ℎ(𝑞⃗; 𝑡), then, most generally, there can be no preferred origin in time for the statistical description of the physical system, which means that the same ensemble follows when we take all the ℎ(𝑘) (𝑞⃗; 𝑡) into consideration, no matter at which instant of time we begin with. If we assume that in the course of some time interval Δ𝑡 the function ℎ(𝑘) (𝑞⃗; 𝑡) will pass through all the values accessible to it, then the time average, made using one single system during some time interval Δ𝑡 would be identical to the ensemble average, using a large number of systems at some instant of time. This is precisely the ergodic assumption. Pictorially, we can see this by looking at Figure 4.1 again. In the time average over system 𝑘 we may divide the time line into intervals of size 2Δ𝑡 and align them vertically. The stationary character of the physical system (considering function ℎ(𝑞⃗; 𝑡)—whenever the system is in equilibrium, this will be valid for all functions) guarantees that the ensemble formed in this way will behave statistically in exactly the same way as the original ensemble. In fact, in such cases, the ensemble average cannot depend upon the instant of time at which the average was made and the time-average cannot depend upon the specific system used to perform it. We thus have ⟨ℎ(𝑞⃗; 𝑡)⟩ = ⟨ℎ(𝑞⃗)⟩; ℎ(𝑘) (𝑞⃗; 𝑡) = ℎ(𝑞⃗) and property (4.20) guarantees that ⟨ℎ(𝑞⃗; 𝑡)⟩ = ⟨ℎ(𝑞⃗)⟩ = ⟨ℎ(𝑞⃗; 𝑡)⟩ = ℎ(𝑞⃗).
Figure 4.1. Ensemble and time averages. The vertical line represent an average at some time t over all the systems composing the ensemble, while the horizontal line represent an average over one single system in the time interval Δt.
In this book we are considering only stationary quantum systems. If these systems behave in the way needed for the use of the ergodic assumption, then the last epistemic blocking can be easily removed. In fact, the present approach means that the old dispute about Quantum Mechanics being related to ensembles (exclusive-)or single systems is dissolved by simple
106
Olavo Leopoldino da Silva Filho
erasing the “exclusive”. This discussion is at the heart of many fierce debates on the foundations of Quantum Mechanics. Indeed: “the probability wave in a configuration space of 3𝑛 dimensions contains statistical statements about one system of n electrons, which can for this purpose be imagined, as in Gibbs’ thermodynamics, as a sample selected arbitrarily from an infinite statistical assembly of identically constructed systems.” (see [67], p. 13).
while “of primary importance is the assertion that a quantum state (pure or otherwise) represents an ensemble of similarly prepared systems. For example, the system may be a single electron. Then the ensemble will be the conceptual (infinite) set of all single electrons which have been subjected to some state preparation technique (...) generally by interaction with a suitable apparatus. (...) In general, quantum theory predicts nothing which is relevant to a single measurement (excluding strict conservation laws like those of charge, energy, or momentum), and the result of a calculation pertains directly to an ensemble of similar measurements” (see [22], pp. 360-361).
Of course, each specific experimental setup may privilege an ensemble (exclusive-) or a single particle picture. For the interference and diffraction situations, it was clearly an ensemble picture that was into play, while an experimental setup to take averages considering one single hydrogen atom selects the single system picture. This solution gives us a very nice picture of what is going on with the dualistic interpretation. It is obvious (mathematically obvious) that Quantum Mechanics refers to some sort of wave behavior, since the Schrödinger equation is a wave equation. If we apply this equation to a single system but assume that its results refer to a single instant of time, then there is no other option except to assume that the system must have a wave like nature— being this the leap from behavior to nature. Note that to make statistical measurements on a single system there is no other possibility as to make them using many time intervals. The solution, or, better, the dissolution, of the ensemble versus single system dispute also implies that the dualistic ontology may be based upon a misconception. A misconception already mentioned some times in this book, in which the behavior of the ensemble is misinterpreted as the nature of the system, and the two levels of description are collapsed into one ontological level. The ergodic assumption assures us that even one single system may behave within a time interval as an ensemble and it is to this behavior that the Schrödinger equation is related. However, there remains to be shown that the ergodic assumption can be applied to quantum systems. To mathematically show this we must find the equations related to Quantum Mechanics and fluctuation. The stochastic equations we found before are not this sort of equation, since there is no random force appearing in the calculations. Indeed, the results of this chapter are based on the average operation represented in (4.2). This averaging operation washes out the random nature of the quantum mechanical systems. In chapter six we will mathematically derive the Langevin equations that reproduce all the results of Quantum Mechanics as obtained from the Schrödinger equation; we will then
The Stochastic Derivation
107
make simulations to prove that the ensemble and time averages give the same results, showing that the ergodic assumption is appropriate.
4.4. CONSEQUENCES OF THE STOCHASTIC DERIVATION It is not enough, however, to simply remove the epistemic obstacles for a corpuscular ontology. Since quantum mechanical phenomena were previously interpreted using a dualist approach, it remains for the proponents of a different perspective to show, if not for all phenomena, at least for the most representative ones, that the monist corpuscular interpretation can also produce acceptable explanations. In the next two subsections we will show how the developments we have made so far are capable of explaining two important characteristics of quantum phenomena in general. The reader should be aware of the fact that when giving such interpretations we will not even allude to semantic constructs as observers, duality (of nature), reduction of the wave packet, complementarity, etc. This is an important feature of our interpretation, since it drastically downsizes the underlying ontology.
4.4.1. On the Dispersion Relations 𝚫𝑬𝚫𝒕 ≥ ℏ/𝟐: When analyzing the connection between ensembles and single systems by means of the ergodic assumption it became clear that this assumption can be applied as long as the time interval Δ𝑡 is large enough to let the individual system behave as an ensemble, being then describable by the Schrödinger equation. This bears great resemblance with the energy-time Heisenberg relation. This relation plays a somewhat distinct role within the quantum formalism, since it cannot be derived by the usual calculations (beginning with the commutator and then taking averages), for the very reason that Δ𝑡 cannot be interpreted as a statistical dispersion in time (nor as an imprecision or uncertainty), since time is a parameter: its value is prescribed, not measured. Just as the Heisenberg relations come from the averages done over ⟨𝛿𝑝(𝑞; 𝑡)2 ⟩⟨𝛿𝑞(𝑞; 𝑡)2 ⟩ =
ℏ2 , 4
we may use that ⟨Δ𝐸(𝑞; 𝑡)⟩Δ𝑡 =
ℏ2 4
,
(4.21)
where now we use the time interval Δ𝑡, and not some average of it. In the present formalism this relation can be derived quite easily, using the common tools of statistical theory. We begin by defining the energy correlation function given by 𝐾(𝑤) = ⟨𝛿𝐸(𝑡)𝛿𝐸(𝑡 + 𝑤)⟩,
108
Olavo Leopoldino da Silva Filho
which is an ensemble average over the energy fluctuations taken at different instants of time. This definition implies that ⟨𝛿𝐸(𝑡)2 ⟩ = 𝐾(0), since we are supposing, without loss of generality, that ⟨𝛿𝐸(𝑡)⟩ = 0. Let us now introduce the Wiener-Khintchine relations (see [25], p. 585), given by +∞
𝐽(𝐸)𝑒 𝑖𝐸𝑤/ℏ 𝑑𝐸,
𝐾(𝑤) = ∫ −∞
where the coefficient 𝐽(𝐸) is called the spectral density. We must have, by inversion of the previous expression, 𝐽(𝐸) =
ℏ +∞ ∫ 𝐾(𝑤)𝑒 −𝑖𝐸𝑤/ℏ 𝑑𝑤, 2𝜋 −∞
giving +∞
⟨𝛿𝐸(𝑡)2 ⟩ = ∫
𝐽(𝐸)𝑑𝐸.
−∞
Since 𝛿𝐸(𝑡) is stationary and ergodic (by assumption), 𝐾(𝑤) is time independent and we can use a time average in place of the ensemble average; hence 𝐾(𝑤) = ⟨𝛿𝐸(𝑡)𝛿𝐸(𝑡 + 𝑤)⟩ = 𝛿𝐸(𝑡)𝛿𝐸(𝑡 + 𝑤) = 1
+Δ𝑡/2
𝛿𝐸(𝑡)𝛿𝐸(𝑡 + 𝑤)𝑑𝑤 ∫ Δ𝑡 −Δ𝑡/2
,
(4.22)
and Δ𝑡 is the minimum time that allows us to exchange ensemble and time averages. By defining the new function (see [25], p. 582) 𝛿𝐸(𝑡) −Δ𝑡/2 < 𝑡 < Δ𝑡/2 𝛿𝐸𝑡 (𝑡) = { , 0 otherwise and taking the spectral decomposition of this time-dependent function as 𝛿𝐸(𝑡) =
1 √2𝜋
+∞
∫
𝜒(𝐸)𝑒 𝑖𝐸𝑡 𝑑𝐸,
−∞
we finally find, substituting these results back in (4.22), that 𝐾(𝑤) =
ℏ +∞ ∫ 𝜒(−𝐸)𝜒(𝐸)𝑒 𝑖𝐸𝑤/ℏ 𝑑𝐸. Δ𝑡 −∞
The Stochastic Derivation
109
With the result (see [25], p. 583) 𝜒(−𝐸) = 𝜒 ∗ (𝐸), we get +∞
⟨𝛿𝐸(𝑡)2 ⟩Δ𝑡 = ℏ ∫
|𝜒(𝐸)|2 𝑑𝐸,
−∞
which can be made equal to (4.21) if we put +∞
ℏ
∫−∞ |𝜒(𝐸)|2 𝑑𝐸 = 4,
(4.23)
which is very similar to result (4.16), related to velocity or momentum fluctuations. Thus, the energy-time Heisenberg relation must (within the present approach) be interpreted as giving, for each level of a quantum mechanical problem, the minimum time window necessary to make such time averages to become identical to ensemble averages, being just a statement of the ergodic assumption. In fact, an explicit interpretation of this relation would be: if the system fluctuates too much in energy, one should expect that less time looking at its single system’s behavior would suffice to let it fill its allowed states (which seems a very natural and obvious statement—quite different from the usual interpretation that appeals to things like observers, complementarity and uncertainty.) Thus, to say that all quantum mechanical phenomena must obey the energy-time Heisenberg relations is the same as saying that the ergodic assumption must be valid for all quantum mechanical phenomena. The impressive result, that pertains to (if not define) the domain of application of quantum mechanics is the expression (4.23) in which we find a universal constant. This result comes neatly from the framework of a stochastic approach.
4.4.2. On the Superposition Principle It is important that the present approach shows itself capable of giving explanations of some of the most central expressions of the formalism. We would also welcome any development that may help us avoiding misconceptions regarding our understanding of central concepts of the theory and their interpretation. Another central concept of Quantum Mechanics comes from the linear wave like character of the Schrödinger equation and is known as the possibility of constructing solutions of quantum mechanical problems that are linear combinations (superpositions) of other solutions. Usually, this superposition property is mentioned as being one feature of Quantum Mechanics that is absent of Classical Physics, where we remember the distinction we have already made between Classical Physics and Newtonian Mechanics. It is beyond doubt that the present formalism is not applicable to Newtonian Mechanics and, indeed, the developments of the previous sections have shown that Newtonian Mechanics must be a particular case of Quantum Mechanics, whenever the role played by fluctuations is not
110
Olavo Leopoldino da Silva Filho
relevant—the so called classical limit will be explicitly worked out in chapter six. The question about the incommensurability between Quantum and Classical Physics is more subtle, for it would mean that the former cannot be understood using only the same semantic constructs of the latter. Thus, the argument goes, the existence of superposition, a syntactic element, in the quantum mechanical formalism and its absence from the classical formalism would give grounds to the argument about an incommensurability of their interpretations (a semantic notion)4. It remains to understand what is meant by “absent”, since it would appear from our previous developments that this sort of syntactic absence would be quite unexpected. Indeed, if we mathematically derive the Schrödinger equation from the stochastic Liouville equation (which is classical, despite stochastic), it would be rather impressive if there is any feature of the quantum formalism that is absent from the classical one. In fact, the argument is usually put in these precise terms: the Schrödinger equation is related to the notion of amplitudes of probability and when we use linear combination of such amplitudes some “extra” terms appear reflecting the “interference” of the sates represented by each probability amplitude in the linear combination. Mathematically, this is simply expressed by the fact that, if we have {𝜓𝑛 } representing the states of some quantum mechanical system, then 𝜓 = ∑ 𝑎𝑘 𝜓𝑘 𝑘
would also be a solution of the Schrödinger equation and the total probability density would be written as 𝜌 = ∑𝑗𝑘 𝑎𝑘∗ 𝑎𝑗 𝜓𝑘∗ 𝜓𝑗 ,
(4.24)
and will have terms differing from the sum of probability densities, for which only 𝑘 = 𝑗 would be expected. This is usually compared to the mathematical behavior of the function 𝐹(𝑞, 𝑝; 𝑡) for which it is also valid a superposition principle, but at the level of the densities and not the amplitudes, precluding the Liouville equation of showing the same type of formal behavior as previously shown by the Schrödinger equation. There are two misconceptions here: the first one ensues when one compares two quite different objects, which would be a probability amplitude defined upon configuration space and a probability density defined upon phase space. What is being assumed in this first misconception is that there is no possibility to define, within the realm of the stochastic Liouville equation, the notion of amplitude to which arguments similar to the one above shown [Eq. (4.24)] could be applied. This is wrong, as we
4
Between Quantum and Newtonian Mechanics we may establish a limiting relation, showing that one is reduced to the other in some limiting situation (absence or irrelevance of randomness)—and takes place at the syntactic level. Between Quantum and Classical interpretations (use of constructs) the relation is one of commensurability, since it takes place at the semantic level.
The Stochastic Derivation
111
expect the present derivation has already shown exhaustively. In fact, we have assumed in the second axiom of the characteristic function derivation that 𝜓 ∗ (𝑞 −
𝛿𝑞 2
; 𝑡) 𝜓 (𝑞 +
𝛿𝑞 2
𝑖𝑝𝛿𝑞
; 𝑡) = ∫ 𝐹(𝑞, 𝑝; 𝑡)exp (
ℏ
) 𝑑𝑝,
(4.25)
and we have already shown that this induces the notion of phase space amplitudes, given by [see equation (2.38)] 𝜓 (𝑞 +
𝛿𝑞 𝑖 ; 𝑡) = 𝐅{ϕ(𝑞, 𝑝; 𝑡)} = ∫ exp ( 𝑝𝛿𝑞) ϕ(𝑞, 𝑝; 𝑡)𝑑𝑝, 2 2ℏ
or simply 𝜓(𝑞; 𝑡) = ∫ ϕ(𝑞, 𝑝; 𝑡)𝑑𝑝 and +∞
𝐹(𝑞, 𝑝; 𝑡) = ∫
ϕ∗ (𝑞, 2𝑝 − 𝑝′ ; 𝑡)ϕ(𝑞, 𝑝′ ; 𝑡)𝑑𝑝′ ,
−∞
which means that any superposition of 𝜓(𝑞; 𝑡) would be related to superpositions of ϕ(𝑞, 𝑝; 𝑡) which will make terms like those of (4.24) to appear. The second misconception comes with the assumption that because the Liouville equation (stochastic or not) is written for the probability density it is incompatible with superposition effects. The previous arguments could also be invoked to show how this assumption is mistaken but it would seem more convincing if we can show that the function 𝐹(𝑞, 𝑝; 𝑡) may represent any superposition behavior This is done in table 4 for the harmonic oscillator and the hydrogen atom (two of our prototype examples). From Table 5 it easy to see that all the superposition content of the amplitudes are reflected in the phase space probability density function. This equivalence is by no means trivial, if one notes that these superpositions give extremely complicated 𝐹(𝑞, 𝑝; 𝑡). For instance, the hydrogen atom case with 2𝜓200 + 𝜓210 in Table 5 gives (𝜉 = cos𝜃) 𝐹(𝑟, 𝜃, ϕ, 𝑝𝑟 , 𝑝𝜃 , 𝑝ϕ ) =
3√12|−2𝑟+4+𝑟cos𝜃|3 32𝜋5/2 (32+𝑟 2 𝜉 2 −4𝑟 2 𝜉+8𝑟𝜉−19+4𝑟 2 )3/2
𝑝2 𝑝2 ϕ 𝜃+ )𝑟[4𝑟 2 −16𝑟−4𝑟 2 𝜉+16+8𝑟𝜉+𝑟 2 𝜉 2 ] 𝑟2 𝑟2 sin2 𝜃
(𝑝𝑟 2 +
exp {−3
32+𝑟 2 𝜉 2 −4𝑟 2 𝜉+8𝑟𝜉−19+4𝑟 2
exp(−𝑟) × .
}
In any case, this shows that it is wrong to assume that the Liouville equation [or its momentum integrated solution 𝐹(𝑞, 𝑝; 𝑡)] is not compatible with superposed states. What is correct is to note that the Liouville equation, being an equation for densities, cannot reveal some eventual underlying superposition state—the Schrödinger equation is here precisely to do that. Everything goes as if the assumption (4.25) (essentially our second axiom) allows us to access these possibilities that are compatible with the stochastic Liouville
112
Olavo Leopoldino da Silva Filho
equation exactly by arriving at an equation (Schrödinger’s) which “cracks” the 𝐹(𝑞, 𝑝; 𝑡) nutshells to see its quantum mechanical “meat”. Thus, the derivation process (with the second axiom) opens a new formal dimension full of new results. This is undisputable! That we must look at the nutshells with different glasses compared to those we use to look at the meat is a much more disputable supposition—indeed one with which we contend5. Table 5. Various averages for the harmonic oscillator problem using the superposition of probability amplitudes and F(q,p;t) calculated from them. Only the results for F(q,p;t) are shown because they are all identical to those obtained using the probability amplitudes of the usual calculation 𝝍𝒏
⟨𝒒⟩
⟨𝒑⟩
⟨𝒒𝟐 ⟩
⟨𝒑𝟐 ⟩
⟨𝑬⟩
3𝜓0 − 2𝜓1 2𝜓2 + 5𝜓3 𝜓0 + 2𝜓1 − 3𝜓4 𝜓0 + 2𝜓1 − 𝜓3 + 3𝜓5 𝜓𝑛ℓ𝑚
−6√2 24.4949 2.82843
0 0 0
10.5 97.5 47.0
10.5 97.5 47.0
10.5 97.5 47.0
2.82843
0
41.185
77.815
59.5
⟨𝑟⟩
⟨cos𝜃⟩
⟨𝑟 2 ⟩
2𝜓200 + 𝜓210
29
−2
128
⟨cos2 𝜃⟩ 29 15
3𝜓300 − 𝜓320
132
0
1989
⟨𝐸⟩ 5 − 8 10 − 18
2.3924
Having clarified this point, we can now show that the present approach may interpret the issue of state superposition in very usual terms. If Quantum Mechanics were to apply only to ensembles, then it would be quite obvious how one should interpret the expression 𝜓 = ∑𝑛 𝑐𝑛 𝜓𝑛 ,
(4.26)
where 𝜓𝑛 are eigenstates of some problem under consideration. In this case, the |𝑐𝑛 |2 would represent the statistical frequencies with which one would expect to find some systems in the ensemble at the state 𝜓𝑛 at some (any) instant of time. This is the usual statistical interpretation of Quantum Mechanics initiated by Born, as we saw in chapter one, and more extensively developed in[22]. Since Quantum Mechanics is also applicable to single systems, we have to extend this interpretation to cover also these situations. The strategy here is quite obvious, indeed; we can use the ergodic assumption to borrow the ensemble interpretation and apply it to single system situations and, again, the main step is provided by the assumption of a finite Δ𝑡 to justify the ergodic assumption. 5
The same occurs in usual relativistic quantum mechanics. When we are treating probability amplitudes as scalars, we are within the domain of the Klein-Gordon equation, which admits only integer spin. However, we can pass from this formalism to Dirac’s by taking the formal square root of the Klein-Gordon equation. This leaves us with probability amplitudes that are spinors, not scalars, which reveal the whole world of half-integral spins (and, as everyone knows, these also satisfy the Klein-Gordon equation, that simply formalizes the relativistic constraint 𝑝 2 = 𝑚2 ).
The Stochastic Derivation
113
Following this strategy, we may say that expression (4.26), when applied to a single system implies that |𝑐𝑛 |2 is a measure of the relative amount of time that the system spends, within Δ𝑡, in state 𝜓𝑛 . This is a quite simple and visualizable interpretation that fits in very naturally with the ergodic assumption. Note, however, as we have already stressed, that there are experimental situations that pick one or the other picture exclusively and the use of the ergodic assumption would be misleading. In fact, the examples about Bragg-Laue diffraction, diffraction by an aperture and interference and diffraction by double slits are precisely the type of experimental situation in which it would be meaningless to talk about time averages over a single system, since the experiment must be performed with the repetition of identical copies. The “minimum wave packet” problem gives us another example of this. Since it is an instructive example, let us develop here its main results: the probability density is written as 1
𝜌(𝑥; 𝑡) = [𝜋⟨𝛿𝑥(𝑥;𝑡)2⟩]1/2 exp {−
[𝑥−ℏ𝑘0 𝑡/𝑚]2 ⟨𝛿𝑥(𝑥;𝑡)2 ⟩
},
(4.27)
where ⟨𝛿𝑥(𝑥; 𝑡)
2⟩
ℏ𝑡 2 = Δ𝑥 + ( ) , 𝑚Δ𝑥 2
and it is frequently associated with one particle. It is generally assumed that “the nonclassical” feature of this distribution comes from the way the above variance is related with time (in a future chapter, we will have more to say about the eagerness to find “nonclassicalities” for each microscopic phenomenon). But this is exactly what we find when we are interested in sums of independent random variables. It is a trivial result of basic statistical theory (called the reproductive result) that if we have two independent random variables 𝑋 and 𝑌 with Gaussian distributions 𝑁𝑋 (𝜇1 , 𝜎12 ) and 𝑁𝑌 (𝜇2 , 𝜎22 ), with averages 𝜇1 , 𝜇2 and variances 𝜎12 , 𝜎22 , respectively, then, if we want to find the distribution of the random variable 𝑍 = 𝑋 + 𝑌, the resulting distribution will be the Gaussian distribution 𝑁𝑋+𝑌 (𝜇1 + 𝜇2 , 𝜎12 + 𝜎22 ) (see [68], p. 233.) If we put 𝜎12 = Δ𝑥 2 ; 𝜎22 =
ℏ𝑡 ℏ𝑘0 𝑡 ; 𝜇 = 0; 𝜇2 = , 𝑚Δ𝑥 1 𝑚
we get just (4.27). This means that 𝑥 and ℏ𝑘0 𝑡/𝑚 are statistically independent random variables. As we noticed before, this distribution is sometimes associated with one particle, but it seems obvious that it is a distribution for an ensemble of particles (described by two uncorrelated random variables) that was initially prepared as 𝜌(𝑥; 0) =
1 𝑥2 exp [− ]. 2 1/2 [2𝜋Δ𝑥 ] 2Δ𝑥 2
114
Olavo Leopoldino da Silva Filho
This sort of distinction between what can be understood as an ensemble and what can be seen as a single system is not problematic only within Quantum Mechanics, it is also a difficult matter even in classical statistical theory. Thus, we may cite [69]: “the best (but not the only) example is the clash between Boltzmannian and Gibbsian statistical mechanics: the two approaches differ on the definition of a physical state (and in particular on the definition of equilibrium state); on the definition of entropy; and, most importantly, on what is captured by the formalism (individual systems or ensembles of systems).” (my emphasis)
As an example of the sort of simplification that the present approach brings about we can cite the problem of electrons traveling in a cloud chamber. The monist interpretation, that understands these packets as ensembles of particles, avoids all the sometimes painstaking maneuvers to explain phenomena such as the production of a well defined track in a cloud chamber, as the following passage shows: “It seems surprising at first that a fast electron, which we can assume possesses a definite momentum (magnitude and direction) and hence cannot be localized in space, can produce a sharp track in a cloud chamber. This phenomenon may be considered from various points of view. (...) we can represent the electron by a wave packet whose center of gravity moves like a classical particle. If the wavelength is short enough, the packet can be fairly small without spreading rapidly and will then interact only with atoms that lie close to the path of its center. (...) Another approach consists in describing the electron by a single plane wave and regarding its interaction with the first atom that it excites or ionizes as a position measurement that carries with it an uncertainty of the order of the atomic size. We consider here in detail a third description in which the electron and the atoms of the cloud chamber gas are treated as parts of a single system, so that we do not have to regard an atomic interaction as a position determination that changes the structure of the electron’s wave function. 6“ (see [70], p. 335).
4.4.3. Schrödinger’s Cats Note that we haven’t made any supposition about the size of the physical system under scrutiny—although the fluctuations are indexed by the universal constant ℏ, which may be very small compared to other relevant physical parameters of the system. Thus, in general, quantum mechanical fluctuations apply to microscopic systems (small masses, etc.) However, there may be situations in which a macroscopic system responds to quantum mechanical fluctuations (Schrödinger’s cats, as usually called); this system can occupy, in the manner above mentioned, superposed states along some time interval. There are two distinct types of “cats”, and we must consider them in detail to understand their differences.
6
The first point of view reflects the ontological interpretation, while the second reflects the epistemological interpretation of Quantum Mechanics presented in chapter one.
The Stochastic Derivation
115
Let us first consider a case in which a cat is inside a box with a dispenser that can make a pointlike loud noise whenever a quantum mechanical system, that can assume only two possible states 𝑛 = 0 or 𝑛 = 1, is at 𝜓0. Suppose also that our silly cat always gets frightened whenever there is noise in the box. The state of the quantum mechanical system is given by 𝜓 = 𝑐0 𝜓0 + 𝑐1 𝜓1 ,
(4.28)
and is a superposition of noisy-silent situations—this superposition, considering one single system during the interval Δ𝑡, represents by |𝑐0 |2 the total amount of time, relatively to Δ𝑡, that our box will have noise. We also have |𝑐1 |2 representing the amount of time, relatively to Δ𝑡, that our box is in silence. Giving the inability of our cat to get accustomed with the noise, this function will also represent the state of (mind of) the cat during the interval Δ𝑡. The cat, during all the interval Δ𝑡, will be sometimes (|𝑐1 |2) in a state of quietude and sometimes frightened as hell (|𝑐0 |2), but we do not know at which instants of time, within Δ𝑡, the cat will be in one or another state. This is a completely reversible problem (this is why we need a silly, amnesiac, cat) and our ergodic assumption applies (as an assumption) without restrictions. Note that the single cat is not frightened-not-frightened at some instant of time; at each instant of time it is frightened (exclusive-) or not frightened and the superposition, as a superposition, applies to time intervals—if one chooses an interval less than Δ𝑡, then the single system does not have sufficient time to reproduce, by means of its fluctuations, the superposed state above mentioned. The second “cat” is of a more problematic nature. In this case let us consider that our dispenser, instead of producing sound, delivers poison whenever some quantum mechanical system assumes the state 𝜓0, while it delivers nothing whenever the quantum mechanical system is in state 𝜓1. The structure of the problem may appear similar to the previous one, but it is not. Contrarily to the previous case, this one is irreversible. This problem, thus, has not the structure of a stationary problem and the application of the ergodic assumption fails. This failure indicates some peculiarities of the present “experiment”. We may easily see that at time 𝑡 = 0, when the quantum mechanical system is firstly prepared, we do have the superposed representation for its state as shown in (4.28), that is (𝜓0 is silence, 𝜓1 is noisy) 𝜓𝑄𝑀 = 𝑐0 𝜓0 + 𝑐1 𝜓1 , where 𝑐0 and 𝑐1 are constants (the quantum mechanical system is supposed here to be reversible). However, the representation for the cat’s terrible situation must be given by (𝜓0′ is cat dead, 𝜓1′ is cat alive) 𝜓𝑐𝑎𝑡 (𝑡) = 𝑑0 (𝑡)𝜓0′ + 𝑑1 (𝑡)𝜓1′ , and must be time dependent. The quantum mechanical system must be prepared in the sate 𝜓1 for us to have 𝑑1 (0) = 1, 𝑑0 (0) = 0, since if we began with 𝑑1 (0) = 0, 𝑑0 (0) = 1, the state of the cat would remain being represented by 𝜓(𝑡) = 𝜓0′ for all eternity (cat resuscitations discarded)—it is the irreversibility character of part of the problem that imposes the distinction about state preparation and measurement within this framework.
116
Olavo Leopoldino da Silva Filho
Thus, we begin with 𝑑1 (0) = 1, 𝑑0 (0) = 0 and let the system evolve in time. Note, however, that the probability of having the transition 𝑑0 (𝑡) → 1, 𝑑1 (𝑡) → 0 is controlled by the relative values of |𝑐0 |2 and |𝑐1 |2 (there are only two possibilities). Depending upon the relative values of the |𝑐𝑖 |2, the above transition will (in the average) take more or less time to happen—let us assume that it takes place at 𝑡 = 𝑡1. Whenever there is a fluctuation of the quantum mechanical system such that it makes a transition to 𝜓0, the cat system makes the transition 𝑑0 (𝑡 > 𝑡1 ) → 1, 𝑑1 (𝑡 > 𝑡1 ) → 0 and doesn’t change anymore. What would be our conclusions? Quite simply: there is no superposition in the cat states, when considered as a single system! Indeed, for times 𝑡 < 𝑡1 we always had 𝜓𝑐𝑎𝑡 = 𝜓1′ (alive cat) and for times 𝑡 > 𝑡1 we will always have 𝜓𝑐𝑎𝑡 = 𝜓0′ (dead cat), that is 𝜓′ 𝜓𝑐𝑎𝑡 (𝑡) = { 1′ 𝜓0
𝑡 < 𝑡1 . 𝑡 > 𝑡1
(4.29)
What regulates our probability of finding the cat alive is the time 𝑡1 at which we open the box, not because we observed the system with our consciousness (that would be responsible for some reduction of the wave packet), but because as we make 𝑡1 → ∞ it is obvious that 𝑑0 (𝑡) → 1, for it becomes more and more probable that 𝜓𝑄𝑀 will fluctuate to the state 𝜓0. In other words, the cat single system does not change back and forth its state and its probability density function must be written as 𝜌𝑐𝑎𝑡 (𝑡) = |𝑑0 (𝑡)|2 𝜌0′ + |𝑑1 (𝑡)|2 𝜌1′, for each time 𝑡, giving 𝜌𝑐𝑎𝑡 (𝑡 < 𝑡1 ) = 𝜌0′ (alive) and 𝜌𝑐𝑎𝑡 (𝑡 > 𝑡1 ) = 𝜌0′ (dead). One of the lessons to learn here is that the equivalence between the ensemble picture and the single system picture breaks down precisely because of the irreversible character of part of the problem. In fact, we should have looked to this problem using an ensemble approach. Then, at any time 𝑡1 we would have in fact the equality 𝜓𝑐𝑎𝑡 = 𝑐0 𝜓0 + 𝑐1 𝜓1, since we expect that for some cat-systems we will find (at 𝑡1) |𝑐0 |2 percent of the cats dead and |𝑐1 |2 percent of them alive. In the single system approach it would be a mistake to make the identification 𝜓𝑄𝑀 = 𝜓𝑐𝑎𝑡 , since the state on the left is time independent (by assumption) while the state on the right depends upon time. In the Schrödinger’s cat situation, the paradox comes from us accepting that we should have 𝜓𝑄𝑀 = 𝜓𝑐𝑎𝑡 (both time independent) even for the single system picture. Thus, it is not really a paradox, but the result of a miscronstrued problem, precisely because one fails to notice that the ensemble approach and the single system one are not equivalent. From what we have just said, one should not be overwhelmed by the existence of macroscopic or microscopic superposition states, for their interpretation is quite mundane. The “paradox” comming from Schrödinger’s cat is just the result of our inability to conceive the two highly distinct situations above mentioned with regard of the role played by time7.
7
Indeed, this book sustains that most of the Quantum Mechanical paradoxes or weirdness are just our inability to conceive time in a correct manner.
The Stochastic Derivation
117
This discussion may seem artificial, but it could be paraphrased using a physical problem such as the double slit interferometer. In that system each realization of the problem with one particle is a time dependent problem, since the absorption of the particle in the photographic plate is a clear cut for it (the measurement). The usual approach to this is to postulate that some reduction of the wave packet occurred, but this comes from the insistence to think that each particle should be described by the full wave. One, thus, must put afterwards a construct (the reduction) that lends the behavior of the system its irreversible nature. A much economic approach is to assume, from the beginning that such types of physical setups refer necessarily to ensemble descriptions and that the overal probability density function refers to the statistical frequencies related to it.
4.5. THE ORIGIN OF STOCHASTIC BEHAVIOR We now turn to provide an idea for the origin of the stochastic behavior. We begin by noticing that although the stochastic derivation is by no means clear about this, the characteristic function and entropy derivations leave it clear that there can be no external influence on the physical system under consideration. This means that our quantum mechanical systems must be considered as being isolated (see, for example, [25], pp. 288-291, where it is done an analysis based upon the notion of entropy somewhat similar to the one we made in chapter three.) When we look at the stationary Schrödinger equation, written as −
ℏ2 2 ∇ 𝜓 + 𝑉𝜓 = 𝐸𝜓, 2𝑚
we argue, within the present interpretation, this must be an equation for an ensemble of particles or for a single system (of particles) considered within some time interval Δ𝑡. Let us assume that each quantum mechanical system is composed by only one particle— since this is the most basic situation for an interpretation, the interpretation regarding many particle systems being a mere extension of this analysis. In the ensemble picture, the particle of each system of the ensemble behaves in ways dictated by the applied field, mathematically represented by the potential 𝑉 in the equation. In the single system picture the particle of the sole quantum mechanical system behaves in ways dictated by the applied 𝑉. In both pictures the energy 𝐸 represents the average energy of the particle. In the following we will use the single system approach, since it is the one that poses more difficulties—the ensemble interpretation follows quite naturally from the single system one. The previous description means that we are separating our isolated system into two interacting subsystems: one subsystem composed by the corpuscles referred to in the Schrödinger equation by the use of the coordinates 𝑞⃗ and another subsystem composed by the field responsible by the potential 𝑉. That is, the isolated physical system is separated into two statistically interacting subsystems (this, on the other hand, is the very source of a Langevintype description of physical systems – see chapter six).
118
Olavo Leopoldino da Silva Filho
Thus, for instance, for a hydrogen atom problem, we have a corpuscular subsystem composed by the electron and the proton: since we keep the proton frozen at the origin, the energy 𝐸 is the average energy of the electron. The field subsystem is considered in a coarse grained fashion, but we can imagine it as being composed by the innumerous photons (virtual, real, etc.) that are the bearers of the electromagnetic interaction between the proton and the electron. This type of separation is not new and, in fact, is usual in the literature on the electromagnetic field. “The [previous] statements of Poynting’s theorem have emphasized the energy of the electromagnetic fields. The work done per unit time per unit volume by the fields (𝐽⃗ ⋅ 𝐸⃗⃗ ) is a conversion of electromagnetic into mechanical or heat energy. Since matter is ultimately composed of charged particles (...) we can think of this rate of conversion as a rate of increase of energy of the charged particles per unit volume. Then we can interpret Poynting’s theorem for the microscopic fields as a statement of conservation of energy of the combined system of particles and fields.”(See [71], p. 190)
Note, however, that if we look this way at the quantum mechanical description of the mechanical system, it is not difficult to see where the fluctuations in the energy of the corpuscular system come from. In fact, we may picture the electromagnetic interaction between the electron and the proton as taking place because, at any instant of time 𝑡, the electron will be sending photons to the nucleus, and receiving some from it. Photons travel with finite velocity, as we know. Then, at any instant of time 𝑡, we have a corpuscular system with some energy 𝐸̅ + 𝛿𝐸(𝑡), where 𝛿𝐸(𝑡) is the deviation from the mean value due to the amount of absorption and emission of photons at 𝑡, while we will have a displacement of the field average energy given by −𝛿𝐸(𝑡) (since the complete physical system is isolated and the total energy cannot fluctuate). When we look at this behavior during a time interval Δ𝑡, if the emission and absorption of photons are random, we must get a fluctuation pattern for the energy of both subsystems (giving a final average energy for the electron as 𝐸̅ ). This same explanation would be valid (although in a more or less evident fashion) no matter which system we are considering. The content of the “quantum potential” term (which represents the energy fluctuations) is precisely a time average at each point of the configuration space of such fluctuations. Thus, it is not strange at all that it can be written as a local temperature: a local temperature of the type 𝑇(𝑞⃗) is a measure of the time averaged momentum fluctuations at each point 𝑞⃗ (remember that time averages of stationary systems are time-independent). The quantum potential represents the time-averaged amount of energy given to or removed from the field by the corpuscular system at each point 𝑞⃗. Since our developments up to this point were not capable of finding an expression for the dynamic equations that incorporates explicitly fluctuations, we can speak only of time-averaged quantities at each point 𝑞⃗, as we did in the stochastic mathematical derivation of the Schrödinger equation in this chapter; in chapter six we will derive the Langevin equations that correctly represent the random behavior of any quantum system and the time profile of the fluctuations will also become explicit. Note that while the total system is isolated, the corpuscular and field subsystems are not and, for the corpuscular subsystem, we passed from an isolated microcanonical description to
The Stochastic Derivation
119
a canonical ensemble picture8. This maneuver, however, does not violate the isolated character of the total system and it is a completely objective approach to the quantum mechanical problem. So far, however, we have developed our intuitions without giving the appropriate formal justifications. It was already stressed that this book’s approach is committed to keep itself as close to the formalism as possible. Thus, it would be nice to present formal developments that can justify the above interpretation. We do this in the next subsection for the electromagnetic field (which is the prevailing system with respect to Quantum Mechanics), but these developments may be generalized to any field. An explicit connection in more general terms will have to wait until chapter six. As a final note of caution, the previous discussions show clearly that Quantum Mechanics (as represented by the Schrödinger equation) should be considered as a mean field theory. This must be so because in this description, the potential 𝑉(𝑞⃗) never changes, although strictly speaking, it must fluctuate in the same manner presented by the corpuscular subsystem’s energy. Indeed, this fact may be the source of many misconceptions (the most obvious one would be that regarding the interpretation of the tunnel effect, to be considered in what follows).
4.5.1. The Formal Grounds for Electromagnetic Stochastic Behavior In the characteristic function derivation of the Schrödinger equation we assumed that the problem has no applied electromagnetic field and used in the kernel of the Fourier transform the function exp(𝑖𝑝⃗ ⋅ 𝛿𝑟⃗/ℏ). If electromagnetic fields are present, then the canonical momentum is not 𝑝⃗, but 𝑝⃗ − 𝑒𝐴⃗(𝑟⃗; 𝑡)/𝑐, where 𝐴⃗(𝑟⃗; 𝑡) is the vector potential of the electromagnetic field; thus we may generalize our characteristic function derivation by writing 𝑖 𝑒 𝑍(𝑟⃗, 𝛿𝑟⃗; 𝑡) = ∫ exp {ℏ [𝑝⃗ − 𝑐 𝐴⃗(𝑟⃗; 𝑡)] ⋅ 𝛿𝑟⃗} 𝐹(𝑟⃗, 𝑝⃗; 𝑡)𝑑 3 𝑝
(4.30)
and it is very easy to show that the same derivation steps used in chapter two give the Schrödinger equation 2
1 𝑒 ∂𝜓(𝑟⃗;𝑡) {2𝑚 [−𝑖ℏ∇ + 𝑐 𝐴⃗(𝑟⃗; 𝑡)] + 𝜑(𝑟⃗; 𝑡)} 𝜓(𝑟⃗; 𝑡) = 𝑖ℏ ∂𝑡 ,
(4.31)
which is the one we should find—𝜑(𝑟⃗; 𝑡) is the scalar potential of electromagnetic theory. This, however, is not the equation we generally try to solve when considering, for example, the hydrogen atom. Instead of this last equation, one solves ℏ2
{− 2𝑚 ∇2 + 𝜑(𝑟⃗)} 𝜓(𝑟⃗; 𝑡) = 𝐸𝜓(𝑟⃗; 𝑡),
(4.32)
“Ensemble” here has the general meaning encompassing also single systems within Δ𝑡 to which the ergodic assumption was already applied.
8
120
Olavo Leopoldino da Silva Filho
in which the, in principle, time dependent potential vector has disappeared and the problem becomes time independent, for it is the static Coulomb potential 𝜑(𝑟⃗) that is written now. The question then is: on what grounds can we assume that the problem mathematically represented by (4.32) is equivalent to the problem represented by (4.31)? Indeed, it is amply known that one of the major historical reasons for developing the formalism of Quantum Mechanics was the fact that, when considering the time dependent electromagnetic potentials, instead of only the electrostatic one, Newton’s equations implies rapidly decaying solutions because of the emission of electromagnetic radiation. These radiations, as we all know, are related to the time derivative of the potential vector. It is pretty much obvious that if we consider the electrostatic potential in Newton’s equations the solutions are time-independent and stable, exactly as with the Schrödinger equation for that matter. Thus, if we are to compare the solution regarding the notion of stability furnished by the two theories (Quantum Mechanics and Newtonian Mechanics), we should keep the potential vector term in (4.31) and solve the time-dependent Schrödinger theory to see if a timeindependent solutions obtains. In this case, the formal expression for 𝐴⃗(𝑟⃗; 𝑡) would be the usual Lienard-Wiechert formulas. Therefore, at this point of the formal developments, there is no justification for the removal of the potential vector from (4.31), and the problem of stability is kept untouched. In fact, it seems to be an old misconception that the usual Schrödinger theory explains the stability of the atoms. This is surely not the case, for stability is being assumed when we pass from (4.31) to (4.32). What Quantum Mechanics does is to explain quantization assuming stability—something Newtonian mechanics cannot do. However, as we have seen, passing from a Newtonian picture to a stochastic one (from which the Schrödinger equation can be derived) means to assume the existence of fluctuations. The appearance of fluctuations is our single hypothesis here and it must suffice to explain the disappearance of 𝐴⃗(𝑟⃗; 𝑡) from (4.31). We now show that the formal developments used to explain this disappearance give us the key to understand our previous interpretation about the origin of stochastic behavior. Indeed, from expression (4.30) we can derive, using the same developments presented in chapter two, the Bohr-Sommerfeld relation 𝑛ℎ 𝑒 ⃗⃗ = { 1 , ∮𝐶 [𝑝⃗ + 𝑐 𝐴⃗(𝑟⃗; 𝑡)] ⋅ 𝑑ℓ (𝑛 + ) ℎ
(4.33)
2
which now involves the potential vector because of the new kernel used in (4.30). However, this last relation implies that the electromagnetic flux is an adiabatic invariant given the trajectory of the particle (see [71], p. 419). This adiabatic invariance of the magnetic flux has far reaching consequences that, to our knowledge, were not fully exploited in the literature. It is important to remember that the crucial step in moving from electrostatic and magnetostatic to electromagnetism is based on Faraday’s equation. This equation may be written in integral form as ⃗⃗ = − 𝑑 ∫ 𝐵 ⃗⃗, ⃗⃗(𝑟⃗; 𝑡) ⋅ 𝑛̂𝑑𝑆 = − 𝑑 ∮ 𝐴⃗(𝑟⃗; 𝑡) ⋅ 𝑑ℓ ∮𝐶 𝐸⃗⃗(𝑟⃗; 𝑡) ⋅ 𝑑ℓ 𝑑𝑡 𝑆 𝑑𝑡 𝐶
(4.34)
The Stochastic Derivation
121
where 𝐶 is the closed curve defining the limits of the open surface 𝑆 and 𝑛̂ is the unit normal to 𝑆. If the magnetic flux is an adiabatic invariant, then the right hand side of (4.34) gives zero and the electric and magnetic field uncouple. In this case, we have ⃗⃗ = 0 ∮ 𝐸⃗⃗(𝑟⃗; 𝑡) ⋅ 𝑑ℓ 𝐶
which is equivalent to the condition defining the electrostatic domain, since this implies that 𝐸⃗⃗ = −∇𝜑(𝑟⃗). Thus, in this case it is adequate to remove the vector potential from (4.31) and use 𝜑(𝑟⃗) as time independent. Therefore, this is the explanation why quantization (of the magnetic flux) implies stability and thus, only now, explaining quantization sums up to explaining stability. There is an assumption being made in the previous developments that must be made explicit: in the previous developments we are assuming the priority of Maxwell’s integral equations over their differential formulation. This is important to note, since the integral equations are global, while the differential ones are local, obviously, and this means that there may be some situations in which local time-dependent behaviors compensate to give a global time-independent behavior. From the historical point of view this assumption about the priority of the integral equations seems justified, for these equations were derived phenomenologically, taking into consideration actual extended circuits over which the integrals were performed. Indeed, there is an assumption that one must make when going from the integral to the differential equations by means of the integral theorems (Stokes, divergence, etc.) The integral equations can be written as differential ones only when the fields have no singularities within the region of interest. If there are regions of singularity in the fields within the spatial region of interest, only the integral equations are valid. Thus, the common approach of deriving the Lienard-Wiechert potentials from the differential equations and then concluding for the (liquid) emission of electromagnetic radiation energy may be considered somewhat precipitated. There may be situations in which the charges (electrons, protons, etc.) are locally accelerated but from which there emerges no liquid electromagnetic radiation. This conclusion is valid both to Quantum and Newtonian Mechanics. However, Newtonian Mechanics cannot impose the necessary dynamical conditions the fields must satisfy to block the passage to the differential equations (and thus to radiation). Quantum Mechanics, because of the notion of local fluctuations, provides precisely these conditions. In fact, the “quantum potential”, which represents the kinetic energy fluctuations of the particle, generally presents these infinities (as one can easily see in Table 3 for the hydrogen atom). Thus, the fluctuation in the kinetic energy of the corpuscles (the electron in the hydrogen atom) must be accompanied by the fluctuation in the electromagnetic energy of the fields (strict conservation of energy) and since the former have infinities in its formal expression, so must have the latter. Physicists are usually not very careful with the mathematical constraints of their theories (which is in general good, but sometimes this may
122
Olavo Leopoldino da Silva Filho
corrupt their interpretation of the derived formal results). The priority of the integral Maxwell’s equations should be obvious since, for the reasons just mentioned, any solution of these equations in the differential format would be a solution to the integral equations, but the converse is not true. This is the meaning of the expression (4.33). As is well known, adiabatic invariance implies that, whenever the system undergoes variations in the relevant parameters that contribute with small fractions to the overall values of these parameters, the related action integrals remain invariant. Thus, expression (4.33) says that although there may be local variations of the vector potential in a cycle, these variations are expected to compensate and to maintain the total magnetic flux constant. In fact, this expression says that there is an exchange of energy between the momentum of the particles and the momentum of the field [represented by 𝐴⃗(𝑟⃗; 𝑡)]. Thus, for very brief time intervals energy passes from the corpuscular subsystem to the field subsystem and backwards, showing that there is fluctuation in both subsystems (and also quantization). In fact, the quantization of the magnetic flux is already experimentally well proved for a number of cases, as in the Meissner effect (see, for example, [72]). Boyer, as other authors seeking for connections between stochastic electrodynamics and quantum electrodynamics[73], also pointed out the deep connection between the adiabatic invariants and stochastic electrodynamics[74]: “here we wish to point out a curious sidelight found in the further development of the theory. It turns out that the action-angle variables provide a convenient description of mechanical systems in random classical radiation (...) [since] it is easy to show that the zero-point radiation is the unique spectrum of random classical radiation which leaves the adiabatic invariants of a nonrelativistic periodic mechanical system with no harmonics as still adiabatic invariant in the presence of radiation.”
From what we have said before, this means that the exchange of momentum between the particle subsystem and the electromagnetic field must be quantized (we call these quanta photons). The separation of the complete problem into two subsystems implied the fluctuations that, in turn, implied the quantization and thus, stability. This is precisely the point, since we know that “the equilibrium state, static from the viewpoint of classical thermodynamics, is necessarily dynamic. Local inhomogeneities continually and spontaneously generate, only to be attenuated and dissipated (...)” (See [43], p. 210.)
Moreover, which is most important for us here: “the ‘subsystem’ may, in fact, be a small portion of a larger system, the remainder of the system constituting the ‘reservoir’. In that case the fluctuations are local fluctuations within a nominally homogeneous system.” (See [43], p. 423.)
Thus, instead of having a conceptual abyss between Classical and Quantum Mechanics, we have just a formal distinction between Newtonian and Quantum Mechanics. The
The Stochastic Derivation
123
distinction comes because the former lacks the fluctuations of the latter. There is no need to take this syntactic difference between Newtonian and Quantum Mechanics up to the level of a semantic difference between Classical and Quantum Mechanics. The previous interpretation may help us understand the behavior of any quantum mechanical system only in terms of internal and objective fluctuations (leading to quantization). In fact, another important advantage of showing the equivalence between the stochastic derivation and ours is that, by means of this equivalence, we automatically add to our own framework inumerous results achieved by the stochastic approach in the last decades. Examples are the relativistic extensions giving both the Klein-Gordon and the Dirac equation [75, 76], their radiative corrections [77], a path integral formulation [78] and its application to the problem of barrier penetration [79]. To illustrate this, we will use the stochastic approach to interpret the tunnel effect in the next subsection.
4.5.2. Corpuscular Interpretation of the Tunnel Effect The tunnel effect is one of those important cases of exemplary problems to which T. Kuhn refer [80]. Exemplary problems serve as the introduction and training of new generations of scientists into the new paradigm. Needless to say, the tunnel effect is customarily explained based on some sort of dualism (ontological or epistemological). However, the development of stochastic approaches allows one to interpret this phenomenon also in terms of a corpuscular ontology [79]. The explanation may be given in terms of stationary solutions or a time dependent wave packet being scattered by a potential barrier. We use here the second situation, but the explanations are identical. Before giving the explanation, it is interesting to note that the fluctuations induced in the incident wave packet didn’t passed unnoticed by the founders of Quantum Mechanics. Thus, in Figure 4-2 we have the graphic representation of a simulation in which a wave packet arrives at a potential barrier (also shown) and have its functional shape changed by the barrier in an almost self-evident form of fluctuation induced phenomenon (see http://www.physics.orst.edu/~rubin/nacphy/CPapplets/QMWAVE/Potbar/Potbar.html for the simulation with an applet). The important fact is that the modification of the shape of the incident packet (be it a wave or an ensemble of particles) must imply that the potential barrier is interacting with it and thus, furnishing or removing energy from it (in fact, if the barrier wasn’t interacting with the incoming particles, there wouldn’t be a physical problem to begin with). However, as is obvious from the figures, the potential barrier (which was drawn, and is not part of the simulation) is kept unaltered (see [70], pp. 106,107 and [81]). The fact that there appear innumerous peaks in the incoming packet implies that some of the corpuscles of which it is made took energy from the barrier, while others gave energy to it. Thus, the barrier must fluctuate accordingly. The fact that it does not in our simulation (or when we consider Schrödinger equation’s solutions) means only that this approach is a sort of mean field theory, so that the potentials are being considered fixed at their average values.
124
Olavo Leopoldino da Silva Filho
Figure 4.2. The behavior of a wave packet scattered by a potential barrier (taken from [70]).
Of course, if the incoming packet is prepared in such a way as to have particles with energy compared, although inferior, to the energy of the barrier (as in Figure 4-2) some of these particles may acquire enough energy from the fluctuations to pass, while much of them are reflected back. This means that there is no weird negative linear momentum, or interference with itself phenomenon. Just particles and fluctuations. In fact[82]: “since the wave function that describes quantum barrier penetration by particles decays exponentially, there is reason to believe that this phenomenon may be expressed classically in terms of guided (acoustic, electric, mechanical, etc.) wave propagation. The subject matter reported, therefore, is the investigation of a classical analog to particle tunneling through quantum potential barriers, utilizing electromagnetic propagation through a section below-cutoff waveguide.”
Thus, as in acoustic, mechanical, etc. phenomena, the problem can be assessed using corpuscular ontologies, in which the existing wave is a manifestation of the behavior of an ensemble of underlying particles. The tunnel effect is an expression of an undulatory behavior, not necessarily quantum mechanical or classical, but any undulatory behavior. We have now reached a point at which the mathematical justification of the use of infinitesimal derivations (𝛿𝑞 or 𝛿𝑡 considered up to second order) is demanding. In the next chapter we will show that this justification reveals the connection between Quantum Mechanics and the Central Limit Theorem.
Chapter 5
QUANTUM MECHANICS AND THE CENTRAL LIMIT THEOREM In previous chapters we presented different derivations of the Schrödinger equation and we also showed their mathematical equivalence. However, these derivations were all based upon the consideration of some “second order” expansion to which we associated the notion of infinitesimal displacements (in the characteristic and entropy derivations) or infinitesimal times (in the stochastic and Feynman’s derivations). This assumption of infinitesimality must be justified. In the previous chapter we touched, on passing, one such justification but the approach we are trying to develop in this book demands that we investigate the topic further. Thus, we show in this chapter that the solid ground on which these derivations are lying is furnished by the Central Limit Theorem (CLT) and its conditions of validity. Indeed, the connections between our derivations and the CLT could be guessed, at first, by the expression found for the phase space probability density function. As we recall, this expression is 𝐹(𝑞⃗, 𝑝⃗; 𝑡) =
𝜌(𝑞⃗; 𝑡) 3 [𝑝⃗ − 𝑝⃗(𝑞⃗; 𝑡)]2 exp {− }. [2𝜋⟨𝛿𝑝⃗(𝑞⃗; 𝑡)2 ⟩/3]3/2 2 ⟨𝛿𝑝⃗(𝑞⃗; 𝑡)2 ⟩
If we look at each point 𝑞⃗ in configuration space, this function is a Gaussian function in 𝑝⃗ with average 𝑝⃗(𝑞⃗; 𝑡) and variance ⟨𝛿𝑝⃗(𝑞⃗; 𝑡)2 ⟩—the marginal probability densities 𝜌(𝑞⃗) and 𝜋(𝑝⃗) are not Gaussian functions, generally. Gaussian probability functions are the natural outcome of physical processes satisfying the conditions for the applicability of the CLT and this theorem fixes an enormous universality class for these underlying physical processes. Note that we could have written the previous expression (one dimension), the result of our previous derivations, as
𝐹(𝑞, 𝑝; 𝑡) =
𝜌(𝑞;𝑡) {2𝜋(−ℏ2
1/2 ∂2 𝑙𝑛𝑍 ) } ∂(𝛿𝑞)2 0
1
exp {− 2
∂ln𝑍 ) ] ∂(𝛿𝑞) 0 2 ∂ 𝑙𝑛𝑍 (−ℏ2 ) ∂(𝛿𝑞)2 0
2
[𝑝−(−𝑖ℏ
},
(5.1)
which is exactly the result one finds as the final expression for the CLT (assuming 𝑝 as the random variable and fixing 𝑞)(see[83], p. 87), giving that 𝑍(𝑞, 𝛿𝑞; 𝑡) is the characteristic
126
Olavo Leopoldino da Silva Filho
function of the problem and the subscript 0 means that the derivatives are being calculated at 𝛿𝑞 → 0. However, the way by which we arrived at the expression for the phase space probability density should be criticized, as we did, since in the derivation we assumed 𝛿𝑞 as infinitesimal, while when obtaining 𝐹(𝑞, 𝑝; 𝑡) we needed to calculate the inverse Fourier transform of the characteristic function as +∞
𝐹(𝑞, 𝑝; 𝑡) = ∫−∞ exp (−
𝑖𝑝𝛿𝑞 ℏ
) 𝑍(𝑞, 𝛿𝑞; 𝑡)𝑑(𝛿𝑞),
(5.2)
which seems meaningless, since in the integration we are assuming variations of the infinitesimal parameter 𝛿𝑞 within the interval (−∞, +∞) ?! All these issues about an apparent infinitesimal value of 𝛿𝑞 were responsible for a criticism of the derivation of the Schrödinger equation done by us[85], even though similar assumptions were also made by Feynman [86] and the stochastic derivations, as we have shown. The developments of this chapter are intended to remove this incongruence by showing what is involved in keeping only terms up to second order in 𝛿𝑞, that is, the clarification of the meaning of this expansion, so that the previously mentioned integration may be still mathematically sound1. In the first section we present the CLT in its usual form, since some readers may be unfamiliar with it. It will be presented as derived in usual statistics textbooks for only one random variable. In the following we present the CLT for two random variables (𝑞⃗, 𝑝⃗), since we are interested in phase-space probability distributions. We then show the relations between our previous derivations and the application of the CLT for quantum mechanical problems.
5.1. THE CENTRAL LIMIT THEOREM FOR ONE RANDOM VARIABLE We present in this section the usual derivation of the Central Limit Theorem (CLT) for one random variable as an introduction to the results of the next sections. The reader acquainted with such derivations may skip this section without any loss. It is well known that if 𝑦1 , 𝑦2 , … , 𝑦𝑛 are 𝑛 independent random variables with Gaussian distribution, then the probability distribution of the random variable 𝑦 = 𝑦1 + 𝑦2 + ⋯ + 𝑦𝑛 is also given by a Gaussian function. The Central Limit Theorem (CLT) states that the same property is valid under much more general conditions about the distributions of the 𝑦𝑖 . For instance, in the one dimensional random walk problem the distributions of each step 𝑙 (forward with probability 𝑝, backward with probability 𝑞 = 1 − 𝑝) is not Gaussian, but in the
1
The results of the present chapter are an extension of the work published in Foundations of Physics vol. 34 issue 6 June 2004. p. 891-935.
Quantum Mechanics and the Central Limit Theorem
127
limit of a large number of steps given by 𝑁 the distribution for the sum 𝑥 = 𝑁𝑙 is of the Gaussian type [25]. The CLT was formulated by Laplace in 1812, but was rigorously proved only in 1901 by the Russian mathematician Liapounoff. We present here a sketch of its derivation. Theorem: consider 𝑦1 , 𝑦2 , … , 𝑦𝑛 a sequence of independent random variables, with ⟨𝑦𝑖 ⟩ = 𝜇𝑖 and ⟨𝑦𝑖2 ⟩ = 𝜎𝑖2 , 𝑖 = 1,2, … If we put 𝑦 = 𝑦1 + 𝑦2 + ⋯ + 𝑦𝑛 then, under very general conditions, the reduced variable2
𝑢(𝑛) =
̅ (𝑛) 𝑦−𝜇 (𝑛) √σ ̅̅̅̅ 2
,
(5.3)
where 𝑛
𝜇̅
(𝑛)
𝑛
= ∑ 𝜇𝑘 , ̅𝜎̅̅2̅
(𝑛)
𝑘=1
= ∑ 𝜎𝑘2 , 𝑘=1
(𝑛) has approximately a Gaussian distribution with 𝜇̅ (𝑛) = 0 and ̅̅̅ σ2 = 1. Thus, if 𝐹𝑛 is the probability distribution function of the random variable 𝑦, then we have
lim 𝐹𝑛 (𝑦) =
𝑛→∞
1 ̅2 √2𝜋𝜎
exp {−
(𝑦−𝜇 ̅ )2 ̅2 2𝜎
},
where 𝜇̅ = 𝜇̅ (∞) = ∑∞ ̅ 2 = ̅̅̅ σ2 𝑘=1 𝜇𝑘 and 𝜎
(5.4) (∞)
2 = ∑∞ 𝑘=1 𝜎𝑘 .
Proof. (We will demonstrate the theorem only for situations in which the 𝑚 = 𝜇𝑘 ’s are all equal and so are all the 𝜎𝑘2 = 𝑠 2. However, the theorem has a much wider applicability) ̅̅̅2̅(𝑛) = 𝑛𝑠 2 (note (see [68], pp. 270-272). Under these assumptions, we have 𝜇̅ (𝑛) = 𝑛𝑚 and 𝜎 that this means that 𝑠 2 must go to zero with 𝑛−1 as we make 𝑛 → ∞, the same being valid for 𝑚. Consider now the probability 𝑃(𝑦)𝑑𝑦 of being at some interval 𝑦 + 𝑑𝑦 after 𝑛 steps, each one within 𝑦𝑘 + 𝑑𝑦𝑘 with probability 𝑤(𝑦𝑘 )𝑑𝑦𝑘 (we are assuming that all the 𝑤(∗) are identical). This situation is represented in figure 5-1 for 𝑛 = 3 trials. Since each 𝑦𝑘 is independent of all the others (this is an assumption of the theorem), the probability of having 𝑦 within 𝑑𝑦 after 𝑛 trials will be given by the sum of the products (𝑗)
(𝑗)
(𝑗)
indicated in (5.5). In this way, every sequence of steps 𝑦1 , 𝑦2 ⋯ 𝑦𝑛 (all possible j) is considered in finding the probability of arriving at the interval (𝑦, 𝑦 + 𝑑𝑦).
2The
index n is simply saying that we have summed only up to a finite number n of random variables.
128
Olavo Leopoldino da Silva Filho (1)
(1)
(1)
(1)
(1)
(1)
(2)
(2)
(2)
(2)
(2)
(2)
𝑤 (𝑦1 ) 𝑑𝑦1 𝑤 (𝑦2 ) 𝑑𝑦2 ⋯ 𝑤 (𝑦𝑛 ) 𝑑𝑦𝑛 𝑃(𝑦)𝑑𝑦 = 𝑠𝑢𝑚 ↓
𝑤 (𝑦1 ) 𝑑𝑦1 𝑤 (𝑦2 ) 𝑑𝑦2 ⋯ 𝑤 (𝑦𝑛 ) 𝑑𝑦𝑛 ⋮ (𝑛) (𝑛) (𝑛) (𝑛) (𝑛) (𝑛) {𝑤 (𝑦1 ) 𝑑𝑦1 𝑤 (𝑦2 ) 𝑑𝑦2 ⋯ 𝑤 (𝑦𝑛 ) 𝑑𝑦𝑛
(5.5)
(𝑗)
as long as we have 𝑦 ≤ ∑𝑛𝑘=1 𝑦𝑘 ≤ 𝑦 + 𝑑𝑦 for each j. The sum in (5.5) (which is an integral) can be written as
Figure 5.1. Three trials of random movements in one dimension. The steps are marked on the horizontal (𝑘) axis as 𝑦𝑗 , while their (equal) sum 𝑦 is represented on the vertical axis.
∏𝑛 )𝑑𝑦𝑘 , 𝑃(𝑦)𝑑𝑦 = ∫ ⏟𝑦 ⋯ ∫𝑦 𝑘=1 𝑤(𝑦𝑘 1
(5.6)
𝑛
𝑑𝑦
as long as 𝑦 ≤ ∑𝑛𝑘=1 𝑦𝑘 ≤ 𝑦 + 𝑑𝑦. This is a very difficult integration to perform, since the integration limits are quite awkward. However, we can use the overall condition upon the sum ∑𝑛𝑘=1 𝑦𝑘 to move all the difficulties out of integration limits taking them to the integrands. This is done by inserting into the integrand the factor 𝛿(𝑦 − ∑𝑛𝑘=1 𝑦𝑘 )𝑑𝑦, where 𝛿(∗) is Dirac’s delta distribution, which goes to 1 when 𝑑𝑦 goes to zero. We thus can rewrite the previous probability as +∞
+∞
𝑃(𝑦)𝑑𝑦 = 𝛿(𝑦 − ∑𝑛𝑘=1 𝑦𝑘 )𝑑𝑦 ∫−∞ ⋯ ∫−∞ ∏𝑛𝑘=1 𝑤(𝑦𝑘 )𝑑𝑦𝑘 ,
(5.7)
where all the restrictions upon the integration limits were lifted. We thus use the integral representation of Dirac’s delta distribution, given by 1
+∞
𝑛
𝛿(𝑦 − ∑𝑛𝑘=1 𝑦𝑘 ) = 2𝜋 ∫−∞ 𝑒 −𝑖𝑡[𝑦−∑𝑘=1 𝑦𝑘 ] 𝑑𝑡, to write (using the multiplicative property of the exponential and disregarding the irrelevant multiplicative factor 1⁄2𝜋)
Quantum Mechanics and the Central Limit Theorem +∞
129
+∞
𝑃(𝑦) = ∫−∞ 𝑒 −𝑖𝑡𝑦 𝑑𝑡 ∏𝑛𝑘=1 ∫−∞ 𝑒 −𝑖𝑡𝑦𝑘 𝑤(𝑦𝑘 )𝑑𝑦𝑘 . Now, if we write the characteristic function of each 𝑦𝑘 as 𝑧(𝑡) = ∫ 𝑒 𝑖𝑡𝑦𝑘 𝑤(𝑦𝑘 )𝑑𝑦𝑘 , then +∞
𝑃(𝑦)𝑑𝑦 = ∫−∞ 𝑒 −𝑖𝑡𝑦 [𝑧(𝑡)]𝑛 𝑑𝑡. Since the characteristic function for the variable y is given by +∞
𝑍(𝑡) = ∫−∞ 𝑒 𝑖𝑡𝑦 𝑃(𝑦)𝑑𝑦, we get 𝑍(𝑡) = [𝑧(𝑡)]𝑛 ,
(5.8)
which is a well known result of statistical theory. Now, if instead of working with 𝑦𝑘 , we use the rescaled random variable (𝑦𝑘 − 𝑚)/𝑠, such that 𝑧̃ (𝑡) = ∫ 𝑒 𝑖𝑡(𝑦𝑘 −𝑚)/𝑠 𝑤(𝑦𝑘 )𝑑𝑦𝑘 , then, the properties of the Fourier transform give us 𝑧̃ (𝑡) = 𝑒 −𝑖𝑚𝑡⁄𝑠 ∫ 𝑒
𝑖𝑡𝑦𝑘 𝑠
𝑡
𝑤(𝑦𝑘 )𝑑𝑦𝑘 = 𝑒 −𝑖𝑚𝑡⁄𝑠 𝑧 ( ). 𝑠
(5.9)
This last result means that the rescaled characteristic function 𝑍̃(𝑡) of the variable 𝑦 must be given by 𝑍̃(𝑡) = 𝑒 −𝑖𝑛𝑚𝑡⁄√𝑛𝑠 [𝑧 (
𝑛
𝑡
)] . 𝑛𝑠
√
Thus, since the 𝑦𝑘 ’s are all independent, the characteristic function of their sum is just the product of the characteristic functions of each 𝑦𝑘 (actually, this is one of the most important features of characteristic functions). Thus, since 𝑢(𝑛) is a (rescaled) linear function of 𝑦 (the sum of the random variables), the characteristic function of 𝑢(𝑛) (related to 𝑦) is given by 𝑍̃(𝑡) = 𝑒 −𝑖(√𝑛𝑚/𝑠)𝑡 [𝑧 (
𝑡 √𝑛𝑠
𝑛
)] .
130
Olavo Leopoldino da Silva Filho
Thus √ 𝑙𝑛𝑍̃(𝑡) = −𝑖
𝑛𝑚 𝑡 𝑡 + 𝑛𝑙𝑛 [𝑧 ( )]. 𝑠 √𝑛𝑠
We can now develop 𝑧(𝑡) in Maclaurin series as 𝑧(𝑡) = 1 + 𝑧′(0)𝑡 +
𝑧′′(0) 2 𝑡 + 𝑅, 2
where 𝑅 is the remainder of the series. Since the definition of 𝑧(𝑡) implies that 𝑧′(0) = 𝑖𝑚, 𝑧 ′′ (0) = −(𝑚2 + 𝑠 2 ), we end with3 𝑧(𝑡) = 1 + 𝑖𝑚𝑡 −
(𝑚2 + 𝑠 2 ) 2 𝑡 + 𝑅, 2
and thus √ lnZ̃(𝑡) = −𝑖
Note that
√𝑛𝑚 𝑠
(𝑚2 + 𝑠 2 ) 2 𝑛𝑚 𝑚 𝑡 + 𝑛ln [1 + 𝑖 𝑡− 𝑡 + 𝑅] ; 𝑠 2𝑛𝑠 2 √𝑛𝑠
is a finite quantity, while
𝑚 √𝑛𝑠
and
(𝑚2 +𝑠2 ) 2𝑛𝑠2
are infinitesimal, because of the
infinitesimality of 𝑚 and 𝑠 2 (with 𝑛−1). Note, however, that there is a factor n multiplying the logarithm. This means that we will have to seek for the expanded expression in the logarithm. Since we want results for 𝑛 → ∞, we develop the logarithm in power series to find √𝑛𝑚 lnZ̃(𝑡) = −𝑖 𝑠 𝑡 + 𝑛 [(𝑖 1
(𝑖 2
𝑚
𝑡− 𝑛𝑠
√
(𝑚2 +𝑠2 ) 2 𝑡 2𝑛𝑠2
𝑚 √𝑛𝑠 2
𝑡−
(𝑚2 +𝑠2 ) 2 𝑡 2𝑛𝑠2
+ 𝑅) − .
(5.10)
+ 𝑅) + ⋯ ]
Note that the first two terms cancel out and we end up with 1 lnZ̃(𝑡) = − t 2 + Ω𝑛 (t), 2 where Ω𝑛 (𝑡) depends upon 𝑛 as some inverse power law of the type 𝑛−𝛼 . Thus, if we take the limit 𝑛 → ∞ we get 1 lim lnZ̃(𝑡) = − 𝑡 2 , 2
𝑛→∞ 3
Note that 𝜇 and 𝜎 are infinitesimal quantities. This is what allows us to develop the expansion only up to second order. The remainder R will be a complicated expression of these quantities and powers of n.
Quantum Mechanics and the Central Limit Theorem
131
which, upon inversion, gives the Gaussian probability function (𝑢 = 𝑢(∞) ) 𝐹(𝑢) =
1 √2𝜋
𝑒 −𝑢
2 /2
.
The reader have probably recognized in the previous proof of the CLT steps quite similar to those made in our characteristic function derivation of the Schrödinger equation. In the next section we make the same derivation for two random variables (but making 𝑞—the configuration space point—constant, which is just what we have done in this section, except that averages and variances will be given in terms of the parameter 𝑞).
5.2. PHASE SPACE DERIVATION One first derivation of the CLT for two variables can be schematically presented in a quite simple way. This is done in what follows in more qualitative terms. However, since this derivation is of utmost importance for our arguments in this book, we also present this derivation with more mathematical details.
Figure 5.2. Schematic representation of random walks on phase space. On the left panel we present the drawing of five trials of random walks. On the right panel we present one possible probability distribution on phase space constructed by taking only the end points of innumerous (≈ 104 ) random 𝑙 walks. These end points are just the results of the sums 𝜋𝑘𝑙 = ∑𝑗 𝑝𝑘,𝑗 for the momentum variable (all k).
Suppose now that we have a statistical process taking place in a two dimensional space labelled by the variables (𝑞, 𝑝). If this process is such that it produces a list of values of the type (𝑞1 , 𝑝1 ), (𝑞2 , 𝑝2 ), … , (𝑞𝑛 , 𝑝𝑛 ), which are independent, then we may separate the outcomes of this list by joining together all the elements that have the same 𝑞𝑘 (in fact, which has an outcome q for the first coordinate of (𝑞, 𝑝) such that 𝑞𝑘 < 𝑞 < 𝑞𝑘 + 𝑑𝑞 . We will end 𝑙 up with the lists (𝑝𝑘,𝑗 represents the j outcome related with position 𝑞𝑘 in the lth trial)
132
Olavo Leopoldino da Silva Filho 1 1 1 1 𝑞1 : 𝑝1,1 , 𝑝1,2 , 𝑝1,3 , ⋯ , 𝑝1,𝑛 1 1 1 1 𝑞2 : 𝑝2,1 , 𝑝2,2 , 𝑝2,3 , ⋯ , 𝑝2,𝑛 ⋮ 1 1 1 1 𝑞𝑎 : 𝑝𝑎,1 , 𝑝𝑎,2 , 𝑝𝑎,3 , ⋯ , 𝑝𝑎,𝑛 ⏟
2 2 2 2 𝑞1 : 𝑝1,1 , 𝑝1,2 , 𝑝1,3 , ⋯ , 𝑝1,𝑛 2 2 2 2 𝑞2 : 𝑝2,1 , 𝑝2,2 , 𝑝2,3 , ⋯ , 𝑝2,𝑛 ⋮ 2 2 2 2 𝑞𝑎 : 𝑝𝑎,1 , 𝑝𝑎,2 , 𝑝𝑎,3 , ⋯ , 𝑝𝑎,𝑛 ⏟
trial 1
trial 2
⋯
(5.11)
𝑙 𝑙 in which each 𝑝𝑘,𝑗 is independent of the remaining 𝑝𝑘,𝑠 ’s, for the same position 𝑞𝑘 and the same trial l (see Figure 5-2, left panel). Now, for each set of lists in (5.11), as the one shown inside the rectangle, we apply the same considerations of the previous section, assuming that 𝑙 each sum 𝜋𝑘𝑙 = ∑𝑗 𝑝𝑘,𝑗 lies in the interval [𝑝, 𝑝 + 𝑑𝑝] for each list indexed by l. Thus, for each 𝑞𝑘 we will have
𝑓(𝑞𝑘 , 𝑝; 𝑡) =
1 √2𝜋𝜎 2 (𝑞𝑘 , 𝑡)
exp {−
[𝑝 − 𝜇(𝑞𝑘 , 𝑡)]2 }, 2𝜎 2 (𝑞𝑘 , 𝑡)
where the average momentum value and its variance must be also labelled by the same 𝑞𝑘 . 𝑙 The probability distribution of the sums ∑𝑗 𝑝𝑘,𝑗 , for each k, will present patterns such as the one shown as an example in Figure 5-2, right panel. Now we note that the variable 𝑞𝑘 is continuous, and it would be quite meaningless to assume 𝑞𝑘 as the label for each set of lists (each 𝑞𝑘 would appear only once, if appearing at all), and we must change this labelling process by assuming that each label should be given by the interval 𝑞𝑘 , 𝑞𝑘 + δ𝑞𝑘 , such that we now have an interval in 𝑞𝑘 upon which we can define a probability density 𝜌(𝑞𝑘 , 𝑡). The derivation of the function 𝑓(𝑞𝑘 , 𝑝; 𝑡) goes the same way, but now we find its expression as 𝐹(𝑞, 𝑝; 𝑡) =
𝜌(𝑞, 𝑡) √2𝜋𝜎 2 (𝑞, 𝑡)
exp {−
[𝑝 − 𝑝(𝑞, 𝑡)]2 }, 2𝜎 2 (𝑞, 𝑡)
which is the result we wanted to show. The arguments presented above can be formally represented as a direct generalization of the arguments of the previous section. As shown in (5.11), we now have for each point 𝑞 on configuration space the probability of being in the interval [𝑞, 𝑞 + 𝑑𝑞] × [𝑝, 𝑝 + 𝑑𝑝] given by the sum of (1)
(1)
(1)
(1)
(2)
(2)
(2)
(2)
𝑤 (𝑞, 𝑝1 ; 𝑡) 𝑑𝑝1 ⋯ 𝑤 (𝑞, 𝑝𝑛 ; 𝑡) 𝑑𝑝𝑛 𝐹(𝑞, 𝑝; 𝑡)𝑑𝑞𝑑𝑝 = 𝜌(𝑞; 𝑡) ∙ 𝑠𝑢𝑚 ↓
𝑤 (𝑞, 𝑝1 ; 𝑡) 𝑑𝑝1 ⋯ 𝑤 (𝑞, 𝑝𝑛 ; 𝑡) 𝑑𝑝𝑛 𝑑𝑞 ⋮ (𝑛) (𝑛) (𝑛) (𝑛) {𝑤 (𝑞, 𝑝1 ; 𝑡) 𝑑𝑝1 ⋯ 𝑤 (𝑞, 𝑝𝑛 ; 𝑡) 𝑑𝑝𝑛 }
such that 𝑛 𝐹(𝑞, 𝑝; 𝑡)𝑑𝑞𝑑𝑝 = 𝜌(𝑞; 𝑡) ∫ ⏟𝑦 ⋯ ∫𝑦 ∏𝑘=1 𝑤(𝑞, 𝑝𝑘 ; 𝑡)𝑑𝑝𝑘 𝑑𝑞. 1
𝑛
𝑑𝑝
Quantum Mechanics and the Central Limit Theorem
133
Similar steps from (5.7) to (5.8) give again 𝛿(𝑝 − ∑𝑛𝑘=1 𝑝𝑘 ) =
1 +∞ −𝑖𝜃[𝑝−∑𝑛 𝑝 ] 𝑘=1 𝑘 𝑑𝜃, ∫ 𝑒 2𝜋 −∞
and +∞
+∞
𝐹(𝑞, 𝑝; 𝑡) = 𝜌(𝑞; 𝑡) ∫−∞ 𝑒 −𝑖𝜃𝑦 𝑑𝑡 ∏𝑛𝑘=1 ∫−∞ 𝑒 −𝑖𝜃𝑝𝑘 𝑤(𝑞, 𝑝𝑘 ; 𝑡)𝑑𝑝𝑘 , such that, defining 𝑧(𝑞, 𝜃; 𝑡) = ∫ 𝑒 𝑖𝑝𝑘 𝜃 𝑤(𝑞, 𝑝𝑘 ; 𝑡)𝑑𝑝𝑘 , we get 𝑍 (𝑛) (𝑞, 𝜃; 𝑡) = 𝜌(𝑞; 𝑡)[𝑧(𝑞, 𝜃; 𝑡)]𝑛 , where the difference now is the appearance of the extra term 𝜌(𝑞; 𝑡). The reduced characteristic function becomes 𝑖√𝑛𝑚(𝑞;𝑡) 𝑍̃ (𝑛) (𝑞, 𝜃; 𝑡) = 𝜌(𝑞; 𝑡)𝑒𝑥𝑝 (− 𝑠(𝑞;𝑡) 𝜃) [𝑧 (𝑞,
𝜃
𝑛
; 𝑡)] , 𝑛𝑠(𝑞;𝑡)
√
where 𝑚(𝑞; 𝑡) and 𝑠(𝑞; 𝑡) are the average and variance for each 𝑞. Thus 𝑖 √𝑛𝑚(𝑞; 𝑡) 𝜃 ln𝑍̃ (𝑛) (𝑞, 𝜃; 𝑡) = ln𝜌(𝑞; 𝑡) − 𝜃 + 𝑛ln [𝑧 (𝑞, ; 𝑡)]. 𝑠(𝑞; 𝑡) √𝑛𝑠(𝑞; 𝑡) However 𝑧(𝑞, 𝜃; 𝑡) = 1 + 𝑖𝑚(𝑞; 𝑡)𝜃 −
𝑚2 (𝑞;𝑡)+𝑠2 (𝑞;𝑡) 2
𝜃 2 + 𝑅(𝜃 3 )
and we end with 𝑖√𝑛𝑚(𝑞;𝑡) 𝑖𝑚(𝑞;𝑡)θ 𝑚 ln𝑍̃ (𝑛) (𝑞, 𝜃; 𝑡) = ln𝜌(𝑞) − 𝜃 + 𝑛ln [1 + − 𝑠(𝑞;𝑡)
√𝑛𝑠(𝑞;𝑡)
2 (𝑞;𝑡)+𝑠 2 (𝑞;𝑡)
2𝑛𝑠2 (𝑞;𝑡)
𝜃2 + 𝑅 (
𝜃3
𝑛3/2
)], (5.12)
𝜃3
where 𝑅𝑛 = 𝑅 (𝑛3/2) depends on 𝑛 as 𝑛
−𝛼
, with 𝛼 ≥ 3/2. The logarithm in the previous
expression can be written as a series of the type ln(1 + 𝜉) = 𝜉 −
𝜉2 𝜉3 + +⋯ 2 3
134
Olavo Leopoldino da Silva Filho
where 𝜉=
𝑚(𝑞; 𝑡) √𝑛𝑠(𝑞; 𝑡)
𝜃+
𝑚2 (𝑞; 𝑡) + 𝑠 2 (𝑞; 𝑡) 2 𝜃 + 𝑅𝑛 . 2𝑛𝑠 2 (𝑞; 𝑡)
We find 1 ln𝑍̃ (𝑛) (𝑞, 𝜃; 𝑡) = ln𝜌(𝑞) − 𝜃 2 + Ω𝑛 (𝜃), 2 where, if we put 𝑅𝑛 = 𝑅(𝜃 3 )/𝑛3/2 (𝑅 not depending on 𝑛, since the remaining terms will be divided by greater orders of 𝑛 which is as big as we want), we get, using the logarithm expansion up to second order in 𝜉, Ω𝑛 (𝜃) = 𝑅
2
1
𝑖𝑚/𝑠(𝑚2 /𝑠2 +1) 2
√
1) 𝜃 ] − 2
[𝑅 + 𝑛 1 𝑛2
1 𝑖𝑚𝑅
𝜃3] − 𝑛 {
𝑠
𝑚2
𝑚2
1
1
𝑚2
𝜃 − [2𝑠2 (2𝑠2 + 1) + 8] 𝜃 4 } + 𝑛3/2 [( 𝑠2 +
2
𝑅 + ⋯,
and since 𝑛𝑚 and √𝑛𝑠 are finite (𝑚⁄𝑠 2 is finite, 𝑚2⁄𝑠 2 = 𝑜(𝑛−1 ), …), we have lim Ω𝑛 (𝜃) = 𝑛→∞
0. Thus, for 𝑛 sufficiently large we have 1 lim ln𝑍̃ (𝑛) (𝑞, 𝜃; 𝑡) = ln𝑍̃(𝑞, 𝜃; 𝑡) = ln𝜌(𝑞) − 𝜃 2 . 2
𝑛→∞
We then find, by inversion of the characteristic function 𝑍̃ = 𝑍̃ (∞) reated to 𝜃, which is not assumed infinitesimal (now writing 𝜃 = 𝛿𝑞/ℏ),
𝐹(𝑞, 𝑝; 𝑡) =
𝜌(𝑞;𝑡) [2𝜋(−ℏ2
1/2 ∂2 ln𝑍 ) ] ∂(δ𝑞)2 0
∂ln𝑍 ) ] ∂(δ𝑞) 0 2 ln𝑍 ∂ ) 2(−ℏ2 ∂(δ𝑞)2 0
2
[𝑝−(−𝑖ℏ
exp {−
},
(5.13)
after proper normalization, which is the desired result [84]. The above results are quite clear with respect to the interpretation of the symbol 𝛿𝑞 and the notion of “infinitesimal” to which we refer in chapters two to four. In fact, in the above developments, there would appear an infinite number of independent random variables 𝑝𝑘 if we increase the number 𝑛 without limits; the terms neglected in Ω𝑛 (𝜃) tend to zero as 𝑛 → ∞ and 𝜃 (or δ𝑞) does not need to be an infinitesimal parameter, thus giving a precise meaning to the integration in (5.2)—or expression (5.13). Indeed, what becomes infinitesimal as 𝑛 → ∞ are 𝑚(𝑞; 𝑡) and 𝑠 2 (𝑞; 𝑡), such that 𝑛𝑚(𝑞; 𝑡) and 𝑛𝑠 2 (𝑞; 𝑡) are finite.
Quantum Mechanics and the Central Limit Theorem
135
5.3. CONNECTION WITH THE CHARACTERISTIC FUNCTION DERIVATION In the second chapter we have presented a derivation of the Schrödinger equation based on the concept of characteristic functions. This derivation was shown equivalent to those of chapters three and four; in particular, the derivation made in chapter three profits from the introduction of the important concept of entropy, which was used by us in chapter three to show the stochastic character of the theory. In this section we will unravel the connections between the characteristic function derivation and the Central Limit Theorem. The clarifications to emerge will equally apply to all other derivations already seen that also use the notion of an expansion up to second order in some parameter. This ultimate connection sheds the final light upon the correct interpretation of the quantum formalism, to which we will return in the last chapter of this book. The characteristic function derivation also begins with the Liouville equation (stochastic or not) and the definition of a characteristic function (at this point we write it as 𝜁(𝑞, 𝛿𝑞; 𝑡) since we do not know if it should correspond to 𝑧(𝑞, 𝛿𝑞; 𝑡) or 𝑍(𝑞, 𝛿𝑞; 𝑡)) +∞
𝑖
𝜁(𝑞, 𝛿𝑞; 𝑡) = ∫−∞ exp ( 𝑝𝛿𝑞) 𝐹(𝑞, 𝑝; 𝑡)𝑑𝑝,
(5.14)
ℏ
with 𝜁(𝑞, 𝛿𝑞; 𝑡) = 𝜓 ∗ (𝑞 −
𝛿𝑞 2
; 𝑡) 𝜓 (𝑞 +
𝛿𝑞 2
; 𝑡)
(5.15)
and 𝑖
𝜓(𝑞; 𝑡) = 𝜔(𝑞; 𝑡)exp [ℏ 𝜆(𝑞; 𝑡)].
(5.16)
If we expand expression (5.15) up to second order as 1 𝛿𝑞 2 1 ∂𝜔
1
∂𝜔 2
𝑖 ∂𝜆
𝜁(𝑞, 𝛿𝑞; 𝑡) = 𝜔(𝑞; 𝑡)2 {1 + ( ) [ − 2 ( ) ]} exp ( 𝛿𝑞) 2 2 𝜔 ∂𝑞 2 𝜔 ∂𝑞 ℏ ∂𝑞 = 𝜔(𝑞; 𝑡)2 [1 − 𝑖
(𝛿𝑞)2 ∂2 ln𝜌(𝑞;𝑡) 8
= 𝜌(𝑞; 𝑡) [1 + ℏ 𝜇𝛿𝑞 −
𝑖 ∂𝜆
] exp (ℏ ∂𝑞 𝛿𝑞)
∂𝑞 2 (𝜇 2 +𝜎 2 ) 2ℏ2
,
(5.17)
(𝛿𝑞)2 + 𝑂(𝛿𝑞3 )]
which is nothing but the expression for 𝑍 (𝑛) (𝑞, 𝛿𝑞; 𝑡) as long as (using the same nomenclature of the previous section) 𝜌(𝑞; 𝑡) = 𝜔(𝑞; 𝑡)2 ; 𝜇 = 𝑛𝑚(𝑞; 𝑡) =
∂𝜆(𝑞; 𝑡) 1 ∂2 ln𝜌(𝑞; 𝑡) ; 𝜎 2 = 𝑛𝑠 2 (𝑞; 𝑡) = − , ∂𝑞 4 ∂𝑞2
136
Olavo Leopoldino da Silva Filho
where we stress the appearance of the number n, since 𝜁 = 𝑍 (𝑛) (𝑞, 𝛿𝑞; 𝑡) is the characteristic function for the sum of random variables in 𝑝𝑘 . Note that the term within brackets in (5.17) is just 𝑧(𝑞, 𝛿𝑞; 𝑡)𝑛 . This means, thus, that the term 𝑂(𝛿𝑞3 ) is, in fact, a term depending on 𝑛−𝛼 , for some positive real number 𝛼. This term disappears as we make 𝑛 → ∞. Therefore, its disappearance is not due to some infinitesimal character of the parameter 𝛿𝑞. This parameter is a finite quantity but the last term is such that lim 𝑂(𝛿𝑞3 ) → 0. 𝑛→∞
Thus, in chapter two we used expansion (5.17) up to second order based on an alleged infinitesimal character of the parameter 𝛿𝑞 (the best we could do at that point). Now, after the developments of this chapter, we clearly see that it would be a mistake to take the characteristic function 𝑧(𝑞, 𝛿𝑞; 𝑡), related with the probability density function 𝑓𝑘 (𝑞, 𝑝𝑘 ; 𝑡) of the single element 𝑞, 𝑝𝑘 , for the one 𝑍 (𝑛) (𝑞, 𝛿𝑞; 𝑡) related with the probability density function 𝐹(𝑞, 𝑝; 𝑡), regarding the sum 𝑝 = ∑𝑛𝑘=1 𝑝𝑘 and the position 𝑞, and given by 𝑍 (𝑛) (𝑞, 𝛿𝑞; 𝑡) = 𝜌(𝑞; 𝑡)[𝑧(𝑞, δ𝑞; 𝑡)]𝑛 .
(5.18)
It is easy to show that expression (5.17), with the previous definitions, when substituted in the equation for the characteristic function gives the equations from which the Schrödinger equation is derived. Thus, whenever we write the correct characteristic function for the correct stochastic process, there is no need to consider δ𝑞 as an infinitesimal parameter when retaining only terms up to second order in the expansions; these results are exact in all possible senses. The axioms of the theory coming from this derivation are: Axiom 1: The Classical Stochastic Liouville equation is valid for the description of any quantum system described by the Schrödinger equation; Axiom 2: The characteristic function 𝑍 (𝑛) (𝑞, 𝛿𝑞; 𝑡) of the random variable 𝑝 = ∑𝑛𝑘=1 𝑝𝑘 can be written (in the limit 𝑛 → ∞) as the product Ψ ∗ (𝑞 −
δ𝑞 2
; 𝑡) Ψ (𝑞 +
δ𝑞 2
; 𝑡)
(5.19)
for any quantum system, and Quantum Mechanics refers to the universality class defined by the CLT. Note that now we do not need to include the definition of the characteristic function 𝑧 or 𝑍 (𝑛) as an axiom, since they are only Fourier transforms of the probability density function, without any assumption about the infinitesimal nature of the parameter δ𝑞. The extra assumption which were lacking in the developments of chapter two is the one regarding the reference of Quantum Mechanics to the universality class defined by the CLT, which is now fully clarified. Having fixed the formal aspects of the Schrödinger equation derivation, it remains to physically interpret the results and their consequences, in particular with respect (a) to the fact that the quantum mechanical formalism has been derived from the classical (but not Newtonian) framework and (b) the consequences of having the random
Quantum Mechanics and the Central Limit Theorem
137
variable 𝑝 = 𝑝1 + ⋯ + 𝑝𝑛 with 𝑛 → ∞. We leave this to the last chapter of this book. However, we do take some steps into interpretation issues in future sections of this chapter. Before considering these interpretation issues, let us show how the Schrödinger equation could be derived from the CLT and the stochastic Liouville equation. The derivation to be presented in the next section is, thus, another derivation of the Schrödinger equation from a different set of postulates. As we have already said, each one of these derivations give us some information about the processes and characteristics underlying quantum phenomena in general. The present one, based on the CLT, enlightens us about all those issues regarding second order expansions and connect quantum phenomena to the requisites of the CLT (for instance, independence of all the 𝑝𝑗 ).
5.4. THE CLT DERIVATION OF THE SCHRÖDINGER EQUATION Our two new axioms are: Axiom 1: The Classical Stochastic Liouville equation is valid for the description of any quantum system also described by the Schrödinger equation; Axiom 2: Assuming the validity of the CLT, our probability density function on phase space must be written as 𝐹(𝑞, 𝑝; 𝑡) =
𝜌(𝑥,𝑡) √2𝜋𝜎 2 (𝑞,𝑡)
exp {−
[𝑝−𝜇(𝑞,𝑡)]2 2𝜎 2 (𝑞,𝑡)
},
(5.20)
with 𝜎 2 (𝑞, 𝑡) = −
ℏ2 ∂2 ln𝜌(𝑥,𝑡) 4
(5.21)
∂𝑥 2
and 𝜇(𝑞, 𝑡) =
∂𝑆(𝑞,𝑡)
(5.22)
∂𝑞 𝑖𝑆(𝑞,𝑡)
where 𝜓(𝑞, 𝑡) = 𝑅(𝑞, 𝑡) exp (
ℏ
) and 𝜌(𝑞, 𝑡) = 𝑅(𝑞, 𝑡)2 and 𝑝 refers to the sum of an
infinite number of independent random variables on momentum space. Now, if we take 𝐹(𝑞, 𝑝, 𝑡) into the stochastic Liouville equation and make integrations in 𝑝 for the first two statistical moments, we get the Schrödinger equation exactly as in the entropy derivation (this is straightforward and it will not be presented here).
138
Olavo Leopoldino da Silva Filho
5.5. ENSEMBLE AND SINGLE SYSTEMS REVISITED The previous results may be used to clarify the long-lived debate about single-system and ensemble interpretations of Quantum Mechanics (see [22, 87, 88]). As was mentioned above, if we are to interpret the CLT in the way it is usually interpreted (not only in physical applications, but in any statistical application as well), then the phase-space variables (𝑞𝑖 , 𝑝𝑖 ) must be random variables; this is precisely the point where the role of the fluctuations becomes crucial and may be used to unravel epistemological issues. Note, furthermore, that in the case of physical systems 𝑞𝑖 is defined by 𝑝𝑖 as 𝑞𝑖 = 𝑝𝑖 𝑡/𝑚. The role of the fluctuations in quantum mechanical processes is nowadays viewed as a platitude. This has already been elucidated by many pioneering studies on the foundations of Quantum Mechanics, mainly those on its stochastic support. Indeed, our derivation process developed in chapter three makes their role quite evident, while connecting it to the derivation process using the notion of a characteristic function. We also showed in chapter four the connections of our two derivation methods with the more direct stochastic derivations found in the literature. In chapter four, we showed that the use of the Wiener-Khinchin relations gives directly the time-energy Heisenberg relations Δ𝐸Δ𝑡 ≥ ℏ/2. This approach also fixes the interpretation of the time interval Δ𝑡 as precisely the minimum time interval to let the fluctuations fill the sample space and to let the probability density function to arrive at its stationary functional appearance. This means that, for single systems, there must be a minimum time window Δ𝑡 (depending upon the underlying amount of fluctuation in the energy) in such a manner as to let these systems to reproduce the effect of an ensemble. This minimum window, it is needless to say, is precisely the one appearing in Heisenberg conditions, and is an objective feature of each physical system. Indeed, Heisenberg conditions are the very prescription for the possibility of making a connection between ensembles and single systems’ averages and they come as the mathematical counterpart of the ergodic assumption. This answers our first question about the physical interpretation of the limit 𝑛 → ∞ in the CLT; indeed, this limit represents mathematically the physical requisite of allowing the system enough time to fluctuate in such a manner as to let its distribution to approach a Gaussian one—the mean square deviations will play an important role in other considerations that we leave to a future section. The fluctuations, however, imply much more. With respect to conservative quantum systems, with a time-independent potential function, we have the stationary character of the solutions. For such random systems the Ergodic assumption usually applies without restriction, meaning that we can pass from (a) averaging over the same individual system during an interval of time sufficiently long to let the fluctuations fill the allowed sample space on the energy space to (b) instantaneously averaging over an ensemble of equally prepared systems. This means that the two approaches are fully equivalent and thus, it makes no sense of preferring one over the other in making our interpretation. Nonetheless, it is important to be aware of the fact that specific experimental settings might select one situation over the other and, since their time-structure may be quite different, the correct interpretation of the phenomenon may crucially depend upon the right choice of
Quantum Mechanics and the Central Limit Theorem
139
the specific time-structure of the particular experimental settings. As we said in the previous chapter, for instance, double-slit and other similar experiments seem to be of an “instantaneous time ensemble type”, since it is necessary to perform the experiment with many incoming particles (ensemble), which interact with the photographic plate in quite short times (instantaneous time), to get the interference pattern in the photographic plate. Each measurement is destructive and, thus, the ergodic assumption does not apply. On the other hand, measurements of energy levels in hydrogen atoms, for instance, seem to be of a “long time single system type” since it is now possible to experimentally isolate one single hydrogen atom and perform such measurements, leaving fixed some minimum time window. The need for a “fluctuation window” in single system experiments could have been advanced simply considering the expression for the stochastic variables 𝑝 and 𝑞 which, being sums, cannot be defined for a single system without it. In ensemble-like experiments this is not necessary, since each 𝑝𝑘 , 𝑥𝑘 may refer to a different copy of the system within the ensemble. In any case, since our derivations begin with the (stochastic) Liouville equation, it is important to note that the ensemble perspective is the primary one; in fact, the single system perspective can be applied only in the special circumstances in which the ergodic assumption applies and, when it applies, it is immediate that Quantum Mechanics will also apply to an ensemble made out of these single systems. Thus, whenever Quantum Mechanics is applicable to single systems, it is also applicable to ensembles, but when it is applicable to ensembles it is not necessarily applicable to single systems. The previous considerations put the so called Duality Principle in a totally new perspective, as we show in the next section.
5.6. ONTOLOGICAL DUALITY AS A FALSE PROBLEM The Duality Principle states roughly that a quantum system (or some of its parts) may behave as a wave or a particle, depending on the type of the experiment. Stated this way, which we could call its “weak” version, the duality principle is nothing but the utterance of a fact. This principle becomes an epistemological problem when one extrapolates the realm of purely observational conclusions to the ontological status of these systems or their constituents, which we would call the “strong” version of the principle. Indeed, it became common usage to say, for instance, that the nature of the electron is dual (particle/wavelike) and that it is the experimental settings that will reveal one of its “facets” or the other (the strong version of the duality principle comes from what we have already called in chapter one the ontological interpretation of Heisenberg’s relations). The range of problems that such an extrapolation arouses is far reaching, as we would try to clarify in what follows. From the point of view of the present results, this ontological interpretation comes from a confusion about the specific type of time structure involved in actual experiments, a difference that can be quite subtle sometimes.
140
Olavo Leopoldino da Silva Filho
As an example, in the case of the double slit interferometry (which is of an ensemble type, since the actual interference pattern appears in the plate only after a great number of incident particles has passed through the slits) it is not the electron that behaves like a wave, but it is the ensemble that behaves like that, since it is only after many equally prepared incident electrons (passing through the slits) have reached the photographic plate that the interference pattern is revealed. That such an interpretation is possible was already advanced by ourselves in chapter three. At other experiments, as for example the electron in a single hydrogen atom, it is not the electron that is a wave at some instant of time, but the same electron behaves like a wave when considered for a sufficiently large time interval Δ𝑡 (in the sense already stressed). This behavior, furthermore, comes from the specific type of randomness of the system. In both experiments the electron is considered in an ontological sense as being a corpuscle, exactly as it must be considered when it reaches the photographic plate, in the double-slit experiment, or when it absorbs a photon (a particle) in hydrogen atoms level transitions (that is, as a localized entity). This interpretation allows us to get rid of the “strong” notion of duality as a false problem; a problem based upon the absence, at the time of its postulation, of the unification power of the previously developed notions. Thus, from the ontological perspective, the present interpretation may stick up with the corpuscular character of those quantum entities (everyone always calls them particles anyway) without losing the richness of the quantum world. In a sense, this interpretation allows us to “live in the best of two worlds” for it gives us a dual descriptive structure (instant-time ensemble or minimum time-interval single-particle) to analyze the phenomena without becoming committed with an ontological duality. This keeps untouched the “weak” version of the principle, related to the behavior of systems, while discarding its “strong” version, related to the nature of the systems’ constituents. It is needless to stress that this interpretation of the “duality” also discards the Complementary Principle; if we understand the Complementary Principle as, roughly, the need of interpreting quantum systems using classical concepts and the incomplete nature of such an interpretation, then there is no room for it in the present approach. Since the “incomplete” nature of our classical interpretations is due, as is postulated by the traditional approach, to the “non-classical” dual character (in the “strong” ontological sense) of the quantum entities and since we have already discarded this ontological version of the duality principle, the Complementary Principle must be rejected altogether as irrelevant. What we do need in our interpretations is the assumption of fluctuations for quantum systems—and, in fact, just that. These quantum fluctuations are also present in classical “macroscopic” systems but are of no importance at this scale—sometimes even in such systems they may become important, as is the case of Bose-Einstein condensates; in such cases, the quantum nature of the phenomenon reveals itself, irrespective of this “macroscopic” character (in fact, this should be the defining concept of what “macroscopic” means, instead of making the association between “macroscopic” and “big”: macroscopic is any system in which quantum mechanical fluctuations play no role, whatever size this system has). The issue then is not one of an ontological distinction between classical and quantum worlds (pictures); the rationale for such a separation (that is of great practical importance, since it will drive us in the use of quantum or classical equations of motion) is the objective relevance of the fluctuations to the overall phenomenon.
Quantum Mechanics and the Central Limit Theorem
141
This explains in a straightforward way why “macroscopic” (in size) systems may present quantum (“microscopic”) behavior. While such phenomena introduce great difficulties to interpretations that pose ontological distinctions between the classical and the quantum, they are harmless for the present approach, since we do not make such ontological distinctions, but only an “empirical” one: a classical system is one in which fluctuations related to ℏ are of no importance, while a quantum system is one in which they are crucial, with all intermediary possibilities allowed—we call this distinction “empirical” for we cannot know in advance when one or the other possibility will take place4. The size of the system with respect to the properties being investigated is usually a good first indication, but this may occasionally fail: the words “quantum” and “classical” should then be understood as referring only to the difference in the relative importance of the fluctuations, while the ontological status of the interpretation remains purely classical (in the sense that we do not need the “non-classical” principles of duality, complementarity, and indeterminacy above mentioned). Therefore, the interpretation freedom that our dual characterization of the system’s time structure implies should not be underestimated. Indeed, it has great consequences to the so called “measurement theory”. It should have already become evident to the reader that the present approach does not need the notion of an observer—a notion that is related to both the ontological and epistemological versions of the Duality Principle and, thus, also to the Complementary Principle. The dispersions happening in Heisenberg’s inequalities are considered here an intrinsic property of physical systems, not an extrinsic one, that is, one related to the influence of an external observer. In fact, the epistemological interpretation of Heisenberg’s relations, when careful analyzed, leads to completely unacceptable results, as the next subsection reveals. This is why we keep the ontological interpretation as our interlocutor throughout this book.
5.6.1. The Epistemological Interpretation of Heisenberg Relations To see that the epistemological interpretation of Heisenberg’s relations does lead to unacceptable results, let us consider the following analysis: It is quite common to use Heisenberg’s relations (mainly the one related to position and momentum) with the inequality sign as in Δ𝑞Δ𝑝 ≥ ℏ/2, since it is this result that comes from the general analysis of the operator structure of Quantum Mechanics. This usage induces the reader to conclude that at each quantum state of the system these dispersions (errors or indeterminacies) could have any value, while respecting the above limiting lower value, and it is this freedom on the value of the product of dispersions that points to the interpretation that such dispersions depend upon something extrinsic to the quantum mechanical system (which wouldn’t be closed, clearly): this extrinsic element is the Surely, “quantum” in this sense is a subclass of `classical' from the semantic point of view and it would be better to talk in terms of: (a) classical systems subjected to quantum fluctuations (obeying Heisenberg relations), (b) classical systems subjected to arbitrary fluctuations (thermodynamics or usual statistical mechanics) and (c) classical systems without fluctuations (Newtonian mechanics). All these are classical because none of them needs the usual constructs of observer, ontological duality, complementarity, etc. that moulds traditional quantum mechanical semantics. Since it is easier to change mentalities than fixed nomenclatures, let us leave the situation as it stays.
4
142
Olavo Leopoldino da Silva Filho
observer and we could say that all measurement theory is developed to take its influence into account. However, the use of the inequality sign together with this interpretation is misleading. Each quantum state can have obviously only one value for the product Δ𝑞Δ𝑝 (or any other similar product) for the very reason that each such state has definite values for the quantities Δ𝑞 and Δ𝑝, calculated from its probability amplitude (e.g. in the case of the harmonic oscillator, we have the equality Δ𝑞Δ𝑝 = (2𝑛 + 1)ℏ𝜔 for each state labelled by 𝑛). This means, since it is an equality, that if we keep with the extrinsic approach in which the observer plays proeminent role we will reach the conclusion that, being at level 𝑛, the harmonic oscillator will have the same indeterminacy in the product Δ𝑞Δ𝑝, irrespective of the measuring apparatus used by the observer (a very precise laser system or a stick)—implying that the very notion of an observer is useless. Indeed, given that this equality is a constant number, it doesn’t matter at all what kind of instrument we use to observe the properties of some physical system, since the dispersion (errors, indeterminacies) will always be the ones already calculated. This preposterous result comes obviously from the fact that we are using a semantic element (the observer) that has no syntactic counterpart (no associated formal variable within the Schrödinger equation), meaning that we can change the characteristics of the observer (at the semantic level) without changing a bit the formal results (at the syntactic level). The present approach, sustaining that the “strong” version of the Duality Principle is a false problem (for those to whom it is a problem anyway) also defends that the use of the notion of an “observer” in the interpretation apparatus of Quantum Mechanics serves only to turn it into a quite metaphysical theory5, no matter how much mathematics one uses to develop it—after all, it is not impossible, and it is really quite easy, to formalize wrong ideas. The “reduction of the wave packet” principle, used in measurement theory to explain, for instance, why the wavelike electron of the double slit experiment strikes the photographic plate as a particle (a point in the emulsion) can also be easily avoided, since there is no wavelike nature to be reduced. In fact, the reduction of the wave packet principle is an interpretation tool that is a consequence of the ontological approach we are criticizing and is mainly responsible for the introduction of an observer (of a very special kind: conscious, etc.), as we tried to make clear in the historical introduction in chapter one. This principle is also responsible for introducing other very difficult interpretation problems (for instance, consciousness and Schrödinger’s cat) and to get rid of it is to dismiss with a lot of burden to the interpretation structure of Quantum Mechanics. From this point of view, new proposals of quantum gravity that are extensions of the Schrödinger equation, making it a non-linear equation and giving the above “reduction” principle possibly a more acceptable solution, become senseless (at least when considered from the point of view of such intentions), since they only represent, from the standpoint of the present approach, those desperate measures to save an illegitimate interpretation.
5
Metaphysics is being used here in the worst sense of the word. Indeed, having elements at the semantic level without any reference at the syntactic level allows one to use these elements as the carpet under which every single problem of the theory could be hidden—usually under the name of “weirdness”.
Quantum Mechanics and the Central Limit Theorem
143
5.7. THE DISPERSION RELATIONS If we are to discard the usual extrinsic interpretation of the Heisenberg “uncertainty” relations, we ought to provide an alternative interpretation, more akin to the constructs of our intrinsic approach. Again, the process of derivation presented in chapter three may be of inestimable help. Indeed, we showed that to arrive at the Schrödinger equation using the stochastic Liouville equation it is necessary to impose the constraint (𝛿𝑞)2 ⋅ (𝛿𝑝)2 =
ℏ2 4
,
(5.23)
where (𝛿𝑞)2 𝜌(𝑞, 𝑡) = ∫ 𝑞2 𝐹(𝑞, 𝑝; 𝑡)𝑑𝑝 − [∫ 𝑞𝐹(𝑞, 𝑝; 𝑡)𝑑𝑝]2 (𝛿𝑝)2𝜌(𝑞, 𝑡) = ∫ [𝑝 − 𝑝(𝑞, 𝑡)]2𝐹(𝑞, 𝑝; 𝑡)𝑑𝑝
(5.24)
are both functions of 𝑞 and 𝑡 (in general). To arrive at the usual Heisenberg’s relations we may simple integrate these functions over configuration space[42] multiply them and take the square root of the result, as we have shown in chapter three, section 3.2. We thus want to interpret equation (5.23) in an intrinsic manner. For the sake of simplicity, let us work only with the position-momentum inequality (the time-energy one has already been analyzed). In this case, it is not difficult at all to provide a straightforward intrinsic interpretation for the relation (5.23), and the equality sign plays again an invaluable help in doing so. Indeed, relation (5.23) simply states that quantum systems cannot fluctuate in any fancy way, but they must fluctuate (for the Schrödinger equation to be valid, as the derivation process shows) in such a manner that, large fluctuations in the position must be accompanied by small fluctuations in the momentum, while large fluctuations in the momentum must be accompanied by small fluctuations in position at each point of the configuration space. Now, we can return to the explanation of these issues, presented only as an intuition in chapter three, to exemplify this behavior. Thus, consider the state 𝑛 = 1 of the harmonic oscillator; the solution of this problem implies that we must have the (normalized) phasespace probability density given by (ℏ = 𝑚 = 𝜔 = 1, see chapter three) 𝐹(𝑞, 𝑝; 𝑡) =
2
2
𝑞 2 𝑒 −𝑞
𝜋
√(𝑞 2 +1)/𝑞 2
𝑞 2 𝑝2
exp [− (𝑞2
],
+1)
(5.25)
which is a result of the expression (5.1) for this particular energy level (see Figure 3-2 at chapter three to see the dynamical behavior of the system). Using Figure 3-2, we can note that, near the 𝑞-axis origin we have strong fluctuations in the momentum, given by expression 1
1
(𝛿𝑝)2 = (1 + 2), 2 𝑞 but very small fluctuations in the position, given by expression (cf. expression (5.15))
(5.26)
144
Olavo Leopoldino da Silva Filho (𝛿𝑞)2 =
1 𝑞2 2 𝑞 2 +1
;
(5.27)
graphically this may be seen by the fact that the level curves are nearly parallel to the momentum axis at this region. Far from the origin (in the 𝑞-axis) both fluctuations tend to show the same value of 1/2. Thus, it is this structure of the fluctuations, related to the momentum and position mean square deviations, that reflects the quantum mechanical nature of the 𝑛 = 1 harmonic oscillator level. Note again that this happens for each point 𝑞 on configuration space. These last comments takes us to our next point.
5.7.1. What Do we Need Quantum Mechanics for? The reader may be wondering that, if we can arrive at the expression (5.1) by means only of the CLT, then Quantum Mechanics, understood as a tool devised for calculation purposes, is no longer needed. This would be a highly misleading conclusion. The CLT gives us only a general result, reflecting the type of statistical distribution that all physical systems will present whenever they fall into the presuppositions of the theorem (independent random fluctuations). It says nothing about the characteristics of particular physical systems. The information concerning a particular system is contained within its specific characteristic function (or probability density upon configuration space), and this information is related to the (momentum) mean square deviations and the factor 𝜌(𝑞; 𝑡) appearing in (5.1). Thus, Quantum Mechanics (based on the Schrödinger equation) is the calculation device for considering particular fluctuating physical systems and it furnishes results that are coherent with the general CLT. The previous results show that the quantum formalism is capable of revealing the specific type of fluctuating structure a physical system must have to behave the way it behaves. In this sense we may say that the quantum mechanical formalism refers to a statistical approach of nature (because of the random character of the systems’ constituents movement) and reveals the fluctuation structure law of such random movements. This result, as far as we can see, has nothing to do with considerations about the ontological statistic or deterministic status of phenomena. From the axioms of the theory we can say that Quantum Mechanics departs from a statistical picture (random fluctuations) to arrive at its results and consequences, thus being the statistical character of nature its assumption, not its conclusion. As a matter of fact, it seems that the reality of a deterministic or indeterministic background of physical theories is a question that cannot be answered within the realm of physics! Physics always assume one or the other, without being capable of saying if, (a) for a deterministic model, there would be no stochastic background of which this seemingly deterministic structure is just a sort of mean field theory or, (b) for an indeterministic model, if there wouldn’t be some underlying deterministic structure the stochastic assumption mimics. The discussions about determinism or indeterminism is, in this sense, highly philosophical and metaphysical (no demerit on that) and plays no role within ordinary physics.
Quantum Mechanics and the Central Limit Theorem
145
5.8. LIMITATIONS OF THE PROOF It is very important to stress what is proved by the CLT. With the CLT and the derivations made in chapters two and three we can show that Theorem: assuming that we can write the characteristic function as 𝑍(𝑞, 𝛿𝑞; 𝑡) = 𝜓 ∗ (𝑞 − 𝛿𝑞/2; 𝑡)𝜓(𝑞 + 𝛿𝑞/2; 𝑡), then: Quantum Mechanics is related to independent random variables, if and only if the choice of developing the characteristic function up to second order is adequate. Proof. In fact, if we assume the second order representation of the characteristic function we are simply making a restatement of the CLT, which is connected with independent random variables (and Gaussian functions in 𝑝, for each 𝑞 ensues naturally)—since Quantum Mechanics follows from this assumption and the previous ansatz for the characteristic function, this implies that Quantum Mechanics is related to independent random variables. On the other hand, if we assume that Quantum Mechanics is related to independent random variables (for 𝑝 and 𝑞), then the CLT applies and we can develop the characteristic function up to second order. This shows that the assumption of using only the second order expansion is equivalent to the assumption of the independent random character of the (𝑞, 𝑝) phase space variables. This assumption is quite appealing, since this is a most general situation for the variables (𝑞, 𝑝) and this is why the outcome of the calculations is a Gaussian (or the CLT), which implies a universality class for statistical processes. Despite these appealing features of universality and simplicity, the proof of the CLT does not imply that we must connect Quantum Mechanics with second order expansions of the characteristic function since, at the end, Quantum Mechanics may not have anything to do with independent random fluctuations. This is the problem with “if and only if” proofs: to go from one side of the equivalence to the other we must always assume the respective antecedent, which may not be empirically adequate. There is an important argument to sustain that the expansion of the characteristic function must go only up to second order: in chapter two we show that to mathematically derive the Schrödinger equation from the stochastic Liouville equation we must use the infinitesimal transformation. In fact, if we use a finite transformation, the mathematical result (the Schrödinger equation) does not follow. However, we are again assuming that the Liouville stochastic equation should be adequate to represent quantum mechanical phenomena. In fact, we have shown in the second chapter that: Theorem: assuming that the stochastic Liouville equation is adequate to represent quantum mechanical phenomena, then Quantum Mechanics is invariant by the choice of the
146
Olavo Leopoldino da Silva Filho
coordinate system only if the characteristic function must be developed only up to second order (or, equivalently, if we can treat 𝛿𝑞, for each step, as infinitesimal)6. Again, the stochastic Liouville equation may not be adequate to represent Quantum Mechanics. It should be obvious that the adequacy of expanding in such and such manners the characteristic function, assuming independent random variables or infinitesimal displacements, etc. pertains to the axioms of the derivation and as such, cannot be proved within the theory. This have important consequences. As the reader may be aware of, there are other phase space representations of Quantum Mechanics that depart from the present one in some relevant aspects. For instance, there is the phase space representation of Quantum Mechanics provided by Wigner’s distributions that would imply in a Kramers-Moyal Liouville-like equation with an infinite number of terms[41],[44]. As we will show in the next chapter, the present approach implies a Focker-Planck type equation which means that the Kramers-Moyal equation, in the present approach, ends in the second order term. There is a theorem, to which we will return in a latter chapter, that shows that the Kramers-Moyal equation goes up to the second order term or it must have an infinity of terms (no other intermediary possibility). Thus, it remains for us to analyze these two possibilities by comparing the present results with those obtained using Wigner’s distributions (or any other). This is a theme for a future chapter but it is interesting to stress that Wigner’s distributions are not positive definite, which is a very strange property for a probability function to show. On the other hand, the phase space probability density function developed in this book is positive definite by construction. Needless to say, both approaches give exactly the same results when it comes to configuration space probability densities, averages, etc. However, they must disagree with respect to some momentum space averages, etc. In fact, they must agree only with respect to the statistical momentum moments up to second order (which seems natural), giving diferent average values, in general, for all other orders. We will return to these issues later on. As we have said, up to this point we have talked about underlying stochastic processes to Quantum Mechanics, but the equations at which we arrive do not present an explicit fluctuation behavior, that is, they are not explicit Langevin equations, but only dynamical equations that seem to be the average of some underlying Langevin equation with an explicit random behavior. This is the mathematical content of the so called Bohm’s equation [see, for example, equations (3.32) or (4.18)]. In the next chapter we will find the Langevin equations for Quantum Mechanics which present in explicit mathematical form the random aspect of the theory.
6
This condition is necessary, but it is not sufficient, since we need the second axiom shown in chapter two to arrive at the Schrödinger equation.
Chapter 6
LANGEVIN EQUATIONS FOR QUANTUM MECHANICS As we have already seen, in the late 1960’s it had appeared in the literature a great number of attempts to derive the quantum mechanical formalism using notions coming from the field of stochastic processes. One of the clearest and most elaborate of such derivations was the one made by de La Peña [66] some years later; we presented this derivation in chapter four. All these derivations aimed at showing that the overall conceptual and formal frameworks of quantum mechanics could be more easily understood if assessed using stochastic constructs. In fact, all these stochastic approaches have two main common features:
They never deviate from usual classical concepts: although always considering notions such as “fluctuations”, “noises”, among others, they are based upon dynamic differential equations (stochastic ones, surely–e.g. Langevin equations) and their constructs are all objective ones; These approaches are always based upon corpuscular models: within them, undulatory effects must be considered as the outcome of the stochastic behavior (of the particles) when one considers an ensemble of physical systems or when one considers a single system within a time-window Δ𝑡, whose length is fixed by the nature of the fluctuations.
Because of this last feature, notions as “observer”, “wave-particle duality”, “reduction of the wave-packet”, etc are not usually found within such approaches. Of course, since stochastic approaches mathematically derive the Schrödinger equation from considerations about randomness, they show us that there should be some way by which the ubiquitous appearance of undulatory phenomena in the quantum framework could be reduced to corpuscular behavior. However, because of the nature of such mathematical derivations, one is never enlightened about how this is actually the case for particular physical problems, that is, which specific dynamic-stochastic equation (for corpuscular natures) furnishes, for some particular physical system, its overall undulatory behavior. Moreover, from the fact that the usual stochastic approach uses averages to arrive at the Schrödinger equation, as we have seen in chapter four, the fluctuation profile is generally lost and the equations one arrives at is generally only Bohm’s equation, which hides the true stochastic behavior of quantum systems.
148
Olavo Leopoldino da Silva Filho
In this chapter we present these dynamic stochastic equations in terms of Langevin equations of a particular nature1. Moreover, the knowledge of such equations allows us to run simulations of actual physical systems and show how the undulatory behavior (e.g. interference) can emerge from the random movement of particles2. The way by which the adequate Langevin equations can be constructed depends upon some results presented in previous chapters, mainly with respect to the characteristic function approach, as it will become clear soon. However, at least preliminarily, we may present the arguments for the aplicability of the Langevin analysis. Generally, an approach based on the Langevin equation is best suited for situations as the one depicted in Figure 6-1. In this figure we present the usual situation that leads to a random walk phenomenon: a small body A with “large” mass immersed in a colloid made of a large number of tinny bodies a with very “small” mass (compared to the “large” one). In this case, the approach based on a Langevin equation tries to understand the coarse grained movement and average properties of A by taking a mean field view of the subsystem made of the a’s.
Figure 6.1. The parallel between the usual random walk situation and the present analysis.
In the present approach we have an analogous situation. We take the isolated system composed by the particle(s) whose movement and other average properties we want to analyse and assume the field in an average perspective. The field subsystem is thus responsible for the introduction of some randomness in the particle(s)’ subsystem which is captured by the Langevin equation. This is a good moment to stress the relevance of the present chapter in the overall structure of this first part of the book – related to principles (mainly derivations of the Schrödinger equation from first principles). In the characteristic function derivation we were somewhat moving blindly; we simply had no clue as to what would justify the expansion up to second order there used. As we moved throughout the chapters, until chapter four, we 1
The results of the present chapter are an extension of the work published in L. S. F. Olavo et al., Annals of Physics 327, 1391 (2012). 2 In the CD-ROM that accompanies this book the reader will find a computer program that helps making simulations of quantum systems by varying a number of input values. The program has a number of visual outputs that will give the reader a very visualizable assessment of the present developments.
Langevin Equations for Quantum Mechanics
149
became aware of the importance of the notion of randomness and fluctuations in the overall assessment of quantum phenomena. It was only in chapter five that we finally understood why the second order expansion (present in all derivations of the Schrödinger equation) should be considered adequate. However, the Central Limit Theorem (CLT) gives us no details about the actual movement of the particles’s on phase space. It assumes general properties about the probability of each step in this movement (mainly independence) to conclude for the Gaussian form of the final distribution of the endpoints of these random movements. This chapter, as the reader will shortly see, will present the details of the movement of the particles on phase space. Moreover, the techniques to be developed in what follows make extensive use of the characteristic function and, at some point, provides the explanation (in its own terms) of why we should go only up to second order in our expansions – that is, a different assessment of the CLT. Furthermore, its connections to the stochastic approach are more than obvious. It is, thus, a very unifying approach that brings unity to the overall set of chapters of this part. The path to the Langevin equation is presented in the section 6.2, after a brief review of the characteristic function derivation. We also present the results of simulations made for four quantum systems: the one dimensional harmonic oscillator is studied in great detail and the three dimensional harmonic oscillator and the hydrogen atom are briefly addressed, the Morse potential is also considered in some detail. These simulations show that the Langevin equations presented in the text do furnish the correct quantum mechanical probability densities. This is true even for superposition of quantum mechanical states. This must be the case if we want the undulatory behavior to be an emergence of the corpuscular in a universal manner. We also stress again previous preliminary conclusions drawn about some topics in the interpretation of quantum mechanics that may be further clarified by the present approach as a preparation for the closing chapter of this book, which deals exclusively with interpretation issues.
6.1. PREVIOUS RESULTS In the characteristic function derivation (chapter two) we began with the Liouville equation and used the characteristic function 𝑖𝑝𝛿𝑞
𝑍(𝑞, 𝛿𝑞; 𝑡) = ∫ 𝐹(𝑞, 𝑝; 𝑡)exp (
ℏ
) 𝑑𝑝
(6.1)
developed up to second order in 𝛿𝑞 as a means to arrive at the Schrödinger equation. We also used the ansatz 𝑍(𝑞, 𝛿𝑞; 𝑡) = 𝜓 ∗ (𝑞 −
𝛿𝑞 𝛿𝑞 ; 𝑡) 𝜓 (𝑞 + ; 𝑡). 2 2
The strategy of developing the characteristic function only up to second order in 𝛿𝑞 was latter justified by means of the Central Limit Theorem (chapter five). In chapter three we
150
Olavo Leopoldino da Silva Filho
connected the characteristic function mathematical construct to the one related to Boltzmann’s entropy as ∂𝑆(𝑞, 𝑡) 𝛿𝑞 ℏ2 ∂2 ln𝜌(𝑞, 𝑡) 𝛿𝑞 2 𝑍(𝑞, 𝛿𝑞, 𝑡) = 𝜌(𝑞, 𝑡)exp {𝑖 − ( ) } ∂𝑞 ℏ 8 ∂𝑞2 ℏ 𝑖𝑝̅ (𝑞, 𝑡)𝛿𝑞 1 [𝛿𝑝(𝑞, 𝑡)]2 (𝛿𝑞)2 } = 𝜌(𝑞, 𝑡)exp { − ℏ 2 ℏ2
,
Where the entropy is given by 𝑆(𝑞; 𝑡) = 𝑘𝐵 ln𝜌(𝑞; 𝑡) and 𝑝̅ (𝑞; 𝑡)𝜌(𝑞; 𝑡) = ∫ 𝑝𝐹(𝑞, 𝑝; 𝑡)𝑑𝑝 = (−𝑖ℏ
∂ln𝑍 ) ∂(𝛿𝑞) 0
[𝛿𝑝(𝑞; 𝑡)]2 𝜌(𝑞; 𝑡) = ∫ [𝑝 − 𝑝̅ (𝑞; 𝑡)]2𝐹(𝑞, 𝑝; 𝑡)𝑑𝑝 = (−ℏ2
∂2 ln𝑍 ) ∂(𝛿𝑞)2 0
,
which allowed the inversion of the Fourier integral in the definition of the characteristic function to furnish the phase-space probability density function as 𝐹(𝑞, 𝑝, 𝑡) =
𝜌(𝑞,𝑡) [2𝜋[𝛿𝑝(𝑞,𝑡)]2 ]
1/2
exp {−
[𝑝−𝑝̅ (𝑞,𝑡)]2 2[𝛿𝑝(𝑞,𝑡)]2
}
(6.2)
showing that, for each configuration space point 𝑞, 𝐹(𝑞, 𝑝; 𝑡) is a Gaussian in momentum space with average momentum density 𝑝̅ (𝑞, 𝑡) and momentum variance [𝛿𝑝(𝑞, 𝑡)]2 . One important feature of this derivation is that it furnishes a “Newtonian-like” equation that can be written as ∂𝑝̅ (𝑞,𝑡) ∂𝑡
∂
𝑝̅ (𝑞,𝑡)2
= − ∂𝑞 [
2𝑚
ℏ2
+ 𝑉(𝑞) − 2𝑚𝑅(𝑞,𝑡)
∂2 𝑅(𝑞,𝑡)
∂𝐻(𝑞,𝑡)
∂𝑞 2
∂𝑞
]=−
,
(6.3)
also known as the Bohmian equation in which appears the “quantum potential” [𝛿𝑝(𝑞, 𝑡)]2 = −
ℏ2 ∂2 𝑅(𝑞, 𝑡) . 2𝑚𝑅(𝑞, 𝑡) ∂𝑞2
The quantum potential is obviously not a true potential. Instead it just represent fluctuations of the momentum for the problem (but note that, within an approach based upon Langevin equations the fluctuating part of the problem is furnished as if it was a driving force) However, it should be mentioned that equation (6.3) is not a Langevin-type equation, since the “quantum potential” does not enter into the equations as a true random force and the description related to equation (6.3) is, in fact, a deterministic one (in Bohm’s sense). However, when we put together all these results, they allow us to find the true Langevin equations for Quantum Mechanics in the way shown in the next section.
Langevin Equations for Quantum Mechanics
151
6.2. LANGEVIN STOCHASTIC ANALYSIS Take first the characteristic function approach and interpret it as a kind of sampling over the momentum subspace of the phase space. Thus, for each point 𝑞 of the configuration space, construct a fiber over which a sampling is made. The Langevin equation we are seeking for is the equation over this fiber. Since the fiber is given for each value of the coordinate 𝑞, we should, in principle, present only the Langevin equation over each fiber, that is, a Langevin equation in the variable momentum setting the coordinate as a parameter. However, in chapter five we adopted the strategy of letting 𝑞 to vary and take its values as indexing lists of 𝑝. This would mean that it is not relevant the order in which the 𝑝 lists are obtained for the various 𝑞 (in fact, small intervals in 𝑞) from the phase-space lists {(𝑞𝑖 , 𝑝𝑖 )}. Thus, in accord with this strategy, we simply let the Langevin equations simulate the usual dynamic random behavior of our physical system [including the equation for the variation in 𝑞(𝑡)], finding different 𝑝′𝑠 for different 𝑞′𝑠, since, at the end, the recollection of our listings for fixed 𝑞′𝑠 will reproduce the desired result. Thus, let us begin with a two-dimensional system (phase-space of a system with one degree of freedom) for which our proposed Langevin equations are given by 𝑑𝑝(𝑡) 𝑑𝑡 {𝑑𝑞(𝑡) 𝑑𝑡
= −𝛾𝑝(𝑡) + ϕ1 (𝑞) + √Γ22 (𝑞)𝜁(𝑡) , 1 = 𝑚 𝑝(𝑡)
(6.4)
where the second term on the right hand side of (6.4) gives the average behavior of the field subsystem, the third and first terms give, respectively, the fluctuation profile of the phenomenon and the dissipation of such fluctuations, such that ⟨𝜁(𝑡)⟩ = 0 ; ⟨𝜁(𝑡)𝜁(𝑡 ′ )⟩ = 𝛿(𝑡 − 𝑡 ′ ).
(6.5)
The first equation in (6.4) may be solved by the usual method: over the fiber defined by 𝑞 we first make the discretization of the time to find 𝑝𝑛+1 = 𝑎𝑝𝑛 + 𝜏ϕ1 (𝑞) + √𝜏Γ22 (𝑞)𝜉𝑛 ,
(6.6)
where 𝑎 = (1 − 𝛾𝜏) and we put3 𝜁(𝑡) →
1 √𝜏
𝜉𝑛 ,
such that ⟨𝜉𝑛 ⟩ = 0 ; ⟨𝜉𝑛 𝜉𝑚 ⟩ = 𝛿𝑛,𝑚 . 3
(6.7)
This is necessary because we know that 𝛿(𝑡 − 𝑡 ′ ) goes to infinity as 1⁄𝑑𝑡, such that 𝛿(𝑡 − 𝑡 ′ )𝑑𝑡 → 1. Since 𝑡 = 𝑛𝜏, 𝜏 is our equivalent of 𝑑𝑡 when 𝑛 → ∞. Thus, the average 〈𝜉𝑛 𝜉𝑚 〉 = 𝛿𝑛,𝑚 goes into 𝛿(𝑡 − 𝑡 ′ )𝑑𝑡.
152
Olavo Leopoldino da Silva Filho
Note that we have not iterated in the variable 𝑞; this is because we are searching for the momentum probability distribution for each point in the configuration space. Thus, for each such point 𝑞, 𝑝𝑛 is a random variable that we can consider using the traditional statistical methods. This is equivalent to treating the phase-space as a set of fiber bundles with width 𝛿𝑞, centered in each point 𝑞 of the configuration space, an approach with which we are now acquainted4. The stochastic variable 𝑝𝑛 defined above is being considered within each fiber bundle (see Figure 6-2).
Figure 6.2. The sampling of phase-space by momentum fibers indexed by configuration space coordinates. We also show how the sampling is done (ensemble approach): we let our particles to begin at any point of the phase space and, after some definite time, given by 𝑛𝜏 (𝑛 = 6 in this figure) we make our statistics over the fiber using the characteristic function given in (6.8).
Now, iterating (6.6) we find5 𝑝1 = 𝑎𝑝0 + 𝜏ϕ1 (𝑞) + √𝜏Γ22 (𝑞)𝜉0 𝑝2 = 𝑎2 𝑝0 + 𝜏[ϕ1 (𝑞) + 𝑎ϕ1 (𝑞)] + [√𝜏Γ22 (𝑞)𝜉1 + 𝑎√𝜏Γ22 (𝑞)𝜉0 ] . ⋮ 𝑛
𝑝𝑛+1 = 𝑎
𝑛+1
𝑛
ℓ
𝑝0 + 𝜏 ∑ 𝑎 ϕ1 (𝑞) + √𝜏 ∑ 𝑎ℓ √Γ22 (𝑞)𝜉n−ℓ ℓ=0
ℓ=0
If we put 𝑝0 = 0 for simplicity, we get 4
Of course this is the same approach as the one used in the realm of the Central Limit Theorem to derive the phase space probability density function from sums of random variables 𝑝𝑘 , keeping the 𝑞 as the indices of the lists for the 𝑝𝑘′ 𝑠. This is also the content of taking the characteristic function only in 𝑝, leaving 𝑞 fixed, as we will show again in this chapter. 5 Note that we are here assuming the strategy of fixing 𝑞 and the listing we are talking about in what follows refers to the fiber indexed by 𝑞. This does not represent the true movement of the particle on phase space, but the assumption of fixed 𝑞 affects the final result only in what respects to the way, in time, the transient distribution goes to the equillibrium one.
Langevin Equations for Quantum Mechanics
153
𝑛−1
𝑝𝑛 = ∑ 𝑤ℓ , ℓ=0
where 𝑤ℓ is the random variable 𝑤ℓ = 𝜏𝑎ℓ ϕ1 (𝑞) + √𝜏𝑎ℓ √Γ22 (𝑞)𝜉𝑛−ℓ . Thus, 𝑝𝑛 is the sum of independent random variables6. Let us now define its characteristic function as (here, the present method and the characteristic function derivation become one and the same) 𝑖𝑝𝑛 𝛿𝑞
𝑍𝑛 (𝑞, 𝛿𝑞; 𝑡) = ⟨exp (
ℏ
)⟩,
(6.8)
where we note that, because the functions Γ22 are dependent of 𝑞, we must have the same dependence for the characteristic function 𝑍𝑛 . In fact, the averaging process represented in (6.8) is explicitly given by [compare it with (6.1)] 𝑖𝑝𝑛 𝛿𝑞
𝑍𝑛 (𝑞, 𝛿𝑞; 𝑡) = ∫ exp (
ℏ
) 𝑓(𝑞, 𝑝𝑛 ; 𝑡)𝑑𝑝𝑛 ;
(6.9)
thus, since we write ⟨1⟩ as (6.9) with 𝛿𝑞 = 0, ⟨1⟩ = 𝜌(𝑞), and we are supposing that all the variables 𝑝𝑘 , 𝑞 have the same underlying joint probability distribution function (see the derivation of the Central Limit Theorem in the previous chapter). Now, (6.8) may be written as iδq
Zn (q, δq; t) = ∫ exp (
ℏ
n−1 ∑n−1 ℓ=0 wℓ ) ∏ℓ=0 f(q, wℓ ; t)dwℓ ,
which results into 𝑛−1
𝑖𝑤ℓ 𝛿𝑞 𝑍𝑛 (𝑞, 𝛿𝑞; 𝑡) = 𝜌(𝑞) ∏ ⟨exp ( )⟩ , ℏ 𝑤ℓ ℓ=0
since the 𝑤ℓ ’s are all independent random variables – note that the averages are now taken 𝑖𝑤ℓ 𝛿𝑞
with respect to 𝑤ℓ and the characteristic function ⟨exp (
ℏ
)⟩ 𝑤ℓ
is such that ⟨1⟩𝑤ℓ = 1. We
also note that (remembering that our averages are now related to 𝑤ℓ alone)
6
This is the point at which the CLT and the Langevin derivations make contact. The Langevin derivation makes it explicit which specific type of sum of independent random variables Quantum Mechanics is all about.
154
Olavo Leopoldino da Silva Filho ⟨𝑤ℓ ⟩ = 𝜏𝑎ℓ ϕ1 (𝑞) + 𝑎ℓ √𝜏Γ22 (𝑞)⟨𝜉ℓ ⟩. Using (6.7) we find ⟨𝑤ℓ ⟩ = 𝜏𝑎ℓ ϕ1 (𝑞). We also have 𝑤ℓ2 = 𝜏 2 𝑎2ℓ ϕ12 (𝑞) + 2𝑎2ℓ 𝜏 3/2 √Γ22 (𝑞)𝜉ℓ + 𝜏𝑎2ℓ Γ22 (𝑞)𝜉ℓ2
and thus ⟨𝑤ℓ2 ⟩ = 𝑎2ℓ 𝜏 2ϕ12 (𝑞) + 𝜏𝑎2ℓ Γ22 (𝑞), where we used again the results in (6.7). We may also calculate 𝑤ℓ3 = 𝜏 3 𝑎3ℓ ϕ13 (𝑞) + 3𝑎3ℓ 𝜏 5/2ϕ12 (𝑞)√Γ22 (𝑞)𝜉ℓ + 3𝜏 2 𝑎3ℓ ϕ1 (𝑞)Γ22 (𝑞)𝜉ℓ2 + 𝜏 3/2 𝑎3ℓ Γ22 (𝑞)3/2 𝜉ℓ3, and higher orders moments. The variance of the random variable 𝑤ℓ becomes ⟨𝑤ℓ2 ⟩ − ⟨𝑤ℓ ⟩2 = 𝑎2ℓ 𝜏 2 ϕ12 (𝑞) + 𝜏𝑎2ℓ Γ22 (𝑞) and thus7 𝑖𝑤ℓ 𝛿𝑞 𝛿𝑞2 𝑖⟨𝑤ℓ ⟩𝛿𝑞 ⟨exp ( )⟩ = exp {−[𝜏 2 𝑎2ℓ Φ1(𝑞) + 𝜏𝑎2ℓ Γ22 (𝑞)] 2 } exp (− ), ℏ ℏ ℏ 𝑤ℓ with Φ1 (𝑞) = ϕ12 (𝑞), and we get 𝑍𝑛 (𝑞, 𝛿𝑞; 𝜏) = 𝜌(𝑞)exp (−
𝑏𝑛 (𝑞;𝜏)𝛿𝑞 2 2ℏ2
) exp (−
𝑖⟨𝑤ℓ ⟩𝛿𝑞 ℏ
),
(6.10)
where 𝑛−1
𝑏𝑛 (𝑞; 𝜏) =
[𝜏 2
Φ1 (𝑞) + 𝜏Γ22 (𝑞)] ∑ 𝑎2ℓ ℓ=0
and
7
This is the point at which the connections between the characteristic function derivation and the entropy derivation make contact with the Langevin derivation. We are here using the second order approach explicitly for reasons that will become clear soon (we show in what follows that higher order moments vanish with 𝜏 → 0).
Langevin Equations for Quantum Mechanics
155
𝑛−1
𝑝̅𝑛 (𝑞; 𝜏) = 𝜏ϕ1 (𝑞) ∑ 𝑎ℓ ℓ=0
where 𝑝̅𝑛 is called the average momentum; note that this nomenclature is appropriate, since ϕ1 (𝑞) is a force depending upon only the random variable 𝑞 and 𝜏 is a time, and thus 𝜏ϕ1 (𝑞) has the dimension of an average momentum (an impulse), for it could be explicitly written as 𝑛−1
𝑛−1 ℓ
𝑝̅𝑛 (𝑞; 𝜏)𝜌(𝑞) = ∑ ∫ 𝜏𝑎 ϕ1 (𝑞)𝑓(𝑞, 𝑝𝑛 ; 𝑡)𝑑𝑝𝑛 = 𝜏ϕ1 (𝑞)𝜌(𝑞) ∑ 𝑎ℓ , ℓ=0
ℓ=0
and is usually called (in non-equilibrium kinetic theory) the macroscopic average momentum[39]. Expression (6.10) automatically implies, by inversion of the Fourier transform, that 𝑓(𝑞, 𝑝𝑛 ; 𝜏) =
𝜌(𝑞) √2𝜋𝑏𝑛 (𝑞; 𝜏)
exp {−
[𝑝𝑛 − 𝑝̅𝑛 (𝑞; 𝜏)]2 }, 2𝑏𝑛 (𝑞; 𝜏)
where we have already normalized the density 𝑓(𝑞, 𝑝; 𝑡). Now we may take the limit 𝜏 → 0, 𝑛 → ∞, such that 𝑛𝜏 → 𝑡, to find 𝑓(𝑞, 𝑝; 𝑡) =
𝜌(𝑞) √2𝜋𝑏(𝑞; 𝑡)
exp {−
[𝑝 − 𝑝̅ (𝑞; 𝑡)]2 }, 2𝑏(𝑞; 𝑡)
where 𝑛−1
𝑏(𝑞; 𝑡) =
lim[𝜏 2 Φ1 (𝑞) + 𝜏→0
𝜏Γ22 (𝑞)] ∑ 𝑎
𝑛−1 2ℓ
; 𝑝̅ (𝑞; 𝑡) = ϕ1 (𝑞)lim𝜏 ∑ 𝑎ℓ
ℓ=0
Note, however, that since (using 𝜏 = 𝑡/𝑛) 𝑛−1
∑ 𝑎2ℓ = ℓ=0
1 − [(1 − 𝛾𝑡/𝑛)𝑛 ]2 1 − [(1 − 𝛾𝑡/𝑛)𝑛 ]2 = 1 − (1 − 𝛾𝜏)2 2𝛾𝜏 − 𝛾𝜏 2
we get 𝑛−1
lim
𝜏→0,𝑛→∞
𝜏Γ22 (𝑞) ∑ 𝑎2ℓ = Γ22 (𝑞) ( ℓ=0
while 𝑛−1
Φ1 (𝑞)lim 𝜏 2 ∑ 𝑎2ℓ → 0. 𝜏→0
ℓ=0
1 − 𝑒 −2𝛾𝑡 ), 2𝛾
𝜏→0
ℓ=0
156
Olavo Leopoldino da Silva Filho
Thus, we may write 𝑏(𝑞; 𝑡) = Γ22 (𝑞) [
1 − 𝑒 −2𝛾𝑡 1 − 𝑒 −𝛾𝑡 ] ; 𝑝̅ (𝑞; 𝑡) = ϕ1 (𝑞) ( ) 2𝛾 𝛾
and, for large enough times (𝑡 → ∞) 𝑏(𝑞) =
Γ22 (𝑞) ϕ1 (𝑞) ; 𝑝̅ (𝑞) = . 2𝛾 𝛾
With the previous results, we find, for the asymptotic distributions, 𝑓(𝑞, 𝑝; 𝑡) =
𝜌(𝑞) √2𝜋𝑏(𝑞)
exp {−
[𝑝−𝑝̅ (𝑞;𝑡)]2 2𝑏(𝑞)
}
(6.11)
as the joint probability distribution related with the Langevin system of equations (6.4). Note that, for higher order moments, we get results such as ∑𝑛ℓ=0〈𝑤ℓ3 〉 = ϕ13 (𝑞)𝜏 3 ∑𝑛ℓ=0 𝑎3ℓ + 3𝜏 2 ϕ1 (𝑞)Γ22 (𝑞) ∑𝑛ℓ=0 𝑎3ℓ = [ϕ13 (𝑞)𝜏 3 + 1−exp(−3𝛾𝑡)
3𝜏 2 ϕ1 (𝑞)Γ22 (𝑞)] −3𝛾𝜏+3𝛾2 𝜏2 +𝛾3𝜏3, which goes to zero as 𝜏 → 0, the same happening to orders higher than three. This means that the expression for the characteristic function 𝑍(𝑞, 𝛿𝑞; 𝑡) using only up to second order in 𝛿𝑞 is not an approximation. This is the very expression of the CLT, rephrased using the Langevin equation and the justification for the characteristic function derivation. We still have to solve the second equation in (6.4) to find the probability densities 𝜌(𝑞), but since we have the unknown (up to this point) function Γ22 (𝑞) that depends upon 𝑞 and that participates in this equation because of the term 𝑝(𝑡), this cannot be done in the same straightforward manner as we have done with the variable 𝑝. In fact, this is exactly the point at which quantum mechanics enters. The result (6.11) shows that, for each point 𝑞 of the configuration space, the momentum probability distribution is of the Gaussian type—as we have already obtained from other considerations regarding the Central Limit Theorem or those of the previous section. The similarities of the present derivation of (6.11) and those of the previous chapters are obvious, but we can bring them together as a means of making explicit the function Γ22 (𝑞) and, thus, being capable of making real simulations of actual physical systems. Let us compare the result (6.11) with the one we obtained in section two for the joint probability density of any quantum system. From such a comparison it becomes quite obvious that, to simulate the quantum mechanical results with the system of equations (6.4) we must make the identification Γ22 (𝑞) = 𝛾(𝛿𝑝2 ) = −
𝛾ℏ2 ∂2 ln𝜌(𝑞) ∂𝑠(𝑞) ; 𝑝̅ (𝑞; 𝑡) = , 4𝑚 ∂𝑞2 ∂𝑞
Where both 𝜌(𝑞) = 𝑅 2 (𝑞) and 𝑠(𝑞) come from the solution of the Schrödinger equation with
Langevin Equations for Quantum Mechanics
157
𝑖𝑠(𝑞) 𝜓(𝑞) = 𝑅(𝑞)exp ( ). ℏ Because of these identifications, the first Langevin equation becomes an expression in which the mean-square deviation of 𝑝, given by
√[𝛿𝑝(𝑞, 𝑡)]2 = √−
ℏ2 ∂2 ln𝜌(𝑞) 4𝑚 ∂𝑞2
enters as a true random force, because of the 𝜉ℓ . Thus, the relation with the Bohmian “quantum potential” becomes clear: (6.3) is an equation for average values (such as 𝑝̅ (𝑞, 𝑡)) and the “quantum potential” does not appear as a true random force. In the next sections we study in great detail four concrete physical systems: the one dimensional harmonic oscillator and the morse potential, and, more briefly, the three dimensional harmonic oscillator and the hydrogen atom. We will try to show how the explicit stochastic processes associated to them may explain, in a purely corpuscular basis, all their known quantum behavior. Some brief considerations about the interpretation are left to the discussions. We also show, using a concrete example, the relations between the Langevin equations (6.4) and Bohmian equation (6.3).
6.3. THE HARMONIC OSCILLATOR The harmonic oscillator potential is given by 1 𝑉(𝑞) = 𝑚𝜔2 𝑞2 2 and the quantum mechanical probability amplitude is 𝑚𝜔 1/4
𝜓𝑛 (𝑞) = (ℏ𝜋2)
1
𝑚𝜔 2
𝑚𝜔
− 𝑞 2ℏ 𝐻𝑛 (√ 𝑛𝑒
√𝑛!2
ℏ
𝑞).
(6.12)
With this probability amplitude we can calculate the momentum variances 𝛿𝑝2 (𝑞) and the results, for the first four pure states, are given in Table 6, together with two superposition states (𝑚 = ℏ = 𝜔 = 1). In Table 7 we show the phase-space positive definite quantum mechanical distributions for the first five rows of Table 6.
158
Olavo Leopoldino da Silva Filho
Table 6. Momentum fluctuations for the first four quantum levels of the harmonic oscillator. Two superpositions are also shown: the first is a superposition of states n=0 and n=1 and the second is a superposition of states n=2 and n=4
√𝟐
𝚪𝒑 𝟐𝜸 1 2 1 𝑞2 + 1 2 𝑞2 1 4𝑞4 + 4𝑞2 + 5 2 (2𝑞2 − 1)2 1 4𝑞6 + 9𝑞2 + 9 2 𝑞2 (2𝑞2 − 3)2 1 (2√2𝑞 + 2𝑞2 + 3) 2 2 (1 + √2𝑞)
𝟏 𝟐[𝟒] + [𝟐] 𝟐
3 (−24√3𝑞2 + 32√3𝑞6 + 300𝑞4 − 128𝑞6 + 64𝑞8 + 339 + 300𝑞2 − 84√3 − 80√3𝑞4 ) 2 2 (8√3𝑞4 − 24√3𝑞2 + 6√3 + 6𝑞2 − 3)
State 𝟎 𝟏 𝟐 𝟑 [𝟏] + [𝟎]
Table 7. The phase-space probability distribution for the four quantum levels of the harmonic oscillator. One example for a superposition of the levels n=0 and n=1 are also shown 𝑭𝒏 (𝒒, 𝒑) 1 −𝑥2−𝑝2 𝑒 𝜋 2 𝑞2 2|𝑞|𝑞 −𝑞 2 − 2 𝑝2 𝑞 +1 𝑒 𝜋√𝑞2 + 1 (2𝑞 2 −1)2 2 (4𝑞2 − 2)2 −𝑞 2 − 4 𝑝 4𝑞 +4𝑞 2 +5 𝑒 4 2 8𝜋√4𝑞 + 4𝑞 + 5 2 2 −3)2 |𝑞||2𝑞2 − 3|(8𝑞2 − 12𝑞)2 −𝑞2−𝑞 (2𝑞 4𝑞 6 +9𝑞 2 +9 𝑒 48𝜋√4𝑞6 + 9𝑞2 + 9
State 𝟎 𝟏 𝟐 𝟑
√2|1 + √2𝑞|
√𝟐
2𝜋√2𝑞2 + 2√2𝑞 + 3
(𝜹𝒑𝟐 ) 1 2 3 2
5 2
5 2
7 2
7 2
2
3
𝝍𝟏 (𝒒) + 𝝍𝟎 (𝒒)
𝑬𝒏 1 2 3 2
𝑒
−𝑞 2 −
2(1+√2𝑞) 𝑝2 2𝑞 2 +2√2𝑞+3
1 1 3 ( + ) 2 2 2
---
We also show in this table the results obtained when we calculate the average energy as 𝑝2
1
𝐸𝑛 = ∫ ∫ [2𝑚 + 2 𝑚𝜔2 𝑞2] 𝐹𝑛 (𝑞, 𝑝)𝑑𝑞𝑑𝑝,
(6.13)
with 𝐹𝑛 (𝑞, 𝑝) given as in (6.11) with the results for the momentum deviations listed in Table 6. Note that they agree, as expected, with the usual quantum mechanical ones. The calculated momentum dispersions, using the expression
Langevin Equations for Quantum Mechanics
159
(𝛿𝑝2 ) = ∫ ∫ 𝑝2 𝐹𝑛 (𝑞, 𝑝)𝑑𝑝𝑑𝑞 are also equal to those obtained by usual quantum mechanical calculations (we made 𝑚 = ℏ = 𝜔 = 1).
6.3.1. Ensemble and Single System Simulations Now, with the expressions of Table 6, we can run stochastic simulations using equations (6.4) and plot the configuration space probability distribution functions. This is done as usual using the Langevin scheme, which can be divided into two cathegories:
we run 𝑁𝑡𝑟𝑎𝑗 trajectories with a time-interval 𝜏 representing the fluctuation time, a value of 𝛾 and the number 𝑛𝑝 of points in each trajectory (𝑡 = 𝑛𝑝 𝜏). Only the points at the end of the trajectories are used to make the statistics. Note that this scheme is related with ensemble simulations (see Figure 6-2). Typical values are: 𝑁𝑡𝑟𝑎𝑗 = 105, 𝜏 = 0.005 and 𝑛𝑝 = 600 are sufficient to attain stabilization of the probability density function to its stationary profile; we run only one trajectory with an initial value (𝑞, 𝑝) randomly chosen within the classical turning points, using a time interval 𝜏 representing the fluctuation time, a value of 𝛾 and the number 𝑛𝑝 of points in the trajectory (𝑡 = 𝑛𝑝 𝜏). The statistics are made along the trajectory by keeping the values of (𝑞, 𝑝) at each interaction and constructing the probability density function 𝐹(𝑞, 𝑝; 𝑡) for each instant of time 𝑡 ′ = 𝑗𝜏, 1 ≤ 𝑗 ≤ 𝑛𝑝 . Typical values are: 𝑛𝑝 = 3 × 106, 𝜏 = 0.005, which are sufficient to arrive at a stable probability density function.
In all simulations (ensemble or single system) we have adopted the strategy of choosing random initial conditions for 𝑞 within the interval defined by the classical turning points. The momentum is then chosen from the inversion of the energy equation using the actual average value of the energy for the state being considered. The choice of the 𝑞0 initial point(s) inside the classically allowed interval was done to prevent arguments against the obviously classical character of the simulations. The turning points are then chosen as 𝑞𝑡𝑟𝑛 = ±√2𝐸𝑛 , where 𝐸𝑛 is the energy of the state labelled by the quantum number 𝑛 and the momenta are chosen to be compatible with 𝐸𝑛 and 𝑞0 for the harmonic oscillator Hamiltonian. In a future section we will analyze the dependence of the simulations on the choice of the energy 𝐸𝑛 from which the initial conditions are randomly chosen.
6.3.2. Ensemble Simulations The results for the configuration and momentum space probability density functions are shown in Figure 6-3 for all levels in Table 6.
160
Olavo Leopoldino da Silva Filho
Figure 6.3. Density functions in configuration and momentum spaces for various levels of the harmonic oscillator (ensemble simulations).
In the graphics regarding the probability density function upon configuration space, the continuous line represents the theoretical result obtained by solving the Schrödinger Equation (6.12) and the symbols represent the results obtained with simulations. It is quite obvious that the results of the numeric simulations are in very good agreement with the theoretical ones. In Figure 6-3 we also present the probability densities of the two superpositions shown in Table 6. The results for the combination 𝑛 = 0 + 1 are less accurate because the distribution is asymmetrical and the tail of the greater peak introduces errors in the localization of the zero and the maximum of the second peak. This effect is clearly absent from superposition 𝑛 =
Langevin Equations for Quantum Mechanics
161
2 + 4, which is symmetric around the origin (see also the discussion about the less accurate results for the tails in what follows). Note, thus, that it is possible for a random Langevin corpuscular system to simulate (as an ensemble) the wave structure of quantum mechanical systems. Moreover, this Langevin corpuscular system can even simulate interference behaviors, being necessary only to furnish the correct fluctuation structure, as given by the expressions of Γ22 (𝑞). Together with the plots of the marginal probability density functions we may also calculate some important characteristic statistical values, such as the first two statistical moments: the mean values of 𝑞 and 𝑝 and the mean-square deviations of 𝑞 and 𝑝, after considering all the 𝑁𝑡𝑟𝑎𝑗 simulations for some specific values of 𝑁𝑡𝑟𝑎𝑗 and 𝑛𝑝 to study how these values change with these numbers (that control the simulation). However, when we try to find the values of
𝑞̅𝑛𝑝 =
𝑁𝑡𝑟𝑎𝑗
𝜎𝑥,𝑛𝑝 = [
1/2
2
𝑁
𝑁
∑𝑘 𝑡𝑟𝑎𝑗 𝑞𝑛𝑘𝑝
∑𝑘 𝑡𝑟𝑎𝑗 (𝑞𝑛𝑘𝑝 )
2
− (𝑞̅𝑛𝑝 ) ]
𝑁𝑡𝑟𝑎𝑗
and 𝑁
𝑁
𝑝̅𝑛𝑝 =
∑𝑘 𝑡𝑟𝑎𝑗 𝑝𝑛𝑘𝑝 𝑁𝑡𝑟𝑎𝑗
𝜎𝑝,𝑛𝑝 = [
∑𝑘 𝑡𝑟𝑎𝑗 (𝑝𝑛𝑘𝑝 ) 𝑁𝑡𝑟𝑎𝑗
1/2
2 2
− (𝑝̅𝑛𝑝 ) ]
,
that is, the average values directly from the simulations and not from the marginal probability densities, we find that they do not converge for any specific value, no matter the magnitude of 𝑛𝑝 and 𝑁𝑡𝑟𝑎𝑗 , except for 𝑛 = 0 (ground state). The reasons for this numerical behavior are important if we want to understand the implementation and the physics of the quantum mechanical systems (but nothing to do with some weirdness of the quantum domain). We address them now.
6.3.3. Local Divergences in the Fluctuations We could ask why the ground state of the harmonic oscillator has the statistical moments converging for stable results as we increase the values of 𝑁𝑡𝑟𝑎𝑗 and 𝑛𝑝 (lowering the value of 𝜏 to keep 𝑡 = 𝑛𝑝 𝜏 constant at a value at which we know that the stationary regime was reached), while the excited states present the previously mentioned anomaly of not converging to any value. The reason for such behavior becomes obvious when we look at Table 6 and see that, for all excited states, the stochastic structure factor Γ𝑛 (𝑞) has points of divergence. Thus, for instance, for 𝑛 = 1, the structure factor diverges for 𝑞 = 0. This divergence is connected to the fact that the probability density functions used to construct Γ𝑛 (𝑞) have zeroes, and Γ𝑛 (𝑞) was obtained from it by the derivative of a logarithm. From the physical point of view, the zeroes of the probability density function mean that the stochastic force has to become an
162
Olavo Leopoldino da Silva Filho
infinite repulsion center at such locations. However, it is quite obvious that the particle must pass through these points, since we expect it to fill the allowed interval and we are studying a one-dimensional problem (we are supposing, coherently with our classical picture, that the particle cannot disappear at some point and appear at another point in some fancy way). Thus, the zeroes of the probability density functions have to be considered as an idealization, meaning that the probability to find a particle at small intervals containing these points is extremely low compared with the probability of other intervals. When the simulations are being done with the diverging structure factors, we can see, by plotting the values of the 𝑞𝑛𝑘𝑝 , for each 𝑘, that there appear numbers as large as 𝑞𝑛𝐾𝑝 = 500 (in atomic units this means that the particle is five hundred times the Bohr radius from the harmonic force center!), which is an absurd (see Figure 6-4).
Figure 6.4. Fluctuation profiles of the variable q for the harmonic oscillator states n=1 and n=2 (ensemble simulation).
In Figure 6-5 we plot the momentum fluctuation profiles of the harmonic oscillator for the states 𝑛 = 1 and 𝑛 = 2 and it is easy to see the isolated peaks. We also plot in Figure 6-4 the fluctuation in position for these two states. We note that for some instants of time the value of the position is as big as 500 “atomic units”. The appearance of such terms, obviously, completely spoils the possible calculations of the statistical moments, since we should consider trajectories in which no huge value of 𝑞𝑛𝐾𝑝 appears and, by slight variations of 𝑁𝑡𝑟𝑎𝑗 , such trajectories can be sometimes included. Thus, since the values of the coordinates (momentum or position) can be as large as above cited (or even larger), and since their appearance is always recurrent (from the periodicity of the physical problem), the consideration of larger values of 𝑁𝑡𝑟𝑎𝑗 and/or 𝑛𝑝 will not make the values of the statistical moments converge.
Langevin Equations for Quantum Mechanics
163
Figure 6.5. Fluctuation profiles for the levels n=1 and n=2 of the harmonic oscillator. The isolated peaks show the vicinities of the regions in which the fluctuation diverges (ensemble simulation).
The fact that the marginal probability distributions does not show much disturbance because of the appearance of such large values for the coordinates can be easily explained. The counting method makes use of small boxes and, although 𝑞𝑛𝐾𝑝 may be very large recurrently, these numbers will be extremely dispersed throughout the real line, making each one of them to count as only one (or some very small integral number), when boxes are being considered. Thus, although these huge numbers may completely spoil the calculations of the statistical moments, they do not significantly alter the marginal probability densities. In fact, as we will just see, the only appreciable effect of such divergencies upon the overall appearance of the marginal probability densities is to give them tails slightly greater than the expected theoretical ones, as can be seen in Figure 6-3. One way to overcome this problem, while keeping track of the physical behavior of the system, is to insert by hand a small value 𝑐 in the expressions of Γ𝑛 (𝑞) to avoid the divergences and make them vary only up to a value proportional to 𝑐 −1 at the divergence points. Thus, for example, for 𝑛 = 1 and 𝑛 = 2 we have the expressions (positive and small 𝑐) Γ1 (𝑞) =
1 𝑞2 + 1 1 4𝑞4 + 4𝑞2 + 5 (𝑞) Γ = . 2 2 𝑞2 + 𝑐 2 (2𝑞2 − 1)2 + 𝑐
These expressions show the same behavior for points away from the vicinities of the divergencies, since the denominators become approximately the original ones, but at the divergence points lim𝑞→0 Γ1 (𝑞) = 1/2𝑐 and lim𝑞→±√2/2 Γ2 (𝑞) = 5/2𝑐. When the above criterion is applied, we may ask three important questions: (a) do the statistical moments become well-defined? (b) what is the behavior of the statistical moments as related to the values of 𝑐? (c) what is the behavior of the marginal probability densities? To answer these questions, we ran our program for 𝑛 = 1 and 𝑛 = 2 for various values of 𝑐 (no such alteration is necessary for 𝑛 = 0). The results are shown in Figure 6-6 for the 𝑛 = 1 marginal densities, and presented in table 9 for the 𝑛 = 1 statistical moments values.
164
Olavo Leopoldino da Silva Filho
It is easy to see from Table 8 that as 𝑐 → 0 the statistical moments tend to become illbehaved, since the effects of the stochastic structure factors become more severe. The probability distributions, however, are much more stable with respect to these singularities, as can be seen from Figure 6-6. The question, thus, is which value of 𝑐 one must use, since there is no prescription (besides the outcomes of the statistical moments) to use any specific value for 𝑐. If one is interested in the probability densities, this is not a striking problem, since they show, as we have seen, a somewhat insensitive behavior as related to 𝑐. Thus, the present discussion may be used to show that one may proceed with some small value of 𝑐 (typically of the order of 𝑐 = 0.01) to find the densities and then, using these densities, calculate the values of the statistical moments. Note also that the 𝑝 distributions have even smaller sensitivity to the values of 𝑐, as expected.
Figure 6.6. Configuration and momentum space probability densities for various values of the parameter c. States n=1 and n=2 are shown. Note that the momentum probability density function is very insensitive to the values of c.
The same numerical behavior may be seen to obtain for any other level. We have made calculations for 𝑛 = 2,3 and the superposition states and no departure from the 𝑛 = 1 case was found. From an analysis of Figure 6-6, we can also see that the choice of 𝑐 affects the internal peaks of the marginal distribution in 𝑞, increasing them. This was expected, since it is now more probable that the particle will pass from the highest probable regions to the internal regions in configuration space. We note, however, that the use of 𝑐 is just to prevent having to make the simulations with too small values of 𝜏 (and, thus, too high values of 𝑛𝑝 ), since this would be too time consuming. As we make 𝜏 → 0, the ideal values of constant 𝑐 approaches zero.
Langevin Equations for Quantum Mechanics
165
Table 8. For N=1 we used Np=400, Ntraj=10^5, γ=10.0 and τ=0.005. Expected values of the statistical moments are shown in parenthesis
𝐜 0.100 0.020 0.010 0.001
𝑥̅ (0.0) −0.021 −0.022 −0.022 −0.023
𝑝̅ (0.0) 0.0017 0.0029 0.0024 0.0045
𝐧=𝟏 𝜎𝑥 (1.22) 1.1211 1.1779 1.1963 1.2787
𝜎𝑝 (1.22) 1.1443 1.2675 1.3021 1.4583
𝐸̅ (1.5) 1.2834 1.4972 1.5635 1.8811
In Figure 6-7 we plot the dependence of the fluctuation structure term on 𝑞 for the sates 𝑛 = 1 and 𝑛 = 2 for some of the values of 𝑐 used in Figure 6-6. We can see that the fluctuation structure terms tend to behave quite similarly to the 𝑐 = 0 situation.
Figure 6.7. Profiles of the fluctuation structure terms for the states n=1 and n=2 of the harmonic oscillator. (a) for n=1, (---)c=0, (++) c=0.01 and (--) c=0.001; (b) for n=2, (---)c=0, (++) c=0.15, (--) c=0.05.
6.3.4. Single System Simulations So far we have made numerical simulations in the ensemble picture, as previously explained. As we have said, it is also possible to make these simulations in the single system picture. If the ergodic assumption is valid, both pictures must give the same outcomes. As an example, we have made numerical simulations using the single systems picture for 𝑛 = 1, 𝑛 = 2 and 𝑛 = 3 and the results are shown in Figure 6-8. The single system result is in excellent agreement with the result obtained for ensemble simulations for the same quantum levels, as we were expecting based upon the ergodic assumption applied to stationary physical systems.
166
Olavo Leopoldino da Silva Filho
Figure 6.8. Single system simulations for the states n=1, 2, 3 of the harmonic oscillator. The parameter values for each state are shown in the figure.
Greater values of 𝛾 are expected when using single system simulations, since in these simulations we begin counting (𝑞, 𝑝) states as long as we start our simulation and, thus, include times in which the distribution is not stationary yet. This makes the control of the tail by choosing values of 𝛾 and 𝜏 a little more complicated—one possible solution would be to begin counting states after some definite time 𝑡0 = 𝑛0 𝜏 when we expect the distribution to achieve its stationary configuration. The number of steps 𝑛𝑝 is usually much larger since now we want to simulate the results of an ensemble with some 5 × 104 particles with the sole movement of a single particle. The distributions may show an uncompensated profile (asymmetry in the heights of peaks), since depending on the value of 𝑛𝑝 , we can sum the contributions of the particle just after it stays too much time on one “side” of the configuration space—these asymmetries move from one side to another and, as we increase the value of 𝑛𝑝 , they disappear. The equivalence of the ensemble and single system approaches has important epistemological consequences, as we have already stressed in other chapters and we address them again later in this chapter.
6.3.5. Dependence of the Simulations on the Energy As we have already mentioned, all the simulations were made using the true quantum mechanical energy 𝐸𝑛 to find the classical turning points 𝑞𝑡𝑟𝑛 and construct the interval [−𝑞𝑡𝑟𝑛 , +𝑞𝑡𝑟𝑛 ] inside which the initial condition(s)8 in 𝑞 (𝑞0(𝑘) ) are randomly chosen. From (𝑘)
(𝑘)
this choice of energy and 𝑞0 the initial conditions for the momentum 𝑝0 was chosen from 2
√2𝐸𝑛 − [𝑞0(𝑘) ] = 𝑝0(𝑘) . We are now interested in analyzing to what extent our simulations depend on the choice of the energy 𝐸𝑛 as being the true energy of the state. Such a dependence would be highly 8
There are 𝑁𝑡𝑟𝑎𝑗 such initial conditions in ensemble calculations and only one in single system simulations (since 𝑁𝑡𝑟𝑎𝑗 = 1 for single systems).
Langevin Equations for Quantum Mechanics
167
undesirable, since it may appear that we are forcing the results by imposing the energy that, after all, we can only know after the simulations are performed. It is easy to show that the simulations are completely independent from the choice of 𝐸𝑛 and thus also from the choice of the initial values inside the classical turning points. In fact, we have performed these simulations for various values of 𝐸 for some states of the harmonic oscillator. In Figure 6-9 we show the behavior of the configuration space probability density function for the sate 𝑛 = 1 for 𝐸 = 5.0 (note that 𝐸1 = 1.5) for increasing values of 𝑛𝑝 —and thus, also the time, since 𝑡 = 𝑛𝑝 𝜏 and 𝜏 is kept fixed. It is easy to see that for this greater value of the energy 𝐸 the system takes much more time to converge to the exact theoretical distribution, but converges to it anyway.
Figure 6.9. Evolution of the configuration space probability density profiles for the state n=1 of the harmonic oscillator using E=5.0 as the energy for the initial conditions. The other parameters are kept fixed as γ=17, τ=0.005, c=0.0005 and Ntraj=50000.
In Figure 6-10 we show the converged configuration space probability density function for the state 𝑛 = 1 of the harmonic oscillator as the outcome of simulations for a number of (𝑘)
(𝑘)
values of the energy 𝐸 from which we choose (𝑞0 , 𝑝0 ).
168
Olavo Leopoldino da Silva Filho
Figure 6.10. Configuration space probability densities for the state n=1 of the harmonic oscillator for various values of the energy E used to calculate the random initial conditions (q 0(k),p0(k)) in ensemble simulations.
As can be seen from Figure 6-10, the values of 𝛾, 𝜏, 𝑁𝑡𝑟𝑎𝑗 and 𝑐 were kept fixed, while the value of 𝐸 from which we choose the initial conditions was varied. The only difference from the simulation in which we use 𝐸 = 𝐸𝑛 = 𝐸1 comes from the time the distribution takes to reach a stable profile, coded in the values of 𝑛𝑝 . Thus, for large values of 𝐸 the system needs more time (greater values of 𝑛𝑝 ) to converge to the true quantum mechanical distribution with exactly the same energy prescribed by quantum mechanical calculations. It is obvious that the term responsible to pull the energy back to its quantum mechanical value is 𝛾𝑝, which is the dissipative term of the Langevin equations. The increase in 𝑛𝑝 with the increase of 𝐸 simply means that it takes more time for this term to dissipate the extra amount of energy. For a physical picture of the process that gives rise to this dissipation of energy, see the section on interpretation issues in this very chapter. For an interpretation of this factor connecting it with the fluctuation-dissipation theorem, see the section after the next.
6.3.6. Connections with Bohmian Equations We can now turn to the possible relations between Bohmian trajectories and the results of our simulations. As an example, for the state 𝑛 = 1, Bohm’s equation becomes (we are now treating 𝑝̅ (𝑞, 𝑡) as simply 𝑝, as is usual in such approaches and our “quantum potential” is slighty different from Bohm’s) ∂𝑝 ∂ 𝑝2 ℏ2 ∂2 ln𝜌(𝑞, 𝑡) =− [ + 𝑉(𝑞) − ] ∂𝑡 ∂𝑞 2𝑚 8𝑚 ∂𝑞2 . ∂ 𝑝2 1 1 𝑞2 + 1 ∂𝐻(𝑞, 𝑝) 2 2 =− [ + 𝑚𝜔 𝑞 + ]=− ∂𝑞 2𝑚 2 4 𝑞2 ∂𝑞 In phase space our trajectories will be given by 𝐻(𝑞, 𝑝) = 𝐸, where 𝐸 is a constant. These phase-space trajectories for the states 𝑛 = 1 and 2 are shown in Figure 6-11(a). In Figure 6-11(b) we show the contours of the phase-space distribution (6.2) for this state and in
Langevin Equations for Quantum Mechanics
169
Figure 6-11(c) we show the results of our simulations for a small number of points (3000 points). It seems clear that the Bohmian trajectories represent an averaging process over the fluctuation process in phase-space—the round appearance of the Bohmian trajectories are disturbed by the momentum fluctuations (note that this happens in the direction of the momentum axis). The simulations were performed using our Langevin equations and make explicit the truly random nature of the “quantum potential”—something absent from Bohm’s approach.
Figure 6.11. (a) Bohmian trajectories, (b) contours of the phase space probability densities and (c) simulation points for the sates n=1 and 2.
So far the developments made here were all in agreement with the usual formalism of Quantum Mechanics, since we recovered the configuration space quantum distributions for all states, their superpositions, etc. However, when we turn to the momentum space marginal distribution there appears an important difference that we now investigate. We plotted in Figure 6-3 the momentum space probability distribution for the various levels of the harmonic oscillator coming from our simulations (level 𝑛 = 0 gives the usual Gaussian distribution as expected from the Langevin equations and does not differ from the usual quantum mechanical results). The continuous line was obtained by integration of the expression (6.2), while the symbols came from the simulation. There are two important features of these distributions: they are not Gaussian (even in their tail) and they are, except for 𝑛 = 0, completely different from the one calculated by the usual quantum mechanical procedure:
170
Olavo Leopoldino da Silva Filho
Criterion: take the Fourier transform of the probability amplitude 𝜓(𝑞) to find 𝜙(𝑝) and form the product 𝛱(𝑝) = 𝜙 ∗ (𝑝)𝜙(𝑝). It is important to stress that, although the joint probability distribution is Gaussian in 𝑝 at each value of 𝑞, its marginal distribution Π(𝑝) does not need to be neither Gaussian (and will not be, except for 𝑛 = 0) nor have Gaussian tails. Note from Figure 6-11 that, for any state of the harmonic oscillator, the Bohmian trajectories do predict that one can find particles with 𝑝 = 0 (or, strictly, trajectories with nonzero probability of finding 𝑝 in the vicinities of zero); in fact, from the Bohmian equation or our Langevin equation for this problem, it is easy to see that, since the system is periodic and one-dimensional, there must be dynamic periodic situations in which 𝑝 = 0 (or within its vicinities). At this point, we remember the reader that Bohm’s approach, at the end, must reproduce the usual results of Quantum Mechanics, since the formalisms are mathematically equivalent. However, when we use the common procedure to find the momentum distribution of the 𝑛 = 1 state, as shown in Figure 6-11, we find 𝜙(𝑝) [or 𝜋𝑄 (𝑝)] with the same aspect of 𝜓(𝑥) [or 𝜌𝑄 (𝑥)], that is, with zero probability of finding the system with 𝑝 = 0 (with similar results for any state represented by an odd quantum number). This seems to be in contradiction not only with our present results, but also (and most important) with the Bohmian reconstruction9. Strangely enough, the states with even quantum number do present non-zero probabilities for finding the system with 𝑝 = 0, although the profiles of the momentum distributions are also completely different from the ones obtained by our simulations and the ones that seem to be implied by the Bohmian trajectories. Note that the behavior of the Bohmian trajectories in the vicinities of 𝑝 = 0 are quite similar for 𝑛 = 1, 2, … and it seems quite unjustifiable why the above mentioned difference between even and odd states should appear. Since the Bohmian reconstruction is mathematically equivalent to the usual Quantum Mechanical approach based upon the Schrödinger equation, this sort of difference can be considered a criticism of the internal consistency of Quantum Mechanics, as viewed from the perspective of its reconstruction by Bohm. The use of the Schrödinger equation, considered from the Bohmian point of view, and the usual procedure for finding the momentum marginal probability distribution are not consistent with each other. This argument should also reverberate on our appraisal of the Wigner-Moyal distribution, not only because of its negative values at some regions of the phase-space, but also because it gives precisely the same 𝜋𝑄 (𝑝) given by the usual quantum mechanical calculation procedure above mentioned. We will return to the comparison of the present approach with the WignerMoyal one in a future chapter about operator formation in Quantum Mechanics.
9
Obviously, we cannot use the argument (acceptable for the configuration space) that the particles pass with so high a velocity in the vicinities of 𝑝 = 0 that the probability of finding them at this point should be close to zero. Indeed, for the vicinities of configuration space points in which the probability density is close to zero, this is exactly what happens, as can be easily seen from the figures.
Langevin Equations for Quantum Mechanics
171
6.3.7. The Fluctuation-Dissipation Theorem in Quantum Mechanics In this section we will state a version of the Fluctuation-Dissipation Theorem (FDT) to make explicit its functional dependence with the quantum level 𝑛. This should be considered a quantized version of the FDT and shows how the present approach, because of its dynamic formulation, may help assessing many features of physical phenomena not covered by other formulations. We will assume that the stochastic force 𝜇𝑛 (𝑡) has the following properties: 1. The stochastic process 𝜇𝑛 (𝑡) is stationary; 2. It has an infinitesimal correlation time such that the self-correlation function 𝐶𝜇𝑛 (𝑡), for each quantum level 𝑛, can be written as 𝐶𝜇𝑛 (𝑡) = ⟨𝜇𝑛 (0)𝜇𝑛 (𝑡)⟩ = 𝜂𝑛 𝛿(𝑡),
(6.14)
where 𝜂𝑛 is a proportionality constant depending upon 𝑛. In the usual processes we have 𝜂𝑛 = 2𝛾𝑚𝐾𝐵 𝑇. We are assuming here that ⟨𝜇𝑛 (0)𝜇𝑛 (𝑡)⟩ = ⟨𝜇𝑛 (𝑡)𝜇𝑛 (0)⟩; 3. The stochastic movement of the particles is due to some fluctuation coming from the thermal equilibrium with the “bath” in which they are moving and ⟨𝑝𝑛 (0)𝜇𝑛 (𝑡)⟩ = 0, where 𝑝𝑛 (𝑡) is given by equation (6.6). From the ergodic assumption we can verify how the FDT works for the quantum harmonic oscillator from the Langevin equations submitted to some white noise [with some structure defined by the term Γ𝑛 (𝑞)] corresponding to the Markovian stationary process. However, to verify the applicability of the ergodic assumption, we will first analyze the variance 𝜎𝑛2 (Δ𝑡) for time intervals Δ𝑡 = 𝑡2 − 𝑡1 between the stochastic forces 𝜇𝑛 (𝑡2) and 𝜇𝑛 (𝑡1 ), for any 𝑡2 and 𝑡1, such that 𝑡2 > 𝑡1. Note that 𝜎𝑛2 (Δ𝑡) = ⟨𝜇𝑛2 (Δ𝑡)⟩, since ⟨𝜇𝑛 (Δ𝑡)⟩ = 0. A constant variance implies that the temporal average made over a single system is identical to an average made over ensembles. Moreover, a constant value for the variance implies also the thermodynamic equilibrium of the system, which is a necessary condition to derive the FDT from a stochastic force in only one experiment. Thus, let us use the following definition 𝑁
𝜎𝑛2 (Δ𝑡)
1 = ∑ 𝜇𝑛 (𝑖)𝜇𝑛 (𝑖). 𝑁 − Δ𝑡 𝑖=Δ𝑡
With these expressions, we found the behaviors for various quantum states of the harmonic oscillator, as shown in Figure 6-12, each one obtained by numeric simulation. In this figure we used the parameters’ values: 𝛾 = 30, 𝜏 = 0.003, 𝑛𝑝 = 1000 and 𝑁𝑡𝑟𝑎𝑗 = 105.
172
Olavo Leopoldino da Silva Filho
For 𝑛 = 2 the variance fluctuates for greater values of Δ𝑡, after being constant for smaller values—which may be connected with a poor stochastic integration of the equation. In any case, it would be awkward to have all the results giving constant values except for 𝑛 = 2.
Figure 6.12. The variance for various quantum mechanical states of the harmonic oscillator.
In general, we can find the correlation between the random forces from the expression 𝑁𝑖
𝑁𝑗
𝑖
𝑗
1 𝐶𝜇𝑛 (Δ𝑡) = ∑ ∑ 𝜇𝑛 (𝑖)𝜇𝑛 (𝑖 + 𝑗), 𝑁𝑖 𝑁𝑗 which corresponds to the time average over a single system. The evaluation of the previous function from the simulations takes us to the graphics shown in Figure 6-13. The parameters’ values are the same of Figure 6-12. Note that the behavior of 𝐶𝜇𝑛 (Δ𝑡) is characteristic of a Dirac delta function 𝛿(𝑡). In fact, this result was already expected, from the fact that we used only a (structured) white noise. What is noteworthy is that this behavior is quantized: for each value of the quantum number 𝑛 we have a different value for the FDT. In the internal plot in Figure 6-13 we can describe the relation between 𝐶𝜇𝑛 (0) and 𝑛 as linear. We also test the relation between 𝐶𝜇1 (0) and 𝛾 for 𝑛 = 1 and 𝐶𝜇4 (0) and 𝜏 for 𝑛 = 4. The results are shown in Figure 6-14 (the parameters have the same values used in Figure 6-12). Note that 𝐶𝜇1 (0) and 𝛾 have a linear behavior, as predicted by the usual classical FDT. We also find the same result for 𝐶𝜇4 (0) and 𝜏, which is a reasonable behavior, given its normal diffusion behavior (within a finite region of space), implying that the system responds linearly.
Langevin Equations for Quantum Mechanics
173
Figure 6.13. Correlation profile for the simulations of the previous figure.
Figure 6.14. Calculation of γ from the stochastic simulations.
From all these comparisons, we can say that the FDT is given by 1
𝐶𝜇𝑛 (Δ𝑡) = 2𝛾𝑐𝜏 (𝑛 + 2) 𝛿(Δ𝑡),
(6.15)
174
Olavo Leopoldino da Silva Filho
where 𝑐𝜏 is a constant with mass units multiplied by the unit of energy, such that there is a direct relation with 𝜏. In the present simulation we found by fitting that 𝑐𝜏 = 𝜏. The results coming from equation (6.15) for the FDT of the Harmonic Oscillator, with 𝑚 = ℏ = 𝜔 = 1, when we substitute the values of 𝑛, 𝛾 and 𝜏 used in the simulations are in complete agreement with the values coming from the simulations. Moreover, since the present approach allows us to define a local temperature as 𝑇𝑛 (𝑥) = −
ℏ2 ∂2 ln𝜌𝑛 (𝑥) , 4𝑚𝐾𝐵 ∂𝑥 2
we can calculate the value of the global or average temperature (for each 𝑛), as 𝐾𝐵 𝑇𝑛 = −
ℏ2 +∞ ∂2 ln𝜌𝑛 (𝑥) ∫ 𝜌𝑛 (𝑥) 𝑑𝑥, 4𝑚 −∞ ∂𝑥 2
which can be seen, by direct calculation, to give 1 ℏ𝜔 𝐾𝐵 𝑇𝑛 = (𝑛 + ) 2 𝑚 1
or, in atomic units, just 𝑚𝐾𝐵 𝑇𝑛 = (𝑛 + ). This allows us to write 2 𝐶𝜇𝑛 (Δ𝑡) = 2(𝛾𝑚𝐾𝐵 𝑇𝑛 )𝜏𝛿(Δ𝑡), precisely as in the usual (that is “classical”) case. The same analysis can be extended to any other quantum system.
6.4. THE MORSE OSCILLATOR It has been said in the literature[85] that the formalism of this book would be appropriate only to problems involving second order potentials, such as the harmonic oscillator. In the previous chapter we have proved that this is not the case, since we have applied the formalism to problems such as the hydrogen atom and obtained the same accurate results. The three dimensional hydrogen atom problem will be addressed again in a later section to show using simulations that the present formalism applies equally well to any quantum mechanical problem. Nevertheless, it would be instructive to address another one-dimensional problem, characterized by a potential quite different from the harmonic oscillator one, to see if the quantum mechanical results are again reproduced. The problem we develop in this section is the one related to the Morse potential [89], given by 𝑉(𝑞) =
𝐷 −𝛽(𝑞−𝑞 ) 2 0 − 1] , [𝑒 2
Langevin Equations for Quantum Mechanics
175
which is widely used to approximately represent intermolecular potentials in, for example, diatomic molecules. The Schrödinger equation is given by −
1 𝑑 2 𝜓(𝑞) 𝐷 −𝑞 + [𝑒 − 1]2 𝜓(𝑞) = 𝐸𝜓(𝑞), 2 𝑑𝑞2 2
where we have already put 𝑚 = ℏ = 𝛽 = 1, 𝑞0 = 0. If we use 1/2
𝑦 = 𝑒 −𝑞 , 𝜓 = (2√𝐷𝑦)
𝑔(𝑦)
(6.16)
we find 1 𝑦 2 𝑔′′ (𝑦) + 2𝑦𝑔′ (𝑦) + [(2𝐸 + − 𝐷) + 2𝐷𝑦 − 𝐷𝑦 2 ] 𝑔(𝑦) = 0 4 and with 𝑥 = 2√𝐷𝑦 the last equation becomes 1 𝑥2 𝑥 2 𝑔′′ (𝑥) + 2𝑥𝑔′ (𝑥) + [(2𝐸 + − 𝐷) + √𝐷𝑥 − ] 𝑔(𝑥) = 0, 4 4 which is equivalent to the Laguerre polynomial equation if we put 𝑔(𝑥) = 𝑒 −𝑥/2 𝑥 (𝑚−1)/2 𝐿𝑚 𝑛 (𝑥), as long as 𝐸=
𝐷 𝑚2 𝑚−1 − , 𝑛 = √𝐷 + , 2 8 2
with 𝑛 ≥ 𝑚 and both natural numbers. Thus, our solution becomes 𝜓(𝑦) = 𝑒 −√𝐷𝑦 (2√𝐷𝑦)
𝑛−√𝐷+1/2 2𝑛+1−2√𝐷 𝐿𝑛 (2√𝐷𝑦),
with 𝑦 given as in (6.16). If we put 𝐷 = 16, we may find four states for the Morse oscillator, represented by the quantum numbers 𝑛 = 4, 5, 6, 7. These states are represented in Table 9 with the energy and momentum fluctuations shown. The energy of the states are represented in Figure 6-15, compared to the potential energy of the oscillator. The results of the simulations for the probability densities are shown in Figure 6-16 for all four levels. As the reader can see, the simulations’ results agree fairly well with the theoretical ones. As we approach the top of the Morse potential, it becomes somewhat harder
176
Olavo Leopoldino da Silva Filho
to describe with the same degree of accuracy the density function tail, but we still get quite good results.
Figure 6.15. Quantum states for the Morse oscillator with D=16. The structure of the Morse potential is also shown.
Figure 6.16. Probability density functions in configuration space for four levels of the Morse oscillator (n, m)=(7,7), (6,7), (5,3) and (4,1), with D=16. The parameters values are also shown.
Langevin Equations for Quantum Mechanics
177
These results show that the criticism[85] that the present formalism would be able to capture only “second order” potentials is completely inappropriate. In a future section we also present the results for the hydrogen atom, which is not a “second order” potential too. Table 9. Values of the energies, expression of the momentum fluctuations for the Morse oscillator with D=16 and y=exp(-q) State 𝒏
Energy
𝜹𝒑𝟐 (𝒒) 9 − 126𝑦 + 793𝑦 − 3072𝑦 5 − 2400𝑦 3 + 3840𝑦 4 + 1024𝑦 6 8𝑦 (64𝑦 3 − 96𝑦 2 + 36𝑦 − 3)2 25 − 320𝑦 3 + 128𝑦 4 + 320𝑦 2 − 140𝑦 4𝑦 (16𝑦 2 − 20𝑦 + 5)2 3 − 6𝑦 + 4𝑦 2 8𝑦 (4𝑦 − 3)2 2𝑦 2
4
7.875
5
6.875
6
4.875
7
1.875
We have already proved in a previous chapters that this criticism is inadequate: we have analytically calculated the energies and mean square deviations using the phase-space distribution for problems such as the hydrogen atom; we have also shown the connections between this formalism and the Central Limit Theorem that explains why we can discard terms in 𝛿𝑥 of order greater than 2 without assuming any approximation whatsoever. In this chapter we show, using simulations, that the present formalism applies to any quantum mechanical system.
6.5. THE 3D-HARMONIC OSCILLATOR One may be tempted to explain the need to include the constant 𝑐 in the expression of the fluctuating force as the expression of some “non-classical” nature. This would be misleading and the example of the 3D harmonic oscillator may show this very clearly. Indeed, if we solve the isotropic 3D harmonic oscillator problem, the non normalized ground state solution would be given by 2
𝜌(𝑟) = 𝑒 −𝑟 = 𝑒 −(𝑥
2 +𝑦 2 +𝑧 2 )
in spherical and Cartesian coordinates and the fluctuation force would be given by √3𝛾/2 (we are making ℏ = 𝜔 = 𝑚 = 1). Thus, if we look at the problem in Cartesian coordinates, we get simply three separate problems for the ground state of 1D harmonic oscillators that do not have singular points, neither in the potentials, nor in the fluctuations (see the 1D harmonic oscillator example before). The simulations do not need any constant 𝑐 to produce the exact distributions. However, if we look at the problem using spherical coordinates, the stochastic equations become (𝑓 = −𝑟𝑛 is the force)
178
Olavo Leopoldino da Silva Filho 2 𝑝𝜃,𝑛
𝑝𝑟,𝑛+1 = 𝑎𝑝𝑟,𝑛 + 𝜏 [(𝑟 𝑝𝜃,𝑛+1 = 𝑎𝑝𝜃,𝑛 + 𝜏
+𝑐)3
2 𝑝ϕ,𝑛 3 2 𝑛 +𝑐) (|sin𝜃𝑛 |+𝑐)
+ (𝑟
𝑛 2 𝑝ϕ,𝑛 cos𝜃𝑛sin𝜃𝑛 (𝑟𝑛 +𝑐)2 (|sin𝜃𝑛 |+𝑐)4
] − 𝜏𝑟𝑛 + √3𝛾𝜏𝜉𝑛
+ √3𝛾𝜏𝜉𝑛
,
(6.17)
𝑝ϕ,𝑛+1 = 𝑎𝑝ϕ,𝑛 + √3𝛾𝜏𝜉𝑛 for the momenta and 𝑟𝑛+1 = 𝑟𝑛 + 𝜏𝑝𝑟,𝑛 𝑝𝜃,𝑛 𝜃𝑛+1 = 𝜃𝑛 + 𝜏 2 𝑟𝑛 + 𝑐 ϕ𝑛+1
𝑝ϕ,𝑛 = ϕ𝑛 + 𝜏 2 2 𝑟𝑛 sin (𝜃𝑛 ) + 𝑐
,
for the positions (we have already introduced the constant 𝑐). Note that, different from its Cartesian counterpart, these equations do have singular points introduced solely by the change of coordinate system (thus, with no relation to the “quantum” or “classical” nature of the problem).
Figure 6.17. Results of the simulations of the three-dimensional harmonic oscillator in the ground state. On the left is the configuration space probability distribution for some non-zero value of the parameter c. On the right is the same distribution for c=0. The effect of the centrifugal terms is evident.
Although the simulation of the problem in Cartesian coordinates doesn’t need the use of any constant 𝑐, the simulation in spherical coordinates does need this constant, because of the singular points in the centrifugal forces. We have made simulations of this system in both coordinate systems for many states (ground and excited) and the results for the ground state (in spherical coordinates) with and without the constant 𝑐 are shown in Figure 6-17. Note that for 𝑐 = 0.0 we find a probability distribution that is pushed from 𝑟 = 0.0 precisely because of the high centrifugal forces; nonzero values of 𝑐 avoids the singular points for these forces and the probability distribution coming from the simulation is in excellent agreement with the theoretical one. We also show in Figure 6-18 the result for the 𝑛 = 2, ℓ = 0, 𝑚 = 0 excited state already with the use of the constant 𝑐. For excited states there appears singular points not only in the centrifugal, but also in the fluctuating forces—while the infinities in the centrifugal forces
Langevin Equations for Quantum Mechanics
179
control the behavior of the probability distribution at 𝑟 = 0.0, the infinities in the fluctuating forces control the tail of the distribution (precisely as with the one dimensional harmonic oscillator).
Figure 6.18. The probability distribution profile for the (2, 0, 0) state of the three dimensional harmonic oscillator.
Thus, it is necessary to introduce another smearing out constant 𝑐 to avoid this other source of infinities. When this is done, the results are in very good agreement with the theoretical ones. The other results are very much alike those obtained for the one dimensional harmonic oscillator and will not be presented here. The importance of this discussion rests only in the avoidance of the usual maneuver to interpret every unusual behavior of quantum systems as the outcome of some “non-classical” nature.
6.6. THE HYDROGEN ATOM For the hydrogen atom we have the densities given by 𝜌𝑛ℓ𝑚 (𝑥) = 𝑅𝑛ℓ (𝑟)2|𝑌ℓ𝑚 (𝜃, ϕ)|2 , where 2 3/2 2𝑟 ℓ 2𝑟 𝑅𝑛ℓ (𝑟) = − ( ) 𝑒 −𝑟/𝑛 ( ) 𝐿2ℓ+1 𝑛+ℓ ( ) 𝑛 𝑛 𝑛 with 𝐿𝑠𝑘 (𝑥) the associated Laguerre functions and where 1/2
2ℓ + 1 (ℓ + |𝑚|)! 𝑌ℓ𝑚 (𝜃, ϕ) = [ ] 4𝜋 (ℓ − |𝑚|)!
𝑃ℓ𝑚 (𝜃)𝑒 𝑖𝑚ϕ
are the usual spherical harmonics with 𝑃ℓ𝑚 (𝜃) the Legendre polynomials.
180
Olavo Leopoldino da Silva Filho
The stochastic equations are the same as those of the 3D harmonic oscillator, with the force term changed to 𝑓 = −1/𝑟 2 and the fluctuation term changed for those constructed from 𝜌𝑛ℓ𝑚 (𝑟) of the present section. We have made only the simulations of the radial levels (ℓ = 𝑚 = 0), since it is much easier to handle the graphics, while being enough to make our point. The results are shown in Figure 6-19.
Figure 6.19. Radial probability densities in configuration space for the states (1, 0, 0) and (2, 0, 0) of the hydrogen atom. The parameters’ values are also shown.
The simulations for the hydrogen atom turned out to be much more difficult, for we needed to use very small values of 𝜏 and the system needs a greater time window to approach its equilibrium distribution; we needed up to 𝑛𝑝 = 104 points in each trajectory to converge the distribution and 𝑁𝑡𝑟𝑎𝑗 = 3 × 104 trajectories to compose the ensemble. It is obvious from Figure 6-19 that the same phenomenon related to centrifugal forces is present, and this was the reason to use the smearing out constant 𝑐. For the hydrogen atom the same constant 𝑐 used in the centrifugal forces (6.17) is also used in the force field 𝑓(𝑟) = −1/(𝑟 2 + 𝑐) for the sake of consistency, since the force also presents a divergence point. In fact, the use of the constant 𝑐 in the hydrogen atom problem is even more justified, since the form of the potential, forces and random structures are such that the inclusion of 𝑐 usually does not introduce any perceptible difference, as the reader may see from Figure 6-20, in which we show the fluctuation profiles of the states (1, 0, 0) and (2, 0, 0) as compared with their profiles with 𝑐 = 0.005. It is interesting to note that for the hydrogen atom it is also possible to remove (to a large extent) the problem of infinities, both in the centrifugal and the newtonian force by just changing the coordinate system. In fact, if one uses semi-parabolic coordinates, for instance, one has the hydrogen equations turned into coupled harmonic oscillator ones and we remove many singularities or at least attenuate them—we also attenuate the singularities in the random structure factor.
6.7. THE “CLASSICAL” LIMIT To correctly understand the relations between Quantum Mechanics and Newtonian Mechanics, it is interesting to take a concrete example and keep all the relevant constants (𝑚, ℏ, 𝜔, etc.) in the calculations.
Langevin Equations for Quantum Mechanics
181
Figure 6.20. Fluctuation profiles for two states: (1, 0, 0) and (2, 0, 0). The line refers to the exact values (c=0), while the crosses refer to values for (c=0.005).
In this section we use the harmonic oscillator system as our concrete example, since it is easier and faster to make its numerical simulations. Using the probability amplitude of the problem, given in (6.12), we would have the fluctuations given by the expressions shown in Table 10. Table 10. Kinetic energy fluctuations for various states of the one-dimensional harmonic oscillator 𝒏
𝟎 𝑚𝜔ℏ 2
𝛿𝑝(𝑞)2
𝟏 ℏ2
𝑚𝜔ℏ + 2 2 2𝑞
𝟐 𝑚𝜔ℏ 2𝑚𝜔ℏ2 (2𝑚𝜔𝑞2 + ℏ) + (2𝑚𝜔𝑞2 − ℏ2 )2 2
For example, the Langevin system of equations for the sate 𝑛 = 1 are given by 𝑑𝑝(𝑡)
{
𝑑𝑡
𝑑𝑞(𝑡) 𝑑𝑡
𝑚𝜔ℏ
= −𝛾𝑝(𝑡) − 𝑚𝜔𝑞 + √
2
+
ℏ2 2𝑞 2
1
√2𝛾𝜁 (𝑡) ,
(6.18)
= 𝑚 𝑝(𝑡) 1
with an energy given by 𝐸𝑛 = (𝑛 + 2) ℏ𝜔. There are two different ways of looking for a limit:
We may keep the energy fixed and vary the mass of the particle. This would imply that we are looking initially to, say, an electron in the presence of a harmonic field 1
with energy 𝐸𝑛 of the order of (𝑛 + ) ℏ𝜔 and we continuously pass to the physical 2 situation in which one has a bowling ball with a huge mass acted by a harmonic field
182
Olavo Leopoldino da Silva Filho
with this same energy. Of course, we then expect that our configuration space probability density tends to a Dirac delta function; We may keep the energy fixed together with the mass of the particle and put 𝛾 → 0. This approach means that we are removing the effect of the fluctuations (and the underlying dissipation mechanism involved in the fluctuation-dissipation theorem). In this case, the system becomes a Newtonian one and this should be considered the Newtonian Limit;
We may now assume that our system has a very high energy and compare the probability density function of this system with the one calculated with 𝛾 = 0 (the Newtonian one).
Case 1 In the first case, it would be preferable to write the Langevin equations (6.18) in terms of the velocity, rather than the momentum. Since the energy is being kept fixed, we may still use ℏ = 𝜔 = 1, and we end with the equations (𝑛 = 1) 𝑑𝑣(𝑡) 1 1 = −𝛾𝑣(𝑡) − 𝑞 + √ + √2𝛾𝜁 (𝑡) 𝑑𝑡 2𝑚 2𝑚2 𝑞2 , 𝑑𝑞(𝑡) { 𝑑𝑡 = 𝑣(𝑡) and it is quite clear that the limit 𝑚 → ∞ will turn our equation into a dynamical one without the fluctuation term. From the very expression of the probability density function, one must expect that the limit 𝑚 → ∞ will concentrate the distribution around the origin, which implies simply that the corpuscle is at rest. Note, in passing, that this process is continuous, as we increase continuously the mass 𝑚. This is of some relevance to the interpretation of the quantum mechanical formalism, since we know that for high values of 𝑚 there is no question about the corpuscular character of the system. In Figure 6-21 we show the behavior of the configuration space probability densities for the 𝑛 = 0,1 states of the harmonic oscillator. As the masses increase, the particle still fluctuates and, for the excited states, it still fluctuates with the same structure profile, obviously. However, in these cases, the interval within which the fluctuations take place is so small that it gets simply imperceptible (unobservable). This strategy is, of course, an inversion of the usual statement saying that, as we increase the value of the mass, for instance, the effects of the observer action on the phenomenon become less visible. The usual perspective assumes the dual character of the particles and uses notions about the influence of the observer to solve the conundrum put forward by the continuous variation of the mass. It goes from the assumed quantum level to the classical one. In the present analysis we assume the obvious corpuscular character of the particles for greater masses and go to the quantum level by noting that the intrinsic fluctuations play a most prominent role as we decrease the value of the mass. In fact, since no mathematical variable related to the observer appears in the mathematical formulae, it seems rather arbitrary and inappropriate to postulate the effect of an observer to solve the problem introduced by the
Langevin Equations for Quantum Mechanics
183
continuous variation of the mass. This introduction of the observer is clearly of the type of a Deus ex machina, so problematic for scientific theories.
Figure 6.21. The behavior of the probability density functions in configuration space for the states n=0, 1 for three different values of the masses. The particle tend to be increasingly confined to the position q=0, since the energy is kept constant and the mass is increased.
In the present analysis the fluctuations are intrinsic to the problem, but play different relative roles in connection to the values of the other physical parameters (mass, in the present case). Notice that it is meaningless to put ℏ → 0 in this approach, since this would mean that the energy 𝐸𝑛 would also approach zero, thus vanishing the physical problem altogether.
Case 2 For this case, the Langevin equations become, in the limit in which 𝛾 = 0 𝑑𝑣(𝑡) = −𝜔2 𝑞 𝑑𝑡 { , 𝑑𝑞(𝑡) = 𝑣(𝑡) 𝑑𝑡 which are the usual Newton equations. This case, thus, corresponds to the Newtonian Limit of the formalism and Quantum and Newtonian frameworks are thus obviously distinguished by the fact that 𝛾 ≠ 0 in the former, while 𝛾 = 0 in the latter. As it is amply known and was proved again in this chapter, the constant 𝛾 is related to the fluctuation-dissipation theorem and to put 𝛾 = 0 is to imply that the system has no internal temperature (or intrinsic fluctuations). It is important to notice that to make 𝛾 = 0 is not the
184
Olavo Leopoldino da Silva Filho
same as making ℏ = 0, since this later strategy would leave the term −𝛾𝑝 in the equations and this term would dissipate all the energy of the Newtonian dynamical system, making the probability distribution to approach a Dirac delta function around the origin. What seems to be interesting in this case is the obvious possibility of making 𝛾 as small as we want to see the continuous change of a system from a Newtonian one into a Quantum one, as 𝛾 grows. We have made such an exercise and found the results for 𝛾 = 0 shown in Figure 6-22 for the harmonic and Morse oscillators.
Figure 6.22. The phase-space contours for different values of the energy for the harmonic oscillator problem (upper left) and the Morse oscillator problem (lower left). The `probability density’ related to these movements are shown in the upper right and lower right positions, respectively.
Now we can increase the value of 𝛾 to find the behavior shown in (for the harmonic oscillator only—the Morse oscillator produces similar results). In the last picture in Figure 6-23 we have used less time than it would be necessary to converge the distribution. If we have used the necessary time, we would find the profile shown in Figure 6-24, which reproduces the usual quantum mechanical result.
Langevin Equations for Quantum Mechanics
185
Figure 6.23. The filling of the phase space for the n=0 state of the harmonic oscillator for various values of γ. As γ increases, the fluctuations tend to fill in non-homogeneous way the phase-space, thus giving the final probability density function.
Thus, we can easily see the continuous process of a Newtonian system changing into a quantum system. This should suffice to show as preposterous the usual tendency to postulate an abyss between Newtonian Mechanics and Quantum Mechanics, from the point of view of their interpretation.
Case 3 Case three seems to be the one we usually think when talking about “classical limit” in connection to the Correspondence Principle. It says that for large quantum numbers the quantum system would behave in the average as the Newtonian one. The usual picture of this process is represented for the harmonic oscillator case, with 𝑛 = 20, in Figure 6-25.
186
Olavo Leopoldino da Silva Filho
Figure 6.24. Phase space filling for a single system simulation of the harmonic oscillator problem (n=0) for γ=1 and 40.000 point in the trajectory. Red and blue regions mean greater or smaller probability density, respectively.
Figure 6.25. Newtonian limit based upon the ideas of the Correspondence Principle.
In anyone of the previous cases, according to the present approach, the name Classical limit is misleading. The word “classical” refers to the semantic level: some theory can be termed non classical if it uses constructs that are incompatible with the constructs used in classical theories. Newtonian mechanics is but one example of a classical theory. Electromagnetic theory, relativistic mechanics, fluid theory, thermodynamics, statistical mechanics for systems presenting fluctuations are other obvious examples.
Langevin Equations for Quantum Mechanics
187
In fact, we are interpreting our results using only the usual classical constructs and the statistical notion of fluctuations—surely, fluctuation is merely a statistical notion, compatible in principle with classical, quantum or whatever physical theory one wants to develop (and also with Economics, Psychology, etc). Therefore, since the constructs we are using are the same used in any classical theory, there is no Classical limit for the obvious reason that all the theory is classical—in the sense that it may be interpreted using only classical constructs, a classical ontology, etc. On the other hand, Newtonian mechanics is not compatible with the notion of fluctuations (in the syntactic apparatus, to begin with) and this is the difference between Quantum and Newtonian mechanics. Since Quantum Mechanics may be taken into Newtonian mechanics by one of the approaches previously mentioned, it is quite obvious that it has a greater physical content and we may look at the latter being an approximate theory with respect to the former—in one of the senses thus mentioned. Thus, one could still keep talking about a Newtonian Limit of Quantum Mechanics. Classical Limit, from the present perspective, is nothing but an oxymoron.
6.8. EXTRINSIC AND INTRINSIC INTERPRETATIONS Generally, quantum mechanical systems are understood as physical systems in contact with an observer that will be responsible for the appearance of the effects coded in the Heisenberg dispersion relations. This kind of perspective is what we would call here “extrinsic”, for it needs an external entity (the observer) to explain some features of the formalism. One should argue that the ontological interpretation of Heisenberg’s relations states that the system is objectively dual and, at least at this point, there is no need to postulate an observer. This is correct, but this ontological interpretation does need the observer to select one of the complementary natures appearing in some quantum mechanical experiment. Thus, we might say, the epistemological interpretation needs the observer in the beginning of its analysis, while the ontological interpretation needs the observer at its end. In contrast with these subjective approaches, the present work has to be seen as a fully objective assessment of quantum phenomena. There is no role for notions as “observer” in this approach—neither to companion notions as “reduction of the wave packet”, complementarity, etc. All the present approach needs is the usual notions of dynamic systems and fluctuations, with the fluctuations coming from the internal behavior of the physical system—and this is why we call it an intrinsic interpretation. The observer neither influences the outcomes of the quantum systems governed by the Schrödinger equation nor selects one of the aspects of some dual nature: it is simply not part of quantum theory. It remains, surely, to explain the origin of the random portion of the stochastic Langevin equations. However, it is not difficult to understand what could be the source of fluctuations: as we have already noted in a previous chapter; if we think, for instance, of the harmonic oscillator as having the usual quadratic potential coming from electromagnetic forces, we can certainly assume that it is the photons of the electromagnetic field that mediate the movement of the particle that defines what is, after all, oscillating. Therefore, when a photon is transmitted it leaves the corpuscular system with less energy (and can be thought as a kick the
188
Olavo Leopoldino da Silva Filho
particle receives, changing its momentum and position), while increasing the energy of the field. Whenever the emitted photon is reabsorbed by a corpuscle, energy is again transferred from the field subsystem of the physical system to the corpuscular subsystem and this is the source of its random behavior. In fact, it is amply known that this sort of dividing the physical system into a part that will be accopanied in detail (the moving particle) and another that will be consider “the reservoir” (the fluctuating field) is the very origin of Langevin equations (classical, quantum, whatever). This physical picture of the process can help us understand the behavior of quantum systems when the energy 𝐸 used to make the simulations is greater than the real quantum mechanical one 𝐸𝑛 . In fact, if this is the case, in the exchange of energy between the corpuscular subsystem and the electromagnetic one there would be photons that would not be captured by the corpuscular subsystem, being definitely lost for the electromagnetic subsystem in the form of a radiation field. This is how the energy of the corpuscular subsystem is dissipated by the term 𝛾𝑝 of the Langevin equations. As long as the system attains a stable configuration this term ceases to dissipate energy excess and energy flows between the corpuscular and electromagnetic subsystems in a conservative way. It is important to stress that, according to our model, each state representing the physical system have a different random structure, given by the appearance of the functions Γ𝑝 (𝑞) by which the fluctuations take place. Thus, it is the particular “fluctuation regime” that gives the particles the kind of structured randomness responsible for the overall wavelike behavior (in time or as an ensemble). Superposed quantum systems are described, as we have shown with our examples, by functions Γ𝑝 (𝑞) that take this superposition situations into account—for each kind of superposition one will find a specific Γ𝑝 (𝑞) that can reproduce in corpuscular terms the overall “interfered” quantum mechanical density. Therefore, the Langevin equations we have proposed here assess the behavior of the amplitudes, and indirectly, the densities—this is in agreement with the fact that the phase space probability distributions 𝐹(𝑞, 𝑝) also assess the amplitudes, as we have shown using superposition states in analytic calculations. As we have already noticed, it is possible to define a local temperature for these quantum mechanical systems, since the fluctuations introduce momentum disorder and thus, heat. If we define the local temperature (over the configuration space) as (𝑘𝐵 is Boltzmann’s constant) 𝑘𝐵 𝑇(𝑞) = 𝛿𝑝(𝑞, 𝑡)2 𝜌(𝑞) = −
ℏ2 𝜌(𝑞)∇2 ln𝜌(𝑞), 4𝑚
then it is an easy task to show, for each particular physical system, that the regions in which the densities vanish are precisely those where the local temperature is higher (where we wouldn’t expect to find the particle). There is no need for this approach to consider within its interpretive system the notion of an observer, and the possibility to get rid of the observer as a construct of the semantic level decreases enormously the kind of underlying ontology, making the interpretation of quantum mechanics much simpler. To get a quick impression of this issue one has only to take a look at the literature about the role of the observer in quantum mechanics [90].
Langevin Equations for Quantum Mechanics
189
6.9. FURTHER DEVELOPMENTS The previous developments imply two type of conclusions: one regarding the formal structure just presented and its possible improvements, other applications, etc. and another regarding the interpretation of quantum phenomena. Let us begin assessing the formal elements. The present approach was not meant to show that the formal structure of Quantum Mechanics is obsolete, since we have used it to derive our fluctuating force. It was meant to show that the usual results of Quantum Mechanics can be derived by other means. This, however, may help us solving and understanding problems in which the dynamic features are important. One possible application of the present approach would be in the study of pertubed systems; problems in which we know the exact Quantum solution for the unperturbed problem (and thus the fluctuating force for it) and we introduce, at some instant of time 𝑡, a perturbation. If the perturbation is introduced slowly enough, the present formalism may be useful to find the solution of the problem. Indeed, even if the forces are non-conservative (dissipative systems), it may be the case that the present analysis would also be helpful. Another striking feature is that as long as we may reproduce the probability distribution of quantum systems by only imposing some structured randomness, there is no impediment to construct macroscopic setups in which, by controlling the fluctuations to make them reproduce the quantum ones, we may simulate quantum behavior. There would be a scaling factor involved (related to ℏ), but a careful reading of this chapter should convince anyone of this possibility. In fact, this should be enough to sustain the points made in the last section about the interpretation, since a dual interpretation in such cases would be ludicrous. This chapter, however, presented only a very crude development of the simulation techniques related to Langevin equations in different coordinate systems and with structured randomness. We hope that others, more acquainted with this field, will be capable of pushing all these developments much further.
PART II. NEW PERSPECTIVES
Chapter 7
CLASSICAL REPRESENTATION OF THE SPIN The existence of the spin has been considered one major example of the conceptual abyss between Quantum Mechanics and Classical Physics. The usual argument is that one cannot find a phase-space representation of the quantum mechanical operators that would model half-integral spin. This is why the spin is always presented in textbooks using the matrix representation and never by using the Schrödinger equation. Of course, if a phase-space representation were available, then the application of a quantization process would imply a Schrödinger equation with the underlying probability amplitudes1. Since we are sustaining in the whole book that there is no such abyss, it is our burden to show not only that half-integral spin do have a classical analog, but also that, once a phasespace representation (a model) of it has been found, one can find a Schrödinger equation describing the phenomenon whose solution gives us the functional representation of any halfintegral spin, depending on the values of the quantum numbers 𝑆 and 𝑚 (to be defined). We begin this chapter by presenting a very easy classical problem that gives us the key to understanding half-integral spins. From this classical problem we propose a model for the structure of the half-integral spin particle that allows us to derive all the known mathematical relations regarding the spin (using Dirac commutators, at the Quantum Mechanical “side”, and using Poisson brackets, on the Classical Mechanical “side”). The Schrödinger equation is then obtained and solved to find the probability amplitudes (eigenfunctions) of half-integral spin particles. The operators constructed to address the problem, together with the probability amplitudes just obtained will help us dissolving the conundrum related to rotations by 4𝜋 that appears in such situations. These rotations are also used as argument to propose a conceptual abyss between Quantum and Classical Physics. Other characteristics of half-integral spin particles are also addressed, such as the mathematical derivation of the Pauli equation, as a matter of completeness. We then use the method of the characteristic function, now extended to include the notion of spinors.
1
Part of the results of the present chapter was published in L. S. F. Olavo, A. D. Figueiredo, "The Schrödinger eigenfunctions for the half-integral spins", Physica A 262, 181 (1999).
194
Olavo Leopoldino da Silva Filho
Finally, the Stern-Gerlach experiment is discussed in great detail, something that is never done in textbooks about Quantum Mechanics, and we show the relations between the Classical (Newtonian) solution and the one obtained from the solution of Pauli’s equation.
7.1. THE CLASSICAL BIDIMENSIONAL OSCILLATOR The classical bidimensional isotropic harmonic oscillator is a very known and easy problem of Classical Mechanics. We begin with the classical Hamiltonian 𝐻=
1 1 (𝑝𝑥2 + 𝑚2 𝜔2 𝑥 2 ) + (𝑝2 + 𝑚2𝜔2 𝑦 2 ) 2𝑚 2𝑚 𝑦
that allows us to find the three constants of motion of the problem 1
𝑆1 = 2𝑚𝜔 (𝑝𝑥 𝑝𝑦 + 𝑚2𝜔2 𝑥𝑦) 1
𝑆2 = 4𝑚𝜔 [𝑝𝑦2 − 𝑝𝑥2 + 𝑚2 𝜔2 (𝑦 2 − 𝑥 2 )]
(7.1)
1
𝑆3 = 2 (𝑥𝑝𝑦 − 𝑦𝑝𝑥 ) which, together with the total energy 𝐻, form four algebraic constants of motion (see [34], pp. 416ff). It is also easy to show, by direct evaluation that 𝑆12 + 𝑆22 + 𝑆32 =
𝐻2 4𝜔 2
or 𝐻 = 2𝜔𝑆, if 𝑆 2 = 𝑆12 + 𝑆22 + 𝑆32 and also {Si , Sj } = εijk Sk ,
(7.2)
where 𝜀𝑖𝑗𝑘 is the totally antisymmetric tensor and {∗,∗} is the Poisson bracket. The important point here is that “the group of transformations generated by 𝑆𝑖 may therefore be identified with 𝑅(3) or 𝑆𝑂(3). Actually, there is some ambiguity in the identification. There is a homomorphism (in this case, a 2 to 1 mapping) between the orthogonal unimodular group 𝑆𝑂(3) also called the rotation group 𝑅(3) in three dimensions and the unitary unimodular group 𝑆𝑈(2) in two dimensions. It turns out that 𝑆𝑈(2) is here more appropriate.” ([34], pp. 417-418).
In fact, we all know that relations (7.2) are those responsible (in Quantum Mechanics and using Dirac commutators) to the overall mathematical properties of the half-integral spin
Classical Representation of the Spin
195
particles. In fact, the complete phase-space representation of the group 𝑆𝑈(3) is given by the functions (see [70], pp. 211-212) 𝐿𝑥 = 𝑦𝑝𝑧 − 𝑧𝑝𝑦 𝑄𝑥𝑦 = 𝛼𝑥𝑦 + 𝛽𝑝𝑥 𝑝𝑦
𝐿𝑦 = 𝑧𝑝𝑥 − 𝑥𝑝𝑧 𝑄𝑦𝑧 = 𝛼𝑦𝑧 + 𝛽𝑝𝑦 𝑝𝑧
𝛼
𝛽
𝐿𝑧 = 𝑥𝑝𝑦 − 𝑦𝑝𝑥 𝑄𝑧𝑥 = 𝛼𝑧𝑥 + 𝛽𝑝𝑧 𝑝𝑥
𝑄0 = 2√3 (𝑥 2 + 𝑦 2 − 2𝑧 2 ) + 2√3 (𝑝𝑥2 + 𝑝𝑦2 − 2𝑝𝑧2 ) 𝛼
.
(7.3)
𝛽
𝑄1 = 2 (𝑥 2 − 𝑦 2 ) + 2 (𝑝𝑥2 − 𝑝𝑦2 ) Thus, it should be yet apparent to the mind of the less prejudicial reader that the whole issue is simply a matter of symmetry properties, without having anything to do with Classical or Quantum Mechanics. Taking this as our point of departure, our model for the half-integral particle is that of a particle (we could call bare-particle) rotating in two-dimensions by means of a central isotropic bidimensional harmonic force—the half-integral particle would thus be the bareparticle plus this rotation behavior2. We now turn our attention to the three 𝑆𝑖 above defined. Of course, as there are many different half-integral spin particles, the constants 𝑚, 𝜔 or 𝛼, 𝛽, known as structure constants, enter just to select which specific particle we are talking about. In what follows we use the definitions 𝛼 𝛽 𝐿𝑧 = 𝑥𝑝𝑦 − 𝑦𝑝𝑥 , 𝑄𝑥𝑦 = √ 𝑥𝑦 + √ 𝑝𝑥 𝑝𝑦 𝛽 𝛼 1 𝛼 𝛽 𝑄1 = [√ (𝑥 2 − 𝑦 2 ) + √ (𝑝𝑥2 − 𝑝𝑦2 )] 2 𝛽 𝛼 and put 1
1
1
𝑆1 = 2 𝑄1 , 𝑆2 = 2 𝑄𝑥𝑦 , 𝑆3 = 2 𝐿𝑧 ,
(7.4)
such that we still have {𝑆𝑖 , 𝑆𝑗 } = 𝜀𝑖𝑗𝑘 𝑆𝑘 . It is obvious from the previous developments that the functions 1 𝛼 𝛽 𝑆0 = [√ (𝑥 2 + 𝑦 2 ) + √ (𝑝𝑥2 + 𝑝𝑦2 )] 2 𝛽 𝛼 and 1
𝑆 2 = 𝑆12 + 𝑆22 + 𝑆32 = 𝑆02 4
2
(7.5)
Other material models would be compatible with the formal elements of the model. Indeed, it would suffice to think of the particle as being flat like a coin and subjected to deformations in the plane, governed by the second order tensors 𝑆1 and 𝑆2, or even having the form of an ellipsoid of revolution.
196
Olavo Leopoldino da Silva Filho
commute with all the 𝑆𝑖 , 𝑖 = 1,2,3. Our half-integral spin particle is modelled by a charged bare-particle rotating on a plane, forming a current loop. This particle is thus able to interact with a magnetic field through the Hamiltonian (SI units) 𝐻𝑖𝑛𝑡 = −
𝑒 ⃗⃗ ⋅ 𝐿⃗⃗, 𝐵 2𝑚
⃗⃗ is the magnetic field and 𝐿⃗⃗ is the angular momentum related to the particle’s where 𝐵 rotation. In terms of the functions 𝑆𝑖 this interaction Hamiltonian becomes 𝐻 = −(2𝜔0 )𝑆3 = −(𝑔𝜔0 )𝑆3 = −𝜔1𝑆3 , where we have put the magnetic field in the 𝑧-direction fixed by 𝐿𝑧 (this will be important latter on) and also put 𝜔0 =
𝑒𝐵 , 2𝑚
with 𝑔 = 2 for the Landé factor. The equations of motion for the particle interacting with an ⃗⃗ are easily obtained from the Poisson bracket and are given by external magnetic field 𝐵 𝑑𝑆1 𝑑𝑡
= {𝑆1 , 𝐻} = 𝜔1 𝑆2 ,
𝑑𝑆2 𝑑𝑡
= −𝜔1 𝑆1 ,
𝑑𝑆3 𝑑𝑡
= 0,
(7.6)
with the solutions 𝑆1 (𝑡) = 𝑆1 (0)cos(𝜔1 𝑡) + 𝑆2 (0)sin(𝜔1 𝑡) , 𝑆2 (𝑡) = −𝑆1 (0)sin(𝜔1 𝑡) + 𝑆2 (0)cos(𝜔1 𝑡) representing a precession taking place in the space defined by the functions 𝑆𝑖 . This precession represents the motion of the bare-particle around the force center changing the eccentricity of its orbit (by the 𝑆2 term) that is, the deformation of the half-integral spin particle. Note that 𝑆3 is a pseudo-vector (a second order antisymetric tensor), and 𝑆1 and 𝑆2 are symmetric tensors. This means that a rotation in the 𝑥𝑦-plane by 2𝜋 would rotate these functions by only 𝜋 (we return to this point in a future section in greater detail). Equations (7.6) can be written as 𝑑𝑆𝑖 𝑑𝑡
𝑔𝑒𝐵
= {𝑆𝑖 , 𝐻} = − 2𝑚 𝜀𝑖3𝑘 𝑆𝑘
or 𝑑𝑆⃗ 𝑔𝑒 ⃗⃗, = 𝑆⃗ × 𝐵 𝑑𝑡 2𝑚 from which we can write the dipole moment of the particle as
(7.7)
Classical Representation of the Spin 𝑚 ⃗⃗⃗𝑆 =
197
𝑔𝑒 𝑆⃗, 2𝑚
with the correct Landé factor (note that the Landé factor is not a characteristic of relativistic approaches; as with all other features of the half-integral spin problem, it is just the outcome of symmetry properties). To show that this classical picture parallels the quantum mechanical one, we also introduce the functions 𝑆+ = 𝑆1 + 𝑖𝑆2 , 𝑆− = 𝑆1 − 𝑖𝑆2
(7.8)
that can be explicitly written as 1
𝛼
𝛽
4
𝛽
𝛼
1
𝛼
4
𝛽
2
𝑆+ = [√ (𝑥 + 𝑖𝑦)2 + √ (𝑝𝑥 + 𝑖𝑝𝑦 ) ] (7.9)
2
𝛽
2
𝑆− = [√ (𝑥 − 𝑖𝑦) + √ (𝑝𝑥 − 𝑖𝑝𝑦 ) ] 𝛼
and perform the transformation given by 𝑥1 = 𝑝1 =
(𝛼/𝛽)1/4 𝑥+𝑖(𝛽/𝛼)1/4 𝑝𝑥
𝑥2 =
√2 (𝛼/𝛽)1/4 𝑥−𝑖(𝛽/𝛼)1/4 𝑝𝑥
𝑝2 =
√2
(𝛼/𝛽)1/4 𝑦+𝑖(𝛽/𝛼)1/4 𝑝𝑦 √2 , (𝛼/𝛽)1/4 𝑦−𝑖(𝛽/𝛼)1/4 𝑝𝑦
(7.10)
√2
such that the new 𝑆𝑖 ′s become 1
1
1
1
𝑆1′ = 2 (𝑥1 𝑝1 − 𝑥2 𝑝2 ) = 2 𝜎1 , 𝑆2′ = 2 (𝑥1 𝑝2 + 𝑥2 𝑝1 ) = 2 𝜎2 1
1
1
𝑆3′ = 2𝑖 (𝑥2 𝑝1 − 𝑥1 𝑝2 ) = 2 𝜎3 , 𝑆0′ = 𝑥1 𝑝1 + 𝑥2 𝑝2 , 𝑆 ′2 = 4 𝑆0′2
.
(7.11)
For the new functions 𝜎𝑖 we have {𝜎𝑖 , 𝜎𝑗 } = 2𝑖𝜀𝑖𝑗𝑘 𝜎𝑘 .
(7.12)
We may also define the Poisson anti-bracket by the expression ∂𝑓 ∂𝑔
{𝑓, 𝑔}𝐴 = ∑3𝑘=1 ( ∂𝑞
𝑘
∂𝑝𝑘
+
∂𝑓 ∂𝑔 ∂𝑝𝑘 ∂𝑞𝑘
)
(7.13)
And 1
𝑆±′ = 2 (𝜎1 ± 𝑖𝜎2 ) = 𝑆1′ ± 𝑖𝑆2′ .
(7.14)
With all these definitions we find the following relations among the various functions with respect to the Poisson bracket and anti-bracket:
198
Olavo Leopoldino da Silva Filho {𝑆+′ , 𝑆−′ } = 2𝑆3′ {𝑆+′ , 𝑆−′ }𝐴 = 𝑆0′
{𝜎𝑖 , 𝜎𝑗 }𝐴 = 2𝑆0′ 𝛿𝑖𝑗
{𝑆+′ , 𝑆+′ } = {𝑆−′ , 𝑆−′ } = 0
,
(7.15)
and we also note that {𝑆0′ , 𝜎𝑖 } = 0, {𝑆0′ , 𝜎𝑖 }𝐴 = 2𝜎𝑖 , {𝑆0′ , 𝑆0′ } = 2𝑆0′2 . We can also define the “number-function” as 𝑁 = 𝑆+′ 𝑆−′ = 𝑆1′2 + 𝑆2′2 which can be seen to be, in terms of the old 𝑥𝑦𝑝𝑥 𝑝𝑦 variables, 1 1 1 𝛼 1 𝛽 𝑁 = 𝑆 2 − 𝑆32 = − 𝐿2𝑧 + [ √ (𝑥 2 + 𝑦 2 ) + √ (𝑝𝑥2 + 𝑝𝑦2 )], 4 4 2 𝛽 2 𝛼 the Casimir function in this coordinate-momentum representation of the Lie group 𝑆𝑈(3) with 𝑧 = 𝑝𝑧 = 0 (since the particle is flat).
7.1.1. The Active View: Operators Until now we used the passive approach in which we view the particle as a body with some intrinsic features moving on a fixed background or coordinate system. Within this view, the movements of the particle (rotations, distortions, etc.) are represented by functions. The active view, however, is the one in which the particle is considered fixed and generating symmetry operations upon the space itself (e.g. 𝐿𝑧 is the generator of rotations around the 𝑧-axis). Since we already know that our three functions 𝑆𝑖 are the phase-space representation of the generators of the 𝑆𝑈(2) symmetry group, the operators related to them 1
in the matrix representation are immediately obtained (for 2-spin)3 0 1 0 −𝑖 1 0 𝜎̂1 = ( ) , 𝜎̂2 = ( ) , 𝜎̂3 = ( ) 1 0 𝑖 0 0 −1
(7.16)
and our 𝑆-functions must be proportional to them and thus, in this matrix representation, they must be written as ℎ ℎ ℎ 𝑆̂1 = 2 𝜎̂1 , 𝑆̂2 = 2 𝜎̂2 , 𝑆̂3 = 2 𝜎̂3 ,
3
(7.17)
When we choose a matrix representation we must fix the dimension of the matrix and thus we must fix if the spin is 1⁄2 or any other half-integral value. For our present concerns, the use of a 1⁄2-spin matrix representation is sufficient.
Classical Representation of the Spin
199
where ℎ is an yet undetermined constant that would be obtained from the experiments (e.g. the Stern-Gerlach experiment) and is indeed known to be Planck’s constant ℏ, which we use hereafter. With these definitions, it is quite simple to show that we must have [𝑆̂𝑖 , 𝑆̂𝑗 ] = 𝑖ℏ𝜀𝑖𝑗𝑘 𝑆̂𝑘 ,
(7.18)
and the interaction Hamiltonian becomes (SI units) 𝐻=−
𝑔𝑒ℏ 𝜎̂3 𝐵 2𝑚 3 2
and 0 1 ̂ 0 0 𝑆̂+ = ℏ ( ) , 𝑆− = ℏ ( ) 0 0 1 0
(7.19)
from which we may find the following relations [𝑆̂+ , 𝑆̂− ] = 2ℏ𝑆̂3
[𝜎̂𝑖 , 𝜎̂𝑗 ] = 2𝛿𝑖𝑗
[𝑆̂+ , 𝑆̂− ]𝐴 = ℏ
[𝑆̂+ , 𝑆̂+ ] = [𝑆̂− , 𝑆̂− ] = 0
𝐴
,
(7.20)
which can be compared with (7.15). Whenever 𝑆̂0′ (the operator related to the function 𝑆0) can be equated to 𝟏, the equivalence is exact. The fact that 𝑆̂0′ can be equated to 𝟏 comes from the 1
choice made above to study the -spin case. 2
In fact, we can define our basis vectors as 0 1 |0⟩ = ( ) , |1⟩ = ( ), 0 1 chosen to make the representation of 𝑆̂3 and 𝑆̂ 2 diagonal. We thus have the usual relations 𝑆̂+′ |0⟩ = ℏ|1⟩,
𝑆̂+′ |1⟩ = 0,
𝑆̂−′ |0⟩ = 0,
𝑆̂−′ |1⟩ = |0⟩
and the number operator is defined as ̂ = 𝑆̂+′ 𝑆̂− = ℏ2 (1 0), 𝑁 0 0 for which ̂|0⟩ = 0, 𝑁 ̂|1⟩ = ℏ2 |1⟩. 𝑁 Thus, it is undeniable that the notion of a spin can be phrased within a classical framework.
200
Olavo Leopoldino da Silva Filho
Usually, one mentions that the “odd” behavior of the spin operators/functions to transform with rotations by an angle Ω as rotating Ω/2 would be the clue to its “nonclassical” origin. By no means! This comes from the fact that rotations around the 𝑧-axis imply rotations of the symmetric tensor functions 𝑆1 , 𝑆2 and, as we all know, which is the key to understand such behavior (we show that in a future section). The advantage of giving no importance to all these “quantum-classical abyss argument” comes not only from the kind of understanding that one may develop about the phenomenon. This sort of attitude also allows us to find the functional representation (in terms of probability amplitudes) of the half-integral spin eigenfunctions.
7.2. THE SPIN EIGENFUNCTIONS Now that we have found a representation of the half-integral spins in terms of phasespace functions, it becomes possible to proceed with the quantization of these functions to derive a Schrödinger equation that gives us the half-integral spin eigenfunctions. The three functions 𝑆0, 𝑆 2 and 𝑆3 are the natural candidates to be quantized (to form the Complete Set of Commuting Operators). However, the expression of 𝑆 2 in terms of the variables 𝑥𝑦𝑝𝑥 𝑝𝑦 is given by 𝑆2 =
1 𝛼 2 𝛽 2 [ (𝑥 + 𝑦 2 )2 + (𝑝𝑥2 + 𝑝𝑦2 ) + 2(𝑥 2 + 𝑦 2 )(𝑝𝑥2 + 𝑝𝑦2 )] 16 𝛽 𝛼
which, if we make the usual quantization procedure of substituting 𝑝𝑥 and 𝑝𝑦 by −𝑖ℏ ∂𝑥 and −𝑖ℏ ∂𝑦 , would lead us to a partial differential equation of fourth order. Fortunately, we have the relation (7.5) in which 1 𝛼 𝛽 𝑆0 = [√ (𝑥 2 + 𝑦 2 ) + √ (𝑝𝑥2 + 𝑝𝑦2 )] 2 𝛽 𝛼 that, upon quantization, gives a second order partial differential equation. Quantization of this last function is quite simple (note that this is the Hamiltonian of the bidimensional isotropic oscillator) and we have the underlying Schrödinger equation 1
𝛽
∂2
∂2
𝛼
[−√𝛼 (∂𝑥 2 + ∂𝑦 2) + √𝛽 (𝑥 2 + 𝑦 2 )] 𝜓(𝑥, 𝑦) = ℏ𝜆𝜓(𝑥, 𝑦), 2
(7.21)
corresponding to 𝑆̂0 𝜓 = ℏ𝜆𝜓. If we introduce polar coordinates
(7.22)
Classical Representation of the Spin 𝑥 = 𝑟cos𝜃,
201
𝑦 = 𝑟sin𝜃
and use 𝑢=(
𝛼 1/4 ) 𝑟, 𝛽ℏ2
we find the equation 1 ∂
1 ∂2
∂𝜓
− 𝑢 ∂𝑢 (𝑢 ∂𝑢 ) + (𝑢2 − 2𝜆 − 𝑢2 ∂𝜃2 ) 𝜓 = 0. We also have 𝑆̂3 𝜓 = ℏ𝑚𝜓 and since 𝑖ℏ ∂𝜓 𝑆̂3 𝜓 = − 2 ∂𝜃 = ℏ𝑚𝜓,
we can see that we must have 𝜓(𝑢, 𝜃) = 𝑒 2𝑖𝑚𝜃 𝑅(𝑢), and 𝑚 is a half integral number. This result turns (7.23) into 1 ∂ u ∂u
∂R
(u ∂u) + (2λ − u2 −
4m2 u2
) R = 0.
Substitution of 𝑅(𝑢) = 𝑢2|𝑚| 𝑒 −𝑢 𝑑2 𝑔 𝑑𝑢2
1+4|𝑚|
+(
𝑢
− 2𝑢)
𝑑𝑔 𝑑𝑢
2 /2
𝑔(𝑢) into de differential equation gives
+ [2𝜆 − 2(1 + 2|𝑚|)]𝑔 = 0,
and if we put 𝑣 = 𝑢2 we find 𝑑2 𝑔
𝑑𝑔
(𝜆−1)
𝑣 𝑑𝑣 2 + (2|𝑚| + 1 − 𝑣) 𝑑𝑣 + [
2
− |𝑚|] 𝑔 = 0.
The last result implies that we must have 2𝑁 = 𝜆𝑁 − 1 − 2|𝑚|. Note that, since 𝑁 ≥ 0, we must have
𝜆𝑁 −1 2
− |𝑚| = 𝑁 or
(7.23)
202
Olavo Leopoldino da Silva Filho |𝑚| ≤
𝜆𝑁 −1 2
.
(7.24)
Note also that, since 𝑚 is a half-integral number, then 𝜆𝑁 must be an even number such that 𝑁 is an integral number. The solution of (7.24) is well known and is given by 2|𝑚| 𝑔(𝑣) = 𝐿𝑁 (𝑣),
such that our complete unnormalized solution becomes |𝑚|/2
𝛼
𝜓𝑁,𝑚 (𝑟, 𝜃) = (𝛽ℏ2 )
1
𝑒 2𝑖𝑚𝜃 𝑟 2|𝑚| 𝑒
𝛼
− √ 2 𝑟 2 2|𝑚| 𝛼 2 𝛽ℏ 𝐿𝑁 (√𝛽ℏ2 𝑟 2 ).
It remains for us to find the relation between the quantum operators 𝑆̂ 2 and 𝑆̂0, which cannot be simply 𝑆̂ 2 = 𝑆̂02 /4 because of the non-commuting terms (in chapter eight we address the issue on the construction of operators in Quantum Mechanics). It is easy to find that 1 𝑆̂ 2 = 4 (𝑆̂02 − ℏ2 ),
such that, if (7.22) is satisfied, then 2
𝜆 −1 𝑆̂ 2 𝜓 = ℏ2 ( 𝑁4 ) 𝜓
obtains. If we had written the last equation using the traditional quantum number 𝑆 for total angular momentum, 𝑆̂ 2 𝜓 = ℏ2 𝑆(𝑆 + 1)𝜓, and we would have 𝑆=
𝜆𝑁 −1 2
,
meaning, together with (7.25) that |𝑚| ≤ 𝑆, as usual. In terms of the quantum numbers 𝑆 and 𝑚 our solution becomes, since 𝑁 = 𝑆 − |𝑚|, the expression 𝛼
|𝑚|/2
𝜓𝑆,𝑚 (𝑟, 𝜃) = (𝛽ℏ2 )
1
𝑒 2𝑖𝑚𝜃 𝑟 2|𝑚| 𝑒
𝛼
− √ 2 𝑟 2 2|𝑚| 𝛼 2 𝛽ℏ 𝐿𝑆−|𝑚| (√𝛽ℏ2 𝑟 2 ),
which is the final solution for any half-integral spin particle. Since 𝑁 must be integral and 𝑚 is half-integral, then 𝑆 must be half-integral, obviously. We thus end with the results shown in Table 11.
Classical Representation of the Spin
203
Table 11. Values of λ, S, m and N for various half-integral spins |𝒎|
𝝀
𝑺
𝟐
1/2
𝟒
3/2
𝟔
5/2
𝟖
7/2
1/2 1/2 3/2 1/2 3/2 5/2 1/2 3/2 5/2 7/2
𝑵 0 1 0 2 1 0 3 2 1 0
Some examples of functions would be (ℏ = 𝛼 = 𝛽 = 1) 𝜓1/2,1/2 (𝑟, 𝜃) = 𝑒 𝑖𝜃 𝑟𝑒 −𝑟
2 /2
𝜓3/2,1/2 (𝑟, 𝜃) = 𝑒 𝑖𝜃 𝑟𝑒 −𝑟
2 /2
𝜓7/2,3/2 (𝑟, 𝜃) = 𝑒
(3 − 3𝑟 2 + 𝑟 4 /2)
3𝑖𝜃 3 −𝑟 2 /2 (10
𝑟 𝑒
2
,
4
− 5𝑟 + 𝑟 /2)
and the graphical representation of the underlying probability density functions is shown in Figure 7-1. The interpretation of these density functions is simple: the probability density functions are the configuration space representation of the probability of finding the bare-particle at some distance from the origin. Since the half-integral particle is here modelled as the bareparticle plus its movement, the probability density reflects the shape of the half-integral 1
particle. Thus, for 2-particles, the overall structure of the particle is given by 2
𝜌(𝑟) = 𝑟 2 𝑒 −𝑟 , with the structure constants 𝛼, 𝛽 = 1 (they are responsible for fixing the specific particle being studied). This profile means that the particle resembles a current loop, as expected. Note also that the number of concentric current loops representing each particle is given by 𝑆 − |𝑚| + 1.
7.2.1. Representations With the results of the previous section, we can make the usual connection between the functional representation [in terms of the probability amplitudes 𝜓𝑆,𝑚 (𝑟, 𝜃)] and the matrix representation. This can assure us that we are working with equivalent mathematical and physical structures.
204
Olavo Leopoldino da Silva Filho
𝟑
𝟏
Figure 7.1. spin densities for various values of the quantum numbers 𝑆 and 𝑚. For 𝑆 = , 𝑚 = it is 𝟐 𝟐 also shown a possible visualizable material representation of the model.
We know that the functional representation of the spin operators is given as (ℏ = 𝛽 = 𝛼 = 1) 1
∂2
1
∂2
𝜎1 = (𝑥 2 − 𝑦 2 + 𝑝𝑥2 − 𝑝𝑦2 ) → (𝑥 2 − 𝑦 2 + 2 − 2 ) 2 2 ∂𝑦 ∂𝑥 𝜎2 = 𝑥𝑦 + 𝑝𝑥 𝑝𝑦 → 𝑥𝑦 − 𝜎3 = 𝑥𝑝𝑦 − 𝑦𝑝𝑥 → 𝑖 (𝑦
∂2
∂𝑥 ∂𝑦 ∂ ∂
∂𝑥
−𝑥
∂𝑦
,
(7.25)
)
with the operator representation given in terms of the variables 𝑥𝑦. If we use them and calculate the matrices 1 1
1 1
1 1
1
1
⟨2 , 2| 𝜎̂𝑖 |2 , 2⟩ ⟨2 , 2| 𝜎̂𝑖 |2 , − 2⟩ [𝜎̂𝑖 ] = ( ) 1 1 1 1 1 1 1 1 ⟨2 , − 2| 𝜎̂𝑖 |2 , 2⟩ ⟨2 , − 2| 𝜎̂𝑖 |2 , − 2⟩ we find precisely the results (7.16). The two formalisms are completely equivalent, given some choice of 𝑆. Thus, summarizing our results, we may say that from the calculations of the twodimensional isotropic oscillator, we have proposed a model for the half-integral spin particles 1 as being composed by a bare-particle moving under the potential 𝑚𝜔2 (𝑥 2 + 𝑦 2 ). The 2
particle is thus the result of the irreducible bare-particle and its movement. The Hamiltonian of the movement of the bare-particle is given by 𝑆0 and if we look at the generators of the 𝑆𝑈(3), the only functions that survive in (7.3) after we put 𝑧 = 𝑝𝑧 = 0 (because the problem is two-dimensional) are 𝐿𝑥 = 0 𝑄𝑥𝑦 = 𝛼𝑥𝑦 + 𝛽𝑝𝑥 𝑝𝑦
𝐿𝑦 = 0 𝐿𝑧 = 𝑥𝑝𝑦 − 𝑦𝑝𝑥 , 𝑄𝑦𝑧 = 0 𝑄𝑧𝑥 = 0
Classical Representation of the Spin
𝑄0 =
𝛼
(𝑥 2 + 𝑦 2 ) +
𝛽
(𝑝𝑥2 + 𝑝𝑦2 ) =
2√3 2√3 𝛼 2 𝛽 𝑄1 = (𝑥 − 𝑦 2 ) + (𝑝𝑥2 − 𝑝𝑦2 ) 2 2
1 2√3
𝑆0
205
,
where 𝐿𝑧 , 𝑄𝑥𝑦 and 𝑄1 are the constants of movement and 𝑄0 is a multiple of the Hamiltonian. The functions 𝐿𝑧 , 𝑄𝑥𝑦 and 𝑄1 are the generators of the 𝑆𝑈(2) when we put (7.4). The quantization may be done in usual terms and we see that our complete set of commuting observables are 𝑆0 , 𝑆3 , 𝑆 2, but since 𝑆 2 is a multiple of 𝑆0, we can work only with 𝑆0 and 𝑆3. We then solved the problem of diagonalizing 𝑆̂0 and 𝑆̂3 (and thus 𝑆̂ 2 ) to find the representation of the movement of the bare-corpuscle as 𝜓𝑆,𝑚 . Since the half-integral spin particle consists of the bare-particle plus its movement, the functions 𝜓𝑆,𝑚 (𝑟, 𝜃) are the representation of the half-integral spin particle, that is, if we multiply the mass of the bareparticle by 𝜌𝑆,𝑚 (𝑟, 𝜃) we find the way the mass of the half-integral spin particle is distributed in space. Its average radius would be given by ∗ (𝑟, 𝑟̅ℓ,𝑚 = ∫ ∫ 𝜓𝑆,𝑚 𝜃)𝑟𝜓𝑆,𝑚 (𝑟, 𝜃)𝑟𝑑𝑟𝑑𝜃
1
and for the 2-spin particle it would be given by 𝑟̅1,1 = 22
3√𝜋ℏ 𝛼 −1/4 ( ) ; 4 𝛽 1
of course, each particular 2-spin particle has different values of 𝛼, 𝛽 that give its specific characteristics as a particle.
7.3. ON ROTATIONS BY 4: One striking feature of the half integral spin is that it is generally assumed that one needs to rotate the system over an angle of 4π to take the system onto itself, what we will call henceforth the 4π problem. This seems, of course, a very unusual behavior that demands full interpretation. This is one of the features that lend half-integral spins their fame of being intrinsically non-classical. However, those not complacent with the commonplace tenet that “quantum mechanics is that weird. There is nothing we can do about it” will always keep the hope that one could finally explain, in terms that reason can follow, what is going on and what this rotation really means. Note, though, that answers such as “it is because SU(2) is the covering group of SO(3)” simply beg the question, being at best a mathematical answer, not a physical one. In general, this question about the covering angle for half integral spins would be very dificult to answer in the realm of Quantum Mechanics, given its high abstract nature. In the litterature on Quantum Mechanics, half integral spin is not treated in exactly the same manner
206
Olavo Leopoldino da Silva Filho
as the angular momentum operators, for instance. For the angular momentum operators one has the classical functions that give rise to them in the usual process of quantization. However, when it comes to half-integral spin, the lack of the underlying classical functions from which one can obtain the quantum mechanical operators demands that one models the spin from a perspective much more abstract, by giving, for example, the properties some halfintegral spin particle should present upon rotations. As an example, one needs to postulate that it must behave as a vector upon rotations[70]. However, in our case, we do have a classical model for half integral spin particles based on the isotropic harmonic oscillator in two dimensions. The fact that there is a classical counterpart for the spin reinforces the problem of knowing what the so called doublecoverage of the SU(2) group physically means. Probably, even the most caustic physicist on these matters of interpretation in the realm of Quantum Mechanics will also agree that, in the realm of Classical Physics, these interpretation issues are central and cannot be ignored by simply wavering the hand and saying that “Classical Physics is that weird, there is nothing we can do about it”. On the other hand, all the information gathered in the classical model of the spin may be of great help in the endeavor to understand the apparent conundrum of the 4π problem. In fact, it could be expected that it would be much easier to understand this phenomenom in the realm of the classical model, than in the realm of Quantum Mechanics. For our present interests, it is noteworthy that we have a dependence on 𝜃 in 𝜓𝑆,𝑚 (𝑟, 𝜃) given by exp(2𝑖𝑚𝜃), not the usual dependence on exp(𝑖𝑚′𝜃). This can be easily explained by the fact that the functions 𝑆1 and 𝑆2 are components of a second order symmetric tensor, not a vector, and a second order symmetric tensor transforms by rotations of the axes with the angle doubled with respect to vectors (first order tensors). Remember, however, that 𝑚 is half-integral, while 𝑚′ is usually an integer. Thus, in the end, there is a compensation of the double angle comming from the tensorial nature of the functions/operators by the half-integral nature of the quantum number. In this perspective, the wavefunction still comes into itself when we make 𝜃 → 𝜃 + 2𝜋. But shouldn't it be 4π? A way to understand that is to work out the relations between the classical bidimensional isotropic harmonic oscillator problem (bhp) and the classical Kepler problem (kp) (already considered in the plane). These relations give us the exact meaning of these issues about rotations in spin space – even if we keep only within the classical space – and it will be possible to fully generalize them to the quantum mechanical case. Indeed, since both the kp and the bhp are planar, we may use complex numbers to represent them. We use 𝑍 = 𝑋 + 𝑖𝑌 to represent kp and 𝑧 = 𝑥 + 𝑖𝑦 to represent bhp4. Thus, Newton's equation of motion for kp can be written as d2 Z
Z
M dt2 = −k |Z|3, where 𝑀 is the mass and 𝑘 is the coupling constant related to the inverse square force. On the other hand, Newton's equation for the bhp can be written as
4
One can find this treatment of the two problems in Fung, M.K., Chin. J. Phys. 50, 713-719 (2012).
Classical Representation of the Spin M
d2 z dτ2
207
= −Mω2 z,
where we note that we are using 𝜏 instead of 𝑡, since to make the two problems identical we must write 𝑡 as a non linear function of 𝜏. Now, the angular momenta for kp and bhp are, respectively, iL =
̅
dZ dZ M dz dz̅ (Z̅ dt − Z dt ) and iℓ = 2 (z̅ dτ − z dτ), 2
M
where we note that we used 𝑡 and 𝜏 for these expressions. We want the two angular momenta to be equal, so that we put dZ dz Z̅ dt = z̅ dτ.
If we assume that the transformation that will take kp into bhp is given by 𝑍 = 𝑧 2, the previous equation means that 2𝑧𝑧̅ d2 Z dt2
1 d
1 dz2
= 2zz̅ dτ (2zz̅
1 d
𝑑 𝑑𝑡
=
1 dz
𝑑 𝑑𝜏
. Thus, z2
dz̅ dz
E Z
h ) = 2zz̅ dτ (z̅ dτ) = − 2(zz̅)3 [(dτ dτ) + ω2 z̅z] = − m|Z| 3, dτ
which means that we have the equivalence of the bhp energy 𝐸𝑏ℎ𝑝 and the parameter 𝑘 of kp: 𝐸𝑏ℎ𝑝 = 𝑘. On the other hand, if we transform the energy of kp Ekp =
̅ dZ M dZ 2 dt dt
k
− |Z|,
into bhp, we get Ekp =
M 1 dz̅ dz 2 z̅z dτ dτ
k
− ̅, zz
and using 𝐸𝑏ℎ𝑝 = 𝑘 M dz̅ dz 2 dτ dτ
− Ekp zz̅ = Ebhp,
which means that 𝐸𝑘𝑝 = 𝑀𝜔2 /2. These are the full identifications between the two problems: the energy in one of them is taken into the coupling constant of the other. However, the equations for the trajectories are what really interests us. They are trivially obtained for bhp as 𝑧 = 𝑎cos(𝜔𝜏) + 𝑖𝑏sin(𝜔𝜏), and this equation represents an ellipse with semi-major axis of length 𝑎 and semi-minor axis of length 𝑏 with the force center being at the center of the ellipse. In fact, we can write (we assumed the possible phases all equal to zero) x2 a2
y2
+ b2 = 1.
208
Olavo Leopoldino da Silva Filho Now it is trivial to get the solution of kp. We just have to put Z = z 2 = a2 cos2 (ωτ) − b2 sin2 (ωτ) + iabsin(2ωτ) . a2 −b2 a2 +b2 = 2 + 2 cos 2ωτ + iab sin 2ωτ
Although we got the result based on the bhp time 𝜏, and 𝑍 must be expressed in terms of 𝑡, this is not relevant if we are trying to get the equations for the orbits, since they are obtained by the elimination of 𝑡 or 𝜏. The radius in kp is given by 𝑅 = 𝑟 2 = [(𝑎2 + 𝑏2 ) + (𝑎2 − 𝑏2 )cos(2𝜔𝜏)]/2 and it is easy to show that the equation for the orbit will be given by an ellipse with the center of force on one of the foci with semi-major and semi-minor axes and eccentricity given, respectively, by 𝐴 = (𝑎2 + 𝑏2 )/2, 𝐵 = 𝑎𝑏 and 𝜀 = (𝑎2 − 𝑏2 )/(𝑎2 + 𝑏2 ). It should be equally noted that the points of greatest and lowest distances in kp, 𝑅𝑚𝑎𝑥 and 𝑅𝑚𝑖𝑛 , are separated by an angle of π, while the points of greatest and lowest distances in bhp, 𝑟𝑚𝑎𝑥 an 𝑟𝑚𝑖𝑛 are separated by an angle of π/2. This should be expected, since the transformation 𝑍 = 𝑧 2 doubles the angles (see what follows). Now, let us see with greater care what these last results mean in the present case of halfintegral spin particles. To do that, let us think about the effect of making transformations of 2π on 𝑍. This would mean that we are making transformations of π on 𝑧, given the relation 𝑍 = 𝑧 2. The problem in the z-space is expressed by the tensor operators defining 𝑆1 and 𝑆2. However, one may be wondering why it is necessary to introduce these tensor operators after all. This is demanded by the fact that the kp refers to an ellipse centered on one of the foci (𝑋, 𝑌) = (𝑓, 0), while bhp refers to an ellipse with center at the origin (𝑥, 𝑦) = (0,0). This means that the principal axes in the latter cannot be expressed by vectors, but only by tensors. Thus, to look at a problem related to an ellipse centered at the origin as if it were a problem related to an ellipse centered at one of the foci one must pass from a description in terms of vectors to a description in terms of tensors, when it comes to represent some of the most important elements of the physical system. In the parlance of complex numbers, this way of looking demands passing from the z-space to its conformal mapped Z-space using the conformal map 𝑧 = √𝑍. To see the details of all these in the language of complex numbers and conformal mapping we begin with the ellipse on Z-space, given by 2
(X−√B2 −A2 ) A2
+
Y2 B2
= 1,
and put, because of the equivalence already shown between bhp and kp, 𝐴 = to get 1 1 2 2 (a2 +b2 )2
4(X− a2 + b2 )
2
Y2
+ a2 b2 = 1,
𝑎2 +𝑏2 2
, 𝐵 = 𝑎𝑏,
Classical Representation of the Spin
209
Figure 7.2. Conformal transform map given by 𝑧 = √𝑍. representing the ellipse shown in Figure 7.2(a) on 𝑍-space. Moreover, if we also put 𝑋 = 𝑥 2 − 𝑦 2 ; 𝑌 = 2𝑥𝑦, we get the ellipse shown in Figure 7.2(b) on 𝑧-space.
Figure 7.3. Riemann surfaces for the conformal map of Figure 7-2 and the ellipse. However, things get even more instructive if we assume the inverse situation, that is 𝑧 = √𝑍, meaning that it will be necessary to appeal to Riemann surfaces to make the description when passing from 𝑍 to 𝑧. In this case we may simply find 𝑥 and 𝑦 in terms of 𝑋 and 𝑌 and substitute them in the harmonic oscillator ellipse (on z-space), given by
x2 a2
+
y2 b2
= 1.
Since there would be two different possibilities for our choice regarding x and y, we find that the previous ellipse is taken into the following ellipse
210
Olavo Leopoldino da Silva Filho 2X+2√X2 −Y2 4a2
+
2X−2√X2 −Y2 4b2
= 1,
representing the inferior Riemann leaf in Figure 7-3, and 2X−2√X2 −Y2 4a2
+
2X+2√X2 −Y2 4b2
= 1,
representing the superior Riemann leaf in figure 7-3. It is easy to see from that figure that when we rotate by 2π on Z-space we rotate only by π on z-space. It is also obvious that if we take bhp and rotate by 2π on z-space we get again the same solution, exactly as we have for kp on Z-space. This is important: if we keep ourselves on z-space, we have to take only turns of 2π to get to the original situation, which is exactly the result already mentioned if we use the Schrödinger equation and the corresponding probability amplitude. It now seems obvious what is going on: the relation 𝑅: 4𝜋 ⇔ 𝑜𝑛𝑒 𝑡𝑢𝑟𝑛 appears only if we try to express bhp, which is represented by an ellipse with center at the origin and demands the use of symmetric tensors, by a planar Kepler problem, which is represented by an ellipse with the center at one of its foci and demands only the use of vectors. All these results could be put into a more algebraic approach, to get a more detailed understanding of the tensor/vector relation among them. For bhp one can define the tensor 𝑏ℎ𝑝 ⃗⃗ (𝑀 = 1 throughout) 𝐴 = 𝑝𝑖 𝑝𝑗 + 𝜔2 𝑥𝑖 𝑥𝑗 = 𝑝⃗𝑝⃗ + 𝜔2 𝑟⃗𝑟⃗, and thus 𝐴𝑏ℎ𝑝 ⋅ 𝐿⃗⃗ = 𝐿⃗⃗ ⋅ 𝐴𝑏ℎ𝑝 = 0 𝑖𝑗
and the equation for the orbit can be written as 𝑟⃗ ⋅ 𝐴𝑏ℎ𝑝 ⋅ 𝑟⃗ = 𝐿2 /2. Indeed, since 𝐴, 𝐻 and 𝐿2 are conserved quantities, in the coordinate system (𝑥 ′ , 𝑦 ′ ) in which 𝐴𝑏ℎ𝑝 is diagonal, that is, 𝐴𝑏ℎ𝑝 = 𝑑𝑖𝑎𝑔{𝐴11 , 𝐴22 , 0}, we get 2𝐴11 𝑥 2 /𝐿2 + 2𝐴22 𝑦 2 /𝐿2 = 1, such that, for appropriately defined 𝑎 and 𝑏, 𝑥 2 /𝑎2 + 𝑦 2 /𝑏2 = 1. Thus, in the coordinate system in which 𝐴𝑏ℎ𝑝 is diagonal, its eigenvalues are associated with the length of the axes of an ellipse, while its eigenvectors essentially represent these axes. 𝑘 For kp one has the Laplace-Runge-Lenz vector 𝐴⃗𝑘𝑝 = 𝑝⃗ × 𝐿⃗⃗ − 𝑟⃗ such that 𝐴⃗𝑘𝑝 ⋅ 𝐿⃗⃗ = 0 𝑟
and 𝐴⃗𝑘𝑝 ⋅ 𝑟⃗ = 𝐴𝑘𝑝 𝑟cos𝜃 = 𝐿2 − 𝑘𝑟, giving, since 𝐴⃗ is a conserved quantity, 1/𝑟 = 𝐴
𝑘 (1 + 𝑘 cos𝜃) /𝐿2 which is the equation for the orbit. It is clear, thus, that the tensor 𝐴𝑏ℎ𝑝 plays a similar role in bhp as the Laplace-RungeLenz vector 𝐴⃗𝑘𝑝 in kp. Passing from a tensor to a vector in this algebraic approach represents the conformal mapping 𝑍 = 𝑧 2 in the analytic approach, which, in turn, represents the necessity for doubling our coverage in the group theoretical approach. As for Quantum Mechanics, all the expositions of the mathematical details regarding half-integral spin particles agree in assuming that 𝑆⃗ = (𝑆1 , 𝑆2 , 𝑆3 ) is a vector quantity. For instance, we can find in passages like5: “Spin angular momentum is described by a vector operator 𝑆⃗ = (𝑆1 , 𝑆2 , 𝑆3 ), whose components obey the angular momentum commutation
5
Baym, G., Lectures on Quantum Mechanics (Addison-Wesley, 1973).
Classical Representation of the Spin
211
relation...”. Another example would be6: “(...) We require that the spin vector 𝑆⃗ or – being identical up to a constant factor – the Pauli operator 𝜎⃗ = [𝜎̂1 , 𝜎̂2 , 𝜎̂3 ] transforms under space ̂𝑆† (𝜙)𝜎̂𝑖 𝑈 ̂𝑆 (𝜙) = 𝑅𝑗𝑖 𝜎̂𝑗 , which simply rotations as a vector. This leads us to the relation 𝜎̂𝑖′ = 𝑈 expresses that the matrices 𝜎̂𝑖 transform in the same manner as the components of a vector under space rotations.” Indeed, we could present an infinity of such examples. At the end of such expositions one ̂𝑆 (𝜙) = 𝑒𝑥𝑝 (− 1 𝑖ℏ𝜙 ⋅ 𝜎̂) as the rotation operator for the half-integral spin particles, gets 𝑈 2
which will guarantee that the angles are halved, if our objects 𝜎̂𝑖 are, indeed, components of a vector. They are not, as we have already shown. A trivial calculation shows that (classical or quantum space, whatever) if we make the rotations in the xy-plane 𝑥 ′ = 𝑥cos𝜙 + 𝑦sin𝜙; 𝑦 ′ = −𝑥sin𝜙 + 𝑦cos𝜙 , 𝑝𝑥′ = 𝑝𝑥 cos𝜙 + 𝑝𝑦 sin𝜙; 𝑝𝑦′ = −𝑝𝑥 sin𝜙 + 𝑝𝑦 cos𝜙 we get the relation 𝑆1′ cos(2𝜙) sin(2𝜙) 0 𝑆1 (𝑆2′ ) = (−sin(2𝜙) cos(2𝜙) 0) (𝑆2 ), 𝑆3′ 0 0 1 𝑆3 ̂𝑆 (𝜙) we do not get states rotated by 𝜙/2, but, instead, and combining this last result with 𝑈 states rotated by 𝜙, which is exactly the same compensation that we talked about when we solved the Schrödinger equation for the half-integral spin particle. All these relations among the classical picture and the quantum mechanical one become even more direct if we think about the representation of the Quantum Mechanical version of bhp and kp in terms of the Bohr-Sommerfeld rules of the so called old quantization, to which the use of the above orbits would be fully justified.
7.4. DERIVATION OF THE PAULI EQUATION: There are two ways to mathematically arrive at the Pauli equation. The first method, that we can call scalar, is to consider the half-integral spin particle as a rotating bare-particle around the origin 𝑜 with energy given by 𝐻𝑜 = 𝑆0 and a translational movement given by 𝐻𝑂 = 𝑃𝑂2 /2𝑚 (see Figure 7-4). The Hamiltonian is thus 𝐻=
6
𝑃⃗⃗𝑂2 + 𝑆0 , 2𝑚
Greiner, M ller, Quantum Mechanics - Symmetries, (Springer, 1992, 2ed).
212
Olavo Leopoldino da Silva Filho
and gives the energy of the bare-particle in its translational and rotational movement. If we ⃗⃗ in the space where the particle is moving, then also assume that there is a magnetic field 𝐵 this magnetic field interacts with the current loop with a potential given by ⃗⃗, 𝑉 = −𝜇⃗ ⋅ 𝐵 where 𝜇⃗ is the magnetic moment of the half-integral spin particle. Note, however, that 𝜇⃗ and ⃗⃗ are expressed in different coordinate systems (this will become important latter on). 𝐵
Figure 7.4. schematic representation of the spin particle in its reference system 𝑜, and the laboratory system 𝑂. The direction of the magnetic field B is also shown.
Here the reader ought to pay much attention to the distinction between the bare-particle and the half-integral spin particle: the magnetic field interacts with the latter, that already encompasses the notion of a current. On the other hand, the bare-particle interacts with some isotropic harmonic oscillator potential (the origin of which we have no idea). The variables related to 𝑃⃗⃗𝑂 are 𝑅⃗⃗ = (𝑋, 𝑌, 𝑍), while those related to 𝑆0 are 𝑟⃗ = (𝑥, 𝑦, 𝑧). The complete Hamiltonian can be written as 𝐻=
𝑃⃗⃗O2 ⃗⃗ + 𝑆0 + 𝜇⃗ ⋅ 𝐵 2𝑚
⃗⃗ does not change the structure of the halfand it is being implicitly assumed that the field 𝐵 ⃗⃗)]. In terms of the coordinates we have integral spin particle [or we would have 𝜇⃗ = 𝜇⃗(𝐵 𝐻= 1
1 2𝑚
⃗⃗0 (𝑋, 𝑌, 𝑍) + (𝑃𝑋2 + 𝑃𝑌2 + 𝑃𝑍2 ) + 𝜇⃗(𝑥, 𝑦, 𝑧) ⋅ 𝐵
𝛽
𝛼
{2 [√𝛼 (𝑝𝑥2 + 𝑝𝑦2 ) + √𝛽 (𝑥 2 + 𝑦 2 )]} This means that 𝑑𝑃⃗⃗ ⃗⃗), = {𝑃⃗⃗, 𝐻} = ∇(𝜇⃗ ⋅ 𝐵 𝑑𝑡
𝑑𝑅⃗⃗ 𝑃⃗⃗ = , 𝑑𝑡 𝑀
.
(7.26)
Classical Representation of the Spin
213
for the equations representing the translational motion. There are equations for the internal degrees of freedom. Note that 𝑑𝑆⃗
𝑃⃗⃗
𝑂 ⃗⃗ + 𝑆0 } = {𝑆⃗, 𝐻}𝑟⃗,𝑝⃗ = {𝑆⃗, 2𝑚 + 𝜇⃗ ⋅ 𝐵
𝑑𝑡
−
𝑟⃗,𝑝⃗
𝑒𝑔 2𝑚
= {𝑆⃗, 𝜇⃗ ⋅ 𝑛̂}𝑟⃗,𝑝⃗ 𝐵 =
𝑒𝑔 𝑒𝑔 ⃗⃗ × 𝑆⃗ {𝑆⃗, 𝑆⃗ ⋅ 𝑛̂}𝑟⃗,𝑝⃗ 𝐵 = − (𝐵𝑛̂) × 𝑆⃗ = − 𝐵 2𝑚
,
(7.27)
2𝑚
⃗⃗ = 𝐵𝑛̂ and used that 𝐵 ⃗⃗ is a function only of 𝑅⃗⃗ and {𝑆𝑘 , 𝑆0 } = 0 (we also used where we put 𝐵 that 𝜇⃗ = −𝑒𝑆⃗/2𝑚 in SI units) Thus, 𝑑𝑆⃗ 𝑑𝑡
=−
𝑒𝑔 2𝑚
⃗⃗ × 𝑆⃗, 𝐵
(7.28)
as expected. Note, however, that at this point we are not interested in the equations for 𝑝⃗ and 𝑟⃗. The previous results mean that the Hamiltonian (7.26) does represent the physical system. Quantization of this Hamiltonian is easy and can be done by the methods of chapter two or the usual method of taking px into −iℏ ∂x, etc. The resulting Schrödinger equation is (we represent the internal part with the notation of the previous sections for simplicity) [−
ℏ2 2𝑚
∇2𝑅 −
𝑒𝑔 2𝑚
⃗⃗ + 𝑆̂0 ] |𝜓⟩ = 𝑖ℏ 𝑆̂ ⋅ 𝐵
∂ ∂𝑡
|𝜓⟩.
(7.29)
The second method to arrive at Pauli’s uses the characteristic function method and is actually a mathematical derivation of this equation[91], which uses spinorial notation from the beginning—this method can also elucidate some features of the previous approach. To begin with, let us introduce the spinor 𝜒(𝜃0 , ϕ0 ) as 𝜒(𝜃0 , ϕ0 ) = 𝜒𝑢 + 𝜒𝑑 ≡ cos
𝜃0 −𝑖ϕ /2 1 𝜃0 0 𝑒 0 ( ) + sin 𝑒 𝑖ϕ0 /2 ( ), 0 1 2 2
and let us consider the Pauli matrices 𝜎𝑖 already defined. This means that we can write any vector 𝜇⃗ with components given by 𝜇𝑗 = 𝜇𝜒 † (𝜃0 , ϕ0 )𝜎𝑗 𝜒(𝜃0, ϕ0 ),
(7.30)
𝑗 = 1,2,3—thus 𝜇𝑥 = 𝜇sin𝜃0 cosϕ0 , 𝜇𝑦 = 𝜇sin𝜃0 sinϕ0, 𝜇𝑧 = 𝜇cos𝜃0 ⃗⃗|) and the force could be written as (𝐵 = |𝐵 ⃗⃗} = 𝜇cos𝜃0∇𝑅 𝐵; 𝐹⃗ = ∇{𝜇[𝜒 † (𝜃, ϕ)𝜎⃗𝜒(𝜃, ϕ)] ⋅ 𝐵 thus, according to the spinorial notation
(7.31)
214
Olavo Leopoldino da Silva Filho 𝐹⃗ = 𝐹⃗𝑢 + 𝐹⃗𝑑 ,
where 𝜃0 ∇ 𝐵 2 𝑅 , 𝜃0 𝐹⃗𝑑 = 𝜇(𝜒𝑑† 𝜎3 𝜒𝑑 )∇𝐵 = −𝜇sin2 ∇𝑅 𝐵 2 𝐹⃗𝑢 = 𝜇(𝜒𝑢† 𝜎3 𝜒𝑢 )∇𝐵 = 𝜇cos2
which is always anti-parallel. To apply the method of chapter two, we must write the Liouville equation, given by [
∂ 𝑑𝑅⃗⃗ ∂ 𝑑𝑃⃗⃗ ∂ + ⋅ + ⋅ ] 𝐹(𝑅⃗⃗, 𝑃⃗⃗; 𝑡) = 0 ∂𝑡 𝑑𝑡 ∂𝑅⃗⃗ 𝑑𝑡 ∂𝑃⃗⃗
and put 𝑍(𝑅⃗⃗, 𝛿𝑅⃗⃗; 𝑡) = ∫ 𝐹(𝑅⃗⃗, 𝑃⃗⃗; 𝑡)𝑒 𝑖𝑃⃗⃗⋅𝛿𝑅⃗⃗/ℏ 𝑑3 𝑃. Following the same steps presented in chapter two, we find {−𝑖ℏ
∂ ℏ2 ∂2 ∂ ⃗⃗)} 𝑍(𝑅⃗⃗, 𝛿𝑅⃗⃗; 𝑡) = 0. − + 𝛿𝑅⃗⃗ ⋅ (𝜇⃗ ⋅ 𝐵 ∂𝑡 𝑚 ∂𝑅⃗⃗ ∂(𝛿𝑅⃗⃗) ∂𝑅⃗⃗
Now, if we put ∂ 𝛿𝑅⃗⃗ 𝛿𝑅⃗⃗ 𝐵𝑗 = 𝐵𝑗 (𝑅⃗⃗ + ) − 𝐵𝑗 (𝑅⃗⃗ − ) 2 2 ∂𝑅⃗⃗ we may write, with the help of (7.30), 𝛿𝑅⃗⃗ ⋅
𝛿𝑅⃗⃗ ⋅
∂ 𝛿𝑅⃗⃗ 𝛿𝑅⃗⃗ ⃗⃗] = 𝜇𝜒 † (𝜃, ϕ)𝜎⃗𝜒(𝜃, ϕ) ⋅ [𝐵 ⃗⃗ (𝑅⃗⃗ + ) − 𝐵 ⃗⃗ (𝑅⃗⃗ − )]. [𝜇⃗ ⋅ 𝐵 2 2 ∂𝑅⃗⃗
We now write 𝑍(𝑅⃗⃗, 𝛿𝑅⃗⃗; 𝑡) = Ψ † (𝑅⃗⃗ −
𝛿𝑅⃗⃗ 𝛿𝑅⃗⃗ ; 𝑡|𝜃0 , ϕ0 ) Ψ (𝑅⃗⃗ + ; 𝑡|𝜃0 , ϕ0 ), 2 2
where Ψ(𝑅⃗⃗; 𝑡|𝜃0 , ϕ0 ) = 𝜒(𝜃0, ϕ0 )Φ(𝑅⃗⃗; 𝑡) ≡ Ψ𝑢 + Ψ𝑑 ,
Classical Representation of the Spin
215
and Φ(𝑅⃗⃗; 𝑡) is a scalar function. The rest of the derivation is exactly as presented in chapter two and we finally get (using that 𝜒 † 𝜒 = 1) ℏ2
∂
⃗⃗(𝑅⃗⃗)} Ψ(𝑅⃗⃗; 𝑡|𝜃0 , ϕ0 ) = 𝑖ℏ Ψ(𝑟⃗; 𝑡|𝜃0, ϕ0 ), {− 2𝑚 ∇2 + 𝜇𝜎⃗ ⋅ 𝐵 ∂𝑡
(7.32)
which is exactly equal to (7.28). This derivation shows us that we can generalize the ansatz of the characteristic function derivation for probability amplitudes that are not scalar functions, but spinors.
7.5. THE STERN-GERLACH EXPERIMENT The Stern-Gerlach experiment is generally used as a prototype experiment showing the “deep differences” existing between the Classical and Quantum frameworks, since it reveals the existence of half-integral spins that, so tells the myth, could not be accommodated within the classical worldview. However, anyone acquainted with the original works of O. Stern and W. Gerlach [92], [93] envisages that the discussions there presented can be framed in completely classical terms (see, for instance, [92]). In what follows we present both a Classical and a Quantum Mechanical analysis of the experiment to show that feature explicitly.
Figure 7.5. the Stern-Gerlach experiment with an oven O, a pair of magnets M and a screen S. Characteristic distances are shown in the picture. Magnetic field lines B are also shown. In the SternGerlach experiment the silver was heated to about 1300K. The atoms emerge through slits which collimates them. Stern and Gerlach verified that their velocity distribution was Maxwellian. Typical speeds were about one 1 km/s and L ≅ 10 cm. Thus an atom spent about 10−4 s in it.
The experimental setup is the one shown in Figure 7-5. Particles come from an oven 𝑂 and travel the distance 𝐷0 without the presence of any force field. At 𝑦 = 𝐷0 the particles begin to be acted upon by the magnetic field given by ⃗⃗ = (−𝛽𝑥, 0, 𝐵0 + 𝛽𝑧) 𝐵
(7.33)
216
Olavo Leopoldino da Silva Filho
⃗⃗ = 0, as must be). The magnet has a dimension 𝐿 = 3𝐷0 and after the (we note that ∇ ⋅ 𝐵 particles arrive at 𝑦 = 4𝐷0 = 𝐿 + 𝐷0, they are no longer acted by any force field. The particles then travel in straight line until they hit a screen containing detectors, the screen being placed at 𝑦 = 𝐿 + 𝐷0 + 𝐷1 and 𝐷1 is made big enough to have the two emerging beans separated.
Figure 7.6. the correct realization of the Stern-Gerlach experimental setup to produce the magnetic field given in (7.33). The magnetic field is, thus, of a quadrupole nature.
Figure 7.7. the relations between the coordinate system related to internal degrees of freedom (𝑜𝑥𝑦𝑧) and the coordinate system related to the translational movement of the particle as a whole(𝑂𝑋𝑌𝑍).
Contrary to what is comonly assumed in the literature, the magnets represented in Figure 7-5 do not give rise to the magnetic field (7.33). In fact, one must have an arrangement like the one shown in Figure 7-6 to get the field lines properly represented by (7.33). We must now explicitly express the second term on the left of (7.29), our primary equation—this is a point that is never correctly done in the rare literature that takes the details of the Stern-Gerlach problem into account. To do that, let us consider Figure 7-7 which makes it explicit the relations between the coordinate system of the internal movement of the particle (𝑂𝑥𝑦𝑧) and the coordinate system of the laboratory (𝑂𝑋𝑌𝑍). We must remember that our solution of the internal degrees of freedom (oxyz) implies that we necessarily assume the axis z as pointing in the direction of Sz or, which is the same,
Classical Representation of the Spin
217
the same direction of μ ⃗⃗ and, for the problem in OXYZ we are assuming that OXZ defines the plane of the magnetic field (this is why we must talk about two, in principle, different coordinate systems) – it is interesting to note that when one uses the matricial notation this becomes hidden into the formalism. To simplify the problem, let us now rotate the coordinate system OXYZ by the angle α, which represents the angle with which the particle entered the magnetic field. With this operation we get the new coordinate system O′X′Y′Z′ such that Y ′ = Y. In this case, the transformed magnetic field becomes 𝐵′ cosα −sinα 𝐵𝑥 ( 𝑥′ ) = ( ) ( ). 𝐵𝑧 𝐵𝑧 sinα cosα ⃗⃗ = (−𝛽𝑋, 0, 𝐵0 + 𝛽𝑍), we get Now, using the fact that 𝐵 𝐵𝑥′ = −𝛽𝑋cosα − (𝐵0 + 𝛽𝑍)sinα ; 𝐵𝑧′ = −𝛽𝑋sinα + (𝐵0 + 𝛽𝑍)cosα. However, in this new coordinate system 𝑂′ X′Y′Z′ we have ⃗⃗′ = 𝐵𝑧′ 𝑆𝑧 = [𝛽(−𝑋sinα + 𝑍cosα) + 𝐵0 cosα]𝑆𝑧 = [𝛽𝑍 ′ + 𝐵0 cosα]𝑆𝑧 , 𝜇⃗ ⋅ 𝐵 since 𝑍 ′ = −𝑋sin𝛼 + 𝑍cos𝛼, as can be seen from Figure 7-7. Pauli's equation can be written in these two coordinate systems (𝑂′ X′Y′Z′ and 𝑜𝑥𝑦𝑧) as {−𝟏
ℏ2 2𝑚
∇′2 −
𝑒𝑔 2𝑚
[𝛽𝑍 ′ + 𝐵0 cos𝛼]𝑆̂𝑧 + 𝑆̂0 } Ψ = 𝟏iℏ
∂Ψ ∂t
,
where 1 is the 2 × 2 identity matrix and the matricial notation is assumed for that part of the system related to the internal movement, that is 𝑜𝑥𝑦𝑧. This last equation can be written as two scalar equations given by (remember that 𝑆̂0 |𝜓𝑟 ⟩ = 𝜆ℏ|𝜓𝑟 ⟩) {−
ℏ2 2𝑚
𝛻 ′2 ±
ℏ 𝑒𝑔 2 2𝑚
[𝛽𝑍 ′ + 𝐵0 cos𝛼] + 𝜆ℏ} 𝜓± = 𝑖ℏ
𝜕 𝜕𝑡
𝜓± .
We don't expect any motion in the 𝑋 ′ direction (𝜇𝑥 = 0), since all the force is made on the 𝑍 ′ direction. This means that we may simply drop the second order derivative in 𝑋 ′ in the last equation to rewrite it as (we are now assuming the energy related to the internal degrees of motion 𝑆̂0 as included in the time derivative on the right hand – givin origin to a term like exp[−𝑖𝑡(±𝜇𝐵 𝐵0 cos𝛼/ℏ + 𝜆)]) ℏ2
𝜕2
𝜕2
𝜕
{− 2𝑚 (𝜕𝑌′2 + 𝜕𝑍′2 ) ± 𝛽𝜇𝐵 𝑍 ′ } 𝜓± (𝑌 ′ , 𝑍 ′ , 𝑡) = 𝑖ℏ 𝜕𝑡 𝜓± (𝑌 ′ , 𝑍 ′ , 𝑡), eℏ
where 𝜇𝐵 = 2𝑚. The solutions for these two equations are Airy's functions, given by
(7.34)
218
Olavo Leopoldino da Silva Filho 𝑚𝛽𝜇𝐵 1/3
𝜓± (𝑌 ′, 𝑍 ′ ; 𝑡) = 𝐴𝑖 [(
)
ℏ2
(±𝑍 ′ +
𝜇𝐵 𝛽 2 𝜇𝐵 𝛽 ′ 𝑝 𝑌 𝑡 )] exp (∓𝑖 2ℏ 𝑍 𝑡) exp [𝑖 ( 𝑌ℏ 4𝑚
𝑝2
𝑌 − 2𝑚 𝑡)],
(7.35) We note that Airy’s functions are non-decaying functions of Z′. Of course, this comes from our choice to represent the magnetic field as in (7.33), which is an approximation valid only within the region represented in figure 7-6. To assess the particle’s average movement we calculate the average of the coordinate 𝑍 ′ (𝑡). To do that we must use the Hamiltonian in (7.34) with 〈𝑍 ′ (𝑡)〉± = ∫ 𝜓± (𝑡)𝑍̂𝜓± (𝑡)𝑑 3𝑅⃗⃗,
(7.36)
and use the Baker-Hausdoff identity 1 𝑒 𝑂̂ 𝐴̂𝑒 −𝑂̂ = 𝐴̂ + [𝑂̂, 𝐴̂] + [𝑂̂, [𝑂̂, 𝐴̂]] + ⋯, 𝜓± (𝑡) = 𝑒 𝐻̂± 𝜓± (0) 2!
̂± = − where 𝑂̂ = 𝐻 〈𝑍 ′ (𝑡)〉± = ±
ℏ2
𝜕2
𝜕2
+ ′2 ) ± 𝛽𝜇𝐵 𝑍 ′ and 𝐴̂ = 𝑍̂′. We thus get ( 2𝑚 𝜕𝑌 ′2 𝜕𝑍
𝜇𝐵 𝛽 2 𝑡 , 2𝑚
(7.37)
where we assumed that the particle (in the 𝑍′-direction) comes from the origin 𝑂′ without an initial velocity. Note that α is, in principle, random, since we are assuming that the particles can enter the Stern-Gerlach apparatus with their magnetic moment pointing at any direction in the plane 𝑂′𝑋′𝑍′. Thus, we expect the particles to make a circular pattern (like a circular ring) on the detectors, which are fixed in the frame 𝑂𝑋𝑌𝑍, parametrized by the random variation of 𝛼.
7.5.1. The “Classical” Stern-Gerlach Experiment The classical version of the problem is much more direct. Since we need not think about diagonalizations, etc, we may refer all the elements of the problem to the same coordinate axes (which we will assume as being 𝑂𝑋𝑌𝑍) by simply introducing the related angles. We thus have ⃗⃗ = 𝐵𝑥 𝑖̂ + 𝐵𝑧 𝑘̂, 𝜇⃗ = 𝜇sin(𝛼)𝑖̂ + 𝜇cos(𝛼)𝑘̂ 𝐵 such that ⃗⃗ = 𝜇[𝐵𝑥 sin(𝛼) + 𝐵𝑧 cos(𝛼)], 𝜇⃗ ⋅ 𝐵 ⃗⃗ = (−𝛽𝑥, 0, 𝐵0 + 𝛽𝑧). where 𝐵
Classical Representation of the Spin
219
If we assume that the magnetic field has not enough time to change the orientation of 𝜇⃗, then α is fixed (and represents the angle with which the particle comes from the oven in the 𝑂𝑋𝑍 plane). Thus ⃗⃗ = 𝜇[−𝛽𝑥sin(𝛼) + (𝐵0 + 𝛽𝑧)cos(𝛼)], 𝜇⃗ ⋅ 𝐵 where 𝛼 is a constant. Then, Newton's equation is given by 𝑑2 𝑟⃗
⃗⃗), 𝑚 𝑑𝑡 2 = −𝛻(𝜇⃗ ⋅ 𝐵 and thus 𝑑2 𝑧
𝑑2 𝑥
𝑚 𝑑𝑡 2 = −𝛽𝜇cos(𝛼), 𝑚 𝑑𝑡 2 = 𝛽𝜇sin(𝛼), meaning that (assuming that the particles come out of the oven collimated in the 𝑦 direction, that is, without velocity in the 𝑥- or 𝑧-direction) 𝑧(𝑡) = 𝑧0 −
𝛽𝜇cos(𝛼) 2 𝑡 ; 2𝑚
𝑥(𝑡) = 𝑥0 +
𝛽𝜇sin(𝛼) 2 𝑡 , 2𝑚
(7.38)
where 𝑧0 and 𝑥0 are the initial positions of each particle sent (and are given by the original distribution related to the initial state preparation). Now, we assume that all particles have the same value of the modulus of 𝜇⃗ (|𝜇⃗| = 𝜇𝐵 , as they all have the same charge and mass, for instance). If we also assume that these particles come with arbitrary equiprobable values of 𝛼, then if all these particles enter the magnet at (𝑥0 , 𝑧0 ) = (0,0), the pattern we should expect on the detectors as they pass through the Stern-Gerlach magnet is a circunference given by 𝛽𝜇
2
𝑧(𝑡)2 + 𝑥(𝑡)2 = ( 2𝑚𝐵 𝑡 2 ) , with radius given by 𝑅(𝑡) =
𝛽𝜇𝐵 2 𝑡 . 2𝑚
This result should be compared to the result (7.36) or
(7.37) to see that they are exactly equal. The classical aspect, and indeed Newtonian, of all these results couldn’t go farther. There is, however, one feature of the phenomenon that was implicitly assumed here and should be made explicit: we assumed here that the magnetic moment of the particles is always on the 𝑂𝑋𝑍-plane. In general, and without any further hypothesis, we should expect that the particles’ magnetic moment could span every single direction in the three-dimensional space. This feature, thus, remains in need of further attention. If we assume the experimental setup presented before, then we must also consider the path between the Stern-Gerlach magnet and the photographic plate, as shown in Figure 7-5. If we do that, we get the final result
220
Olavo Leopoldino da Silva Filho
𝑍𝑐 = 𝑍0 +
𝛽𝜇𝐵 𝐿2 𝛽𝜇𝐵 𝐿 𝐷1 + , 2𝑚 𝑣𝑌2 2𝑚 𝑣𝑌 𝑣𝑌
where the first term is the (average) initial position of beam, the second term comes from the 𝐿
deflection inside the magnet (during a time 𝑡 = 𝑣 , 𝑣𝑌 = 𝑃𝑌 /𝑚) and the last term comes from 𝑌
the uniform movement of the particle with velocity 𝑣𝑧 = 𝛽𝜇𝐵 𝐿/2𝑚𝑣𝑌 (𝐿/𝑣𝑌 is the time it was exposed to the accelerating field) during a time 𝐷1 /𝑣𝑌 between the magnet and the screen. Thus 𝑍𝑐+ = 𝑍0 +
𝛽𝜇𝐵 𝐿2 𝐷1 2 (1 + 𝐿 ), 2𝑚𝑣𝑌
which can be compared with similar results given in equation (10.5) of ref. [94]. In the case of the experiment made by Zacharias[94] with an unpolarized beam of cesium atoms we have 𝐷1 /2𝐿 = 0.125. Of course 𝑍𝑐− = 𝑍0 −
𝛽𝜇𝐵 𝐿2 𝐷1 2 (1 + 𝐿 ) 2𝑚𝑣𝑌
and thus Δ𝑍 =
𝛽𝜇𝐵 𝐿2 𝐷1 2 (1 + 𝐿 ). 𝑚𝑣𝑌
Note that the final angle of deflection is given by 𝜑𝑓 = ±arctan [
𝛽𝜇𝐵 𝐿2 𝐷1 (1 + )]. 2 𝐿 2𝑚𝑣𝑌 𝐷1
At the beginning of the magnet the initial distribution of particles is given by (we are assuming now that 𝑋̅0 = 𝑍0̅ = 0, meaning that we are assuming collimation by pinholes) 𝐹(𝑋, 𝑍, 𝑡0 ) =
1 𝑋 2 + Z2 exp (− ). 2𝜋𝜎02 2𝜎02
Given the previous calculation, the final distribution of points in the screen is expected to be 2
1
𝐹(𝑋, 𝑍, 𝑡) = 2𝜋𝜎2 exp (− 0
(√𝑋 2 +Z2 −∆z⁄2) 2𝜎02
),
and its functional behavior can be graphically seen in Figure 7-8.
(7.39)
Classical Representation of the Spin
221
The experimental results of Gerlach and Stern[93] can be seen in Figure 7-9. The shape of lips of the separated beam can be explained in terms of the slit properties.
Figure 7-8. The expected experimental outcome in an Stern-Gerlach experiment using pinholes as collimators.
Our previous results assumed that the slits would be circular appertures (pinholes), which will entail the result of a circular ring on the photographic plate (as in Figure 7-8). Of course, if we change the geometry of the slits, the final geometry of the deposition on the photographic plate will also change.
Figure 7-9. The experimental result of a Stern-Gerlach experiment. See also [93].
For instance, we expect that a rectangular geometry will just transform the circular ring into an ellipsoid on the detectors, resembling closely the results of the original Stern-Gerlach experiment.
7.6. EXPERIMENTAL ASSESSMENT TO NON-COMMUTING OBSERVABLES Given the expected and the already known experimental results presented in the previous section, we may now ask ourselves about the physical meaning of such results, that is, how these results can be accommodated into the interpretation.
222
Olavo Leopoldino da Silva Filho
Figure 7.10.The supposed theoretical prediction of the outcome of the Stern-Gerlach experiment. Two dots on the detectors’ screen.
When analyzing the Stern-Gerlach experiment, one usually thinks of the outcome of it as representing an ideal situation shown in Figure 7-10 – two dots on the detectors’ screen. The remaining parts of the lip-shaped experimental outcome (or the circular one, for that matter) are simply discarded (probably being assumed as some experimental disturbance coming from some lack of experimental control regarding collimation, for instance). This is temerarious in any concrete situation, but disastrous in the present one, as we will argue in what follows. Indeed, taken the circular outcome, mainly, it will be a matter of pure belief, driven by conviction about some interpretation (let alone addiction), to assume that the experimental outcome should be as shown in Figure 7-10. Why not assuming the same for any direction, since they form, as predicted by the very theory, a circular path, with no preference of direction? However, if we assume that what Otto Stern and Walther Gerlach obtained is accurate and should be considered in all its details, then the tails of the lip shaped distribution on the detectors contain as much physics as its center, as our previous calculations just shown. In fact, if we take the experiment done with one single particle, we may assume that it impacts the detectors on some position (𝑋𝐷 , 𝑍𝐷 ). Thus, for this single particle experiment one may use the mathematical arrangement to calculate: (a) the angle 𝛼, with which the particle entered the magnet; (b) its projection on the 𝑂𝑋 axis; and (c) its projection on the 𝑂𝑍 axis. However, since we already know the value of the modulus of the particle’s magnetic moment, we assess the experimental value of the projection of the magnetic moment on the 𝑂𝑋 and 𝑂𝑍 axes. This, of course, is also correct if we talk about averages (and use a small angle on the 𝑂𝑋𝑍 plane on the detectors). But these experimental results should be the values of 〈𝜎̂𝑋 〉 and 〈𝜎̂𝑍 〉, even though we already know that the operators 𝜎̂𝑋 and 𝜎̂𝑍 do not commute! One may argue (correctly) that we have obtained, in fact, the (average) value of the operator 〈𝜎̂𝑋 sin 𝛼 + 𝜎̂𝑍 cos 𝛼〉, since we solved the problem in the representation which is diagonal for this operator (remember that we made diagonal the problem by rotating our coordinate system). But this last qualification does not change anything in the previous argument, since the experimental values of 𝜎̂𝑋 and 𝜎̂𝑍 are still being obtained, and these operators do not commute with 𝜎̂𝑋 sin 𝛼 + 𝜎̂𝑍 cos 𝛼 either. However, the important fact here is that neither 〈𝜎̂𝑋 〉, nor 〈𝜎̂𝑍 〉, nor even 〈𝜎̂𝑋 sin(𝛼) + 𝜎̂𝑍 cos(𝛼)〉 represent experimental results without dispersion, while according to the usual
Classical Representation of the Spin
223
interpretation, the latter should be dispersionless, while the other two should present dispersions, even though 〈𝜎̂𝑋 sin(𝛼) + 𝜎̂𝑍 cos(𝛼)〉 = 〈𝜎̂𝑋 sin(𝛼)〉 + 〈𝜎̂𝑍 cos(𝛼)〉, and dispersions do not cancell out. Thus, the traditional approach seems to be saying to us that 〈𝜎̂𝑋 sin(𝛼) + 𝜎̂𝑍 cos(𝛼)〉 = 〈𝜎 〈𝜎̂𝑍 〉 cos(𝛼) , ̂𝑋 〉 sin(𝛼) + ⏟ ⏟ ⏟ 𝑑𝑖𝑠𝑝𝑒𝑟𝑠𝑖𝑜𝑛𝑙𝑒𝑠𝑠
𝑑𝑖𝑠𝑝𝑒𝑟𝑠𝑖𝑜𝑛 ∆𝜎𝑥
𝑑𝑖𝑠𝑝𝑒𝑟𝑠𝑖𝑜𝑛 ∆𝜎𝑧
which is preposterous. This result will become very important in the Chapter nine, when we approach the issue on Bell’s inequalities.
Chapter 8
OPERATOR FORMATION AND PHASE SPACE DISTRIBUTIONS Until chapter six, we proved that the developments in configuration space of the present approach are in complete agreement with those obtained by usual means in the framework of the Schrödinger equation. In chapter six we made explicit the true random behavior of quantum systems by means of Langevin equations. We then turned our attention to phase-space descriptions of Quantum Mechanics. There appeared a difference between the marginal momentum space distribution obtained from 𝜌𝐹 (𝑝) = ∫ 𝐹(𝑞, 𝑝)𝑑𝑝
(8.1)
and the one coming from the traditional procedure of taking the Fourier transformation of the configuration space probability amplitude and squaring it 2
𝜌𝑄 (𝑝) = |∫ 𝑒 𝑖𝑝𝑞 𝜓(𝑞)𝑑𝑞| .
(8.2)
As mentioned, the present approach completely reproduces the usual results of the theory based on the Schrödinger equation when only the configuration space is considered. Thus, it becomes obvious that the way (8.2) to prescribe how one should form the momentum space probability amplitudes and densities must be a different and independent axiom of the traditional approach. An axiom that is inconsistent with the present approach. In fact, the formation of the momentum space distribution plays a quite distinct role in the present approach. In our approach, expression (8.1) cannot be considered an independent axiom. In fact, the calculation of the momentum space probability density merely follows from the notion of 𝐹(𝑞, 𝑝) as a probability density function and is done by the usual statistical procedure above mentioned, which reflects the action of taking each point in phase-space and counting the number of times the system occupies it. Indeed, the process of calculating the average values of dynamical functions of 𝑞 and 𝑝 is simply ⟨𝑓(𝑞, 𝑝)⟩𝐹𝑎𝑣 = ∫ 𝑓(𝑞, 𝑝)𝐹(𝑞, 𝑝)𝑑𝑞𝑑𝑝,
226
Olavo Leopoldino da Silva Filho
exactly as in any classical statistical approach, whatever the appearance of the function 𝑓(𝑞, 𝑝) could be. In such an approach, there must be only one distribution 𝐹(𝑞, 𝑝) for each phenomenon; in fact, 𝐹(𝑞, 𝑝) characterizes the phenomenon—different 𝐹(𝑞, 𝑝) will refer to different phenomena. In Quantum Mechanics, as it is generally accepted, things are quite different, and phasespace probability density functions, let alone distributions, galore. This does not comes without consequences. In fact, it is assumed that there could be innumerous phase space distributions (truly an infinite number of them). This assumption depends necessarily on a philosophical perspective according to which there is no underlying reality (regarding the action of counting on phase-space) to which one could refer to sustain some unique 𝐹(𝑞, 𝑝). Or, at least (in the weaker version), if there is such an underlying reality, its state (the way the phase space is filled) can change depending on how one observes the phenomenon. Thus, the phenomenon is observed in such ways that sometimes the filling of its phase-space is given by Husimi’s distribution, sometimes this filling is given by Wigner’s distribution. This is consistent, of course, with an interpretation sustaining that the observers define the quantum mechanical operators. Indeed, in the realm of traditional ways of interpreting Quantum Mechanics, the operators on phase-space have a mathematical definition that depends on the statistical distributions. Thus, a change in the way of observation implies a change also in the way the system’s phase-space is filled, without changing the values of the averages measured. In fact, it is an interesting feature of this kind of interpretation that the fundamental constraint it uses is the complete agreement with respect to the final observation, regardless of the observers involved; it is like if the observers could change the underlying reality of the counting in phase-space without changing the final outcome of such a counting when average values are concerned—an interesting merge of subjectivism (given by the use of active observers) with objectivism (their final agreement). In such an interpretation, the phase-space has no objective reality, although the averages do. This may explain the whimsical way physicists take the result that distributions like Wigner’s do not always assume positive values—something unacceptable if one considers 𝐹(𝑞, 𝑝) as the result of some statistical counting process. Preposterous as such a way of thinking may be, we can turn our attention to the central idea according to which in Quantum Mechanics the operators are mathematical entities that depend on the distributions. This is far from the usual. In Classical Physics averages of functions of p and q does not have these functions depending on the distribution, but these functions remain fixed, no matter which distribution is used. If we understand the reasons for the quantum perspective, it may possibly help us with the difficult task of understanding how a phase-space theory for Quantum Mechanics should be developed. Indeed, in the way Quantum Mechanics was historically developed, it was impossible to impose some functional appearance for the operators in a way independent of the distributions (densities or amplitudes). For instance, in classical situations, we are pretty sure of the way one should calculate the average value of the quantity 𝑞2 𝑝2, and we know that in advance, without any reference to the probability distribution. We just calculate 2 2 ⟨𝑞2 𝑝2 ⟩𝐶𝐿 𝑎𝑣 = ∫ ∫ 𝑞 𝑝 𝐹𝐶𝐿 (𝑞, 𝑝; 𝑡)𝑑𝑞𝑑𝑝.
(8.3)
Operator Formation and Phase Space Distributions
227
There is no doubt about that, since this is the way, indeed, that we define the probability distribution functions1. In the classical framework, if one has a distribution function 𝐹(𝑞, 𝑝; 𝑡) that furnishes different occupation values for cells in phase-space as compared to the experimental results, then 𝐹(𝑞, 𝑝; 𝑡) is, quite simply, the wrong distribution. In Quantum Mechanics, for reasons that will soon become explicit, it is impossible to advance the mathematical expression of an operator without choosing the phase-space representation by means of some phase-space distribution function (Wigner’s, Husimi’s, Normal Ordered, etc.) For instance, we have no clue about the mathematical expression for the operator 𝑂[𝑞 2 𝑝2 ] until an underlying 𝐹(𝑞, 𝑝; 𝑡) is assumed for the phenomenon we are investigating. Some would say that this is an essential feature of Quantum Mechanics (and we have no doubt that this is the place where one would lightly stress the weird, strange or mysterious character of the theory). We would like to disagree on that matter. We believe that this state of affairs should not be considered essential to Quantum Mechanics. Instead, it should be considered as reflecting one of its deficiency that follows quite naturally from the fact that the Schrödinger equation was one of the axioms of the theory. This becomes obvious from the present approach and its results. In the present approach, the Schrödinger equation is nothing but a theorem, coming from the axioms of the theory. These axioms allow us to always know in advance how to calculate the average value of things like 𝑞2 𝑝2 . Indeed, it is calculated in exactly the same manner as usual in classical frameworks; we just write 2 2 ⟨𝑞2 𝑝2 ⟩𝑄𝑀 𝑎𝑣 = ∫ ∫ 𝑞 𝑝 𝐹𝑄𝑀 (𝑞, 𝑝; 𝑡)𝑑𝑞𝑑𝑝.
Thus, one may ask about the difference between this result and the one coming from (8.3). The difference comes not from the way we represent 𝑞2 𝑝2 to calculate ⟨𝑞2 𝑝2 ⟩𝑎𝑣 , but from the distribution function 𝐹𝑄𝑀 (𝑞, 𝑝; 𝑡) that we use to make such a calculation. When we use 𝐹𝑄𝑀 (𝑞, 𝑝; 𝑡), we get averages of systems that behave quantum mechanically. This is so because this distribution embraces the results of the Schrödinger equation. If we use some 𝐹𝐶𝐿 (𝑞, 𝑝; 𝑡), reflecting the results of some classical equation (say, Boltzmann’s one), we get different results. The moral of the story here is that one should expect differences in calculating average values because there are differences on the probability distributions (each one being solution of a different equation that represent different phenomena), not on the differences on the probability distributions and the way one constructs the mathematical objects that furnish the average results (the phase-space operators). Thus, in the present approach there is no room for some “redefinition” of the phase-space operators, let alone for an interpretation that sustains such things. This poses the problem of knowing which distribution 𝐹(𝑞, 𝑝; 𝑡) is the correct one, given some physical system. Of course, we sustain that the true distribution is the one we found with our derivation process, The knowledge of all statistical moments of the form 𝑝 𝑛 𝑞𝑘 is the definition of the characteristic function that defines 𝐹(𝑞, 𝑝) by inversion of a Fourier transform.
1
228
Olavo Leopoldino da Silva Filho
but this is far from being unanimous. The reasons for that is the subject of the next section, when we will show some inconsistencies that appear in the way the subject is usually treated.
8.1. THE NOTION OF A PROBABILITY DISTRIBUTION IN QUANTUM MECHANICS The way we develop the formal structures and interpretation of Quantum Mechanics can be used to clarify a long lasting problem in this theory. Quantum mechanics was originally formulated in the configuration space representation and only some time later its momentum space representation was proposed[8]. Moreover, in 1932, Wigner proposed[43] a phase space representation of quantum mechanics that, since then, received a lot of attention. For some time, Wigner’s distribution was considered the only phase space distribution function related to quantum phenomena. However, in the last decades there appeared a large number of alternative phase space distributions, of which Husimi’s distribution is one example. There are a number of properties that each one of these distributions are made to obey and, generally, we evaluate them precisely with respect to the number of obeyed properties. The main properties that one would wish general quantum phase space distributions 𝑄 (𝑞, 𝐹 𝑝) to obey are: 1. its marginal probability densities must be those expected for the quantum phenomenon, that is: +∞
∫
+∞
𝐹 𝑄 (𝑞, 𝑝)𝑑𝑝 = 𝜌(𝑞), ∫
−∞
𝐹 𝑄 (𝑞, 𝑝)𝑑𝑞 = 𝜋(𝑝),
−∞
where 𝜌(𝑞) is the usual configuration space probability density function and 𝜋(𝑝) is the momentum space probability density function defined as 𝜋(𝑝) = 𝜑 ∗ (𝑝)𝜑(𝑝), where it is assumed that +∞
𝜑(𝑝) = ∫
𝜓(𝑞)𝑒 𝑖𝑝𝑞 𝑑𝑞,
−∞
being 𝜓(𝑞) the configuration space probability amplitude (or wave function)2. 2. being a probability density function, it should be positive definite; 2
The fact that this property implies a different axiom of the theory is never stressed, giving the impression that this follows naturally from more fundamental quantum mechanical principles. As we pointed out before, the construction of the momentum marginal distribution by the way just prescribed is a different and independent axiom of the theory.
Operator Formation and Phase Space Distributions
229
3. it must be invariant with respect to space and time reflections, i.e., if 𝜓(𝑞) → 𝜓(−𝑞) then 𝐹(𝑞, 𝑝) → 𝐹(−𝑞, −𝑝) and if 𝜓(𝑞) → 𝜓 ∗ (𝑞) then 𝐹(𝑞, 𝑝) → 𝐹(𝑞, −𝑝); 4. it must be Galilean invariant, i.e., if 𝜓(𝑞) → 𝜓(𝑞 + 𝑎) then 𝐹(𝑞, 𝑝) → 𝐹(𝑞 + 𝑎, 𝑝) and if 𝜓(𝑞) → exp(𝑖𝑝′ 𝑞/ℏ)𝜓(𝑞) then 𝐹(𝑞, 𝑝) → 𝐹(𝑞, 𝑝 − 𝑝′ ); 5. it must be bilinear in the wave function 𝜓; 6. it must be real valued for all 𝑞, 𝑝 and 𝑡; 7. the set of functions 𝐹𝑛𝑚 (𝑞, 𝑝) constructed from the bilinear pair 𝜓𝑛 , 𝜓𝑚 (which are eigenstates of some physical system) form a complete orthonormal set, in the sense that (see [95], p. 166) ∫ 𝑑𝑞 ∫ 𝑑𝑝𝐹𝑛′ 𝑚′ (𝑞, 𝑝)𝐹𝑛,𝑚 (𝑞, 𝑝)𝑑𝑞𝑑𝑝 =
1 𝛿 ′𝛿 ′ 2𝜋ℏ 𝑛𝑛 𝑚𝑚
and ∗ (𝑞 ′ ′ ) ∑ 𝐹𝑛𝑚 (𝑞, 𝑝)𝐹𝑛𝑚 ,𝑝 = 𝑛,𝑚
1 𝛿(𝑞 − 𝑞 ′ )𝛿(𝑝 − 𝑝′ ) 2𝜋ℏ
8. and many other less fundamental properties (see [46]). These properties may be viewed as constraints, with different degrees of importance, defining the physical acceptability of some 𝐹 𝑄 (𝑞, 𝑝). In fact, Wigner’s distribution satisfies most of these constraints (constraints that, after all, were thought with respect to it), except for being positive definite (for some states of some systems)—and this is the reason why we call it a probability distribution function in spite of a probability density function. On the other hand, Husimi’s distribution does satisfy the requirement of positiveness, but fails in producing a probability density with property one, for instance. In fact, it has already been shown that any distribution having property two will lack property one. Thus, the scenario of quantum mechanical probability densities/distributions is populated by functions that satisfy some of the requirements 1-5, but not all of them. In Table 12 we show some of these distributions and the properties they obey. For each one of these distributions there exists a dynamical equation of which they are the solutions; these distributions also induce different operator formation rules (association rules). The widest known of these dynamical equations, of course, is the Wigner-Moyal phase space dynamical equation given by ∞
(ℏ/2𝑖)2𝑛 ∂2𝑛+1 𝑉 ∂2𝑛+1 𝐹𝑊 ∂𝐹 𝑊 𝑝 ∂𝐹 𝑊 ∂𝑉 ∂𝐹 𝑊 =− + +∑ , (2𝑛 + 1)! ∂𝑞2𝑛+1 ∂𝑝2𝑛+1 ∂𝑡 𝑚 ∂𝑞 ∂𝑞 ∂𝑝 𝑛=1
230
Olavo Leopoldino da Silva Filho
where 𝑉(𝑞) is the potential function, as usual. We stress at this point that this is, in general, an equation with infinite terms, since this will be important for us later on. In 1966, Cohen has shown[96] that one may produce an infinite number of such phase space probability density functions, some of them with the positive definiteness property[97]. Each one of these functions obey some of the requirements given by the properties previously mentioned. Cohen’s result implied the excruciating question about which of these functions is really the quantum mechanical distribution, a topic that received some attention in the literature. This problem, however, was “solved” in the last years by brute force. Indeed, it is now generally accepted that it is irrelevant which one of these functions one uses. In fact, from the physical point of view, they are generally considered as identical to one another, differing in the easiness one may find to actually work with them (an operational difference, not a physical one)[95]. This, of course, would, instead of solving the problem, simply dissolving it—showing that the old question about the real phase space distribution was, in fact, meaningless. However, it is not difficult to show that this is wrong and we pass to show it in what follows. The argument for their identity goes as follows Table 12. Properties of the distribution functions (taken from Lee, op. cit., p. 166). d.f. 1 𝑊
Wigner (𝐹 ) Standard ordered (𝐹 𝑆 ) Antistandard ordered (𝐹
𝐴𝑆
)
𝑁
Normal ordered (𝐹 ) Antinormal ordered (𝐹 Husimi (𝐹 𝐻 ) 𝑂
Present work (𝐹 )
𝐴𝑁
)
Properties 2 5 6
7
yes
no
yes
yes
yes
yes
no
yes
no
yes
yes
no
yes
no
yes
no
no
yes
yes
no
no
yes
yes
yes
no
no
yes
yes
yes
no
no
yes
yes
yes
yes
Claim: given a function 𝐹 𝑄 (𝑞, 𝑝), it induces some operator formation rule; when we use each 𝐹 𝑄 (𝑞, 𝑝) with its own induced operators and integrate them in the whole space, they produce always the same numerical results. Thus, in the literature, statements such as the one presented bellow galore. “To conclude this section, we emphasize that the distribution functions 𝐹 𝑊 , 𝐹 𝑆 , 𝐹 𝐴𝑆 , 𝐹 𝑁 and 𝐹 𝐴𝑁 [Wigner’s, standard and anti-standard, normal and anti-normal distribution functions, respectively] are all capable of yielding the [same] expectation value of any arbitrary operator as long as it can be expanded in powers of 𝑞̂ and/or 𝑝̂ . All these distribution functions are thus equal in the sense that they contain all the physically meaningful information about the system being considered. Neither of these distribution functions contains either more or less information than any other.’”(Cf. [95], p. 159)
Operator Formation and Phase Space Distributions
231
There are, however, qualifications that should be done with respect to the previous argument. We address them in what follows.
̂ operator 8.1.1. The 𝟏 The “operator” 1̂ has a strong statistical interpretation (after all, the concept of a distribution is statistical, not physical–physics says what this distribution should mean and to what it should refer). Indeed, the only way we can say that a function 𝐹(𝜉) is a statistical density/distribution for the variable 𝜉 is if the integration 𝜉
∫𝜉 𝑏 𝐹(𝜉)𝑑𝜉 𝑎
(8.4)
gives the probability of 𝜉 being within the interval [𝜉𝑎 , 𝜉𝑏 ]. This is the very definition of a probability density/distribution. If we assume that the operator 1̂ is always taken into itself when passing from one quantum mechanical phase space distribution function 𝐹 𝐴 (𝑞, 𝑝) to another 𝐹 𝐵 (𝑞, 𝑝), then the argument about the indifference of choosing one of these distributions would be (at first sight) acceptable. After all, the integration of any distribution over the entire space is simply one, assuming that they are normalized. However, this is not actually the case: the above argument is always used thinking on integration on the whole phase space, but this is a very particular situation as compared with (8.4). Indeed, in the literature it is always lacking the specification that the results of phase space integrations to calculate average values of quantum mechanical operators must be compared as integrated in the whole phase space[95]. Of course, each quantum mechanical 𝐹 𝑄 (𝑞, 𝑝) must integrate to 1. Now (and this should be obvious from the start), take Wigner’s distribution 𝐹 𝑊 (𝑞, 𝑝) and Husimi’s 𝐹 𝐻 (𝑞, 𝑝) and integrate the first one with respect to the “operator” 1 (that is, as a true probability distribution function) exactly within the phase space region Ω𝑊 (𝑞, 𝑝) within which 𝐹 𝑊 is negative. The result, of course, must be negative. However, being Husimi’s function positive definite, there is no way it can be integrated (in whatever phase space region one prescribes) to give a negative result (assuming that it must be integrated with the operator 1̂). Thus, we have the following situation: these two distributions cannot induce the same operator 1̂ or they cannot be both considered the correct probability density function, since they do not furnish the same quantum mechanical result for the problem under scrutiny. Of course, one may still argue that “to the phase-space representation of operator 1̂, related to 𝐹 𝑊 (𝑞, 𝑝), there is another phase-space representation given by the operator 𝑓(𝑞, 𝑝), related to 𝐹 𝐻 (𝑞, 𝑝), such that when we perform the integration ∫
𝑓(𝑞, 𝑝)𝐹 𝐻 (𝑞, 𝑝)𝑑𝑞𝑑𝑝,
Ω𝐻 (𝑞,𝑝)
for some region Ω𝐻 (𝑞, 𝑝) of the phase space, it gives exactly the same result of the integration
232
Olavo Leopoldino da Silva Filho ∫Ω
𝑊 (𝑞,𝑝)
𝐹 𝑊 (𝑞, 𝑝)𝑑𝑞𝑑𝑝.
However, in this case, for instance, the Wigner quantum mechanical operator 1𝑊 𝑄 is taken into the Husimi quantum mechanical operator 𝑓(𝑞, 𝑝)—shouldn’t we consider, then, 𝑓(𝑞, 𝑝)𝐹 𝐻 (𝑞, 𝑝) as the correct probability density? And, since this would vary for each Ω𝑊 (𝑞, 𝑝), that is, for each Ω𝑊 (𝑞, 𝑝) we will find infinite pairs of 𝑓(𝑞, 𝑝), Ω𝐻 (𝑞, 𝑝) to arrive at the same result, shouldn’t this represent that the so called indifference on the choice of the phase-space probability distribution is a myth? Let us see some examples: Example: the Wigner distribution function can be related to the anti-normal distribution function 𝐹 𝐴𝑁 (𝑞, 𝑝, 𝑡) by means of the expression (see [95], p. 170) 1
′
2 /ℏ−(𝑝′ −𝑝)2 /ℏ𝑚𝜔]
𝐹 𝐴𝑁 (𝑞, 𝑝, 𝑡) = 𝜋ℏ ∫ 𝑑𝑞′ ∫ 𝑑𝑝′ 𝑒 [−𝑚𝜔(𝑞 −𝑞)
𝐹𝑊 (𝑞′ , 𝑝′ , 𝑡)𝑑𝑞′ 𝑑𝑝′ .
For the harmonic oscillator, we have (see [95], p. 167) 𝐹𝑛𝑊 (𝑞, 𝑝) =
(−1)𝑛 −2𝐻/ℏ𝜔 4𝐻 𝑒 𝐿𝑛 ( ), 𝜋ℏ ℏ𝜔
where 1 𝑝2 𝐻 = 𝑚𝜔2 𝑞 2 + , 2 2𝑚 𝑛 is the quantum number fixing the state, and 𝐿𝑛 is the Laguerre polynomial of order 𝑛. The anti-normal distribution becomes 𝐹𝑛𝐴𝑁 (𝑞, 𝑝) =
𝐻𝑛 𝑒 −𝐻 , (2𝜋ℏ𝑛!)
which is always positive, since 𝐻 is always positive. Assume, as an example, that 𝑛 = 1; thus 𝐹1𝑊 (𝑞, 𝑝) = −
1 −2𝐻/ℏ𝜔 4𝐻 𝑒 (1 − ); 𝜋ℏ ℏ𝜔
for this state, we can now calculate any average value of a general function 𝑓(𝑞, 𝑝) that can be expanded in powers of 𝑞 and 𝑝 by simply integrating ⟨𝑓(𝑞, 𝑝)⟩1𝑊 = ∫ 𝑓(𝑞, 𝑝)𝐹1𝑊 (𝑞, 𝑝)𝑑𝑞𝑑𝑝. Thus, we can assume that 𝑓(𝑞, 𝑝) = 1 and integrate over the region for which 𝐻 ≤ ℏ𝜔/4 to find the probability of the physical system being in a phase space region at which the energy is less then ℏ𝜔/4. To simplify, put 𝑚 = ℏ = 𝜔 = 1 and 𝑅 2 = 2𝐻, such that 𝑅𝑑𝑅𝑑𝜃 = 𝑑𝑞𝑑𝑝 and thus
Operator Formation and Phase Space Distributions 1
√2/2
𝑃𝑊 (𝐻 ≤ 1/4) = − 𝜋 ∫0
233
2
𝑒 −𝑅 (1 − 2𝑅2 )𝑅𝑑𝑅𝑑𝜃 = −2𝑒 −1/2 + 1 = −0.21306.
The probability is negative, whatever this means3. If we simply integrate 𝐹1𝐴𝑁 (𝑞, 𝑝) in whatever region of the phase space we will never get a negative value. The anti-normal average value of any operator 𝐴̂(𝑞̂, 𝑝̂ ) represented as powers of 𝑞̂ and 𝑝̂ can be found by writing the operator in terms of (see [95], p. 156) 𝑎̂ =
1 √2ℏ𝑚𝜔
(𝑚𝜔𝑞̂ + 𝑖𝑝̂ ), 𝑎̂† =
1 √2ℏ𝑚𝜔
(𝑚𝜔𝑞̂ − 𝑖𝑝̂ ),
then expressing 𝐴̂(𝑞̂, 𝑝̂ ) in terms of 𝑎̂ and 𝑎̂† , with 𝑎̂ always preceding 𝑎̂† and finally substituting 𝑎̂ and 𝑎̂† by the functions (see [95], p. 158) 𝑎̂ = 𝛼 =
1 √2ℏ𝑚𝜔
(𝑚𝜔𝑞 + 𝑖𝑝), 𝑎̂† = 𝛼 ∗ =
1 √2ℏ𝑚𝜔
(𝑚𝜔𝑞 − 𝑖𝑝)
to form the function 𝐴(𝑞, 𝑝) and thus integrate it as 𝐴𝑁 ⟨𝐴̂(𝑞̂, 𝑝̂ )⟩ = ∫ 𝐴(𝑞, 𝑝)𝐹 𝐴𝑁 (𝑞, 𝑝)𝑑𝑞𝑑𝑝
Thus, for the operator 𝐴̂(𝑞̂, 𝑝̂ ) = 1, we have 𝐴𝑊 (𝑞, 𝑝) = 1, and also 𝐴𝐴𝑁 (𝑞, 𝑝) = 1. It is quite obvious that +∞
∫ −∞
+∞
∫
+∞
𝐹 𝑊 (𝑞, 𝑝)𝑑𝑞𝑑𝑝 = ∫
−∞
−∞
+∞
∫
𝐹 𝐴𝑁 (𝑞, 𝑝)𝑑𝑞𝑑𝑝 = 1,
−∞
since they must all be normalized (not obeying this property would be too much, even for the most unorthodox physicist). However, if we calculate 𝑃 𝐴𝑁 (𝐻 ≤ 1/4) we find (constants are kept equal to 1) √2/2
𝑃 𝐴𝑁 (𝐻 ≤ 1/4) = ∫ 0
2 5 2 𝑅 𝑒 −𝑅 ( ) 𝑅𝑑𝑅𝑑𝜃 = − 𝑒 −1/4 + 1 = +0.0265, 4𝜋 4
which is obviously positive. In fact, it doesn’t matter if we even change the integration region, since the integrand is positive everywhere. Thus, although one can always calculate averages of any operator 𝐴̂(𝑞̂, 𝑝̂ ) using any distribution 𝐹 ∗ (𝑞, 𝑝) by just adjusting the underlying association rule, this is valid only if one is calculating the average over all the space. Example: as a second example, let us calculate the average energy of the harmonic oscillator for the two distributions 𝐹1𝑊 (𝑞, 𝑝) and 𝐹1𝐴𝑁 (𝑞, 𝑝)—first excited state—(a) first 3
Optimists may try to interpret this as saying that it is impossible for the physical system to occupy such regions in phase space, since the energy of the 𝑛 = 1 state is given by 3/2. However, let us remember that this energy is an average value. Moreover, impossibility in statistics refers to probability𝑧𝑒𝑟𝑜, not negative probability.
234
Olavo Leopoldino da Silva Filho
integrating over the complete phase space and then (b) integrating over the region at which 𝐹1𝑊 (𝑞, 𝑝) is negative. The first integration, using the Wigner’s distribution, is given by 𝑝2 𝑞 2 ⟨𝐻⟩1𝑊 = ∫ ∫ ( + ) 𝐹1𝑊 (𝑞, 𝑝)𝑑𝑞𝑑𝑝 = 3/2 2 2 The integration using the anti-normal distribution is given by first noting that ̂ (𝑎̂, 𝑎̂† ) = 𝑎̂† 𝑎̂ + 1/2 = 𝑎̂𝑎̂† − 1/2 𝐻 giving 𝐻 𝐴𝑁 (𝑞, 𝑝) = 𝐻 − 1/2 and thus 𝑝2 𝑞 2 1 ⟨𝐻⟩1𝐴𝑁 = ∫ ∫ ( + − ) 𝐹1𝐴𝑁 (𝑞, 𝑝)𝑑𝑞𝑑𝑝 = 3/2. 2 2 2 Thus, the results are the same, obviously. Now, (b) must be given by √2/2
𝑊 ⟨𝐻⟩1,𝑟𝑒𝑔𝑖𝑜𝑛 = ∫0
2𝜋
(𝑅 2 /2)𝑒 −𝑅 (1 − 2𝑅 2)𝑅𝑑𝑅𝑑𝜃 = − 𝑒 −1/2 + = −0.0163, 2 2
2𝜋
(𝑅 2 /2 − 1/2)𝑒 −𝑅 (2𝑅2 )𝑅𝑑𝑅𝑑𝜃 = −
∫0
5
2
3
while √2/2
𝐴𝑁 ⟨𝐻⟩1,𝑟𝑒𝑔𝑖𝑜𝑛 = ∫0
∫0
2
31 16
3
𝑒 −1/4 + = −0.0089, 2
which are obviously different. In fact, if we keep the upper limit of the 𝑅 integration as a variable, we find 𝑟
2𝜋
(𝑅 2 /2)𝑒 −𝑅 (1 − 2𝑅 2)𝑅𝑑𝑅𝑑𝜃 = −𝑒 −𝑟 (𝑟 4 + 𝑟 2 + ) + , 2 2 2
𝑟
2𝜋
(𝑅 2 /2 − 1/2)𝑒 −𝑅 (2𝑅2 )𝑅𝑑𝑅𝑑𝜃 = −𝑒 −𝑟
𝑊 ⟨𝐻⟩1,𝑟𝑒𝑔𝑖𝑜𝑛 = ∫0 ∫0
2
3
2
3
3
and 𝐴𝑁 ⟨𝐻⟩1,𝑟𝑒𝑔𝑖𝑜𝑛 = ∫0 ∫0
2
2 /2
𝑟4
3
3
3
( 4 + 4 𝑟 2 + 2) + 2,
meaning that equality will obtain only when 𝑟 → ∞. In fact, it is possible to get equality if we put, only in the second result, 𝑟 → √2𝑟, but this cannot be justified by any argument known by this author (and is, generally speaking, a property that is valid only to the harmonic oscillator problem in this very particular state, anyway.) In any case, this strategy would not work in the case 𝐴̂(𝑞̂, 𝑝̂ ) = 1, since no change in the integration interval can make the 𝐴𝑁 calculation ⟨1⟩1,𝑟𝑒𝑔𝑖𝑜𝑛 negative.
Operator Formation and Phase Space Distributions
235
From these two examples it comes the quite obvious conclusion: the distributions 𝐹 𝑝) and 𝐹 𝐴𝑁 (𝑞, 𝑝) are not equivalent. There is, of course, one last resource for the despaired: one may sustain that both the operator (even if it is 1̂𝑄 ) and the integration domain should be modified, this domain becoming equal in all cases only when it becomes the whole space. However, this would throw the field into the wildest confusion, since one would have first to find, by usual means, the operator induced by 𝐹 𝑄 (𝑞, 𝑝), and then calculate for each quantum mechanical state the correct modification of the integration domain that would lend the results equal. Moreover, even this last procedure, preposterous as it should appear, would not work when the regions giving negative probability values are considered, if one keeps the prescription that the operator 1̂ should be taken into itself. Moreover, the modification of the integration domain seems too arbitrary; for example, if one asks what is the probability of finding the harmonic oscillator system with less energy than 𝐻0, it would be ridiculous to integrate some distribution 𝐹 𝐴 using Ω𝐴 = [0, 𝐻0 ] while integrating another distribution 𝐹 𝐵 using Ω𝐵 = [0, 𝐻0∗ ], with 𝐻0 ≠ 𝐻0∗—what then 𝐻0∗ stands for? Does it stand for the fact that some “observer”-like action changed the filling of the phase space? But, then, it is a mystery the fact that, despite this disastrous change in the phase space properties, the averages, when calculated with respect to the whole space, still furnishes the same average results. In any case, one could not control the relation between the fixation of the integration domain (when it is not infinite) and some process of observation, since there is no prescription to link such things. Thus, whenever the observation of finite regions of space are considered, one would have to wait for the experimental results to search for the appropriate interval of integration and the adequate probability density function. Thus, the reassuring argument that all these functions are physically identical cannot be accepted and the question about which one is the real quantum mechanical distribution must be reconsidered. This is one of the aims of the present chapter. 𝑊 (𝑞,
8.2. OPERATOR FORMATION IN QUANTUM MECHANICS Closely related to the question about the true phase-space probability distribution in quantum mechanics is the existence of many different ways in Quantum Mechanics to define how a quantum mechanical operator should be formed. There are, generally speaking, two different classes of approach to the problem: The first strategy does not establish a connection between operators and phase space probability distributions. The methods of this kind of approach rely on the assumption of certain rules. Two methods of this kind are[98] von Neumann’s and Dirac’s rules. Given that 𝑂[𝐴] is the operator of some function 𝐴, von Neumann’s rules are ([99], p. 9) 𝐴 → 𝑂[𝐴] ⇒ 𝑓(𝐴) → 𝑓(𝑂[𝐴]) , 𝐴 → 𝑂[𝐴], 𝐵 → 𝑂[𝐵] ⇒ 𝐴 + 𝐵 → 𝑂[𝐴] + 𝑂[𝐵] where the commutativity of 𝑂[𝐴] and 𝑂[𝐵] is not assumed. On the other hand, Dirac’s rule reads ([98],[99])
236
Olavo Leopoldino da Silva Filho 𝑂[𝐴]𝑂[𝐵] − 𝑂[𝐵]𝑂[𝐴] = 𝑖ℏ{𝐴, 𝐵},
where {𝐴, 𝐵} is the classical Poisson bracket. As an example, the operators related to the classical function 𝑓(𝑞, 𝑝) = 𝑞2 𝑝2 are, according to von Neumann’s rules ([98], p. 9,10), 𝑂[𝑞2 𝑝2 ] = 𝑂[𝑞]2 𝑂[𝑝]2 − 2𝑖ℏ𝑂[𝑞]𝑂[𝑝] − ℏ2 /4 { , 𝑂[𝑞2 𝑝2 ] = 𝑂[𝑞]2 𝑂[𝑝]2 − 2𝑖ℏ𝑂[𝑞]𝑂[𝑝] − ℏ2 while, according to Dirac’s rules ([99], p. 19), 𝑂[𝑞2 𝑝2 ] = 𝑂[𝑞]2 𝑂[𝑝]2 − 2𝑖ℏ𝑂[𝑞]𝑂[𝑝] − ℏ2 /3 { 𝑂[𝑞2 𝑝2 ] = 𝑂[𝑞]2 𝑂[𝑝]2 − 2𝑖ℏ𝑂[𝑞]𝑂[𝑝] − 2ℏ2 /3 which are ambiguous and thus contrary to the postulates of quantum mechanics. Indeed, it is imperative that each “observable” must be assigned to one and only one operator. In fact, since these rules give, for the same function 𝑞2 𝑝2, different operators, they also give different average values for the same “observable”, which is unacceptable, since these average values are precisely the values of the “observables” on each process of measurement (in terms of the usual interpretation). The second strategy does rely on the connections between probability distributions and the operator formation rule they define. For instance, Weyl’s rule (Wigner’s distribution) implies that 𝑂[𝑞2 𝑝2 ] = 𝑂[𝑞]2 𝑂[𝑝]2 − 2𝑖ℏ𝑂[𝑞]𝑂[𝑝] − ℏ2 /2 which is unambiguous, but has the drawback that, for the harmonic oscillator problem, for instance, with the Hamiltonian given by 1 𝐻 = (𝑝2 + 𝑞2 ), 2 this rule gives 𝑂[𝐻 2 ] = 𝑂[𝐻]2 + ℏ2 /4 which is equivalent to saying that it predicts energy dispersions for the energy eigenstates (that is also contrary to the orthodox epistemological interpretation of Quantum Mechanics, that assumes that no dispersion is to be found when considering eigenstates of the operator). In this example, the last equation says that the dispersion in energy will be given by Δ𝐸 = √⟨𝑂[𝐻 2 ]⟩ − ⟨𝑂[𝐻]⟩2 = ℏ/2 for all energy levels. Weyl’s rule has also the disadvantage that some functions of classical constants of motion are transformed into operators which do not commute with the
Operator Formation and Phase Space Distributions
237
Hamiltonian and are not constants of motion in the quantum mechanical side. This may be shown using the Hamiltonian[98] 𝐻 = 𝑝2 /2 + 𝑞4 /4, for which Weyl’s rule gives 𝑂[𝐻 2 ] = {𝑂[𝐻]}2 + 3ℏ2 𝑞2 /4, implying that 𝑂[𝐻 2 ] does not represent a constant of motion, since it does not commute with 𝐻. In short, the acceptance of Weyl’s rule implies that we do not accept the first of von Neumann’s rule 𝑂[𝐴] = 𝐴 ⇒ 𝑂[𝑓(𝐴)] = 𝑓(𝐴). Leaving aside the first strategy, since it is ambiguous in the production of operators, we are still left with the second strategy. This type of approach, although unambiguous, has the characteristic of furnishing different operators when different phase-space probability distributions are used. Since we have shown that there is no possibility of considering all phase-space probability distributions as equivalent, the quest for the true phase-space distribution is also the quest for the true operator formation rule. On the other hand, it is possible that one can get rid of the need to even present such a rule. Indeed, this is the obvious result of the present approach since in it, because of the axioms, we must always form average values of phase-space functions 𝑓(𝑞, 𝑝) with the expression ⟨f(q, p)⟩av = ∫ ∫ f(q, p)FQ (q, p; t)dqdp,
(8.5)
where 𝜌(𝑞; 𝑡)
2
(𝑝 − 𝑝(𝑞; 𝑡)) 𝐹𝑄 (𝑞, 𝑝; 𝑡) = exp [− ], 2 2𝜎 2 (𝑞; 𝑡) √2𝜋𝜎 (𝑞; 𝑡) and 𝜌(𝑞; 𝑡), 𝑝(𝑞; 𝑡) and 𝜎 2 (𝑞; 𝑡) come from the solutions of the Schrödinger equation. Thus, to find any average value in the present approach, one must simply solve the Schrödinger equation to find the correct probability amplitudes, take ρ(q; t), p(q; t) and σ2 (q; t) formed from those probability amplitudes and write FQ as prescribed by the last result. The average value of any phase space function is thus calculated as in (8.5). This result, together with the fact that we already have a method to arrive at the Schrödinger equation for any physical system (a quantization procedure), implies that, within the present approach, there is no need to even speak about operator formation.
238
Olavo Leopoldino da Silva Filho
8.3. THE DYNAMICAL EQUATIONS As we have mentioned in the previous section, there are different dynamic equations connected to the innumerous probability distribution functions. Wigner’s dynamical equation was already presented and can be written as ∂𝐹𝑊 ∂𝑡
=−
𝑝 ∂𝐹𝑊 𝑚 ∂𝑞
+
∂𝑉 ∂𝐹𝑊 ∂𝑞 ∂𝑝
+ ∑∞ 𝑛=1
(ℏ/2𝑖)2𝑛 ∂2𝑛+1 𝑉 ∂2𝑛+1 𝐹𝑊 (2𝑛+1)! ∂𝑞 2𝑛+1 ∂𝑝2𝑛+1
,
which is an equation with, formally, an infinite number of terms (for some specific problems, such as the harmonic oscillator, it can have a finite number of terms because of the mathematical form of the potential.) For the antinormal-ordered distribution function 𝐹 𝐴𝑁 (𝑞, 𝑝, 𝑡) we have (see [95], p. 184) ∂𝐹𝐴𝑁 ∂𝑡
×
2
= exp [ ℏ
ℏ
∂
∂
+
ℏ𝑚𝜔 ∂
∂
ℏ
] sin [ (
∂
∂
2𝑚𝜔 ∂𝑞1 ∂𝑞2 2 ∂𝑝1 ∂𝑝2 2 ∂𝑞1 ∂𝑝2 𝐴𝑁 (𝑞 𝐴𝑁 (𝑞 ̃ )𝐹 𝐻 1 , 𝑝1 2 , 𝑝2 , 𝑡)|𝑞 =𝑞 =𝑞,𝑝 =𝑝 =𝑝 1 2 1 2
−
∂
∂
∂𝑞2 ∂𝑝1
)] ,
̃ 𝐴𝑁 (𝑞, 𝑝) has to be obtained from 𝐻(𝑞, 𝑝) by means of the process where we remember that 𝐻 mentioned in the two previous examples and is, in general, different from 𝐻(𝑞, 𝑝) (see the second example in section 8.1). Other distributions have different dynamical equations with different terms. In fact, the standard-ordered distribution function F S (q, p, t) satisfies the dynamical equation ∂𝐹 𝐴𝑁 2 ℏ ∂ ∂ ∂ ∂ ℏ ∂ ∂ ∂ ∂ = exp [𝑖 ( − )] sin [ ( − )] ∂𝑡 ℏ 2 ∂𝑞1 ∂𝑝2 ∂𝑞2 ∂𝑝1 2 ∂𝑞1 ∂𝑝2 ∂𝑞2 ∂𝑝1 , × 𝐻(𝑞1 , 𝑝1 )𝐹 𝑆 (𝑞2 , 𝑝2 , 𝑡)|𝑞1 =𝑞2 =𝑞,𝑝1=𝑝2 =𝑝 and similarly to all distributions mentioned in Table 12. The important point to stress here is that the usual claim is that all these equations have infinite terms, and are of a type generally known as Kramer-Moyal dynamic equations. This claim is also problematic, since the terms in the infinite expansions (for all distributions) depend on derivatives of the potential function with respect to the space. Indeed, it is quite easy to prove that, for potentials of the form 𝑞2 , Wigner’s dynamical equation has the same appearance of the Liouville equation. In fact, choosing the potential function as 𝑉(𝑞) = 𝑞𝑛 will make Wigner’s dynamical equation to have a finite number of terms related to 𝑛. But this result implies that Wigner’s dynamical equation cannot be a Kramer-Moyal type equation. In fact, it was already proved by many authors [66], that Kramer-Moyal dynamical equations must have only two or less terms or else an infinite number of terms. It is impossible to truncate the Kramer-Moyal equation in term six, for instance, without inserting inconsistencies in the relations between statistical moments of the underlying problem. Since Wigner’s distribution satisfies an equation that can have, for specific systems, a finite number of terms, either this distribution does not refer to any statistical process related to Kramer-Moyal equations (which are extremely general), or this distribution induces
Operator Formation and Phase Space Distributions
239
inconsistencies in the statistical formalism one assumes it describes. This argument applies to all probability distributions of Table 12. On the other hand, in the present work we have found that it is possible to derive the Schrödinger equation by different mathematical paths and these paths led us to the phase space distribution function 𝐹 𝑂 (𝑞, 𝑝, 𝑡) =
𝜌(𝑞,𝑡) √2𝜋𝜎 2 (𝑞,𝑡)
exp {−
[𝑝−𝑝̅ (𝑞,𝑡)]2 2𝜎 2 (𝑞,𝑡)
},
where 𝑝̅ (𝑞, 𝑡) and 𝜎 2 (𝑞, 𝑡) are, respectively, the average momentum and the variance for each point 𝑞 of the configuration space. This function has a Gaussian structure in the momentum variable on each fiber labeled by 𝑞 on phase space. This Gaussian functional appearance is the natural expression coming from the truncation of the product 𝑍(𝑞, 𝛿𝑞, 𝑡) = 𝜓 ∗ (𝑞 − 𝛿𝑞/2, 𝑡)𝜓(𝑞 + 𝛿𝑞/2, 𝑡), up to second order in 𝛿𝑞, which was justified by the Central Limit Theorem. In fact, the difference between this characteristic function and the one related to Wigner’s distribution is precisely the mathematical step of stopping at the second order term, which is in the root of the obvious positiveness of F O (q, p, t). This type of approach is quite usual in statistical physics and can be found, for instance, in simple treatments of the random walk problem even in traditional text books (see [25], pp. 38, 39.) The dynamical equation that F O (q, p; t) should solve is given by ∂𝐹𝑛𝑂 ∂𝑡
+
𝑝 ∂𝐹𝑛𝑂 𝑚 ∂𝑞
−{
∂𝑉 ∂𝑞
−
1 ∂𝑣𝑛 (𝑞,𝑡) [𝑝−𝑝𝑛 (𝑞;𝑡)]2 2𝑚
∂𝑞
{
𝑣𝑛 (𝑞,𝑡)
− 3}}
∂𝐹𝑛𝑂 ∂𝑝
= 0,
as shown in Chapter 3, which is an equation with a finite number of therms. We called it the stochastic Liouville equation. The mathematical derivation method presented in this work leaves no freedom to the appearance of another phase space density function different from 𝐹 𝑂 (𝑞, 𝑝, 𝑡), given the scope of applicability of the fomalism (see a discussion on this in what follows). Thus, the results of the previous section raise the question about the correctness of this function (and the inadequacy of all the others, since we have already shown that they cannot be equivalent.) Function 𝐹 𝑂 (𝑞, 𝑝, 𝑡), although satisfying properties 2 − 7, does not satisfy property 1, which we remember, gives one way by which the momentum space density function can be obtained from the configuration space probability amplitude. True enough, this way of obtaining the marginal probability density on momentum space is the one Quantum Mechanics always assumed as being the correct one, and is fully compatible with Wigner’s function 𝐹 𝑊 (𝑞, 𝑝, 𝑡). In fact, it has been already shown ([44], [100], [101]) that there is no nonnegative distributions that are bilinear in the probability amplitude and which yields the quantum-mechanical marginal momentum density as obtained by the usual method shown in property 1.
240
Olavo Leopoldino da Silva Filho
Thus, the question about the correct phase space distribution function is also a question about the correctness of the procedure, mentioned in property 1 above, to find the momentum space marginal probability density function. In the next section we will show that these distribution functions are connected to different degrees of statistical and physical generality. We will also show that all the equations underlying the different distribution functions are related to different types of stochastic processes, which is then another obvious proof of their physical incompatibility.
8.4. ON DISTRIBUTIONS ASSUMING NEGATIVE VALUES There is a further concern regarding specifically those distributions that assume negative values (Wigner’s, for instance, but not Husimi’s). Consider, for the sake of an example, the momentum space probability density function for the first excited state of the harmonic oscillator. This density function has the appearance shown in Figure 8-1(a).
Figure 8.1. Solutions for the first excited state of the Harmonic Oscillator: (a) momentum space density function, (b) Wigner’s distribution and (c) phase space contours of Wigner’s distribution (the gray area corresponds to the region where Wigner’s distribution becomes negative.
How can one interpret this result? If Figure 8-1(a) represents a probability density function, then it says that the probability of finding the system (particle, wave, whatever) with momentum 𝑝 in the vicinity of zero is zero. Of course, from the point of view of a corpuscular interpretation, the last sentence is ludicrous: it is obviously impossible for a periodic system to have probability zero for 𝑝 around zero—just think about the turning points. However, if we look at Figure 8-1(b) and (c), we become aware that our previous interpretation cannot be correct. Indeed, looking at Figure 8-1(c), we readily see that the system can occupy the vicinity of the origin in momentum space, for values of 𝑞 greater than 0.6, approximately. In Figure 8-1(c), as the arrow nearer to the origin shows, there are an infinite number of contours crossing the 𝑞 axis (and, thus, referring to 𝑝 = 0). However, and this is the crucial point, when we calculate marginal distributions on one variable, we must integrate the other variable. It is when we sum all the possibilities for each point 𝑝 that we end up with the result shown in Figure 8.1(a). Thus, because of the negativity of Wigner’s distribution, the information conveyed by Figure 8-1(a) conflicts with the information conveyed by Figure 8-1(b). This kind of conflict
Operator Formation and Phase Space Distributions
241
never happens for positive definite density functions—in fact, the region of momentum space outside the gray area seems to behave with respect to its probability description as one should expect for any probability density function. Even this is not true: because of the negativity of the distribution, the probability of finding the system outside the gray area must be greater than one, which is something contradictory with the notion of probability. Therefore, even if one can make any sense of negative probabilities, this kind of conflict should make us suspicious of these distributions if we are to interpret them as probability distributions. Note that, for Wigner’s distribution, all these comments apply equally well to the configuration space probability density function (since Wigner’s distribution is symmetrical with respect to 𝑞 and 𝑝). We may now contrast property one with our own method for defining 𝜋(𝑝). Of course, we have no alternative but to define 𝜋𝑛 (𝑝) = ∫ 𝐹𝑛𝑂 (𝑞, 𝑝)𝑑𝑞 = ∫
𝜌𝑛 (𝑞) √2𝜋𝜎𝑛 (𝑞)
exp (−
[𝑝−𝑝𝑛 (𝑞)]2 2𝜎𝑛 (𝑞)
) 𝑑𝑞.
For 𝑛 = 1 we get 2 |𝑞|3 exp(−𝑞2 )
𝐹1𝑂 (𝑞, 𝑝) = 𝜋
√𝜋(1+𝑞2 )
𝑝2 𝑞 2
exp [− 1+𝑞2 ],
and its contours and phase space density function, respectively, are as shown in Figure 8-2.
Figure 8.2 (a) Probability contours and (b) probability density function for the harmonic oscillator in the first excited state.
Note that the above mentioned problem does not occur in this approach. In momentum space, all points can be accessed and, indeed, the cell in momentum space related to 𝑝 = 0 corresponds to the greater probability density–as one would expect from the physical intuitions coming from the fact that the physical system is periodic. On the other hand, the phase space probability density also confirms the information already conveyed by the configuration space probability density that the system is not expected to be found in the cell, on configuration space, around 𝑞 = 0 (for reasons that were already discussed in chapter six).
242
Olavo Leopoldino da Silva Filho
Example: We can calculate (using algebraic computation) the momentum density function of the first levels of the hydrogen atom. In this case, the phase space probability density is given by 𝐹1 (𝑞, 𝑝) =
3√6𝑟 3/2 exp(−2𝑟) 4𝜋5/2
2𝑝2
exp (−
3𝑟
),
where we wrote 𝑝2
𝑝2
𝑝𝑟2 + 𝑟𝜃2 + 𝑟 2 sinϕ2(𝜃) = 𝑝2 , for the total momentum of the problem. We thus have 𝜋(𝑝) = ∫ 𝐹1 (𝑞, 𝑝)𝑑𝑞 =
2520 4 3
9/2
729( +𝑝2 )
,
while the usual solution using property one is given by 16
𝜋 ∗ (𝑝) = 5𝜋(1+𝑝2 )4. We can plot this function to get the following result
In the above picture, the momentum space probability density 𝜋(𝑝) obtained using our approach is represented in gray, while the one coming from property one is represented in black. There are small differences between them. The 𝜋(𝑝) obtained by our method is greater at the origin, but compensates with a tail that decays faster.
8.4.1. Some Considerations on the construction of F(x,p) We have always assumed that the stochastic processes we are considering (through Langevin equations, for instance) are such that the true Quantum Mechanical distribution is attained. We have also shown, in chapter six, that from the Langevin equations it is easy to see how the Newtonian limit is qualified. Indeed, we have shown that
Operator Formation and Phase Space Distributions 𝑑𝑝(𝑡)
{
𝑑𝑡
𝑑𝑞(𝑡) 𝑑𝑡
= −𝛾𝑝(𝑡) + ϕ1 (𝑞) + √−γ =
1 𝑚
ℏ2 ∂2 ln𝜌(𝑞) 4
∂q2
𝜁(𝑡) ,
243
(8.5)
𝑝(𝑡)
where ϕ1 (𝑞) represents the force, 𝑚 the mass, 𝑞, 𝑝 the phase space coordinates, 𝜁(𝑡) the term responsible for fluctuations and 𝜌(𝑞) the probability density function related to the phenomenon. In this case, 𝛾 controls the fluctuation-dissipation theorem by saying to what amount fluctuations are dissipated. Of course, Newtonian behavior is recovered if 𝛾 is zero. As 𝛾 increases, we come closer to the quantum mechanical behavior, until we arrive at the value representing the correct 𝛾. On the other hand, for very high values of 𝛾 the system tends to stay on the very same fiber on phase space. For the harmonic oscillator, this would mean the situations schematically shown in Figure 8-3. Thus, it is obvious that one must be very cautious about the way we must get the phase space probability density function, since it will highly depend on the value of 𝛾, mainly with respect to the position.
Figure 8.3. Change of the behavior of the phase space probabiity density function with respect to the parameter gamma. Each frame was produced using 100,000 points in one trajectory and tau=0.0005.
It is easy to see by means of a simple example why the construction of the phase space probability density function must proceed with some caution. Example: Consider the example of a particle in an one dimensional box of size 𝐿 with infinite reflexive walls. The Newtonian solution is well known. Particularly in momentum space, we know that we must have a probability density function that can be written in terms of two Dirac delta distributions in the momenta, that is 1
𝜋𝑁 (𝑝) = [𝛿(𝑝 − 𝑝0 ) + 𝛿(𝑝 + 𝑝0 )], 2
representing the probability of the particle having momentum 𝑝0 or −𝑝0 (because of perfect reflection in the walls). Thus, if we are too close to the Newtonian situation (for instance, because we have 𝐿 → ∞) and the fluctuations are not too strong, then we should expect a momentum probability density that has two peaks, one around 𝑝0 and the other around −𝑝0. This situation clearly violates the criterion for the application of the Central Limit Theorem in the whole momentum space. Indeed, remembering that our Central Limit Theorem is taken along fibers in the momentum coordinate, it is easy to see that, aside for those fibers close to 𝑝0 or −𝑝0,
244
Olavo Leopoldino da Silva Filho
the Central Limit Theorem cannot be assumed. This means that we must proceed with caution with our sampling throughout the entire phase space. This behavior is very different from the one expected for the harmonic oscillator problem. This topic on the momentum space probability densities, although important, will take us too far afield and we will not pursue it further in this book. In the entire book we will be assuming that we are allowed to assume the applicability of the Central Limit Theorem for all fibers in the momentum coordinate.
8.5. WHY FO(Q,P,T)? Since we are now sure that we must look for the true phase space distribution function for quantum mechanical systems, and since there are a number of them from which to choose, we must turn ourselves to two different, but complementary, paths: we may ask for the theoretical arguments for using one distribution or another or we can make experiments that can discern them all. In the following subsections we present a number of theoretical arguments in favor of 𝐹 𝑂 (𝑞, 𝑝, 𝑡). We compare the density 𝐹 𝑂 (𝑞, 𝑝, 𝑡) with the distribution 𝐹 𝑊 (𝑞, 𝑝, 𝑡). These two functions were chosen because they are those having most of the required properties.
8.5.1. Argument One: Positiveness In chapter two, when making the “entropy derivation” of the Schrödinger equation, we found equation (3-12) that we rewrite here as ∂𝑝𝑛 ∂𝑡
∂
𝑝2
ℏ2
∂2 𝑅
𝑛 = − ∂𝑞 [2𝑚 + 𝑉(𝑞) − 2𝑚𝑅(𝑞;𝑡) ∂𝑞2 ],
(8.6)
which clearly has the structure of Bohm’s equation, if we write the “quantum potential” as 𝑄𝑛 (𝑞, 𝑡) =
1
∂2 𝑅𝑛 (𝑞,𝑡)
2𝑚𝑅𝑛 (𝑞,𝑡)
∂𝑞 2
.
From this point on, Bohm makes the identification 𝑝′2
𝑛 𝑝𝑛′ ← 𝑝𝑛 (𝑞, 𝑡), ℎ𝑛 (𝑞, 𝑝𝑛′ , 𝑡) ← 2𝑚 + 𝑉(𝑞)
(8.7)
to write (8.7) as ′ ∂𝑝𝑛
∂𝑡
∂
= − ∂𝑥 [ℎ𝑛 (𝑞, 𝑝𝑛′ ) + 𝑄𝑛 (𝑞, 𝑡)].
(8.8)
By making the previously mentioned identification, it is obvious that we are now considering equation (8.8) as defined in (𝑞, 𝑝′ ) which is a different phase space from the one
Operator Formation and Phase Space Distributions
245
given by (𝑞, 𝑝)—the one from which we began our calculations. This identification is, for most of the cases, problematic as a simple example may show. Example for the harmonic oscillator problem with the potential given by 1
𝑉(𝑥) = 𝑚𝜔2 𝑞2, 2
one has 𝑝′2
1
𝑛 ℎ𝑛 (𝑞, 𝑝𝑛′ , 𝑡) + 𝑄(𝑞, 𝑡) = 2𝑚 + (𝑛 + 2) ℏ𝜔,
giving 𝑝̇ 𝑛′ = 0, 𝑞̇ 𝑛 = 𝑝𝑛′ /𝑚, which is a forceless movement for any state 𝑛. This result shows us that the identification given by Bohm furnishes us a result that must be interpreted in a different way as the one he provides[46]. On the other hand, if we define 𝐸𝑛 = ∫ ℎ𝑛 (𝑞, 𝑡)𝜌𝑛 (𝑞, 𝑡)𝑑𝑞 = ∫ 𝐻(𝑞, 𝑝)𝐹𝑛𝑂 (𝑞, 𝑝, 𝑡)𝑑𝑝𝑑𝑞 =∫[
𝑝2
2𝑚
+ 𝑉(𝑞)] 𝐹𝑛𝑂 (𝑞, 𝑝, 𝑡)𝑑𝑝𝑑𝑞
,
consistent with our definitions of average values, always exactly equal to the classical case, where ℎ𝑛 (𝑞, 𝑡) is the configuration space momentum average of the Hamiltonian 𝐻(𝑞, 𝑝), and 𝑛 is the quantum number related to the state, we find 𝑝𝑛 (𝑞,𝑡)2
𝐸𝑛 = ∫ [
2𝑚
ℏ2 ∂2 ln𝜌𝑛 (𝑞,𝑡)
+ 𝑉(𝑞) − 8𝑚
∂𝑥 2
] 𝜌𝑛 (𝑞, 𝑡)𝑑𝑞
(8.9)
and thus ℎ𝑛 (𝑞, 𝑡) =
𝑝𝑛 (𝑞,𝑡)2 2𝑚
ℏ2 ∂2 ln𝜌𝑛 (𝑞,𝑡)
+ 𝑉(𝑞) − 8𝑚
∂𝑞 2
.
Equation (8.9) is exactly the result we obtain making directly the integration 𝑝2
∫ [2𝑚 + 𝑉(𝑞)] 𝐹𝑛𝑂 (𝑞, 𝑝, 𝑡)𝑑𝑝𝑑𝑞, using 𝐹𝑛𝑂 (𝑞, 𝑝, 𝑡), as it should be, since 𝐹 𝑂 (𝑞, 𝑝, 𝑡) must give the correct energies for the problem. Now, if we make the identification 𝐸𝑛′ ← ℎ𝑛 (𝑞, 𝑡), 𝑝𝑛′ ← 𝑝𝑛 (𝑞, 𝑡)
(8.10)
246
Olavo Leopoldino da Silva Filho
we end with the result 𝑝′2
ℏ2 ∂2 ln𝜌𝑛 (𝑞,𝑡)
𝑛 𝐸𝑛′ = 2𝑚 + 𝑉(𝑞) − 8𝑚
∂𝑞 2
,
which is a much more appropriate identification. Example in the same example of the harmonic oscillator, consider its first three excited states; their equations are now given by (𝑚 = ℏ = 1) ′2 𝑝𝑛
2 ′2 𝑝𝑛 2 ′2 𝑝𝑛 2
1
1 𝑞 2 +1
2 1
1 4 𝑞2 4 2 1 4𝑞 +4𝑞 +5 = 4 (2𝑞 2 −1)2
+ 𝑞2 + + 𝑞2 + 2 1
1
=𝐸
𝐸2 ,
4𝑞 6 +9𝑞2 +9
+ 2 𝑞2 + 4 𝑞2 (2𝑞2 −3)2 = 𝐸3
which have the contours given in the first line of Figure 8-4.
Figure 8.4 Bohm’s trajectories, probability density functions and equiprobable contours for the harmonic oscillator (levels n=0,1 and 2).
We also show the probability densities for these three excited states in the second line of Figure 8-4. It is obvious that our identification (8.10) represents the phenomenon much better than Bohm’s (8.7). The various energies 𝐸𝑛′ (for each 𝑛) represent the energies the system may have at some instant 𝑡 and the average energy 𝐸𝑛 is finally given by (8.9). Thus, if we consider the system as random, as is obvious from all the previous results and, in particular, the “entropy derivation” of the Schrödinger equation, the system will pass over regions of the phase space indexed by 𝐸𝑛′ with different probabilities; thus, for each contour curve given by some value of 𝐸𝑛′ (for fixed 𝑛) we have an associated probability.
Operator Formation and Phase Space Distributions
247
It is important to stress that where the probabilities in configuration space are zero there is no contour curve, showing the adequacy of the identification. Moreover, the contours of the probability densities 𝐹𝑛𝑄 (𝑞, 𝑝′ ) for these same states are shown in the third line of Figure 8-4 and it is quite clear that they represent appropriately the physical situation. It is important to stress that Bohm’s approach or the present modified Bohm’s approach are mathematically equivalent, although the latter is more appropriate physically. Therefore, both are mathematically equivalent to the Schrödinger equation. Now, if we plot the distribution 𝐹 𝑊 (𝑞, 𝑝) for the same three excited states of the harmonic oscillator we get the results shown in the first line of Figure 8-5, since we have (𝑚 = ℏ = 𝜔 = 1) 𝐹𝑛𝑊 (𝑞, 𝑝) =
(−1)𝑛 𝜋
𝑒 −2𝐻(𝑞,𝑝) 𝐿𝑛 [4𝐻(𝑞, 𝑝)],
where 𝐿𝑛 denotes the Laguerre polynomial of order 𝑛.
Figure 8.5. Wigner’s distributions for the first three excited states of the harmonic oscillator.
This figure shows some important features: for state 𝑛 = 1, for instance, 𝐹1𝑊 (𝑞, 𝑝)’s contours show that the system can fill the configuration space point 𝑞 = 0, as long as they do not have some momentum values 𝑝 for which 𝐹1𝑊 (𝑞, 𝑝) becomes zero or negative. Indeed, the contour for which 𝐹1𝑊 (𝑞, 𝑝) begins to show negative behavior is given by 𝑝2 + 𝑞2 = 0.5 for 𝑛 = 1 and one finds 𝑃𝑟𝑜𝑏(𝑞 = 0, 𝑝 > √0.5) = 0.273. It is only when we sum over all the momenta (allowed and prohibited—positive and negative probability) that we find zero as the probability of finding the system at 𝑞 = 0. Thus there is a finite probability of finding the system (at some instant of time 𝑡) in the (vicinities of the) phase space point (𝑞 = 0, 𝑝) (for p’s for which 𝐹1𝑊 (𝑞, 𝑝) is neither zero nor negative), but if we ask for the probability of finding the same system at 𝑞 = 0 for any p we find that it is zero, that is, the system cannot be found in this configuration point in the average, since 𝜌1 (𝑞 = 0) = 0, although it can be found in this configuration point some times (!?).
248
Olavo Leopoldino da Silva Filho
It gets more awkward: the average we referred to can be an ensemble average or a single system average (time average). If we are performing an ensemble average, then we must say that we do find the physical system at 𝑞 = 0 for some p’s in some actual systems of the ensemble but, since we do (don’t?!) find other systems with negative probability at the same point 𝑞 = 0 for some other p’s, the overall probability of finding it at 𝑞 = 0 for the whole ensemble is zero. If we are performing a single system average (time-average), then we must say that the system fills the configuration point 𝑞 = 0 (with allowed values of p) at some instant of times 𝑡 but fills (doesn’t fill?!) this same 𝑞 = 0 point with negative probability at other instants of time 𝑡 ′ (for prohibited p’s), giving an overall probability zero of finding the system at 𝑞 = 0 in a sufficiently large interval Δ𝑡. It is obvious that we cannot even describe the phenomenon without pronouncing an oxymoron. The origin of the paradox resides, of course, in the fact that 𝐹1𝑊 (𝑞, 𝑝) assumes negative values4. If we compare the contours in the first line of Figure 8-5 and those of Figure 8-4 we can easily see that 𝐹𝑛𝑊 (𝑞, 𝑝) can by no means be the appropriate probability distribution for the harmonic oscillator. The modified Bohmian equations are equivalent to the Schrödinger equation and they show, in phase space, which regions cannot be visited (with zero, not negative, probability). In fact, the very structure of the Bohmian equations (dynamical equations with an underlying corpuscular interpretation) preclude us from talking about negative probabilities. However, as we have already mentioned, 𝐹𝑛𝑊 (𝑞, 𝑝, 𝑡) represents property one previously mentioned. Indeed, any positive definite probability distribution 𝐹𝑛 (𝑞, 𝑝, 𝑡) giving the same marginal density 𝜌𝑛 (𝑞, 𝑡) cannot give the same 𝜋𝑛 (𝑝, 𝑡) as obtained by property one. Since we are pretty much sure about 𝜓𝑛 (𝑞, 𝑡) and 𝜌𝑛 (𝑞, 𝑡), if we want to avoid the paradoxes above mentioned (by assuming only positive definite distributions), we must abandon the requirement related to property one. This is the same as saying that the usual method of finding the momentum space probability density function is not appropriate, the most appropriate being simply 𝜋𝑛 (𝑝, 𝑡) = ∫ 𝐹𝑛 (𝑞, 𝑝, 𝑡)𝑑𝑞, for any positive definite probability distribution 𝐹𝑛 (𝑞, 𝑝, 𝑡). In many practical situations it can be shown that calculating 𝜋 𝑊 (𝑝, 𝑡) and 𝜋 𝑂 (𝑝, 𝑡) give very approximate distributions[102], but they are, of course, theoretically different (for the harmonic oscillator, however, they are completely different for all states other than the ground state). This can be traced back to the fact that there is a “second order” relation between 𝐹𝑛𝑊 (𝑥, 𝑝, 𝑡) and 𝐹𝑛𝑂 (𝑥, 𝑝, 𝑡)[50]. Example: we can plot the momentum densities for the first excited state of the harmonic oscillator as shown in Figure 8-6. 4
Surely, it can also be said that the preposterous statements come from the fact that we are forcing a “classical picture” over a “quantum object”; in this case the notion of a trajectory. However, the point is: if I can use such a classical picture to understand all quantum mechanical phenomenon using 𝐹 𝑂 (𝑞, 𝑝, 𝑡), why would I recur to 𝐹 𝑊 (𝑞, 𝑝, 𝑡) to do so (having then to deal with negative probabilities, etc.)?
Operator Formation and Phase Space Distributions
249
Note that 𝜋 𝑊 (𝑝, 𝑡) says that it is not possible to find the system in the vicinities of 𝑝 = 0, which is completely different from the assertion made by 𝜋 𝑂 (𝑝, 𝑡), which says that the system would be in the vicinities of 𝑝 = 0 with the highest probability. The result 𝜋 𝑊 (𝑝, 𝑡) seems simply ludicrous (for our deviant classical minds), since the harmonic oscillator problem refers to a sort of periodical movement in which the particle must invert the direction of its movement (invert the sign of 𝑝, thus passing through 𝑝 = 0 innumerous times, even in this case of a random harmonic oscillator.) Of course, one can always say that “the quantum phenomenon is that weird”, etc. and refrain from adopting 𝐹 𝑂 (𝑥, 𝑝) or any other positive definite phase space distribution. There is no possible scientific answer for such a claim.
Figure 8.6 Momentum probability density functions as marginal distributions to FW and FO, respectively.
8.5.2. Argument Two: Maximizes Entropy and Minimizes Energy Another interesting argument in favor of 𝐹 𝑂 (𝑥, 𝑝) can be found in the literature connected with the Density Functional Theory (DFT). It can be easily shown that 𝐹 𝑂 (𝑞, 𝑝) is the distribution function that maximize the local entropy 𝑆 = −𝑘𝐵 ∫ ∫ 𝐹(𝑞⃗, 𝑝⃗)ln𝐹(𝑞⃗, 𝑝⃗)𝑑𝑞⃗𝑑𝑝⃗
(8.11)
constrained to satisfy 𝜌(𝑞⃗) = ∫ 𝐹(𝑞⃗, 𝑝⃗)𝑑𝑝⃗, 𝑡𝑠 (𝑞⃗) = ∫
𝑝⃗2 𝐹(𝑞⃗, 𝑝⃗)𝑑𝑝⃗, 2
where 𝑡𝑠 (𝑞⃗) is the so called Weiszäcker term in the context of DFT (Cf. [102], pp. 239-240). It is then possible to define notions such as local temperature, pressure, etc. Given this last result, we can use DFT to show by other means that 𝐹 𝑂 (𝑞⃗, 𝑝⃗) is the probability density function related to the Schrödinger equation. Indeed, the energy functional becomes
250
Olavo Leopoldino da Silva Filho 𝑝2
𝐸[𝜌, ∇𝜌] = ∫ ∫ 𝐻(𝑞⃗, 𝑝⃗)𝐹𝑂 (𝑞⃗, 𝑝⃗)𝑑𝑟⃗𝑑𝑝⃗ = ∫ ∫ [ + 𝑉(𝑞⃗)] 𝐹 𝑂 (𝑞⃗, 𝑝⃗)𝑑𝑞⃗𝑑𝑝⃗ = 2𝑚 𝑝(𝑞,𝑡)2
∫[
+ 𝑉(𝑞⃗) −
2𝑚 𝑝(𝑞,𝑡)2
∫ {[
2𝑚 𝑝(𝑞,𝑡)2
∫ {[
2𝑚
ℏ2 8𝑚
∇2 ln𝜌(𝑞⃗)] 𝜌(𝑞⃗)𝑑𝑞⃗ =
+ 𝑉(𝑞⃗)] 𝜌(𝑞⃗) −
2
ℏ [∇2 𝜌 8𝑚 ℏ2 |∇𝜌|2
+ 𝑉(𝑞⃗)] 𝜌(𝑞⃗) + 8𝑚
𝜌
−
|∇𝜌|2 𝜌
]} 𝑑𝑞⃗ =
,
} 𝑑𝑞⃗
where we used in the last equality the fact that 𝜌(𝑞⃗) must be zero when 𝑞⃗ → ∞. With this functional we can make our variations [with the Lagrange multiplier 𝜆 introduced by the normalization of the probability density 𝜌⃗(𝑞⃗)] to find 𝛿𝐸[𝜌,∇𝜌]
=−
𝛿𝜌 ℏ2 ∇2 𝜌
− 4𝑚 [
𝜌
ℏ2
−2
ℏ2 |∇𝜌|2
∇𝜌
∇⋅( )− 𝜌 8𝑚
4𝑚 1 |∇𝜌|2 𝜌2
]+
𝑝(𝑞⃗⃗,𝑡)2 2𝑚
𝜌2
+
𝑝(𝑞⃗⃗,𝑡)2 2𝑚
+ 𝑉(𝑞⃗) − 𝜆 = ,
+ 𝑉(𝑞⃗) = 𝜆
and when we put, as usual, 𝜌(𝑞⃗, 𝑡) = 𝑅(𝑞⃗, 𝑡)2 and 𝑝⃗(𝑞⃗, 𝑡) = ∇𝑠(𝑞⃗, 𝑡), we find ℏ2
1
|∇𝑠(𝑞⃗, 𝑡)|2 + 𝑉(𝑞⃗) − ∇2 𝑅(𝑞⃗, 𝑡) = 𝜆. 2𝑚 2𝑚𝑅(𝑞⃗⃗,𝑡) This last equation is completely equivalent to (8.7) or (3.12) for stationary states for which ∂𝑠(𝑞⃗⃗,𝑡) ∂𝑡
= 𝜆,
being 𝜆, obviously, the energy of the state. Thus, if the density 𝐹 𝑂 (𝑞, 𝑝, 𝑡) satisfies with its functional appearance the property of maximizing entropy and minimizing energy, and is functionally different from 𝐹 𝑊 (𝑞, 𝑝, 𝑡) (or any other distribution here discussed), then 𝐹 𝑊 (𝑞, 𝑝, 𝑡): (a) does not maximize entropy, as written as (8.11) and (b) since physical systems must maximize the underlying entropy, Quantum Mechanics should have another entropy functional associated to it. However, it was possible to derive the Schrödinger equation by assuming this very functional appearance of the entropy (as done in chapter two). Thus, it is possible to make Quantum Mechanics to refer to the functional (8.11) for the entropy and, as such, it is possible to make Quantum Mechanics be the outcome of the insistence on the property of a maximization of some underlying entropy (mainly for stationary states, that would be related to quasiequilibrium situations). Thus, since 𝐹 𝑊 (𝑞, 𝑝, 𝑡) does not maximize the underlying entropy of the physical system, it should be discarded.
Operator Formation and Phase Space Distributions
251
8.5.3. Argument Three: Unambiguity The problems of ambiguity that the non-commutativity of the operators brings about are amply known. As we said before, to each function 𝐹(𝑥, 𝑝, 𝑡) there is an associated operator formation rule which consists, basically, of giving the way by which the momentum and position operators must be composed. Thus, the “infinity of possibilities for 𝐹(𝑥, 𝑝)” are associated to the “infinity of ways by which we can associate the position operator 𝑥̂ and the momentum operator 𝑝̂ ” [98]. The way by which we have developed our results, however, does not show this sort of ambiguity. Indeed, the source of ambiguity derives from the ubiquitous choice to always defining the problem from the “quantum mechanical side”, instead of defining it from the “classical side”5. Let us clarify this point. Phase space distribution are always defined in terms of an arbitrary operator 𝐴̂(𝑥̂, 𝑝̂ ) representing a certain quantum-mechanical observable as 𝑇𝑟{𝜌̂(𝑞̂, 𝑝̂ , 𝑡)𝐴̂(𝑞̂, 𝑝̂ )} = ∫ ∫ 𝐴(𝑞, 𝑝)𝐹(𝑞, 𝑝; 𝑡)𝑑𝑝𝑑𝑞,
(8.12)
where 𝜌̂(𝑞̂, 𝑝̂ , 𝑡) is the density matrix and 𝑇𝑟 means the trace operation. However “there is no unique way of defining the quantum phase-space distribution function 𝐹(𝑞, 𝑝, 𝑡) due to the noncommutability of quantum-mechanical operators” (Cf. [95], p. 152). This can be easily seen when we try to calculate the expectation value of the operator exp(𝑖𝜉𝑞̂ + 𝑖𝜂𝑝̂ ) and thus 𝑇𝑟{𝜌̂𝑒 𝑖𝜉𝑞̂+𝑖𝜂𝑝̂ } = ∫ ∫ 𝑒 𝑖𝜉𝑞+𝑖𝜂𝑝 𝐹(𝑞, 𝑝, 𝑡)𝑑𝑝𝑑𝑞; “the difficulty arises because there is no unique way of assigning a quantum-mechanical operator to a given classical function of conjugate variables. In the above example, although the scalar function exp(𝑖𝜉𝑞 + 𝑖𝜂𝑝) can equally well be written as exp(𝑖𝜉𝑞)exp(𝑖𝜂𝑝), the corresponding expression in the operator form, exp(𝑖𝜉𝑞̂ + 𝑖𝜂𝑝̂ ) and exp(𝑖𝜉𝑞̂)exp(𝑖𝜂𝑝̂ ), are clearly not equal because the operators 𝑞̂ and 𝑝̂ do not commute.” (Cf. [95], p. 152) However, our approach was to mathematically derive the Schrödinger equation from the Stochastic Liouville equation from which we got unambiguously 𝐹 𝑂 (𝑞, 𝑝, 𝑡). Another way of making this derivation is to derive the Schrödinger equation directly from a variational principle applied to 𝐹 𝑂 (𝑞, 𝑝, 𝑡). Thus, having the phase-space distribution function, we can define unambiguously the operator ordering that is coherent with the process of derivation of the Schrödinger equation. In this case, expression (8.12) should be written ∫ ∫ 𝐴(𝑞, 𝑝)𝐹𝑄 (𝑞, 𝑝)𝑑𝑝𝑑𝑞 = 𝑇𝑟{𝜌̂(𝑞̂, 𝑝̂ , 𝑡)𝐴̂𝑂 (𝑞̂, 𝑝̂ )},
With this “quantum mechanical side'' we refer only to the formalism, based upon the Schrödinger equation and the non-commuting operators. The “classical side'' is the one of usual statistical physics, with its usual definitions, etc.
5
252
Olavo Leopoldino da Silva Filho
which unambiguously defines 𝐴̂𝑂 (𝑞̂, 𝑝̂ ). In this case, the left-hand side is already known, since 𝐴(𝑞, 𝑝) is a usual classical function and 𝐹 𝑂 (𝑞, 𝑝, 𝑡) is known from the solution of the Schrödinger equation; thus, 𝐴̂𝑄 (𝑞̂, 𝑝̂ ) can be unambiguously determined.
8.5.4. Argument Four: Generality As we have seen in a previous section, the density 𝐹 𝑂 (𝑞, 𝑝, 𝑡) is related to the most general statistical class of physical systems. This should be expected, since we have already proven that this density is related to the Central Limit Theorem, that defines a class of universality for probability distribution functions, by saying that the specifics of the way the system fluctuates does not make any difference to the Gaussian character of the final asymptotic probability density, although it would make much difference to some of the properties of this Gaussian function (configuration space probability density 𝜌(𝑞, 𝑡), configuration space momentum dispersion 𝜎 2 (𝑞, 𝑡), etc.) Of course, the final word about which function should be considered the correct probability density (or distribution) function should be given by the experiments. One must not be naïve on that matter, thus. There is no experiment devoid of theory and we must be very cautious when it comes to make experiments to differentiate approaches, since at times one might be assuming much of the underlying presumptions of one of the approaches without knowing.
Chapter 9
ON REALITY, LOCALITY AND BELL’S INEQUALITIES We have seen in chapter one that since Quantum Mechanics was proposed as a statistical theory some physicists have been asking if the theory wouldn’t have a deterministic support that might had been shadowed by its probabilistic results. Such a deterministic structure would emerge, so the argument goes, whenever some parameters, of which we are presently unaware of, become known. These parameters, hidden as they must be in the present stage of our knowledge of the theory, give the name of such approaches to Quantum Mechanics, approaches now known as Hidden Variables Theories. In the pioneering work of von Neumann (1955) on the mathematical foundations of Quantum Mechanics, it is given a formal proof of the impossibility to construct any hidden variable theory consistent with Quantum Mechanics. This proof, which we will address in what follows, was considered adequate until the presentation1 of what was considered a true hidden variable theory in 1952 by David Bohm[45]. Obviously, this fact puts von Neumann’s proof under suspicion,. The whole conundrum was supposedly solved in 1964 by Bell[103], when it was allegedly proved that no local hidden variable theory could be consistent with Quantum Mechanics, although non-local hidden variable theories could be valid candidates to a deterministic support for Quantum Mechanics. The notion of “local” here is the one classical physicists were familiar with, and is the same proposed in the famous Einstein-PodolskyRosen (EPR) paper of 1935[105] as the requirement for a realist theory. As we have said in the preface to this book, it is not our intention to enter into the moody discussions on reality; after all, the very requirement of locality for a realist physical theory (which we endorse, by the way) is already a choice full of metaphysical presuppositions that could be, themselves, object of discussion—leading us to an unending chain of philosophical arguments that will be nothing but a diversion of the physical discussion. Indeed, it is the conviction of this author that it is this superposition of philosophical and physical arguments,
1
One should be very cautious with these “no-go proofs”. Generally, they are saying that it is impossible to someone to find a model (for a physical system), or a reconstruction (for a physical theory) that will address some issue – the impossible one. The claims about the quantum nature of the spin, which we have already addressed, are example of a “no-go claim” about models. In this chapter we will find two “no-go proof” for theory reconstruction pointing into opposite directions. Von Neumann’s and Bell’s theorems are “no-go proofs” (about theories). On the other hand, Bell himself has said that he hoped that his work would "continue to inspire those who suspect that what is proved by the impossibility proofs is lack of imagination.”[104]
254
Olavo Leopoldino da Silva Filho
generally with quite unclear presuppositions, that lends this subject on realism and locality its unbearable obscurity and hermeticism. Generally we can find in propositions in this field of physics three very distinct and difficult concepts: locality, reality, and determinism. Let us try to take them separately to get a deeper understanding of what Quantum Mechanics may be all about. Indeed, from the interpretation we have been advancing until this point it is quite obvious that we do have a local theory which captures reality in the usual sense physicists mean when talking about such matters. This book’s interpretation of the theory, however, appeals to stochastic behaviors and, thus, is essentially probabilistic, to the present stage of development. Therefore, it is pretty much obvious that the question about realism and locality is independent of the question about determinism. This book’s interpretation shows that Quantum Mechanics can be viewed as a realist statistical approach to the world, based upon corpuscles with well defined trajectories and momenta at each instant of time—it is a realist assessment in the same sense Newtonian and Classical Statistical Physics are, and the fact that it has a stochastic support means nothing to claims about realism. The present theory, furthermore, is completely local. Indeed, as with Newtonian Physics, it can be made local by looking for its relativistic extensions (see chapter 11). So much for realism and locality. What the present approach is not able to devise, let alone determine, is if there can be any deterministic background for Quantum Mecanics. So, for the sake of clarity, let us put forward the main problem that we will address in this chapter: we just want to understand what it means to say that Quantum Mechanics cannot have as its support a deterministic local theory, since we already know that it can be a statistical local theory. Although the arguments to follow do not affect our approach, given this approach’s statistical character, there are elements in the “no-go” theorems to be presented that are interesting to consider. Thus, we begin the next section with von Neumann’s argument and proceed in the rest of the chapter with Bell’s reconstruction of Bohmian Quantum Mechanics and his arguments on locality.
9.1. VON NEUMANN’S THEOREM AND BELL’S ARGUMENT In his book[106] von Neumann assumes the following postulatees for Quantum Mechanics: 1. if an observable 𝑅 is represented by the operator 𝑅̂, then any function 𝑓(𝑅) of the observable is represented by 𝑓(𝑅̂ ); 2. the sum of any number of observables 𝑅, 𝑆, … is represented by the operator 𝑅̂ + 𝑆̂, even if these operators do not commute; 3. the correspondence between operators and their observables is one-to-one; 4. if the observable 𝑅 is non-negative, then its average value ⟨𝑅⟩ is non-negative; 5. for any observables 𝑅, 𝑆, …, and real numbers 𝑎, 𝑏, …, we must have ⟨𝑎𝑅 + 𝑏𝑆 + ⋯ ⟩ = 𝑎⟨𝑅⟩ + 𝑏⟨𝑆⟩ + ⋯ for all ensembles in which these averages can be calculated.
On Reality, Locality and Bell’s Inequalities
255
Using these postulates, von Neumann construes the density operator 𝜌̂ (density matrix) and its properties. One of such properties refers to how such an operator must be used to furnish average values, that is ⟨𝑅⟩ = 𝑇𝑟(𝜌̂𝑅), as is now amply known. Von Neumann then argues that in any hidden variable theory one must have dispersionless states2, that is ⟨(𝑅 − ⟨𝑅⟩)2 ⟩ = 0 ⇒ ⟨𝑅 2⟩ = ⟨𝑅⟩2 . However, von Neumann continues, if we put 𝑅 = |ϕ⟩⟨ϕ|, with ⟨ϕ|ϕ⟩ = 1, we get ⟨ϕ|𝜌̂|ϕ⟩ = ⟨ϕ|𝜌̂|ϕ⟩2 . Thus 𝜌̂ = 1 or 𝜌̂ = 0 for any amplitude |ϕ⟩, if such amplitudes are to represent dispersionless states. The case 𝜌̂ = 0 is trivial and presents no interest. However, 𝜌̂ = 1 does not imply in a dispersionless state if the vector space is of dimension greater than one, since ⟨1⟩ = 𝑇𝑟(1) = 𝑑, where 𝑑 is the dimension of the space. Thus ⟨(𝑅 − ⟨𝑅⟩)2 ⟩ = ⟨𝑅 2 ⟩ − 2⟨𝑅⟩2 + ⟨𝑅⟩2 ⟨1⟩ ≠ 0. Thus, von Neumann has proven that if one accepts his postulates, there can be no dispersionless states and, therefore, no hidden variable theory compatible with Quantum Mechanics. Since this is a mathematical derivation, if one is to be suspicious of the previous conclusion, then one should look at the postulates (the theorem’s premises) to see what could be considered as spurious assumptions underlying them. Before passing to John Bell’s rebuttal to von Neumann’s argument, let us stress that we have already shown von Neumann’s postulates to be internally inconsistent in the previous chapter. Indeed, some of these postulates represent a choice for operator formation that does not lead to a one-to-one correspondence between operators and observables. On the other hand, as we also learned from the previous chapter, if we lift those propositions that form the core of von Neumann’s choice for operator formation (favoring Weyl’s rule, Wigner’s distribution, etc.), such that the remaining rules now imply that the one-to-one correspondence obtains, it is clear then by construction that all operators are generally related to some dispersion in the underlying observable, even if we are in a representation in which the operator is diagonal—that is even if we are dealing with eigenvalues and eigenvectors.
2
Thus he is talking about deterministic theories.
256
Olavo Leopoldino da Silva Filho
In any case, von Neumann’s presuppositions are internally inconsistent and his proof is meaningless, whenever his conclusion depends essentially on his proposal of operator formation. This, however, was not the path taken by Bell to put forward his famous argument; Bell does not accept the fifth postulate for dispersionless states arguing that “the non-trivial nature of the additive property represented by postulate five can be seen when we consider individual measurements of non-commuting observables. In the specific example of the spin components of a particle, we can measure 𝜎̂𝑥 by some choice of orientation of the Stern-Gerlach magnet. The measurement of 𝝈 ̂ 𝒚 requires a new orientation. However, there is no way to relate the measurement of (𝜎̂𝑥 + 𝜎̂𝑦 ) with the previous measurements. In fact, the result of the measurement, an eigenvalue of (𝜎̂𝑥 + 𝜎̂𝑦 ), won’t be the sum of the eigenvalue of 𝜎̂𝑥 with an eigenvalue of 𝜎̂𝑦 . Thus, that the averages obey postulate five is a non-trivial aspect of quantum states. In dispersionless states, however, each observable has one single value as eigenvalue and, thus, there can be no linear relation among eigenvalues of non commuting observables. In general, it is not obvious that postulate five can still be true for these states. When we remove this postulate, it is possible to show that a hidden variable model reproduces correctly the statistical distribution for quantum states.” [103] (my emphasis)
It is obvious that Bell’s argument uses some version of Measurement Theory, which is part of some interpretations of Quantum Mechanics (it is not part of this book’s interpretation, for instance). He appeals to the “non-trivial” character of quantum averages and the relation between observables and Stern-Gerlach magnets. His use of the Stern-Gerlach setup as a model experiment is particularly interesting for us, for reações that will soon become apparent. Bell’s argument implicitly assumes that the outcome of a measurement on some quantum system is represented in the formalism specifically by the eigenvalue of a quantum mechanical operator, related to the experimental setup in which this operator becomes diagonal. Thus, in such an interpretation, if we put the Stern-Gerlach magnet aligned with the 𝑥-direction, then it selects the eigenvalue of the operator 𝜎̂𝑥 , which becomes “observable”, that is, capable of being measured, while leaving the values of 𝜎̂𝑦 and 𝜎̂𝑧 “unobservable”, since they cannot be made diagonal in some matrix representation together with 𝜎̂𝑥 3. The operators 𝜎̂𝑦 and 𝜎̂𝑧 can be made diagonal (thus observable, the argument goes), but at the expense of changing the direction of the Stern-Gerlach magnets. Since there is no way to make any sort of connection between the eigenvalues of 𝜎̂𝑥 , 𝜎̂𝑦 and 𝜎̂𝑥 + 𝜎̂𝑦 , one should conclude in favor of the “non-trivial” aspect of quantum measurements. Bell’s argument is complex enough to deserve careful attention. Note that Bell’s appeal to a version of Measurement Theory implies in a strong assumption. From the present work’s point of view, it is not at all true that “the result of some measurement to which an operator is associated to is one of this operator’s eigenvalues”. 3
This interpretation of the role played eigenvalues is alien even to the Copenhagen Interpretation, which sustains only that a measurement involving eigenvalues of some operator 𝑂̂ is dispersionless, while the measurement of the underlying non-commuting canonical conjugates with 𝑂̂ can be measured, although with necessary dispersions.
On Reality, Locality and Bell’s Inequalities
257
In the last section of chapter seven, and for the very Stern-Gerlach experiment as an example, we have already shown that this claim is completely equivocal. It is clearly possible to measure the projections of the magnetic moment of the electron for each particle coming out of the Stern-Gerlach magnets, that is, the values of 𝜇𝑥 and 𝜇𝑧 , even though 𝜎̂𝑥 and 𝜎̂𝑧 do not commute. The description given by Bell is related to an assumption, already exposed in the last section of chapter seven, that the outcome of the Stern-Gerlach experiment, contrary to what all experimental data seems to be showing, should be just two small dots (not even stripes) at the detectors’ screen, representing the measurement (without dispersion) of the value of the magnetic moment 𝜇𝑧 . However, instead of the double-dot outcome imagined by Bell4, one should expect something like the results shown in Figure 9-1 for various realizations of the Stern-Gerlach experiment, each one with one single particle sent at a time. At each realization of the SternGerlach experiment for one single particle, one would find something quite similar to what is shown schematically in Figure 9-2, from which the values of 𝜇𝑥 and 𝜇𝑧 can be measured.
Figure 9.1. schematic presentation of the results expected for experiments related to the Stern-Gerlach experimental setup. They differ in outcome because of the kind of collimation done by the slit’s geometry.
In this problem, as we have already noted in chapter seven, from the quantum mechanical point of view, we are making the problem diagonal in the direction of the magnetic moment of the incoming particle, for each particle sent. The calculations made show that we are working with the diagonal representation of the operator 𝜎̂𝑧 cos 𝛼 + 𝜎̂𝑥 sin 𝛼 for each angle 𝛼 between the magnetic moment of the particle and the Z-axis. It is obviously impossible to make diagonal (simultaneously) 𝜎̂𝑧 or 𝜎̂𝑥 , since they do not commute. However, this has absolutely nothing to do with us being capable of measuring the values, for each experiment with a single particle, of 𝜇𝑥 and 𝜇𝑧 , and thus, their average after an ensemble of such particles is considered.
4
Which is difficult to understand why, since no experiment made to the present gives such a result.
258
Olavo Leopoldino da Silva Filho
Figure 9.2. calculation of the projections of the magnetic moment of the Stern-Gerlach particle for the emission of each single particle
In fact, to turn the Stern-Gerlach magnets into the x-direction as a means to calculate the value of 𝜇𝑥 would be preposterous. When one writes 〈𝜎̂𝑧 cos(𝛼) + 𝜎̂𝑥 sin(𝛼)〉𝑅 = 〈𝜎̂𝑧 〉𝑅 cos(𝛼) + 〈𝜎̂𝑥 〉𝑅 sin(𝛼),
(9.1)
one implicitly assumes the same representation in both sides of the previous equation – there shown by the subscript R. This can be in an obvious way if, instead of using Dirac’s notation in (9.1), we use a notation based on probability amplitudes. Then (9.1) may be written as ⃗⃗+ [𝜎̂𝑧 cos(𝛼) + 𝜎̂𝑥 sin(𝛼)]𝜔 ⃗⃗𝑑𝑣 = ∫ 𝜔 ⃗⃗+ [𝜎̂𝑧 cos(𝛼)]𝜔 ⃗⃗𝑑𝑣 + ∫ 𝜔 ⃗⃗+ [𝜎̂𝑥 sin(𝛼)]𝜔 ⃗⃗𝑑𝑣, ∫𝜔 where we must have the same spinor 𝜔 ⃗⃗ (thus, the same representation) at both sides of the equation. Note that it simply does not matter which representation R we choose to use, the result keeps its validity. We also stress again what we said in the last section of chapter seven: it is inconsistent to assume the left hand term in (9.1) as being dispersionless (because it is diagonal), while considering the two terms at the right hand side of the equation as having dispersions, since dispersions do not cancel out. This connection between being diagonal and being dispersionless is a result that comes from the first of von Neumann’s postulates. If, to measure 𝜇𝑥 , we have to change the Stern-Gerlach magnets’ direction, then we will automatically change the problem (mainly in the case of non circular slits, which means that the outcome at the detectors’ screen would not be isotropic in the 𝑂𝑋𝑍 plane). Furthermore, note that assuming (9.1) we are saying that, for each particle (that is, for each incoming angle 𝛼) we are assuming a different representation 𝑅𝛼 , since this would be related to a different rotation of the 𝑂𝑋𝑍 plane (in the notation of chapter seven), for which the operator 𝑆̂𝑧 in the internal variables is diagonal. The fact that averages obey postulate five, for each representation, is a fully obvious and trivial aspect of Quantum Mechanics. In fact, it is postulate one of von Neumann’s proof that is “non-trivial” and, indeed, inconsistent with postulate three. However, as we have previously noted, it is precisely this postulate one that Bell assumes acritically, when he identifies the mathematical property of an operator being diagonal in some representation R with the statistical property of this same
On Reality, Locality and Bell’s Inequalities
259
operator being related to dispersionless measurements. In fact, this would be the case if we assume that 𝑂̂𝜑 = 𝑜𝜑 → 𝑂̂2 𝜑 = 𝑜 2 𝜑 → ∆𝑜 = 0, where 𝑂̂ is some operator with the eigenvalue 𝑜. For Rivier’s rule, for instance, this is not always true (and also for the symmetrization rule, usualy adopted in Quantum Mechanics). In fact, when treating the problem of the eigenfunctions for the spin, we used Rivier’s rule to calculate 𝑆 2 = 14𝑆02 → 𝑆̂ 2 = 14(𝑆̂02 − ℏ2 ), and we wouldn’t have found the correct answer to the problem of the spin if we had simply adopted von Neumann’s postulate one. When Bell says that “in dispersionless states, however, each observable has one single value as eigenvalue and, thus, there can be no linear relation among eigenvalues of non commuting observables” he simply extends to dispersionless situations his (mistaken) assumptions about the relation between presenting eigenvalues and having the property of being measurable. Here these assumptions are most trivially exposed as wrong. Indeed, for a dispersionless situation the representation is given by a simple choice of coordinate axes and the observables become vectors, tensors, etc (not necessarily operators that must be made diagonal, see next paragraph). The vectors and tensors representing the dispersionless quantities are written with respect to these axes. The choice of this or that set of coordinate axes to represent relations among such vectors must be irrelevant or physics would have to be completely reconstructed. As a last comment, one should not assume that the deterministic underlying theory for Quantum Mechanics ought to be based on the same formalism (operators, diagonalizations, etc). In fact, it is quite probable that this does not obtain. From the developments already presented in this book, we could argue that our approach based on Langevin equations qualify as an underlying theory for Quantum Mechanics, although not deterministic. The fact is that this underlying theory has nothing to do with the mathematical formalism of Quantum Mechanics based on the Schrödinger equation, which is, precisely, a mean field theory when compared to the level of analysis based on Langevin equations. Thus, arguments about eigenvalues appearing in the underlying deterministic support for Quantum Mechanics assume too much (and this plagues both Bell’s and von Neumann’s arguments).
On the Status of Bell’s Argument against Von Neumann’s Proof All the previous analysis implies that Bell’s argument cannot be accepted as an argument against von Neumann’s proof. However, as we have already argued, von Neumann’s proof is tainted by other problems—internal consistency problems (much more debasing, by the way). Nonetheless, it is possible to modify postulate one, that is inconsistent with postulate three, by using other methods of operator formation without modifying von Neumann’s proof. In fact, if one keeps postulate three, only operator formation rules that give dispersions in general for all observables are possible, making von Neumann’s proof unnecessary and making Bell’s arguments out of the question, since they appeal to dispersionless states.
260
Olavo Leopoldino da Silva Filho
On the other hand, our developments in previous chapters, mainly those related to the Langevin reconstruction of Quantum Mechanics, show that the possibility to find a hidden variable theory for Quantum Mechanics must be related to the fluctuation-dissipation part of the Langevin equation (in a manner that we will discuss in what follows). In fact, the analysis of the possibility of such a hidden variable theory should never be done at the Schrödinger equation level, for the very reason that this equation is already connected to non-zero dispersions by postulate (see, for instance, the entropy derivation and its postulates). In this sense, von Neumann’s conclusion is perfectly adequate, for it is meant to this level of analysis: one that already assumes the applicability of the Schrödinger equation. Furthermore, Bohmian equations are also developed at the level of the Schrödinger equation and we then must face the original problem that set the “debate” between Bell and von Neumann: if Bohm’s reconstruction of Quantum Mechanics is a deterministic theory, then von Neumann’s conclusion must be wrong. By contraposition, if von Neumann’s conclusion is adequate, then Bohm’s reconstruction of Quantum Mechanics cannot be a deterministic theory. Bell sustains that Bohmian mechanics is such a deterministic theory, but at the cost of being a non-local theory (after he eliminates von Neumann’s postulate five). We have already encountered Bohm’s results a number of times in this book and we are in position to give our own reflections about Bell’s assumption on that matter. Thus, let us scrutinize Bohm’s reconstruction of Quantum Mechanics to see if Bells assumptions are appropriate. First, however, let us get a deeper insight into the notion of “non-locality” by understanding Bell’s theorem.
9.2. NON-LOCALITY: WHAT IT MEANS? From his criticism on von Neumann’s proof of the impossibility of hidden variable theories, J. Bell was able to show that if any such theory does exist it should be non-local. By non-local he means that experiments made at regions separated from each other by space-like distances should be still capable of exchanging signals. This is, basically, his argument [103]: Claim: Take two different experimental apparatuses A and B to measure spin, for instance. The outcomes of the first apparatus are represented by 𝐴(𝑎⃗, 𝜆) (Alice’s measurements) while the outcomes of the second apparatus are represented by 𝐵(𝑏⃗⃗, 𝜆) (Bob’s measurements), where 𝑎⃗ and 𝑏⃗⃗ are arbitrary directions of the Stern-Gerlach magnets and 𝜆 represents a set of hidden variables with probability density 𝜌(𝜆) for some known quantum state. Locality is implicit in our assumption that the outcomes of experiments on 𝐴 depend only on 𝐴′ 𝑠 own characteristics, and are independent of what would be the outcomes in 𝐵, and vice-versa. We then ask if the average value of 𝐴𝐵, that is, their correlation, given by 𝑃(𝑎⃗, 𝑏⃗⃗) = ∫ 𝜌(𝜆)𝐴(𝑎⃗, 𝜆)𝐵(𝑏⃗⃗, 𝜆)𝑑𝜆,
(9.2)
On Reality, Locality and Bell’s Inequalities
261
where ∫ 𝜌(𝜆)𝑑𝜆 = 1, can be equivalent to any measurement of Quantum Mechanics. In other words, we want to know if any hidden variable theory in which we assume a priori its local character can give similar results as those coming from the usual methods of Quantum Mechanics. Assuming, without any loss of generality, that |𝐴(𝑎⃗, 𝜆)| ≤ 1, |𝐵(𝑏⃗⃗, 𝜆)| ≤ 1 ⇒ |𝑃(𝑎⃗, 𝑏⃗⃗)| ≤ 1,
(9.3)
we have 𝑃(𝑎⃗, 𝑏⃗⃗) − 𝑃(𝑎⃗, 𝑏⃗⃗′) = ∫[𝐴(𝑎⃗, 𝜆)𝐵(𝑏⃗⃗, 𝜆) − 𝐴(𝑎⃗, 𝜆)𝐵(𝑏⃗⃗′, 𝜆)]𝜌(𝜆)𝑑𝜆 . = ∫ 𝐴(𝑎⃗, 𝜆)𝐵(𝑏⃗⃗, 𝜆)[1 ± 𝐴(𝑎⃗′, 𝜆)𝐵(𝑏⃗⃗′, 𝜆)]𝜌(𝜆)𝑑𝜆 − ∫ 𝐴(𝑎⃗, 𝜆)𝐵(𝑏⃗⃗′, 𝜆)[1 ± 𝐴(𝑎⃗′, 𝜆)𝐵(𝑏⃗⃗, 𝜆)]𝜌(𝜆)𝑑𝜆 Now, applying the triangle inequality to both sides, using (9.3) and the fact that [1 ± 𝐴(𝑎⃗′, 𝜆)𝐵(𝑏⃗⃗′, 𝜆)] and [1 ± 𝐴(𝑎⃗′, 𝜆)𝐵(𝑏⃗⃗, 𝜆)] are non-negative values we get |𝑃(𝑎⃗, 𝑏⃗⃗) − 𝑃(𝑎⃗, 𝑏⃗⃗′)| ≤ ∫[1 ± 𝐴(𝑎⃗′, 𝜆)𝐵(𝑏⃗⃗′, 𝜆)]𝜌(𝜆)𝑑𝜆 + ∫[1 ± 𝐴(𝑎⃗′, 𝜆)𝐵(𝑏⃗⃗, 𝜆)]𝜌(𝜆)𝑑𝜆, and, since the integral of 𝜌(𝜆) is 1, we finally get |𝑃(𝑎⃗, 𝑏⃗⃗) − 𝑃(𝑎⃗, 𝑏⃗⃗′ )| + |𝑃(𝑎⃗′ , 𝑏⃗⃗′ ) + 𝑃(𝑎⃗′ , 𝑏⃗⃗)| ≤ 2,
(9.4)
(a result known, generally, as Bell’s inequality); if we put 𝑎⃗′ = 𝑏⃗⃗′ and assuming that 𝑃(𝑏⃗⃗′ , 𝑏⃗⃗′ ) = −1 we get |𝑃(𝑎⃗, 𝑏⃗⃗) − 𝑃(𝑎⃗, 𝑏⃗⃗′ )| ≤ 1 + 𝑃(𝑏⃗⃗, 𝑏⃗⃗′ ). Since locality seems to be our only point of departure in the theorem, if the last inequality or the one in (9.4) is satisfied for some physical system, then the system presents local behavior. If the theorem breaks down for some physical system, then this system must present non-local behavior.
9.2.1. Quantum Mechanical Spin as an Example Now, let us apply this argument to the situation in which one has a singlet initial state |𝜓𝑆 ⟩ consisting of two particles of opposite spins. Thus |𝜓𝑆 ⟩ =
|+⟩|−⟩−|−⟩|+⟩ √2
,
(9.5)
where the spatial part of the function is not represented. This system dissociates at some instant into particle 1 and particle 2 that move in opposite directions and pass through Stern-
262
Olavo Leopoldino da Silva Filho
Gerlach magnets representing A and B, respectively. For a experimental setup for which the Stern-Gerlach apparatus on the left is aligned with direction 𝑎⃗ to measure particle’s 1 spin and the Stern-Gerlach apparatus on the right is aligned with direction 𝑏⃗⃗ to measure particle’s 2 spin, we can calculate the correlation in their outcomes as 1
⟨𝜓𝑆 |𝜎𝑎 𝜎𝑏 |𝜓𝑆 ⟩ = [⟨+|𝜎𝑎 |+⟩⟨−|𝜎𝑏 |−⟩ − ⟨+|𝜎𝑎 |−⟩⟨−|𝜎𝑏 |+⟩ − 2
⟨−|𝜎𝑎 |+⟩⟨+|𝜎𝑏 |−⟩ + ⟨−|𝜎𝑎 |−⟩⟨+|𝜎𝑏 |+⟩]
.
(9.6)
If we put 𝑎⃗ = 𝑘̂ (𝑧-direction), 𝜎𝑎 = 𝜎𝑧 , and thus 𝜎𝑧 |+⟩ = |+⟩ and 𝜎𝑧 |−⟩ = −|−⟩. Thus, the correlation (9.6) becomes 1
⟨𝜓𝑆 |𝜎𝑎 𝜎𝑏 |𝜓𝑆 ⟩ = [⟨−|𝜎𝑏 |−⟩ − ⟨+|𝜎𝑏 |+⟩] = −cos𝜃𝑎𝑏 , 2
(9.7)
where 𝜃𝑎𝑏 is the angle between 𝑎⃗ and 𝑏⃗⃗. However, this result violates (9.4), if we use the experimental setup ∠𝑎⃗𝑏⃗⃗′ = 2𝜃, ∠𝑏⃗⃗𝑎⃗′ = 0 and ∠𝑏⃗⃗𝑏⃗⃗′ = ∠𝑎⃗𝑎⃗′ = 𝜃 (see Figure 9-1). Indeed, in this case, we can use (9.7) to write (9.4) as |𝑐𝑜𝑠(2𝜃) − 𝑐𝑜𝑠(𝜃)| + (1 + 𝑐𝑜𝑠𝜃) ≤ 2, which is violated for a range of values of 𝜃, as can be seen in Figure 9-2.
9.2.2. Classical Mechanics’ Counterpart of Spins as an Example Now that we have already presented the quantum mechanical example, and knowing from chapter 7 that we can work with spins in a fully compatible classical presentation, let us turn to the framework of Classical Physics to see what happens.
Figure 9.3. The choice of the Stern-Gerlach magnets to measure particle’s 1 spin (the smaller sphere on the left of the singlet initial state represented by the bigger sphere) and particle’s 2 spin (on the right). Axis 𝑎⃗ is ⃗⃗⃗⃗ and 𝑏⃗⃗ are inclined by 𝜃 with respect to the z-direction, and axis 𝑏’ ⃗⃗⃗ is inclined in the z-direction, while axes 𝑎’ by 2𝜃 with respect to the z-direction. Frames 1 to 4 show different combinations of these axes (or magnet orientations) to correlate the results of the measured spin of both particles.
On Reality, Locality and Bell’s Inequalities
263
Figure 9.4. plot of the right hand side of Bell’s inequality as compared with its limiting value 2 to show that this inequality is violated for the experimental set up of the text.
In the framework of Classical Physics it is very simple to identify the quantities referred to in Bell’s theorem for the associated physical situation of Stern-Gerlach magnets pointing in various directions. First of all, the so called hidden variable is clearly the angle 𝛼. Moreover, if we assume that the particles come from the oven with magnetic moments pointing in arbitrary random and equiprobable directions 𝛼, then 𝜌(𝛼) = 1⁄2𝜋. Then we must find the correlation of what is being measured. Note that the outcomes of the experimental situation are referred (see chapter 7) to vectors in the 𝑂𝑋𝑍 plane given by ⃗⃗(𝑏⃗⃗, 𝛼) = 𝑖̂𝑋𝑏 (𝛼) + 𝑘̂ 𝑍𝑏 (𝛼), such that the correlation one 𝐴⃗(𝑎⃗, 𝛼) = 𝑖̂𝑋𝑎 (𝛼) + 𝑘̂ 𝑍𝑎 (𝛼) and 𝐵 is interested in is given by ⃗⃗,𝜆)𝑑𝜆 ⃗⃗ (𝑏 ∫ 𝜌(𝜆)𝐴⃗(𝑎⃗⃗,𝜆)∙𝐵
𝑃(𝑎⃗, 𝑏⃗⃗) =
2
2
=
√∫ 𝜌(𝜆)|𝐴⃗(𝑎⃗⃗,𝜆)| 𝑑𝜆√∫ 𝜌(𝜆)|𝐵 ⃗⃗(𝑎⃗⃗,𝜆)| 𝑑𝜆
1 2𝜋 ∫ [𝑋𝑎 (𝛼)𝑋𝑏 (𝛼)+𝑍𝑎 (𝛼)𝑍𝑏 (𝛼)]𝑑𝛼 2𝜋 0 1 2𝜋 1 2𝜋 √ ∫0 𝑅𝑎2 (𝛼)𝑑𝛼√ ∫0 𝑅𝑏2 (𝛼)𝑑𝛼 2𝜋 2𝜋
,
(9.8)
where 𝑅𝑠2 (𝛼) = 𝑋𝑠2 (𝛼) + 𝑍𝑠2 (𝛼), for 𝑠 = 𝑎, 𝑏. The above formulas are simply Pearson’s correlation factor, where we are assuming that ⃗⃗(𝑏⃗⃗, 𝜆)𝑑𝜆 = 0. ∫ 𝜌(𝜆)𝐴⃗(𝑎⃗, 𝜆)𝑑𝜆 = ∫ 𝜌(𝜆)𝐵
(9.9)
These last two constraints are shown to be satisfied in what follows. Note that when particle 1 comes out of the singlet state at an angle 𝛼, then particle 2 comes out of the singlet state at the angle 𝜋 + 𝛼. If we put magnet one pointing to the Zdirection, and magnet two pointing to a direction inclined by the angle 𝜑 with respect to the Z-direction, then we have, according to the results of chapter 7, the following expressions [see (7.38)]: 𝑍𝑎 (𝑡) = −
𝛽𝜇 cos(𝛼) 2 𝑡 ; 2𝑚
𝑋𝑎 (𝑡) =
𝛽𝜇 sin(𝛼) 2 𝑡 , 2𝑚
264
Olavo Leopoldino da Silva Filho
and 𝑍𝑏 (𝑡) = −
𝛽𝜇 cos(𝜋+𝛼−𝜑) 2 𝑡 ; 2𝑚
𝑋𝑏 (𝑡) =
𝛽𝜇 sin(𝜋+𝛼−𝜑) 2 𝑡 , 2𝑚
such that the result for 𝑃(𝑎⃗, 𝑏⃗⃗) becomes 1
𝑃(𝑎⃗, 𝑏⃗⃗) = 2𝜋
(
𝛽𝜇 2 4 2𝜋 ) 𝑡 ∫0 [cos(𝛼) cos(𝜋+𝛼−𝜑)+sin(𝛼) sin(𝜋+𝛼−𝜑)]𝑑𝛼 2𝑚 𝛽𝜇 2 1 2𝜋 1 2𝜋 ( ) 𝑡 4 √ ∫0 𝑑𝛼 ∫0 𝑑𝛼 2𝑚 2𝜋 2𝜋
,
which is simply 1 2𝜋 𝑃(𝑎⃗, 𝑏⃗⃗) = 2𝜋 ∫0 cos(𝜑)𝑑𝛼 = −cos(𝜑).
(9.10)
Note that our assumptions (9.9) are, indeed, valid. Result (9.10) is completely equal to the quantum mechanical one (9.7), in exactly the same experimental situation. Thus, for the same choices of the Stern-Gerlach apparatuses’ directions shown in Figure 9-1 we get exactly the same graphic shown in Figure 9-2. Thus, this shows that Bell’s inequality is violated also in the classical (indeed, Newtonian) framework, exactly in the same manner as it is violated in the quantum mechanical framework. However, now we are presenting the associated hidden variable theory (Newtonian Mechanics with 𝛼 as our parameter), which is certainly deterministic and is trivially local (just think about the relativistic extension of all these calculations). This seems in flagrant contrast with the conclusions of Bell’s theorem: a deterministic and local “hidden” variable theory violating Bell’s inequality.
Figure 9.5. Artistic representation of a purely classical experimental set to test Bell’s theorem for Classical Physics.
One may be still uneasy about the result, since it is being presented as a result closely connected to Quantum Mechanics, etc. Let us reformulate it to fit only within Classical Physics, without any reference to spins and quantum domains. Example: consider the hypotetical situation shown in Figure 9-3. There we show an electromechanical device built in such a manner that it will dispense small magnetic coins with their magnetic moments aligned with random directions on the XZ plane (perpendicular to the plane of the movement). These coins are made to pass through Stern-Gerlach magnets
On Reality, Locality and Bell’s Inequalities
265
as those already considered. What would be Newton’s equation for this experimental setup? Precisely the one giving the solutions we have just shown. These solutions would then entail the same correlation already calculated in (9.10) and will, thus, violate locality! What has gone wrong? The classical reconstruction of the phenomenon allows us to speculate about the very moment at which the example violates the suppositions of the theorem and makes it to break down. The crucial passage in the classical presentation is the one in which we assume that if coin 1 comes out of the singlet state with its magnetic moment inclined by an angle 𝛼 with respect to the Z-direction, than coin 2 surely comes out of the singlet state with its magnetic moment inclined by an angle 𝜋 + 𝛼. It is this crucial suposition that gives us the final result for 𝑃(𝑎⃗, 𝑏⃗⃗) in this classical version of the phenomenon. It is this very assumption that gives 𝑃(𝑎⃗, 𝑏⃗⃗) in the quantum mechanical version of the phenomenon, since it is this assumption that was used in (9.7), in the same manner as in (9.10). However, the previously mentioned assumption comes naturally from conservation arguments. Conservation laws guarantee that there is information being passed from one system to the other in a “non-local” way, but this “non-locality” is not physical, it comes not from signals travelling at velocities greater than the light—this sort of information is engraved in the way the physical situation is constructed (from singlet states, for instance). Thus, a conservation law seems to be the source of the break down of Bell’s assumptions when saying that, for instance, 𝐴⃗(𝑎⃗, 𝜆) is related only to 𝑎⃗. Indeed, when considering Bell’s theorem, we simplified the formalism (as is usually done in the literature) by writing 𝐴⃗(𝑎⃗, 𝜆) ⃗⃗(𝑏⃗⃗, 𝜆). and 𝐵 However, by doing this we are implicitly assuming that the same set of hidden variable 𝜆 describes both A and B, without qualification. It is important to qualify this point, now that we have found a problem with the application of Bell’s theorem. Indeed, in general, each observer, A or B, will describe the movement of her side of the experiment with the subsystem’s own hidden variables (in our present example, the different angles the magnetic moments of coins 1 and 2 make with the Z-axis). This means that, in general, 𝜆 may be a vector having some of its components related to A, while others are related to B. When we integrate over the variable 𝜆 we are in fact integrating over a vector, that is 𝑑𝜆⃗. In the present case, we have the angles 𝛼 and 𝛼′ related to A and B, respectively, that is 𝑑𝜆⃗ = 𝑑𝛼𝑑𝛼′. But then, (9.3) should have been written as ⃗⃗(𝑏⃗⃗, 𝛼′)𝑑𝛼′𝑑𝛼, 𝑃(𝑎⃗, 𝑏⃗⃗) = ∫ 𝜌(𝛼, 𝛼 ′ )𝐴⃗(𝑎⃗, 𝛼) ∙ 𝐵
(9.11)
where 𝜌(𝛼, 𝛼 ′ ) becomes the possible origin of some violation of the pressupositions about locality into Bell’s theorem. In fact, for the problem we were analyzing, we may put ρ(α, α′ ) = ρ(α)Ω(α′ |α), where ρ(α) = 1/2π and Ω(α′ |α) is the conditional probability of an outcome of α′ given that α obtained. Function Ω(α′ |α) encompasses the correlation function in the form Ω(α′ |α) = c(α, α′)ρ(α′). For the present physical situation, we have Ω(α′ |α) = δ(α − α′ − π), where δ(x) is Dirac’s delta distribution. Dirac’s delta distribution represents precisely the correlation between α and α′ and it is most probably the origin of the problem. In fact, if we replace 𝑐(𝛼, 𝛼′) by 1 (no correlation
266
Olavo Leopoldino da Silva Filho
between α and α′), we get simply 𝑃(𝑎⃗, 𝑏⃗⃗) = 0, trivially satisfying Bell’s theorem. Other correlations, different from 1 and 𝛿(𝛼 − 𝛼′ − 𝜋)/𝜌(𝛼′) will give intermediary results for 𝑃(𝑎⃗, 𝑏⃗⃗), some obeying Bell’s theorem, others not. All the previous arguments suggest that we should be much more carefull when analyzing the notion of non-locality in physical systems. Be it as it may, the fact is that Bell assumes that Bohmian Mechanics can be considered a deterministic support for Quantum Mechanics, that is, a true hidden variable theory. From this belief, Bell then also assumes Bohmian Mechanics to be a true non-local reconstruction of the quantum mechanical formalism (since, according to him, Quantum Mechanics cannot have a deterministic local support). Given our previous results regarding Bell’s theorem applicability, we are faced with two different questions: (a) the extension to which Bohmian Mechanics can, indeed, be considered a hidden variable theory for Quantum Mechanics, mainly with respect to being a deterministic approach; and (b) the extension to which the notion of “non-locality” can be applied to Bohmian Mechanics.
9.3. THE RELATION BETWEEN BOHMIAN MECHANICS AND HIDDEN VARIABLE THEORIES There are a number of arguments one can present to deny to Bohmian Mechanics the status of a deterministic theory and, thus, the status of a hidden variable support to Quantum Mechanics. In what follows, we present some of these arguments.
9.3.1. The Quantum Potential and the Stochastic Velocity When we were developing the stochastic derivation of Quantum Mechanics it was possible to show the deep relation between Bohm’s quantum potential 𝑄(𝑞⃗; 𝑡) and the stochastic velocity 𝑢 ⃗⃗(𝑞⃗; 𝑡). This relation is given by 𝑄(𝑞⃗; 𝑡) = −
𝑚 ℏ ⃗⃗(𝑞⃗; 𝑡)2 + ∇ ⋅ 𝑢 ⃗⃗(𝑞⃗; 𝑡)]. [𝑢 2 𝑚
This result unequivocally shows that the source of the quantum potential is the random fluctuations of the physical system’s stochastic behavior. Indeed, in the stochastic derivation of the Schrödinger equation we defined two velocities: velocity 𝑣⃗(𝑞⃗, 𝑡) represented the average movement of the physical system, while the other velocity 𝑢 ⃗⃗(𝑞⃗; 𝑡) represented only the random deviations from this average movement, such that the true velocity was written as 𝑐⃗ = 𝑣⃗ + 𝑢 ⃗⃗. If 𝑄(𝑞⃗; 𝑡) is related solely to the stochastic velocity 𝑢 ⃗⃗(𝑞⃗; 𝑡), its statistical origin is definitely established.
On Reality, Locality and Bell’s Inequalities
267
Furthermore, we arrived at the Schrödinger equation by means of an average (an average that, after the advent of the Langevin derivation, can be easily seen to be connected to the fluctuation-dissipation theorem, see next section.) Thus, Bohm’s formalism ought to be considered as a mere statistical average of the true random behavior furnished by the Langevin approach. Indeed, it does not matter at all if Bohmian equations can be put in the appearance of a Hamilton-Jacobi equation (see next section); the referents of these equations’ main symbols remain statistical. If one may argue in favor of a support to Quantum Mechanics, then this support should be represented by the Langevin equations. However, these equations are naturally stochastic and thus intrinsically statistical.
9.3.2. On the Hamilton-Jacobi Form of Bohmian Mechanics It is always stated that Bohmian Mechanics puts Quantum Mechanics into a HamiltonJacobi form. However, the best one can say on that matter is that Bohmian Mechanics mimics a Hamilton-Jacobi theory (with the inclusion of an extra term named quantum potential). Indeed, the Hamilton-Jacobi theory is a theory on phase-space, while Bohmian Mechanics refers to the momentum as 𝑝⃗(𝑞⃗, 𝑡), which is not independent of the variable 𝑞⃗ and, thus, the theory never qualifies as a true Hamilton-Jacobi theory. In fact, one never tries to solve Newton’s (or Hamilton’s) equation coming from Bohm’s reconstruction. To get into Bohmian trajectories, one needs to consider only the equation for the velocity or momentum, given by 𝑝⃗(𝑞⃗; 𝑡) = ∇𝑠(𝑞⃗; 𝑡), where 𝑠(𝑞⃗; 𝑡) is the phase of the probability amplitude function. Furthermore, this 𝑝⃗(𝑞⃗; 𝑡) comes from the average momentum (as represented in the entropy derivation), a statistical descriptor. Furthermore, when we are considering stationary states, the value of p ⃗⃗(q ⃗⃗; t) can be zero (s states of the hydrogen atom and, indeed, any quantum mechanical situation in which the probability amplitude is real). If we were to consider p ⃗⃗(q ⃗⃗; t) literally (not as an average), this result becomes simply ludicrous. It is not correct to conclude, from the fact that a theory implies sharp trajectories, that it is a deterministic theory. These sharp trajectories can be the outcome of averages previously done, of which one may be unaware of.
9.3.3. Relations between Statistical Correlation and Non-Local Behavior Everyone knows that the quantum potential encompasses what is generally called nonlocal interactions between parts of the physical system. However, we can scrutinize the origin of these non-local interactions by proposing the following probability density function (for a two particle physical system to simplify the analysis):
268
Olavo Leopoldino da Silva Filho 𝜌(𝑟⃗1 , 𝑟⃗2 ) = 𝜌1 (𝑟⃗1 )𝜌2 (𝑟⃗2 )𝑐(𝑟⃗1 , 𝑟⃗2 ),
where 𝑐(𝑟⃗1 , 𝑟⃗2 ) represents correlation (if 𝑐(𝑟⃗1 , 𝑟⃗2 ) = 1 we have no correlation). If we substitute this probability density function into the term 1 1 𝜕𝑘2 𝜌𝑘 𝜕𝑘 𝜌𝑘 2 1 𝜕𝑘2 𝑐 𝜕𝑘 𝑐 2 − 𝜕𝑘2 𝑙𝑛[𝜌(𝑟⃗1 , 𝑟⃗2 )] = − [ +( ) ]− [ +( ) ] 4 4 𝜌𝑘 𝜌𝑘 4 𝑐 𝑐 that represents the random force in the Langevin equation, we get a result that becomes separable in the variables 𝑟⃗1 , 𝑟⃗2 whenever we assume 𝑐(𝑟⃗1 , 𝑟⃗2 ) = 1. In the case 𝑐(𝑟⃗1 , 𝑟⃗2 ) = 1, the problem splits into two separable and independent Langevin equations (or Bohmian equations) in which one finds no “non-local” interaction. Therefore, it is obvious that we are getting non-locality because we decided to call “potential” (a source of possible deterministic behavior) a term reflecting correlations (a source of necessary statistical behavior). Let us clarify this point. Indeed, the presence of correlations in some quantum mechanical problem (mainly spin correlations) come imprinted in the way we write the probability amplitude of the problem. For instance, when it comes to particles with half integral spins we must write the total probability amplitude as the antisymmetric product represented by the Slater determinant (assume there are only two particles for simplicity) Ψ(𝑥⃗1 , 𝑥⃗2 ) =
1 𝜓1 (𝑥⃗1 )ω(↑) 𝜓2 (𝑥⃗1 )ω(↑) | |. √2 𝜓1 (𝑥⃗2 )ω(↓) 𝜓2 (𝑥⃗2 )ω(↓)
It is easy to show that all spin correlation comes from the antisymmetry embedded in Ψ(𝑥⃗1 , 𝑥⃗2 ). In fact, the Hartree-Fock approach to multielectronic quantum systems is based precisely on the assumption of an antisymmetric probability amplitude to arrive at the nonlinear Hartree-Fock system of equations in which there appears, besides all the usual terms, one directly related to the antisymmetric property called the exchange-correlation potential. Thus, the strategy of the Hartree-Fock approach is to transform the requisites of spin correlation, imprinted in the antisymmetric probability amplitude, into a potential—the exchange-correlation potential (which is non-local, by the way). Besides that, the kinetic energy is also affected since it becomes written as (2 particles, Hartree units and closed shell systems) 1 𝑇 = 2 ∫ 𝜓𝑎 (𝑟⃗) (− ∇2 ) 𝜓𝑎 (𝑟⃗)𝑑𝑟⃗, 2 that is, the shell scheme is automatically imposed on the solution. Thus, it must be stressed. It is customary for physicists to present statistical constraints, most generally correlations, as potentials. This comes from the fact that Quantum Mechanics gives us a way to consider such statistical constraints in the formalism, by simply writing down the probability amplitude in some specific form. Thus, the strategy of taking correlations into potentials is natural, since we do know (this is the content of Pauli’s
On Reality, Locality and Bell’s Inequalities
269
exclusion principle) that spin correlation means antisymmetric state vector. Thus, we use what we know and transform the problem such that all the specific correlation profile of the problem is imprinted into a potential, which then can be analyzed by usual mathematical methods. This is mathematically okay, but becomes dangerous when interpretations are our concern. Indeed, if one has physical laws (e.g. of conservation) governing any event which must be understood using statistical tools, then these laws, of course, induce on the event statistical correlations. Thus, the law of angular momentum conservation will imply that, for a two particle system, whenever the angular momentum of one of the particles is known, the other is also known. This has nothing to do with signals going back and forth from each particle of the physical system. The law is a constraint over the behavior of the system. Of course, if this law is embedded into the problem as a potential, by any mathematical process, the physical system becomes naturally “non-local”. However, this “non-locality” is not physical. It reflects only our initial choice of calling the realization of the correlations as a potential, and our later confusion, set by bad terminology taken too seriously, about what is a physical potential and what is merely a mathematical artifice. It should be obvious at this point that any statistical correlated problem that is mathematically managed to show this correlation as a potential (because of the way one writes the probability density function) will necessarily present “non-local potentials”. Indeed, since correlation is, by definition, an instantaneous relation between parts of a system, its codification into a potential can give only “non-local potentials”, even if we were dealing with (dynamically) relativistic (thus, local) systems5.
9.4. BOHMIAN EQUATIONS: WHOLENESS We can develop our notions about Bohmian Mechanics a little bit more to unravel the notion of time that is being assumed when one uses this approach as a true deterministic theory. As we will see, the notion of wholeness is a mere unfolding of the abstruse notion of time when Bohmian Mechanics is assumed as a deterministic dynamical theory. We have already said that, from the present approach’s perspective, 𝑄(𝑟⃗ 𝑁 , 𝑡) is by no means a true potential. It reflects the effect of fluctuations, since we have seen that it is nothing but the time average of the Langevin fluctuation term (see chapter six). However, the fluctuations characterizing a physical system come from many different sources, mainly when quantum systems are our concern. Indeed, if we are considering stationary systems for instance, since 𝑅(𝑟⃗ 𝑁 , 𝑡) comes from 𝑁 𝜓(𝑟⃗ , 𝑡) and 𝜓(𝑟⃗ 𝑁 , 𝑡) is obtained from the Schrödinger equation on which we apply
5
All these assumptions of some difference between “signals” and “information” (with the first being capable of traveling at velocities greater than light) are nothing but a runaway solution for a self induced trap. The difference (that does exist) is marked by the notion of “meaning”, a difference that should be irrelevant for material systems with no cognitive trait.
270
Olavo Leopoldino da Silva Filho
boundary conditions, then 𝑅(𝑟⃗𝑁 , 𝑡) must also incorporate these features. Needless to say that there is nothing more “non-local” than boundary conditions. The fact that the quantum potential (let us adhere to the usual nomenclature) depends on the boundary conditions reflects the different aspects of the parameter time in the context of a statistical equation when compared to its role in the context of a dynamical equation. When one solves the Schrödinger equation for some time-independent problem, for instance, one is looking for the stationary solutions of the problem, that is, those probability distributions that do not depend on time. Then, strickly speaking, there is no time parameter for stationary quantum mechanical problems. This means that any effect coming from the boundaries are taken into consideration only after any transient effect has disapeared, that is, after the components of the system become “aware” of these boundary conditions (by hitting them, for instance). When these boundary conditions are incorporated into the dynamical equations by means of a “potential”, there is no way to escape from some sort of “nonlocality”. But this has nothing to do with some particular aspect of Quantum Mechanics (its “weirdness”, one would say), and the next example will show this quite clearly. Example: Consider the context of electrostatics and the boundary condition on metalic surfaces. We all know that we must have the tangential component of the electric field, 𝐸𝑡 , equal to zero over such surfaces. However, let us think of a problem in which a point charge is placed in front of a grounded metalic plane. We know, then, because of the boundary condition, that the lines of electric field must arrive at the metalic plate perpendicularly. However, at the time the charge was placed in front of the plane its lines of field were just radial. As time evolves, however, the tangential components of the field are dissipated, since they give rise to dissipative charge movements (Joule effect), or the charge distribution over the plane changes, there resulting only the orthogonal field components. If we manage to insert the boundary conditions into some dynamical equation related to the field behavior, then it is pretty much obvious that the problem becomes “non-local”. However, this is the mere consequence of us confounding two different time structures in the interpretation of a mathematical manouvre. Example: If the reader feels the last example as too abstract and still thinks of the quantum potential and its non-local nature as some aspect specific of Quantum Mechanics, consider then the wave equation for the classical electromagnetic fields (component 𝐸𝑖 (𝑟⃗, 𝑡) to simplify the math), given by 1 𝜕 2 𝐸𝑖 (𝑟⃗,𝑡)
𝛻 2 𝐸𝑖 (𝑟⃗, 𝑡) − 𝑐 2
𝜕𝑡 2
= 0,
(9.12)
where 𝑐 is the velocity of light. If we use the same ansatz as before for 𝐸𝑖 (𝑟⃗, 𝑡) = 𝑅𝑖 (𝑟⃗, 𝑡)exp(𝑖𝑆(𝑟⃗, 𝑡)), we get 1 𝜕𝑆(𝑟⃗,𝑡) 2 𝑐2
[
𝜕𝑡
] = [𝛻𝑆(𝑟⃗, 𝑡)]2 −
𝜕 𝜇 𝜕𝜇𝑅(𝑥,𝑡) 𝑅(𝑥,𝑡)
,
(9.13)
On Reality, Locality and Bell’s Inequalities
271
and 𝜕𝜇 (𝑅(𝑥, 𝑡)2𝜕𝜇 𝑆(𝑥, 𝑡)) = 0,
(9.14)
where 𝜕𝜇 = (𝛻, 𝑐 −1 𝜕𝑡 ), 𝜕𝜇 = (𝛻, −𝑐 −1 𝜕𝑡 ) are contravariant and covariant four vectors and 𝜕𝜇 ∂𝜇 = 𝛻 2 − 𝑣 −2 𝜕𝑡2 is the D’Alambertian. If we asociate 𝜕𝑆/𝜕𝑡 with the energy, and ∇𝑆 with the momentum, as we do in the derivation of Bohm’s equation, we get the structure 𝐸 2 /𝑐 2 = 𝑝⃗2 + 𝑄(𝑥, 𝑡),
(9.15)
where 𝑄(𝑥, 𝑡) represents some non-local “potential” in exactly the same terms as in the quantum mechanical formalism (and we are talking about classical electromagnetic fields!). Equation (9.14) says essentially that the flux of energy is zero (conservation equation). We would then say that the photons (which would have energy E and momentum p) are now driven by some wave-like “potential” 𝑄(𝑥, 𝑡). Photons, thus, would have to be related to one another by some sort of non-local “potential”. If our previous more abstract arguments were not enough to convince the reader that Bohm’s equation has nothing to do with non-locality, this last example should do the deed. By exactly the same arguments put forward for Quantum Mechanics (superposition included), one should conclude that a deterministic theory based on trajectories would have to be nonlocal to furnish the same results of the electromagnetic theory?! Preposterous. Some authors spent a lot of time trying to make this so called non-locality intelligible. The usual maneuver is to propose some difference between signal and information, as we have already said. The argument is that these “weird” quantum systems are capable of exchanging signals instantaneously, but not information, and relativity theory should apply only to information. This is a perfect example of the consequences of bad interpretations of a physical theory. This difference between signal and information, when referring to physical systems, is simply ludicrous, grotesque. The only difference between these two concepts must be referred to meaning: signal conveys no meaning, while information does. However, it remains to be said what meaning (as referred to human traits) has to do with physical systems—it does not help to say that information refers to unstructured signals, since this just begs the question with the “unstructured” concept. Should an electron be capable, in relativistic interactions, to understand some information in contrast of not understanding signals traveling at speeds greater than light? Even if we go along with this joke, why not saying the same for sound? Now sound behavior becomes weird?! It seems that it would be better to inaugurate some Church of the Weird Behavior than to make physics on these terms. This may seem a joke; it is not! Bohmian mechanics is usually related to the so called “wholeness”, the property of physical objects to be “connected” with all other objects in the world. Wholeness is, of course, just a name for non-locality taken to its ultimate consequences. In the end, it is nothing but guru-like bad metaphysics. Sadly, Quantum Mechanics was prolific in wild interpretations such as that regarding “wholeness”. The so called Everett’s
272
Olavo Leopoldino da Silva Filho
Many Worlds Interpretation is another interpretation that becomes absurd when it is taken from the context of a mere artifice to make it possible a general relativistic extension of Quantum Mechanics with respect to the principle of reduction of the wave packet. Still, these two awkward interpretations, to say the least, did the joy of mystics and esoterists along all the XXth century (and still do). It is very difficult to differentiate in Bell’s inequality what is a test for correlated statistical behavior from what is a test for non-local interactions. Aspect, in an impressing series of experiments[107] showed that Quantum Mechanics violates Bell’s inequality. These experiments have thus showed that Quantum Mechanics refers to correlated statistical behavior. Anything else is nothing but wild extrapolation.
9.5. ON DETERMINISM Von Neumann’s conclusion that Quantum Mechanics based on the Schrödinger equation cannot be put into a dispersionless, and thus deterministic, format is correct, although one must get rid of some problems with his operator formation rules so that his proof would also become appropriate (but, eventually, irrelevant). Note, however, that this is not a proof that Quantum Mechanics does not have a deterministic support. It is a proof that Quantum Mechanics, based on the Schrödinger equation does not have such a deterministic support. In chapter six, about the Langevin derivation of the Schrödinger equation, we unraveled a level of analysis of quantum mechanical systems that is deeper than the one assumed by the use of the Schröndiger equation. Indeed, at the level of the Langevin equations we get not only the domain in which fluctuations were already averaged upon, but also the transient level, in which fluctuations take control (time intervals insuficient to let the averages to stabilize). In this sense, the Schrödinger equation is an emergence of the underlying Langevin equations. We have also shown that the Newtonian limit should be taken de partindo from the Langevin equations (𝛾 → 0), not from the Schrödinger equation, where the mathematical assumption of ℏ → 0 is blatantly inconsistent. This deeper level of analysis, given by the Langevin equations underpinning the Schrödinger equation, is also a statistical one, although in a sense quite different from the one exposed by the level of analysis of the Schrödinger equation. However, one must remember that the fluctuations that introduce randomness into the corpuscular subsystem are the outcome of the theoretical choice of treating the field subsystem only as a mean field theory. We cannot assume, in advance, that it is impossible to devise some deterministic structure of interaction between the two subsystems such that the correct fluctuation profile would be the mere outcome of the separation of the two systems. Of course, if such a theory is devised, then it would inaugurate a new level of analysis, deeper than the one related to the Langevin equations. For this level of analysis, it remains obscure if Bell’s theorem would be applicable. In some sense, therefore, some may always keep the optimism and assume that “(…) what is proved by the impossibility proofs is [indeed] lack of imagination”.
Chapter 10
INDISTINGUISHABILITY Since the beginning of Quantum Mechanics, the differences in the way by which one must count states within the classical framework and the quantum one were a source of astonishment. Although the theme is ancestral, there is still some controversy on the subject. Three different probability distribution functions, known as Boltzmann's, Bose-Einstein's and Fermi-Dirac's come as a result of three different counting strategies, each one with its own properties and characteristics. From Boltzmann's and Gibbs' seminal works[108], the investigations of Ehrenfest[109] and the present approaches on the subject[110], it has been almost always assumed that the major difference between classical and quantum countings was due to our ability to distinguish among classical particles, even if they are identical. There are two main sources of arguments in favor of this distinguishability: 1. the first argument comes from the assumption that one always has trajectories in the classical domain and they suffice to “follow” particles and, thus, keep track of their identities. On the other hand, there is no quantum counterpart for these trajectories (sometimes it is assumed that there are trajectories within quantum mechanics, but when the probability amplitudes of these particles overlap they loose their distinguishability). 2. the second argument comes from the assumption that one always can place different numbers, colors or whatever on classical particles so that they can be tracked (or, depending on their size and velocity, simply because we can follow them with our eyes). At the end, all classes of arguments relate distinguishability with tracking capability. The fact is that it is generally assumed that classical entities, otherwise identical, can always be distinguished from each other because they can be tracked, while quantum ones should be assumed indistinguishable—this is the orthodoxy on that matter. It is obvious that there should be no doubt about the validity of the mathematical expressions describing Bose-Einstein or Fermi-Dirac statistics. These expressions already passed experimental test innumerous times.
274
Olavo Leopoldino da Silva Filho
However, we will try to show that the connection between these formulas (Boltzmann's, Bose-Einstein's or Fermi-Dirac's) and the concept of (in)distinguishability, conceived as a property coming solely from the particles, is misleading. Furthermore, we will try to show that the use of Boltzmann's expression for the weight function is inadequate, no matter which arguments on distinguishability one uses.
10.1. SOME TRIVIAL RESULTS ON COUNTING What first strikes the uninformed reader is the very difference implicit in the use of the two distinct concepts: identity and distinguishability. There is no such thing when one uses traditional counting methods of statistics (not classical statistics nor quantum statistics, simply statistics).
Figure 10.1. How to arrange nine geometrical objects with respect to their color (even if we know that they have different shapes).
Take, as example, the following trivial situation where everyone (presumably) will agree on the way one should be counting. One has nine geometric objects as in Figure 10-1. Three of them are red (R), two are yellow (Y) and four are blue (B), while the shapes are as indicated in the figure. We then ask [Q1]: in how many different ways can we arrange them with respect to color? This is a trivial problem that can be found in any textbook on probability and statistics. The answer is, obviously, 𝑊𝑄1 =
9! , 3! 2! 4!
since, with respect to color, the first three geometrical objects are considered indistinguishable, as with the fourth and fifth yellow objects and the last four blue objects. Of course, in the case of 𝑛 objects that can be considered indistinguishable with respect to some property, we would get simply
Indistinguishability 𝑊𝑛 = 𝑛
𝑛!
,
1 !𝑛2 !⋯𝑛𝐾 !
275 (10.1)
where 𝐾 is the number of different values for the property, 𝑛 is the number of objects and 𝑛𝑖 is the number of indistinguishable objects within some class defined by the property used to count (e.g. color, color plus constitution, etc). Indeed, we now change our question to: [Q2]: in how many different ways can we arrange them with respect to color and shape? The answer is again trivial and given by 𝑊𝑄2 =
9! , 1! 2! 1! 1! 1! 2! 1!
since, now, the categories of indistinguishable objects changed when we changed our property used to count. If this were not to be the case, almost all trivial problems of probability and counting would be wrong. Indeed, consider, for instance, counting “different arrangements of male and female brothers (with respect to gender) when three are male and two are female”. One should count all male (female) brothers as identical, no matter how different they are. If one count different male brothers as if they were distinguishable, the result of the counting is simply wrong. The fact is that when it comes to some statistical counting of arrangements and combinations there are no two different concepts to consider independently: identity and distinguishability. There is only one: distinguishability with respect to the property being used to count. Moreover, each 𝑛𝑖 ! in the denominator of (10.1) reflects exactly one class of indistinguishable objects, given the property used to count (the property, of course, can be complex and referring to more than one characteristic of the objects, like color and shape). The obvious conclusion is that the concept of distinguishability is always used in an operational perspective, not an ontological one. It means nothing if, in Q1, we can “see” that some colored geometrical objects have different shapes; the property used to count being color, their distinguishability can be phrased only with respect to their color - irrespectively of what “they really are”. The previous argument remains unchanged if we apply our counting skills to classical or quantum worlds. Given the properties of the objects, counting proceeds without reference to such worlds. In the usual argument, classical particles, when having exactly the same physical properties (as mass, charge, etc), must be considered identical. However, they also should be considered distinguishable because we can put on them some mark or simply track them from their trajectories. From the previous considerations, it should be clear how grotesque such an argument is. In the usual physical situation we are counting how energy cells can be filled by these particles. The property used to count is energy. It is, thus, irrelevant if we can or cannot track the particles. If these particles are considered identical, the only thing that can differentiate them is the energy they assume, and this is precisely what is inscribed into the 𝑛𝑖 ! in the
276
Olavo Leopoldino da Silva Filho
denominator of (10.1) – there are 𝑛𝑖 particles assuming energy in the cell [𝜀𝑖 , 𝜀𝑖 + Δ𝜀𝑖 ], 𝑛𝑗 particles assuming energy in cell [𝜀𝑗 , 𝜀𝑗 + Δ𝜀𝑗 ], etc. On one hand, if tracking capabilities were to be assumed, then there should be no 𝑛𝑖 ! in the denominator, since we would have to consider all particles “distinguishable”, since all particles have different trajectories (or are associated to different numbers, or colors, or can be seen by us as being different), and the [sole complex] property used to count should be “particles with different trajectories and energies” – this is completely analogous to the “shape and color geometrical objects” already mentioned. Thus, there would be no 𝑛𝑗 in the denominator. The number of ways to arrange them would then be equal to the number of ways to permute them. On the other hand, if we simply abandon the criterion of “distinguishability” (let our tracking capabilities to pass unoticed) and keep saying that the particles are all identical (since they have the same properties as charge and mass), then we still have 𝑛! in the numerator, but we will have to put the same 𝑛! in the denominator, and the ways to arrange them would be one! Indeed, if we merge the two concepts of dinstinguishability and identity into one (or simply abandon one of them as spurious), as we should, then we must change our question, or we will get one of the trivial answers just presented. Thus, given the sole criterion of identity, we must now ask in how many different ways we can fill our energy boxes (or colored boxes). This means that, in all situations in which we have only one set of particles (e.g. only photons or only electrons) presenting energies filling different energy cells, we would have to combine these particles into the energy cells with possible repetitions (not arrange them). Consider again Q1; we now consider three colored boxes with subboxes, one Red (subboxes light red and dark red, that is, degenerate by 𝑔𝑅 = 2 with respect to red color), one Yellow (subboxes light yellow and dark yellow, 𝑔𝑌 = 2), and one Blue (subboxes light blue, medium blue and dark blue, 𝑔𝐵 = 3) and ask: [Q3]: in how many ways we can fill these colored boxes with 𝑁 geometrical figures (of any shape) such that we put 𝑛𝑅 into the Red box, 𝑛𝑌 into the Yellow box and 𝑛𝐵 into the Blue box? It is easy to see that we get simply 𝑊𝑄3 =
(𝑛𝑅 + 𝑔𝑅 − 1)! (𝑛𝑌 + 𝑔𝑌 − 1)! (𝑛𝐵 + 𝑔𝐵 − 1)! ∙ ∙ , 𝑛𝑅 ! (𝑔𝑅 − 1)! 𝑛𝑌 ! (𝑔𝑌 − 1)! 𝑛𝐵 ! (𝑔𝐵 − 1)!
such that 𝑁 = 𝑛𝑅 + 𝑛𝑌 + 𝑛𝐵 , since, fixing this last expression as a constraint, the three factors above represent independent probabilities. The result we get of the previous counting furnishes the ways we can get 𝑛𝑅 Red, 𝑛𝑌 Yellow and 𝑛𝐵 Blue geometric figures (where we are counting each shade of color as “degenerate” with respect to the underlying color). The differences are obvious if we take a look at Figure 10-2 and compare it with Figure 10-1. In Figure 10-2 the geometrical figures lost their identification labels (colors), which were now passed to the boxes, meaning that the geometrical figures are all identical (as geometrical objects [as particles]) no matter how different they may be in shape [in
Indistinguishability
277
trajectories, labels, etc.] That is, all figures [particles] are entitled to occupy any one of the degenerate shade-states [degenerate energy-states] of the color-boxes [energy-boxes]. This colorful example is completely analogous to the usual Quantum Mechanical counting of bosons. Again, we must stress that the fact that the geometrical figures may be “visually distinguished” or even “labelled” meant nothing to the outcome of our counting— they are being counted as geometrical figures [identical particles], not as geometrical figures of some shape or another [identical particles with some trajectory or another].
Figure 10-2. Ways to combine nine geometrical objects into three degenerate stages with degeneration numbers equal to 2, 2 and 3.
10.2. GIBB'S PARADOX AND THE DISTINCTION BETWEEN CLASSICAL AND QUANTUM WORLDS As everyone knows, the 𝑁! factor in (10.1) precludes the entropy from being extensive, an important feature any entropy should present. Its removal from that expression continues to be considered ad hoc and this is the base for treating it as a paradox, Gibbs' paradox, and for assuming that classical physics inherently furnishes the wrong answer. Although there is dissent with respect to this notion of “paradox” in Gibb's work [111] for historical reasons, this is irrelevant for the present approach. In each and every trial to remove the 𝑁! term in the numerator one generally accepts the main argument of the general exposition: that the possibility of tracking particles in the classical domain, as opposed to the impossibility of doing so in the quantum domain, suffices to justify the appearance of the 𝑁! term on the former, while it must be absent in the latter. The first problem with these assumptions is that the 𝑁! factor in the weight function has nothing to do with distinguishability, as we have shown in section two. It reflects the simple fact that we are permuting 𝑁 particles when we use Boltzmann's weight factor and arrangements. Distinguishability is decided in arrangments by the denominators 𝑛𝑖 , 𝑖 = 1,2, . .. which mark classes of indistinguishable objetcs. Thus, it is grotesque to say that Boltzmann's weight refers to classical distinguishability of particles in the sense that we can track them by any means whatsoever. If this would be the case, then there should be no 𝑛𝑖 ! term in the denominator. The appearance of the 𝑛𝑖 ! in the denominator suggests that we are counting particles with respect to some property that allows these particles to be grouped into classes of indistinguishable objects.
278
Olavo Leopoldino da Silva Filho
Moreover, when permuting our 𝑁 objects, we are using the energy as the property to count that marks the class of distinguishable objects—nothing to do with tracking capabilities. The second problem with assumptions related to the Boltzmann's weight factor: we should be counting for different states—energy states. Comparing this to our three situations in the previous section, we conclude that we should not be counting here as we did for situations one and two. Instead, we should be counting as we did for situation three— irrespectively if we are in the quantum or classical domain. This should be clear from two facts: a) our counting property is the energy (with possible degenerate levels), something obvious from the very result one gets for Boltzmann's distribution—exp(−𝑈/𝑘𝑇), where 𝑈 is the total energy; b) any one of our (identical) particles may have this or that energy. We are thus fixing energy “boxes”(with sub boxes for degenerate levels) and asking for the ways these boxes can be filled with our 𝑁 particles—the way we can combine these particle within fixed energy intervals. This is a situation completely analogous to our “situation Q3”. The fact that the particles are identical increases the strenght of these conclusions. It is not classical physics that makes us count wrongly, thus making Gibbs' paradox to emerge. It is our way of counting that is wrong, when we use Boltzmann's factor—that is, when we assume our counting as an arrangement, not as a combination.
10.3. A DERIVATION OF THE CORRECT BOLTZMANN WEIGHT Boltzmann's weight function is generally presented compared to the quantum weight functions. Table 13 shows the usual approach of textbooks in giving an example of the three ways of counting for two particles occupying two degenerate states. The first counting assumes that the particles are distinguishable, so that there are two possibilities for state (1, 1); the other two countings assume indistinguishability, the last one assuming also Pauli’s Exclusion Principle. Table 13. Usual counting process for Boltzmann's, Bose-Einstein's and Fermi-Dirac's distributions
State (2,0)
Boltzmann ab -
Distributions Bose-Einstein aa -
(0,2)
-
ab
-
aa
(1,1)
a b
b a
a
a
Fermi-Dirac a
a
Indistinguishability
279
In any case, generalizing the results of Table 13 for 𝑔𝑖 states with 𝑛𝑖 particles, one gets the following weight functions 𝑛𝑗
𝑊𝐵∗ ({𝑛𝑖 })
=
𝑔𝑗 𝑁! ∏𝐾 𝑗=1 𝑛 ! 𝑗
, 𝑊𝐵𝐸 ({𝑛𝑖 }) = ∏𝐾 𝑗=1
(𝑔𝑗 −1+𝑛𝑗 )! (𝑔𝑗 −1)!𝑛𝑗 !
, 𝑊𝐹𝐷 ({𝑛𝑖 }) = ∏𝐾 𝑗=1
𝑔𝑗 !
,
(𝑔𝑗 −𝑛𝑗 )!𝑛𝑗 !
for Boltzmann's, Bose-Einstein's and Dirac's weight functions. In this case, 𝑊𝐵∗ gives an incorrect answer because of the 𝑁! factor. However, we may take another path to Boltzmann’s distribution, which clarifies what we have previously said. It is amply known that one can derive the correct 𝑊𝐵 (without the asterisc) from the other two: one just takes 𝑊𝐵𝐸 and assumes 𝑔𝑗 /𝑛𝑗 → ∞. In this case, Stirling's approximation allows us to write 𝑛! = 𝑛𝑛 𝑒 −𝑛 and thus nj
g +nj
WBE =
(gj +nj ) j ∏K nj gj j=1 nj gj
∏K j=1
=
gj
gj (1+nj /gj ) nj nj
;
using the fact that (1 + 𝑥/𝑠)𝑠 → 𝑒 𝑥 if 𝑠 → ∞, we end with nj
WBE ({ni }) =
∏K j=1
nj
gj
nj −n j
nj e
=
gj ∏K j=1 n ! j
= WB ({ni }),
where 𝑊𝐵 already lacks the 𝑁! term, which 𝑊𝐵∗ shows, and is devoid of problems regarding entropy extensivity. The same approach can be used for 𝑊𝐹𝐷 . We thus have nj
WFD ({ni}) = ∏K j=1
nj
gj
g
n
(1−nj /gj ) j e j nj !
=
gj ∏K j=1 n ! j
= WB ({ni }).
Now we must ask for the interpretation of the limit 𝑔𝑗 /𝑛𝑗 → ∞. Figure 10-3 shows the asymptotic behavior of 𝑊𝐵 /𝑊𝐹𝐷 and 𝑊𝐵 /𝑊𝐵𝐸 as 𝑔 grows with 𝑛 fixed as 200.
Figure 10-3. Asymptotic behavior of Bose-Einstein's an Fermi-Dirac's weights as the density of states goes to infinity.
280
Olavo Leopoldino da Silva Filho
Figure 10.4. the same example of combination of colors, but with a continuous range of shades for each color.
Clearly, this means that for any number of particles 𝑛𝑗 , the amount of levels to be filled by them is much higher. Indeed, this limit will always obtain if we approach continuity for the allowed energies. This is in every sense equivalent to our combination of colors when our colored-boxes are chosen to be ranges of the electromagnetic spectrum, as in Figure 10-4. The conclusion is obvious enough: the only difference between Boltzmann's and BoseEinstein's distributions comes from the assumption of the former that energy forms a continuum. This has nothing to do with distinguishability versus indistinguishability. Now, if we count as in the first column of Table 13 for Boltzmann's weight (that is, assuming distinguishability of particles with respect to the energy), we get the physically wrong expression (because of the term 𝑁!), while if we count as in the second or third columns of Table 13 for Bose-Einstein's or Fermi-Dirac's weight (that is, assuming indistinguishability with respect to the energy) and take the limit of continuous energy values we get the physically correct expression for Boltzmann's weight function. It seems obvious that one should take the second approach, not the first. In such a case, all three weight functions come not from the assumption of some ontological indistinguishability of particles, but from the understanding that our ability to track particles (or bowling balls) is irrelevant to the statistical result, given the counting property, which is energy states, in a way analogous to situation Q3 of section 10.2. The obvious interpretation of the previous arguments should be that there is no inherent problem with classical physics (in what respects to counting). The problem resides (as it always did) in the (wrong) way we decided to count. In fact, the innadequacy of 𝑊𝐵∗ should have been considered from the start as an indication of problems with the chosen process of counting, not an indication of problems with some particular domain of physics (classical or quantum). It seems that we became so eager to blame Classical Physics for each failure of some approach to a particular problem in Physics that we lost our hability to scrutinize (and, thus, understand) our own mistakes. These arguments, together with those of the previous sections, should suffice to show how classical and quantum countings are equal with respect to the notion of distinguishability, although different with respect to the number of possible degenerate states (where quantization reveals its importance).
10.4. HOW COME? It is astonishing that the type of mistakes we pointed out in this chapter has prevailed for so much time—more than a century. The results and arguments here presented are not in any sense complex and could be presented by any average physicist from the beginning of the XX 𝑡ℎ century to these days.
Indistinguishability
281
Historically, all the confusion about indistinguishability made by great scientists like Boltzmann and Gibbs can be understood, since they were the first to shed light into the subject. However, the endurance of these mistakes should be considered as the outcome of a trend that became usual to almost all physicists working with quantum mechanics: the assumption of an abyss, generally an ontological one, between classical and quantum physics and the overall inadequacy of the former. We have shown in this chapter that, from the point of view of counting, classical and quantum particles should all be counted as if they were indistinguishable, not because they are indistinguishable ontologically, but because the counting property in all cases (the energy) cannot discern them on the grounds of their possible extrinsic properties (such as trajectories, labels, colors, etc). It is the very distinction, within statistical counting in physics, between identity and distinguishability that is fundamentally wrong.
PART III. RELATIVISTIC EXTENSION
Chapter 11
SPECIAL AND GENERAL RELATIVISTIC QUANTUM MECHANICS Quantum Mechanics is now a vetust physical theory, more than a century old. Although being contemporary to other equally groundshaking physical theories, the Special and General Theories of Relativity, Quantum Mechanics never showed the same status of maturity of its companions. The two theories of Relativity were given stable physical interpretations as early as the first half of the XX 𝑡ℎ Century, while Quantum Mechanics is still wandering in search for a definitive interpretation, despite presenting mature mathematical and experimental developments. Moreover, when it comes to their interplay, Quantum Mechanics has always presented some sort of refrainment to mingle with the General Theory of Relativity into one comprehensive physical theory, mainly because of interpretation issues. Although the Special Theory of Relativity survived, to some extent, the assumption about the inexistence of trajectories, the wavelike background, notions of uncertainty and so on, all these showed to be too much for a General Theory of Relativity to encompass. Indeed, one of the main concepts of the General Theory of Relativity, the geodesic, is usually considered problematic (to say the least) within most interpretations of Quantum Mechanics. Even the Special Theory of Relativity presented interpretation issues that stressed its different conceptual origins compared to Quantum Mechanics. This became apparent when the first relativistic extensions of Quantum Mechanics were proposed in terms of the Dirac and Klein-Gordon relativistic equations. Quantum Field Theory was then necessary to smooth out the wrinkles aroused by the necessity of making these two fundamental representations of the World, about the too small and about the too fast, to come to an agreement. Nowadays, it is not only a Quantum Gravity theory that is lacking, but also a General Relativistic reformulation of Quantum Mechanics1. A reformulation made in the same lines of Special Relativistic Quantum Mechanics, instead of Quantum Field Theory. In part, this void is the outcome of the generally accepted idea that Relativistic Quantum Mechanics (Special or General) can never become a complete and consistent theory—with interpretation issues playing a major role in subsidizing this perception?. 1
In this chapter we will be using the term "Special Relativistic Quantum Mechanics" to describe those approaches based on the relativistic equations of Dirac and Klein-Gordon, as opposed to the relativistic Field Theory reformulation of Quantum Mechanics.
286
Olavo Leopoldino da Silva Filho
This chapter goals are twofold: we will first show how to extend the characteristic function derivation of Quantum Mechanics as a means to arrive at its quantum mechanical relativistic counterparts (special and general). In the case of the special relativistic quantum mechanical equations this would be merely a reconstruction, if the method of derivation were not to make explicit some misconceptions hidden by the usual way of presentation of these equations -- in fact, misconceptions that are at the root of the general discomfort with the Special Relativistic Quantum Mechanics approach. The second goal is to present the consequences of the derived General Relativistic Quantum Mechanics Theory (not Quantum Gravity). We will unfold these consequences presenting the solution of a very basic but instructive example.
11.1. SPECIAL RELATIVISTIC EQUATIONS Pehaps it is already clear that the process of relativistic generalization of the Schrödinger equation derivation method (to be presented shortly) will follow exactly the same general lines of previous chapters (specifically those followed in chapter two). In order to make this generalization we expect that it would be necessary only to write the axioms in a way compatible with the special and general relativistic theories and proceed with exactly the same mathematical steps followed to derive the non-relativistic Schrödinger equation. Thus, in what follows we rewrite the three axioms of our theory to adapt them to the requirements of the Special Theory of Relativity. We will show that this is the only step needed to derive the special relativistic quantum equations. The particle state is described by the function 𝐹(𝑥, 𝑝; 𝜏) giving the probability density of finding this particle with a four-position 𝑥 and a four-momentum 𝑝 at some value of the scalar parameter 𝜏. Let us list the modified axioms of our theory: Axiom 1: For closed systems, the probability density function 𝐹(𝑥, 𝑝; 𝜏) is a constant of motion 𝑑𝐹(𝑥,𝑝;𝜏) 𝑑𝜏
= 0.
(11.1)
Again, the strictly correct equation should be the relativistic counterpart of the stochastic Liouville equation, but as happens in the non-relativistic case already analyzed, the effect of passing to the Schrödinger equation is just to wash out the detailed stochastic behavior in favor of a description (usually a stationary one) in which this behavior is represented only by average values, as in Δ𝑥 or Δ𝑝. Axiom 2: The transformation, defined as 𝑍(𝑥, 𝛿𝑥; 𝜏) = ∫ 𝐹(𝑥, 𝑝; 𝜏)exp [𝑖
(𝑝𝛼 −𝑒𝐴𝛼 /𝑐)𝛿𝑥𝛼 ℏ
] 𝑑4𝑝
(11.2)
is adequate for the description of a general quantum system interacting with an electromagnetic field, where 𝑍 is called the relativistic characteristic function.The electromagnetic field is introduced through the four-vector
Special and General Relativistic Quantum Mechanics 𝐴𝛼 = (ϕ, 𝐀),
287 (11.3)
where ϕ is the scalar potential and 𝐀 is the vector potential. The inclusion of the vector potential in the momentum appearing in the exponent of the characteristic function is just another generalization of the method to include the canonical momentum, coherent with the presence of electromagnetic fields. Axiom 3:The characteristic function can be written as the product 𝑍(𝑥, 𝛿𝑥; 𝜏) = Ψ † (𝑥 −
𝛿𝑥 2
; 𝜏) Ψ (𝑥 +
𝛿𝑥 2
; 𝜏).
(11.4)
The derivation of the Klein-Gordon equation follows steps similar to the non-relativistic derivation. However, as a means of giving the reader a feeling of the delicacy and beauty of the derivation method, and since we are including the electromagnetic field, we will present the (lengthy) calculations in their most important steps. Thus, using equation (11.1) we can write ∂𝐹 ∂𝜏
+
𝑑𝑥 𝛼 ∂𝐹 𝑑𝜏 ∂𝑥 𝛼
+
𝑑𝑝𝛼 ∂𝐹 𝑑𝜏 ∂𝑝𝛼
= 0,
(11.5)
where (assuming the presence of the electromagnetic field only) 𝑑𝑥 𝛼 𝑑𝜏
=
𝑝𝛼 𝑚
;
𝑑𝑝𝛼 𝑑𝜏
=
𝑒 𝑝 𝐺 𝛼𝛽 , 𝑚𝑐 𝛽
(11.6)
with 𝐺 𝛼𝛽 = ∂𝛼 𝐴𝛽 − ∂𝛽 𝐴𝛼 the electromagnetic tensor. We now put (11.2) and (11.6) in (11.5) and integrate over 𝑝 to find ∂𝑍 ∂𝜏
+
1 𝑚 ⏟
∂𝐹
∫ 𝑝𝛽 ∂𝑥 𝛽 exp [𝑖
(𝑝𝜆 −𝑒𝐴𝜆 /𝑐)𝛿𝑥𝜆 ℏ
] 𝑑4𝑝 +
(𝐴) 𝑒
∂𝐹
∫ 𝑝𝛽 𝐺 𝛼𝛽 ∂𝑝𝛼 exp [𝑖 𝑚𝑐 ⏟
(𝑝𝜆 −𝑒𝐴𝜆 /𝑐)𝛿𝑥𝜆 ℏ
. ] 𝑑4𝑝 = 0
(𝐵)
Let us simplify (A) and (B) separately. For (A) we use ∂ (𝑝 𝜆 − 𝑒𝐴𝜆 /𝑐)𝛿𝑥𝜆 𝛽 {𝑝 𝐹exp [𝑖 ]} = ℏ ∂𝑥 𝛽 ∂𝐹 𝑖𝑒 ∂𝐴𝜆 (𝑝 𝜆 − 𝑒𝐴𝜆 /𝑐)𝛿𝑥𝜆 {𝑝𝛽 𝛽 − 𝑝𝛽 𝛽 𝛿𝑥 𝜆 } exp [𝑖 ] ℏ𝑐 ∂𝑥 ℏ ∂𝑥
(11.7)
288
Olavo Leopoldino da Silva Filho
to get (𝐴) =
1 ∂ (𝑝 𝜆 − 𝑒𝐴𝜆 /𝑐)𝛿𝑥𝜆 4 𝛽 ∫ 𝑝 𝐹exp [𝑖 ]𝑑 𝑝 + 𝑚 ∂𝑥 𝛽 ℏ
𝑖𝑒 ∂𝐴𝜆 𝜆 (𝑝 𝜆 − 𝑒𝐴𝜆 /𝑐)𝛿𝑥𝜆 4 𝛽 𝛿𝑥 ∫ 𝑝 𝐹exp [𝑖 ]𝑑 𝑝 ℏ𝑚𝑐 ∂𝑥 𝛽 ℏ
.
Now ∫ 𝑝𝛽 𝐹exp [𝑖
(𝑝𝜆 −𝑒𝐴𝜆 /𝑐)𝛿𝑥𝜆 ℏ
∂𝑍
𝑒
] 𝑑 4 𝑝 = −𝑖ℏ ∂(𝛿𝑥 ) + 𝑐 𝐴𝛽 𝑍,
(11.8)
𝛽
and thus (assuming that ∂𝛼 𝐴𝛼 = 0—this is not necessary, but simplifies the math) (𝐴) = −𝑖ℏ
∂2 𝑍 ∂𝑥 𝛽 ∂(𝛿𝑥𝛽 )
+
𝑒 𝛽 ∂𝑍 𝑒 ∂𝐴𝜆 ∂𝑍 𝑖𝑒 2 𝛽 ∂𝐴𝜆 𝐴 + 𝛿𝑥 + 𝐴 𝛿𝑥 𝑍. 𝜆 𝑚𝑐 ∂𝑥 𝛽 𝑚𝑐 𝜆 ∂𝑥 𝛽 ∂(𝛿𝑥𝛽 ) ℏ𝑚𝑐 2 ∂𝑥 𝛽
For (B), we have (𝑝 𝜆 − 𝑒𝐴𝜆 /𝑐)𝛿𝑥𝜆 4 (𝑝 𝜆 − 𝑒𝐴𝜆 /𝑐)𝛿𝑥𝜆 𝑒 𝛼𝛽 ∂𝐹 𝑒 ∂ ∫ 𝛼 {𝐺 𝛼𝛽 𝑝𝛽 𝐹exp [𝑖 𝐺 ∫ 𝑝𝛽 𝛽 exp [𝑖 ]𝑑 𝑝 = ]} 𝑑4 𝑝 − 𝑚𝑐 ℏ 𝑚𝑐 ∂𝑝 ℏ ∂𝑝 ; ∂𝑝𝛽 (𝑝 𝜆 − 𝑒𝐴𝜆 /𝑐)𝛿𝑥𝜆 4 (𝑝 𝜆 − 𝑒𝐴 𝜆 /𝑐)𝛿𝑥𝜆 4 𝑒 𝑖𝑒 ∫ 𝐺 𝛼𝛽 𝛼 𝐹exp [𝑖 ]𝑑 𝑝 + 𝛿𝑥𝛼 𝐺 𝛼𝛽 ∫ 𝑝𝛽 𝐹exp [𝑖 ]𝑑 𝑝 𝑚𝑐 ∂𝑝 ℏ 𝑚𝑐ℏ ℏ
the first term represents a divergence which must go to zero, because 𝐹 is a probability density function; the second term is also zero, since 𝐺 𝛼𝛽 is an anti-symmetric tensor and ∂𝑝𝛽 / ∂𝑝𝛼 = 𝛿𝛼𝛽 (𝛿𝛼𝛽 the Kroeneker symbol). We can use (11.8) to further simplify the last term as 𝑖𝑒 (𝑝 𝜆 − 𝑒𝐴𝜆 /𝑐)𝛿𝑥𝜆 4 𝛿𝑥𝛼 𝐺 𝛼𝛽 ∫ 𝑝𝛽 𝐹exp [𝑖 ]𝑑 𝑝 = 𝑚𝑐ℏ ℏ . 2 𝑒 ∂𝑍 𝑖𝑒 𝛿𝑥 𝐺 𝛼𝛽 − 𝛿𝑥 𝐺 𝛼𝛽 𝐴𝛽 𝑍 𝑚𝑐 𝛼 ∂(𝛿𝑥 𝛽 ) 𝑚𝑐ℏ2 𝛼 (𝐵) =
With these results, (11.7) becomes ∂𝑍
𝑖ℏ
∂2 𝑍
𝑒
∂𝑍
− 𝑚 ∂𝑥 𝛼 ∂(𝛿𝑥 ) + 𝑚𝑐 𝐴𝛼 ∂𝑥 𝛼 + ∂𝜏 𝛼
𝑖𝑒 2 ℏ𝑚𝑐 2
𝛿𝑥𝜆
∂𝐴𝜆 ∂𝑥 𝛼
𝐴𝛼 𝑍 +
𝑒 𝑚𝑐
𝛿𝑥𝜆 𝐺 𝜆𝛽
∂𝑍 ∂(𝛿𝑥 𝛽 )
+
𝑖𝑒 2 ℏ𝑚𝑐 2
𝛿𝑥𝜆 𝐺 𝜆𝛽 𝐴𝛽 𝑍 = 0
,
(11.9)
and the definition of 𝐺 𝛼𝛽 implies the simplification ∂𝑍
ℏ2
∂2 𝑍
𝑖𝑒ℏ
∂𝑍
∂𝐴
𝑒2
∂𝑍
∂𝐴2
𝑖ℏ ∂𝜏 = − 𝑚 ∂𝑥 𝛼 ∂(𝛿𝑥 ) − 𝑚𝑐 [𝐴𝛼 ∂𝑥 𝛼 + 𝛿𝑥𝜆 ∂𝑥𝛼 ∂(𝛿𝑥 )] + 2𝑚𝑐 2 𝛿𝑥𝜆 ∂𝑥 𝑍. 𝛼
𝜆
𝛼
𝜆
(11.10)
Special and General Relativistic Quantum Mechanics
289
In order to obtain an equation for the probability amplitude we can use (11.4) and write 𝑖
Ψ(𝑥; 𝜏) = 𝑅(𝑥, 𝜏)exp [ℏ 𝑆(𝑥, 𝜏)],
(11.11)
being 𝑅(𝑥; 𝜏) and 𝑆(𝑥; 𝜏) real functions. The method is thus to take expression (11.11) into (11.4) and take the result to equation (11.9); we then collect only the zeroth and first order coefficients on 𝛿𝑥 by assuming that our problem obeys the Central Limit Theorem. Thus, using expressions (11.11) and (11.4), developed up to the second order in 𝛿𝑥, we obtain 𝑍(𝑥, 𝛿𝑥; 𝜏) = {𝑅 2 +
𝛿𝑥𝛼 𝛿𝑥𝛽 2
∂2 𝑅
[𝑅 ∂𝑥
𝛼 ∂𝑥𝛽
−
∂𝑅 ∂𝑅 ∂𝑥𝛼 ∂𝑥𝛽
𝑖 ∂𝑆
]} exp (ℏ ∂𝑥 𝛽 𝛿𝑥 𝛽 ).
(11.12)
Now, substituting this expression in (11.9), keeping the zeroth and first order terms in 𝛿𝑥, and using 𝑖 [∗] = exp ( 𝛿𝑥 𝛽 ∂𝛽 𝑆) ℏ and ∂𝑍 ∂𝑅 2 𝑖 2 𝛽 ∂2 𝑆 = [ + 𝑅 𝛿𝑥 ] [∗] ∂𝑥 𝜆 ∂𝑥 𝜆 ℏ ∂𝑥 𝛽 ∂𝑥 𝜆 ∂𝑍 𝑖 ∂𝑆 ∂2 𝑅 ∂𝑅 ∂𝑅 = [ 𝑅2 + 𝛿𝑥𝛼 (𝑅 − )] [∗] ∂(𝛿𝑥𝜆 ) ℏ ∂𝑥𝜆 ∂𝑥𝛼 ∂𝑥𝜆 ∂𝑥𝛼 ∂𝑥𝜆 ∂𝑍 ∂𝑅 2 ∂𝑆 𝑖ℏ = [𝑖ℏ − 𝑅 2 𝛿𝑥 𝛽 ∂𝛽 ] [∗] ∂𝜏 ∂𝜏 ∂𝜏
,
ℏ2 ∂𝑍 ∂ 𝑅 2 ∂2 𝑆 ∂ 1 ∂𝑆 2 2 𝛽 = {−𝑖ℏ ( ) + 𝑅 𝛿𝑥 [ ( ) ]− 𝑚 ∂𝑥 𝜆 ∂(𝛿𝑥𝜆 ) ∂𝑥 𝜆 𝑚 ∂𝑥𝜆 ∂𝑥 𝛽 2𝑚 ∂𝑥𝜆 ℏ2 ∂ ∂2 𝑅 ∂𝑅 ∂𝑅 𝛿𝑥𝛼 𝜆 (𝑅 − )} [∗] 𝑚 ∂𝑥𝛼 ∂𝑥𝜆 ∂𝑥𝛼 ∂𝑥𝜆 ∂𝑥 −
we get −𝑖ℏ [
∂𝑅 2 ∂ 𝑅 2 ∂2 𝑆 𝑒 𝜆 ∂𝑅 2 + 𝜆( )+ 𝐴 ]+ ∂𝜏 𝑚𝑐 ∂𝑥 𝜆 ∂𝑥 𝑚 ∂𝑥𝜆
𝑅 2 𝛿𝑥 𝛽 ∂𝛽 [
∂𝑆 1 ∂𝑆 2 ℏ2 𝑒 𝜆 ∂𝑆 𝑒2 + ( ) − W𝑅 + 𝐴 + 𝐴2 ] = 0 ∂𝜏 2𝑚 ∂𝑥𝜆 2𝑚𝑅 𝑚𝑐 ∂𝑥 𝜆 2𝑚𝑐 2
which simplifies to
290
Olavo Leopoldino da Silva Filho ∂𝑅2
𝑖ℏ {
∂𝜏
+ ∂𝛼 [𝑅 2 ∂𝑆
𝑅 2 𝛿𝑥 𝛽 ∂𝛽 [
∂𝜏
+
(∂𝛼 𝑆+𝑒𝐴𝛼 /𝑐)
]} +
𝑚
(∂𝛼 𝑆+𝑒𝐴𝛼 /𝑐)2 2𝑚
−
,
ℏ2 2𝑚𝑅
(11.13)
W𝑅] = 0
where we put ∂𝛼 = ∂/ ∂𝑥 𝛼 and ⊡= ∂𝛼 ∂𝛼 , as usual. Collecting the real and imaginary parts and equating them separately to zero, we get the pair of equations ∂𝑅2 ∂𝜏
+ ∂𝛼 [𝑅 2
(∂𝛼 𝑆+𝑒𝐴𝛼 /𝑐)
]=0
𝑚
(11.14)
and ℏ2
∂𝑆
− 2𝑚𝑅 ⊡ 𝑅 + ∂𝜏
(∂𝛼 𝑆+𝑒𝐴𝛼 /𝑐)2 2𝑚
= 𝑐𝑜𝑛𝑠𝑡.
(11.15)
The term (∂𝛼 𝑆 + 𝑒𝐴𝛼 /𝑐) was expected, since ∂𝛼 𝑆 stands for the quantum linear fourmomentum and the latter expression is its canonical extension. To see that, just calculate lim − 𝑖ℏ
𝛿𝑥𝛼 →0
lim − 𝑖ℏ
𝛿𝑥𝛼 →0 (𝑝𝛼
∫
∂ ∂(𝛿𝑥) ∂
∫ 𝑍(𝑥, 𝛿𝑥, 𝜏)𝑑 4 𝑥 = ∫
𝜌(𝑥, 𝜏)𝑑 4𝑥 =
𝑖
∫ 𝐹(𝑥, 𝑝, 𝜏)exp [ℏ (𝑝𝛼 − 𝑒𝐴𝛼 /𝑐)𝛿𝑥𝛼 ] 𝑑 4 𝑥𝑑 4 𝑝 =,
∂(𝛿𝑥𝛼 ) 𝛼 )𝐹(𝑥,
− 𝑒𝐴
∂𝑆 ∂𝑥𝛼
(11.16)
𝑝, 𝜏)𝑑4 𝑥𝑑 4 𝑝
so that 〈∂𝛼 𝑆 + 𝑒𝐴𝛼 /𝑐〉 = 〈𝑝𝛼 〉.
(11.17)
We now choose 𝑐𝑜𝑛𝑠𝑡 = 𝑚𝑐 2 /2 (for reasons that we will present shortly) and find that (11.14) and (11.15) are formally identical to [
1 2𝑚
𝑒
2
(𝑖ℏ ∂𝛼 + 𝐴𝛼 (𝑥)) + 𝑐
𝑚𝑐 2 2
] Ψ(𝑥; 𝜏) = 𝑖ℏ
∂Ψ(𝑥;𝜏) ∂𝜏
(11.18)
since the substitution of expression (11.11) in the previous equation gives us equations (11.15) and (11.14) for the real and imaginary parts, respectively. This result reflects the relativistic counterpart of the Schrödinger equation. Kyprianidis and others[112,113] have already found a similar Schrödinger equation using different reasonings (and with the constant term missing). It is noteworthy that one finds in the litterature the complaint that the time enters in the Schrödinger equation as a first order derivative, while the space enters as a second order derivative, and this would be a problem to any relativistic generalization, since time and space must enter the equation on the same grounds. The argument is correct, but fails to understand that time, as a parameter, should be replaced by some other parameter allowed by the Special Theory of Relativity, and not simply eliminated. This new parameter should be 𝜏.
Special and General Relativistic Quantum Mechanics
291
If we assume that the probability density function defined upon configuration space is stationary with respect to the laboratory frame of reference, this fixes the meaning of 𝜏 and we can put Ψ(𝑥; 𝜏) = 𝜓(𝑥)𝑒 −𝑖𝑚𝑐 giving [
1 2𝑚
𝑒
2 𝜏/ℏ
2
,
(𝑖ℏ ∂𝛼 + 𝐴𝛼 (𝑥)) −
(11.19)
𝑚𝑐 2
𝑐
2
] 𝜓(𝑥) = 0,
(11.20)
(being this result the reason why we have chosen the constant in (11.15) as 𝑚𝑐 2 /2). This last result is the usual relativistic Klein-Gordon equation. Note that (11.18) is the counterpart of the non-relativistic Schrödinger equation, having 𝜏 as its parameter, while the Klein-Gordon equation is one of its particular cases. If we introduce a force 𝛼 𝐹𝑖𝑛𝑡 = ∂𝛼 (𝜋 ⋅ 𝐄 + 𝜇 ⋅ 𝐁),
giving the interaction between the internal degrees of freedom of the particle and the electromagnetic fields, where 𝜋 and 𝜇 are the electric and magnetic moments, and 𝐄 and 𝐁 are the electric and magnetic fields, we find the usual quantum relativistic second order Dirac equation [114].
11.1.1. Probabilities and Averages The introduction of relativistic concepts changes the way we understand the phase-space probability density function in some important aspects. Indeed, we now have a function 𝐹(𝑥, 𝑝; 𝜏) depending on the four-position and four-momentum, and also depending, parametrically, on 𝜏. The normalization condition consistent with (11.2) is ∫ ∫ 𝐹(𝑥, 𝑝; 𝜏)𝑑 4 𝑥𝑑 4 𝑝 = 1.
(11.21)
Note that now we are integrating upon the invariant eight dimensional phase-space, which includes the time. This is seemingly an absurd, since it appears to state that we are integrating the probability density function for all times and for all possible energies, when what we need is the probability density function at some specific "time". The fact is that, until now, we have not used the important constraints 𝑝2 = 𝑚2 ; 𝜏 2 = 𝑐 2 𝑡 2 − 𝑥⃗ 2 , which are specific of the Theory of Relativity. When we do that it first becomes clear that we are integrating in (11.21) only over an energy shell (which takes us from 𝑑 4 𝑝 to 𝑑 3 𝑝). The constraint on the proper time is even
292
Olavo Leopoldino da Silva Filho
more instructive; when we take 𝜏 as the parameter describing the dynamic evolution of the probability density function, we are assuming that, if we are at the origin of the coordinate system, to take into account some point 𝑥⃗ within 𝑑 3 𝑥⃗ to build up 𝐹(𝑥, 𝑝; 𝜏) we must consider that point at the times (see Figure 11-1) ±√𝜏 2 − 𝑥⃗ 2 /𝑐
(11.22)
(for stationary distributions, we must take both signs into consideration, since a positive time evolution of the system is indistinguishable from a negative one). This should be so, since us, as relativistic beings, could never build up 𝐹(𝑥, 𝑝; 𝜏) considering some instantaneous universal time 𝑡, for the very fact that there is no shuch thing in the relativistic framework. This gives us a notion of the importance of having 𝜏 as the parameter of our description. Of course, for quantum mechanical systems of the size of an atom, this feature has no numerical relevance, since the characteristic distances and times are too small. However, when it comes to understand the role of gravitation, one would be usually dealing with astronomical distances and (11.22) becomes crucial.
Figure 11.1. Evolution of the system with respect to the parameter τ and the way the probability density function must be built. The clocks represent the usual synchronization process done within the Special Theory of Relativity.
The definition of the configuration space probability density function is inherited from (11.2) and (11.4) and must be given by 𝜌(𝑥, 𝜏) = Ψ ∗ (𝑥, 𝜏)Ψ(𝑥, 𝜏), which is, obviously, allways, positive. In fact, Axiom 1 gives, by inheritance, ∂𝜌 ∂𝜏
𝑑
+ ∂𝛼 𝑗 𝛼 = 0 ⇒ 𝑑𝜏 ∫ 𝜌(𝑥; 𝜏)𝑑 4 𝑥 = 0,
(11.23)
Special and General Relativistic Quantum Mechanics
293
as the probability conservation equation with respect to 𝜏, where the four-current is given by 𝑗 𝛼 (𝑥, 𝜏) =
𝑖ℏ [Ψ ∗ (𝑥, 𝜏) ∂𝛼 Ψ(𝑥, 𝜏) − Ψ(𝑥, 𝜏) ∂𝛼 Ψ∗ (𝑥, 𝜏)]. 2𝑚
This is a major difference from the usual way the configuration space probability density function is defined in textbooks’ approaches to the Klein-Gordon equation. In fact, in the usual approach (those of textbooks), since the product ∫ 𝜓 ∗ (𝑥; 𝑡)𝜓(𝑥; 𝑡)𝑑 3 𝑥
(11.24)
does not furnish a time independent value, one searches for a time invariant probability density upon three dimensional space. With the argument that the resulting integral is a relativistic invariant and that, in the non-relativistic limit, this component tends to the nonrelativistic density[114], one finally relates this three dimensional density with the zeroth component of the probability current. This implies that, within the usual approach one simply gets ∂𝛼 𝑗 𝛼 = 0.
(11.25)
In such approaches there appears the problem that 𝑗 0 can assume negative values—this was, historically, one of the reasons to search for the linear Dirac equation and the positive definite probability density defined from it. In fact, the negativity of 𝑗 0 usually serves as an argument to sustain that the Klein-Gordon equation presents a major inadequacy for the description of relativistic particles. There are two reasons why the requirement 𝜌′0 ⇒ ∫ 𝜌′3 𝑥 = 1 is simply bizarre: firstly, it assumes that we must use some universal time 𝑡 to build up our probability amplitudes 𝜓(𝑥; 𝑡) (and thus the density), as if we were gods violating the fact that we can access positions 𝑥⃗ only restricted to a delay given by (11.22). It is important to remember that the probability density function must be considered not only as a mathematical construct, but as a mathematical construct linked to an appropriate method of experimental access. This 𝜌′ (𝑥; 𝑡), without further considerations, is unaccessible for relativistic beings, as we are. Secondly, 𝑗 0 is not, in general, a conserved quantity in four-dimensional space. If we integrate 𝑗 0 with respect to 𝑑 4 𝑥 the result won't be a constant of motion and the stationary character of the Klein-Gordon equation will be lost, although it is always sustained in the mathematical formalism. Of course, when one has a stationary behavior in 𝜏, as we advanced previously, then the result ∂𝛼 𝑗 𝛼 = 0 will be correct, but even in this case the probability density should never be identified as 𝑗 0. The innadequacy of the choice of 𝑗 0 can also be seen in the way other results are established. Indeed, in the context of this approach to the Klein-Gordon equation, one uses the fact that the probability density function can assume both signs to define charge density as 𝑒𝜌(𝑥, 𝑡)—in which the sign of 𝑒 is always kept positive. Then, charge conjugation can be easily defined by the very operation of probability amplitude conjugation. Although this can be made consistent with other results in the formalism of the Klein-Gordon equation, it means a double standard approach, when we look at the big picture. In fact, in any other approach in
294
Olavo Leopoldino da Silva Filho
which the probability density function is positive definite, as with the linear Dirac equation formalism, one will have to assume that the sign of the charge density will be given by the sign of 𝑒 and not the sign of 𝜌(𝑥, 𝑡). Having said this, it remains to give 𝑗 0 an appropriate interpretation. This is not a problem if we note that, within the realm of a relativistic approach (special or general) we must speak of four dimensional volumes, not three dimensional ones. Thus, we must make reference not only to fluxes through three volumes, but also fluxes through the dimension of time. In such cases, the physical system may be at rest in the specified frame of reference and fields still create or destroy particles/antiparticles in such a way that, although there may be no flux in three-dimensional space, there will still be a flux through the "time surface" (see Figure 11-2).
Figure 11.2. An example of the interpretation of a flux through the time "surface". The particle is at rest in the four-volume (time passes for it as for the time surface). At some instant (b) the particle interacts with the field and particles and antiparticles are created. Particles flow to the left, staying within the 4volume, while antiparticles flow to the right, trasversing the time surface and, thus, producing a timeflux.
11.1.2. The Random Effective Potential From equations (11.14) and (11.15) we can calculate a random averaged effective potential formally identical to Bohm's quantum potential but representing the fluctuations present in the physical system. In this case, (11.15) has a Hamilton-Jacobi format and it is immediate to associate this effective potential with [115] ℏ2
𝑒
2
𝑉𝑒𝑓𝑓 (𝑥) = 𝑉(𝑥) − 2𝑚𝑅 (𝑖ℏ ∂𝛼 + 𝑐 𝐴𝛼 (𝑥)) 𝑅,
(11.26)
which reflects the Newton-like equation 𝑑𝑝𝛼 𝑑𝜏
= − ∂𝛼 𝑉𝑒𝑓𝑓 (𝑥),
(11.27)
together with the initial condition (see comments after equation (11.15)) 𝑝𝛼 = ∂𝛼 𝑆.
(11.28)
Special and General Relativistic Quantum Mechanics
295
These last three expressions will be the cornerstone of the general relativistic quantum derivation to be developed in the next section. Equation (11.26) may be interpreted as follows: we always have systems composed of particles and a field; by choosing to describe the particle subsystem, while avoiding to describe in detail the field subsystem, we choose to see the latter as a sort of thermal reservoir, which is in contact with the particle subsystem: the contact (interaction) between the two subsystems may be understood if we note that, whenever there is an interaction between them, there will be an exchange of the particle-like entity that carries the interaction (photons, gravitons, etc.). These particle-like entities thus transfer part of the energy of the field subsystem to the particle subsystem or vice-versa and are responsible for the fluctuations of the field energy and the fluctuations of the position and momentum of the particles. The fluctuations in position and momentum must, furthermore, obey the Heisenberg dispersion relations. This process was already schematically shown in.
Figure 11.3. Fluctuations of energy from the particle subsystem to the field subsystem and vice-versa. The two particles shown are exchanging photons in their electromagnetic interaction. When (a) one photon leaves particle 2 towards particle 1 but has not arrived at particle 1, energy is transferred from the particle subsystem to the field subsystem. When (b) the photon is absorbed by particle 1, then energy is transferred from the field subsystem back to the particle subsystem, making energy fluctuate. The realistic situation is when (c) a very large number of photons is leaving both particles while another large number of them is being absorbed by the particles, making the fluctuation profile a very complex one.
11.2. GENERAL RELATIVISTIC QUANTUM MECHANICS Now, the axioms will be once more altered to become adequate to the General Theory of Relativity. Axiom 1: For a system in the presence of a gravitational field, the probability density function 𝐹(𝑥, 𝑝, 𝑠) related to the four-position and four-momentum of the particle is a conserved quantity when its variation is taken along the system geodesics, that is
296
Olavo Leopoldino da Silva Filho 𝐷𝐹(𝑥,𝑝;𝑠) 𝐷𝑠
= 0,
(11.29)
where 𝑠 is the time associated to the geodesic and 𝐷/𝐷𝑠 is the derivative taken along the geodesic defined by 𝑠. Axiom 2: The general relativistic characteristic function 𝑖
𝑍(𝑥, 𝛿𝑥; 𝑠) = ∫ 𝐹(𝑥, 𝑝; 𝑠)exp ( 𝑝𝛼 𝛿𝑥𝛼 ) 𝑑 4 𝑝 ℏ
(11.30)
is adequate to the description of any quantum system in the presence of gravitational fields. Axiom 3: The general relativistic characteristic function may be decomposed as 𝑍(𝑥, 𝛿𝑥, 𝑠) = Ψ ∗ (𝑥 −
𝛿𝑥 2
; 𝑠) Ψ (𝑥 +
𝛿𝑥 2
; 𝑠),
(11.31)
for any quantum mechanical system in the presence of gravitational fields. With equation (11.29) we can write ∂𝐹
+
∂𝑠
𝑝𝛼 𝑚
∇𝑥 𝛼 𝐹 + 𝑓 𝛼 ∇𝑝𝛼 𝐹 = 0,
(11.32)
where we used the relations 𝐷𝑥 𝛼 𝐷𝑠
=
𝑝𝛼 𝑚
;
𝐷𝑝𝛼 𝐷𝑠
= 𝑓𝛼.
(11.33)
Assuming the validity of the decomposition in (11.31) and writing Ψ(𝑥; 𝑠) = 𝑅(𝑥; 𝑠)exp[𝑖𝑆(𝑥; 𝑠)/ℏ],
(11.34)
we obtain, collecting the zeroth and first order coefficients of 𝛿𝑥 𝛼 , the pair of expressions ∂𝑅2 ∂𝑠
+ ∇𝜇 (𝑅(𝑥)2
∇𝜇 𝑆 𝑚
) = 0,
(11.35)
and ℏ2
∂𝑆
− 2𝑚𝑅 ⊡ 𝑅 + 𝑉 + ∂𝑠
∇𝛼 𝑆∇𝛼 𝑆 2𝑚
= 0,
(11.36)
where now ⊡= ∇𝜇 ∇𝜇 . These two equations are equivalent to the general relativistic Schrödinger equation −
ℏ2 2𝑚
⊡ Ψ + 𝑉Ψ +
𝑚𝑐 2 2
Ψ = 𝑖ℏ
∂Ψ ∂𝑠
(11.37)
Special and General Relativistic Quantum Mechanics
297
since, when substituting expression (11.34) into equation (11.37) and separating the real and imaginary parts, we get (11.35) and (11.36). We now obtain, as in the previous section, the expression for the potential and the stochastic force associated with the “random effective field” giving the fluctuations in the kinetic energy as −ℏ2
𝜇
𝑉(𝑄) = 2𝑚𝑅 ⊡ 𝑅 ; 𝑓(𝑄) = ∇𝜇 𝑉(𝑄) (𝑥)
(11.38)
so that we can write 𝑚
𝐷2 𝑥 𝜇 𝐷𝑠2
𝜇
= 𝑓𝜇 (𝑥) + 𝑓(𝑄) (𝑥),
(11.39)
which is an equation for the system's stochastic movement. In this case, (11.39) can be considered the equation for the stochastic geodesics associated to the random behavior that the system may actually show. Looking at equation (11.39), we follow Einstein's intuition and put 𝐺𝜇𝜈 = −
8𝜋𝐺 𝑐2
(𝑇(𝑀)𝜇𝜈 + 𝑇(𝑄)𝜇𝜈 ),
(11.40)
where 𝐺𝜇𝜈 is Einstein's tensor, 𝑇(𝑀)𝜇𝜈 is the energy-momentum tensor associated with the forces represented by 𝑓𝜇 (𝑥) in equation (11.39) and 𝑇(𝑄)𝜇𝜈 is the tensor associated with the statistical potential of equation (11.38). Indeed, for any force appearing in the right side of the geodesic equation, Einstein's prescription is to include its energy-momentum tensor in the right side of Einstein's equation, which was exactly what we have done. We can obtain 𝑇(𝑄)𝜇𝜈 by looking at (11.35) and (11.36). Equation (11.36) represents the possible geodesics related to the fluctuations, as was pointed out above, while (11.35) defines an equation for the “statistical” field variables 𝑅(𝑥) and 𝑆(𝑥) related to the probability field that behaves like an incompressible fluid. The tensor associated with this equation is given by ∇𝜇 𝑆 ∇𝜈 𝑆
𝑇(𝑄)𝜇𝜈 = 𝑚𝑅(𝑥; 𝑠)2 ( 𝑚
𝑚
)
(11.41)
and is called a matter tensor if we make the following substitution 𝑢𝜇 =
∇𝜇 𝑆 𝑚
; 𝜌𝑄 (𝑥; 𝑠) = 𝑚𝑅(𝑥; 𝑠)2 ,
(11.42)
as a random four-velocity [see (11.28)] and a statistical matter distribution, respectively, to get[35] 𝑇(𝑄)𝜇𝜈 = 𝜌𝑄 (𝑥; 𝑠)𝑢𝜇 𝑢𝜈 .
(11.43)
The interpretation of this tensor is quite simple and natural. It represents the stochastic matter distribution in space-time. The picture is the following: given a gravitational system (possibly with other forces acting upon its constituents) there appear fluctuations in the
298
Olavo Leopoldino da Silva Filho
positions and momenta of its components (for reasons already discussed above). These fluctuations, obviously, will make the geodesic in (11.39) to behave randomly in a way defined by the mass density in expression (11.43), which defines the space-time large scale structure geometry by the metric 𝑔𝜇𝜈 ; this metric (which is a potential) appears in Einstein's equation and also in (11.37) within the D’Alambertian operator. The system of equations to be solved is −
ℏ2 2𝑚
⊡ Ψ + 𝑉Ψ +
𝐺𝜇𝜈 = −
8𝜋𝐺 𝑐2
𝑚𝑐 2 2
Ψ = 𝑖ℏ
(𝑇(𝑀)𝜇𝜈 + 𝑇(𝑄)𝜇𝜈 )
∂Ψ ∂𝑠
(11.44) (11.45)
This system may be solved in the following way: first we solve Einstein's equation for the yet unknown probability density 𝜌𝑄 (𝑥; 𝑠) and obtain the metric in terms of this function. With the metric at hand, expressed in terms of the functions 𝑅(𝑥; 𝑠) and 𝑆(𝑥; 𝑠), we return to (11.44) and solve it for these functions. This procedure is repeated until self-consistency is attained. One problem that can be easily solved is the free particle general relativistic quantum mechanical problem—one particle in the presence of a massive body—and its statement and solution will be presented in the next section. We note here that the way the system (11.44),(11.45) is solved may be paralleled with the Hartree method to solve for the electronic charge distribution problem in atomic systems. Indeed, in the latter, one solves the Schrödinger equation for some initial unknown charge distribution (equal to 𝜌 = 𝑒𝜓 ∗ (𝑥; 𝑡)𝜓(𝑥; 𝑡)) to obtain the electronic distribution. With this distribution at hand one then solves Maxwell's equations. (indeed, only Gauss’ one ∇2 ϕ = −4𝜋𝜌, since the problem is static) to find the potential ϕ. This process is iterated until selfconsistency is attained. For one particle system, such as the hydrogen atom, this selfconsistency is analytically obtained. This method is equivalent to the one presented here, with the metric as the potential function (appearing inside the D’Alambertian operator) and with Einstein’s equation instead of Gauss’ law. Another important issue to note is that the system of equations (11.44),(11.45) is highly non-linear, since the metric appears in the D’Alambertian operator while the functions 𝑅 and 𝑆 are used to construct the tensor 𝑇(𝑄)𝜇𝜈 , from which the metric is calculated.
11.3. APPLICATION: THE GENERAL RELATIVISTIC "FREE" PARTICLE PROBLEM In the previous section we developed a general relativistic quantum theory for stochastic systems. This theory can be considered as an immediate generalization of the second order special relativistic equations (Klein-Gordon’s and Dirac’s), depending on the specific problem under consideration. It includes Einstein’s equation as part of the system of equations to be solved and, thus, takes into account gravitation. As happens to its special relativistic counterparts, this general relativistic theory is not a quantum field theory, which remains to be developed.
Special and General Relativistic Quantum Mechanics
299
In the present section we will apply the above mentioned theory to the "free" particle problem. Note that "free" in the context of a general relativistic theory means in some gravitational field. The quantum counterpart of this problem is as follows: we suppose that the initial conditions related to the particle are not known—or its path is subjected to random fluctuations. It is thus necessary to approach the problem statistically. The resulting statistical description shall account for the particle’s probability distribution over space-time, parameterized by its world line. The function that emerges from the calculations shall represent the probability amplitude related to the particle being somewhere in threedimensional space at some instant of time—an event probability amplitude. As with electron clouds for the hydrogen atom problem, the particle becomes represented by a continuous (probability) density distribution. In our case we have something like a dust cloud. This implies that all the particle properties, such as the mass or the charge, shall also be considered as continuously (and statistically) distributed in space-time. In the next subsection we state the free particle problem mathematically and solve it analytically.
11.3.1. Statement and Solution of the Problem We showed that the system of equations to be solved when considering a general relativistic quantum problem is given by [cf. equations (11.44) and (11.45)] ∂𝑆 ∂𝑠
−
ℏ2 2𝑚𝑅
𝐺𝜇𝜈 = −
⊡𝑅+𝑉+ 8𝜋𝐺 𝑐2
∇𝛽 𝑆∇𝛽 𝑆 2𝑚
+
𝑚𝑐 2 2
=0
(𝑇(𝑀)𝜇𝜈 + 𝑇(𝑄)𝜇𝜈 ),
(11.46) (11.47)
where functions 𝑅 and 𝑆 are related to the probability amplitude by Ψ(𝑥, 𝑠) = 𝑅(𝑥, 𝑠)exp(𝑖𝑆(𝑥; 𝑠)/ℏ),
(11.48)
𝐺 is Newton's gravitational constant, 𝑇(𝑀)𝜇𝜈 is the matter energy-momentum tensor, 𝑇(𝑄)𝜇𝜈 is the energy-momentum tensor of the random effective field given, in terms of 𝑅 and 𝑆 as [see equations (11.41) and (11.42)] 𝑇(𝑄)𝜇𝜈 = 𝑚𝑅(𝑥, 𝑠)2
∇𝜇 𝑆 ∇𝜈 𝑆 𝑚
𝑚
,
(11.49)
and 𝐺𝜇𝜈 is Einstein's tensor. The only force acting in our specific problem is due to the gravitational field. Then, we expect only the random field energy-momentum tensor 𝑇(𝑄)𝜇𝜈 to appear in the right side of (11.47), since, in this case, 𝑉 = 0, and 𝑇(𝑀)𝜇𝜈 is identically zero. Because of the symmetry of the problem, we may write the tentative metric in the commoving coordinate system as
300
Olavo Leopoldino da Silva Filho 𝑐 2 𝑑𝑠 2 = 𝑐 2 𝑑𝜏 2 − 𝑒 𝑤(𝑟,𝜏) 𝑑𝑟 2 − 𝑒 𝑣(𝑟,𝜏) (𝑑𝜃 2 + sin2 𝜃𝑑ϕ2 ),
(11.50)
where 𝜏 is the particle proper time coordinate, (𝑟, 𝜃, ϕ) its spherical-polar coordinates and 𝑤(𝑟, 𝜏), 𝑣(𝑟, 𝜏) the functions we shall obtain to fix the metric and solve the problem. Thus, our specific problem demands that we rewrite (11.46) and (11.47) as ∂𝑆 ∂𝑠
−
ℏ2 2𝑚𝑅
⊡𝑅+
∇𝛽 𝑆∇𝛽 𝑆 2𝑚
+
𝑚𝑐 2 2
=0
(11.51)
and 𝐺𝜇𝜈 = −
8𝜋𝐺 𝑐2
𝜌(𝑟, 𝜏)𝑢𝜇 𝑢𝜈 ,
(11.52)
where we used the following conventions 𝜌(𝑟, 𝜏) = 𝑚𝑅(𝑥, 𝜏)2 ; 𝑢𝜇 =
∇𝜇 𝑆 𝑚
.
(11.53)
Looking at (11.51) we can see that, in the commoving coordinate system, we shall have ∇𝜇 𝑆∇𝜇 𝑆 = 𝑚2 𝑐 2 ⇒ 𝑢𝜇 𝑢𝜇 = 𝑐 2 ,
(11.54)
and also, as before, ∂𝑆(𝑥;𝑠) ∂𝑠
= −𝑚𝑐 2 ,
(11.55)
which means that the function 𝑆(𝑥, 𝑠) has to be written as 𝑆(𝑥; 𝑠) = −𝑚𝑐 2 𝑠 − 𝑚𝑐 2 𝜏,
(11.56)
the last term in the right coming from the constraint in (11.54)—we note here that the parameter 𝑠 and the variable 𝜏 are being treated here as independent as described in [117]. These results turn (11.51) into ℏ2 W𝑅 = 0,
(11.57)
where now one has to remember that the time derivatives in the D'Alambertian operator will be according to the proper time 𝜏. It is also important to note that, because of the symmetry of the problem and the coordinate system used, the function 𝑅 shall not depend upon the position 𝑥 𝑖 , but only upon the proper time: 𝑅 = 𝑅(𝜏). As the coordinate system is commoving, we must put 𝑢𝜇 = (𝑢0 , 0,0,0) and the fluctuating effective field energy-momentum tensor becomes 𝑇(𝑄)00 = 𝜌(𝑟, 𝜏)𝑐 2 ; 𝑇(𝑄)𝜇𝜈 = 0 if 𝜇 ≠ 0 or 𝜈 ≠ 0.
(11.58)
The reader may easily verify that, with this last expression, Einstein's equation may be reduced in the non-relativistic domain to Poisson's equation (Ref. [35], p.152)
Special and General Relativistic Quantum Mechanics ∇2 ϕ = 4𝜋𝐺𝜌,
301 (11.59)
which is the expected result. This makes the comparison with the Hartree procedure for the hydrogen atom, already alluded previously, even more striking, since this last equation is the gravitational analog of Gauss's law for electrostatics--here, the gravitational potential ϕ is written for −𝑔00 /2, as usual (Ref. [35], p.78). Einstein's equation can now be written explicitly as [117] −𝑒 −𝑤 (𝑣 ′′ + 𝑣̇ ′ +
𝑣 ′ 𝑣̇
−
2 ⋅⋅
𝑒 𝑤 (𝑣 + ⋅⋅
𝑣
(2 +
𝑣̇ 2 4
4
𝑤̇ 𝑣 ′ 2
3𝑣̇ 2
+
3𝑣 ′2
𝑤 ′𝑣′ 2
) + 𝑒 −𝑣 +
𝑣̇ 2
+
4
𝑣̇ 𝑤̇ 2
= 8𝜋𝐺𝜌
=0
+ 𝑒 −𝑣 ) −
4 𝑣̇ 𝑤̇ 4
−
⋅⋅
+
𝑤 2
+
𝑣 ′2
𝑤̇ 2
4
,
=0 𝑤 ′𝑣′
) + 𝑒 −𝑤 ( 4
4
−
𝑣 ′′ 2
−
𝑣 ′2 2
(11.60)
)=0
where the line and the dot indicate derivatives regarding variables 𝑟 and 𝜏, respectively. We can solve the last three equations if we put [117] 𝑒𝑤 =
𝑒 𝑣 𝑣 ′2 4
; 𝑒 𝑣 = [𝐹(𝑟)𝜏 + 𝐺(𝑟)]4/3 ,
(11.61)
where 𝐹(𝑟) and 𝐺(𝑟) are arbitrary functions of 𝑟. From the first equation in (11.60) we get the density function 𝜌(𝑟, 𝜏) with its explicit dependence on the metric given by the functions 𝐹(𝑟) and 𝐺(𝑟): 𝜌(𝑟, 𝜏) =
𝐹(𝑟)𝐹′ (𝑟) . 6𝜋𝐺 [𝐹(𝑟)𝜏+𝐺(𝑟)][𝐹′ (𝑟)𝜏+𝐺 ′ (𝑟)] 1
(11.62)
To solve our primary system of equations (11.46),(11.47) we still have to solve (11.57). We shall stress at this point that (11.57) is highly non-linear. The functions that define the density also define the metric. These functions will equally well appear in the D’Alambertian operator. Moreover, the function 𝑅(𝑟, 𝜏) is the “square-root” of the density function given by expression (11.62) while being also the function to be calculated with (11.57). We can solve (11.57) remembering that the function 𝑅 shall not depend upon the space coordinates, since we are commoving with the coordinate system, and so 𝜌 = 𝜌(𝜏). This means that we have to choose function 𝐺(𝑟) to be identically zero in expression (11.62) 𝐺(𝑟) = 0, which gives, for the density 1
1
𝜌(𝜏) = 𝜏2 ⇒ 𝑅(𝜏) = 𝜏 .
(11.63)
302
Olavo Leopoldino da Silva Filho
Using relation (11.48) for the probability amplitude in terms of 𝑅 and 𝑆, expression (11.56) for the function 𝑆(𝑥, 𝑠) and expression (11.63) for the function 𝑅(𝜏), we find the probability amplitude for the quantum free particle problem as Ψ(𝜏; 𝑠) = 𝑁√
1 6𝜋𝑚𝐺
𝑒 −𝑖𝑚𝑐 𝜏
2 𝜏/ℏ
𝑒 −𝑖𝑚𝑐
2 𝑠/ℏ
,
(11.64)
where 𝑁 is a normalization constant and we are in the commoving coordinate system. Replacing these results in expression (11.50) for the metric, we get 4 𝐹 ′ (𝑟)2𝜏 2 𝑑𝑠 2 = 𝑐 2 𝑑𝜏 2 − ( ) 𝑑𝑟 2 − (𝐹(𝑟)𝜏)4/3 (𝑑𝜃 2 + sin2 𝜃𝑑ϕ2 ), 9 𝐹(𝑟)2/3 𝜏 2/3 that can be further reduced to the format 𝑑𝑠 2 = 𝑐 2 𝑑𝜏 2 − 𝜏 4/3 [𝑑𝜒 2 + 𝜒 2 (𝑑𝜃 2 + sin2 𝜃𝑑ϕ2 )], where 𝜒(𝑟) = 𝐹(𝑟)2/3. The interpretation of (11.64) is unambiguous. It resembles the interpretation given to the solutions of the spherical spacial scattering problem where we have one inward and one outward scattered solution given by 𝜓𝑖𝑛 (𝑟, 𝑡) =
𝑒 −𝑖𝑘𝑟 𝑒 +𝑖𝑘𝑟 ; 𝜓𝑜𝑢𝑡 (𝑟, 𝑡) = , 𝑟 𝑟
where 𝑟 here plays the role of the geodesic (as it should be for a spherical three-dimensional problem). Thus, in the present case, the matter solution Ψ𝑀 (to which we associate a positive mass 𝑚 = |𝑚|) represents the probability amplitude related to a particle that, in its rest frame, is free falling in the direction of a massive body along the geodesic of the problem, which is precisely the expected result (strictly speaking, the particle is at rest in the commoving coordinate system that is free falling in the direction of a massive body, and this is why only the proper time appears in the solution, since, for the particle in this coordinate system, 𝑥 𝑖 = 𝑐𝑜𝑛𝑠𝑡., 𝑖 = 1,2,3). The solution represented by 𝑚 = −|𝑚|, which is also possible, gives an antimatter particle that is travelling along this same geodesic, but in the opposite direction. It then turns out that the present quantum gravitational theory predicts that matter is attracted by the gravitational field of a positive mass body, while antimatter is repelled with the same modulus of the acceleration (here “positive mass” represents matter, while “negative mass” represents antimatter. Obviously, the picture is independent of the choice of these signs, since we can choose which one will be called “matter”). This behavior of being attracted or repelled by the massive body 𝑀 (which is being considered with positive mass) fixes the signs of the rest masses as opposite. Clearly, any theory that does not take into account gravitational effects will not be capable of discerning
Special and General Relativistic Quantum Mechanics
303
these two entities (but see the following sections for an example of how the usual theory may be rewritten to show this property). The resulting metric is a Robertson-Walker one 𝑑𝑠 2 = 𝑐 2 𝑑𝜏 2 − 𝑅𝑊 (𝜏) [
𝑑𝑟 2 + 𝑟 2 (𝑑𝜃 2 + sin2 𝜃𝑑ϕ2 )] 1 − 𝑘𝑟 2
with 𝑘 = 0; meaning that the three space is flat. The Hubble constant is easily computed from the metric and gives the usual value (see Ref. [35], p.141) 𝐻=
𝑅̇𝑊 (𝜏) 1 = , 𝑅𝑊 (𝜏) 𝜏
apart from a multiplicative scale constant, as expected for this problem and also as expected from the appearance of the probability density function in expression (11.63). As a final remark, we may take a look at the geodesic equation for this problem. Since the random effective potential is given by expression (11.38) and since we must have the result given by expression (11.57), the resulting equation for the geodesic is simply 𝐷2𝑥 𝜇 = 0, 𝐷𝜏 2 as it should be for a free fall (this comes from the fact that the free particle has no fluctuation related to its movement—the random effective potential is zero).
11.4. THE NEGATIVE MASS CONJECTURE As we have seen, the notion of a negative (inertial) mass appeared in a quite natural way in the framework of a quantum gravitational theory. Indeed, in the same sense that only a theory that accounts for electric charges is capable of distinguishing between their signs, we expect that only a gravitational theory would be able to distinguish between gravitational “charge” (mass). However, it remains for us to show, at least briefly, that we can accommodate this notion of negative mass within the already known theories of relativistic classical mechanics, electromagnetism and special-relativistic quantum mechanics. Thus, this will prove that the negative mass conjecture calls for an extension of the known theories rather than their replacement. In the next three subsections, we will show that this is indeed the case by addressing the problem from the point of view of (a) relativistic classical mechanics, (b) Klein-Gordon’s relativistic theory and (c) second order Dirac’s relativistic theory.
11.4.1. Classical Mechanics In the classical special relativistic theory the momentum is defined as
304
Olavo Leopoldino da Silva Filho 𝑚
𝑑𝑥 𝜇 𝑑𝜏
= 𝑝𝜇 ,
(11.65)
where 𝑥 𝜇 and 𝑝𝜇 are the position and momentum four-vectors, 𝜏 is the proper time and 𝑚 is the rest mass. The zeroth component of this relation gives the energy as 𝐸 𝑚𝑐 2
=
𝑑𝑡 𝑑𝜏
.
(11.66)
Now, for the fraction on the right side of the last expression we may have 𝑑𝑡 𝑑𝜏
=±
1 √1−𝑣 2 /𝑐 2
= ±𝛾,
(11.67)
since the metric is given by the quadratic equation 𝑐 2 𝑑𝜏 2 = 𝑐 2 𝑑𝑡 2 − 𝑑𝐱 2 .
(11.68)
Traditionally, only the positive solution is taken from equation (11.67). We now assume that for matter (M) and antimatter (A) we must have (𝑀): {𝑑𝑡 ≥ 0 and (𝐴): {𝑑𝑡 ≤ 0 𝑚>0 𝑚 0, such that the mean square deviations never converge for 𝑛 → ∞. When one measures two components of the spin, for instance, 𝜎𝑥 and 𝜎𝑦 , the two 2
variances converge as (∆𝜎𝑥 )2 ∙ 𝑛 and (∆𝜎𝑦 ) ∙ 𝑛, such that ∆𝜎𝑥 ∆𝜎𝑦 ≥ ℏ/2. When one measures 𝜎𝑧 as an eigenvalue and CI assumes that ∆𝜎𝑥 = 0, then, it is still possible to measure 𝜎𝑥 and 𝜎𝑦 with finite values, but their dispersions would behave like (∆𝜎𝑥 )2 ∙ 𝑛1+𝛿 2 and (∆𝜎𝑥 )2 ∙ 𝑛11+𝜀 , with 𝜀, 𝛿 > 0. Thus, one should not confound the experimental error made in some measurement process with the mean square deviation of successive measurements done over a physical system. This analysis, although obvious, is very important, since it seems that there is a lot of confusion about this in the literature. It also makes explicit the kind of assertion being considered by CI when it says that the probability amplitudes refer to individual systems.
324
Olavo Leopoldino da Silva Filho
12.2. DAVID BOHM’S INTERPRETATION: WHOLENESS In David Bohm’s interpretation of Quantum Mechanics, the wave function provides only an incomplete description of the system. It governs, like a pilot wave, the behavior of particles in their movement. The ontology is corpuscular and the wave function refers to the behavior of the particles. The positions of the particles are the hidden variables to which the interpretation refers (the name is misleading, however). There are two equations that provides the completion of Bohm’s schema: Schrödinger’s equation and 𝑑𝑥 ℏ 𝜑 ∗ 𝜕𝑥 𝜑 = 𝐼𝑚 { ∗ }, 𝑑𝑡 𝑚 𝜑 𝜑 for the one dimensional single particle situation. If, in the case of a scalar wave function, we 𝑖𝑆
write 𝜑 = 𝑅𝑒𝑥𝑝( ℏ ), with 𝑅 and 𝑆 real functions, we then get this equation in its most simple version as 𝑑𝑥 𝜕𝑥 𝑆 𝐽 = = , 𝑑𝑡 𝑚 𝜌 in which 𝐽 is the probability current and 𝜌 the probability density. Bohm’s second equation can be easily obtained in the following manner: write 𝜑 = 𝑖𝑆
𝑅𝑒𝑥𝑝( ℏ ) as before and take it to the Schrödinger equation. There will appear two equations for functions 𝑅 and 𝑆. One equation will be 𝜕𝑡 𝑅 2 + 𝜕𝑥 [𝑅 2
𝜕𝑥 𝑆 ]=0 𝑚
while the other will be 𝜕𝑡 𝑆 +
1 ℏ2 𝜕𝑥2 𝑅 (𝜕𝑥 𝑆)2 + 𝑉 − =0 2𝑚 2𝑚 𝑅
which has the Hamilton-Jacobi form if we put, formally, 𝑝 = 𝜕𝑥 𝑆 and interpret the last term on the left as a potential – the so called quantum potential, which is the responsible for the above mentioned guidance of the particles. Indeed, in this case one gets 𝜕𝑡 𝑆 + 𝐻𝐵 (𝑥, 𝑝) = 0, in which 𝐻𝐵 (𝑥, 𝑝) =
𝑝2 + 𝑉 + 𝑄, 2𝑚
The Interpretation of Quantum Mechanics
325
𝑄 = − ℏ2 𝜕𝑥2 𝑅⁄2𝑚𝑅 being the so called quantum potential. However, one must keep in mind that, in the Hamiltonian formalism, 𝑝 is a completely independent variable, while here it is just an acronym for a function of 𝑥 and 𝑡2. Despite the suggestion given by the appearance of the Bohmian “Hamilton-Jacobi” equation that we get the classical Hamilton-Jacobi equation if we put ℏ → 0, this is completely wrong, since the function 𝑅(𝑥, 𝑡) will, generally, contain ℏ too and cancelation may occur (in fact, cancelation almost always happens for stationary states). One can easily check that the identification 𝑝 = 𝜕𝑥 𝑆 becomes natural (in mathematical terms) since +∞
𝑝̅ = ∫
+∞
𝜑 ∗ (−𝑖ℏ𝜕𝑥 )𝜑𝑑𝑥 = ∫
−∞
𝜌(𝑥, 𝑡)𝜕𝑥 𝑆(𝑥, 𝑡)𝑑𝑥.
−∞
(but note that we got rid of a term given by 𝑅𝜕𝑥 𝑅, since its integration in the whole space must give zero). Thus, it is always possible to define 𝑝(𝑥, 𝑡) = 𝜕𝑥 𝑆(𝑥, 𝑡) such that +∞
𝑝̅ = ∫
𝜌(𝑥, 𝑡)𝑝(𝑥, 𝑡)𝑑𝑥.
−∞
Because of Bohm’s second equation, if one has, at some initial time, the distribution of particles given by ρ = φ∗ φ, then this will be true for all subsequent times. We then obtain the “hidden variables” model by regarding the initial configuration of the distribution given by φ∗ φ. The guiding equation for the big system then transforms the initial configuration into the final configuration. Bohmian mechanics is considered, thus, a complete deterministic reformulation of Quantum Mechanics. As such, it does not need any postulate on the “collapse of the wave function”. One remarkable feature of Bohmian mechanics is the way it treats particles and waves, so that duality is expunged. In this approach, nature is corpuscular while the behavior of the particles is undulatory, since the guiding field has an obvious undulatory structure.
12.2.1. Connections with the Present Interpretation On the Statistical Character of Descriptors In previous chapters Bohm’s “Hamilton-Jacobi-like” equation appeared a great number of times. Furthermore, the continuity equation is nothing but a consequence of the Schrödinger equation. This creates a bridge between the approach of this book and Bohm’s interpretation. However, there are some important differences. Bohm’s equation for the momentum 𝜕𝑥 𝑆 does not appear in the same way as in Bohmian mechanics. In fact, the present approach is 2
Of course, in any classical problem, after we have solved it, 𝑝 is also a function of 𝑥 and 𝑡. The difference for Bohm’s approach is that, in the Hamiltonian, 𝑝 is a mere function of 𝑥 and 𝑡, and there is no resulting “phasespace”.
326
Olavo Leopoldino da Silva Filho
capable of showing where Bohm’s interpretation fails3. We can do that because of the symbolic inheritance property to which we have already alluded many times. Indeed, our derivation method leaves no doubt about the statistical character of Quantum Mechanics (and, indeed, stochastic). From any of the derivations shown in this book, the Bohmian momentum 𝑝(𝑥, 𝑡) is nothing but a statistical descriptor. The fact that the resulting Bohmian equations are of a deterministic formal appearance does not imply that they are, indeed, deterministic, since its symbols may have, as they do, statistical origins. This solves a problem of Bohm’s approach that appears when one uses it to scrutinize stationary solutions of quantum mechanical systems. For instance, for the harmonic oscillator in state 𝑛 (frequency 𝜔), since the wave function is real, we always have 1
𝐻𝐵 = (𝑛 + 2) ℏ𝜔, 𝑝𝑛 (𝑥, 𝑡) = 0, that is, the corpuscle is always at rest! The same result comes from S states of the hydrogen atom and many other systems. This is an obvious absurd if we do not assume that 𝑝(𝑥, 𝑡) is a statistical descriptor that has average values equal to zero (for obvious symmetry reasons in the two examples cited). Furthermore, it is somewhat questionable if Bohmian mechanics is, indeed, a deterministic theory, since its kinematics is governed by a statistical descriptor (the wave function). Remember that we have also shown that the “quantum potential” is related to the random part of the behavior of a physical system.
On the Notion of a Quantum “Potential” and “Non-Locality” As we have discussed in the chapter on reality and non-locality, the so called quantum potential seems to be the source of more confusion. Names are not neutral – at least when it comes to interpret the formalism. When Bohm assumes that there is a quantum potential, the door is completely open to the appearance of non-local behaviors. “Non-locality” refers to situation in which there is a causal relation that does not observe the restriction set forth by the finite value of the light velocity. However, Bohm’s quantum potential already uses the final solution for the Schrödinger equation, and this equation is solved by considering not only the equation, but also the underlying boundary conditions. Thus, inserted in Bohm’s pilot wave is already inscribed the effect of the boundaries of the physical system, even though the corpuscles may have not interacted with those boundaries at some instant of time. Of course, this problem in stationary situations is even more drastic, since for them, the very notion of a time is an elusive concept – stationary behavior exists exactly where the time plays no role for the overall description of the problem. Our approach in this book showed many times and from different perspectives that the so called quantum potential is, indeed, a statistical element to which we associate stochastic forces. The so called “non-locality” is nothing but correlation (another statistical descriptor). This can be easily seen. If one uses a density function for two particles given by 𝜌(𝑥, 𝑦; 𝑡) = 𝜌1 (𝑥; 𝑡)𝜌2 (𝑦; 𝑡) (𝑥, 𝑦 describing the particles) the “quantum potential” factors into two 3
Of course, by construction, Bohm’s mathematical approach reproduces all the formal results of ordinary Quantum Mechanics, as obtained directly from the Schrödinger equation. Our concern here is with the interpretation.
The Interpretation of Quantum Mechanics
327
“local” potentials. However, if we write 𝜌(𝑥, 𝑦; 𝑡) = 𝜌1 (𝑥; 𝑡)𝜌2 (𝑦; 𝑡)[1 + 𝑐12 (𝑥, 𝑦)] (assuming real wave functions) then we get terms involving only 𝑅1 = √𝜌1 (𝑥) and 𝑐12 (𝑥, 𝑦) and their derivatives, terms involving only 𝑅2 = √𝜌2 (𝑥) and 𝑐12 (𝑥, 𝑦) and their derivatives, and terms involving only 𝑐12 (𝑥, 𝑦) and its derivatives. It is then a mathematical obviousness that the so-called “non-local” behavior is nothing but the effect of statistical correlations. Needless to say that economic systems, biological environments, and many others all share this property with quantum mechanics–to be correlated–although not through Bohm’s potential. Indeed, this may be seen as one of the most impressive characteristics of Quantum Mechanics: to furnish always, no matter which physical system is being considered, the correct correlations by means of the addition of probability amplitudes (with multiplicative coefficients). Being related to statistical correlations, Bohm’s “potential” is hardly a true potential, being simply a correlation expression that plays, in all formulae, a role similar to a potential. “Wholeness” and “implicate order”, two central concepts in Bohm’s interpretation play no role in the present approach, but his distinction between being and behaving, which is never considered in its full power, does play a central role in the developments of this book.
Heisenberg’s Relations For Bohm, Heisenberg’s relations become inscribed into the trajectory formalism because of the initial conditions, given by the probability density 𝜌(𝑥, 0). Since the formalism preserves the solution of the Schrödinger equation, it preserves also Heisenberg’s relations. This is a clever solution. Indeed, since Bohm’s formalism deals with individual corpuscular trajectories, it is not obvious how one can accommodate Heisenberg’s relations. Because of this solution, Bohmian mechanics refer to individual trajectory of particles, while Heisenberg’s relations refer to an ensemble of such trajectories (when the possibilities given by 𝜌(𝑥, 0) are made concrete in actual experiments, with the weights given by 𝜌(𝑥, 0)). Since the present approach always assumes that the results connected to the probability amplitudes always refer to ensemble of particles or individual particles in some time interval, it gives a different interpretation to Heisenberg’s relations. In the present approach, Heisenberg’s relations define the mean square values of the “observables” (position and momentum, for instance) for the ensemble or for the individual physical system considered within some time interval. It is not, thus, the result of an ignorance of the precise initial conditions, as it is considered in Bohm’s interpretation, but the result of the everlasting stochastic behavior of the corpuscles. Furthermore, note that, even if we had the exact initial condition of some particle, the solution of Bohm’s equations would not represent the actual movement of it, since Bohm’s approach is nothing but an average over the true random movement of the particle, according to the present approach. Formal Relations In chapter six we presented the Langevin equations for Quantum Mechanics. In that chapter, an expression close to Bohm’s potential appears as the fluctuating part of the equation, the one closely connected to the fluctuation-dissipation theorem. The expression we found was
328
Olavo Leopoldino da Silva Filho
𝑄′ = −
ℏ2 2 ℏ2 ℏ2 2 [𝜕𝑥 𝑙𝑛|𝑅|]2 − 𝜕𝑥 𝑙𝑛[𝜌(𝑥, 𝑡)] = 𝜕 𝑅, 4𝑚 2𝑚𝑅 2𝑚𝑅 𝑥
which already includes Bohm’s term. As shown in the third chapter, this expression gives, the ̅̅̅̅̅̅̅̅̅̅̅̅̅̅ average momentum fluctuation term (𝛿𝑝(𝑥, 𝑡))2, also related to the notion of temperature. The random part of the Langevin equation comes from our decision to consider every other part of the physical system, such as the underlying fields, as playing the role of a thermal reservoir. This removes the uneasiness many physicists show with the notion of a quantum potential. In this case, previous chapters showed us that Bohmian mechanics is just a mean field theory for the correct Langevin equations. Moreover, one needs not to appeal to notions such as pilot waves, etc. The Langevin equations are just the traditional ones with a specific random term (for each quantum mechanical state).
Relations between Bohm’s Interpretation and the Present One What the present interpretation has in common with Bohmian mechanics is the fact that it also considers only a corpuscular ontology. Particles have trajectories (stochastic but, any way, real). There is no need for wave packet reduction, no role for observers and no place for duality. This last feature is of utmost importance, since the way the present approach removes duality is structurally identical with Bohm’s: by making a difference (in its own terms) between being and behaving. In Bohm’s interpretation, the corpuscles behave in undulatory ways because of the pilot wave. In the present approach, the same corpuscles behave in undulatory ways because of the structured fluctuations, from which it emerges this undulatory behavior (even for one single particle, if we consider correctly the variable time).
12.3. THE STATISTICAL INTERPRETATION: ENSEMBLES The Statistical Interpretation of Quantum Mechanics is most attractive because of the impressive downsizing of the underlying ontology needed to understand the theory, while still being capable of rendering as trivial many situations usually considered weird. This interpretation accepts the Schrödinger equation as a principle, as with those previously mentioned. However, the referent of the probability amplitude is no more the individual system, as with the Copenhagen Interpretation, but the ensemble of equal prepared systems. This simple modification of one postulate changes everything and makes many other postulates of the Copenhagen Interpretation unnecessary. Indeed, from the assumption that the referent of the probability amplitude is the ensemble of identically prepared systems, one gets:
No duality. Since duality needs the concepts of particles and waves to exist at the same ontological level, and within this interpretation the waves are undeniably linked to the ensemble, there can be any duality. The ontology, thus, is one of a corpuscular
The Interpretation of Quantum Mechanics
329
nature, and waves are at the ensemble level. This means that, implicitly, this approach makes something analogous to Bohmian mechanics (although by other means): it distinguishes the levels of being and behaving by assuming that quantum mechanical entities are corpuscles and that they behave as waves when considered as an ensemble. No Complementarity. Of course, if one gets rid of the concept of duality, there is hardly room for the companion concept of Complementarity. Our concepts, in ways very close to those of Classical Statistical Mechanics are perfectly adjusted to understand the microscopic world. No Uncertainty. The symbols ∆𝑥, ∆𝑝, and, in general, ∆𝛼, are interpreted as mere dispersions calculated for the ensemble (as in Classical Statistical Mechanics) and Heisenberg’s relations should not be considered an imposition on our possibilities to understand the world, but a constraint to the scope of Quantum Mechanics. That is, only physical systems with dispersions obeying relations such as ∆𝑥∆𝑝 ≥ ℎ/4𝜋 can be described by Quantum Mechanics. Systems within the scope of Quantum Mechanics have a property that small dispersions for one variable mean big dispersions for the other conjugate variable. This does not prevent quantum mechanical systems of having objective properties4, as must be with the concept of uncertainty (the interpretation, however, may or may not assume these objective properties). No Observer. The observer, with the qualifications given by the Copenhagen Interpretation, is no longer needed. Copenhagen’s notion of a qualified observer comes from the need to find a source for the uncertainty related to the nature or behavior of each individual physical system. It is, thus, a companion to the concept of uncertainty that was washed out. No reduction of the wave packet. The absence of duality in the interpretation makes the notion of a reduction of wave packets irrelevant. Indeed, any linear combination of pure states means simply that these pure states are all probable within the ensemble of equally prepared systems, with the probability given by the modulus squared of the coefficients that appear in the linear combination. This interpretation assumes that the eigenvalues of operators represent the average values of their underlying properties and is fully compatible with the fact that every property (not measurement), being referred to an ensemble and thus statistical, must have some dispersion. This means that the Statistical Interpretation assumes no compromise with operator formation leading to dispersionless properties. Differently from Bohm’s interpretation, the Statistical Interpretation has no intention to put Quantum Mechanics into a deterministic layout. In fact, it assumes from the beginning that Quantum Mechanics is nothing but a statistical theory.
The previous features of the Statistical Interpretation of Quantum Mechanics lead to the resolution of innumerous difficulties with regard to the interpretation of specific phenomena, such as the one known as Schrödinger’s Cat Paradox.
4
Properties that do not depend on any observation process, properties that are intrinsic to the physical system.
330
Olavo Leopoldino da Silva Filho
One characteristic of the interpretation is that, from the experimental point of view, it assumes that the measurement of any quantum mechanical property is the result of measuring the physical system under scrutiny for each one of the virtual infinity of its identically prepared copies. The downsizing produced by the Statistical Interpretation sometimes gives it the name of Minimal Interpretation of Quantum Mechanics.
12.3.1. Too Minimal an Interpretation One may say that the main advantage of the Statistical Interpretation, its minimalistic character, is also the source of its weakness. This has to do with the very notion of an ensemble that the Statistical Interpretation adheres to. For this interpretation, an ensemble is an infinite copy of identical physical systems. Such physical systems may have one, two or any number of individuals, that is, may be composed of one electron, an electron and a proton, etc. The word “ensemble” is not used to name one system with a large number of individuals. Let us call this a space-type ensemble. However, it is an experimental fact that Quantum Mechanics applies to individual systems. If we keep our notion of “ensemble” as the one mentioned, then we must conclude that the Statistical Interpretation was already experimentally proved false. The Statistical Interpretation, however, uses a too restrict notion of ensemble. Indeed, in many cases (mainly in stationary situations), one single system considered within some time window behaves exactly like an ensemble of equally prepared systems. Let us call this a timetype ensemble. The important notion, which links the two notions of ensemble, is the variable time. In the space-type ensemble, one takes a virtually infinite copy of equally prepared systems and makes, at some instant of time an infinite number of measurements, from which there will appear the averages and dispersions. For a time-type ensemble, one takes one single system and, during a time window, submits this system to the measurement of the same property. Thus, the concept that is missing in the Statistical Interpretation is that one of “ergodicity”. However, this concept should not be inserted into the interpretation from nowhere, since it will make this insertion ad hoc. This concept of ergodicity must come with its companion concepts of “randomness”, “fluctuations”, “dissipations”, “entropy”, etc. However, the Statistical Interpretation, with its sole modification of the interpretation for the referent of the probability amplitude is unable to introduce into itself such concepts without being ad hoc.
12.3.2. Connections with the Present Interpretation We may say that the present interpretation makes an extension of the Statistical Interpretation to insert the important notions already mentioned. At this point, it should be needless to say, that the present approach makes this insertion without any ad hoc maneuver. From chapters two through six we have shown how concepts such as “entropy” (third chapter), “randomness” (fourth, fifth and sixth chapters), and “ergodicity” (sixth chapter) can be mathematically introduced into Quantum Mechanics. In
The Interpretation of Quantum Mechanics
331
fact, in chapter six we have shown with simulations that, for the cases there presented, ergodicity obtains. We have also discussed the fact that it is not always that the notion of ergodicity may be used. There are some experiments that select, unambiguously, the space-type ensemble (two slits interferometry is one such).
12.4. THE STOCHASTIC INTERPRETATION: RANDOMNESS In chapter four we presented the Stochastic Interpretation and its concept of randomness. The Stochastic Interpretation (SI) differs from the ones we already alluded by having as one of its goals to mathematically derive the Schrödinger equation. Indeed, in the SI the Schrödinger equation ceases to be a postulate to become a theorem. We have already said how important such a maneuver is: for a theory as abstract as Quantum Mechanics, this type of approach allows one to immediately interpret the symbols of the target theory from their origins in the source theory – already fully interpreted. The more direct the process of derivation, the most obvious the semantic inheritance process. This is exactly what SI does. From a corpuscular framework with a Newtonian equation, one introduces randomness from the notions of forward and backward derivations in such a way that one ends up with the Schrödinger equation (see chapter four). Since the Schrödinger equation makes no reference to stochastic processes among its symbols, it is evident that SI must wash out any explicit mention to randomness in the process of derivation. This is done by the time averaging process used, for instance, in Peña’s derivations [65]. The Stochastic Interpretation then arrives at a Bohmian equation in which it becomes clear, by inheritance of the derivation process, that Bohm’s “potential” is nothing but the expression of an average of the fluctuations – and has nothing to do with true physical potentials with a definite physical source, such as gravitation or electromagnetism. The connection between Bohm’s term and correlations is made explicit by this interpretation. It should be already clear that the SI can serve as a mathematical and an interpretive support to the Statistical Interpretation. It furnishes the Statistical Interpretation some of the constructs that are missing in the latter but are nevertheless crucial to its intents, such as randomness.
12.4.1. Complements to the Stochastic Interpretation However, in ways less demanding than the Statistical Interpretation, the Stochastic Interpretation still lacks some important concepts – important to its own interpretation and to the Statistical Interpretation. Indeed, the very notion of a fluctuation-dissipation process is lacking in SI. This happens exactly because of the already mentioned average process needed to arrive at the Schrödinger equation. Another important concept that is lacking in SI is ergodicity. Again, the average process used to derive the Schrödinger equation erases from the start the explicit and specific
332
Olavo Leopoldino da Silva Filho
fluctuation pattern that would give rise to some quantum mechanical state of a physical system. The Stochastic Interpretation always assumed that the randomness of which it talks should come from an external source, external to the physical system under consideration, such as the electromagnetic background radiation. The results of this book showed that no such assumption is necessary: when we understand that Quantum Mechanics (as based upon the Schrödinger equation) is about the corpuscular subsystem, connected to a field subsystem that is considered only in average terms, that is, a closed system, it becomes obvious that fluctuations and randomness are intrinsic of all quantum mechanical physical systems.
12.4.2. Connections with the Present Interpretation The present interpretation furnishes precisely those other concepts that are missing in the Stochastic Interpretation. It does that by taking recourse to a reformulation of the problem in terms of Langevin Equations. Within the framework of Langevin equations, the fluctuations are made explicit. Thus, one can understand the role played by them in the overall behavior of the system by simply making simulations, as we did in chapter six. The Langevin formulation makes it clear that quantum mechanical systems are always closed physical systems (at least the usual ones conserving energy) for which we assume the canonical ensemble statistical approach in the following way: we keep track of the matter particles of our system and assume the field subsystem to be the reservoir. Exchange between these two subsystems is the intrinsic origin of fluctuations in the position, momentum, energy, etc of particles. One needs not (and ought not) to postulate some external all pervading background (in some analogy with Brownian movement). It is possible to see, with the simulations, the fluctuation-dissipation process and also see, at least for stationary systems, the ergodic hypothesis into work. The interplay of both interpretations, the Stochastic and the Statistical ones, is overwhelming. The relation with Bohmian mechanics is also immediate. Bohmian mechanics emerges as a mean field theory of the underlying fine grained stochastic processes – a conclusion that one can easily extend to the Schrödinger equation. The Bohmian, Stochastic and Statistical Interpretations are thus put into the same parlance, and their shortcomings are all removed at once. The Newtonian limit was also clarified with Langevin equations. It is completely misleading to take Bohmian equations and simply make Planck’s constant to go to zero (to remain with a Newtonian-like equation). Indeed, the quantum potential (as Bohm calls it) is given by an expression that has, explicitly, Planck’s constant, but it is built of the probability amplitude that also depends on Planck’s constant. Moreover, we have shown that the descriptors of Bohmian Mechanics are all statistical in their origin – 𝑝 in that approach means not an independent variable, but a function defined on the configuration space. However, when we are in the framework of the Langevin equation one has only true momenta and positions, with no internal dependence on Planck’s constant, only external dependence, as when Planck’s constant multiplies some term. The limit ℏ → 0 is completely meaningful in this context. In this context, we have shown that the correct Newtonian limit is
The Interpretation of Quantum Mechanics
333
related to the assumption that no fluctuation exists (or that these fluctuations are irrelevant to the overall behavior of the system and disappear together with the process that dissipates them.) All the features we have profited from the minimal interpretation are kept – the ontological downsizing remains impressive.
12.5. THE RELATIVE STATE INTERPRETATION: MANY WORLDS For those acquainted with Hugh Everett’s Relative State Interpretation of Quantum Mechanics [135], it may be a surprise to find a reference to it in this work. Indeed, for a work based on the criticism of the current fetish for the weirdness, an interpretation postulating an infinite non-enumerable number of parallel worlds is certainly to be at the center of the criticism. Although it is certainly true that the present interpretation has nothing to do with parallel worlds, it nonetheless shares some intuitions with Everett’s interpretation – of course, with some reinterpretation of Everett’s constructs. We will show this structural convergence in what follows. We begin by making a quick presentation of the most important aspects of Everett’s interpretation. Everett wants to apply Quantum Mechanics to the General Theory of Relativity, and, thus, to a closed Universe. His intention, moreover, is to produce a metatheory for the Copenhagen Interpretation. In this case, the postulates of the Copenhagen Interpretation, mainly that one related to the reduction of the wave packet, could be fit into a wider framework in which some eventual weirdness would disappear. The fact that he searches for a quantum theory of gravitation for closed systems means that the process of reduction of the wave packet, which depends on an external observer, is not available to him. He thus assumes that the deterministic unfolding of the probability amplitude ruled by the Schrödinger equation is enough to a complete description of Quantum Mechanics. The main idea of his approach is to consider only isolated systems and divide them into two subsystems: one that would be the usual quantum mechanical system being observed and one related to the observer. In any case, observed and observer here means material systems, such as electrons, detectors, etc. There is no recourse to constructs such as minds, consciousness, etc5. The observer subsystem has memory in the usual physical sense, that is, its present state is the result of a process influenced by all previous states with some weight, that is, something like 𝑡
𝜑𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑟 (𝑥, 𝑡) = ∫ 𝐾(𝑥; 𝑡, 𝑡 ′ )𝜑𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑟 (𝑥, 𝑡 ′ )𝑑𝑡′. −∞
The memory of the observing device may be a sort of Turing machine, for instance (a tape with holes on it).
5
Minds are the object of the “Many Minds” interpretation of Quantum Mechanics.
334
Olavo Leopoldino da Silva Filho
The Hilbert space of the complete system can be represented by the tensor product of the Hilbert spaces of the two subsystems, that is, 𝐻 𝑆+𝑂 = 𝐻 𝑆 × 𝐻 𝑂 , where 𝑆 + 𝑂, 𝑆 and 𝑂 are the complete system, the observed subsystem and the observing subsystem, respectively. The 𝑂 effect of function 𝐾(𝑥; 𝑡, 𝑡 ′ ) on 𝜑𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑟 is expressed by him in the format 𝜑[𝐴,𝐵,..,𝐶] , showing that 𝐴, 𝐵, … 𝐶 are time-ordered past events that formed the present 𝜑 𝑂 . In this case, it is obvious that there must be a strong correlation between the states of the observed subsystem and the states of the observing subsystem – or “observation” would be meaningless. Thus, because of the correlations, the states of one subsystem exist relative to the states of the other – and this is the source of the construct “relative states”. Everett thus describes what should be understood by a “good observation”. A good observation, when it comes to observe eigenstates, is such that it gets (prior to measurement) 𝑂 the complete system into the combined state 𝜑 𝑆+𝑂 = 𝜃𝑖 𝜑[𝐴,𝐵,..𝐶] and takes it to state 𝜑 𝑆+𝑂 = 𝑂 𝜃𝑖 𝜑[𝐴,𝐵,..𝐶,𝛼 , where 𝛼𝑖 is the eigenvalues related to the eigenfunction 𝜃𝑖 of system 𝑆. Now, 𝑖] 𝑂 𝜑[𝐴,𝐵,..𝐶,𝛼 means that the observer is “aware” of the eigenvalues 𝛼𝑖 of eigenfunction 𝜃𝑖 (note, 𝑖] here, that Everett has already conformed himself to the Copenhagen Interpretation, by assuming that eigenvalues are measured without dispersion). Using the linearity of the Schrödinger equation he extends this analysis to states that are not eigenstates of the problem (which can be written as superposition of eigenstates) and arrives at the two rules:
𝑆
The observation of a quantity A, with eigenfunctions 𝜑𝑖 1 , in a subsystem 𝑆1 by the observer 𝑂, transforms the total state according to: 𝑆
𝑂 1 𝑆2 𝑆𝑛 𝑂 𝜑′𝑆+𝑂 = 𝜑 𝑆1 𝜑 𝑆2 ⋯ 𝜑 𝑆𝑛 𝜑[⋯ ] → ∑ 𝑎𝑖 𝜃𝑖 𝜑 ⋯ 𝜑 𝜑[⋯𝛼𝑖 ] , 𝑖 𝑆
with 𝑎𝑖 = (𝜃𝑖 1 , 𝜑 𝑆1 ).
𝑆
A further observation of a quantity B, with eigenfunctions 𝜇𝑗 2 on 𝑆2 by the observer 𝑂 transforms the total state according to: 𝑆
𝑆
𝑆
𝑂 𝑂 ∑ 𝑎𝑖 𝜃𝑖 1 𝜑 𝑆2 ⋯ 𝜑 𝑆𝑛 𝜑[⋯𝛼 → ∑ 𝑎𝑖 𝑏𝑗 𝜃𝑖 1 𝜇𝑗 2 𝜑 𝑆3 ⋯ 𝜑 𝑆𝑛 𝜑[⋯𝛼 , 𝑖] 𝑙 𝛽𝑗 ] 𝑖
𝑖,𝑗 𝑆
with 𝑏𝑗 = (𝜇𝑗 1 , 𝜑 𝑆2 ). Everett thus interprets the situation of the first rule as meaning that “there is no longer any independent system state or observer state, since the two have become correlated in a 𝑆 𝑂 one-one manner. However, in each element of the superposition, 𝜃𝑖 1 𝜑[⋯𝛼 , the object-system 𝑖] state is a particular eigenstate of the observation, and furthermore the observer system-state describes the observer as definitely perceiving that particular system state. This correlation is what allows one to maintain the interpretation that a measurement has been performed” ([135], p. 459). To characterize the notion of wave packet reduction he thus considers a situation in which the observer is connected to a number of separate identical subsystems 𝑆𝑖 = 𝑆𝑗 , for all i and j. Then, the sequence of observations of the whole system gives
The Interpretation of Quantum Mechanics 𝑆
𝑆
335
𝑆
𝑂 𝑂 1 2 𝑆 𝑆 𝜑 𝑆+𝑂 → ∑ 𝑎𝑖 𝜃𝑖 1 𝜑 𝑆2 ⋯ 𝜑 𝑆𝑛 𝜑[⋯𝛼 1 → ∑ 𝑎𝑖 𝑎𝑗 𝜃𝑖 𝜃𝑗 𝜑 3 ⋯ 𝜑 𝑛 𝜑 ] [⋯𝛼1 𝛼 2 ] 𝑖
𝑖
𝑖
𝑗
𝑖,𝑗
and so on, such that, after r measurements one gets 𝑆
𝑆
𝑆
𝑂 𝜑 𝑟 = ∑ 𝑎𝑖 𝑎𝑗 ⋯ 𝑎𝑟 𝜃𝑖 1 𝜃𝑗 2 ⋯ 𝜃𝑘 𝑟 𝜑 𝑆𝑟+1 ⋯ 𝜑 𝑆𝑛 𝜑[⋯𝛼 , 1 2 𝛼 ⋯𝛼 𝑟 ] 𝑖
𝑗
𝑘
𝑖,𝑗
which
is
a
superposition
of
𝑆
𝑆
𝑆
𝑂 𝜑𝑖𝑗⋯𝑘 = 𝜃𝑖 1 𝜃𝑗 2 ⋯ 𝜃𝑘 𝑟 𝜑 𝑆𝑟+1 ⋯ 𝜑 𝑆𝑛 𝜑[⋯𝛼 , 1 2 𝛼 ⋯𝛼 𝑟 ]
states
𝑖
𝑗
𝑘
representing a definite memory sequence of the observer [⋯ 𝛼𝑖1 𝛼𝑗2 ⋯ 𝛼𝑘𝑟 ]. Relative to him the 𝑆
𝑆
𝑆
observed system states are given by 𝜃𝑖 1 𝜃𝑗 2 ⋯ 𝜃𝑘 𝑟 , with the other 𝑆𝑟+𝜎 systems unaltered. This superposition means that the observer perceived an apparently random sequence of definite results for his observations, but these observations left the observed systems in their eigenstates. If, at a later moment, the observer returns to the observation of system 𝑆𝑙 , with 𝑙 < 𝑟, then he will get a memory configuration of the type [⋯ 𝛼𝑖1 ⋯ 𝛼𝑗𝑙 ⋯ 𝛼𝑘𝑟 𝛼𝑗𝑙 ] for each element of the superposition, with the same value for the last eigenvalue as obtained before in the memory of the observer. This means that the memory states are correlated. It is from this sort of correlation that there appears “to the observer (…) that each initial observation on a system caused this system to “jump” to an eigenstate in a random fashion and thereafter remain there for subsequent measurements on the same system” ([135], p 459). At this point one finds the crucial step into the “Many Worlds” interpretation: we have only one observer system, but we end up with no definite state for this system, but rather a superposition of such states which are relative to the sequence of observations made. For each set of observations, there is a specific observer state for a specific observed set of states. Thus, after each observation, the observer state “branches” into a number of different states. Each branch represents a different outcome of a set of measurements taken along time and the corresponding set of states for the object-system. All branches exist simultaneously in the superposition after any given sequence of observations. For these branches to simultaneously exist in the superposition requires their objective realization in parallel universes6. This whole process is represented in Figure 12-1. It is at this point that the most important intuition hidden in Everett’s interpretation emerges. Looking at Figure 12-1, we can see that at any vertical line (where an infinite superposition of the same system appears at the same instant of time) we must have the outcome of the whole experiment – that is, whenever we make our statistics over parallel universes we should recover the complete quantum mechanical result. This statistics is not at our reach, obviously. At our reach as observers is one branch representing statistics taken along time. In Everett’s pedagogical example he takes recourse to an ensemble of identically prepared physical systems. Of course, if Everett’s interpretation is not to suffer from the same criticism imposed on Ballentine’s Interpretation, he must assume that the same example 6
Although the notion of “parallel universes” is not mentioned in Everett’s paper, his note added in proof ([135], p. 459,460) strongly suggests it.
336
Olavo Leopoldino da Silva Filho
should be uttered using only one physical system on which observations are made over time (at least when it is possible to do so – we have already shown that there are some situations in which the ensemble picture is imposed by the experimental setup). In this case, each branch would represent observation throughout time and would be equivalent to the initial distribution of results throughout the universes. This is simply the statement of the Ergodic Principle – awkwardly uttered, most certainly, but simply that in any case.
12.5.1. Connections with the Present Interpretation It is pretty much obvious that Everett’s approach transfers all the problems related to the reduction of one wave packet (a superposition of states) into a single point of a detector to the “observer side”, by including the observer into his closed system. Now, one has a superposition of observers and, since they cannot be thought of as being reduced, they are supposed to exist into parallel ever-branching universes. A shortcut to Everett’s approach would be to say that in the process that seems a “reduction” it is the very wave packet (as a superposition) that makes the universe(s) to branch, with one realization of its possibilities (as a superposition) in each one of these everbranching universes. Although all this gibberish of Many Worlds seems grotesque for the present interpretation, there are, within Everett’s approach, very subtle and important intuitions that are structurally present in our approach. The main intuition is the one assuming that in Quantum Mechanics we always have one isolated system with two subsystems. The Schrödinger equation, as we usually treat it, is related to one of these two subsystems, while the other is treated in general terms as a sort of reservoir. Thus, the Schrödinger equation refers to a closed system which is one of the subsystems alluded before. Fluctuations and randomness come directly from this assumption – and the Langevin approach is merely a statement of that. Of course, since the present interpretation has nothing to do with observers, the two subsystems have to be reinterpreted. It is not difficult to see how this can be done. Consider, for instance, a Gaussian wave packet passing through a potential barrier of strength 𝑉 (extending from – 𝑎 to 𝑎 in one dimension). When one simulates the propagation of this wave packet, one finds that when the wave packet is in regions where 𝑉 is not zero, its Gaussian form changes into very intricate functional appearances. However, the potential never changes. This means that the behavior of the field is being considered as only a mean field, that is, as a reservoir, such that the modifications in the particle subsystem (to which the Schrödinger equation applies) do not alter, in the average, its state. The field subsystem, as a reservoir, keeps the energy of the particle subsystem into some fixed average value, being subjected, however, to fluctuations. Every single passage of Everett’s work can be rephrased to fit into this approach 𝑂 (although this would be irrelevant). The only difference is that 𝜑[⋯𝛼 now would refer 1 2 𝛼 ⋯𝛼 𝑟 ] 𝑖
𝑗
𝑘
to the state of the field subsystem that passed to all the “memory” states [⋯ 𝛼𝑖1 𝛼𝑗2 ⋯ 𝛼𝑘𝑟 ]
The Interpretation of Quantum Mechanics
337
related to the occurrence (not measurement) of these values in the corpuscular system in some sequence of time7. Thus, the present interpretation also assumes that the wave function represents a relative state, but in the sense that it refers to a closed system (the corpuscular part of a larger isolated system composed also of a field subsystem, considered as a reservoir). In other words, we regard the Schrödinger equation as applying to mean field situations. As a reservoir, it is the field state that selects the specific corpuscular system state (in the same manner that a thermal reservoir selects the average temperature of its corresponding closed system). Moreover, the present interpretation also regards quantum mechanical systems (at least stationary ones) as governed by the Ergodic Principle, which puts ensembles and single systems on equal footing. Although all these notions were worked out in the present interpretation without any consideration of Everett’s approach, it is very interesting to compare them both, even if it is done in hindsight. Such a comparison can elucidate a number of issues of our own interpretation, while dismissing with unnecessary elements of the rival approach.
12.6. CONSISTENT HISTORIES INTERPRETATION: EVENTS The last interpretation we would like to present is the Coherent Histories Interpretation, which also have interesting relations to our own. To achieve this, let us begin with a set of important definitions. Definition 1: (Event) an event is the specification of properties of a system through a projection operator of the Hilbert subspace representing the property. That is, we make an event 𝑃𝑖,𝑗 (a proposition 𝑃𝑖 made at time 𝑡𝑗 ) to be represented in the formalism by the projection operator 𝑃̂𝑖,𝑗 . In some double slit experiment, for instance, an event may be represented by propositions such as: “the electron passed through aperture one at time 𝑡”, “the electron passed through aperture two at time 𝑡”, “the electron was measured by the photographic plate at position 𝑦 (at time 𝑡)”, etc. However, it cannot be: “the electron passed through both apertures (at the instant of time 𝑡)”
7
Note, however, that these values need not be exactly the eigenvalue of the problem, but only this eigenvalue plus some random fluctuation, that is, 𝛼𝑖𝑘 = 𝛼𝑖 ± ∆𝑘 𝛼𝑖 . The eigenvalue would be, thus, the average 𝛼𝑖 = ∑𝑛𝑘=1 𝛼𝑖𝑘 /𝑛, with 𝑛 → ∞, this process being realized by a succession of values in time (for a single system) or a list of values obtained from an ensemble of identically prepared systems. Note also that the state, here, is already fixed – as explained in the text in the next paragraph.
Figure 12.1. Everett's branching for the double slit experiment (in which the measured state is simply a position over the photographic plate). Only three systems are shown being measured. Quantum Mechanics is correct if averages over parallel universes correspond to averages taken in the same universe along time, for each branch, after a sufficient large number of measurements.
The Interpretation of Quantum Mechanics
339
Definition 2: (History) a homogeneous history of events 𝐻𝑖 is a sequence of any number of events ordered with respect to time. Symbolically, 𝐻𝑖 = (𝑃𝑖,1 , 𝑃𝑖,2 , ⋯ , 𝑃𝑖,𝑛𝑖 ), which should be read as: the history 𝐻𝑖 occurs when the proposition 𝑃𝑖,1 is true at time 𝑡1 , proposition 𝑃𝑖,2 is true at time 𝑡2, until proposition 𝑃𝑖,𝑛 is true at time 𝑡𝑛 . Another way to represent histories in the formalism is by using projection operators in the place of propositions. This makes a history to be represented by a projection operator that is the tensor product of the projection operators that constitute its events. In symbols: ̂𝑖 = 𝑃̂𝑖,1 ⨂𝑃̂𝑖,2 ⨂ ⋯ ⨂𝑃̂𝑖,𝑛 𝐻 𝑖 From these definitions, we can proceed to the postulates of the Consistent Histories approach (CH). Postulate 1: Quantum Mechanics refers to isolated systems. An event can be any stage of the evolution of a quantum mechanical system, a measurement, the interaction with the environment, etc. Thus, the process of measurement is included (as any other event) within the scope of the usual formalism based on the Schrödinger equation – which means that there wouldn’t be any reduction of the wave packet (indeed, one may say that one of the major aims of the CH is to get rid of these reductions). The main argument of the founders of this interpretation to get rid of any sort of measurement theory is that a fundamental theory, such as Quantum Mechanics, should supply a basis for interpreting experiments, not the other way around. Postulate 2: the time development of any individual quantum mechanical system is stochastic and involves many histories (this is the superposition principle in history-disguised form). Of course, the time development of a quantum mechanical system is still given by the unitary evolution operator, but now it is viewed as going from event to event, in a time ordered sequence of events building out possible different histories – that is, in the account for the concrete evolution of a quantum mechanical system we must take into account not only the actual history, but all possible histories. Postulate 3: the action of an operator on a wavefunction represents the very act of observation. Moreover, if the operator is diagonal in the chosen representation, then it means that the experimental setup was chosen in such a way that the underlying eigenvalue (the observable entity) can be measured with no dispersion. Although this last postulate is rarely explicit assumed in usual expositions of CH, it is clearly demanded by the way an event may be formed. Indeed, a history may be composed of incompatible properties at different instant of times (such as 𝑥 and 𝑝), but the same time slice can contain only compatible properties. This notion of “compatibility” is tributary of postulate 3 – and is mathematically revealed by the use of projection operators for events.
340
Olavo Leopoldino da Silva Filho
With these notions we can now introduce the main construct of CH, which is the notion of consistency. To achieve this, we then write the class operator as 𝑛𝑖
𝐶̂𝐻𝑖 = 𝑇 ∏ 𝑃̂𝑖,𝑗 , 𝑗=1
in which 𝑇 represents an operator that puts all the projectors 𝑃̂𝑖,𝑗 in a time ordered sequence. In this case, 𝐶̂𝐻𝑖 |𝛼〉 represents the (event-like) evolution of a state initially represented by |𝛼〉, and assuming properties 𝑃̂𝑖,𝑗 during the time period [𝑡𝑖,1 , ⋯ , 𝑡𝑖,𝑛 ]. 𝑖
̂𝑖 } is strongly consistent if we Definition 3: (consistency) a set of histories {𝐻 ̂𝑖 } is have 𝑇𝑟 {𝐶̂𝐻𝑖 𝜌̂𝐶̂𝐻†𝑗 } = 0, for 𝑖 ≠ 𝑗, where 𝜌̂ is the density matrix. A set of histories {𝐻 weakly consistent if we have 𝑇𝑟 {𝐶̂𝐻𝑖 𝜌̂𝐶̂𝐻†𝑗 } ≈ 0, for 𝑖 ≠ 𝑗. This definition implies that different consistent histories are orthogonal (strong consistency) or almost orthogonal (weak consistency), in the sense that (for strong consistency) ∑⟨𝛼𝑖 (𝑡𝑖,𝑘 )|𝛼𝑗 (𝑡𝑗,𝑘 )⟩ = 0, 𝛼
for each instant of time 𝑡𝑘 , where |𝛼𝑖 (𝑡𝑖,𝑘 )〉 = 𝑃̂𝑖,𝑘 |𝛼〉. We may now define a relation between consistent histories and probabilities. Postulate 4: The probability associated to a history is given by 𝑃𝑟𝑜𝑏(𝐻𝑖 ) = 𝑇𝑟{𝐶̂𝐻𝑖 𝜌̂𝐶̂𝐻†𝑖 }. With this association between probability and histories we get the main result of CH: the probability densities begin to obey usual traditional properties, such as 𝑃𝑟𝑜𝑏(𝐻𝑖 ∪ 𝐻𝑗 ) = 𝑃𝑟𝑜𝑏(𝐻𝑖 ) + 𝑃𝑟𝑜𝑏(𝐻𝑗 ) − 𝑃𝑟𝑜𝑏(𝐻𝑖 ∩ 𝐻𝑗 ). Those adhering to this interpretation assume that this is the way one can solve the apparent difficulties about probabilities and superposition of states in Quantum Mechanics. The Consistent Histories Interpretation profits from insights coming from the notions of quantum decoherence. It is assumed that all irreversible macroscopic phenomena (assumedly all classical measurements) render histories automatically consistent. This would allow one to recover classical reasoning and “common sense” when applied to the outcomes of these measurements. According to Roland Omnès, [136] “[the] history approach, although it was initially independent of the Copenhagen approach, is in some sense a more elaborate version of it. It has, of course, the advantage
The Interpretation of Quantum Mechanics
341
of being more precise, of including classical physics, and of providing an explicit logical framework for indisputable proofs. But, when the Copenhagen interpretation is completed by the modern results about correspondence and decoherence, it essentially amounts to the same physics. [... There are] three main differences: 1.
2.
3.
The logical equivalence between an empirical datum, which is a macroscopic phenomenon, and the result of a measurement, which is a quantum property, becomes clearer in the new approach, whereas it remained mostly tacit and questionable in the Copenhagen formulation. There are two apparently distinct notions of probability in the new approach. One is abstract and directed toward logic, whereas the other is empirical and expresses the randomness of measurements. We need to understand their relation and why they coincide with the empirical notion entering into the Copenhagen rules. The main difference lies in the meaning of the reduction rule for ‘wave packet collapse’. In the new approach, the rule is valid but no specific effect on the measured object can be held responsible for it. Decoherence in the measuring device is enough.”
The proponents of CH argue that it overcomes the fundamental disadvantages of the old Copenhagen Interpretation, and can be used as a complete interpretational framework for Quantum Mechanics. Roland Omnès1 [138] argues that CH can be viewed as selecting the sets of classical questions that can be consistently asked of a single quantum system; any other question should be considered fundamentally inconsistent, and thus meaningless. In this context, it is possible to demonstrate that classical, logical reasoning often does apply, even to quantum experiments – but we can now be mathematically exact about the limits of classical logic.
12.6.1. Connections with the Present Interpretations A very good way to assess CH is to begin considering its ultimate goal and the reasons used to justify it. The main goal of CH is certainly to find a criterion that allows one to speak of quantum mechanical events in a way that renders the underlying probability densities the usual properties of Classical Mechanics. This means that CH imposes semantic restrictions on utterances that are related to superposition of events – which introduces a rule for probability composition based on the probability amplitudes, not based on probability densities. Thus, CH adheres completely to Bohr’s Complementarity Principle, assuming implicitly that our apparent inability to grasp the quantum mechanical world is mainly due to our insistence in trying to speak about histories that are not consistent (that present some overlapping). This adhesion is coherent to the fact that CH accepts almost all the Copenhagen Interpretation, except for its principle of reduction of the wave packet – that is precisely what
1
Omnés assumes an approach more founded in logical reasoning, while Griffiths [137] uses an approach based on the notion of frameworks.
342
Olavo Leopoldino da Silva Filho
it tries to reformulate. In fact, we might say that CH gives Bohr’s Complementarity Principle a precise logical framework. This is why, for instance, it is not allowed to include, in the same “time-slice” of an event (assumed) incompatible situations such as “𝑥 is measured without dispersion” and “𝑝 is measured without dispersion”, since CH assumes, such as CI, that one cannot, at some instant of time, measure 𝑥 and 𝑝 (without dispersion). This is related, of course, to the way CI treats Heisenberg’s relations, and is formally introduced in CH by the use of projection operators over the Hilbert space to represent events. However, this looks strange, since CH assumes explicitly that Quantum Mechanics should have a stochastic support – and, thus, no variable should be measured without dispersion. In such a stochastic approach, dispersions are congenial of all properties in any physical system. Moreover, we have already shown that this requirement of dispersionless properties (based on some Complete Set of Commuting Observables) introduces unavoidably the problem of ambiguous operator formation. If one assumes a stochastic support for Quantum Mechanics, the superposition of states does not introduce problems for the interpretation, since it can be treated by an ensemble approach or, for the analysis of single systems, it can be treated by an approach based on the Ergodic Principle, which takes single-systems (within some minimal time window) into ensembles (at some instant of time). In each one of these situations there is no problem at all in interpreting the superposition of states. For the ensemble, it is obvious what should mean a superposition (and Ballentine has already elucidated it). For a single stochastic system the superposition |𝑎〉 = ∑ 𝑐𝑛 |𝑎𝑛 〉 represents that the underlying physical system has the set {|𝑎𝑛 〉} of accessible states for some, say, energy; within some time window each ortonormalized state |𝑎𝑛 〉 will be approximately accessed by the physical system a number |𝑐𝑛 |2/n of times (which becomes exact at the limit 𝑛 → ∞, or an infinite observation window). Furthermore, we have shown that our approach treats the way one calculates statistical moments exactly in the same way as in Classical Statistical Physics. Our phase space probability density function incorporates any superposition that may exist in the quantum mechanical problem, as it should do, since superposition is the way by which Quantum Mechanics introduces correlation. As long as we assume this relation between superposition and correlation (a relation that must consider the variable time in an appropriate manner), there is no mystery related to the former, since it is quite clear what is meant by the latter.
12.7. THE PRESENT INTERPRETATION One may have the impression that it suffices to utter that Ballentine’s interpretation of Quantum Mechanics gets its completion with the use of the Ergodic Principle (in statistics). This would be misleading. True enough, the appropriation of the Ergodic Principle by our interpretation and the consequent extension of Ballentine’s interpretation in what should be considered a complete statistical assessment of Quantum Mechanics gives the present interpretation soundness. However, one should be aware against the mere utterance of concepts within some interpretation, just to make it palatable.
The Interpretation of Quantum Mechanics
343
An interpretation is part of the physics of the theory. Indeed, that part that makes physics what it is: an interpreted calculus. Without an interpretation, one is left with a mere bunch of mathematical formulas. Thus, being part of the physics, an interpretation must be also based on mathematical development, not only on philosophical leanings. One cannot simply state that the Ergodic Principle is valid for Quantum Mechanics (whenever it is meaningful to talk about repeated measurements on single systems). This statement must be accompanied by sound mathematical development. The same must be said about each and every concept pertaining to the quantum framework. It was the aim of this book to follow in every respect this tenet. Each interpretation of some concept was presented with respect to one or more mathematical developments of the theory. True, some concepts did not allow their immediate interpretation at the time they were introduced, since the mathematical elements at hand were considered insufficient to make that much—the approach merely suggested some underlying interpretation. Then, another approach, taking recourse to other concepts, was developed to enlighten the role of that concept within the interpretation of Quantum Mechanics. This is how the present interpretation should be viewed: an interconnected structure of derivations (and concepts) of the quantum mechanical formalism, each derivation sustaining some set of concepts (akin to its formal structure), while suggesting other concepts which are enlightened by another set of derivations. This state of affairs may be synthesized as in Figure 12-2, where we present schematically the relations among the derivations, made throughout this book. This figure also presents the main concepts of each derivation (those for which it can provide an undisputed interpretation) and some concepts to which the derivation only alludes (those for which it can provide only a suggested interpretation). Figure 12-2 also presents how each derivation was connected to a set of other derivations to show that they are mathematically identical, while assessing with different strengths semantic domains. Thus, the interconnections of Figure 12-2 can be described in some of its richness in the following way:
The characteristic function derivation was mainly concerned with the derivation of the Schrödinger equation from some axioms that allow the interpretation of the main symbols of Quantum Mechanics by inheritage. Thus, assuming that the derivation was sound, it was possible to undisputable interpret the wavefunction 𝜓 as a probability amplitude. The use of the Liouville equation also allowed to interpret the Schrödinger equation as referring to ensembles (and only them, at this point). The way by which one should quantize a system in orthogonal coordinates was also clarified. This derivation suggested (verly lightly) that the parameter 𝛿𝑥 should be connected to fluctuations;
This characteristic function derivation was then mathematically connected to Feynman’s path integral approach by using simply a Legendre transformation between the momentum and the velocity, usual to the framework of Lagrangian and Hamiltonian Mechanics. Fluctuations and randomness fit much better wihin this approach, although not being its conceptual focus, than they fit within the approach related to the characteristic function
344
Olavo Leopoldino da Silva Filho
because of the formal notion of paths between two points in space. As with the characteristic function derivation, Feynman’s approach was based on some infinitesimal notion of time demanding clarification; The characteristic function derivation was also connected to the entropy derivation. The central concept of this derivation, the entropy, strengthened the idea that randomness and fluctuations should play some role within Quantum Mechanics— giving the suggested interpretation of 𝛿𝑥 as connected to fluctuations better grounds. Moreover, it introduced the notion of a phase-space probability density, although in an undisputable way because of the probem of interpreting the role played by second order expansions. In any case, the entropy derivation and the characteristic function derivation were formally connected by the very expansion of the characteristic function in its statistical moments (defined over configuration space); The entropy derivation was then mathematically connected to the stochastic derivation that undisputable interprets Quantum Mechanics as being related to randomness and fluctuations. This interpretation also undisputably fixed the interpretation of Bohm’s “potential” as exclusively related to the stochastic part of the behavior of quantum mechanical systems (and, thus, making Bohm’s dreams of a deterministic theory a non sequitur). The stochastic interpretation was formaly connected to the entropy derivation by means of the fluctuation dissipation theorem. This stochastic approach, together with the entropy derivation, strongly suggested that Quantum Mechanics (as represented by the Schröndiger equation) is related to the corpuscular subsystem, while the field subsystem is treated as a heat bath – this allowed us to interpret stochastic behavior as coming from this separation of subsystems (cannonical ensemble approach of Statistical Physics), and not from some external field (macrocannonical approach). The stochastic interpretation also suggested the notion of ergodicity. However, as with all the other interpretations, the stochastic interpretation was also plagued by the issue on taking only expansions up to second order in some parameter; Then it came the Central Limit Theorem (CLT) derivation. It fixed undisputably the notion of second order expansions by referring them to the so called “infinitesimal processes” that give rise to a Gaussian class of universality with respect to statistical behavior. This result shed light on all four previous derivations. The CLT derivation was easily connected to the characteristic function derivation and the entropy derivation. The phase space probability density function to which the entropy derivation alluded was then made sound and properties of the statistical processes underlying Quantum Mechanics were made explicit (as the need to consider sums of independent random variables, a main construct of the CLT). Last, but not least, came the Langevin derivation which undisputably presented (by simulations) the role played by fluctuation-dissipation processes. This derivation also enlightened us about the necessary interpretations of Bohm’s approach. It also allowed us to show the equivalence, for some quantum mechanical systems, between ensemble and single system approaches (the ergodic principle). This extended the reference of the quantum mechanical formalism to include single systems whenever the ergodic principle is aplicable. The Langevin derivation was connected to the characteristic function derivation by considering the very process of taking the
The Interpretation of Quantum Mechanics
345
characteristic function as a sort of sampling on fibers over phase space. It was also connected to the entropy derivation. Notions such as “transient behavior” and others, akin to the Langevin approach, were also put forward. Moreover, the separation of the physical system into two subsystems (the corpuscular subsystem and the field subsystem), suggested by the stochastic approach, was given undisputable soundness within the Langevin derivation. The Langevin derivation also enlightened us about the stochastic derivation, that does not include those terms related to fluctuation and dissipation, and ends up with just Bohm’s potential. Moreover, the issue about waves and corpuscles was solved as a consequence of the very formalism; it is undisputable that Langevin equations refer to corpuscles while it was shown that the behavior of corpuscles along some time window can assume the form of a wave (an amplitude of probability that satisfies a wave equation). The difference between Newtonian physics and Classical Physics became apparent in this approach, since Langevin equations are classical in any possible perspective, while not Newtonian. The Newtonian limit of the Langevin equation is undisputable obtained when one assumes that the pair fluctuation-dissipation is irrelevant to the description of the physical system. That being said, we can now present the main results of the present interpretation: we will be using, for that, the characteristic function/CLT derivation, just as a means to fix the axioms (but already considering all the other results of the book).
Axioms Axiom 1: The Classical Stochastic Liouville equation is valid for the description of any quantum system described by the Schrödinger equation; Axiom 2: The characteristic function 𝑍𝑛 (𝑞, δ𝑞; 𝑡) of the random variable 𝑝 = ∑𝑛𝑘=1 𝑝𝑘 and can be written as the product Ψ ∗ (𝑞 −
δ𝑞 2
; 𝑡) Ψ (𝑞 +
δ𝑞 2
; 𝑡) for any quantum system, with
𝑛 → ∞, and Quantum Mechanics refers to the universality class defined by the CLT.
Most Important Consequences and Their Interpretations
The primary onthological entities represented in the formalism are corpuscles; o This comes necessarily from the axioms of the theory; The function Ψ must be interpreted as a probability amplitude over configuration space. It refers, primarily, to ensembles. However, in all cases in which an approach based on single systems is possible, if the Ergodic Principle applies to the physical situation, then the function Ψ also refers to a single system description; o This comes immediately from the axioms of the theory, by inheritance; o This means that there is no need to take recourse to things like reduction of the wave packet, observers, complementarity;
Figure 12.2. A synthesis of the conceptual net developed in the present book.
The Interpretation of Quantum Mechanics
347
The function Ψ, the solution of a wave equation, is related to the behavior of the primary entities of the formalism – the corpuscles; o This means that there is no need to sustain any kind of duality in the present approach, since corpuscles and waves refer to distinct onthological levels (being and behaving); The behavior of the primary entities of the formalism, the corpuscles, has a stochastic suport; o This is guaranteed by the stochastic derivation, equivalent to those other derivations developed by ourselves; Whenever one assumes superpositions for any quantum mechanical system, these superpositions represent, in the ensemble perspective, that when we look at some system of the ensemble, there is some probability of finding it in one of the superposed states with the probability given by the square modulus of the coefficient. In the single system perspective, this probability is the outcome of the fact that the system occupies each one of the superposed states during a time comparable to the relation among the square modulus of the coefficients of the expansion representing the superposition; o There is no need to a principle such as the reduction of the wave particle. Superpositions are the way Quantum Mechanics represents correlations among statistical states of some physical system; o This makes Quantum Mechanics a very unique statistical theory, since any correlation among states within it can be found by a general mathematical procedure: the mere sum of the probability amplitudes of the underlying states; o Correlations are, by their very nature, “non-local”, but they depend on the system having time for establishing them. In the single system approach they are the result of letting the system to fluctuate for some time until it overcomes its transient behavior. This is the origin of the “non-local” behavior, which does not imply that Quantum Mechanics is related to any “non-local” potential or force. There is no need, thus, for notions as “wholeness” and causality is preserved (although not determinism); o It is very important, thus, to keep in mind that the very boundary conditions of the problem can set forth unsuspected correlations. This became clear when we study the double slit problem by means of exchange of linear momentum between the incoming corpuscles and the slits. There, the mere existence of the slits (and the underlying boundary conditions) implies the necessary correlation between instances of experiments using only a single incoming particle – a correlation that is expressed in wave like terms as the superposition of waves coming from the center of the slits. Thus, although instances of the double slit experiment using a single incoming particle are separated (in time) from each other, they remain correlated because of the very plate and its slits. o Moreover, the interpretation fully and happily embraces Realism, despite being stochastic, as any other classical stochastic theory; Quantum Mechanics, as related to the Schrödinger equation, refers to an isolated
348
Olavo Leopoldino da Silva Filho
system which is separated into two subsystems, one related to the corpuscles properly – which is its primary concern— and the other related to the fields; o This is guaranteed by the Langevin derivation, since the Langevin equations are, in this case, the outcome exactly of such a consideration; o This means that Quantum Mechanics is considered statistically in the canonical ensemble perspective; o We must also conclude that Quantum Mechanics is a sort of mean field theory, since the fields, which are connected to the random corpuscular subsystem, are not being considered in their detailed random behavior; o This separation is the source of the fluctuations to which the corpuscular subsystem is subjected – the fields, thus, are a sort of thermal bath; The measurable quantities of the quantum mechanical formalism are the results of the application of operators to the function Ψ. These quantities are the so called observables of the theory. o Thus, the Complete Set of Commuting Observables is merely a mathematical tool that allows us to get our equations solved for (one of) the best choice(s) of the symmetry axes on Hilbert space for the mathematical problem under consideration; There is no dispersionless observable, being it an eigenvalue of some operator or not. o This comes immediately from the structure of the phase-space probability density function, but it could be equally well considered as the obvious result of the stochastic nature of the theory, which would make it strange the existence of any dispersionless quantity; Heisenberg’s relations define a class of systems that can be properly called quantum mechanical. o This implies an intrinsic interpretation of these relations, that is, the dispersions are the outcome of the random behavior of the quantum mechanical systems because of the specific stochastic support to which each state is connected ; o This is clearly shown by the entropy derivation; o There is no need to postulate effects coming from the (external) actions of observers . The theory is completely objective; o Heisenberg’s relation for the energy and the time must be interpreted as saying that for physical system showing greater dispersions (stronger fluctuations), one needs less time for this system to fill all its accessible states, and thus overcome its transient phase; o These relations refer to the statistical dispersions of non-commuting observables measured 𝑛 times, not to experimental errors related to one single measurement of a single system (contrary to the suggestion made by Heisenberg’s microscope example).
The list of elements of the interpretation could be made bigger, but the above items should suffice for a deep understanding of the theory.
The Interpretation of Quantum Mechanics
349
12.8. CLOSING REMARKS ON PHYSICS, WEIRDNESS AND GARBAGE COLLECTORS Despite the title, this book is not about the principles of Quantum Mechanics, nor is it about new perspectives for old problems. It is not even about extensions of Quantum Mechanics… This book’s aim is just the interpretation of the formal apparatus of Quantum Mechanics, so that it can turn itself into a true physical theory. Principles, new perspectives and extensions are just the way we found out to back it up the interpretation, to give it the soundness a physical theory must have. It is expected that a new theory presents many unresolved issues, mainly with respect to the interpretation of its main constructs. From this point of view, Quantum Mechanics is not different from Newtonian Mechanics, that was the final result of a struggle that lasted for thousands of years, nor is it different from Electromagnetic Theory, that needed a century to stabilize its own worldview. The Special Theory of Relativity is another example of a bunch of hypothesis and wild guesses that remained in search for an interpretation for some fifty years until the work of Einstein in 1905. What makes Quantum Mechanics unique with respect to all those theories is the fact that, for Quantum Mechanics the formalism came much earlier than its axiomatization and also its interpretation. Just to cite an example, the Schrödinger equation came before the interpretation of its main symbol, the probability amplitude, and during almost all the 𝑋𝑋 𝑡ℎ century there remained doubts about its referents (single systems or ensembles). On the other hand, when Newton set forth his axioms for Mechanics, the referents of their symbols were known enough, at least in the way physicists “know” constructs, such as forces, energies, masses, etc. The same could be said about Electromagnetic Theory for constructs such as fields, for instance, that took almost half a century to become mature before they were used in Maxwell’s axiomatization. In fact, the history of these great areas of physics is, in some sense, the history of such constructs and the axiomatization of the underlying theory is its final completion (from which many different consequences could be extracted). Quantum Mechanics began with an equation (the Schrödinger equation) of which symbols the referents were unknown (the probability amplitude, dispersions, entanglement, etc.) The 𝑋𝑋 𝑡ℎ century then came to provide these symbols some history and to give the theory its completion and identity. However, even in its earlier stages Quantum Mechanics began to live with a new Zeitgeist for which the weirdness, the strange and, ultimately, the incomprehension became an acceptable element of a theory. Who among us while students of Quantum Mechanics have not heard from our teachers the advice not to try to fully understand Quantum Mechanics, for it will be opaque to such an endeavor? Bohr’s Complementarity Principle, for instance, is not a principle of Physics, it is a principle of Linguistics—a principle that tries to regulate our way of thinking and questioning Nature given some interpretation of the Uncertainty Relations, that were thought as setting limits to our knowing of the elements of the reality. The Coherent Histories Interpretation goes all along these lines. Reality and Objectivity themselves were expunged from Physics (at least in many physicists’ parlance, since in the laboratories these concepts kept their pace unaltered—not
350
Olavo Leopoldino da Silva Filho
seldom in the attitudes of the same physicists). These two concepts, indeed, live uneasily in the companion of the weird. In any case, that thing that moved our predecessors to keep fighting for clarity and understanding, the horror insoliti, has faded away and in its place, at least with respect to interpretation issues, came laziness. Why not? If one got too abstruse an interpretation of some phenomenon, too abstruse even for the initiated, one had always recourse to observers, duality, Complementarity, reductions, etc to play with. If none of these sufficed, there was always the recourse to the “natural” weirdness of Quantum Mechanics. Historically, this leniency with the weird comes with Heisenberg’s argument on an assumed weirdness in the field of the Special Theory of Relativity. The assumption is misplaced for the reason already mentioned that the latter has always had a control of its constructs and referents incomparably greater than Quantum Mechanics. No one has ever heard the utterance from a good physicist that “nobody understands Relativity”, while it is generally accepted that “nobody understands Quantum Mechanics”. Indeed: “The difficulty really is psychological and exists in the perpetual torment that results from your saying to yourself, ‘But how can it be like that?’ which is a reflection of uncontrolled but utterly vain desire to see it in terms of something familiar. I will not describe it in terms of an analogy with something familiar; I will simply describe it. There was a time when the newspapers said that only twelve men understood the theory of relativity. I do not believe there ever was such a time. There might have been a time when only one man did, because he was the only guy who caught on, before he wrote his paper. But after people read the paper a lot of people understood the theory of relativity in some way or other, certainly more than twelve. On the other hand, I think I can safely say that nobody understands quantum mechanics. So do not take the lecture too seriously, feeling that you really have to understand in terms of some model what I am going to describe, but just relax and enjoy it. I am going to tell you what nature behaves like. If you will simply admit that maybe she does behave like this, you will find her a delightful, entrancing thing. Do not keep saying to yourself, if you can possible avoid it, ‘But how can it be like that?’ because you will get 'down the drain', into a blind alley from which nobody has escaped. Nobody knows how it can be like that.” (Richard P. Feynman, The Messenger Lectures, 1964, MIT)
After the Solvay Conference, many physicists rose against this posture. They are dispersedly cited in this book and to them I offer it as an encomium. Quantum Mechanics has received many ingenious interpretations since its earlier days. Now, with respect to its interpretation, it is not time for ingeniousness anymore. The interpretation of Quantum Mechanics is in need of garbage collectors. To me, Physics is inseparable from the “perpetual torment” mentioned by Feynman (I would call it a structural torment and, thus, perpetual). This book is an answer to one such torment. If you reached this point, there is good probability that you are a tormented physicist too, possibly by the same ghosts. I sincerely hope this book can serve for the tormented as it served me.
REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]
[12]
[13] [14] [15] [16] [17]
[18]
Pullman, B. (1998). The Atom in History of the Human Thought (Oxford University Press, New York). Descartes, R. (1637). La Geometrie. The Geometry of Rene Descartes, translated to the English by David E. Smith and Marcia L. Latham, (Dover, New York, 1954). Descartes, R. (1664). Le Monde ou le Traité de la Lumère. El mondo o el Tratado de la Luz, translated to the Spanish by Ana Rioja (Alianza Editorial, Madrid, 1991). Newton, I. (1730). Optics. Óptica, translated to the Portuguese by André Koch Assis (Edusp, São Paulo, 1996). Purrington, R. D. (1997). Physics in the Nineteenth Century (Rutgers Univ. Press). Maxwell, J. C. (1954). A Treatise on Electicity and Magnetism (Dover edition, New York). Lucretium, (1995). On the Nature of Things: De rerum natura. Anthony M. Esolen, transl. (The Johns Hopkins Univ. Pr., Baltimore). Mehra, J. (1987). Foundations of Physics 17, 461. Born, M. (1971). The Born-Einstein Letters (Walker and Company, New York). Jammer, M. (1974).The Philosophy of Quantum Mechanics (John Wiley & Sons, New York). Heisenberg, W. (1967). Quantum Theory and its Interpretation in Niels Bohr. His Life and Work as Seen by His Friends and Colleagues, S. Rozental, ed., (North-Holland, Amsterdam), pp. 94-95. Heisenberg, W. (1960). Erinnerungen an di Zeit der Entwicklung der Quantenmechanik in Theoretical Physics in the Twentieth Century. A Memorial Volume to Wolfgang Pauli, M. Fierz and V. F. Weisskopf, eds. (Interscience, New York), pp. 40-47. Mehra, J. and Rechenberg, H. (1982-1988). The Historical Development of Quantum Theory (Springer-Verlag, New York), Vols. 1-6. Pauli, W. (1979). Scientific Correspondence (Springer-Verlag, New York), Vol. I Heisenberg, W. (1971). Physics and Beyond (Harper and Row, New York). Jammer, M. (1966). The Conceptual Development of Quantum Mechanics (McGrawHill, New York), pp. 285-289. Heisenberg, W. (1971). Physics and Beyond: Encounters and Conversations (Harper and Row, New York); Mehra, J. and Rechenberg, H. (1977). The Historical Development of Quantum Theory (Springer-Verlag, New York), Vol. 5, Part 2, Chapter IV, section 5. Heisenberg, W. (1927). Z. Phys. 43, 172.
352
Olavo Leopoldino da Silva Filho
[19] Chibeni, S. S. (2005). Rev. Bras. Ens. Física 27, 181. [20] Bohr, N. (1961). The quantum postulate and the recent development of atomic theory, in Atomic Theory and the Description of Nature (Cambridge University Press, Cambridge), (originally published in (1928). Nature 121, 580.) [21] Heisenberg, W. (1949). The Physical Principles of the Quantum Theory (Dover, New York). [22] Ballentine, L. E. (1970). Rev. Mod. Phys. 42, 358. [23] Duane, W. (1923). Proc. Natl. Acad. Sci. USA 9, 158. Compton, A.H. (1923). Proc. Natl. Acad. Sci. USA 9, 359. [24] Landé, A. (1965). Foundations of Quantum Theory (Cambridge University Press, Cambridge). A. Landé, (1965). New Foundations of Quantum Theory (Cambridge University Press, Cambridge). [25] Reif, F. (1965). Fundamentals of Statistical and Thermal Physics (McGraw-Hill, Singapore). [26] Wallstrom, T. C. (1994). Phys. Rev. A 49, 1613. [27] Gruber, G.R. (1971). Found. of Phys. 1, 227. [28] Gruber, G.R. (1972). Prog. Theo. Phys. 6, 31. [29] Pauli, W. (1950). Die Allgemeinen Prinzipien der Wellenmechanik (J.W. Edwards Publishing Co., Ann Arbor, Michigan). [30] Mehra, J. (1974). The Quantum Principle: its interpretation and epistemology (D. Heidel Publishing Co., Holland). [31] Lanczos, C. (1970). The Variational Principles of Mechanics (Dover, New York), 4th ed. [32] Gradshteyn, I. S. and Ryzhik, I. M. (1980). Table of Integrals, Series and Products (Academic Press, London). [33] Brillouin, L. (1949). Les Tenseurs en Mechanique et en Elasticite (Masson et Cie., Paris) [34] Goldstein, H. (1950). Classical Mechanics (Addison-Wesley, Cambridge). [35] Weinberg, S. (1972). Gravitation and Cosmology, Principles and Applications of the General Theory of Relativity (Willey, New York). [36] Kim, Y.S. and Noz, M. E. (1991). Phase Space Picture of Quantum Mechanics:group theoretical approach (World Scientific, Singapore) , chapter four. [37] Feynman, R. P. and Hibbs, A.R. (1965). Quantum Mechanics and Path Integrals (McGraw-Hill, New York). [38] Pauling, L. and Wilson, E.B. (1963). Introduction to Quantum Mechanics, with applications to chemistry (Dover, New York). [39] Liboff, R. (1990). Kinetic Theory (Prentice-Hall, New Jersey). [40] Born, M. (1949). Natural Philosophy of Cause and Chance (Oxford University Press, Oxford). [41] Moyal, J. E. (1949). Proc. Cambridge Phil. Soc. 45, 99. [42] Takabayasi, T. (1954). Prog. Theoret. Phys. 11, 341. [43] Callen, H. B. (1985). Thermodynamics: an introduction to thermostatistics (John Wiley & Sons, New York), 2nd Ed. [44] Wigner, E. (1932). Phys. Rev. 40, 749. [45] Bohm, D. (1952). Phys. Rev. 85, 166,180. [46] Hillery, M., O’Connell, R. F., Scully, M. O., Wigner, E. P. (1984). Phys. Rep. 106, 121.
References
353
[47] Parr, R. G. and Yang, W. (1989). Density-Functional Theory of Atoms and Molecules (Oxford University Press, New York). [48] Gosh, S. K., Berkowitz, M. and Parr, R. G. (1984). Proc. Natl. Acad. Sci USA 81, 80288031. [49] Gosh, S. K. and Parr, R. G. (1986). Phys. Rev.A 34, 785-791. [50] Berkowitz, M. (1986). Chem. Phys. Lett 129, 486-488. [51] Bartolotti, L. J. and Parr, R. G. (1980). J. Chem. Phys. 72, 1593-1596. [52] Gosh, S. K. (1987). J. Chem. Phys. 87, 3513-1517. [53] Robles, J. (1986). J. Chem. Phys. 85, 7245-7250. [54] Gosh, S. K. and Berkowitz, M. (1985). J. Chem. Phys. 83, 2976-2983. [55] Braffort, P. and Tzara, C. R. (1954). Hebd. Seances Acad. Sci. 239, 157. [56] Keynes, I. (1952). Z. Phys. 132, 81. [57] Weizel, W. (1953). Z. Phys. 134, 264. Weizel, W. (1953). Z. Phys. 135, 270. Weizel, W. (1954). Z. Phys. 136, 582. [58] Kershaw, D. (1964). Phys. Rev. 138, B1850. [59] Comisar, G. G. (1965). Phys. Rev. 138, B1332. [60] Braffort, P., Surdin, M. and Taroni, A. (1965). Hebd. Seahces Acad. Sci. 261,4339. [61] Marshall, T. (1965). Proc. Cambridge Philos. Soc. 61, 537. [62] Bourret, R. C. (1965). Can. J. Phys. 43, 619. [63] Nelson, E. (1966). Phys. Rev. 150, 1079. [64] de la Peña-Auerbach, L. (1982). Found. Phys. 12, 1017. [65] de la Peña-Auerbach, L. (1969). J. Math. Phys. 10, 1620. [66] Pawula, R. F. (1967). Phys. Rev. 162, 186. [67] Heisenberg, W. (1955). The Development of the Interpretation of the Quantum Theory, in Niels Bohr and the Development of Physics, W. Pauli ed. (McGraw-Hill, New York), pp. 12-29. [68] Meyer, P. L. (1969). Probabilidade, aplicações à estatstica. Translated by R.C. Lourenço Filho from Introductory Probability and Statistical Applications (AddisonWesley, Massachusetts). [69] Hagar, A. (2005). Phil. Sci. 72, 468. [70] Schiff, L. I. (1968). Quantum Mechanics (McGraw-Hill, Singapore). [71] Jackson, J. D. (1975). Classical Electrodynamics, (John Wiley & Sons, New York), 2nd Ed. [72] Deaver, B.S. and Fairbank, W. (1961). Phys. Rev. Lett. 7, 43. See also Döll, R. and Nabauer, M. (1961). Phys. Rev. Lett. 7, 51. [73] Boyer, T. H. (1975). Phys. Rev. D 11, 790 and references therein. [74] Boyer, T. H. (1978). Phys. Rev. A 18, 1238. [75] de la Peña, L. (1970). Phys. Lett. 31A, 403. [76] Vigier, J. P. (1979). Lett. Nuovo cimento Soc. Ital. Fis. 24, 265. [77] de la Peña, L. (1971). J. Math. Phys. 12, 453. [78] Berrondo, M. (1973). Nuovo Cimento Soc. Ital. Fis. B18, 95. [79] Weaver, D. L. (1978). Phys. Rev. Lett. 40, 1473. [80] Kuhn, T. (1962). The Structure of Scientific Revolutions (University of Chigago Press, Chicago). [81] Goldberg, A. Schey, H. M., and Schwartz, J. L. (1967). Am. J. Phys. 35, 177. [82] Campi, M. and Harrison, M. (1967). Am. J. Phys. 35, 133.
354
Olavo Leopoldino da Silva Filho
[83] Khinchin, A. I. (1949). Mathematical Foundations of Statistical Mechanics (Dover, New York). [84] Levy, P. (1976). Théorie des erreurs. La loi de Gauss et les lois exceptionelles. In Oeuvres de Paul Levy (Ecole Polytechnique, France). [85] Alonso, D., Muga, J. G., and SalaMayato, R. (2001). Phys. Rev. A 64, 016101. [86] Feynman, R. P. (1998). Statistical Mechanics, a set of lectures (Addison-Wesley, Massachussets). [87] Brody, T. (1993). The Philosophy behind Physics (Springer, Berlin). [88] Landé, A. (1960). From Dualism to Unit in Quantum Mechanics (Cambridge University Press, Cambridge). [89] Morse, P.M. (1929). Phys. Rev. 34, 57. [90] Hanson, N.R. (1959). Am. J. Phys. 27, 1. Shimony, A. (1963). Am. J. Phys. 31, 755. Witmer, E.E. (1963). Am. J. Phys. 35, 40. Wigner, E.P. (1963). Am. J. Phys. 31, 6. Pearle, P. (1967). Am. J. Phys. 35, 742. Wesley, J.P. (1984). Found. of Phys. 14, 155 and many, many others. [91] Dechoum, K., França, H.M. and Malta, C.P., Phys. Lett. A 248, 93 (1998). [92] Stern, O. (1921). Z. Phys. 7, 249. There is an English translation of this papper in Z. Phys. D10, 114 (1988). [93] Gerlach, W. and Stern, O. (1921). Z. Phys. 8, 110, Gerlach, W. and Stern, O. (1922). Z. Phys. 9, 349. See also Taylor, J.B. (1926). Phys. Rev. 28, 581. [94] A. P. French and Taylor, E. F. (1978). An Introduction to Quantum Physics (Norton, New York), chap. 10. [95] Lee, H-W (1995). Phys. Rep. 259, 147. [96] Cohen, L. (1966). J. Math. Phys. 7, 781. [97] Cohen, L. and Zaparovanny, Y.I. (1980). J. Math. Phys. 21, 794. [98] Shewell, J. R. (1959). Amer. J. Phys. 27, 5. [99] Shewell, J.R. (1958). I On the formation of quantum mechanical operators. II The Wigner distribution function. Doctoral Thesis presented to the Rice Institute. Shewell’s doctoral thesis can be downloaded at http://scholarship.rice.edu/bitstream/handle/ 1911/18447/3079882.PDF?sequence=1. [100] Wigner, E.P. (1971). In Perspectives in Quantum Theory, edited by W. Yourgrau and A. van der Merwe (MIT, Cambridge). [101] O’Connel, R.F. and Wigner, E.P. (1981). Phys. Lett. A 83, 145. [102] Parr, R. G., and Yang, W. (1986). Density-functional theory of atoms and molecules (Oxford, New York). [103] Bell, J. (1964). Physics 1, 195. [104] Bell, J., Found. of Phys. 12, 989 (1982). Reprinted in Speakable and unspeakable in quantum mechanics: collected papers on quantum philosophy. CUP, 2004, p. 161. [105] Einstein, A., Podolsky, B., Rosen, N. (1935). Phys. Rev. 47, 777. [106] von Neumann, J. (1996). Mathematical Foundations of Quantum Mechanics (Princeton University Press). [107] Aspect, A., Grangier, P., Roger, G. (1982). Phys. Rev. Lett. 49, 91. J.F. Clauser, M.A. Horne, A. Shimony, R.A. Holt (1969). Phys. Rev. Lett. 23, 880. J.F. Clauser, M.A. Horne (1974). Phys. Rev. D 10, 526. Weihs, G., Jennewein, T., Simon C., Weinfurter, H., Zeilinger, A. (1998). Phys. Rev. Lett. 81, 5039.
References
355
[108] Gibbs, J. W. (1902). Elementary Principles in Statistical Mechanics (New Haven: Yale University Press). [109] Ehrenfest, P. (1959). Welche Züge der Lichquantenhypothese spielen in der Theorie der Wärmestrahlung eine wesentliche Rolle? (1911). Annalen der Physik, 36, 91-118. Reprinted in Bush, (ed.), P. Ehrenfest, Collected Scientific Papers (North-Holland, Amsterdam). [110] Bach, A. (1997). Indistinguishable Classical Particles (Springer, Berlin). [111] Jaynes, E.T. (, 1992). The Gibbs Paradox, In Maximum Entropy and Bayesian Methods, C. R. Smith, G. J. Erickson, & P. O. Neudorfer, Editors (Dordrecht: Kluwer Academic Publishers). [112] Dewdney, C., Holland, P.R., Kyprianidis, A., Maric, Z. and Vigier, J.P. (1986). Phys. Lett. 113A, 359. [113] Kyprianidis, A. (1985), Phys.Lett. 111A, 111. [114] Baym, G. (1973), Lectures on Quantum Mechanics (Addison-Wesley, California). [115] Bohm, D. & Hilley, B. J. (1993). The undivided universe (Routledge, London). [116] Weinberg, S. (1972). Gravitation and Cosmology, principles and applications of the general theory of relativity (John Wiley & Sons, New York). [117] Oppenheimer, J. R. & Snyder, H. (1939). Phys. Rev., 56, 455. [118] Feshbach, H. & Villars, M. (1958).Rev. Mod. Phys. 30, 24. [119] Martin, P. C. & Glauber, R. J. (1958). Phys. Rev., 109, 1307. [120] Greiner, W. (1994). Relativistic Quantum Mechanics Wave Equations (SpringerVerlag, Berlim). [121] Kraus, L. M. (1998). Astroph. J., 494, 95. [122] Nieto, M. M. & Goldman, T. (1991). Phys. Rep., 205 (5), 221. [123] Wigner, E. P. (1957). Rev. Mod. Phys., 29, 255. [124] Wigner, E. P. (1979). Bull. Am. Phys. Soc., 24, 633 (Abstract GA 5). [125] Salecker, H. & Wigner, E. P. (1958). Phys. Rev., 109, 571. [126] Greenberger, D. (1968). Ann. Phys., 47, 116. [127] Davies, P. C. W. & Fang, J. (1982). Proc. Roy. Soc. London A, 381, 469. [128] Hartle, J. B. Time and Prediction in Quantum Cosmology, in Proc. 5th Marcel Grossman Meeting on General Relativity, eds. D.G.Balir and M.J.Buckingham (World Scientific, Singapore, 1989), 107-204. [129] Hartle, J. B. Progress in Quantum Cosmology, in: General Relativity and Gravitation (1989), eds. N.Ashby, D.F.Bartlett and W.Wyss (Cambridge Univ. Press, Cambridge, 1990), 391-417. [130] Alfven, H. (1966). Worlds-Antiworlds, antimatter in cosmology (Freeman, San Francisco). [131] Goldhaber, M. (1956). Science, 124, 218. [132] Morrison, P. (1958). Am. J. Phys., 26, 358. [133] Schiff, L. (1958). Phys. Rev. Lett., 1, 254. [134] Good, M. L. (1961). Phys. Rev., 121, 311. [135] Everett III, H. (1957). Rev. Mod. Phys. 29, 454. [136] Omnès, R. (1999).Understanding Quantum Mechanics (Princeton University Press.), pp. 179, 257. [137] Griffiths, R. B. (2003). Consistent Quantum Theory (Cambridge University Press). [138] Omnès, R. (1999). Quantum Philosophy (Princeton University Press).
AUTHOR’S CONTACT INFORMATION Olavo Leopoldino da Silva Filho Institute of Physics, University of Brasília, Brasília, Brazil Email: [email protected]
INDEX A atomism, x, 3, 4, 5, 7, 9, 10 chemical, 4, 9, 10, 11, 13 physical, ix, x, xi, 4, 5, 7, 9, 11, 13, 15, 18, 19, 21, 22, 24, 27, 29, 30, 31, 32, 33, 36, 38, 51, 53, 54, 57, 61, 66, 67, 71, 72, 81, 98, 104, 105, 114, 117, 118, 123, 125, 138, 141, 142, 144, 147, 148, 151, 156, 157, 161, 162, 163, 165, 168, 171, 181, 183, 187, 188, 203, 205, 208, 213, 221, 227, 229, 230, 231, 232, 233, 237, 240, 241, 247, 248, 250, 252, 253, 261, 263, 265, 266, 267, 269, 271, 275, 283, 285, 294, 306, 307, 311, 313, 317, 318, 319, 320, 321, 322, 323, 326, 327, 328, 329, 330, 331, 332, 333, 335, 342, 345, 347, 348, 349 average momentum, 76, 85, 132, 150, 155, 239, 267, 328 average momentum fluctuation term, 328 average values, 65, 82, 83, 95, 123, 146, 157, 161, 225, 226, 227, 231, 236, 237, 245, 255, 286, 323, 326, 329 calculation, xiii, 12, 15, 22, 42, 68, 82, 89, 106, 112, 144, 170, 174, 211, 220, 225, 227, 234, 258, 305 axiomatization, 33, 34, 349 of a physical theory, 7, 33, 271
particle, xv, xvii, xviii, xix, 3, 13, 21, 24, 25, 26, 28, 29, 31, 32, 37, 39, 47, 49, 51, 52, 53, 54, 55, 57, 59, 61, 63, 87, 88, 93, 94, 97, 98, 104, 106, 113, 114, 117, 120, 121, 122, 124, 139, 140, 142, 147, 148, 152, 162, 164, 166, 181, 182, 183, 187, 188, 193, 195, 196, 198, 202, 203, 204, 205, 206, 211, 212, 216, 217, 218, 219, 220, 222, 240, 243, 249, 256, 257, 258, 261, 262, 263, 267, 269, 278, 286, 291, 294, 295, 298, 299, 300, 302, 303, 304, 305, 306, 307, 308, 313, 314, 324, 327, 328, 336, 347 being and behaving, 327, 328, 329, 347 Bell's inequalities, 223, 253, 322 Bohm's equation, 146, 147, 168, 244, 271, 325, 327 Bohmian equation, 150, 157, 170, 248, 260, 267, 268, 326, 331, 332 Bohmian Mechanics, vi, 266, 267, 269, 332 Bohmian trajectories, xvii, 168, 169, 170, 267 Bohr-Sommerfeld rules, 33, 48, 49, 50, 61, 104, 211 Boltzmann's weight factor, 277, 278 Bragg-Laue diffraction, xv, 31, 50, 51, 52, 53, 55, 57, 61, 113 of particles, xi, xv, 7, 25, 26, 47, 50, 51, 52, 53, 55, 59, 113, 114, 117, 118, 123, 148, 220, 276, 277, 280, 295, 308, 324, 325, 327, 328, 332 Brownian movement, 13, 332
C B behavior, xi, xvi, xvii, xviii, xix, 7, 8, 14, 21, 25, 26, 30, 31, 32, 47, 51, 53, 54, 60, 78, 88, 93, 94, 95, 103, 104, 106, 109, 110, 111, 117, 118, 120, 121, 123, 124, 140, 141, 143, 146, 147, 151, 157, 161, 163, 164, 167, 170, 172, 179, 182, 183, 184, 187, 188, 189, 195, 200, 205, 220, 225, 243, 244, 247, 261, 266, 267, 268, 269, 270, 271, 272, 279, 286, 293, 297, 302, 305, 318, 319, 324, 325, 326, 327, 329, 332, 333, 336, 344, 345, 347, 348
Central Limit Theorem, vi, xi, 34, 77, 81, 103, 124, 125, 126, 135, 149, 152, 153, 156, 177, 239, 243, 244, 252, 289, 344 characteristic function, xi, xvi, 34, 35, 37, 38, 39, 41, 42, 45, 48, 65, 66, 69, 71, 72, 73, 77, 80, 81, 103, 111, 117, 119, 126, 129, 131, 133, 134, 135, 136, 138, 144, 145, 146, 148, 149, 150, 151, 152, 153, 154, 156, 193, 213, 215, 227, 239, 286, 287, 296, 343, 344, 345
360
Index
derivation, x, xi, xii, 34, 36, 37, 39, 43, 50, 61, 62, 65, 66, 69, 71, 72, 73, 75, 78, 80, 84, 92, 93, 94, 96, 98, 100, 101, 103, 111, 112, 117, 119, 123, 126, 127, 131, 132, 135, 136, 137, 138, 143, 146, 147, 148, 149, 150, 153, 154, 156, 213, 215, 227, 239, 244, 246, 251, 255, 260, 266, 267, 271, 272, 286, 287, 295, 307, 312, 326, 331, 343, 344, 345, 347, 348 general relativistic, xii, 272, 286, 295, 296, 298, 299, 312, 313 in generalized coordinates, 73, 76 in spherical coordinates, 41, 42, 43, 178 inversion, 94, 95, 96, 108, 131, 134, 150, 155, 159, 182, 227, 304 relativistic, 33, 34, 41, 112, 123, 186, 197, 254, 264, 269, 271, 283, 285, 286, 287, 290, 291, 292, 293, 294, 298, 300, 303, 304, 305, 311, 312, 313, 318 classical, xi, xix, 15, 16, 17, 18, 19, 21, 22, 23, 26, 27, 28, 29, 32, 34, 39, 40, 41, 62, 63, 65, 66, 74, 82, 90, 93, 94, 95, 110, 113, 114, 122, 124, 136, 140, 141, 147, 159, 162, 166, 167, 172, 174, 177, 178, 179, 182, 185, 186, 187, 188, 193, 194, 197, 199, 200, 205, 206, 211, 215, 218, 219, 226, 227, 236, 245, 248, 249, 251, 252, 253, 262, 264, 265, 270, 271, 273, 274, 275, 277, 278, 280, 281, 283, 303, 313, 315, 321, 322, 325, 340, 341, 345, 347 concepts, ix, 10, 17, 22, 23, 24, 26, 27, 29, 30, 32, 39, 65, 109, 140, 147, 254, 271, 274, 275, 276, 285, 291, 317, 323, 327, 328, 329, 330, 331, 332, 342, 343, 349 classical limit, xi, 110, 185 complementarity, 17, 30, 314, 320, 329, 341, 349, 350 complete set of commuting operators, 200, 319, 321, 322 conceptual abyss between Quantum and Classical Physics, 193 corpuscular, xi, 3, 5, 6, 7, 14, 16, 25, 27, 29, 31, 39, 47, 50, 51, 52, 53, 54, 55, 59, 60, 61, 88, 94, 103, 104, 107, 118, 119, 122, 123, 124, 140, 147, 149, 157, 161, 182, 187, 188, 240, 248, 272, 313, 317, 318, 320, 324, 325, 327, 328, 331, 332, 337, 344, 345, 348 interpretation of light, 3 phenomena, xi, 4, 5, 6, 11, 12, 14, 15, 16, 18, 21, 26, 29, 31, 32, 39, 47, 54, 60, 61, 62, 63, 65, 77, 103, 107, 109, 114, 124, 137, 140, 141, 144, 145, 149, 171, 187, 189, 226, 227, 228, 315, 318, 329, 340 corpuscular models, 54, 147 corpuscular nature, 14, 25, 31, 51, 52, 88, 147, 317, 329
behaving in time, 88 corpuscular natures, 147 corpuscular quantum systems, 94
D Density-Functional Theory, 90, 91, 249, 353 derivation, x, xi, xii, 34, 36, 37, 39, 43, 50, 61, 62, 65, 66, 69, 71, 72, 73, 75, 78, 80, 84, 92, 93, 94, 96, 98, 100, 101, 103, 111, 112, 117, 119, 123, 126, 127, 131, 132, 135, 136, 137, 138, 143, 146, 147, 148, 149, 150, 153, 154, 156, 213, 215, 227, 239, 244, 246, 251, 255, 260, 266, 267, 271, 272, 286, 287, 295, 307, 312, 326, 331, 343, 344, 345, 347, 348 method, x, 4, 21, 22, 43, 54, 57, 62, 65, 67, 80, 151, 153, 163, 193, 211, 213, 214, 237, 239, 241, 242, 248, 286, 287, 289, 293, 298, 326 deterministic, 15, 87, 88, 144, 150, 253, 254, 255, 259, 260, 264, 266, 267, 268, 269, 271, 272, 325, 326, 329, 333, 344 Bohmian equations, 248, 260, 267, 268, 326, 332 differences between behavior and nature, 25 diffraction, xv, 6, 14, 26, 31, 39, 51, 52, 53, 54, 55, 56, 57, 58, 60, 61, 103, 104, 106, 113 Bragg-Laue, xv, 31, 50, 51, 52, 53, 55, 57, 61, 103, 113 by an aperture, 54, 55, 57, 58, 60, 103, 113 pattern, xv, 25, 31, 47, 50, 51, 52, 53, 55, 57, 58, 59, 60, 87, 104, 118, 139, 140, 218, 219, 332 discrete momentum transfer, xv, 51, 56 dispersionless measurements, 259, 321, 323 distinguishable, 275, 276, 278 distribution function, xii, xiii, 76, 77, 81, 84, 103, 127, 153, 159, 227, 229, 230, 231, 232, 238, 240, 249, 251, 252, 273 Boltzmann's, 273, 274, 278, 279, 280 Bose-Einstein's, 273, 274, 278, 279, 280, Fermi-Dirac's, 273, 274, 278, 279, 280 double slit interference, xv, 54, 59, 60 Drei-Männer-Arbeit, 20 duality, x, xi, 3, 27, 28, 29, 30, 32, 33, 39, 53, 107, 139, 140, 141, 147, 318, 325, 328, 329, 347, 350 ontological, 14, 25, 29, 30, 31, 32, 51, 53, 54, 60, 88, 106, 114, 123, 139, 140, 141, 142, 144, 187, 275, 280, 281, 313, 318, 319, 328, 333 dynamic equation, 118, 238 Kramer-Moyal, 238 dynamical equation, vi, 146, 229, 238, 239, 248, 270 Wigner's, 238
Index
E eigenvalues of operators, 329 electromagnetic theory, 3, 4, 7, 11, 14, 119, 271 electromagnetism, 15, 36 energy correlation function, 107 energy functional, 249 ensemble, xv, xvi, xvii, 13, 25, 47, 51, 53, 54, 88, 104, 105, 106, 107, 108, 109, 112, 113, 114, 116, 117, 119, 123, 124, 138, 139, 140, 147, 152, 159, 160, 161, 162, 163, 165, 166, 168, 180, 188, 248, 257, 318, 319, 327, 328, 329, 330, 331, 332, 335, 337, 342, 344, 347, 348 average, xv, 12, 13, 65, 66, 76, 82, 83, 85, 86, 88, 92, 95, 104, 105, 106, 107, 108, 116, 117, 118, 123, 125, 132, 133, 146, 148, 150, 151, 155, 157, 158, 159, 161, 171, 172, 174, 185, 205, 218, 220, 222, 225, 226, 227, 231, 232, 233, 235, 236, 237, 239, 245, 246, 247, 248, 254, 255, 257, 260, 266, 267, 269, 280, 286, 323, 326, 327, 328, 329, 331, 332, 336, 337 interpretation, ix, x, xi, xii, xix, 3, 4, 5, 10, 13, 14, 15, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 36, 37, 38, 39, 53, 54, 57, 59, 60, 61, 62, 63, 64, 65, 85, 87, 88, 98, 106, 107, 109, 112, 113, 114, 116, 117, 119, 120, 122, 123, 134, 135, 137, 138, 139, 140, 141, 142, 143, 149, 157, 168, 182, 185, 187, 188, 189, 203, 205, 206, 221, 222, 223, 226, 227, 228, 231, 236, 240, 248, 254, 256, 270, 272, 279, 280, 283, 285, 294, 297, 302, 308, 312, 313, 315, 317, 318, 319, 320, 321, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 339, 340, 341, 342, 343, 344, 345, 347, 348, 349, 350, 352 simulations, xi, xvi, xvii, 88, 107, 148, 149, 156, 159, 160, 161, 162, 164, 165, 166, 167, 168, 169, 170, 172, 173, 174, 175, 177, 178, 180, 181, 188, 331, 332, 344 entropy, xi, 13, 65, 67, 68, 73, 75, 91, 92, 93, 96, 101, 103, 114, 117, 125, 135, 137, 150, 154, 244, 246, 249, 250, 260, 267, 277, 279, 330, 344, 345, 348 derivation, x, xi, xii, 34, 36, 37, 39, 43, 50, 61, 62, 65, 66, 69, 71, 72, 73, 75, 78, 80, 84, 92, 93, 94, 96, 98, 100, 101, 103, 111, 112, 117, 119, 123, 126, 127, 131, 132, 135, 136, 137, 138, 143, 146, 147, 148, 149, 150, 153, 154, 156, 213, 215, 227, 239, 244, 246, 251, 255, 260, 266, 267, 271, 272, 286, 287, 295, 307, 312, 326, 331, 343, 344, 345, 347, 348 epistemic blocking, 31, 32, 39, 54, 60, 103, 105
361
equation, x, xi, 13, 16, 21, 26, 33, 34, 35, 36, 37, 39, 40, 43, 45, 46, 47, 52, 54, 57, 60, 66, 67, 70, 71, 73, 74, 75, 77, 78, 79, 80, 82, 85, 87, 88, 92, 93, 94, 95, 96, 97, 98, 100, 101, 102, 103, 106, 111, 112, 117, 119, 120, 123, 136, 142, 143, 145, 146, 147, 148, 149, 150, 151, 156, 157, 159, 168, 170, 171, 172, 174, 175, 182, 193, 194, 200, 201, 202, 206, 207, 208, 210, 211, 213, 216, 217, 219, 220, 227, 229, 230, 236, 237, 238, 239, 244, 250, 251, 258, 260, 265, 267, 270, 271, 272, 286, 287, 289, 290, 291, 293, 294, 296, 297, 298, 300, 301, 303, 304, 305, 306, 307, 308, 310, 312, 318, 319, 320, 324, 325, 326, 327, 331, 332, 336, 343, 344, 345, 347, 349 continuity, 35, 36, 66, 73, 74, 80, 100, 280, 325 corpuscular, xi, 3, 5, 6, 7, 14, 16, 25, 27, 29, 31, 39, 47, 50, 51, 52, 53, 54, 55, 59, 60, 61, 88, 94, 103, 104, 107, 118, 119, 122, 123, 124, 140, 147, 149, 157, 161, 182, 187, 188, 240, 248, 272, 313, 317, 318, 320, 324, 325, 327, 328, 331, 332, 337, 344, 345, 348 Einstein's, 297, 298, 300, 301 Hamilton-Jacobi, 267, 294, 324, 325 Klein-Gordon, 112, 123, 285, 287, 291, 293, 298, 303, 305, 312 Liouville, xv, 34, 38, 39, 40, 41, 42, 44, 47, 66, 73, 74, 77, 78, 79, 80, 92, 110, 111, 135, 139, 145, 146, 149, 214, 238, 343 Madelung, 36, 37, 100 Pauli's, 194, 217 Schrödinger, vi, ix, x, xi, 4, 16, 20, 21, 22, 23, 24, 25, 26, 28, 30, 33, 34, 36, 37, 39, 40, 43, 45, 52, 57, 61, 63, 65, 66, 68, 69, 75, 77, 80, 81, 82, 83, 85, 88, 92, 93, 94, 98, 99, 100, 101, 104, 106, 107, 110, 111, 114, 116, 117, 118, 119, 120, 123, 125, 126, 131, 135, 136, 137, 142, 143, 144, 145, 146, 147, 148, 149, 156, 160, 170, 175, 187, 193, 200, 210, 211, 213, 225, 227, 237, 239, 244, 246, 247, 248, 249, 250, 251, 252, 259, 260, 266, 267, 269, 270, 272, 286, 290, 291, 296, 298, 312, 317, 318, 319, 320, 324, 325, 326, 327, 328, 329, 331, 332, 333, 334, 336, 337, 339, 343, 345, 347, 349 second order Dirac, 291, 303, 308, 312 ergodic assumption, 104, 105, 106, 107, 109, 112, 113, 115, 119, 138, 139, 165, 171 ergodic principle, 88, 344 ergodicity, xi, 330, 331, 344 Ewald sphere, xv, 55, 58 experiment black body, 14, 15, 318
362
Index
Compton, 14, 15, 16, 19, 25, 26, 31, 54, 60, 318, 352 diffraction, xv, 6, 14, 26, 31, 39, 51, 52, 53, 54, 55, 56, 57, 58, 60, 61, 103, 104, 106, 113 interference, 5, 6, 14, 26, 31, 39, 47, 59, 60, 61, 104, 106, 110, 113, 124, 139, 140, 148, 161, 320 photoelectric, 14, 15, 54, 60, 318 Stern-Gerlach, vi, xviii, xix, 194, 199, 215, 216, 218, 219, 221, 222, 256, 257, 258, 260, 262, 263, 264 wave-like, 31, 39, 271 experiment(s), x, xi, xv, xviii, xx, 3, 4, 6, 7, 9, 12, 14, 17, 19, 20, 23, 27, 28, 29, 31, 39, 47, 50, 51, 53, 54, 55, 56, 57, 58, 59, 60, 61, 113, 115, 139, 140, 142, 171, 187, 194, 199, 215, 220, 221, 222, 244, 252, 256, 257, 260, 265, 272, 314, 320, 323, 327, 331, 335, 337, 338, 339, 341, 347 wavelike, 14, 31, 47, 50, 52, 54, 60, 104, 139, 142, 188, 285, 320 explanation, 5, 6, 8, 12, 14, 16, 24, 28, 29, 31, 39, 47, 51, 57, 59, 103, 104, 118, 121, 123, 143, 149 inheritance, 35, 39, 283, 292, 326, 331, 345 extensions of the theory, 283 extrinsic, 141, 142, 143, 187, 281 interpretation, ix, x, xi, xii, xix, 3, 4, 5, 10, 13, 14, 15, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 36, 37, 38, 39, 53, 54, 57, 59, 60, 61, 62, 63, 64, 65, 85, 87, 88, 98, 106, 107, 109, 112, 113, 114, 116, 117, 119, 120, 122, 123, 134, 135, 137, 138, 139, 140, 141, 142, 143, 149, 157, 168, 182, 185, 187, 188, 189, 203, 205, 206, 221, 222, 223, 226, 227, 228, 231, 236, 240, 248, 254, 256, 270, 272, 279, 280, 283, 285, 294, 297, 302, 308, 312, 313, 315, 317, 318, 319, 320, 321, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 339, 340, 341, 342, 343, 344, 345, 347, 348, 349, 350, 352 property, 101, 105, 109, 126, 128, 141, 146, 228, 229, 230, 233, 234, 239, 240, 241, 242, 248, 250, 256, 258, 259, 268, 271, 274, 275, 276, 277, 278, 280, 281, 303, 314, 319, 320, 326, 327, 329, 330, 337, 341
287, 294, 295, 297, 298, 299, 300, 302, 305, 308, 313, 317, 325, 332, 336, 337, 344, 345, 350 profiles, xv, xvi, xvii, 85, 86, 162, 163, 167, 170, 180, 181 fluctuation-dissipation process, 331, 332, 344 Fluctuation-Dissipation Theorem, 65, 93, 101, 168, 171, 182, 183, 243, 267, 327 fluctuation(s), xi, xiii, xv, xvi, xvii, xix, 22, 32, 36, 62, 64, 65, 67, 74, 76, 78, 79, 85, 86, 87, 88, 92, 93, 96, 98, 101, 102, 106, 108, 109, 114, 115, 116, 118, 120, 121, 122, 123, 124, 138, 139, 140, 141, 143, 144, 146, 147, 149, 150, 151, 158, 159, 161, 162, 163, 165, 168, 169, 171, 177, 180, 181, 182, 183, 185, 186, 187, 188, 189, 243, 260, 267, 269, 272, 294, 295, 297, 303, 313, 327, 328, 330, 331, 332, 333, 336, 337, 343, 344, 348
G garbage collectors, 350
Gibbs' paradox, 277, 278
H half-integral spin, xiii, xiv, 112, 193, 194, 195, 196, 197, 200, 202, 203, 204, 205, 206, 208, 210, 211, 212, 215, 308, 309, 311 eigenfunctions, 193, 200, 259, 334 particle, xv, xvii, xviii, xix, 3, 13, 21, 24, 25, 26, 28, 29, 31, 32, 37, 39, 47, 49, 51, 52, 53, 54, 55, 57, 59, 61, 63, 87, 88, 93, 94, 97, 98, 104, 106, 113, 114, 117, 120, 121, 122, 124, 139, 140, 142, 147, 148, 152, 162, 164, 166, 181, 182, 183, 187, 188, 193, 195, 196, 198, 202, 203, 204, 205, 206, 211, 212, 216, 217, 218, 219, 220, 222, 240, 243, 249, 256, 257, 258, 261, 262, 263, 267, 269, 278, 286, 291, 294, 295, 298, 299, 300, 302, 303, 304, 305, 306, 307, 308, 313, 314, 324, 327, 328, 336, 347 Heisenberg's relations, 30, 139, 141, 143, 187, 319, 327, 329, 342, 348, hidden variables, ix, 260, 265, 324, 325 hidden variables theories, 253 H-theorem, 92
F Feshbach-Villars decomposition, 305, 306, 308 field, xviii, xix, 4, 7, 8, 11, 12, 13, 14, 20, 22, 30, 46, 61, 90, 93, 98, 117, 118, 119, 121, 122, 147, 148, 151, 180, 181, 187, 188, 189, 196, 212, 215, 216, 217, 218, 219, 220, 235, 254, 270, 272, 283, 286,
I indistinguishable objects, 275, 277 individual systems, 114, 319, 320, 323, 330
Index infinitesimal, 34, 35, 40, 41, 43, 45, 47, 49, 62, 63, 70, 76, 77, 80, 100, 124, 125, 126, 130, 134, 136, 145, 146, 171, 344 character, 6, 15, 16, 17, 32, 35, 37, 52, 60, 76, 77, 105, 109, 115, 116, 119, 135, 136, 138, 140, 144, 145, 159, 182, 227, 252, 254, 256, 261, 293, 317, 326, 330 coordinates, x, xvi, 39, 40, 41, 42, 43, 44, 45, 46, 73, 89, 117, 152, 162, 163, 177, 178, 180, 200, 212, 243, 300, 301, 343 displacement, 48, 118 elements, 10, 11, 22, 24, 33, 41, 43, 73, 92, 131, 142, 189, 195, 208, 218, 254, 313, 317, 320, 337, 343, 348, 349 Fourier transformation, 34, 40, 48, 225 in time, 3, 62, 105, 107, 116, 152, 188, 337, 347 notion of, vi, xi, 3, 4, 7, 8, 10, 12, 13, 15, 19, 21, 23, 31, 32, 51, 54, 61, 65, 75, 88, 104, 110, 111, 117, 120, 121, 125, 134, 135, 138, 140, 141, 142, 149, 187, 188, 193, 199, 212, 225, 228, 241, 248, 253, 260, 266, 269, 277, 280, 292, 303, 313, 314, 318, 319,322, 326, 328, 329, 330, 331, 334, 335, 339, 340, 341, 344 time, xv, xvi, xix, xx, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 19, 20, 22, 23, 24, 26, 28, 29, 30, 31, 32, 47, 48, 51, 53, 54, 55, 56, 57, 59, 60, 62, 63, 72, 73, 74, 77, 88, 90, 94, 95, 96, 97, 98, 99, 104, 105, 106, 107, 108, 109, 112, 113, 114, 115, 116, 117, 118, 120, 121, 122, 123, 138, 139, 140, 141, 143, 147, 151, 152, 155, 159, 162, 164, 166, 167, 168, 171, 172, 180, 184, 189, 208, 217, 219, 220, 228, 229, 247, 248, 254, 257, 269, 270, 271, 272, 280, 290, 291, 292, 293, 294, 296, 297, 299, 300, 302, 304, 305, 313, 317, 318, 319, 322, 323, 325, 326, 327, 328, 330, 331, 334, 335, 336, 337, 338, 339, 340, 342, 343, 344, 345, 347, 348, 350 interference, 5, 6, 14, 26, 31, 39, 47, 59, 60, 61, 104, 106, 110, 113, 124, 139, 140, 148, 161, 320 interpretation, ix, x, xi, xii, xix, 3, 4, 5, 10, 13, 14, 15, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 36, 37, 38, 39, 53, 54, 57, 59, 60, 61, 62, 63, 64, 65, 85, 87, 88, 98, 106, 107, 109, 112, 113, 114, 116, 117, 119, 120, 122, 123, 134, 135, 137, 138, 139, 140, 141, 142, 143, 149, 157, 168, 182, 185, 187, 188, 189, 203, 205, 206, 221, 222, 223, 226, 227, 228, 231, 236, 240, 248, 254, 256, 270, 272, 279, 280, 283, 285, 294, 297, 302, 308, 312, 313, 315, 317, 318, 319, 320, 321, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 339, 340, 341, 342, 343, 344, 345, 347, 348, 349, 350, 352
363
coherent histories, 337, 349 Copenhagen, vii, ix, 18, 23, 24, 26, 27, 28, 256, 317, 318, 321, 322, 328, 329, 333, 334, 340, 341 David Bohm’s, vii, 324 kinematic, 19, 20, 27, 39 monist, 14, 54, 103, 107, 114 of light, 3, 4, 5, 6, 7, 8, 9, 13, 14, 16, 17, 25, 270, 317 probabilistic, 25, 253, 254 relative state, vii, 333 statistical, vii, 13, 24, 254, 267, 325, 328, 329, 330, 331, 332, 342, 344, 352, 353, 354, 355 stochastic, xi, xvii, 64, 77, 78, 80, 81, 85, 92, 93, 94, 95, 96, 97, 98, 101, 103, 106, 109, 110, 111, 117, 118, 120, 122, 123, 125, 126, 135, 136, 137, 138, 139, 143, 144, 145, 146, 147, 148, 149, 152, 157, 159, 161, 164, 171, 172, 173, 177, 180, 187, 239, 240, 242, 254, 266, 267, 283, 286, 297, 298, 326, 327, 328, 331, 332, 339, 342, 344, 345, 347, 348 visualizable, xviii, 24, 28, 32, 51, 54, 57, 60, 113, 148, 204 intrinsic, 5, 141, 143, 182, 183, 187, 198, 320, 329, 332, 348 interpretation, ix, x, xi, xii, xix, 3, 4, 5, 10, 13, 14, 15, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 36, 37, 38, 39, 53, 54, 57, 59, 60, 61, 62, 63, 64, 65, 85, 87, 88, 98, 106, 107, 109, 112, 113, 114, 116, 117, 119, 120, 122, 123, 134, 135, 137, 138, 139, 140, 141, 142, 143, 149, 157, 168, 182, 185, 187, 188, 189, 203, 205, 206, 221, 222, 223, 226, 227, 228, 231, 236, 240, 248, 254, 256, 270, 272, 279, 280, 283, 285, 294, 297, 302, 308, 312, 313, 315, 317, 318, 319, 320, 321, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 339, 340, 341, 342, 343, 344, 345, 347, 348, 349, 350, 352 property, 101, 105, 109, 126, 128, 141, 146, 228, 229, 230, 233, 234, 239, 240, 241, 242, 248, 250, 256, 258, 259, 268, 271, 274, 275, 276, 277, 278, 280, 281, 303, 314, 319, 320, 326, 327, 329, 330, 337, 341 irreversibility, 13, 92, 115 irreversible, 115, 116, 117, 340 isolated system(s), 34, 40, 66, 81, 94, 117, 148, 333, 336, 337, 339, 348
K kinematic, 19, 20, 27, 39
364
Index
variables, 19, 20, 37, 38, 39, 45, 46, 47, 85, 100, 113, 122, 131, 138, 139, 145, 153, 198, 200, 204, 212, 251, 258, 268, 297, 301, 312, 319 Kinetic Theory of Gases, 3, 9, 12, 14, 25
L Langevin equation(s), xi, 34, 85, 87, 88, 101, 106, 118, 146, 147, 148, 149, 150, 151, 156, 157, 168, 169, 170, 171, 182, 183, 187, 188, 189, 225, 242, 259, 260, 267, 268, 272, 327, 328, 332, 345, 348 Langevin random systems light corpuscular theory of, 7 nature and properties, 4 nature of, 3, 4, 5, 6, 9, 10, 13, 14, 15, 25, 26, 34, 36, 51, 53, 72, 106, 136, 139, 140, 144, 147, 169, 178, 206, 253, 256, 318, 320, 348 propagation, 4, 7, 8, 14, 124, 336 quanta, 15, 16, 17, 122, 313 lines of force, 7, 8 locality, 253, 254, 260, 261, 265, 326
M Markovian, 100, 103, 171 matrix representation, 193, 198, 203, 256, 308 matter, x, xi, xiv, 3, 4, 7, 8, 9, 10, 11, 13, 14, 22, 25, 26, 29, 31, 43, 51, 56, 105, 114, 118, 120, 124, 142, 144, 161, 193, 195, 222, 226, 227, 233, 252, 258, 260, 267, 273, 274, 275, 276, 297, 299, 302, 304, 305, 306, 310, 311, 317, 321, 322, 327, 332 nature of, 3, 4, 5, 6, 9, 10, 13, 14, 15, 25, 26, 34, 36, 51, 53, 72, 106, 136, 139, 140, 144, 147, 169, 178, 206, 253, 256, 318, 320, 348 Maxwellian, xviii, 82, 92, 215 mean field, 92, 119, 123, 144, 148, 259, 272, 328, 332, 336, 337, 348 mean field theory, 119, 123, 144, 259, 272, 328, 332, 348 mean-square deviations, 66, 161 Measurement Theory, 256 model, xviii, 5, 6, 7, 8, 9, 14, 25, 50, 61, 93, 144, 188, 193, 195, 204, 206, 253, 256, 325, 350 corpuscular, xi, 3, 5, 6, 7, 14, 16, 25, 27, 29, 31, 39, 47, 50, 51, 52, 53, 54, 55, 59, 60, 61, 88, 94, 103, 104, 107, 118, 119, 122, 123, 124, 140, 147, 149, 157, 161, 182, 187, 188, 240, 248, 272, 313, 317, 318, 320, 324, 325, 327, 328, 331, 332, 337, 344, 345, 348 models, 6, 8, 24, 29, 195, 206, 253, 313 corpuscular and undulatory, 3, 5, 7, 16, 29, 60 momentum exchange, 31, 51, 57, 61
quantized, xv, 51, 52, 54, 56, 57, 59, 60, 61, 122, 171, 172, 200 momentum fluctuations, xiii, 67, 89, 109, 118, 169, 175, 177 momentum partition function, xiii, 71, 72 momentum transfer, 51, 54, 55, 56, 58, 61
N nature, xviii, 3, 4, 5, 6, 9, 10, 13, 14, 15, 18, 21, 24, 25, 26, 28, 30, 34, 36, 51, 53, 56, 72, 106, 107, 115, 117, 136, 139, 140, 142, 144, 147, 148, 169, 177, 178, 179, 187, 205, 206, 216, 253, 256, 270, 313, 318, 319, 320, 325, 329, 347, 348, 350 continuum, 9, 14, 15, 16, 37, 280 discrete, xv, 9, 14, 16, 23, 26, 51, 56, 104 wave, x, xvi, 3, 5, 7, 8, 14, 17, 21, 22, 23, 24, 25, 26, 27, 28, 29, 31, 39, 47, 53, 54, 55, 60, 61, 88, 106, 109, 113, 114, 117, 123, 124, 139, 140, 142, 147, 161, 228, 229, 240, 270, 271, 318, 319, 320, 324, 325, 326, 327, 328, 329, 334, 336, 337, 341, 345,347 negative linear momentum, 124 Newtonian Limit, 182, 183, 187 "no-go" theorems, 254 non-local, xii, 253, 260, 261, 265, 266, 267, 268, 269, 270, 271, 272, 283, 326, 347 interactions, 8, 31, 267, 271, 272 potential, xvi, xvii, 35, 37, 40, 50, 51, 85, 88, 92, 99, 102, 117, 119, 120, 121, 122, 123, 124, 138, 149, 150, 157, 174, 175, 176, 177, 180, 187, 204, 212, 230, 238, 245, 268, 269, 270, 271, 287, 294, 297, 298, 301, 303, 324, 326, 327, 331, 336, 344, 345, 347
O objective approach, 119 objectivism, 226 observable(s), 19, 65, 205, 236, 251, 254, 255, 256, 259, 319, 321, 322, 323, 327, 339, 348 observer, 54, 65, 93, 141, 142, 147, 182, 187, 188, 235, 265, 304, 320, 323, 329, 333, 334, 335, 336 ontological, 14, 25, 29, 30, 31, 32, 51, 53, 54, 60, 88, 106, 114, 123, 139, 140, 141, 142, 144, 187, 275, 280, 281, 313, 318, 319, 328, 333 reduction, 26, 29, 30, 31, 32, 53, 60, 107, 116, 117, 142, 147, 187, 272, 320, 328, 329, 333, 334, 336, 339, 341, 345, 347 ontological downsizing, 333 ontology, xi, 12, 14, 25, 26, 29, 31, 47, 54, 60, 103, 106, 107, 123, 187, 188, 324, 328
Index corpuscular, xi, 3, 5, 6, 7, 14, 16, 25, 27, 29, 31, 39, 47, 50, 51, 52, 53, 54, 55, 59, 60, 61, 88, 94, 103, 104, 107, 118, 119, 122, 123, 124, 140, 147, 149, 157, 161, 182, 187, 188, 240, 248, 272, 313, 317, 318, 320, 324, 325, 327, 328, 331, 332, 337, 344, 345, 348 corpuscular and undulatory, 3, 5, 7, 16, 29, 60 monist, 14, 54, 103, 107, 114 operator formation, xii, 84, 170, 229, 230, 236, 237, 251, 255, 256, 259, 272, 321, 329, 342 Dirac's rule, 235, 236, von Neumann’s rules, 235, 236
365
289, 293, 299, 302, 305, 318, 319, 322, 323, 328, 330, 332, 333, 343, 345, 349 probability density, xi, xiii, xv, xvi, xvii, xviii, xix, 25, 34, 35, 40, 48, 50, 56, 66, 67, 78, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 110, 111, 113, 116, 117, 125, 126, 132, 136, 137, 138, 143, 144, 146, 150, 152, 156, 159, 160, 161, 164, 167, 170, 182, 183, 184,185, 186, 203, 225, 226, 228, 229, 230, 231, 232, 235, 239, 240, 241, 242, 243, 246, 248, 249, 250, 252, 260, 267, 268, 269, 283, 286, 288, 291, 292, 293, 295, 298, 303, 308, 312, 324, 327, 342, 344, 348
Weyl's rule, 236, 237, 255, 321 operators, 37, 38, 43, 73, 74, 96, 98, 193, 198, 200, 202, 204, 206, 208, 222, 226, 227, 230, 231, 235, 236, 237, 251, 254, 255, 256, 259, 312, 319, 320, 321, 322, 339, 342, 348, 354 optics, 4, 5, 7, 8, 14
P path integral approach, 49, 62, 63, 65, 343 phase space distribution function, 65, 77, 81, 228, 231, 239, 240, 244 phase space distributions, 226, 228 pilot wave, 324, 326, 328 principle, 3, 5, 6, 17, 19, 27, 28, 29, 30, 32, 54, 60, 68, 77, 88, 110, 120, 139, 140, 142, 151, 187, 217, 218, 251, 269, 272, 311, 312, 313, 318, 319, 320, 328, 339, 341, 344, 347, 349 of Complementarity, 15, 329 of wavepacket reduction, 54 principle, xvii, 16, 17, 18, 19, 25, 28, 29, 30, 62, 109, 139, 140, 141, 142, 185, 186, 278, 314, 318, 319, 320, 336, 337, 341, 342, 343, 345, 349, 352 probability, xi, xiii, xv, xvi, xvii, xviii, xix, 12, 25, 34, 35, 37, 38, 40, 47, 48, 50, 56, 63, 66, 67, 76, 77, 78, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 101, 106, 110, 111, 112, 113, 116, 117, 125, 126, 127, 128, 131, 132, 136, 137, 138, 142, 143, 144, 146, 149, 150, 152, 153, 156, 157, 158, 159, 160, 161, 163, 164, 167, 168, 169, 170, 175, 178, 179, 180, 181, 182, 183, 184, 185, 186, 188, 189, 193, 200, 203, 210, 215, 225, 226, 227, 228, 229, 230, 231, 232, 233, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 246, 247, 248, 249, 250, 252, 258, 260, 265, 267, 268, 269, 270, 273, 274, 275, 283, 286, 288, 289, 291, 292, 293, 295, 297, 298, 299, 302, 303, 308, 312, 318, 319, 322, 323, 324, 327, 328, 329, 330, 332, 333, 340, 341, 342, 343, 344, 345, 347, 348, 349, 350 amplitude, 19, 36, 37, 49, 50, 101, 110, 142, 157, 170, 181, 210, 225, 228, 239, 255, 267, 268,
Q quantization, x, 15, 18, 31, 39, 40, 46, 50, 55, 57, 58, 73, 82, 120, 121, 122, 123, 193, 200, 205, 206, 211, 237, 280, 312, 322 in generalized coordinates, 73, 76 spatial, 46, 55, 56, 57, 59, 121, 261 quantized momentum transfer, 57, 59, 61 quantizing, 39 in generalized curvilinear orthogonal coordinates, 39 quantum jumps, 23, 26 Quantum Mechanics, vi, vii, ix, xi, xx, 13, 14, 15, 16, 19, 20, 21, 22, 23, 24, 26, 27, 29, 32, 33, 34, 39, 40, 46, 47, 51, 54, 58, 65, 68, 76, 77, 84, 88, 90, 92, 93, 94, 105, 106, 109, 112, 114, 119, 120, 121, 122, 123, 124, 125, 136, 138, 139, 141, 142, 144, 145, 146, 147, 150, 153, 169, 170, 171, 180, 185, 187, 189, 193, 194, 195, 202, 205, 206, 210, 211, 225, 226, 227, 228, 235, 236, 239, 250, 253, 254, 255, 256, 258, 259, 260, 261, 264, 266, 267, 268, 270, 271, 272, 273, 283, 285, 286, 295, 313, 315, 317, 318, 321, 322, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 336, 338, 339, 340, 341, 342, 343, 344, 345, 347, 348, 349, 350, 351, 352, 353, 354, 355 axiomatization, 33, 34, 349 operator formation in, 84, 170 stochastic approach, 93, 94, 103, 109, 123, 147, 149, 342, 344, 345 visualization, 21, 22, 24, 27 visualization of, 21 quantum potential, 87, 102, 118, 121, 124, 150, 157, 168, 169, 244, 266, 267, 270, 294, 324, 325, 326, 328, 332
R random fluctuations, 75, 144, 145, 266, 299
366
Index
random process, 63, 65 random variables, 77, 113, 126, 127, 129, 131, 134, 136, 137, 138, 145, 146, 152, 153, 344 sum of, 77, 94, 110, 127, 129, 132, 136, 137, 153, 254, 256, 347 randomness, 32, 62, 64, 65, 110, 140, 147, 148, 149, 188, 189, 272, 330, 331, 332, 336, 341, 343, 344 intrinsic, 5, 141, 143, 182, 183, 187, 198, 320, 329, 332, 348 reality, 12, 22, 25, 144, 226, 253, 254, 326, 349 reality and objectivity, 349 reduction of the wave packet, vii, 31, 32, 107, 116, 117, 142, 147, 187, 272, 317, 320, 329, 333, 339, 341, 345 references, vii, 351 relativistic generalization, 283, 286, 290 reversible problem, 115
S sampling over the momentum subspace, 151 Schrödinger equation, x, xi, 33, 34, 36, 37, 39, 40, 43, 45, 52, 57, 61, 63, 65, 66, 68, 69, 75, 77, 80, 81, 82, 83, 85, 92, 93, 94, 98, 99, 100, 101, 104, 106, 107, 110, 111, 117, 118, 119, 120, 123, 125, 126, 131, 135, 136, 137, 142, 143, 144, 145, 146, 147, 148, 149, 156, 170, 175, 187, 193, 200, 210, 211, 213, 225, 227, 237, 239, 244, 246, 247, 248, 249, 250, 251, 252, 259, 260, 266, 267, 269, 270, 272, 286, 290, 291, 296, 298, 312, 318, 319, 320, 324, 325, 326, 327, 328, 331, 332, 333, 334, 336, 337, 339, 343, 345, 347, 349 in generalized coordinates, 73, 76 mathematical derivation of the, 33, 93, 118, 193 Schrödinger's Cat Paradox, 329 second order, 35, 36, 45, 69, 70, 76, 81, 92, 99, 100, 102, 103, 124, 125, 126, 130, 134, 135, 136, 137, 145, 146, 148, 149, 154, 156, 174, 177, 195, 196, 200, 206, 217, 239, 248, 289, 290, 291, 298, 303, 305, 308, 312, 344 displacement, 48, 118 expansion, 35, 36, 102, 125, 126, 130, 134, 135, 136, 145, 148, 344, 347 of displacement, 62 single system, xv, xvii, 30, 31, 48, 88, 104, 105, 106, 107, 109, 112, 113, 114, 115, 116, 117, 119, 138, 139, 147, 159, 165, 166, 171, 172, 186, 248, 330, 337, 342, 343, 344, 345, 347, 348, 349 average, xv, 12, 13, 65, 66, 76, 82, 83, 85, 86, 88, 92, 95, 104, 105, 106, 107, 108, 116, 117, 118, 123, 125, 132, 133, 146, 148, 150, 151, 155, 157, 158, 159, 161, 171, 172, 174, 185, 205, 218, 220, 222, 225, 226, 227, 231, 232, 233,
235, 236, 237, 239, 245, 246, 247, 248, 254, 255, 257, 260, 266, 267, 269, 280, 286, 323, 326, 327, 328, 329, 331, 332, 336, 337 interpretation, ix, x, xi, xii, xix, 3, 4, 5, 10, 13, 14, 15, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 36, 37, 38, 39, 53, 54, 57, 59, 60, 61, 62, 63, 64, 65, 85, 87, 88, 98, 106, 107, 109, 112, 113, 114, 116, 117, 119, 120, 122, 123, 134, 135, 137, 138, 139, 140, 141, 142, 143, 149, 157, 168, 182, 185, 187, 188, 189, 203, 205, 206, 221, 222, 223, 226, 227, 228, 231, 236, 240, 248, 254, 256, 270, 272, 279, 280, 283, 285, 294, 297, 302, 308, 312, 313, 315, 317, 318, 319, 320, 321, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 339, 340, 341, 342, 343, 344, 345, 347, 348, 349, 350, 352 simulation(s), xi, xvi, xvii, 32, 88, 107, 123, 148, 149, 156, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 171, 170, 172, 173, 174, 175, 177, 178, 180, 181, 186, 188, 189, 331, 332, 344 source of randomness, 94 Special and General Theories of Relativity, 285, 312 spectral analysis, 14 spectral density, 108 spinors, 112, 193, 215, 307, 308, 309 statistical correlations, 269, 327 statistical tools, 37, 269 statistically independent random variables, 113 Stern-Gerlach apparatus, 218, 262, 264 stochastic, xi, xvii, 64, 77, 78, 80, 81, 85, 92, 93, 94, 95, 96, 97, 98, 101, 103, 106, 109, 110, 111, 117, 118, 120, 122, 123, 125, 126, 135, 136, 137, 138, 139, 143, 144, 145, 146, 147, 148, 149, 152, 157, 159, 161, 164, 171, 172, 173, 177, 180, 187, 239, 240, 242, 254, 266, 267, 283, 286, 297, 298, 326, 327, 328, 331, 332, 339, 342, 344, 345, 347, 348 accelerations, 97 geodesics, 295, 297 Liouville equation, xv, 34, 38, 39, 40, 41, 42, 44, 66, 73, 74, 77, 78, 79, 80, 110, 111, 135, 139, 149, 214, 238, 343 processes, 24, 26, 28, 29, 64, 77, 92, 100, 125, 137, 138, 145, 146, 147, 157, 171, 240, 242, 331, 332, 344 stochastic approach, ix, x, xi, xii, xvi, 4, 5, 10, 11, 12, 13, 15, 17, 18, 20, 21, 22, 23, 24, 25, 28, 31, 32, 34, 36, 37, 40, 47, 48, 50, 52, 53, 54, 58, 59, 60, 62, 63, 65, 66, 69, 77, 80, 84, 87, 93, 94, 97, 103, 104, 105, 107, 109, 112, 114, 116, 117, 119, 121, 123, 125, 138, 140, 141, 142,
Index 143, 144, 146, 148, 149, 150, 151, 152, 154, 169, 170, 171, 174, 175, 180, 182, 183, 184, 186, 187, 188, 189, 198, 210, 213, 223, 225, 226, 227, 235, 237, 239, 241, 242, 247, 251, 254, 259, 266, 267, 268, 269, 277, 278, 279, 280, 283, 286, 293, 294, 299, 304, 308, 312, 313, 315, 317, 318, 321, 325, 326, 327, 328, 329, 330, 331, 332, 333, 336, 337, 339, 340, 341, 342, 343, 344, 345, 347, 352 velocities, 4, 7, 12, 53, 57, 94, 95, 97, 98, 103, 265, 266, 269 velocity, xviii, 4, 6, 7, 12, 26, 30, 54, 58, 62, 93, 94, 95, 96, 97, 98, 101, 109, 118, 170, 182, 215, 218, 219, 220, 266, 267, 270, 273, 297, 326, 343 stochastic force, 78, 93, 98, 161, 171, 297, 326 Stochastic Liouville equation, 78, 80, 81, 110, 112, 136, 137, 143, 145, 146, 239, 251, 286, 345 stochastic support, 92, 93, 94, 138, 254, 342, 348 structure of the fluctuations, 144 subjectivism, 226
T temperature, 10, 12, 74, 75, 102, 118, 174, 183, 188, 249, 328, 337 local, 82, 92, 118, 121, 122, 174, 188, 249, 253, 254, 260, 261, 264, 266, 267, 269, 271, 283, 326, 327, 347 transformation(s), 27, 28, 34, 35, 39, 41, 42, 44, 46, 47, 49, 62, 63, 76, 145, 194, 197, 207, 208, 286, 305, 343 canonical, 13, 44, 45, 46, 47, 76, 95, 119, 256, 287, 290, 313, 323, 332, 348 Mathieu's canonical, 46, 47 translational symmetry, 58, 97 tunnel effect, 119, 123, 124
U uncertainties, 319, 320 uncertainty, 15, 17, 26, 28, 29, 30, 329, 349 uncertainty relations, 24, 28, 29 undulatory, xi, 3, 5, 6, 7, 8, 16, 21, 25, 26, 29, 30, 31, 39, 47, 51, 52, 53, 60, 61, 88, 124, 147, 148, 149, 318, 320, 325, 328 interpretation of light, 3 undulatory behavior, xi, 29, 47, 88, 124, 147, 148, 149, 328 undulatory mechanics, 21 undulatory phenomena, 147 undulatory theory, 3, 5, 8, 53
367
V variables, 19, 20, 37, 38, 39, 45, 46, 47, 85, 100, 113, 122, 131, 138, 139, 145, 153, 198, 200, 204, 212, 251, 258, 268, 297, 301, 312, 319 non-commuting, 19, 202, 251, 256, 348
W wave, x, xvi, 3, 5, 7, 8, 14, 17, 21, 22, 23, 24, 25, 26, 27, 28, 29, 31, 39, 47, 53, 54, 55, 60, 61, 88, 106, 109, 113, 114, 117, 123, 124, 139, 140, 142, 147, 161, 228, 229, 240, 270, 271, 318, 319, 320, 324, 325, 326, 327, 328, 329, 334, 336, 337, 341, 345,347 behavior, xi, xvi, xvii, xviii, xix, 7, 8, 14, 21, 25, 26, 30, 31, 32, 47, 51, 53, 54, 60, 78, 88, 93, 94, 95, 103, 104, 106, 109, 110, 111, 117, 118, 120, 121, 123, 124, 140, 141, 143, 146, 147, 151, 157, 161, 163, 164, 167, 170, 172, 179, 182, 183, 184, 187, 188, 189, 195, 200, 205, 220, 225, 243, 244, 247, 261, 266, 267, 268, 269, 270, 271, 272, 279, 286, 293, 297, 302, 305, 318, 319, 324, 325, 326, 327, 329, 332, 333, 336, 344, 345, 347, 348 equation, x, xi, 13, 16, 21, 26, 33, 34, 35, 36, 37, 39, 40, 43, 45, 46, 47, 52, 54, 57, 60, 66, 67, 70, 71, 73, 74, 75, 77, 78, 79, 80, 82, 85, 87, 88, 92, 93, 94, 95, 96, 97, 98, 100, 101, 102, 103, 106, 111, 112, 117, 119, 120, 123, 136, 142, 143, 145, 146, 147, 148, 149, 150, 151, 156, 157, 159, 168, 170, 171, 172, 174, 175, 182, 193, 194, 200, 201, 202, 206, 207, 208, 210, 211, 213, 216, 217, 219, 220, 227, 229, 230, 236, 237, 238, 239, 244, 250, 251, 258, 260, 265, 267, 270, 271, 272, 286, 287, 289, 290, 291, 293, 294, 296, 297, 298, 300, 301, 303, 304, 305, 306, 307, 308, 310, 312, 318, 319, 320, 324, 325, 326, 327, 331, 332, 336, 343, 344, 345, 347, 349 wave like, 106, 109, 347 character of the Schrödinger equation, 109 nature, xviii, 3, 4, 5, 6, 9, 10, 13, 14, 15, 18, 21, 24, 25, 26, 28, 30, 34, 36, 51, 53, 56, 72, 106, 107, 115, 117, 136, 139, 140, 142, 144, 147, 148, 169, 177, 178, 179, 187, 205, 206, 216, 253, 256, 270, 313, 318, 319, 320, 325, 329, 347, 348, 350 weight functions, xii, 278, 279, 280 weirdness, 32, 116, 142, 161, 270, 315, 333, 349, 350 Weiszäcker term, 249
368 Weyl's quantization rule, 322 Wiener-Khinchin relation(s), 108, 138
Index Wigner distribution, 84, 92, 232, 354