464 73 7MB
English Pages 416 [437] Year 2015
QUANTUM MECHANICS
SIXTH EDITION
rae • napoltano
PHYSICS
“This is a very versatile textbook, which could be used in a variety of courses ranging from an ‘honors’ introductory course to a challenging undergraduate upper-class course. … The book is divided into parts, making it easy for an instructor to choose the relevant material based on the level of the class.” —Robert Pelcovits, Professor of Physics, Brown University
“This book by Rae and Napolitano distinguishes itself with a unique approach by including more material on practical applications of the theoretical concepts … a great choice of textbook for upper-class undergraduate students in physics or students entering graduate studies in engineering schools.” —Professor Chunhui Chen, Iowa State University “This text provides an updated treatment of quantum mechanics, suitable for the standard seniorlevel undergraduate course at U.S. colleges and universities.” —Dr. Pete Markowitz, Professor, Department of Physics, Florida International University
K24456
an informa business
6000 Broken Sound Parkway, NW Suite 300, Boca Raton, FL 33487 711 Third Avenue New York, NY 10017 2 Park Square, Milton Park Abingdon, Oxon OX14 4RN, UK
ISBN: 978-1-4822-9918-2
90000 9 78 1 482 299 1 82
w w w. c rc p r e s s . c o m
sixth edition
Quantum Mechanics, Sixth Edition builds on its highly praised predecessors to make the text even more accessible to a wider audience. This sixth edition contains three new chapters that review prerequisite physics and mathematics, laying out the notation, formalism, and physical basis necessary for the rest of the book. Along with more problems, this edition also presents short descriptions of numerous applications relevant to the physics discussed, offering a brief look at what quantum mechanics has made possible industrially and scientifically.
SIXTH EDITION
QUANTUM MECHANICS
“The new sixth edition of this well-known textbook should be thought of as one of the best options available for undergraduate quantum mechanics courses, among a very large class of introductory books. … Detailed worked examples and asides on associated applications of the principles discussed enhance the educational aspects of this book.” —Aaron Lindenberg, Associate Professor, Department of Materials Science and Engineering/Photon Science, Stanford University/SLAC National Accelerator Laboratory
QUANTUM MECHANICS
Alastair I. M. Rae Jim Napolitano
QUANTUM MECHANICS SIXTH EDITION
QUANTUM MECHANICS SIXTH EDITION
Alastair I. M. Rae
University of Birmingham, U.K.
Jim Napolitano
Temple University, U.S.A.
Boca Raton London New York
CRC Press is an imprint of the Taylor & Francis Group, an informa business
CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2016 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Version Date: 20151105 International Standard Book Number-13: 978-1-4822-9921-2 (eBook - PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com
To Angus and Gavin
To Julia
Contents PART I
Waves, Electromagnetism, and the Limits of Classical Physics
CHAPTER 1 The Physics and Mathematics of Waves 1.1 1.2 1.3 1.4 1.5
A REVIEW OF SIMPLE HARMONIC MOTION THE STRETCHED STRING EQUATION OF MOTION STANDING WAVES, AND FOURIER SERIES THE FOURIER TRANSFORM PROBLEMS
CHAPTER 2 Maxwell’s Equations and Electromagnetic Waves 2.1 2.2 2.3 2.4 2.5 2.6
MAXWELL’S EQUATIONS AS INTEGRALS SURFACE THEOREMS IN VECTOR CALCULUS MAXWELL’S EQUATIONS AS DERIVATIVES ELECTROMAGNETIC WAVES ELECTROMAGNETIC RADIATION PROBLEMS
CHAPTER 3 Particle Mechanics, Relativity, and Photons 3.1 3.2 3.3 3.4
NEWTON, MAXWELL, AND EINSTEIN SPACETIME IN SPECIAL RELATIVITY VELOCITY, MOMENTUM, AND ENERGY PROBLEMS
CHAPTER 4 The Early Development of Quantum Mechanics 4.1 4.2
THE PHOTOELECTRIC EFFECT THE COMPTON EFFECT
3 4 7 11 16 20
23 25 29 32 37 39 41
45 46 49 52 58
61 62 65 vii
viii Contents
4.3 4.4 4.5 4.6 4.7
PART II
LINE SPECTRA AND ATOMIC STRUCTURE DE BROGLIE WAVES WAVE-PARTICLE DUALITY THE REST OF THIS BOOK PROBLEMS
67 69 72 76 77
Elementary Wave Mechanics
¨ CHAPTER 5 The One-dimensional Schrodinger Equations 81 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8
¨ THE TIME-DEPENDENT SCHRODINGER EQUATION ¨ THE TIME-INDEPENDENT SCHRODINGER EQUATION BOUNDARY CONDITIONS THE INFINITE SQUARE WELL THE FINITE SQUARE WELL QUANTUM MECHANICAL TUNNELLING THE HARMONIC OSCILLATOR PROBLEMS
¨ CHAPTER 6 The Three-dimensional Schrodinger Equations 6.1 6.2 6.3 6.4 6.5
PART III
THE WAVE EQUATIONS SEPARATION IN CARTESIAN COORDINATES SEPARATION IN SPHERICAL POLAR COORDINATES THE HYDROGENIC ATOM PROBLEMS
82 85 87 87 91 95 102 106
109 110 111 115 124 130
Formal Foundations
CHAPTER 7 The Basic Postulates of Quantum Mechanics 135 7.1 7.2 7.3 7.4 7.5 7.6 7.7
THE WAVE FUNCTION THE DYNAMICAL VARIABLES PROBABILITY DISTRIBUTIONS COMMUTATION RELATIONS THE UNCERTAINTY PRINCIPLE THE TIME DEPENDENCE OF THE WAVE FUNCTION DEGENERACY
137 137 143 150 153 158 160
Contents ix
7.8 7.9
THE HARMONIC OSCILLATOR AGAIN 163 THE MEASUREMENT OF MOMENTUM BY COMPTON SCATTERING 165 7.10 PROBLEMS 169
CHAPTER 8 Angular Momentum I 8.1 8.2 8.3 8.4 8.5
THE ANGULAR-MOMENTUM OPERATORS 174 THE ANGULAR MOMENTUM EIGENVALUES AND EIGENFUNCTIONS 176 THE EXPERIMENTAL MEASUREMENT OF ANGULAR MOMENTUM 179 A GENERAL SOLUTION TO THE ANGULARMOMENTUM EIGENVALUE PROBLEM 182 PROBLEMS 188
CHAPTER 9 Angular Momentum II 9.1 9.2 9.3 9.4 9.5 9.6 9.7
PART IV
173
MATRIX REPRESENTATIONS PAULI SPIN MATRICES SPIN AND THE QUANTUM THEORY OF MEASUREMENT DIRAC NOTATION SPIN-ORBIT COUPLING AND THE ZEEMAN EFFECT A MORE GENERAL TREATMENT OF THE COUPLING OF ANGULAR MOMENTA PROBLEMS
191 191 195 197 200 202 209 216
Extensions and Approximation Schemes
CHAPTER 10 Time-independent Perturbation Theory and the Variational Principle 221 10.1 PERTURBATION THEORY FOR NONDEGENERATE ENERGY LEVELS 10.2 PERTURBATION THEORY FOR DEGENERATE ENERGY LEVELS 10.3 THE VARIATIONAL PRINCIPLE 10.4 PROBLEMS
222 228 238 243
x Contents
CHAPTER 11 Time Dependence 11.1 11.2 11.3 11.4 11.5 11.6 11.7
TIME-INDEPENDENT HAMILTONIANS THE SUDDEN APPROXIMATION TIME-DEPENDENT PERTURBATION THEORY TRANSITIONS BETWEEN ATOMIC ENERGY LEVELS THE EHRENFEST THEOREM THE AMMONIA MASER PROBLEMS
CHAPTER 12 Scattering 12.1 12.2 12.3 12.4 12.5
SCATTERING IN ONE DIMENSION SCATTERING IN THREE DIMENSIONS THE BORN APPROXIMATION PARTIAL WAVE ANALYSIS PROBLEMS
CHAPTER 13 Many-particle Systems 13.1 13.2 13.3 13.4 13.5 13.6 13.7 13.8
PART V
GENERAL CONSIDERATIONS ISOLATED SYSTEMS NONINTERACTING PARTICLES INDISTINGUISHABLE PARTICLES MANY-PARTICLE SYSTEMS THE HELIUM ATOM SCATTERING OF IDENTICAL PARTICLES PROBLEMS
247 252 254 260 266 267 270
273 274 279 281 286 297
299 300 301 302 303 306 314 320 321
Advanced Topics
CHAPTER 14 Relativity and Quantum Mechanics 14.1 14.2 14.3 14.4 14.5
245
BASIC RESULTS IN SPECIAL RELATIVITY THE DIRAC EQUATION ANTIPARTICLES OTHER WAVE EQUATIONS QUANTUM FIELD THEORY AND THE SPINSTATISTICS THEOREM 14.6 PROBLEMS
325 326 326 335 337 337 341
Contents xi
CHAPTER 15 Quantum Information 15.1 15.2 15.3 15.4 15.5
QUANTUM CRYPTOGRAPHY ENTANGLEMENT CLONING AND TELEPORTATION QUANTUM COMPUTING PROBLEMS
CHAPTER 16 The Conceptual Problems of Quantum Mechanics 16.1 16.2 16.3 16.4 16.5 16.6
THE CONCEPTUAL PROBLEMS HIDDEN-VARIABLE THEORIES NONLOCALITY THE QUANTUM MEASUREMENT PROBLEM THE ONTOLOGICAL PROBLEM PROBLEMS
343 345 348 349 354 364
365 366 368 374 385 400 401
B IBLIOGRAPHY
403
I NDEX
407
Preface to Sixth Edition Quantum mechanics is an indispensable part of any undergraduate physics curriculum. Nevertheless, the level at which quantum mechanics is taught varies widely from one college or university to another, as do the mathematics and physics prerequisites for the course. Our goal in producing this new edition is to make the textbook more accessible to a wider readership. At the same time, we do not want the book to appear unfamiliar to those that have come to know it over the years. As part of an attempt to reach these goals, the book is now divided into five “parts” that separately cover broad topics some or all of which might make up portions of any general course on quantum mechanics. Part 1 is comprised of three new chapters that review prerequisite physics and mathematics, along with a slightly revised version of the first chapter from the fifth edition. Our purpose here is not to cover classical waves, electromagnetism and special relativity, along with the attendant mathematics, at a level of detail that would suitable for a full course in these. Rather, we lay out the notation, formalism, and physical basis that we feel is necessary to set the stage for the rest of the book. That is, the chapters should be considered to be review material and a reference source for the later topics. Also, many of the end-of-chapter problems in these chapters are designed to extend the discussion in the text, so one approach to covering this material could be to assign those problems after a review of the these chapters. Part 2 consists of elementary wave mechanics based on the Schr¨odinger equations and their solutions in one and three dimensions, culminating in the energy levels and wavefunctions for the hydrogen atom. Part 3 introduces formal quantum mechanics including the theory of angular momentum and spin-orbit coupling. Part 4 extends the basic theory to include approximation schemes, scattering and many-body systems. Part 5 contains a selection of advanced topics, including the Dirac equation, quantum information and a discussion of conceptual problems. We hope that this division will assist teachers to select those topics that best fit the background of their students. Thus, an instructor might choose not to cover all or some of Part I, especially if that material is part of one or more prerequisite courses. (This approach would be closest to using the fifth edition of the book.) For example, xiii
xiv Preface to Sixth Edition many curricula in the US include a prerequisite course in “Modern Physics,” covering all of Part I in great detail, and possibly elements of Part II. In this case, the instructor could start at Part 2, while having students revise these topics by reading through, and working some problems from, the first four chapters. On the other hand, many curricula will see the material in those first four chapters covered in different courses in different years. For example, the vector calculus formulation of electromagnetism may not be seen until the same time students are taking a quantum mechanics course. In this case, the instructor might spend some class time at the start of the term, working through the material in Sections 2.2 and 2.3 before moving on to Part 2. Instructors and students need to appreciate the breadth versus depth balance in the new chapters. A further innovation in this edition is the inclusion of short descriptions of a number of applications relevant to the physics being discussed at various points through the book. These are not meant to be complete descriptions by any means, but rather a brief look at what quantum mechanics has made possible, both industrially and scientifically. It is our hope that these short sections will inspire the interested reader to learn more about the subjects they describe. We have also extended the number and range of the end-of-chapter problems and have made a number of other small changes aimed at modernizing and improving the existing material. Otherwise, the chapters from the fifth edition are essentially retained, albeit with new chapter numbers. It is our hope that all of these changes not only make the book more accessible to a wider range of curricula and students, while retaining nearly all the aspects of the familiar textbook that has been used for so many years by so many students. We are very grateful to our editor Francesca McGowan for all her help and guidance as well as pressure to meet agreed deadlines. We would welcome comments from students and teachers. Please contact us via our Featured Author pages on the CRC Press website. Some supporting material, including colour versions of some of the illustrations can be found at https://www.crcpress.com/Quantum-Mechanics-Sixth-Edition/ Rae-Napolitano/ 9781482299182. Alastair I. M. Rae Jim Napolitano 2015
Preface to Fifth Edition One of the main changes I have made is to intersperse of a number of worked examples in the text. These have been devised to illustrate the principles being discussed and should help students develop and test their understanding. Their level is similar to that of the problems listed at the end of the chapter and students would be well advised to try them for themselves before studying the worked solutions. More substantial examples of the application of quantum mechanics to physical systems (such as the treatment of the hydrogen atom in Chapter 3 [now 7]) which could not reasonably be attempted as unseen exercises are retained, but not listed as ‘examples’. A detailed solutions manual that contains worked solutions to all the end-ofchapter examples is available for qualifying instructors. I have once more checked and amended the text with the aim of further improving its clarity and relevance. In particular, this has led to quite substantial revision and extension of the treatment of quantum teleportation and quantum computing in Chapter 12 [now 15]. I should particularly like to thank my editor, John Navas, and other staff at Taylor & Francis who have given me invaluable help during the preparation of the cameraready copy for this edition. With their help, I have established a supporting web site, which is now at www.crcpress.com/product/isbn/9781584889700 (click on “Downloads and Updates”) This will include details of corrections to the text that have been identified since this edition was published. Please send details of further corrections and other comments to [email protected]. Once again, I must put on record my warm appreciation of Ann’s continued help and support. Alastair I. M. Rae 2007
xv
Preface to Fourth Edition When I told a friend that I was working on a new edition, he asked me what had changed in quantum physics during the last ten years. In one sense very little: quantum mechanics is a very well established theory and the basic ideas and concepts are little changed from what they were ten, twenty or more years ago. However, new applications have been developed and some of these have revealed aspects of the subject that were previously unknown or largely ignored. Much of this development has been in the field of information processing, where quantum effects have come to the fore. In particular, quantum techniques appear to have great potential in the field of cryptography, both in the coding and possible de-coding of messages, and I have included a chapter aimed at introducing this topic. I have also added a short chapter on relativistic quantum mechanics and introductory quantum field theory. This is a little more advanced than many of the other topics treated, but I hope it will be accessible to the interested reader. It aims to open the door to the understanding of a number of points that were previously stated without justification. Once again, I have largely re-written the last chapter on the conceptual foundations of the subject. The twenty years since the publication of the first edition do not seem to have brought scientists and philosophers significantly closer to a consensus on these problems. However, many issues have been considerably clarified and the strengths and weaknesses of some of the explanations are more apparent. My own understanding continues to grow, not least because of what I have learned from formal and informal discussions at the annual UK Conferences on the Foundations of Physics. Other changes include a more detailed treatment of tunnelling in Chapter 2 [now 5], a more gentle transition from the Born postulate to quantum measurement theory in Chapter 4 [now 7], the introduction of Dirac notation in Chapter 6 [now 9] and a discussion of the Bose-Einstein condensate in Chapter 10 [now 13]. I am grateful to a number of people who have helped me with this edition. Glenn Cox shared his expertise on relativistic quantum mechanics when he read a draft of Chapter 11; Harvey Brown corrected my understanding of the de Broglie-Bohm hidden variable theory discussed in the first part of Chapter 13; Demetris Charalambous read a late draft of the whole book and suggested several improvements
xvii
xviii Preface to Fourth Edition and corrections. Of course, I bear full responsibility for the final version and any remaining errors. Modern technology means that the publishers are able to support the book at the website.1 This is where you will find references to the wider literature, colour illustrations, links to other relevant web sites, etc. If any mistakes are identified, corrections will also be listed there. Readers are also invited to contribute suggestions on what would be useful content. The most convenient form of communication is by e-mail to∗ Finally I should like to pay tribute to Ann for encouraging me to return to writing after some time. Her support has been invaluable. Alastair I. M. Rae 2002
1 Now
obsolete. See preface to fifth edition for details of the current web site.
Preface to Third Edition In preparing this edition, I have again gone right through the text identifying points where I thought the clarity could be improved. As a result, numerous minor changes have been made. More major alterations include a discussion of the impressive modern experiments that demonstrate neutron diffraction by macroscopic sized slits in Chapter 1 [now 4], a revised treatment of Clebsch-Gordan coefficients in Chapter 6 [now 9] and a fuller discussion of spontaneous emission in Chapter 8 [now 11]. I have also largely rewritten the last chapter on the conceptual problems of quantum mechanics in the light of recent developments in the field as well as of improvements in my understanding of the issues involved and changes in my own viewpoint. This chapter also includes an introduction to the de Broglie-Bohm hidden variable theory and I am grateful to Chris Dewdney for a critical reading of this section. Alastair I. M. Rae 1992
xix
Preface to Second Edition I have not introduced any major changes to the structure or content of the book, but I have concentrated on clarifying and extending the discussion at a number of points. Thus the discussion of the application of the uncertainty principle to the Heisenberg microscope has been revised in Chapter 1 [now 4] and is referred to again in Chapter 4 [now 7] as one of the examples of the application of the generalized uncertainty principle; I have rewritten much of the section on spin-orbit coupling and the Zeeman effect and I have tried to improve the introduction to degenerate perturbation theory which many students seem to find difficult. The last chapter has been brought up to date in the light of recent experimental and theoretical work on the conceptual basis of the subject and, in response to a number of requests from students, I have provided hints to the solution of the problems at the ends of the chapters. I should like to thank everyone who drew my attention to errors or suggested improvements, I believe nearly every one of these suggestions has been incorporated in one way or another into this new edition. Alastair I. M. Rae 1985
xxi
Preface to First Edition Over the years the emphasis of undergraduate physics courses has moved away from the study of classical macroscopic phenomena towards the discussion of the microscopic properties of atomic and subatomic systems. As a result, students now have to study quantum mechanics at an earlier stage in their course without the benefit of a detailed knowledge of much of classical physics and, in particular, with little or no acquaintance with the formal aspects of classical mechanics. This book has been written with the needs of such students in mind. It is based on a course of about thirty lectures given to physics students at the University of Birmingham towards the beginning of their second year—although, perhaps inevitably, the coverage of the book is a little greater than I was able to achieve in the lecture course. I have tried to develop the subject in a reasonably rigourous way, covering the topics needed for further study in atomic, nuclear, and solid state physics, but relying only on the physical and mathematical concepts usually taught in the first year of an undergraduate course. On the other hand, by the end of their first undergraduate year most students have heard about the basic ideas of atomic physics, including the experimental evidence pointing to the need for a quantum theory, so I have confined my treatment of these topics to a brief introductory chapter. While discussing these aspects of quantum mechanics required for further study, I have laid considerable emphasis on the understanding of the basic ideas and concepts behind the subject, culminating in the last chapter which contains an introduction to quantum measurement theory. Recent research, particularly the theoretical and experimental work inspired by Bell’s theorem, has greatly clarified many of the conceptual problems in this area. However, most of the existing literature is at a research level and concentrates more on a rigorous presentation of results to other workers in the field than on making them accessible to a wider audience. I have found that many physics undergraduates are particularly interested in this aspect of the subject and there is therefore a need for a treatment suitable for this level. The last chapter of this book is an attempt to meet this need. I should like to acknowledge the help I have received from my friends and colleagues while writing this book. I am particularly grateful to Robert Whitworth, who read an early draft of the complete book, and to Goronwy Jones and George Morrison, who read parts of it. They all offered many valuable and penetrating criticisms, most of which have been incorporated in this final version. I should also like to thank Ann Aylott who typed the manuscript and was always patient and helpful throughout many changes and revisions, as well as Martin Dove who
xxiii
xxiv Preface to First Edition assisted with the proofreading. Naturally, none of this help in any way lessens my responsibility for whatever errors and omissions remain. Alastair I. M. Rae 1980
Author Biographies Alastair I. M. Rae retired as Reader in Quantum Physics from The University of Birmingham, UK which he joined in 1967. He has conducted research in a number of areas of condensed matter physics, including superconductivity and, in particular, its ”high temperature” manifestations. He first taught quantum mechanics in the 1970s and this led to the publication of the first edition of this book in 1981. This also sparked an interest in the conceptual problems of the subject which form the focus of the last chapter of this book and of two others. Jim Napolitano is Professor of Physics at Temple University, joining the faculty there in January 2014. His research field is Experimental Nuclear and Particle Physics, aiming primarily at studies of the fundamental interactions. Professor Napolitano has developed and taught courses for physics students of varying levels, at the College of William and Mary, Rensselaer Polytechnic Institute, and Temple University. He maintains an interest in modern instructional techniques, and has previously published two textbooks on advanced topics in physics.
xxv
I Waves, Electromagnetism, and the Limits of Classical Physics
CHAPTER
1
The Physics and Mathematics of Waves CONTENTS 1.1 1.2
1.3
1.4 1.5
A Review of Simple Harmonic Motion . . . . . . . . . . . . . . . . . . . . . . . . . Euler’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Stretched String Equation of Motion . . . . . . . . . . . . . . . . . . . . . . . Principle of linear superposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sinusoidal waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Standing Waves, and Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . String fixed at both ends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fourier series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Dirac δ-function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 5 7 9 10 11 12 14 16 18 20
This book aims to cover Quantum Mechanics thoroughly, at a level accessible to the serious undergraduate physics student. Consequently, it is important to establish immediately the prerequisite material with which the student should be familiar, in order to follow the arguments as we lay them out. This first chapter begins the coverage of that prerequisite material. The first three chapters of this book establish both the mathematical prerequisites for the rest of the text, as well as important physical and historical contexts in the development of Quantum Mechanics. We begin in this chapter by discussing strictly classical phenomena, namely vibrations and waves, using the mathematics critical for understanding so much of the physical world. In particular, we will derive, and explore the solutions of, the classical wave equation in one spatial and one time dimension. This will later have direct applicability to the Schr¨odinger wave equation of quantum mechanics. Generally speaking, a wave is a “disturbance” that moves, over time, through space. A wave, such as a classical mechanical or electromagnetic wave, carries momentum and energy, while the quantum mechanical wavefunction can often be used to calculate probabilities. The equation for the wavefunction, namely 3
4 Quantum Mechanics
k
m x
A simple mechanical system used to illustrate simple Harmonic Motion. A mass m is attached to a spring with stiffness constant k and the other end of the spring is attached to a fixed wall. The variable x measures the extension of the spring. That is, when x = 0, the spring is relaxed and the mass is at its equilibrium position. FIGURE 1.1
Schr¨odinger’s wave equation, will be presented and solved in upcoming chapters. We shall then see how the basis of Schr¨odinger’s equation lies in the postulates of quantum mechanics. For the present, however, we shall focus on classical waves, which are governed by an equation that is called the wave equation, This will be a good introduction to the mathematics and physical principles of waves. Our first example will be mechanical waves in one (spatial) dimension along a stretched string, where we can derive the wave equation from Newton’s Laws. The next chapter will show how Maxwell’s equations for electromagnetism in fact imply the existence of electromagnetic waves. Along the way, we will introduce mathematics that will be needed to understand these examples and which will also be used in our later study of quantum properties. Before discussing waves as such, we will review the a basic concept of mechanics involving simple oscillation.
1.1
A REVIEW OF SIMPLE HARMONIC MOTION
We begin with something that should already be familiar to most users of this textbook, namely the simple Harmonic Motion of mass attached to an ideal spring. Figure 1.1 shows the system involved. The mass m moves in the horizontal, i.e. x, direction, acted on only by the force of the spring. (We can assume that the mass moves on the surface of a smooth horizontal table, which cancels any vertical gravitational force). We define x = 0 as the equilibrium position so that the force on the spring is F = −kx, where k is a constant. By Newton’s second law of motion, this force equals the mass times its acceleration, that is F = ma. With time denoted by t, this becomes d2 x (1.1) −kx = m 2 dt
The Physics and Mathematics of Waves 5 since the acceleration is the second time derivative of position. This is called the equation of motion of the mass. In a standard notation, we define a quantity ω through ω2 = k/m, and the equation of motion becomes d2 x + ω2 x = 0 (1.2) dt2 Solving this differential equation means finding a function x(t) which satisfies the equation. The solution here is rather obvious. The function x(t) must have the property so that when it is differentiated twice, you get the same function back, but now multiplied by −ω2 . Indeed, two such functions come to mind, namely cos ωt and sin ωt. Furthermore, you could multiply each of these by an arbitrary constant, and, since the differential equation (1.2) is linear, any linear combination of cos ωt and sin ωt is also a solution. That is, the general solution can be written x(t) = B cos ωt + C sin ωt
(1.3)
where B and C are determined from the initial conditions, namely the values that x and dx/dt are given at t = 0. The reader should check that (1.3) is s solution to (1.2) by substitution and carrying out the differentiations. This kind of oscillatory motion is called “sinusoidal.” It is made of a linear combination of cosines and sines with one specific frequency ω, and this in turn can be expressed as a single cosine or sine function with the same frequency. (See the problems at the end of this chapter.) That the result comes out this way is an artifact of there being a single mass, with a single coordinate x which describes the motion. We say that there is only one degree of freedom. As we shall see, waves, on the other hand, have an infinite number of degrees of freedom, and their motion can be considerably more intricate. It is worth pointing out that the equation of motion itself can be used to provide insights to the motion. For example, if we go back to (1.1), rearrange it a little, and multiply through by the velocity v = dx/dt, we have " # dx dv dx d 1 2 1 2 d2 x dx + kx = m v + kx = mv + kx = 0 (1.4) m 2 dt dt dt dt 2 2 dt dt In other words, the quantity mv2 /2 + kx2 /2 ≡ E does not change with time, that is, E is a conserved quantity. We call this quantity the total mechanical energy.
Euler’s formula We now take a diversion into mathematics with the aim of coming up with a formula that will make it much easier to work with both oscillatory and wave motion. Taylor’s theorem says that any function f (x) can be expressed as f (x) = f (a) + f 0 (a)(x − a) +
1 00 1 f (a)(x − a)2 + f 000 (a)(x − a)3 + · · · 2! 3!
(1.5)
6 Quantum Mechanics where f 0 (x) is the first derivative of f (x) with respect to x, f 00 (x) is the second derivative, and so forth. We say that f (x) is “expanded” about the point x = a. The Taylor expansion for the function f (x) = e x about x = 0 is particularly simple. Since f 0 (x) = f (x) and f (0) = 1, we have 1 1 1 1 exp(x) ≡ e x = 1 + x + x2 + x3 + x4 + x5 + · · · (1.6) 2! 3! 4! 5! The expansions for cos x and sin x about x = 0 are also simple. Taking the derivative changes one into the other, but with an occasional sign change, and cos(0) = 1 and sin(0) = 0, so 1 1 (1.7a) cos x = 1 − x2 + x4 + · · · 2! 4! 1 1 (1.7b) sin x = x − x3 + x5 + · · · 3! 5! Equation 1.7b, for example, is the basis for the commonly used approximation that sin θ ≈ θ for θ 1. So far we have been using only real numbers. Now, however, consider√ the function exp(z) ≡ ez where z might be complex - i.e. z = x + iy, where i = −1. If you learned about the exponential function by studying natural logarithms, this might seem to be very peculiar. Nevertheless go ahead and interpret this function in terms of the expansion (1.6), and evaluate exp(ix) where x is real. Then 1 2 1 1 1 x − i x3 + x4 + i x5 + · · · 2! 3! 4! 5! Comparing expressions (1.7) and (1.8) we arrive at Euler’s Formula: exp(ix) = 1 + ix −
exp(ix) = cos x + i sin x
(1.8)
(1.9)
This expression is very useful to formulate oscillatory and wave phenomena in terms of complex exponential functions. It may seem odd at first that can use a complex expression to solve a purely real differential equation, but, in the case of classical waves, any physically allowed set of initial conditions always gives an appropriate, real solution. When we come to the quantum mechanical wavefunction later, we shall see that in many cases, the only allowed solutions are complex. Worked Example 1.1 Solve the differential equation (1.2) using the ansatz (i.e. initial guessed solution) x(t) = exp(iαt) to solve for α. Use the initial conditions x(0) = x0 and v(0) = v0 , and show that the result is a purely real function for x(t). Solution Since exp(iαt) is never zero, substituting this into (1.2) gives −α2 + ω2 = 0 which has two solutions, namely α = ±ω. The general solution is a linear combination of these two, i.e. dx = iωA exp(iωt) − iωB exp(−iωt) dt and the initial conditions give A + B = x0 and A − B = v0 /iω. These are easily solved for A and B, and, making use of Euler’s formula, we arrive at v0 x(t) = x0 cos ωt + sin ωt ω which indeed is a purely real expression. x(t) = A exp(iωt) + B exp(−iωt)
so
v(t) =
The Physics and Mathematics of Waves 7
y
T 1
T
2
x
x
Section of a string under tension, slightly deformed, and variables used to derive its equation of motion. The vertical displacement of the string is the function y(x, t). FIGURE 1.2
1.2
THE STRETCHED STRING EQUATION OF MOTION
It is straightforward to generalize the oscillation problem with a single degree of freedom to one with two (or more) degrees of freedom, such as a series of masses connected to each other by springs. One result is that the motion is no longer simply sinusoidal, but rather a linear combination of sinusoidal “normal modes.” (See the problems at the end of this chapter.) In fact, with some care, this model can be generalized to an infinite number of degrees of freedom by taking the limit for a very large number of very small masses. A stretched string is a prime example of a classical system with an infinite number of degrees of freedom. Rather than treating it as an infinite series of small masses, however, we will apply Newton’s laws directly to the string. Figure 1.2 shows a small section of the string, which lies in the xy plane. The string experiences a uniform tension T and has a linear mass density µ. The shape of the string at any given time t is given by y(x, t). Our first job is to use Newton’s laws to obtain the differential equation which governs y(x, t). We make the assumption that the string is never strongly deformed. That is, any tangent line to the string makes a small angle θ with respect to the x-axis. We now consider the vertical motion of a short piece of the string at an arbitrary fixed point x. The length of this piece is δx, so it has mass µδx. The net upward force on this piece is T sin θ2 − T sin θ1 ≈ T (θ2 − θ1 ) ≈ T (tan θ2 − tan θ1 ) since the angles are small. Therefore, Newton’s second law takes the form T (tan θ2 − tan θ1 ) = (µδx)
∂2 y(x, t) ∂t2
(1.10)
This is just “F = ma” for the small piece of string, which is moving only vertically, by assumption. The partial derivative ∂/∂t, applied twice, just means to take the derivative with respect to time, at one fixed position x. That is, ∂2 y/∂t2 is the vertical acceleration of the piece of string.
8 Quantum Mechanics Of course, we have made the tacit assumption that the piece of string is located at a single position x, although it has a horizontal length δx. Naturally, though, we are planning to consider the result in the limit δx → 0. We’ll take that limit now. As you know from introductory calculus, the derivative of a curve y(x) with respect to x gives you the tangent to the curve at position x. The angle θ of this tangent line is related to the slope of the line simply by tan θ. Therefore, with a little rearranging, (1.10) becomes ! ∂y µ ∂2 y 1 ∂y − = (1.11) δx ∂x 2 ∂x 1 T ∂t2 where we have again used the partial derivative notation, but this time to express the slope of the line at a fixed value of time. Now, if we take δx → 0 for the left side of (1.11), we are obviously just taking the derivative of the slopes, that is, the second derivative of y with respect to x. Making the definition v2 ≡ T/µ, and doing a little rearranging, we finally end up with 1 ∂2 y ∂2 y − =0 (1.12) v2 ∂t2 ∂x2 Differential equation (1.12) is called the wave equation in one spatial dimension. Its solutions are (classical) waves, and this of course is not surprising. We are all familiar that what happens if you take a guitar string, stretch it, and then “pluck” it. We will now study the mathematical formulation of waves by looking at the solutions to (1.12). We are tempted at first to consider sine and cosine solutions to this equation, but remember that we now have an infinite number of degrees of freedom. That is, each infinitesimal point along the string is considered an oscillator, coupled to the points just next door. We should therefore expect a solution that is much more general than simply sines or cosines. In fact, the general solution to (1.12) is amazingly simple. Any function of the form y(x, t) = f (x − vt) + g(x + vt) (1.13) solves the equation, where f (ξ) and g(ξ) are arbitrary functions, so long as they can be differentiated twice. The reader should check that (1.13) is a solution to (1.12) by substitution and carrying out the differentiations. Also the end-of-chapter problems include a more general derivation of this solution, albeit still not a particularly rigorous one. Why is this kind of general solution considered a “wave”? See Figure 1.3, where the function y(x, t = 0) shows an arbitrary shape, which might describe by a function f (ξ) or g(ξ). At some time t later, the functional form is of course unchanged. However, it is now a function of x ± vt instead of just x. The figure illustrates what you see in the case where g = 0 and v > 0. The shape is just moved by a distance vt. That is, the curve appears to move with speed v towards the right; v is therefore known as the “wave speed.” Similarly, if f = 0, the shape described by g(x+vt) would move to the left. This maintenance of the curve shape, while moving to the right or left, is the property that defines a “wave.”
The Physics and Mathematics of Waves 9
y
vt x
A representation of the solution expressed by Equation 1.13. The solid line plots an arbitrary function y(x). The dotted line, and axis, plot the same curve, but displaced to the right by an amount vt. That is, the curve moves to the right with a speed v. FIGURE 1.3
Principle of linear superposition The astute reader will notice that Equation (1.13) is actually the sum of two solutions, each of which satisfies the wave equation (1.12), and is an example of the principle of linear superposition. That is, if y1 (x, t) and y2 (x, t) are solutions of the wave equation, then y(x, t) = ay1 (x, t) + by2 (x, t) is also a solution, where a and b are arbitrary constants. (Proving this is left as a problem at the end of this chapter.) This is a general property of linear differential equations, that is, differential equations in which the dependent variable (i.e. y(x, t)) appears only linearly. Worked Example 1.2 Suppose that at time t = 0, the string is stationary and has shape
y(x, 0) = h(x) where h(x) is some localized “bump.” Find a solution to the wave equation that satisfies the initial conditions, and describe the subsequent motion of the string. Solution There are properties of differential equations that are important, and provable, but which we will not discuss in great detail in this book. One of these properties is “uniqueness,” namely that if a solution is found which satisfies the initial conditions, then it is the only solution. Therefore, if we can guess this right solution, then we are done. Consider the expression 1 1 y(x, t) = h(x − vt) + h(x + vt) 2 2 Obviously, y(x, 0) = h(x). Also, the initial (vertical) velocity of the string at any point x is 1 1 ∂y 1 1 = −v h(x − vt) + v h(x + vt) = −v h(x) + v h(x) = 0 ∂t t=0 2 2 2 2 t=0 t=0 and, indeed, the string is motionless at t = 0. The initial bump splits in half, with each piece moving to the left or right. See Figure 1.4 for an illustration of this solution. This probably agrees with what you would have guessed.
10 Quantum Mechanics
y
vt
x
vt
Graphic illustration of a solution to the wave equation, for a “bump” that is initially at rest. The shape splits in two with one half moving to the right, and half moving to the left. FIGURE 1.4
Sinusoidal waves We once again point out that there is nothing necessarily oscillatory about the solution (1.13) to the wave equation (1.12). sinusoidal waves, however, are allowed particular solutions to the wave equation and we discuss these in this section. Later we will see how we can use the principle of superposition to express a general solution can as a sum of sinusoidal waves. Consider a special form of the wave (1.13), namely y(x, t) = A cos(kx − ωt)
(1.14)
That is y(x, t) = f (x − vt) where f (ξ) = A cos(kξ). Writing kx − ωt = k(x − ω/kt), it is clear that the speed of this wave is v = ω/k. Because it is a cosine function, this function is periodic in both position and time. At a fixed time, the distance between successive maxima is 2π/k; this distance is called the wavelength and is usually denoted by λ. Moreover, at any fixed position x, the wave oscillates up and down with angular frequency ω. That is, its frequency is f = ω/2π. Therefore, for a sinusoidal wave, we have λ=
2π (wavelength) k
and
f=
ω (frequency) 2π
(1.15)
and the wave speed v = ω/k = λ f . We can always add an arbitrary constant phase φ to (1.14) so that y(x, t) = A cos(kx − ωt − φ), and these conclusions are unchanged. For φ = π/2, the wave is a sine function instead of a cosine function. Similarly, we can also always include an additive constant, that is y(x, t) = A cos(kx − ωt − φ) + B. In principle, the constants A, φ, and B are determined from initial conditions. Another approach to solving (1.12) is to use an ansatz based on Euler’s formula.
The Physics and Mathematics of Waves 11 In other words, we write y(x, t) = A exp[i(kx − ωt − φ)] and substitute into (1.12) to get ω2 or ω = ±kv (1.16) − 2 + k2 = 0 v That is, we can choose any value of k, but are then forced to choose either ω = +kv or ω = −kv, though we can also use a linear combination of the two corresponding solutions. The principle of linear superposition makes it clearer why sinusoidal waves are so important, even though they are a very special case of a waveform. We can show that it is possible to build arbitrary waves out of a superposition of sinusoidal waves. This has enormous practical importance. For example, the radio waves from any number of transmitting stations add together, but a tunable receiver is able to extract only waves with the frequency relevant to a particular station. We’ll come back to superposition many times when we are deeply into quantum mechanical waves.The remainder of this chapter will make full use of this concept and we will learn some more generally useful mathematics along the way.
1.3
STANDING WAVES, AND FOURIER SERIES
We have seen that the solution to the partial differential equation (1.12) can depend on the initial conditions, that is, the shape y(x, t), or vertical velocity profile ∂y/∂t, when t = 0. The solution may also depend on boundary conditions, that is the value of y(x, t), or of ∂y/∂x, at some value of x for all times. Consider a very simple boundary condition, namely y(x, t) = 0 at x = 0 for all times. The simple way to realize this boundary condition is to fix the string so that it does not move at the point x = 0. For example, the string could be attached to a wall, so that it only exists when x ≥ 0; this implies that solutions in the region x < 0 are nonphysical. We shall call any such solution a “ghost.” In fact, it is quite easy to construct a solution that has these properties. Let’s pick a case where the wave is localized in some region where x 0 with a shape f (x) at t = 0, and is moving leftward, that is, towards x = 0. Then consider the quantity y(x, t) = f (x + vt) − f (−(x − vt))
(1.17)
This is the sum of two solutions to the wave equation, one leftward moving and one rightward moving. If we now relax the condition x 0, we find that y(0, t) = f (vt) − f (vt) = 0 for all times t, so the boundary condition is satisfied. Furthermore, at t = 0, y(x, t) = f (x) − f (−x) = f (x) if we are only concerned with x 0, which is consistent with our starting condition. Figure 1.5 is a graphical representation of this solution. At t = 0, the solution is a localized shape in a region of positive x, moving leftward. There is also a ghost localized in the nonphysical region x < 0 and moving to the right. At some time T later, these two solutions cross, cancelling each other at x = 0. After another time increment T later, the ghost is moving to the left, and our physical solution is an inverted form of what we started out with, moving to the right.
12 Quantum Mechanics
y
t=0
x y t=T x y t=2T x Graphic illustration of a solution to the wave equation which meets a boundary condition at x = 0, namely that the string remains motionless.
FIGURE 1.5
In other words, the wave pulse encounters the fixed point at x = 0, inverts and moves back. Anyone who has played with a string tied to a wall has observed this phenomenon. Suppose instead that the end of the string is free, and the boundary condition there is ∂y/∂x = 0 at x = 0 for all times. In this case y(x, t) = f (x + vt) + f (−(x − vt))
(1.18)
You should prove to yourself that the derivative with respect to x remains zero at all times for x = 0. The behaviour of the string is very similar to that with the end fixed, but this time the pulse shape does not invert.
String fixed at both ends The general behaviour of a string fixed at both ends follows from the above: a pulse would move towards one end, reflect and invert to move in the other direction, and then reflect and invert from the other end. In this case, however, we can get additional important insights by considering sinusoidal solutions to the wave equation. Let the string be fixed at x = ±a, and find a solution y(x, t) where y(x = ±a, t) = 0 at all times t. If we think about making a solution out of linear combinations of exp[±i(kx − ωt)], where ω2 = k2 v2 , then we are led to a neat way to meet the “all times” constraint. Add a rightward-moving wave B exp[i(kx − ωt)] and a leftwardmoving wave C exp[−i(kx + ωt)]. This allows us to factor out all the time dependence
The Physics and Mathematics of Waves 13 as exp[−iωt], leaving B exp[ikx] + C exp[−ikx]. Forcing the spatial part to be real, we are naturally led to C = B, that is a cos kx dependence, or C = iB with a sin kx dependence. Furthermore, adding this linear combination to its complex conjugate gives a real time dependence. In other words, we are led to the solution y(x, t) = A cos(kx) cos(ωt)
or
y(x, t) = A sin(kx) cos(ωt)
(1.19)
(To be sure, these are the choices that force the string to be standing still at t = 0. More generally, we have another two solutions with cos(ωt) replaced by sin(ωt), but in the interest of brevity, we will not discuss these here.) These waves move neither left nor right. They are called standing waves, but of course we built them out of a linear combination of waves moving both left and right. It is easy to recover that linear combination with the waves in this form, using trigonometric identities to express cos(kx ± ωt) and sin(kx ± ωt) in terms of the quantities on the right-hand side of (1.19). However, the solution cos(kx ± ωt) does not necessarily fulfill the boundary condition y(±a, t) = 0. A glance at (1.19) shows that this is simple to do. All we need to is set cos(ka) = 0 or sin(ka) = 0. In other words, the boundary conditions put restrictions on the allowed values of the wave number k, namely ka = π/2, 3π/2, 5π/2, . . . or ka = π, 2π, 3π, . . .. These conditions are nicely combined by writing nπ n = 1, 2, 3, . . . (1.20) k= 2a The integer n will go by various names, depending on the physical situation involved. For now, we can just call it a “mode number.” Remember that the shape of the string is a cosine function for odd n and a sine function for even n. What is the motion of the string as described by the standing wave solutions (1.19), including the constraints (1.20) enforced by the boundary conditions? For any mode number, the shape is just a cosine or sine function, whose “amplitude” is A cos(ωt). That is, the height of the string oscillates with period T = 2π/ω is maximized at t = 0, the string is flat at t = π/2ω, and at t = π/ω the shape is flipped from its initial form. The wavelength λ = 2π/k, so each mode number corresponds to a different wavelength λ = 4a/n = 2L/n where L = 2a is the length of the string. This motion is described in Figure 1.6 It is very important to remember that the standing wave solutions represented by (1.19) with (1.20) are very special. As sinusoidal functions of both space and time, they each have a single defined frequency and wavelength. The prefix eigen- is often applied to such solutions, with words like eigenmode and eigenfrequency. These solutions certainly do not cover all cases. Imagine, for example, that we “pluck” the string by pulling it vertically at x = 0, and then let it go. Its shape at t = 0 is a triangle, and not one of our cosine or sine curves. In order to understand the motion of the string in this case, we need to take a detour and introduce some more mathematics.
14 Quantum Mechanics
t=0
n=1
-a
t=T/8
t=T/2
n=3 0
n=2
0
0
a -a
0
a
Modes of a vibrating string, each with a fixed wavelength and frequency. The left shows the shape at t = 0 for the first three mode numbers, while the right shows the shape of the n = 2 mode at t = 0, at one eighth of a period, and at half a period. FIGURE 1.6
Fourier series The functions sin(kx) and cos(kx) from a complete set of functions which means that any periodic function of x can be described by a linear combination of them, with values of k chosen to satisfy the appropriate boundary conditions. This statement can be proved rigorously, but we will not attempt to do so here. Nevertheless, armed with this concept, we can follow our nose and discover a powerful technique known as the Fourier series. Consider a periodic function f (x) with period R, that is f (x + R) = f (x) for all values of x. The functions sin(kx) and cos(kx) have periods R/n, where n is an integer, so long as kR = 2nπ. The Fourier conjecture is that the appropriate sum of these functions gives f (x), that is ! ∞ ! ∞ 2nπx X 2nπx a0 X f (x) = an cos bn sin + + (1.21) 2 n=1 R R n=1 The constant a0 /2 clearly represents the amount by which the average value of f (x) differs from zero. In fact, let us take the average value of both sides of this equation. This is just the integral over one period, divided by the period. Each cosine and sine term will average to zero, leaving only the constant a0 , so Z Z a0 2 R/2 1 R/2 f (x)dx = so a0 = f (x)dx (1.22) R −R/2 2 R −R/2 and we have determined the first coefficient of the expansion (1.21). Although this is a very modest start, the mechanism for determining the other terms follows quite logically. Multiply through (1.21) by either cos(2mπx/R) or sin(2mπx/R), for some integer m, and then average over a period by integrating. The trigonometric identities cos(A ± B) = cos(A) cos(B) ∓ sin(A) sin(B) sin(A ± B) = sin(A) cos(B) ± cos(A) sin(B)
(1.23a) (1.23b)
The Physics and Mathematics of Waves 15 make it clear that a product of cosines or sines is equivalent to a sum of two cosine or sine functions, with a period that is also some integer fraction of R. These will therefore integrate to zero, unless m = n, leaving only one term on the right-hand side of (1.21). Making use of ! ! Z Z 1 R/2 2 2nπx 1 1 R/2 2 2nπx cos sin dx = dx = (1.24) R −R/2 R R −R/2 R 2 we derive the succinct formulas ! Z 2 R/2 2nπx an = f (x) cos dx R −R/2 R ! Z 2 R/2 2nπx and bn = dx f (x) sin R −R/2 R
n = 0, 1, 2, . . .
(1.25a)
n = 0, 1, 2, . . .
(1.25b)
Of course, you should work through these steps yourself. Notice that we have extended the value of n to include n = 0, thanks to our original form of (1.21) and that (1.25b) gives b0 = 0. That is, we can write f (x) =
∞ X
[an cos(kn x) + bn sin(kn x)]
(1.26)
n=o
where we introduce the notation kn = 2nπ/R. Now we can apply this formalism to standing waves on a stretched string that is fixed at both ends. First, the string has length L = 2a and (1.20) tells us that R = 2L if we are to make use of the right set of cosine and sine functions to describe this problem. In other words, we picture the string as one-half a period of an infinite system, to ensure the boundary conditions on either end for all values of n. This means that the allowed values of k must be those given by (1.21) so k = kn = 2nπ/R = nπ/2a and ω = ωn = kn v. If we form the linear combination of all of these solutions, we get ∞ X [an cos(kn x) + bn sin(kn x)] cos(ωn t) y(x, t) = (1.27) n=0
and the shape of the string f (x) = y(x, 0) at t = 0 determines the an and bn . Then (1.27) gives the shape of the string at all times. Note that the initial shape is not maintained, because the effective Fourier coefficients at later times are given by an cos(ωn t) and bn cos(ωn t) which vary differently with time for different mode numbers n. Worked Example 1.3 A string is stretched under tension, so that its wave speed is v, and fixed at the points x = ±a. It is “plucked” by pulling up at the center, and then released from rest. Find the motion of the string as a function of time. Solution Figure 1.7 shows the shape of the string at t = 0. We need to find the Fourier components an and bn for this function. Adapting Equations (1.25) to this physical situation, we have R = 4a and integrate over one period from −2a to +2a. Recognizing that the initial shape is an even function of x, this gives bn = 0 and ! Z 2 a x 2nπx 4 an = 2 × 1 − cos dx = 2 2 1 − (−1)n R −a a R n π
16 Quantum Mechanics
On the left is the initial shape of a “plucked” stretched string. The plot on the right shows the shape of the string at different times, each a fraction of a full period. FIGURE 1.7
Inserting this into (1.27)for different values of t produces the curves on the right in the figure. We see that the string oscillates as a trapezoid. (The rounded corners are an artifact that results from the fact that we truncated the (infinite) Fourier series at n = 21).
It is instructive and useful, as we will see shortly, to work out the Fourier decomposition using Euler’s formula. Writing (1.26) instead as f (x) =
∞ X
cn exp(ikn x)
(1.28)
n=−∞
where the cn are complex, embodies the same mathematics. It is easy to see from Euler’s formula that an = 2(cn + c−n ) and bn = 2i(cn − c−n ). With (1.28), however, we don’t need to resort to trigonometric identities to find the cn . Recall that kn = 2nπ/R. Since exp(2inπ) = 1 for all integers n, multiplying each side by exp(−ikm x) and integrating over −R/2 ≤ x ≤ R/2 gives zero unless m = n. Therefore cn =
1 R
Z
R/2
f (x) exp(−ikn x)dx
(1.29)
−R/2
and Equations (1.25) are recovered by re-applying Euler’s formula. Notice that it is easy to show, from (1.28), that if f (x) is a real function, then c−n = c∗n .
1.4
THE FOURIER TRANSFORM
We are able to use the Fourier expansion to solve the problem of a string fixed at both ends, because it is equivalent to an infinitely long string that behaves periodically, with a period equal to twice the length of the fixed string. Let’s try to adapt this thinking to an infinitely long string without periodicity. A natural application might be to the propagation along the string of a pulse of some arbitrary shape.
The Physics and Mathematics of Waves 17 We seek to modify Equations (1.28) and (1.29) in the limit R → ∞. Firstly, note that kn = 2nπ/R becomes a continuous variable, so let’s just write it as k. Next, the integral in (1.29) will be finite for any localized function f (x); this means that cn → 0 as R → ∞, however the product Rcn remains finite. Now consider (1.28). The quantity 2π/R is really ∆k, that is, the difference between successive k values. Therefore we can write Z ∞ ∞ X R→∞ 1 R −−−−−−−→ (Rcn ) exp(ikx)dk (1.30) cn exp(ikn x) ∆k f (x) = 2π 2π −∞ n=−∞ √ We define a(k) = (Rcn )/ 2π and write Z ∞ 1 a(k) exp(ikx)dk f (x) = √ 2π −∞ as the limiting form of Equation (1.28), and Z ∞ 1 a(k) = √ f (x) exp(−ikx)dx 2π −∞
(1.31a)
(1.31b)
as the limiting form of Equation (1.29). Equations (1.31) together are known as the Fourier transform. Given a function f (x), we typically refer to a(k) as the Fourier transform of f (x). Note that we have followed a standard convention by “splitting” the factor of 2π between the two formulas. Just as in the case of the Fourier series for a periodic function, where the values of cn define the relative importance of the terms in the Fourier series, the Fourier transform gives the relative importance of different components in the “reciprocal space” defined as that spanned by the variable k. The function a(k) is often termed a “spectral decomposition” of f (x). If x measures position, then it is a spectrum in inverse wavelengths. If x measures time, then it is a frequency spectrum. A specific example should help make things clearer. Worked Example 1.4 A function f (x) is zero everywhere except for −α < x < +α where
it has the value 1/2α. (The integral of f (x) over −∞ < x < +∞ is unity.) Find the Fourier transform of f (x) and discuss the spectral behaviour in terms of α, Solution Following (1.31b) we have Z α 1 1 1 1 1 sin kα a(k) = √ exp(−ikx)dx = √ exp(ikα) − exp(−ikα) = √ 2π −α 2α 2π 2ikα 2π kα
Figure 1.8 shows the functions f (x) and a(k). The spectral strength is primarily in the region where −π . kα . π. In other words, the wave components that mainly contribute to f (x) are centered at k = 0 with a width ∆k ∼ 2π/α. The function f (x), of course, has width ∆x = 2α, so the product ∆x∆k ∼ 4π, that is, independent of α.
This relationship for Fourier transforms is rather general. That is, for a localized function f (x) with width ∆x, the Fourier transform a(k) will have a width ∆k such that ∆x∆k ∼ 1. The exact value for the product will depend on the specific functions
18 Quantum Mechanics
FIGURE 1.8
A square pulse f (x), left, and its Fourier transform a(k), right.
involved, as well as whatever definition we have for “width,” but the point is that the result is of order unity and does not depend on the value of specific parameters used to define f (x). We will revisit this concept later in this book when we discuss the Heisenberg uncertainty principle. As another example, consider a completely “delocalized” function f (x), namely a constant, so that, f (x) = A or all x. Its Fourier transform is Z ∞ A a(k) = √ exp(−ikx)dx (1.32) 2π −∞ which is a most peculiar function. For any nonzero value of k, it is just an integral over an infinite number of cosine and sine functions, cancelling itself out over every period. That is, a(k) = 0 for k , 0. On the other hand, when k = 0, the integrand is just unity, so a(k) is infinitely large. A completely delocalized function f (x) has the most localized possible Fourier transform a(k)! This is in keeping with the reciprocity of the widths of the functions f (x) and a(k), as noted above. We can, however, be more quantitative about the function a(k) in (1.32), rather than just say it is infinite at k = 0. Consider its integral over a range covering k = 0, that is, −κ < k < +κ for some positive value κ: Z κ Z κZ ∞ Z ∞ √ A 2A sin κx a(k)dk = √ exp(−ikx)dxdk = √ d(κx) = A 2π (1.33) −κ 2π −κ −∞ 2π −∞ κx R∞ where we make use of the known integral −∞ [sin(u)/u]du = π. In other words, even though this a(k) is infinitely high and has zero width, it has a finite area.
The Dirac δ-function This last example introduces the concept of the function δ(x), named after the British physicist Paul Dirac. This function is defined by the property that it is zero everywhere that x , 0, yet “large enough” at x = 0 so that Z ∞ δ(x)dx = 1 (1.34) −∞
Think of the Dirac δ-function as an infinitely high, infinitely narrow “spike” that has unit area. We will find a lot of use for this function in physics.
The Physics and Mathematics of Waves 19 We have just seen that one way to write the δ-function is Z ∞ 1 δ(x) = exp(ikx)dk 2π −∞
(1.35)
There are many other forms. Perhaps the easiest forms for visualizing δ(x) are limits of “spike-like” functions with finite width and height, but with unit area, as the width goes to zero and the height goes to infinity. For example, in terms of the Gaussian function, # " 1 2 2 (1.36) δ(x) = lim √ exp(−x /2σ ) σ→0 σ 2π Another useful example, although harder to write analytically, is the limit as a → 0 of a square pulse with width 2a and height 1/2a (such as the left side of Figure 1.8). The δ-function has many interesting and useful properties. For example Z ∞ f (x)δ(x)dx = f (0) (1.37) −∞
for any function f (x). This is easy to see, since δ(x) is nonzero only at x = 0, you can just “factor out” f (0) before doing the integral. We will make use of the Dirac δ-function at various points in this book, and at times point to its properties as well as to its oddities.
20 Quantum Mechanics
1.5
PROBLEMS
1.1 Use Euler’s formula to find a purely real expression for ii . 1.2 Show that (1.3) can be written as x(t) = A cos(ωt + φ) and derive expressions for A and φ in terms of B and C. Assuming the oscillator starts out at position x(0) = x0 with velocity v(0) = v0 , determine A and φ in terms of x0 and v0 . Note: We call A the amplitude and φ the phase of the oscillation. 1.3 A spring with stiffness k hangs vertically from point on the ceiling. A mass m is attached to the lower end of the spring without stretching it, and then is released from rest. Show that when the gravitational force mg is taken into account, the motion is still sinusoidal with ω = (k/m)1/2 but with an equilibrium position shifted to a lower point. Find the new equilibrium position in terms of m, k, and g. 1.4 Consider a system of a mass and spring, such as in Figure 1.1, but with an additional force Fdamp = −bv proportional to velocity but acting in the direction opposite to the motion. Reformulate the equation of motion, and find the solution for x(0) = x0 and v(0) = v0 . Use the ansatz x(t) = exp(iαt) to solve for α. You may assume that b2 /m2 is less than 4k/m. 1.5 Two equal masses m move in one dimension and are each connected to fixed walls by springs with stiffness k. The masses are also connected to each other by a third, identical spring, as shown:
k
m
k
m
k
Write the (differential) equations of motion for the positions x1 (t) and x2 (t) of the two masses. Solve those equations with the ansatz x1 (t) = A1 exp(iαt) and x2 (t) = A2 exp(iαt); you will discover nontrivial solutions only for two values of ω2 . (Those two values are called eigenfrequencies.) What kind of motion corresponds to each of these two eigenfrequencies? 1.6 Find the eigenfrequencies for the two-mass, two-spring system shown here:
k
3m
k
2m
1.7 For the two-mass, three-spring system discussed in Problem 1.5, find expressions for x1 (t) and x2 (t) subject to the initial conditions x1 (0) = A and x2 (0) = v1 (0) = v2 (0) = 0. Make a plot of x1 (t) and x2 (t), and also plot the quantities x1 (t) + x2 (t) and x1 (t) − x2 (t). Comment on your observations. 1.8 Repeat Problem 1.7, but this time let the “coupling” spring between the two masses have a spring constant kc = k/100. Show that the overall motion “oscillates” between cases were the first mass is in simple Harmonic Motion by itself, to one where the second mass is in simple Harmonic Motion, and then back again. What is the frequency of these low frequency oscillations between the two masses? 1.9 Derive the solution (1.13) to the wave equation (1.12) by going through the following steps. Consider a change of variables from x and y to ξ = x − vt and η = x + vt. Then use the chain rule to rewrite the wave equation in terms of ξ and η. You should find that ∂2 y =0 ∂ξ∂η Then argue that this means that y is a function of either ξ or η, but not both at the same time. In other
The Physics and Mathematics of Waves 21 words, the solution is (1.13). If you are not familiar with the chain rule for partial differentiation, it means that if w and z are functions of x and y, then ∂ ∂ f ∂w ∂ f ∂z f (x, y) = + ∂x ∂w ∂x ∂z ∂x and similarly for ∂/∂y. You can assume that you get the same result regardless of the order in which the partial derivatives are taken. 1.10 Prove the principle of linear superposition for the wave equation (1.12). That is, show that if y1 (x, t) and y2 (x, t) are solutions of the wave equation, then y(x, t) = ay1 (x, t) + by2 (x, t) is also a solution, where a and b are arbitrary constants. 1.11 A string with linear mass density µ hangs motionless between two fixed points (x, y) = (±a, 0) where y measures the vertical direction. The length of the string is greater than 2a, so the lowest point is at (x.y) = (0, b). Derive the differential equation that describes the shape of the string, and solve it for y(x) in terms of a, b, and the acceleration g due to gravity. Unlike our derivation of the wave equation, do not make the “small displacement” assumption. Note that the shape, called a catenary, is not a parabola. 1.12 Show that the standing wave solutions (1.19) are linear combinations of the traveling wave solutions cos[kx ± ωt] and sin[kx ± ωt]. 1.13 A function f (x) is periodic, such that f (x + 2) = f (x). For −1 < x < 0 f (x) = −1, and for 0 < x < 1 f (x) = +1. Find the first five terms of the Fourier expansion for f (x), and make a plot of the approximations based on the first term, and the sums up to the third and fifth terms, along with a plot of f (x) itself. 1.14 The G-string on a standard guitar vibrates at 196 Hz. On one particular guitar, this string is 60 cm long and has a mass of 3.1 grams per meter. Calculate the tension on the string, both in Newtons and pounds. 1.15 Consider a string with an initial shape similar to that shown in Figure 1.7, but instead plucked at a position x = a/2 instead of x = 0. Find the motion of the string in this case. You might find a program like M ATHEMATICA particularly useful to carry out the calculation of the Fourier components, as well as to produce an animation of the string’s motion. 1.16 Find the Fourier transform of the (normalized) Gaussian function f (x) =
1 √
σ 2π
exp(−x2 /2σ2 )
(You will need to “complete the square” of the exponent in the integrand to carry out the integration.) Using σ as a measure of the “width” ∆x of f (x), propose an analogous quantity to express with width ∆k of the Fourier transform, and evaluate ∆x∆k. 1.17 The “width” of a localized function can have a precise definition, based in fact on the concept of standard deviation. For some normalized distribution f (z), we can define the width ∆z by Z ∞ Z ∞ (∆z)2 = hz2 i − hzi2 hzi = z f (z)dz hz2 i = z2 f (z)dz and (1.38) −∞
−∞
Use this definition to find the width ∆k of the distribution function a(k) in Figure 1.8. Are you surprised? 1.18 Consider a triangular function f (x) that forms a straight line from the point at x = 0 to the x-axis at both x = ±α and is zero otherwise, that is
22 Quantum Mechanics
Find the Fourier transform. Then find the width of both f (x) and of its Fourier transform, using the definition (1.38) based on standard deviation. 1.19 Evaluate the integral Z
∞
δ(ax)dx −∞
where a is a positive constant. 1.20 The “step function” H(x) is defined so that H(x) = 0 for x < 0, and H(x) = 1 for x > 0. Show that dH/dx has the correct properties to claim that dH/dx = δ(x).
CHAPTER
2
Maxwell’s Equations and Electromagnetic Waves CONTENTS
2.1
2.2 2.3
2.4 2.5 2.6
Notation for working in three dimensions . . . . . . . . . . . . . . . . . . . . . . . On the question of units in electromagnetism . . . . . . . . . . . . . . . . . . . Maxwell’s Equations as Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fields, forces, and flux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application: the resistor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Surface Theorems in Vector Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . Maxwell’s Equations as Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . The electrostatic and magnetic vector potentials . . . . . . . . . . . . . . . . Charge conservation and the continuity equation . . . . . . . . . . . . . . . Application: the betatron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Electromagnetic Waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Electromagnetic Radiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thomson scattering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24 24 25 27 27 29 32 33 35 35 37 39 41 41
In this chapter we move on to discuss classical waves in three spatial dimensions. Our vehicle for this is classical electromagnetism and the notion of a field in a mathematical and physical sense. Working with fields leads us into the mathematics of vector calculus as we move from one to three spatial dimensions. Several theorems of vector calculus will prove useful here, in quantum mechanics, and in other areas of theoretical physics. Electricity and magnetism were seen as separate physical phenomena until the late nineteenth century when James Clerk Maxwell made a crucial addition to the work done by Michael Faraday among others. This provided a unified account of the two phenomena in a single theory that soon became known as electromagnetism, and
23
24 Quantum Mechanics its appreciation led almost directly to special relativity. Later, its incorporation into quantum mechanics led to new phenomena and modern applications.
Notation for working in three dimensions The notation used for describing three dimensional space is not universally standard and the following notation is used throughout this book. A point in space can be located by three orthogonal coordinates x, y, and z, or by the vector r drawn from the origin to the point. The magnitude |r| = r is always taken as positive, with r2 = x2 + y2 + z2 . We write1 x0 , y0 , and z0 , for dimensionless unit vectors in the directions of the x, y, and z axes. That is x0 · x0 = y0 · y0 = z0 · z0 = 1, x0 · y0 = y0 · z0 = z0 · x0 = 0 and r = xx0 + yy0 + zz0 . This convention is called the Cartesian coordinate system. One can also locate a point in space using r and two angles θ and φ, and this is called the spherical polar coordinate system. The details of this coordinate system will be left for a later discussion. See Section 6.3.
On the question of units in electromagnetism Students of physics are almost always faced with having to deal with electromagnetism in two different systems of units. These are the SI (for Syst`eme International) system, and the Gaussian system. This textbook is written strictly in the SI system, but many graduate level texts on quantum mechanics use the Gaussian system. The SI system, also known as MKSA for meter, kilogram, second, amp´ere, uses those four “base units” as their starting point. The key to appreciating its use in electromagnetism is the fourth base unit, the amp´ere. This is the amount of electric current that results in a force of one newton per metre length between two parallel wires separated by a specific distance. The details do not matter to us here, but it can be shown that it leads to Coulomb’s law for the magnitude of the force between two point charges q1 and q2 separated by a distance r in vacuum: namely FCoulomb =
1 q1 q2 4π0 r2
(SI Units)
(2.1)
where 0 is a constant known as the permittivity of free space and has the value 8.854187817...x10−12 farads per meter. The derived unit of charge is called the coulomb, which is defined so that one coulomb equals one amp´ere-second. The Gaussian system of units, also known as CGS for centimeter, gram, second, does not invent a fourth base unit, but rather defines the unit of charge as derived from those three. In this system, Coulomb’s law is written simply as q1 q2 (Gaussian Units) (2.2) FCoulomb = 2 r This unit of charge is called the electrostatic unit or esu. It is just the combination of centimeters, grams, and seconds that makes (2.2) dimensionally correct. 1 It is common to write x ˆ , yˆ , and z for unit vectors, but this book will use the “hat” notation for quantum mechanical operators so the above convention in our case is used to avoid confusion.
Maxwell’s Equations and Electromagnetic Waves 25 It is important to realize that, unlike the units for length, mass, and time, the factors needed to convert between different units for electric charge are not just dimensionless numbers (much less powers of ten) between the SI and Gaussian systems. The factors instead turn out to contain a dimensioned number that can be traced back to the speed of light, c. The result is that various quantities look rather different in the two systems, including factors of 4π, c, and 0 , as well as other quantities derived from these. In the SI system, the quantity µ0 , the so-called “permeability of free space,” is defined to be 4π × 10−7 T·m/A (where T stands for tessla, the unit of magnetic field strength) and appears in the equations for magnetism. One calculates 0 from the relation 0 µ0 c2 = 1
(2.3)
We emphasize once again that this book is written strictly in SI units. You should just keep in mind that, when moving on to other texts, they may not be keeping the same set of units to which you are accustomed.
2.1
MAXWELL’S EQUATIONS AS INTEGRALS
The unification of electricity and magnetism by James Clerk Maxwell, early in the latter half of the nineteenth century, is one the most fascinating stories in the history of physics. Maxwell used his insight to insist that something was missing from the existing formulation, even though there was not yet any experimental evidence for this. That evidence eventually came in the observation of electromagnetic waves, which, as we shall see, are a necessary consequence of Maxwell’s theory. Maxwell’s equations relate two quantities, known as the electric field E and the magnetic field B to each other and to electric charges and currents that give rise to them. The nature and physical properties of these fields will be discussed shortly. As most students first see them, the equations take the following form: I 1 E · dA = Qenclosed (2.4a) 0 IS B · dA = 0 (2.4b) S "Z # I d B · dA (2.4c) E · dl = − dt C "Z # I d B · dl = µ0 Ienclosed + µ0 0 E · dA (2.4d) dt C The integrals on the left are over “closed” regions, hence the little circle on the integral sign. By “S ” we mean some arbitrary closed surface, whereas “C” is a curve forming a closed loop. The vector differential dA has a magnitude that is the area of a tiny piece of the surface, and its direction is perpendicular to that tiny piece. Therefore, the left-hand sides of the first two equations are integrals of the normal components of the electric and magnetic fields over the surface S . On the other hand,
26 Quantum Mechanics the differential dl is the length of a tiny piece of the curve, with direction tangent to the curve. Therefore, the second two equations integrate the tangential components of the fields around the closed loop C. It is worthwhile pausing for a momentRto review the concept of an integral from a physical perspective. The integral sign , an elongated S, means to sum over an infinite number of infinitesimally small pieces. For example, the integral in (2.4a) says to break the surface up into a large number of tiny pieces, Then, at each of those tiny pieces, take the inner product between the field E(r, t), where r locates the position of the tiny piece, and the perpendicular area element dA. Finally, add up all these inner products for every element of the surface. There are many prescriptions for evaluating integrals mathematically, but sometimes a simple picture will do. ! # Note that some people use the notation or for integrals over twodimensional or three-dimensional spaces. We are not going to follow that convention in this book. It is nearly always obvious from the context how many variables an integral is to be performed over. Now take a moment to discuss what each equation says physically. Equation (2.4a), known as Gauss’s law, relates the surface integral of E over the closed surface to the electric charge enclosed by that surface. It is not hard to see that Gauss’s law implies Coulomb’s law; see the problems at the end of the chapter. The statement of (2.4b), called Gauss’s Law for Magnetism, is similar, but simpler. The integral is always zero, because it is an experimental fact, that nobody has ever conclusively observed the existence of a so-called “magnetic monopole” the magnetic equivalent to electric charge. Now consider (2.4c), known as Faraday’s law. The right side includes an integral over an open surface whose edge is determined by the closed curve C. We call that integral the magnetic flux through the loop. Faraday’s law says that a changing magnetic flux produces an electric field. This changing flux can result either from a changing magnetic field or from a change in the shape or orientation of the curve C. This is the principle by which an electric generator operates, when a magnet is spun witin a metal coil, as in every coal, gas, or nuclear power plant. Equation (2.4d), says that there are two ways to generate a magnetic field. One way, based on the first term on the right, relates the current through a closed loop C to the tangential integral of B around the loop. This is known as Amp`ere’s Law and predates Maxwell. The second term represents Maxwell’s key contribution and says that a magnetic field can result in a changing electric field flux. This term is analogous to Faraday’s law discussed above. (We note that there is no term in Faraday’s law analogous to that of Amp`ere’s Law because that would require a current of magnetic monopoles, which do not exist.) Maxwell added the second term on the right in (2.4d) because his intuition demanded it—see the end of chapter problems for an example of how your own intuition can lead you to the same conclusion. This additional term “unified” electricity and magnetism into one theory we now call “electromagnetism.” As we will see shortly, this term implies the existence of electromagnetic waves.
Maxwell’s Equations and Electromagnetic Waves 27
Fields, forces, and flux Maxwell’s equations relate “fields” to “sources,” namely charges and currents, but the concept of a field is abstract. Electric and magnetic fields are essentially defined by equations (2.4) They make their physical presence known by the forces they exert on charges and currents other than those that are the sources of the fields themselves and, as we shall see shortly, by their manifestation in electromagnetic waves. The idea of a field has, however, a more general meaning. In physics, a “field” can be any function of space and time that has a physical interpretation. For example, the volume density ρ(r, t) of some quantity, like charge or mass, is a field which is represented by just one number at each point in space at any particular time. The electric and magnetic fields, E(r, t) and B(r, t), on the other hand, are examples of vector fields. They have a magnitude and direction, (that is three components) at any point in space and time. Electromagnetic fields exert forces on charges. That is, they change the momentum p of a charged particle. The electric field will act on any charge, but the magnetic field only acts on moving charges. This is all described by the Lorentz force law dp F= = qE + qv × B (2.5) dt This result does not follow directly from Maxwell’s equations, but is consistent with them, with special relativity, and of course with experimental observation. Note that the force due to the magnetic field is always perpendicular to the direction of motion of a particle. Therefore, it does no work and cannot change the magnitude of the particle’s momentum. Each of Maxwell’s equations involves the concept of flux of a vector field through a surface. For a vector field V(r) and a surface S , Z ΦV = V · dA (2.6) S
defines the flux ΦV of V(r) through S . Equations (2.4a) and (2.4b) involve the flux through a closed surface, whereas Equations (2.4c) and (2.4d) include a term on the right that is the flux through an open surface. Flux has a transparent physical interpretation if you think of V(r) as the “flow field” of some quantity. For example, given a density field ρ(r) for some quantity and a velocity field v(r), V = ρv is a current density for that quantity. That is, for an area element dA, the amount of the quantity flowing through dA per unit time is just V · dA. Integrating over the surface gives the total amount flowing per unit time.
Application: the resistor This is a good time to talk about a simple application of electricity and magnetism, namely the resistor. This a device that impedes the flow of current in an electric circuit. The resistance (R) presented by a resistor is defined as the
28 Quantum Mechanics
magnitude of the electric potential (i.e. voltage, V) that results when an electric current (I)flows through it. That is V = IR
(2.7)
L
r
V Symbols
Model
Photograph
The resistor as used in electrical circuits. The upper symbol was completely standard,but has more recently been replaced by the lower one. They are used in circuit diagrams regardless of the specific type of resistor used. We can model a resistor as a long cylinder, made of some uniform, isotropic material, with wires attached to the flat faces. The model quite resembles a standard carbon resistor. The colored bands on the resistor are a code for its resistance value and precision standards. FIGURE 2.1
This equation is known as Ohm’s Law. Figure 2.1 shows how a resistor appears in a circuit diagram, a useful model for understanding its behavior, and a photograph of a carbon resistor. A potential difference V is applied and a current I flows through the resistor. For a resistor of the type illustrated, the resistance is proportional to its length L and inversely proportional to its cross sectional area A(= πr2 ). That is R=ρ
L A
(2.8)
where the “resistivity” ρ is a property of the resistor material. Indeed, calculation of the resistivity of metals is a standard topic in the application of quantum mechanics to crystalline materials. Many practical applications of resistors make use of the fact that they get hot. Each electron passing through the resistor loses an amount of energy eV, which appears as either heat or radiation or both. We can calculate the amount of energy given off by a resistor using simple principles outlined in this section to determine the electric and magnetic fields within the device.
Maxwell’s Equations and Electromagnetic Waves 29
A thorough treatment of electromagnetism shows that the Poynting Vector S = (1/µ0 )E × B
(2.9)
measures the energy flux, that is, the energy flowing per unit time through some unit area. The electric field E driving current through the resistor is just V/L, along the axis of the cylinder, constant in time. From Amp´ere’s law (2.4d), the magnetic field B at the surface of the cylinder is (µ0 /2π)(I/r). The two fields E and B are perpendicular to each other, so S = IV/(2πrL) is the energy flux at the surface. However, 2πrL is just the area of the surface, so the power dissipated in the resistor is P = IV (2.10) In principle, this power can be used for any number of purposes. As an “immersion heater” it is a popular way to heat a cup of water. An incandescent light bulb is a resistor imbedded in an evacuated glass bulb, which gets so hot that it glows. Of course, resistors also find invaluable use as circuit elements.
2.2
SURFACE THEOREMS IN VECTOR CALCULUS
Consider the concept of a region surrounded by a boundary that forms the edge of the region. There is a set of mathematical theorems that relates an integral of some quantity over the region to a different integral over the boundary. A glance at Equations (2.4) makes it clear that these theorems are relevant to Maxwell’s equations. We will discuss two of these so-called surface theorems, namely the Divergence Theorem, also called Gauss’s Theorem, and Stokes’ Theorem. We start with the divergence theorem, which concerns integrals like those on the left-hand side of (2.4a) and (2.4b). That is, integrals of some vector field’s perpendicular components over the surface surrounding some arbitrary region of space. For the purposes of discussion, let the vector field be called V(r). (Time dependence is irrelevant to the mathematics we are working on now.) We want to H study the integral V · dA over the closed surface. The divergence theorem relates this surface integral to the integral of a different quantity over the volume of the region. First, divide the region up into small cells, as shown in Figure 2.2(a). (The figure and our discussion relate to two-dimensional regions, but the generalization to three dimensions should be clear.) Clearly, the volume integral is obtained by making the size of the cells smaller and smaller, taking the limit as their number grows to infinity. Interestingly, the same is true for the surface integral. That is, you can express the surface integral as a sum over all the surfaces of all the volume elements. To understand this, see Figure 2.2(b). Along the boundary between adjacent cells, the quantity V·dA cancels from one cell to the next, because dA changes sign. Therefore, the only cell surfaces that make a finite contribution to the integral are those on the outer boundary of the original, large region.
30 Quantum Mechanics
(x,y+dy) i dAi
(a)
(b)
j dAj
(x+dx,y+dy) dy
(x,y)
(x+dx,y) dx
(c)
Geometric explanation of the derivation of the divergence theorem. (a) The volume integral over a region is the sum of small rectangles (or rectangular blocks in three dimensions). (b) Any quantity proportional to the area of a volume element cancels on adjacent cells, leaving only the edges on the boundary. (c) Coordinate system for analyzing the integral over one small cell. FIGURE 2.2
H We can therefore evaluate the surface integral V · dA by calculating it for each small cell, and then summing the result over all the volume elements. For each cell, we add the contributions from each of the four faces (or six faces in three dimensions). This gives X V · dA = (V · dA)bottom + (V · dA)right + (V · dA)top + (V · dA)left (2.11) cell
Next, express this sum using the coordinate system in Figure 2.2(c). Referring to the diagram, start at the lower left corner and move counter clockwise to the other corners. Evaluate the appropriate component of V by using the coordinate at the relevant corner. (We are being cavalier about how V varies between (x, y) and (x + dx, y + dy) but in the end, we will take dx → 0 and dy → 0 so all we really need to do is keep track of the important nonzero components.) Note that the magnitude of dA is dx on the top and bottom, and dy on the left and right, and remember that the direction of dA is always away from the inside of the cell. (The magnitudes are dxdz and dydz if we are in three dimensions.) Putting this all together gives us X V · dA = −Vy (x, y)dx + V x (x + dx, y)dy + Vy (x + dx, y + dy)dx − V x (x, y + dy)dy cell
# V x (x + dx, y) − V x (x, y + dy) Vy (x + dx, y + dy) − Vy (x, y) = + dxdy dx dy " # ∂V x ∂Vy = + dxdy (2.12) ∂x ∂y "
where we recognize the partial derivates as the result of taking the limits dx → 0 and
Maxwell’s Equations and Electromagnetic Waves 31
(x,y+dy) dli
i
j
dy
dlj (x,y)
(a)
(x+dx,y+dy)
(b)
(x+dx,y) dx
(c)
Geometric explanation of the derivation of Stokes’ theorem. (a) The surface integral over a region is the sum of small rectangles. (b) Any quantity proportional to a loop over the area element cancels on adjacent cells, leaving only the edges on the boundary. (c) Coordinate system for analyzing the integral over one small cell. FIGURE 2.3
dy → 0. The generalization to three dimensions is obvious, namely " # X ∂V x ∂Vy ∂Vz V · dA = + + dτ ∂x ∂y ∂z cell
(2.13)
where dτ = dxdydz is the infinitesimal volume element. Finally, as already discussed, the surface integral is obtained by adding up the contributions from all of these cells. Therefore, I Z (∇ · V) dτ V · dA = (2.14) where
∇·V =
∂V x ∂Vy ∂Vz + + ∂x ∂y ∂z
(2.15)
is called the divergence of V. Equation (2.14) is the Divergence Theorem. A nonzero divergence of the vector field V, at a point r, obviously means that in some small region around r, the field “diverges,” that is, it points outward if ∇ · V > 0 or inward if ∇ · V < 0. The second surface theorem we need to discuss is Stokes’ theorem, which involves the line integral around a closed loop. This is the sort of integral on the right-hand side of (2.4c) and (2.4d). We will show that these integrals are equal to a surface integral of another vector field called the curl of the field V. We proceed as Hwe did for the divergence theorem. See Figure 2.3(a). Our goal now is to evaluate V · dl by evaluating V · dl where dl is tangent to the boundary as we move around the loop. We choose to move in a counterclockwise direction, as shown in the figure. Again, quantities on adjacent cell boundaries cancel each other
32 Quantum Mechanics because the dls on each side are in opposite directions. Starting from the lower left, V · dl = V x (x, y)dx + Vy (x + dx, y)dy − V x (x + dx, y + dy)dx − Vy (x, y + dy)dy " # Vy (x + dx, y) − Vy (x, y + dy) V x (x + dx, y + dy) − V x (x, y) = − dxdy dx dy " # ∂Vy ∂V x = − dA (2.16) ∂x ∂y where dA = dxdy is the area element of the surface. Defining the curl of V as " # " # " # ∂Vy ∂V x ∂Vz ∂Vy ∂V x ∂Vz ∇×V = − x0 + − y0 + − z0 (2.17) ∂y ∂z ∂z ∂x ∂x ∂y and realizing that dA is in the z direction i.e. dA = dAz0 , we finally have Stokes’ theorem: I Z V · dl = [∇ × V] · dA (2.18) This way of rewriting (2.16) as (2.18) is compact, but hides a bit of interesting mathematics. We came up with (2.16) by assuming a curve that lies in a plane and bounds a plane surface; we then tiled the surface with flat rectangles. However, we could have chosen any surface that is bounded by the curve. We could of course tile that closed curve with flat rectangles, but how do we know this would yield the same result as (2.18)? The answer is given by the divergence theorem. Two different arbitrary surfaces with the same boundary form a closed volume region. Since ∇ · [∇ × V] = 0 for any vector field V, the integral over this closed surface must be zero. The “outward” areas are in different directions for the two surfaces, so the Stokes integrals of the curl over the area must be the same for both of them.
2.3
MAXWELL’S EQUATIONS AS DERIVATIVES
We are now ready to discuss Maxwell’s equations in their differential form. Before reviewing how we use the surface theorems to make this transformation, we first present the result: 1 ρ 0 ∇·B = 0 ∂B ∇×E = − ∂t ∇·E =
∇ × B = µ0 j + µ0 0
(2.19a) (2.19b) (2.19c) ∂E ∂t
(2.19d)
Here ρ = ρ(r, t) is the charge density (i.e. charge per unit volume), and j = j(r, t) is the current density (current per unit area). Apart from these quantities, we assume empty space.
Maxwell’s Equations and Electromagnetic Waves 33 It is easy to see how Equations (2.19) follow from (2.4) and the surface theorems. First consider (2.4b), Gauss’s law for Magnetism. The surface integral on the left is over an arbitrary region, so the divergence theorem turns that into the integral of ∇ · B over an arbitrary volume, so it follows that ∇ · B = 0 everywhere. Treating Gauss’s law (2.4a) in a similar way, we first note that the right-side is not zero. However, the enclosed charge is just the integral of the charge density over the same arbitrary volume, so (2.19a) follows. For Faraday’s law, a similar argument transforms (2.4c) into (2.19c), now using Stokes’ theorem, where here it is the loop and bounded surfaces that are arbitrary. Also, when the time derivative is “brought inside” the integral, it becomes a partial derivative because the integrand is a function of both position and time, while the integral depends only on time. In the case of Amp´ere’s law, (2.4d) leads to (2.19d) in a similar way, except that the current term requires some explanation. The current flows through the same arbitrary bounded surface as the integrals are performed over in (2.4d). This “enclosed” current can be obtained by integrating the component of R the current density perpendicular to the surface. In other words, Ienclosed = j · dA. Then (2.19d) follows. Equations (2.19) are not only concise and free of the “arbitrary” regions used in (2.4), they also simplify the solutions of various problems in electromagnetic theory. In addition, they are very useful for gaining more insight into properties of the electromagnetic field as well as for understanding special relativity as we shall see in the next chapter.
The electrostatic and magnetic vector potentials For both physical and practical reasons, we frequently refer to electromagnetic potentials instead of fields. This is analogous to discussing potential energy functions in place of forces in mechanical systems. First consider static systems: that is, situations in which the electromagnetic fields have no explicit time dependence. Then (2.19c) implies that ∇ × E = 0 everywhere in space. Since the curl of any gradient is identically zero, we can write that E = −∇V
(2.20)
where V = V(r) is called the electrostatic potential, and the minus sign is chosen to preserve the convention with mechanical potential energy. (We refrain from using the term scalar potential because in terms of special relativity, V is not a scalar but rather the time-like component of a Lorentz four-vector.) Clearly we can add an arbitrary constant to V(r) without affecting the electric field. When the field is due to a localized charge distribution, however, it is a universal convention that we choose this constant so that the electrostatic potential is equal to zero at any point infinitely far away from the charges. Consider moving a charge q from point a to point b along some path C. The electric field E(r) exerts R a forceR F(r) = qE(r) on the charge, so the field does an amount of work W = C F · dl = C qE · dl on the charge. It is easy to show that W is
34 Quantum Mechanics independent of the path C. If we choose two such paths C1 and C2 , then Z b Z b Z b Z a I qE · dl − qE · dl = qE · dl + qE · dl = qE · dl a,C1 a,C2 a,C1 b,C2 C1 +C2 Z Z = q∇ × E · dA = − q∇ × ∇V · dA = 0 (2.21) using Stokes’ theorem and (2.20), plus the fact that the curl of any gradient is zero. Therefore, W is the same for both paths, and we can define the electrostatic potential energy as Z Z U = −W = − qE · dl = q∇V · dl = qV(rb ) − qV(ra ) = q∆Vab (2.22) C
C
where we refer to ∆Vab as the potential difference between points a and b. In the case of localized charge distributions, the energy change when a charge is moved to a position r from infinitely far away is U(r) = qV(r). Combining (2.20) with (2.19a) leads us to ∇2 V = −
ρ 0
(2.23)
which is called Poisson’s equation. In practice, one solves (2.23) for V(r), given a charge distribution ρ(r), and then uses (2.20) to find the electric field. There is a wide variety of techniques for solving Poisson’s equation, but we shall not consider these here. Now consider the potential associated with the magnetic field. In this case, we start with (2.19b), which holds even for time dependent fields. Since the divergence of any curl is also zero, we can write B = ∇×A
(2.24)
where A = A(r, t) is called the magnetic vector potential. In practice, it turns out that this vector potential is not especially useful in solving time-independent problems given a distribution of currents. It is, however, very useful for gaining insight into the nature of electromagnetic fields. For example, it can be used to generalize (2.20) to the case of time-dependent electric fields. If we write E = −∇V −
∂A ∂t
(2.25)
we find by substitution that E satisfies Maxwell’s equations (2.19b) and (2.19c). Equations (2.19a) and (2.19d) then become differential equations for V(r, t) and A(r, t) in terms of the sources ρ(r.t) and j(r, t). We noted above that the electric potential V is defined only up to an arbitrary added constant. Something similar applies in the case of the magnetic vector potential, but in this case, any vector that can be expressed in the form ∇χ, where χ is any scalar function of r, can be added to A without affecting either of the fields B and E. This property is sometimes referred to as gauge invariance.
Maxwell’s Equations and Electromagnetic Waves 35
Charge conservation and the continuity equation Maxwell’s equations imply that charge is conserved. Let us prove this statement by considering how the amount of charge can change in an arbitrary region of space. The rate of change of the charge enclosed in an arbitrary region is Z Z d d d (∇ · E) dτ ρdτ = 0 (2.26) Qenclosed = dt dt dt where Gauss’s law (2.19a) is used in the last step. We can bring the time derivative inside the integral as a partial derivative because, once again, only the explicit time dependence survives the integration. The partial derivatives commute (i.e. can be performed in either order without affecting the result), and Amp´ere’s law can then be used to replace the time derivative ∂E/∂t. Then, noting that the divergence of the curl of any vector field is zero, we have Z Z d 1 Qenclosed = ∇ · (∇ × B) − µ0 ∇ · j dτ = − ∇ · j dτ (2.27) dt µ0 Finally, using the divergence theorem, we get d Qenclosed = − dt
I j · dA
(2.28)
This is a statement of charge conservation. Remembering that j is the amount of charge flowing in a particular direction per unit time per unit area, (2.28) says that the amount of charge in some enclosed region changes only as a result of charge flow into, or out of, that region. In other words, charge cannot be created or destroyed, only moved from place to place. The overall minus sign is a result of our convention that a surface integral is performed with the area elements pointing outward from an enclosed region. Another way of stating charge conservation is the continuity equation. Combining (2.26) and (2.27) and arguing that if two integrals over the same arbitrary region are equal this implies that their integrands must be equal, we get ∂ρ +∇·j = 0 ∂t
(2.29)
This is a general statement that any quantity represented by a density ρ and current density j is conserved. We will see later, for example, that the Schr¨odinger equation implies that probability density is similarly conserved in quantum mechanics.
Application: the betatron Particle accelerators have played a large role in our experimental verification of quantum mechanics, particularly in the context of subatomic physics. There are
36 Quantum Mechanics
many types of accelerators, for example Van de Graaff, cyclotrons, synchrotrons, and linear accelerators, but they all make use of electromagnetic fields to add energy to electrons, protons, or atomic ions. A particularly innovative design is the Betatron, also known as an induction accelerator. An electron moves in a circle of constant radius in an axially symmetric magnetic field whose magnitude steadily increases with time. By Faraday’s law, the changing magnetic field induces an electric field which accelerates the electron. As is discussed later, an accelerating electric charge radiates energy in the form of electromagnetic waves. The system is designed so that the energy gain of the electron just matches what is radiated so the electron remains in the same circular path.
B, or dB/dt
v
E
R E
v
A Betatron accelerates particles using the electric field from magnetic induction. The diagram shows the principle, and defines quantities that allow us to derive the so-called Betatron Condition. The photograph (Courtesy Department of Physics, University of Illinois at Urbana-Champaign) shows early Betatrons with inventor, Donald Kerst. FIGURE 2.4
Figure 2.4 shows the geometry. Assume we want to accelerate an electron with charge q = −e in a circular orbit of radius R. For the area enclosed by the orbital plane, Faraday’s law (2.4c) gives us Z d d E(2πR) = BdA = πR2 hBi (2.30) dt Orbit dt where hBi is just the average magnetic field over the area enclosed by the orbit. It is easy to show, from the radial force and acceleration, that the electron
Maxwell’s Equations and Electromagnetic Waves 37
momentum (tangential to the orbit) is p = eBR R where BR is the magnetic field at the radius R. The change in momentum p˙ must equal the force from the induced electric field E, and therefore e
Rd d BR R = eE = e hBi dt 2 dt
so
1 BR = hBi 2
(2.31)
That is, the magnetic field at the orbital radius must be one-half of the average magnetic field over the area enclosed by the orbit. This is known as the Betatron Condition, and is the reason for the beveled edges on the pole faces in Fig. 2.4. The Betatron was invented in 1940 at the University of Illinois at UrbanaChampaign by Professor Donald Kerst. It was immediately useful as a powerful X-ray source, including a 25 MeV model built commercially, and in 1950 a 300 MeV Betatron was built for nuclear physics research at the University. In practice, the magnetic field is driven sinusoidally around zero, rather than just increasing. Therefore, for only one quarter of the time (the part of the cycle where the field is positive and increasing) does the Betatron actually accelerate. That is, it produces a pulsed, rather than continuous, beam of radiation.
2.4
ELECTROMAGNETIC WAVES
Probably the most dramatic consequence of Maxwell’s equations is that they predict the existence of electromagnetic waves. This essentially comes from Faraday’s law, which predicts that a changing electric field induces a magnetic field, and Amp´ere’s law (with the addition of Maxwell’s displacement current) which predicts that a changing magnetic field induces an electric field. The interplay between these two leads to the phenomenon of a propagating electromagnetic wave. Moreover, Gauss’s Law for electric and magnetic fields, require that electromagnetic waves are transverse. That is, similarly to waves on a stretched string, the oscillations occur in a direction perpendicular to the direction of propagation. It is not hard to show that (the three dimensional version of) the wave equation (1.12) follows directly from Maxwell’s equations in “free space,” that is, with ρ = 0 and j = 0 everywhere. Starting by taking the curl of both sides of Faraday’s law (2.19c) and making use of the theorem ∇ × (∇ × V) = ∇ (∇ · V) − ∇2 V
(2.32)
for any vector field V(r), where ∇2 V = (∇ · ∇) V. This leads to ∇ (∇ · E) − ∇2 E = −
∂ ∇×B ∂t
(2.33)
where on the right side we again use the fact that partial derivatives commute. In free space, Gauss’s law tells us that the first term on the left is zero, and Amp´ere’s law
38 Quantum Mechanics says that ∇ × B is just proportional to ∂E/∂t. Using (2.33) we finally have 1 ∂2 E − ∇2 E = 0 c2 ∂t2
(2.34)
where ∇2 = ∇ · ∇ is the Laplacian operator. In Cartesian coordinate coordinates, ∇2 V =
∂2 V ∂2 V ∂2 V + + 2 ∂x2 ∂y2 ∂z
(2.35)
for any vector field V, so equation (2.34) is simply the analogue of the wave equation (1.12) in three spatial dimensions. The electric field in free space therefore has the behaviour of a wave whose speed is c. The same equation can be derived for the magnetic field. See the problems at the end of this chapter. In addition to describing a wave in three dimensions instead of one, (2.34) differs from (1.12) in that it is a vector equation. In other words, the electric field wave has, in principle, three separate components. We will see, however, that one of those components is strictly zero. First, though, let us try to construct general solutions to (2.34) as we did for (1.12). In one spatial dimension we wrote solutions as f (x ∓ vt) or f (kx ∓ ωt), but in three dimensions there is more flexibility. One simple way to see this is to consider E(r, t) = E0 f (z ∓ ct) (2.36) where E0 is a constant (vector) with magnitude E0 . Equation (2.36) is a wave solution that moves in the z-direction. Obviously, we could write the same thing for either the x- or y-directions. In fact, we could have a wave moving in any direction we want. How do we write this generally? Equation (2.36) provides a clue. It has no dependence on x or y, so we can picture the entire xy plane moving along the z-axis. We speak of a “planar wave front” and solutions like (2.36) are called plane waves. A plane in three dimensional space is the set of points r for which k · (r − a) = 0, where a is an arbitrary point on the plane. That is, the points r for which k · r is constant describe a plane with normal vector k. Therefore E(r, t) = E0 f (k · r − ωt)
(2.37)
describes a plane wave moving in the direction k. The wave equation, and Maxwell’s equations separately, put constraints on the parameters k and E0 . Writing k = k x x0 + ky y0 + kz z0 , so that k · r = k x x + ky y + kz z, it is easy to see that (2.34) implies that ω = kc where k = |k|. Furthermore, if we write E0 = E0 x x0 + E0y y0 + E0z z0 , it is straightforward to use (2.19a) to show that k · E0 = 0. In other words, the electric field vector must be perpendicular to the direction of wave propagation, so electromagnetic waves are transverse. Now consider the magnetic field, and write B(r, t) = B0 g(k · r − ωt). Faraday’s law (2.19c) combined with (2.37), gives k × E0 f 0 (k · r − ωt) = ωB0 g0 (k · r − ωt)
(2.38)
Maxwell’s Equations and Electromagnetic Waves 39 which must hold at all places and at all times. Clearly the functions f (ξ) and g(ξ) must be the same (to within an additive constant) and 1 B0 = k0 × E0 c
(2.39)
where k0 is the unit vector in the direction of propagation. This means that in addition to E and B being perpendicular to the direction of the wave, they are also perpendicular to each other. Furthermore, the magnitudes of the electric and magnet field waves are related by E0 = cB0 (2.40) Plane waves are not the only solution to the wave equation in three dimensions. It is also possible for a wave front to be the surface of an expanding sphere. Spherical waves of the form f (kr − ωt) g(θ, φ) (2.41) E(r, t) = E0 r propagate away from the origin in all directions. (There are some restrictions on the functions f and g, but they are largely arbitrary.) That is, k = kr0 . On the other hand, over a small region far away from the source, in any particular direction we see that plane waves are a good approximation to (2.41).
2.5
ELECTROMAGNETIC RADIATION
A stationary electric charge gives rise to an electric field, but no magnetic field, while a steady electric current gives rise to a constant magnetic field. How then does one physically produce an electromagnetic wave? The answer, although by no means obvious, is that accelerating electric charges give rise to electromagnetic waves. To show this, mathematically one needs to develop the concept of “retarded time,” but we will not go into that here. Instead, we will motivate the idea by outlining a special case, namely that of electric dipole radiation. An electric dipole is simply two charges with the same magnitude but opposite signs, separated by some distance. (See the problems at the end of the chapter for examples using electric and magnetic dipoles.) For a dipole to radiate, this separation distance has to change with time and we assume that it does so sinusoidally. This results in the emission of an electromagnetic wave which spreads out equally in all directions perpendicular to the direction of the dipole, but has zero amplitude parallel to this direction Figure 2.5 illustrates this. Twice during each complete period of oscillation, the charges are at the same position and cancel each other, so the field lines cut off, but then the configuration reverses itself leading to a reversal in the electric (and magnetic) fields. The situation then repeats, leading to a series of electric and magnetic fields in alternating directions, radiating outward in the form of an electromagnetic wave.. Radiation from an oscillating system of charges involves three different distance scales. One is the size d of the radiating system. For example, an oscillating electric
40 Quantum Mechanics
,
,
,
,
FIGURE 2.5 A schematic representation of electric dipole radiation. The frames from left to right represent increasing time, while each frame represents the same portion of space. The lines are contours of constant electric field strength. The source is an infinitesimally small oscillating electric dipole at the origin, oriented in the vertical direction. The field strength is inversely proportional to distance from the origin, with the field lines forming closed loops that move away from the dipole.
dipole would consist of two charges ±q separated by a distance d cos ωt. The second scale is the wavelength λ = 2πc/ω of the emitted radiation. The distance R at which the radiation is detected makes up the third distance scale. Our subsequent discussion will concentrate on the “far field” where d λ R. We are often interested in the “intensity” distribution I(θ) for a radiating system. This is the average energy emitted per unit area per unit time, into some accepted solid angle in some direction. For an oscillating electric dipole p = p0 cos ωt with p0 = qd, it can be shown that I(θ) =
µ0 p20 ω4 sin2 θ 32π2 c
R2
(2.42)
for the intensity distribution where θ is the polar angle defining the direction of propagation with respect to the dipole direction. We can integrate over a sphere of radius R to find the total radiated power P=
µ0 p20 ω4 12πc
(2.43)
There are some important things to recognize about (2.42) and (2.42). First, (2.42) has a well-defined angular distribution of the emitted radiation. It is maximized at 90◦ relative to the direction of the dipole, and is zero in the direction parallel to this—c.f. Figure 2.5. Secondly, (2.43) is independent of R, because (2.42) is proportional to 1/R2 . That is, the total energy flux though a sphere is independent of the radius of the sphere. This is a defining characteristic of “radiation,” which transports energy into or out of the sphere, while there is no net change in the total energy.
Maxwell’s Equations and Electromagnetic Waves 41 Finally, these equations make it clear that the frequency of the radiation is just the frequency at which the dipole oscillates. In other words, the power of the emitted radiation is determined by the oscillation frequency and nothing else. As we shall see later, this is a critical factor for appreciating a crisis that physicists confronted in the early part of the 20th century.
Thomson scattering We are now in a position to ask what happens when electromagnetic radiation encounters an atom. Our first thought should be that it will somehow affect the atomic electrons, as they are by far the lowest-mass charged components. However, the electrons are bound in the atoms, and this must have consequences. Let’s restrict our discussion to radiation that “carries energy” which exceeds by a large amount the binding energy of the electrons. Physicists from 100 years ago had difficulty quantifying that statement, although we can now understand it perfectly well on quantum mechanical grounds. Nevertheless, there was good evidence that for light in the far ultraviolet or near X-ray spectral regions, the electrons behaved as if they were essentially free. This is where we focus our attention now. The scattering of light from a free electron is called Thomson scattering. The idea is actually very simple. An electromagnetic plane wave passes over an electron, and the oscillating electric field causes the electron to oscillate as well. That is, it sets up an electric dipole that oscillates with the same frequency as the incoming radiation. This leads to dipole radiation, with an intensity pattern given by Equation (2.42). If the incoming radiation is polarized, that is if the electric fields all line up in the same plane, then that electric field direction defines the axis from which θ is measured. Thomson scattering therefore presents a clear prediction for both the intensity and scattering angle distribution of light from free electrons. Furthermore, the scattered radiation needs to have exactly the same frequency and wavelength of the incoming light. The breakdown of this prediction, when compared to experiments done by A. H. Compton, represented a turning point in physics. In order to appreciate the motivation for performing these experiments in the first place, however, we need to take some time to review the postulates and implications of special relativity.
2.6
PROBLEMS
2.1 Consider a vector field V(r) = 4xx0 + 5yy0 + 6zz0 and a closed, cubicalH S surface with side length L and one corner at the origin, lying in the first octant. Evaluate the integral S V · dA by first carrying out the dot product and integral on each of the six faces of the cube, and adding them up. Check your answer by using the divergence theorem, which you are likely able to do in your head. 2.2 Show that the integral form of Coulomb’s law can be derived from Gauss’s law. First, argue why rotational symmetry implies that the electric field from a point charge q has to be isotropic in all directions, and can only depend on the distance r from the charge. Next, use this to choose an appropriate “Gaussian surface” S so that the integral in Equation (2.4a) is simple to evaluate. Finally, use Equation (2.5) to show that the force F on another charge q0 is 1 qq0 F= 4π0 r2
42 Quantum Mechanics 2.3 A “parallel plate capacitor” is made from two plane conducting sheets, each with area A, separated by a distance d. The plates carry equal but opposite charges ±Q, uniformly distributed over their surface, and this creates a potential difference V between them. Infer the (constant) electric field between the plates, and use Gauss’s law to show that Q = CV, where C depends only A and d (and 0 ). 2.4 Use the concept of a parallel plate capacitor to find the energy density in an electric field. Charge is added in small increments dQ0 to an initially uncharged capacitor giving a potential difference V 0 . Each increment changes the stored energy by V 0 dQ0 = (Q0 /C)dQ0 where C is the capacitance. (See Problem 2.3.) Integrate to find the total energy when charge Q is stored in the capacitor. Divide by the volume of the capacitor to find the electric field energy density uE =
1 0 E 2 2
where E is the electric field inside the capacitor. 2.5 Calculate the magnetic field at a distance r from an infinitely long straight wire which carries a current I. First, using Gauss’s law for magnetism, explain why the field must be tangential to a circle of radius r, centered on the wire and lying in a plan perpendicular to the wire. Then use Amp´ere’s law to show that the magnitude of the magnetic field is B=
µ0 I 2πr
2.6 A long cylindrical coil of wire is called a solenoid and can be used to store a magnetic field. If the coil is infinitely long, there is a uniform magnetic field in the axial direction inside the coil, and no field outside the coil. Use an “Amperian Loop” that is a rectangle enclosing some length of the coil, with one leg inside and one leg outside, to show that the magnetic field is B = µ0 In where I is the current in the wire and there are n turns per unit length in the coil. 2.7 Find the vector potential A(r) which gives the magnetic field for the long straight wire in Problem 2.5. It is easiest to let the wire lie along the z-axis and express your result in terms of r = (x2 + y2 )1/2 , and to carry out the calculation in cylindrical polar coordinates (r, θ, z). 2.8 Follow this guide to convince yourself that the second term on the right in (2.4d) is needed for the whole equation to make sense. First, imagine a long straight current-carrying wire, with associated magnetic field given in Problem 2.5. Now “cut” the wire, and insert a very thin capacitor, with plates perpendicular to the direction of the wire. Current continues to flow through the wire while the capacitor charges up, but no current flows between the capacitor plates, so it would seem there should be no magnetic field there. But that doesn’t make sense: how could the magnetic field just stop at the capacitor? Intuitively, you expect it to be continuous right through it. Finally, show that the second term, called a displacement current, in fact gives you the same B inside the capacitor. 2.9 A “ 1/r2 ” vector field, such as the electric field from a point charge or the gravitational field from a point mass, takes the form k k V(r) = 2 r0 = 3 r r r Show by an explicit calculation in Cartesian coordinate coordinates, that H ∇ · V = 0 everywhere, except at the origin. Then, using a spherical surface about the origin, show that V · dA = 4πk. Hence argue that the charge density for a point charge q located at the origin is qδ3 (r) = qδ(x)δ(y)δ(z), where δ(x) is a Dirac δ(x) function as defined in Chapter 1.
Maxwell’s Equations and Electromagnetic Waves 43 2.10 A “ 1/r” vector field, such as the magnetic field from an infinitely long current carrying wire, takes the form k k V(r) = φ0 = z −yx0 + xy0 r r Show by an explicit calculation in Cartesian coordinate coordinates, that H ∇ × V = 0 everywhere, except at the origin. Then, using a circular curve about the z-axis, show that V · dl = 2πk. Hence argue that the current density for an infinitely long current carrying wire of zero thickness located along the z-axis is Iδ(x)δ(y). 2.11 Derive the wave equation for the magnetic field B from Maxwell’s equations. 2.12 Show that (2.39) is a solution to the wave Equation (2.34). It is easier to do if you express the Laplacian in spherical coordinates, but you may find it more satisfying to work it through in Cartesian coordinate coordinates, remembering that r = (x2 + y2 + z2 )1/2 . 2.13 Find the electrostatic potential V(r) at a distance r from a point charge q, with the R integration constant defined so that V(r) → 0 as r → ∞. You can do this either by directly integrating C E · dl along some path C, or by simply guessing the function V(r) which satisfies (2.20). 2.14 An “electric dipole” is formed by two point charges ±q separated by a distance d. Form such a system by putting the two charges on the z-axis at z = ±d/2 and calculate the electric field E(r). Show that for distances r d, the electric field can be written in terms of the electric dipole moment vector p = qdz0 instead of explicitly on either q or d. 2.15 Find the electrostatic potential V(r) at a distance r from the center of the electric dipole in Problem 2.14. Express your answer in terms of the electric dipole moment p. 2.16 Draw the electric field lines for and electric dipole with dipole moment p = pz0 . Superimpose contour lines of electric potential on top of the field lines. This is best done by choosing some specific values and using a program such as M ATHEMATICA. 2.17 An electric dipole p = qdz0 sits in a uniform, external electric field E. By considering the torque on the dipole, from the interaction of the charges ±q with E, calculate the work needed to rotate the dipole through an angle θ, about an axis perpendicular to the plane formed by p and E. Thereby show that the electrostatic potential energy of the dipole in the external field is U E = −p · E. 2.18 A “magnetic dipole” is formed by a planar loop of wire with area A and carrying current I. Using an approach similar to that in Problem 2.17, show that the magnetic potential energy of a magnetic dipole in an external magnetic field B is U B = −µ · B. Here, the magnetic moment µ has magnitude IA and direction perpendicular to the plane of the loop. You can do this rather easily if you model the loop as a square of side length L, in which case A = L2 , then use an argument similar to that used to prove Stokes’ theorem to explain why deriving this for a square is equivalent to solving it for any planar loop. 2.19 Given a scalar function χ(r, t), show explicitly that the transformation A → A + ∇χ has no effect on the magnetic field. Then find the condition that needs to be satisfied by χ so that the additional transformation V → V + χ has no effect on the electric field. 2.20 The Poynting Vector S = (1/µ0 )E × B for an electromagnetic wave. What is the direction of this vector? Show that the Poynting Vector has the dimensions of “energy flux,” that is, energy per unit area per unit time.
CHAPTER
3
Particle Mechanics, Relativity, and Photons CONTENTS 3.1
3.2
3.3
3.4
Newton, Maxwell, and Einstein . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The principle of relativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Galilean relativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The constancy of the speed of light . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spacetime in Special Relativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Events and the invariant interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Four-vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Lorentz transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Velocity, Momentum, and Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Massless particles: photons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application: invariant mass and the Higgs boson . . . . . . . . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46 46 47 48 49 50 51 52 52 54 56 58
The first attempts to formulate quantum mechanics tried to incorporate special relativity. This was only natural, as physical phenomena that required one or both of them came to be known around the beginning of the 20th Century. However, quantum mechanics and special relativity soon diverged, particularly as people saw the great success of Schr¨odinger’s non relativistic wave equation. It was few decades later, with the advent of quantum field theory, before a successful relativistic quantum theory could be developed. So, why do we take the time to review special relativity in an introductory book on quantum mechanics? Probably the most important reason is the historical context, particularly with respect to the concept of a “massless” particle. Indeed, the connection of such a notion to a “photon” is almost irresistible, as we shall see shortly. Another reason is to remind the student how relativity treats space and time on an equal footing, a concept that is respected by electricity and magnetism but which is violated in both classical mechanics and Schr¨odinger wave mechanics. Although the use of the mathematics of four-dimensional vector calculus is beyond
45
46 Quantum Mechanics the scope of this book, we believe it is important for the modern student to realize where the subject goes next.
3.1
NEWTON, MAXWELL, AND EINSTEIN
The essence of classical Newtonian mechanics is embodied by Newton’s second law of Motion, which states that a “force” acting on a body will change the “momentum” of that body. Written in modern notation this becomes F=
dp dt
(3.1)
For a particle with mass m and velocity v, the momentum p = mv. Therefore F=m
dv = ma dt
(3.2)
where a = dv/dt is the acceleration. Equation (3.2) is, of course, the way we usually think of Newton’s second law. We should make several points about Newton’s second law. First, it treats time as an absolute quantity, the same for anyone who makes a measurement, regardless of their position or velocity. This seems like a natural assumption, but there is no evidence to support it, and it is, in fact, false. The second law is a second order differential equation one solves for the position r = r(t), where v = dr/dt. Given initial conditions r(0) and v(0), the solution to the differential equation gives the position r(t) at all times with no uncertainty. That is, the motion is completely deterministic, something that we will see later, is at odds with quantum mechanics. Lastly, the second law is almost useless in practice, unless one has some model for the force F, which is generally a function of position and perhaps of time as well. One of Newton’s greatest triumphs, in fact, was his gravitational force law, which led to a mathematical prediction for the motion of planets, when coupled with the second law. There are more fundamental ways to formulate classical mechanics, but we will not pursue them in this book.
The principle of relativity The concept of “relativity” in physics is rather simple and certainly profound, yet it continues to confuse many people, including practicing physicists. We try here to state the principle of relativity in a straightforward fashion. Physics is based on the philosophy of physical law, namely that all “observers” will encounter a world that is governed by the same physical principles. When we formulate physics in terms of mathematics, this philosophy implies that all observers use the same equations to describe how things behave. It does not mean that the quantities in the equation have the same values, but it does mean that when everything in the equation is translated to a different observer, the “equal sign” still holds. The principle of relativity is a way of distinguishing observers. The observers
Particle Mechanics, Relativity, and Photons 47
y
y'=y
V
x or x' Two different inertial reference frames S and S 0 . The coordinates x and y locate a point in frame S , while coordinates x0 and y0 locate a point in frame S 0 . Frame S 0 moves to the right, that is in the positive x-direction, with speed V relative to S . FIGURE 3.1
can not only be in different spatial locations, they can also be moving relative to each other so long as their relative velocity is constant. Two observers moving with constant relative velocity are said to occupy “inertial reference frames.” This is because either of these observers would find that Newton’s first law, the so-called “Law of Inertia,” holds true. (The first law would not appear to hold to an observer in an accelerating reference frame.) In other words, the principle of relativity states that any two observers moving with constant velocity relative to each other, must observe phenomena that are governed by the same equations. Let us see what this implies about Newton’s second law and then Maxwell’s equations.
Galilean relativity So now we ask the question, Does Newton’s second law, written for example as (3.2), look the same to two observers in different inertial reference frames? The key to answering that question is to understand how to transform from one reference frame to the other. Let’s look at this simply, for motion in two dimensions x and y, in two frames that are moving relative to each other with constant speed V = Vx0 in the x-direction. See Figure 3.1. If we assume, as Newton did and as we might consider “obvious” from everyday experience, that t0 = t, then the transformation is very easy to derive. The y0 axis is simply displaced relative to the y-axis by a distance Vt. That is x0 t0
= x − Vt = t
(3.3a) (3.3b)
48 Quantum Mechanics This is called the Galilean transformation. We emphasize that (3.3b) is by no means derived, but rather it is an assumption that time is an absolute quantity, the same for all observers. Let us use the Galilean transformation to compare how speeds appear to observers in S and S 0 . In other words, how do velocities transform? The result is simple. If something moves with a speed (in the x-direction) u = dx/dt to an observer in S , and a speed u0 = dx0 /dt0 in S 0 , then (3.3) immediately tell us that u0 = u − V
(3.4)
which is exactly what our everyday experience says. That is, the difference in velocities is just the relative velocity between the two frames. For example, if someone fires a projectile horizontally while standing on a moving flatcar on a train, a person on the ground will see the projectile move at a greater or less speed, the difference given by the speed of the train. Now we can return to our original question, namely, does Newton’s second law apply to all observers in inertial reference frames? Stated in slightly more sophisticated language, Is Newton’s second law (3.2) invariant under the Galilean transformation (3.3)? From (3.4) it is clear that a0 =
du0 du = =a dt0 dt
(3.5)
so the acceleration of the mass m is the same to each observer. Although, depending on its specific form, the force F may look different in one frame from another, we can agree that it is still a “force.” Therefore, the answer is Yes, that is, Newton’s second law is invariant under a Galilean transformation. This concept of invariance under a transformation law turns out to be fundamental to all formulations of physics. It took a couple of centuries, though, for physicists to finally realize that the transformation (3.3) is fatally flawed. Suppose we ask the question, are Maxwell’s equations (2.19) invariant under a Galilean transformation? We are immediately stuck because we’d have to understand how electric and magnetic fields transform. You might imagine that we could just assume that every observer sees the same fields E and B, but you know that can’t be right. A stationary charge has only an electric field, but if it is moving, even with constant velocity, then it also has a magnetic field.
The constancy of the speed of light There is something else, though, that immediately demonstrates a fundamental problem with Maxwell’s equations and Galilean relativity. Imagine someone sending out a light pulse horizontally while standing on the platform of a moving train. According to (3.4), and as one might naturally expect, the speed of the light pulse to someone on the ground would differ by the speed of the train. However, as we showed in Section 2.4, electromagnetic waves always move at the same speed c, according to Maxwell’s equations. That is, there is nothing about the equations (2.19) that indicate any preferred reference frame.
Particle Mechanics, Relativity, and Photons 49
L V (a)
(b)
Schematic of a “clock” based on a light pulse that bounces between two mirrors. The mirrors are arranged vertically and are separated by a distance L. In (a) the clock is seen by a stationary observer, while in (b), the clock is moving horizontally with speed V. FIGURE 3.2
This is a fundamental inconsistency between the physical formulations of Newton and Maxwell. One or both of these formulations must be wrong, or at least modified in some way that makes them consistent. It was Einstein, of course, who recognized that the fatal flaw can be traced back to (3.3b). Einstein’s special theory of relativity dispenses with the notion of absolute time. Instead, it hinges on the notion that the “speed of light” is a constant of nature, and from this derives transformation laws that successfully marry Maxwell’s equations with a so-called relativistic formulation of the motion of masses.
3.2
SPACETIME IN SPECIAL RELATIVITY
It is instructive to investigate what the constancy of the “speed of light” implies about measurements of space and time, before we investigate the specific transformation equations that must replace (3.3). We do this by imagining a type of clock that uses light to keep time. Figure 3.2 shows what such a clock might look like, relative to a stationary or moving observer. Let one tick of the clock be defined as the time it takes the light pulse to make a round trip between the mirrors. If the clock is stationary, the path is shown in Fig. 3.2(a) so this time interval is simply ∆t = 2 ×
L c
(3.6)
If the clock is moving, as in Fig. 3.2(b), the during one tick of the clock, the bottom mirror moves a distance V∆t0 where ∆t0 is the time interval for the moving clock.
50 Quantum Mechanics The light pulse traces a path along the hypotenuse of two right triangles, so q L2 + (V∆t0 /2)2 ∆t0 = 2 × c
(3.7)
because we are assuming that c is the same in both frames. Solving this for ∆t0 and using (3.6), we easily derive the following relationship between the fundamental time intervals for the stationary and moving clock: ∆t0 = p
1 1 − V 2 /c2
∆t
(3.8)
It is customary to define the quantities
and
β =
V c
(3.9a)
γ
1 p 1 − β2
(3.9b)
=
in which case we rewrite (3.8) as ∆t0 = γ∆t
(3.10)
Therefore, yes, time will be different for different observers. Indeed, space and time are intimately connected. In fact, we generally speak of spacetime when discussing space and time in special relativity. Two important facts are immediately apparent. First, γ > 1 for any finite velocity, so (3.10) says that a tick on a moving clock last longer than for a stationary clock. That is, moving clocks record time more slowly than stationary clocks. This phenomenon is called “time dilation.” Secondly, β 1 for any “normal” speed, so time dilation is in fact a tiny effect in everyday life. In order to verify (3.10), one either needs to move very rapidly, or to make extremely precise measurements, neither of which was possible until well into the 20th Century. Consequently, Newton’s assumption of “absolute time” appears to be quite reasonable based on everyday experience. We also see our first hint that there is something special about the “speed of light” c. Equations (3.9) make no sense for V ≥ c. We’ll discover other peculiar things about moving “faster than light” but it should be clear by now that c plays a role much larger than simply the speed of electromagnetic waves.
Events and the invariant interval It is convenient to discuss special relativistic (or, simply, “relativistic”) phenomena in terms of “events.” An event is something which occurs at a particular point in space at a particular time. The so-called spacetime coordinates of an event will be different for different observers. The transformation between spacetime coordinates for different observers is called the Lorentz transformation.
Particle Mechanics, Relativity, and Photons 51 One click of our light pulse clock is the time interval between two events. The first event is the light pulse leaving the bottom mirror, and the second event is the pulse returning. We now consider the separations in spacetime between these two events. For an observer who sees the clock as stationary, as in Fig. 3.2(a), the time separation is ∆t = 2L/c and the spatial separation is ∆x = 0. On the other hand, for the observer who sees Fig. 3.2(b), ∆t0 = 2γL/c and ∆x0 = V∆t0 . Next consider (∆s)2 = c2 (∆t)2 − (∆x)2 = 4L2
(3.11)
using quantities for the stationary clock, and ∆s0
2
V2 = c2 ∆t0 2 − ∆x0 2 = 4L2 γ2 − 4L2 γ2 2 = 4L2 c
(3.12)
for the moving clock. That is, ∆s0 = ∆s. That is, this quantity is the same for any two events in spacetime. We call the quantity ∆s the Invariant Interval. Events for which ∆s > 0 are called timelike. There is some reference frame in which ∆x = 0 so the separation between the two events is only in time, not in space. On the other hand, events for which ∆s < 0 are called spacelike. In this case, there is some frame in which the two events are separated in space but happen simultaneously. If ∆s = 0, we call the separation lightlike.
Four-vectors The concept of invariance in special relativity is usually formulated using fourvectors, a generalization of vectors in three dimensional space. Consider the quantity ∆x = (c∆t, ∆x, ∆y, ∆z) = (c∆t, ∆r)
(3.13)
which combines displacement in space and time into one entity, as our first fourvector. Although we are now explicitly including the spatial coordinates y and z, we will still only work with frame transformations, also known as “boosts,” in the x-direction. Given two four-vectors A = (A0 , A) and B = (B0 , B), we define the inner product A · B = A0 B0 − A · B
(3.14)
where A · B is the usual inner product between spatial three-vectors. Under this definition, the invariant interval can be written as (∆s)2 = ∆x · ∆x
(3.15)
That is, when the components of ∆x transform appropriately from one reference frame to another, the inner product ∆x · ∆x is invariant. This carries over to fourvectors in general, so that A · B is also relativistically invariant, that is, invariant under boosts. Mathematically, this is completely analogous to A · B being invariant under rotations for three vectors A and B.
52 Quantum Mechanics
The Lorentz transformation Here we present the formal equations for the Lorentz transformation, also known as a boost, between two reference frames S and S 0 , where S 0 moves in the positive xdirection with speed V relative to S . We write down this transformation for a general four-vector A = (A0 , A) = (A0 , A x , Ay , Az ), and leave the derivation, in terms of the spacetime four-vector x = (ct, r), for the problems at the end of this chapter. Maintaining the definitions (3.9), the Lorentz transformation is A00 A0x A0y A0z
= γA0 − γβA x = γA x − γβA0 = Ay = Az
(3.16a) (3.16b) (3.16c) (3.16d)
It is easy to see that this transformation is just the identity when β → 0.
3.3
VELOCITY, MOMENTUM, AND ENERGY
Let us build another four-vector that we will call four-velocity. For a particle observed in a general reference frame, we write its ordinary three-velocity as u0 =
∆r0 ∆t0
(3.17)
(Of course, we should write u0 = dr0 /dt0 , but we will just assume that changes can be infinitesimally small if we need them to be.) This is inconvenient to use as a model to define four-velocity, because both the numerator and denominator will transform differently from frame to frame. Instead, we can divide by the proper time ∆τ, namely the time interval between events as measured by a stationary clock. So, using (3.10), we have the four-velocity ! ! ∆x0 c∆t0 ∆r0 c∆t0 ∆t0 ∆r0 u0 = = , = , = γ(c, u0 ) (3.18) ∆τ ∆τ ∆τ ∆τ ∆τ ∆t0 where, in this case, γ refers to the velocity needed to transform to a frame where the particle is at rest, namely u0 . The inner product of the four-velocity with itself is indeed relativistically invariant: 2 u0 · u0 = γ2 c2 − u0 = γ2 c2 (1 − β2 ) = c2 (3.19) There is enormous power in a formulation of special relativity using four-vectors, and we have only scratched the surface here, but it is enough for now to help motivate our study of quantum mechanics. It is natural for us now to define the four-momentum for a particle with mass m. Dispensing with the “primed” notation, we have p = mu = (γmc, γmu) = (γmc, p)
(3.20)
Particle Mechanics, Relativity, and Photons 53 where p = γmu is the relativistic three-momentum. Using an obvious notation where
p2 = p · p, the square of the four-momentum is
p2 = m2 c2
(3.21)
We can write the spatial components of (3.20) as a relativistic version of ordinary three-momentum, that is p = γmu. Sometimes the term “relativistic mass” is used for the quantity γm, but we will avoid that potentially confusing concept here. Now consider the timelike component of (3.20). Its interpretation becomes apparent if we multiply by c and expand for low speeds. That is " γmc = mc 1 − 2
2
u 2 #− 12 c
# " 1 1 u 2 = mc2 + mu2 ≈ mc2 1 + 2 c 2
(3.22)
The second term is the ordinary (non-relativistic) kinetic energy, while the first is a new term that we can call rest energy. We generally write E = γmc2 = mc2 + K
(3.23)
for the total energy of a particle of mass m, with kinetic energy K = (γ − 1)mc2 . It is common to write the four-momentum in terms of the (relativistic) total energy and three-momentum, that is E p= ,p (3.24) c Combining (3.21) with (3.24) gives us a convenient relationship between a particle’s energy and three momentum. Using m2 c2 = E 2 /γ2 c2 , we find E2 E2 2 − |p| = c2 γ2 c2
(3.25)
Then, using γ2 = 1/(1 − β2 ), we arrive at the useful expression |p|c =β E
(3.26)
As one might expect, the total four-momentum of a collection of particles is a conserved quantity. That is, three-momentum and total energy are conserved separately. (The idea of potential energy, dependent on three-dimensional space, is not well defined in special relativity, so we do not include it in this discussion.) In fact, this can be proved within the context of Einstein’s general theory of relativity, but that is beyond the scope of the material we are covering here. Conservation of four-momentum, however, is a powerful tool for calculations using relativistic kinematics. One reason for this is that the inner product between any two four vectors gives the same answer no matter what frame in which it is evaluated. As an example, let us work out the solution to a common problem in twobody inelastic collisions.
54 Quantum Mechanics Worked Example 3.1 An object of mass m1 moves with speed v. It collides with, and sticks to, a stationary object of mass m2 . Find the mass M of the final, combined object, and check that you get the right answer if v c. Solution We can express conservation of four-momentum in this collision as p1 + p2 = P
where p1 is the four-momentum of m1 , p2 is the four-momentum of m2 , and P is the fourmomentum of the combined object. Squaring both sides and using (3.21) gives m21 c2 + 2p1 · p2 + m22 c2 = M 2 c2 Given the information at hand, it is simplest to evaluate p1 · p2 in the frame where m2 is at rest, typically called the “laboratory frame.” In this case p1 = (γm1 c, γm1 v)
and
p2 = (m2 c, 0)
so that p1 · p2 = γm1 m2 c2 using the definition (3.14) of the inner product between four-vectors. We therefore find h i1/2 M = m21 + 2γm1 m2 + m22 which correctly reduces to M = m1 + m2 in the limit v/c → 0 where γ → 1.
This example points us to the concept of centre-of-mass energy in special relativity. If we discuss the collision in terms of a frame S ? which has zero total three-momentum, that is the centre-of-mass frame, then by definition p?1 + p?2 = 0. Therefore p?1 + p?2 = (E1? /c + E2? /c, 0) and P? = (Mc, 0), and these two four-vectors are of course equal because of four-momentum conservation. Writing the total energy in the centre-of-mass as Ecm = E1? + E2? , we see that Ecm = Mc2 . In other words, the total amount of “available” energy to create mass in a collision is just the total energy in the centre-of-mass frame.
Massless particles: photons A natural question to ask is, “Can there be a particle with zero mass?” It is not hard to imagine that a zero-mass particle can have nonzero energy, but E = γmc2 seems to imply that it must have speed c for E to be finite. Let us use our relativistic four-vector formalism to try and put this idea on more solid ground. From (3.21) we see that a convenient definition of a massless particle is one whose four-momentum squared is zero. Combining this with (3.24) gives us E = |p|c
(3.27)
which is obviously consistent with (3.26) for β = 1. This is the relationship between energy and momentum for a massless particle. In this case, of course, the total and kinetic energy are the same. In Chapter 4 we will go through the evidence, which was mounting in the early part of the 20th Century, that light behaved as if it came in “packets” of energy. These packets came to be called photons, although it was difficult to consider
Particle Mechanics, Relativity, and Photons 55
m,p E
θ E'
Scattering of a massless particle with energy E from a stationary particle with mass m. The massless particle moves off at an angle θ and with energy E 0 . The massive particle moves off with momentum p. FIGURE 3.3
photons “particles” until the discovery of Compton scattering. This phenomenon demonstrated that photons behaved according to the laws of relativistic kinematics. All of the evidence was that photons had energy, but zero mass. We know electromagnetic radiation can scatter from atomic electrons, so it is not unreasonable to suspect that photons might scatter from electrons. How would such a scattering manifest itself? We can answer this question by simply considering the kinematics of scattering of a zero mass particle. Referring to Figure 3.3, the four-momenta of the incident and scattered massless particle are
k
and
k0
E E , k0 c c ! 0 E E0 0 = , k c c 0
=
(3.28a) (3.28b)
respectively, where k0 and k00 are unit vectors in the directions of the incident and scattered photons respectively. If we write p0 = (mc, 0) for the four-vector of the, initially stationary, massive particle, and p for the massive particle after the collision, then conservation of four-momentum tells us that k + p0 = k 0 + p
(3.29)
56 Quantum Mechanics After a slight rearrangement and squaring both sides, we find (k + p0 − k0 )2 = p2
(3.30)
Recall that the square of a particle’s four-momentum is just determined by its mass; see (3.21). Equation (3.30) therefore simplifies to k · p0 − k · k 0 − k 0 · p0 = 0
Using p0 = (mc, 0) and Equations (3.28), this becomes ! E E E0 E E0 E0 0 mc − − k0 · k0 − mc = 0 c c c c c c
(3.31)
(3.32)
However k0 · k00 = cos θ so we finally arrive at E i E0 = h E 1 + mc2 (1 − cos θ)
(3.33)
In other words, the effect of scattering a massless particle from a massive one is that the energy of the massless particle is decreased, depending on the scattering angle. This is, of course, completely expected. We also see that the amount of this energy decrease depends on the size of the energy relative to the mass (times c2 ) of the target particle. For low energies, the scattered energy is the same energy as the incident.c We will see in the next chapter how this is connected to concept of “photons” in the early development of quantum mechanics.
Application: invariant mass and the Higgs boson Most subatomic particles are unstable. They decay rapidly to lighter particles that live long enough to be observed in detectors. These detectors can precisely measure the direction, momentum, and energy of the long lived particles. Special relativity provides a framework for measuring the mass of the unstable particles, by observing the properties of the long lived daughters. Suppose an unstable particle with mass M and four-momentum P decays to two so called “daughter” particles with masses m1 and m2 and four-momenta p1 and p2 . Conservation of the four-momentum tells us that p1 + p2 = P
(3.34)
m21 c2 + 2p1 · p2 + m22 c2 = M 2 c2
(3.35)
and squaring both sides gives us
The key is that particle detectors measure the momenta and energy of the
Particle Mechanics, Relativity, and Photons 57
daughters (and identifies them so we know their mass). Since p1 · p2 is a relativistic invariant, we can evaluate it in any reference frame and get the same answer.
Two very high energy protons collide in the CMS detector at CERN, producing hundreds of elementary particles. Two of these are identified as photons in one particular collision. Plotting the invariant mass of pairs of photons in many thousands of events, shows a “bump” in the spectrum at 125 GeV/c2 . This is one observation of the Higgs boson. The diagram appears courtesy CERN. The plot is from the publication by V. Khachatryan, et al, EPJC, October 2014, 74:3076, courtesy Springer. FIGURE 3.4
Particle physicists have used this so-called invariant mass technique for many decades, and most recently it was applied to the discovery of the Higgs boson. Theory does not predict the mass of the Higgs, but given the mass, it precisely predicts the rate for two-photon decay H → γγ. Therefore, one approach to searching for the Higgs boson is to produce it in high energy collisions and then search for photon pairs that point to a unique invariant mass. Large particle detectors, such as the Compact Muon Solenoid (CMS) at CERN, observe photons as large, localized energy deposits on a cylindrical surface surrounding the collision point. See Figure 3.4. The energy of the photon is measured directly, and the position on the cylinder tells the direction. Energetic photons, however, most often come from processes other than Higgs boson production and decay. In addition, reaction products other than photons can also leave large amounts of energy in the photon detectors. Consequently, “photon pairs” are routinely observed that have nothing to do with the Higgs, and instead provide a “background” which can obscure the Higgs signal. Only by examining a very large number of collisions, can we extract the signal from background.
58 Quantum Mechanics
Following Equation (3.34) we set m1 = m2 = 0 and define 2 Mγγ = 2p1 · p2 /c2 = 2E1 E2 (1 − cos θ12 )/c4
(3.36)
where E1,2 are the individual photon energies, and θ12 is the angle between the two photons. Figure 3.4 includes a histogram of Mγγ for a large number of collisions. The histogram is almost featureless, except for a “bump” near Mγγ = 125 GeV/c2 , accentuated on the bottom where the smooth background is subtracted. This is how the Higgs boson was discovered.
3.4
PROBLEMS
3.1 Derive the Lorentz Transformation (3.16) for space-time coordinates (ct, x, y, z) for a boost along the x-direction using the following construct. Let the set of axes (x0 , y0 , z0 ) move along the x-axis of an (x.y.x) coordinate system at speed V. Let their origins coincide at t = t0 = 0, at which time a spherical light pulse is emitted. Then, the constancy of the speed of light implies that x2 + y2 + z2 = c2 t2
and
2
2
2
2
x0 + y0 + z0 = c2 t0
Then write t0 = α(t − βx), x0 = γ(x − δt), y0 = y, and z0 = z and find solutions for α, β, γ, and δ. 3.2 Invert the Lorentz Transformation (3.16) by directly solving for the Aµ in terms of the A0µ and show that the result is the same as taking β → −β. 3.3 Write the Lorentz Transformation (3.16) in terms of the rapidity η, defined as tanh η = β. Your result will contain cosh η and sinh η but not γ or β explicitly. Can you see a resemblance to ordinary rotation in three space dimensions? 3.4 Use the Lorentz Transformation (3.16) to demonstrate time dilation 3.10. That is, apply the transformation to two events A and B which occur at the same place in the frame S 0 , and compare the time difference between them as seen by an observer in each of the two frames. 3.5 Use the Lorentz Transformation (3.16) to demonstrate length contraction. That is, apply the transformation to two events A and B which occur at the same time in the frame S 0 , and compare the time difference between them as seen by an observer in each of the two frames. 3.6 Use the results of Problems 3.4 and 3.5 to show that (∆s)2 is the same to observers in both frames. 3.7 Use the Lorentz Transformation (3.16) to derive the relativistic velocity addition formula. That is, write u0 = ∆x0 /∆t0 in terms of u = ∆x/∆t and the relative frame velocity V. Show that if the speed is c in the S frame, then it is c in the S 0 frame. 3.8 You are probably familiar with the Doppler effect as the frequency shift of a sound wave when the source is receding from, or approaching, the listener. There is a similar effect with electromagnetic radiation, but it is generally important to take into account relativistic effects because the sources are often moving at high speeds, and time dilation effects must be taken into account. Show that if an electromagnetic wave is emitted with frequency f0 in the source rest frame, then an observer with relative speed V sees a frequency s s 1 + V/c 1 − V/c f = f0 approaching or f = f0 receding 1 − V/c 1 + V/c
Particle Mechanics, Relativity, and Photons 59 You should also note that relativity implies a “transverse” Doppler Effect, for objects moving at right angles of the line between the source and observer. 3.9 An astronomical object known as 3C273 was a long standing puzzle until astronomers realized that it was emitting radiation from hydrogen atoms that were shifted to much longer wavelengths by a factor of 1.158. At what speed is 3C273 receding from the Earth? Note: 3C273 was the first object to be called a “Quasistellar Object” or quasar. 3.10 An object of mass m is moving with a speed βc and collides with a stationary object also of mass m. In their centre-of-momentum frame, both masses scatter at 90◦ relative to the incident direction of motion. Find the angle between the two objects in the lab frame. 3.11 An electron with energy E much larger than its rest energy, scatters from a target through an angle θ. Its final energy is E 0 . The four-momentum transfer is q = k − k0 where k (k0 ) is the electron incident (scattered) four-momentum. Show that θ q2 = −4EE 0 sin2 2 3.12 An object of mass m is moving with a speed βc and collides with a stationary object also of mass m. If new matter is created out of the collision energy, what is the largest mass that can appear? (You need to find the amount of energy in the centre-of-momentum system.) 3.13 The antiproton p¯ was discovered in the reaction pp → ppp p. ¯ What is the minimum energy needed for the accelerated incident proton if the target proton is at rest? It would interest you to look up the history of an accelerator called the Bevatron. 3.14 An object with mass m and kinetic energy K collides with, and sticks to, a stationary object of mass M. Find the mass of the final, compound object. Show that you get the correct answer for K mc2 . Also consider the answer for K mc2 for the cases m M and m M. 3.15 A neutrino is a very light (but not massless) elementary particle. Its mass m is so small, that when it is produced in nature, it almost always comes with an energy E mc2 . Find an expression for E in terms of m and p, its momentum, to lowest nonzero order in mc/p. 3.16 The Sun derives it energy from the (effective) nuclear fusion reaction p + p + p + p →4 He + 2e+ + 2νe where the neutrino νe mass can be neglected. The proton p has mass 938.272 MeV/c2 , the alpha particle 4 He has mass 3727.379 MeV/c2 , and the positron e+ has mass 0.511 MeV/c2 . Find the amount of energy released in one fusion reaction. If the Sun has a mass of 2 × 1030 kg, how long can the Sun live if it is made purely of hydrogen?
CHAPTER
4
The Early Development of Quantum Mechanics CONTENTS 4.1 4.2 4.3 4.4 4.5 4.6 4.7
The Photoelectric Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application: photomultiplier tubes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Compton Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Line Spectra and Atomic Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . De Broglie Waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application: electron microscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wave-Particle Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The uncertainty principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Rest of This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
62 64 65 67 69 71 72 75 76 77
Quantum mechanics was developed as a response to the inability of the classical theories of mechanics and electromagnetism to provide a satisfactory explanation of some of the properties of electromagnetic radiation and of atomic structure. As a result, a theory has emerged whose basic principles can be used to explain not only the structure and properties of atoms, molecules and solids, but also those of nuclei and of “elementary” particles such as the proton and neutron. Although there are still many features of the physics of such systems that are not fully understood, there are presently no indications that the fundamental ideas of quantum mechanics are incorrect. In order to achieve this success, quantum mechanics has been built on a foundation that contains a number of concepts that are fundamentally different from those of classical physics and which have radically altered our view of the way the physical universe operates. This book aims to elucidate and discuss the conceptual basis of the subject as well as explaining its success in describing the behaviour of atomic and subatomic systems. Quantum mechanics is often thought to be a difficult subject, not only in its conceptual foundation, but also in the complexity of its mathematics. However, although a rather abstract formulation is required for a proper treatment of the subject, much of the apparent complication arises in the course of the solution of 61
62 Quantum Mechanics essentially simple mathematical equations applied to particular physical situations. We shall discuss a number of such applications in this book, because it is important to appreciate the success of quantum mechanics in explaining the results of real physical measurements. However, the reader should try not to allow the ensuing algebraic complication to hide the essential simplicity of the basic ideas. In this chapter we shall discuss some of the key experiments that illustrate the failure of classical physics. However, although the experiments described were performed in the first quarter of this century and played an important role in the development of the subject, we shall not be giving a historically based account. Neither will our account be a complete description of the early experimental work. For example, we shall not describe the experiments on the properties of thermal radiation and the heat capacity of solids that provided early indications of the need for the quantization of the energy of electromagnetic radiation and of mechanical systems. The topics to be discussed have been chosen as those that point most clearly towards the basic ideas needed in the further development of the subject. As so often happens in physics, the way in which the theory actually developed was by a process of trial and error, often relying on flashes of inspiration, rather than the possibly more logical approach suggested by hindsight.
4.1
THE PHOTOELECTRIC EFFECT
When light strikes a clean metal surface in a vacuum, it causes electrons to be emitted with a range of energies. The maximum energy, E x , of the emitted electrons energies can be measured by applying a negative voltage between the metal and the electron detector. This opposes the electron motion and when it is equal to or greater than a value known as the “stopping potential” which equals the maximum electron energy, no electrons will be detected. For light of a given frequency f , E x is found to be equal to the difference between two terms. One of these is proportional to the frequency of the incident light with a constant of proportionality h that is the same whatever the metal used, while the other is independent of frequency but varies from metal to metal. Neither term depends on the intensity of the incident light, which affects only the rate of electron emission. Thus Ex = h f − φ
(4.1)
It is impossible to explain this result on the basis of the classical theory of light as an electromagnetic wave. This is because the energy contained in such a wave would arrive at the metal at a uniform rate and there is no apparent reason why this energy should be divided up in such a way that the maximum electron energy is proportional to the frequency and independent of the intensity of the light. This point is emphasized by the dependence of the rate of electron emission on the light intensity. Although the average emission rate is proportional to the intensity, individual electrons are emitted at random. It follows that electrons are sometimes emitted well before sufficient electromagnetic energy should have arrived at the metal, and this point has been confirmed by experiments performed using very weak light.
The Early Development of Quantum Mechanics 63 Such considerations led Einstein to postulate that the classical electromagnetic theory does not provide a complete explanation of the properties of light, and that we must also assume that the energy in an electromagnetic wave is “quantized” in the form of small packets, known as photons, each of which carries an amount of energy equal to h f . Given this postulate, we can see that when light is incident on a metal, the maximum energy an electron can gain is that carried by one of the photons. Part of this energy is used to overcome the binding energy of the electron to the metal— so accounting for the quantity φ in (4.1), which is known as the work function. The rest is converted into the kinetic energy of the freed electron, in agreement with Equation (4.1) and the experimental results. The photon postulate also explains the emission of photoelectrons at random times. Thus, although the average rate of photon arrival is proportional to the light intensity, individual photons arrive at random and, as each carries with it a quantum of energy, there will be occasions when an electron is emitted well before this would be expected classically. The constant h connecting the energy of a photon with the frequency of the electromagnetic wave is known as Planck’s constant, because it was originally postulated by Max Planck in order to explain some of the properties of thermal radiation. It is a fundamental constant of nature that frequently occurs in the equations of quantum mechanics. We shall find it convenient to change this notation slightly and define another constant } (pronounced “hcross” or “hslash”) as being equal to h divided by 2π. Moreover, when referring to waves, we shall normally use the angular frequency ω(= 2π f ), in preference to the frequency f . Using this notation, the photon energy E can be expressed as E = }ω
(4.2)
Throughout this book we shall write our equations in terms of } and avoid ever again referring to h. We note that } has the dimensions of energy×time and its currently best accepted value is 1.054 571 596 × 10−34 J s. Worked Example 4.1 The maximum energy of photoelectrons emitted from potassium is 2.1 eV when illuminated by light of wavelength 3 × 10−7 m and 0.5 eV when the light wavelength is 5 × 10−7 m. Use these results to obtain values for Planck’s constant and the minimum energy needed to free an electron from potassium. Solution Using (4.1), the fact that 1eV is equivalent to 1.6 × 10−19 J, and remembering that f = c/λ, where c = 3 × 108 ms−1 is the speed of light and λ is the wavelength, we get 2π × 1015 } − φ = 3.36 × 10−19 and 1.2π × 1015 } − φ = 0.8 × 10−19 Solving these simultaneous equations gives } = 1.02 × 10−34 J s and φ = 1.9 eV.
64 Quantum Mechanics
Application: photomultiplier tubes One important application of the photoelectric effect is the development of Photomultiplier tubes, also known as PMTs. These are extremely sensitive, and low noise, detectors of visible light. Some models are able to detect individual, single photons, and discriminate them from spurious background sources. FOCUSING ELECTRODE SECONDARY ELECTRON
DIRECTION OF LIGHT
LAST DYNODE
STEM PIN
VACUUM (~10P-4)
e-
FACEPLATE ELECTRON MULTIPLIER (DYNODES)
ANODE
STEM
PHOTOCATHODE
A schematic of a photomultiplier tube (PMT), with a “box and grid” dynode structure, along with a photograph of a Hamamatsu R1306 PMT which uses an analogous construction. Materials courtesy of Hamamatsu Photonics K.K., reprinted with permission. FIGURE 4.1
The operation of a PMT begins with the photoelectric effect–see Figure 4.1. A photon of visible light passes through the faceplate of an evacuated glass tube. The inside of the faceplate is coated with a thin layer of a special material, called a photocathode, that has an exceptionally small work function. The photocathode is thick enough so there is a high probability that the photoelectric reaction (4.1) occurs, but thin enough so that the ejected electron escapes into the evacuated region. Metal pins pass through the back of the glass tube, providing electrical connections to a metal structure inside. Most of the pins are attached to high voltage sources that set up complicated electric fields inside the tube, shaped by the metal structure. Two of the pins are used to evaporate
The Early Development of Quantum Mechanics 65
the photocathode on the inside of the faceplate, after the tube is evacuated and sealed. The electrostatic field between the metal components serves several purposes. In the front of the tube, the structure creates a focussing electrode that guides the photo-electron into a secondary structure called an electron multiplier. The electron multiplier consists of a series of “dynodes” which both receive and emit electrons, with each dynode at a successively higher electric potential. The last stage is an anode which receives the signal and makes it available at one of the output pins. When an electron, accelerated through some potential difference, strikes the material of a dynode, several electrons are emitted. Thus, one electron ejected from the photocathode gives rise to δN electrons at the anode, where δ is the average number of electrons emitted at a dynode and N is the number of dynode stages. With δ ≈ 3 and N = 10 as in Figure 4.1, the “gain“ of a PMT is 310 = 6 × 104 . This is a rather large value for any stable amplifier. Many different variants on the basic design are possible, leading to different performance characteristics, applications, and cost. For example, the type and thickness of the glass, as well as the choice of photocathode material, will affect the “quantum efficiency” of the PMT. This is the probability, as a function of wavelength of light, that an electron will be ejected per incident photon. The detailed design of the dynode structure also affects performance. Figure 4.1 shows a so-called “box and grid” arrangement, but there are many others. A handbook Photomultiplier Tubes: Basics and Applications is available online from Hamamatsu Photonics K.K. It is a complete technical description of the principles of operation, including different design options, as well as a variety of applications.
4.2
THE COMPTON EFFECT
The existence of photons is also demonstrated by experiments involving the scattering of x-rays by electrons, which were first carried out by A. H. Compton. To understand these we must make the further postulate that a photon, as well as carrying a quantum of energy, also has a definite momentum and can therefore be treated in many ways just like a classical particle. An expression for the photon momentum is suggested by the classical theory of radiation pressure: it is known that if energy is transported by an electromagnetic wave at a rate W per unit area per second, then the wave exerts a pressure of magnitude W/c (where c is the velocity of light), whose direction is parallel to that of the wave vector k of the wave; if we now treat the wave as composed of photons of energy }ω it follows that the photon momentum p should have a magnitude }ω/c = }k and that its direction should be parallel to k. Thus p = }k
(4.3)
We now consider a collision between such a photon and an electron of mass m
66 Quantum Mechanics
In Compton scattering an x-ray photon of angular frequency ω and wave vector k collides with an electron initially at rest. After the collision the photon frequency and wave vector are changed to ω0 and k0 , respectively, and the electron recoils with momentum pe .
FIGURE 4.2
that is initially at rest, as shown in Figure 4.2. Following the discussion in Sec. 3.3, a particle with energy E = h¯ ω and momentum |p| = h¯ k where ω = kc would have zero mass. We have already worked out those kinematics, finding E i E0 = h E 1 + mc2 (1 − cos θ)
(4.4)
where E 0 = h¯ ω0 is the energy of the scattered photon. Historically, Compton discussed his theory and results in terms of a shift in wavelength, so we invert both sides of this equation and after a little arrangement, find 1 1 } − = (1 − cos θ) ω0 ω mc2 that is
2π} (1 − cos θ) (4.5) mc where λ and λ0 are the x-ray wavelengths before and after the collision, respectively. Compton laid out his thinking in two clearly written papers, one on the theory and the other on the first experiments. It is worth a moment to consider the juxtaposition between classical physics, special relativity, and quantum mechanics that may have led to his thinking. We know from Chapter 2 that electromagnetic radiation will scatter from “free” electrons, the phenomenon known as Thomson scattering. In this mechanism, the passing electromagnetic wave makes the electron undergo simple λ0 − λ =
The Early Development of Quantum Mechanics 67 harmonic motion, which in turn emits dipole radiation with a specific angular distribution, but with exactly the same wavelength as the incoming wave. Where might we expect this mechanism to break down? The maximum speed of the oscillating electron would be around ωa where a ∼ 10−10 m is the size of the atom. Clearly we expect breakdown, then, when ωa = c, or E = h¯ ω ∼ 2 keV. Compton used molybdenum Kα x-rays, which have an energy of 17 keV. Experimental studies of the scattering of x-rays by electrons in solids produce results in good general agreement with these predictions. In particular, if the intensity of the radiation scattered through a given angle is measured as a function of the wavelength of the scattered x-rays, a peak is observed whose maximum lies just at the point predicted by (4.5). In fact such a peak has a finite, though small, width implying that some of the photons have been scattered in a manner slightly different from that described above. This can be explained by taking into account the fact that the electrons in a solid are not necessarily at rest, but generally have a finite momentum before the collision. Compton scattering can therefore be used as a tool to measure electron momentum, and we shall discuss this in more detail in Chapter 7. Both the photoelectric effect and the Compton effect are connected with the interactions between electromagnetic radiation and electrons, and both provide conclusive evidence for the photon nature of electromagnetic waves. However, we might ask why there are two effects and why the x-ray photon is scattered by the electron with a change of wavelength, while the optical photon transfers all its energy to the photoelectron. The principal reason is that in the x-ray case the photon energy is much larger than the binding energy between the electron and the solid; the electron is therefore knocked cleanly out of the solid in the collision and we can treat the problem by considering energy and momentum conservation. In the photoelectric effect, on the other hand, the photon energy is only a little larger than the binding energy and, although the details of this process are rather complex, it turns out that the momentum is shared between the electron and the atoms in the metal and that the whole of the photon energy can be used to free the electron and give it kinetic energy. However, none of these detailed considerations affects the conclusion that in both cases the incident electromagnetic radiation exhibits properties consistent with it being composed of photons whose energy and momentum are given by the expressions (4.2) and (4.3). Worked Example 4.2 An x-ray photon of wavelength 1.0 × 10−12 m is incident on a stationary electron. Calculate the wavelength of the scattered photon if it is detected at an angle of 60◦ to the incident radiation. (The mass of the electron equals 9.1 × 10−31 kg.) Solution Using (4.5) we calculate λ0 − λ = 1.21 × 10−12 m and therefore λ0 = 2.21 × 10−12 m.
4.3
LINE SPECTRA AND ATOMIC STRUCTURE
When an electric discharge is passed through a gas, light is emitted which, when examined spectroscopically, is typically found to consist of a series of lines, each of which has a sharply defined frequency. A particularly simple example of such a line
68 Quantum Mechanics spectrum is that of hydrogen, in which case the observed frequencies are given by the formula ! 1 1 ωmn = 2πR0 c 2 − 2 (4.6) n m where n and m are integers, c is the speed of light and R0 is a constant known as the Rydberg constant (after J. R. Rydberg who first showed that the experimental results fitted this formula) whose currently accepted value is 1.097 373 157 × 107 m−1 . Following our earlier discussion, we can assume that the light emitted from the atom consists of photons whose energies are }ωmn . It follows from this and conservation of energy that the energy of the atom emitting the photon must have been changed by the same amount. The obvious conclusion to draw is that the energy of the hydrogen atom is itself quantized, meaning that it can adopt only one of the values En where 2πR0 }c (4.7) En = − n2 the negative sign corresponding to the negative binding energy of the electron in the atom. Similar constraints govern the values of the energies of atoms other than hydrogen, although these cannot generally be expressed in such a simple form. We refer to allowed energies such as En as energy levels. Further confirmation of the existence of energy levels is obtained from the ionization energies and absorption spectra of atoms, which both display features consistent with the energy of an atom being quantized in this way. It will be one of the main aims of this book to develop a theory of quantum mechanics that will successfully explain the existence of energy levels and provide a theoretical procedure for calculating their values. One feature of the structure of atoms that can be at least partly explained on the basis of energy quantization is the simple fact that atoms exist at all! According to classical electromagnetic theory, an accelerated charge always loses energy in the form of radiation, so a negative electron in motion about a positive nucleus should radiate, lose energy, and quickly coalesce with the nucleus. However, if the energy of the atom is quantized, there will be a minimum energy level (that with n = 1 in the case of hydrogen) below which the atom cannot go, and in which it will remain indefinitely. Quantization also explains why all atoms of the same species behave in the same way. As we shall see later, all hydrogen atoms in the lowest energy state have the same properties, which is in contrast to a classical system, such as a planet orbiting a star, where an infinite number of possible orbits with very different properties can exist for a given value of the energy of the system. Worked Example 4.3 Calculate the wavelength of light emitted when a hydrogen atom makes a transition from a state with n = 4 to one with n = 3. Solution Using (4.6), and λmn = 2πc/ωmn , we get " !# 1 1 −1 λ3,4 = 2πc/ω34 = 1.097 × 107 × − = 1.9 × 10−6 m 9 16
The Early Development of Quantum Mechanics 69
4.4
DE BROGLIE WAVES
Following on from the fact that the photons associated with electromagnetic waves behave like particles, L. de Broglie suggested that particles such as electrons might also have wave properties. He further proposed that the frequencies and wave vectors of these “matter waves” would be related to the energy and momentum of the associated particle in the same way as in the photon case. That is E = }ω (4.8) p = }k In the case of matter waves, Equations (4.8) are referred to as the de Broglie relations. We shall develop this idea in subsequent chapters, where we shall find that it leads to a complete description of the structure and properties of atoms, including the quantized atomic energy levels. In the meantime we shall describe experiments that provide direct confirmation of the existence of matter waves. The property possessed by a wave that distinguishes it from any other physical phenomenon is its ability to form interference and diffraction patterns: when different parts of a wave are recombined after travelling different distances, they reinforce each other or cancel out depending on whether the two path lengths differ by an even or an odd number of wavelengths. Such phenomena are readily demonstrated in the laboratory by passing light through a diffraction grating. However, if the wavelength of the waves associated with even very low energy electrons (say, around 1 eV) is calculated using the de Broglie relations (4.8) a value of around 10−9 m is obtained, which is much smaller than that of visible light and much too small to form a detectable diffraction pattern when passed through a conventional grating. However, the atoms in a crystal are arranged in periodic arrays, so a crystal can act as a threedimensional diffraction grating with a very small spacing. This is demonstrated in x-ray diffraction, and the first direct confirmation of de Broglie’s hypothesis was an experiment performed by C. Davisson and L. H. Germer that showed electrons being diffracted by crystals in a similar manner. Nowadays the wave properties of electron beams are commonly observed experimentally and electron microscopes, for example, are often used to display the diffraction patterns created by the objects under observation. Moreover, not only electrons behave in this way; neutrons of the appropriate energy can also be diffracted by crystals, this technique being commonly used to investigate structural and other properties of solids. In recent years, neutron beams have been produced with such low energy that their de Broglie wavelength is as large as 2.0 nm. When these are passed through a double slit whose separation is of the order of 0.1 mm, the resulting diffraction maxima are separated by about 10−3 degrees, which corresponds to about 0.1 mm at a distance of 5 m beyond the slits, where the detailed diffraction pattern can be resolved. Figure 4.3 gives the details of such an experiment and the results obtained; we see that the number of neutrons recorded at different angles is in excellent agreement with the intensity of the diffraction pattern, calculated on the assumption that the neutron beam can be represented by a de Broglie wave. Worked Example 4.4 Using the data shown in Figure 4.3 and its legend, calculate the
70 Quantum Mechanics
In recent years, it has been possible to produce neutron beams with de Broglie wavelengths around 2 nm which can be detectably diffracted by double slits of separation about 0.1 mm. A typical experimental arrangement is shown in (a) and the slit arrangement is illustrated in (b). The number of neutrons recorded along a line perpendicular to the diffracted beam 5 m beyond the slits is shown in (c), along with the intensity calculated from diffraction theory, assuming a wave model for the neutron beam. The agreement is clearly excellent. [Reproduced by permission from A. Zeilinger, R. G¨ahler, C. G. Schull, W. Treimer and W. Mampe, Reviews of Modern Physics, 60, 1067–73 (1988).]
FIGURE 4.3
The Early Development of Quantum Mechanics 71 wavelength and speed of the neutrons used in this experiment, assuming the slit separation to be 0.12 mm. (The neutron mass equals 1.675 × 10−21 kg) Solution By direct measurement on Figure 4.3c, we find that the separation of the diffraction peaks is about 75µm. The angular separation is therefore 75 × 10−6 /5 = 1.5 × 10−5 radians. From the standard theory of two-slit diffraction, this (small) angle must equal λ/d, where d = 0.12 mm is the slit separation. Hence λ = 1.8 nm and the neutron speed = 2π}/λmn = 6.626 × 10−34 /(1.9 × 10−9 × 1.675 × 10−27 ) = 219 ms−1 .
Application: electron microscopy A typical optical microscope sends light through a specimen, and then manipulates the light with lenses to create a magnified, focussed image that you see with your eye. In principle, arbitrarily large magnification is possible, but if the specimen size is near the wavelength of light (∼ 500 nm or smaller), then optical diffraction renders it essentially invisible. Using shorter wavelengths is impractical, because the fundamental optical techniques are not available in the ultraviolet. However, electrons can have an arbitrarily small de Broglie wavelength, simply by increasing their kinetic energy. This is most easily achieved by acceleration through a static high voltage. Furthermore, it is straightforward to design a system of electrostatic and magnetic elements to transport and focus the electrons. Thus one can achieve magnification in the same way as lenses for an optical microscope. Rather than observing the electrons with your eye, they form an image on a screen which can be digitized and recorded. This is the principle of the transmission electron microscope , or TEM. (A similar application, where the electron beam is reflected while rapidly scanning over the specimen is called a scanning electron microscopy or SEM.) Figure 4.4 shows a schematic of a TEM device, including the electron “gun,” specimen port, optical elements, and observing screen. The beam, typically accelerated through tens of kV in the gun, passes vertically down through the specimen, and is focussed on the screen, where it can be viewed in real time, or recorded electronically. Figure 4.4 also shows a TEM image of an HIV virus attaching to a human T-cell, about to infect it. Note the scale; HIV viruses are typically 90–120 nm in diameter. This image was √ taken with an accelerating potential V = 100 kV, corresponding to λ = h/ 2me eV = 4 × 10−3 nm. Such short wavelengths not only make images possible of small objects like the HIV virus, but also allow exquisitely detailed high resolution micrographs of larger organisms and other objects.
72 Quantum Mechanics
The left shows a general schematic of a Transmission Electron Microscope (TEM). Electrons are accelerated and emitted by the gun at the top, producing a downward going beam. (This schematic is courtesy of Wikipedia). The right image is of a human immunodeficiency virus (HIV) organism, next to a human T-Cell. The effective magnification shown here is ×145000. Note the scale in the image, a size not resolvable with optical microscopy. Image courtesy of the Dartmouth College Electron Microscope Facility. FIGURE 4.4
4.5
WAVE-PARTICLE DUALITY
Although we have just described the experimental evidence for the wave nature of electrons and similar bodies, it must not be thought that this description is complete or that these are any-the-less particles. Although in a diffraction experiment wave properties are manifested during the diffraction process and the intensity of the wave determines the average number of particles scattered through various angles, when the diffracted electrons or neutrons are detected they are always found to behave like point particles with the expected mass and charge, and having a particular energy and momentum. Conversely, although we need to postulate photons in order to explain the photoelectric and Compton effects, phenomena such as the diffraction of light by a grating or of x-rays by a crystal can be explained only if electromagnetic radiation has wave properties. Quantum mechanics predicts that both the wave and the particle models apply to all objects whatever their size. However, in many circumstances it is perfectly clear
The Early Development of Quantum Mechanics 73 which model should be used in a particular physical situation. Thus, electrons with a kinetic energy of about 100 eV (1.6 × 10−17 J) have de Broglie wavelengths of about 10−10 m and are therefore diffracted by crystals according to the wave model. However, if their energy is very much higher (say, 100 MeV) the wavelength is then so short (about 10−14 m) that diffraction effects are not normally observed and such electrons nearly always behave like classical particles. Now consider a small grain of sand of mass about 10−6 g moving at a speed of 10−3 m s−1 ; this has a de Broglie wavelength of the order of 10−24 m and its wave properties are quite undetectable; clearly this is even more true for heavier or faster moving objects. In recent years, there has been considerable interest in attempting to detect wave properties of more and more massive objects. One of the heaviest bodies for which diffraction of de Broglie waves has been directly observed is the Buckminster fullerene molecule C60 whose mass is nearly 1000 times that of a neutron. These particles were passed through a grating and the resulting diffraction pattern was observed in an experiment performed in 2000 by the same group as is featured in Figure 4.3. Some experiments cannot be understood unless the wave and particle models are both used. This is illustrated by the neutron diffraction experiment illustrated in 4.3. The neutron beam behaves like a wave when it is passing through the slits and forming an interference pattern, but when the neutrons are detected, they behave like a set of individual particles with their usual mass, zero electric charge, etc. We never detect half a neutron! Moreover, the typical neutron beams used in such experiments are so weak that there is practically never more than one neutron in the apparatus at any one time, in which case we cannot explain the interference pattern on the basis of any model involving interactions between different neutrons. Suppose we now change this experiment by placing detectors behind each slit instead of a large distance away; these will detect individual neutrons passing through one or other of the slits—but never both at once—and the obvious conclusion is that the same thing happened in the interference experiment. But we have just seen that the interference pattern is formed by a wave passing through both slits, and this can be confirmed by arranging a system of shutters so that only one or other of the two slits (but never both) are open at any one time, in which case it is impossible to form an interference pattern. Both slits are necessary to form the interference pattern, so if the neutrons always pass through one slit or the other then the behaviour of a given neutron must somehow be affected by the slit it did not pass through! An alternative view, which is now the orthodox interpretation of quantum mechanics, is to say that the model we use to describe quantum phenomena is not just a property of the quantum objects (the neutrons in this case) but also depends on the arrangement of the whole apparatus. Thus, if we perform a diffraction experiment, the neutrons are waves when they pass through the slits, but are particles when they are detected. But if the experimental apparatus includes detectors right behind the slits, the neutrons behave like particles at this point. This dual description is possible because no interference pattern is created in the latter case. Moreover, it turns out that this happens no matter how subtle an experiment we design to detect which slit the neutron passes through: if it is successful, the phase relation between the waves passing through the slits is destroyed and the interference pattern disappears.
74 Quantum Mechanics
A measurement of the position of a particle by a microscope is accompanied by a corresponding uncertainty in our knowledge of the particle momentum.
FIGURE 4.5
We can therefore look on the particle and wave models as complementary rather than contradictory properties. Which one is manifest in a particular experimental situation depends on the arrangement of the whole apparatus, including the slits and the detectors; we should not assume that, just because we detect particles when detectors are placed behind the slits, the neutrons still have these properties when these are absent. It should be noted that, although we have just discussed neutron diffraction, the argument would have been largely unchanged if we had considered light waves and photons or any other particles with their associated waves. In fact the idea of complementarity is even more general than this and we shall find many cases in our discussion of quantum mechanics where the measurement of one property of a physical system renders another unobservable; an example of this will be described in the next paragraph when we discuss the limitations on the simultaneous measurement of the position and momentum of a particle. Many of the apparent paradoxes and contradictions that arise can be resolved by concentrating on those aspects of a physical system that can be directly observed and refraining from drawing conclusions about properties that cannot. However, there are still significant conceptual problems in this area that remain the subject of active research, and we shall discuss these in some detail in Chapter 16.
The Early Development of Quantum Mechanics 75
The uncertainty principle In this section we consider the limits that wave-particle duality places on the simultaneous measurement of the position and momentum of a particle. Suppose we try to measure the position of a particle by illuminating it with radiation of wavelength λ and using a microscope of angular aperture α, as shown in Figure 4.5. The fact that the radiation has wave properties means that the size of the image observed in the microscope will be governed by the resolving power of the microscope. The position of the electron is therefore uncertain by an amount ∆x, which is given by standard optical theory as ∆x '
λ sin α
(4.9)
However, the fact that the radiation is composed of photons means that each time the particle is struck by a photon it recoils, as in Compton scattering. The momentum of the recoil could of course be calculated if we knew the initial and final momenta of the photon, but as we do not know through which points on the lens the photons entered the microscope, the x component of the particle momentum is subject to an error ∆p x where ∆p x ' p sin α = 2π} sin α/λ
(4.10)
Combining (4.9) and (4.10) we get ∆x∆p x ' 2π}
(4.11)
It follows that if we try to improve the accuracy of the position measurement by using radiation with a smaller wavelength, we shall increase the error on the momentum measurement and vice versa. This is just one example of an experiment designed to measure the position and momentum of a particle, but it turns out that any other experiment with this aim is subject to constraints similar to (4.11). We shall see in Chapter 7 that the fundamental principles of quantum mechanics ensure that in every case the uncertainties in the position and momentum components are related by 1 ∆x∆p x > } 2
(4.12)
This relation is known as the Heisenberg uncertainty principle. According to quantum mechanics it is a fundamental property of nature that any attempts to make simultaneous measurements of position and momentum are subject to this limitation. It should be noted that the uncertainties in position and momentum are an intrinsic feature of the measurement process; in particular, ∆x is a result of the finite resolving power of the lens and is not affected by the fact that the particle’s position may change as it recoils during the collision with the photon. This point is emphasized by the fact that (4.12) does not contain the particle mass and therefore applies equally well to a particle of large mass, which would not recoil significantly as a result of
76 Quantum Mechanics photon scattering. The uncertainty in its velocity would therefore be very small, but the momentum uncertainty would still be as calculated above and (4.12) would still hold. The Heisenberg uncertainty principle is more subtle and more fundamental than the popular idea of the value of one property being disturbed when the other is measured. We return to this point in our more general discussion of the uncertainty principle in Chapter 7.
4.6
THE REST OF THIS BOOK
In the next two chapters we discuss the nature and properties of matter waves in more detail and show how to obtain a wave equation whose solutions determine the energy levels of bound systems. We shall do this by considering one-dimensional waves in Chapter 5, where we shall obtain qualitative agreement with experiment; in the following chapter we shall extend our treatment to three-dimensional systems and obtain excellent quantitative agreement between the theoretical results and experimental values of the energy levels of the hydrogen atom. At the same time we shall find that this treatment is incomplete and leaves many important questions unanswered. Accordingly, in Chapter 7 we shall set up a more formal version of quantum mechanics within which the earlier results are included, but which can, in principle, be applied to any physical system. This will prove to be a rather abstract process and prior familiarity with the results discussed in the earlier chapters will be a great advantage in understanding it. Having set up the general theory, it is then developed in subsequent chapters and discussed along with its applications to a number of problems such as the quantum theory of angular momentum and the special properties of systems containing a number of identical particles. Chapter 14 consists of an elementary introduction to relativistic quantum mechanics and quantum field theory, while Chapter 15 discusses some examples of the applications of quantum mechanics to the processing of information that were developed toward the end of the twentieth century. The last chapter contains a reasonably detailed discussion of some of the conceptual problems of quantum mechanics. Chapters 10 to 16 are largely self-contained and can be read in a different order if desired. Finally we should point out that photons, which have been referred to quite extensively in this chapter, will hardly be mentioned again except in passing. This is primarily because a detailed treatment requires a discussion of the quantization of the electromagnetic field. We give a very brief introduction to quantum field theory in Chapter 14, but anything more would require a degree of mathematical sophistication, which is unsuitable for a book at this level. We shall instead concentrate on the many physical phenomena that can be understood by considering the mechanical systems to be quantized and treating the electromagnetic fields semiclassically. However, it should be remembered that there are a number of important phenomena, particularly in high-energy physics, which clearly establish the quantum properties of electromagnetic waves, and field quantization is an essential tool in considering such topics.
The Early Development of Quantum Mechanics 77
4.7
PROBLEMS
4.1 When light of variable wavelength shines on a particular metal, no photoelectrons are emitted if the wavelength is greater than 550 nm. For what wavelength of light would the maximum energy of photoelectrons be 3.5 eV? 4.2 If the energy flux associated with a light beam is 10 W m−2 , estimate how long it would take, classically, for sufficient energy to arrive at a potassium atom of radius 2 × 10−10 m in order that an electron be ejected, if the potassium work function is 2.3eV. What would be the average emission rate of photoelectrons if such light fell on a piece of potassium 10−3 m2 in area and all the energy were used in photoemission? How would your answers to the latter questions be affected by quantum-mechanical considerations? 4.3 When light of wavelength λ was directed onto the surface of a sample of sodium, photoelectrons were emitted with maximum energy V. When λ = 2.5 × 10−7 m, V was measured to be 2.4 eV and when λ = 1.2 × 10−7 , V = 7.5 eV. Obtain values for Planck’s constant and the work function of sodium. 4.4 Hydrogen forms a solid at very low temperatures. If electromagnetic radiation of wavelength 50 nm were directed at solid hydrogen, what would be the maximum energy of the resulting photoelectrons? 4.5 The Compton effect describes the scattering an x-ray photon when it strikes an electron. To conserve momentum, the electron must recoil in the process. Calculate the magnitude of this recoil and its direction relative to that of the incident photon. 4.6 One of the early experiments that revealed the quantization of atomic energy levels was performed by J. Franck and G. Hertz in 1914. They passed a beam of electrons through a cell containing mercury vapour. The electrons were accelerated by an applied voltage and the resulting current increased steadily until this voltage reached 4.9 V when it dropped suddenly, Further increase in voltage renewed the current increase until a voltage of 9.8 V was reached when the current dropped again and this behaviour repeated at a value of 14.7V. Explain these results in as much detail as you can. 4.7 What is the typical de Broglie wavelength associated with an atom of helium in a gas at room temperature, given that the average thermal energy equals (3/2)kB T , where kB is Boltzmann’s constant? 4.8 A beam of neutrons with known momentum is diffracted by a single slit in a geometrical arrangement similar to that shown for the double slit in Figure 4.3. Show that a value for the component of momentum of the neutrons in a direction perpendicular to both the slit and the incident beam can be derived from the single-slit diffraction pattern. Show that the uncertainty in this momentum is related to the uncertainty in the position of the neutron passing through the slit in a manner consistent with the Heisenberg uncertainty principle. (This example is discussed in more detail in Chapter 7.) 4.9 In recent years, the wave nature of matter has been tested for systems of appreciable mass and some complexity. One of these tests consisted of passing a beam of molecules that are known as buckminsterfullerene and consist of 60 carbon atoms in a near-spherical configuration, through a diffraction grating. The mass of a carbon atom is twelve times that of a proton and the spacing of the grating used was 10−7 m. What was the approximate value of the molecular speed for diffraction effects to have been visible in this experiment? 4.10 It can be shown that the magnitude of the average kinetic energy of an electron in a hydrogen atom is equal to the magnitude of the binding energy. Use this and the Heisenberg uncertainty principle to obtain an approximate lower bound to the radius of a hydrogen atom in its lowest energy state. 4.11 When an atom emits a photon, its energy level changes, but the atom also recoils, so some energy is converted into kinetic energy of the atom as a whole. Show that the change in photon angular frequency due to this effect is ∆ω = h¯ 2 ω20 /(2Mc2 ), where ω0 is the frequency in the absence of recoil and M is the mass of the atom - assume that h¯ ω0 a
(5.21) (5.22)
This is known as an infinite square well. 1 A detailed discussion of this point has been given by E. Merzbacher, Am. J. Phys., vol. 30, p. 237, 1962.
88 Quantum Mechanics
V(x) E4 E3 E2
E1 -a
a
(a)
(b)
u8
|u8|2
u3
|u3|2
u2
|u2|2
u1
|u1|2
(c)
FIGURE 5.1 a) Shows the potential V(x) as a function of x for an infinite square well, along with the energy levels, En , of the four lowest energy states. The wave functions, un , and position probability distributions, |un |2 , corresponding to energy states with n = 1, 2, 3, and 8 are shown in (b) and (c) respectively.
The One-dimensional Schr¨odinger Equations 89 In the first region, the time-independent Schr¨odinger equation (5.17) becomes }2 d2 u + Eu = 0 2m dx2
(5.23)
A general solution to this equation can be written in the form u = A cos kx + B sin kx
(5.24)
where A and B are constants and k = (2mE/}2 )1/2 . This can be verified by substitution. In the region outside the well where the potential is infinite, the Schr¨odinger equation can be satisfied only if the wave function is zero. We now apply the first boundary condition, which requires the wave function to be continuous at x = ±a and therefore equal to zero at these points. Thus A cos ka + B sin ka = 0 (5.25) A cos ka − B sin ka = 0 Hence Either
B=0
or
A=0
and that is, and that is,
cos ka = 0 k = nπ/2a sin ka = 0 k = nπ/2a
n = 1, 3, 5, . . . n = 2, 4, 6, . . .
(5.26)
These conditions, combined with the definition of k given above (following (5.24)), mean that solutions consistent with the boundary conditions exist only if E ≡ En = }2 π2 n2 /8ma2
(5.27)
In other words, the energy is quantized. Application of the normalization condition (5.20) leads to the following expressions for the time-independent part of the wave function, which we now write as un , where un = a−1/2 cos(nπx/2a) for n odd if − a 6 x 6 a −1/2 (5.28) un = a sin(nπx/2a) for n even and un = 0
if |x| > a
These expressions are illustrated graphically in Figure 5.1(b) for a number of values of n. We see that the wave function is either symmetric (un (x) = un (−x)) or antisymmetric (un (x) = −un (−x)) about the origin, depending on whether n is even or odd. This property is known as the parity of the wave function: symmetric wave functions are said to have even parity while antisymmetric wave functions are said to have odd parity. The possession of a particular parity is a general feature of the wave function associated with an energy state of a potential, which is itself symmetric— i.e., when V(x) = V(−x).
90 Quantum Mechanics Remembering that the probability distribution for the particle position is given by |u(x)|2 , we see from Figure 5.1 that in the lowest energy state the particle is most likely to be found near the centre of the box, while in the first excited state its most likely positions are near x = ±a/2. For states of comparatively high energy, the probability distribution has the form of a large number of closely spaced oscillations of equal amplitude. We can use these results to get some idea of how the Schr¨odinger equation can be used to explain atomic properties. The typical size of an atom is around 10−10 m and the mass of an electron is 9.1 × 10−31 kg. Taking the first of these to be a and substituting into (5.27) leads to the expression En ' 1.5 × 10−18 n2 J The energy difference between the first and second levels is then 4.5 × 10−18 J (28 eV) so that a photon emitted in a transition between these levels would have a wavelength of about 4.4 × 10−8 m, which is of the same order as that observed in atomic transitions. Of course the atom is not a one-dimensional box, so we can only expect approximate agreement at this stage; quantitative calculations of atomic energy levels must wait until we develop a full three-dimensional model in the next chapter. Worked Example 5.1 Repeat the above calculation with m the mass of a proton (1.7 × 10−27 kg) and a the order of the diameter of a typical nucleus (2 × 10−15 m) and comment on the physical significance of your result. Solution En = π2 × (1.05 × 10−34 )2 n2 / 8 × 1.7 × 10−27 × 4 × 10−30 = 2.0 × 10−12 n2 J The energy difference between the first and second levels is now about 10−11 J (60 MeV) which is in order-of-magnitude agreement with experimental measurements of nuclear binding energies.
One of the important requirements of a theory of microscopic systems is that it must produce the same results for macroscopic systems as are successfully predicted by classical mechanics. This is known as the correspondence principle. Applied to the present example, in the classical limit we expect to have no measurable quantization of the energy and a uniform probability distribution—because the particle is equally likely to be anywhere in the box. Referring to (5.27), we see that the energy level separation is inversely proportional to ma2 , so we can expect it to be undetectably small if m and a are macroscopically large. Similarly, the separation between adjacent peaks in the probability distribution equals 2a/n from (5.28), which also becomes very small if n is large. Moreover, if the energy of the system is not precisely defined then the exact value of n will be unknown, and, as will be shown in Chapter 7, this implies that the wave function is then a linear combination of the wave functions of the states within the allowed energy span. The corresponding probability distribution is then very nearly uniform across the well—in even better agreement with the classical expectation.
The One-dimensional Schr¨odinger Equations 91 Worked Example 5.2 Find the energy level separation and the distance between adjacent probability maxima in the case of a particle of mass 10−10 kg (e.g., a small grain of salt) confined to a box of half-width 10−4 m at a temperature of 1 K and comment on their physical significance. NB: these quantities are small on a macroscopic scale although large in atomic terms. Assume that the kinetic energy of a particle at temperature T is approximately kB T , where kB is Boltzmann’s constant (1.4 × 10−23 JK−1 ). Solution From (5.27), the energy levels of this system are En = π2 × (1.05 × 10−34 )2 / 8 × 10−10 × 10−8 = 1.4 × 10−50 n2 J so the separation between adjacent levels is (2.8 × 10−50 n) J. The separation between adjacent probability maxima is (2×10−4 /n) m. At a temperature of 1 K the energy is about 1.4×10−23 J leading to a value for n of around 3 × 1013 . The separation between adjacent energy levels would then be about 10−36 J and the separation between adjacent peaks in the probability distribution would be about 10−17 m. These quantities are so small that an experiment of the accuracy required to detect any effects due to energy quantization would be completely impossible in practice. An experiment to define the position of the particle to this accuracy or better would be similarly impossible. Thus to all intents and purposes, quantum and classical mechanics predict the same results—all positions within the well are equally likely and any value of the energy is allowed—and the correspondence principle is verified.
5.5
THE FINITE SQUARE WELL
We now consider the problem where the sides of the well are not infinite, but consist of finite steps. The potential, illustrated in Figure 5.3, is then given by V =0 − a 6 x 6 a (5.29) V = V0 |x| > a We shall consider only bound states where the total energy E is less than V0 . The general solution to the Schr¨odinger equation in the first region is identical to the corresponding result in the infinite case (5.24). In the region |x| > a, however, the Schr¨odinger equation becomes }2 d2 u − (V0 − E)u = 0 2m dx2
(5.30)
u = C exp(κx) + D exp(−κx)
(5.31)
whose general solution is
where C and D are constants and κ = [2m(V0 − E)/}2 ]1/2 . We see at once that if x > 0, C must equal zero, otherwise the wave function would tend to infinity as x tends to infinity in breach of the boundary conditions. Thus we have u = D exp(−κx)
x>a
(5.32)
92 Quantum Mechanics A similar argument leads to u = C exp(κx)
x < −a
(5.33)
As the discontinuities in the potential at x = ±a are now finite rather than infinite, the boundary conditions require that both u and du/dx be continuous at these points. Thus we have, from (5.24), (5.32) and (5.33) A cos ka + B sin ka = D exp(−κa) −kA sin ka + kB cos ka = −κD exp(−κa) A cos ka − B sin ka = C exp(−κa) kA sin ka + kB cos ka = κC exp(−κa)
(5.34) (5.35) (5.36) (5.37)
These equations lead directly to 2A cos ka = (C + D) exp(−κa) 2kA sin ka = κ(C + D) exp(−κa) 2B sin ka = (D − C) exp(−κa) 2kB cos ka = −κ(D − C) exp(−κa)
(5.38) (5.39) (5.40) (5.41)
where (5.38) is obtained by adding (5.34) and (5.36), (5.39) is obtained by subtracting (5.35) from (5.37), and (5.40) and (5.41) are derived similarly. If we now divide (5.39) by (5.38) and (5.41) by (5.40) we get k tan ka = κ unless C = −D and A = 0 (5.42) k cot ka = −κ unless C = D and B = 0 Both conditions (5.42) must be satisfied simultaneously, so we have two sets of solutions subject to the following conditions k tan ka = κ C=D and B = 0 (5.43) −k cot ka = κ C = −D and A = 0 Remembering that k = (2mE)1/2 /} and κ = [2m(V0 − E)]1/2 /}, we see that equations (5.43) determine the allowed values of the energy, just as the energy levels of the infinite well were determined by equations (5.26). However, in the present case the solutions to the equations cannot be expressed as algebraic functions and we have to solve them numerically. This can be done graphically or by iteration. To obtain a solution graphically, we first define k0 so that k02 = 2mV0 /}2 and use this with the definitions of k and κ to rewrite equations (5.43) as 1 ka tan(ka) = (k02 a2 − k2 a2 ) 2 (5.44) 2 2 21 2 2 −ka cot(ka) = (k a − k a ) 0
Figure 5.2 shows the left- and right-hand sides of (5.43) as functions of ka in
The One-dimensional Schr¨odinger Equations 93
The circular quadrant represents κa and the other lines are graphs of ka tan ka and ka cot ka. We note that k and κ are both positive by definition. Intersections between the quadrant and the other curves correspond to the solutions of equation (5.44). FIGURE 5.2
the case where k0 a = 5. The values of ka corresponding to the intersections of these graphs are therefore solutions of (5.43) from which the energy levels can be deduced. One method of applying iteration to derive accurate values of ka is known as “bracketing and bisection.” Referring to Figure 5.2, we see that the nth solution corresponds to a value of ka between (n − 1)π/2 and nπ/2 (or k0 a in the case where n equals its maximum value) If we calculate the difference between the left and right sides of (5.44) at the midpoint of this range—i.e., where ka = (n − 21 )π/2— this will be positive only if the desired value of ka is greater than (n − 12 )π/2 and will be negative otherwise. This allows us to put either the upper or lower bound equal to (n − 21 )π, which halves the possible range of ka. This process can then be repeated until the difference between the upper and lower bounds is small enough to ensure that the value of ka is known to the desired accuracy. Table 5.1 sets out the results of this procedure for all the possible solutions in this case, showing the energy levels both as fractions of V0 and as fractions of the energies of the corresponding infinite-well states (5.27). The associated wave functions are shown in Figure 5.3. Comparing these with the wave functions for the infinite square well (Figure 5.1), we see that they are generally similar and, in particular, that they have a definite parity, being either symmetric or antisymmetric about the point x = 0. However, one important difference between Figures 5.1 and 5.3 is that in the latter
94 Quantum Mechanics
FIGURE 5.3 a) Shows the potential V as a function of x for a finite square well in the case where V0 = 25}2 /2ma2 , along with the energies of the four bound states. The wave functions and position probability distributions for these states are shown in (b) and (c) respectively.
case the wave functions decay exponentially in the region |x| > a instead of going to zero at x = ±a. That is, the wave function penetrates a region where the total energy is less than V0 , implying that there is a probability of finding the particle in a place where it could not be classically as it would then have to have negative kinetic energy. This is another example of a quantum-mechanical result that is quite different from the classical expectation and we shall discuss it in more detail in the next section. The penetration of the wave function into the classically forbidden region also results in the energy levels being lower than in the infinite square-well case (Table 2.1) because the boundary conditions are now satisfied for smaller values of k. This effect is more noticeable for the higher energy levels and, conversely, we can conclude that in the case of a very deep well, the energy levels and wave functions of the low-lying states would be indistinguishable from those where V0 was infinite. This point also follows directly from the boundary conditions: when (V0 − E), and therefore κ, are very large, the conditions (5.43) become identical to (5.26).
The One-dimensional Schr¨odinger Equations 95 Values of the quantities ka, κa and E that are consistent with the boundary conditions for a potential well whose sides are of height V0 in the case where V0 = 25}2 /2ma2 . The energies of the corresponding states in the case where V0 is infinite are represented by E∞ . ka κa E/V0 E/E∞ 1.306 4.826 0.069 0.691 2.596 4.273 0.270 0.682 3.838 3.205 0.590 0.663 4.907 0.960 0.964 0.610 TABLE 5.1
Worked Example 5.3 Find the conditions on V0 and a for there to be only one bound state associated with a finite potential well. Solution Referring to Figure 5.2, we see that the arc representing κ has one, and only one, intersection with the other curves if k0 a ≤ π/2—i.e. V0 ≤ π2 }2 /(8ma2 ). We now turn to a more detailed discussion of effects associated with the penetration of the wave function into the classically forbidden region. Consider first a potential well bounded by barriers of finite height and width as in Figure 5.7(a). As we have seen in the finite square well case, the wave function decays exponentially in the classically forbidden region and is still nonzero at the points |x| = b. In the regions where |x| > b, however, the total energy is again greater than the potential energy so the wave function is again oscillatory. It follows that there is a probability of finding the particle both inside and outside the potential well and also at all points within the barrier. Quantum mechanics therefore implies that a particle is able to pass through a potential energy barrier which, according to classical mechanics should be impenetrable. This phenomenon is known as quantum-mechanical tunnelling or the tunnel effect.
5.6
QUANTUM MECHANICAL TUNNELLING
To study the tunnel effect in more detail we consider the case of a beam of particles of momentum }k and energy E = }2 k2 /2m approaching a barrier of height V0 (where V0 > E) and width b (see Figure 5.4). A fraction of the particles will be reflected at the barrier with momentum −}k, but some will tunnel through to emerge with momentum }k at the far side of the barrier. The incident, transmitted and reflected beams are all represented by plane waves, so the wave function on the incident side (which we take to be x < 0) is u = A exp(ikx) + B exp(−ikx)
(5.45)
Inside the barrier the wave function has the same form as (5.31) u = C exp(κx) + D exp(−κx)
(5.46)
96 Quantum Mechanics
A beam of particles represented by a plane wave is incident on a potential barrier. Most particles are reflected, but some are transmitted by quantummechanical tunnelling. FIGURE 5.4
and beyond the barrier, which is the region x > b, particles may emerge moving in the positive x direction, so the wave function will have the form u = F exp(ikx)
(5.47)
We note that because the barrier does not reach all the way to infinity, we cannot drop the first term in (5.46) as we did in the square-well case. The boundary conditions requiring both u and du/dx be continuous at x = 0 and x = b can be applied in much the same way as before to give A + B = C + D κ A − B = (C − D) ik (5.48) C exp(κb) + D exp(−κb) = F exp(ikb) ik C exp(κb) − D exp(−κb) = F exp(ikb) κ Adding the first two equations and adding and subtracting the second two gives κ κ 2A = 1 + C + 1− D ik ik ! ik 2C exp(κb) = 1 + F exp(ikb) (5.49) κ ! ik 2D exp(−κb) = 1 − F exp(ikb) κ We can combine these to express F in terms of A 4iκk F = exp(−ikb) A 2iκk + κ2 − k2 ) exp(−κb) + (2iκk − κ2 + k2 ) exp(κb
(5.50)
The fraction of particles transmitted is just the ratio of the probabilities of the
The One-dimensional Schr¨odinger Equations 97 particles being in the transmitted and incident beams, which is just |F|2 /|A|2 and can be evaluated directly from (5.50). In many practical cases, the tunnelling probability is quite small, so we can ignore the term in exp(−κb) in the denominator of (5.50). In this case the tunnelling probability becomes |F|2 16κ2 k2 16E(V0 − E) = exp(−2κb) = exp(−2κb) |A|2 (κ2 + k2 )2 V02
(5.51)
We can use this to calculate the rate at which particles pass through the barrier in the case where f particles approach the barrier per second as Γ= f
n}k 16E(V0 − E) 16E(V0 − E) exp(−2κb) = exp(−2κb) 2 m V0 V02
(5.52)
where n is the number of particles per unit length in the incident beam and the particle speed is }k/m. We see that this tunnelling probability is largely determined by the exponential decay of the wave function within the barrier: the lower and narrower the barrier is, the greater the likelihood of tunnelling. To apply this to the situation of tunnelling out of a well as in Figure 5.7, we would first have to make a Fourier expansion of the wave function inside the well in terms of plane waves and then form an appropriately weighted sum of the transmission probabilities associated with them. In many cases, however, good semi-quantitative estimates can be made simply by considering the exponential decay of the wave function within the barrier. A number of physical examples of tunnelling have been observed and two of these—alpha particle decay and cold electron emission—will now be described.
Alpha decay It is well known that some nuclei decay radioactively emitting alpha particles. The alpha particle consists of two protons and two neutrons bound together so tightly that it can be considered as retaining this identity even when within the nucleus. The interaction between the alpha particle and the rest of the nucleus is made up of two components; the first results from the so-called strong nuclear force, which is attractive, but of very short range, whereas the second is the Coulomb interaction, which is repulsive (because both the alpha particle and the residual nucleus are positively charged) and acts at comparatively large distances. The total interaction potential energy is sketched in Figure 5.5(a) as a function of the separation between the alpha particle and the nucleus, and we see that it is qualitatively similar to that shown in Figure 5.7 and discussed previously. It follows that if the alpha particle occupies a quantum state whose energy is less than zero, it will remain there indefinitely and the nucleus will be stable. If, however, the form of the potential is such that the lowest energy state of the alpha particle is greater than zero, but less than V0 , it will be able to escape from the nucleus by quantum-mechanical tunnelling. The probability of such emission will depend on the actual shape of the barrier, particularly its height and width, which accounts for the large variation in the observed decay constants of different nuclei.
98 Quantum Mechanics
a) Shows the potential energy of interaction between an alpha particle and a nucleus as a function of its distance from the centre of the nucleus while (b) shows the potential energy of an electron near the surface of a metal with and without (broken line) an applied electric field. In each case the particles can pass through the potential barrier by quantum-mechanical tunnelling. FIGURE 5.5
The One-dimensional Schr¨odinger Equations 99
Cold electron emission This phenomenon is observed when a strong electric field is directed toward the surface of a metal, resulting in the emission of electrons. This occurs even if the electrons are not thermally excited (which would be thermionic emission) and so do not have enough energy to escape classically. We first consider the situation in the absence of a field–Figure 5.5(b)– when the electrons are confined within the metal by an energy barrier formed by the work function (see the discussion of the photoelectric effect in Chapter 4). When the electric field is applied, the potential is changed, so that at a short distance from the surface of the metal the potential energy is now less than the energy of the electrons inside the metal. Although the electrons cannot classically penetrate the barrier at the metal surface, they can now pass through by quantum-mechanical tunnelling and the observation of cold electron emission is therefore a confirmation of this effect.
Application: scanning tunnelling microscopy In recent years, cold electron emission has been exploited in the scanning tunnelling microscope. In this instrument, a very sharp tungsten point is held close to, but not touching, a metal surface and an electric potential difference is maintained between them, so that a tunnelling current flows between the surface and the point. This current is measured as the point is slowly scanned across the metal surface. Variations in the tunnelling current then represent changes in the separation between the point and the source of electron emission. The method is very sensitive because of the exponential factor in (5.51); a typical value of κ is 1010 m−1 , so there are significant changes in the tunnelling current when the tipto-sample distance changes by as little as 10−11 m. Using this technique, changes in the tunnelling current can be observed as the point moves over individual atoms, providing a map of the actual atomic structure on the metal surface. An example of this is shown in Figure 5.6 which shows a silicon surface at atomic resolution. (In this case, in common with modern practice, a servo mechanism keeps the tunnelling current constant by moving the tip perpendicular to the surface, and the image is reconstructed from the record of the resulting tip movements.)
100 Quantum Mechanics
An image of the (111) surface of silicon obtained by scanning tunnelling microscopy. The bright peaks correspond to silicon atoms. The hexagonal symmetry is a characteristic feature of this surface. (Supplied by P. A. Sloan and R. E. Palmer of the Nanoscale Physics Research Laboratory at the University of Birmingham.) FIGURE 5.6
It should be noted that, although both these experiments imply that the alpha particle or electron has passed through a classically forbidden region, in neither case has the particle been directly observed while undergoing this process. In fact such an observation appears always to be impossible; all known particle detectors are sensitive only to particles with positive kinetic energy so that, if we insert such a detector within the classically forbidden region, its presence implies that a “hole” has been made in the potential so that the particle is no longer in such a region when it is detected.
Worked Example 5.4 Use (5.52) and some semi-classical reasoning to derive an approx-
The One-dimensional Schr¨odinger Equations 101
FIGURE 5.7 Particles in states with energy between V1 and V2 can escape from the potential well illustrated in (a) by quantum-mechanical tunnelling. (b) shows the real part of the wave function of such a state.
imate expression for the halflife of a particle initially known to be in the ground state of a potential well of the form shown in Figure 5.7. Solution Inside the well, the wave function is similar to that of a particle in the ground state of an infinite square well of width 2a, so that k = π/2a, E = π2 }2 /8ma2 . Semi-classically, we can think of the particle oscillating between the two sides of the well, with a probability of escape at either side. The number of particles per unit length is 1/2a, so using (5.52), the total decay rate is Γ = 2×
}π 16E(V2 − E) exp(−2κb) 4ma2 (V2 − V12
Imagine we have a large number, N0 of such systems and that after a time t, N0 − N(t) particles have escaped from their wells. The change in N(t) during the time interval t to t + dt, is then dN = −Γdt, so that N(t) = N0 exp(−Γt) After a time equal to t1/2 = ln 2/Γ, half the particles have decayed and half remain; this time is known as the “halflife.”
102 Quantum Mechanics
5.7
THE HARMONIC OSCILLATOR
We finish this chapter with a discussion of the energy levels and wave functions of a particle moving in a harmonic oscillator potential. This is an important example because many physical phenomena, including the internal vibrations of molecules and the motion of atoms in solids, can be described using it. It also provides an application of the method of series solution of differential equations, which is a technique that will be used again in the next chapter when we consider threedimensional systems. The harmonic oscillator potential has the form V(x) = 12 K x2 where K is a constant; classically a particle of mass m oscillates in this potential with an angular frequency ωc = (K/m)1/2 . The time-independent Schr¨odinger equation for this system can therefore be written in the form −
}2 d2 u 1 + mω2c x2 u = Eu 2m dx2 2
(5.53)
The subsequent mathematics is a good deal easier to follow if we first change variables from x to y, where y = (mωc /})1/2 x and define a constant α = (2E/}ωc ). Equation (5.53) now becomes d2 u + (α − y2 )u = 0 dy2
(5.54)
We first discuss the asymptotic form of the solution in the region where y is very large so that the equation is approximately d2 u 2 −y u = 0 dy2
(5.55)
We shall try as a solution to this equation u = yn exp(−y2 /2)
(5.56)
where n is a positive integer. Differentiating twice with respect to y we get d2 u = [n(n − 1)yn−2 − (2n + 1)yn + yn+2 ] exp(−y2 /2) dy2 ' yn+2 exp(−y2 /2) when y 1 = y2 u Thus we see that (5.56) is the asymptotic form of the solution we are looking for, which suggests that a general solution to (5.54), valid for all values of y, might be u(y) = H(y) exp(−y2 /2)
(5.57)
where H(y) is a function to be determined. Substituting from (5.57) into (5.54) we get H 00 − 2yH 0 + (α − 1)H = 0 (5.58)
The One-dimensional Schr¨odinger Equations 103 where a prime indicates differentiation with respect to y. We now write H in the form of a power series H=
∞ X
apyp
(5.59)
p=0
(Note that negative powers of y are not permitted as they produce physically unacceptable infinities at y = 0.) Hence H0 =
∞ X
a p py p−1
(5.60)
a p p(p − 1)y p−2
(5.61)
p=0
and
∞ X
H 00 =
p=0
=
∞ X
a p p(p − 1)y p−2
p=2
because the first two terms on the right-hand side of (5.61) vanish; thus H 00 =
∞ X
0
a p0 +2 (p0 + 2)(p0 + 1)y p
(5.62)
p0 =0
where p0 = p − 2. However, p0 is just an index of summation so we can write it simply as p. Doing this and substituting from (5.59), (5.60) and (5.62) into (5.58) we get ∞ X [(p + 1)(p + 2)a p+2 − (2p + 1 − α)a p ]y p = 0
(5.63)
p=0
This can be true for all values of y only if the coefficient of each power of y vanishes, so we obtain the following recurrence relation a p+2 /a p = (2p + 1 − α)/[(p + 1)(p + 2)] → 2/p as p → ∞
(5.64)
This last expression is identical to the recurrence relation between successive terms of the power series for the function exp(y2 )(= Σn (y2n /n!)) so in general H(y) will tend to infinity with y like exp(y2 ), implying that u(y) diverges like exp( 21 y2 ) leading to a physically unrealistic solution. This can be avoided only if the power series for H terminates after a finite number of terms. To obtain the conditions for such a termination, we first note that the series can be expressed as a sum of two series, one containing only even and the other only odd powers of y. By repeated application of (5.64) we can express the coefficients of y p as functions of α multiplied by the constants a0 or a1 depending on whether p is even or odd, respectively. Either series,
104 Quantum Mechanics but not both, can be made to terminate by choosing α so that the numerator of (5.64) vanishes for some finite value of p (say p = n)—i.e., by putting α = 2n + 1. The other series cannot terminate simultaneously, but can be made to vanish completely if its leading coefficient (a0 or a1 ) is taken to be zero. Thus we have the following conditions for a physically acceptable solution to the Schr¨odinger equation in the case of a particle moving in a harmonic oscillator potential α = 2n + 1 n = 0, 1, 2, . . . (5.65) a1 = 0 if n is even and a0 = 0 if n is odd If the first condition is combined with the definition of α, we find that the total energy of the system is quantized according to 1 E ≡ En = (n + )}ωc 2
(5.66)
Thus quantum mechanics predicts that the energy levels of a harmonic oscillator are equally spaced with an interval of } times the classical frequency and have a minimum value of 21 }ωc (known as the zero-point energy). Experimental confirmation of these results is obtained, for example, from observations of the properties of molecules. Thus a diatomic molecule can be considered as two point masses connected by a spring and this system can therefore undergo quantized oscillations. Photons absorbed and emitted by such a molecule then have frequencies that are multiples of the classical frequency of the oscillator. Thermal properties, like the heat capacity, of a gas composed of such molecules are also largely determined by energy quantization. The polynomials Hn (y) are known as Hermite polynomials. Expressions for them can be obtained by repeated application of (5.64) and the results of this procedure for the four lowest energy states are H0 = 1 H1 = 2y
2 H2 = 4y − 2 3 H = 8y − 12y
(5.67)
3
where the values of a0 and a1 have been chosen according to established convention. The wave functions un (x) can now be obtained by multiplying Hn (y) by the factor exp(− 21 y2 ), making the substitution y = (mωc /})1/2 x and applying the normalization condition (5.20). The results of this procedure are shown in Figure 5.8 where the wave functions are seen to have even or odd parity, depending on whether n is even or odd. We note that, like the wave functions associated with the energy states of a finite square well (Figure 5.3), they are oscillatory within the classically permitted region and penetrate the classically forbidden region to some extent. Algebraic expressions
The One-dimensional Schr¨odinger Equations 105
The wave functions corresponding to the four lowest energy states of the harmonic oscillator are shown in (a), while (b) depicts the position probability distribution in the case where n = 10, compared with that calculated classically (broken curve). The limits of the classical oscillation are indicated by vertical lines in each case. Note that (a) and (b) are on different scales.
FIGURE 5.8
106 Quantum Mechanics for these wave functions are as follows u0 = (mωc /π})1/4 exp(−mωc x2 /2})
1/4 3/4 2 u1 = (4/π) (mωc /}) x exp(−mωc x /2}) 1/4 2 2 u2 = (mωc /4π}) [2(mωc /})x − 1] exp(−mωc x /2}) 1/4 3/4 2 2 u3 = (1/9π) (mωc /}) [(2mωc /})x − 3]x exp(−mωc x /2})
(5.68)
The harmonic oscillator provides yet another example of the correspondence principle whereby the results of quantum mechanics tend to those of classical mechanics in the classical limit. To see this, consider a classical oscillator (such as might be used in a quartz watch) with a classical angular frequency ωc = 108 rad s−1 . The separation of energy levels is then about 10−26 J, which corresponds to a temperature of about 10−3 K. At any normal temperature, therefore, the oscillator will be highly excited and the effects of energy quantization will be undetectable. We can also show that the probability distribution of the particle’s position is similar to that expected classically if the oscillator is highly excited. Classically the position and velocity of an oscillating particle vary as x = a cos ωc t
v = −aωc sin ωc t
(5.69)
where a is the amplitude of oscillation. Now an oscillating particle is more likely to be found near the limits of its oscillation where it is moving slowly than near the centre where it moves more quickly. Putting this point more quantitatively, the probability P(x)dx that the particle be found in the region between x and dx is inversely proportional to the speed v(x) at the point x. Hence using (5.69) 1 π(a2 − x2 )1/2 =0
P(x) =
|x| < a
(5.70)
|x| > a
where the factor π ensures normalization. It is clear from Figure 5.8(a) that the quantum-mechanical probability distribution is quite different from this for low energy states, but the agreement rapidly improves as n increases. A detailed comparison is shown for the case n = 10 in Figure 5.8(b). This shows that the quantum-mechanical probability distribution is a rapidly oscillating function of x, but that if we average over these oscillations, the result is quite close to that given in (5.70). For larger values of n this agreement is even better and the separation of the adjacent maxima and minima in the quantum-mechanical probability distribution also becomes undetectably small, so that the predictions of quantum and classical mechanics are then indistinguishable.
5.8
PROBLEMS
5.1 An electron is confined to a one-dimensional potential well of width 6 × 10−10 m which has infinitely high sides. Calculate: (i) the three lowest allowed values of the electron energy; (ii) the wavelength of the electromagnetic wave that would cause the electron to be excited from the lowest to the highest of these three levels; (iii) all possible wavelengths of the radiation emitted following the excitation in (ii).
The One-dimensional Schr¨odinger Equations 107 5.2 If um and un are the wave functions corresponding to two energy states of a particle confined to a one-dimensional box with infinite sides, show that Z ∞ un um dx = 0 if n , m −∞
This is an example of “orthogonality” which will be discussed in Chapter 7. 5.3 Consider a particle of mass m subject to a one-dimensional potential V(x) that is given by V = ∞,
x < 0;
V = 0,
0 6 x 6 a;
V = V0 ,
x>a
Show that bound states—i.e. those with E < V0 —of this system exist only if k cot ka = −κ where k2 = 2mE/}2 and κ2 = 2m(V0 − E)/}2 . Obtain expressions for the wavefunctions of the bound states. 5.4 Use a spread sheet or write a computer program to implement the method of bracketing and bisection and calculate the values of the quantities listed in Table 5.1 when V0 = n}2 /2ma2 when n equals 10, 30, 50 ,100. 5.5 Show that there is only one bound state in the case of a particle moving under the influence of a potential V = −V0 δ(x), where δ(x) is the Dirac delta function defined in Chapter 1. Obtain a value for the energy level and corresponding wavefunction. Show that your results correspond with those for the finite square well where V0 → ∞ and a → 0, but aV0 = 1. 5.6 Show that if V(x) = V(−x), solutions to the time-independent Schr¨odinger equation have definite parity—that is, u(x) = ±u(−x). Hint: Make the substitution y = −x and show first that u(x) = Au(−x) where A is a constant. 5.7 Do the wavefunctions representing the energy states of the system described in problem 3 above have definite parity? If not, why not? 5.8 Consider a particle of mass m subject to the one-dimensional potential V(x) that is given by V =0
if − a 6 x 6 a or if |x| > b
V = V0
if a < |x| 6 b
where b > a. Write down the form of an even-parity solution to the Schr¨odinger equation in each region in the case where E < V0 . Note that the particle is not bound in this potential as there is always a probability of quantum-mechanical tunnelling, so solutions exist for all values of E. Show, however, that if κ(b − a) 1 (where κ is defined as in Problem 3) the probability of finding the particle inside the region |x| < a is very small unless its energy is close to that of one of the bound states of a well of side 2a bounded by potential steps of height V0 . In the case where this condition is fulfilled exactly, obtain an expression for the ratio of the amplitudes of the wave function in the regions |x| 6 a and |x| > b. 5.9 An electron with kinetic energy 2eV approaches a barrier of height 3eV and width 10−10 m. Calculate the probabilities for transmission and reflection and repeat your calculation for a proton approaching the barrier with the same energy. In each case, consider whether you are justified in using equation (5.51). 5.10 Obtain expressions for the transmission and reflection probabilities for a particle approaching the barrier discussed in the text in the case where E > V0 . 5.11 A particle of mass m and energy greater than V0 approaches the finite square well described in the text (5.29). Obtain expressions for the reflection and transmission probabilities in this case.
108 Quantum Mechanics 5.12 An isotope of plutonium (atomic number 94) emits alpha particles (mass 6.6 × 10−27 kg) of energy 4.7 MeV. Referring to Figure 5.6 (a), calculate the distance from the centre of the point where the particles escape from the well. Assuming that the radius of the plutonium nucleus is 10−14 m, what is the maximum energy of the potential barrier? Making approximations similar to those used in Worked Example 5.4, obtain a value for κ for an alpha particle tunnelling through a square barrier of height equal to the maximum energy calculated above. Hence estimate the width of this square barrier if the half life of this process is 8 × 107 years. Compare this width with that derived earlier and comment. 5.13 The tip of a scanning tunnelling microscope is made of metal whose work function as defined in Chapter 4 equals 4eV. If this is held at a distance d from another metal surface, estimate the voltage that must be applied if a current is to flow. Assume that the two metal surfaces are planar and parallel. The shape of the barrier is triangular; supposing that it were replaced by a rectangular barrier of the same height what would be the value of κ? If a tunnelling current flows through the junction, by what factor would this change if the tip-to-sample distance changed by an atomic diameter ( 3 × 10−10 m)? 5.14 What is the parity of the wavefunction associated with the nth energy state of a harmonic oscillator? 5.15 Calculate the normalization constants for the two lowest energy states of a harmonic oscillator and verify that they are orthogonal in the sense defined in Problem 5 above. 5.16 Consider a particle constrained to move in a circle (e.g. a bead on a circular wire) of radius R. What is the form of the wavefunction and what boundary conditions determine the energy levels? Hence obtain expressions for the energy levels and associated wavefunctions. [Assume that the radius of the circle is large so that motion around the circumference can be treated as one-dimensional.] 5.17 Show that the expression y exp(y2 /2) is a solution to the Schr¨odinger equation for a harmonic oscillator and obtain the corresponding value for the energy. Explain carefully why this solution is not physically acceptable. 5.18 A particle subject to a potential V(x) has the form 1.0/ cosh(kx). Obtain an expression for V(x) and the value of the corresponding energy level. 5.19 The hydrogen atom in a water molecule can vibrate in a direction along the O-H bond, and this motion can be excited by electromagnetic radiation of a wavelength about 4 × 10−6 m, but not by radiation of a longer wavelength. Calculate the effective spring constant for this vibration and the zero-point energy of the oscillator. Given that every molecular oscillation has a thermal energy of about kB T , where kB (Boltzmann’s constant) ' 1.4 × 10−23 J K−1 and T is the temperature, how many quanta of vibrational energy are possessed by a typical water molecule in steam superheated to 1000 K? 5.20 The carbon dioxide molecule consists of a carbon atom bound to two oxygen atoms in a straight line (O-C-O). The carbon atom carries a small positive charge which is compensated for by small negative charges on the oxygens. Use your understanding of electromagnetism (Chapter 2) to explain how the CO2 molecule oscillates in response to an electromagnetic wave whose electric vector is perpendicular to the line of the molecule. If this response is a maximum when the wavelength of the radiation equals 1.9 × 10−5 m, what are the energy levels associated with this oscillation? Explain how this leads to the atmospheric greenhouse effect.
CHAPTER
6
The Three-dimensional ¨ Schr odinger Equations CONTENTS 6.1 6.2 6.3 6.4 6.5
The Wave Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Separation in Cartesian Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . The three-dimensional “box” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Separation in Spherical Polar Coordinates . . . . . . . . . . . . . . . . . . . . . . The radial equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Hydrogenic Atom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The hydrogenic atom wave functions . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
110 111 112 115 121 124 127 130
In the previous chapter we saw that for a particle in a one-dimensional potential well, physically acceptable solutions to the one-dimensional Schr¨odinger equation are possible only for particular discrete values of the total energy. Moreover, in the case of an electron in a well of atomic dimensions, the spacings between these energy levels are in qualitative agreement with the separations experimentally observed in atoms; we also interpreted the square of the wave function as a probability distribution for the position of the particle. These ideas led us to phenomena, such as quantum-mechanical tunnelling, that have been observed experimentally. The real world, however, is three-dimensional and, although one-dimensional examples often provide useful insights and analogies, we shall have to extend our theory into three dimensions before we can make quantitative predictions of most experimental results. In the present chapter, therefore, we shall set up the three-dimensional Schr¨odinger equation and obtain its solutions in a number of cases, culminating in a discussion of the hydrogen atom where we shall find that theory and experiment agree to a remarkable degree of accuracy.
109
110 Quantum Mechanics
6.1
THE WAVE EQUATIONS
In classical mechanics the total energy of a free particle of mass m and momentum p is given by E = p2 /2m = (p2x + p2y + p2z )/2m (6.1) where p x , py and pz are the momentum components along the Cartesian axes x, y and z. The de Broglie relations (4.8) in three dimensions are E = }ω (6.2) p = }k that is p x = }k x ; py = }ky ; pz = }kz A wave equation whose solutions are consistent with these relations can be set up in exactly the same way as that described for the one-dimensional case— Equations (5.1) to (5.8). We obtain ! }2 ∂2 Ψ ∂2 Ψ ∂2 Ψ ∂Ψ =− + + 2 (6.3) i} ∂t 2m ∂x2 ∂y2 ∂z where the wave function Ψ(r, t) is now a function of all three positional coordinates and the time. When the particle is subject to a potential V(r, t), Equation (6.3) is generalized in the same way as in the one-dimensional case—Equations (5.9) to (5.11)—to produce the time-dependent Schr¨odinger equation i}
∂Ψ }2 = − ∇2 Ψ + VΨ ∂t 2m
(6.4)
where the vector operator ∇2 (del-squared) is defined so that ∇2 Ψ =
∂2 Ψ ∂2 Ψ ∂2 Ψ + + 2 ∂x2 ∂y2 ∂z
The generalization of the probabilistic interpretation of the wave function is also straightforward: if P(r, t)dτ is the probability that the particle be found in the volume element dτ(≡ dx dy dz) in the vicinity of the point r at time t, then P(r, t) = |Ψ(r, t)|2 It follows directly that the normalization condition (5.12) becomes Z |Ψ(r, t)|2 dτ = 1
(6.5)
(6.6)
where the volume integral is performed over all space (as will always be assumed to be the case for volume integrals unless we specifically state otherwise). When the potential V is independent of time, we can write Ψ(r, t) = u(r)T (t) and separate the variables to obtain the time-independent Schr¨odinger equation—cf. Equations (5.14) to (5.18) }2 − ∇2 u + Vu = Eu (6.7) 2m
The Three-dimensional Schr¨odinger Equations 111 along with T (t) = exp(−iEt/}) while the normalization condition (6.6) now becomes Z |u(r)|2 dτ = 1
(6.8)
The boundary conditions on a wave function that is a solution to the time independent Schr¨odinger equation also follow as a natural extension of the arguments set out in Chapter 5. It must be a continuous single-valued function of position and time; its squared modulus must be integrable over all space—which usually means that it must be everywhere finite; and its spatial derivatives (∂Ψ/∂x, ∂Ψ/∂y and ∂Ψ/∂z) must all be continuous everywhere, except where there is an infinite discontinuity in V. We shall now proceed to obtain solutions to the three-dimensional time independent Schr¨odinger equation (6.7) in a number of particular cases. Unlike the onedimensional case, the equation is now a partial differential equation which gives rise to considerable mathematical complications in general. We shall consider only those cases where the separation-of-variables technique can be used, but we shall still have to solve three ordinary differential equations in order to obtain a complete solution. We shall shortly consider spherically symmetric systems when we shall carry out this process in a spherical polar coordinate system, but we first discuss some simpler examples where the Schr¨odinger equation can be separated in Cartesian coordinates.
6.2
SEPARATION IN CARTESIAN COORDINATES
Consider the case where the potential V(r) can be written as a sum of three quantities, each of which is a function of only one of the three Cartesian coordinates. That is V(r) = V1 (x) + V2 (y) + V3 (z)
(6.9)
We now express the wave function as a product of three one-dimensional functions u(r) = X(x)Y(y)Z(z)
(6.10)
On substituting (6.9) and (6.10) into the time-independent Schr¨odinger equation (6.7), dividing through by u and rearranging slightly, we get # " 2 # " 2 } 1 d2 X } 1 d2 Y − + V1 (x) + − + V2 (y) 2m X dx2 2m Y dy2 " 2 # } 1 d2 Z + − + V3 (z) = E (6.11) 2m Z dz2 Each expression in square brackets is a function of only one of the variables (x, y, z), but the equation holds at all points in space. It follows that each of the expressions in
112 Quantum Mechanics square brackets must be equal to a constant and the sum of the constants must equal E. Thus }2 d 2 X − + V X = E X 1 1 2m dx2 2 2 } d Y (6.12) + V Y = E Y − 2 2 2 2m dy }2 d2 Z − + V Z = E Z 3 3 2 2m dz where E1 + E2 + E3 = E. Each of these equations has the form of the one-dimensional Schr¨odinger equation (5.17) so, in suitable cases, we can carry over results directly from the previous chapter.
The three-dimensional “box” This example relates to a potential that is zero inside a rectangular region of sides 2a × 2b × 2c and infinite outside. That is, we are considering a potential of the form V(r) = 0 if − a 6 x 6 a, −b 6 y 6 b and − c 6 z 6 c (6.13) V(r) = ∞ if |x| > a, |y| > b, or |z| > c Each of the three separated equations (6.12) is now equivalent to the Schr¨odinger equation for the one-dimensional infinite square well and the boundary conditions (X = 0 if x = ±a, etc.) are also the same. It therefore follows directly from (5.27) that the energy levels are given by 2 }2 π2 n21 n22 n3 En1 n2 n3 = (6.14) + + 8m a2 b2 c2 where n1 , n2 and n3 are integers. Similarly, using (5.28), the corresponding wave functions are given by ( ) ( ) ( ) cos n1 πx cos n2 πy cos n3 πz (6.15) un1 n2 n3 = (abc)−1/2 sin sin sin 2a 2b 2c where the cosine applies if the integer in the following argument is odd, and the sine applies if it is even. We see that three quantum numbers (n1 , n2 and n3 ) are needed to specify the energy levels and wave functions in this example, compared with only one in the one-dimensional case. This is a general feature of three-dimensional systems. It is interesting to consider the special case where two sides of the box are equal, as this illustrates some important features that arise when a three-dimensional potential has symmetry. Putting a = b, (6.14) becomes 2 }2 π2 n21 + n22 n3 En1 n2 n3 = + 2 (6.16) 8m a2 c
The Three-dimensional Schr¨odinger Equations 113
(1,2,1)
a
A
y
A
A-
a
a
a
A
x
A
a
A
A
y
A
A-
a
A
x
A
A
y
A-
a
x
A
A
a
a
FIGURE 6.1 The top two diagrams show the shape of the wave functions corresponding to the states n1 = 2, n2 = 1, n3 = 1, and n1 = 1, n2 = 2, n3 = 1, as calculated in the plane z = 0 for the case of a particle in a rectangular box with a = b. The lower diagram shows the position probability distribution calculated as the average of the squares of the two wave functions.
There are now several different combinations of n1 , n2 and n3 that have the same energy: e.g., the states (n1 = 2, n2 = 1, n3 = 1) and (n1 = 1, n2 = 2, n3 = 1) whose wave functions are u211 = (a2 c)−1/2 sin(πx/a) cos(πy/2a) cos(πz/2c) (6.17) 2 −1/2 u121 = (a c) cos(πx/2a) sin(πy/a) cos(πz/2c) When two or more quantum states have the same value of the energy we say that they are degenerate. Very often the degeneracy is closely associated with the symmetry of the potential and we can illustrate this in the present case by considering the geometrical relationship between the two wave functions given in (6.17). Figure 6.1 shows the shape of these functions in the plane z = 0. They are equivalent to each other apart from their orientation in space: thus u121 can be transformed into u211 by rotating it through 90◦ about the z axis. As the potential has square symmetry in the xy plane, we should not expect such a rotation to result in any physical change of the system. However, the position probability distribution (|u|2 ) corresponding to either of these states, does not have the expected symmetry. If the system is in the state with wave function u211 , for example, the probability of finding it near (a/2, 0, 0) is quite large, while that of finding it near (0, a/2, 0) is zero—even though, as far as the potential is concerned, these points are equivalent. To resolve this apparent paradox we must carefully consider what information a measurement of the energy gives us about a degenerate system. If only the energy has been measured, we can conclude that the wave function is one of the two forms given in (6.17), but we cannot tell which. In the absence of further information, we have to assume that either state is equally probable. It follows that the appropriate expression for the position probability distribution is the average of |u211 |2 and |u121 |2 . This quantity does have
114 Quantum Mechanics the same symmetry as the potential, as is also shown in Figure 6.1. We note that this argument relies on the states being degenerate, as otherwise we could tell from an energy measurement which one of the two was occupied, which illustrates the close connection between degeneracy and symmetry. Moreover, the fact that the squared moduli of the wave functions associated with the individual states do not have a direct physical significance illustrates the general principle that we should concentrate only on those results that can be measured and avoid drawing conclusions about apparent consequences that cannot be directly tested, if we want to make sense of quantum mechanics. We shall return to the topic of degeneracy when we discuss spherically symmetric systems later in this chapter and we shall also consider the topic more formally in Chapter 7. Worked Example 6.1 The three-dimensional harmonic oscillator Another example of a three-dimensional system, where the Schr¨odinger equation can be separated in Cartesian coordinates, is the case of a particle moving in the potential V(r) =
1 1 1 K1 x2 + K2 y2 + K3 z2 2 2 2
Find expressions for the energy levels and corresponding wave functions. Solution The separated Schr¨odinger equations (6.12) are −
}2 d2 X 1 + mω21 x2 X = E1 X 2m dx2 2
where ω1 = (K1 /m)1/2 , with similar equations for Y and Z. Each of these has the form of the one-dimensional harmonic oscillator equation (5.53) so we can use the results of this case (5.66) directly to get an expression for the energy levels of the three-dimensional oscillator 1 1 1 En1 n2 n3 = (n1 + )}ω1 + (n2 + )}ω2 + (n3 + )}ω3 2 2 2 etc., where n1 , n2 and n3 are nonnegative integers. In the special, isotropic, case where ω1 = ω2 = ω3 ≡ ω, we can write 3 En = (n + )}ω 2 where n = n1 + n2 + n3 The wave functions also follow directly from the one-dimensional results. In the isotropic case, they are 1 un1 n2 n3 = Hn1 (x0 )Hn2 (y0 )Hn3 (z0 ) exp − (x02 + y02 + z02 ) 2 where x0 = (mω1 /})1/2 x, etc., and the Hni s are Hermite polynomials. This example provides another illustration of how symmetry can result in degeneracy. If, for example, K1 = K2 = K3 it follows that the potential is spherically symmetric and all states with the same value of (n1 + n2 + n3 ) are degenerate.
The Three-dimensional Schr¨odinger Equations 115
FIGURE 6.2 The geometrical relationship between the spherical polar coordinates r, θ, φ and the Cartesian axes x, y, z.
6.3
SEPARATION IN SPHERICAL POLAR COORDINATES
Although there are applications of quantum mechanics in which Cartesian coordinates can be usefully employed, many interesting physical systems, particularly atoms and nuclei, are much more nearly spherical than they are rectangular. Spherically symmetric systems, where the potential V(r) is independent of the direction of r, are usually best treated using spherical polar coordinates (r, θ, φ), where r is known as the radial distance, θ as the polar angle and φ as the azimuthal angle . These are related to the Cartesian coordinates (x, y, z) by the expressions x = r sin θ cos φ y = r sin θ sin φ z = r cos θ
(6.18)
and the geometrical relationship between the two systems is shown in Figure 6.2. The Schr¨odinger equation (6.7) can be written in spherical polar coordinates as # " ! ! }2 1 ∂ 2 ∂u 1 ∂ ∂u 1 ∂2 u − r + sin θ + 2me r2 ∂r ∂r ∂θ r2 sin θ ∂θ r2 sin2 θ ∂φ2 + V(r)u = Eu (6.19) where the standard expression (derived in many mathematics textbooks) is used to
116 Quantum Mechanics express ∇2 u in spherical polar coordinates, and we have represented the mass by me (a common symbol for electron mass) because m will be used later to represent a quantum number. We now proceed to separate the variables and do so in two stages. We first put u(r, θ, φ) = R(r)Y(θ, φ), substitute into (6.19), divide through by u and multiply through by r2 to get " ! # }2 1 d 2 dR 2 2 − r +r V −r E 2me R dr dr ! # " ∂Y }2 1 1 ∂2 Y }2 1 1 ∂ sin θ − =0 (6.20) + − 2me Y sin θ ∂θ ∂θ 2me Y sin2 θ ∂φ2 The contents of the first square bracket are independent of θ and φ and those of the second are independent of r, so they must be separately equal to constants, and the sum of the two constants must be equal to zero. We call these constants −λ and λ, and obtain ! }2 1 d 2 dR λ − r + V + R = ER (6.21) 2me r2 dr dr r2 and
! }2 1 ∂ ∂Y }2 1 ∂2 Y = λY (6.22) − sin θ − 2me sin θ ∂θ ∂θ 2me sin2 θ ∂φ2 Equation (6.22) does not contain the potential V. This means that if we can solve (6.22) for Y(θ, φ), the solutions will represent the angular parts of the wave functions for any spherically symmetric potential V(r) and we then “only” have to solve the radial equation (6.21) to get the complete wave function in a particular case. We shall now show how the general solutions to (6.22) are obtained and return to the solution of the radial equation for particular potentials later. Continuing the separation process, we put Y(θ, φ) = Θ(θ)Φ(φ), substitute into (6.22) and multiply through by sin2 θ/Y to get " # ! # " }2 sin θ d dΘ }2 1 d2 Φ 2 − sin θ − λ sin θ + − =0 (6.23) 2me Θ dθ dθ 2me Φ dφ2 The contents of the first square bracket are independent of φ while those of the second are independent of θ, so they must each be equal to a constant and the sum of the two constants must be equal to zero. We call these constants −w and w, and get ! }2 d dΘ − sin θ sin θ − λ sin2 θΘ + wΘ = 0 (6.24) 2me dθ dθ and
}2 d2 Φ + wΦ = 0 2me dφ2
(6.25)
The solution to (6.25) is straightforward, giving Φ = A exp[±i(2me w/}2 )1/2 φ]
(6.26)
The Three-dimensional Schr¨odinger Equations 117 where A is a constant. We can now apply the condition that the wave function, and hence Φ, must be single valued,1 which means that Φ(φ + 2π) = Φ(φ) Thus exp[±i(2me w/}2 )1/2 2π] = 1 so that (2me w/}2 )1/2 = m
(6.27)
where m is an integer that can be positive or negative or zero. Substituting back into (6.26), we get Φ = (2π)−1/2 exp(imφ) (6.28) where the factor (2π)−1/2 is included as a first step to normalizing the wave function; it ensures that Z 2π
|Φ|2 dφ = 1
(6.29)
0
We have now completed the solution of one of the three differential equations and obtained one quantum condition (6.27) along with one quantum number, m. Returning to the equation for Θ (6.24), we can substitute from (6.27) and rearrange to get ! d dΘ sin θ sin θ + (λ0 sin2 θ − m2 )Θ = 0 (6.30) dθ dθ where λ0 = 2me λ/}2 . The solution of this equation is made simpler if we make the substitution v = cos θ and write P(v) ≡ Θ(θ), leading to d d d = − sin θ = −(1 − v2 )1/2 dθ dv dv Equation (6.30) then becomes " # " # d m2 2 dP 0 (1 − v ) + λ − P=0 dv dv 1 − v2
(6.31)
We first consider the simpler special case where m is equal to zero; Equation (6.31) is then " # d dP (1 − v2 ) + λ0 P = 0 (6.32) dv dv The method of series solution that was previously employed in the case of the onedimensional harmonic oscillator (Section 5.7) can now be applied and we put P=
∞ X p=0
1 See
footnote 2 in Chapter 5.
apvp
(6.33)
118 Quantum Mechanics Hence " # ∞ d d X 2 dP (1 − v ) = [a p pv p−1 − a p pv p+1 ] dv dv dv p=0 = =
∞ X
a p p(p − 1)v p−2 −
p=0 ∞ X
∞ X
a p p(p + 1)v p
p=0 0
a p0 +2 (p0 + 2)(p0 + 1)v p −
p0 =0
∞ X
a p p(p + 1)]v p
(6.34)
p=0
where p0 = p − 2; because the terms with p = 0 and p = 1 in the first summation on the second line are zero, the summation over p0 in the last line begins at p0 = 0. As p0 is just an index of summation, we can rewrite it as p and substitute from (6.34) into (6.32) ∞ X {a p+2 (p + 2)(p + 1) − a p [p(p + 1) − λ0 ]}v p = 0 p=0
This can be true only if the coefficient of each power of v is zero, so we obtain the recurrence relation a p+2 p(p + 1) − λ0 = →1 ap (p + 1)(p + 2)
as p → ∞
(6.35)
Thus, for large p the series (6.33) is identical to the Taylor expansion of the function (1 − v)−1 , which diverges to infinity at the point v = 1, which is not consistent with physical boundary conditions. This can be avoided if the series terminates at some finite value of p, say, p = l, and if a0 = 0 when l is odd and a1 = 0 when l is even. We therefore obtain the second quantum condition λ0 = l(l + 1)
(6.36)
where l is an integer which is greater than or equal to zero. Thus P, now written as Pl , is a polynomial of degree l which contains either only odd powers or only even powers of v. These polynomials are known as the Legendre polynomials and their properties are described in many mathematics textbooks. Explicit forms corresponding to particular values of l, can be obtained from (6.36) and (6.35); for example P0 (v) = 1 P1 (v) = v 1 2 (6.37) P2 (v) = (3v − 1) 2 1 P3 (v) = (5v3 − 3v) 2 where the values of the constants a0 and a1 have been chosen in accordance with established convention to ensure that Pl (1) = 1. It follows that, because Pl (v) contains
The Three-dimensional Schr¨odinger Equations 119 only even powers of v when l is even and only odd powers when l is odd, Pl (−1) = (−1)l . The solution of (6.31) in the general case of nonzero values of m is more complicated and the reader is referred to a mathematics textbook for the details. We note that (6.31) is independent of the sign of m, so we expect the solutions to be (v). (Note: |m| is not a power in characterized by l and |m| and we write them as P|m| l this case.) It can be shown that 1 P|m| (v) = (1 − v2 )|m|/2 l
d|m| Pl dv|m|
(6.38)
We can use (6.38) to obtain a condition restricting the allowed values of m. Pl is a polynomial of degree l so its |m|th derivative, and hence P|m| , will be zero if |m| l is greater than l. But if P|m| is zero, the whole wave function must be zero over all l space, and this is physically unrealistic. We therefore have the condition −l 6 m 6 l
(6.39)
We have now solved the differential equations in θ and φ so we can combine the solutions to obtain expressions for the angular part of the wave function, which we now write as Ylm (θ, φ), the suffixes l and m emphasizing the importance of these quantum numbers in characterizing the functions. We have " Ylm (θ, φ) = (−1)
m
(2l + 1)(l − |m|)! 4π(l + |m|)!
#1/2
P|m| (cos θ) exp(imφ) l
m>0
(6.40)
where it can be shown that the factor in square brackets ensures normalization of the function when it is integrated over all solid angles; that is Z 0
2π Z π
|Ylm (θ, φ)|2 sin θ dθ dφ = 1
(6.41)
0
The phase factors (−1)m in (6.40) are arbitrary, but chosen in accordance with established convention. The functions Ylm are known as spherical harmonics and the reader is once again referred to an appropriate mathematics textbook for a discussion of their properties and a derivation of the form of the normalizing constant. Explicit expressions for the spherical harmonics with l less than or equal to two are given 1 The mathematically inclined reader can verify this result by differentiating (6.32) |m| times, using Leibniz’s expression for the nth derivative of a product. The resulting expression is then equivalent to (6.31), provided (6.38) holds.
120 Quantum Mechanics below and their shapes are illustrated in Figure 6.3. !1/2
!1/2 3 Y10 = cos θ 4π !1/2 3 Y1±1 = ∓ sin θ exp(±iφ) 8π !1/2 5 Y20 = (3 cos2 θ − 1) 16π !1/2 15 cos θ sin θ exp(±iφ) Y2±1 = ∓ 8π !1/2 15 2 Y2±2 = sin θ exp(±2iφ) 32π 1 Y00 = 4π
(6.42)
A notable feature of Figure 6.3 is that the wave functions have a particular orientation in space even though the potential is spherically symmetric and the direction of the z axis (sometimes known as the axis of quantization) is therefore arbitrary. This apparent contradiction results from degeneracy and can be resolved in the same way as in the similar case of a particle in a square box discussed in Chapter 5. We first note that m does not enter Equation (6.21), which determines the energy levels of the system, so there are always 2l + 1 degenerate states that differ only in their values of m. If we measure the energy of such a system, we shall not be able to tell which of these wave functions is appropriate and we must therefore average their squared moduli in order to calculate the position probability distribution. The apparently angularly dependent part of this quantity will therefore be given by l X (2l + 1)−1 |Ylm (θ, φ)|2 m=−l
It is one of the standard properties of the spherical harmonics that this quantity is spherically symmetric—as can be readily verified in the cases where l = 0, 1 and 2 by substituting the expressions given in Equation (6.42)—so we once again see that the predictions of quantum mechanics concerning physically measurable quantities are consistent with what would be expected from the symmetry of the problem. The physical significance of the quantum numbers l and m will be discussed in detail in Chapter 8. It will turn out that l and m are associated with the quantization of the angular momentum of a particle in a central field: the square of the angular momentum has the value l(l + 1)}2 and the z component of angular momentum has the value m}.
The Three-dimensional Schr¨odinger Equations 121
Representations of the shapes of the spherical harmonics with quantum numbers l, m, where l 6 2 and the z axis is vertical. In the case where m = 0, the dark and light regions are of opposite sign; when m , 0, the function is complex and its phase changes by 2mπ during a complete circuit of the z axis. FIGURE 6.3
The radial equation We now turn our attention to the radial equation (6.21), which determines the energy levels of the system. Substituting the expression for λ0 obtained from the angular solution (6.36) and remembering that λ0 = 2me λ/}2 , we get ! " # }2 1 d 2 dR l(l + 1)}2 − r + V(r) + R = ER (6.43) 2me r2 dr dr 2me r2 This can be slightly simplified by making the substitution χ(r) = rR(r), which gives " # }2 d 2 χ l(l + 1)}2 + V(r) + χ = Eχ (6.44) − 2me dr2 2me r2 Apart from the second term within the square brackets, Equation (6.44) is identical in form to the one-dimensional Schr¨odinger equation. However, an additional boundary
122 Quantum Mechanics condition applies in this case: χ must equal zero at r = 0, otherwise R = r−1 χ would be infinite at that point.1 As well as being mathematically convenient, the function χ(r) has a physical interpretation in that |χ|2 dr is the probability of finding the electron at a distance between r and r + dr from the origin averaged over all directions. This follows from the fact that this probability is obtained by integrating |ψ(r, θ, φ)|2 over a spherical shell of radius r and thickness dr. That is, it is given by Z
2π Z π
|R2 (r)|r2 dr 0
|Y(θ, φ)|2 sin θ dθ dφ = |χ2 (r)|dr
0
using (6.41). To progress further with the solution of the radial equation, the form of the potential V(r) must be known, and in the next section we shall consider the particular example of the hydrogenic atom. Worked Example 6.2 The spherical infinite square well. Find expressions for the energy levels and wave functions if the potential is spherically symmetric and V(r) = 0,
r ≤ a;
V(r) = ∞,
r>a
Assume l = 0 and discuss qualitatively the case of other values of l. Solution The potential is spherically symmetric, so u(r, θ, φ) = (χ(r)/r)Yl,m (θ, φ), where χ(r) = 0 if r > a, and (6.44) becomes, with l = 0 −
}2 d2 χ = Eχ, 2me dr2
r≤a
The general solution of this equation has the form χ = A cos(kr) + B sin(kr) R must be finite at r = 0, so χ(0) = 0 and A = 0; also, R = 0 at r = a, so k = nπ/2a, where n is an integer. Hence χ = B sin(nπr/2a) Substituting this solution into the differential equation leads to an expression for the energy levels }2 k2 }2 π2 2 En = = n 2m 2ma2 If l > 0, an additional positive term appears on the left-hand side of the differential equation, which implies that energy levels with l > 0 must be higher than the corresponding levels where l = 0. Rr observant reader will have noticed that if R ∼ r−1 then 0 |R|2 r2 dr will be finite and therefore the boundary condition set out in Chapter 5 is not breached. It can be shown that the condition χ = 0 at r = 0 follows from the requirement that the solutions of the Schr¨odinger equation expressed in spherical polar coordinates must also be solutions when the equation is written in Cartesian coordinates. Further details on this point can be found in P. A. M. Dirac, The Principles of Quantum Mechanics, Oxford, 1974, Chapter 9. 1 The
The Three-dimensional Schr¨odinger Equations 123 Worked Example 6.3 The isotropic harmonic oscillator. The three-dimensional oscillator was discussed in Worked Example 3.1: in the isotropic case, K1 = K2 = K3 ≡ K and the potential is V(r) = 21 Kr2 . Using results from this example show that the energy levels in the isotropic case can be written as En = (n + 12 )}ω, where n ≥ 1. Using the expressions for the Hermite polynomials given in Chapter 5, express the wave functions of the lowest energy state and one of the degenerate first excited states in spherical polar coordinates and verify that these are solutions of (6.44). Solution From Worked Example 6.1 1 1 1 En1 n2 n3 = (n1 + )}ω1 + (n2 + )}ω2 + (n3 + )}ω3 2 2 2 1 = (n + )}ω 2 if ω1 = ω2 = ω3 ≡ ω and n is an integer ≥ 1, because the ni are integers ≥ 0. Also 1 un1 n2 n3 = Hn1 (x0 )Hn2 (y0 )Hn3 (z0 ) exp − (x02 + y02 + z02 ) 2 1 = Hn1 (r0 sin θ cos φ)Hn2 (r0 sin θ sin φ)Hn3 (r0 cos θ) exp −( r02 ) 2 where r0 = (mω/})1/2 r. The lowest energy state has n1 = n2 = n3 = 0. One of the first excited states has n1 = n2 = 0, n3 = 1. Thus, referring to (5.68), but ignoring normalizing factors u0,0,0 = exp(−mωr2 /2}) u0,0,1 = r cos θ exp(−mωr2 /2}) Referring to (6.42), we see that u0,0,0 is a state with l = m = 0, while u0,0,1 has l = 1 and m = 0. We therefore have χ0,0,0 = r exp(−mωr2 /2}) χ0,0,1 = r2 exp(−mωr2 /2}) In the case of the ground state, (6.44) becomes (putting γ = mω/} so that }2 /2m = mω/2γ2 ) 1 −(mω2 /2γ2 )[−γr − 2γr + γ2 r3 ] exp(−γr2 /2) + mω2 r3 exp(−γr2 /2) = Er exp(−γr2 /2) 2 The third and fourth terms on the left-hand side cancel, so the equation is satisfied if E = 3}ω/2 as expected. In the case of the first excited state, (6.44) becomes 1 −(mω2 /2γ2 )[2 − 3γr2 − 2γr2 + γ2 r4 ] exp(−γr2 /2) + mω2 r4 exp(−γr2 /2) 2 +(mω2 /2γ2 ) × 1 × 2 exp(−γr2 /2) = Er2 exp(−γr2 /2) The first and last terms on the left-hand side cancel, as do the fourth and fifth, so the equation is satisfied if E = 5}ω/2, as expected.
124 Quantum Mechanics
6.4
THE HYDROGENIC ATOM
We are now ready to apply quantum theory to the real physical situation of an electron moving under the influence of a positively charged nucleus. If this nucleus consists of a single proton, the system is a hydrogen atom, but the theory is also applicable to the more general case of an atom with atomic number Z (and hence nuclear charge Ze) with all but one of its electrons removed (for example, He+ , Li2+ , etc.). In general, such a system is described as a hydrogenic atom. The potential energy of interaction between the electron and the nucleus is −Ze2 /4π0 r. In principle, both the electron and the nucleus will move under the influence of their mutual attraction; we shall show in Chapter 13 that this can be allowed for in exactly the same way as in classical mechanics by taking r to be the distance between the nucleus and the electron, and µ to be the reduced mass of the nucleus (mass mN ) and the electron (mass me ). That is µ = mN me /(mN + me )
(6.45)
We now consider the motion of a single particle with this reduced mass moving in the given potential. Because the mass of the electron is much smaller than that of the nucleus, µ is very nearly equal to me and the effect of nuclear motion is small. However, conventionally, when comparison is made with experimental quantities (such as the Rydberg constant introduced in Chapter 4) the experimental values are first adjusted to compensate for the effect of nuclear motion. The radial equation (6.44) for the hydrogenic atom is therefore " # Ze2 l(l + 1)}2 }2 d2 χ + − + χ = Eχ (6.46) − 2µ dr2 4π0 r 2µr2 The solution of Equation (6.46) will again involve considerable manipulation, which is simplified by making a suitable substitution. We define a new variable ρ so that ρ = (−8µE/}2 )1/2 r (6.47) (note that E is negative for bound states as the potential is zero when r is infinite) and hence d2 χ 8µE d2 χ = − (6.48) dr2 }2 dρ2 Equation (6.46) now becomes ! d2 χ χ β 1 − l(l + 1) + − χ=0 (6.49) ρ 4 dρ2 ρ2 where the constant β is defined as µ 1/2 Ze2 β= − 2E 4π0 }
(6.50)
We first consider the solution to (6.49) in the case of very large ρ when the equation becomes d2 χ 1 − χ=0 (6.51) dρ2 4
The Three-dimensional Schr¨odinger Equations 125 leading to χ ∼ exp(−ρ/2)
(6.52)
(where we have rejected a possible solution with positive exponent because it diverges to infinity at large ρ). This suggests that we try χ = F(ρ) exp(−ρ/2)
(6.53)
as a solution to (6.49). On substitution we get d2 F dF l(l + 1) β − − F+ F =0 ρ dρ2 dρ ρ2
(6.54)
We now look for a series solution to (6.54) and put F=
∞ X
apρp
(6.55)
p=1
The lower limit of this summation is p = 1 rather than p = 0, otherwise F and, therefore, χ would not be zero at ρ = 0. Thus ∞
dF X = pa p ρ p−1 dρ p=1
(6.56)
and ∞
d2 F X p(p − 1)a p ρ p−2 = dρ2 p=1 =
∞ X (p + 1)pa p+1 ρ p−1
(6.57)
p=1
Also, after redefining p as p + 1 F/ρ = 2
∞ X
a p ρ p−2
p=1
= a1 ρ−1 +
∞ X
a p+1 ρ p−1
(6.58)
p=1
Substituting from Equations (6.55) to (6.58) into (6.54) we get −l(l + 1)a1 ρ−1 +
∞ X [(p + 1)pa p+1 − pa p − l(l + 1)a p+1 + βa p ]ρ p−1 = 0 p=1
The coefficient of each power of ρ must vanish so we have a1 = 0
unless l = 0
(6.59)
126 Quantum Mechanics and a p+1 p−β = ap p(p + 1) − l(l + 1) → p−1
as p → ∞
(6.60) (6.61)
We first note that the denominator on the right-hand side of (6.60) is zero if p = l. This implies that al+1 (and, by implication, all other a p where p is greater than l) must be infinite unless al is zero. But if al equals zero, it follows from (6.60) that al−1 , al−2 , etc., must also equal zero. We conclude, therefore, that all a p with p less than or equal to l must be zero if the solution is to represent a physically realistic wave function. We also see that (6.61) is identical to the recurrence relation for the terms in the series expansion of exp(ρ) and so χ, which equals F exp(−ρ/2), will diverge like exp(ρ/2) as ρ tends to infinity. However, just as in the case of solutions to the harmonic oscillator and Legendre-polynomial equations, this divergence can be prevented by ensuring that the series terminates after a finite number of terms. For this to occur at the term p = n we must have β=n>l
(6.62)
and hence, using (6.50) µZ 2 e4 (6.63) 2(4π0 )2 }2 n2 We have thus derived expressions for the discrete energy levels of the hydrogenic atom in terms of the mass of the electron, the nuclear charge and the fundamental constants e, } and 0 . It should be noted that the energy levels (6.63) are not only independent of m, as would be expected from the earlier discussion, but are also independent of l. This additional degeneracy is a particular feature of the Coulomb potential and is not a general property of spherically symmetric systems. It is now an acid test of the theory developed so far that we compare these energy levels with those experimentally measured from observations of atomic spectra. We saw in Chapter 4 that the line spectra of hydrogen could be accounted for if the hydrogen atom were assumed to have a set of energy levels given by E ≡ En = −
En = −2π}cR0 /n2
(6.64)
where n is a positive integer and R0 is the Rydberg constant. Comparison of (6.63) and (6.64) shows at once that these have the same form so that there is at least qualitative agreement between theory and experiment. Quantitative comparison is made using the measured values of the fundamental constants me = 9.109 381 88 × 10−31 kg 0 = 8.854 187 817 × 10−12 F m−1 } = 1.054 571 596 × 10−34 J s e = 1.602 176 462 × 10−19 C c = 2.997 924 58 × 108 m s−1
The Three-dimensional Schr¨odinger Equations 127 These lead to a value for R0 of 10 973 731.6 m−1 , which is within one part in 1010 of the accepted best value.1 Moreover, the tiny discrepancy is less than the estimated errors on the measurements of the relevant quantities. Similar agreement is obtained for other hydrogenic atoms when the appropriate values of the nuclear charge are substituted into Equation (6.63). These results therefore represent an important test of quantum-mechanical theory, which it has passed with flying colours.2 Our belief in quantum mechanics does not of course rest on this result alone: indeed an expression identical to (6.63) was derived by Niels Bohr using an earlier theory, which was subsequently shown to be incorrect when applied to more complex systems. However, although we shall compare the results of calculation and experiment on a number of other occasions when we shall always find agreement within the limits of experimental error, there are very few examples of physical quantities whose values can be both measured experimentally and calculated theoretically to such high precision.
The hydrogenic atom wave functions We now complete our consideration of the hydrogenic atom by discussing the form of the wave functions associated with the different energy levels. We previously saw that the radial part of the wave function is consistent with the boundary conditions only if the series (6.55) for F starts at the term p = l + 1 and terminates at p = n. We thus have n X Fn (ρ) = apρp (6.65) p=l+1
where the coefficients a p can be expressed in terms of al+1 using the recurrence relation (6.60) with β = n. The results are known as the Laguerre polynomials. We can then use (6.53) and the definition of ρ in terms of r to obtain χ(r) and hence R(r). This can be combined with the appropriate spherical harmonic to produce an expression for the complete time-independent part of the wave function, u(r, θ, φ), which will be normalized if the spherical harmonic has been normalized in accordance with (6.41) and the constant al+1 has been chosen so that Z ∞ |R|2 r2 dr = 1 (6.66) 0
Formally, then, we have unlm = Rnl (r)Ylm (θ, φ)
(6.67)
where the suffixes indicate the dependence of the various functions on the quantum numbers n, l, and m. The wave functions corresponding to the five states of lowest 1 This value has been adjusted to remove the effects of nuclear motion as discussed earlier as well as relativistic corrections (see Chapter 14). 2 Of course the quantum theory of atomic spectra is now so well established that formulae such as (6.63) are themselves used in determining the best values of the fundamental constants, but the fact that a wide variety of experimental data can be successfully and consistently used in this way is itself a confirmation of the theory.
128 Quantum Mechanics
The radial parts, Rn,l , of the wave functions corresponding to some of the energy states of the hydrogen atom (a). The corresponding radial probability distributions, |χn,l |2 = r2 |Rn,l |2 , are displayed in (b). FIGURE 6.4
energy as determined in this way are u100 = (Z 3 /πa30 )1/2 exp(−Zr/a0 )
3 3 1/2 u200 = (Z /8πa0 ) (1 − Zr/2a0 ) exp(−Zr/2a0 ) 3 3 1/2 u210 = (Z /32πa0 ) (Zr/a0 ) cos θ exp(−Zr/2a0 ) 3 3 1/2 u21±1 = ∓(Z /πa0 ) (Zr/8a0 ) sin θ exp(±iφ) exp(−Zr/2a0 )
(6.68)
where the constant a0 is defined as a0 = 4π0 }2 /µe2 = 0.529 176 6 × 10−10 m and is known as the Bohr radius.
(6.69)
The Three-dimensional Schr¨odinger Equations 129 The value of the azimuthal quantum number l is often denoted by a particular letter code: states with l = 0, 1, 2, and 3 are labelled s, p, d, and f respectively. This letter is sometimes prefixed by a number equal to the quantum number n, so that the first state in (6.68) is known as the 1s state, the second is 2s and the others are 2p states. The radial parts of the wave functions (6.68) are plotted as functions of r in Figure 6.4 for the case of the hydrogen atom where Z = 1. We see that the constant a0 characterizes the radius of the wave function of the lowest energy state and that this width increases for states of higher energy. We can combine (6.69) and (6.63) to express the energy levels in terms of a0 En = −
Z 2 e2 2(4π0 )a0 n2
(6.70)
As the potential energy is given by V = −Ze2 /4π0 r, an electron with the above total energy would have positive kinetic energy only for values of r less than 2n2 a0 /Z. As we see in Figure 6.4, the exponential tails of the wave functions penetrate the classically forbidden region where r is greater than this in a manner very similar to that discussed in the one-dimensional cases in Chapter 5. Figure 6.4 also shows |χ2 (r)| = r2 |R2 (r)| for each state as a function of r. As we pointed out earlier, this expression equals the probability that the electron be found at a distance between r and r + dr from the origin (in any direction). We see that this probability reaches a maximum at r = a0 in the case of the ground-state wave function. We particularly note that in all cases |χ|2 equals zero at the origin, even though the square of the wave function can have its largest value at that point. The reader should think carefully about this apparent contradiction and how it can be resolved by understanding the different nature of the two probability distributions represented by |u|2 and |χ|2 . Worked Example 6.4 The positron is a particle with charge e and mass me . An electron and a positron can come together to form an atom of “positronium.” Find an expression for the energy levels of positronium and compare the size of a positronium atom with that of an atom of hydrogen Solution The electrostatic potential is the same in positronium as in hydrogen, but the reduced mass is now me /2. The energy levels are therefore given by En = −
m e Z 2 e4 4(4π0 )2 }2 n2
The equivalent of the Bohr radius is 8π0 }2 /me e2 So, to a good approximation, the size of a positronium “atom” is twice that of the hydrogen atom.
130 Quantum Mechanics
6.5
PROBLEMS
6.1 Generalize the argument in Section 6.2 to show that, whatever the detailed form of the potential, if it has square symmetry, the wave functions either have this symmetry or form degenerate pairs. Show also that the position probability distribution always has square symmetry. 6.2 What are the possible degeneracies and what is the symmetry of the position probability distribution in the case of a particle subject to a three-dimensional potential with cubic symmetry? 6.3 A particle moves in two dimensions in a circularly symmetric potential. Show that the time independent Schr¨odinger equation can be separated in plane polar coordinates and that the angular part of the wave function has the form (2π)−1/2 exp(imφ) where m is an integer. What is thesymmetry of the ∂ ∂2 ∂ position probability distribution in this case? [In plane-polar coordinates, ∇2 = 1r ∂r r ∂r + r12 ∂φ 2] 6.4 Consider a circularly symmetric two-dimensional system similar to that described in Problem 3, where the potential is zero for all values of r less than a and infinite otherwise. Show that the radial part R(r) of the wave function must satisfy the equation ! m2 d2 R 1 dR R=0 + + 1 − dρ2 ρ dρ ρ2 P k where ρ = (2µE/}2 )1/2 r. In the case where m = 0 show that R = ∞ k=0 Ak ρ where Ak = 0 if k is odd and Ak+2 = −Ak /(k + 2)2 . Given that the first zero of this function is at ρ = 2.405, obtain an expression for the energy of the ground state of the system. 6.5 A particle moves inside a cylinder of radius a and length l. The potential is zero everywhere inside the cylinder and infinite everywhere outside. Write down the time-independent Schr¨odinger equation for this problem using cylindrical polar coordinates (r, φ, z) and show that it separates into three ordinary differential equations, two of which are the same as in the two previous problems. Explain how the energy levels are the same as those obtained in the previous problem, but with an additional term and obtain an expression for the contribution of this term to the energies of the states. 6.6 A particle of mass me moves in a three-dimensional spherically symmetric well where V = 0, r 6 a and V = V0 , r > a. Show that the energies of those states with quantum number l = 0 are determined by the condition k cot ka = −κ in the notation used in Chapter 5. Show that there are no bound states of such a system unless V0 > }2 π2 /8me a2 . Would you expect this condition to be modified if states with l , 0 were also considered? 6.7 What is the probability of finding a particle with energy E and subject to the potential described in the previous problem, in the classically forbidden region? 6.8 Referring to worked example 6.3, show that the first three energy levels of the isotropic harmonic oscillator have degeneracies 1, 3 and 6 respectively. Show that in each case the form of their wavefunction can be expressed as linear combinations of the spherical harmonics defined in (6.42). (Normalization is not required) 6.9 A particle of mass m is rigidly connected by a rigid light rod of length a to an axis about which it is able to rotate freely. Obtain expressions for the energy levels and the corresponding wavefunctions in terms of the azimuthal angle φ. 6.10 The arrangement described in the previous problem is modified to allow rotation about any direction in space. Obtain expressions for the energy levels and the corresponding wavefunctions in terms of the azimuthal angle φ and the polar angle θ.
The Three-dimensional Schr¨odinger Equations 131 6.11 The (negative) binding energy of the ground state of the deuteron (neutron+proton) is 2.23 MeV. Assuming that the interaction potential is of the form described in Problem 6 with a = 2.0 × 10−15 m, find the corresponding value of V0 . Do bound states of the deuteron other than the ground state exist? Hints: Use the reduced mass ; x = 1.82 is a solution to the equation x cot x = −0.46. 6.12 Verify that
2π Z π
Z
∗ Ylm Yl0 m0 sin θ dθ dφ = 0 unless l = l0 and m = m0 0 for all values of l, l0 , m and m0 up to and including those with l and/or l0 equal to 2, where Y 0
lm is a spherical ∗ its complex conjugate. This is an example of a property known as “orthogonality” which harmonic and Ylm will be further discussed in the next chapter.
6.13 Use the hydrogen atom wave functions and the probabilistic interpretation of the wave function to calculate (i) the most probable and (ii) the average value, of the distance between the electron and proton in a hydrogen atom in its 1s state in terms of the Bohr radius, a0 . 6.14 Calculate the probability of finding an electron in the ground state of a hydrogen atom in the classically forbidden region. 6.15 The Uranium atom normally consists of a nucleus whose mass is 238 times that of a proton surrounded by 92 electrons. Supposing it were possible to remove all but one of these electrons, calculate the energy levels of the resulting ion and the average distance of the electron from the nucleus. Comment on your result. 6.16 Show that the hydrogen atom wavefunctions u100 , u200 and u300 are mutually orthogonal 6.17 A “muonic atom” is formed when a muon of charge −e and mass equal to 207me is bound to a nucleus. Assuming the nucleus to be a proton, calculate the energy levels and compare the size of this muonic atom with that of the hydrogen atom.
III Formal Foundations
CHAPTER
7
The Basic Postulates of Quantum Mechanics CONTENTS The Wave Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 The Dynamical Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Orthonormality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Continuous eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Expectation values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Commutation Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 The Uncertainty Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Single-slit diffraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 The Time Dependence of the Wave Function . . . . . . . . . . . . . . . . . . . 7.7 Degeneracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.8 The Harmonic Oscillator Again . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.9 The Measurement of Momentum by Compton Scattering . . . . . . 7.10 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1
137 137 137 142 143 147 148 149 150 151 153 156 158 160 163 165 168
In the previous chapters we have seen how solutions to the time-independent Schr¨odinger equation correspond to the allowed energy levels of a quantum system and how, in the hydrogen atom case in particular, the results of this procedure are in extremely good agreement with the results of experiment. It would be possible to extend the process to predict the energy levels of other atoms. We would find that the corresponding Schr¨odinger equations could no longer be solved exactly, but that approximations could be developed which, when combined with computational techniques, would lead to predicted energy levels that were once again in very good agreement with experiment. However, such a programme, most of which is beyond the scope of this book in any case, would be premature at this stage as we have not
135
136 Quantum Mechanics yet established a general procedure that will do more than predict the allowed energy levels of a particle moving in a potential and the probability that it is in the vicinity of a particular point in space. We do not yet know, for example, how to predict the momentum of an electron in a hydrogen atom; we do not even know if it has a definite value or if it can only be specified by a probability distribution, as is the case for electron position. We have as yet no way of predicting under what conditions an atom undergoes a transition from one state to another, emitting energy in the form of a quantum of electromagnetic radiation. Answers to this and similar questions require a more general and more sophisticated approach to quantum mechanics. We shall develop the foundations of this in the present chapter and we shall see that it is based on five postulates. In developing and discussing these postulates, we shall build on what we have already done in previous chapters to make each postulate seem at least reasonable. It is important to remember that this is an inductive process; i.e., although the postulates can be made to appear reasonable by considering particular examples, they can never be rigourously derived in this way. We encountered a previous example of an inductive argument when we set up the Schr¨odinger equation on the basis of a consideration of classical waves and the experimental de Broglie relations; we emphasized then that this argument in no way represented a proof of the Schr¨odinger equation and that our belief in its correctness lay in the fact that it was successful in predicting the results of experiments. In a similar way the correctness or otherwise of the basic postulates of quantum mechanics rests on the agreement between deductions from them and the results of experiments. So far at least, quantum mechanics has withstood every experimental test; there are many fields of physics that are not yet completely understood, but there are no experimental results that contradict or falsify the fundamental principles of quantum mechanics. Despite this, the fundamental basis of the subject contains ideas that run very much against the intuition we have all developed in our interaction with the classical world. We have come across some of these already in our discussion of wave-particle duality in earlier chapters and we shall come across more in this chapter and later. However, we shall postpone any detailed discussion of the foundations of the subject to the last chapter, where we shall attempt to introduce the reader to the main ideas underlying the vigourous philosophical debate about the conceptual basis of quantum mechanics that has gone on ever since the early days of the subject. It might therefore have been expected that we should simply have stated the basic postulates at the beginning of this book, thereafter concentrating on deriving predictions of experimental results from them, and indeed such an approach is sometimes adopted. However, the ideas of quantum mechanics are rather abstract and most students understand them more easily when introduced to them gradually using inductive arguments based on a knowledge of the wave mechanics of atoms as described in the previous chapters.
The Basic Postulates of Quantum Mechanics 137
7.1
THE WAVE FUNCTION
The experimental evidence for the wave properties of the electron and the success of the Schr¨odinger equation in predicting the energy levels of the hydrogen atom indicate that information about the physical properties of a particle moving under the influence of a potential can be obtained from the wave function. As we saw, this is a continuous, single-valued function of the coordinates of the particle and the time, and its squared modulus must be integrable over all possible particle positions. It is important to remember that, although the measurable properties of a system are derived from the wave function, it is not itself a physical quantity. However, it is a fundamental principle of quantum mechanics that the wave function contains all the information it is possible to obtain about a physical system in a particular state. The first postulate then concerns the existence of the wave function. Postulate 1 For every dynamical system there exists a wave function that is a continuous, square-integrable, single-valued function of the parameters of the system and of time, and from which all possible predictions about the physical properties of the system can be obtained. This statement covers the case of a particle moving in a potential where the “parameters of the system” are the particle coordinates, but it also refers to more general situations: for example, the parameters may be the coordinates of all the particles of a many-body system and may include internal variables such as “spin.”
Notation In the previous chapters we used the symbol Ψ to represent a general solution to the time-dependent Schr¨odinger equation and the symbol u (sometimes with a subscript) as the time-independent part of the wave function of a system in a state of given energy. We shall continue this notation in the present chapter, and in addition we shall use the symbol ψ to represent a general wave function, whose time dependence we are not explicitly considering, and the symbol φ (often with a subscript) when the system is in what we shall call an “eigenstate”—that is, when some dynamical quantity (not necessarily the energy) has a known value.
7.2
THE DYNAMICAL VARIABLES
In this section we consider how dynamical variables (position, momentum, angular momentum, energy, etc.) are represented mathematically in quantum mechanics. In classical mechanics, such quantities are represented by algebraic variables: position coordinates x, y, z; momentum components p x , py , pz ; energy E; etc. However, algebraic variables can take on any value whereas we have seen that quantummechanical properties are often confined to a discrete set of values (e.g. the energy levels of a hydrogen atom). Moreover, algebraic variables imply precise values, while the uncertainty principle indicates that simultaneous specification of the magnitudes of two dynamical quantities (e.g., position and momentum) is not always possible
138 Quantum Mechanics in quantum mechanics. We therefore look for a new way of representing dynamical quantities mathematically. Let us consider again the time-independent Schr¨odinger equation in the case of a particle moving in a potential V(r), which was given in Equation (6.7) and which we now write in the form # " 2 } (7.1) − ∇2 + V(r) un = En un 2m or }2 ˆ n = En un Hu where Hˆ = − ∇2 + V (7.2) 2m The quantity in square brackets in (7.1), which is defined as Hˆ in (7.2), is an example of a mathematical operator. Such operators operate on functions (in this case the un ) to produce new functions, and Equation (7.2) therefore states that the energy levels En and corresponding wave functions un are such that, when the operator Hˆ operates on un it produces a result equivalent to multiplying un by the constant En . The quantities En and the functions un are known as the eigenvalues ˆ and we say that the energy of the and eigenfunctions respectively, of the operator H, quantum-mechanical system is represented by an operator Hˆ whose eigenvalues are equal to the allowed values of the energy of the system. For historical reasons Hˆ is generally known as the Hamiltonian operator. We now consider the significance of the eigenfunctions un . In the earlier chapters we interpreted un as representing the wave function of the system when it was in a state whose energy is En . This implies that if the energy of the system were to be measured when the wave function is un , we should certainly obtain the result En . As a second measurement of the energy performed immediately after the first would be reasonably expected to yield the same result, we conclude that the wave function of a quantum-mechanical system will be identical to the corresponding eigenfunction of the Hamiltonian operator immediately after a measurement of the energy of the system. It is a reasonable extension of these arguments to say that other dynamical variables (position, momentum, etc.) should also be capable of representation by operators, that the eigenvalues of these operators should correspond to the possible results of experiments carried out to measure these quantities, and that the wave function of a quantum-mechanical system should be identical to the corresponding eigenfunction immediately following a measurement represented by such an operator. Consider first the momentum operator: the first part of the Hamiltonian (7.2) corresponds to the kinetic energy of the particle, and classically this is related to the particle momentum by the expression T = p2 /2m
(7.3)
If we assume that a similar relation holds in quantum mechanics we get }2 1 ˆ2 P = − ∇2 2m 2m
(7.4)
The Basic Postulates of Quantum Mechanics 139 where Pˆ is the momentum operator. An expression for Pˆ that is consistent with this is Pˆ = −i}∇ (7.5) That is
∂ Pˆ x = −i} ∂x
etc.
(7.6)
where Pˆ x is the operator representing the x component of the momentum. The negative sign in the definition of Pˆ x is chosen to ensure consistency with our earlier results as we shall soon see. We now consider the eigenvalue equation of the operator representing the x component of momentum: if the eigenvalues and eigenfunctions are p and φ respectively, then ∂ (7.7) −i} φ = pφ ∂x that is φ = A exp(ikx) (7.8) where A is a constant and k = p/}. But this is just the de Broglie relation (5.1) connecting wave number and momentum, and Equation (7.8) is therefore consistent with the discussion in earlier chapters. This explains the choice of sign for the momentum operator (7.5). We now note two points about the momentum eigenvalue equation. First, there is a solution to (7.7) for any value of p so the possible values of the momentum of a particle are not confined to a discrete set as the allowed energy levels of a bound particle are. Second, the only case where the solutions to the time-independent Schr¨odinger equation (i.e. the energy eigenfunctions) have the same form as that of the momentum eigenfunctions is when the particle is free and the potential is uniform. Thus if the energy of a bound particle is measured, the wave function immediately after the measurement will not be an eigenfunction of the momentum operator. It follows that the outcome of a measurement of the momentum of a bound particle when the energy is already known cannot be accurately predicted. We shall return to this point in more detail later when we discuss the fourth postulate. We now consider the operator representing the position of a particle, confining our discussion to one dimension for the moment. The eigenfunction equation can be written formally as ˆ = x0 φ Xφ (7.9) where x0 and φ now represent the position eigenvalues and eigenfunctions, respectively, and we have to obtain a form of Xˆ which is consistent with what we already know. We remember that in the previous chapters the squared modulus of the wave function was interpreted as the probability density for the position of the particle. If, therefore, the eigenfunction φ is to represent the wave function immediately after a position measurement yielding the result x0 , then φ must be very large inside a region of very narrow width ∆ in the vicinity of x and zero elsewhere. In the limit as
140 Quantum Mechanics ∆ goes to zero, the wave function therefore has the form of a Dirac delta function–c.f. Chapter 1 (1.34)–and is written as δ(x − x0 ). In this case ˆ Xδ(x − x0 ) = x0 δ(x − x0 )
(7.10)
Xˆ ≡ x
(7.11)
which is satisfied if as can be seen by substituting (7.11) into (7.10) and considering separately the point x = x0 (where the expressions on the two sides of (7.10) are identical) and the region x , x0 (where both sides are zero). That is, the quantum-mechanical operator representing the x coordinate of a particle is just the algebraic variable x. In three dimensions, it follows that the particle position operator is the vector r and we note that this is consistent with the fact that we wrote the energy operator in the form 2
} Hˆ = − ∇2 + V(r) 2m where V(r) is just the potential expressed as a function of the algebraic variable r. We therefore conclude that the position and momentum of a particle can be represented by the operators r and −i}∇, respectively. We saw that the kinetic and potential energy operators have the same functional dependence on the operators representing position and momentum as do the corresponding quantities in classical mechanics and we assume that this relationship also holds for other dynamical quantities that can be expressed classically as functions of r and p. We write the eigenfunction equation in the case of a general operator Qˆ as ˆ n = qn φn Qφ
(7.12)
where qn is an eigenvalue and φn is the corresponding eigenfunction of Qˆ Before expressing all this in the form of formal postulates, we shall establish one general property of operators used to represent dynamical variables. We have interpreted the eigenvalues of the operators discussed so far as representing the possible results of experiments carried out to measure the values of the corresponding physical quantities. If this interpretation is to be correct, it is clear that these eigenvalues must be real numbers even though the eigenfunctions, or the operators themselves, may be imaginary (like the momentum operator) or complex. One class of mathematical operators that always have real eigenvalues consists of the Hermitian operators. These are defined such that if f (r) and g(r) are any well-behaved functions of r that vanish at infinity, then the operator Qˆ is Hermitian if and only if1 Z Z ˆ f Qg dτ = gQˆ ∗ f dτ (7.13) 1 Readers
familiar with Hermitian matrices may like to note that (7.13) is equivalent to Q12 = Q∗21
where Q12 =
Z
f1∗ Qˆ f2 dτ
These results follow directly from (7.13) if we replace f and g by f1∗ and f2 , respectively.
The Basic Postulates of Quantum Mechanics 141 where Qˆ ∗ is the complex conjugate of Qˆ and the integrals are over all values of r. The complex conjugate of an operator is defined so that, for any function f , Qˆ ∗ f ∗ equals (Qˆ f )∗ ; in nearly all cases, this definition is equivalent to replacing i with −i ˆ wherever it appears in Q. We now show that the eigenvalues of a Hermitian operator are real. Let ˆ n = qn φn Qφ then Qˆ ∗ φ∗n = q∗n φ∗n Hence
and
Z
Z
ˆ n dτ φ∗n Qφ
= qn
Z
φn Qˆ ∗ φ∗n dτ = q∗n
Z
φ∗n φn dτ
(7.14)
φn φ∗n dτ
(7.15)
But if Qˆ is Hermitian it follows from (7.13) (identifying f with φn and g with φ∗n ) that the left-hand sides of (7.14) and (7.15) must be equal. Equating the right-hand sides of these equations we get Z Z 2 ∗ qn |φn | dτ = qn |φn |2 dτ (7.16) and hence qn = q∗n
(7.17)
so that qn is real as required. The converse of this theorem (that all operators with real eigenvalues are Hermitian) is not true: the Hermitian property is a stronger condition on the operator than the reality of the eigenvalues. However, as it has been found possible to represent all physical quantities by Hermitian operators, and as these have a number of useful properties, some of which will be discussed later, it is convenient to impose this Hermitian condition. We can readily show that the operators representing position and momentum in one dimension are Hermitian: in the case of x this follows trivially from substitution into the one-dimensional equivalent of (7.13), while in the case of Pˆ x we have Z ∞ Z ∞ ∂g dx f Pˆ x g dx = −i} f −∞ −∞ ∂x Integrating by parts, the right-hand side becomes h i∞ Z ∞ ∂ f −i} f g − g dx −∞ −∞ ∂x
142 Quantum Mechanics The first term is zero because f and g vanish at infinity so we have Z ∞ Z ∞ ∂f f Pˆ x g dx = i} g dx ∂x −∞ Z ∞−∞ = gPˆ ∗x f dx −∞
as required. We leave it as an exercise for the reader to extend this argument to three dimensions and to confirm that other operators considered so far are Hermitian. We can now summarize the contents of these paragraphs in two postulates. Postulate 2 Every dynamical variable may be represented by a Hermitian operator whose eigenvalues represent the possible results of carrying out a measurement of the value of the dynamical variable. Immediately after such a measurement, the wave function of the system is identical to the eigenfunction corresponding to the eigenvalue obtained as a result of the measurement. This transformation of the wavefunction from its original form to one equal to that of the eigenfunction is often described as “the collapse of the wavefunction.” Postulate 3 The operators representing the position and momentum of a particle are r and −i}∇, respectively. Operators representing other dynamical quantities bear the same functional relation to these, as do the corresponding classical quantities to the classical position and momentum variables.
Orthonormality One important property of the eigenfunctions of a Hermitian operator is known as orthonormality and is expressed by the following relation Z φ∗n φm dτ = δnm (7.18) where φn and φm are eigenfunctions of some Hermitian operator and δnm is the Kronecker delta defined by δnm = 0, n , m; δnm = 1, n = m. To prove this, consider a Hermitian operator Qˆ whose eigenvalues are qn and whose eigenfunctions are φn . That is ˆ n = qn φn Qφ (7.19) Thus
Z
ˆ m dτ φ∗n Qφ
= qm
Z
φ∗n φm dτ
(7.20)
(because qm is a constant, which can be taken outside the integral) and similarly (remembering that qn is real) Z Z φm Qˆ ∗ φ∗n dτ = qn φm φ∗n dτ (7.21)
The Basic Postulates of Quantum Mechanics 143 But it follows from the definition of the Hermitian operator (7.13) that the left-hand sides of (7.20) and (7.21) are equal. We thus have, equating the right-hand sides Z φ∗n φm dτ = 0 or qm = qn (7.22) The second alternative implies that m = n or that the eigenfunctions φm and φn correspond to the same eigenvalue—i.e., that they are degenerate. We postpone a discussion of degeneracy until later in this chapter and conclude for the moment that Z φ∗n φm dτ = 0 m,n (7.23) Turning now to the case m = n, it is clear from the eigenvalue equation (7.19) that if φn is an eigenfunction of Qˆ then so is Kφn where K is any constant. As |φn |2 dτ represents the probability of finding the particle in the volume element dτ and as the total probability of finding the particle somewhere in space is unity, we choose the scaling constant so that Z |φn |2 dτ = 1
(7.24)
Equations (7.23) and (7.24) are equivalent to (7.18) so we have proved orthonormality in the case of nondegenerate eigenfunctions. We shall see later that, although degenerate eigenfunctions are not necessarily orthogonal, an orthonormal set of wave functions can always be chosen in the degenerate case also. Worked Example 7.1 Write down the Hamiltonian operator for a one-dimensional oscillator and show that it is Hermitian. Solution Referring to (5.53), we get Pˆ 2 1 Hˆ = + mω2c Xˆ 2 2m 2 We know that Pˆ is Hermitian, so Z ∞ Z ∞ Z ˆ Pg ˆ dx = f Pˆ 2 g dx = [ f P] −∞
−∞
∞
ˆ P∗ˆ f dx = [Pg]
−∞
Z
∞ −∞
ˆ dx = [Pˆ ∗ f ]Pg
Z
∞
gPˆ ∗2 f dx
−∞
which proves that Pˆ 2 is Hermitian. A similar argument can be applied to Xˆ 2 , so Hˆ must also be Hermitian
7.3
PROBABILITY DISTRIBUTIONS
So far we have postulated that physical quantities can be represented by Hermitian operators whose eigenvalues represent the possible results of experimental measurements and whose corresponding eigenfunctions represent the wave function of the system immediately after the measurement. Clearly if, for example, the wave function of a system is also identical to one of the energy eigenfunctions un immediately before a measurement of the energy of the system, then the result En
144 Quantum Mechanics will definitely be obtained. However, we have not yet postulated what the outcome of the experiment will be if the wave function before the measurement is not one of the un . The first point to be made is that in this case, quantum mechanics does not make a precise prediction about the result of the energy measurement. This is not to say that a precise measurement of the energy is impossible (in fact Postulate 2 states that such a measurement will always produce a result precisely equal to one of the energy eigenvalues) but that it is unpredictable. Clearly, the measurement of energy is just one example and similar reasoning should be applicable to the measurement of other physical properties. The inability of quantum mechanics to predict the actual outcome of many individual physical events is a fundamental feature of the theory. However, the relative probabilities of the different possible outcomes can always be predicted, and we shall now proceed to develop a postulate concerning these probabilities and how they can be derived from a knowledge of the wave function and the operators representing the dynamical variables. We have already seen in Chapters 5 and Chapter 6 how a knowledge of the wave function can be used to predict the relative probabilities of the possible outcomes of a measurement of the position of a particle. We found that if the wave function of the system before the position measurement is carried out is ψ(r), then the probability that the position measurement will yield a result within the element of volume dτ around r is |ψ(r)|2 dτ. The outcome of a position measurement is exactly predictable only if the wave function is zero everywhere except at a particular point where it is very large, and we saw in the previous section that this “Dirac-delta” form is an eigenfunction of the position operator r. Dirac deltas give rise to technical problems that are best avoided at this stage and in any case no practical measurement can determine the particle position with infinite accuracy. Instead, we consider a one-dimensional experiment where the particle is found in one of a series of detectors arranged along the x axis, if we assume that each detector has a small, but finite, width ∆ and that all positions within it are equally probable, then the wavefunction after a measurement that detected the particle in the nth detector will be φn (x) where 1 1 xn − ∆ 6 x 6 xn + ∆ 2 2 (7.25) 1 φn (x) = 0 if |x − xn | > ∆ 2 xn is the position of the centre of the nth detector and the magnitude of φn (x) is chosen to ensure normalization–c.f. (7.23). Suppose now that the wave function before the measurement is ψ(x). Because ∆ is assumed to be small, ψ(x) is effectively constant and equal to ψ(xn ) within the detector. It therefore follows from our earlier discussion of the probability interpretation of the wavefunction that the probability of the particle being found in the nth detector is |ψ(xn )|2 ∆. and that after the measurement the wavefunction will be one of the φn . This is illustrated in Figure 7.1 To generalise this result, we first note that we can use the above definition of φn (x) to write ψ(x) as φn (x) = ∆−1/2
if
The Basic Postulates of Quantum Mechanics 145 Before Measurement After Measurement = or
+
or
+
or
+
or
+
or
+
or
+
An example of quantum measurement theory. The left-hand part of the diagram shows how the wave function before a position measurement can be expressed as a sum of parts, each of which corresponds to one of the seven possible outcomes. After the measurement the wave function “collapses” at random into one of the functions shown on the right. (NB: The vertical scale of the right-hand peaks is reduced: they should all be the same size with their squares mutiplied by ∆ normalized to unity.) FIGURE 7.1
ψ(x) =
X
≡
X
∆1/2 ψ(xn )φn
n
an φn
(7.26)
n
where an = ∆1/2 ψ(xn )
(7.27)
and we have again used the fact that ∆ is small. We now postulate that we can generalize this result to obtain an expression for the probability of obtaining the result qn from a measurement represented by the ˆ n = qn φn ) on a system whose wave function is known to be ψ operator Qˆ (where Qφ immediately before the measurement. To do this, we assume that we can always find
146 Quantum Mechanics a set of constants an such that
X
ψ=
an φn
(7.28)
n
where the summation is over all the eigenfunctions of the operator) and that the probability of obtaining the result qn , when the quantity represented by Qˆ is measured, is |an |2 . The expression (7.28) relies on a mathematical property of the eigenfunctions of a Hermitian operator known as completeness. This states that any well-behaved function, such as the wave function ψ, can be expressed as a linear combination of the eigenfunctions φn which are then said to form a complete set. As noted earlier (Postulate 2), immediately after a measurement yielding the result qn the wave function becomes equal to φn . We now express all this in the form of a postulate. Postulate 4 When a measurement of a dynamic variable represented by a Hermitian ˆ is carried out on a system whose wave function is ψ, then the probability operator, Q, P of the result being equal to a particular eigenvalue qm will be |am |2 , where ψ = n an φn and the φn are the eigenfunctions of Qˆ corresponding to the eigenvalues qn . We now show that in the case where both the wave function ψ and the eigenfunctions φn are known, we can use orthonormality to obtain an expression for an Z Z X am φm dτ φ∗n ψ dτ = φ∗n m
X
=
Z am
φ∗n φm dτ
m
X
=
am δnm = an
(7.29)
φ∗n ψ dτ
(7.30)
m
So that an =
Z
The right-hand-side of (7.30) is sometimes called the overlap integral connecting ψ and φn . Essentially it is a measure of the extent to which the two functions are similar, which makes some sense of the fact that its square is the probability of obtaining the corresponding result of the measurement. We developed our argument by considering the measurement of position, so let us see what it now gives us for momentum measurements. Suppose we design an experiment, similar to that considered in the earlier position measurement, to measure the momentum of a particle. We assume that the possible outcomes constitute a set of closely spaced bands centred on the values pn . Applying Postulate 4, we say that the probability of obtaining the result pn (= }kn ) is |an |2 where, using the momentum eigenfunctions (7.8) X ψ(x) = an A exp ikn x (7.31) n
The Basic Postulates of Quantum Mechanics 147 We see that in this case, completeness leads to an expression that is just the Fourier expansion of the function ψ(x). Applying (7.30) to the momentum case, we get Z an = ψ(x)A exp −ikn xdx (7.32) which is the standard expression for the Fourier amplitude. Worked Example 7.2 (i) What possible values would be obtained if the position, x, of a particle were measured on a system whose wave function is 3 4 φ(x − a) + φ(x − 2a) 5 5 where φ has the form of a narrow peak of height h and width ∆ and is subject to (7.25) (ii) What possible values would be obtained if the momentum p of a particle were measured on a system whose wave function is 3 4 exp[ik0 x] + exp[2ik0 x] 5 5 Solution (i) Assuming that the width of φ is vanishingly small, the possible values of x are a and 2a. Using (7.28) and Postulate 4, the probabilities of these results are 9/25 and 16/25, respectively. (ii) From (7.8), the possible values of p are }k0 and 2}k0 . Using (7.28) and Postulate 4, the probabilities of these results are 9/25 and 16/25 respectively.
Continuous eigenvalues The previous arguments have been developed on the assumption that the eigenvalues of the operator in question have discrete values that can be indexed by an integer such as n. This is true for the energy levels of a particle bound to a potential and (as we shall see later) for the angular momentum eigenvalues, but many other quantities, notably position and momentum, have a continuous range of eigenvalues. There are several ways by which these cases can be brought within our formalism. One way is to “force” the problem to be discrete. We did this earlier when we assumed that measurements of position or momentum would produce one of a discrete set of results rather than a precise value. Another way of doing this is to imagine the system to be contained inside a large box, so that the eigenvalues are closely spaced, but not continuous. Outside the box, the wave function may be assumed to be zero or we may imagine that its form inside the box is repeated periodically through all space. The latter device ensures that the wave function can be expanded as a Fourier series rather than a transform and this is effectively what we did in (7.31). An alternative is to adapt our formalism to the continuous case by making use of the properties of Dirac delta functions. Writing the general eigenvalue equation (7.12) as ˆ Qφ(k, x) = q(k)φ(k, x)
(7.33)
148 Quantum Mechanics where the summation index, n is replaced by the continuous variable k, the equivalent of (7.28) is Z ψ(x) =
a(k)φ(k, x)dk
(7.34)
Instead of |an |2 as the probability of getting the result qn we have |a(k)|2 dk as the probability of getting a result inside the range dk in the vicinity of k. The orthonormality condition (7.18) then becomes Z φ∗ (k, x)φ(k0 , x)dx = δ(k − k0 ) (7.35) As an example, we show how momentum eigenfunctions can be treated in this way, restricting ourselves to one dimension. We put φ(k, x) = (2π)−1/2 exp(ikx) so that ψ(x) = (2π)−1/2 and a(k) = (2π)−1/2
Z
Z
a(k) exp(ikx)dk
(7.36)
ψ(x) exp(−ikx)dx
(7.37)
a(k) is therefore the Fourier transform of ψ(x) and can be thought of as a “wave function in k space.” It should be noted that k may represent a set of variables, rather than a single variable. For example, representation of the momentum of a three-dimensional system requires a vector k with three components. Similarly, in many discrete systems the index n used before represents a set of several integer indices.
Summary We have now set out the core ideas of quantum measurement theory and the reader would be wise to pause at this point and ensure that they have been grasped. Putting it all a little less formally: 1. A quantum system has a wave function associated with it. 2. When a measurement is made, the result is one of the eigenvalues of the operator associated with the measurement. 3. As a result of the measurement the wave function collapses to become the corresponding eigenfunction. 4. The probability of a particular outcome equals the squared modulus of the overlap between the wave functions before and after the measurement. Look again at Figure 7.1 and make sure you understand how this applies to the case of a position measurement.
The Basic Postulates of Quantum Mechanics 149
Expectation values We can use this formalism to predict the average value that will be obtained from a large number of measurements of the same quantity. We assume that the wave functions before each experiment are identical; an example would be a stream of particles whose positions were then measured—e.g. as in Figure 7.1. This average is known as the expectation value and is obtained in the case of a wave function ψ and an operator Qˆ as follows. Consider the expression Z ˆ dτ ψ∗ Qψ Using the expansion of ψ in terms of the eigenfunctions φn of Qˆ (7.28) we can write this as Z Z X X ∗ ˆ ψ Qψ dτ = a∗m φ∗m Qˆ an φn dτ m
=
X
=
X
=
X
n
Z a∗m an qn
φ∗m φn dτ
m,n
a∗m an qn δmn
m,n
|an |2 qn
(7.38)
n
But Postulate 4 states that |an |2 is just the probability that the value qn be obtained in the measurement, so the right-hand side of (7.38) is just the standard statistical expression for the average value of q and the left-hand side is the expectation value ˆ so we have we are looking for. This is often written as hQi Z ˆ = ψ∗ Qψ ˆ dτ hQi (7.39) Thus, if we know the wave function of the system and the operator representing the dynamical variable being measured, we can calculate the expectation value. Worked Example 7.3 Find the expectation values of (i) x, p x , x2 and p2x in the case of a particle in the ground state of a one-dimensional infinite square well. Solution Referring to (5.28), the wave function is a−1/2 cos(πx/2a) when −a ≤ x ≤ y and zero otherwise, so using the one-dimensional form of (7.39) we have (i) Z a ˆ = a−1 hXi x cos2 (πx/2a)d x = 0, by symmetry −a
(ii) ! ∂ cos(πx/2a) dx cos(πx/2a) −i} ∂x −a Z a = (iπ}/2a)a−1 cos(πx/2a) sin(πx/2a) dx
hPˆ x i = a−1
Z
a
−a
= 0, by symmetry
150 Quantum Mechanics (iii) hXˆ 2 i = a−1
Z
a
x2 cos2 (πx/2a) dx
−a
Z a 1 = a−1 x2 [1 + cos(πx/a)] 2 −a ! 1 2 = a2 − 2 = 0.13a2 3 π (iv) ! ∂2 cos(πx/2a) −} cos(πx/2a) dx ∂x2 −a Z π2 }2 a cos2 (πx/2a) dx = 4a3 −a π2 }2 = 4a2
hPˆ 2x i = a−1
Z
a
2
where we have used standard integrals.
7.4
COMMUTATION RELATIONS
Consider the following expression where ψ is any wave function, and Xˆ and Pˆ x are the operators representing the x coordinate and x component of momentum of a particle ! ∂ ∂ψ (Pˆ x Xˆ − Xˆ Pˆ x )ψ = −i} (xψ) − x −i} ∂x ∂x ∂ψ ∂ψ = −i}x − i}ψ + i}x ∂x ∂x = −i}ψ (7.40) We note two important points. First, the effect on ψ of the product of two quantummechanical operators is, in general, dependent on the order in which the operators are applied: that is, unlike algebraic variables, quantum-mechanical operators need not commute. Second, the result (7.40) is completely independent of the particular form of ψ so we can write ˆ ≡ Pˆ x Xˆ − Xˆ Pˆ x = −i} [Pˆ x , X]
(7.41)
ˆ is defined by (7.41) and is known as the commutator or commutator where [Pˆ x , X] ˆ ˆ Similar arguments to these can be used to show that bracket of P x and X. ˆ = [Pˆ z , Z] ˆ = −i} [Pˆ y , Y] ˆ Y], ˆ [Pˆ x , Y], ˆ [Pˆ x , Pˆ y ] etc., are all equal to zero. and that commutators of the form [X, All the commutators relating to the different components of position and momentum
The Basic Postulates of Quantum Mechanics 151 can therefore be collected together in the following statements [Xˆ i , Xˆ j ] = [Pˆ i , Pˆ j ] = 0 ˆ ˆ ˆ ˆ [Pi , X j ] = −[X j , Pi ] = −i}δi j
(7.42)
ˆ Xˆ 2 ≡ Y; ˆ etc. where we are now using the notation Xˆ 1 ≡ X; Expressions for the commutation relations between other pairs of operators representing dynamical variables can be obtained in a similar manner, although not every commutator is equal to a constant, as was the case above. Worked Example 7.4 Find an expression for the commutator bracket of the x coordinate of a particle and its kinetic energy. Solution 1 ˆ ˆ2 [X, P ] 2m 1 ˆ ˆ2 ˆ2 ˆ ˆ ˆ2 ˆ2 ˆ ˆ ˆ2 ˆ2 ˆ = X P x − P x X + X Py − Py X + X Pz − Pz X 2m 1 ˆ ˆ = [(P x X + i})Pˆ x − Pˆ x (Xˆ Pˆ x − i})] 2m = i}Pˆ x /m
ˆ Tˆ ] = [X,
where we have used (7.42).
Commutation relations are of great importance in quantum mechanics. In fact it is possible to use the expressions (7.42) to define the operators representing position and momentum rather than making the explicit identifications, Xˆ ≡ x and Pˆ x = −i}∂/∂x, etc. Clearly, other sets of operators could be chosen that would satisfy these commutation relations; for example, the momentum operator could be put equal to the algebraic variable p with the one-dimensional position operator being i}∂/∂p, but it can be shown that the physical predictions of quantum mechanics are independent of the choice of operators provided the commutation relations (7.42) are satisfied. It is possible to develop much of the formal theory of quantum mechanics using the commutation relations without postulating particular forms for the operators and this “representation-free” approach is employed in many advanced texts. It is also sometimes useful to represent dynamical variables by matrices instead of algebraic operators and this will be developed further in Chapter 9. However, although we shall prove several important relations without using explicit expressions for the operators, we shall use the “position representation” set up in Postulate 3 whenever this makes the argument clearer or more readily applicable to particular examples. We shall now use the concept of the commutator bracket to discuss the ideas of compatible measurements and the uncertainty principle.
Compatibility Two physical observables are said to be compatible if the operators representing them have a common set of eigenfunctions. This means that if one quantity is
152 Quantum Mechanics measured, the resulting wave function of the system will be one of the common eigenfunctions; a subsequent measurement of the other quantity will then have a completely predictable result and will leave the wave function unchanged. (As will be seen later, this statement has to be modified in the degenerate case.) We shall now show that the operators representing compatible measurements commute. Let the operators be Qˆ and Rˆ with respective eigenvalues qn and rn and common eigenfunctions φn . Then, if ψ is any physical wave function, which can therefore be written in the form (7.28) X ψ= an φn n
we have ˆ R]ψ ˆ = [Q,
X
ˆ n − Rˆ Qφ ˆ n) an (Qˆ Rφ
n
=
X
=
X
ˆ n φn − Rq ˆ n φn ) an (Qr
n
an (rn qn φn − qn rn φn )
n
=0 ˆ R] ˆ = 0 if Qˆ and Rˆ represent compatible observables. The converse can also Thus [Q, be proved in the nondegenerate case (the degenerate case is discussed later in this chapter) as we now show. Given that the operators Qˆ and Rˆ commute, let φn be an eigenfunction of Qˆ so that ˆ n = Rˆ Qφ ˆ n = Rq ˆ n φn = qn Rφ ˆ n Qˆ Rφ
(7.43)
ˆ n ) is also an eigenfunction of Qˆ with It follows directly from (7.43) that (Rφ ˆ n ) can differ from φn only eigenvalue qn . In the absence of degeneracy, therefore, (Rφ by a multiplicative constant. If we call this constant rn we have ˆ n = rn φn Rφ ˆ Thus, in order to test and φn must therefore be an eigenfunction of both Qˆ and R. whether two quantities are compatible, it is sufficient to calculate the commutator of the operators representing them and check whether it is equal to zero. For example, it follows from this and the commutation relations (7.42) that measurements of the three positional coordinates can be made compatibly, as can measurements of the three components of momentum. Moreover, a momentum component can be measured compatibly with the measurement of either of the other two position coordinates, but not with the position coordinate in the same direction. In the case of a free particle, the energy operator (Pˆ 2 /2m) commutes with the momentum operator ˆ and these two quantities can therefore be measured compatibly: the common (P) eigenfunctions in this case are plane waves of the form (7.8).
The Basic Postulates of Quantum Mechanics 153
7.5
THE UNCERTAINTY PRINCIPLE
One of the most well-known ideas in quantum mechanics is Heisenberg’s uncertainty principle, which we discussed briefly in Chapter 4. Arguably, it is also the most misunderstood topic in the field and we shall try to clear up some of this confusion later in this section. However, before doing so we shall use the formalism developed so far to obtain a general form of the uncertainty principle applicable to any pair of measurements. We first need to establish some properties of pairs of Hermitian operators. We will show that if Qˆ and Rˆ are Hermitian operators then, although the product Qˆ Rˆ need not ˆ and i(Qˆ Rˆ − Rˆ Q) ˆ are always Hermitian. We be Hermitian, the expressions (Qˆ Rˆ + Rˆ Q) ˆ ˆ have, using (7.13) and the fact that Q and R are Hermitian Z Z ˆ dτ = (Rg) ˆ Qˆ ∗ f dτ f Qˆ Rg Z ˆ dτ = (Qˆ ∗ f )Rg Z = gRˆ ∗ Qˆ ∗ f dτ (7.44) Similarly Z
ˆ dτ = f Rˆ Qg
Z gQˆ ∗ Rˆ ∗ f dτ
(7.45)
Thus, adding (7.44) and (7.45) Z Z ˆ dτ = g(Qˆ Rˆ + Rˆ Q) ˆ ∗ f dτ f (Qˆ Rˆ + Rˆ Q)g
(7.46)
while, subtracting (7.44) and (7.45) and multiplying by i Z Z ˆ dτ = g[i(Qˆ Rˆ − Rˆ Q)] ˆ ∗ f dτ f [i(Qˆ Rˆ − Rˆ Q]g
(7.47)
which proves the above statements. We note that it is an obvious corollary of (7.46) that if Qˆ is Hermitian so is Qˆ 2 . Thus, for example, the Hermitian property of the kinetic energy operator, Tˆ = Pˆ 2 /2m, follows directly from the fact that Pˆ is Hermitian. Consider now a series of measurements of the quantity represented by the operator Qˆ on a system whose wave function is ψ before each measurement. The ˆ which was shown earlier (7.39) average result is equal to the expectation value hQi to be given by Z ˆ dτ hQi = ψ∗ Qψ We can also estimate the average amount by which the result of such a measurement would be expected to deviate from this expectation value: the operator representing
154 Quantum Mechanics ˆ so if the root-mean-square deviation from the mean this “uncertainty” is (Qˆ − hQi), is ∆q, we have Z 2 ˆ 2 ψ dτ ∆q = ψ∗ (Qˆ − hQi) Z = ψ∗ Qˆ 0 Qˆ 0 ψ dτ ˆ Clearly if Qˆ is Hermitian, so is Qˆ 0 and therefore where Qˆ 0 is defined as (Qˆ − hQi). Z ∆q2 = (Qˆ 0 ψ)(Qˆ 0∗ ψ∗ ) Z = |Qˆ 0 ψ|2 dτ (7.48) In the same way, if we had carried out the measurement represented by the operator ˆ and the root-meanRˆ on the same system, the expectation value would have been hRi square deviation ∆r where Z ∆r2 = |Rˆ 0 ψ|2 dτ ˆ Now consider the product and Rˆ 0 = Rˆ − hRi. Z Z 2 2 0 2 ˆ ∆q ∆r = |Q ψ| dτ |R0 ψ|2 dτ 2 Z > (Qˆ 0∗ ψ∗ )(Rˆ 0 ψ) dτ
(7.49)
where the last step was obtained using a mathematical relationship known as Schwarz’s inequality, which states that Z Z Z 2 | f |2 dτ |g|2 dτ > f ∗ g dτ where f and g are any integrable functions of r.1 Now, again using the Hermitian 1 To
prove Schwarz’s inequality, consider the expression Z Z Z 2 f f g∗ dτ dτ |g|2 dτ − g
This expression must be greater than or equal to zero because the integrand is nowhere negative by definition. We can rewrite the integrand as a product of a function and its complex conjugate and get Z Z Z Z Z |g|2 dτ − g f g∗ dτ dτ > 0 f∗ |g|2 dτ − g∗ f ∗ g dτ f When we multiply out the product of the two square brackets, two terms are equal and opposite and the R others have a common factor, |g|2 dτ that can be cancelled out. We then get Z Z 2 Z | f |2 dτ |g|2 dτ − f ∗ g dτ > 0 which is Schwarz’s inequality.
The Basic Postulates of Quantum Mechanics 155 property of Qˆ 0 Z
(Qˆ 0∗ ψ∗ )(Rˆ 0 ψ) dτ =
Z
ψ∗ Qˆ 0 Rˆ 0 ψ dτ Z 1 = ψ∗ (Qˆ 0 Rˆ 0 − Rˆ 0 Qˆ 0 )ψ dτ 2 Z 1 + ψ∗ (Qˆ 0 Rˆ 0 + Rˆ 0 Qˆ 0 )ψ dτ 2
(7.50)
It was previously shown that if Qˆ 0 and Rˆ 0 are Hermitian operators so are (Qˆ 0 Rˆ 0 + Rˆ 0 Qˆ 0 ) and i(Qˆ 0 Rˆ 0 − Rˆ 0 Qˆ 0 ), which means that the expectation values of the latter two operators must be real, because all their eigenvalues are real. It follows that the first term on the right-hand side of (7.50) is purely imaginary and the second term is purely real, so the right-hand side of (7.49) can be expressed as the sum of squares of the two terms on the right-hand side of (7.50) and we get Z 2 Z 2 1 ∗ 0 0 2 2 ˆ ˆ ∆q ∆r > ψ [Q , R ]ψ dτ + ψ∗ (Qˆ 0 Rˆ 0 + Rˆ 0 Qˆ 0 )ψ dτ 4 Z 2 1 ∗ 0 0 ˆ ˆ > ψ [Q , R ]ψ dτ 4 ˆ R] ˆ Now it follows immediately from the definitions of Qˆ 0 and Rˆ 0 that [Qˆ 0 , Rˆ 0 ] = [Q, so we have Z 1 ˆ R]ψ ˆ dτ ∆q∆r > ψ∗ [Q, 2 That is
1 ˆ ˆ ∆q∆r > |h[Q, R]i| (7.51) 2 Thus, if we know the commutator of two operators, we can calculate the minimum value of the product of the uncertainties associated with a series of measurements represented by each of them, on a system whose wave function is ψ before each measurement. This is known as the generalized uncertainty principle. In the case where Qˆ and Rˆ are the operators representing a position coordinate and a ˆ Pˆ x ] = i} and therefore corresponding component of momentum, we have [X, 1 ∆x∆p x > } 2
(7.52)
which is the Heisenberg uncertainty principle referred to in Chapter 4. We note that in this particular case the uncertainty product is independent of the wave function of the system. It is important to remember that (7.51) is an inequality and we cannot use it to deduce, for example, that, if two operators commute, the product of their uncertainties must be zero. To see this, consider predictions of the values of the coordinates of a particle initially in the ground state of the hydrogen atom; both ∆x and ∆y are nonzero, even though their operators commute.
156 Quantum Mechanics
Single-slit diffraction. Light reaching the right-hand slit has also passed through the left-hand slit and is therefore travelling parallel to the horizontal axis. When it emerges from the second slit, it spreads out to form a diffraction pattern The graph at the right shows the intensity of the diffraction pattern as a function of θ. FIGURE 7.2
Thus we have shown that the uncertainty principle follows directly from the fundamental postulates of quantum mechanics. However, it is important to understand the nature of the “uncertainties” ∆q and ∆r. First, these are not experimental errors associated with any particular measurement, but refer to the root-mean-square deviations or “average spread” of a set of repeated measurements. This is the most we can predict about the outcome of the experiments, so we can look on the uncertainties as “errors in prediction.” Second, the measurements represented by Qˆ and Rˆ are obtained in separate sets of experiments, starting with the same wave function in each case. There is nothing in this to justify the popular notion that uncertainty arises because measuring one quantity “disturbs” the system so as to introduce errors into the measurement of the other. These points should become clearer in the following example.
Single-slit diffraction Consider a beam of particles of definite momentum p travelling parallel to the z axis (i.e., the wave function is a plane wave whose wave vector has components 0, 0, p/}) towards a slit of width a in the x direction where the origin of coordinates is the centre of the slit. We consider how we can measure the x coordinate of the particle position and the x component of momentum after the particles pass through the slit, given that the wave function ψ corresponds to the wave function of a particle emerging from the slit. We first consider a set of experiments carried out to measure x, by placing a photographic film immediately behind the slit and allowing a large
The Basic Postulates of Quantum Mechanics 157 number of particles to pass through. If the beam is uniform across the slit, any particle has equal probability of passing through any part of the slit and it follows that the expectation value of its x coordinate will be zero with a standard deviation√∆x of about a/4. (A direct application of (7.48) shows that its value is actually a/2 3.) We measure the momentum uncertainty by a separate set of experiments in which we allow the particles to pass well beyond the slit before detecting them on a screen as in Figure 7.2. Here, the wave function has the form of the well-known singleslit diffraction pattern, the first minimum of which occurs at an angle θ1 where sin θ1 = λ/a = 2π}/ap. Thus most of the particles passing through the slit travel in a direction within the central maximum of the diffraction pattern, so we can predict that measurements of the x component of the momentum of the particles will have an expectation value of zero with a spread ∆p x approximately equal to p sin θ1 . Hence ∆x∆p x '
} 1 1 a p2π = π} > } 4 ap 2 2
(7.53)
in agreement with the uncertainty principle. We should note a number of points about this example. First, when we record the diffraction pattern we can measure the x component of the particle momentum as precisely as we like by noting the point at which it blackens the screen (any error in θ due to the size of the slit or to that of the blackened area can be made indefinitely small by placing the screen far enough behind the slit). This is perfectly consistent with the uncertainty principle because the uncertainty ∆p x represents our inability to predict the outcome of an experimental measurement (we cannot tell in advance where on the screen the particle will arrive) and not the accuracy of the measurement actually made. The second point to note is that the uncertainty principle is about the predictions of the results of two separate sets of experiments. Sometimes one of these is implied rather than performed, because the expected results are so obvious: in the present example, the measurement of x comes into this category. A mistake that is sometimes made is to say that passing the particle through the slit constitutes a “measurement of position with error ∆x,” following which its momentum “is measured with error ∆p x .” However, unless the position of the particle has been recorded, this quantity has not been measured. Finally, we return to the example of the Heisenberg microscope referred to in Chapter 4. As a result of many photons having been scattered by the electron into the microscope we obtain an image of the electron that is “blurred” by an amount ∆x. From the earlier discussion in this section, we see that what this means is that, if we were to follow up this measurement by another that determined x precisely and repeat the whole process a number of times, the spread in the values of x would be ∆x. We also derived the uncertainty in the x component of the electron momentum, ∆p x , and we found that these two uncertainties are related by the uncertainty principle as expected. This is correct, but to actually measure ∆p x we would have to conduct a separate series of experiments in which we recorded the particles just after they emerged from the lens. Once again two sets of experiments are required to measure the spread in the results for the two variables.
158 Quantum Mechanics Worked Example 7.5 Use the results obtained in worked example 3 to test the Heisenberg uncertainty in the case of a particle in the ground state of a one-dimensional potential well. ˆ = 0 and hPˆ x i = 0, (∆x)2 = hXˆ 2 i and (∆p x )2 = hPˆ 2x i. Thus, using expressions Solution As hXi derived in the example 3 ! 1 2 1/2 π} − 2 × 3 π 2a π} 1 = 0.36 = 0.57} > } 2 2
∆x∆p x = a
7.6
THE TIME DEPENDENCE OF THE WAVE FUNCTION
Nearly all our discussion so far has related to the properties of wave functions immediately after measurement—i.e., to the eigenfunctions of operators representing physical quantities—and we have given very little consideration to the evolution of the wave function in time. However, at the beginning of Chapter 5 we set up the time-dependent Schr¨odinger equation, which we generalized to three dimensions in Chapter 6 as }2 ∂Ψ − ∇2 Ψ + V(r, t)Ψ = i} (7.54) 2m ∂t In the notation of the present chapter this becomes ˆ = i} ∂Ψ HΨ ∂t
(7.55)
where Hˆ is the Hamiltonian operator for the system. In the earlier chapters we separated off the time dependence and obtained the time-independent Schr¨odinger equation, which we have since recognized as the energy eigenvalue equation. We shall now postulate that the time-dependent Schr¨odinger equation (7.55) must always be satisfied even when the energy of the system is not known. Postulate 5 Between measurements, the development of the wave function with time is governed by the time-dependent Schr¨odinger equation. Note the phrase “between measurements.” We saw earlier (Postulate 2) that measurement generally leads to collapse of the wave function into one of the eigenfunctions of the measurement operator. This collapse is not a consequence of the Schr¨odinger equation, but is a separate type of time dependence associated with the act of measurement. The fact that quantum mechanics predicts two different time dependencies for the wave function underlies what is known as the “quantum measurement problem,” which will be discussed in the last chapter. We shall return to a more detailed discussion of the Schr¨odinger time dependence in Chapter 11, but in the meantime we discuss only the particular case where Hˆ is not itself a function of time; classically this corresponds to a closed system in which energy is conserved. If the energy eigenfunctions of the system obtained by solving
The Basic Postulates of Quantum Mechanics 159 the energy eigenvalue equation (7.2) are un , then, by completeness, the wave function at any time t can be expressed as a linear combination of the un X ψ(r, t) = an (t)un (r) (7.56) n
where the coefficients an are, in general, functions of time. Substituting (7.56) into (7.55) we get i}
X dan dt
n
un =
X
=
X
ˆ n an Hu
n
an En un
n
Thus X n
! dan i} − an E n un = 0 dt
(7.57)
Equation (7.57) must be true at all points in space so the terms in brackets must vanish leading to dan i} = an En dt that is an (t) = an (0) exp(−iEn t/}) (7.58) where an (0) is the value of an at some initial time t = 0. Hence from (7.56) and (7.58) a general solution to the time-dependent Schr¨odinger equation is X an (0)un (r) exp(−iEn t/}) (7.59) Ψ(r, t) = n
Now suppose we carry out a measurement of the energy of the system at time t = 0. From the earlier postulates we should obtain a result equal to one of the energy eigenvalues, say, Em . Moreover, the wave function immediately after the measurement will be the corresponding eigenfunction um . This is equivalent to saying that at t = 0, am (0) = 1 and an (0) = 0, n , m. In this case (7.59) becomes Ψ(r, t) = um exp(−iEm t/})
(7.60)
which was the particular form of solution used in Chapters 5 and 6. We notice that the right-hand side of (7.60) differs from um only by a phase factor, so Ψ is an eigenfunction of Hˆ at all times (remember Hˆ is assumed to be time independent). Thus any later measurement of the energy will again yield the value Em and we conclude that, in this sense, energy is conserved in such a quantum system, as it would be classically. Moreover, the value of any quantity that can be measured compatibly with the energy and whose operator therefore commutes with Hˆ (e.g., the linear momentum of a free particle) will also be conserved. This result emphasizes the great importance of the time-independent Schr¨odinger
160 Quantum Mechanics equation (the energy eigenvalue equation) whose solutions we discussed in some detail in earlier chapters. Much of quantum mechanics concerns the properties of the energy eigenstates of systems whose Hamiltonians are independent of time—often referred to as “stationary states” because of the properties just described. Moreover, many experimental measurements (e.g. atomic spectra) refer to systems that change from one nearly stationary state to another under the influence of a time-dependent potential. We shall discuss the quantum mechanics of such changes in Chapter 11 where we shall see that, in many cases, the time-dependent part of the potential can be considered as causing small “perturbations” on the stationary states of the system.
7.7
DEGENERACY
The previous sections of this chapter apply to cases where the eigenfunctions of the operators representing physical measurements are nondegenerate. That is, we have assumed that if ˆ n = qn φn Qφ then qn , qm
for all n , m
(7.61)
However, degeneracy is in fact quite a common feature of physical systems, and indeed we saw in Chapter 6 that it was a necessary consequence of symmetry in a number of three-dimensional examples. In the present section, therefore, we shall extend our discussion of the formal properties of quantum mechanics to the degenerate case. There have been two occasions where we have explicitly assumed that systems were non-degenerate: first when discussing orthornormality and second when discussing compatible measurements. We shall now extend our discussion of both these points to include degeneracy. One property that is unique to the degenerate case is that any linear combination of degenerate eigenfunctions (φn ) with the same eigenvalue (q) is also an eigenfunction with that eigenvalue. This follows directly from substitution into the eigenvalue equation X X ˆ n Qˆ cn φn = cn Qφ n
n
=q
X
cn φn
(7.62)
n
Turning now to orthogonality, the proof that eigenfunctions must be orthogonal no longer holds, but we can show that it is always possible to construct a set of orthogonal eigenfunctions from a set of nonorthogonal eigenfunctions, φn . Consider the function φ02 where Z 0 φ2 = S 12 φ1 − φ2 and S 12 = φ∗1 φ2 dτ φ02 is a linear combination of φ1 and φ2 and is therefore also an eigenfunction of Qˆ
The Basic Postulates of Quantum Mechanics 161 with eigenvalue q. Z
φ∗1 φ02 dτ
= S 12
Z
φ∗1 φ1 dτ −
Z
φ∗1 φ2 dτ
= S 12 − S 12 = 0 Thus φ02 is orthogonal to φ1 and a third function orthogonal to these two can be similarly shown to be φ03 where φ03 = S 13 φ1 + S 23 φ02 − φ3 S 13 is similar to S 12 , and S 23 =
Z
Z φ0∗ φ dτ |φ02 |2 dτ 3 2
This procedure (known as Schmidt orthogonalization) can be continued until a complete orthogonal set of degenerate eigenfunctions has been set up. Thus all the results in this chapter that depend on orthogonality can be extended to the degenerate case, provided it is assumed that such an orthogonal set has first been constructed. Throughout the rest of this book, we shall assume that this has indeed been done and that all sets of eigenfunctions we encounter are orthogonal. Turning now to a discussion of compatible measurements, the proof of the theorem that the operators representing compatible measurements commute is unaffected by the presence of degeneracy, but the converse (that measurements represented by commuting operators are compatible) requires modification. Consider the case where φ1 and φ2 are two degenerate eigenfunctions (eigenvalue q) of the ˆ which commutes with the operator R. ˆ Assume for the moment that there operator Q, are only two linearly independent eigenfunctions with this eigenvalue. Then ˆ 1 = Rˆ Qφ ˆ 1 = qRφ ˆ 1 Qˆ Rφ
(7.63)
ˆ 1 ) is also an eigenfunction of Qˆ with eigenvalue q, and it must therefore be a so (Rφ linear combination of φ1 and φ2 . That is ˆ 1 = aφ1 + bφ2 Rφ
(7.64)
where a and b are constants. An identical argument, replacing φ1 with φ2 in (7.63), leads to ˆ 2 = cφ1 + dφ2 Rφ (7.65) where c and d are constants. We shall now show that particular linear combinations of φ1 and φ2 that are eigenfunctions of Rˆ exist. Let one of these be φ0 = Aφ1 + Bφ2 and let its eigenvalue be r, then ˆ 0 ≡ ARφ ˆ 1 + BRφ ˆ 2 ≡ Aaφ1 + Abφ2 + Bcφ1 + Bdφ2 Rφ 0 = rφ ≡ rAφ1 + rBφ2
(7.66)
162 Quantum Mechanics Equating coefficients of φ1 and φ2 in (7.66) we get (a − r)A + cB = 0 bA + (d − r)B = 0
(7.67)
The Equations (7.67) have solutions only if the determinant of the coefficients of A and B vanishes, leading to two possible values of r. In general these will not be equal, so the eigenfunctions of Rˆ need not be degenerate. This result can readily be extended to the case of an arbitrary number of degenerate functions. We therefore conclude that, although all possible forms of the degenerate eigenfunctions of one operator are not necessarily eigenfunctions of the other, a set of eigenfunctions that are common to both operators can always be chosen. Once this has been done, compatibility has the same meaning as it did in the nondegenerate case. Finally, we consider the implications of the previous discussion for the quantum ˆ theory of measurement. Following a measurement represented by the operator Q, which produces the result q, the system will have a wave function equivalent to one of the set of degenerate eigenfunctions that correspond to this eigenvalue. However, this need not be an eigenfunction of the operator Rˆ and the exact result of a measurement represented by this operator is not predictable even though Qˆ and Rˆ commute. Nevertheless, the only possible results are those calculable in the manner described earlier, and a subsequent measurement of the quantity represented by Rˆ will leave q unchanged and result in the wave function being equivalent to one of the set of common eigenfunctions. The results of further measurements represented by Qˆ or Rˆ will then be completely predictable. As an example of this, consider the energy eigenfunctions of the hydrogen atom discussed in Chapter 6. A measurement of the energy that showed the atom to have principal quantum number n = 2 would result in the atom being in one of four degenerate states (l = 0, m = 0; l = 1, m = 0 or ±1) or in some linear combination of them. As will be shown in Chapter √ 8, a subsequent measurement of the total angular momentum must yield a result l(l + 1} where l is either zero or one, but which of these is unpredictable. Following this, if l has been found to be equal to one, say, the value of m will be similarly unpredictable unless the z component of angular momentum is also measured. Each of these measurements leaves the values of the previously measured quantities unchanged and, once all three measurements have been made, the wave function of the system is completely specified and the results of further measurements of any of these quantities are completely predictable. Worked Example 7.6 Show that in the case of a free particle in one dimension, the ˆ but that the eigenstates of Hˆ are not necessarily eigenstates of Pˆ are also eigenstates of H, ˆ eigenstates of P. Solution The eigenvalues of Pˆ are }k and the corresponding eigenfunction is exp(ikx). These are also eigenfunctions of Hˆ with eigenvalues }2 k2 /2m. However, states where k has the same ˆ so linear combinations of these magnitude, but opposite sign, are degenerate eigenstates of H, should also be so. Explicitly 2 ˆ exp(ikx) + B exp(−ikx)] = } [k2 A exp(ikx) + k2 B exp(−ikx)] H[A 2m
The Basic Postulates of Quantum Mechanics 163 ˆ but confirming that the linear combination is an eigenfunction of H, Pˆ A exp(ikx) + B exp(−ikx) = −i} ik exp(ikx) − ikB exp(−ikx) so the linear combination is not an eigenstate of Pˆ unless either A or B equals zero. NB in three dimensions, all k with the same magnitude and whatever direction correspond to states ˆ but not P. ˆ that are degenerate with respect to H,
7.8
THE HARMONIC OSCILLATOR AGAIN
We discussed the harmonic oscillator energy states in Chapter 6 where we showed that these were quantized with values (n + 12 )}ωc , ωc being the classical frequency of the oscillator and n an integer > 0. We return to this problem as an example of the application of operator methods, which will lead to the same results. The techniques used here are very similar to those that will be used to analyze the properties of the angular momentum operators (8) and similar techniques underlie much of quantum field theory, which is touched on in Chapter 14. The energy of a harmonic oscillator is represented by the Hamiltonian operator Pˆ 2 1 Hˆ = + mω2c Xˆ 2 2m 2
(7.68)
We can simplify the algebra in a manner similar to that used in Chapter 5 by using dimensionless variables such that x is measured in units of (}/mωc )1/2 , p in units of (m}ωc )1/2 and energy in units of }ωc so that (7.68) becomes 1 Hˆ = (Pˆ 2 + Xˆ 2 ) 2 We define two new operators in terms of Xˆ and Pˆ ˆ Rˆ + = (Pˆ + iX) ˆ ˆ ˆ R− = (P − iX)
(7.69)
(7.70)
ˆ X] ˆ = −i. Using this with In our units the commutation relation (7.41) becomes [P, (7.70), we get ˆ X]) ˆ = 2Hˆ − 1 Rˆ + Rˆ − = (Pˆ 2 + Xˆ 2 − i[P,
(7.71)
Rˆ − R+ = 2Hˆ + 1
(7.72)
1 1 Hˆ = (Rˆ + Rˆ − + 1) = (Rˆ − Rˆ + − 1) 2 2
(7.73)
and so that
164 Quantum Mechanics Using (7.73), we can write ˆ = Rˆ + 1 (Rˆ − Rˆ + − 1) − 1 (Rˆ + Rˆ − + 1)Rˆ + [Rˆ + , H] 2 2 = −R+
(7.74)
ˆ = Rˆ − [Rˆ − , H]
(7.75)
ˆ n + Rˆ + un Hˆ Rˆ + un = Rˆ + Hu = (En + 1)(Rˆ + un )
(7.76)
Similarly Now, from (7.74)
where un is an energy eigenfunction with eigenvalue En . Similarly, using (7.75) we get Hˆ Rˆ − un = (En − 1)(Rˆ − un ) (7.77) Thus (Rˆ + un ) and (Rˆ − un ) are eigenfunctions of Hˆ with eigenvalues (En + 1) and (En − 1) respectively. It follows that the operators Rˆ + and Rˆ − move us up and down a “ladder” of eigenvalues. As a result Rˆ + and Rˆ − are known as “raising” and “lowering” operators, or sometimes as “ladder operators”, or sometimes as “creation” and “annihilation” operators because they create or annihilate quanta of energy. We also know from the form of the Hamiltonian that all energy levels must be positive. So, if u0 is the eigenfunction corresponding to the minimum value of the energy, (7.77) can be satisfied only if Rˆ − u0 = 0; hence, using (7.71) 0 = Rˆ + Rˆ − u0 = (2Hˆ − 1)u0 = (2E0 − 1)u0 Hence E0 = units or
1 2
(7.78)
and it follows that the energy levels are En = (n + 21 ) in dimensionless
1 (7.79) En = (n + )}ωc 2 agreeing with the result in Chapter 5. This is an example of a “representation-free” derivation, in which the energy eigenvalues have been obtained without using an explicit form of the position and momentum operators. The form of the eigenfunctions can also be shown to be the same as that deduced in Chapter 5. After making the substitutions Xˆ = x, and Pˆ = −i∂/∂x, the expression Rˆ − u0 = 0 becomes ∂u0 + xu0 = 0 (7.80) ∂x whose solution is u0 = A exp(−x2 /2) (7.81) where A is a normalizing constant, which again agrees with 5. The wave functions of the other states can be obtained by successively operating on u0 by Rˆ + , again expressed in terms of x and −i∂/∂x.
The Basic Postulates of Quantum Mechanics 165 Worked Example 7.7 Use the raising operator and (7.81) to obtain unnormalized expressions for the n = 1 and n = 2 eigenfunctions of the harmonic oscillator. Solution ! ∂ u1 = Rˆ + u0 = i − + x exp(−x2 /2) ∂x = 2ix exp(−x2 /2) and u2 = Rˆ + u1 = i −
! ∂ + x x exp(−x2 /2) ∂x
= i(2x2 − 1) exp(−x2 /2)
which agrees with the expressions derived in Chapter 5 (5.68) when the change of units discussed above is allowed for.
7.9
THE MEASUREMENT OF MOMENTUM BY COMPTON SCATTERING
We close this chapter with an example of a real physical measurement that illustrates many of the principles discussed. This is the measurement of the momentum of electrons by Compton scattering of x-rays. We discussed Compton scattering in Chapter 4 where we obtained evidence for the existence of photons of energy }ω and momentum }k, where ω and k are the angular frequency and wave vector of the x-rays. At that stage we assumed that the electron was at rest before the scattering event; we will now lift this restriction and show how Compton scattering can be used to obtain information about the momentum of the scattering electron. We consider first the case of a free electron whose momentum before and after the photon is scattered has the values p and p0 , respectively. Measurement of the wavelength of the incident and scattered x-rays and knowledge of the direction of the incoming and outgoing photon lead to values for the change in momentum (δp) and the change in energy (δE) of the photon in the collision. We can now use the conservation of four-momentum (3.29) in chapter Chapter 3 and, assuming that the electron speed before and after the collision is much less than that of light, we can treat energy and momentum separately to get δp = p − p0
(7.82)
δE = p2 /2m − p02 /2m
(7.83)
p02 = p2 + δp2 − 2p · δp
(7.84)
p02 = p2 − 2mδE
(7.85)
and Squaring (7.82) we get while from (7.83) we have
166 Quantum Mechanics Equating the right-hand sides of (7.84) and (7.85) leads to 2p · δp = δp2 + 2mδE
(7.86)
Thus our knowledge of δp and δE have led directly to a measurement of the component of p in the direction of δp and a similar component of p0 is readily derived from (7.86) and (7.82). We now analyze this measurement from the point of view of quantum mechanics. We first note again that for a free particle the momentum operator (−i}∇) commutes with the Hamiltonian (−}2 ∇2 /2m) so these are compatible measurements and the momentum eigenfunctions are therefore stationary states of the system. The experiment then implies that the electron was in a state of momentum p (wave function exp(ip · r/})) before the measurement and in another stationary state of momentum p0 (wave function exp(ip0 · r/})) afterwards. The outcome of the experiment is therefore completely predictable and if it is repeated with the same starting conditions, the same result will certainly be obtained for the component of p0 in the direction δp. The situation becomes more complicated, but at the same time more illustrative of the quantum theory of measurement, if the electron is bound to an atom before the scattering process takes place. We shall assume that a previous measurement of the energy of the atom (achieved perhaps by observing photons emitted from it) has shown it to be in its ground state, which is, of course, a stationary state of the system. We now consider the dynamics of the scattering process. The conservation of energy and momentum is complicated by the presence of the atomic nucleus which means that at least some of the momentum transfer will be to the atom as a whole and the experiment may then give little or no information about the momentum of the individual electron. To minimize this effect, we make what is called the “impulse approximation,” where we assume that the change in energy is much greater then the binding energy of the electron to the rest of the atom. This means that the electron is knocked cleanly out of the atom and is effectively free at the end of the process. Equation (7.86) will then provide an accurate value for the component of the electron momentum. Looking at this experiment from the point of view of quantum mechanics we first note that, if the electron is bound to the atom, the momentum operator does not commute with the Hamiltonian operator so the initial state cannot be a momentum eigenstate. As we know the change in photon energy, it follows from conservation of energy that we also know the energy of the system after the collision, which means that the electron must again be in an energy eigenstate at that time. However, if its momentum has also been measured it follows from Postulate 2 that the electron is also in a momentum eigenstate after the measurement. The only way both these conditions can be simultaneously fulfilled is if the electron is effectively free after the collision, which is the condition we just showed was necessary for an accurate momentum measurement. We have seen, therefore, that an accurate measurement of the momentum of an electron in an atom is possible provided the electron is placed in a momentum eigenstate as a result of the experiment. What is not possible on the basis of the postulates of quantum mechanics is a prediction of the precise result of such an
The Basic Postulates of Quantum Mechanics 167 experiment. Postulate 4, however, tells us how to obtain the relative probabilities of the different possible outcomes and we complete the discussion by applying this to the present example. We assume the atom is hydrogen and that it is initially in its ground state. Then its wave function is, from Chapter 6 u(r) = (πa30 )−1/2 exp(−r/a0 )
(7.87)
The probability of obtaining a result for p in the region d p x , d py , d pz in the vicinity of p = }k is |a(k)|2 , where a(k) is given by the three-dimensional equivalent of (7.37) !1/2 Z 1 u(r) exp(−ik · r) dτ a(k) = 8π3 Z 2π Z π Z ∞ = (8π4 a30 )−1/2 exp(−r/a0 − ikr cos θ)r2 dr sin θ dθ dφ (7.88) 0
0
0
where we have expressed the volume integral in spherical polar coordinates with the direction of k as the polar axis. The integrals over θ and φ are readily evaluated after making the substitution x = cos θ leading to Z ∞ k−1 sin(kr) exp(−r/a0 )r dr a(k) = 2(2π2 a30 )−1/2 0
The integral over r can be evaluated by successive integrations by parts and we obtain a(k) = 2(2a30 )1/2 /[π(1 + a20 k2 )2 ]
(7.89)
As expected, therefore, the probability distribution for the momentum of an electron in the ground state of the hydrogen atom is spherically symmetric in momentum space, corresponding with the spherically symmetric position probability distribution obtained by squaring (7.87). Finally we note that the Compton scattering experiment measures one component of momentum only. Calling this the z component, the probability that it will be found to have a value between }kz and }(kz + dkz ) while the other components may adopt any values will be P(kz )dkz where Z Z P(kz ) = |a(k)|2 dk x dky Z ∞Z ∞ = (8a30 /π2 ) (1 + a20 k2x + a20 ky2 + a20 kz2 )−4 dk x dky −∞
−∞
The relevant integrals can be evaluated to give P(kz ) =
8a0 1 3π (1 + a20 kz2 )3
(7.90)
We note that the product of the widths of the momentum distribution (δpz ' }/a0 ) and the position distribution (δz ' a0 ) is approximately equal to } in agreement with the uncertainty principle.
168 Quantum Mechanics
A comparison of the theoretical and experimental intensities of Compton scattering from helium. The continuous line represents the momentum calculated in the manner described in the text and the points represent measured intensities—after J. W. M. Dumond and H. A. Kirkpatrick, Physical Review, vol. 52, pp. 419-436, 1937. FIGURE 7.3
We have discussed this example in some detail because it illustrates several of the main features of the basic postulates of quantum mechanics. In particular we have seen how an exact measurement of a physical quantity (in this case the momentum of an electron) results in the system being in an eigenstate of the operator representing the measurement immediately after the measurement has been completed. We have also seen that knowledge of the wave function of the system before the measurement does not generally provide a precise prediction of the result, but only of the relative probabilities of the various outcomes. Another reason for discussing this particular example is that Compton scattering represents a measurement that can actually be carried out. Unfortunately, such experiments on monatomic hydrogen are impractical, but Figure 7.3 shows the probability distribution for a momentum component in helium, calculated in a similar manner, along with the results of an early Compton scattering experiment; each experimental point represents the number of times the component of momentum of an electron was measured to be in the region of that value. We see that there is excellent agreement between the experimental results and the calculated probability distribution, but we note once more that quantum mechanics can only make this statistical prediction and that the values obtained in individual measurements of electron momentum are unpredictable.
The Basic Postulates of Quantum Mechanics 169
7.10
PROBLEMS
ˆ Qˆ 2 , Qˆ 3 , etc., are all compatible. Hence show that any component 7.1 Show that the operators Q, of the momentum of a particle can always be measured compatibly with the kinetic energy, but that the momentum and total energy can be measured compatibly only if the potential energy is constant everywhere. ˆ are Hermitian operators, which of the following operators are also Hermitian: 7.2 If Qˆ and (R) (i) Qˆ n if n is any positive integer. (ii) Qˆ + Rˆ (iii) Qˆ + iRˆ ∂ (iv) ∂x (v)
∂2 ∂x2
7.3 Show that, where r is the radial coordinate in three dimensions, the operator −ih∂/∂r is not Hermitian, but −ih[∂/∂r + 1/r] is. 7.4 A particle moving in one dimension has a wavefunction ψ(x) where √ ψ(x) = (1/2)φ1 (x) + ( 3/2)φ5 (x) where φ1 and φ5 are members of the set of φn defined in (7.25) What are the possible results of a measurement of the position of this particle and what are their relative probabilities? What would be the form of the wavefunction immediately after such a measurement? A particle moving in one dimension has a wavefunction ψ(x) where p ψ(x) = (1/2) exp(ix) + ( (3)/2) exp(5ix) What are the possible results of a measurement of the momentum of this particle and what are their relative probabilities? What would be the form of the wavefunction immediately after such a measurement? 7.5 A particle moves in one dimension subject to a potential that is zero in the region −a 6 x 6 a and infinite elsewhere. At a certain time its wave function is ψ = (5a)−1/2 cos(πx/2a) + 2(5a)−1/2 sin(πx/a) What are the possible results of the measurement of the energy of this system and what are their relative probabilities? What are the possible forms of the wave function immediately after such a measurement? If the energy is immediately remeasured, what will now be the relative probabilities of the possible outcomes? (The energy eigenvalues and eigenfunctions of this system are given in Section 5.4.) 7.6 The energy of the particle in Problem 7.4 is measured and a result equal to the lowest energy eigenvalue is obtained. Show that the probability of a subsequent measurement of the electron momentum yielding a result between }k and }(k + dk) is equal to P(k) dk where P(k) =
cos2 ka π 2a3 (π2 /4a2 − k2 )2
7.7 Show that if the particle in Problem 7.6 had been in a highly excited energy eigenstate of eigenvalue E, a measurement of its momentum would almost certainly have produced a value equal to ±(2mE)1/2 . Compare this result with the predictions of classical mechanics for this problem. 7.8 A particle is observed to be in the lowest energy state of an infinite-sided well. The width of the well is (somehow) expanded to double its size so quickly that the wave function does not change during this process; the expansion takes place symmetrically so that the centre of the well does not move. The energy of the particle is measured again. Calculate the probabilities that it will be found in (i) the ground state, (ii) the first excited state and (iii) the second excited state of the expanded well.
170 Quantum Mechanics 7.9 Show that the commutator [xn , p x ] = i}nxn−1 and that [x, pnx ] = i}npn−1 x 7.10 Show that for a system in an energy eigenstate, ˆ Q] ˆ >= 0 < [H, where Qˆ is any time-independent Hermitian operator. ˆ show that, for a one-dimensional system in an energy eigenstate, By putting Qˆ = Xˆ P, ∂V 2 < Tˆ >=< x > ∂x where Tˆ and Vˆ are operators representing the kinetic and potential energy respectively. Generalise this result to three dimensions and use it to show that 2 < Tˆ > + < Vˆ >= 0 in the case of the hydrogen atom. 7.11 Calculate the expectation values of (i) x, (ii) x2 , (iii) p, and (iv) p2 for a particle known to be in the lowest energy state of a one-dimensional harmonic oscillator potential before these measurements are performed. Hence show that the expectation value of the total energy of the particle equals the groundstate eigenvalue and that the product of the root-mean-square deviations of x and p from their respective means has the minimum value allowed by the uncertainty principle. 7.12 Show that for a particle in the nth energy eigenstate of a harmonic oscillator < Xˆ 2 >=< Pˆ 2 >= n + 1/2, where Pˆ and Xˆ are as in (7.69). Hence show that the uncertainty product ∆x∆p = (n + 1/2)}. In the case where n = 0, compare this with the answer to the previous problem. 7.13 Calculate the expectation values of the following quantities for an electron known to be in the ground state of the hydrogen atom before the measurements are performed: (i) the distance r of the electron from the nucleus, (ii) r2 , (iii) the potential energy, (iv) the kinetic energy. Show that the sum of (iii) and (iv) equals the total energy. ˆ known as the parity operator, is defined so that, in one dimension, Pˆ f (x) = f (−x) 7.14 An operator P, where f is any well-behaved function of x. Assuming that Pˆ is real, prove that Pˆ is Hermitian. Show that the eigenvalues of Pˆ are ±1 and that any function φ(x) that has definite parity—i.e., where φ(x) = ±φ(−x)— ˆ is an eigenfunction of P. 7.15 Show that, if Hˆ is the Hamiltonian operator representing the total energy of a one-dimensional system where the potential energy has even parity, and if Pˆ is the parity operator defined in Problem 8, ˆ H] ˆ = 0. Hence show that in the nondegenerate case, the energy eigenfunctions of such a system then [P, always have definite parity. What restrictions, if any, are placed on the eigenfunctions in the similar, but degenerate, case? 7.16 Generalize the results given in problems 7.14 and 7.15 to the three-dimensional case and confirm that they apply to the energy eigenfunctions of a particle in a spherically symmetric potential obtained in Chapter 6. 7.17 Express the operators representing the position and momentum of a particle in a harmonic oscillator potential in terms of the raising and lowering operators. Hence show that Z ∞ Z ∞ ˆ n dx = ˆ n dx = 0 u∗m Xu u∗m Pu −∞
−∞
unless m = n ± 1.. Obtain expressions for the integrals in these two special cases. 7.18 Show that if Hˆ = pˆ 2 /2m + V(r)
∂ < x2 >= [< xp x > + < p x x >]/m ∂t
The Basic Postulates of Quantum Mechanics 171 7.19 Explain why a precise measurement of momentum would be impossible for a particle in an infinite square well, but could be performed in the case of a finite square well, even when the potential steps are very much larger than the ground-sate energy. 7.20 X-rays of wavelength 1.00 × 10−10 m are incident on a target containing free electrons, and a Compton scattered x-ray photon of wavelength 1.02×10−10 m is detected at an angle of 90◦ to the incident direction. Obtain as much information as possible about the momentum of the scattering electron before and after the scattering process.
CHAPTER
8
Angular Momentum I CONTENTS 8.1 8.2 8.3 8.4
8.5
The Angular-Momentum Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Angular Momentum Eigenvalues and Eigenfunctions . . . . . The Experimental Measurement of Angular Momentum . . . . . . . The Stern-Gerlach experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A General Solution to the Angular-Momentum Eigenvalue Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
174 176 179 180 182 186 187 188
Angular momentum is a very important and revealing property of many physical systems. In classical mechanics the principle of conservation of angular momentum is a powerful aid to the solution of such problems as the orbits of planets and satellites and the behaviour of gyroscopes and tops. The role of angular momentum in quantum mechanics is probably even more important and this will be the subject of the present chapter and the next. We shall find that the operators representing the components of angular momentum do not commute with each other, although they all commute with the operator representing the total angular momentum. It follows that no pair of these components can be measured compatibly and we shall therefore look for a set of eigenfunctions that are common to the operators representing the total angular momentum and one of its components. We shall find that the angular momentum eigenvalues always form a discrete set and that, in the case of a central field, the operators commute with the Hamiltonian, implying that the total angular momentum and one component can be measured compatibly with the energy in this case, so that their values, once measured, remain constant in time. Consideration of angular momentum will also enable us to make predictions about the behaviour of atoms in magnetic fields, and we shall find that these are in agreement with experiment only if we assume that the electron (in common with many other fundamental particles) has an intrinsic “spin” angular momentum in addition to the “orbital” angular momentum associated with its motion. The interaction between spin and orbital angular momenta leads to detailed features of the atomic spectra which are known as “fine structure.”
173
174 Quantum Mechanics We shall also use the measurement of angular momentum to illustrate the quantum theory of measurement discussed in the previous chapter. Angular momentum is therefore an important subject, which deserves and requires detailed consideration. In the rest of this chapter we shall consider the properties of the angular-momentum operators and obtain expressions for their eigenvalues in the case of spin as well as orbital angular momentum, while in the next chapter we shall show that the angular-momentum components can be usefully represented by matrices rather than differential operators, and discuss how this matrix representation can be used to describe the fine structure of atomic spectra and to illustrate the quantum theory of measurement.
8.1
THE ANGULAR-MOMENTUM OPERATORS
The classical expression for the angular momentum l of a particle whose position and momentum are r and p respectively, is l = r×p
(8.1)
ˆ and P. ˆ The Using Postulate 3 in Chapter 7, we replace r and p by the operators R ˆ quantum-mechanical operator, L, representing angular momentum is then ˆ × Pˆ Lˆ = R
(8.2)
ˆ and Pˆ are the operators representing position and momentum. The operators where R representing the Cartesian components of angular momentum are therefore Lˆ x = Yˆ Pˆ z − Zˆ Pˆ y
Lˆ y = Zˆ Pˆ x − Xˆ Pˆ z
Lˆ z = Xˆ Pˆ y − Yˆ Pˆ x
(8.3)
and that representing the square of the magnitude of the total angular momentum is Lˆ 2 = Lˆ 2x + Lˆ y2 + Lˆ z2
(8.4)
We now investigate whether these operators represent quantities that are compatible in the sense discussed in the previous chapter. To do this, we check whether the operators Lˆ x , Lˆ y , Lˆ z , and Lˆ 2 commute when taken in pairs. Using (8.3), we get [Lˆ x , Lˆ y ] = Lˆ x Lˆ y − Lˆ y Lˆ x = (Yˆ Pˆ z − Zˆ Pˆ y )(Zˆ Pˆ x − Xˆ Pˆ z ) − (Zˆ Pˆ x − Xˆ Pˆ z )(Yˆ Pˆ z − Zˆ Pˆ y ) = Yˆ Pˆ z Zˆ Pˆ x − Yˆ Pˆ z Xˆ Pˆ z − Zˆ Pˆ y Zˆ Pˆ x + Zˆ Pˆ y Xˆ Pˆ z − Zˆ Pˆ x Yˆ Pˆ z + Zˆ Pˆ x Zˆ Pˆ y + Xˆ Pˆ z Yˆ Pˆ z − Xˆ Pˆ z Zˆ Pˆ y
(8.5)
The corresponding terms in the last two lines of (8.5) contain the same operators, but in a different order, which is important only for those operators that do not commute. Remembering the commutation relations for the components of position ˆ We and momentum (7.42), the only noncommuting operators in (8.5) are Pˆ z and Z.
Angular Momentum I 175 can therefore cancel the second and third terms in the second to last line with the corresponding terms in the last line, and rearrange the others to give ˆ [Lˆ x , Lˆ y ] = Yˆ Pˆ x (Pˆ z Zˆ − Zˆ Pˆ z ) + Xˆ Pˆ y (Zˆ Pˆ z − Pˆ z Z) ˆ = (Xˆ Pˆ y − Yˆ Pˆ x )(Zˆ Pˆ z − Pˆ z Z) = i}(Xˆ Pˆ y − Yˆ Pˆ x ) = i}Lˆ z using (8.3). [Lˆ y , Lˆ z ] and [Lˆ z , Lˆ x ] can be obtained similarly and we have h i Lˆ x , Lˆ y = i}Lˆ z h i ˆLy , Lˆ z = i}Lˆ x h i ˆLz , Lˆ x = i}Lˆ y
(8.6)
(8.7)
We can now complete our calculation of the commutator brackets by considering those involving Lˆ 2 . From (8.4) [Lˆ 2 , Lˆ z ] = [Lˆ 2x , Lˆ z ] + [Lˆ y2 , Lˆ z ] + [Lˆ z2 , Lˆ z ]
(8.8)
Considering the first term on the right-hand side and using Equations (8.7) [Lˆ 2x , Lˆ z ] = Lˆ x Lˆ x Lˆ z − Lˆ z Lˆ x Lˆ x = Lˆ x (−i}Lˆ y + Lˆ z Lˆ x ) − (i}Lˆ y + Lˆ x Lˆ z )Lˆ x = −i}(Lˆ x Lˆ y + Lˆ y Lˆ x )
(8.9)
Similarly [Lˆ y2 , Lˆ z ] = i}(Lˆ x Lˆ y + Lˆ y Lˆ x )
(8.10)
[Lˆ z2 , Lˆ z ] = Lˆ z3 − Lˆ z3 = 0
(8.11)
Also Substituting from (8.9), (8.10), and (8.11) into (8.8) and generalizing the result to the commutator brackets containing the other components we get [Lˆ 2 , Lˆ x ] = [Lˆ 2 , Lˆ y ] = [Lˆ 2 , Lˆ z ] = 0
(8.12)
It follows from (8.7) that the operators representing any two components of angular momentum do not commute and are therefore not compatible. If, therefore, the system is in an eigenstate of one angular momentum component, it will not be simultaneously in an eigenstate of either of the others. However, (8.12) shows that the magnitude of the total angular momentum can be measured compatibly with any one of its components. We shall therefore approach the eigenvalue problem by looking for a set of functions that are simultaneously eigenfunctions of the total angular momentum and one component, which is conventionally chosen to be the z component Lˆ z .
176 Quantum Mechanics
8.2
THE ANGULAR MOMENTUM EIGENVALUES AND EIGENFUNCTIONS
We can obtain an explicit expression for the operators representing the angular momentum components using the forms of the position and momentum operators given in Postulate 3 ˆ =r Pˆ = −i}∇ R Hence Lˆ = −i}r × ∇
(8.13)
When considering the angular momentum of a particle moving about a point, it is a great advantage to refer to a spherical polar coordinate system similar to that used in Chapter 6 when discussing the energy eigenvalues of a particle in a central field. Expressions for Lˆ z and Lˆ 2 in this coordinate system will be derived using vector calculus, but readers who are not familiar with this technique may prefer to accept Equations (8.16) and (8.17) and proceed to the discussion following these equations. We first refer the vector r to the origin of the spherical polar coordinate system (cf. Figure 6.2), giving us the standard expression ∇ = r0
1 ∂ ∂ ∂ 1 + θ0 + φ0 ∂r r ∂θ r sin θ ∂φ
(8.14)
where r0 , θ0 and φ0 are the three basic unit vectors of the spherical polar system. It follows directly that ! ∂ 1 ∂ ˆ L = −i}r × ∇ = −i} φ0 − θ0 (8.15) ∂θ sin θ ∂φ Taking the polar axis to be z, the unit vector, z0 , in this direction is z0 = cos θr0 − sin θθ0 Hence
ˆ = −i} ∂ Lˆ z = z0 · L ∂φ
(8.16)
An expression for Lˆ 2 is obtained from the vector identity Lˆ 2 = −}2 (r × ∇) · (r × ∇) = −}2 r · [∇ × (r × ∇)] Substituting for r × ∇ from (8.15) this expression becomes 1 ∂ ∂ Lˆ 2 = −}2 r · ∇ × φ0 − θ0 ∂θ sin θ ∂φ
!
so that, using the standard expression for curl in spherical polar coordinates we get " ! # 2 ˆL2 = −}2 1 ∂ sin θ ∂ + 1 ∂ (8.17) sin θ ∂θ ∂θ sin2 θ ∂φ2
Angular Momentum I 177 We can now use (8.16) and (8.17) to obtain the eigenvalues and the common set of eigenfunctions for the operators Lˆ z and Lˆ 2 . Considering Lˆ 2 first, we notice that (apart from a multiplicative constant) (8.17) is identical to the differential equation (6.22) determining the angularly dependent part of the energy eigenfunctions in the central field problem, which has already been solved in Chapter 6. It follows directly from (6.40) that the eigenvalues of Lˆ 2 are l(l + 1)}2
(8.18)
where l is a positive integer or zero, and that the corresponding eigenfunctions—cf. (6.40)—are #1/2 " (2l + 1) (l − |m|)! (cos θ) exp(imφ) (8.19) P|m| Ylm (θ, φ) = (−1)m l 4π (l + |m|)! where m is an integer whose modulus is less than or equal to l, and some of the properties of the spherical harmonics Ylm and the associated Legendre functions, P|m| , l have been discussed in Chapter 6. It follows immediately from (8.16) and (8.19) that ∂Ylm Lˆ z Ylm = −i} = m}Ylm ∂φ
(8.20)
so the functions Ylm are the simultaneous eigenfunctions of Lˆ z and Lˆ 2 that we have been looking for, and the eigenvalues of Lˆ z are given by m}
where
−l 6 m 6 l
(8.21)
We have therefore confirmed the physical interpretation of the quantum numbers l and m that we anticipated in Chapter 6: the square of the total angular momentum is quantized in units of l(l + 1)}2 and the z component in units of m}. The condition −l 6 m 6 l corresponds to the obvious requirement that a measurement of the total angular momentum must yield a result at least as large as that of a simultaneous measurement of any component. Also, if the magnitude of the total angular momentum and that of the z component are known, it follows that the angular momentum vector must be at a fixed angle to the z axis. However, its orientation with respect to the x and y axes is unknown and neither of these components can therefore be measured compatibly with Lˆ z , which is consistent with the results derived earlier from the commutation relations. The possible orientations with respect to the z axis are illustrated in Figure 8.1 for the case where l = 2: we note that there is no eigenstate where the angular momentum is exactly parallel to the z axis, which is to be expected as this configuration would imply that the x and y components were simultaneously known to be exactly equal to zero. Our knowledge of the angular-momentum eigenvalues and eigenfunctions also helps us to understand more clearly the physical basis of the degeneracy found in the energy states of the central-field problem in Chapter 6. The central-field Hamiltonian, which is contained in the time-independent Schr¨odinger equation (6.19), can be written using (8.17) in the form ! Lˆ 2 }2 1 ∂ 2 ∂ ˆ r + V(r) + (8.22) H=− 2m r2 ∂r ∂r 2mr2
178 Quantum Mechanics
The possible orientations of the angular momentum vector relative to the z axis in the case where l = 2. Note that the orientation within the xy plane is unknown if the z component has a definite value, so the angular momentum vector joins the origin to the edge of a cone whose axis is z. FIGURE 8.1
The final term on the right-hand side is the quantum-mechanical equivalent of the rotational kinetic energy term in the classical expression for the total energy. The only part of Hˆ that depends on θ and φ is this last term so it follows from this, and the fact that Lˆ 2 and Lˆ z are not functions of r, that Hˆ commutes with both Lˆ 2 and Lˆ z . Thus, in this case the energy eigenfunctions are also eigenfunctions of the total angular momentum and of one component. Energy levels corresponding to the same value of l, but different values of m are degenerate, so the angular part of the wave function may be any linear combination of the eigenfunctions corresponding to these various values of m. This is equivalent to saying that the spatial orientation of the angular momentum vector is completely unknown so that no component of it can be measured unless the spherical symmetry of the problem is broken—for example by applying a magnetic field—as will be discussed in the next section. Worked Example 8.1 The hydrogen molecule can be modelled as two point masses (each 1.7 × 10−27 kg) at a fixed separation of a = 0.96 × 10−10 m. What are the allowed values of the total angular momentum of a hydrogen molecule? Obtain an expression for the corresponding rotational energy levels. Solution The molecule can rotate about an axis through its centre of mass and the direction of this axis is not fixed in space. The allowed values of angular momentum are therefore the eigenvalues of Lˆ 2 , given in (8.18). The corresponding energy levels follow from the expression for the rotational kinetic energy of a rigid body of moment of inertia I about an axis through
Angular Momentum I 179 its centre of mass, which is E = L2 /2I, where I = 2m(a/2)2 . Thus El = l(l + 1)}2 /2I =
(1.05 × 10−34 )2 l(l + 1) 2 × 2 × 1.7 × 10−27 × (0.96 × 10−10 /2)2
= 7.04 × 10−22 l(l + 1) J = 0.0044l(l + 1) eV
Worked Example 8.2 Show that the function sin θ exp iφ is an eigenfunction of Lˆ 2 and Lˆz . What are the corresponding eigenvalues? Solution Using (8.17), we have # " ! 1 ∂ ∂ 1 ∂2 Lˆ 2 sin θ exp iφ = −}2 sin θ exp iφ sin θ + 2 sin θ ∂θ ∂θ sin θ ∂φ2 " # 1 ∂ 1 = −}2 sin θ cos θ exp iφ − 2 sin θ exp iφ sin θ ∂θ sin θ # " 1 1 2 2 2 = −} cos θ − sin θ exp iφ − exp iφ sin θ sin θ = 2}2 sin θ exp iφ showing that the function is an eigenstate of Lˆ 2 with eigenvalue 2}2 , corresponding to l = 1 Using (8.20) we see that it is also an eigenfunction of Lˆ z with eigenvalue }—i.e., m = 1.
8.3
THE EXPERIMENTAL MEASUREMENT OF ANGULAR MOMENTUM
Now that we have derived expressions for the angular momentum eigenvalues, we shall see how these agree with the results of experimental measurement. Such measurements are often made by studying the effects of magnetic fields on the motion of particles, and we shall consider these in this section. We first treat the problem classically. We consider the classical motion of a particle, such as an electron, with charge −e and mass me moving in a circle with angular velocity ω in an orbit of radius r. Its angular momentum l therefore has magnitude me ωr2 and points in a direction perpendicular to the plane of the orbit. The circulating charge is equivalent to an electric current −eω/2π moving around a loop of area πr2 . The magnitude of its magnetic moment µ is equal to the product of the current and the loop area which is eωr2 /2. Its direction is parallel to, but opposite to, l (because of the negative sign). Hence we have µ=−
e l 2me
(8.23)
We can now apply Postulate 3 to (8.23) to obtain the quantum-mechanical operator representing a measurement of magnetic moment µˆ = −
e ˆ L 2me
180 Quantum Mechanics that is µˆ z = −
e ˆ Lz 2me
etc.
(8.24)
It follows that, if the z component of the magnetic moment of an atom is measured, a value for this component of the angular momentum is also obtained. If a magnetic field B is applied to such a system, the energy of interaction between the magnetic moment and the field will be −µ · B. The corresponding quantummechanical operator in the case where B is in the z direction is therefore Hˆ where eB ˆ Lz Hˆ = 2me
(8.25)
and, if the system is in an eigenstate of Lˆ z with eigenvalue m}, a measurement of this interaction energy will yield the value ∆E = µB Bm where µB =
e} 2me
(8.26)
(8.27)
and is known as the Bohr magneton. Thus, for example, if we apply a magnetic field to an atom in a p state (i.e., one with l = 1, m = −1, 0 or 1) the interaction energy (8.26) will lift its threefold degeneracy. We would therefore expect spectral lines resulting from a transition between this state and a nondegenerate s (l = 0, m = 0) state to be split into a triplet, the angular frequency difference between neighbouring lines being equal to (eB/2me ). However, when this experiment is performed on a one-electron atom such as hydrogen, a rather different result is observed. For strong fields the spectra are consistent with the 2p level having been split into four substates instead of three and the 1s level, which was expected to remain single, into two. As the latter state possesses no angular momentum associated with the motion of the electron in the field of the nucleus, the observed magnetic moment and associated angular momentum must arise from some other cause. As we shall show, these results can be explained if we postulate that an electron possesses an additional intrinsic angular momentum in addition to the orbital angular momentum discussed so far. Following normal practice, we shall often refer to the intrinsic angular momentum as “spin;” but, as will be discussed later, this should not be taken to mean that the electron is actually spinning. In the l = 0 state this intrinsic angular momentum can apparently adopt two orientations, while in the l = 1 state the orbital and intrinsic angular momenta couple to produce the four substates observed. The details of this “spin-orbit coupling” will be discussed in the next chapter.
The Stern-Gerlach experiment We have seen that the energy of an atom that possesses a magnetic moment is changed by the interaction between this magnetic moment and an applied magnetic field. If this applied field is inhomogeneous—i.e., varies from place to place—there
Angular Momentum I 181
(a) A beam of spin-half atoms is split by an inhomogeneous magnetic field. The form of the field is shown in projection down the beam direction in (b). FIGURE 8.2
will be a force on the atom directing it towards regions where the interaction energy is smaller. Thus, if an atom with a z component of magnetic moment µz is in a magnetic field directed along z whose magnitude B is a function of z, it will be subject to a force F given by ∂B F = −µz (8.28) ∂z An experiment that makes use of this effect to measure atomic magnetic moments is the Stern-Gerlach experiment. A beam of atoms is directed between two poles of a magnet that are shaped as shown in Figure 8.2 in order to produce an inhomogeneous field, which exerts a force on the atoms in a direction transverse to the beam. On emerging from the field, the atoms will have been deflected by an amount proportional to this component of their magnetic moment. In the absence of spin, therefore, we should expect the beam to be split into 2l + 1 parts where l is the total orbital angular momentum quantum number and, in the particular case where l is zero, we should then expect no splitting at all. However, when this experiment is performed on hydrogen (or a similar one-electron system) in its ground state, the atomic beam is found to be split into two, implying that the atom possesses a magnetic moment whose z component can adopt two opposite orientations with
182 Quantum Mechanics respect to the field. As the ground state of hydrogen has zero angular momentum, this magnetic moment cannot be associated with the orbital motion of the electron and must be due to its spin. Thus the Stern-Gerlach experiment provides direct evidence for the electron possessing intrinsic angular momentum which is consistent with the spectroscopic results described earlier. It is clear, however, that the rules governing the quantization of spin must be different from those governing orbital angular momentum, as the latter require there to be an odd number (2l + 1) of eigenstates of Lz whereas an even number (2) is observed in the Stern-Gerlach experiment. This would be consistent only if the total angular momentum quantum number l equalled one half in this case, with the quantum number representing the z component equal to ± 21 . But our earlier discussion, based on the analysis in Chapter 6, showed that these should be integers.
8.4
A GENERAL SOLUTION TO THE ANGULARMOMENTUM EIGENVALUE PROBLEM
If we re-examine the analysis in Chapter 6 of the angular part of the Schr¨odinger equation for a particle in a central field, we see that quantization of angular momentum is a consequence of the boundary condition on the wave function that requires it to be a continuous, single-valued, integrable function of the particle position. However, spin is not associated with the motion of the particle in space and its eigenfunctions are therefore not functions of particle position, so the same conditions need not apply in this case. On the other hand, spin is a form of angular momentum so it might be reasonable to expect the quantum-mechanical operators representing its components to obey the same algebra as do the corresponding operators in the orbital case. In particular, we might expect the commutation relations (8.7) and (8.12) to be universally valid for all forms of angular momentum. We shall assume this to be the case and, using only this assumption and the basic physical condition that the total angular momentum eigenvalues must be at least as large as those of any simultaneously measured component, we shall obtain expressions for the angular-momentum eigenvalues that are valid for spin, as well as in the orbital case. Let the simultaneous eigenvalues of Lˆ 2 and Lˆ z be α and β, respectively, and let the corresponding eigenfunction be φ, where φ is not necessarily a function of particle position. (A suitable representation in the case of spin will be described in the next chapter.) We then have Lˆ 2 φ = αφ
and
Lˆ z φ = βφ
(8.29)
where α > β2 because the total magnitude must be at least as large as that of one of its components. We now define two operators Lˆ + and Lˆ − as Lˆ + = Lˆ x + iLˆ y
Lˆ − = Lˆ x − iLˆ y
(8.30)
These operators do not represent physical quantities (note that they are not Hermitian), but are very useful in developing the mathematical argument that
Angular Momentum I 183 follows. For reasons that will soon become clear they are known as raising and lowering operators, or sometimes as ladder operators. Similar operators were used in the treatment of the harmonic oscillator in Chapter 7. We first establish some commutation relations involving Lˆ + and Lˆ − . From (8.30), using (8.4) and (8.7) Lˆ + Lˆ − = Lˆ 2x + Lˆ y2 − i[Lˆ x , Lˆ y ] = Lˆ 2 − Lˆ z2 + }Lˆ z
(8.31)
Lˆ − Lˆ + = Lˆ 2 − Lˆ z2 − }Lˆ z
(8.32)
[Lˆ + , Lˆ − ] = 2}Lˆ z
(8.33)
Similarly Hence We also have [Lˆ z , Lˆ + ] = [Lˆ z , Lˆ x ] + i[Lˆ z , Lˆ y ] = i}(Lˆ y − iLˆ x ) using (8.7). That is [Lˆ z , Lˆ + ] = }Lˆ +
(8.34)
[Lˆ z , Lˆ − ] = −}Lˆ −
(8.35)
and similarly Having established these relations, which we shall use shortly, we now operate on both sides of the second of equations (8.29) with Lˆ + giving Lˆ + Lˆ z φ = Lˆ + βφ Using (8.34) and remembering that β is a constant, this leads to Lˆ z (Lˆ + φ) = (β + })(Lˆ + φ)
(8.36)
Similarly, using (8.29) and (8.35), we get Lˆ z (Lˆ − φ) = (β − })(Lˆ − φ)
(8.37)
Thus, if φ is an eigenfunction of Lˆ z with eigenvalue β, (Lˆ + φ) and (Lˆ − φ) are also eigenfunctions of Lˆ z with eigenvalues (β + }) and (β − }), respectively. We also have, from the first of the equations (8.29) Lˆ + Lˆ 2 φ = αLˆ + φ
and
Lˆ − Lˆ 2 φ = αLˆ − φ
(8.38)
Now we know that Lˆ 2 commutes with both Lˆ x and Lˆ y so it must also commute with Lˆ + and Lˆ − ; (8.38) therefore becomes Lˆ 2 (Lˆ + φ) = α(Lˆ + φ)
and
Lˆ 2 (Lˆ − φ) = α(Lˆ − φ)
(8.39)
184 Quantum Mechanics so the functions (Lˆ + φ) and (Lˆ − φ), as well as being eigenfunctions of Lˆ z , are also eigenfunctions of Lˆ 2 with eigenvalue α. It follows that the operators Lˆ + and Lˆ − respectively “raise” and “lower” the eigenfunctions up or down the “ladder” of eigenvalues of Lˆ z corresponding to the same eigenvalue of Lˆ 2 , so accounting for the names of the operators mentioned earlier.1 We can now apply the condition that β2 is less than or equal to α, which implies that there must be both a maximum (say β1 ) and minimum (say β2 ) value of β corresponding to a particular value of α. If φ1 and φ2 are the corresponding eigenfunctions, we can satisfy (8.36) and (8.37) only if Lˆ + φ1 = 0
(8.40)
Lˆ − φ2 = 0
(8.41)
and We now operate on (8.40) with Lˆ − and use (8.32) to get Lˆ − Lˆ + φ1 = (Lˆ 2 − Lˆ z2 − }Lˆ z )φ1 = 0
(8.42)
and hence, using (8.29) (α − β21 − }β1 )φ1 = 0 That is α = β1 (β1 + })
(8.43)
Similarly, from (8.41), (8.31) and (8.29) α = β2 (β2 − })
(8.44)
It follows from (8.43) and (8.44) (remembering that β2 is less than β1 by definition) that β2 = −β1 (8.45) Now we saw from (8.36) and (8.37) that neighbouring values of β are separated by } so it follows that β1 and β2 are separated by an integral number (say n) of steps in }. That is β1 − β2 = n} (8.46) It follows directly from (8.45) and (8.46) that β1 = −β2 = n}/2 and hence, putting n equal to 2l and using (8.43) α = l(l + 1)}2
where l = n/2
(8.47)
and β = m}
(8.48)
1 The operators L ˆ + and Lˆ − are also known as “creation” and “annihilation” operators respectively because they “create” or “annihilate” quanta of Lˆ z .
Angular Momentum I 185 where m varies in integer steps between −l and l, and l is either an integer or a halfinteger. We see that this result is exactly the one we have been looking for. In the case of orbital angular momentum, the extra condition that the wave function must be a well-behaved function of particle position requires n to be even and l to be an integer so that (8.47) and (8.48) are equivalent to (8.18) and (8.21). However, when we are dealing with intrinsic angular momentum this condition no longer holds, so n may be an odd number, in which case l and m are half-integers. In the particular case of electron spin, the Stern-Gerlach experiment showed that the z-component operator has two eigenstates, implying that the total-spin quantum number equals one-half in this case, with m = ± 21 . This is always true for the electron and also for other fundamental particles such as the proton, the neutron and the neutrino which are accordingly known as “spin-half” particles. Other particles exist with total-spin quantum numbers that are integers or half-integers greater than one-half, and their angular-momentum properties are also exactly as predicted by this theory. The possession by the electron of two kinds of angular momentum—orbital and intrinsic—suggests an analogy with the classical case of a planet, such as the earth, orbiting the sun: it has orbital angular momentum associated with this motion and intrinsic angular momentum associated with the fact that it is spinning about an axis. Because of this, the intrinsic angular momentum of an electron (or other particle) is usually referred to as spin. However, there is no evidence to suggest that the electron is literally spinning and, indeed, it turns out that such a mechanical model could account for the observed angular momentum only if the electron had a structure, parts of which would have to be moving at a speed faster than that of light, so contradicting the theory of relativity! In fact the only satisfactory theory of the origin of spin results from a consideration of relativistic effects: Dirac showed that quantum mechanics and relativity could be made consistent only if the electron is assumed to be a point particle which possesses intrinsic angular momentum. By “intrinsic” we mean that this is a fundamental property of the electron, just like its charge or mass. Although conventionally referred to as “spin,” this implies nothing about its underlying structure. We shall give a brief introduction to Dirac’s theory in Chapter 14, where we shall see that all fundamental particles, such as the electron, should have intrinsic angular momentum with quantum number equal to one-half. Experimentally, it is found that particles that are not spin-half. 1 In order to proceed further with our study of angular momentum, we shall have 1 It is very important when referring to particles as “spin-half,” “spin-one” etc. to remember that these labels refer to the total-spin √ quantum number and not √ to the magnitude of the angular momentum which actually has the value 3}/2 in the first case and 2} in the second. Similarly an electron with z component of spin equal to 12 } or − 21 } is often referred to as having “spin up” or “spin down” respectively even though its angular momentum vector is inclined at an angle of cos−1 (3−1/2 ) ' 55◦ to the z axis—cf. Figure 8.1 are either “exchange” particles such as the photon, which are not subject to Dirac’s theory, or they have a structure, being composed of several more fundamental spin-half particles. For example, the alpha particle has total spin zero, but this results from a cancellation of the spins of the constituent protons and neutrons, which are spin-half particles. The truly fundamental constituents of matter (apart from the photon and other “exchange” particles that are not subject to Dirac’s theory) are believed to be leptons (which include the electron) and quarks, and these are all spin-half particles.
186 Quantum Mechanics to consider the eigenfunctions as well as the eigenvalues of the angular-momentum operators. However, we have already seen that if this discussion is to include spin, these eigenfunctions cannot be expressed as functions of particle position. In the next chapter, therefore, we shall develop an alternative representation of quantum mechanics based on matrices, which will be used to describe the properties of intrinsic angular momentum.
Notation Up to now we have used l to represent the “total” angular momentum quantum number, while m corresponds to that associated with the z component. From now on we shall follow the standard convention by which l and m are reserved for the quantum numbers representing the orbital angular momentum only; sometimes ml is used instead of m to emphasize its association with l. The corresponding quantities in the case of spin are then represented by s and m s , while in the case where the orbital and spin angular momenta combine together the quantum numbers associated with the resulting total angular momentum are denoted by j and m j . Other symbols are often used to describe the various contributions to the angular momentum of a nucleus composed of a number of nucleons, for example, but we shall not need to use these. Worked Example 8.3 What would be the result of passing atoms through a Stern-Gerlach experiment if their total angular momentum quantum number had the value: (i) j = 3/2, (ii) j = 2, (iii) j = 5/2? Solution In the Stern-Gerlach experiment, the atoms are deflected by an amount proportional to m j . There are 2 j + 1 possible values of m j corresponding to a given j, so the effect will be to split the beam of atoms into (i) 4, (ii) 5 and (iii) 6 beams, respectively
Worked Example 8.4 Obtain expressions for Lˆ x and Lˆ y and hence for Lˆ + and Lˆ − in spherical polar coordinates in the case of orbital angular momentum. Use this result to obtain unnormalized expressions for the spherical harmonics with l = 1, given that Yl,0 = cos θ Solution Using a similar procedure to that followed in (8.15) to (8.17), we have x0 = sin θ cos φr0 + cos θ cos φθ0 − sin φφ0 y0 = sin θ sin φr0 + cos θ sin φθ0 + cos φφ0 Hence ˆ = i} sin φ ∂ + cot θ cos φ ∂ Lˆ x = x0 · L ∂θ ∂φ
!
ˆ = i} − cos φ ∂ + cot θ sin φ ∂ Lˆ y = y0 · L ∂θ ∂φ
!
Angular Momentum I 187 so that ∂ ∂ Lˆ + = } exp(iφ) + i cot θ ∂θ ∂φ
!
∂ ∂ Lˆ − = } exp(−iφ) − + i cot θ ∂θ ∂φ
!
From the definition of the raising and lowering operators, we get Y1,±1 = Lˆ ± Y1,0 = ±} exp(±iφ)
∂ cos θ = ∓} sin θ exp(±iφ) ∂θ
which agrees with the expressions in (6.42) apart from normalizing factors.
Normalization We have seen that the operators Lˆ + and Lˆ − operate on the eigenfunction φm of Lˆ z to produce the eigenfunctions φm+1 and φm−1 . However, there is nothing in the earlier treatment to ensure that the eigenfunctions so created are normalized and, in general, they are not. To ensure normalization, we have to multiply the eigenfunctions by appropriate constants, after creating them using the ladder operators, as we now show. We first show that, although Lˆ + and Lˆ − are not Hermitian operators, they are Hermitian conjugates in the sense that Z Z ˆ f L+ g dτ = gLˆ −∗ f dτ (8.49) This follows straightforwardly from the definitions of Lˆ + and Lˆ − (8.30) and the fact that Lˆ x and Lˆ y are Hermitian Z Z Z Z f Lˆ + g dτ = f (Lˆ x + iLˆ y )g dτ = g(Lˆ ∗x + iLˆ y∗ ) f dτ = gLˆ −∗ f dτ We now show how a normalized eigenfunction, φm+1 , can be generated from φm , given that the latter is normalized. Z Z ∗ φm+1 φm+1 dτ = (Lˆ + φm )Lˆ +∗ φ∗m dτ Z = φ∗m Lˆ − Lˆ + φm dτ = [l(l + 1) − m(m + 1)]} (8.50) using (8.42) and the fact that φm is normalized. To produce a ladder of normalized eigenfunctions, we must therefore use φm+1 = [l(l + 1) − m(m + 1)]−1/2 }−1 Lˆ + φm
(8.51)
Similarly in the case of Lˆ − φm−1 = [l(l + 1) − m(m − 1)]−1/2 }−1 Lˆ − φm
(8.52)
These results will be used during the general treatment of the coupling angular momenta towards the end of the next chapter.
188 Quantum Mechanics Worked Example 8.5 Obtain normalized expressions for Y1,1 and Y1,−1 , given that Y1,0 = (3/4π)1/2 cos θ. (Use the results of Worked Example 5.4.) Solution Y1,±1 = [1 × 2 − 0 × 1]−1/2 }−1 Lˆ ± Y1,0 = ±2−1/2 exp(±iφ)
∂ (3/4π)1/2 cos θ ∂θ
= ∓(3/8π)1/2 sin θ exp(±iφ) which agrees with the expressions given in (6.42).
8.5
PROBLEMS
8.1 Show that ˆ Pˆ = L. ˆ R ˆ =0 L. 8.2 A point mass µ is attached by a massless rigid rod of length a to a fixed point in space and is free to rotate in any direction. Find a classical expression for the energy of the system if its total angular momentum has magnitude L, and hence show that the quantum-mechanical energy levels are given by l(l + 1)}2 /2ma2 where l is a positive integer. 8.3 The mass and rod in Problem 1 are now mounted so that they can rotate only about a particular axis that is perpendicular to the rod and fixed in space. What are the energy levels now? What can be said about the components of angular momentum perpendicular to the axis of such a system when it is in an energy eigenstate? 8.4 A particle has a wave function of the form ψ = z exp[−α(x2 + y2 + z2 )], where α is a constant. Show that this function is an eigenfunction of Lˆ 2 and Lˆ z and find the corresponding eigenvalues. Use the raising and lowering operators L+ and L− to obtain (unnormalized) expressions for all the other eigenfunctions of Lz that correspond to the same eigenvalue of Lˆ 2 . Hint: Use Cartesian coordinates. 8.5 Verify directly that the spherical harmonics Ylm with l 6 2 (see Chapter 6) are eigenfunctions of Lˆ 2 and Lˆ z with the appropriate eigenvalues. 8.6 In the carbon dioxide molecule, the three atoms form a straight line with the carbon in the centre. Obtain an expression for the rotational energy levels of this molecule and calculate the energies of the three lowest states. 8.7 A magnetic rigid rod is subject to a magnetic field. Its Hamiltonian is then Hˆ = αLˆ 2 + βLˆz where α and β are constants. Obtain expressions for the energy eigenvalues of this system. At time t = 0 the angular part of the wavefunction of this system is ψ(0) = (3/16π)1/2 sin θ sin φ. Obtain an expression for ψ(t). Show that at any time t, the wavefunction is an eigenfunction of a component of angular momentum in he xy plane. Give a physical interpretation of this result. 8.8 The total orbital angular momentum of the electrons in a silver atom turns out to be zero while its total-spin quantum number is 12 , resulting in a magnetic moment whose z component has a magnitude of 1 Bohr magneton (9.3 × 10−24 J T−1 ). In the original Stern-Gerlach experiment, silver atoms, each with a kinetic energy of about 3 × 10−20 J travelled a distance of 0.03 m through a nonuniform magnetic field of gradient 2.3 × 103 T m−1 . Calculate the separation of the two beams 0.25 m beyond the magnet. [The mass of a silver atom equals 1.79 × 10−25 kg.]
Angular Momentum I 189 8.9 Describe the patterns you would expect to see if the Stern-Gerlach experiment were performed on particles with zero spin and orbital angular momentum quantum numbers l = 1, l = 2 and l = 3 if the values of the angular momentum components are initially unknown. 8.10 A zero-spin particle with angular momentum quantum numbers l = 1 and m = 1 passes though a Stern Gerlach apparatus oriented so as to measure Lˆ x . What possible results could be obtained and what are their relative probabilities? In each case, describe the form of the angular part of the wavefunction following the measurement. 8.11 The wave function of a particle is known to have the form f (r, θ) cos φ. What can be predicted about the likely outcome of a measurement of the z component of angular momentum of this system? 8.12 Derive (8.52) in a similar manner to that set out for (8.51). 8.13 Express Lˆ x and Lˆ y in terms of Lˆ + and Lˆ − and hence show that for a system in an eigenstate of Lˆ z , hLˆ x i = hLˆ y i = 0. Also obtain expressions for hLˆ 2x i and hLˆ y2 i (using (8.51) and (8.52) where necessary) and compare the product of these two quantities with the predictions of the uncertainty principle. 8.14 Use the raising and lowering operators to derive normalized expressions for all the spherical harmonics with l = 2, given that !1/2 5 (3 cos2 θ − 1) Y20 = 16π Compare your results with those given in Chapter 6.
CHAPTER
9
Angular Momentum II CONTENTS 9.1 9.2 9.3 9.4 9.5
9.6 9.7
Matrix Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pauli Spin Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spin and the Quantum Theory of Measurement . . . . . . . . . . . . . . . . Dirac Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spin-Orbit Coupling and the Zeeman Effect . . . . . . . . . . . . . . . . . . . . Spin-orbit coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The strong-field Zeeman effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The weak-field Zeeman effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A More General Treatment of the Coupling of Angular Momenta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
191 195 197 200 202 204 206 207 209 216
In the first part of this chapter we shall show how it is possible to represent dynamical variables by matrices instead of differential operators without affecting the predicted results of physically significant quantities. We shall see that, although this representation is often more complicated and cumbersome than the methods used earlier, it has a particularly simple form when applied to problems involving angular momentum, and has the great advantage that it can be used in the case of spin where no differential operator representation exists. We shall use such spin matrices to analyze experiments designed to measure the components of spin in various directions and find that these provide an important and illustrative example of the quantum theory of measurement. In the second part of this chapter we shall discuss the problem of the addition of different angular momenta (such as the orbital and spin angular momenta of an electron in an atom) and illustrate the results by discussing the effects of spin-orbit coupling and applied magnetic fields on the spectra of one-electron atoms.
9.1
MATRIX REPRESENTATIONS
We first consider the case of a dynamical variable represented by the Hermitian ˆ If one of its eigenvalues is q and the corresponding eigenfunction is operator Q.
191
192 Quantum Mechanics ψ then ˆ = qψ Qψ
(9.1)
We now expand ψ in terms of some complete orthonormal set of functions φn , which ˆ are not necessarily eigenfunctions of Q—cf. (7.28) X ψ= an φn (9.2) n
As in Chapter 7, we assume that the φn form a discrete set, but the extension to the continuous case is reasonably straightforward. Substituting from (9.2) into (9.1) we get X X ˆ n=q an Qφ an φn (9.3) n
n
We now multiply both sides of (9.3) by the complex conjugate of one of the functions—say φ∗m —and integrate over all space. This leads to X Qmn an = qam (9.4) n
where the Qmn are defined by Qmn =
Z
ˆ n dτ φ∗m Qφ
(9.5)
and we have used the orthogonality property in deriving the right-hand side of (9.4). Equation (9.4) is true for all values of m so we can write it in matrix form Q11 Q21 · · ·
Q12 Q22 · · ·
· · · a1 a1 a · · · a2 2 · · · · = q · · · · · · · ··· ·
(9.6)
where it is clear from (9.5) and the fact that Qˆ is a Hermitian operator, that Qnm = Q∗mn . A matrix whose elements obey this condition is known as a Hermitian matrix, and Equation (9.6) is a matrix eigenvalue equation. Such an equation has solutions only for the particular values of q that satisfy the determinantal condition Q12 · · · Q11 − q Q∗ Q22 − q · · · 12 · · · · = 0 (9.7) · · · · · · · · · · · There are as many solutions to this equation as there are rows (or columns) in the determinant. As we have already defined q as one of the eigenvalues of the original ˆ it follows that the eigenvalues of the matrix equation (9.6) are the same as operator Q,
Angular Momentum II 193 those of the original differential equation (9.1) and these two equations are therefore formally equivalent. Equation (9.6) can therefore be written in the form [Q][a] = q[a]
(9.8)
where [Q] is the matrix whose elements are Qmn and [a] is the column vector whose elements are an . In the same way that ψ is an eigenfunction of the Hermitian operator ˆ the column vector [a] is an “eigenvector” of the Hermitian matrix [Q]. Q, The properties of Hermitian matrices are closely analogous to those of Hermitian operators discussed in Chapter 7. However, one difference is that where we took the complex conjugate of a function, we now take the complex conjugate of, say a column vector (e.g.[a] in (9.8)) and “transpose” it into the corresponding row vector. This process is known as “taking the Hermitian conjugate” of vector. We can apply a similar process to a matrix by exchanging its rows and columns) and at the same time taking its complex conjugate. In the case of a vector, this process is equivalent to converting a row vector to a column vector (or vice versa) and then taking its complex conjugate. In both cases this is symbolized by a † superscript. Thus, for any matrix [A] [A† ] ≡ [A˜∗ ] where ˜ implies transpose and * implies complex conjugate. So that if [Q] is a Hermitian matrix [Q† ] = [Q] (9.9) As an example of the equivalence of the matrix and differential-operator representations, we can show that the eigenvalues of a Hermitian matrix are real—cf. (7.17). We take the Hermitian conjugate of (9.8) [a† ][Q] = q∗ [a† ]
(9.10)
using the standard rule for transposing a product of matrices and (9.9). We now premultiply (9.8) by [a† ] to get [a† ][Q][a] = q[a† ][a] Then postmultiply (9.10) by [a] to get [a† ][Q][a] = q∗ [a† ][a] from which it follows that q = q∗ and the eigenvalue is real. We now consider orthonormality, which was expressed in Chapter 7 as Z φ∗n φm dτ = δnm
(9.11)
If [a0 ] is another eigenvector of [Q] with eigenvalue q0 , we postmultiply (9.10) by [a0 ] to get [a† ][Q][a0 ] = q∗ [a† ][a0 ] (9.12)
194 Quantum Mechanics We now premultiply (9.8) by [a0† ] to get [a0† ][Q][a] = q0 [a0† ][a] and take the Hermitian conjugate of this equation [a† ][Q][a0 ] = q0∗ [a† ][a0 ]
(9.13)
The left-hand sides of (9.12) and (9.13) are equal, so it follows that [a† ][a0 ] = 0
(9.14)
unless q = q0 . By orthogonality of eigenvectors we therefore mean that the scalar product of one with the Hermitian conjugate of the other is zero. We can also ensure normalization, in the sense that [a† ][a] = 1, by multiplying all the elements of the eigenvector by an appropriate constant. The other properties of Hermitian operators set out in Chapter 7 can be converted to matrix form in a similar manner, the main difference being that where integrals of products of operators and functions appear in the former case, they are replaced by products of matrices and vectors in the latter. We see from this that the physical properties of a quantum-mechanical system can be derived using appropriate matrix equations instead of the differential operator formalism. The postulates and algebra contained in Chapter 7 can all be expressed directly in matrix terms: the dynamical variables are represented by Hermitian matrices and the state of the system is described by a “state vector” that is identical to the appropriate eigenvector immediately after a measurement whose result was equal to the corresponding eigenvalue. Moreover, provided we could postulate appropriate matrices to represent the dynamical variables, the whole procedure could be carried out without ever referring to the original operators or their eigenfunctions. However, the matrix method suffers from one major disadvantage: because most quantum systems have an infinite number of eigenstates, the dynamical variables, including position and momentum, usually have to be represented by matrices of infinite order. Techniques for obtaining the eigenvalues in some such cases have been developed and, indeed, Heisenberg developed a form of quantum mechanics based on matrices (sometimes known as “matrix mechanics”) and used it to solve the energy eigenvalue equations for the simple harmonic oscillator and the hydrogen atom before Schr¨odinger had invented “wave mechanics.” Nevertheless, matrices of infinite order are generally difficult to handle and solving the corresponding differential equations is usually more straightforward. However, this problem does not arise when the number of eigenvalues is finite, which accounts for the usefulness of matrix methods when studying angular momentum. Provided the total angular momentum has a fixed value (determined by the quantum number l), only the 2l + 1 states with different values of the quantum number m have to be considered and the angular momentum operators can be represented by matrices of order 2l + 1. This simplification is particularly useful in the case of spin, as for spin-half particles, s = 21 (s is the equivalent of l in the case of spin) and therefore 2 × 2 matrices can be used. Moreover, we have seen that it is not possible to represent the spin components by differential operators so matrix methods are particularly useful in this case.
Angular Momentum II 195
9.2
PAULI SPIN MATRICES
A suitable set of 2 × 2 matrices that can be used to represent the angular-momentum components of a spin-half particle were first discovered by W. Pauli and are known as Pauli spin matrices. These are " # " # " # 0 1 0 −i 1 0 σx = σy = σz = (9.15) 1 0 i 0 0 −1 and the operators Sˆ x , Sˆ y and Sˆ z representing the three spin components are expressed in matrix form as 1 (9.16) Sˆ x = }σ x , etc. 2 To see that these matrices do indeed form a suitable representation for spin, we first check whether they obey the correct commutation relations " #" # " #" # 1 0 1 0 −i 0 −i 0 1 [Sˆ x , Sˆ y ] = }2 − 1 0 i 0 i 0 1 0 4 " # " # 1 i 0 −i 0 − = }2 0 −i 0 i 4 " # 1 1 0 = i }2 0 −1 2 = i}S z
(9.17)
This result agrees with that derived earlier (8.7) and the other commutation relations can be similarly verified. Turning now to the eigenvalues, we see that those of Sˆ z are simply 21 } times the eigenvalues of σz , which are in turn obtained from the 2 × 2 equivalent of (9.7) 1 − q 0 =0 (9.18) 0 −1 − q Hence q = ±1 and the eigenvalues of Sˆ z are ± 21 } as expected. The corresponding eigenvectors are obtained from the eigenvalue equation, which in the case q = +1 is " #" # " # 1 1 0 a1 1 a1 } = } (9.19) 2 0 −1 a2 2 a2 which is satisfied if a2 = 0, and, if we require the eigenvector to be normalized, we also have a1 = 1. In the case where q = −1, the corresponding results are a1 = 0 and a2 = 1. We therefore conclude that " # " # the eigenvectors of S z corresponding to the 1 0 eigenvalues 12 } and − 21 } are and , respectively. Moreover, these are orthogonal 0 1 as expected from (9.14). Eigenvalues and eigenvectors can similarly be found for Sˆ x and Sˆ y . In each case the eigenvalues are equal to ± 21 } and the corresponding eigenvectors are shown in Table 9.1. The reader should verify this and show that each eigenvector is normalized and that each pair is orthogonal.
196 Quantum Mechanics TABLE 9.1 The eigenvalues and eigenvectors of the matrices representing the angular momentum components of a spin-half particle. NB: The overall phase of an eigenvector, like that of a wave function, is arbitrary.
Spin component
Eigenvalue
Sˆ x
1 2}
Sˆ x
− 12 }
Sˆ y
1 2}
Sˆ y
− 12 }
Sˆ z
1 2}
Sˆ z
− 12 }
Eigenvector " # 1 1 αx ≡ √ 2 1 " # 1 1 βx ≡ √ 2 −1 " # 1 1 αy ≡ √ 2 i " # 1 1 βy ≡ √ 2 −i " # 1 αz ≡ 0 " # 0 βz ≡ 1
As a further test of the correctness of this representation, we consider the square of the total angular momentum whose operator is given by Sˆ 2 = Sˆ 2x + Sˆ y2 + Sˆ z2 " # 3 1 0 = }2 0 1 4
(9.20)
using (9.15) and (9.16). Clearly Sˆ 2 commutes with Sˆ x , Sˆ y and Sˆ z as expected and all the eigenvectors given in Table 9.1 are also eigenvectors of Sˆ 2 with eigenvalue 21 ( 12 + 1)}2 . We conclude therefore that the Pauli spin matrix representation for the angularmomentum components of a spin-half particle produces the same results as were obtained previously, and we can proceed with confidence to use this representation to obtain further information about the properties of spin. Worked Example 9.1 Find a matrix representations for the raising and lowering operators defined in Chapter 8 in the case of a spin-half particle and verify that they have the expected properties. Solution " # " #! " # 1 0 1 0 −i 0 1 Sˆ + = Sˆ x + iSˆ y = } +i =} i 0 0 0 2 1 0 " 1 0 ˆ ˆ ˆ S − = S x − iS y = } 2 1
# " 1 0 −i 0 i
#! " −i 0 =} 0 1
0 0
#
Angular Momentum II 197
" # " 0 1 Sˆ + = 0 0 " # " 0 0 = Sˆ + 0 1
#" # 1 1 =0 0 0 #" # " # 1 1 0 = 0 0 1
#" # " # " # " 0 0 0 1 1 = Sˆ − = 1 1 0 0 0 #" # " # " 0 0 0 0 =0 = Sˆ − 1 0 1 1
confirming that the raising and lowering operators move the state up and down the (two-rung) ladder of eigenstates of Sˆ z .
9.3
SPIN AND THE QUANTUM THEORY OF MEASUREMENT
The measurement of spin provides a very clear illustration of the quantum theory of measurement. Consider a beam of spin-half particles travelling along the y axis, as in Figure 9.1, towards a Stern-Gerlach apparatus oriented to measure the z component of spin (such an apparatus will be denoted by SGZ). We assume that the number of particles in each beam is small enough for there to be only one in the Stern-Gerlach apparatus at any one time; we can therefore treat each particle independently and ignore interactions between them. Two beams of particles will emerge, one with z component of spin equal to − 12 }, which we block off with a suitable stop, and the other with z component 12 }, which we allow to continue. This measurement has therefore defined the state of the system and the state vector of the particles that carry on is known to be the eigenvector αz of Sˆ z , corresponding to an eigenvalue of 12 } (cf. Table 9.1). If we were to pass this beam directly through another SGZ apparatus, we should expect all the particles to emerge in the channel corresponding to S z = 12 } and none to emerge in the other channel, and indeed this prediction is confirmed experimentally. We now consider the case where the particles that emerge from the first SGZ with a positive z component pass into a similar apparatus oriented to measure the x component of spin (i.e., an SGX) as in Figure 9.1. Each particle must emerge from the SGX with S x equal to either 12 } or − 12 }, but, as the initial state vector is not an eigenvector of Sˆ x , we cannot predict which result will actually be obtained. However, we can use the quantum theory of measurement (Chapter 7, Postulate 4) to predict the relative probabilities of the possible outcomes. We expand the spin part1 of the initial state vector as a linear combination of the eigenvectors of Sˆ x and the probabilities are then given by the squares of the appropriate expansion coefficients. Thus, referring to Table 9.1 αz = c+ α x + c− β x (9.21) 1 To define the complete state of the particle we should multiply the vector representing the spin state of the particle by a wave function representing its motion in space. For "example, the full state of the particle # 1 emerging from the first SGZ apparatus would have the form exp iky . It simplifies the treatment if this 0 is implied rather than explicitly included.
198 Quantum Mechanics
Successive measurements of the angular momentum components of spin-half particles. The boxes represent sets of Stern-Gerlach apparatus which direct a particle into the upper or lower output channel depending on whether the appropriate spin component is found to be positive or negative. The y direction is from left to right. FIGURE 9.1
where c+ and c− are constants. Using the matrix equivalent of (7.30), we get i "1# 1 1 h † =√ c+ =α x αz = √ 1 1 0 2 2 " # i 1 1 h 1 c− =β†x αz = √ 1 −1 =√ (9.22) 0 2 2 Thus 1 αz = √ (α x + β x ) 2
(9.23)
The two possible results of a measurement of the x component of spin therefore have equal probability so, although we cannot predict the result of the measurement on any particular particle, we know that, if the experiment is repeated a large number of times, half the particles will emerge in each channel. We saw in Chapter 7 that an important aspect of quantum measurement theory is that the act of measurement leaves the system in an eigenstate of the measurement operator, but destroys knowledge of any other property that is not compatible with it. To emphasize this point, suppose we pass one of the beams of particles emerging from the SGX through a second SGZ (see Figure 9.1 again); a similar argument to the previous one shows that we shall again be unable to predict in which channel a particular particle will emerge, although the relative probabilities of the two possible results are again equal. We see therefore that the act of measuring one component of spin has destroyed any knowledge we previously had about the value of another. In fact the quantum theory of measurement (at least as it is conventionally interpreted) states that while a particle is in an eigenstate of Sˆ z , it is meaningless to think of it as having a value for the x component of its spin, as the latter can be measured only by changing the state
Angular Momentum II 199 of the system to be an eigenstate of Sˆ x . Moreover, as we have seen, the result of a measurement of the x component is completely unpredictable if the particle is in an eigenstate of Sˆ z , and this indeterminacy is to be thought of as an intrinsic property of such a quantum system. We now consider a modification of the setup illustrated in Figure 9.1 in which the second Stern-Gerlach apparatus (now called SGθ) is oriented to measure the spin component in the xz plane at an angle θ to z. The operator Sˆ θ , representing the component of angular momentum in this direction, is given by analogy with the classical expression as Sˆ θ = Sˆ z cos θ + Sˆ x sin θ " # 1 cos θ sin θ = } 2 sin θ − cos θ
(9.24)
using (9.15). The eigenvalues of this matrix are easily shown to be ± 12 } as expected " # a and if the eigenvector in the positive case is 1 , then a2 1 1 }(a1 cos θ + a2 sin θ) = }a1 2 2 and therefore
cos(θ/2) sin θ a1 = (9.25) = a2 1 − cos θ sin(θ/2) " # cos(θ/2) The normalized eigenvector is therefore and a similar argument leads to sin(θ/2) " # − sin(θ/2) the expression in the case where the eigenvalue equals − 21 }. We note cos(θ/2) that when θ = 0, the eigenvectors are the same as those of Sˆ z , and when θ = π/2 they are identical to those of Sˆ x (cf. Table 6.1); this is to be expected as the θ direction is parallel to z and x, respectively, in these cases. The probabilities of obtaining a positive or negative result from such a measurement are |c+ |2 and |c− |2 respectively, where c+ and c− are now the coefficients of the SGθ eigenvectors in the expansion of the initial state vector. Thus " # " # " # 1 cos(θ/2) − sin(θ/2) = c+ + c− (9.26) 0 sin(θ/2) cos(θ/2) from which it follows that c+ = cos(θ/2)
and
c− = − sin(θ/2)
(9.27)
and the probabilities of a positive or negative result are therefore cos2 (θ/2) and sin2 (θ/2), respectively. We note that, when θ = 0, the result is certain to be positive because in this case the second apparatus is identical to the first, and that when θ = π/2 (corresponding to a measurement of the x component) the probability of
200 Quantum Mechanics each is equal to one-half in agreement with the earlier result. Experiments have been performed to carry out such measurements and in each case they produce results in agreement with those predicted by the quantum theory of measurement. Some scientists have found these features of quantum mechanics very unsatisfactory and attempts have been made to develop other theories, known as “hidden variable theories,” that avoid the problems of indeterminacy and the like. We shall discuss these in some detail in Chapter 16 where we shall show that, where such theories predict results different from those of quantum mechanics, experiment has always favoured the latter. In that chapter we shall also discuss the measurement problem in more depth, again making use of the measurement of spin components as an illustrative example. Worked Example 9.2 An experiment like that just described, but with the second and third apparatuses oriented in the θ and x directions is performed. What are the probabilities of a positive or negative result in this case? " # cos(θ/2) Solution The state of the particles emerging from SGθ and entering SGX is , and sin(θ/2) " # " # 1 1 the eigenvectors of SGX are α x = 2−1/2 and β x = 2−1/2 . The equivalent of (9.26) in 1 −1 this case is therefore " # " # " # cos(θ/2) 1 1 = 2−1/2 c+ + 2−1/2 c− sin(θ/2) 1 −1 Hence c+ = 2−1/2 (cos(θ/2) + sin(θ/2))
and
c− = 2−1/2 (cos(θ/2) − sin(θ/2))
The probabilities of a positive or negative result are therefore sin(θ)], respectively.
9.4
1 2 [1 + sin(θ)]
and
1 2 [1 −
DIRAC NOTATION
P. A. M. Dirac was born and educated in Bristol, UK and conducted research in mathematical physics in the University of Cambridge from the 1930s until his death in 1984. He is best known for his work in developing the “Dirac equation” which extends quantum mechanics into the high-energy regime where relativistic effects are important (see Chapter 14). Two main consequences of this are the fact that electrons and other fundamental particles possess an intrinsic angular momentum or “spin” and that every such particle also has an “antiparticle”—e.g., the positron is the antiparticle of the electron. Another major contribution by Dirac was to put the whole of quantum mechanics on a more general footing. He was one of the first to realize that the measurable results of the theory are independent of the “basis” used to describe them. By this we mean that alternative formulations, such as the wave function and any matrix representation, are completely equivalent as far as their predictions of experimental
Angular Momentum II 201 results are concerned. Dirac developed a “representation-free” notation to express this. Consider a typical quantum expression such as Z ˆ 2 dτ Q12 = ψ∗1 Qψ (9.28) in wave notation, and its equivalent in matrix notation which is Q12 = [a†1 ][Q][a2 ]
(9.29)
ˆ Q12 = h1|Q|2i
(9.30)
In Dirac notation we write as a general expression that can stand for either of the other two. We can now split the “bracket” in (9.30) into a “bra” , denoted by h1|, and a “ket,” |2i which are the equivalents of ψ∗1 (or [a†1 ]) and ψ2 (or [a2 ],)respectively. To express an eigenfunction equation such as ˆ n = qn φn Qφ
or
[Q][an ] = q[an ]
(9.31)
in Dirac notation, we therefore write ˆ = q|ni Q|ni
(9.32)
We note that what is written inside the bra or the ket is quite arbitrary and can be chosen to suit the context of the problem. For example, the ground state of the hydrogen atom might be written as 1 |n = 1, l = 0, m = 0, s = i 2 or
1 |1, 0, 0, i 2
or even as |the ground state of the hydrogen atomi. Dirac developed this notation when he formulated quantum mechanics in the language of “vector spaces.” The bras and kets form coordinate systems for two abstract “Hilbert spaces” of, in general, infinite dimensions. These ideas underlie most advanced thinking about quantum mechanics, but we shall use the Dirac notation as one that can be translated into whatever wave or matrix representation is most useful for the problem at hand. To recover the wave representation we should replace the ket by a wave function, and the bra by its complex conjugate, and if they appear together as in (9.30) the integration (9.28) is implied. The matrix representation can be similarly restored, remembering that (9.30) now implies the product (9.29). In the rest of this book, we shall use Dirac notation sparingly, but an example of its application will be found in the section on combining angular momenta towards the end of this chapter.
202 Quantum Mechanics
9.5
SPIN-ORBIT COUPLING AND THE ZEEMAN EFFECT
In the last chapter we showed that, classically, an electron with orbital angular momentum l has a magnetic moment µl given by (8.23) µl = −
e l 2me
(9.33)
ˆ are similarly and that the corresponding quantum-mechanical operators µˆ l and L related. There is a similar magnetic moment µ s associated with electron spin, which is proportional to the spin angular momentum, s, but the constant of proportionality in this case cannot be easily calculated and must be obtained from experiment or from relativistic quantum theory; its value turns out to be almost exactly twice that in the orbital case and we therefore have µs = −
e s me
(9.34)
The orbital and spin magnetic moments separately interact with any applied magnetic field (the Zeeman effect) and also with each other (spin-orbit coupling). To obtain an expression for the quantum-mechanical operator representing the spinorbit energy, we follow the same procedure as before and calculate an expression for the corresponding classical quantity, which we then quantize following the procedure set out in Chapter 7. We first consider the particular case of a one-electron atom, such as hydrogen, and extend our treatment to the more general case later. Classically, an electron orbiting a nucleus of charge Ze “sees” the nucleus in orbit around it. Assuming the orbit to be circular and the angular velocity to be ω, the electron is at the centre of a loop of radius r carrying a current, I, equal to Zeω/2π and is therefore subject to a magnetic field of magnitude B=
µ0 I µ0 Zeω µ0 Ze = = l 2r 4πr 4πme r3
(9.35)
where l is, as before, the magnitude of the classical angular momentum of the electron—i.e., l = me r2 ω. This is an example of a more general expression for the magnetic field B seen by a particle moving with velocity v in an electric field E(r), which is B = E × v/c2 (9.36) where c is the speed of light. If the electrostatic potential U associated with the field is spherically symmetric (as in a one-electron atom) we have ∂U r ∂r r 1 ∂V r = e ∂r r
E =−
(9.37)
where V = −eU is the potential energy of the electron in the electrostatic potential
Angular Momentum II 203 U. Hence, using (9.36) and (9.37) along with l = r × p = me r × v we get 1 ∂V r×v ec2 r ∂r 1 ∂V = l me ec2 r ∂r
B=
(9.38)
In the case of a one-electron atom V(r) = −Ze2 /(4π0 r) and (9.38) is equivalent to (9.35). If the magnetic moment associated with the electron spin is µ s the energy of interaction (W) between it and this magnetic field is given by 1 W = − µs · B 2
(9.39)
where the factor of one-half arises from a relativistic effect known as Thomas precession, which we shall not discuss further here–except to note that it has nothing to do with the other relativistic factor of one-half mentioned in connection with Equation (9.34). Combining (9.34), (9.38) and (9.39), we get W=
1 ∂V l·s 2m2e c2 r ∂r
(9.40)
We can make the transformation from classical mechanics to quantum mechanics in the usual way by replacing the dynamical variables with appropriate operators so that the interaction energy is represented by Hˆ 0 =
1 ∂V ˆ ˆ L·S 2m2e c2 r ∂r
(9.41)
It is useful to recast this expression in terms of the magnitudes of the orbital, spin ˆ Sˆ and total angular momenta. If the total angular momentum is represented by Jˆ = L+ then ˆ · Sˆ Jˆ2 = Lˆ 2 + Sˆ 2 + 2L (9.42) and hence Hˆ 0 =
1 ∂V ˆ2 ˆ 2 ˆ 2 (J − L − S ) 4m2e c2 r ∂r
(9.43)
Also, in the presence of an external magnetic field of magnitude B0 which we assume to be in the z direction, we have, using (8.25), (8.27) and (9.34) Hˆ 0 = f (r)( Jˆ2 − Lˆ 2 − Sˆ 2 ) + (µB /})B0 (Lˆz + 2Sˆz ) where f (r) =
1 ∂V µ0 Ze2 = 4m2e c2 r ∂r 16πm2e r3
(9.44)
(9.45)
and the final expression applies to the one-electron atom. We shall shortly discuss the effect of this on the energy levels of a one-electron atom when we shall need
204 Quantum Mechanics to remember some of the properties of the hydrogen-atom energy eigenfunctions derived in Chapter 6. In particular: the energy eigenfunctions are also eigenfunctions of the squared total orbital angular momentum Lˆ2 (eigenvalue l(l + 1)}2 , where l is a nonnegative integer) and its z component Lˆz (eigenvalue ml }, where −l ≤ m ≤ l); and eigenfunctions with the same l and different ml are degenerate. A general solution to (9.44) is a little complicated to derive, so we shall first treat some special cases. Some readers may prefer to proceed directly to the general treatment in Section 9.6.
Spin-orbit coupling We first consider the case where the applied field is zero and we can ignore the second term of (9.44). The first point to note is that if B0 = 0, there is nothing to define the direction of the z axis and therefore no necessity for the energy eigenfunctions to be eigenfunctions of Lˆz or Sˆz . However, the first term contains Jˆ2 , the squared magnitude of the total angular momentum, so the energy eigenfunctions should be simultaneous eigenfunctions of Jˆ2 , Lˆ 2 and Sˆ 2 , which can, in general, be expressed as linear combinations of the eigenfunctions of Lˆz and Sˆz . At the end of the last chapter, we showed from very general considerations that the energy levels of any total angular momentum operator have the form j( j + 1)}2 , where j is an integer or a half integer. It follows from this and (9.44) that the changes in the energy levels are proportional to [ j( j + 1) − l(l + 1) − s(s + 1)]}2 The constant of proportionality can be estimated by assuming that the radial part of the wave function is unaffected by Hˆ 0 so that the energy change, δE, can be approximated by its expectation value. (This will be more rigourously justified in the next chapter.) We therefore have δE = E0 [ j( j + 1) − l(l + 1) − s(s + 1)]
(9.46)
where E0 is defined as h f (r)i}2 . The allowed values of j for given l and s can be deduced from the following physical argument, which will be made more rigourous later. We consider the two extreme cases where the orbital angular momenta are (i) as closely parallel and (ii) as closely antiparallel as possible, given the restrictions on j, l, and s. In the first case, j = l + s while in the second j = |l − s|. (This is illustrated in the case of l = 1 and s = 21 in Figure 9.2.) The other allowed values of j form a ladder with integer steps between these extremes. In the case where s equals 12 the possible values of j are (l + 12 ) and (l − 21 ); unless l equals zero when j and s are identical. Thus it follows from (9.46) that spinorbit coupling results in all states with l , 0 being split into two, one member of the resulting doublet having its energy raised1 by an amount proportional to l while 1 There are further relativistic corrections that have the same effect on both the split levels. Thus, although the splitting is correctly predicted by the above argument, the positions of the split levels relative to that calculated without spin-orbit coupling may differ—see problem 10.12.
Angular Momentum II 205
FIGURE 9.2 The maximum (minimum) possible values of j are obtained when l and s are as near parallel (antiparallel) as possible, given that the squared magnitudes of j, l and s must be proportional to j( j + 1), l(l + 1) and s(s + 1) respectively, where j, l and s are integers or half-integers. The diagram shows the case where l = 1, s = 21 and j = 32 or 12 .
the other is lowered by an amount proportional to (l + 1) (see Figure 9.3). States with l = 0, however, have j = s = 21 and are not split by spin-orbit coupling. We note that the quantum number m j does not enter (9.46)—as indeed would be expected because the spherical symmetry of the problem has not been broken—so each state with a given value of j consists of (2 j + 1) degenerate components, which can be separated by an applied magnetic field as discussed in the next two subsections. A familiar example of such a doublet is observed in the spectrum of sodium and other alkali metals. The outer electron in sodium can be considered as moving in a spherically symmetric potential resulting from the nucleus and ten inner electrons. As this potential is not exactly proportional to 1/r, the energy level spectrum differs from that of hydrogen in that states with the same radial quantum number, but different l, are not degenerate. The well-known D lines result from transitions between states with l = 1 and l = 0, respectively. The first of these is split into two by the spin-orbit coupling, one component having j = 32 and being fourfold degenerate and the other having j = 21 and being doubly degenerate, while the l = 0 state is not affected by spinorbit coupling and is therefore a singlet. The total number of states (6) equals 2(2l+1) as expected for a spin-half particle with l = 1. Transitions between the two levels with l = 1 and the singlet with l = 0 produce a pair of closely spaced spectral lines in agreement with experimental observation. The difference between the frequencies of these lines can be calculated in the way described earlier and such calculations produce results that are in excellent agreement with experiment in every case. Similar effects of spin-orbit coupling are observed in the spectrum of hydrogen, but these are complicated by the fact that states with different l are accidentally degenerate. Worked Example 9.3 Find the quantum numbers and degeneracy of the l = 2 states of an alkali atom.
206 Quantum Mechanics
FIGURE 9.3
The effect of spin-orbit coupling on the energy levels of a one-electron
atom. Solution j = l + s or |l − s|, i.e., 52 or 32 . The l = 2 states are therefore split into a doublet. The degeneracy of either state is equal to 2 j + 1—i.e., 6 for j = 25 and 4 for j = 23 . The total number of states (10) equals 2(2 × 2 + 1) as expected for a spin-half particle with l = 2.
The strong-field Zeeman effect We now consider the effect of an applied field and begin with the most straightforward case; this is the strong-field Zeeman effect, where the second term in (9.44) is much larger than the first. This means that the effect of the applied field is much greater than the spin-orbit coupling, which we ignore to a first approximation. In this case, simple products of the eigenfunctions of Lˆz and Sˆz are also eigenfunctions of Hˆ 0 . The energy eigenvalues are therefore changed by an amount equal to ∆E, where ∆E = µB B(ml + 2m s )
(9.47)
and ml and m s are the quantum numbers associated with the z components of the orbital and spin angular momenta. Each state corresponding to given values of l and s is therefore split into (2l + 1)(2s + 1) equally-spaced components, although some of these may be accidentally degenerate. For example, if l = 1, the possible values of ml are ±1 or 0, while those of m s , are ± 21 . Hence, (ml + 2m s ) can equal ±2, ±1 or 0 where the last state is doubly degenerate. Although for high field strengths the spin-orbit interaction is smaller than that representing the interaction with the field, it may not be so small that it can be ignored completely. A further correction, δE, to each state specified by ml and m s can be estimated by assuming that the eigenfunctions are the same as in the absence of any spin-orbit interaction and that the changes to the eigenvalues are just the expectation values of the spin-orbit contribution to the Hamiltonian (this is an example of perturbation theory to be discussed in the next chapter). We get from
Angular Momentum II 207 (9.44) ˆ δE = 2h f (r)ihLˆ · Si = 2E0 ml m s
(9.48)
where we have used the fact that hLˆx i = hLˆy i = hSˆx i = hSˆy i = 0 when the system is in an eigenstate of Lˆz and Sˆz and, as earlier, E0 = }2 < f (r) >. In the case where l = 1, ml = 0, ±1, m s = ±1/2, the energy levels are E = ±2µB B + E0 E = ±µB B E = −E0
1 2 1 where ml = 0 and m s = ± 2 1 where ml = ±1 and m s = ∓ 2 where ml = ±1 and m s = ±
(9.49) and the level with E = −E0 is twofold degenerate. The above results are illustrated in Figure 9.5 along with the effects of weak and intermediate magnetic fields, which will be discussed shortly.
The weak-field Zeeman effect The opposite limit is the weak-field Zeeman effect, where the applied magnetic field B0 is small enough for its contribution to the energy to be much less than that due to spin-orbit coupling. In this case, we can assume the eigenfunctions to be effectively unchanged from the zero-field case—cf. the arguments leading to (9.46) and (9.48). A level with a particular value of j, l and s is split into (2 j + 1) states, each denoted by a particular value of the quantum number, m j , associated with the z component of the total angular momentum. The evaluation of the magnitude of the splitting in the weak-field case is complicated by the fact that the operator representing the total ˆ represents a vector that is not in the same magnetic moment µˆ = (e/2me )(Lˆ + 2S) ˆ (see direction as the operator representing the total angular momentum—(Lˆ + S) Figure 9.4). We shall treat this problem more rigourously later, but for the moment use the following semi-classical argument. Referring to Figure 9.4 , if j = l + s, then for a given j, l and s, µ must lie on a cone with axis j. The average value of µ is the average over the surface of the cone, which is just the component (µ j ) of the total magnetic moment in the direction of the total angular momentum. We therefore have µ j = j · µ/ j e e =− j · (l + 2s) = − ( j2 + j · s) 2me j 2me j ! ! ej l · s + s2 ej j2 − l 2 + s 2 =− 1+ =− 1+ 2me 2me j2 2 j2
208 Quantum Mechanics
The orbital and spin magnetic moments are parallel to the respective angular momenta, but the constants of proportionality are different, so that the total magnetic moment, µ, is not parallel to j. The vectors representing l, s, j, µl , µs and µ are shown, assuming that they all lie in the plane of the page. However, for a given j, l and s lie on the surfaces of the cones and, as a result, µ also lies on a cone whose axis is j. In weak magnetic fields, the average value of µ is therefore its component parallel to j. FIGURE 9.4
using (9.42). The interaction energy δE is then given by -Bµ jz where µ jz is the z component of µ j . That is ! eB jz j2 − l2 + s2 δE = 1+ (9.50) 2me 2 j2 We now replace the classical expressions for the magnitudes of the angular momenta by the eigenvalues of the corresponding quantum-mechanical operators, so that the change in energy relative to the zero-field case is given by δE = gµB m j B
(9.51)
where µB = e}/2me is the Bohr magneton (8.27) and g = 1+
j( j + 1) − l(l + 1)) + s(s + 1) 2 j( j + 1)
(9.52)
is known as the Land´e g-factor. We see, for example, that in the case where s = 0, j
Angular Momentum II 209 must equal l and therefore g equals 1. This corresponds to all the angular momentum being orbital so that the total magnetic moment is directly proportional to the total angular momentum. This also happens if there is no orbital angular momentum (that is, l = 0 and therefore j = s) and in this case g = 2. We now use these results to discuss the effect of applying a weak magnetic field to a one-electron atom such as sodium, concentrating on the state l = 1. From (9.51), the number of energy levels is equal to the number of possible values of m j , which is 2 j + 1—i.e., 2, and 4 in the case of j = 1/2 and 3/2 respectively. Thus the changes to the l = 1 energy level due to the combined effect of spin-orbit coupling and the weak-field Zeeman effect are 3 3 E = E0 ± 2µB B where j = and m j = ± 2 2 2 3 1 E = E0 ± µB B where j = and m j = ± 3 2 2 1 1 1 E = −2E0 ± µB B where j = and m j = ± (9.53) 3 2 2 This is shown in Figure 9.5, which also illustrates the case of strong and intermediate fields. We note how some of the levels cross each other as the field is increased from the weak- to the strong-field limit. To fully understand the intermediate region, a more sophisticated treatment is required and this is set out in the next section. Worked Example 9.4 What splitting pattern is generated when (i) a strong and (ii) a weak, magnetic field is applied to the l = 2 states of a one-electron atom? Solution (i) If l = 2, ml = −2, -1, 0, 1 or 2, while m s = ±1/2. From (9.47), the energy of interaction with the field is proportional to ml + 2m s in the strong-field limit and the possible values of this quantity are: -3, -2, -1*, 0*, 1*, 2, 3, where the states marked by an asterisk are doubly degenerate. The further correction given by (9.48) lifts these degeneracies except for the case where ml + 2m s = 0 (ii) From Worked Example 9.3, the j values of the states are 5/2 and 3/2. A weak field splits each level according to the 2 j + 1 allowed values of m j —i.e into 6 and 4 sublevels, respectively.
9.6
A MORE GENERAL TREATMENT OF THE COUPLING OF ANGULAR MOMENTA
We consider two angular momenta whose squared magnitudes are represented by the operators Lˆ 2 and Sˆ 2 and whose z components are represented by Lˆ z and Sˆ z . Despite this notation, our treatment is not confined to the spin-orbit case, but can be applied to any two coupled angular momenta. We assume that the two angular momenta can be measured compatibly so all four operators commute. That is [Lˆ 2 , Lˆ z ] = [Sˆ 2 , Sˆ z ] = [Lˆ 2 , Sˆ 2 ] = [Lˆ 2 , Sˆ z ] = [Lˆ z , Sˆ 2 ] = [Lˆ z , Sˆ z ] = 0
(9.54)
210 Quantum Mechanics
FIGURE 9.5 The splitting ∆E of the states with l = 1 in a one-electron atom as a function of applied magnetic field. Energy is in units of E0 =< f (r) > }2 and the field is in units of E0 /µB .
The four operators therefore have a common set of eigenstates, which will be specified by the quantum numbers l, ml , s and m s , where these are all either integers or half-integers and where −l 6 ml 6 l and −s 6 m s 6 s, as discussed in the previous chapter. We represent the state with these quantum numbers in Dirac notation as |l, ml , s, m s i or just |ml , m s i. This state is therefore a direct product of the individual eigenstates. That is |ml , m s i = |ml i|m s i (9.55) The fact that the expressions (9.55) are eigenfunctions of all four operators follows immediately on substitution into the appropriate eigenvalue equations. This discussion has assumed that the values of the total and z component of both angular momenta are known. Although this condition is sometimes fulfilled at least approximately (as in the strong-field Zeeman effect) other situations, (such as spinorbit coupling) often arise in which the values of the individual z components are unknown. The quantities measured in the latter case are, typically, the individual magnitudes whose squares are represented by Lˆ 2 and Sˆ 2 as before, along with the squared magnitude and z component of the total which are represented by the
Angular Momentum II 211 operators Jˆ2 and Jˆz respectively, where ˆ 2 = Lˆ 2 + Sˆ 2 + 2Lˆ · Sˆ Jˆ2 = |Lˆ + S|
(9.56)
Jˆz = Lˆ z + Sˆ z
(9.57)
and Clearly Jˆ2 commutes with Lˆ 2 and Sˆ 2 and also with Jˆz , but not with Lˆ z or Sˆ z —these relations can be checked by substituting from (9.56) and (9.57) into the commutation relation (8.7). It follows from the general discussion of degeneracy at the end of Chapter 7 that the operators Jˆ2 , Jˆz , Lˆ 2 and Sˆ 2 possess a common set of eigenstates. We now explain how to obtain these along with the corresponding eigenvalues. The eigenvalues of the square of the magnitude of any angular momentum vector and its corresponding z component were determined in the previous chapter, using a very general argument based only on the commutation relations between the operators representing the components. The results of this can therefore be applied to the present case, which means that the eigenvalues of Jˆ2 , Jˆz , Lˆ 2 , and Sˆ 2 are j( j+1)}2 , m j }, l(l + 1)}2 and s(s + 1)}2 , respectively, where j, l, s and m j are integers or halfintegers and − j 6 m j 6 j. The common eigenstates can therefore be denoted by these four quantum numbers and represented by | j, m j , l, si or | j, m j i. To find expressions for these, we first consider again the products |ml i|m s i, which were shown earlier to be eigenfunctions of the operators Lˆ 2 , Sˆ 2 , Lˆ z and Sˆ z . If we operate on these with Jˆz we obtain, using (9.57) Jˆz |ml i|m s i = (Lˆ z + Sˆ z )|ml i|m s i = (ml + m s )}|ml i|m s i
(9.58)
These products are therefore also eigenfunctions of Jˆz with eigenvalues m j } where m j = ml + m s
(9.59)
and all such products that have the same values of l, s and m j form a degenerate set with respect to the operators Lˆ 2 , Sˆ 2 and Jˆz , but not necessarily of Jˆ2 . Physically this means that the total z component of angular momentum must be the sum of the two individual z components, and any allowed orientation of L and S that satisfies this condition and has the correct value for the total angular momentum is a possible state of the system. Remembering the general discussion of degenerate systems at the end of Chapter 7, we see that any linear combination of these degenerate eigenfunctions is also an eigenfunction with the same eigenvalue, so we can express the eigenfunctions of Jˆ2 as a linear combination of the members of this degenerate set X X | j, m j i = C j,m j ,ml ,ms |ml i|m s i (9.60) ml =−l,l m s =−s,s
where the constants, C j,m j ,ml ,m s are known as Clebsch-Gordan coefficients. We first consider the state where ml and m s have their maximum possible values which are l and s, respectively. There is only one product, |ml = li|m s = si,
212 Quantum Mechanics TABLE 9.2 Clebsch-Gordan coefficients for the case l = 1 and s = 21 . Eigenfunctions of the total angular momentum with given values of ( j, m j ) are constructed as linear combinations of products of the angular momentum eigenfunctions denoted by the values of (ml , m s ).
( j, m j ) ( 23 , 32 )
(ml , m s ) (1, 21 ) (1, − 12 ) (0, 21 ) (0, − 12 ) (−1, 21 ) 1 0 0 0 0
( 23 , 12 )
0
( 32 , − 12 ) ( 32 , − 32 ) ( 21 , 12 )
0
0
0
0 q
0
( 12 , − 12 ) 0
√2 3
√1 3
0 0 2 3
−
0
√1 3
0
0
0
0
√1 3
√2 3
(−1, − 12 ) 0 0
0
0
1
0
0 q
0
− √1
3
2 3
0
corresponding to these quantum numbers, which must therefore be the required eigenstate in this case. Moreover, it follows from (9.59) that the corresponding value of m j is equal to (l + s) and, as this is the maximum possible value of m j , it must also be equal to j. We therefore have | j = l + s, m j = l + si = |ml = li|m s = si or |l + s, l + si = |li|si
(9.61)
We can now use the ladder operators introduced at the end of Chapter 8 to obtain the state where j = l + s and m j = l + s − 1 from (9.61). We first note that it follows directly from their definition (8.30) that the ladder operators corresponding to states of the combined angular momentum are just sums of these corresponding to the individual angular momenta. In particular Jˆ− = Lˆ − + Sˆ −
(9.62)
Operating on (9.61) with Jˆ− and using (9.62) we get Jˆ− |l + s, l + si = |siLˆ − |li + |liSˆ − |si Hence, remembering the normalization results (8.51) and (8.52) }−1 (l + s)(l + s + 1) − (l + s)(l + s − 1) 1/2 |l + s, l + s − 1i = |si}−1 l(l + 1) − l(l − 1) 1/2 |l − 1i + |li}−1 s(s + 1) − s(s − 1) 1/2 |s − 1i i.e. l |l + s, l + s − 1i = l+ s
!1/2 |l − 1i|si +
s 1/2 |li|s − 1i l+ s
(9.63)
(9.64)
(9.65)
Angular Momentum II 213 There is a second state with m j = l + s − 1. It must have a different value of j and the only possibility is l + s − 1. To be orthogonal to |l + s, l + s − 1i, it must have the form !1/2 s 1/2 l |li|s − 1i (9.66) |l + s − 1, l + s − 1i = |l − 1i|si − l+ s l+ s A further application of Jˆ− to the states given in (9.65) and (9.66) produces expressions for the states |l + s, l + s − 2i and |l + s − 1, l + s − 2i; while an expression for the state |l + s − 2, l + s − 2i is found using orthogonality. We note, however, that some of these states will be absent if l or s is less than 2. In general, the process can be continued by further applications of Jˆ− , which create states of smaller m j , while adding in an additional value of j each time using orthogonality until the minimum value of j (i.e., |l − s|) is reached. Further applications of Jˆ− reduce m j further with values of j now being successively eliminated until m j has its minimum value of −l − s. At this point there is again only one state : |l + s, −l − si. An entirely equivalent procedure is to start with this state and apply the raising operator Jˆ+ until m j = l + s. Table 9.2 shows the results of applying this procedure to the particular case where l = 1 and s = 21 . The second row is obtained by substituting into (9.65) and the third and fourth by further applications of Jˆ− . The fifth row is obtained by substituting into (9.66), while the sixth is obtained from the fifth by applying Jˆ− . The process gets very complex for large quantum numbers, and it is easier to look up the results in a reference text or via the internet. Once the ClebschGordan coefficients have been determined, they can be used to construct a matrix representation of the energy operator from which the energy eigenvalues can be obtained. We will describe this for the case of the Hamiltonian operator given in (9.44) which we re-write as Hˆ 0 ≡ Hˆ 0(1) + Hˆ 0(2)
eB0 ˆ = f (r) ( Jˆ2 − Lˆ 2 − Sˆ 2 ) + (Lz + 2Sˆz ) 2me
(9.67)
the expectation value being taken over the radial coordinate only. Using the states | j, m j i as a basis, we calculate the matrix elements Hˆ 0 ( j,m j ),( j0 ,m0j ) = H(0(1) + H(0(2) j,m j ),( j0 ,m0j ) j,m j ),( j0 ,m0j )
2 = f (r) h j, m j )|( Jˆ − Lˆ 2 − Sˆ 2 )| j0 , m0j i +
eB0 h j, m j )|(Lˆz + 2Sˆz )| j0 , m0j i 2me
(9.68)
| j, m j )i is an eigenstate of ( Jˆ2 − Lˆ 2 − Sˆ 2 ) with eigenvalue [ j( j + 1) − l(l + 1) − s(s + 1)]}2 , so it follows (remembering orthogonality) that the first term in (9.68) contributes only to the diagonal elements of the matrix and we have (1) Hˆ 0 ( j,m j ),( j0 ,m0 ) = E0 [ j( j + 1) − l(l + 1) − s(s + 1)]δ j, j0 δm j ,m0j j
(9.69)
214 Quantum Mechanics We evaluate the second term by expressing the states | j, m j i in terms of the products |ml i|m s i using Clebsch-Gordan coefficients as described earlier, and get eB X ∗ (2) C hm s |hml |(Lˆz + 2Sˆz ) Hˆ 0 ( j,m j ),( j0 ,m0 ) = − j 2me m ,m ( j,m j )(ml ,ms ) l s X × C( j0 ,m0j )(m0 ,m0s ) |m0l i|m0s i l
m0l ,m0s
=
eB X X ∗ C C 0 0 0 0 2me m ,m 0 0 ( j,m j )(ml ,ms ( j ,m j )(ml ,m s ) l
sm
l
,m s
× hml |hm s |(Lˆz + 2Sˆz )|m0l i|m0s i
(9.70)
We can now use the fact that |ml i and |m s i are eigenstates of Lˆ z and Sˆ z , respectively, along with orthogonality to get e}B X ∗ H(0(2) = C C 0 0 (ml + 2m s ) (9.71) j,m j ),( j0 ,m0j ) 2me m ,m ( j,m j )(ml ,m s ) ( j ,m j )(ml ,m s ) l
s
(9.69) and (9.71) can be combined to form the complete Hamiltonian matrix, which can then be diagonalized to obtain the energy eigenvalues. Following this procedure and using the Clebsch-Gordan coefficients in Table 9.2, we get a matrix whose elements are listed in Table 9.3 for the case where l = 1 and s = 12 . To diagonalize this matrix, we have to form and evaluate a 6 × 6 determinant of the form shown in (9.7). If we examine the matrix set out in Table 9.3, we see that its determinant factorizes into a product of two 1 × 1 and two 2 × 2 determinants. The energy levels, E are then given by E + 2 µ B − E (E − E0 + 2µB B) (E − E0 − 2µB B) 0 √32 B 3 µB B E − 2 µ B − E × 0 √32 B 3 µB B
√ 2 µ B B 3 1 −2E0 − 3 µB B − E √ 2 µ B B 3 −2E0 + 31 µB B − E
=0
(9.72)
which leads to the following expressions for the energy levels E = E0 ± 2µB B 1/2 1 E = −E0 ± µB B + 9E02 ± 2E0 µB B + µ2B B2 2 1/2 1 E = −E0 ± µB B − 9E02 ± 2E0 µB B + µ2B B2 2
(9.73)
The expressions (9.73) are shown as functions of B in Figure 9.5. Worked Example 9.5 Check that (9.73) gives the same results as those obtained earlier in the limits of weak and strong applied field.
Angular Momentum II 215 TABLE 9.3 The matrix representing Hˆ 0 in the case where l = 1 and s = 21 . E0 = h f (r)i}2 and µB = e}/2me .
( j, m j ) ( 32 , 23 ) ( 32 , 21 ) ( 32 , − 21 ) ( 23 , − 23 ) ( 21 , 21 ) ( 21 , − 21 )
( 23 , 32 )
( j, m j ) ( 32 , − 23 ) 0
E0 +2µB B
( 23 , − 12 ) 0
( 32 , − 12 ) 0
( 12 , 12 ) 0
( 21 , − 12 ) 0
0
E0 + 23 µB B
0
0
0
E0 − 32 µB B
0
0
0
0
0
E0 −2µB B
0
0
0
−2E0 + 31 µB B
0
0
0
−2E0 − 31 µB B
√
0 0
−
√ 2 3 µB B
0
0 √
−
2 3 µB B
− 32 µB B
0
0 −
√ 2 3 µB B
Solution Considering the zero-field case first, we see that four of the states listed in (9.73) have spin-orbit energies equal to E0 , and two have spin-orbit energies of −2E0 , agreeing with (9.53) and Figure 9.3. If B is not zero, but µB B > E0 to get E = ±2µB B + E0 " !# 1 E0 E = −E0 ± µB B + µB B 1 ± = µB B or − E0 2 µB B " !# 1 E0 = −µB B or − E0 E = −E0 ± µB B − µB B 1 ± 2 µB B
These results are identical with our previous calculation (9.49).
Although we have considered only a few comparatively simple examples, the principles underlying the addition of angular momenta have wide application. For example, similar techniques can be applied to the problem of the coupling of the orbital and spin angular momenta of many-electron atoms when detailed calculations show that this can often be considered as a two-stage process: first the orbital and spin angular momenta of the different electrons separately combine to form total orbital and spin vectors; then these two quantities interact in a similar way to that
216 Quantum Mechanics described earlier in the case of one-electron atoms. Such a system is described as “Russell-Saunders” or “L-S ” coupling, and provides a satisfactory account of the atomic spectra of atoms with low atomic number. Heavy elements, however, are better described by “ j- j coupling” in which the orbital and spin angular momenta of each electron interact strongly to form a total described by the quantum number j. The totals associated with different electrons then combine together to form a grand total for the atom. The applications of many of the ideas discussed in this chapter and the last to the fields of nuclear and particle physics are also of great importance. This is not only due to the importance of angular momentum in these systems, but also because other physical quantities turn out to be capable of representation by a set of operators obeying the same algebra as do those representing angular momentum. An example of this is the quantity known as isotopic spin. Using this concept the proton and neutron can be thought of as separate states of the same particle (known as the nucleon), by assuming that the total isotopic-spin quantum number is 21 so that its “z component” can be represented by a quantum number having the values 21 (the proton) and − 12 (the neutron). Properties such as the charge and mass differences between the two states can then be treated as resulting from isotopic-spin-dependent interactions between the three quarks that constitute the nucleon. It is also found that some of the excited states of the nucleon have properties that can be described by assigning appropriate values to the two isotopic-spin quantum numbers.
9.7
PROBLEMS
9.1 Show that the Hermitian matrices 0 !1/2 1 1 } 0 [x] = 0 2 mω · ·
1 √0 2 0 · ·
√0 2 √0 3 · ·
0 0 √ 3 0 · ·
· · · · · · · · · · · · · · · ···
and 0 !1/2 i 0 1 [P] = mω} 0 2 · ·
−i √0 i 2 0 · ·
0√ −i 2 √0 i 3 · ·
0 0√ −i 3 0 · ·
· · · · · · · · · · · · · · · ···
obey the correct commutation rules for position and momentum in one dimension. Show that these can be used to obtain the energy levels of a harmonic oscillator of classical frequency ω and compare your answers with those obtained in Chapter 5. 9.2 Calculate the expectation values hSˆ x i hSˆ y i, hSˆ 2x i, and hSˆ y2 i for a spin-half particle known to be in an eigenstate of Sˆ z with eigenvalue 21 }. Show that the product hSˆ 2x ihSˆ y2 i is consistent with the uncertainty principle. 9.3 Show that for a spin-half particle the “anticommutator” Sˆ x Sˆ y + Sˆ y Sˆx = 0
Angular Momentum II 217 Show that this is not true in the case of orbital angular momentum. Is it true for a particle with any other total spin quantum number? 9.4 A spin-half particle, initially in an eigenstate of Sˆ x with eigenvalue 12 }, enters a Stern-Gerlach apparatus oriented to measure its angular momentum component in a direction whose orientation with respect to the x and z axes is defined by the spherical polar angles θ and φ. Obtain expressions for the probabilities of obtaining the results + 21 } and − 12 } from the second measurement. 9.5 Consider a Stern-Gerlach experiment carried out as in Figure 9.1, but with the second and third magnets arranged so that their measurement axes are in the xy plane at angles θ1 and θ2 respectively to the x axis. What are the probabilities of the two possible outcomes of the final measurement? 9.6 Spin-half particles, initially in an eigenstate of Sˆ z with eigenvalue 21 }, are directed into a SternGerlach apparatus oriented to measure the x component of spin. The lengths of the two possible paths through the second apparatus are precisely equal and the particles are then directed into a common path so that it is impossible to tell which route any particular particle followed. What result would you expect to obtain if the z component of the particle spin is now measured? Discuss this from the point of view of the quantum theory of measurement and compare your discussion with that of the two-slit experiment discussed in Chapter 4. 9.7 Show that the following matrices obey the appropriate commutation rules and have the correct eigenvalues to represent the three components of angular momentum of a spin-one particle 0 0 1 0 0 −i 0 1 0 } } 0 0 −i [L x ] = √ 1 0 1 [Ly ] = √ i [Lz ] = } 0 0 2 0 1 0 2 0 0 0 −1 i 0 Verify that the corresponding matrix representing the square of the total angular momentum also has the correct eigenvalues. 9.8 Obtain the eigenvectors of the matrices given in the previous problem and use these to find the relative probabilities of the possible results of the measurement of the x component of spin on a spin-one particle initially in an eigenstate of S z with eigenvalue }. 9.9 Obtain expressions for a set of 4 × 4 matrices representing Jˆz , Jˆ+ , Jˆ− , Jˆx , Jˆy , and Jˆ2 when j = 3/2 and find their eigenvalues and eigenvectors. Hint: Begin by assuming that Jˆz is represented by a diagonal matrix that has the correct eigenvalues. 9.10 A particle with angular momentum quantum numbers j = 3/2 and m j = 1/2 passes though a Stern Gerlach apparatus oriented so as to measure Jˆx . What possible results could be obtained and what are their relative probabilities? In each case, what is the state vector immediately after the measurement. 9.11 Find expressions for the changes in the energy levels due to spin-orbit interaction and the Zeeman effect in the case of the l = 3 states of a one-electron atom (i) in zero field, (ii) in a weak magnetic field and (iii) in a strong magnetic field. Express your answer in terms of the quantity E0 defined in (9.46). 9.12 The spins of two spin-half particles are coupled. Obtain expressions for the resulting states in terms of their total spin quantum number J and m J . 9.13 Two particles with j1 = 2 and j2 = 1 are coupled. Obtain expressions for the allowed states of the joint system (i) in a basis | j1 , m1 , j2 , m2 i and (ii) in the basis |J, m J i. 9.14 A particle with spin quantum numbers (1/2, 1/2) at t = 0 is subject to a magnetic field, B,in the x direction such that its Hamiltonian is Hˆ = −γB.Sˆ where γ is a constant. Obtain an expression for the state vector after time t.
IV Extensions and Approximation Schemes
CHAPTER
10
Time-independent Perturbation Theory and the Variational Principle CONTENTS Perturbation Theory for Nondegenerate Energy Levels . . . . . . . . . The atomic polarizability of hydrogen . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Perturbation Theory for Degenerate Energy Levels . . . . . . . . . . . . . The Stark effect in hydrogen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearly degenerate systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Electrons in a one-dimensional solid . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 The Variational Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The atomic polarizability of hydrogen . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1
222 226 228 231 234 235 238 240 243
Throughout our discussion of the general principles of quantum mechanics we have emphasized the importance of the eigenvalues and eigenfunctions of the operators representing physical quantities. However, we have seen that solving the basic eigenvalue equations determining these is often not straightforward. In the early chapters discussing the energy eigenvalue equation (or time-independent Schr¨odinger equation) for example, we found that this could often not be solved exactly and that, even when a solution was possible, it frequently required considerable mathematical analysis. Because of this, methods have been developed to obtain approximate solutions to eigenvalue equations. One of the most important of such techniques is known as time-independent perturbation theory and will be the first method to be discussed in the present chapter. Another, known as the variational principle, will be discussed later. We shall confine our discussion to the timeindependent Schr¨odinger equation, but generalization to other eigenvalue equations 221
222 Quantum Mechanics is quite straightforward. We shall mainly use the wave function representation, but we shall restate some of the important results in Dirac notation so that they can be readily transformed to a matrix representation if desired. ˆ representPerturbation theory can be applied when the Hamiltonian operator H, ing the total energy of the system, can be written in the form Hˆ = Hˆ 0 + Hˆ 0
(10.1)
where the eigenvalues (E0n ) and eigenfunctions (u0n ) of Hˆ 0 are assumed to be known and the operator Hˆ 0 represents an additional energy known as a perturbation, which is in some sense small compared with Hˆ 0 . Thus, if we know the solution to a problem described by Hˆ 0 (for example, the energy eigenvalues and eigenfunctions of the hydrogen atom) we can use perturbation theory to obtain approximate solutions to a related problem (such as the energy eigenvalues and eigenfunctions of a hydrogen atom subject to a weak electric field). We shall use (10.1) along with the eigenvalue equation for the unperturbed system Hˆ 0 u0n = E0n u0n
(10.2)
to express the eigenvalues and eigenfunctions of Hˆ in the form of a series, the leading (zero-order) term of which is independent of Hˆ 0 while the next (firstorder) term contains expressions linear in Hˆ 0 and so on; a similar series is also obtained for the energy eigenfunctions. The mathematical complexity of the series increases rapidly with ascending order, and perturbation theory is generally useful only if the series converges rapidly—which usually means that the additional energy due to the perturbation is much smaller than the energy difference between typical neighbouring levels. Accordingly, we shall confine our treatment to obtaining expressions for the eigenvalues of Hˆ that are correct to second order in Hˆ 0 as well as first-order corrections to the eigenfunctions. We shall illustrate the perturbation method by considering its application to several physical problems.
10.1
PERTURBATION THEORY FOR NONDEGENERATE ENERGY LEVELS
We first consider the effect of perturbations on energy levels that are not degenerate and return to the degenerate case later. We assume that the correct eigenvalues and eigenfunctions of Hˆ can be expressed as a series whose terms are of zeroth, first, second, etc., order in the perturbation Hˆ 0 , and derive expressions for each of these in turn. This can be done more conveniently if we rewrite (10.1) in the form Hˆ = Hˆ 0 + βHˆ 0
(10.3)
where β is a constant, and obtain results that are valid for all values of β, including the original case where β = 1. It is important to note that β itself is not assumed to be small, but first, second etc., order corrections are identified as terms in β, β2 , etc. The series for En and un are written as En = E0n + βE1n + β2 E2n + · · · (10.4) 2 un = u0n + βu1n + β u2n + · · ·
Time-independent Perturbation Theory and the Variational Principle 223 where the terms independent of β are known as zeroth-order terms, those in β are first order, those in β2 second order, and so on. We substitute these expressions into the energy eigenvalue equation ˆ n = E n un Hu (10.5) and obtain (Hˆ 0 + βHˆ 0 )(u0n + βu1n + β2 u2n + · · · ) = (E0n + βE1n + β2 E2n + · · · )(u0n + βu1n + β2 u2n + · · · )
(10.6)
Expanding this equation and equating the coefficients of the different powers of β we get Hˆ 0 u0n = E0n u0n Hˆ 0 u0n + Hˆ 0 u1n = E0n u1n + E1n u0n Hˆ 0 u1n + Hˆ 0 u2n = E0n u2n + E1n u1n + E2n u0n
(10.7) (10.8) (10.9)
We note first that (10.7) is identical to (10.2), which is to be expected as the former refers to a perturbation of zero order and the latter describes the unperturbed system. We can obtain expressions for the first-order corrections from (10.8) by expressing u1n as a linear combination of the complete set of unperturbed eigenfunctions u0m X u1n = anm u0m (10.10) m
Substituting (10.10) into (10.8) gives, after rearranging and using (10.7) X (Hˆ 0 − E1n )u0n = anm (E0n − E0m )u0m
(10.11)
m
We now multiply both sides of (10.11) by u∗0n and integrate over all space, using the fact that the u0m are orthonormal to get 0 E1n = Hnn
where 0 Hnn =
(10.12)
Z u∗0n Hˆ 0 u0n dτ ≡ h0n|Hˆ 0 |0ni
(10.13)
We have therefore obtained an expression for the first-order correction to the energy eigenvalues of the perturbed system in terms of the perturbation operator H 0 and the eigenfunctions of the unperturbed system, which are assumed to be known. We note that this first-order correction to the energy is just the expectation value of the perturbation operator, calculated using the unperturbed eigenfunctions. We can now obtain an expression for the first-order correction to the eigenfunction by multiplying both sides of (10.11) by u∗0k (where k , n) and again integrating over all space when we get (again using orthonormality) ank =
0 Hkn
E0n − E0k
k,n
(10.14)
224 Quantum Mechanics 0 is defined by an equation similar to (10.13) with u∗ replaced by u∗ and where Hkn 0n 0k is therefore a matrix element (cf. Chapter 9). Hence, using (10.4) and (10.10)
un = (1 + ann )u0n +
X
0 Hkn
k,n
E0n − E0k
u0k + higher order terms
(10.15)
We shall now show that ann can be put equal to zero. To do this we note that (10.15) must be normalized. That is Z 1 = u∗n un dτ Z X X = (1 + a∗nn )u∗0n + a∗nk u∗0k (1 + ann )u0n + ank u0k dτ k,n
k,n
Remembering that the u0k are orthonormal and retaining only first-order terms we get 1 + a∗nn + ann = 1 Hence ann = −a∗nn
(10.16)
and ann is therefore an imaginary number, which we can write as iγn where γn is real. Hence 1 + ann = 1 + iγn ' exp(iγn ) where the final equality is correct to first order as is (10.15). It follows that the factor (1 + ann ) in the first term of (10.15) is equal to exp(iγn ) to first order and the effect of this is simply to multiply u0n by a phase factor. Referring to (10.13) we see that the first-order change in the energy eigenvalue is independent of the phase of u0n and from (10.15) we see that the value of γn affects only the overall phase of the eigenfunction—at least as far as zero- and first-order terms are concerned. But we know that the absolute phase of an eigenfunction is arbitrary, so there is no loss of generality involved if we put γn = 0, leading to the following expression for the eigenfunction which is correct to first order un = u0n +
X
0 Hkn
k,n
E0n − E0k
u0k
(10.17)
or, in Dirac notation |ni = |0ni +
X h0k|Hˆ 0 |0ni k,n
E0n − E0k
|0ki + higher order terms
(10.18)
Proceeding now to consider second-order terms, we first expand u2n in terms of the unperturbed eigenfunctions X u2n = bnk u0k (10.19) k
Time-independent Perturbation Theory and the Variational Principle 225 and then substitute from (10.7), (10.10) and (10.19) into (10.9) which gives, after some rearrangement X X bnk (E0k − E0n )u0k + ank (Hˆ 0 − E1n )u0k = E2n u0n (10.20) k
Multiplying (10.20) by
k
u∗0n
and integrating over all space leads to X 0 E2n = ank Hnk k,n
=
X H0 H0 kn nk k,n
=
E0n − E0k
X |H 0 |2 kn k,n
E0n − E0k
(10.21)
0 = H 0∗ , which follows from the where we have used (10.14) and the relation Hkn nk definition of the matrix element and the fact that Hˆ 0 is a Hermitian operator. This procedure can be continued to obtain expressions for the second-order change in the eigenfunction as well as higher-order corrections to both the eigenfunctions and the eigenvalues. However, the expressions rapidly become more complicated and if the perturbation series does not converge rapidly enough for (10.12), (10.17) and (10.21) to be sufficient, it is usually better to look for some other method of solving the problem.
Worked Example 10.1 The anharmonic oscillator. Consider the case of a particle of mass m subject to a one-dimensional potential V(x) where 1 V = mω2 x2 + γx4 2
(10.22)
Calculate the energy of the ground state to first order in γ. Solution If γ were zero, the potential would correspond to a harmonic oscillator of classical frequency ω whose energy levels were shown in Chapter 5 to be 1 E0n = (n + )}ω 2
(10.23)
The corresponding ground-state energy eigenfunction was also evaluated in Chapter 5 as mω 1/4 mω u00 = exp − x2 (10.24) π} 2} The inclusion of the term γx4 in the potential (10.22) changes the problem from a harmonic oscillator to an anharmonic oscillator. If γ is small, we can apply perturbation theory and the first-order correction to the ground-state energy is obtained by substituting from (10.24) into (10.12) giving mω mω 1/2 Z ∞ E10 = γx4 exp − x2 dx π} } −∞ =
3}2 γ 4m2 ω2
(10.25)
226 Quantum Mechanics TABLE 10.1 The ground-state energy of an anharmonic oscillator calculated using first-order perturbation theory (E01 ) and by numerical methods (E0 )
.
g = 2}γ/m2 ω3
E01 / 21 }ω
E0 / 12 }ω
0.01 0.1 0.2
1.007 50 1.075 1.15
1.007 35 1.065 1.12
which, along with the zero-order term 12 }ω, constitutes the required expression for the groundstate energy. If we characterize the “strength” of the perturbation by the dimensionless quantity g defined as g = (2}/m2 ω3 )γ, it follows from (10.23) and (10.25) that the ratio of the firstorder correction to the unperturbed energy is just 3g/4. The perturbed energies calculated in this way are shown in Table 7.1 for several values of g, along with values of the total energy (as a fraction of 21 }ω) calculated by solving the Schr¨odinger equation numerically with the potential (10.22). We see that when the total correction to the zero-order energy is around 1%, the error involved in using first-order perturbation theory is less than 0.02% of the total, and that the approximation is still correct to about 3% when the total correction is about 20%.
The atomic polarizability of hydrogen The d.c. polarizability of an atom α is defined by the equation µ = α0 E where µ is the electric dipole moment induced by a steady uniform electric field E. We shall consider a hydrogen atom in its ground state for which the wave function is spherically symmetric so that the direction of the z axis can be chosen as parallel to the field. A field of magnitude E in this direction will contribute an extra term, Hˆ 0 , to the Hamiltonian which can be treated as a perturbation. Thus from elementary electrostatics Hˆ 0 = eEz (10.26) where z is the coordinate of the electron with respect to the proton at the origin. Substituting from (10.26) into (10.17), the ground-state eigenfunction u0 is given by X zk0 u0 = u00 + eE u0k (10.27) E − E0k 00 k,0 Classically, if the electron were at a position r with respect to the nucleus, the atom would have a dipole moment µ where µ = −er. It follows that the quantummechanical operator representing the dipole moment has the same form and the expectation value of its z component is therefore hµi where Z hµi = −e u∗0 zu0 dτ
Time-independent Perturbation Theory and the Variational Principle 227 and the components in the x and y directions vanish from symmetry considerations. Using (10.27) we have Z X |zk0 |2 hµi = −e u∗00 zu00 dτ + 2e2 E (10.28) E − E00 k,0 0k to first order in E, remembering that z∗k0 = z0k . The first term on the right-hand side of (10.28) vanishes because u00 is spherically symmetric and z is an odd function, so it follows that the atomic polarizability, α, is given by α=
2e2 X |zk0 |2 0 k,0 E0k − E00
(10.29)
Worked Example 10.2 Develop an alternative derivation of (10.29) using the fact that energy of interaction between an induced dipole and the applied field equals − 12 α0 E2 . Solution As the unperturbed atom is spherically symmetric, the first-order correction to the energy (10.12) is zero. To calculate the second-order term, we substitute (10.26) into (7.21) to get E2n = e2 E
X k,0
|zk0 |2 E00 − E0k
Comparing this with the expression given we get α=
2e2 X |zk0 |2 0 E0k − E00 k,0
as required.
The summation in (10.29) is over all the excited states of the hydrogen atom (including the continuum of unbound states) and considerable computational effort would be required to evaluate it exactly. However, we can calculate a maximum possible value or “upper bound” for α using what is known as the Uns¨old closure principle. In this case we replace each energy difference E0k − E00 in (10.29) by its smallest possible value, which is the difference between the ground and first excited energy levels. Writing this quantity as ∆E we get 2e2 X |zk0 |2 ∆E k,0 2e2 X = z0k zk0 − |z00 |2 ∆E k
0 α 6
(10.30)
As previously mentioned, the second term in (10.30) vanishes for symmetry reasons. To evaluate the first term we refer back to the discussion of matrix representations in Chapter 9, from which it follows that the quantities z0k constitute the elements of a matrix representing the operator z. If we now apply the standard rules for multiplying
228 Quantum Mechanics matrices we see that the first term in (10.30) is equal to the leading diagonal element of a similar matrix representing the operator z2 . Hence Z X 2 z0k zk0 = (z )00 = u∗00 z2 u00 dτ (10.31) k
1 = 3 πa0 =
Z 0
2π Z π Z ∞ 0
0
! 2r 4 2 exp − r cos θ sin θ dr dθ dφ a0
a20
(10.32)
where we have used spherical polar coordinates and the standard expression for the ground-state eigenfunction—cf. (6.70). The energy difference ∆E is obtained from the expressions (6.70) for the energy levels of the hydrogen atom 3e2 (10.33) ∆E = 8(4π0 )a0 Substituting from (10.32) and (10.33) into (10.30) we get α 6 64πa30 /3 = 67.0a30
(10.34)
This maximum value is within 15% of the experimental value of 57.8a30 . Because the upper bound can be evaluated using only the ground-state eigenfunction and the difference between the energies of the two lowest states, (10.31) can be usefully applied to the calculation of polarizabilities of other systems, even where the full set of matrix elements and energy differences are not known and expressions such as (10.30) cannot be evaluated. Moreover, as we shall see later in this chapter, the variational principle can often be used to generate a lower bound to α so that by combining both methods, a theoretical estimate can be made with a precision that is rigorously known.
10.2
PERTURBATION THEORY FOR DEGENERATE ENERGY LEVELS
The application of perturbation theory to degenerate systems can often lead to powerful insights into the physics of the systems considered. This is because the perturbation often “lifts” the degeneracy of the energy levels, leading to additional structure in the line spectrum. We saw examples of this in Chapter 9 where the inclusion of a spin-orbit coupling term split the previously degenerate energy levels (Figure 9.3) and further splitting resulted from the application of magnetic fields (Figure 9.5). We shall now develop a general method for extending perturbation theory to such degenerate systems. The mathematical reason why we cannot apply the theory in the form developed so far to the degenerate case follows from the fact that, if one or more of the energy levels E0k in (10.15) is equal to E0n , then at least one of the denominators (E0n −
Time-independent Perturbation Theory and the Variational Principle 229 E0k ) in (10.15) would be zero, leading to an infinite value for the corresponding term in the series. To understand the reason why this infinity arises, we consider the case of twofold degeneracy and take u01 and u02 to be two eigenfunctions of the unperturbed Hamiltonian Hˆ 0 with eigenvalue E01 . It is important to remember (see the discussion of degeneracy in Chapter 7) that any linear combination of u01 and u02 is also an eigenfunction of Hˆ 0 with the same eigenvalue, so that if we had solved the unperturbed problem in a different way we could well have come up with a different pair of eigenfunctions. Suppose that a small perturbation Hˆ 0 is applied and that as a result we have two states of slightly different energy whose eigenfunctions are v1 and v2 ; because the system is no longer degenerate, linear combinations of these are not energy eigenfunctions. Now imagine we remove the perturbation gradually. As it tends to zero, v1 and v2 will tend to eigenfunctions of the unperturbed system; let these be u01 and u02 . In general u01 and u02 will be different from u1 and u2 , although the former quantities will of course be linear combinations of the latter. Hence, if we start from u1 and u2 the application of a very small perturbation has to produce large changes in the eigenfunctions to get us to u01 and u02 . This is why the infinite terms appear in the perturbation series. The way out of this apparent impasse is to use other methods to obtain the correct starting functions u01 and u02 . Referring back to (10.11) we see that, if u01 is one of a degenerate pair whose other member is u02 , then the coefficient a12 on the right-hand side is indeterminate (because E01 = E02 ) and the infinity arises later in the derivation when we divide through by E01 − E02 , which equals zero. However, the appropriate zero-order eigenfunction must be some linear combination of u01 and u02 which we can write as v0 = C1 u01 + C2 u02 (10.35) where the constants C1 and C2 are to be determined. Using this in place of u01 , the equivalent equation to (10.11) becomes X (Hˆ 0 − E1 )(C1 u01 + C2 u02 ) = a1k (E01 − E0k )u0k (10.36) k
where the first-order correction to the energy of the degenerate states is now written as E1 . We now premultiply both sides of (10.36) by u∗01 and integrate over all space, then repeat the process using u∗02 to obtain the following pair of equations 0 0 − E1 )C1 + H12 C 2 = 0 (H11 (10.37) 0 0 H21 C1 + (H22 − E1 )C2 = 0 where we have assumed that u01 and u02 are orthogonal because it was shown in Chapter 7 that, although degenerate eigenfunctions need not be orthogonal, an orthogonal set can always be generated using Schmidt orthogonalization. Equations (10.37) have a nontrivial solution if, and only if, the determinant of the coefficients is zero 0 −E 0 H11 H12 1 =0 (10.38) H 0 0 H22 − E1 21 The solution of (10.38) leads to two possible values for E1 . Substituting these
230 Quantum Mechanics into (10.37), and using the condition that the zero-order eigenfunction must be normalized, leads to two sets of values for the coefficients C1 and C2 . Thus we have found two zero-order functions, v01 and v02 that are linear combinations of u01 and u02 and which, in general, correspond to different first-order corrections to the energy. Hence, as we expected, one effect of the perturbation may be to remove the degeneracy. These two linear combinations represent the appropriate choice of zero-order eigenfunctions for the problem and they can be used, along with the unperturbed eigenfunctions for the other states, to evaluate higher-order corrections in a similar way to the nondegenerate Rcase. Note that diagonalizing the matrix in (10.38) means that the matrix element v∗01 Hˆ 0 v02 = 0, and the infinite terms in the expansion have been removed. This discussion has been confined to the case of twofold degeneracy, but the extension to the general (say M-fold) case is quite straightforward. The correct zero-order eigenfunctions, v0m can be written as linear combinations of the original eigenfunctions u0k according to v0m =
M X
Cmk u0k
(10.39)
k=1
and the first-order corrections to the energy are obtained from the determinantal equation 0 0 0 H12 ··· H1M H11 − E1 0 −E 0 H 0 H22 H2M 1 ··· 21 = 0 (10.40) . . . .. .. .. 0 0 0 · · · H MM − E1 H M2 H M1 The coefficients Cmk are obtained by substituting the resulting values of E1 into the generalized form of (10.37) and applying the normalization condition. Once the appropriate set of unperturbed eigenfunctions has been found, higher-order corrections can be applied, just as in the nondegenerate case. However, in many cases, the results of physical interest emerge from the first-order treatment just described. Worked Example 10.3 A particle moves is a two-dimensional potential for the form V(r) = V0 xy/a2 V(r) = ∞
if − a 6 x 6 a, and − a 6 y 6 a if |x| > a, or |y| > a
where V0 is a constant, small enough for the term in xy to be treated as a perturbation. Obtain expressions for the energies of the ground and first excited states of this system to first order in V0 . Solution In the limit where V0 is zero, the energy eigenvalues and eigenfunctions can be deduced from the discussion of the three-dimensional box in Chapter 6. Ground state πx πy }2 π2 −1 u = a cos cos E0,11 = 11 2a 2a 4ma2
Time-independent Perturbation Theory and the Variational Principle 231 where the notation E0,11 means the zeroth order approximation to the energy of the state with quantum numbers n1 = 1 and n2 = 1. The ground state is nondegenerate, so the first-order correction to the energy is Z aZ a πy πx E1,11 = a−4 V0 cos2 cos2 xydxdy 2a 2a −a −a =0 First excited state 5}2 π2 4ma2 πx πy u12 = a−1 cos sin 2a a πy πx −1 cos u21 = a sin a 2a
E1,21 = E1,12 =
As this state is degenerate, we must calculate the matrix elements of the perturbation Z aZ a πx πy 0 cos2 H12,12 = a−4 V0 sin2 xydxdy = 0 2a a −a −a Z aZ a πx πy πx πy 1024 0 0 cos V0 H12,21 = H21,12 = a−4 V0 sin sin cos xydxdy = 2a a a 2a 81π4 Z a Z a −a −a πx πy 0 H21,21 = a−4 V0 sin2 cos2 xydxdy = 0 a 2a −a −a
and we have used standard expressions for the integrals. We obtain the energy levels using (10.38) 0 H12,21 −E1 H 0 = 0 −E 1 12,21 0 so that E1 = ±|H12,21 | = ± 1024 V . Substituting into (10.37), we get C1 = ±C2 , so that the 81π4 0 appropriate zero-order energy eigenfunctions are πy πx πy πx cos ± sin cos v = (2a)−1 sin a 2a a 2a
The Stark effect in hydrogen The effect of applying an electric field to the hydrogen atom in its ground state was discussed earlier in this chapter where we showed that there was no first-order change in the energy of this state. We now consider the first excited state which is fourfold degenerate1 in the absence of any perturbation and we shall find that this level is split into three when an electric field is applied. This is known as the Stark effect. The four degenerate states corresponding to the unperturbed first excited state of 1 For the purposes of this discussion we shall assume that the perturbation associated with the applied electric field is much larger than the splitting resulting from spin-orbit coupling discussed in the previous chapter, so that the latter effect can be ignored.
232 Quantum Mechanics hydrogen all have quantum number n equal to 2 and the eigenfunctions, referred to spherical polar coordinates (r, θ, φ), are—cf. Chapter 6, Equation (6.68) with Z = 1 u1 ≡ u200 = (8πa30 )−1/2 (1 − r/2a0 ) exp(−r/2a0 ) 3 −1/2 u2 ≡ u210 = (8πa0 ) (r/2a0 ) cos θ exp(−r/2a0 ) (10.41) 3 −1/2 u3 ≡ u211 = −(πa0 ) (r/8a0 ) sin θ exp(iφ) exp(−r/2a0 ) u ≡u = (πa3 )−1/2 (r/8a ) sin θ exp(−iφ) exp(−r/2a ) 4
21−1
0
0
0
The perturbation due to the applied field (assumed as usual to be in the z direction) is represented by Hˆ 0 where—cf. (10.26) Hˆ 0 = eEz = eEr cos θ
(10.42)
If we now proceed to evaluate the matrix elements we find that most of them vanish 0 = H 0 = H 0 = H 0 = H 0 = H 0 = 0 because in because of symmetry. Thus H11 44 43 34 33 22 0 = H0 = H0 = H0 = H0 = each case the integrand is antisymmetric in z, and H13 31 24 23 14 0 = H 0 = H 0 = 0 because these integrands all contain the factor exp(iφ) and the H32 42 41 integration with respect to φ is from φ = 0 to φ = 2π. This leaves only the matrix 0 and H 0 which are given by elements H12 21 0 0 H12 = H21 Z 2π Z π Z = 0
0
0
∞
(8πa30 )−1 (r/2a0 )(1 − r/2a0 )
× eEr cos2 θ exp(−r/a0 )r2 dr sin θ dθ dφ Z Z ∞ eE π 2 = 4 cos θ sin θ dθ (r4 − r5 /2a0 ) exp(−r/a0 ) dr 8a0 0 0 = −3eEa0
(10.43)
The determinantal equation (10.40) therefore becomes −3eEa0 0 0 −E1 −3eEa0 −E1 0 0 = 0 0 0 −E 0 1 0 0 0 −E1
(10.44)
leading to the following expressions for the first-order corrections to the energy and for the zero-order eigenfunctions—making use of the generalized form of (10.37) 1 E1 = 3eEa0 v1 = √ (u1 − u2 ) 2 1 (10.45) E1 = −3eEa0 v2 = √ (u1 + u2 ) 2 E =0 v =u and v =u 1
3
3
4
4
We should note several points about these results. First, the degeneracy of the last
Time-independent Perturbation Theory and the Variational Principle 233
A section at y = 0 through the position probability distribution corresponding to the n = 2 energy state of a hydrogen atom subject to an electric field in the z direction. (A number of contours have been omitted in the high peaks represented by the cross-hatched areas.)
FIGURE 10.1
two states has not been lifted by the perturbation so any linear combination of u3 and u4 is a valid eigenfunction with E1 = 0. Secondly, we see from Figure 10.1, which shows a plot of the probability densities |v1 |2 and |v2 |2 , that these are not symmetric across the plane z = 0, implying that in these states the atom possesses a dipole moment that is aligned antiparallel to the field in the first case and parallel to it in the second. The energy changes can therefore be thought of as arising from the interaction between these dipoles and the applied field. However, in contrast to the polarization of the ground state discussed earlier (Section 10.1) these dipole moments are not generated by the physical operation of the field and their creation does not require the expenditure of any energy: the polarized states are simply two of the possible eigenfunctions of the degenerate unperturbed system. The energy change is therefore proportional to the field magnitude E, rather than to E2 . Thirdly, we see from (10.45) that neither v3 nor v4 (and also by implication no linear combination of these functions) possesses a dipole moment and therefore neither has a first-order interaction with the field; the energy of these states is not affected by the perturbation to first order and they remain degenerate even in the presence of the field. Finally, we note that the occurrence of the Stark effect is a result of a mixing of the wave functions associated with the spherically symmetric 2s state and the 2p state with m = 0. In the hydrogenic atom these states are “accidentally” degenerate but this is not the case in other atomic systems where a linear Stark effect is not normally found. Apart from hydrogen, a linear splitting is observed only in the case of a few atoms such as helium where the energy difference between the s and p states is small compared with the perturbation resulting from a strong electric field (cf. Chapter 13).
234 Quantum Mechanics
Nearly degenerate systems Cases sometimes arise where the unperturbed energies of two or more states are nearly, but not exactly, equal. This means that, although the relevant terms in the perturbation expansion are not infinite, they can be very large and the perturbation series does not converge rapidly, if at all. However, the procedures just described for overcoming this are valid only if the states are actually degenerate. We shall show that a procedure similar to that described in the fully degenerate case can be applied in the nearly degenerate situation. If we reexamine the procedure described above, we see that it amounts to using a matrix representation for the part of the system that is described by the degenerate states u1 and u2 . Whereas, in the degenerate case, we were able to use perturbation theory, this worked only because the unperturbed energies of the two states are equal. In the more general case, we diagonalize the matrix of the whole Hamiltonian, Hˆ = Hˆ 0 + Hˆ 0 , over the subset of states spanned by u1 and u2 . Instead of (10.37) we then get (H11 − E)C1 + H12C2 = 0 (10.46) H21C1 + (H22 − E)C2 = 0 where H11 =
Z
and H12 =
0 u∗1 (Hˆ 0 + Hˆ 0 )u1 dτ = E01 + H11
Z
0 u∗1 (Hˆ 0 + Hˆ 0 )u2 dτ = H12
with similar expressions for H21 and H22 The equivalent of (10.37) is then 0 0 (H11 + E01 − E)C1 + H12 C 2 = 0 0 0 H21C1 + (H22 + E02 − E)C2 = 0
(10.47)
Nontrivial solutions to these equations are possible only if the determinant of the coefficients of C1 and C2 vanishes, leading to two allowed values of E. To assist comparison with the corresponding result in the fully degenerate case (10.37), we express E01 and E02 in terms of the average energy E¯ 0 = 21 (E01 + E02 ) and the difference ∆E0 = (E01 − E02 ); we also define E1 so that E = E¯ 0 + E1 . (10.47) then becomes 1 0 0 (H11 + ∆E0 − E1 )C1 + H12 C2 = 0 2 (10.48) 1 0 0 H21C1 + (H22 − ∆E0 − E1 )C2 = 0 2 which is identical to (10.37) in the fully degenerate case, when ∆E0 = 0. Once the solutions to (10.48) have been obtained, higher order corrections can be found using the procedure for nondegenerate systems. This procedure can be extended to systems where more than two states are nearly degenerate. The criterion for near-degeneracy is that the difference between the
Time-independent Perturbation Theory and the Variational Principle 235 unperturbed energies is less than the magnitude of the matrix element connecting them. That is 0 2 0 0 ∆E02 < |H12 | = H12 H21 (10.49)
Electrons in a one-dimensional solid The fact that metals can carry electric currents implies that they contain electrons that are mobile and not attached to particular atoms; such electrons are known as “free electrons.” In many common metals (e.g. sodium) all but one of the atomic electrons are tightly bound in “closed shells,” while the remaining electron is effectively free (see Chapter 13). In this example, we shall develop a one-dimensional model of a solid in which we study the behaviour of otherwise free electrons in a potential similar to that arising from the nuclei and other electrons in a real solid. We shall assume that this potential is weak enough to be treated as a perturbation and we shall see that this simple model provides quite a good explanation of the electrical conductivity of solids. The atoms in a crystalline solid are arranged on a regular lattice, so, in one dimension, we expect the potential experienced by a free electron to vary periodically with distance. If we call this repeat distance a, the simplest form of such a periodic potential is V(x) = V0 cos(2πx/a) (10.50) V0 is assumed to be small so that V can be treated as a perturbation, where the unperturbed eigenfunctions correspond to completely free electrons and therefore have the form u0k = L−1/2 exp(ikx) (10.51) where the factor L−1/2 ensures that the eigenfunctions are normalized when integrated over the distance L, which is taken as the length of the macroscopic piece of solid being considered. If this contains N atoms, it follows that L = Na. The unperturbed energies E0k are given by (cf. Section 5.4) E0k =
}2 k 2 2m
(10.52)
The matrix elements representing the linking of the states u0k and u0k0 are given by 0 Hkk 0
=L
Z
Na
−1
V0 = 2L V0 = 2 =0
exp(−ikx)V0 cos(2πx/a) exp(ik0 x) dx Z
0 Na
exp i(k0 − k + 2π/a)x + exp i(k0 − k − 2π/a)x dx
0
if k0 − k = ±
2π a
(10.53)
otherwise
0 are all zero. If k0 = −k the states u and We first note that the diagonal elements Hkk 0k
236 Quantum Mechanics u0k0 are degenerate and the matrix element connecting these states is nonzero only if k = ±π/a. The first-order changes to the energies of these state are then given by Equation (10.40) which now has the form −E1 21 V0 (10.54) 1 V −E = 0 1 2 0 Thus E1 = ± 21 V0 and the degeneracy of those states where k = ±π/a is lifted in first order. We now turn to states with other values of k, but where k − k0 = ±2π/a as above, and first consider the case where k is very different from ±π/a. The unperturbed energy difference E0k − E0k0 , which appears in the denominator of (10.21), is now large compared with 21 V0 , so the unperturbed states are neither fully, nor nearly, degenerate. The first-order perturbation correction is therefore zero and the secondorder term is small. However, if the values of k are in the near vicinity of, say, π/a, the matrix element of the perturbation linking this state with that with k0 = k − 2π/a ∼ −π/a will be much greater than the difference between the unperturbed energies of the states k and k0 , and we can treat the problem as an example of a “nearly degenerate” system discussed earlier.1 Referring to (10.48) and using (10.52), the determinantal equation becomes 1 21 ∆E0 − E1 2 V0 =0 (10.55) 1 V 1 − 2 ∆E0 − E1 2 0 which leads directly to 1 E1 = ± (∆E02 + V02 )1/2 2 Remembering the definitions of ∆E0 and E1 and using (10.52) we have 1/2 # !2 " }2 2 2πk 2π2 1 }2 4πk 4π2 2 E= k − + 2 ± − 2 + V0 2me a 2 2me a a a
(10.56)
(10.57)
When k = π/a, ∆E0 = 0, so E = }2 k2 /2me ± 21 V0 , which agrees with the value obtained above for the degenerate case. Now suppose that k is a little larger then π/a. If we put V0 = 0 in (10.57), we see that the positive and negative signs yield the unperturbed energy for the states k and k0 , respectively. Identical results are obtained when the signs of k and k0 are reversed. It follows that when V0 , 0, we should choose the positive sign in (10.57) when |k| > π/a and the negative sign when −π/a ≤ k ≤ π/a. In this way, we obtain the curve of E as a function of k shown in Figure 10.2. A notable feature of this diagram is the appearance of gaps of width V0 in the energy spectrum corresponding to the points k = ±π/a. The influence of these gaps on the physical properties of the system is considerable and we shall discuss this shortly. First, however, we must consider more carefully the boundary conditions to be imposed on the problem. 1 The only other state linked by the perturbation to that with k ∼ π/a is one with k ∼ 3π/a and the energy difference between these two is large.
Time-independent Perturbation Theory and the Variational Principle 237
The energy of an electron in a one-dimensional metal as a function of wave number k, compared with that of a free electron (broken line), showing the energy gaps at k = ±π/a. FIGURE 10.2
The length L was previously described as the length of the piece of solid under consideration so the most obvious boundary condition would be to require the wave function to be zero outside the range between x = 0 and x = L. Such “fixed boundary conditions” can be used, but they give rise to considerable mathematical complications associated with the fact that plane waves of the form (10.51) are no longer eigenfunctions of the unperturbed system. It is therefore more convenient to use “periodic boundary conditions” where we impose the condition that uk (x) = uk (x + L) and the allowed values of k are therefore those where k = 2nπ/L, n being an integer. This boundary condition corresponds physically to the one-dimensional line being bent to form a closed loop, when the allowed values of k follow from the condition that the wave function be single-valued. As bulk properties, such as the electrical conductivity, of real solids are not dependent on parameters such as the size and shape of the sample being considered, it is a reasonable assumption that the precise form of boundary conditions will not be important when discussing similar properties of our one-dimensional model. The most convenient form can therefore be chosen and this turns out to correspond to periodic boundary conditions. We previously saw that the number of atoms associated with a length L of this one-dimensional solid is N where L = Na and a is the repeat distance of the periodic potential. The number of energy states associated with values of k between plus and minus π/a is also N, because the periodic boundary conditions require that successive allowed values of k are separated by 2π/L. It follows from the Pauli exclusion principle to be discussed in Chapter 13 that each state can be occupied
238 Quantum Mechanics by no more than two electrons, which must have opposite spin. In the case where each atom contributes one free electron to the system, the ground state of the system will therefore correspond to the (N/2) states of lowest energy being filled. If now an electric field is applied, some of the electrons will be excited so that there are more with (say) positive k than with negative k and, as k is proportional to the electron momentum, an electric current results. In this case the one-dimensional solid behaves like a metal. However, if there are two free electrons per atom, all the states in the band of energies whose k values lie between plus and minus π/a will be occupied. If V0 is large enough, an applied field will be unable to excite any electrons, and there will therefore be as many electrons with positive k as there are with negative k; no current then flows, and the one-dimensional solid in this case is an insulator. If, however, V0 (and consequently the size of the energy gap) is small enough, some electrons will be thermally excited into states with |k| greater than π/a where they will be mobile and can respond to a field. This excitation leaves behind “holes” in the otherwise full band and it can be shown that these have the same electrical properties as positively charged mobile particles. Such a system is a one-dimensional intrinsic semiconductor. This discussion can be generalized to the case where there are more than two electrons per atom if we remember that a general periodic potential has Fourier components with repeat distances a, a/2 etc., giving rise to energy gaps at k = ±π/a, ±2π/a, etc. It is then clear that any one-dimensional solid that possesses an odd number of free electrons per atom will be a metal, while an even number will imply insulating or semiconducting properties. Similar arguments can be deployed to explain the difference between metals and insulators and a wide range of other physical phenomena displayed by real three-dimensional solids. The interested reader should consult a textbook on solid state physics (for example, J. R. Hook and H. E. Hall, Solid State Physics, Wiley, New York, 1991).
10.3
THE VARIATIONAL PRINCIPLE
This approximate method can be applied to eigenvalue problems where we know the operator Hˆ and can make some guess as to the form of the eigenfunction corresponding to the lowest energy state of the system. The variational principle can then be used to improve our guessed eigenfunction and produce a maximum value, or upper bound, to the ground-state energy. The usefulness of the variational principle depends strongly on our ability to make a good initial guess, and the symmetry and other physical properties of the system can often be useful guides to this. The variational principle can be extended to consider states other than the ground state of the system, but this has limited applications and will not be discussed here. We shall now proceed to describe the method in more detail. Let v be a function that is to approximate to the ground-state eigenfunction of ˆ We shall now show that the expectation value of Hˆ the Hamiltonian operator H. calculated using v cannot be less than the true ground-state eigenvalue E0 . Allowing
Time-independent Perturbation Theory and the Variational Principle 239 for the fact that v may not be normalized, this expectation value is given by R ˆ dτ v∗ Hv ˆ = R hHi (10.58) v∗ v dτ Using completeness, v can be expressed as a linear combination of the true, but unknown, eigenfunctions of Hˆ X v= an un (10.59) n
Using (10.59) the numerator of (10.58) can be expressed as Z Z X X ∗ ˆ v Hv dτ = a∗n u∗n Hˆ am um dτ n
=
X
=
X
m
Z a∗n am
ˆ m dτ u∗n Hu
nm
a∗n am Em δnm
nm
=
X
|an |2 En
(10.60)
n
using orthonormality. If E0 is the ground-state eigenvalue it must be smaller than all the other En ’s, so we can write Z X ˆ dτ > E0 v∗ Hv |an |2 (10.61) n
The denominator of (10.58) can be similarly shown to be given by Z X v∗ v dτ = |an |2
(10.62)
n
and combining (10.58), (10.61), and (10.62) we get ˆ > E0 hHi
(10.63)
as required. Thus by guessing an approximate eigenfunction, v, we can never get an expectation value of the energy that is less than the true value, so this expectation value constitutes an upper bound to the ground-state energy. Moreover, if v contains ˆ has its minimum value, when v adjustable parameters, these can be varied until hHi will represent the best possible approximation of this form. Worked Example 10.4 The harmonic oscillator. Suppose that we did not know the ground-state eigenfunction for a one-dimensional harmonic oscillator, but have guessed that it is similar to that for a particle in an infinite square well. That is v = a−1/2 cos(πx/2a) a 6 x 6 a =0 |x| > a
240 Quantum Mechanics Find the value of a for which the energy is closest to the true value. Solution We wish to find the value of a for which the expectation value of the energy is a minimum. Using (10.58) ! Z a }2 ∂2 1 2 2 ˆ = a−1 hHi + cos(πx/2a) − mω x cos(πx/2a) dx 2m ∂x2 2 −a where m is the particle mass and ω the classical frequency of the oscillator. The differentiation and integration are straightforward, if a little tedious, and we get ! 2 2 ˆ = } π + mω2 a2 1 − 1 hHi (10.64) 6 π2 8ma2 ˆ we put ∂hHi/∂a ˆ =0 To find an expression for a corresponding to the minimum value of hHi and get ! 1 1 }2 π2 + 2mω2 a − 2 = 0 − 6 π 4ma3 so that ! " #1/4 3 } 1/2 a=π mω 4(π2 − 6) !1/2 } = 2.08 (10.65) mω ˆ which we We can substitute from (10.65) into (10.64) to obtain the minimum value of hHi, ˆ min . Thus write as hHi !1/2 1 π2 − 6 ˆ hHimin = }ω 2 3 = 0.568}ω
(10.66)
Thus we have been able to set an upper bound to the energy of the oscillator, which is within 14% of the true value of 12 }ω. Figure 10.3 compares the approximate wave function evaluated using the value of a given in (10.65) with the exact ground-state eigenfunction as obtained in Chapter 5.
The atomic polarizability of hydrogen This problem was previously treated by perturbation theory where we showed (10.34) that an upper bound to the polarizability α is 64πa30 /3. As the energy of an atom in an electric field of magnitude E is equal to − 21 0 αE2 , this corresponds to a lower bound for the energy; we shall now use the variational principle to set an upper bound for the energy and hence a lower bound to α. We showed earlier—cf. (10.26)—that the Hamiltonian for a hydrogen atom subject to a uniform electric field E in the positive z direction is Hˆ = Hˆ 0 + eEz
(10.67)
We expect the application of an electric field to polarize the atom, which means that
Time-independent Perturbation Theory and the Variational Principle 241
The wave function corresponding to the ground state of a onedimensional harmonic oscillator (broken line) compared with that for an infinitesided well whose parameters are chosen using the variational principle (continuous line). FIGURE 10.3
the negatively charged electron will be pulled in a direction opposite to the field, while the positively charged nucleus is pushed parallel to it. The wave function should therefore be such as to enhance the probability of finding the electron at negative rather than positive values of z. A simple (unnormalized) function with these properties is v = u0 (1 − βz) (10.68) where u0 is the ground-state eigenfunction of Hˆ 0 given in Chapter 6 (6.68) and β is a constant. We note that u0 is a real function, and we also assume β to be real. We can therefore write the expectation value of Hˆ as R u0 (1 − βz)(Hˆ 0 + eEz)u0 (1 − βz) dτ ˆ = R hHi u20 (1 − βz)2 dτ Z Z Z ˆ ˆ ˆ = u0 H0 u0 dτ − β u0 (zH0 + H0 z)u0 dτ + eE u0 zu0 dτ Z Z Z + β2 u0 zHˆ 0 zu0 dτ − 2βeE u0 z2 u0 dτ + β2 eE u0 z3 u0 dτ Z Z Z 2 2 ÷ u0 dτ − 2β u0 zu0 dτ + β u0 z2 u0 dτ (10.69) Remembering that u0 is the ground-state energy eigenfunction in the zero-field case, which is spherically symmetric, all integrals that involve odd powers of z, are equal to R zero on symmetry grounds. Moreover, u0 zHˆ 0 zu0 dτ also vanishes, as can be shown by substituting the spherical polar expressions for the operators and eigenfunctions
242 Quantum Mechanics R (cf. Chapter 6), while the integral u0 z2 u0 dτ was earlier shown (10.32) to be equal to a20 . Equation (10.69) therefore becomes ˆ = hHi
E0 − 2eEβa20 1 + β2 a20
' E0 (1 − β2 a20 ) − 2eEβa20
(10.70)
where we have ignored powers of β higher than the second as we are interested in the case of low fields where β is expected to be small. We can now differentiate to find a ˆ value of β corresponding to the minimum value of hHi −2a20 βE0 − 2eEa20 = 0 that is β=−
eE E0
(10.71)
ˆ min of hHi ˆ is Substituting into (10.70), the minimum value hHi e2 E2 2 a E0 0 = E0 − 8π0 a30 E2
ˆ min = E0 + hHi
(10.72)
using (6.70). This represents an upper bound to the ground-state energy, so the corresponding lower bound for the polarizability is 16πa30 . We can combine this with the upper bound obtained by perturbation methods and the Uns¨old approximation (10.34) to give (10.73) 16πa30 6 α 6 64πa30 /3 Taking as a best estimate, the average of these two bounds, we get α = (18.8 ± 2.8)πa30
(10.74)
and we have therefore obtained a theoretical value for the polarizability that is rigourously correct to within about 14%. Moreover, it is in good agreement with the experimental value of 18.4πa30 . A combination of perturbation and variational calculations can quite often be used in a similar way to that just described to obtain upper and lower bounds to the theoretical estimates of physical quantities that cannot be calculated directly— perhaps because the unperturbed eigenvalues and eigenfunctions are unknown. Such methods are frequently used—for example, in the calculation of the polarizabilities of nonhydrogen atoms and of molecules, and in estimating the “Van der Waals” interactions between atoms in gases. In favourable cases, by the use of sophisticated trial functions in the variational method and special techniques to estimate the sums over states in the perturbation expressions, the upper and lower bounds can be made to approach each other very closely so that highly accurate theoretical results can be obtained.
Time-independent Perturbation Theory and the Variational Principle 243
10.4
PROBLEMS
10.1 A particle moves in a potential given by V = V0 cos(πx/2a)
V =∞
(−a 6 x 6 a)
(|x| > a)
where V0 is small. Treat this problem as a perturbation on the case of a particle in an infinite-sided square well potential of width 2a and calculate the changes in the energies of the three lowest energy states to first order in V0 . 10.2 A particle moves in a potential given by V = V0
(−b 6 x 6 b)
V =0
(b < |x| 6 a)
V =∞
|x| > a
Calculate the energies of the three lowest states to first order in V0 using a similar procedure to that in Problem 1. 10.3 The eigenvalues and eigenvectors of the hydrogenic atom were derived in Chapter 6 on the assumption that the nucleus is a point charge. A better description is a small sphere of radius of radius about 1000 times smaller than the Bohr radius. If the nuclear charge had the form of a thin spherical shell of radius a, show that the change to the Hamiltonian from the point-charge approximation is ! Ze2 1 1 − , r≤a H 0 = 0, r > a =− 4π0 a r Use perturbation theory to calculate the resulting first-order changes to the 1s, 2s, and 2p energy levels of a hydrogenic atom. (Hint neglect the variation of the size of the wave function over the volume of the nucleus.) Evaluate these quantities assuming a = 10−15 m and express your results as a fraction of the magnitude of the total ground-state energy. 10.4 More realistically, the nuclear charge can be considered to be spread uniformly over the volume of the nucleus. Show that in this case the change to the Hamiltonian from the point-charge approximation is Hˆ 0 = 0, r > a ! Ze2 3 r2 1 Hˆ 0 = − − 3 − , r≤a 4π0 a a r Use perturbation theory to calculate the resulting first-order changes to the 1s, 2s, and 2p energy levels of a hydrogenic atom. Evaluate these quantities assuming a = 10−15 m and express your results as a fraction of the magnitude of the total ground-state energy. 10.5 Consider a system whose Hamiltonian has the form 0 0 0 E1 0 E2 0 + V 0 0 0 E3
V 0 0
0 0 0
where V a (12.74) U = U0 r 6 a As before, U0 may be positive or negative, but now need not be small. We shall, however, consider explicitly only the case where the incident energy E is greater than U0 and also confine our attention to s-wave scattering, where l = 0 and ka 1; it follows that our results will be applicable to the repulsive case only if U0 }2 /2ma2 . In the region r 6 a the general solution to the Schr¨odinger equation when l = 0 is 1 0 (A sin k0 r + B0 cos k0 r) r
(12.75)
k0 = [2m(E − U0 )/}2 ]1/2
(12.76)
where A0
B0
and and are constants. However, the wave function must be finite everywhere, including the point r = 0, so the cosine term cannot exist and B0 must be equal to zero. The eigenfunction in the region r > a has the form given by (12.52) with l = 0. If we apply the condition that both the wave function and its first spatial derivative must be continuous at r = a, we get A[exp(−ika) − exp(i(ka + 2δ0 ))] = A0 sin k0 a ikA[− exp(−ika) − exp(i(ka + 2δ0 ))] = k0 A0 cos k0 a
Scattering 295 If we divide the second of these equations by the first we get ik[− exp(−ika) − exp(i(ka + 2δ0 ))] k0 cos k0 a = exp(−ika) − exp(i(ka + 2δ0 )) sin k0 a We can now multiply the numerator and denominator on the left-hand side by exp(−iδ0 ) and express the resulting expressions in trigonometric form as k cot(ka + δ0 ) = k0 cot k0 a which leads, after a little manipulation, to k tan ka tan k0 a + k0 k tan k0 a − k0 tan ka k0 cot k0 a ' k(1 − k0 a cot k0 a)
cot δ0 =
(12.77) (12.78)
where (12.78) is the limiting form of (12.77) when ka 1. The scattering cross section for s-wave scatteringis given by the l = 0 term in (12.68) as 4π 2 sin δ0 k2 4π = 2 k (1 + cot2 δ0 )
σ=
(12.79)
using standard trigonometric identities. Thus, given k, a and U0 , we can calculate cot δ0 using (12.78), and then σ using (12.79). We note from (12.78) that if the magnitude of k0 a happens to be such that 0 k a cot k0 a = 1 (i.e., if k0 a = 4.49, 7.73, 10.9, . . .) then cot δ0 will be infinite, the cross section will be zero, and the incident particles will “diffract past” the potential without being significantly scattered. We note that this condition can be satisfied simultaneously with ka being small only if U0 is negative—that is, if we are considering a potential well rather than a step. In such a case, the absence of scattering means that the wave function inside the well fits smoothly onto the plane wave outside. The potential well need not be square for this to occur, and the phenomenon has been observed experimentally in the case of the scattering of electrons by rare-gas atoms when it is known as the Ramsauer-Townsend effect: electrons of energy about 0.7 eV pass through helium gas without being significantly scattered. The opposite extreme to this case occurs when the parameters are such that k0 a is approximately equal to an odd multiple of π/2 so that cot k0 a is small. We can then ignore the second term in the denominator of (12.78) and write the cross section as σ=
4π k2 + k02 cot2 k0 a
(12.80)
and we note that, as ka and cot k0 a are both much less than one, the cross section (12.80) is much greater than the geometrical cross section πa2 . This case is known as
296 Quantum Mechanics resonant scattering. We can understand the condition for resonant scattering a little better by making some approximations. Because ka is small and k0 a ' π/2, k 0. Thus the energies of the states where the spins are parallel are different from those where the spins are antiparallel and the system behaves as though the spins were strongly coupled. It is important to remember that this coupling results from the requirement that the wave function be antisymmetric and is quite independent of the interaction between the magnetic dipoles associated with the spins, which is very much weaker. It turns out that the exchange integral X is indeed positive so that the triplet states have lower energy than the corresponding singlets. However, the ground state of helium is a singlet, because in this case both electrons occupy the same (1s) orbital and no corresponding triplet state exists. Another type of physical system that exhibits a similar coupling of spins is a ferromagnet such as iron: in this case, the exchange interaction between electrons associated with neighbouring atoms produces a negative X and hence a triplet ground state. The magnetic moments associated with the atomic spins therefore all point in the same direction, leading to a large magnetic moment overall. C=
ZZ
u∗1 (r1 )u∗2 (r2 )
Many-particle Systems 317 Returning to the helium atom, the coupling of the spins via the exchange energy means that the spin-orbit interaction is between the total spin and the total orbital angular momentum, whose magnitude is determined by the quantum numbers of the single-particle functions u1 and u2 . Given these quantum numbers, the splitting due to spin-orbit coupling can be calculated by the methods discussed in Chapter 9, as can the response of the system to weak and strong magnetic fields. Helium is therefore an example of Russell-Saunders coupling, which was discussed briefly at the end of Chapter 9 where we mentioned that it applied most usefully to atoms of low atomic number. The fact that the spatial part of a singlet wave function is symmetric, while that of a triplet is antisymmetric, means that electric dipole transitions between any members of these two sets of states are forbidden. This is because the electric dipole operator (cf. Chapter 11) is a function of the electron positions only and, like all physical operators, is symmetric with respect to particle interchange so that the matrix elements connecting the singlet and triplet states are of the form ZZ ψ∗s (1, 2)Qˆ s (1, 2)ψa (1, 2) dτ1 dτ2 ZZ =− ψ∗s (2, 1)Qˆ s (2, 1)ψa (2, 1) dτ1 dτ2 (13.37) where the subscripts s and a signify symmetry and antisymmetry respectively and Qˆ s is any symmetric operator. But the labels on the variables of integration have no significance, so the integrals on each side of (13.37) are identical, which can be true only if they both vanish. The same is true for any perturbation that is a function of either the electron positions or of the spins only. It follows that singlet to triplet transitions, and vice versa, occur only as a result of collisions between atoms, where the property conserved is the total antisymmetry of the wave function representing all the electrons in both atoms, rather than that of each atom separately. Because transitions between singlet and triplet states occur so rarely, they were not observed at all in the early days of spectroscopy and, at one time, helium was thought to be a mixture of two gases: “parahelium” with the singlet spectrum and “orthohelium” with that of the triplet. These properties also underlie the principles of operation of the helium-neon laser. When an electric discharge is passed through helium gas, it causes many of the atoms to be ionized. When they subsequently recombine, there is an appreciable probability of some of the atoms being in one of the excited triplet states. These then decay, emitting photons, until they reach the lowest energy triplet state where they remain, because a further transition to the singlet ground state is forbidden. It turns out that the excitation energy of the lowest triplet state of helium is close to that of one of the excited states of neon so that in a mixture of the two gases there is an appreciable probability of neon atoms being excited by collisions with appropriate helium atoms. If the partial pressures of the gases in such a mixture are right, an inverted population can be generated in which more neon atoms are in this excited state than are in another state, which has lower energy. Radiation whose frequency matches the energy difference between the two neon states can therefore be amplified
318 Quantum Mechanics
The energies of the singlet and triplet states of the helium atom compared with the energy levels of the hydrogen atom. The number beside each energy level is the principal quantum number n. Each triplet state is actually three closely spaced levels split by the spin-orbit interaction. Zero energy corresponds to a singly ionized He atom. FIGURE 13.3
by stimulated emission and laser action ensues. This can be maintained continuously by passing a suitable discharge through the mixture, thereby replenishing the number of triplet helium atoms. We shall complete our discussion of the helium atom by considering some of the more quantitative features of the energy levels. Figure 13.3 shows an energy-level diagram in which the experimentally measured energies of the states are referred to an origin that corresponds to the energy required to just remove one electron from the atom, leaving the other in the ground state of the remaining He+ ion. The energy levels of the hydrogen atom derived in Chapter 6 are also included for comparison. We see that, on this scale, the ground-state energy of helium is very much lower than that of hydrogen and that this is also true, though to a lesser extent, of the first and second excited states. However, states with high values of the principal quantum number n (i.e., greater than about five) have energies that are very little different from the corresponding hydrogen-atom values, being nearly independent of the orbitalangular-momentum quantum number l. This last point is consistent with the fact that helium, like hydrogen, exhibits a first-order Stark effect (cf. Section 10.2). We can account for all these features using the theory developed earlier. The first point to note is that all states where both electrons are assigned to excited orbitals turn out to have such a high energy that they cannot exist as bound states. All the bound states of the helium atom therefore involve linear combinations of products in which
Many-particle Systems 319 one of the one-electron functions is the lowest energy (1s) orbital. Accordingly, the quantity previously referred to as the principal quantum number n denotes the orbital other than the 1s that is involved in the description of the state. We first consider highly excited states and note that the one-electron orbitals with high values of n have appreciable magnitude only at large distances from the nucleus, while the 1s eigenfunction is significant only at small values of r (cf. Figure 6.4). If we now examine the expressions for the Coulomb and exchange energies (13.36) we see that, for a highly excited state, the exchange energy is very small because u1 and u2 occupy almost entirely different regions of space. The singlet and triplet states can therefore be treated as degenerate, which means that the energy can be estimated by assigning one electron to each of the orbitals u1 and u2 in the same way as in the noninteracting case discussed earlier. The total energy of such a state would then be equal to the sum of the energies of the ground and appropriately excited states of the He+ ion, along with the Coulomb energy, which represents the mean electrostatic interaction between electrons assigned to these two orbitals. This result still has the limitations of first-order perturbation theory and a better approximation is obtained by assuming that the particles can be independently assigned to each single-particle orbital but that the form of each orbital is affected by the presence of the other electron. Thus the outer electron moves in a potential due to the nucleus and the charge distribution of the spherically symmetric inner orbital, which is equivalent to that of a single positive charge (+e) at the nucleus. The outer electron therefore has an energy close to that of the corresponding state in hydrogen, while the inner electron has an energy equivalent to that of the 1s orbital of a He+ ion, because the field inside the spherically symmetric charge distribution of the outer orbital is zero. The total energy of such an excited state is therefore close to the sum of the groundstate energy of a He+ ion and that of the appropriately excited state of the hydrogen atom, which is just the result obtained experimentally and described in the previous paragraph (cf. Figure 13.3). Turning our attention to the ground state of the helium atom, in this case both electrons are associated with the 1s orbital, so the exchange term again vanishes (this time because u1 ≡ u2 ) and the effect of the Coulomb term is that each electron, to some extent, screens the other from the full potential of the doubly charged nucleus. Evaluation of the Coulomb integral for the ground state yields a value of 34.0 eV which, when combined with the unperturbed energy of −54.4 eV, produces a value of −20.4 eV for the total, in quite good agreement with the experimental result of −24.8 eV, given the limitations of first-order perturbation theory in this context. Finally, we consider states, other than the ground state, that are not highly excited. We can now no longer assume that the exchange energy is negligible, and the singlet and triplet states are therefore expected to have different energies, as is observed experimentally. Moreover, the effective potential experienced by the electrons is now significantly different from the Coulomb form, so states whose orbitals have the same values of n, but different l, are no longer degenerate. We see in Figure 13.3 that both these effects are most pronounced for states constructed from an orbital with n = 2 along with the 1s orbital, and become progressively smaller as n increases until, for
320 Quantum Mechanics n greater than about five, the states can be considered as “highly excited” in the sense described above.
13.7
SCATTERING OF IDENTICAL PARTICLES
We close this chapter by considering the problem of scattering where the particles in the incident beam are identical to that constituting the scatterer; an example to which we shall return shortly is the scattering of alpha particles by the nuclei of 4 He atoms. We note that in a scattering problem there are no external forces, so we can separate the relative motion from that of the centre of mass using the procedure discussed previously for isolated systems. Neglecting spin for the present, the energy eigenfunctions can be written as ψ(r1 , r2 ) = U(R)u(r)
(13.38)
where R and r are as defined in (13.5) and we remember that m1 = m2 , R = 12 (r1 + r2 ) and r = r1 − r2 . Clearly U(R) is completely symmetric with respect to particle exchange, so the symmetry of the wave function is determined by that of u(r). Moreover, particle interchange is equivalent to a reversal of the sign of r so exchange symmetry is equivalent to parity in this case. In Chapter 12 (cf. Equations (12.62) to (12.65)) we showed that the energy eigenfunction in a scattering problem could be written in the form " # 1 −1/2 V exp(ikz) + exp(ikr) f (θ) (13.39) kr where f (θ) =
∞ X (2l + 1) exp(iδl ) sin δl Pl (cos θ)
(13.40)
l=0
The expression (13.39) does not have a definite parity so the correct eigenfunction in the case of identical particles must be equal to a linear combination of this expression with a similar one whose sign of r is reversed. That is ( ) 1 −1/2 u(r) = (2V) [exp(ikz) ± exp(−ikz)] + exp(ikr)[ f (θ) ± f (π − θ)] (13.41) kr where the positive and negative signs apply when u(r) is symmetric and antisymmetric, respectively. In the absence of scattering, the wave function is proportional to the first term in square brackets in (13.41), so it follows by an argument similar to that leading to Equations (12.66) and (12.67), that the differential cross section σ(θ) is given by σ(θ) = | f (θ) ± f (π − θ)|2 (13.42) We know from Chapter 6 that Pl (cos θ) = (−1)l Pl (cos(π − θ))
Many-particle Systems 321 so it follows from (13.41) and (13.42) that the cross section is made up from partial waves with only even values of l in the symmetric case and with only odd values in the asymmetric case. It also follows from (13.42) that the contributions to the total cross section from those partial waves that have nonzero amplitude are four times what they would be if the particles were distinguishable. Half of this factor of four arises because a recoiling target particle cannot be distinguished from one scattered out of the incident beam, but the remaining factor of two has no such simple explanation. All these results have been confirmed experimentally by scattering experiments involving spin-zero particles: the scattering of α particles by 4 He atoms, for example, departs from the predictions of the Rutherford scattering formula (12.42) in a manner that can be quantitatively accounted for on this basis. When the particles have nonzero spin, the overall symmetry of the wave function is determined by that of both the spatial and the spin-dependent parts. We shall not consider the general case here, but confine our discussion to spin-half fermions where the total wave function must be antisymmetric. It follows that all partial waves contribute to the scattering, but for those with even l the spin part must be antisymmetric, leading to a singlet state with zero total spin, while for those with odd l the spin function must be a spin-one triplet state. Thus if, for example, the energy of the incident particle were such that the scattering is s-wave in character, we can conclude that scattering will occur only when the particle spins correspond to a singlet state. As, on average, only one-quarter of the particle pairs is in this state, the total cross section is reduced by a factor of four to be the same as if the particles were distinguishable. After such a scattering event, therefore, the incident and target particles move off in a state where both the total orbital angular momentum and the total spin are zero. An example of the application of this principle is the scattering of protons by hydrogen nuclei at an energy large enough to ensure that the scattering is predominantly due to the nuclear force rather than the Coulomb interaction, but small enough to ensure that the contribution from partial waves with nonzero values of l is negligible. As will be discussed in Chapter 16, measurements of the properties of such pairs of protons can be used to compare the predictions of quantum mechanics with those of “local hidden variable” theories.
13.8
PROBLEMS
13.1 Show that for a many-particle system subject to no external forces, the total energy and total momentum can always be measured compatibly, but that, if the particles interact, the individual momenta cannot be measured compatibly with the total energy. 13.2 Show that if Ψ(r1 , r2 , . . . , rN ) is the wave function representing a system of N indistinguishable particles, then the probability of finding any one of these in the element dτ around the point r is given by P1 (r) dτ where Z Z P1 (r) =
...
|Ψ(r, r2 , . . . , rN )|2 dτ2 . . . dτN
Show that in the case of two identical noninteracting particles (either bosons or fermions) P1 (r) =
1 (|u1 (r)|2 + |u2 (r)|2 ) 2
where u1 and u2 are appropriate single-particle eigenfunctions.
322 Quantum Mechanics 13.3 Two indistinguishable noninteracting spin-half particles move in an infinite-sided one-dimensional potential well. Obtain expressions for the energy eigenvalues and eigenfunctions of the ground and first excited states and use these to calculate P1 (x)—as defined in Problem 13.2—in each case. Show that in those of these states where the total spin is zero, there is generally a finite probability of finding the two particles at the same point in space, but that this probability is zero if the total spin is one. 13.4 Show that if the potential described in Problem 13.3 has width Na and contains a large number, N, of noninteracting fermions, the total ground-state energy is approximately equal to (π2 }2 /24ma2 )N. 13.5 Using the one-dimensional model described in Problem 13.4, estimate the average energy per particle for (i) a free-electron gas and (ii) a gas of 3 He atoms, assuming a mean linear density of 4 × 109 particles per metre in each case. Compare your answer to (ii) with a similar estimate of the ground-state energy per atom of 4 He gas. Above about what temperature might you expect these gases to have similar properties? 13.6 A bound system consisting of two neutrons is almost, but not quite, stable. Estimate the energy of the virtual energy level of this system given that the low-energy s-wave scatteringcross section of neutrons by neutrons is about 60 × 10−28 m2 . Hint: cf. Chapter 12 Problem 14.
V Advanced Topics
CHAPTER
14
Relativity and Quantum Mechanics CONTENTS 14.1 14.2
14.3 14.4 14.5 14.6
Basic Results in Special Relativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Dirac Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A free particle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A particle in an electromagnetic field . . . . . . . . . . . . . . . . . . . . . . . . . . . A particle in a magnetic field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A particle subject to a scalar potential . . . . . . . . . . . . . . . . . . . . . . . . . . Antiparticles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other Wave Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quantum Field Theory and the Spin-Statistics Theorem . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
326 326 327 331 331 332 335 337 337 341
The early twentieth century saw two major revolutions in the way physicists understand the world. One was quantum mechanics itself and the other was the theory of relativity. Important results also emerged when these two ideas were brought together: some of these have been referred to in earlier chapters—in particular the fact that fundamental particles such as electrons have intrinsic angular momentum (spin) was stated to be a relativistic effect. This chapter explores the relationship between relativity and quantum mechanics more deeply A full understanding of relativistic quantum mechanics is well outside the scope of this book, but many of the important results can be understood at this level and these will be discussed in this chapter. After a short summary of the main results of special relativity, we show how combining this with the time-dependent Schr¨odinger equation leads to a new wave equation known as the Dirac equation. We show how the Dirac equation requires particles such as electrons to have intrinsic angular momentum (spin) and we explore some of its other consequences. The chapter concludes with an outline of some more advanced ideas known as quantum field theory. Our treatment is confined to the quantum effects associated with special relativity. The reconciliation of quantum mechanics with general relativity is still a topic of 325
326 Quantum Mechanics active research and few, if any, generally accepted results have emerged from it so far.
14.1
BASIC RESULTS IN SPECIAL RELATIVITY
Special relativity was introduced in Chapter 3 where it was shown that this modifies classical (i.e., non-quantum) kinematics and dynamics to encompass phenomena that become increasingly significant when particles move at speeds comparable to the speed of light. All the results we need for this chapter have been derived in Chapter 3 and we summarize these here. The kinematics of relativity are governed by the Lorentz transformation (see (3.16) and problem 1 in Chapter 3), which relates the position and time coordinates (x, y, z, t) of an event observed in one inertial frame of reference to those (x0 , y0 , z0 , t0 ) observed from another moving at constant velocity v in the x direction relative to the first. We have x − vt x0 = p 1 − v2 /c2
y0 = y
z0 = z
t − (v/c2 )x t0 = p 1 − v2 /c2
(14.1)
where c is the speed of light. The momentum p and energy E of a particle also transform by a Lorentz transformation, where x, y, z and ct in (14.1) are replaced by p x , py , pz and E/c, respectively, the energy including the “rest-mass” energy mc2 . In the case of a free particle, the energy and momentum are related by E 2 = p2 c2 + m2 c4
(14.2)
Equation (14.2) is an example of a Lorentz invariant—i.e., it has the same form in all inertial frames of reference, as can be verified by applying the Lorentz transformation to the components of p and E and substituting into (14.2). Also, if we define by E = mc2 + , the nonrelativistic limit is when mc2 , in which case (14.2) reduces to = p2 /2m, the non-relativistic Newtonian expression for kinetic energy. We now consider the case where the particle is not free, but subject to an electromagnetic field. This field can, in turn, be represented by a scalar potential, φ(r) and a vector potential, A(r) where the electric field is E = −∇φ + ∂A/∂t and the magnetic field is B = ∇ × A. The components of A along with φ then form a fourvector similar to that formed by p and E (see chapters 2 and 3) It can then be shown that the behaviour of a particle with charge q is determined by (E − qφ)2 = (p − qA)2 c2 + m2 c4
(14.3)
14.2 THE DIRAC EQUATION To develop relativistic quantum theory, we look for a wave equation that relates to (14.3) in a manner analogous to the relation between the Newtonian expression for kinetic energy and the Schr¨odinger equation—cf. Chapters 5 and 6. This was first done successfully for a particle such as an electron by P. A. M. Dirac in 1928 and
Relativity and Quantum Mechanics 327 our treatment basically follows his approach. In the same way as the Schr¨odinger equation cannot be derived from classical mechanics because it is essentially new physics, any relativistic equation can only be guessed at by a process of induction, and its truth or otherwise must be established by testing its consequences against experiment. Following Dirac, we start this process by considering the time-dependent Schr¨odinger equation ∂ ˆ (14.4) i} ψ = Hψ ∂t and begin, as we did in the nonrelativistic case, by considering the case of a free particle.
A free particle We consider a particle, such as an electron, with mass m (dropping the subscript e to keep the notation as simple as possible) in a field-free region of space —i.e., where we can assume that A = φ = 0. Following the principles of Postulate 3 in Chapter 7, we can assume that the energy operator Hˆ can be expressed in terms of the momentum operator Pˆ in the same way as E is related to p in the classical limit. Hence, using (14.2) and (14.4) q ∂ψ i} = [Pˆ 2 c2 + m2 c4 ] ψ (14.5) ∂t Before we can proceed, we have to know how to deal with an expression such as the right-hand side of (14.5), which has the form of an operator that is a square root of another operator. There is no definite prescription for handling this, but we do know that, in order to preserve Lorentz invariance, positional coordinates and time must appear in a similar way in any relativistic theory. So, because the left-hand side ∂ of (14.5) is linear in ∂t∂ , if we are to make the standard replacement Pˆx = −i} ∂x , etc., the right-hand side should be linear in these quantities. Applying this principle and following Dirac we write i}
∂ ψ = [cα1 Pˆx + cα2 Pˆy + cα3 Pˆz + βmc2 ]ψ ∂t
(14.6)
where the αi and β are dimensionless quantities that are independent of position and time. For (14.5) and (14.6) to be consistent, the squares of the operators on the righthand sides of these equations should be equivalent. That is [cα1 Pˆx + cα2 Pˆy + cα3 Pˆz + βmc2 ]2 = c2 Pˆ 2 + m2 c4
(14.7)
Multiplying out the left-hand side of (14.7) and equating the corresponding terms leads to α21 = α22 = α23 = 1 (14.8) α1 α2 + α2 α1 = α2 α3 + α3 α2 = α3 α1 + α1 α3 = 0 α1 β + βα1 = α2 β + βα2 = α3 β + βα3 = 0
328 Quantum Mechanics This is obviously not possible if αi and β are scalar numbers, and Dirac showed that the simplest expressions for αi and β consistent with (14.8) are a set of 4 × 4 matrices 0 0 α1 = 0 1 0 0 α3 = 1 0
0 0 1 0 0 0 0 −1
0 1 0 0
1 0 0 0 1 0 0 −1 0 0 0 0
0 0 0 0 0 i α2 = 0 −i 0 i 0 0 1 0 0 0 1 0 β = 0 0 −1 0 0 0
−i 0 0 0 0 0 0 −1
(14.9)
The reader should check that these expressions have the properties set out in (14.8). Equation (14.7), complemented by the definitions (14.9), constitutes the Dirac equation for a free particle. We analyze it further to draw out its physical significance and to explore its solutions. One remarkable feature of the expressions (14.9) is that the three 4 × 4 matrices representing αi are all made up from 2×2 Pauli spin matrices introduced in Chapter 9 and defined in (9.15) as " # " # " # 0 1 0 −i 1 0 σx = σy = σz = (14.10) 1 0 i 0 0 −1 while β can be expressed in terms of the 2 × 2 unit matrix (represented by I). Thus " # " # 0 σj I 0 αj = j = 1, 2, 3 and β= (14.11) σj 0 0 −I where σ1 ≡ σ x etc. The following properties of the Pauli spin matrices, which can be easily proved by direct substitution, will be used shortly σ2x = σ2y = σ2z = I (14.12) σ x σy = −σy σ x = iσz " # " # i 0 −i 0 σ x σy = σy σ x = 0 −i 0 i Similar results hold for cyclic permutations of the Cartesian coordinates. The fact that the Dirac equation is a matrix equation implies that ψ is a vector formed out of four functions of position and time ψ1 " # ψ ψ ψ = 2 ≡ + ψ3 ψ− ψ4 where ψ+ and ψ− are two-component vectors defined by (14.13).
(14.13)
Relativity and Quantum Mechanics 329 We now use the above to rewrite the Dirac equation (14.6) as ∂ψ+ (σ · P)cψ− + mc2 ψ+ = i} ∂t ∂ψ − (σ · P)cψ+ − mc2 ψ− = i} ∂t
(14.14)
When we introduced the nonrelativistic Schr¨odinger equation in Chapters 5 and 6, we separated out the time dependence and derived the time-independent equation by putting ψ = u exp(−iEt/}) Following the same procedure here, we get (σ · P)cu− + mc2 u+ = Eu+ (14.15) 2 (σ · P)cu+ − mc u− = Eu− We note again that all the terms in (14.15) are 2 × 2 matrices or 2 × 1 vectors, so that each of these equations actually represents two equations. We now look for solutions of (14.15) that will correspond to the time-independent part of the wave function of a relativistic free particle. We first use the second of (14.15) to express u− in terms of u+ u− =
c (σ · P)u+ mc2 + E
(14.16)
We substitute this into the first of (14.15) to get (σ · P)2 c2 u+ = (E − mc2 )(E + mc2 )u+
(14.17)
Expanding the first term on the left-hand side in Cartesian coordinates and rearranging, we get [σ2x Pˆ 2x + (σ x σy + σy σ x )Pˆ x Pˆ y +◦ ]c2 u+ = (E 2 − m2 c4 )u+
(14.18)
where the symbol +◦ implies repeating the previous expressions twice, with cyclic permutation of the Cartesian coordinates. Using (14.12), we can rewrite (14.18) as Pˆ 2 c2 u+ = (E 2 − m2 c4 )u+
(14.19)
−}2 c2 ∇2 u+ = (E 2 − m2 c4 )u+
(14.20)
or where we have used the differential operator representation of momentum. This equation has plane wave solutions for u+ = v+ exp (ik · r) where
" # v v+ = 1 v2
(14.21)
330 Quantum Mechanics and v1 and v2 are constants, so that v+ is an eigenvector of spin in a direction determined by the relative values of v1 and v2 . We also have E 2 = }2 c2 k2 + m2 c4
(14.22)
This is identical to (14.2) provided p = }k, which is just the de Broglie relation relating momentum and wave vector introduced in Chapter 4. The components of u− can now be obtained by substituting (14.21) into (14.16). For convenience, we choose the direction of the z axis to be parallel to k so that ! ∂ v+ exp ikz = kv− exp(ikz) (14.23) (σ · P)u+ = σz −i} ∂z where, using (14.10) v− = σz and therefore
" # " # v1 v = 1 v2 −v2
" # c v1 exp(ikz) u− = 2 −v mc + E 2
Collecting results from (14.13), (14.21) and (14.25) we have v1 v 2 ψ = }kc v exp[i(kz z − Et/})] mc2 +E 1 }kc − mc2 +E v2
(14.24)
(14.25)
(14.26)
We first note that the wave function has the form of a plane wave, which is an eigenfunction of the momentum operator, −i}∇, as well as the energy operator; this result is the same as found in the nonrelativistic case discussed in Chapter 7. We now consider how (14.26) relates to the idea of spin developed in Chapter 9. The wave function, ψ, is not in general an eigenvector of spin, because v+ and v− , may be eigenvectors of different spin components. However, there are two particular cases where ψ does represent a spin eigenstate. The first of these is the nonrelativistic limit, where }kc