256 70 5MB
English Pages [294] Year 2017
Theoretical Materials Science
Prof. Matthias Scheffler and Dr. Christian Carbogno Theory Department Fritz Haber Institute of the Max Planck Society Faradayweg 4-6 14195 Berlin, Germany
Version: August 31, 2017
Table of Contents
0 Foreword 0.1 Introductory Remarks . . . . . . . . . . 0.2 Literature for This Lecture . . . . . . . . 0.3 Symbols and Terms Used in This Lecture 0.4 Atomic Units . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
2 . 2 . 8 . 11 . 12
1 Introduction 13 1.1 Many-Body Hamilton Operator . . . . . . . . . . . . . . . . . . . . . . . . 13 1.2 Separation of Dynamics of Electrons and Nuclei . . . . . . . . . . . . . . . 16 1.2.1 Adiabatic Approximation or Born-Oppenheimer Approximation . . 16 1.2.2 Static Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.2.3.1 Structure, Lattice Constant, and Elastic Properties of Perfect Crystals 22 1.2.3.2 Lattice Waves (Phonons) . . . . . . . . . . . . . . . . . . 23 1.3 The Ewald Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2 Some Definitions and Reminders, incl. Fermi Statistics 2.1 Statistical Mechanics . . . . . . . . . . . . . . . . . . . . 2.2 Fermi Statistics of the Electrons . . . . . . . . . . . . . . 2.3 Some Definitions . . . . . . . . . . . . . . . . . . . . . .
of Electrons 30 . . . . . . . . . . 30 . . . . . . . . . . 31 . . . . . . . . . . 33
3 Electron-Electron Interaction 3.1 Electron-Electron Interaction . . . . . . . . . . . . . . . . 3.2 Hartree Approximation . . . . . . . . . . . . . . . . . . . 3.3 Hartree-Fock Approximation . . . . . . . . . . . . . . . . 3.4 Exchange Interaction . . . . . . . . . . . . . . . . . . . . . 3.5 Koopmans’ Theorem . . . . . . . . . . . . . . . . . . . . . 3.6 The Xα Method . . . . . . . . . . . . . . . . . . . . . . . 3.7 Thomas-Fermi Theory and the Concept of Screening . . . 3.8 Density-Functional Theory . . . . . . . . . . . . . . . . . . 3.8.1 Meaning of the Kohn-Sham Single-Particle Energies 3.8.2 Spin Polarization . . . . . . . . . . . . . . . . . . . 3.8.3 Two Examples . . . . . . . . . . . . . . . . . . . . . 3.9 Summary (Electron-Electron Interaction) . . . . . . . . .
2
. . . . . . . . . . . . . . . . ǫi . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
38 38 39 46 51 57 59 61 68 78 84 85 88
4 Lattice Periodicity 92 4.1 Lattice Periodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.2 The Bloch Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 4.3 The Reciprocal Lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5 The Band Structure of the Electrons 118 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 5.1.1 What Can We Learn from a Band Structure? . . . . . . . . . . . . 121 5.2 General Properties of ǫn (k) . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.2.1 Continuity of ǫn (k) and Meaning of the First and Second Derivatives of ǫn (k)125 5.2.2 Time Reversal Symmetry . . . . . . . . . . . . . . . . . . . . . . . . 129 5.2.3 The Fermi Surface . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.3 The LCAO (linear combination of atomic orbitals) Method . . . . . . . . . . 133 5.3.1 Band Structure and Analysis of the Contributions to Chemical Bonding137 5.4 The Density of States, N (ǫ) . . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.5 Other Methods for Solving the Kohn-Sham Equations of Periodic Crystals 141 5.5.1 The Pseudopotential Method . . . . . . . . . . . . . . . . . . . . . 142 5.5.2 APW and LAPW . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 5.5.3 KKR, LMTO, and ASW . . . . . . . . . . . . . . . . . . . . . . . . 142 5.6 Many-Body Perturbation Theory (beyond DFT) . . . . . . . . . . . . . . . 142 6 Cohesion (Bonding) in Solids 6.1 Introduction . . . . . . . . . . . . . . . . . 6.2 Van der Waals Bonding . . . . . . . . . . . 6.3 Ionic bonding . . . . . . . . . . . . . . . . 6.4 Covalent bonding . . . . . . . . . . . . . . 6.4.1 Hybridization . . . . . . . . . . . . 6.5 Metallic bonding . . . . . . . . . . . . . . 6.6 Hydrogen bonding . . . . . . . . . . . . . 6.6.1 Some Properties of Hydrogen bonds 6.6.2 Some Physics of Hydrogen bonds . 6.7 Summary . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
7 Lattice Vibrations 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Vibrations of a Classical Lattice . . . . . . . . . . . . . 7.2.1 Adiabatic (Born-Oppenheimer) Approximation . 7.2.2 Harmonic Approximation . . . . . . . . . . . . 7.2.3 Classical Equations of Motion . . . . . . . . . . 7.2.4 Comparison between ǫn (k) and ωi (q) . . . . . . 7.2.5 Simple One-dimensional Examples . . . . . . . 7.2.5.1 Linear Chain with One Atomic Species 7.2.5.2 Linear Chain with Two Atomic Species 7.2.6 Phonon Band Structures . . . . . . . . . . . . . 7.3 Quantum Theory of the Harmonic Crystal . . . . . . . 7.3.1 One-dimensional Quantum Harmonic Oscillator 7.3.2 Three-dimensional Quantum Harmonic Crystal 3
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
143 143 146 153 159 161 165 175 175 176 179
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . . . . .
181 . 181 . 182 . 182 . 183 . 185 . 188 . 188 . 189 . 192 . 194 . 197 . 198 . 199
7.3.3 7.3.4
7.4
Lattice Energy at Finite Temperatures . . . . . . . . . . . . . . . . 201 Phonon Specific Heat . . . . . . . . . . . . . . . . . . . . . . . . . . 203 7.3.4.1 High-Temperature Limit (Dulong-Petit Law) . . . . . . . 205 7.3.4.2 Intermediate Temperature Range (Einstein Approximation, 1907)205 7.3.4.3 Low Temperature Limit (Debye Approximation, 1912) . . 207 Anharmonic Effects in Crystals . . . . . . . . . . . . . . . . . . . . . . . . 209 7.4.1 Thermal Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 7.4.2 Heat Transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
8 Magnetism 214 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 8.2 Macroscopic Electrodynamics . . . . . . . . . . . . . . . . . . . . . . . . . 215 8.3 Magnetism of Atoms and Free Electrons . . . . . . . . . . . . . . . . . . . 218 8.3.1 Total Angular Momentum of Atoms . . . . . . . . . . . . . . . . . . 218 8.3.2 General derivation of atomic susceptibilities . . . . . . . . . . . . . 219 8.3.2.1 ∆E LL : Larmor/Langevin Diamagnetism . . . . . . . . . . 222 8.3.2.2 ∆EVV : Van Vleck Paramagnetism . . . . . . . . . . . . . . 223 8.3.2.3 ∆E Para : Paramagnetism . . . . . . . . . . . . . . . . . . . 224 8.3.2.4 Paramagnetic Susceptibility: Curie’s Law . . . . . . . . . . 226 8.3.3 Susceptibility of the Free Electron Gas . . . . . . . . . . . . . . . . 227 8.3.4 Atomic magnetism in solids . . . . . . . . . . . . . . . . . . . . . . 231 8.4 Magnetic Order: Permanent Magnets . . . . . . . . . . . . . . . . . . . . . 234 8.4.1 Ferro-, Antiferro- and Ferrimagnetism . . . . . . . . . . . . . . . . . 234 8.4.2 Interaction versus Thermal Disorder: Curie-Weiss Law . . . . . . . 235 8.4.3 Phenomenological Theories of Ferromagnetism . . . . . . . . . . . . 239 8.4.3.1 Molecular (mean) field theory . . . . . . . . . . . . . . . . 239 8.4.3.2 Heisenberg and Ising Hamiltonian . . . . . . . . . . . . . . 242 8.4.4 Microscopic origin of magnetic interaction . . . . . . . . . . . . . . 248 8.4.4.1 Exchange interaction between two localized moments . . . 248 8.4.4.2 From the Schrödinger equation to the Heisenberg Hamiltonian251 8.4.4.3 Exchange interaction in the homogeneous electron gas . . 252 8.4.5 Band consideration of ferromagnetism . . . . . . . . . . . . . . . . 255 8.4.5.1 Stoner model of itinerant ferromagnetism . . . . . . . . . 257 8.5 Magnetic domain structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 9 Transport Properties of Solids 9.1 Introduction . . . . . . . . . . . . . . . . 9.2 The Definition of Transport Coefficients 9.3 Semiclassical Theory of Transport . . . . 9.4 Boltzmann Transport Theory . . . . . . 9.5 Superconductivity . . . . . . . . . . . . . 9.6 Meissner effect . . . . . . . . . . . . . . 9.7 London theory . . . . . . . . . . . . . . . 9.8 Flux quantization . . . . . . . . . . . . . 9.9 Ogg’s pairs . . . . . . . . . . . . . . . . 9.10 Microscopic theory - BCS theory . . . . 9.10.1 Cooper pairs . . . . . . . . . . . 4
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
266 . 266 . 267 . 268 . 271 . 274 . 276 . 279 . 280 . 282 . 282 . 282
9.11 Bardeen-Cooper-Schrieffer (BCS) Theory . . . . . . . . . . . . . . . . . . . 285 9.12 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
1
0 Foreword 0.1
Introductory Remarks
A solid or, more generally, condensed matter is a complex quantum-mechanical manybody-system consisting of ∼1023 electrons and nuclei per cm3 . The most important foundations of its theoretical description are electronic-structure theory and statistical mechanics. Due to the complexity of the quantum-mechanical many-body problem there is a large variety of phenomena and properties. Their description and understanding is the purpose of this lecture course on “condensed-matter theory”. Keywords are, e.g., crystal structure, hardness, magnetism, electrical conductivity, thermal conductivity, superconductivity, etc.. The name of this lecture (Theoretical Materials Science) indicates that we intend to go a step further, i.e., “condensed matter” is replaced by “materials”. This is a small, but nevertheless important generalization. When we talk about materials, then, in addition to phenomena and properties, we also think of potential applications, i.e., the possible function of materials, like in electronic, magnetic, and optical devices (solar cells or light emitters), sensor technology, catalysts, lubrication, and surface coatings (e.g. with respect to protection against corrosion, heat, and mechanical scratch-resistance). It is obvious that these functions, which are determined to a large extent by properties on a nanometer length scale, play an important role in many technologies, on which our lifestyle and the wealth of our society are based. In fact, every new commercial product is based on improved or even novel materials. Thus, the identification of new materials and their property profiles can open new opportunities in fields such as energy, transport, safety, information, and health. About 200,000 different inorganic materials are known to exist, but the properties, e.g., hardness, elastic constants, thermal or electrical conductivity, are know for very few of them. Considering that there are about 100 different atoms in the periodic table and that unit cells of crystals can have sizes anywhere between one and hundred atoms (or even more), considering surfaces, nanostructures, heterostructures, organic materials and organic/inorganic hybrids, the amount of different, possible materials is practical infinite. Thus, it is likely that there are hitherto unknown or unexplored materials with unknown but immensely useful property profiles. Finding structure in this huge data space of different materials and reaching better understanding of the materials properties is a fascinating scientific goal of fundamental research. And, there is no doubt that identifying new materials can have truly significant socio-economical impact. This role of materials science has been realized by at least two influential “science and engineering initiatives” of US presidents: the “national nanotechnology initiative” by B. Clin2
ton (2000) and the “materials genome initiative” by B. Obama (2011). Such initiatives come with no money. They just identify and describe a promising new route. In the mentioned two examples, these initiatives were worked out convincingly, and various funding agencies (from basic research to engineering to military) got together and created significant programs. This has changed the research landscape not just in the US but in the world. – In both cases our research group had been on these routes already and fully profited from the developments without changing. The field of electronic-structure theory, applied to materials-science problems, is in an important, active phase with rapid developments in • the underlying theory, • new methods, • new algorithms, and • new computer codes. For several years now a theory is evolving that, taking advantage of high and highest performance computers, allows an atomistic modeling of complex systems with predictive power, starting from the fundamental equations of many-body quantum mechanics. The two central ingredients of such ab initio theories are the reliable description of the underlying elementary processes (e.g. breaking and forming of chemical bonds) and a correct treatment of their dynamics and statistical mechanics. Because of the importance of quantum-mechanical many-body effects in the description of the interactions in polyatomic systems up to now a systematic treatment was rarely possible. The complexity of the quantum mechanical many-body problem required the introduction of approximations, which often were not obvious. The development of the underlying theory (density-functional theory, DFT) started in 19641 . But “only” since 1978 (or 1982)2 reliable calculations for solids have been carried out. Only these allow to check the validity of possibly reasonable (and often necessary) approximations and to give the reasons for their success; or demonstrate which approximations have to be abandoned. Today this development has reached a feasible level for many types of problems, but it is not completed yet. With these DFT-based (and related) computational methods, the approximations, which in many existing text books are introduced ad hoc, can be inspected. Further it is possible to make quantitative predictions, e.g., for the properties of new materials. But still theoretical condensed-matter physics or theoretical materials science is in an active state of development. Phenomena that are still only badly understood (or not at all) include phase transitions, disorder, catalysis, defects (meta-, bi-stabilities), properties of quantum 1
To some extend earlier concepts by Thomas, Fermi, Hartree, and Slater may be considered as preparation for this route. 2 V.L. Moruzzi, J.F. Janak, and A.R. Williams, Calculated Electronic Properties of Metals, Pergamon Press (1978) ISBN 0-08-022705-8; and M.T. Yin and M.L. Cohen, Theory of static structural properties, crystal stability, and phase transformations: Application to Si and Ge, Phys. Rev. B 26, 5668 (1982).
3
dots, crystal growth, crystal structure prediction, systems containing f -electrons, hightemperature-superconductivity, electronic excitation, and electrical and thermal transport. Modern theoretical materials science is facing two main challenges: 1. Explain experimentally found properties and phenomena and place them in a bigger context. This is done by developing models, i.e., by a reduction to the key physical processes. This enables a qualitative or semi-quantitative understanding. As mentioned before, there are examples for which these tasks are not accomplished, yet. 2. Predict properties of systems that have not been investigated experimentally so far – or situations that cannot be investigated by experiments directly. The latter include conditions of very high pressures or conditions that are chemically or radioactively harsh. A recent development is concerned with building a “library of hitherto unknown materials”. We will get back to this project at the end of this lecture. The latter point shall be illustrated by the following examples: 1) Until recently it was not possible to study the viscosity and the melting temperature of iron at pressures that exist at the earth core, but this has been calculated since about 19993 ). 2) The element carbon exists in three solid phases: i) as amorphous solid and in crystalline form ii) as graphite and iii) as diamond4,5 . Graphite is the most stable phase, i.e., the one with the lowest internal energy. At normal conditions, diamond is only a metastable state, but with a rather long lifetime. Usually when carbon atoms are brought together graphite or an amorphous phase is formed. Only under certain conditions (pressure, temperature) present in the earth mantle diamonds can be formed. Therefore, in a certain sense, diamonds are known only accidentally. It cannot be excluded that also other elements (Si, Ge, Ag, Au, etc.) can exist in other yet unknown modifications (polymorphs). Examples for new, artificially created materials are semiconductor quantum dots, quantum wire systems, or epitaxial magnetic layers6 . Once the interactions between atoms are understood, 3
D. Alfe, M.J. Gillan, and G.D. Price, Nature 401, 462 (1999). Diamond is the hardest known material, i.e., it has the highest bulk modulus. Diamonds without defects are most transparent to light. At room temperature their thermal conductivity is better than that of any other material. 5 This statement is slightly oversimplified: Since 1985 fullerenes (e.g. C60 ) and since 1991 carbon nanotubes are known. From the latter, “soft matter” can be formed and they can also be used directly as nanomaterials (for example as nanotube transistors). These systems will be discussed in more detail later. 6 In 1988, Albert Fert and Peter Grünberg independently discovered that an increased magnetoresistive effect (hence dubbed “giant magnetoresistance” or GMR) can be obtained in magnetic multilayers. These systems essentially consist of an alternate stack of ferromagnetic (e.g., Fe, Co, Ni, and their alloys) and non-ferromagnetic (e.g., Cr, Cu, Ru, etc.) metallic layers. It is unusual that a basic effect like GMR leads to commercial applications in less than a decade after its discovery: Magnetic field sensors based on GMR were already introduced into the market as early as 1996 and by now, e.g., all read heads for hard discs are built that way. In 2007 Albert Fert and Peter Grünberg were awarded the Nobel Prize in physics. 4
4
materials with completely new physical properties could be predicted theoretically. As an example, the theoretical investigation of semiconductor and/or metal heterostructures is currently used to attempt the prediction of new materials for light-emitting diodes (LEDs) and of new magnetic memory devices. Likewise, researchers hope to find new catalysts by the theoretical investigation of alloys and surface alloys with new compositions (for which no bulk-analogue exists). Albeit this sounds like being very close to practical application, it should be noted that applications of such theoretical predictions cannot be expected in the too near future, because in industry many practical details (concerning the technical processes, cost optimization, etc.) are crucial to decide whether a physical effect will be used in real devices or not. It is also interesting to note that 38 Nobel prizes have been awarded for work in the field of or related to materials science since 1980. They are listed in the following. More details, also review papers describing the work behind the prize can be found here: http://www.nobelprize.org/: 1981 Physics: Nicolaas Bloembergen and Arthur L. Schawlow “for their contribution to the development of laser spectroscopy” Kai M. Siegbahn “for his contribution to the development of high-resolution electron spectroscopy” 1981 Chemistry: Kenichi Fukui and Roald Hoffmann “for their theories, developed independently, concerning the course of chemical reactions” 1982 Physics: Kenneth G. Wilson “for his theory for critical phenomena in connection with phase transitions” 1982 Chemistry: Aaron Klug “for his development of crystallographic electron microscopy and his structural elucidation of biologically important nuclei acid-protein complexes” 1983 Chemistry: Henry Taube “for his work on the mechanisms of electron transfer reactions, especially in metal complexes” 1984 Chemistry: Robert Bruce Merrifield “for his development of methodology for chemical synthesis on a solid matrix” 1985 Physics: Klaus von Klitzing “for the discovery of the quantized Hall effect” 1985 Chemistry: Herbert A. Hauptman and Jerome Karle “for their outstanding achievements in the development of direct methods for the determination of crystal structures” 1986 Physics: Ernst Ruska “for his fundamental work in electron optics and for the design of the first electron microscope” Gerd Binnig and Heinrich Rohrer “for their design of the scanning tunneling microscope” 1986 Chemistry: Dudley R. Herschbach, Yuan T. Lee, and John C. Polanyi “for their contributions concerning the dynamics of chemical elementary processes” 5
1987 Physics: J. Georg Bednorz and K. Alexander Müller “for their important breakthrough in the discovery of superconductivity in ceramic materials” 1987 Chemistry: Donald J. Cram, Jean-Marie Lejn, and Charles J. Pedersen “for their development and use of molecules with structure-specific interactions of high selectivity” 1988 Chemistry: Johann Deisenhofer, Robert Huber, and Hartmut Michel, “for the determination of the three-dimensional structure of a photosynthetic reaction centre” 1991 Physics: Pierre-Gilles de Gennes “for discovering that methods developed for studying order phenomena in simple systems can be generalized to more complex forms of matter, in particular to liquid crystals and polymers” 1991 Chemistry: Richard R. Ernst “for his contributions to the development of the methodology of high resolution nuclear magnetic resonance (NMR) spectroscopy” 1992 Chemistry: Rudolph A. Marcus “for his contributions to the theory of electron transfer reactions in chemical systems” 1994 Physics: Bertram N. Brockhouse “for the development of neutron spectroscopy”, and Clifford G. Shull “for the development of the neutron diffraction technique”. 1996 Physics: David M. Lee, Douglas D. Osheroff, and Robert C. Richardson “for their discovery of superfluidity in Helium-3” 1996 Chemistry: Robert F. Curl Jr., Sir Harold W. Kroto, Richard E. Smalley “for their discovery of fullerenes” 1997 Physics: Steven Chu, Claude Cohen-Tannoudji, and William D. Phillips “for development of methods to cool and trap atoms with laser light” 1998 Physics: Robert B. Laughlin, Horst L. Störmer, and Daniel C. Tsui “for their discovery of a new form of quantum fluid with fractionally charged excitations” 1998 Chemistry: Walter Kohn “for his development of the density-functional theory” John A. Pople “for his development of computational methods in quantum chemistry” 1999 Chemistry: Ahmed H. Zewail “for his studies of the transition states of chemical reactions using femtosecond spectroscopy” 2000 Physics: Zhores I. Alferov and Herbert Kroemer “for developing semiconductor heterostructures used in high-speed- and opto-electronics”, and Jack S. Kilby “for his part in the invention of the integrated circuit” 2000 Chemistry: Alan J. Heeger, Alan G. MacDiarmid, and Hideki Shirakawa, “for the discovery and development of conductive polymers” 2001 Physics: Eric A. Cornell, Wolfgang Ketterle, and Carl E. Wieman “for the achievement of Bose-Einstein condensation in dilute gases of alkali atoms, and for early fundamental studies of the properties of the condensates” 6
2003 Physics: Alexei A. Abrikosov, Vitaly L. Ginzburg, and Anthony J. Leggett “for pioneering contributions to the theory of superconductors and superfluids” 2005 Physics: Roy J. Glauber “for his contribution to the quantum theory of optical coherence” and John L Hall and Theodor Hänsch for their contributions to the development of laser-based precision spectroscopy, including the optical comb technique” 2007 Physics: Albert Fert and Peter Grünberg “for their discovery of Giant Magnetoresistance” 2007 Chemistry: Gerhard Ertl “for his studies of chemical processes on solid surfaces” 2009 Physics: Charles K. Kao “for groundbreaking achievements concerning the transmission of light in fibers for optical communication” Willard S. Boyle and George E. Smith “for the invention of an imaging semiconductor circuit - the CCD sensor” 2010 Physics: Andre Geim and Konstantin Novoselov “for groundbreaking experiments regarding the two-dimensional material graphene” 2011 Chemistry: Dan Shechtman “for the discovery of quasicrystals”. 2013 Chemistry: Martin Karplus, Michael Levitt, and Arieh Warshel “for the development of multiscale models for complex chemical systems”. 2014 Physics: Isamu Akasaki, Hiroshi Amano and Shuji Nakamura “for the invention of efficient blue light-emitting diodes which has enabled bright and energy-saving white light sources”. 2014 Chemistry: Eric Betzig, Stefan W. Hell and William E. Moerner “for the development of super-resolved fluorescence microscopy”. 2016 Physics: David J. Thouless, F. Duncan M. Haldane and J. Michael Kosterlitz “for theoretical discoveries of topological phase transitions and topological phases of matter”. 2016 Chemistry: Jean-Pierre Sauvage, Sir J. Fraser Stoddart and Bernard L. Feringa “for the design and synthesis of molecular machines”. In the above list I ignored Nobel prizes in the field of biophysics, though some developments in this area are now becoming part of condensed-matter physics. The length of this list reflects the fact that materials science is an enormously active field of research and an important one for our society as well. Some remarks on the above list: The quantum-Hall-effect (Nobel prize 1985) is understood these days, but for its “variant” the “fractional quantum-Hall-effect” (Nobel prize 1998) this is true only in a limited way. The latter is based on the strong correlation of the electrons and even these days unexpected results are found.
7
The theory of high-Tc superconductivity (Nobel prize 1987) is still unclear. High-Tc superconductors seem to feature Cooper pairs with a momentum signature of different symmetry than conventional BCS-superconductors. Typically, high-Tc superconductors have a complex atomic structure and consist of at least four elements (e.g., La, Ba, Cu, O). However, in recent years also simpler systems have been found with rather high Tc , e.g., MgB2 (Tc = 39 K) and in 2015 hydrogen sulfide (H2 S) was found to undergo a superconducting transition at high pressure (around 150 GP) near 203 K, the highest temperature superconductor known to date. The latter two examples are materials that had been know since many decades, but their properties had only been investigated recently and the noted results were not expected. In this lecture: 1. Equations will not fall down from heaven, but we will derive them from first principles; 2. we will not only give the mathematical derivation, but also, and in particular, we will develop a physical feeling, i.e., we will spend a noticeable amount of time in interpreting equations; 3. we will give the reasons for approximations and clarify their physical meaning and the range of validity (as much as this is possible). In contrast to most text books we will start with the “adiabatic principle” and subsequently discuss the quantum mechanical nature of the electron-electron interaction. In most text books both are introduced only in the middle or at the end. In the first part of the lecture we will restrict ourselves – unless stated otherwise – to T ≈ 0 K. Sometimes an extrapolation to T 6= 0 K is unproblematic. Still, one should keep in mind that for T 6= 0 K important changes and new effects can occur (e.g. due to entropy).
0.2
Literature for This Lecture Author: Ashcroft, Neil W. and Mermin, N. David Title: Solid state physics Place: Philadelphia, PA Year: 1981 Publisher: Saunders College Publishing ISBN: 0-03-083993-9 = 0-03-049346-3 Author: Kittel, Charles Title: Quantum theory of solids Place: Hoboken, NJ Year: 1963 Publisher: John Wiley & Sons, Inc.
8
Author: Ziman, John M. Title: Principles of the theory of solids Place: Cambridge Year: 1964 Publisher: Cambridge University Press Author: Madelung, Otfried Title: Festkörpertheorie, 3 Bände Place: Berlin Year: 1972 Publisher: Springer Author: Dreizler, Reiner M. and Gross, Eberhard K. U. Title: Density functional theory: an approach to the quantum many-body problem Place: Berlin Year: 1990 Publisher: Springer ISBN: 3-540-51993-9 = 0-387-51993-9 Author: Parr, Robert G. and Yang, Weitao Title: Density-functional theory of atoms and molecules Place: Oxford Year: 1994 Publisher: Oxford University Press ISBN: 0-19-509276-7 Author: Marder, Michael P. Title: Condensed matter physics Place: New York Year: 2000 Publisher: John Wiley & Sons, Inc. ISBN: 0-471-17779-2 Author: Martin, Richard M. Title: Electronic Structure Place: Cambridge Year: 2004 Publisher: Cambridge University Press Author: Marvin L. Cohen and Steven G. Louie Title: Fundamentals of Condensed Matter Physics Place: Cambridge Year: 2016 9
Publisher: Cambridge University Press
10
0.3
Symbols and Terms Used in This Lecture −e charge of the electron +e charge of the proton m mass of the electron rk position of electron k σk spin of electron k ZK nuclear number of atom K ZvK valence of atom K MK mass of nucleus K RK position of nucleus K φ electric field Ψ many-body wave function of the electrons and nuclei Λ nuclear wave function Φ many-body wave function of the electrons ϕ single-particle wave function of the electrons χ spin wave function
{RI } ≡ {R1 ,...,RM } atomic positions {ri σi } ≡ {r1 σ1 ,...,rN σN } electron coordinates (position and spin) ε0 dielectric constant of the vacuum ǫi single-particle energy of electron i Vg volume of the base region Ω volume of a primitive cell vI (rk ) potential created by nucleus I at position of the k-th electron, rk V BO potential-energy surface (PES) or Born-Oppenheimer energy surface
11
0.4
Atomic Units
All through this lecture I will use SI-units (Système International d’Unités). Often, however, so-called atomic units (a.u.) are introduced in order to simplify the notation in quantum mechanics. For historic reasons two slightly different conventions exist. For both we have: length :
4πε0 ~2 = 1 bohr = 0.529177 Å = 0.0529177 nm , me2
(0.1)
but then two different options have been established: Rydberg atomic units and Hartree atomic units. e2 4πε0
~
m
Rydberg a.u.
2
1
0.5
Hartree a.u.
1
1
1
~2 2m
Hamilton operator for the hydrogen atom
1 Ry = 13.606 (eV)
1
1 Ha = 27.212 (eV)
0.5
2 r 1 1 − ∇2 + 2 r
energy =
12
~2 2ma2B
−∇2 +
1 Introduction 1.1
Many-Body Hamilton Operator
The starting point of a quantitative theoretical investigation of the properties of solids is the many-body Schrödinger equation HΨ = EΨ , with Ψ = Ψ({RI }, {rk , σk }) .
(1.1)
Here, the many-body wave function depends on the coordinates of all the atoms, {RI }, and on the space and spin coordinates of all electrons. In general, this wave function will not separate into RI - and (rk , σk )-dependent components. This should be kept in mind when we will introduce such a separation below and use it in most parts of this lecture. Of course, this separation is an approximation, but an extremely useful one and we will discuss its range of validity. The properties of condensed matter are determined by the electrons and nuclei and, in particular, by their interaction (∼ 1023 particles per cm3 ). For many quantum-mechanical investigations it is useful to start with an approximation, which is called the “frozen-core approximation”. This approximation is often helpful or convenient, but it is not necessary, i.e., the many-body problem can also be solved without introducing this approximation. While most theoretical work still introduces this approximation, at the FHI we developed a methodology that treats the full problem with the same efficiency (in terms of CPU time and memory) as the “frozen-core approximation”. In terms of qualitatively discussing and understanding numerical results, as the inter-atomic interactions, it is nevertheless useful to focus on the valence electrons. The frozen-core approximation assumes that, when condensed matter is formed from free atoms, only the valence electrons contribute to the interaction between atoms. Indeed, the electrons close to the nuclei (core electrons), which are in closed shells, will typically only have a small influence on the properties of solids. Exceptions are situations at very high pressure (small inter-atomic distances) and experiments, which more or less directly measure the core electrons or the region close to the nuclei [e.g. X-ray photo emission (XPS), electron spin resonance (ESR)]. Therefore it is reasonable to introduce the following separation already in the atom, before turning to solids: Nucleus and core electrons shall be regarded as a unit, i.e., the neutral atom consists of a positive, spherically symmetric ion of charge Zv e and of Zv valence electrons. This ion acts on each valence electron with a potential that looks like that shown in Fig. 1.1. The symbols have the following meaning: 13
Ze: nuclear charge of the atom Zv e: charge of the ion Rc : radial extension of the core electrons
v Ion
η
Rc r −Zv e 4πε0 r −Z e 4πε0 r
Figure 1.1: Potential of a positive ion (full line), where all valence electrons have been removed, i.e., only the electrons in closed shells are kept. The dashed curves show the asymptotic behavior for small and large distances. Thus, the number of core electrons is Z − Zv , and the solid is considered being composed of these ions and the valence electrons. The frozen-core approximation is conceptionally appropriate, because it corresponds to the nature of the interaction. In Table 1.1, I give the electronic configuration and the ionic potentials for four examples. Here (and in Fig. 1.1), η is a small number roughly of the order of Rc /(100 Z). The question marks in the range η ≤ r ≤ Rc indicate that in this range no analytic form of the potential can be given. Though the field frequently uses a language (and often also a theory) in terms of ions and valence electrons, we will continue in this lecture by talking about nuclei and electrons. For the construction of the Hamilton operator of the many-body Schrödinger equation of a solid, we start with the classical Hamilton function and subsequently utilize the correspondence principle by replacing the classical momentum p with the operator (~/i)∇. The many-body Hamilton operator of the solid has the following contributions: 1) Kinetic energy of the electrons N X p2k T = 2m k=1 e
14
.
(1.2)
Table 1.1: Electronic configuration and ionic (frozen-core) potentials for different atoms. atom H He
electronic configuration 1s1 1s2
C
[1s ]2s 2p
Si
[1s2 2s2 2p6 ]3s2 3p2
2
2
Z
Zv
Rc (bohr)
1 2
1 2
0 0
6
4
0.7
14
4
1.7
2
v Ion (r) (Ry) −2/r −4/r r ≥ Rc : η ≤ r ≤ Rc : r< Φν | . (2.1) ν
Of course, the ensemble of all states has to be normalized to 1. Therefore we have X ν
P (Eνe , T ) = 1 ≡
1 X exp(−Eνe /kB T ) . Ze ν
This yields the partition function of the electrons Z e : X Ze = exp(−(Eνe /kB T )) = T r(exp(−H e /kB T )) .
(2.2)
(2.3)
ν
Z e is directly related to the Helmholtz free energy F e : − kB T ln Z e = F e = U e − T S e
,
(2.4)
where U e and S e are the internal energy and the entropy of the electronic system, respectively. Consequently, the probability of a thermal excitation of a certain state (Eν , Φν ) is 1 P (Eνe , T ) = e exp(−Eνe /kB T ) = exp[(F e − Eνe )/kB T ] . (2.5) Z At finite temperatures, we thus need to know the full energy spectrum of the many-body Hamilton operator, i.e., Eνe for all values of ν. Then we can calculate the partition function 30
and the free energy defined in Eqs. (2.3) and (2.4), respectively. Let us now briefly discuss, how the internal energy and the entropy can be determined1 : The internal energy is what is often also called the “total energy at finite temperature”: e
U ({RI }, T ) =
X ν
Eνe ({RI }, T )P (Eνe , T )
M,M 1 1X e2 + ZI ZJ 4πε0 2 I,J |RI − RJ |
.
(2.6)
I6=J
It is important to keep in mind that in the general case, i.e., when also atomic vibrations are excited, we have U = U e + U vib , not just U e . From Eq.(2.4) we obtain the specific heat 1 ∂U = cv = V ∂T V
T V
∂S ∂T
(2.7)
. V
We removed the superscript e and in fact mean U = U e + U vib and S = S e + S vib . The calculation of cv for metals (cf. Ashcroft-Mermin p. 43, 47, 54) is an instructive example for the importance of the Fermi statistics for electrons that will be introduced below.
2.2
Fermi Statistics of the Electrons
To inspect the statistical mechanics of electrons, we have to introduce some concepts that will be justified and discussed in a more formal way in Chap. 3. For the start, we note that the full many-problem for N electrons is given by Eqs. (1.11) and (1.12): H e ({RI })Φν ({RI }, {rk σk }) = Eνe Φν
with H e = T e + V e−Nuc + V e−e
.
(2.8)
If the electron-electron interaction V e−e is neglected by assuming H ind,e = T e + V e−Nuc , the problem separates and one gets N independent Hamiltonians hj : H ind,e =
N X j=1
hj
with hj = −
~2 ∇j + V e−Nuc (rj ) . 2m
(2.9)
Since the Hamiltonians h = hj = hi are identical for all electrons, it is sufficient to solve h ϕi (r) = ǫi ϕi (r)
(2.10)
once. Each of these single-particle levels can be occupied with two electrons at most (one electron with spin up and one electron with spin down) due to the Pauli principle. In the absence of thermal excitations, i. e., at T = 0 K, only the lowest N energy levels ǫi are 1 cf. e.g. N.D. Mermin, Phys. Rev. 137, A 1441 (1969); M. Weinert and J.W. Davenport, Phys. Rev. B 45, 13709 (1992); M.G. Gillan, J. Phys. Condens. Matter 1 689 (1989); J. Neugebauer and M. Scheffler, Phys. Rev. B 46, 16067 (1992); F. Wagner, T. Laloyaux, and M. Scheffler, Phys. Rev. B 57, 2102 (1998).
31
occupied, so that the ground-state total energy (internal energy at 0K) of this electronic system is given by: U e,ind (T = 0K) = E0e,ind =
N X
ǫi
.
(2.11)
i=1
This independent electron approximation is very strong and thus not particularly u seful by itself. However, it is very instructive: As will be discussed in Chap. 3, also the full many-body problem problem can be rewritten in terms of an effective single-particle Hamiltonian h=
−~2 2 ▽ +V eff (r) 2m
(2.12)
with effective “single-particle” levels2 given by (2.13)
h ϕi (r) = ǫi ϕi (r) .
Please note that in this formalism the electrons are only described by an effective singleparticle Hamiltonian, but they are in fact not independent (see Chapter 3). As a consequence, the ground-state total energy (internal energy at 0K) is given by e
U (T = 0K) =
E0e
=
N X
(2.14)
ǫi + ∆ ,
i=1
where ∆ is a correction accounting for the electron-electron interaction. For independent particles ∆ is zero, but for the true, fully interacting many-body problem it is very important (see Chapter 3). We now apply the density matrix formalism introduced in the previous section to the electronic system described by Eq. (2.12), see for instance Marder, Chapter 6.4 or LandauLifshitz, Vol. IV. In this case, the lowest internal energy that is compatible with the Pauli principle at finite temperatures is given by e
U (T ) =
∞ X
ǫi f (ǫi , T ) + ∆(T ) + U vib (T ) .
(2.15)
i=1
The index i is running over all single particle states and the occupation probability of the ith single particle level ǫi is given by the Fermi function (cf. e.g. Ashcroft-Mermin, Eq. (2.41) - (2.49) or Marder, Chapter 6.4, and the plot in Fig. 2.1): f (ǫi , T ) =
1 exp[(ǫi − µ)/kB T ] + 1
.
(2.16)
Here, kB is again the Boltzmann constant and µ is the chemical potential of the electrons, i.e., the lowest energy required to remove an electron from the system. At T = 0K this is − µ = E0e (N − 1) − E0e (N ) .
(2.17)
At finite temperature this should be replaced by the difference of the corresponding free energies. 2
The justification and meaning of the term “single-particle level” will be discussed in Chapter 3.
32
f (ǫ) 1.0
T =0
0.5 T 6= 0 0.0
µ
ǫ
Figure 2.1: The Fermi-distribution function [Eq. (2.16)]. Obviously, the number of electrons N is independent of temperature. Therefore, we have ∞ X f (ǫi , T ; µ) . (2.18) N= i=1
For a given temperature this equation contains only one unknown quantity, the chemical potential µ. Thus, if all ǫi are known, µ(T ) can be calculated. By introducing the energy and entropy per unit volume (u = U/V, s = S/V ) and using the laws of thermodynamics [(∂u/∂T )V = T (∂s/∂T )V , and s → 0 if T → 0)], we obtain i Xh Se se = = −kB f (ǫi , T ) ln f (ǫi , T ) + (1 − f (ǫi , T )) ln (1 − f (ǫi , T )) . (2.19) V i
The derivation of Eq. (2.19) is particularly simple if one is dealing with independent particles as described by Eq. (2.15) with ∆(T ) = 0.
2.3
Some Definitions
We will now introduce some definitions and constrains that are particularly useful in the description of solids. First, we like to consider a system without spin-orbit interaction. Thus spin and position coordinates separate: Φ({rk σk }) = Φ({rk }) .χ({σk }) .
(2.20)
H e = T e + V e−Nuc + V e−e ≈ T e + C jellium + V e−e .
(2.21)
Second, we simplify the Hamiltonian H : One possible consists in approximating the electron-nuclear interaction with a constant C jellium e
33
In this so called “jellium” model, the potential of the nuclei is smeared out to a constant value. We note in passing that for different electron numbers N this constant is different and that at not too low electron densities, this is an exact description (see Sec. 3.8 for details).3 Since the “jellium” model defined in Eq. (2.21) is not tractable analytically, we switch to an ever cruder approximation, in which also the electron-electron interaction is neglected: H e ≈ T e + C jellium
.
(2.22)
For C jellium = 0, this is the free electron problem that will be analysed in detail below. At this stage, you might ask yourself why it is helpful to discuss such a harsh approximation at all. As already mentioned in the discussion of Eq. (2.12) and discussed in detail in Chap. 3, the full many-body Schrödinger equation can be addressed using DensityFunctional Theory (see Sec. 3.8) via an effective single-particle equation ~2 2 eff − ∇ + V (r) ϕj (r) = ǫj ϕj (r) . (2.23) 2m In such a formalism, the “jellium” and the “free-electron” problem would just differ in the effective potential V eff (r): For the “free-electron” problem, V eff (r) is zero. For the “jellium” model in the limit of not too low densities, V eff (r) becomes a constant that depends on C jellium in a non-trivial way. We will discuss the details in Chap. 3.8. In the following, we will thus discuss the free electron problem and then use the derived formalism to find interpretations to the more complex theories discussed in the later chapters. The eigenfunctions of Eq. (2.23) are plane waves ϕk (r) = eikr
,
(2.24)
and choosing the energy zero such that C jellium = 0 the energy eigenvalues are ǫ(k) =
~2 k 2 2m
.
(2.25)
The vectors k, or its components kx , ky , kz , are interpreted as quantum numbers. Thus, they replace what so far was noted as index j. Including the spin, the quantum numbers are k and s, where the latter can assume one of the two values: +1/2 and −1/2 (up and down). The wave functions in Eq. (2.24) are not normalized (or they are normalized with respect to δ functions). In order to obtain a simpler mathematical description, it is often useful and helpful to constrain the electrons to a finite volume. This volume is called the base region Vg and it shall be large enough, so that the obtained results are independent of its size and shape.4 The base region Vg shall contain N valence electrons and M ions, and for simplicity we chose a box of the dimensions Lx , Ly , Lz (cf. Ashcroft, Mermin: 3
For systems with very low densities, electrons will localize themselves at T = 0 K due to the Coulomb repulsion. This is called Wigner crystallization and was predicted in 1930. 4 For external magnetic fields the introduction of a base region can give rise to difficulties, because then physical effects often depend significantly on the border.
34
Exercise for more complex shapes). For the wave function we could chose an almost arbitrary constraint (because Vg shall be large enough). It is advantageous to use periodic boundary conditions ϕ(r) = ϕ(r + Lx ex ) = ϕ(r + Ly ey ) = ϕ(r + Lz ez ) .
(2.26)
Here ex , ey , ez are the unit vectors in the three Cartesian directions. This is also called the Born-von Kármán boundary condition. As long as Lx , Ly , Lz , and thus also Vg are large enough, the physical results do not depend on this treatment. Sometimes also anti-cyclic boundary conditions are chosen in order to check that the results are independent of the chosen base region. By introducing such a base region, the wave functions can be normalized to the Kronecker delta. Using Eq. (2.26) and the normalization condition Z ϕ∗k (r)ϕk′ (r)d3 r = δk,k′ (2.27) Vg
we obtain 1 ϕk (r) = √ eikr Vg
(2.28)
.
Because of Eq. (2.26), i.e., because of the periodicity, only discrete values are allowed for the quantum numbers k, i.e., k · Li ei = 2πni and therefore 2πnx 2πny 2πnz k= , , , (2.29) Lx Ly Lz with ni being arbitrary integer numbers. Thus, the number of vectors k is countable, and each k point is associated with the volume (2π)3 /Vg . Each state ϕk (r) can be occupied by two electrons. In the ground state at T = 0 K the N/2 k points of lowest energy are occupied by two electrons each. Because ǫ depends only on the absolute value of k, these points fill (for non-interacting electrons) a sphere in k-space (the “Fermi sphere”) of radius kF (the “Fermi radius”). We have N =2
4 3 Vg 1 πkF = 2 kF3 Vg 3 3 (2π) 3π
.
(2.30)
Here the spin of the electron (factor 2) has been taken into account, and Vg /(2π)3 is the density of the k-points [cf. Eq. (2.25)]. The particle density of the electrons in jellium is constant:
n(r) = n =
1 N = 2 kF3 Vg 3π
,
(2.31)
the charge density of the electrons is −en, and the highest occupied state has the “quan√ 3 tum number”, or k-vector with radius kF = 3π 2 n. 35
The energy of this single-particle state |kF i is
~2 2 ~2 kF = (3π 2 n)2/3 . (2.32) 2m 2m For jellium-like systems (e.g, metals), the electron density is often noted in terms of the density parameter rs . This is defined by a sphere 4π r3 , which contains exactly one electron. 3 s One obtains 4π 3 r = Vg /N = 1/n . (2.33) 3 s The density parameter rs is typically given in bohr units. For the valence electrons of metals rs is typically around 2 bohr , and therefore kF is approximately 1 bohr−1 , or 2 Å−1 , respectively. ǫF =
Now we introduce the (electronic) density of states: N (ǫ)dǫ = number of states in the energy interval [ǫ, ǫ + dǫ] . For the total number of electrons in the base region we have: Z +∞ N= N (ǫ)f (ǫ, T )dǫ .
(2.34)
−∞
For free electrons (jellium), we get the density of states: Z 2Vg d3 k δ(ǫ − ǫk ) N (ǫ) = (2π)3 Z 2Vg 4π = k 2 dk δ(ǫ − ǫk ) (2π)3 Z dǫk 2mǫk Vg = δ(ǫ − ǫk ) 2 π (dǫk /dk) ~2 Z Vg 2m 2mǫk = dǫk δ(ǫ − ǫk ) 2 π ~2k ~2 mVg √ = 2mǫ . (2.35) π 2 ~3 For ǫ < 0 we have N (ǫ) = 0. The density of states for two- and one dimensional systems is discussed in the exercises (cf. also Marder). Figure 2.2 shows the density of states and the occupation at T = 0 K and at finite temperature. The density of states at the Fermi level is N (ǫF ) 3N 1 m = = 2 2 kF Vg 2 V g ǫF ~π
.
(2.36)
The figure shows that at finite temperatures holes below µ and electrons above µ are generated. Later, we will often apply Eqs. (2.32)-(2.35) and the underlying concepts, because some formulas can be presented and interpreted more easily, if ǫF , kF , and n(r) are expressed in this way.
36
T=0K
T = 10,000 K
f (ǫ) ·
√
❤✧★✩✪
ǫ
♦ ✁✂✄☎✆✝
☞✌✍✎✏✑✒✓
s✞✟✠✡☛
✔✕✖✗✘✙ ❡✚✛✜✢✣✤✥✦
Figure 2.2: Density of states of free electrons and unoccupied states for two temperatures.
p (ǫ)f (ǫ, T ) and the separation in occupied
37
3 Electron-Electron Interaction 3.1
Electron-Electron Interaction
In the adiabatic approximation, the dynamics of the electrons and nuclei is decoupled. Still, equations (1.11) and (1.17) describe systems containing 1023 particles. In the following, we will discuss methods that enable to obtain solutions to the many-body Schrödinger equation defined in Eq. (1.11). The wave function Φν ({ri σi }) and its energy Eνe are determined by the equations (1.11) and (1.12). We write down Eq. (1.12) once more He =
N X k=1
−
N,N N X ~2 2 1 1 X e2 ∇rk + v(rk ) + 2m 2 4πε0 k,k′ |rk − rk′ | k=1
.
(3.1)
k6=k′
Here, v(rk ) is the potential of the lattice components (ions or nuclei) of the solid (cf. Eq. (1.7)). Often it is called the “external potential”. At this point we briefly recall the meaning of the many-body wave function. It depends on 3N spatial coordinates and N spin coordinates. Because the spatial coordinates are all coupled by the operator V e−e , they cannot be dealt with separately in general. In a certain sense, this is already the case in the single-particle problem: Neglecting spin for the moment, the single-particle wave function ϕ(x, y, z) depends on three spatial coordinates, and the motion in x-direction is usually not independent of that in y-direction. The same is true for x and z, and for y and z. In other words, ϕ(x, y, z) does not describe 3 independent one-dimensional particles, but 1 particle with 3 spatial coordinates. In the same way the N -particle Schrödinger equation has to be treated as a many-body equation with 3N spatial coordinates. One can say that the total of all the electrons is like a glue, or a mush and not like 3N independent particles. If the electron-electron interaction would be negligible or if it could be described as V
e−e
=
N X
v e−e (rk ) ,
(3.2)
k=1
i.e., if the potential at position rk would not explicitly depend on the positions of the other electrons, then it would not be a major problem to solve the many-body Schrödinger equation, since it would decouple into independent particle equations. Unfortunately, such a neglect cannot be justified: The electron-electron interaction is Coulombic; it has an infinite range and for small separations it becomes very strong. Indeed, learning about 38
this electron-electron interaction is the most interesting part of solid-state theory, since it is responsible for a wide range of fascinating phenomena displayed in solids. Thus, now we will describe methods that enable us to take into account the electron-electron interaction in an appropriate manner. There are four methods (or concepts) that are being used: 1. Many-body perturbation theory (also called Green-function theory): This method is very general, but not practical in most cases. One constructs a series expansion in interactions and necessarily many terms have to be neglected, which are believed to be unimportant. Keywords are Green functions, self energies, vortex corrections, Feynman diagrams. We will come back to this method in a later part of this lecture, because it allows for the calculation of excited states. 2. Methods that are based on Hartree-Fock theory and then treat correlation analogous to state-of-the-art quantum chemistry approaches. Keywords are Møller-Plesset perturbation theory, coupled-cluster theory, and configuration interaction. These are very systematic concepts, but they don’t play a role in materials science, so far, as they are numerically too inefficient. This may well change in the next years if new algorithms are being developed. It is also discussed that these equations are suitable for a quantum computer, and in 5-10 years such machines may well exist. 3. Effective single-particle theories: Here, we will emphasize in particular densityfunctional theory (DFT)1 . Primarily, DFT refers to the ground state, E0e . In this chapter we will discuss DFT (and its precursors) with respect to the ground state, and we will also briefly describe the calculation of excited states and time-dependent DFT (TD-DFT). 4. The quantum Monte Carlo method: Here the expectation value of the many-body Hamilton operator H e is calculated using a very general ansatz of many-body wave functions in the high-dimensional configurational space. Then the wave functions are varied and the minimum of the expectation value hΦ|H e |Φi/hΦ|Φi is determined. Due to the availability of fast computers and several methodological developments in recent years, this method has gained in importance. It will be discussed more to the end of this lecture. Now we will discuss density-functional theory in detail. This will be done step by step in order to clarify the physical content of the theory. Thus, we begin with the Hartree and Hartree-Fock theory and then proceed, via Thomas-Fermi theory, to density-functional theory.
3.2
Hartree Approximation
Besides its scientific value, the ansatz introduced by Hartree also serves here as more general example for the route how a theory can evolve. Often, initially an intuitive feeling is present. Only in a second step one attempts to derive things in a mathematical way. Hartree (Proc. Camb. Phil. Soc. 24, 89, 111, 426 (1928)) started from the following idea: 1
Walter Kohn was awarded the Nobel prize in chemistry for the development of density-functional theory in 1998.
39
The effect of the electron-electron interaction on a certain electron at position r should approximately be given by the electrostatic potential, which is generated by all other electrons on average at position r, i.e., it should approximately be possible to replace the potential V e−e ({ri }) of the many-body Schrödinger equation by V
e−e
with v
Hartree
!
({ri }) ≈
N X
v Hartree (rk )
(3.3)
k=1
e2 (r) = 4πε0
Z
n(r′ ) 3 ′ dr |r − r′ |
.
(3.4)
The ansatz of Eq. (3.3) is called mean field approximation. As a consequence, the manybody Hamilton operator decomposes into N single-particle operators e
H =
N X
h(rk ) .
(3.5)
k=1
Each electron can then be described by a single-particle Schrödinger equation with a Hamilton operator ~2 2 h=− ∇ + v(r) + v Hartree (r) . (3.6) 2m The validity of Eqs. (3.3) – (3.6) may seem to be reasonable. However, for nearly all situations this approach represents a too drastic approximation. Only a more systematic treatment can show how problematic this approximation is, and this shall be done now. Mathematical Derivation of the Hartree Equations Starting from the general many-body equation, Eqs. (3.3) – (3.6) shall be derived. At first we note that H e does not contain the spin of the electrons explicitly, and therefore, spin and position are not coupled. Thus, for the eigenfunctions of H e we have the factorization Φν ({ri σi }) = Φν ({ri }) χν ({σi }) .
(3.7)
To take into account orbital as well as spin quantum numbers, we will label the set of quantum numbers as follows ν ≡ oν sν
,
with oν representing the orbital and sν the spin quantum numbers of set ν. In the freeelectron system, oν represents all possible values of the three numbers kx , ky , kz , and for each state, sν is either ↑ or ↓ (cf. Chapter 2). Because H e does not contain spin-orbit and spin-spin coupling, the spin component factorizes and we find χν ({σi }) = χs1 (σ1 )χs2 (σ2 )...χsN (σN ) . (3.8) 1 0 Here, the “functions” are χ↑ (σ) ≡ and χ↓ (σ) ≡ , i. e., σ refers to the elements 0 1 of a two-component vector. For the spatial part such a product ansatz is not possible, 40
Figure 3.1: Schematic representation of the expectation value of the Hamilton operator as a “function” of the vectors of the Hilbert space of H e . The tick marks at the Φ-axis label the wave functions of type ΦHartree . They definitely cannot reach the function Φ0 . because H e couples the positions of “different” electrons due to the electron-electron interaction. We used quotes because speaking of different electrons is no longer appropriate. One should better talk about a many-electron cloud or an electron gas. For the lowest eigenvalue of the Schrödinger equation the variational principle holds, i.e., for the ground state of H e we have hΦ|H e |Φi E0e ≤ , (3.9) hΦ|Φi
where the Φ are arbitrary functions of the N -particle Hilbert space, which can be differentiated twice and can be normalized. Let us now constrain the set of functions Φ further to achieve mathematical convenience. For example, we could allow only many-body functions that can be written as a product of single particle functions Φ({ri }) ≈ ΦHartree ({ri }) = ϕo1 (r1 )ϕo2 (r2 ) . . . ϕoN (rN ) ,
(3.10)
where we clearly introduce an approximation. Functions that can be written as in Eq. (3.10) do not span the full Hilbert space of the functions Φ({ri }), and most probably we will not obtain E0e exactly. However, a certain restriction on the set of the allowed functions is justifiable when we are not interested in the exact value of E0e , but accept an error of, say, 0.1 eV. As a word of caution, we note that the quality of such a variational estimation is often overrated and in general difficult to assess. Schematically, the variational principle can be described by Fig.(3.1). Because the Hartree ansatz (Eq.(3.10)) for sure is an approximation, we have E0e
0.73 (CsCl structure) R− R+ 0.73 > > 0.41 (N aCl structure) R− R+ 0.41 > > 0.23 (ZnS structure) R− 1>
(6.17) (6.18) (6.19)
The relationships as listed here are merely based on the most efficient ways of packing different sized spheres in the various crystal structures. They can be easily verified with a few lines of algebra. The essential messages to take from these relationships are basically: (i) If the anion and cation are not very different in size then the CsCl structure will probably be favored: (ii) If there is an extreme disparity in their size then the ZnS structure is likely; and (iii) if in between the NaCl structure is likely. Of course atoms are much more than just hard spheres but nonetheless correlations such as those predicted with these simple rules are observed in the structures of many materials. A partial understanding of why this is so can be obtained by simply plotting E Mad as a function of the anion-cation ratio. This is done in Fig. 6.11. If, for example, one looks at the CsCl to NaCl transition in Fig. 6.11 it is clear that there is a discontinuity at 0.73 after which the Madelung energy remains constant. This transition is a consequence of the fact that the volume of the CsCl structure is determined solely by the second nearest 158
Madelung Energy
neighbour anion-anion interactions. Once adjacent anions come into contact no further energy can be gained by shrinking the cation further. It would simply "rattle" around in its cavity with the volume of the cell and thus the Madelung energy remaining constant. Further discussion on this issue and these simple rules of thumb can be found in Pettifor, Bonding and Structure of Molecules and Solids.
CsCl NaCl ZnS
0.0 0.2 0.4 0.6 0.8 1.0 R+/RFigure 6.11: The Madelung energy in ionic compounds as a function of the radius ratio for CsCl, NaCl and cubic ZnS lattices (assuming the anion radius is held constant) (Based on Pettifor, Bonding and Structure of Molecules and Solids).
6.4
Covalent bonding
The ionic bonding described in the last section is based on a complete electron transfer between the atoms involved in the bond. The somewhat opposite case (still in our idealized pictures), i.e. when chemical binding arises out of electrons being more or less equally shared between the bonding partners, is called covalence. Contrary to the ionic case, where the electron density in the solid does not differ appreciably from the one of the isolated ions, covalent bonding results from a strong overlap of the atomic-like wavefunctions of the different atoms. The valence electron density is therefore increased between the atoms, in contrast to the hitherto discussed van der Waals and ionic bonding types. It is intuitive that such an overlap will also depend on the orbital character of the wavefunctions involved, i.e. in which directions the bonding partners lie. Intrinsic to covalent bonding is therefore a strong directionality as opposed to the non-directional ionic or van der Waals bonds. From this understanding, we can immediately draw some conclusions: • When directionality matters, the preferred crystal structures will not simply result from an optimum packing fraction (leading to fcc, hcp or CsCl, NaCl lattices). The classic examples of covalent bonding, the group IV elements (C, Si, Ge) or III-V compounds (GaAs, GaP), solidify indeed in more open structures like diamond or zincblende. • Due to the strong directional bonds, the displacement of atoms against each other (shear etc.) will on average be more difficult (at least more difficult than in the case of metals discussed below). Covalent crystals are therefore quite brittle. 159
Element C Si Ge
theory exp. %diff. theory exp. %diff. theory exp. %diff.
ao (Å) 3.602 3.567 2 , for the alanine chain. This is a dramatic cooperative increase in H bond strength. This cooperative behaviour of H bonds is the opposite of the more intuitive behaviour exhibited by, for example, covalent bonds which generally decrease in strength as more bonds are formed (cf. Fig. 6.21). Cooperativity of H bonds is a crucial physical phenomena in biology, providing additional energy to hold certain proteins together under ambient conditions.
6.6.2
Some Physics of Hydrogen bonds
Although many of the structural and physical properties of H bonds are clear, the electronic make-up of H bonds is less clear and remains a matter of debate. The unique role H plays is generally attributed to the fact that the H ion core size is negligible and that H has one valence electron but unlike the alkali metals, which also share this configuration, has a high ionization energy. Generally it is believed that H bonds are mostly mediated by electrostatic forces; stabilized by the Coulomb interaction between the (partial) positive charge on H and the (partial) negative charge on the electronegative element B. However, H bonds are not purely electrostatic. First, it has been shown that for the water dimer (the most studied prototype H bond) it is not possible to fit the energy versus separation curve to any of the traditional multipole expansions characteristic of a pure electrostatic interaction (see S. Scheiner, Hydrogen bonding a theoretical perspective, for more details.). Second, first- principles calculations reveal overlap between the wavefunctions of the donor and acceptor species in the H bond. This is shown in Fig. 6.25(B) which shows one of the occupied eigenstates of the gas phase water dimer. Overlap between orbitals such as 176
Figure 6.26: Correlation between the A-B distance (in this case O-O distance) and OH vibrational frequency for a host of H bonded complexes. From G.A. Jeffrey, An Introduction to H bonding.
177
PBE error per hb (kcal/mol)
(H2O)(NH 3 )
(FH)(NH 3 )
Error PBE Error LDA
(ClH)(NH 3 )
(B)
(OC)(HF)
(H2O)2
(CO)(HF)
(HCl)2
9 8 7 6 5 4 3 2 1 0 -1
(HF)2
E (kcal/mol)
(A) 10
1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 110
120
130
140
150
160
170
180
q(°)
Figure 6.27: (A) Comparison between LDA and GGA error for H bond strengths in several gas phase complexes (from C. Tuma, D. Boese and N.C. Handy, Phys. Chem. Chem. Phys. 1, 3939 (1999)) (B) Correlation between PBE error and H bond angle for several gas phase H bonded complexes. See J. Ireta, J. Neugebauer and M. Scheffler J. Phys. Chem. A 108, 5692 (2004) for more details.
178
that shown in Fig. 6.25(B)is characteristic of covalent bonding. Current estimates of the electrostatic contribution to a typical H bond range anywhere from 90 to 50%. Finally, we shall end with a brief discussion on how well DFT describes H bonds. This question has been tackled in countless papers recently, particularly by focussing on the H2 O dimer and other small gas phase clusters. The two most general (and basic) conclusions of these studies are: • Predicted H bond strengths strongly depend on the exchange correlation functional used. The LDA is not adequate for describing H bonds. The LDA routinely predicts H bonds that are too strong by 100%. GGAs on the other hand generally predict H bond strengths that are close (within 0.05 eV) of the corresponding experimental value. This general conclusion is summarised nicely in Fig. 6.27(A) which plots the difference between DFT (LDA and GGA (PBE)) H-bond strengths from those computed from high level quantum chemical calculations (post Hartree-Fock methods such as Configuration Interaction or Coupled-Cluster, which can yield almost exact results) for several gas phase H-bonded clusters. It can be seen from this figure that LDA always overestimates the H bond strengths, by at least 4 kcal/mol ( 0.16 eV). PBE on the other hand is always within 1 kcal/mol ( 0.04 eV) of the ’exact’ value. • The quality of the GGA (PBE) description of H bonds depends on the structure of the H bond under consideration. Specifically, it has been shown that the more linear the H bond is, the more accurate the PBE result is. This is shown in Fig. 6.27(B) for several different H bonded gas phase clusters.
6.7
Summary
Some of the key features of the five main types of bonding are shown in Fig. 6.28.
179
Type
Cohesive Energy
Schematic d-
d+
d
Van der Waals
d+
d+ d-
H bonding
d-
d+
very weak: 1-10 meV
d-
d- d+
weak: 0.1-0.5 eV
d-
noble gases (Ar, Ne, etc), organic molecules
water , ice, proteins
C (graphite, strong: £11 eV (molecules) diamond), Si, transition £8 eV (solids) metals
Covalent
+ Metallic
+ d-
+
Ionic
Examples
+
+ -
+ d-
+ d-
+
Na, Al, transition metals
strong: £8 eV
NaCl; NaBr, KI
+
+ -
+ -
strong: £9 eV
+
Figure 6.28: Very simple summary of the main types of bonding in solids
180
7 Lattice Vibrations 7.1
Introduction
Up to this point in the lecture, the crystal lattice (lattice vectors & position of basis atoms) was always assumed to be immobile, i.e., atomic displacements away from the equilibrium positions of a perfect lattice were not considered. This allowed us to develop formalisms to compute the electronic ground state for an arbitrary periodic but static atomic configuration and therewith the total energy of our system. It is obvious that the assumption of a completely rigid lattice does not make a lot of sense – neither at finite temperatures nor at 0K, where it even violates the uncertainty principle. Nonetheless, we obtained with this simplification a very good description for a wide range of solid state properties, among which were the electronic structure (band structure), the nature and strength of the chemical bonding, and low temperature cohesive properties, e.g., the stable crystal lattice and its lattice constants. The reason for this success is that the internal energy U of a solid (or similarly of a molecule) can be written as U = E static + E vib
,
(7.1)
where E static is the electronic ground state energy with a fixed lattice (on which we have focused so far) and E vib is the additional energy due to lattice motion. In general (and we will substantiate this below), E static ≫ E vib , so that many properties are described in qualitatively correct fashion –and often even semi-quantitatively– even when E vib is neglected. There is, however, also a multitude of solid state properties, for which the consideration of the lattice dynamics is crucial. In this respect, “dynamics” is already quite a big word, because for many of these properties it is sufficient to take small vibrations of the atoms around their (ideal) equilibrium position into account. As we will see below, these vibrations of the crystal can be described as quasi-particles – so called phonons. The most salient examples are: • High-Accuracy Predictions at 0 K: Even at 0 K, the nuclei are always in motion due to quantum-nuclear effects. In turn, this alters the equilibrium properties at 0 K (lattice constants, bulk moduli, band gaps, etc.). For high-accuracy predictions, the nuclear motion can thus not be neglected. • Thermodynamic Equilibrium Properties for T > 0 K: In a rigid and immobile lattice, the electrons are the only degrees of freedom. In metals, only the free-electron-like conduction electrons can take up small amounts of energy (of the order of thermal energies). The specific heat capacity due to these conduction electrons was computed to vary linearly with temperature for T → 0 in exercise 4. In 181
insulators with the same rigid perfect lattice assumption, not even these degrees of freedom are available, since thermal excitations across the band gap Eg ≫ kB T exhibit a vanishingly small probability ∼ exp(−Eg /kB T ). In reality, one observes, however, that for both insulators and metals the specific heat capacity varies predominantly with T 3 at low temperatures, in clear contradiction to the above cited rigid lattice dependency of ∼ T . The same argument holds for the pronounced thermal expansion typically observed in experiments: Due to the negligibly small probability of electronic excitations over the gap, there is nothing left in a rigid lattice insulator that could account for it. These expansions are, on the other hand, key to many engineering applications in material science (different pieces simply have to fit together over varying temperatures). Likewise, a rigid lattice offers no possibility to describe or explain structural phase transitions upon heating and the final melting. • Thermodynamic Non-Equilibrium Properties for T > 0 K: In a perfectly periodic potential, Bloch-electrons suffer no collisions, since they are eigenstates of the many-body Hamiltonian. Formally, this leads to infinite transport coefficients. The thermodynamic nuclear motion breaks this perfect periodicity and thus introduces scattering events and finite mean free paths, which in turn explain the really measured, and obviously finite electric and thermal conductivity of metals and semiconductors.1 Computing and understanding these transport properties and other related non-equilibrium properties (propagation of sound, heat transport, optical absorption, etc.) is of fundamental importance for many real devices (like transistors, LED’s, lasers, etc.).
7.2 7.2.1
Vibrations of a Classical Lattice Adiabatic (Born-Oppenheimer) Approximation
Throughout this chapter we assume that the electrons follow the atoms instantaneously, i.e., the electrons are always in their ground-state at any possible nuclear configuration {RI }. In this case, the potential-energy surface (PES) on which the nuclei move is given by V BO ({RI }) = min E({RI }; Ψ) (7.2) Ψ
for any given atomic structure described by atomic positions {RI } (I ∈ nuclei). This is nothing else but the Born-Oppenheimer approximation, which we discussed at length in the first chapter, and that is why the so defined PES is often also called Born-Oppenheimer surface. Qualitatively, this implies that there is no “memory effect” stored in the electronic motion (e.g. due to the electron cloud lacking behind the atomic motion), since V BO ({RI }) is conservative, i.e., its value at a specific configuration {RI } does not depend on how this configuration was reached by the nuclei.2 1
Please note that defects and boundaries break the perfect symmetry of a crystal as well. However, they are not sufficient to quantitatively describe these phenomena. 2 A breakdown of the Born-Oppenheimer approximations often manifests itself as friction: In that case, the actual value of V ({RI }) depends not only on {RI }, but on the whole trajectory. See for instance Wodtke, Tully, and Auerbach, Int. Rev. Phys. Chem. 23, 513 (2004).
182
The motion of the atoms is then governed by the Hamiltonian (7.3)
H = T nu + V BO ({RI }) ,
which does not contain the electronic degrees of freedom explicitly anymore. The complete interaction with the electrons (i.e. the chemical bonding) is instead contained implicitly in the functional form of the PES V BO ({RI }). For a perfect crystal, the minimum of this PES corresponds to the equilibrium configuration {R◦I } of a Bravais lattice (lattice vectors & position of the basis atoms). In the preceding chapters, we have exclusively discussed these equilibrium configurations {R◦I } in the rigid lattice model. Now, we will also discuss deviations from these equilibrium configurations {R◦I }: In principle, this boils down to evaluating the PES at arbitrary positions {RI }. Practically, this can become rapidly intractable both analytically and computationally. In many cases, however, we do not need to consider all arbitrary configurations {RI }, but only those that are very close to the (pronounced) minimum corresponding to the ideal lattice {R◦I }. This allows us to write RI = R◦I + sI with |sI | ≪ a (a: lattice constant) . (7.4) For such small displacements around a minimum, the PES can be approximated by a parabola using the harmonic approximation.
7.2.2
Harmonic Approximation
The fundamental assumption of the harmonic approximation consists in limiting the description to small displacements sI from the equilibrium configuration {R◦I }, i.e., to vibrations with small amplitudes. Accordingly, this is a low-temperature approximation. In this case, the potential in which the particles are moving can be expanded in a Taylor series around the equilibrium geometry, whereby only the first leading terms are considered. We illustrate this first for the one-dimensional case: The general Taylor expansion around a minimum at x0 yields ∂ 1 ∂2 1 ∂3 2 V (x) = V (x0 ) + V (x) s + V (x) s + V (x) s3 + . . . , (7.5) 2 3 ∂x 2 ∂x 3! ∂x x0 x0 x0 with the displacement s = x − x0 . The linear term vanishes, because x0 is a stationary point of the PES, i.e., the equilibrium geometry in which no forces are active. In the harmonic approximation, cubic (and higher) terms are neglected, which yields 1 2 1 ∂2 2 V (x) ≈ V (x0 ) + V (x) s = V (x ) + cs . (7.6) 0 2 ∂x2 2 x0 Here, we have introduced the spring constant c =
h
∂2 V ∂x2
(x)
i
x0
in the last step, which
shows that we are indeed dealing with a harmonic potential that yields restoring forces F = −
∂ V (x) = −cs ∂x
that are proportional to the actual displacement s. 183
(7.7)
A corresponding Taylor expansion can also be carried out in three dimensions for the Born-Oppenheimer surface of many nuclei, which leads to M,M 3,3 1XX µ ν ∂2 BO BO ◦ BO V ({RI }) = V ({RI }) + s s V ({RI }) . (7.8) 2 I=1, µ=1, I J ∂RIµ ∂RJν {R◦ } J=1
I
ν=1
Here, µ, ν = x, y, z are the three Cartesian coordinates, I J run over the individual nuclei, and the linear term has vanished again for the exact same reasons as in the one-dimensional case. By defining the (3M × 3M ) matrix, i.e., the Hessian ∂2 µν BO V ({RI }) , (7.9) ΦIJ = ∂RIµ ∂RJν {R◦ } I
the Hamilton operator for the M nuclei of the solid can be written as H =
M X I=1
M,M 3,3 PI2 1 X X µ µν ν BO ◦ + V ({RI }) + s Φ s 2Mnu,I 2 I=1 µ=1 I IJ J J=1
,
(7.10)
ν=1
where Mnu,I is the mass of nucleus I. Please note that the contributions of the N elecBO trons is still implicitly contained in V BO ({R◦I }) and Φµν ({R◦I }) IJ . As a matter of fact, V corresponds to the energy of the rigid lattice discussed in previous chapters. Given that it is a constant, it does however not affect the motion of the nuclei, which only depends on Φµν IJ in the harmonic approximation. This also sheds some light on our earlier claim that generally V BO ({R◦I }) = E static ≫ E vib : The thermodynamically driven E vib is typically ≪ 0.5 eV/atom, e.g., approx. 50 meV/atom at room temperature. In comparison, the static term is often much larger, especially in the case of a strong (ionic or covalent) bonding, which often feature cohesive energies > 1 eV/atom (see previous chapter). However, E vib can become a decisive contribution whenever there is only a weak bonding or when there are (even strongly bound) competing phases, the energy difference of which is in the order of E vib . To evaluate the Hamiltonian H defined in Eq. (7.10) we need the (3M )2 second derivatives of the PES at the equilibrium geometry with respect to the 3M coordinates sµI . Analytically, this is very involved, since the Hellmann-Feynman theorem cannot be applied here: In exercise 12, we had seen that the forces acting on a nucleus J can be generally expressed as FJ = −∇RJ V BO ({RI }) = −∇RJ hΨ| H |Ψi ✿0 = − hΨ| ∇RJ H |Ψi − 2hΨ| H |∇RJ Ψi .
(7.11) (7.12)
Here, H is the full many-body Hamiltonian and Ψ = Ψ{RI } ({ri }) is the exact electronic many-body wavefunction in the Born-Oppenheimer approximation (see Chap. 1). Accordingly, evaluating the forces at a specific configuration {RI } only requires to know the wavefunction (density) at this specific configuration, but not its derivative. This is no longer true for higher-order derivatives of V BO ({RI }), e.g., Φµν IJ
dFµI = − ν dRJ
(7.13)
= − hΨ|
∂ 2H ∂H ∂H |Ψi − h∂Ψ/∂RνJ | |∂Ψ/∂RνJ i . (7.14) µ µ |Ψi − hΨ| ν ∂RI ∂RJ ∂RI ∂RµI 184
In this case, the derivative or response of the wavefunction ∂Ψ/∂RνJ to a displacement enters explicitly. Formally, this can be generalized in the 2n + 1 theorem [Gonze and Vigneron, Phys. Rev. B 39, 13120 (1989)]: Evaluating the (2n+1)th derivative of the BornOppenheimer surface requires to know the nth derivative of the wavefunction (density). Essentially, this problem can be tackled by means of perturbation theory [Baroni, de Gironcoli, and Dal Corso, Rev. Mod. Phys. 73, 515 (2001).] In this lecture, we will discuss an alternative route that exploits the three-dimensional analog to Eq. (7.7), i.e., =0
Φµν IJ ≈
FIµ (R◦1 , . . . , R◦J
+
δsµJ , . . . , R◦M )
z }| { − FIµ (R◦1 , . . . , R◦J , . . . , R◦M )
δsJ,µ
.
(7.15)
In this finite difference approach, the second derivative can be obtained numerically by simply displacing an atom slightly from its equilibrium position and monitoring the arising forces on this and all other atoms. This approach will be discussed in more detail in the exercises. At this stage, this force point of view highlights that many elements of the second derivative matrix will vanish: A small displacement of atom I will only lead to non-negligible forces on the atoms in its immediate vicinity. Atoms further away will not notice the displacement, and hence the second derivative between these atoms J and the initially displaced atom I will be zero:3 ◦ ◦ Φµν IJ = 0 for |RI − RJ | ≫ a0
,
(7.16)
where a0 is the lattice parameter. Very often, one even finds that the values of Φµν IJ become negligible for |R◦I − R◦J | > 2a0 . Accordingly, only a finite number of differences (7.15), i.e., only a finite number of neighbours J need to be inspected for each atom I. Despite these simplifications, this numerical procedure is at this stage still unfortunately intractable, because we would have to do it for each I of the M atoms in the solid (i.e. a total of 3M times). With M ∼ 1023 , this is unfeasible, and we need to exploit further properties of solids to reduce the problem. Obviously, just like in the case of the electronic structure problem a much more efficient use of the periodicity of the lattice must be made. Although the lattice vibrations break the perfect periodicity of the static crystal, we can still exploit the translational invariance of the lattice by assuming that periodic images of the unit cell behave in a similar fashion, i.e., break the local symmetry in a similar fashion. Most clearly, this can be understood and derived by inspecting the equations of motion following from the general form of Eq. (7.10).
7.2.3
Classical Equations of Motion
Before further analyzing the quantum-mechanical problem, it is instructive and more intuitive to first discuss the classical Hamilton function corresponding to Eq. (7.10). We will see that the classical treatment emphasizes more the wave-like character of the lattice vibrations, whereas a particle (corpuscular) character follows from the quantum-mechanical 3
In the case of polar crystals with ionic bonding and thus long-range Coulomb interactions this assumption does not hold. In that case, long-range interactions need to be explicitly accounted for, see for instance Baroni, de Gironcoli, and Dal Corso, Rev. Mod. Phys. 73, 515 (2001).
185
treatment. This is not new, but just another example of the wave-particle dualism known, e.g., from light (electromagnetic waves viz. photons). In the classical case and knowing the structure of the Hamilton function from Eq. (7.10), the total energy (as sum of kinetic and potential energy) can be written as E =
M X Mnu,I I=1
2
s˙ 2I
+V
BO
({R◦I })
M,M 3,3 1 X X µ µν ν + s Φ s 2 I=1, µ=1, I IJ J J=1
(7.17)
ν=1
in the harmonic approximation. Accordingly, the classical equations of motion for the nuclei can be written as X µν Mnu,I s¨µI = FµI = − ΦIJ sνJ , (7.18) J,ν
where the right hand side represents the µ-component of the force vector acting on atom I of mass Mnu,I and the double dot above s represents the second time derivative. As a first step, we will get rid of the time dependence by using the well-known ansatz sµI (t) = uµI eiωt
.
(7.19)
for this harmonic second-order differential equation. Inserting Eq. (7.19) into Eq. (7.18) leads to X µν Mnu,I ω 2 uµI = ΦIJ uνJ . (7.20) J,ν
This is a system of coupled algebraic equations with 3M unknowns (still with M ∼ 1023 !). As already remarked above, it is hence not directly tractable in this form, and we exploit that the solid has a periodic lattice structure in its equilibrium geometry (just like in the electronic case). Correspondingly, we decompose the coordinate RI into a Bravais lattice vector Ln (pointing to the unit cell) and a vector Rα pointing to atom α of the basis within the unit cell, (7.21) RµI = Lµn + Rµα . Formally, we can therefore replace the index I over all atoms in the solid by the index pair (n, α) running over the unit cells and different atoms in the basis. With this, the matrix of second derivatives takes the form µν ′ Φµν IJ = Φαβ (n, n ) .
(7.22)
And in this form, we can now start to exploit the periodicity of the Bravais lattice. From the translational invariance of the lattice follows immediately that µν ′ ′ Φµν αβ (n, n ) = Φαβ (n − n ) .
(7.23)
This suggests, that uI may be describable by means of a vibration 1 cµα eiqLn uµα (Ln ) = p Mnu,α 186
(7.24)
where c is the amplitude of displacement and q is a vector in reciprocal space.4 Using this ansatz for Eq. (7.20), we obtain ( ) X X 1 µν 2 µ ′ iq(Ln −Ln′ ) p ω cα = cνβ . (7.25) Φαβ (n − n )e Mα Mβ βν n′
Note that the part in curly brackets is only formally still dependent on the position RI (through the index n). Since all unit cells are equivalent, RI is just an arbitrary zero for the summation over the whole Bravais lattice. The part in curly brackets, X µν 1 µν Φαβ (n − n′ )eiq(Ln −Ln′ ) Dαβ (q) = p Mα Mβ n ′
,
(7.26)
is called dynamical matrix, and its definition allows us to rewrite the homogeneous linear system of equations in a very compact way X µν ω 2 cµα = Dαβ (q)cνβ ⇒ ω 2 c = D(q)c . (7.27) βν
In the last step, we have introduced the vectorial notation c = (cx1 , cy1 , cz1 , cx2 , · · · ) to highlight that we are dealing with an eigenvalue problem here. Compared to the original equation of motion, cf. Eq. (7.20), a dramatic simplification has been reached. Due to the lattice symmetry we achieved to reduce the system of 3Mp × M equations to 3Mp equations, where Mp is the number of atoms in the basis of the unit cell. The price we paid for it is that we have to solve the equations of motion for each q anew. This is comparably easy, though, since this scales linearly with the number of q-points and not cubic like the matrix inversion, and has to be done only for few q (since we will see that the q-dependent functions are quite smooth). From Eq. (7.27), one gets a condition for the existence of solutions to the eigenvalue problem (i.e. solutions must satisfy the following:) ˆ (7.28) det D(q) − ω 2 ˆ1 = 0 . This problem has 3Mp eigenvalues, which are functions of the wave vector q, ω = ωi (q) ,
(7.29)
i = 1, 2, . . . , 3Mp
and to each of these eigenvalues ωi (q), Eq. (7.27) provides the eigenvector ci (q) = (cxi,1 (q), cyi,1 (q), czi,1 (q), cxi,2 (q), · · · ) .
(7.30)
These eigenvectors are determined up to a constant factor, and this factor is finally obtained by requiring all eigenvectors to be normalized (and orthogonal to each other). In conclusion, we find that the displacement sI (t) of atom I = (n, α) away from the equilibrium position at time t is given by
4
X 1 Ai (q)cµi,α (q)ei(qLn −ωi (q)t) sµI (t) = p Mnu,α i,q
,
(7.31)
To distinguish electronic and nuclear degrees of freedom, q and not k is used in the description of lattice vibrations.
187
whereby the amplitudes Ai (q) are determined by the initial conditions. With this, the system of 3M three-dimensional coupled oscillators is transformed into 3M decoupled (but “collective”) oscillators, and the solutions (or so-called normal modes) correspond to waves which propagate throughout the whole crystal. The resemblance of these waves to what we think of as oscillators traditionally is thus only formal. The normal modes form a basis for the description of lattice vibrations, i.e., any arbitrary displacement situation of the atoms can now be written as an expansion in these special solutions.
7.2.4
Comparison between ǫn (k) and ωi (q)
We have derived the normal modes of lattice vibrations exploiting the periodicity of the ideal crystal lattice in a similar way as we did for the electron waves in chapters 4 and 5. Since it is only the symmetry that matters, the dispersion relations ωi (q) will therefore exhibit a number of properties equivalent to those of the electronic eigenvalues ǫn (k). We will not derive these general properties again, but simply list them here (check chapter 4 and 5 to see how they emerge from the periodic properties of the lattice): 1. ωi (q) is periodic in q-space. The discussion can therefore always be restricted to the first Brillouin zone. 2. ǫn (k) and ωi (q) have the same symmetry within the Brillouin zone. Additionally to the space group symmetry of the lattice, there is also time inversion symmetry, yielding ωi (q) = ωi (−q). 3. Due to the use of periodic boundary conditions, the number of q-values is finite. When the solid is composed of M/Mp unit cells, then there are M/Mp different qvalues in the Brillouin-zone. Since i = 1 . . . 3Mp , there are in total 3Mp × (M/Mp ) = 3M values for ωi (q), i.e., exactly the number of degrees of freedom of the crystal lattice. When speaking of solids, M ∼ 1023 so that q and the dispersions relations ωi (q) become essentially continuous. For practical calculation using supercells of finite size, this aspect has however to be kept in mind, as discussed in more detail in the exercises. 4. ωi (q) is an analytic function of q within the Brillouin zone. Whereas the band index n in ǫn (k) can take arbitrarily many values, i in ωi (q) is restricted to 3Mp values. We will see below that this is connected to the fact that electrons are fermions and lattice vibrations (phonons) bosons.
7.2.5
Simple One-dimensional Examples
The difficulty of treating and understanding lattice vibrations is primarily one of bookkeeping. The many indices present in the general formulae of section 7.2.2 and 7.2.3 let one easily forget the basic (and in principle simple) physics contained in them. It is therefore very useful (albeit academic) to consider toy model systems to understand the idea behind normal modes. The simplest possible system representing an infinite periodic lattice (i.e. with periodic boundary conditions) is a linear chain of atoms, and we will see that with the conceptual understanding obtained for one dimension it will be straightforward to generalize to real three-dimensional solids. 188
unit cell a M
s(n-1)
s(n)
s(n+1)
s(n+2)
Figure 7.1: Linear chain with one atomic species (Mass M , and lattice constant a).
7.2.5.1
Linear Chain with One Atomic Species
Consider an infinite one-dimensional linear chain of atoms of identical mass M , connected by springs with (constant) spring constant f as shown in Fig. 7.1. The distance between two atoms in their equilibrium position is a, and s(n) the displacement of the nth atom from this equilibrium position. In analogy to eq. (7.18), the Newton equation of motion for this system is written as ..
M sn = −f (s(n) − s(n + 1)) + f (s(n − 1) − s(n)) ,
(7.32)
i.e., the sum over all atoms breaks down to left and right nearest neighbor, and the second derivative of the PES corresponds simply to the spring constant. We solve this problem as before by treating the displacement as a sinusoidally varying wave: 1 s(n) = √ cei(qan−ωt) M yielding ω 2 M = f 2 − e−iqa − eiqa Solving for ω, one obtains ω(q) = 2
r
(7.33)
,
f qa sin M 2
(7.34)
.
(7.35)
.
Alternatively, one can solve this problem using the dynamical matrix formalism given by Eq. (7.27) and discussed in the previous chapter. Formally, the potential energy terms that depend on sn are V =
f f (s(n − 1) − s(n))2 + (s(n) − s(n + 1))2 , 2 2
(7.36)
which yields the second derivatives ∂ 2V ∂ 2V = = −f ∂s(n)∂s(n − 1) ∂s(n)∂s(n + 1) 189
and
∂ 2V = 2f . ∂s(n)∂s(n)
(7.37)
ω(q) / (f/M)1/2
2.0 1.5 1.0 0.5 0.0 −π/a
0
+π/a
q
Figure 7.2: Dispersion relation ω(q) for the linear chain with one atomic species.
Accordingly, the full Hessian matrix has the structure ............................................ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 −f 2f −f 0 0 0 . . . 0 0 . . . 0 0 −f 2f −f 0 0 . . . 0 Φ= 0 . . . 0 0 0 −f 2f −f 0 . . . 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............................................
.
(7.38)
Please note that we omitted the cartesian degrees of freedom µ, ν in this one-dimensional case. Since this system has only one atom in the unit cell, it is sufficient to focus on one line of the Hessian (α = 1). This yields the dynamical “matrix” with one entry:
f (2 − eiqa − e−iqa ) . M Accordingly, we get the dispersion relation r f qa ω(q) = 2 sin , M 2 the “eigenvector” c = 1, and thus the displacement field 1 X s(n, t) = √ A(q)ei(qan−ω(q)t) M q
(7.39)
D(q) =
(7.40)
.
(7.41)
As expected, ω is a periodic function of q and symmetric with respect to q and −q. The first period lies between q = −π/a and q = +π/a (first Brillouin zone), and the form of the dispersion is plotted in Fig. 7.2. Equation (7.41) gives the explicit solutions for the displacement pattern in the linear chain for any wave vector q. They describe waves propagating along the chain with phase velocity ω/q and group velocity vg = ∂ω/∂q. Let us look in more detail at the two limiting cases q → 0 and q = π/a (i.e. the center and the border of the Brillouin zone). For q → 0 we can approximate ω(q) given by Eq. (7.35) with its leading term in the small q Taylor expansion (sin(x) ≈ x for x → 0) r ! f ω ≈ a q . (7.42) M 190
s(n-1) = -s(n)
s(n)
s(n+1) = -s(n) s(n+2) = s(n) s(n+3) = -s(n)
s(n+4) = s(n)
Figure 7.3: Snap shot of the displacement pattern in the linear chain at q = +π/a.
ωp is thus linear in q, and the proportionality constant is simply the group velocity vg = a f /M , which is in turn identical to the phase velocity and independent of frequency. For the displacement pattern, we obtain A(q) s(n, t) = √ eiq(an−vg t) M
(7.43)
in this limit, i.e., at these small q-values (long wavelength) the phonons propagate simply as vibrations with constant speed vg (the ordinary sound speed in this 1D crystal!). This is the same result we would have equally obtained within linear elasticity theory, which is understandable because for such very long wavelength displacements the atomic structure and spacing is less important (the atoms move almost identically on a short length scale). Fundamentally, the existence of a branch of solutions whose frequency vanishes as q vanishes results from the symmetry requiring the energy of the crystal to remain unchanged, when all atoms are displaced by an identical amount. We will therefore always obtain such types of modes, i.e., ones with ω(0) = 0, regardless of whether we consider more complex interactions (e.g. spring constants to second nearest neighbors) or higher dimensions. Since their dispersion is close to the center of the Brillouin zone linear in q, which is characteristic of sound waves, these modes are commonly called acoustic branches.5 One of the characteristic features of waves in discrete media, however, is that such a linear behaviour ceases to hold at wavelengths short enough to be comparable to the interparticle spacing. In the present case ω falls below vg q as q increases, the dispersion curve bends over and becomes flat (i.e. the group velocity drops to zero). At the border of the Brillouin zone (for q = π/a) we then obtain 1 s(n, t) = √ eiπn e−iωt M
.
(7.44)
Note that eiπn = 1 for even n, and eiπn = −1 for odd n. This means that neighboring atoms vibrate against each other as shown in Fig. 7.3. This also explains, why in this case the highest frequency occurs. Again, the flattening of the dispersion relation is determined entirely by symmetry: From the repetition in the next Brillouin zone and with kinks forbidden, vg = ∂ω/∂q = 0 must result at the Brillouin zone boundary, regardless of how much more complex and real we make our model. 191
a -a/4
+a/4 M1
s(1,n)
s(2,n)
M2
s(1,n+1)
s(2,n+1)
Figure 7.4: Linear chain with two atomic species (Masses M 1 and M 2). The lattice constant is a and the interatomic distance is a/2.
7.2.5.2
Linear Chain with Two Atomic Species
Next, we consider the case shown in Fig. 7.4, where there are two different atomic species in the linear chain, i.e., the basis in the unit cell is two. The equations of motion for the two atoms follow as a direct generalization of Eq. (7.32) ..
M1 s1 (n) = −f (s1 (n) − s2 (n) − s2 (n − 1)) .. M2 s2 (n) = −f (s2 (n) − s1 (n + 1) − s1 (n))
(7.45) (7.46)
.
Hence, also the same wave-like ansatz should work as before 1 1 s1 (n) = √ c1 ei(q(n− 4 )a−ωt) M1 1 1 s2 (n) = √ c2 ei(q(n+ 4 )a−ωt) M2
(7.47) (7.48)
.
This leads to qa 2f 2f c1 + √ c2 cos 2 M1 M2 M1 M2 2f 2f qa = −√ c2 + √ c1 cos 2 M1 M2 M1 M2
−ω 2 c1 = − √ −ω 2 c2
(7.49) ,
(7.50)
i.e., in this case the frequencies of the two atoms are still coupled to each other. The eigenvalues follow from the determinant qa 2 √ 2f √ 2f − ω − cos 2 M1 M2 M1 M2 (7.51) = 0 , 2f qa 2f 2 −√ √ cos 2 −ω M1 M2 M1 M2 which yields the two solutions 2 ω±
5
s 2 qa 1 1 1 1 4 = f + ±f + − sin2 M1 M2 M1 M2 M1 M2 2 s qa f 1 4 = ±f − sin2 , f f2 M1 M2 2 M M
(7.52)
Notable exceptions do exist: For instance, graphene exhibits a parabolic dispersion in its acoustic modes close to Γ.
192
−1 f= 1 + 1 with the reduced mass M . M1 M2 Again, one can also solve this problem using the dynamical matrix formalism given by Eq. (7.27) and discussed in the previous chapter. Formally, the potential energy terms that depend on s1 (n) and s2 (n) are given by V =
f f f [s2 (n − 1) − s1 (n)]2 + [s1 (n) − s2 (n)]2 + [s2 (n) − s1 (n + 1)]2 , 2 2 2
(7.53)
which yields the second derivatives ∂ 2V ∂ 2V ∂ 2V ∂ 2V = = = = −f ∂s1 (n)∂s2 (n − 1) ∂s1 (n)∂s2 (n) ∂s2 (n)∂s1 (n) ∂s2 (n)∂s1 (n + 1) ∂ 2V ∂ 2V = = 2f . (7.54) ∂s1 (n)∂s1 (n) ∂s2 (n)∂s2 (n) Accordingly, the full Hessian matrix exhibits the structure ............................................ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 −f 2f −f 0 0 0 . . . 0 0 . . . 0 0 −f 2f −f 0 0 . . . 0 Φ= 0 . . . 0 0 0 −f 2f −f 0 . . . 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............................................
.
(7.55)
Again, cartesian degrees of freedom µ, ν have been omitted for this one-dimensional case. In contrast to our earlier example, this system features two atoms in the unit cell, so that we now get a dynamical matrix with dimension two: ! 2f √ f + exp(−iqa)) − (1 M1 M1 M2 D(q) = . (7.56) 2f − √Mf1 M2 (1 + exp(+iqa)) M2
The respective eigenvalues are the solutions of qa 1 4f 2 1 4 ω − 2f + ω2 + sin2 =0 M1 M2 M1 M2 2 | {z }
(7.57)
f−1 M
that yields the dispersion relation v s u qa uf 1 4 ω± (q) = t ± f − sin2 . f f2 M1 M2 2 M M
(7.58)
We therefore obtain two branches for ω(q), the dispersion of which is shown in Fig. 7.5. As apparent, one branch goes to zero for q → 0, ω− (0) = 0, as in the case of the monoatomic chain, i.e., we have again an acoustic branch. The q other branch, on the other hand, exhibits f interestingly a (high) finite value ω+ (0) = 2 M f for q → 0 and shows a much weaker dispersion over the Brillouin zone. Whereas in the acoustic branch both atoms within the 193
ω(q) / (f/M)1/2
2.0 1.5 1.0 0.5 0.0 −π/a
0
+π/a
q
Figure 7.5: Dispersion relations ω(q) for the linear chain with two atomic species (the specific values are for M1 = 3M2 = M ). Note the appearance of the optical branch not present in Fig. 7.2.
unit cell move in concert, they vibrate against one another in opposite directions in the other solution. In ionic crystals (i.e. where there are opposite charges connected to the two species), these modes can therefore couple to electromagnetic radiation and are responsible for much of the characteristic optical behavior of such crystals. Correspondingly, they are generally (i.e. not only in ionic crystals) called optical modes, and their dispersion is called the optical branch. q 2f At the border of the Brillouin zone at q = π/a we finally obtain the values ω+ = M 1 q 2f and ω− = M2 , i.e. as apparent from Fig. 7.5 a gap has opened up between the acoustic and the optical branch. The width of this gap depends only on the mass difference of the two atoms (M1 vs. M2 ) in this case.6 The gap closes for identical masses of the two species. This is comprehensible since then our model with equal spring constants and equal masses gets identical to the previously discussed model with one atomic species: The optical branch is then nothing but the upper half of the acoustic branch, folded back into the half-sized Brillouin zone.
7.2.6
Phonon Band Structures
With the conceptual understanding gained within the one-dimensional models, let us now proceed to look at the lattice vibrations of a real three-dimensional crystal. In principle, the formulae to describe the dispersion relations for the different solutions are already given in section 7.2.3. Most importantly, it is Eq. (7.28) that gives us these dispersions. In close analogy to the electronic case, the dispersions are called phonon band structure in their totality. The crucial quantity is the dynamical matrix. The different branches of the band structure follow from the eigenvalues after diagonalizing this matrix. This route is also followed in quantitative approaches to the phonon band structure problem, e.g., within density-functional theory. In the so-called direct, supercell, or frozenphonon approach, one exploits the expression of the forces as given in Eq. (7.15) for the computation of the dynamical matrix, since in principle it is nothing more than a massweighted Fourier transform of the force constants. In practice, the different atoms in the 6
In more realistic cases, varying spring constants, e.g., f1 and f2 , would also affect the gap.
194
Figure 7.6: DFT-LDA and experimental phonon band structure (left) and corresponding density of states (right) for fcc bulk Rh [from R. Heid, K.-P. Bohnen and W. Reichardt, Physica B 263, 432 (1999)].
unit cell are displaced by small amounts and the forces on all other atoms are recorded. In order to be able to compute the dynamical matrix with this procedure at different points in the Brillouin zone, larger supercells are employed and the (in principle equivalent) atoms within this larger periodicity displaced differently, so as to account for the different phase behavior. Obviously, in this way only q-points that are commensurate with the supercell periodicity under observation can be accessed, and one runs into the problem that quite large supercells need to be computed in order to get the full dispersion curves. Approaches based on perturbation theory (linear response regime) are therefore also frequently employed: The details of the different methods go beyond the scope of the present lecture, but a good overview can for example be found in R.M. Martin, Electronic Structure: Basic Theory and Practical Methods, Cambridge University Press, Cambridge (2004). Experimentally, phonon dispersion curves are primarily measured using inelastic neutron scattering and Ashcroft/Mermin dedicates an entire chapter to the particularities of such measurements. Overall, phonon band structures obtained by experimental measurements and from DFT-LDA/GGA simulations typically agree extremely well these days, regardless of whether computed by a direct or a linear response method. Instead of discussing the details of either measurement or calculation, we rather proceed to some practical examples. If one has a mono-atomic basis in the three-dimensional unit cell, we expect from the onedimensional model studies that only acoustic branches should be present in the phonon band structure. There are, however, three of these branches because the orientation of the polarization vector (i.e. the eigenvector belonging to the eigenvalue ω(q)), cf. Eqs. (7.30) and (7.31)) matters in three-dimensions. In the linear chain, the displacement was only considered along the chain axis and such modes are called longitudinal. If the atoms vibrate in the three-dimensional case perpendicular to the propagation direction of the wave, two further transverse modes are obtained, and longitudinal and transverse modes must not necessarily exhibit the same dispersion. Fig. 7.6 shows the phonon band structure for bulk Rh in the fcc structure and the phonon density of states (DOS) resulting from it (which is obtained analogously to the electronic structure case by integrating over the 195
Figure 7.7: DFT-LDA and experimental phonon band structure (left) and corresponding density of states (right) for Si (upper panel) and AlSb (lower panel) in the diamond lattice. Note the opening up of the gap between acoustic and optical modes in the case of AlSb as a consequence of the large mass difference. 8 cm−1 ≈ 1 meV [from P. Giannozzi et al., Phys. Rev. B 43, 7231 (1991)].
196
Brillouin zone). Particularly along the lines Γ − X and Γ − L, which run both from the Brillouin zone center to the border, the dispersion of the bands qualitatively follows the form discussed for the linear chain model above and is thus conceptually comprehensible. Turning to a poly-atomic basis, the anticipated new feature is the emergence of optical branches due to vibrations of the different atoms against each other. This is indeed apparent in the phonon band structure of Si in the diamond lattice shown in Fig. 7.7, and we again roughly find the shape that also emerged in the diatomic linear chain model for these branches (check again the Γ − X and Γ − L curves). Since there are in general 3Mp different branches for a basis of Mp atoms, cf. section 7.2.4, 3Mp −3 optical branches result (i.e. in the present case of a diatomic basis, there are 3 optical branches, one longitudinal one and two transverse ones). Although the diamond lattice has two inequivalent lattice sites (and thus represents a case with a diatomic basis), these sites are both occupied by the same atom type in bulk Si. From the discussion of the diatomic linear chain, we would therefore expect a vanishing gap between the upper end of the acoustic branches and the lower end of the optical ones. Comparing with Fig. 7.7 this is indeed the case. Looking at the phonon band structure of AlSb also shown in this figure, a huge gap can now, however, be discerned. Obviously, this is the result of the large mass difference between Al and Sb in this compound, showing that we can indeed understand most of the qualitative features of real phonon band structures from the analysis of the two simple linear chain models. In order to develop a rough feeling for the order of magnitude of lattice vibrational frequencies (or in turn the energies contained in these vibrations), it is helpful to keep the inverse variation of the phonon frequencies with mass in mind, cf. Eq. (7.26). For transition metals, we are talking about a range of 20-30 meV for the optical modes, cf. Fig. 7.6, while for lighter elements this becomes increasingly higher, cf. Fig. 7.7. First row elements like oxygen in solids (oxides) exhibit optical vibrational modes around 7080 meV, whereas the acoustic modes go down to zero at the center of the Brillouin zone in all cases. Thus, all frequencies lie in the range of thermal energies, which already tells us how important these modes will be for the thermal behavior of the solid.
7.3
Quantum Theory of the Harmonic Crystal
In the preceding section we have learnt to compute the lattice vibrational frequencies when the ions are treated as classical particles. In practice, this approximation is not too bad for quite a range of materials properties. The two main reasons are the quite heavy masses of the ions (with the exception of H and He), and the overall rather similar properties of the quantum mechanical and the classical harmonic oscillator on which all of the harmonic theory of lattice vibrations is based. In a very rough view, one can say that if only the vibrational frequency ω(q) of a mode itself is relevant, the classical treatment gives already a good answer (which is ultimately why we could understand the experimentally measured phonon band structure with our classical analysis). As a matter of fact, the formalism introduced in the previous chapter has highlighted that the eigenvalues and -vectors are a property of the harmonic potential itself. If, on the other hand, the energy connected with this mode, ω(q), is important or, more precisely, the way one excites (or populates) this mode is concerned, then the classical picture usually fails. The prototypical example for this is the specific heat due to the lattice vibrations, which is nothing else but the measure of how effectively the vibrational modes of the lattice can 197
take up thermal energy. In order to understand material properties of this kind, we need a quantum theory of the harmonic crystal. Just as in the classical case, let us first recall this theory for a one-dimensional harmonic oscillator before we really address the problem of the full solid-state Hamiltonian.
7.3.1
One-dimensional Quantum Harmonic Oscillator
Instead of just solving the quantum mechanical problem of the one-dimensional harmonic oscillator, the focus of this section will rather be to recap a specific formalism that can be easily generalized to the three-dimensional case of the coupled lattice vibrations afterwards. At first sight, this formalism might appear mathematically quite involved and seems to “overdo” for this simple problem, but in the end, the easy generalization will make it all worthwhile. In accordance with Eq. (7.6), the classical Hamilton function for the one-dimensional harmonic oscillator is given by classic H1D = T +V =
p2 1 p2 mω 2 2 + f s2 = + s 2m 2 2m 2
,
(7.59)
where f = mω 2 is the spring constant and s the displacement. From here, the translation to quantum mechanics proceeds via the general rule to replace the momentum p by the operator ∂ p = −i~ , (7.60) ∂s where p and s satisfy the commutator relation [p, s] = ps − sp = −i~ ,
(7.61)
which is in this case nothing but the Heisenberg uncertainty principle. The resulting Hamiltonian preserves the form as in Eq. (7.59), but p and s have now to be treated as (non-commuting) operators. This structure can be considerably simplified by introducing the lowering operator r r mω 1 a = s + i p , (7.62) 2~ 2~mω and the raising operator a† =
r
mω s − i 2~
r
1 p . 2~mω
(7.63)
Note that the idea behind this pair of adjoint operators leads to the concept called second quantization, where a and a† are denoted as annihilation and creation operator, respectively. The canonical commutation relation of Eq. (7.61) then implies (7.64)
[a, a† ] = 1 , and the Hamiltonian takes the simple form QM H1D
1 † = ~ω a a + 2 198
.
(7.65)
From the commutation relation and the form of the Hamiltonian, it follows that the eigenstates |n> (n = 0, 1, 2, . . .) of this problem fulfill the following relations: (7.66) (7.67) (7.68)
a† |n> = (n + 1)1/2 |n + 1> , a|n> = n1/2 |n − 1> (n 6= 0) , a|0> = 0 .
A derivation of these is given in any decent textbook on quantum mechanics. We see that the operators a and a† lower or raise the excitation state of the system. This explains the names given to these operators, in particular when considering that each excitation can be seen as one quantum: The creation operator creates one quantum, while the annihilation operator wipes one out. With these recursion relations between the different states, the eigenvalues (i.e. the energy levels) of the one-dimensional harmonic oscillator are finally obtained as QM En = < n|H1D |n> = (n + 1/2) ~ω
.
(7.69)
Note, that E0 6= 0, i.e., even in the ground state the oscillator has some energy (so called energy due to zero-point vibrations or zero-point energy), whereas with each higher excited state one quantum ~ω is added to the energy. With the above mentioned picture of creation and annihilation operators, one therefore commonly talks about the state En of the system as corresponding to having excited n phonons of energy ~ω. This nomenclature accounts thus for the discrete nature of the excitation and emphasizes more the corpuscular/particle view. In the classical case, the energy of the system was determined by the amplitude of the vibration and could take arbitrary continuous values. This is no longer possible for the quantum mechanical oscillator.
7.3.2
Three-dimensional Quantum Harmonic Crystal
For the real three-dimensional case, we will now proceed in an analogous fashion as in the simple one-dimensional model. To make the formulae not too complex, we will restrict ourselves to the case of a mono-atomic basis in this section, i.e., all ions have a mass Mnu and Mp = 1. The generalization to polyatomic bases is straightforward, but messes up the mathematics without yielding too much further insight. In section 7.2.2, we had derived the total Hamilton operator of the M nuclei in the harmonic approximation. It is given by M,M 3,3 M X p2I 1 X X µ µν ν BO ◦ H = + V ({RI }) + sI ΦIJ sJ , 2M 2 nu µ=1 I=1 I=1 ν=1
(7.70)
J=1
and the displacement field defined in Eq. (7.31) solves the classical equations of motion. For a mono-atomic basis, we can subsume all time-dependent terms into s˜i (q, t) and define the transformation to normal mode coordinates X sµI (t) = s˜i (q, t)cµi,α (q)ei(qLn ) . (7.71) i,q
199
Inserting this form of the displacement field into Eq. (7.70) yields H vib
= =
3 X 2 X p˜ (q)
3
1 XX Mnu ωi2 (q)˜ s2i (q, t) 2M 2 nu i=1 q i=1 q 3 X X p˜2 (q) Mnu ωi2 (q) 2 i , + s˜i (q, t) 2Mnu 2 i=1 q | {z } i
+
(7.72) (7.73)
Hivib (q)
where we have introduced the generalized momentum p˜i = d/dt s˜2i (q, t) and focused only on the vibrational part of the Hamiltonian, H vib , i.e., we discarded the contribution from the static minimum of the potential energy surface by setting V BO ({R◦I }) = 0. With this transformation, the Hamiltonian has decoupled to a sum over independent harmonic oscillators defined by one-dimensional Hamiltonians Hivib (q) (compare with Eq. (7.59)!). It is thus straightforward to generalize the solution discussed in detail in the last section to this more complicated case. For this purpose, all previously defined operators need to become i- and q-dependent: We define annihilation and creation operators for a phonon (i.e. normal mode) with frequency ωi (q) and wave vector q as s r Mnu ωi (q) 1 ai (q) = s˜i (q) + i p˜i (q) , (7.74) 2~ 2~Mnu ωi (q) s r M ω (q) 1 nu i a†i (q) = si (q) − i p˜i (q) . (7.75) 2~ 2~Mnu ωi (q) Similarly, the Hamiltonian is brought to a form equivalent to Eq. (7.65), i.e., H
vib
=
3 X X i=1
~ωi (q)
q
a†i (q)ai (q)
1 + 2
,
(7.76)
and the eigenvalues generalize to E
vib
=
3 X X i=1
q
1 ~ωi (q) ni (q) + 2
.
(7.77)
The energy contribution of the lattice vibrations therefore stems from 3M harmonic oscillators with frequencies ωi (q). The frequencies are the same as in the classic case, which is (as already remarked in the beginning of this section) why the classical analysis permitted us already to determine the phonon band structure. In the harmonic approximation, the 3M modes are completely independent of each other: Any mode can be excited independently and contributes the energy ~ωi (q). While this sounds very familiar to the concept of the independent electron gas (with its energy contribution ǫn (k) from each state (n, k)), the fundamental difference is that in the electronic system each state (or mode in the present language) can only be occupied with one electron (or two in the case of non-spin polarized systems). This results from the Pauli principle and we talk about fermionic particles. In the phonon gas, on the other hand, each mode can be excited with an arbitrary number of (indistinguishable) phonons, i.e., the occupation numbers ni (q) 200
are not restricted to the values 0 and 1. Phonons are therefore bosonic particles. In a hand-waving manner, the following analogy holds: The individual modes ~ωi (q) are the phonon analogue to the electronic levels ǫn (k). In the phononic case, however, the respective occupation numbers ni (q) are not restricted to the values 0 and 1, but can in principle have any integer value in [0, ∞) – depending on the thermodynamic conditions (see next section). For this exact reason, the number of phonons is neither constant nor conserved, but dictated by the energy of the lattice and therewith through the temperature. At T = 0 K, the occupation numbers are zero (ni (q) ≡ 0). Still, the lattice is not static, since each mode contributes ~ωi (q)/2 to the lattice energy from the quantum mechanical zero-point vibrations as discussed for Eq. (7.77). At increasing temperatures, more and more phonons are created, so that these lattice vibrations are further enhanced and the lattice energy rises.
7.3.3
Lattice Energy at Finite Temperatures
Having obtained the general form of the quantum mechanical lattice energy with Eq. (7.77), the logical next step is to ask what this energy is at a given temperature T , i.e. we want to evaluate E vib (T ). Looking at the form of Eq. (7.77), the quantity we need to evaluate in order to obtain E vib (T ) is obviously ni (q), i.e., the number of excited phonons in the normal mode (q, ω(q)) as a function of temperature. According to the last two sections, the energy of this mode with n phonons excited is En = (n + 1/2)~ωi (q) and consequently the probability for such an excitation at temperature T is proportional to exp(−En /(kB T )). Normalizing (the sum of all probabilities in the denominator must be one), we arrive at e Pn (T ) = P
− kEnT
−
B
le
E − k lT B
e = P ∞ l
n~ωi (q) kB T
e
−
xn = P∞
l~ωi (q) kB T
l
xl
(7.78)
,
) in the last step. From where we have defined the short hand variable x = exp(− ~ωkBi (q) T mathematics, the value of the geometric series is known as ∞ X
xn =
n=0
1 1−x
(7.79)
,
which simplifies the probability to Pn (T ) = xn (1 − x) .
(7.80)
With this probability the mean energy of the oscillator at temperature T becomes E i (T, ω(q)) =
∞ X
En Pn (T ) = E0 + ~ω( q)
n=0
nPn (T )
n=1
∞ X ~ωi (q) + ~ωi (q) (1 − x) nxn 2 n=0
= Using the relation
∞ X
∞ X n=0
nxn =
x (1 − x)2
201
,
.
(7.81)
(7.82)
10
3 _
T = 100 K T = 300 K T = 500 K
8
h_ ω = 10 meV h_ ω = 25 meV h ω = 40 meV
2.5 2 nBE
nBE
6
1.5
4 1
2 0 0
0.5
5
_
h ω (meV)
10
0 0
15
100
200
300
400
500
T (K)
Figure 7.8: Bose-Einstein distribution nBE as defined in Eq. (7.84): The left plot shows the frequency dependence for three different temperatures, whereas the right plot shows the temperature dependence for three different phonon frequencies. Note the dashed line that highlights the maximum possible occupation number in a Fermi statistics.
that can be obtained from differentiating Eq. (7.79), we find that the mean energy of this oscillating mode is ! 1 1 E i (T, ω(q)) = ~ωi (q) + . (7.83) ~ωi (q) e kB T − 1 2 Comparing with the energy levels of the mode En = ~ωi (q)(n + 1/2), we can identify 1
ni (q) = e
~ωi (q) kB T
(7.84) −1
as the mean excitation (or occupation with phonons) of this particular mode at T . This is simply the famous Bose-Einstein statistics, consistent with the remark above that phonons behave as bosonic particles. Knowing the energy of one mode at T , cf. Eq. (7.83), it is easy to generalize over all modes and arrive at the total energy due to all lattice vibrations at temperature T E vib (T ) =
3Mp X X i=1
1
~ωi (q) e
q
~ωi (q) kB T
1 + −1 2
!
.
(7.85)
This is still a formidable expression and we notice that the mean occupation of mode i with frequency ωi (q) and wave vector q does not explicitly depend on q itself (only indirectly via the dependence of the frequency on q). It is therefore convenient to reduce the summations over modes and wave vectors to one simple frequency integral, by introducing a density of normal modes (or phonon density of states) g(ω), defined so that g(ω) is the total number of modes with frequencies ω per volume, Z 1 X g(ω) = dq δ (ω − ωi (q)) . (7.86) (2π)3 i 202
This implies obviously, that Z
∞
dω g(ω) = 3Mp /V
(7.87)
,
0
i.e., the integral over all modes yields the total number of modes per volume, which is three times the number of atoms per volume, cf. Sec 7.2.4. With this definition of g(ω), the total energy due to all lattice vibrations at temperature T of a solid of Mp atoms in a volume V takes the simple form E vib (T ) = V
Z
∞
dω g(ω)
0
"
1 ~ω
e kB T
# 1 + ~ω −1 2
,
(7.88)
i.e., the complete information about the specific vibrational properties of the solid is contained in the phonon density of states g(ω). Once this quantity is known, either from experiment or DFT (see the examples in Figs. 7.6 and 7.7), the contribution from the lattice vibrations can be evaluated immediately. Via thermodynamic relations, also the entropy due to the vibrations is then accessible, allowing to compute the complete contribution from the phonons to thermodynamic quantities like the free energy or Gibbs free energy.
7.3.4
Phonon Specific Heat
We will discuss the phonon specific heat as a representative example of how the consideration of vibrational states improves the description of those quantities that were listed as failures of the static lattice model in the beginning of this chapter. The phonon specific heat cV measures how well the crystal can take up energy with increasing temperatures, and we had stated in the beginning of this chapter that the absence of low energy excitations in a rigid lattice insulator would lead to a vanishingly small specific heat at low temperatures, in clear contradiction to the experimental data. Taking lattice vibrations into account, it is obvious that they, too, may be excited with temperature. In addition, we had seen that lattice vibrational energies are of the same order of magnitude as thermal energies, so that the possible excitation of vibrations in the lattice should yield a strong contribution to the total specific heat of any solid. The crucial point to notice is, however, that vibrations cannot take up energy continuously (in form of gradually increasing amplitudes) in a quantum-mechanical formalism. Instead, only discrete excitations ~ωi (q) of the phonons modes are possible, and we will see that this has profound consequences on the low temperature behavior of the specific heat. That the resulting functional form of the specific heat at low temperatures agrees with the experimental data was one of the earliest triumphs of the quantum theory of solids. In general, the specific heat at constant volume is defined as 1 ∂U cV (T ) = , (7.89) V ∂T V
i.e., it measures (normalized per volume) the amount of energy taken up by a solid when the temperature is increased. If we want to specifically evaluate the contribution from the 203
Figure 7.9: Qualitative form of the specific heat of solids as function of temperature (here normalized to a temperature ΘD below which quantum effects become important). At high temperatures, cV approaches the constant value predicted by Dulong-Petit, while it drops to zero at low temperatures.
lattice vibrations using Eq. (7.88), we get the general expression
1 ∂E vib vib cV (T ) = V ∂T V
∂ = ∂T ∂ = ∂T
Z Z
∞
dω g(ω)
0 ∞ 0
dω g(ω)
" "
1 ~ω
e kB T 1 e
~ω kB T
# 1 + ~ω −1 2 # ~ω
.
(7.90)
−1
In the last step, we have dropped the temperature-independent term associated with zero-point vibrations. If one plugs in the phonon densities of states from experiment or DFT, the calculated cvib V agree excellently with the measured specific heats of solids, proving that indeed the overwhelming contribution to this quantity comes from the lattice vibrational modes. The curves have for most elements roughly the form shown in Fig. 7.9. In particular, cV approaches a constant value at high temperatures, while in the lowest temperature range it scales with T 3 (in metals, this scaling is in fact αT + βT 3 , because there is in addition a small contribution from the conduction electrons, which you calculated in exercise 4). But, let us again try to see, if we can understand these generic features on the basis of the theory developed in this chapter (and not simply accept the data coming out from DFT or experiment). 204
ω(k) −π/a
0
+π/a
k
Figure 7.10: Left: Illustration of the dispersion curves behind the Einstein and Debye models (dashed curves). Einstein modes approximate optical branches, while Debye modes focuses on the acoustic branches that become particularly relevant at low temperatures. Right: Specific heat curves obtained with both models.
7.3.4.1
High-Temperature Limit (Dulong-Petit Law)
In the high temperature limit, we have kB T ≫ ~ω, and can therefore expand the exponential in the denominator in Eq. (7.90) " # Z ∞ ∂ 1 cvib dω g(ω) ~ω ~ω V (T → ∞) = ∂T 0 e kB T − 1 " # Z ∞ 1 ∂ dω g(ω) = ~ω ∂T 0 (1 + k~ω + . . .) − 1 T B Z ∞ ∂ = dω g(ω) (kB T ) ∂T 0 Mp = 3kB = constant . (7.91) V In the last step, we exploited that g(ω) is normalized to the total number of phonon states as given in Eq. (7.87). We therefore find that the specific heat indeed becomes constant at high temperatures and the absolute value of cvib V is entirely given by the available degrees of freedom in the solid (Mp /V ). This was long known as the Dulong-Petit law from classical statistical mechanics, i.e., from the equipartition theorem. In this high-temperature limit, we recover this behaviour because at such high temperatures the quantized nature of the vibrations does not show up anymore, i.e., ~ω ≪ kB T . Therefore, energy can be taken up in a quasi-continuous manner and we get the classical result that each degree of freedom that enters the Hamiltonian quadratically contributes kB2T to the energy of the solid. 7.3.4.2
Intermediate Temperature Range (Einstein Approximation, 1907)
The constant cvib V obtained in the high-temperature limit is not surprising, since it coincides with what we expect from classical statistical mechanics. This classical understanding is, however, not able to grasp why the specific heat deviates from the constant value predicted by the Dulong-Petit law at intermediate and low temperatures. This lowering of 205
cVvib can only be understood starting from quantum theory by taking the discrete, quantized nature of vibrational excitations into account. The simplest historic model that goes beyond a classical theory and considers this quantized nature of the oscillations stems from Einstein. He ignored all interactions between the vibrating atoms in the solid and assumed that all three degrees of freedom for atoms of one species vibrate with exactly one characteristic frequency ωE (independent harmonic oscillators). Without interactions, the dispersion curves of the various phonon branches become simply constant in such an Einstein approximation. Accordingly, they can be regarded as a model for optical branches (see Fig. 7.10). For a mono-atomic solid, the phonon density of states in the Einstein model is given by g(ω)Einstein = 3
Mp δ(ω − ωE ) , V
,
(7.92)
where ωE is this characteristic frequency (often called Einstein frequency) with which all atoms vibrate. Inserting this form of g(ω) into Eq. (7.90) leads to Mp x2E exE Einstein cV (T ) = 3kB . (7.93) V (exE − 1)2 Here, the short hand variable xE = ~ωE /(kB T ) expresses the ratio between vibrational and thermal energy. Eq. (7.93) already yields a fundamentally important result: In the high-temperature limit (x ≪ 1), the term in brackets approaches unity and we recover the Dulong-Petit law; at intermediate temperatures, however, this term declines and finally drops to zero for x ≫ 1 (i.e. T → 0). Accordingly, the resulting specific heat curve shown in Fig. 7.10 exhibits almost all qualitative features of the experimental data. This is a big success for such a simple model and was a sensation hundred years ago!. To be more specific, the cEinstein (T ) curve starts to deviate significantly from the Dulong-Petit value V for x ∼ 1, i.e., for temperatures ΘE ∼ ~ωE /kB . ΘE is known as the Einstein temperature and for temperatures above it, all modes can easily be excited and a classical behavior is obtained. For temperatures of the order or below ΘE , however, the thermal energy is of the same order of magnitude as the quantized excitations, so that these excitations can no longer be regarded to be quasi-continuous. As a matter of fact, the discrete nature of the vibrational energy starts to show up and one says that this mode starts to “freeze-out”. In other words, any phonon mode whose energy ~ω is much greater than kB T cannot be excited and thus contributes nothing of note to the specific heat anymore. At T → 0 K, the specific heat thus approaches zero. Yet, precisely this limit is not obtained properly in the Einstein model. From Eq. (7.93) we see that the specific heat approaches the zero temperature limit exponentially, which is not the proper T 3 scaling known from experiment. The reason for this failure lies, of course, in the gross simplification of the phonon density of states. Approximating all modes to have the same (dispersionless) frequency is not too bad for the treatment of optical phonon branches as illustrated in Fig. 7.10. If the characteristic frequency is chosen properly in the range of the optical branches, the specific heat from this model in fact reproduces the experimental or properly computed cvib V quite well in the intermediate temperature range. At these temperatures, the specific heat is thus indeed dominated by the contributions from the optical modes. At lower temperatures, these modes freeze out rapidly, so that the Einstein model predicts a rapid exponential decline of cEinstein to zero. While the optical V branches are indeed unimportant at low temperatures, the low-energy acoustic branches 206
still contribute, though. And it is the contribution from the latter that gives rise to the T 3 scaling for T → 0 K. Since the Einstein model does not describe acoustic dispersion and low-frequency modes at all, it fails in reproducing this limit.
7.3.4.3
Low Temperature Limit (Debye Approximation, 1912)
To properly address the low temperature limit, we a model for the acoustic branches, since these are the lowest energy modes that can still be excited thermally for T → 0 K. More specifically, the part of the acoustic branch close to the Brillouin zone center, cf. Fig. 7.10, needs to be considered. In section 7.2.5 we had discussed that in this region the dispersion is linear in |q| and for the corresponding very long wavelength modes the solid behaves like a continuous elastic medium. This is the conceptual idea behind the Debye model, which approximates the dispersion of these modes using ωi (q) = vg |q|, where vg (the group velocity) is the speed of sound that can be obtained from linear elasticity theory. Note that we are only going to discuss a minimalistic Debye model here, because even in a perfectly isotropic medium there would be at least different group velocities for longitudinal and transverse waves, while in real solids this gets even more differentiated. To illustrate the idea behind the Debye model, all of this is neglected and we only use one generic speed of sound for all three acoustic phonon modes. The important aspect of the Debye model is in any case not this material constant vg , but the linear scaling of the dispersion curves. With this linear scaling, one obtains the phonon density of states
g(ω)
Debye
Z 3 = dq3 δ(ω − vg |q|) (2π)3 Z 3 = dqq 2 δ(ω − vg q) 2π 2 3 ω2 = 2π 2 |vg |3 3 ω2 ⇒ θ(ωD − ω) . 2π 2 |vg |3
(7.94) (7.95) (7.96) (7.97)
Since this quadratic form would be unbound for ω → ∞, a cutoff frequency ωD (Debye frequency) is introduced in the last step. The value of ωD is determined by Eq. (7.87), i.e., by requiring that the integral over g(ω)Debye yields the correct number of available vibrational states per volume ( V3 in this case). Accordingly, one gets
ωD =
√
3|vg |3
6π 2 V
,
(7.98)
which allows to define a Debye temperature ΘD = ~ωD /kB in close analogy to the Einstein temperature introduced above. 207
In such a Debye model, the specific heat defined in Eq. (7.90) is given by
cDebye = kB V
Z
~ω kB T
2
~ω kB T
exp dωg(ω)Debye 2 exp k~ω −1 BT
ZωD 3kB x2 exp (x) 2 dω ω = 2π 2 |vg |3 (exp (x) − 1)2 0 3 Z ΘD /T 9kB T x4 e x = dx x . V ΘD (e − 1)2 0
(7.99)
(7.100) (7.101)
As it was the case in the Einstein model, the Dulong-Petit value is again obtained properly also in the Debye model for the high-temperature limit (at least for the three acoustic degrees of freedom). Also, cvib V declines to zero at low temperatures T → 0. As immediately apparent from the equation above, the scaling in this limit is now correctly given by T 3 . As expected, the Debye model thus recovers the correct low-temperature limit. However, it fails for the intermediate temperature range: As illustrated in Fig. 7.10, the Debye model substantially deviates from the Einstein model that reproduces experimental data in this range very well. Obviously, the reason is that at these temperatures the optical branches get noticeably excited, but that these optical branches are essentially not accounted for in the Debye model, cf. Fig. 7.10. The specific heat data from a real solid can therefore be well understood from the discussion of the three ranges just discussed. At low temperatures, only the acoustic modes contribute noticeably to cV and the Debye model provides a good description. With increasing temperatures, optical modes start to become excited, too, and the cV curve changes gradually to the form predicted by the Einstein model. Finally, at the highest temperatures all modes contribute quasi-continuously and the classical Dulong-Petit value is approached. Similar to the role of ΘE in the Einstein model, also the Debye temperature ΘD indicates the temperature range above which classical behavior sets in and below which modes begin to freeze-out due to the quantized nature of the lattice vibrations. Debye temperatures are normally obtained by fitting the predicted T 3 form of the specific heat to low temperature experimental data. They are tabulated for all elements and are typically of the order of a few hundred degree Kelvin, which is thus the natural temperature scale for (acoustic) phonons. Therefore, the Debye temperature plays the same role in the theory of lattice vibrations as the Fermi temperature TF plays in the theory of metals, separating the low-temperature region where quantum statistics must be used from the high-temperature region where classical statistical mechanics is valid. For conduction electrons, TF is of the order of 20000 K, though, i.e., at actual temperatures we are always well below TF in the electronic case, while for phonons both classical and quantum regimes can be encountered. This large difference between TF and ΘD is also the reason why the contribution from the conduction electrons to the specific heat of metals is only noticeable at lowest temperatures. In this regime, cel V scales linearly with (T /TF ) (cf. exercise 4). The phonon contribution scales with (T /ΘD )3 , though. If we equate the two relationships, we see that the conduction 208
Figure 7.11: Measured specific heats at constant volume cV and at constant pressure cP for aluminum (from Brooks and Bingham, J. Phys. Chem. Sol. 29, 1553 (1968)).
electron contribution will show up only for temperatures s Θ3D , T ∼ TF
(7.102)
i.e., for temperatures of the order a few Kelvin. Finally, we note that comparing the tabulated values for ΘD with measured melting temperatures, a rough correlation between the two quantities (ΘD is about 30-50% of the melting temperature) is found. The Debye temperature can therefore also be seen as a measure for the stiffness of the material and thus reflects the critical role that lattice vibrations play also in the process of melting (which is one of the other properties listed in the beginning of this chapter, for which a static lattice model obviously has to fail). For the exact description of melting, however, we come to a temperature range, where the initially made assumption of small vibrations around the PES minimum and with that the harmonic approximation is certainly no longer valid. Melting is therefore one of the materials properties, for which it is necessary to go beyond the theory of lattice vibrations developed so far in this chapter.
7.4
Anharmonic Effects in Crystals
So far, our general theory of lattice vibrations has been based on the initial assumption that the amplitudes of the oscillations are quite small. This has led us to the so called harmonic approximation, where higher order terms beyond the quadratic one are neglected in the Taylor expansion around the PES minimum. This approximation is plausible at low temperatures, keeps the mathematics tractable and can indeed explain a wide range of material properties. We had exemplified this in the last section for a fundamental equilibrium 209
property like the specific heat. Similarly, we would find that the developed theory is able to correctly reproduce and predict other thermodynamic equilibrium properties, e.g., free energies, at low and intermediate temperatures. At elevated temperatures, however, this theoretical framework becomes inaccurate: As shown in Fig. 7.11 for aluminum, the specific heat cV is not a simple constant in the high-temperature regime. Under these thermodynamic conditions, our initial assumption of small oscillations obviously breaks down, so that higher-order terms in the Taylor expansion –so called anharmonic terms– need to be accounted for. At elevated temperatures, these anharmonic effects often quantitatively change the predictions made by a harmonic furthermore. There are, however, also a number of materials properties that cannot be understood within a harmonic theory of lattice vibrations at all. The most important of these are the thermal expansion of crystals and heat transport.
7.4.1
Thermal Expansion
In a rigorously harmonic crystal the equilibrium size would not depend on temperature. One can prove this formally (rather lengthy, see Ashcroft and Mermin), but it becomes already intuitively clear by looking at the equilibrium position in an arbitrary one-dimensional PES minimum. If we expand the Taylor series only to second order around the minimum, the harmonic potential is symmetric to both sides. Already from symmetry it is therefore clear that the average equilibrium position of a particle vibrating around this minimum must at any temperature coincide with the minimum position itself in the harmonic approximation. If the PES minimum represents the bond length (regardless of whether in a molecule or a solid), no change will therefore be observed with temperature. Only anharmonic terms, at least the cubic one, can yield an asymmetric potential and therewith a temperature dependent equilibrium position. With this understanding, it is also obvious that the temperature dependence of elastic constants of solids, e.g., the bulk modulus, must result from anharmonic effects. Let’s consider in more detail the thermal expansion coefficient α. For an isotropic expansion of a cube of volume V , the expansion coefficient can be expressed as 1 ∂V 1 ∂P α = . = . (7.103) 3V ∂T P 3B ∂T V by using the thermodynamic definition of the bulk modulus 1 1 ∂V = . B V ∂P T Since we know that the pressure P can be written as ∂U P = − , ∂V T
(7.104)
(7.105)
volume-derivative of the internal energy U given in Eq. (7.85), the lattice expansion α can be calculated via: 1 X ∂ni (q) ∂~ωi (q) α= . (7.106) − 3B i,q ∂V ∂T T V 210
Here, ni (q) is the Bose-Einstein distribution for mode i at wavevector q. Since its derivative also appears in the definition of the specific heat cV (T ), it is common to express the thermal expansion coefficient in terms of the specific heat. For this purpose, mode-specific specific heats ~ωi (q) ∂ni (q) cV (i, q, T ) = (7.107) V ∂T and mode-specific Grüneisen parameters γi (q) = −
V ∂ωi (q) ∂ ln(ωi (q)) =− ωi (q) ∂V ∂ ln(V )
(7.108)
are introduced. This yields α=
1 X γ(T )cV (T ) γi (q)cV (i, q, T ) = 3B i,q 3B
.
(7.109)
In the last step, we have used the definition of the overall Grüneisen parameter P i,q γi (q) cV (i, q, T ) X γ(T ) = (7.110) cV (i, q, T ) i,q,T
|
{z
cV (T )
}
to further simplify the equation. The above discussion and especially Eq. (7.110) shows that in first order approximation (γ(T ) ≈ constant, e.g., NaCl = 1.6, KBr = 1.5), α(T ) will exhibit a similar temperature dependence as cV . Specifically, α(T ) approaches zero as T 3 and becomes constant at high T . Notable exceptions, exist, though, e.g., the negative lattice expansion of silicon that is discussed in the exercises. In this case, also the approximation γ(T ) ≈ constant has its limits: the actual dependence of the Grüneisen parameter on temperature (and volume) needs to be additionally accounted for. This also is true at high-temperatures and/or in the case of strong anharmonicities (see exercises).
7.4.2
Heat Transport
Heat transport in solid, isotropic materials is described by Fourier’s law: J = κ∇T
.
(7.111)
Whenever a temperature gradient ∇T is present, a heat flux J develops that drives the system back to equilibrium. The material-specific, temperature-dependent constant that relates J and ∇T is the thermal conductivity κ. Heat can be transported through a crystal either by electrons (metals) or by phonons (all solids).7 We will discuss electronic transport in more detail in a later chapter. Here, we focus on the lattice vibrations, which are the dominant contribution in semiconductors and insulators. Heat in the form of a once created wavepacket of lattice vibrations would travel indefinitely in a rigorously harmonic crystal: This becomes evident from the fact that the 7
For transparent materials at elevated temperatures, i.e., when a material is already incandescent, photons can play a role, too.
211
Harmonic Approximation Potential Energy V
VBO
rd
3 Order Approximation generalized coordinate R
Figure 7.12: Left: Schematic one-dimensional representation of the Taylor expansion for an anharmonic potential-energy surface V BO : Both the harmonic approximation (green) as well as a third-order expansion (7.112) that additionally includes a cubic term (orange) are shown. Right: Computed thermal conductivity κ of silicon (red) and germanium (blue) compared to experiment (full symbols), from Broido et al., Appl. Phys. Lett. 91, 231922 (2007).
energy stored in each individual mode is time-independent, i.e., the amplitudes Ai (q) in a classical case (see Eq. (7.31)) and the occupation numbers ni (q) in a quantum-mechanical formalism (see Eq. (7.77). This is not surprising: These oscillations are eigenstates of the harmonic Hamiltonian and do thus not decay over time, but are completely determined by the initial conditions. Consequently, a perfectly harmonic solid would exhibit an infinite thermal conductivity. To explain the finite thermal conductivity of insulators and semiconductors, anharmonic terms beyond the quadratic one must be considered in the PES expansion, e.g., by extending the Taylor expansion in Eq. (7.8) to the third order: V
BO
({RI })
≈ +
1 X X µ ν ∂ 2 V BO ({RI }) V + s s 2 I,J µ,ν I J ∂RIµ ∂RJν {R◦I } 1 X X µ ν η ∂ 3 V BO ({RI }) s s s . η 6 I,J,K µ,ν,η I J K ∂RIµ ∂RJν ∂RK R◦I | {z } BO
({R◦I })
(7.112) (7.113)
Ψµνη IJK
As shown in Fig. 7.12, such an approximation obviously improves the description of the PES in the region close to the actual minimum with respect to a pure harmonic potential. Still, inaccuracies arise whenever the system is not close to equilibrium anymore: In particular, this third-order potential yields qualitatively wrong results in this limit, i.e., it is unbound V BO ({RI }) → −∞ due to the asymmetry in the cubic term. This makes an exact solution of the equations of motion in such a potential essentially impossible – both analytically and numerically. Instead, the cubic Ψµνη IJK term is typically treated via perturbation theory by expanding the Hamiltonian for the third-order potential in terms of the harmonic solutions. The actual derivation is lengthy and complex (see for instance 212
Broido, Ward, and Mingo, Phys. Rev. B 72, 014308 (2005)), so we only summarize the final result here: In an isotropic crystal, the thermal conductivity can be evaluated via Z 1 X κ= dqcV (i, q, T )vg2 (i, q)τi (q) . (7.114) 3 (2π) i Loosely speaking, this expression just formalizes that each phonon mode i, q transports its energy cV (i, q, T ) (the mode-specific specific heat) with the respective group velocity vg (i, q) for a certain time (the lifetime τi (q)). After this lifetime, which is the only term that depends on the Ψµνη IJK , the complete contribution of the mode is lost. This finite lifetime arises since the harmonic phonons are now no longer perfect solutions to the equations of motions, but only a first approximation. Phonons can therefore “decay” by scattering at each other and this immediately yields a finite thermal conductivity. As shown in Fig. 7.12, this kind of approach yields qualitatively and quantitatively excellent results at low temperatures and for harmonic materials with large thermal conductivities, i.e., whenever anharmonicity can indeed be regarded as a small perturbation. In all other cases, this approach is prone to fail, since a third-order expansion only describes a small portion of the potential-energy surface that is explored in strongly anharmonic systems at elevated temperatures (see Fig. 7.12). Accordingly, an assessment of thermal insulators requires to take the full potential-energy surface into account. This can be achieved with molecular dynamics simulations that will be discussed in the exercises. The thermal conductivity can then be evaluated from such simulations (see Carbogno, Ramprasad, and Scheffler, Phys. Rev. Lett. 118, 175901 (2017)).
213
8 Magnetism 8.1
Introduction
The topic of this part of the lecture are the magnetic properties of materials. Most materials are generally considered to be “non-magnetic”, which is a loose way of saying that they become magnetized only in the presence of an applied magnetic field (dia- and paramagnetism). We will see that in most cases these effects are very weak and the magnetization is lost as soon as the external field is removed. Much more interesting (also from a technological point of view) are those materials which not only have a large magnetization, but also retain it even after the removal of the external field. Such materials are called permanent magnets (ferromagnetism). The property that like (unlike) poles of permanent magnets repel (attract) each other was already known to the Ancient Greeks and Chinese over 2000 years ago. They used this knowledge for instance in compasses. Since then, the importance of magnets has risen steadily. They play an important role in many modern technologies: • Recording media (e.g. hard disks): Data is recorded on a thin magnetic coating. The revolution in information technology owes as much to magnetic storage as to information processing with computer chips (i. e., the ubiquitous silicon chip). • Credit, debit, and ATM cards: All of these cards have a magnetic strip on one side. • TVs and computer monitors: Monitors contain a cathode ray tube that employs an electromagnet to guide electrons to the screen. Plasma screens and LCDs use different technologies. • Speakers and microphones: Most speakers employ a permanent magnet and a currentcarrying coil to convert electric energy (the signal) into mechanical energy (movement that creates the sound). • Electric motors and generators: Most electric motors rely upon a combination of an electromagnet and a permanent magnet, and, much like loudspeakers, they convert electric energy into mechanical energy. A generator is the reverse: it converts mechanical energy into electric energy by moving a conductor through a magnetic field. • Medicine: Hospitals use (nuclear) magnetic resonance tomography to spot problems in a patient’s organs without invasive surgery. Figure 8.1 illustrates the type of magnetism in the elementary materials of the periodic table. Only Fe, Ni, Co and Gd exhibit ferromagnetism. Most magnetic materials are therefore alloys or oxides of these elements or contain them in another form. There is, however, 214
Figure 8.1: Magnetic type of the elements in the periodic table. For elements without a color designation, magnetism is smaller or not explored. recent and increased research into new magnetic materials such as plastic magnets (organic polymers), molecular magnets (often based on transition metals) or molecule based magnets (in which magnetism arises from strongly localized s and p electrons). Electrodynamics gives us a first impression of magnetism. Magnetic fields act on moving electric charges or, in other words, currents. Very roughly, one may understand the behavior of materials in magnetic fields as arising from the presence of “moving charges”. Bound core and quasi-free valence electrons possess a spin, which in a very simplified classical picture can be viewed as a rotating charge, i. e., a current. Bound electrons have an additional orbital momentum which adds another source of current (again in a simplistic classical picture). These “microscopic currents” react in two different ways to an applied magnetic field: First, according to Lenz’ law, a current is induced that creates a magnetic field opposing the external magnetic field (diamagnetism). Second, the “individual magnets” represented by the electron currents align with the external field and enhance it (paramagnetism). If these effects happen for each “individual magnet” independently, they remain small. The corresponding magnetic properties of the material may then be understood from the individual behavior of the constituents, i. e., from the individual atoms (insulators, semiconductors) or from the ions and free electrons (conductors), cf. Sec. 8.3. The much stronger ferromagnetism, however, arises from a collective behavior of the “individual magnets” in the material. In Sec. 8.4 we will first discuss the source for such an interaction before we move on to simple models treating either a possible coupling of localized moments or itinerant ferromagnetism.
8.2
Macroscopic Electrodynamics
Before we plunge into the microscopic sources of magnetism in solids, let us first recap a few definitions from macroscopic electrodynamics. In vacuum we have E and B as the 215
electric field (unit: V/m) and the magnetic flux density (unit: Tesla = Vs/m2 ), respectively. Note that both are vector quantities, and in principle they are also functions of space and time. Since we will only deal with constant, uniform fields in this lecture, this dependence will be dropped throughout. Inside macroscopic media, both fields can be affected by the charges and currents present in the material, yielding the two new net fields, D (electric displacement, unit: As/m2 ) and H (magnetic field, unit: A/m). For the formulation of macroscopic electrodynamics (the Maxwell equations in particular) we therefore need socalled constitutive relations between the external (applied) and internal (effective) fields: D = D(E, B),
H = H(E, B).
(8.1)
In general, this functional dependence can be written in form of a multipole expansion. In most materials, however, already the first (dipole) contribution is sufficient. These dipole terms are called P (el. polarization) and M (magnetization), and we can write B − M + ... µ0 D = ǫ0 (E + P + . . .) ,
H =
(8.2) (8.3)
where ǫ0 = 8.85 · 10−12 As/Vm and µ0 = 4π · 10−7 Vs/Am are the dielectric constant and permeability of vacuum. P and M depend on the applied field, which we can formally write as a Taylor expansion in E and B. If the applied fields are not too strong, one can truncate this series after the first term (linear response), i. e., the induced polarization/magnetization is then simply proportional to the applied field. We will see below that external magnetic fields we can generate at present in the laboratory are indeed weak compared to microscopic magnetic fields, i. e., the assumption of a magnetic linear response is often well justified. As a side note, current high-intensity lasers may, however, bring us easily out of the linear response regime for the electric polarization. Corresponding phenomena are treated in the field of non-linear optics. For the magnetization, it is actually more convenient to expand in the internal field H instead of in B. In the linear response regime, we thus obtain for the induced magnetization and the electric polarization M = χmag H ⇒ M = χmag H P = χel E ⇒ P = χel E .
(8.4) (8.5)
Here, χel/mag is the dimensionless electric/magnetic susceptibility tensor. In simple materials (on which we will focus in this lecture), the linear response is often isotropic in space and parallel to the applied field. The susceptibility tensors then reduce to a scalar form χmag/el . These equations are in fact often directly written as defining equations for the dimensionless susceptibility constants in solid state textbooks. It is important to remember, however, that we made the dipole approximation and restricted ourselves to isotropic media. For this case, the constitutive relations between external and internal field reduce to 1 B µ0 µr D = ǫ0 ǫr E,
H =
216
(8.6) (8.7)
material susceptibility χmag vacuum 0 H2 0 -8·10−6 Cu ∼ -10−5 Al 2·10−5 iron (depends on purity) ∼ 100 - 1000 µ-metal (nickle-iron alloy) ∼ 80,000 - 100,000
diamagnetic diamagnetic paramagnetic ferromagnetic basically screens everything (used in magnetic shielding)
Table 8.1: Magnetic susceptibility of some materials. where ǫr = 1 + χel is the relative dielectric constant, and µr = 1 + χmag the relative permeability of the medium. For χmag < 0, the magnetization is anti-parallel to H. Accordingly, we have µr < 1 and |H| < | µ10 B|. The response of the medium reduces (or screens) the external field. Such a system is called diamagnetic. For the opposite case (χmag > 0), the magnetization is aligned with H. Hence, we have µr > 1 and therefore |H| > | µ10 B|. The external field is enhanced by the material, and we talk about paramagnetism. To connect to microscopic theories, we consider the energy. The process of screening or enhancing the external field will require work, i. e., the energy of the system is changed (magnetic energy). Assume therefore that we have a material of volume V which we bring into a magnetic field B (which for simplicity we take to be along one dimension only). From electrodynamics, we know that the energy of this magnetic field is 1 1 1 ⇒ dE mag = ( B dH + H dB) . (8.8) E mag = BHV 2 V 2 From Eq. (8.6), we find dB = µ0 µr dH and therefore H dB = B dH, which leads to 1 dE mag = B dH = (µ0 H +µ0 M ) dH |{z} V
(8.9)
B0
The first term µ0 H = B0 describes the field in the vacuum, i. e., without the material, and thus just gives the field induced energy change if no material was present. Only the second term is material dependent and describes the energy change of the field in reaction to the material. Due to energy conservation, the energy change in the material itself is therefore mag dEmaterial = −µ0 M V dH. (8.10) Recalling that the energy of a dipole with magnetic moment m in a magnetic field is E = −mB, we see that the approximations leading to Eq. (8.6) are equivalent to assuming that the homogeneous solid is build up of a constant density of “molecular dipoles”, i. e., the magnetization M is the (average) dipole moment density. By rearranging Eq. (8.10), we finally arrive at an expression that is a special case of a frequently employed alternative definition of the magnetization 1 ∂E(H) M (H) = − , (8.11) µ0 V ∂H S,V
and in turn of the susceptibility χ
mag
∂M (H) 1 ∂ 2 E(H) (H) = = − . ∂H S,V µ0 V ∂H 2 S,V 217
(8.12)
At finite temperatures, it is straightforward to generalize these definitions to 1 ∂F (H, T ) 1 ∂ 2 F (H, T ) mag M (H, T ) = − χ (H, T ) = − . µ0 V ∂H S,V µ0 V ∂H 2 S,V
(8.13)
While the derivation via macroscopic electrodynamics is most illustrative, we will see that these last two equations will be much more useful for the actual quantum-mechanical computation of the magnetization and susceptibility of real systems. All we have to do is to derive an expression for the energy of the system as a function of the applied external field. The first and second derivatives with respect to the field then yield M and χmag , respectively.
8.3
Magnetism of Atoms and Free Electrons
We had already discussed that magnetism arises from the “microscopic currents” connected to the orbital and spin moments of electrons. Each electron therefore represents a “microscopic magnet”. But how do they couple in materials with a large number of electrons? Since any material is composed of atoms, it is reasonable to first reduce the problem to that of the electronic coupling inside an atom before attempting to describe the coupling of “atomic magnets”. Since both orbital and spin momentum of all bound electrons of an atom will contribute to its magnetic behavior, it will be useful to first recall how the total electronic angular momentum of an atom is determined (Sec. 8.3.1) before we turn to the effect of a magnetic field on the atomic Hamiltonian (Sec. 8.3.2). In metals, we have the additional presence of free conduction electrons. Their magnetic properties will be addressed in Sec. 8.3.3. Finally, we will discuss in Sec. 8.3.4 which aspects of magnetism we can already understand just on the basis of these results on atomic magnetism, i. e., in the limit of vanishing coupling between the “atomic magnets”.
8.3.1
Total Angular Momentum of Atoms
In the atomic shell model, the possible quantum states of electrons are labeled as nlml ms , where n is the principle quantum number, l the orbital quantum number, ml the orbital magnetic quantum number, and ms the spin magnetic quantum number. For any given n that defines a so-called “shell”, the orbital quantum number l can take only integer values between 0 and (n − 1). For l = 0, 1, 2, 3 . . ., we generally use the letters s, p, d, f and so on. Within such an nl “subshell”, the orbital magnetic number ml may have the (2l + 1) integer values between −l and +l. The last quantum number, the spin magnetic quantum number ms , takes the values −1/2 and +1/2. For the example of the first two shells, this leads to two 1s, two 2s, and six 2p states. When an atom has more than one electron, the Pauli exclusion principle dictates that each quantum state specified by the set of four quantum numbers nlml ms can only be occupied by one electron. In all but the heaviest ions, spin-orbit coupling1 is negligible. In this case, the Hamiltonian does not depend on spin or the orbital moment and therefore commutes with these quantum numbers (Russel-Saunders coupling). The total orbital 1
Spin-orbit (or LS) coupling is a relativistic effect that requires the solution of the Dirac equation.
218
and spin angular momentum for a given subshell are then good quantum numbers and are given by X X L= ml and S= ms . (8.14)
If a subshell is completely filled, it is easy to verify that L=S=0. The total electronic angular momentum of an atom, J = L + S, is therefore determined by the partially filled subshells. The occupation of quantum states in a partially filled shell is given by Hund’s rules: 1. The states are occupied so that as many electrons as possible (within the limitations of the Pauli exclusion principle) have their spins aligned parallel to one another, i. e., so that the value of S is as large as possible. 2. When determined how the spins are assigned, then the electrons occupy states such that the value of L is a maximum. 3. The total angular momentum J is obtained by combining L and S as follows: • if the subshell is less than half filled (i. e., if the number of electrons is < 2l +1), then J = L − S;
• if the subshell is more than half filled (i. e., if the number of electrons is > 2l+1), then J = L + S; • if the subshell is exactly half filled (i. e., if the number of electrons is = 2l + 1), then L = 0, J = S.
The first two Hund’s rules are determined purely from electrostatic energy considerations, e. g., electrons with equal spins are farther away from each other on account of the exchange-hole. Only the third rule follows from spin-orbit coupling: Each quantum state in this partially filled subshell is called a multiplet. The term comes from the spectroscopy community and refers to multiple lines (peaks) that have the same origin (in this case the same subshell nl). Without any interaction, e. g. electron-electron, spin-orbit, etc., all quantum states in this subshell would be energetically degenerate and the spectrum would show only one peak at this energy. The interaction lifts the degeneracy and, in an ensemble of atoms, different atoms might be in different quantum states. The single peak then splits into multiple peaks in the spectrum. The notation for the ground-state multiplet obtained by Hund’s rules is not simply denoted LSJ as one would have expected. Instead, for historical reasons, the notation (2S+1) LJ is used. To make matters worse, the angular momentum symbols are used for the total angular momentum (L = 0, 1, 2, 3, . . . = S, P, D, F, . . .). The rules are in fact easier to apply than their description might suggest at first glance. Table 8.2 lists the filling and notation for the example of a d-shell.
8.3.2
General derivation of atomic susceptibilities
Having determined the total angular momentum of an atom, we now turn to the effect of a uniform magnetic field 0 H = 0 (8.15) H 219
el. 1 2 3 4 5 6 7 8 9 10
ml =
2 ↓ ↓ ↓ ↓ ↓ ↓↑ ↓↑ ↓↑ ↓↑ ↓↑
1 ↓ ↓ ↓ ↓ ↓ ↓↑ ↓↑ ↓↑ ↓↑
0
↓ ↓ ↓ ↓ ↓ ↓↑ ↓↑ ↓↑
−1
−2
↓ ↓ ↓ ↓ ↓ ↓↑ ↓↑
↓ ↓ ↓ ↓ ↓ ↓↑
S 1/2 1 3/2 2 5/2 2 3/2 1 1/2 0
L=|
P
2 3 3 2 0 2 3 3 2 0
ml |
J 3/2 2 3/2 0 5/2 4 9/2 4 5/2 0
Symbol 2 D3/2 3 F2 4 F3/2 5 D0 6 S5/2 5 D4 4 F9/2 3 F4 2 D5/2 1 S0
Table 8.2: Ground states of ions with partially filled d-shells (l = 2), as constructed from Hund’s rules. along the z-axis on the electronic Hamiltonian of an atom, H e = T e + V e−ion + V e−e . Note that the focus on H e means that we neglect the effect of H on the nuclear motion and spin. This is in general justified by the much greater mass of the nuclei rendering the nuclear contribution to the atomic magnetic moment very small, but it would of course be crucial if we were to address, e. g., nuclear magnetic resonance (NMR) experiments. Since V e−ion and V e−e are not affected by the magnetic field, we are left with the kinetic energy operator. From classical electrodynamics, we know that in the presence of a magnetic field, we have to replace all momenta p (= i~∇) by the canonic momenta p → p + eA. For a uniform magnetic field (along the z-axis), a suitable vector potential A is 1 A = − (r × µ0 H) , (8.16) 2 for which it is straightforward to verify that it fulfills the conditions H = (∇ × A) and ∇ · A = 0. The total kinetic energy operator is then written i2 1 Xh e 1 X 2 e [pk + eA] = pk − (rk × µ0 H) . (8.17) T (H) = 2m k 2m k 2
Expanding the square leads to X p2 e e2 2 k e T (B) = + pk · (µ0 H × rk ) + (rk × µ0 H) 2m 2m 8m k X eµ0 e2 µ20 H 2 X 2 = Toe + H · (rk × pk ) + xk + yk2 2m 8m k k
(8.18) ,
where we have used the fact that p · A = A · p if ∇ · A = 0 and that the first term Toe describes the kinetic energy of the system without any field. We simplify this equation further by Pintroducing the dimensionless (!) total electronic orbital momentum operator L = 1/~ k (rk × pk ) and the Bohr magneton µB = (e~/2m) = 0.579 · 10−4 eV/T, so that the variation of the kinetic energy operator due to the magnetic field can be expressed as e2 µ20 2 X 2 ∆T e (B) = T e (H) − Toe = µB µ0 L · H + H xk + yk2 . (8.19) 8m k 220
If this was the only effect of the magnetic field on H e , one can, however, show that the magnetization in thermal equilibrium must always vanish (Bohr-van Leeuwen theorem). In simple terms, this is because the vector potential amounts to a simple shift in p which integrates out when a summation over all momenta is performed. The free energy is then independent of H and the magnetization is zero. In reality, however, the magnetization does not vanish. The problem is that we have used the wrong kinetic energy operator. We should have solved the relativistic Dirac equation V − mc2 −cσ · (p − eA) φ φ =E . (8.20) 2 cσ · (p − eA) −V − mc χ χ φ↑ χ↑ Here, φ = is an electron wave function and χ = a positron one, σ is φ↓ χ↓ a vector of 2 × 2 Pauli spin matrices, c is the speed of light, and V is the potential energy. To solve and fully understand this equation, one would have to resort to quantum electrodynamics (QED), which goes far beyond the scope of this lecture. We therefore heuristically introduce another momentum, the electron spin, that would emerge from the fully relativistic treatment mentioned above. The new interaction energy term has the form X ∆H spin (H) = g0 µB µ0 S · H , where S = Sk . (8.21) k
Here, S is the again dimensionless total spin angular momentum operator andg0 is the so called electronic g-f actor (i. e., the g-factor for one electron spin). With the Dirac equation one obtains g0 = 2, whereas a full QED approach yields g0
h
i α 2 = 2 1+ + O(α ) + . . . , 2π = 2.0023 . . . .
where α =
e2 1 ≈ ~c 137
(8.22)
,
For the sake of notational conciseness, we will use a value of g0 = 2 in this lecture. Eventually, the total field-dependent Hamiltonian is therefore ∆H e (H) = ∆T e (H)+∆H spin (H) = µ0 µB (L + g0 S)·H +
e2 µ20 2 X 2 H xk + yk2 . (8.23) 8m k
We will see below that the energy shifts produced by Eq. (8.23) are generally quite small on the scale of atomic excitation energies, even for the highest presently attainable laboratory field strengths. Perturbation theory can thus be used to calculate the changes in the energies connected with this modified Hamiltonian. Remember that magnetization and susceptibility were the first and second derivative of the field-dependent energy with respect to the field, which is why we would like to write the energy as a function of H. Since we will require the second derivative later, terms to second order in H must be retained. Recalling second order perturbation theory, we can write the energy of the nth non-degenerate level of the unperturbed Hamiltonian as En → En + ∆En (H);
∆En = hn|∆H e (H)|ni + 221
X |hn|∆H e (H)|n′ i|2 En − En′ n′ 6=n
.
(8.24)
Substituting Eq. (8.23) into the above and retaining terms to quadratic order in H, we arrive at the basic equation for the theory of the magnetic susceptibility of atoms: ∆E LL
z }| { z }| { e2 µ20 2 X 2 = µ0 µB hn| (L + g0 S) · H|ni + H hn| xk + yk2 |ni 8m k X |hn|(L + g0 S) · H|n′ i|2 + µ20 µ2B . E n − E n′ ′ n 6=n | {z } ∆E Para
∆En
(8.25)
∆EVV
Since we are mostly interested in the atomic ground state |0i, we will now proceed to evaluate the magnitude of the three terms contained in Eq. (8.25) and their corresponding susceptibilities. This will also give us a feeling of the physics behind the different contributions. Beforehand, a few remarks –that will be justified during the discussion below– on the order of magnitude of these terms: • If J = L = S = 0, we are dealing with a closed shell atom. Accordingly, also excitation energies En − En′ will be large and the only non-vanishing term is ∆E LL . • If J = 0, but L and S are not, we get an additional contribution from ∆E VV . The contribution will be the stronger the less energy an electronic excitation En − En′ costs. • If J 6= 0, ∆E Para 6= 0 will completely dominate Eq. (8.25). In this case, however, the ground state is (2J + 1)-degenerate and we have to diagonalize the matrix in ∆E Para . 8.3.2.1
∆E LL : Larmor/Langevin Diamagnetism
To obtain an estimate the magnitude of P this term, we assume spherically symmetric P for 2 2 wavefunctions (h0| k (xk + yk )|0i ≈ 2/3 h0| k rk2 |0i). This allows us to approximate the energy shift in the atomic ground state due to the second term as ∆E0dia ≈
e2 µ20 H 2 X e2 µ20 H 2 2 h0|rk2 |0i ∼ Z r¯atom 12m k 12m
,
(8.26)
where Z is the total number of electrons in the atom (resulting from the sum over the k 2 electrons in the atom), and r¯atom is the mean square atomic radius. If we take Z ∼ 30 2 2 and r¯atom of the order of Å , we find ∆E0dia ∼ 10−9 eV even for fields of the order of Tesla. This contribution is therefore rather small. We make a similar observation from an order of magnitude estimate for the susceptibility, χmag,dia = −
2 1 ∂ 2 E0 µ0 e2 Z r¯atom = − ∼ −10−4 . µ0 V ∂H 2 6mV
(8.27)
With χmag,dia < 0, this term represents a diamagnetic contribution (the so called Larmor or Langevin diamagnetism), i. e., the associated magnetic moment screens the external field. We have thus identified the term which we qualitatively expected to arise out of the induction currents initiated by the applied magnetic field. 222
We will see below that this diamagnetic contribution will only determine the overall magnetic susceptibility of the atom when the other two terms in Eq. (8.25) vanish. This is only the case for atoms with J|0i = L|0i = S|0i, i. e., for closed shell atoms or ions (e.g. the noble gas atoms). In this case, the ground state |0i is indeed non-degenerate which justifies our use of Eq. (8.25) to calculate the energy level shift. As we recall from chapter 6 on cohesion, the first excited state is much higher in energy for such closed shell systems, so that at all but the highest temperatures, there is also a negligible probability of the atom being in any but its ground state in thermal equilibrium. This means that the diamagnetic susceptibility will be largely temperature independent (otherwise it would have been necessary to use Eq. (8.13) for the derivation of χmag,dia ). 8.3.2.2
∆EVV : Van Vleck Paramagnetism
We first note that the evaluation of this term for a non-degenerate (J = 0) ground state |ni and for a field aligned along the z-direction yields ∆EVV =
µ20 µ2B H 2
X |h0|(Lz + g0 Sz )|ni|2 n
E0 − En
,
(8.28)
in which Lz and Sz are the total angular projection operators along the z-axis for the orbital and spin angular momenta, respectively. By taking the second derivative, we obtain the susceptibility 2µ0 µ2B X |h0|(Lz + g0 Sz )|ni|2 . (8.29) χmag,vleck = V En − E0 n
Note that we have compensated for the minus sign in the definition of the susceptibility (8.12) by reversing the order in the denominator. Since the energy of any excited state will necessarily be higher than the ground state energy (En > E0 ), we thus get χmag,vleck > 0. The third term therefore represents a paramagnetic contribution (the so called Van Vleck paramagnetism), which is connected to field-induced electronic transitions. If the electronic ground state is non-degenerate (and only for this case does the above formula hold), it is precisely the normally quite large separation between electronic states which makes this term similarly small as the diamagnetic one. Van Vleck paramagnetism therefore only plays an important role for atoms or ions that fulfill the following conditions: First, the total angular momentum J has to vanish, otherwise the paramagnetic term ∆EPara discussed in the next section would completely overshadow ∆EVV . Second, L and S must not vanish individually, otherwise the contribution of ∆EVV would become vanishingly small due to the large separation between ground and excited states in closed-shell configurations. A closer look at Hund’s rules (see Tab. 8.2 for the example of a d-shell) reveals that this is only the case for atoms wich are one electron short of being half filled. The magnetic behavior of such atoms or ions is determined by a balance between Larmor diamagnetism and Van Vleck paramagnetism. Both terms are small and tend to cancel each other. The atomic lanthanide series is a notable exception: In this case, the energy spacing between the ground and excited states is small enough so that Van Vleck paramagnetism indeed gives a notable contribution that is comparable in strength to ∆E Para both for Eu (J = 0) and for Sm (J = 5/2)2 , see Fig. 8.2. Here the effective 2
Obviously, Eq. (8.28) is not directly applicable to Sm with J = 5/2. Rather, the six degenerate states Jz ∈ {−5/2, −3/2, · · · , +5/2} need to be explicitly considered, as discussed in the next section.
223
Figure 8.2: Van Vleck paramagnetism for Sm and Eu in the lanthanide atoms. From Van Vleck’s nobel lecture. magneton number µeff is plotted, which is defined in terms of the susceptibility χ=
8.3.2.3
N µ2eff µ2B . 3kB T
∆E Para : Paramagnetism
We again evaluate this term for a field aligned with the z-direction and get: (8.30)
∆E Para = µ0 µB Hhn|Lz + g0 Sz |ni .
As mentioned before, this first term does not vanish for any atom with J 6= 0 (which is the case for most atoms) and will then completely dominate the magnetic behavior, so that we can neglect the second and third term in Eq. (8.25) in this case. However, atoms with J 6= 0 have a (2J + 1)-fold degenerate ground state, which implies that the simple form of Eq. (8.25) cannot be applied. Instead, we have to diagonalize the perturbation matrix defined in Eq. (8.3.2.3) with respect to the α = 1, . . . , (2J + 1) degenerate ground states: (2J+1) (2J+1) X X ′ ∆E0,α = µ0 µB H h0α|Lz + g0 Sz |0α i = µ0 µB H Vα,α′ . (8.31) α′ =1
α′ =1
224
The respective energy shifts are obtained by diagonalizing the interaction matrix Vα,α′ within the ground-state subspace. This is a standard problem in atomic physics (see e.g. Ashcroft/Mermin) and one finds that the basis diagonalizing this matrix are the states defined by J and Jz , hJLS, Jz |Lz + g0 Sz |JLS, Jz′ i = g(JLS)Jz δJz ,Jz′ .
(8.32)
where JLS are the quantum numbers defining the atomic ground state |0i in the shell model and g(JLS) is the so-called Landé g-factor. Setting g0 ≈ 2, this factor is obtained as 3 1 S(S + 1) − L(L + 1) g(JLS) = + . (8.33) 2 2 J(J + 1) If we insert Eq. (8.32) into Eq. (8.31), we obtain
∆EJLS,Jz = g(JLS)µ0 µB Jz H
,
(8.34)
i. e., the magnetic field splits the degenerate ground state into (2J + 1) equidistant levels separated by g(JLS)µ0 µB H, i.e., g(JLS)µ0 µB H Jz
with
Jz ∈ {−J, J − 1, . . . , J + 1, J} .
(8.35)
This is the same effect, as the magnetic field would have on a magnetic dipole with magnetic moment matom = −g(JLS)µ0 µB J. (8.36)
Therefore, the first term in Eq. (8.25) can be interpreted as the expected paramagnetic contribution due to the alignment of a “microscopic magnet”. This is of course only true when we have unpaired electrons in partially filled shells (J 6= 0). Before we proceed to derive the paramagnetic susceptibility that arises from this contribution, we note that this identification of matom via Eq. (8.25) serves nicely to illustrate the difference between phenomenological and so-called first-principles theories. We have seen that the simple Hamiltonian H spin = −matom · H would equally well describe the splitting of the (2J + 1) ground state levels. If we had only known this splitting, e.g. from experiment (Zeeman effect), we could have simply used H spin as a phenomenological, so-called “spin Hamiltonian”. By fitting to the experimental splitting, we could even have obtained the magnitude of the magnetic moment for individual atoms (which due to their different total angular momenta is of course different for different species). Examining the fitted magnitudes, we might even have discovered that the splitting is connected to the total angular momentum. The virtue of the first-principles derivation used in this lecture, on the other hand, is that this prefactor is directly obtained, without any fitting to experimental data. This allows us not only to unambiguously check our theory by comparison with the experimental data (which is not possible in the phenomenological theory, where the fitting can often easily cover up for deficiencies in the assumed ansatz). We also directly obtain the dependence on the total angular momentum and can even predict the properties of atoms which have not (yet) been measured. Admittedly, in many cases first-principles theories are harder to derive. Later we will see more examples of “spin Hamiltonians” which describe the observed magnetic behavior of the material very well, but for which an all-encompassing first-principles theory is still lacking. 225
1 J=1/2 BJ(x)
0.8
1
0.6 2
5/2
3/2
0.4 0.2 0
0
1
2 x
3
4
Figure 8.3: Plot of the Brillouin function BJ (x) for various values of the spin J (the total angular momentum).
8.3.2.4
Paramagnetic Susceptibility: Curie’s Law
The separation of the (2J + 1) ground state levels of a paramagnetic atom in a magnetic field is g(JLS)µ0 µB H. Recalling that µB = 0.579 · 10−4 eV/T, we see that the splitting is only of the order of 10−4 eV, even for fields of the order of Tesla. This is small compared to kB T for realistic temperatures. Accordingly, more than one level will be populated at finite temperatures, so that we have to use statistical mechanics to evaluate the susceptibility, i.e., the derivatives of the free energy as defined in Eq. (8.13). The Helmholtz free energy F = −kB T ln Z (8.37) is given in terms of the partition function Z with X − En Z= e kB T .
(8.38)
n
Here, n is Jz = −J, . . . , J and En is our energy spacing g(JLS)µ0 µB HJz . By defining η=
g(JLS)µ0 µB H kB T
so that η is the fraction of the magnetic splitting versus thermal energy, it becomes evident that the partition function is a geometric sum that evaluates to: Z =
J X
Jz =−J
=
e−ηJz = eηJ + eη(J−1) + · · · + e−η(J−1) + e−ηJ
e−η(J+1/2) − eη(J+1/2) sinh [(J + 1/2)η] = −η/2 η/2 e sinh [η/2] −e 226
.
(8.39)
With this, one finds for the magnetization 1 ∂F 1 ∂(−kB T ln Z) = − (8.40) µ0 V ∂H µ0 V ∂H kB T ∂ g(JLS)µB J = [ln (sinh[(J + 1/2)η]) − ln (sinh[η/2])] = BJ (η) . µ0 V ∂H V
M (T ) = −
In the last step, we have defined the so called Brillouin function h η i 1 1 1 1 BJ (η) = (J + ) coth (J + )η/J − coth J 2 2 2 2J
.
(8.41)
As shown in Fig. 8.3, BJ → 1 for η ≫ 1, i.e., in the limit of a strong field and/or low temperatures. In this case, Eq. (8.40) simply tells us that all atomic magnets with momentum matom defined in Eq. (8.36) have aligned to the external field and that the magnetization reaches its maximum value of M = matom /V , i. e., the magnetic moment density. We had, however, discussed above, that η will be much smaller than unity for normal field strengths. At all but the lowest temperatures, the other limit for the Brillouin function, i. e., BJ (η → 0), will therefore be much more relevant. This small-η expansion yields BJ (η ≪ 1) ≈
J +1 η + O(η 3 ) . 3
(8.42)
In this limit, we finally obtain for the paramagnetic susceptibility χmag,para (T ) =
µo µ2B g(JLS)2 J(J + 1) 1 C ∂M = = ∂H 3V kB T T
.
(8.43)
With χmag,para > 0, we have now confirmed our previous hypothesis that the first term in Eq. (8.25) gives a paramagnetic contribution. The inverse dependence of the susceptibility on temperature is known as Curie’s law, χmag,para = CCurie /T with CCurie being the Curie constant. Such a dependence is characteristic for a system with “molecular magnets” that favor to align with the applied field and are being flipped by thermal fluctuations. Again, we see that the first-principles derivation not only reveals the range of validity for this empirical law (remember that it only holds in the small η-limit), but also provides the proportionality constant in terms of fundamental properties of the system. To make a quick size estimate, we take the volume of atomic order (V ∼ Å3 ) to find χmag,para ∼ 10−2 at room temperature. The paramagnetic contribution is thus orders of magnitude larger than the diamagnetic or the Van Vleck one, even at room temperature. Nevertheless, with χmag,para ≪ 1 it is still small compared to electric susceptibilities, which are of the order of unity. We will discuss the consequences of this in section 8.3.4, but before we have to evaluate a last remaining, additional source of magnetic behavior in metals, i. e., the free electrons.
8.3.3
Susceptibility of the Free Electron Gas
Having determined the magnetic properties of electrons bound to ions, it is valuable to also consider the opposite extreme and examine their properties as they move nearly freely in a metal. There are again essentially two major terms in the response of free 227
fermions to an external magnetic field. One results from the “alignment” of spins (which is a rough way of visualizing the quantum mechanical action of the field on the spins) and it is called Pauli paramagnetism. The other arises from the orbital moments created by induced circular motions of the electrons. It thus tends to screen the exterior field and is known as Landau diamagnetism. Let us first try to analyze the origin of Pauli paramagnetism and examine its order of magnitude like we have also done for all atomic magnetic effects. For this we consider again the free electron gas, i. e., N non-interacting electrons in a volume V . In chapter 2, we had already derived the density of states (DOS) per volume 3/2 √ 2me V N (ǫ) = ǫ . (8.44) 2 2 2π ~ In the electronic ground state, all states are filled following the Fermi-distribution Z ∞ N = N (ǫ) f (ǫ, T ) dǫ. (8.45) V 0
The electron density N/V uniquely characterizes our electron gas. For sufficiently low temperatures, the Fermi-distribution can be approximated by a step function so that all states up to the Fermi level are occupied. This Fermi level is conveniently expressed in terms of the Wigner-Seitz radius rs that we had already defined in chapter 6 50.1 eV ǫF = 2
(8.46)
.
rs aB
The DOS at the Fermi level can then also be expressed in terms of rs −1 1 rs N (ǫF ) = . 20.7 eV aB
(8.47)
Up to now we have not explicitly considered the spin degree of freedom which leads to the double occupation of each electronic state with a spin up and a spin down electron. Such an explicit consideration becomes necessary, however, when we now include an external field. This is accomplished by defining spin up and spin down DOS’s, N ↑ and N ↓ , in analogy to the spin-unresolved case. In the absence of a magnetic field, the ground state of the free electron gas has an equal number of spin-up and spin-down electrons, and the degenerate spin-DOS are therefore simply half-copies of the original total DOS defined in Eq. (8.44) 1 (8.48) N ↑ (ǫ) = N ↓ (ǫ) = N (ǫ) , 2 cf. the graphical representation in Fig. 8.4a. An external field µ0 H will now interact differently with spin up and spin down states. Since the electron gas is fully isotropic, we can always choose the spin axis that defines up and down spins to go along the direction of the applied field, and thus reduce the problem to one dimension. If we focus only on the interaction of the spin with the field and neglect the orbital response for the moment, the effect boils down to an equivalent of Eq. (8.21), i. e., we have ∆H spin (H) = µ0 µB g0 H · s = +µ0 µB H = −µ0 µB H 228
(for spin up) (for spin down) ,
(8.49) (8.50)
Figure 8.4: Free electron density of states and the effect of an external field: N↑ corresponds to the number of electrons with spins aligned parallel and N↓ antiparallel to the z-axis. (a) No magnetic field, (b) a magnetic field, µ0 H0 is applied along the z-direction, (c) as a result of (b) some electrons with spins antiparallel to the field (shaded region on left) change their spin and move into available lower energy states with spins aligned parallel to the field.
229
where we have again approximated the electronic g-factor g0 by 2. The effect is therefore simply to shift spin up electrons relative to spin down electrons. This can be expressed via the spin DOS’s 1 N (ǫ − µ0 µB H) 2 1 N ↓ (ǫ) = N (ǫ + µ0 µB H) , 2
(8.51)
N ↑ (ǫ) =
(8.52)
or again graphically by shifting the two parabolas with respect to each other as shown in Fig. 8.4b. If we now fill the electronic states according to the Fermi-distribution, a different number of up and down electrons results and our electron gas exhibits a net magnetization due to the applied field. Since the states aligned parallel to the field are favored, the magnetization will enhance the exterior field and a paramagnetic contribution results. For T ≈ 0, both parabolas are essentially filled up to a sharp energy, and the net magnetization is simply given by the dark grey shaded region in Fig. 8.4c. Even for fields of the order of Tesla, the energetic shift of the states is µB µ0 H ∼ 10−4 eV, i. e., small on the scale of electronic energies. The shaded region representing our net magnetization can therefore be approximated by a simple rectangle of height µ0 µB H and width 12 N (ǫF ), i. e., the Fermi level DOS of the unperturbed system without applied field. We then have N↑ =
N 1 + N (ǫF )µ0 µB H 2 2
(8.53)
N↓ =
N 1 − N (ǫF )µ0 µB H 2 2
(8.54)
up electrons per volume and
down electrons per volume. Remember that the magnetization is equal to the dipole magnetic density and can therefore be written as dipole moment per electron times DOS in such a uniform system. Accordingly, each electron contributes a magnetic moment of µB to the magnetization and we have M = (N ↑ − N ↓ )µB = N (ǫF )µ0 µ2B H
.
(8.55)
This then yields the susceptibility χmag,Pauli =
∂M = N (ǫF )µ0 µ2B ∂H
.
(8.56)
Using Eq. (8.47) for the Fermi level DOS of the free electron gas, this can be rewritten as function of the Wigner-Seitz radius rs 2.59 mag,Pauli −6 χ = 10 . (8.57) rs /aB Recalling that the Wigner-Seitz radius of most metals is of the order of 3-5 aB , we find that the Pauli paramagnetic contribution (χmag,Pauli > 0) of a free electron gas is small. In fact, it is as small as the diamagnetic susceptibility of atoms and therefore much smaller than their paramagnetic susceptibility. For magnetic atoms in a material, thermal disorder at 230
finite temperatures prevents their complete alignment to an external field and therefore keeps the magnetization small. For an electron gas, on the other hand, it is the Pauli exclusion principle that opposes such an alignment by forcing the electrons to occupy energetically higher lying states when aligned. For the same reason, Pauli paramagnetism does not show the linear temperature dependence we observed in Curie’s law in Eq. (8.43). The characteristic temperature for the electron gas is the Fermi temperature. One could therefore cast the Pauli susceptibility into Curie form, but with a fixed temperature of order TF playing the role of T . Since TF ≫ T , the Pauli susceptibility is then hundreds of times smaller and almost always completely negligible. Turning to Landau diamagnetism, we note that its calculation would require solving the full quantum mechanical free electron problem along similar lines as done in the last section for atoms. The field-dependent Hamilton operator would then also produce a term that screens the applied field. Taking the second derivative of the resulting total energy with respect to H, one would obtain for the diamagnetic susceptibility 1 χmag,Landau = − χmag,Pauli 3
,
(8.58)
i. e., another term that is vanishingly small, as is the paramagnetic response of the free electron gas. Because this term is so small and the derivation does not yield new insight compared to the one already undertaken for the atomic case, we refer to the book by M.P. Marder, Condensed Matter Physics (Wiley, 2000) for a proper derivation of this result.
8.3.4
Atomic magnetism in solids
So far, we have discussed the magnetic behavior of materials from tight-binding viewpoint by assuming that a material is “just” an ensemble of indipendent atoms or ions and electrons. That is why we first addressed the magnetic properties of isolated atoms and ions. As an additional source of magnetism in solids, we looked at free (delocalized or itinerant) electrons. In both cases we found two major sources of magnetism: a paramagnetic one resulting from the alignment of existing “microscopic magnets” (either total angular momentum in the case of atoms, or spin in case of free electrons) and a diamagnetic one arising from induction currents that try to screen the external field. • Atoms (bound electrons): - Paramagnetism χmag,para - Van Vleck paramagnetism χmag,VV - Larmor diamagnetism χmag,dia
≈ 1/T ∼ 10−2 (RT) ≈ const. ∼ −10−4 ≈ const. ∼ −10−4
∆E0para ∼ 10−4 eV ∆E0VV ∼ 10−9 eV ∆E0dia ∼ 10−9 eV
• Free electrons: - Pauli paramagnetism - Landau diamagnetism
χmag,Pauli ≈ const. ∼ 10−6 χmag,Landau ≈ const. ∼ −10−6
∆E0Pauli ∼ 10−4 eV ∆E0Landau ∼ 10−4 eV
If the coupling between the different sources of magnetism is small, the magnetic behavior of the material would simply be a superposition of all relevant terms. For insulators this would imply, for example, that the magnetic moment of each atom/ion does not change 231
Figure 8.5: (a) In rare earth metal atoms the incomplete 4f electronic subshell is located inside the 5s and 5p subshells, so that the 4f electrons are not strongly affected by neighboring atoms. (b) In transition metal atoms, e.g. the iron group, the 3d electrons are the outermost electrons and therefore interact strongly with other nearby atoms.
appreciably when transfered to the material and the total susceptibility would be just the sum of all atomic susceptibilities. And this is indeed what one primarily finds when going through the periodic table: Insulators: When J = 0, the paramagnetic contribution vanishes. If in addition L = S = 0 (closed-shell), the response is purely diamagnetic. This is the case for noble gas solids, simple ionic crystals like the alkali halides (recall from the discussion on cohesion that the latter can be viewed as closed-shell systems!), many molecules (e.g. H2 O), and thus also many molecular crystals. Otherwise, i.e., if J = 0 but not necessarily L and S, the response is a balance between Van Vleck paramagnetism and Larmor diamagnetism. In all cases, the effects are small and the results from the theoretical derivation presented in the previous section are in excellent quantitative agreement with experiment. For J 6= 0, the response is dominated by the paramagnetic term. This is the case, when the material contains rare earth (RE) or transition metal (TM) atoms (with partially filled f and d shells, respectively). These systems indeed obey Curie’s law, i. e., they exhibit susceptibilities that scale inversely with temperature. For the rare earths, even the magnitude of the measured Curie constant corresponds very well to the theoretical prediction given in Eq. (8.43). For TM atoms in insulating solids, howver, the measured Curie constants can only be understood if one assumes L = 0, while S is still given by Hund’s rules. This effect is referred to as quenching of the orbital angular momentum and is caused by a general phenomenon known as crystal field splitting. As illustrated by Fig. 8.5, the d shells of a TM atom belong to the outermost valence shells and are particularly sensitive to the environment. The crystal field lifts the five-fold degeneracy of the TM d-states, as shown for an octahedral environment in Fig. 8.6a). Hund’s rules are then in competition with the energy scale of the d-state splitting. If the energy separation is large, only the three lowest states can be filled and Hund’s rules give a low spin configuration 232
Figure 8.6: Effect of octahedral environment on transition metal ion with 5 electrons (d5 ): a) the degenerate levels split into two groups, b) for large energy separation a low spin and for small energy separation a high spin configuration c) might be favorable. d-count d4 d5 d6 d7
# unpaired electrons examples high spin low spin 4 2 Cr2+ , Mn3+ 5 1 Fe3+ , Mn2+ 4 0 Fe2+ , Co3+ 3 1 Co2+
Table 8.3: High and low spin octahedral transition metal complexes.
with weak diamagnetism (Fig. 8.6b). If the energy separation is small, Hund’s rules can still be applied to all five states and a high spin situation with strong paramagnetism emerges (Fig. 8.6c). Please note that the crystal field splitting does not affect RE atoms, since the partially filled f -orbitals of RE atoms are located deep inside the filled s and p shells and are therefore not significantly affected by the crystalline environment. Even in the case of TM atoms, the important thing to notice is that although the solid perturbs the magnitude of the “microscopic magnet” connected with the partially filled shell, all individual moments inside the material are still virtually decoupled from each other (i. e., the magnetic moment in the material is that of an atom that has simply adapted to a new symmetry situation). This is why Curie’s law is still valid, but the magnetic moment becomes an effective magnetic moment that depends on the crystalline environment. Table 8.3 summarizes the configurations for some transition metal ions in an octahedral environment. Semiconductors: Covalent materials only have partially filled shells, so they could be expected to have a finite magnetic moment. However, covalent bonds form through a pair of 233
electrons with opposite spin, and hence the net orbital angular momentum is zero. Therefore, covalent materials like silicon exhibit only a vanishingly small diamagnetic response, as long as they are not doped. Dilute magnetic semiconductors, i. e., semiconductors doped with transition metal atoms, hold the promise to combine the powerful world of semiconductors with magnetism. However, questions like how pronounced the magnetism is and if room temperature magnetism can be achieved are still heavily debated and subject of active research. Metals: In metals, the delocalized electrons add a new contribution to the magnetic behavior. In simple metals like the alkalis, this new contribution is in fact the only remaining one. As discussed in the chapter on cohesion, such metals can be viewed as closed-shell ion cores and a free electron glue formed from the valence electrons. The closed-shell ion cores have J = L = S = 0 and exhibit therefore only a negligible diamagnetic response. The magnetic behavior is then dominated by the Pauli paramagnetism of the conduction electrons. Measured susceptibilities are indeed in the order of magnitude that can be expected from Eq. (8.57). Quantitative differences arise from exchange-correlation effects that were not treated in our free electron model. We will come back to metals below.
8.4
Magnetic Order: Permanent Magnets
The theory we have developed up to now relies on the assumption that the magnetic properties of material can be described entirely by summing up the contributions that stem from the individual atoms and free electrons. In all cases, the paramagnetic or diamagnetic response is very small, at least when compared to electric susceptibilities, which are of the order of unity. For this exact reason, typically only electric effects are discussed when analysing the interactions of materials with electromagnetic fields. At this stage, we could therefore close the chapter on magnetism of solids as something really unimportant, if it was not for a few exceptions: Some elemental and compound materials (formed of or containing transition metal or rare earth atoms) exhibit a completely different magnetic behavior: enormous susceptibilities and a magnetization that does not vanish when the external field is removed. These so called ferromagnetic effects are not captured by the hitherto developed theory, since they originate from a strong coupling of the individual magnetic moments. This is what we will consider next.
8.4.1
Ferro-, Antiferro- and Ferrimagnetism
In our previously discussed, simple theory of paramagnetism the individual magnetic moments (ionic shells of non-zero angular momentum in insulators or the conduction electrons in simple metals) do not interact with one another. In the absence of an external field, the individual magnetic moments are then thermally disordered at any finite temperature, i. e., they point in random directions yielding a zero net moment for the solid as a whole, cf. Fig. 8.7a. In such cases, an alignment can only be caused by applying an external field, which leads to an ordering of all magnetic moments at sufficiently low temperatures, i.e., as long as the field is strong enough to overcome the thermodynamic trend towards disorder. Schematically, this is shown in Fig. 8.7b. However, a similar alignment could also be obtained by coupling the different magnetic moments, i. e., by an interaction that favors a parallel alignment for example. Already quite short-ranged interactions, 234
Figure 8.7: Schematic illustration of the distribution of directions of the local magnetic moments connected to “individual magnets” in a material. (a) Random thermal disorder in a paramagnetic solid with insignificant magnetic interactions, (b) complete alignment either in a paramagnetic solid due to a strong external field or in a ferromagnetic solid below its critical temperature as a result of magnetic interactions, and (c) example of antiferromagnetic ordering below the critical temperature.
e.g., nearest-neighbor interactions, can potentially lead to an ordered structure as the one shown in Fig. 8.7b. In literature, interactions that introduce a magnetic ordering are often called magnetic interactions. This does however not imply that the fundamental source of the interaction is magnetic (e.g., a magnetic dipole-dipole interaction, which in fact is not the reason for the ordering as we will see below). Materials that exhibit an ordered magnetic structure in the absence of an applied external field are called ferromagnets (or permanent magnets) and their resulting magnetic moment is known as spontaneous magnetization Ms . The complexity of the possible magnetically ordered states exceeds the simple parallel alignment case shown in Fig. 8.7b by far; some additional examples are shown in Fig. 8.8a. Additionally, Fig. 8.7c and Fig. 8.8b show another common case: Although, a microscopic ordering is present, the individual local moments sum to zero, so that no spontaneous magnetization is present. Such magnetically ordered states with Ms = 0 are called antiferromagnetic. If a spontaneous magnetization Ms 6= 0 occurs from the ordering of magnetic moments of different magnitude that are not completely aligned with Ms (not all local moments have a positive component along the direction of spontaneous magnetization), one talks about ferrimagnets. Fig. 8.8 shows a few examples of possible ordered ferrimagnetic structures. Please note that the complexity of possible magnetic structures is so large that it cannot be covered by a simple sketch. Indeed, some of them do not rigorously fall into any of the three categories discussed so far. Those structures are well beyond the scope of this lecture and will not be covered here.
8.4.2
Interaction versus Thermal Disorder: Curie-Weiss Law
Even in permanent magnets, the magnetic order does not usually prevail at all temperatures. Above a critical temperature TC , the thermal energy can overcome the ordering induced by the interactions. Then, the magnetic order vanishes and the material (most often) behaves like a simple paramagnet. In ferromagnets, TC is known as the Curie temperature, while in antiferromagnets it is often called as Néel temperature (sometimes denoted TN ). Note that TC may depend strongly on the applied field (both in strength 235
Figure 8.8: Examples for some typical magnetic ordering that can be found in a simple linear array of spins: (a) ferromagnetic, (b) antiferromagnetic, and (c) ferrimagnetic.
236
Figure 8.9: Typical temperature dependence of the magnetization Ms , of the specific heat cV , and of the zero-field susceptibility χo of a ferromagnet. The temperature scale is normalized to the critical (Curie) temperature TC .
and direction!). If the external field is parallel to the direction of spontaneous magnetization, TC typically increases with increasing external field since both ordering tendencies support each other. For other field directions, a range of complex phenomena can arise as a result of the competition between both ordering tendencies. At first, we are more interested in the ferromagnetic properties of materials in the absence of external fields, i. e., in the existence of a finite spontaneous magnetization. The gradual loss of order with increasing temperature is reflected in the continuous drop of Ms (T ) illustrated in Fig. 8.9. Just below TC , a power law dependence is typically observed Ms (T ) ∼ (TC − T )β
(for T ր TC ) ,
(8.59)
with a critical exponent β somewhere around 1/3. Coming from the high temperature side, the onset of ordering also appears in the zero-field susceptibility, which is found to diverge as T approaches TC χ0 (T ) = χ(T )|B=0 ∼ (T − TC )−γ
(for T ց TC ) ,
(8.60)
with γ around 4/3. This behavior already indicates that the material experiences dramatic changes at the critical temperature, which is also reflected by a divergence in other fundamental quantities like the zero-field specific heat cV (T )|B=0 ∼ (T − TC )−α
(for T → TC ) ,
(8.61)
with α around 0.1. The divergence is connected to the onset of long-range order (alignment), which sets in more or less suddenly at the critical temperature resulting in a second order phase transition. The actual transition is difficult to describe theoretically and theories are therefore often judged by how well (or badly) they reproduce the experimentally measured critical exponents α, β and γ. Above the critical temperature, the properties of magnetic materials often become “more normal”. In particular, for T ≫ TC , one usually finds a simple paramagnetic behavior, 237
Fe Co Ni Eu Gd Dy
m ¯ s (in µB ) matom (in µB ) TC (in K) ΘC (in K) 2.2 6 (4) 1043 1100 1.7 6 (3) 1394 1415 0.6 5 (2) 628 650 7.1 7 289 108 8.0 8 302 289 10.6 10 85 157
Table 8.4: Magnetic quantities of some elemental ferromagnets: The saturation magnetization at T = 0 K is given in form of the average magnetic moment m ¯ s per atom and the critical temperature TC and the Curie-Weiss temperature ΘC are given in K. For comparison, also the magnetic moment matom of the corresponding isolated atoms is listed (the values in brackets are for the case of orbital angular momentum quenching).
with a susceptibility that obeys the so-called Curie-Weiss law, cf. Fig. 8.9, χ0 (T ) ∼ (T − ΘC )−1
(for T ≫ TC ) .
(8.62)
ΘC is called the Curie-Weiss temperature. Since γ in Eq. (8.60) is almost never exactly equal to 1, ΘC does not necessarily coincide with the Curie temperature TC . This can, for example, be seen in Table 8.4, where also the saturated values for the spontaneous magnetization at T → 0 K are listed. In theoretical studies, one typically converts this measured Ms (T → 0 K) directly into an average magnetic moment m ¯ s per atom (in units of µB ) by simply dividing by the atomic density in the material. In this way, we can compare directly with the corresponding values for the isolated atoms, obtained as matom = g(JLS)µB J, cf. Eq. (8.36). As apparent from Table 8.4, the two values agree very well for the rare earth metals, but differ substantially for the transition metals. Recalling the discussion on the quenching of the orbital angular momentum of TM atoms immersed in insulating materials in section 8.3.4, we could once again try to ascribe this difference to the symmetry lowering of interacting d orbitals in the material. Yet, even using L = 0 (and thus J = S, g(JLS) = 2), the agreement does not become much better, cf. Table 8.4. Accordingly, one can summarize that in the RE ferromagnets the saturation magnetization appears to be a result of the perfect alignment of atomic magnetic moments that are not strongly affected by the presence of the surrounding material. Conversely, the TM ferromagnets (Fe, Co, Ni) exhibit a much more complicated magnetic structure that is not well described by the magnetic behavior of the individual atoms. As a first summary of our discussion on ferromagnetism (or ordered magnetic states in general), we are therefore led to conclude that it must arise out of an (yet unspecified) interaction between magnetic moments. This interaction produces an alignment of the magnetic moments and subsequently a large net magnetization that prevails even in the absence of an external field. This means we are certainly outside the linear response regime that has been used as fundamental basis for developing the dia- and paramagnetic theories in the first sections of this chapter. With respect to the alignment, the interaction seems to have the same qualitative effect as an applied field acting on a paramagnet, which explains the similarity between the Curie-Weiss and the Curie law of paramagnetic solids: Interaction and external field favor alignment, but are opposed by thermal disorder. Far 238
below the critical temperature the alignment is perfect and far above TC thermal disorder is predominant. In this case, the competition between alignment with respect to an external field and disorder is equivalent to the situation in a paramagnet (just the origin of the temperature scale has shifted). The interacting magnetic moments in ferromagnets can either stem from delocalized electrons (Pauli paramagnetism) or stem from partially filled atomic shells (Paramagnetism). This makes it plausible why ferromagnetism is a phenomenon specific to only some metals, and in particular metals with a high magnetic moment that arises from partially filled d or f shells (i. e., transition metals, rare earths, as well as their compounds). However, we cannot yet explain why only some and not all TMs and REs exhibit ferromagnetic behavior. The comparison between the measured values of the spontaneous magnetization and the atomic magnetic moments suggests that ferromagnetism in RE metals can be understood as the coupling of localized magnetic moments (due to the inert partially filled f shells). The situation is more complicated for TM ferromagnets, where the picture of a simple coupling between atomic-like moments fails. We will see below that this so-called itinerant ferromagnetism arises out of a subtle mixture of delocalized s electrons and the more localized, but nevertheless not inert d orbitals.
8.4.3
Phenomenological Theories of Ferromagnetism
After this first overview of phenomena arising out of magnetic order, let us now try to establish a proper theoretical understanding. Unfortunately, the theory of ferromagnetism is one of the less well developed of the fundamental theories in solid state physics (a bit better for the localized ferromagnetism of the REs, worse for the itinerant ferromagnetism of the TMs). The complex interplay of single particle and many-body effects, as well as collective effects and strong local coupling makes it difficult to break the topic down into simple models appropriate for an introductory level lecture. We will therefore not be able to present and discuss a full-blown ab initio theory like in the preceding chapters. Instead we will first consider simple phenomenological theories and refine them later, when needed. This will enable us to address some of the fundamental questions that are not amenable to phenomenological theories (like the source of the magnetic interaction). 8.4.3.1
Molecular (mean) field theory
The simplest phenomenological theory of ferromagnetism is the molecular field theory due to Weiss (1906). We had seen in the preceding section that the effect of the interaction between the discrete magnetic moments in a material is very similar to that of an applied external field (leading to alignment). For each magnetic moment, the net effect of the interaction with all other moments can therefore be thought of as a kind of internal field created by all other moments (and the magnetic moment itself). In this way we average over all interactions and condenses them into one effective field, i. e., molecular field theory is a typical representative of a mean field theory. Since this effective internal, or so-called molecular field corresponds to an average over all interactions, it is reasonable to assume that it will scale with M , i. e., the overall magnetization (density of magnetic moments). Without specifying the microscopic origin of the magnetic interaction and the resulting field, the ansatz of Weiss was to simply postulate a linear scaling with M H mol = µ0 λM 239
,
(8.63)
Figure 8.10: Graphical solution of equation 8.66. where λ is the so-called molecular field constant. This leads to an effective field H eff = H + H mol = H + µ0 λM
,
(8.64)
For simplicity, we will consider only the case of an isotropic material, so that the internal and the external point in the same direction, allowing us to write all formulae as scalar relations. Recalling the magnetization of paramagnetic spins (Eq. 8.40), M0 (T ) =
H g(JLS)µB J BJ (η) with η ∼ , V T
we obtain M (T ) = M0
Hef f T
→ M0
λM T
(8.65)
(8.66)
when the external field is switched off (H=0). We now ask the question if such a field can exist without an external field? Since M appears on the left and on the right hand side of the equation we perform a graphic solution. We set x = Tλ M (T ), which results in M0 (x) =
T x . λ
(8.67)
The graphic solution is show in Fig. 8.10 and tells us that the slop of M0 (0 K) has to be larger than Tλ . If this condition is fulfilled a ferromagnetic solution of our paramagnetic mean field model is obtained, although at this point we have no microscopic understanding of λ. 240
For the mean-field susceptibility we then obtain with H(T ) = M0 ( HTeff ) ∂M0 C ∂M0 ∂Heff C ∂M χ= 1+ = (1 + λχ) = = ∂H ∂Heff ∂H T ∂H T where we have used that for our paramagnet χ gives the Curie-Weiss law χ=
∂M0 ∂Heff
(8.68)
is just Curie’s law. Solving Eq. 8.68 for
C ∼ (T − ΘC )−1 . T − λC
(8.69)
ΘC is the Curie-Weiss temperature. The form of the Curie-Weiss law is compatible with an effective molecular field that scales linearly with the magnetization. Given the experimental observation of Curie-Weiss like scaling, the mean field ansatz of Weiss seems therefore reasonable. However, we can already see that this simple theory fails, as it predicts an inverse scaling with temperature for all T > TC , i. e., within molecular field theory ΘC = TC and γ = 1 at variance with experiment. The exponent of 1, by the way, is characteristic for any mean field theory. Given the qualitative plausibility of the theory, let us nevertheless use it to derive a first order of magnitude estimate for this (yet unspecified) molecular field. This phenomenological consideration therefore follows the reverse logic to the ab initio one we prefer. In the latter, we would have analyzed the microscopic origin of the magnetic interaction and derived the molecular field as a suitable average, which in turn would have brought us into a position to predict materials’ properties like the saturation magnetization and the critical temperature. Now, we do the opposite: We will use the experimentally known values of Ms and TC to obtain a first estimate of the size of the molecular field, which might help us lateron to identify the microscopic origin of the magnetic interactions (about which we still do not know anything). For the estimate, recall that the Curie constant C was given by Eq. (8.40) as N µo µ2B g(JLS)2 J(J + 1) N µo m2atom C = ≈ , (8.70) 3kB V V 3kB where we have exploited that J(J + 1) ∼ J 2 , allowing us to identify the atomic magnetic moment matom = µB g(JLS)J. With this, it is straightforward to arrive at the following estimate of the molecular field at T = 0 K, TC N 3kB TC m ¯s mol B (0 K) = µ0 λMs (0 K) = µ0 m ¯s ≈ . (8.71) 2 C V matom When discussing the content of Table 8.4 we had already seen that, for most ferromagnets, matom is at least of the same order of m ¯ s , so that plugging in the numerical constants we arrive at the following order of magnitude estimate B mol ∼
[5TC in K] Tesla . [matom in µB ]
(8.72)
Looking at the values listed in Table 8.4, one realizes that molecular fields are of the order of some 103 Tesla, which is at least one order of magnitude more than the strongest magnetic fields that can currently be produced in the laboratory. As take-home message 241
we therefore keep in mind that the magnetic interactions must be quite strong to yield such a high internal molecular field. Molecular field theory can also be employed to understand the temperature behavior of the zero-field spontaneous magnetization. As shown in Fig. 8.9 Ms (T ) decays smoothly from its saturation value at T = 0 K to zero at the critical temperature. Since the molecular field produced by the magnetic interaction is indistinguishable from an applied external one, the variation of Ms (T ) must be equivalent to the one of a paramagnet, only with the external field replaced by the molecular one. Using the results from section 8.3.2 for atomic paramagnetism, we therefore obtain directly a behavior equivalent to Eq. (8.40) g(JLS)µB B mol Ms (T ) = Ms (0 K) BJ , (8.73) kB T i. e., the temperature dependence is entirely given by the Brillouin function defined in Eq. (8.41). BJ (η), in fact, exhibits exactly the behavior we just discussed: it decays smoothly to zero as sketched in Fig. 8.9. Similar to the reproduction of the Curie-Weiss law, mean field theory provides us therefore with a simple rationalization of the observed phenomenon, but fails again on a quantitative level (and also does not tell us anything about the underlying microscopic mechanism). The functional dependence of Ms (T ) in both limits T → 0 K and T ր TC is incorrect, with e.g. Ms (T ր TC ) ∼ (TC − T )1/2
(8.74)
and thus β = 1/2 instead of the observed values around 1/3. In the low temperature limit, the Brillouin function decays exponentially fast, which is also in gross disagreement with the experimentally observed T 3/2 behavior (known as Bloch T 3/2 law). The failure to predict the correct critical exponents is a general feature mean field theories that cannot account for the fluctuations connected with phase transitions. The wrong low temperature limit, on the other hand, is due to the non-existence of a particular set of low-energy excitations called spin-waves (or magnons), a point that we will come back to later. 8.4.3.2
Heisenberg and Ising Hamiltonian
Another class of phenomenological theories is called spin Hamiltonians. They are based on the hypothesis that magnetic order is due to the coupling between discrete microscopic magnetic moments. If these microscopic moments are, for example, connected to the partially filled f -shells of RE atoms, one can replace the material by a discrete lattice model, in which each site i has a magnetic moment mi . A coupling between two different sites i and j in the lattice can then be described by a term coupling Hi,j = −
Jij mi · mj µ2B
,
(8.75)
where Jij is known as the exchange coupling constant between the two sites (unit: eV). In principle, the coupling can take any value between −Jij mi mj /µ2B (parallel (ferromagnetic) alignment) and +Jij mi mj /µ2B (antiparallel (antiferromagnetic) alignment) depending on the relative orientation of the two moment vectors. Note that this simplified interaction has no explicit spatial dependence anymore. If this is required (e.g. in a non-isotropic crystal with a preferred magnetization axis), additional terms need to be added to the Hamiltonian. 242
Figure 8.11: Schematic illustrations of possible coupling mechanisms between localized magnetic moments: (a) direct exchange between neighboring atoms, (b) superexchange mediated by non-magnetic ions, and (c) indirect exchange mediated by conduction electrons.
Applying this coupling to the whole material, we first have to decide which lattice sites to couple. Since we are now dealing with a phenomenological theory we have no rigorous guidelines that would tell us which interactions are important. Instead, we have to choose the interactions (e.g. between nearest neighbors or up to next-nearest neighbors) and their strengths Jij . Later, after we have solved the model, we can revisit these assumptions and see if they were justified. We can also use experimental (or first principles) data to fit the Jij ’s and hope to extract microscopic information. Often our choice is guided by intuition concerning the microscopic origin of the coupling. For the localized ferromagnetism of the rare earths one typically assumes a short-ranged interaction, which could either be between neighboring atoms (direct exchange) or reach across non-magnetic atoms in more complex structures (superexchange). As illustrated in Fig. 8.11 this conveys the idea that the interaction arises out of an overlap of electronic wavefunctions. Alternatively, one could also think of an interaction mediated by the glue of conduction electrons (indirect exchange), cf. Fig. 8.11c. However, without a proper microscopic theory it is difficult to distinguish between these different possibilities. Having decided on the coupling, the Hamiltonian for the whole material becomes in its simplest form of only pair-wise interactions H
Heisenberg
=
M X
coupling Hi,j
i,j=1
M X Jij = − mi · mj µ2 i,j=1 B
.
(8.76)
Similar to the discussion in chapter 6 on cohesion, such a pairwise summation is only justified, when the interaction between two sites is not affected strongly by the presence of magnetic moments at other nearby sites. Since as yet we do not know anything about 243
the microscopic origin of the interaction, this is quite hard to verify. The particular spin Hamiltonian of Eq. (8.76) is known as Heisenberg Hamiltonian and is the starting point for many theories of ferromagnetism. It looks deceptively simple, but in fact it is difficult to solve. Up to now, no analytic solution has been found. Since often analytic solutions are favored over numerical solutions further simplifications are made. Examples are one- or two-dimensional Heisenberg Hamiltonians (possibly only with nearest neighbor interaction) or the Ising model: H
Ising
M X Jij = − mi,z · mj,z µ2 | {z } i,j=1 B
.
(8.77)
±1 only
These models opened up a whole field of research in the statistical mechanics community. It quickly developed a life of its own and became more and more detached from the original magnetic problem. Instead it was and is driven by the curiosity to solve the generic models themselves. In the meantime numerical techniques such as Monte Carlo developed that can solve the original Heisenberg Hamiltonian, although they do of course not reach the beauty of a proper analytical solution. Since the Ising model has played a prominent role historically and because it has analytic solutions under certain conditions we will illustrate its solution in one dimension. For this we consider the Ising model in a magnetic field with only next nearest neighbor interactions M X X H Ising = − Jij σi σj − h j σj (8.78)
j
where < ij > denotes that the sum only runs over nearest neighbor and hj is the magnetic field µ0 H at site j and σi = ±1. Despite this simplified form no exact solution is known for the three dimensional case. For two dimensions Onsager found an exact solution in 1944 for zero magnetic field. Let us reduce the problem to one dimension and further simplify to Ji,i+1 = J and hj = h: H
Ising
= −J
N X i=1
σi σi+1 − h
X
σi
(8.79)
i
where we have used the periodic boundary condition σN +1 = σ1 . The partition function for this Hamiltonian is X X PN 1 PN Z(N, h, T ) = ... eβ [J i σi σi+1 + 2 i (σi +σi+1) ] , (8.80) σ1
with β = elements:
σN
1 . In order to carry out the spin summation we define a 2×2 matrix P with kB T ′
h
′
hσ|P |σ ′ i = eβ[Jσσ + 2 (σ+σ )] h1|P |1i = e
h−1|P | − 1i = e
(8.81)
β[J+h]
(8.82)
β[J−h]
(8.83)
h1|P | − 1i = h−1|P |1i = e−βJ , 244
(8.84)
P is called the transfer matrix and can be written in more compact form as β(J+h) −βJ e e P = . −βJ e eβ(J−h)
(8.85)
With this definition, the partition function becomes X X Z(N, h, T ) = ... hσ1 |P |σ2 ihσ2 |P |σ3 i . . . hσN −1 |P |σN ihσN |P |σ1 i σ1
=
X σ1
(8.86)
σN
hσ1 |P N |σ1 i = Tr(P N ) .
(8.87)
To evaluate the trace, we have to diagonalize P , which yields the following eigenvalues q 2 βJ −4βh λ± = e cosh(βh) ± sinh (βh) + e . (8.88) With this, the trace equates to N Tr(P N ) = λN + + λ−
(8.89)
.
Since we are interested in the thermodynamic limit N → ∞, λ+ will dominate over λ− , because λ+ > λ− for any h. This finally yields Z(N, h, T ) = λN +
(8.90)
and yields the free energy per spin, q 1 2 −4βh F (h, T ) = −kB T ln λ+ = −J − ln cosh(βh) ± sinh (βh) + e β
.
(8.91)
The magnetization then follows as usual cosh(βh) sinh(βh) + √sinh(βh) ∂F sinh2 (βh)+e−4βh q = M (h, T ) = − ∂h cosh(βh) + sinh2 (βh) + e−4βh
.
(8.92)
The magnetization is shown in Fig. 8.12 for different ratios J/h. For zero magnetic field (h = 0) coth(βh) = 1 and sinh(βh) = 0 and the magnetization vanishes. The next-nearest neighbor Ising model therefore does not yield a ferromagnetic solution in the absence of a magnetic field. It also does not produce a ferromagnetic phase transition at finite field, unless the temperature is 0. The critical temperature for this model is thus zero Kelvin. Returning now to the Heisenberg Hamiltonian, let us consider an elemental solid, in which all atoms are equivalent and exhibit the same magnetic moment m. Inspection of Eq. (8.76) reveals that J > 0 describes ferromagnetic and J < 0 antiferromagnetic coupling. The ferromagnetic ground state is given by a perfect alignment of all moments in one direction | ↑↑↑ . . . ↑↑i. Naively one would expect the lowest energy excitations to be a simple spin flip | ↑↓↑ . . . ↑↑i. Surprisingly, a detailed study of Heisenberg Hamiltonians reveals that these states not eigenstates of the Hamiltonian. Instead, the low energy excitations have a much more complicated wave-like form, with only slight directional 245
1
h/J=5
h/J=2 h/J=1
0.8 M(β)
h/J=0.5 0.6 0.4 h/J=0.1 0.2 0
h/J=0.01 0
0.1
0.2
0.3
β
0.4
0.5
0.6
Figure 8.12: The magnetization M (T ) in the 1-dimensional Ising model as a function of inverse temperature β = (kB T )−1 for different ratios J/h.
Figure 8.13: Schematic representation of the spatial orientation of the magnetic moments in a one-dimensional spin-wave.
246
changes between neighboring magnetic moments as illustrated in Fig. 8.13. At second thought it is then intuitive, that such waves will have only an infinitesimally higher energy than the ground state, if their wave vector is infinitely long. Such spin-waves are called magnons, and it is easy to see that they are missing from the mean-field description and also from the Ising model. As a result the Heisenberg Hamiltonian correctly accounts for the Bloch T 3/2 -scaling of Ms (T ) for T → 0 K. Magnons can in fact be expected quite generally, whenever there is a direction associated with the local order that can vary spatially in a continuous manner at an energy cost that becomes small as the wavelength of the variation becomes large. Magnons are quite common in magnetism (see e.g. the ground state of Cr) and their existence has been confirmed experimentally by e.g. neutron scattering. The Heisenberg model also does a good job at describing other characteristics of ferromagnets. The high-temperature susceptibility follows a Curie-Weiss type law. Close to the critical temperature the solution deviates from the 1/T -scaling. Monte Carlo simulations of Heisenberg ferromagnets yield critical exponents that are in very good agreement with the experimental data. Moreover, experimental susceptibility or spontaneous magnetization data can be used to fit the exchange constant Jij . The values obtained for nearest neighbor interactions are typically of the order of 10−2 eV for TM ferromagnets and 10−3 − 10−4 eV for RE ferromagnets. These values are high compared to the energy splittings in dia- and paramagnetism and give us a first impression that the microscopic origin of ferromagnetic coupling must be comparatively strong. We can also obtain this result by considering the mean-field limit of the Heisenberg Hamiltonian: H
Heisenberg
M X Jij = − mi · mj µ2 i,j=1 B
= −
M X i=1
M X Jij
mi ·
j=1
µ2B
mj
!
= −
M X i=1
mi · µ 0 Hi
.
(8.93)
In the last step, we have realized that the term in brackets has the units of Tesla, and can therefore be perceived as a magnetic field µ0 H arising from the coupling of moment i with all other moments in the solid. However, up to now we have not gained anything by this rewriting, because the effective magnetic field is different for every state (i. e., directional orientation of all moments). The situation can only be simplified, by assuming a mean field point of view and approximating this effective field by its thermal average < H >. This yields µ0 hHi =
M X Jij j=1
µ2B
hmj i =
PM
j=1 Jij µ2B
N V
Ms (T ) ∼ λMs (T )
(8.94)
since the average magnetic moment at site i in an elemental crystal is simply given by the macroscopic magnetization (dipole moment density) divided by the density of sites. As apparent from Eq. (8.94) we therefore arrive at a mean field that is linear in the magnetization as assumed by Weiss in his molecular field theory. In addition, we can now even relate the field to the microscopic quantity Jij . If we assume for only nearest neighbor 247
interactions in an fcc metal, it follows from Eq. (8.94) that J =
JijN N
µ2B B mol = 12m ¯s
(8.95)
.
Using the saturation magnetizations listed in Table 8.4 and the estimate of molecular fields of the order of 103 Tesla obtained above, we obtain exchange constants of the order of 10−2 eV for the TM or 10−3 − 10−4 eV for the RE ferromagnets in good agreement with the values mentioned before.
8.4.4
Microscopic origin of magnetic interaction
We have seen that the phenomenological models capture the characteristics of ferromagnetism. However, what is still lacking is microscopic insight into the origin of the strong interaction J. This must come from an ab initio theory. The natural choice (somehow also (falsely) conveyed by the name “magnetic interaction”) is to consider the direct dipolar interaction energy of two magnetic dipoles m1 and m2 , separated by a distance r. For this, macroscopic electrodynamics gives us J mag dipole =
1 [m1 · m2 − 3(m1 · ˆr)(m2 · ˆr)] r3
,
(8.96)
Omitting the angular dependence and assuming the two dipoles to be oriented parallel to each other and perpendicular to the vector between them we obtain the simplified expression 1 (8.97) J mag dipole ≈ 3 m1 m2 . r Inserting magnetic moments of the order of µB , cf. Table 8.4, and a typical distance of 2 Å between atoms in solids, this yields J mag dipole ∼ 10−4 eV. This value is 1-2 orders of magnitude smaller than the exchange constants we derived using the phenomenological model in the previous section. Having ruled out magnetic dipole interactions the only thing left are electrostatic (Coulomb) interactions. These do in fact give rise to the strong coupling, as we will see in the following. To be more precise it is the Coulomb interaction in competition with the Pauli exclusion principle. Loosely speaking, the exclusion principle keeps electrons with parallel spins apart, and thus reduces their Coulomb repulsion. The resulting difference in energy between the parallel spin configuration and the antiparallel one is the exchange energy we were seeking. This alignment is favorable for ferromagnetism only in exceptional circumstances because the increase in kinetic energy associated with parallel spins usually outweighs the favorable decrease in potential energy. The difficulty in understanding magnetic ordering lies in the fact that we need to go beyond the independent electron (one particle) picture. In the next few sections, we will look at how the Pauli principle and the Coulomb interaction can lead to magnetic effects. We will do this for two simple yet opposite model cases: two localized magnetic moments and the free electron gas. 8.4.4.1
Exchange interaction between two localized moments
The most simple realization of two localized magnetic moments is the familiar problem of the hydrogen molecule. Here the two magnetic moments are given by the spins of the 248
two electrons (1) and (2), which belong to the two H ions A and B. If there was only one ion and one electron, we would simply have the Hamiltonian of the hydrogen atom (ho ) (8.98)
ho |φi = ǫo |φi ,
with its ground state solution |φi = |r, σi. For the molecule we then also have to consider the interaction between the two electrons and two ions themselves, as well as between the ions and electrons. This leads to the new Hamiltonian (8.99)
H|Φi = (h0 (A) + h0 (B) + hint )|Φi ,
where the two-electron wave function |Φi is now a function of the positions of the ions A and B, the two electrons (1) and (2), and their two spins σ1 , σ2 . Since H has no explicit spin dependence, all spin operators commute with H and the total two-electron wave function must factorize |Φi = |Ψorb i|χspin i , (8.100)
i. e., into an orbital part and a pure spin part. As a suitable basis for the latter, we can choose the eigenfunctions of S2 and Sz , so that the spin part of the wave function is given by |χspin iS |χspin iT1 |χspin iT2 |χspin iT3
= = = =
2−1/2 (| ↑↓i − | ↓↑i) | ↑↑i 2−1/2 (| ↑↓i + | ↓↑i) | ↓↓i
S S S S
=0 =1 =1 =1
Sz Sz Sz Sz
=0 =1 =0 = −1
.
The fermionic character of the two electrons (and the Pauli exclusion principle) manifests itself at this stage in the general postulate that the two-electron wave function must change sign when interchanging the two indistinguishable electrons. |χspin iS changes sign upon interchanging the two spins, while the three triplet wave functions |χspin iT do not. Therefore only the following combinations are possible |Φisinglet = |Ψorb,sym i |χS i |Φitriplet = |Ψorb,asym i |χT i ,
(8.101)
where |Ψorb,sym (1, 2)i = |Ψorb,sym (2, 1)i and |Ψorb,asym (1, 2)i = −|Ψorb,asym (2, 1)i. This tells us that although the Hamiltonian of the system is spin independent a correlation between the spatial symmetry of the wave function and the total spin of the system emerges from the Pauli principle. For the hydrogen molecule the full solution can be obtained numerically without too much difficulty. However, here we are interested in a solution that gives us insight and that allows us to connect to the Heisenberg picture. We therefore start in the fully dissociated limit of infinitely separated atoms ( hΦ|hint |Φi = 0). The total wave function must separate into a product of wave functions from the isolated H atoms 1 φA (r) = √ e−α(r−RA ) N
.
(8.102)
However, the symmetry of the two electron wave function has to be maintained. Omitting the two ionic choices φ(1A)φ(2A) and φ(1B)φ(2B) that have a higher energy, the two low 249
energy choices are |Ψ∞ orb,sym i
|Ψ∞ orb,asym i
= 2−1/2 [ |φ(1A)i|φ(2B)i + |φ(2A)i|φ(1B)i ]
= 2−1/2 [ |φ(1A)i|φ(2B)i − |φ(2A)i|φ(1B)i ]
(8.103)
.
Here |φ(1A)i is our notation for electron (1) being in orbit around ion A. These two wave functions correspond to the ground state of H2 in either a spin singlet or a spin triplet configuration, and it is straightforward to verify that both states are degenerate in energy with 2ǫo . Let us now consider what happens when we bring the two atoms closer together. At a certain distance, the two electron densities will slowly start to overlap and to change. Heitler and London made the following approximation for the two wave functions
1 −αr φA (r) = √ e N
1+β
r·RA r
(8.104)
.
The two-electron problem is then still of the form given in Eq. (8.103) |ΨHL orb,sym i
|ΨHL orb,asym i
= (2 + 2S)−1/2 [ |φ(1A)i|φ(2B)i + |φ(2A)i|φ(1B)i ]
= (2 − 2S)−1/2 [ |φ(1A)i|φ(2B)i − |φ(2A)i|φ(1B)i ]
(8.105) ,
where S is the overlap integral S = hφ(1A)|φ(1B)ihφ(2A)|φ(2B)i = |hφ(1A)|φ(1B)i|2
.
(8.106)
Due to this change, the singlet ground state energy ES = hΦ|H|Φisinglet and the triplet ground state energy ET = hΦ|H|Φitriplet begin to deviate, and we obtain ET − ES = 2 where
CS − A 1 − S2
,
C = hφ(1A)|hφ(2B)| Hint |φ(2B)i|φ(1A)i (Coulomb integral) A = hφ(1A)|hφ(2B)| Hint |φ(2A)i|φ(1B)i (Exchange integral) .
(8.107)
(8.108) (8.109)
C is simply the Coulomb interaction of the two electrons with the two ions and themselves, while A arises from the exchange of the indistinguishable particles. If the exchange did not incur a sign change (as e.g. for two bosons), then A = CS and the singlet-triplet energy splitting would vanish. This tells us that it is the fermionic character of the two electrons that yields A 6= CS as a consequence of the Pauli exclusion principle and in turn a finite energy difference between the singlet ground state (antiparallel spin alignment) and the triplet ground state (parallel spin alignment). For the H2 molecule, the triplet ground state is in fact less energetically favorable, because it has an asymmetric orbital wave function. The latter must have a node exactly midway between the two atoms, meaning that there is no electron density at precisely that point where they could screen the Coulomb repulsion between the two atoms most efficiently. As we had anticipated, the coupling between the two localized magnetic moments is therefore the result of a subtle interplay of the electrostatic interactions and the Pauli exclusion principle. What 250
determines the alignment is the availability of a spin state that minimizes the Coulomb repulsion between electrons once Fermi statistics has been taken into account. For this reason, magnetic interactions are also often referred to as exchange interactions (and the splitting between the corresponding energy levels as the exchange splitting). By evaluating the two integrals C and A in Eq. (8.109) using the one-electron wave functions of the H atom problem, we can determine the energy splitting (ET − ES ). Realizing that the two level problem of the H2 molecule can be also described by the Heisenberg Hamiltonian of Eq. (8.76) (as we will show in the next section), we have therefore found a way to determine the exchange constant J. The Heitler-London approximation sketched in this section can be generalized to more than two magnetic moments, and is then one possible way for computing the exchange constants of Heisenberg Hamiltonians for (anti)ferromagnetic solids. However, this works only in the limit of highly localized magnetic moments, in which the wave function overlap is such that it provides an exchange interaction, but the electrons are not too delocalized (in which case the Heitler-London approximation of Eq. (8.105) is unreasonable). With this understanding of the interaction between two localized moments, also the frequently used restriction to only nearestneighbor interactions in Heisenberg Hamiltonians becomes plausible. As a note aside, in the extreme limit of a highly screened Coulomb interaction that only operates at each lattice site itself (and does not even reach the nearest neighbor lattice sites anymore), the magnetic behavior can also be described by a so-called Hubbard model, in which electrons are hopping on a discrete lattice and pay the price of a repulsive interaction U when two electrons of opposite spin occupy the same site. 8.4.4.2
From the Schrödinger equation to the Heisenberg Hamiltonian
Having gained a clearer understanding how the exchange interaction between two atoms arises we will now generalize this and derive a magnetic Hamiltonian from the Schrödinger equation. Recapping Chapter 0 the non-relativistic many-body Hamiltonian for the electrons is X ∇2 X Z 1X 1 H=− − + . (8.110) 2 |ri − Ra | 2 i6=j |ri − rj | i ia It is more convenient to switch to 2nd quantization by introducing the field operators X ψ(r) = φnλ (r − rα )anλ (rα ), (8.111) α,n,λ
where anλ (rα ) annihilates an excitation from state n and spin state λ at site rα and φ is the associated orbital. For the interaction part of the Hamiltonian we then obtain in 2nd quantization 1 X hα1 n1 α2 n2 |V |α3 n3 α4 n4 ia†n1 λ1 (rα1 )a†n2 λ2 (rα2 )an3 λ2 (rα3 )an4 λ1 (rα4 ) . 2 α1 α2 α3 α4
(8.112)
n1 n2 n3 n4 λ1 λ2
This is still exact, but a bit unwieldy and not very transparent.3 Since the relevant orbitals are localized we can restrict the terms to those in which α3 = α1 and α4 = α2 or α3 = α2 Try to explain why the operators occur in the order a† a† aa instead of a† aa† a. Writing the creators to the left and annihilators to the right is called normal ordering in the literature. 3
251
and α4 = α1 . The remaining terms involve various orbital excitations induced by the Coulomb interaction. These lead to off-diagonal exchange. We shall therefore restrict each electron to a definite orbital state. That is, we shall keep only those terms in which n3 = n1 and n4 = n2 or n3 = n2 and n4 = n1 . If, for simplicity, we also neglect orbital-transfer terms, in which two electrons interchange orbital states, then the interaction part reduces to 1X [ hαnα′ n′ |V |αnα′ n′ ia†nλ (rα )a†n′ λ′ (rα′ )an′ λ′ (rα′ )anλ (rα ) 2 ′
(8.113)
αα nn′ λλ′
+hαnα′ n′ |V |α′ n′ αni ] a†nλ (rα )a†n′ λ′ (rα′ )anλ′ (rα )an′ λ (rα′ )
(8.114)
The first term is the direct and the second the exchange term that we encountered in the previous section. Using the fermion anticommutation relation n o † anλ (rα ), an′ λ′ (rα′ ) = δαα′ δnn′ δλλ′ , (8.115) we may write the exchange term as
1X Jnn′ (rα , rα′ )a†nλ (rα )anλ′ (rα )a†n′ λ′ (rα′ )an′ λ (rα′ ) . 2 ′
(8.116)
αα nn′ λλ′
Expanding the spin sums we can rewrite his 1 1 1X Jnn′ (rα , rα′ ) + S(rα ) · S(rα′ ) , 2 ′ 4 4
(8.117)
αα nn′ λλ′
where S is the spin (or magnetic moment) of the atom at site rα . Now mapping over to lattice sites (rα → i and rα′ → j) we finally obtain the Heisenberg Hamiltonian from Eq. (8.75) X H spin = J˜ij Si · Sj . (8.118) ij
We have thus shown that magnetism does not arise from a spin-dependent part in the Schrödinger equation, but from the Pauli exclusion principle that the full many-body wave function has to satisfy. We could now use the Coulomb integrals derived in this section or the microscopic interaction we have derived in the previous section for Jij to fill the Heisenberg Hamiltonian with life. 8.4.4.3
Exchange interaction in the homogeneous electron gas
The conclusion from the localized moment model is that feeding Heitler-London derived exchange constants into Hubbard or Heisenberg models provides a good semi-quantitative basis for describing the ferromagnetism of localized moments as is typical for rare earth solids. However, the other extreme of the more itinerant ferromagnetism found in the transition metals cannot be described. For this it is more appropriate to return to the 252
homogeneous electron gas, i. e., treating the ions as a jellium background. We had studied this model already in section 8.3.3 for the independent electron approximation (zero electron-electron interaction) and found that the favorable spin alignment in an external field is counteracted by an increase in kinetic energy resulting from the occupation of energetically higher lying states. The net effect was weak and since there is no driving force for a spin alignment without an applied field we obtained a (Pauli) paramagnetic behavior. If we now consider the electron-electron interaction, we find a new source for spin alignment in the exchange interaction, i. e., an exchange-hole mediated effect to reduce the repulsion of parallel spin states. This new contribution supports the ordering tendency of an applied field, and could (if strong enough) maybe even favor ordering in the absence of an applied field. To estimate whether this is possible, we consider the electron gas in the Hartree-Fock (HF) approximation as the minimal theory accounting for the exchange interaction. In the chapter on cohesion, we already derived the HF energy per electron, but without explicitly considering spins HF
(E/N )jellium = Ts + E XC ≈
30.1 eV 12.5 eV 2 − rs aB
rs aB
(8.119)
.
Compared to the independent electron model of section 8.3.3, which only yields the first kinetic energy term, a second (attractive) term arises from exchange. To discuss magnetic behavior we now have to extend this description to the spin-polarized case. This is done in the most straightforward manner by rewriting the last expression as a function of the electron density (exploiting the definition of the Wigner-Seitz radius as rs /aB = 1/3 3 (V /N ) ) 4π ( 2/3 1/3 ) N N Ejellium,HF (N ) = N 78.2 eV − 20.1 eV . (8.120) V V The obvious generalization to a spin-resolved gas is then spin Ejellium,HF (N ↑ , N ↓ ) = Ejellium,HF (N ↑ ) + Ejellium,HF (N ↓ ) = (8.121) " 1/3 1/3 #) ( " 2/3 2/3 # N↓ N↓ N↑ N↑ = 78.2 eV + − 20.1 eV + , V V V V
where N ↑ is the number of electrons with up spin, and N ↓ with down spin (N ↑ + N ↓ = N ). More elegantly, the effects of spin-polarization can be discussed by introducing the polarization N↑ − N↓ P = , (8.122) N which possesses the obvious limits P = ±1 for complete spin alignment and P = 0 for a non-spinpolarized gas. Making use of the relations N ↑ = N2 (1 + P ) and N ↓ = N2 (1 − P ), Eq. (8.121) can be cast into Ejellium,HF (N, P ) = 1 5 5/3 5/3 4/3 4/3 NT (1 + P ) + (1 − P ) − α (1 + P ) + (1 − P ) 2 4 253
(8.123) ,
Figure 8.14: Plot of ∆E(P ), cf. Eq. (8.124), as a function of the polarization P and for various values of α. A ferromagnetic solution is found for α > 0.905.
with α = 0.10(V /N )1/3 . Ferromagnetism, i. e., a non-vanishing spontaneous magnetization at zero field, can occur when a system configuration with P 6= 0 leads to a lower energy than the non-polarized case, ∆E(N, P ) = E(N, P )−E(N, 0) < 0. Inserting the expression for the HF total energy we thus arrive at ∆E(N, P )
= NT
(8.124) 1 5 5/3 5/3 4/3 4/3 . (1 + P ) + (1 − P ) − 2 − α (1 + P ) + (1 − P ) − 2 2 4
Figure 8.14 shows ∆E(N, P ) as a function of the polarization P for various values of α. We see that the condition ∆E(N, P ) < 0 can only be fulfilled for α > αc =
2 1/3 (2 + 1) ≈ 0.905 , 5
(8.125)
in which case the lowest energy state is always given by a completely polarized gas (P = 1). Using the definition of α, this condition can be converted into a maximum electron density below which the gas possesses a ferromagnetic solution. In units of the Wigner-Seitz radius this is rs > ∼ 5.45 , (8.126) aB
i. e., only electron gases with a density that is low enough to yield Wigner-Seitz radii above 5.6 aB would be ferromagnetic. Recalling that 1.8 < rs < 5.6 for real metals, only Cs would be a ferromagnet in the Hartree-Fock approximation. Already this is obviously not in agreement with reality (Cs is paramagnetic and Fe/Co/Ni are ferromagnetic). However, the situation becomes even worse, when we start to improve the theoretical description by considering further electron exchange and correlation effects 254
beyond Hartree-Fock. The prediction of the paramagnetic-ferromagnetic phase transition in the homogeneous electron gas has a long history, and the results vary dramatically with each level of exchange-correlation. The maximum density below which ferromagnetism can occur is, however, consistently obtained much lower than in Eq. (8.126). The most quantitative work so far predicts ferromagnetism for electron gases with densities below (rs /aB ) ≈ 50±2 [F.H. Zhong, C.Lin, and D.M. Ceperley, Phys. Rev. B 66, 036703 (2002)]. From this we conclude that Hartree-Fock theory allows us to qualitatively understand the reason behind itinerant ferromagnetism of delocalized electrons, but it overestimates the effect of the exchange interaction by one order of magnitude in terms of rs . Yet, even a more refined treatment of electron exchange and correlation does not produce a good description of itinerant magnetism for real metals, none of which (with 1.8 < rs < 5.6) would be ferromagnetic at all in this theory. The reason for this failure cannot be found in our description of exchange and correlation, but must come from the jellium model itself. The fact that completely delocalized electrons (as in e.g. simple metals) cannot lead to ferromagnetism is correctly obtained by the jellium model, but not that Fe/Co/Ni are ferromagnetic. This results primarily from their partially filled d-electron shell, which is not well described by a free electron model as we had already seen in the chapter on cohesion. In order to finally understand the itinerant ferromagnetism of the transition metals, we therefore need to combine the exchange interaction with something that we have not yet considered at all: band structure effects.
8.4.5
Band consideration of ferromagnetism
In chapter 3 we had seen that an efficient treatment of electron correlation together with the explicit consideration of the ionic positions is possible within density-functional theory (DFT). Here, we can write for the total energy of the problem E/N = Ts [n] + E ion−ion [n] + E el−ion [n] + E Hartree [n] + E XC [n] ,
(8.127)
where all terms are uniquely determined functions of the electron density n(r). The most frequently employed approximation to the exchange-correlation term is the local density approximation (LDA), which gives the XC-energy at each point r as the one of the homogeneous electron gas with equivalent density n(r). Recalling our procedure from the last section, it is straightforward to generalize to the so-called spin-polarized DFT, with E/N = Ts [n↑ , n↓ ] + E ion−ion [n] + E el−ion [n] + E Hartree [n] + E XC [n↑ , n↓ ] .
(8.128)
n↑ (r) and n↓ (r) then are the spin up and down electron densities, respectively. We only need to consider them separately in the kinetic energy term (since there might be a different number of up- and down- electrons) and in the exchange-correlation term. For the latter, we proceed as before and obtain the local spin density approximation (LSDA) from the XC-energy of the spin-polarized homogeneous electron gas that we discussed in the previous section. Introducing the magnetization density m(r) = n↑ (r) − n↓ (r) µB , (8.129)
we can write the total energy as a function of n(r) and m(r). Since the integral of the magnetization density over the whole unit cell yields the total magnetic moment, selfconsistent solutions of the coupled Kohn-Sham equations can then either be obtained 255
Figure 8.15: Spin-resolved density of states (DOS) for bulk Fe and Ni in DFT-LSDA [from V.L. Moruzzi, J.F. Janak and A.R. Williams, Calculated electronic properties of metals, Pergamon Press (1978)].
under the constraint of a specified magnetic moment (fixed spin moment (FSM) calculations), or directly produce the magnetic moment that minimizes the total energy of the system. Already in the LSDA this approach is very successful and yields a non-vanishing magnetic moment for Fe, Co, and Ni. Resulting spin-resolved density of states (DOS) for Fe and Ni are exemplified in Fig. 8.15. After self-consistency is reached, these three ferromagnetic metals exhibit a larger number of electrons in one spin direction (called majority spin) than in the other (called minority spin). The effective potential seen by up and down electrons is therefore different, and correspondingly the Kohn-Sham eigenlevels differ (which upon filling the states up to a common Fermi-level yields the different number of up and down electrons that we had started with). Quantitatively one obtains in the LSDA the following average magnetic moments per atom, m ¯ s = 2.1 µB (Fe), 1.6 µB (Co), and 0.6 µB (Ni), which compare excellently with the experimental saturated spontaneous magnetizations at T = 0 K listed in Table 8.4. We can therefore state that DFT-LSDA reproduces 256
the itinerant ferromagnetism of the elemental transition metals qualitatively and quantitatively! Although this is quite an achievement, it is still somewhat unsatisfying, because we do not yet understand why only these three elements in the periodic table exhibit ferromagnetism. 8.4.5.1
Stoner model of itinerant ferromagnetism
To derive at such an understanding, we take a closer look at the DOS of majority and minority spins in Fig. 8.15. Both DOS exhibit a fair amount of fine structure, but the most obvious observation is that the majority and minority DOS seem to be shifted relative to each other, but otherwise very similar. This can be rationalized by realizing that the magnetization density (i. e., the difference between the up and down electron densities) is significantly smaller than the total electron density itself. For the XC-correlation potential one can therefore write a Taylor expansion to first order in the magnetization density as ↑↓ VXC (r) =
↑↓ δEXC [n(r), m(r)] 0 ≈ VXC (r) ± V˜ (r)m(r) , δn(r)
(8.130)
0 where VXC (r) is the XC-potential in the non-magnetic case. If the electron density is only slowly varying, one may assume the difference in up and down XC-potential (i. e., the term V˜ (r)m(r)) to also vary slowly. In the Stoner model, one therefore approximates it with a constant term given by
1 ↑↓ 0 VXC,Stoner (r) = VXC (r) ± IM 2
.
(8.131)
The proportionality constant I is called the Stoner parameter (or exchange integral), and R M = unit−cell m(r)dr (and thus M = (m ¯ s /µB )× number of elements in the unit cell!). Since the only difference between the XC-potential seen by up and down electrons is a constant shift (in total: IM ), the wave functions in the Kohn-Sham equations are not changed compared to the non-magnetic case. The only effect of the spin-polarization in the XC-potential is therefore to shift the eigenvalues ǫ↑↓ i by a constant to lower or higher values ǫ↑↓ = ǫ0i ± 1/2IM . (8.132) i The band structure and in turn the spin DOS are shifted with respect to the non-magnetic case, XZ ↑↓ 0 N (E) = δ(E − ǫ↑↓ (8.133) k,i )dk = N (E ± 1/2IM ) , i
BZ
i. e., the Stoner approximation corresponds exactly to our initial observation of a constant shift between the up and down spin DOS. So what have we gained by? So far nothing, but as we will see, we can employ the Stoner model to arrive at a simple condition that tells us by looking at an non-magnetic calculation, whether the material is likely to exhibit a ferromagnetic ground state or not. What we start out with are the results of a non-magnetic calculation, namely the nonmagnetic DOS N 0 (E) and the total number of electrons N . In the magnetic case, N remains the same, but what enters as a new free variable is the magnetization M . In the Stoner model the whole effect of a possible spin-polarization is contained in the parameter I. We now want to relate M with I to arrive at a condition that tells us which I will yield 257
Figure 8.16: Graphical solution of the relation M = F (M ) of Eq. (8.136) for two characteristic forms of F (M ).
a non-vanishing M (and thereby a ferromagnetic ground state). We therefore proceed by noticing that N and M are given by integrating over all occupied states of the shifted spin DOSs up to the Fermi-level Z 0 N = N (E + 1/2IM ) + N 0 (E − 1/2IM ) dE (8.134) ǫF Z 0 M = N (E + 1/2IM ) − N 0 (E − 1/2IM ) dE . (8.135) ǫF
Since N 0 (E) and N are fixed, these two equations determine the remaining free variables of the magnetic calculation, ǫF and M . However, the two equations are coupled (i. e., they depend on the same variables), so that one has to solve them self-consistently. The resulting M is therefore characterized by self-consistently fulfilling the relation Z 0 M = N (E + 1/2IM ) − N 0 (E − 1/2IM ) dE ≡ F (M ) (8.136) ǫF (M )
As expected M is a only function F (M ) of the Stoner parameter I, the DOS N 0 (E) of the non-magnetic material, and the filling of the latter (determined by ǫF and in turn by N ). To really quantify the resulting M at this point, we would need an explicit knowledge of N 0 (E). Yet, even without such a knowledge we can derive some universal properties of the functional dependence of F (M ) that simply follow from its form and the fact that N 0 (E) > 0, F (M ) = F (−M ) F (±∞) = ±M∞
F (0) = 0 F ′ (M ) > 0
.
Here M∞ corresponds to the saturation magnetization at full spin polarization, when all majority spin states are occupied and all minority spin states are empty. The structure of F (M ) must therefore be quite simple: monotonously rising from a constant value at −∞ to a constant value at +∞, and in addition exhibiting mirror symmetry. Two possible generic forms of F (M ) that comply with these requirements are sketched in Fig. 8.16. In case (A), there is only the trivial solution M = 0 to the relation M = F (M ) of Eq. 258
(8.136), i. e., only a non-magnetic state results. In contrast, case (B) has three possible solutions. Apart from the trivial M = 0, two new solutions with a finite spontaneous magnetization M = Ms 6= 0 emerge. Therefore, if the slope of F (M = 0) is larger than 1, the monotonous function F (M ) must necessarily cross the line M in another place, too. A sufficient criterion for the existence of ferromagnetic solutions with finite magnetization Ms is therefore F ′ (0) > 1. Taking the derivative of Eq. (8.136), we find that F ′ (0) = IN 0 (ǫF ), and finally arrive at the famous Stoner criterion for ferromagnetism
IN 0 (ǫF ) > 1 .
(8.137)
A sufficient condition for a material to exhibit ferromagnetism is therefore to have a high density of states at the Fermi-level and a large Stoner parameter. The latter is a material constant that can be computed within DFT (by taking the functional derivative of Vxc↑↓ (r) with respect to the magnetization; see e.g. J.F. Janak, Phys. Rev. B 16, 255 (1977)). Loosely speaking it measures the localization of the wave functions at the Fermi-level. In turn it represents the strength of the exchange interaction per electron, while N 0 (ǫF ) tells us how many electrons are able to participate in the ordering. Consequently, the Stoner criterion has a very intuitive form: strength per electron times number of electrons participating. If this exceeds a critical threshold, a ferromagnetic state results. Figure 8.17 shows the quantities entering the Stoner criterion calculated within DFTLSDA for all elements. Gratifyingly, only the three elements Fe, Co and Ni that are indeed ferromagnetic in reality fulfill the condition. With the understanding of the Stoner model we can now, however, also rationalize why it is only these three elements and not the others: For ferromagnetism we need a high number of states at the Fermi-level, such that only transition metals come into question. In particular, N 0 (ǫF ) is always highest for the very late TMs with an almost filled d-band. In addition, we also require a strong localization of the wave functions at the Fermi-level to yield a high Stoner parameter. And here, the compact 3d orbitals outwin the more delocalized ones of the 4d and 5d elements. Combining the two demands, only Fe, Co, and Ni remain. Yet, it is interesting to see that also Ca, Sc and Pd come quite close to fulfilling the Stoner criterion. This is particularly relevant, if we consider other situations than the bulk. The DOSs of solids (although in general quite structured) scale to first order inversely with the band width W . In other words, the narrower the band, the higher the number of states per energy within the band. Just as much as by a stronger localization, W is also decreased by a lower coordination. In the atomic limit, the band width is zero and the Stoner criterion is always fulfilled. Correspondingly, all atoms with partially filled shells have a non-vanishing magnetic moment, given by Hund‘s rules. Somewhere intermediate are surfaces and thin films, where the atoms have a reduced coordination compared to the bulk. The bands are then narrower and such systems can exhibit a larger magnetic moment or even a reversal of their magnetic properties (i. e., when they are non-magnetic in the bulk). The magnetism of surfaces and thin films is therefore substantially richer than the bulk magnetism discussed in this introductory lecture. Most importantly, it is also the basis of many technological applications. 259
Figure 8.17: Trend over all elements of the Stoner parameter I, of the non-magnetic DOS ˜ F )) and of their product. The Stoner criterion at the Fermi-level (here denoted as D(E is only fulfilled for the elements Fe, Co, and Ni [from H. Ibach and H. Lüth, Solid State Physics, Springer (1990)].
260
Figure 8.18: Schematic illustration of magnetic domain structure.
8.5
Magnetic domain structure
With the results from the last section we can understand the ferromagnetism of the RE metals as a consequence of the exchange coupling between localized moments due to the partially filled (and mostly atomic-like) f -shells. The itinerant ferromagnetism of the TMs Fe, Co and Ni, on the other hand, comes from the exchange interaction of the still localized, but no longer atomic-like d orbitals together with the delocalized s-electron density. We can treat the prior localized magnetism using discrete Heisenberg or Hubbard models and exchange constants from Heitler-London type theories (or LDA+U type approaches), while the latter form of magnetism is more appropriately described within band theories like spin-DFT-LSDA. Despite this quite comprehensive understanding, an apparently contradictory feature of magnetic materials is that a ferromagnet does not always show a macroscopic net magnetization, even when the temperature is well below the Curie temperature. In addition it is hard to grasp, how the magnetization can be affected by external fields, when we know that the molecular internal fields are vastly larger. The explanation for these puzzles lies in the fact that two rather different forces operate between the magnetic moments of a ferromagnet. At very short distance (of the order of atomic spacings), magnetic order is induced by the powerful, but quite short-ranged exchange forces. Simultaneously, at distances large compared with atomic spacings, the magnetic moments still interact as magnetic dipoles. We stressed in the beginning of the last section that the latter interaction is very weak; in fact between nearest neighbors it is orders of magnitude smaller than the exchange interaction. On the other hand, the dipolar interaction is rather long-ranged, falling off only as the inverse cube of the separation. Consequently, it can still diverge, if one sums over all interactions between a large population that is uniformly magnetized. In order to accommodate these two competing interactions (short range alignment, but divergence if ordered ensembles become too large), ferromagnets organize themselves into domains as illustrated in Fig. 8.18. The dipolar energy is then substantially reduced, since due to the long-range interaction the energy of every spin drops. This bulk-like effect is opposed by the unfavorable exchange interactions with the nearby spins in the neighboring misaligned domains. Because, however, the exchange interaction is short-ranged, it is only the spins near the domain boundaries that will have their exchange energies raised, i. e., this is a surface effect. Provided that the domains are not too small, domain formation will be favored in spite of the vastly greater strength of 261
Figure 8.19: Schematic showing the alignment of magnetic moments at a domain wall with (a) an abrupt boundary and (b) a gradual boundary. The latter type is less costly in exchange energy.
the exchange interaction: Every spin can lower its (small) dipolar energy, but only a few (those near the domain boundaries) have their (large) exchange energy raised. Whether or not a ferromagnet exhibits a macroscopic net magnetization, and how this magnetization is altered by external fields, has therefore less to do with creating alignment (or ripping it apart), but with the domain structure and how its size and orientational distribution is changed by the field. In this respect it is important to notice that the boundary between two domains (known as the domain wall or Bloch wall) has often not the (at first thought intuitive) abrupt structure. Instead a spread out, gradual change as shown in Fig. 8.19 is less costly in exchange energy. All an external field has to do then is to slightly affect the relative alignments (not flip one spin completely), in order to induce a smooth motion of the domain wall, and thereby increase the size of favorably aligned domains. The way how the comparably weak external fields can "magnetize" a piece of unmagnetized ferromagnet is therefore to rearrange and reorient the various domains, cf. Fig. 8.20. The corresponding domain wall motion can hereby be reversible, but it may also well be that it is hindered by crystalline imperfections (e.g. a grain boundary), through which the wall will only pass when the driving force due to the applied field is large. When the aligning field is removed, these defects may prevent the domain walls from returning to their original unmagnetized configuration, so that it becomes necessary to apply a rather strong field in the opposite direction to restore the original situation. The dynamics of the domains is thus quite complicated, and their states depend in detail upon the particular history that has been applied to them (which fields, how strong, which direction). As a consequence, ferromagnets always exhibit so-called hysteresis effects, when plotting the magnetization of the sample against the external magnetic field strength as shown in Fig. 8.21. Starting out with an initially unmagnetized crystal (at the origin), the applied external field induces a magnetization, which reaches a saturation value Ms when all dipoles are aligned. On removing the field, the irreversible part of the domain boundary dynamics leads to a slower decrease of the magnetization, leaving the so-called remnant magnetization MR at zero field. One needs a reverse field (opposite to the magnetization of the sample) to further reduce the magnetization, which finally reaches zero at the so-called coercive field Bc . Depending on the value of this coercive field, one distinguishes between soft and hard magnets in applications: A soft magnet has a small Bc and exhibits therefore only a small 262
Figure 8.20: (a) If all of the magnetic moments in a ferromagnet are aligned they produce a magnetic field which extends outside the sample. (b) The net macroscopic magnetization of a crystal can be zero, when the moments are arranged in (randomly oriented) domains. (c) Application of an external magnetic field can move the domain walls so that the domains which have the moments aligned parallel to the field are increased in size. In total this gives rise to a net magnetization.
Figure 8.21: Typical hysteresis loop for a ferromagnet.
263
Figure 8.22: Hysteresis loops for soft and hard magnets.
hysteresis loop as illustrated in Fig. 8.22. Such materials are employed, whenever one needs to reorient the magnetization frequently and without wanting to resort to strong fields to do so. Typical applications are kernels in engines, transformators, or switches and examples of soft magnetic materials are Fe or Fe-Ni, Fe-Co alloys. The idea of a hard magnet which requires a huge coercive field, cf. Fig. 8.22, is to maximally hinder the demagnetization of a once magnetized state. A major field of application is therefore magnetic storage with Fe- or Cr-Oxides, schematically illustrated in Fig. 8.23. The medium (tape or disc) is covered by many small magnetic particles each of which behaves as a single domain. A magnet – write head – records the information onto this medium by orienting the magnetic particles along particular directions. Reading information is then simply the reverse: The magnetic field produced by the magnetic particles induces a magnetization in the read head. In magnetoresistive materials, the electrical resistivity changes in the presence of a magnetic field so that it is possible to directly convert the magnetic field information into an electrical response. In 1988 a structure consisting of extremely thin, i. e., three atom-thick-alternative layers of Fe and Co was found to have a magnetoresistance about a hundred times larger than that of any natural material. This effect is known as giant magnetoresistance and is in the meanwhile exploited in todays disc drives.
264
Figure 8.23: Illustration of the process of recording information onto a magnetic medium. The coil converts an electrical signal into a magnetic field in the write head, which in turn affects the orientation of the magnetic particles on the magnetic tape or disc. Here there are only two possible directions of magnetization of the particles, so the data is in binary form and the two states can be identified as 0 and 1.
265
9 Transport Properties of Solids 9.1
Introduction
So far, we have almost exclusively discussed in solids in thermodynamic equilibrium. In this chapter, we will now discuss transport effects, i.e., how solids react to an external perturbation, .e.g, to an external electric field E, i.e., to the gradient of an electric potential E = −∇U , or to a temperature gradient ∇T . In all this cases, a flux (Je and Jh , see below) develops as reaction to the perturbation and tries to re-establish thermodynamic equilibrium. The effectiveness of this mechanism is characterized by the respective transport coefficients. These are material, temperature, and pressure dependent constants that describe the linear relationship between the respective fluxes and perturbations, e.g., the electrical conductivity σ in Ohm’s law Je = −σ∇U
(9.1)
and the thermal conductivity κ in Fourier’s law Jh = −κ∇T .
(9.2)
Please note that both σ and κ are tensorial quantities, e.g., a gradient along the x-axis can in principle lead to fluxes along the y- or z-axis. In most cases, however, these nondiagonal terms of σ and κ are orders of magnitude smaller than the diagonal terms, so that we will restrict our discussion to the diagonal terms in this chapter. Furthermore, it is important to realize that Eq. (9.1) and Eq. (9.2) are truncated first-order Taylor expansions of the full relationships for Je (U ) and Jh (T ). For the purposes of this chapter, using such a linear response approximation is more than justified, given that the typical voltages ∆U < 1, 000 V and temperature differences ∆T < 1, 000 K applied in real-world applications to macroscopic solids (volume V > 1µm3 ) correspond to minute electric fields1 and temperature gradients at the microscopic level V < 10 nm3 . In this regime, we can thus consider the solid in the so called “Onsager picture” as consisting of microscopic, individual sections: Each of this section is assumed to so large that thermodynamic equilibrium rules hold and allow to define thermodynamic quantities such as the temperature for each individual section. However, the individual sections are not in thermodynamic equilibrium with respect to each other. At first sight, this seems to be an assumption that apparently simplifies the discussion. As we will see in this chapter, this is not the case for practical considerations: The fundamental reason is that we 1
As already mentioned in the previous chapter, this approximation would not longer hold for the interactions of solids with lasers. Today’s laser can generate huge electrical fields at a microscopic level that in turn lead to non-linear responses.
266
are now discussing macroscopic effects driven by microscopic interactions, which requires spanning several orders of magnitude in time and length scales. Developing theories that accurately describe the microscopic level but still allow insights on the macroscopic level within affordable computational effort is still topic of scientific research.
9.2
The Definition of Transport Coefficients
In the following, we will discuss electronic transport. For phonons, see Chap. 7, very similar relations are applicable with two important differences: Electrons carry charge and are Fermions, while phonons carry no charge and are Bosons. If the fluxes consist of particles that carry both energy and charge, as it is the case for electrons with charge e described by a chemical potential µ (i.e. the local Fermi energy in the respective “Onsager section”), the respective fluxes Je and Jh are coupled. An electron that is contributing to a heat flux driven by a thermal gradient still carries a charge and thus also leads to a electrical flux (and vice versa). Formally this is described by the so called Onsager coefficients Lij : ∇µ 11 Je = L −∇U − (9.3) + L12 (−∇T ) e ∇µ 21 (9.4) + L22 (−∇T ) . Jh = L −∇U − e As we will show with microscopic considerations later, L21 = T L12 . With that, a comparison of the Onsager relations above with Eq. (9.1) and (9.2) shows that the electrical conductivity corresponds to σ = L11 and that the thermal conductivity corresponds to κ = L22 −
L21 L12 (L12 )2 22 = L − T . L11 L11
(9.5)
The formal asymmetry in the definition of σ and κ stems in fact from their experimental definition. While σ is defined for measurements in thermodynamic equilibrium, in which both cathode and anode are kept at the same temperature via interactions with the environment (so heat is added and taken from the system), κ is typically measured in an open circuit, in which an electronic flux and the respective charges cannot leave the system. This leads to a pile-up of charge on the cold side, which in turn leads to an electric flux in the opposite direction that needs to be accounted for by the non-diagonal Lij terms. This leads to the definition given in Eq. (9.5). In turn, the considerations discussed above highlight an interesting and peculiar effect, the thermoelectric effect. For a non-zero temperature gradient in a closed circuit, Eq. (9.3) and (9.4) imply that a residual voltage is induced by a temperature gradient L12 ∇µ = 11 ∇T = S∇T , (9.6) −∇U − e L as described by the Seebeck coefficient S. This thermoelectric effect is currently very much in the focus of many scientific and industrial studies, since it in principle allows to recover a useful voltage from otherwise wasted heat, e.g., from industrial plants operating at high temperatures, from exhaustion fumes from vehicles, and even from electronics such 267
as CPUs. However, the conversion efficiency of the currently exhisting thermoelectric materials is still too low for a large-scale, economically attractive deployment. To enable such applications, thermoelectric materials with higher efficiency are needed, which is typically characterized by the figure-of-merit zT defined as: zT =
σS 2 T , κel + κph
(9.7)
whereby currently known thermoelectric materials exhibit values of zT of ∼ 2 at most. Improving zT is far from trivial: Metals typically exhibit high values of σ but also high values of κel , which essentially leads to zT ≈ 0. Vice versa, κ = κph becomes small in insulators due to the absence of charge carriers, but then σ and thus zT vanish again. Accordingly, (highly doped) semiconductors are the most attractive material class for these kind of applications. Eventually, let us note that the Seebeck effect also works “the other way round”: The so called Peltier effect describes how an electric current through a closed circuit induces a heat current Jh = T
L12 Je = T S · Je L11
(9.8)
that allows cooling. For instance, this technique is used to date in so called solid state fridges, e.g., in airplanes. To understand these effects and to estimate and compute this quantities, it is necessary to introduce concepts stemming from the still very active and quite complex research field of non-equilibrium quantum-mechanical thermodynamics and statistical mechanics. At first, let’s just discuss the qualitative results of such a derivation. The microscopic, local flux densities j(r, t) and jH (r, t) have to fulfill the respective continuity equations Z ∂n(r, t) 1 e + ∇ · j(r, t) = 0 ⇒ J(t) = j(r, t)dr (9.9) ∂t V V Z 1 ∂ε(r, t) + ∇ · jH (r, t) = 0 ⇒ JH (t) = jH (r, t)dr , (9.10) ∂t V V in which e n(r, t) is the charge density, ε(r, t) the energy density, and V the volume of an Onsager system. This allows to define the quantum-mechanical flux operators ˆj = e p ˆ (9.11) m ˆjH = 1 [H, p ˆ ] − he p ˆ. (9.12) 2m In the last equation, he denotes the enthalpy per electron. A full quantum-mechanical treatment of these equations involves quite elaborate derivations. For this exact reason, we will consider these effects in a semi-classical approximation first.
9.3
Semiclassical Theory of Transport
To introduce non-equilibrium in our system, we first define a wave packet X i Ψn (r, k, t) = g(k)Φn (k, r) exp − εn (k) with g(k′ ) ≈ 0 for |k − k′ | > ∆k (9.13) ~ k 268
Figure 9.1: Sketch of an electron wavepacket, as described by Eq. (9.13): The wavepacket shall be spread out over multiple unit cells, i.e., it shall be larger that typical lattice constants. Also, the wavepacket shall be smaller than the wavelength of the applied field. using the one-particles states Φn (k, r) and their eigenvalues εn (k). The condition |k−k′ | > δk implies that the wave packet is localized in k space and can thus be meaningfully approximated by an average k value. In turn, this implies that this wave packet is delocalized in real space, see Fig. 9.1. Accordingly, such a description is consistent with the Onsager picture defined in the introduction. Given that this wave packet is so wide spread, we can describes its dynamics in an electric field by classical equations of motion: r˙ = vn (k) =
1 ∂εn (k) ~ ∂k
and
~k˙ = −eE(r, t) .
(9.14)
While the first equation just makes use of the fact that a wave-packet travels with a welldefined group velocity, a fully correct, formal derivation of the second equations is tricky. Still, one can qualitatively derive them by considering that energy conservation needs to hold during this motion: εn (k) − eU = constant (9.15) Then, its time derivative must vanish:
∂εn (k) ˙ k − e∇U · r˙ ∂k h i h i ˙ ˙ = vn (k) · ~k − e∇U = vn (k) · ~k + eE
0 =
(9.16) (9.17)
Please note that in this case we have assumed that the band index n is a constant of the motion, i.e., that the considered electrical and thermal fields cannot induce electronic transition between different electronic bands. The flux associated to such a wave packets then becomes Jn (k) = −evn (k) and JH (9.18) n (k) = εn (k)vn (k) In this formalism, the individual electrons thus just travel with a velocity vn (k) and thereby carry a charge −e and an energy εn (k) with them. 269
Already this very simple considerations allow some very important insight in the physics of transport: • Solving this equations of motion for a constant DC field E(r, t) = E yields k(t) = k(0) −
eEt . ~
(9.19)
Given that our group velocity is periodic and obeys time reversal symmetry vn (k) = −vn (−k), this implies that the group velocity can change sign over time. In turn, this means that a DC field could induce an AC current if the electrons travel long enough to overcome zone boundaries. We will see later that this is generally not the case. • For the exact same reasons (vn (k) = −vn (−k), εn (k) = εn (−k)), we can see that a fully filled band is inert, i.e., it does not carry any flux Z Jn = dk Jn (k) = 0 , (9.20) since the contributions from k and −k cancel each other out. That’s why insulators are insulators and electronic transport only happens in semiconductors and metals with partially filled bands. This implies that the only relevant portion of the electronic band structure that contributes to transport is situated around the Fermi energy εF ± kB T . • Electrons & Holes: Now, let’s assume that band n contains only one electron at k′ viz. εn (k′ ), i.e., ( 1 for k = k′ f (εn (k)) = . (9.21) 0 else In this case, the electric flux becomes: Z Jn = dk f (εn (k))Jn (k) = −evn (k′ ) .
(9.22)
If the band would be fully filled except one electron at k′ viz. εn (k′ ), the respective occupation numbers become f ′ (εn (k)) = 1 − f (εn (k)) and we get the flux Z Z Z ′ Jn = dk f (εn (k))Jn (k) = dk Jn (k) − dk f (εn (k))Jn (k) = evn (k′ ) . (9.23) The first integral vanishes due to Eq. (9.20); the second one describes the flux of one “electron with opposite charge”, i.e., a hole. Although formally equivalent, it is often more convenient to discuss conduction in almost fully filled bands in terms of moving holes instead of electrons. There is one additional important implication: Since in full thermodynamic equilibrium the Fermi distribution is symmetric f (εn (−k)) = f (εn (k)), no flux can be present even in not fully filled bands. That is why we will have to look at nonequilibrium distributions in the next section to understand the transport. 270
• In semiconductors, in which only electrons/holes close to the valence band maximum/conduction band minimum contribute, one can often approximate the respective band structure with a second order Taylor expansion: εn (k) ≈ εn (k) +
1 X ∂ 2 εn (k) ki kj . 2 i,j ∂ki kj
(9.24)
Formally, the dispersion thus corresponds to the one of a free electron with a slightly different, not necessarily isotropic mass, as described by the tensor: −1 1 ∂ 2 εn (k) [M ]ij = . (9.25) ~2 ∂ki ∂kj Accordingly, one can speak about fast (light) and slow (heavy) electrons/holes in a solid.
9.4
Boltzmann Transport Theory
With the formalisms introduced in the previous sections, we are now able to discuss how an arbitrary electron population g(r, k, t), i.e., one that is not necessarily in thermodynamic equilibrium, evolves over time. Using Liouville’s theorem (conservation of phase space) and the relations derived for r(t) and k(t) in the previous section, we thus get: g(r, k, t)drdk = g(r − v(k)dt, k +
eE , t − dt)drdk ~
(9.26)
By linearizing these equation, we get the so called “drift” terms that in an Onsager picture have to be counterbalance by a scattering or collision mechanism: dg(r, k, t) ∂g ∂g eE ∂g ∂g = +v· − = (9.27) dt ∂t ∂r ~ ∂k ∂t coll drift By defining a collision scattering probability from state k into k′ Wkk′ dtdk′ , (2π)3
(9.28)
we find that these collisions can either lead to less electrons at k (scattering from k into any state k′ ): Z ∂g dk′ = −g(k) (9.29) Wkk′ [1 − g(k′ )] ∂t coll,out (2π)3 or to additional electrons at k (scattering from any state k′ into state k)2 : Z ∂g dk′ = [1 − g(k)] Wk′ k g(k′ ) . ∂t coll,in (2π)3 2
(9.30)
Please note that the respective collision rates in Eq. (9.29) and (9.30) are proportional to the population g of electrons that are actually scattered and proportional to 1 − g of states in which these electrons can be scattered.
271
With that, we get the full collision term ∂g ∂g ∂g = + ∂t coll ∂t coll,in ∂t coll,out Z ′ dk = − {Wkk′ g(k)[1 − g(k′ )] − Wk′ k [1 − g(k)]g(k′ )} (2π)3 [g(k) − g 0 (k)] ≈ − τ (k)
(9.31) (9.32) (9.33)
In the last step, we have applied the so called relaxation time approximation (RTA) that allows to solve the Boltzmann transport equation analytically. This relaxation time approximation assumes (a) that the electron distribution is close to the equilibrium one, i.e., to the Fermi function (b) that scattering generates equilibrium, i.e., drives the population g(k) towards g 0 (k), and (c) that all the involved scattering rates can be subsumed into single state-specific lifetime or relaxation time τ (k) that does not depend on g(k). Formally this implies: Z ∂g dk′ g(k) = −g(k) Wkk′ [1 − g(k′ )] ≈ − (9.34) 3 ∂t coll,out (2π) τ (k) Z ∂g dk′ g 0 (k) ′ ′ k g(k ) ≈ = [1 − g(k)] W (9.35) k ∂t coll,in (2π)3 τ (k) Now, lets solve the Boltzmann transport equation under different boundary conditions: • If no external forces/fields are present (E = 0) and the electrons are uniformly distributed in space (∇g = 0), Eq. (9.27) becomes [g(k) − g 0 (k)] ∂g =− ∂t τ (k)
(9.36)
in the RTA. Its solution is
t g(t) = g(k, t = 0) − g (k) exp − τ (k) 0
+ g 0 (k) ,
(9.37)
which describes the relaxation of a non-equilibrium population g(k, t = 0) to the equilibrium one in an exponential decay characterized by the relaxation time τ . • Under steady state (∂g/∂t = 0), closed circuit conditions (∇g = 0) with an external field E, Eq. (9.27) becomes −
eE ∂g [g(k) − g 0 (k)] =− ~ ∂k τ (k)
(9.38)
in the RTA. Its solution is g(t) = g 0 (t) + τ
eE ∂g eE ∂g 0 ∂g 0 ≈ g 0 (t) + τ = g 0 (t) + e v(k)τ E ~ ∂k ~ ∂k ∂ǫ(k) 272
(9.39)
In the last step, we have simplified the expression by assuming ∂g/∂k ≈ ∂g 0 /∂k and using the definition of the group velocity. With that, we can write the flux as X Z dk X Z dk ∂g 0 2 2 g (k, v (k)τ (k) j = −e t)v (k) = e E. (9.40) n n n 4π 3 4π 3 n ∂ǫn (k) n n | {z } σ
A comparison with Ohm’s law yields the definition of σ. Again, the fact that the derivative ∂g 0 /∂ǫn (k) enters the definition of σ above highlights that filled bands are inert and that only electrons/holes close to the Fermi-energy matter. Please note that the derivation can be generalized to the AC case X Z dk ∂g 0 1 2 2 σ(ω) = e v (k) (9.41) n 3 4π ∂ǫn (k) 1/τn (k) − iω n by using
E → Re[E exp(iωt)] .
(9.42)
• Eventually, we can generalize this derivation to describe all Onsager coefficients by defining the respective heat jq and entropy js fluxes from the respective thermodynamic relation: dQ = T dS ⇒ jq = T js . (9.43) Similarly, the respective particle jn and energy flux je can be defined using the respective thermodynamic relation and the chemical potential µ: T dS = dU − µdN ⇒ T js = je − µjn . Accordingly, we get: je jn ⇒ jq
X Z dk gn (k, t)vn (k)εn (k) = 4π 3 n X Z dk gn (k, t)vn (k) = 4π 3 n X Z dk = gn (k, t)vn (k)(εn (k) − µ) 4π 3 n
Solving the respective BTEs in the RTA yields: 0 ∂g εn (k) − µ 0 g(t) ≈ g (t) − v(k)τ (k) −eE − ∇µ + (−∇T ) . ∂ǫk T
(9.44)
(9.45) (9.46) (9.47)
(9.48)
Again, a comparison with the Onsager relations introduced before j = L11 (E + ∇µ) + L12 (−∇T ) jq = L21 (E + ∇µ) + L22 (−∇T )
yields the following definition for the Onsager coefficients X Z dk ∂g 0 α 2 2 L =e v (k)τ (k) (εn (k) − µ)α , n n 3 4π ∂ǫn (k) n
where L11 = L0 , L21 = T L12 = − 1e L1 , and L22 = 273
1 L2 . e2 T
(9.49) (9.50)
(9.51)
From the derived equations, one can immediately derive some important material properties: For metals, which feature a large amount of free carriers, the electrical and thermal conductivity is typically. Using the Sommerfeld expansion around εF introduced in the exercises, one can immediately show that the thermal conductivity and the electrical conductivity are proportional to each other, as described by the Wiedemann-Franz law: 2 π 2 kB κ= Tσ . (9.52) 3 e As shown in Fig. 9.2, the typical temperature dependence of the resistivity ρ = 1/σ follows the following rule: ρ(T ) ≈ ρ(0) + BT 5 . (9.53)
At low temperatures, we have a finite (residual) resistivity ρ(0) that is caused by scattering with (intrinsic) defects, grain boundaries, and surfaces. At elevated temperatures, the resistivity increases with T 5 due to the scattering of electrons with phonons that determine the lifetimes τ (k). As we will see in the next section, the resistivity can, however, drop to 0 (infinite conductivity) under certain circumstances.
9.5
Superconductivity
So far in this lecture we have considered materials as many-body “mix” of electrons and ions. The key approximations that we have made are • the Born-Oppenheimer approximation that allowed us to separate electronic and ionic degrees of freedom and to concentrate on the solution of the electronic manybody Hamiltonian with the nuclei only entering parameterically. • the independent-electron approximation, which greatly simplifies the theoretical description and facilitates our understanding of matter. For density-functional theory we have shown in Chapter 3 that the independent-electron approximation is always possible for the many-electron ground state. • that the independent quasiparticle can be seen as a simple extension of the independentelectron approximation (e.g. photoemission, Fermi liquid theory for metals, etc.). We have also considered cases in which these approximations break down. The treatment of electron-phonon (or electron-nuclei) coupling, for example, requires a more sophisticated approach, as we saw. Also most magnetic orderings are many-body phenomena. In Chapter 8 we studied, for example, how spin coupling is introduced through the manybody form of the wave function. In 1911 H. Kamerlingh Onnes discovered superconductivity in Leiden. He observed that the electrical resistance of various metals such as mercury, lead and tin disappeared completely at low enough temperatures (see Fig. 9.2). Instead of dropping off as T 5 to a finite resistivity that is given by impurity scattering in normal metals, the resistivity of superconductors drops of sharply to zero at a critical temperature Tc . The record for the longest unstained current in a material is now 2 1/2 years. However, according to the theories for ordinary metals developed in this lecture, perfect conductivity should not exist at T > 0. Moreover, superconductivity is not an isolated phenomenon. Figure 9.3 shows the 274
Figure 9.2: The resistivity of a normal (left) and a superconductor (right) as a function of temperature.
Figure 9.3: Superconducting elements in the periodic table. From Center for Integrated Research & Learning. elements in the periodic table that exhibit superconductivity, which is surprisingly large. Much larger, in fact, than the number of ferromagnetic elements in Fig. 8.1. However, compared to the ferromagnetic transition temperatures discussed in the previous Chapter typical critical temperatures for elemental superconductors are very low (see Tab. 9.1). The phenomenon of superconductivity eluded a proper understanding and theoretical description for many decades, which did not emerge until the 1950s and 1960s. Then in 1986 the picture that had been developed for “classical” superconductor was turned on its head by the discovery of high temperature superconductivity by Bednorz and Müller. This led to a sudden jump in achievable critical temperatures as Fig. 9.4 illustrates. The record element Tc /K
Hg (α) Sn 4.15 3.72
Pb 7.19
Nb 9.26
Table 9.1: Critical temperatures in Kelvin for selected elemental superconductor. 275
Figure 9.4: Timeline of superconducting materials and their critical temperature. Red symbols mark conventional superconductors and blue high temperature superconductors. From Wikipedia. now lies upwards of 150 Kelvin, but although this is significantly higher than what has so far been achieved with “conventional” superconductors (red symbols in Fig. 9.4), the dream of room temperature superconductivity still remains elusive. What also remains elusive is the mechanism that leads to high temperature superconductivity. In this Chapter we will therefore focus on “classical” or “conventional” superconductivity and recap the most important historic steps that led to the formulation of the BCS ( Bardeen, Cooper, and Schrieffer) theory of superconductivity. Suggested reading for this Chapter are the books by A. S. Alexandrov Theory of Superconductivity – From Weak to Strong Coupling, M. Tinkham Introduction to Superconductivity and D. L. Goodstein States of Matter.
9.6
Meissner effect
Perfect conductivity is certainly the first hallmark of superconductivity. The next hallmark was discovered in 1933 by Meissner and Ochsenfeld. They observed that in the superconducting state the material behaves like a perfect diamagnet. Not only can an external field not enter the superconductor, which could still be explained by perfect conductivity, but also a field that was originally present in the normal state is expelled in the superconducting state (see Fig. 9.5). Expressed in the language of the previous Chapter this implies B = µ0 µH = µ0 (1 + χ)H = 0 (9.54) and therefore χ = −1, which is not only diamagnetic, but also very large. The expulsion of a magnetic field cannot be explained by perfect conductivity, which would tend to trap flux. The existence of such a reversible Meissner effect implies that superconductivity will be destroyed by a critical magnetic field Hc . The phase diagram of such a superconductor (denoted type I, as we will see in the following) is shown in Fig. 9.6. 276
Figure 9.5: Diagram of the Meissner effect. Magnetic field lines, represented as arrows, are excluded from a superconductor when it is below its critical temperature. To understand this odd behavior we first consider a perfect conductor. However, we assume that only a certain portion ns of all electrons conduct perfectly. All others conduct dissipatively. In an electric field E the electrons will be accelerated m
dvs = −eE dt
(9.55)
and the current density is given by j = −evs ns . Inserting this into eq. 9.55 gives ns e 2 dj = E . dt m
(9.56)
Substituting this into Faraday’s law of induction ∇×E=− we obtain
∂ ∂t
1 ∂B c ∂t
(9.57)
(9.58)
ns e 2 ∇×j+ B mc
Together with Maxwell’s equation
∇×B=
4π j c
=0 .
(9.59)
this determines the fields and currents that can exist in the superconductor. Eq. 9.58 tells us that any change in the magnetic field B will be screened immediately. This is consistent with Meissner’s and Ochsenfeld’s observation. However, there are static currents and magnetic fields that satisfy the two equations. This is inconsistent with the observation 277
Figure 9.6: The phase diagram of a type I superconductor as a function temperature and the magnetic field H. that fields are expelled from the superconducting region. Therefore perfect conductivity alone does not explain the Meissner effect. To solve this conundrum F. and H. London proposed to restrict the solutions for superconductors to fields and currents that satisfy ∇×j+
ns e 2 B=0 . mc
(9.60)
This equation is known as London equation and inserting it into Maxwell’s equation yields ∇ × (∇ × B) +
4πns e2 B=0 . mc2
(9.61)
Using the relation ∇ × (∇ × B) = ∇(∇B) − ∇2 B and applying Gauss’ law ∇B = 0 we obtain 4πns e2 ∇2 B = B . (9.62) mc2 For the solution of this differential equation we consider a half infinite superconductor (x > 0). The magnetic field then decays exponentially B(x) = B(0)e
− λx
L
,
where the London penetration depth is given by s 23 12 mc2 rs n Å . λL = = 41.9 2 4πns e a0 ns
(9.63)
(9.64)
The magnetic field at the boarder between a normal material and a superconductor is shown schematically in Fig. 9.7. Inside the superconductor the field is expelled from the superconductor by eddy currents at its surface. The field subsequently decays exponentially fast. The decay length is given by the London penetration depth, which for 278
Figure 9.7: The magnetic field inside a type I superconductor drops off exponentially fast from its surface. typical superconductors is of the order of 102 to 103 Å. Although London’s explanation phenomenologically explains the Meissner effect it begs the question if we can create a microscopic theory (i.e. version of London’s equation) that goes beyond standard electrodynamics. And how can we create a phase transition in the electronic structure that leads to the absence of all scattering.
9.7
London theory
F. London first noticed that a degenerate Bose-Einstein gas provides a good basis for a phenomenological model of superconductivity. F. and H. London then proposed a quantum mechanical analog of London’s electrodynamical derivation shown in the previous section. They showed that the Meissner effect could be interpreted very simply by a peculiar coupling in momentum space, as if there was something like a condensed phase of the Bose gas. The idea to replace Fermi statistics by Bose statistics in the theory of metals led them to an equation for the current, which turned out to be microscopically exact. The quantum mechanical current density is given by ie e2 (ψ ∗ ∇ψ − ψ∇ψ ∗ ) − Aψ ∗ ψ (9.65) 2m m where A ≡ A(r) is a vector potential such that ∇×A = B. One should sum this expression over all electron states below the Fermi level of the normal metal. If the particles governed by the wave functions ψ in this expression were electrons we would only observe weak Landau diamagnetism (see previous Chapter). So let us instead assume bosons with charge e∗ and mass m∗ . They satisfy their Schrödinger equation 1 (9.66) (∇ + ie∗ A) φ(r) = Eφ(r) . − 2m∗ Choosing the Maxwell gauge for the vector field (∇A = 0) we obtain 1 ie∗ e∗ 2 2 − ∗ ∇ − ∗ A∇ + A φ(r) = Eφ(r) . (9.67) 2m m 2m∗ j(r) =
279
Now we assume A to be small so that we can apply perturbation theory (9.68)
φ(r) = φ0 (r) + φl (r) ,
where φ0 (r) is the ground state wave function in the absence of the field and φl (r) the perturbation introduced by A. We know from the theory of Bose condensates that at low temperatures the condensate wave function is a constant 1 φ0 (r) = √ , V
k = 0, E0 = 0,
(9.69)
where V is the volume of the condensate. Next we insert eq. 9.68 into eq. 9.67 and expand to first order in A and φl −
1 1 ie∗ 2 2 ∇ φ (r) − ∇ φ (r) − A∇φ0 (r) = E0 φl (r) + E1 φ0 (r) . 0 l 2m∗ 2m∗ m∗
(9.70)
The first and third term on the left hand side of this equation and the first term on the right hand side are zero. Since we are first order in A, E1 has to be proportional to A and because E1 is a scalar our only choice is for it to be proportional to ∇A. This means it vanishes in the Maxwell gauge and φl (r) = 0 to first order in A. Inserting φ(r) = φ0 (r) into the equation for the current density (eq. 9.65) the term in brackets evaluates to zero and we are left with e ∗ 2 ns e ∗ 2 Ns j(r) = − ∗ A = − A . (9.71) 2m V 2m∗ 4π With ∇ × A = B and Maxwell’s equation ∇ × B = j we finally arrive at c (9.72)
B + λL ∇ × ∇B = 0,
which is identical to the London equation derived from electrodynamics in the previous section (cf eq. 9.61). We now have a quantum mechanical derivation of the London equation and the London penetration depth and we have identified the carriers in the superconducting state to be bosons. However, this begs the question where these bosons come from?
9.8
Flux quantization
If the superconducting state is a condensate of Bosons, as proposed by London, then its wave function is of the form √ φ(r) = ns e−ϕ(r) . (9.73) ns is the superconducting density, that we introduced in section 9.6 and which we assume to be constant throughout the superconducting region. ϕ(r) is the phase factor of the wave function, which is yet undetermined. To pin down the phase let us consider a hole in our superconductor (cf Fig. 9.8) or alternatively a superconducting ring. The magnetic flux through this hole is Z Z I ΦB = ds B = ds ∇ × A(r) = dl A(r) , (9.74) surf
surf
C
280
Figure 9.8: Flux trapped in a hole in a superconductor is quantized. where we have used Stoke’s theorem to convert the surface integral into the line integral over the contour C. But according to London’s theory the magnetic field inside the superconductor falls off exponential so that we can find a contour on which B and j are zero. Using the equation for the current density (eq. 9.65) this yields 0 = j(r) =
ie∗ e∗ 2 ∗ ∗ (ψ ∇ψ − ψ∇ψ ) − Aψ ∗ ψ . 2m∗ m∗
(9.75)
Inserting our wave function (eq. 9.73) for the condensate gives the following equation for the phase ie∗ e∗ 2 ∗ ∗ (i∇ϕφ φ + i∇ϕφφ ) − Aφ∗ φ = 0 (9.76) 2m∗ m∗ or e∗ − ∗ φ∗ φ (∇ϕ + e∗ A) = 0 . (9.77) m This implies that there is a relation between the phase and the vector potential A(r) = −
∇ϕ(r) . e∗
(9.78)
Inserting this result into our expression for the magnetic flux (eq. 9.74) we finally obtain I ∇ϕ δϕ ΦB = − dl ∗ = ∗ . (9.79) e e C
Here δϕ is the change of the phase in the round trip along the contour, which is only given up to modulo 2π: δϕ = 2πp with p=0, 1, 2,. . . This means the magnetic flux is quantized ΦB =
2π~c p. e∗
(9.80)
Experimentally this was confirmed by Deaver and Fairbank in 1961 who found e∗ = 2e! 281
9.9
Ogg’s pairs
Recapping at this point, we have learned that the superconducting state is made up of bosons with twice the electron charge. This led Richard Ogg Jr to propose in 1946 that electrons might pair in real space. Like for the hydrogen molecule discussed in section 8.4.4.1 the two electrons could pair chemically and form a boson with spin S=0 or 1. Ogg suggested that an ensemble of such two-electron entities could, in principle, form a superconducting Bose-Einstein condensate. The idea was motivated by his demonstration that electron pairs were a stable constituent of fairly dilute solutions of alkali metals in liquid ammonia. The theory was further developed by Schafroth, Butler and Blatt, but did not produce a quantitative description of superconductivity (e.g. the Tc of ∼ 104 K is much too high). The theory could also not provide a microscopic force to explain the pairing of normally repulsive electrons. It was therefore concluded that electron pairing in real-space does not work and the theory was forgotten.
9.10 9.10.1
Microscopic theory - BCS theory Cooper pairs
The basic idea behind the weak attraction between two electrons was presented by Cooper in 1956. He showed that the Fermi sea of electrons is unstable against the formation of at least one bound pair, regardless of how weak the interaction is, son long as it is attractive. This result is a consequence of Fermi statistics and of the existence of the Fermi-sea background, since it is well known that binding does not ordinarily occur in the two-body problem in three dimensions until the strength of the potential exceeds a finite threshold value. To see how the binding comes about, we consider the model of two quasiparticles with momentum k and −k in a Fermi liquid (e.g. free-electron metal). The spatial part of the two-particle wave function of this pair is then X ψ0 (r1 , r2 ) = gk eikr1 e−ikr2 . (9.81) k
For the spin part we assume singlet pairing |χspin >S = 2−1/2 (| ↑↓> − | ↓↑>) (see also section 8.4.4.1), which gives us S=0, i.e. a boson. The triplet with S=1 would also be possible, but is of higher energy for conventional superconductors (see e.g. unconventional superconductivity). |χspin >S is antisymmetric with respect to particle exchange and ψ0 (r1 , r2 ) therefore has to be symmetric: X ψ0 (r1 , r2 ) = gk cos k(r1 − r2 ) . (9.82) k>kF
If we insert this wave function into our two-particle Schrödinger equation " # X pi 2 + V (r1 , r2 ) ψ0 = Eψ0 2m i=1,2
we obtain
(E − 2ǫk )gk =
X
k′ >k
282
F
Vkk′ gk′ .
(9.83)
(9.84)
Vkk′ is the characteristic strength of the scattering potential and is given by the Fourier transform of the potential V Z 1 ′ Vkk′ = drV (r)e−(k−k )r (9.85) Ω with r = r1 − r2 and Ω the normalized volume. The energies ǫk in eq. 9.84 are the unperturbed plane-wave energies and are larger than ǫF , because the sum runs over k > kF . If a set of gk exists such that E < 2ǫF then we would have a bound state of two electrons. But how can this be? Recall that the Fourier transform of the bare Coulomb potential (i.e. free electrons) is 4π 2 Vkk′ = Vq=k−k′ = 2 > 0 . (9.86) q This is always larger than zero and therefore not attractive, as expected. However, quasiparticles are not free electrons and are screened by the electron sea. Let us consider Thomas-Fermi screening introduced in Chapter 3. The dielectric function becomes ε(k) =
k2 + k20 6= 1 . k2
(9.87)
The bare Coulomb potential is screened by ε V (r1 − r2 ) = Its Fourier transform Vq =
ε−1 (r1 − r2 ) . |r1 − r2 |
4π 2 q2 + k20
(9.88)
(9.89)
is still positive, but now reduced compared to the bare Coulomb interaction. Building on the idea of screening we should also consider the ion cores. They are positively charged and also move to screen the negative charge of our electron pair. However, their “reaction speed” is much smaller than those of the electrons and of the order of typical phonon frequencies. A rough estimate (see e.g. Ashcroft Chapter 26) to further screen the ThomasFermi expression could look like 4π 2 ω2 Vq = 2 1+ 2 , (9.90) q + k20 ω − ωq2 1 (ǫk − ǫ′k ). If now ω < ωq ~ this (oversimplified) potential would be negative and therefore attractive. In other words, phonons could provide our attractive interaction that is schematically depicted in Fig. 9.9. At normal temperatures, the positive ions in a superconducting material vibrate away on the spot and constantly collide with the electron bath around them. It is these collisions that cause the electrical resistance that wastes energy and produces heat in any normal circuit. But the cooler the material gets, the less energy the ions have, so the less they vibrate. When the material reaches its critical temperature, the ions’ vibrations are incredibly weak and no longer the dominant form of motion in the lattice. The tiny
where ωq is a phonon frequency of wave vector q and ω =
283
Figure 9.9: Cooper pair formation schematically: at extremely low temperatures, an electron can draw the positive ions in a superconductor towards it. This movement of the ions creates a more positive region that attracts another electron to the area. attractive force of passing electrons that’s always been there is suddenly enough to drag the positive ions out of position towards them. And that dragging affects the behaviour of the solid as a whole. When positive ions are drawn towards a passing electron, they create an area that is more positive than their surroundings, so another nearby electron is drawn towards them. However, those electrons are on the move, so by the time the second one has arrived the first one has moved on and created a path of higher positivity that the second electron keeps on following. The electrons are hitched in a game of catch-up that lasts as long as the temperature stays low. To proceed Cooper then further simplified the potential −V for |ǫk − ǫF |, |ǫk′ − ǫF | < ~ωc Vkk′ = (9.91) 0 otherwise where ωc is the Debye frequency. Now V pulls out of the sum on the right-hand side of eq. 9.84 and we can rearrange for gk : P k′ g k′ gk = V . (9.92) E − 2ǫk P P Applying k on both sides and canceling k gk we obtain X 1 = (2ǫk − E)−1 . V k>k
(9.93)
F
Replacing the sum by an integral yields 1 = V
Z
ǫF +~ωc
dǫN (ǫ) ǫF
1 2ǫF − E + 2~ωc 1 ≈ N (ǫF ) ln 2ǫ − E 2 2ǫF − E 284
(9.94)
where we have made the assumption N (ǫ) ≈ N (ǫF ). Solving this expression for E we obtain 2 2 2 E(e− N V − 1) = 2ǫF (e− N V − 1) + 2~ωc e− N V . (9.95) 2
Now we make the “weak coupling” approximation N (ǫF )V ≪ 1 such that 1 − e− N V ≈ 1 to arrive at the final expression 2
E ≈ 2ǫF − 2~ωc e− N V < 2ǫF .
(9.96)
We see that we indeed find a bound state, in which the binding outweighs the gain in kinetic energy for states above ǫF , regardless of how small V is. It is important to note that the binding energy is not analytic at V = 0, i.e. it cannot be expanded in powers of V . This means that perturbation theory is not applicable, which slowed down the development of the theory significantly. With view to the Ogg pairs discussed in section 9.9 we note that the Cooper pair is bound in reciprocal space: X ψ0 = gk cos k(r1 − r2 ) (9.97) k>kF
∼
X
k>kF
V cos k(r1 − r2 ) 2(ǫk − ǫF ) + 2ǫF − E
(9.98)
The denominator assumes its maximum value for electrons at the Fermi level and falls off from there. Thus the electron states within a range 2ǫF − E above ǫF are those most strongly involved in forming the bound states.
9.11
Bardeen-Cooper-Schrieffer (BCS) Theory
Having seen that the Fermi sea is unstable against formation of a bound Cooper pair when the net interaction is attractive, we then expect pairs to condense until an equilibrium point is reached. This will occur when the state of the system is so greatly changed from the Fermi sea due to the large number of pairs that the binding energy for an additional pair has gone to zero. The wave function for such a state appears to be quite complex and it was the ingenuity of Bardeen, Cooper and Schrieffer to develop it. The basic idea is to create the total many-electron wave function from the pair wave functions φ we discussed in the previous section Ψ = φ(r1 σ1 , r2 σ2 )φ(r3 σ3 , r4 σ4 ) . . . φ(rN −1 σN −1 , rN σN )
(9.99)
ˆ and then to antisymmeterize (ΨBCS = AΨ). The coefficients of ΨBCS could then be found by minimizing the total energy of the many-body Hamiltonian. BCS introduced a simplified pairing Hamiltonian X † X † † H= ξk ckσ ckσ + ck↑ c−k↓ Vkk′ c−k′ ↓ ck′ ↑ , (9.100) k,σ
kk′
with ξ = ǫk − ǫF and the usual electron creation and annihilation operators c†kσ and ckσ . Vkk′ again is our attractive interaction as in Eq. (9.91), but it is restricted to (k ↑,−k ↓) pairs. The problem is solvable in principle but this may be not too illustrative. Instead, we will consider a mean-field approximation. 285
We start with the observation that the characteristic BCS pairing Hamiltonian will lead to a ground state which is some phase-coherent superposition of many-body states with pairs of Bloch states (k ↑,−k ↓) occupied or unoccupied. Because of the coherence, operators such as c−k′ ↓ ck′ ↑ can have nonzero expectation values bk = hc−k′ ↓ ck′ ↑ iavg in such a state, rather than averaging to zero as in a normal metal, where the phases are random. Moreover, because of the large number of particles involved, the fluctuations about these expectation values should be small. This suggests that it will be useful to express such a product of operators formally as c−k′ ↓ ck′ ↑ = bk + (c−k′ ↓ ck′ ↑ − bk ) ,
(9.101)
and subsequently neglect quantities which are quadratic in the (presumably small) fluctuation term in parentheses. By making the following mean-field approximation for the potential term X X Vkk′ c−k′ ↓ ck′ ↑ ≈ −V Θ(~ωc − |ξk |) Θ(~ωc − |ξk′ |)hc−k′ ↓ ck′ ↑ iavg (9.102) k′
=: ∆k =
k′
∆ for |ǫk − ǫF |, |ǫk′ − ǫF | < ~ωc , 0 otherwise
(9.103)
inserting all into the BCS Hamiltonian Eq. (9.100), and expanding to first order, we write X † X HMF = ξk ckσ ckσ + Vkk′ c†k↑ c†−k↓ bk′ + b†k c−k′ ↓ ck′ ↑ − b†k bk′ (9.104) k,σ
=
X k,σ
kk′
X † † † † ∗ ξk ckσ ckσ − ∆k ck↑ c−k↓ + ∆k c−k′ ↓ ck′ ↑ − ∆k bk .
(9.105)
k
This is an effective single particle Hamiltonian (two c-operators instead of four), which we can solve. To diagonalize this mean-field Hamiltonian, we define a suitable linear transformation onto a new set of Fermi operators ck↑ = uk αk + vk βk†
(9.106)
c−k↓ = uk βk − vk αk† ,
(9.107)
where αk and βk are two types of new non-interacting Fermions (i.e. quasiparticles). The expansion coefficients are 1 ξk 2 uk = 1+ (9.108) 2 ǫ˜k 1 ξk 2 vk = 1− (9.109) 2 ǫ˜k ∆k uk v k = − (9.110) 2˜ǫk and
q ǫ˜k = ξk2 + |∆k |2 .
(9.111)
With this so-called Bogoliubov transformation, the mean-field Hamiltonian becomes diagonal X ǫ˜k (αk† αk + βk† βk ) , (9.112) HMF = E0 + k
286
with E0 = 2
X
(ξk vk + ∆k uk vk ) +
k
|∆|2 . V
The order parameter ∆ is given as before by X ∆ = −V Θ(~ωc − |ξk′ |hc−k′ ↓ ck′ ↑ iavg .
(9.113)
(9.114)
k′
If we now replace hc−k′ ↓ ck′ ↑ iavg by αk and βk as given in Eq. (9.106) and (9.107), we obtain the following relation X∆ ∆=V (1 − 2fk′ ) . (9.115) ǫ ˜ k ′ k
fk is the quasiparticle distribution function fk =< αk† αk >=< βk† βk >. Unlike in the case of bare electrons, the total average number of quasi-particles is not fixed. Therefore, their chemical potential is zero in thermal equilibrium. Since they do not interact, their distribution is described by the usual Fermi-Dirac function, so that 1
fk = e
ǫ ˜k kB T
(9.116)
. +1
We see that at T = 0, fk = 0, so there are no quasiparticles in the ground state. This implies that E0 is the ground state energy of the BCS supercondutor at T =0. To evaluate E0 we need to know ∆. The trivial solution to Eq. (9.114) is ∆ = 0. It corresponds to the normal state and gives X X1 ξk 2 E0 = 2 ξk vk = 2 1− . (9.117) 2 ǫ ˜ k k k For ∆ = 0, we have ǫ˜k = |ξk | and therefore
E0 = 2
X
(9.118)
ξk ,
ξk kF gives zero since ǫ˜k = ξk ). The non-trivial solution is obtained by canceling the ∆ from Eq. (9.115): X 1 1=V . (9.119) ǫ ˜ k ′ k P Since ǫ˜k still depends on R∆, we solve for it by replacing the momentum summation k by an energy integration dǫN (ǫ) with the density of states N (ǫ). We again approximate the density of states by its value at the Fermi level , N (ǫ) = N (ǫF ). This yields Z ~ωc dξ 1 ~ωc −1 p = = sinh . (9.120) N (ǫF )V ∆ ∆2 + ξ 2 0
Solving for ∆ in the weak coupling limit N (ǫF )V ≪ 1 finally gives ∆= sinh
~ω c
1
1 N (ǫF )V
≈ 2~ωc e− N (ǫF )V .
287
(9.121)
Figure 9.10: The dispersion of electron-like excitations in the SC state is gapped and differs from that of free electrons (∆ = 0). Taken from Mean-Field Theory: Hartree-Fock and BCS, Erik Koch, http://www.cond-mat.de/events/correl16/manuscripts/koch.pdf Inserting everything into the equation for the ground state energy (Eq. (9.113)), we obtain for the condensation energy Ec = E0 (∆ 6= 0) − E0 (∆ = 0) X X 1 ≈2 ξk − N (ǫF )∆2 − 2 ξk 2 ξ