292 33 22MB
English Pages 649 [650] Year 2023
Applied and Numerical Harmonic Analysis
Patrick Flandrin Stéphane Jaffard Thierry Paul Bruno Torrésani Editors
Theoretical Physics, Wavelets, Analysis, Genomics An Indisciplinary Tribute to Alex Grossmann
Applied and Numerical Harmonic Analysis Series Editors John J. Benedetto, University of Maryland, College Park, MD, USA Wojciech Czaja, Mathematics, University of Maryland, College Park, College Park, MD, USA Kasso Okoudjou, Dept of Mathematics, Tufts University, Medford, MA, USA Editorial Board Members Akram Aldroubi, Vanderbilt University, Nashville, TN, USA Peter Casazza, Math Department, University of Missouri, Columbia, USA Douglas Cochran, Arizona State University, Phoenix, AZ, USA Hans G. Feichtinger, University of Vienna, Vienna, Austria Anna C. Gilbert, Dept of Statistics and Data Science, Yale University, New Haven, CT, USA Christopher Heil, Georgia Institute of Technology, Atlanta, GA, USA Stéphane Jaffard, University of Paris XII, Paris, France Gitta Kutyniok, Ludwig Maximilian University of Munich, München, Bayern, Germany Mauro Maggioni, Johns Hopkins University, Baltimore, MD, USA Ursula Molter, University of Buenos Aires, Buenos Aires, Argentina Zuowei Shen, National University of Singapore, Singapore, Singapore Thomas Strohmer, University of California, Davis, CA, USA Michael Unser, Laboratoire d’imagerie biomédicale, École Polytechnique Fédérale de Lausa, Lausanne, Switzerland Yang Wang, Hong Kong University of Science & Technology, Kowloon, Hong Kong
Patrick Flandrin • Stéphane Jaffard • Thierry Paul • Bruno Torrésani Editors
Theoretical Physics, Wavelets, Analysis, Genomics An Indisciplinary Tribute to Alex Grossmann
Editors Patrick Flandrin Laboratoire de Physique, Université Lyon, ENS de Lyon Université Claude Bernard CNRS, Lyon, France Thierry Paul Sorbonne Université, CNRS, Université Paris Cité Laboratoire Jacques-Louis Lions (LJLL) Paris, France
Stéphane Jaffard Laboratoire d’Analyse et de Mathématiques Appliquées Université Paris Est Créteil Créteil, France Bruno Torrésani Aix Marseille Univ CNRS, Marseille, I2M, France
ISSN 2296-5009 ISSN 2296-5017 (electronic) Applied and Numerical Harmonic Analysis ISBN 978-3-030-45846-1 ISBN 978-3-030-45847-8 (eBook) https://doi.org/10.1007/978-3-030-45847-8 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This book is published under the imprint Birkhäuser, www.birkhauser-science.com by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
ANHA Series Preface
The Applied and Numerical Harmonic Analysis (ANHA) book series aims to provide the engineering, mathematical, and scientific communities with significant developments in harmonic analysis, ranging from abstract harmonic analysis to basic applications. The title of the series reflects the importance of applications and numerical implementation, but richness and relevance of applications and implementation depend fundamentally on the structure and depth of theoretical underpinnings. Thus, from our point of view, the interleaving of theory and applications and their creative symbiotic evolution is axiomatic. Harmonic analysis is a wellspring of ideas and applicability that has flourished, developed, and deepened over time within many disciplines and by means of creative cross-fertilization with diverse areas. The intricate and fundamental relationship between harmonic analysis and fields such as signal processing, partial differential equations (PDEs), and image processing is reflected in our state-of-theart ANHA series. Our vision of modern harmonic analysis includes a broad array of mathematical areas, e.g., wavelet theory, Banach algebras, classical Fourier analysis time-frequency analysis, deep learning, and fractal geometry, as well as the diverse topics that impinge on them. For example, wavelet theory can be considered an appropriate tool to deal with some basic problems in digital signal processing, speech and image processing, geophysics, pattern recognition, biomedical engineering, and turbulence. These areas implement the latest technology from sampling methods on surfaces to fast algorithms and computer vision methods. The underlying mathematics of wavelet theory depends not only on classical Fourier analysis, but also on ideas from abstract harmonic analysis, including von Neumann algebras and the affine group. This leads to a study of the Heisenberg group and its relationship to Gabor systems, and of the metaplectic group for a meaningful interaction of signal decomposition methods. The unifying influence of wavelet theory in the aforementioned topics illustrates the justification for providing a means for centralizing and disseminating information from the broader, but still focused, area of harmonic analysis. This will be a key
v
vi
ANHA Series Preface
role of ANHA. We intend to publish with the scope and interaction that such a host of issues demands. Along with our commitment to publish mathematically significant works at the frontiers of harmonic analysis, we have a comparably strong commitment to publish major advances in the following applicable topics in which harmonic analysis plays a substantial role: *Analytic Number Theory * Antenna Theory * Artificial Intelligence * Biomedical Signal Processing * Classical Fourier Analysis * Coding Theory * Communications Theory * Compressed Sensing * Crystallography and Quasi-Crystals * Data Mining * Data Science * Deep Learning * Digital Signal Processing * Dimension Reduction and Classification * Fast Algorithms * Frame Theory and Applications * Gabor Theory and Applications * Geophysics * Image Processing * Machine Learning * Manifold Learning * Numerical Partial Differential Equations * Neural Networks * Phaseless Reconstruction * Prediction Theory * Quantum Information Theory * Radar Applications * Sampling Theory (Uniform and Non-uniform) and Applications * Spectral Estimation * Speech Processing * Statistical Signal Processing * Super-resolution * Time Series * Time-Frequency and Time-Scale Analysis * Tomography * Turbulence * Uncertainty Principles *Waveform design * Wavelet Theory and Applications The above point of view for the ANHA book series is inspired by the history of Fourier analysis itself, whose tentacles reach into so many fields. In the last two centuries, Fourier analysis has had a major impact on the development of mathematics, on the understanding of many engineering and scientific phenomena, and on the solution of some of the most important problems in mathematics and the sciences. Historically, Fourier series were developed in the analysis of some of the classical PDEs of mathematical physics; these series were used to solve such equations. In order to understand Fourier series and the kinds of solutions they could represent, some of the most basic notions of analysis were defined, e.g., the concept of “function." Since the coefficients of Fourier series are integrals, it is no surprise that Riemann integrals were conceived to deal with uniqueness properties of trigonometric series. Cantor’s set theory was also developed because of such uniqueness questions. A basic problem in Fourier analysis is to show how complicated phenomena, such as sound waves, can be described in terms of elementary harmonics. There are two aspects of this problem: first, to find, or even define properly, the harmonics or spectrum of a given phenomenon, e.g., the spectroscopy problem in optics; second, to determine which phenomena can be constructed from given classes of harmonics, as done, for example, by the mechanical synthesizers in tidal analysis. Fourier analysis is also the natural setting for many other problems in engineering, mathematics, and the sciences. For example, Wiener’s Tauberian theorem in Fourier analysis not only characterizes the behavior of the prime numbers but is a fundamental tool for analyzing the ideal structures of Banach algebras. It also provides the proper notion of spectrum for phenomena such as white light. This latter process leads to the Fourier analysis associated with correlation functions in
ANHA Series Preface
vii
filtering and prediction problems. These problems, in turn, deal naturally with Hardy spaces in complex analysis, as well as inspiring Wiener to consider communications engineering in terms of feedback and stability, his cybernetics. This latter theory develops concepts to understand complex systems such as learning and cognition and neural networks, and it is arguably a precursor of deep learning and its spectacular interactions with data science and AI. Nowadays, some of the theory of PDEs has given way to the study of Fourier integral operators. Problems in antenna theory are studied in terms of unimodular trigonometric polynomials. Applications of Fourier analysis abound in signal processing, whether with the fast Fourier transform (FFT), or filter design, or the adaptive modeling inherent in time-frequency-scale methods such as wavelet theory. The coherent states of mathematical physics are translated and modulated Fourier transforms, and these are used, in conjunction with the uncertainty principle, for dealing with signal reconstruction in communications theory. We are back to the raison d’être of the ANHA series! College Park, MD, USA Boston, MA, USA
John Benedetto Wojciech Czaja Kasso Okoudjou
En guise de préface
This book contains numerous texts written by researchers who got to deeply know, as collaborators and/or as friends, Alex Grossmann. Being by no mean formal “homages”, these chapters constitute state-of-the-art research or review papers in different fields inside the three domains in which Alex Grossmann gave us definitive contributions: quantum mechanics and theoretical physics, wavelets and mathematical analysis, and genomics and biology. The broad scope of the different subjects treated in this book constitutes by itself a tribute to the extraordinary large culture and scientific activity of Alex Grossmann all along his life. We believe that this choice of presenting very diverse contemporary scientific results is the best homage – in fact the only one – Alex would have accepted and enjoyed. Lyon, France Créteil, France Paris, France Marseille, France
Patrick Flandrin Stéphane Jaffard Thierry Paul Bruno Torrésani
xi
Contents
The Making of a Physicist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael Grossmann
1
Alex Grossmann, a Rinascimento Multidisciplinary Man. . . . . . . . . . . . . . . . . . . Thierry Paul
15
Generalized Affine Signal Analysis with Time-Delay Thresholds . . . . . . . . . . Jan W. Dash, Alex Grossmann, and Thierry Paul
23
Introductory Note on the Draft Paper “Generalized Affine Signal Analysis with Time-Delay Thresholds”, by Jan W. Dash, Alex Grossmann and Thierry Paul . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jan W. Dash and Thierry Paul Alex Grossmann’s PhD Thesis (Harvard 1959): Covariant Functions of Quantum Fields - Table of Contents and Introduction . . . . . . . Alex Grossmann
45
51
Part I Quantum Mechanics and Theoretical Physics Alex Grossmann, from Nested Hilbert Spaces to Partial Inner Product Spaces and Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jean-Pierre Antoine
63
Combining Quantum Mechanical Languages (A Tribute to Alex Grossmann) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Joshua Zak
81
Alex Grossmann, Scattering Amplitude, Fermi Pseudopotential, and Particle Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tai Tsun Wu
97
Sixty Years of Hadronic Vacuum Polarization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Eduardo de Rafael
xiii
xiv
Contents
Standard Model, and Its Standard Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Chris P. Korthals Altes SU(3) Higher Roots and Their Lattices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Robert Coquereaux Quantum Field Theory with Dynamical Boundary Conditions and the Casimir Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Benito A. Juárez-Aubry and Ricardo Weder Where Is a Photon in an Interferometer? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 Yosi Avron About the Derivation of the Quasilinear Approximation in Plasma Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 Claude Bardos and Nicolas Besse Analysing the Scattering of Electromagnetic Ultra-wideband Pulses from Large-Scale Objects by the Use of Wavelets . . . . . . . . . . . . . . . . . . . . 281 François Bentosela Species of Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 Thierry Paul Part II Wavelets and Mathematical Analysis Curved Model Sets and Crystalline Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 Yves Meyer Diffusion Maps: Using the Semigroup Property for Parameter Tuning . . . 409 Shan Shan and Ingrid Daubechies Wavelet Phase Harmonics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425 Stéphane Mallat, Gaspar Rochette, and Sixin Zhang Multiscale Decompositions of Hardy Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 Ronald R. Coifman and Jacques Peyrière A Generalization of Gleason’s Frame Function for Quantum Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 John J. Benedetto, Paul J. Koprowski, and John S. Nolan Post-Fourier Frequencies: Variations and Paradoxes . . . . . . . . . . . . . . . . . . . . . . . . 515 Patrick Flandrin The Unreasonable Effectiveness of Haar Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537 Stéphane Jaffard and Hamid Krim
Contents
xv
Part III Genomics and Biology Quantifying the Rationality of Rhythmic Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565 Alexandre Guillet, Alain Arneodo.‡ , Pierre Argoul, and Françoise Argoul Four Billion Years: The Story of an Ancient Protein Family. . . . . . . . . . . . . . . . 595 Gilles Didier, Claudine Landès, Alain Hénaut, and Bruno Torrésani Pseudo-Rate Matrices, Beyond Dayhoff’s Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617 Claudine Landès, Yolande Diaz-Lazcoz, Alain Hénaut, and Bruno Torrésani Applied and Numerical Harmonic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645
Contributors
Jean-Pierre Antoine Institut de Recherche en Mathématique et Physique, Université catholique de Louvain, Louvain-la-Neuve, Belgium Françoise Argoul CNRS, UMR5787, Laboratoire Ondes et Matière d’Aquitaine, Université de Bordeaux, Bordeaux, France Pierre Argoul MAST-EMGCU, Univ Gustave Eiffel, IFSTTAR, Marne-la-Vallée, France Alain Arneodo.‡ CNRS, UMR5787, Laboratoire Ondes et Matière d’Aquitaine, Université de Bordeaux, Bordeaux, France Yosi Avron Department of Physics, Technion, Haifa, Israel Claude Bardos Laboratoire J.-L. Lions, Sorbonne Université, Paris, France John J. Benedetto Norbert Wiener Center, Department of Mathematics, University of Maryland, College Park, MD, USA François Bentosela Aix Marseille Univ, Université de Toulon, CNRS, CPT, Marseille, France Nicolas Besse Laboratoire J.-L. Lagrange, Observatoire de la Côte d’Azur, Université Côte d’Azur, Nice, France Ronald R. Coifman Department of Mathematics, Program in Applied Mathematics, Yale University, New Haven, CT, USA Robert Coquereaux Aix Marseille Univ, Université de Toulon, CNRS, CPT, Marseille, France Jan W. Dash J. Dash Consultants, LLC, Silver Spring MD, USA Fordham University New York, NY, USA Ingrid Daubechies Duke University, Durham, NC, USA
xvii
xviii
Contributors
Yolande Diaz-Lazcoz Université d’Evry, LaMME, Evry, France Gilles Didier IMAG, Univ Montpellier, CNRS, Montpellier, France Patrick Flandrin Laboratoire de Physique, Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS, Lyon, France Alex Grossmann Centre de Physique Theorique Section 2, Marseille, France Michael Grossmann Tumbleweed Consulting, Saint Germain-en-Laye, France Alexandre Guillet CNRS, UMR5787, Laboratoire Ondes et Matière d’Aquitaine, Université de Bordeaux, Bordeaux, France Alain Hénaut Université Publique Française, Evry, France Stéphane Jaffard Laboratoire d’Analyse et de Mathématiques, Appliquées Université Paris Est Créteil, Créteil, France Benito A. Juárez-Aubry Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México Ciudad, de México, Mexico Paul J. Koprowski Amtrak, Washington DC, USA Chris P. Korthals Altes NIKHEF, Theory Group, Amsterdam, The Netherlands Aix Marseille Univ, Université de Toulon, CNRS, CPT, Marseille, France Hamid Krim Department of Electrical and Computer Engineering, North Carolina State University Raleigh, Raleigh, NC, USA Claudine Landès Univ Angers, Institut Agro, INRAE, IRHS, SFR QUASAV, Angers, France Stéphane Mallat DI ENS, PSL University, Paris, France Collège de France, Paris, France CCM, Flatiron Institute, New york, USA Yves Meyer École normale supérieure Paris-Saclay, Gif-sur-Yvette, France John S. Nolan Department of Mathematics, UC Berkeley, Berkeley, CA, USA Thierry Paul Sorbonne Université, CNRS, Université Paris Cité, Laboratoire Jacques-Louis Lions (LJLL), Paris, France Jacques Peyrière Institut de Mathématiques d’Orsay, CNRS, Université ParisSaclay, Orsay, France Eduardo de Rafael Aix Marseille Univ, Université de Toulon, CNRS, CPT, Marseille, France Gaspar Rochette DI ENS, PSL University, Paris, France
Contributors
xix
Shan Shan University of Southern Denmark, Odense, Denmark Bruno Torrésani Aix Marseille Univ, CNRS, I2M, Marseille, France Ricardo Weder Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México Ciudad, de México, Mexico Tai Tsun Wu Gordon McKay Laboratory, Harvard University, Cambridge, MA, USA Joshua Zak Department of Physics, Technion—Israel Institute of Technology, Haifa, Israel Sixin Zhang DI ENS, PSL University, Paris, France
The Making of a Physicist Michael Grossmann
Ask someone how he or she chose to become a mathematician, and you are likely to hear something about the inner beauty of mathematics, the universal truth it holds. Alex’s alibi for stumbling upon math was more mundane: tuberculosis, which he developed as a teenager upon returning to newly Socialist Yugoslavia in 1945. His parents, both medical doctors in Zagreb, sent him to a sanatorium in Switzerland for treatment. TB had a 50% mortality rate over five years at the time. In 1944 TB accounted for 15% of the deaths in Paris, 7.4% in London. In Alex’s telling, pensioners at the sanatorium would often have affairs, sensing their end was near, and to hell with the consequences. Not so for Alex, who having survived World War 2 as a refugee, probably felt he had life ahead of him. He found a mathematics book at the sanatorium’s library and this is what set him on the path to this discipline. Alex was treated for TB by collapsing the infected lung and starving the TB bacteria of oxygen. Shortly after, antibiotic treatment became the disease’s standard cure, and there was no need to collapse lungs. Nothing was simple with Alex’s family tree. Alex in effect had two mothers, the one who bore him, and the one who reared him. From these mothers, he had one sister from each. He had a sister from his father’s third marriage, and he had a brother from his father’s first marriage—or possibly was it an out-of-wedlock birth. Likewise, his biological mother, Marijana Horvat—known as Marica—was the daughter of an opera singer Anka Pisaˇciˇc. Her father was either Anka’s husband, a lawyer-politician, or Anka’s lover and future husband, a medical doctor. DNA analysis of her daughter eventually credited the medical doctor with Marica’s
This non-scientific biography covers Alex’s origins and peripatetic life from prewar Yugoslavia to his setting down in France. M. Grossmann () Tumbleweed Consulting, Saint Germain-en-Laye, France © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Flandrin et al. (eds.), Theoretical Physics, Wavelets, Analysis, Genomics, Applied and Numerical Harmonic Analysis, https://doi.org/10.1007/978-3-030-45847-8_1
1
2
M. Grossmann
paternity, but she did not know this during her lifetime, and she did not care either. “I like them both” she said of her possible fathers. Going up in Alex’s family tree, one finds several examples of first, second, or third marriages, each yielding one child. By extrapolation, little Alex came up with the theory that adult pairs could only have one child. As a father himself he later disproved this theory. Alex’s father Makso (“Max”) Grossmann was a cardiologist and a pharmacologist who became the director in Zagreb’s Merkur Clinic before the war. He and Marica married in 1929 when Marica was aged only seventeen. Max’s Jewish mother Šarlota did not approve of her son’s choice of a gentile spouse, and she let it be known. This might not have mattered much, were it not for the fact that the young couple lived at the mother’ home. In a revealing anecdote, she would have given an egg to her cherished son, but not to her pregnant daughter-in-law. This is something Marica remembered and resented all her life. Being a medical doctor, Max surely could have afforded a separate home for himself and his young wife. Years later, Alex himself could not explain why Max chose to stay at the parental home. On August 5th 1930 at age seventeen, Marica gave birth to Alex after two days of labor—Alex was apparently a big baby. This difficulty of labor, compounded with the difficulties at home triggered in her a postpartum depression. Marica was sent to recover at the Semmering sanatorium, a retreat in the Austrian Alps. There she met a successful writer of historical biographies, Paul Frischauer. One of his novels Der Gewinn—The Prize—is his semi-autobiographical rendering of how he met and conquered his prize—the young and beautiful Marica. At one point, Marica showed him a photo of the baby boy she had left behind, held in Max’s arms. In Der Gewinn, he renders Max as a “colorless balding man holding a big healthy looking baby”. After the sanatorium, Marica moved to Vienna with her fiancé and future husband. There she enrolled in medical school, where Max had studied in pre-WW I years. Her plan was ultimately to become a medical doctor but she and her Jewish husband sensed the danger from the rise of fascism in Austria. The Frischauers moved to London before the outbreak of World War II, and from there onto Brazil, where Paul was to write the biography of Getulio Vargas, the country’s president at the time. Unofficially, he was also missionned by the MI6 to put his proximity to the country’s leader to good use during WW II and nudge Brazil to actively enter the war on the Allied side. Another Austrian writer is known to have fled his home country because of the rise of Nazism for the UK, and from there to Brazil. Unsurprisingly Stefan Zweig and Paul Frischauer not only knew each other but were also friends. Right after WW II, Marica divorced her second and last husband and was awarded custody of their daughter Silba. Marica travelled to Europe probably at the end of 1946 to get a chance to see her son. They did meet briefly, and of course they met again in the mid-fifties, she a single immigrant mother, and he a graduate student of physics at Harvard University. Years later Alex reminisced that he related to his biological mother as a very good friend. Were it not for the impending war in Europe, Marica would have studied to become a medical doctor, just like her mother’s friend and her first husband. Once in the USA, however, the necessities of
The Making of a Physicist
3
earning a living forced her to postpone this plan and take up a clerical job at a life insurance company. Over the years she climbed the corporate ladder in the insurance industry, becoming a vice president. For all her American Dream success, however, she felt she was a citizen of Europe, and maybe even of bygone Austro-Hungarian Europa. When with Alex, they would spontaneously speak Croatian, oblivious to others around who did not understand the language. It was their way to re-live happier pre-war times. Soon after Max’s marriage with Marica was annulled, Max married Doctor Štefanija Winter, a Vienna-educated pediatrician from Osijek. Female doctors were rare in those days, and Štefanija may have been Yugoslavia’s first female pediatrician. For all intents and purposes, Štefanija became Alex’s mother, and in wartime correspondence with Max, she referred to him as “Deko” or “our son”, and in effect, he was her son. The Grossmanns lived in a rented house situated a stone-throw away from the Ruąer Josip Boškovi´c Institute of physics where, twenty years later, Alex would study physics. In October 1933 he got a little sister, Ena. One anecdote from Alex’s early childhood dates back to his kindergarten years. Prematurely for a kid his age, he believed that being married was a very good thing, but he knew it was not for children, at least not in Yugoslavia. Alex had heard of child marriages in India so, undeterred by geography, he decided to set off for this country. He also convinced some kindergarten friends to join him on the journey. The kindergarten’s gates were unlocked and the little troop did manage to walk past them but fortunately it did not go far. The kindergarten was run along progressive Montessori rules, and little Alex’s punishment was to have a red dot (for “naughty”) next to his name on the classroom board. Through the meanderings of his life, Alex effortlessly learned six languages at native fluency. Answering the question of where he learned to speak German, Alex referred to these pre-war years, when it was the language of the urban bourgeoisie of this former Habsburg province. When his parents wanted to talk without the children understanding, German was the language they used. Alex said that is why he had to learn German. At the time, German in Yugoslavia had the foreign language status that English has now, and it was also the language of higher education. After the war, he kept German as a foreign language, as well as English. Russian was a mandatory language in the immediate post-war Yugoslavia. As a teenage refugee in Italy, Alex learned Italian, and more precisely, the classical dialect spoken in Tuscany. At summer camps before the war and then as a refugee in Switzerland in 1944– 1945 he picked up French, which would become handy when as a graduate student he attended the physicists’ summer school of Les Houches. Alex would also read Latin classics for pleasure and would often drop obscure and appropriate quotes. Ancient Greek was more of a struggle for him, and he preferred to read it with the Latin translation for checking. In retirement, Alex assiduously studied Mandarin Chinese, in part motivated by the fact that he started a collaboration with Wuhan University after his formal retirement from the French research agency CNRS. He
4
M. Grossmann
was surprised to discover that the Chinese language was a tougher nut to crack than the European languages he had known. The Kingdom of Yugoslavia, a creature of the Versailles Treaty and post WW I Europe, had tried to remain neutral during WW II. Hitler felt the need to invade the country to secure Germany’s southern flank before launching the fateful Barbarossa plan against the USSR. After a successful blitzkrieg, Germany partitioned Yugoslavia into vassal states. The puppet state of Croatia established in April 1941 was called the Independent State of Croatia (“NDH”). The NDH immediately set out to carry out acts of mass murder against Jews, communists, and Serbs. A few days after the new regime was established, Alex witnessed victims of the Nazi hanging from lampposts in Zagreb as reprisal for the killing of a German soldier. One victim he recognized as a science teacher from his school. Max soon lost his Directorship of the Merkur hospital in Zagreb. Officially he was put into retirement, but without a pension. Likewise, Štefanija was released from her job at a children’s hospital where she had practiced medicine since 1924. The family decided to flee Croatia. One exit route considered was emigration to Cuba, of all places. While the Grossmann family was lying low in Zagreb, a Swiss friend was sent to do the paperwork on their behalf to obtain a visa to travel to the island state. Legend has it that a bribe was expected for the delivery of a visa. This friend indignantly turned down the request, not realizing the risk the Grossmanns were exposed to as a result of her principles. In June 1941, the Grossmann family had decided to flee to Italy. Italy was then a relatively safe haven for Jews. Mussolini’s Italy enacted highly discriminatory Racial Laws in 1938 against Italy’s Jews, but at the time it turned down Hitler’s demand that it extradite its Jews. The Grossmanns took the train to Rijeka, a port city West of today’s Croatia. Max had a rich philatelic collection, which he took with him, intending to sell stamps for sustenance. Unlike gold or jewelry, the value of stamps is not apparent to the non-specialist. The Croatian customs officer at the Croatian-Italian border checked their passports, casually looked at the stamp collection and gave them a pass. After the 1941 Axis invasion of the Kingdom of Yugoslavia, Rijeka was annexed to Italy, and it was a safer haven than Croatia proper. Between the world wars, it was known as the Free State of Fiume (fiume and rijeka mean river in Italian and in Croatian). Rijeka was in effect, the way station from which Max and Štefanija planned the family’s escape into Italy. Max went to Italy on a scouting mission to secure work as a medical doctor and a residency permit the family. He was successful, apparently thanks to connections he had made when, as a military doctor during WW I in the Austrian Army, he had treated Italian prisoners of war. It is unclear under what circumstances the young doctor he was at the time would have known the identity of his charges from the defeated Italian side. At any rate, by July 1941, the Grossmann family was already settled down in Tuscany.
The Making of a Physicist
5
The Grossmann family settled in Montecatini Terme, a spa town in Tuscany. Alex’s bucolic days there, the deprivations and dangers of war notwithstanding, were among best of his life as he later recalled. At one point, while adults were having a discussion at home, he crept from behind and made a big “boom” sound. During wartime such a prank was tactless, and Alex recalled this as the one drubbing in his life that he did get and deserve. Sometime during their stay the whole family converted to Catholicism, just in case. Alex was baptized and he even took catechism classes together with local boys. As he later recalled, the boys taking catechism classes in wartime Italy were an unruly bunch. At one point, as a queue of boys on their knees were ready to take the Host after mass, a prankster at the back end pretended to fall forward, setting off a falling domino effect. Alex and Ena attended regular school. He made friends with a certain Maurizio Piattelli, teaching the schoolboy how to play chess on a set that to this day is known among Maurizio’s children as “Deko’s chess.” After northern Italy fell under direct German military rule in September 1943, order was immediately given to round up all Jews for deportation to an unknown destination. The policy became to detain and to extradite Jews, whether Italian citizens or refugees. The new “Social Republic of Italy” (RSI) thus became an active collaborator of the Final Solution. While policies changed, some civil servants preferred to engage in passive resistance. In September 1943, an Italian policeman in Montecatini told Max that he was ordered to arrest him and his family and advised them to disappear before he would officially go looking for them. Štefanija and the kids went elsewhere in Italy with a view to escaping to Switzerland. Max stayed somewhere in Montecatini, believing the promise of a colleague at the hospital where he was working that his status as a medical doctor would protect him from deportation. Max and Štefanija did not know where the other was hiding, and the letters they exchanged did not carry any indication of their location. The wording of these letters pointedly avoided referring to people’s full names, places, or events. They corresponded using one or several trusted messengers who would carry their letters written in impeccable Italian. A letter sent on March 3rd 1945, i.e., 36 days before the end of the war, comments that “the weather is improving, and that it helps withstand the hardship of winter.” This double-entendre letter was sent to Štefanija in Switzerland, from an Italian refugee friend of hers in Munich. The 1944–1945 winter was exceptionally harsh in Europe, but the spring of 1945, more than any spring before, held the promise of better days ahead. Max was sick at the hospital on March 23rd 1944 when the SS arrested him. He was detained at a prison in Florence for four weeks, and then at Campo di Fossoli. A famous guest of this transit detention center was Primo Levi, who was there in January 1944. While in prison, Max somehow managed to send letters to his family, as well as a self-portrait he drew in profile. In the second to last letter he sent from Fossoli on May 02 1944, he bade farewell to his beloved Štefanija, Alex, and Ena. He told Deko to find his way in life and to obey Štefanija. He anointed him the future
6
M. Grossmann
head of the family. In his last letter from Fossoli, Max wrote to her that he would be sent to either “Theresienstadt (Bohemia) or Auschwitz (care of Rabinowitz).” To Max, these destinations must have sounded equally ominous. While Auschwitz does not need to be presented, Theresienstadt was a model concentration camp that was meant to lull the international community about the fate of disappeared prominent Jews. This camp is where Paul Frischauer’s parents were murdered, in 1942 and 1943. Thus, during the first month and a half of his captivity in Italy, Max could somehow sent messages across to Štefanija, but Štefanija could not write back. Max arrived in Auschwitz on May 30th 1944. The odds of survival were near nil. The trip to the camps lasted several days without water or food, and with no place to even lie down. Some were dead on arrival. When they got off the railcars two queues were formed one for those able to work and another for the sick and unable to work. Max had a heart condition since before the war, and he was already sick at the time of his arrest in Italy, so he chose his queue accordingly. The head of the camps infirmary, a certain Obersturmführer Heinz Thilo, saved Max by shoving him into the work-able queue. This act of random kindness, if it were one, is hard to explain. Being 17 years apart and having studied in different places, there is little chance they knew each other. That evening, Max was shown the smoke belching out the chimney stacks to answer to his question about the fate of those who chose the sick queue. After living in hiding for 2 or 3 months in Italy, Štefanija, Alex, and Ena escaped to Switzerland in November 1943 with a group of refugees. They were guided by smugglers over a steep mountain, as Alex recalled. Some Swiss border guards turned back refugees and handed them over to the Germans, while others turned a blind eye and let them in. Fortunately, they encountered the latter type, and they received a temporary residency permit in Switzerland. Max had squirrelled some money in a Swiss bank, and the family had instructions on how to access the funds once in Switzerland. The family may also have received financial assistance from the American Jewish Joint Distribution Committee or JDC in short, and thus it was able to spend the rest of the war in Switzerland. Before that, they benefited from the assistance of DELASEM (“Delegazione per l’Assistenza degli Emigranti Ebrei”), an Italian association that assisted Jews in Italy, both nationals or foreign refugees. DELASEM operated openly and legally but was banned by the Social Republic of Italy. From the safety of Switzerland, Štefanija followed up on every tip or shred of information to find her husband. In a letter from Štefanija dated February 10th 1944, which was probably never sent for lack of an address, Alex wrote a note to his dad that “he’s still his usual naughty boy”, while his sister Ena announces that she can now eat with a fork and without making a mess. A letter card was sent on July 25th 1944 to “Dr Max Grossmann, born 16. II. 1893, Vukovar (Yugoslavia), originally from Zagreb (Yugoslavia), Auschwitz Detention camp, East Upper Silesia.” It was returned as undeliverable, with the stamp of the Eastern Military Command of the
The Making of a Physicist
7
Wehrmacht. Štefanija could not know positively where her husband was, but she did inquire in the likely places and here, her intuition was right. However, on January 23rd 1945, i.e., four days before the liberation of Auschwitz, she was also following up on a tip to the effect that Max was in Munich, where Max was not. After the liberation of Auschwitz on January 27th 1945, Max testified twice, once at the camp two weeks after its liberation, and then a few months later at a Sovietrun commission in Krakow on the war crimes of the Nazis. Max returned from Poland to Zagreb in July 1945. The rest of the family returned from Switzerland the following month. Max became a professor at the Medical Faculty of Zagreb and wrote a scientific article on the health effects of long-term undernourishment. A heart attack felled him on February 20th 1947. Right after the war, Marica Horvat somehow managed to track down her son and place an international call. When she asked him how he was doing, “I have the diarrhea” was Alex’s memorable answer. Štefanija and children returned from Switzerland to Zagreb in the last week of August 1945. Right upon returning to newly socialist Croatia, the Grossmanns had to set the record straight on who they were, where they had been, and what they had done during the war. They did not return to their pre-war home, which had been rented. They moved in to live with Štefanija’s aunt Matilda Deutsch in Zagreb. In November 1947, Max’s other son Vladimir was allowed to move in to this apartment, after Matilda had passed away. Housing in Zagreb, as elsewhere in Europe, was scarce in the immediate post-war period. Court proceedings show that Štefanija went to court in September 1945 to ask for the restitution of some valuable personal belongings that had been “confiscated” by the NDH police in her absence during the war, such as Persian rugs, paintings, and old furniture. Most importantly, everyone was alive and ready to get on with life. Alex enrolled back in school in 1945 and finished high school in 1948 with “very good” or “excellent” marks in all subjects of the “Velika Matura” exam, except for physical education from which he was excused, probably owing to the tuberculosis he had suffered. In fact, TB was the reason why Alex had to return to a sanatorium in Switzerland over November 1946 to January 1947. He enrolled in Zagreb University in 1948, graduating in 1952 from the faculty of mathematics. After this he became an assistant professor at the newly founded Ruąer Boškovi´c Institute. The Institute for Theoretical Physics, Materials Physics and Chemistry had been founded in 1950. It is named after a eighteenth century polymath from what was the Free City of Dubrovnik. At one point he and a researcher friend met a couple of girls at the beach while on vacation. They mentioned being scientists at the Institute, hoping this would make an impression but the girls were unimpressed.
8
M. Grossmann
Acting on advice from renowned Croatian physicist Ivan Supek, Alex attended the master classes in theoretical physics at the 1954 summer school of Les Houches. The professors at the summer school included known theoretical physicists. Among these was Nobel physicist Enrico Fermi, who taught there 5 months before his death from a cancer of the stomach. Belgian physicist Léon Van Hove, then affiliated with the Institute for Advanced Study, Princeton and future Director General of the CERN taught quantum mechanics, while Harvard professor Dr. Roy J. Glauber taught quantum physics. Dr. Glauber was awarded half a Nobel prize in 2005. The textbook used at Les Houches was authored by Van Hove, and the teaching was in French. Cécile DeWitt-Morette, a French physicist and the founder of the Les Houches school, remembered Alex 30 years later as being passionate about the now-forgotten conflict opposing Italy and Yugoslavia over what had been the Free Territory of Trieste soon after WWII. It is one of the two occasions in Alex’s life when he could
The Making of a Physicist
9
be seen to be passionate about anything political. Since 1947, the Territory of Trieste was under United Nations administration, pending a settlement between the two countries which came in October 1954. On whose side Alex’s sympathies were is not a foregone conclusion: one the one hand, Alex clearly loved Italy and everything Italian: he described his two years in Italy as the happiest in his life, and he would gladly speak in his Tuscany Italian whenever he had a chance to do so; on the other hand, Alex was a freshly minted Yugoslav physicist at an international gathering of peers. The other occasion when Alex showed some nationalist feelings was in 1991–95, during the war opposing the newly created Croatian state, and Serbia as the heir to the Federation of Yugoslavia. On September 21st 1955 Alex travelled to the USA, entering the country via New York City. He had an invitation to work as a Research Assistant at Harvard University’s Physics Department and a $2,000 / year scholarship, $20,000 in today’s money. He received a MSc in Physics from Harvard in 1956. By early 1958, Alex was a Research Assistant at Brandeis University, as his application for permission to accept employment attests. In an April 1959 application for US citizenship, he declares a respectable-sounding salary of $5,500 / year, which was the median household income for that year. He became a US citizen shortly thereafter. While at Brandeis, Alex started what was going to be a long history of cooperation with Harvard physics Professor Tai Tsun Wu (commonly known as “T.T. Wu”) on quantum physics. From March 1961 to February 1962, three articles on Schrödinger Scattering Amplitudes were published in the Journal of Mathematical Physics, two of which jointly with T.T. Wu. For unknown reasons, Alex transferred from Brandeis to the Institute for Advanced Studies (IAS), Princeton, where he did research between September 1961 and June 1963. Drawing on the Schrödinger articles and two other 1960 articles published in the same Journal, Alex intended to write a doctoral thesis and earn a doctorate degree from Harvard. The apparent plan was to expand these articles and package them into one doctoral thesis; he asked for T.T. Wu, already at Harvard, to be one of the assessors. During the first quarter of 1962, Alex was trying to agree on the needed modifications to his articles. His correspondent at Harvard was Dr. Roy J. Glauber, whom he had met at Les Houches. By March 1962, the correspondence with Glauber was formal in tone but one could sense that, at least from Alex’s perspective, it had taken a litigious turn. Alex carbon-copied for his records the letters he sent to Roy Glauber, copied to William Preston the Chair of Harvard’s Physics Department. This is quite unlike Alex, who is known for his informal and affable nature. Alex’s letters from March to June 1962 summarize his understanding of what remains to be done to assemble his published articles and turn them into a coherent thesis; in one letter he asks Tai to be among his thesis assessors. Alex sent them more inquiries on this than he got answers, but he did finally get one from Roy Glauber on June 22nd 1962 explaining the reason why his doctoral thesis could not be accepted and would need to be reworked. He recognizes the material in the document is plentiful, i.e., the three Schrödinger Scattering Amplitude articles plus an additional one, but in his view it lacked explanations on certain topics
10
M. Grossmann
and transitions. One criticism from Dr. Glauber is that “the Committee wondered whether [Alex was] not often using more imposing terminology than was needed.” The overall tone of the letter was condescending but despite that, Alex was still expressing his willingness to edit his thesis to conform to the assessors’ comments. By July 1962, he must have given up on the idea of a Harvard doctorate, and possibly on doctorates altogether. An endorsement from Freeman Dyson, already a respected physicist at the IAS set the record straight: “[. . . ] I am convinced that Mr. Grossmann’s work and professional standing fully entitle him to a doctor’s degree. I therefore write to you to ask whether you can take suitable administrative action to ensure that this deadlock does not continue indefinitely.” Dyson, like Alex, never had a doctorate. Both men probably decided at some point in their life that there is more to science than holding a piece of paper proving one’s credentials. Alex met Dickie in 1955 or 1956 when he was a Research Assistant at Harvard, and she was a researcher at the Harvard Medical School. According to a family legend, Dickie came to attend some social event at the International House. She had been asked to chaperon younger female students coming along. As it turned out, she was the one who had to be chaperoned. The only other relationship Alex is known to have had before Dickie was with a certain Alice. The anecdote is that when the would-be fiancée presented him to her father in the lobby of the luxurious Pierre Hotel in NY, he gauged him and he let it immediately be known to Alice that Alex would not become her husband. The father was a successful businessman from France and he probably he did not think a poor foreign graduate student could be a worthy match for his daughter. Alice walked out of the hotel with Alex, apologizing for her father’s decision. Alex and Dickie dated for ten years. Their respective scientific careers kept them apart for the first five years, when Alex was in Waltham, MA at Brandeis, and then Princeton, NJ at the IAS. Dickie was in Cambridge, MA working at the Harvard Medical School and at MIT. They were reunited when she joined Princeton University in September 1961. They married in May 1965 in Watertown, NY when a common friend of Alex and Dickie’s gave them the friendly injunction to “stop living in sin” and with that, handed them the cash to pay the administrative fee to officialize their union. Soon after a sober civil marriage, they planned a trip to India. Both Alex and Dickie had come up with scientific reasons to travel to India, but this trip probably had a honeymoon feel to it, after 10 years of dating. The trip to India was Dickie’s second. Nine years earlier she spent 22 days in India as part of her whirlwind trip around the world together with her friend Esther Weiss. At the time, she visited Calcutta, Benares, and Delhi. On their way to India this time, Dickie and Alex stopped in Yugoslavia, where Alex presented Dickie to his mother. In August and September 1965, Dickie collaborated with the botany department of Madras University, while Alex would attend a symposium at the Institute of Mathematical Sciences in Madras (now Chennai). They also visited Bangalore, where Alex had a tailored suit made. It is probably Alex’s only sartorial indulgence of his life. Thus, in the end Alex did realize his early dream of travelling to India and getting married, albeit not in this order.
The Making of a Physicist
11
This trip was the first of a lifetime of international travel together where at least one of the motives for going somewhere had to do with their scientific work. The other motive, in the case of annual trips between France and the USA, was to see their respective families during the summer. Dickie’s family was and still is located in the state of Rhode Island, while Alex’s was in the states of Connecticut, where Marica lived for many years, and in Philadelphia where his sister Silba lives to this day. Yearly trips to North America would take the whole family to where Alex had scientific collaborations going, such as the University of Toronto in October 1968–June 1969, the University of Colorado in Boulder in the summer of 1981, Harvard University in the summer of 1982 and in August-December 1980, and at Yale University in the summer of 1984. While at the IAS, Alex conducted research with Belgian physicist David Ruelle. In 1964, Alex moved to the Courant Institute at New York U, while Ruelle went on to the Institut des Hautes Études Scientifiques. The IHES in France was modeled along the same lines as the IAS, and its creation had the blessing of then IAS director Robert Oppenheimer. Alex probably did not see a good fit with the Courant Institute because in late December 1964 he wrote to Ruelle and Louis Michel, also at IHES, regarding a collaboration assignment at the IHES for the 1965/66 academic year. In early January 1965, Léon Motchane, the moving force behind the creation of the IHES and its first director, accepted Alex’s application. Motchane, who had a US residence in Montclair, NJ, proposed to Alex that they meet at his home to work out the details of Alex’s transfer to France. One result of Alex’s collaboration with the IHES was an article entitled Condensation of Lattice Gases, published in May 1966 by Jean Ginibre, Alex Grossmann, and David Ruelle. No sooner had Alex and Dickie returned from their passage to India that they had to pack again to move to France. By November 1965 they were already settled in Orsay, south of Paris. Alex’s assignment in France was initially meant to be temporary but as often happens, the temporary turned out to be permanent. Founded in 1958, the Institut des Hautes Études Scientifiques was modeled as a French equivalent to the IAS, public-private scientific institution. Orsay also turned out to be where Alex and Dickie would settle into retirement later in life, but most of their work life was to be spent in southern France, at the Centre de Physique Théorique in Alex’s case. Looking back, the year and a half from May 1965 to December 1966 was a time warp. Alex and Dickie were married, they travelled to India, returned to the USA, packed their bags, and travelled to France for what was going to be a yearlong collaboration. In between, Alex managed a trip to his alma mater in Zagreb in December 1965 to deliver a lecture on nested Hilbert spaces, a subject on which he had made a name for himself after publishing an article on the subject in August 1964. In July 1966 their first son was born, and on October 1st 1966 Dickie was admitted to the Centre de Biologie Moléculaire in Marseille. Having a baby in tow and what looked like a stable job at France’s national scientific center (CNRS), the Grossmanns finally looked ready to settle down. Reminiscing in retirement, Alex said one reason for his wanting to move to France was the opportunity the CNRS would give him to concentrate on theoretical research with minimal administrative
12
M. Grossmann
oversight and no requirements to teach classes. This was a good thing because Alex was never comfortable teaching in amphitheaters. He much preferred the Socratic style of mentoring thesis students and turning them into future collaborators. As a tenured researcher at the CNRS, all he had to do was write a yearly account of his scientific activities. Dickie’s laboratory and place of work was in Marseille, and Alex’s contract with IHES had expired. It was important for Alex to find work at one of France’s (CNRS) research groups in the south of France. An opening came from physics professor at the Université d’Aix-Marseille Daniel Kastler who, in January 1967 was creating the Center of Theoretical Physics (in French, CPT) of Marseille. To endorse Alex’s joining the CNRS, D. Kastler made mention of Alex’s work on Nested Hilbert Spaces, a mathematical tool on which Alex had been working since 1964. Marseille was the official location of the CPT but the initial core group of mathematical physicists liked to gather around Kastler’s home in the seaside village of Bandol, about one hour’s driving time from Marseille’s eastern edge. They took the habit of gathering daily at 14:00 to work in a kindergarten after it was freed of its young occupants. Enough space and a large blackboard was all that was needed. In addition to Daniel Kastler and Alex Grossmann, the group consisted of Jean-Michel Combes and Derek Robinson. All elected to build their homes in Bandol’s hinterland. Bandol was also the home of physicists Raymond Stora, and Hans Ekstein, and it was the final home of Daniel’s father, the Nobel prize winner Alfred Kastler. Many visitorscientists joined the Bandol group, such as like R.V. Kadison , Sorgio Doplicher, Nicolaas Marinus Hugenholtz. After 1984, a crowd of younger scientists joined Alex in southern France after he created a strong activity around his discovery of wavelets, Thierry Paul, Ingrid Daubechies, and Matthias Holschneider among them. Now living in a rented house in Bandol, the Grossmanns seemed set to settle down in southern France. In the summer of 1967, he managed a trip to the USA and one to Yugoslavia, in part to show the baby to their respective families, plus a trip to Heidelberg in Germany in September. As Alex quipped in a note to the IHES, “once you start to travel, it never stops.” In October 1968, Alex was admitted to the CNRS. However, in December 1969 the Grossmann family moved to Toronto with two sons in tow, one aged 5 months and another aged 5 years and 5 months. The likely objective of this relocation, which lasted until April 1970, was to collaborate with U of Toronto’s fellow Croatian mathematical physics Eduard Prugoveˇcki. Upon returning, they settled in Cassis, about midway between Marseille and Bandol. They rented half a house in this picturesque and touristic fishing village. The house was in the hinterland, almost at the foot of the Cap Canaille cliff. The family car, an unassuming white Peugeot 403 purchased in December 1966—the last months of this model’s production run—would last until the eighties when a car rammed into it and damaged it beyond repair. Alex and Dickie daily commute between home in Cassis and work in Marseille would take them over the scenic and treacherous Route de la Gineste. Alex was a leisurely driver. Every time he took this road, a queue of cars would accumulate behind him, unable as they were to overtake him on this narrow and sinuous mountain road. One could see his driving as a metaphor for his doing it his way, Sinatra-like.
The Making of a Physicist
13
The decision to come to France must not have been an easy one to make. Dickie was born and bred and educated in the US state of Rhode Island, where she has family to this day. English was her preferred language throughout her life and she did not speak any French when she came to France. Alex too had strong ties to the USA. Marica and Silba lived in New England. For Dickie the move to southern France was something of a sacrifice. She saw Marseille as a scientific back-water in her area of work, and the port city’s classical musical scene certainly did not have Paris’s standing. Still in March 1967 that is, soon after their arrival to France, she was trying to sell her New York City apartment while being in France. This was the country they had decided to settle in. At one point in the late sixties, Alex and Dickie engaged a procedure to acquire the French citizenship, but the application forms were lost by the French administration. They gave up on the idea of trying to become French, satisfied with having residency and essentially all the rights and obligations of citizenship. In about 1985, the French administration decided at its own initiative to grant them full citizenship. Becoming tenured teacher-researcher at CNRS was relatively easy in the sixties as a tenured teacher scientist, with all these baby boomers coming of university age. The requirement was to have a doctorate, which Alex did not have, but his published works probably granted him a doctoral equivalency. When he was admitted as an untenured Maître de Recherche (Associate Professor) in March 1968, Alex had to ask every year to have his contract extended. The request would start with “J’ai l’honneur de solliciter votre haute bienveillance.. . .”, a form of address that today sounds more suited for monarchs. In February 1971 he got tenure and no longer had to ask for contract extensions every year. As he put it, the formal requirement of the job as a CNRS scientist was to send to his administrative superior a two-pager describing his scientific activities for the year. The annual rapport d’activité was supplemented with peer reviews of referees, which invariably gave him glowing reviews and recommended his promotion to the next rank in the CNRS hierarchy. The research agency was to be his employer up until retirement in 1991, having achieved in 1990 the rank of first grade Directeur de Recherche. After the CNRS, his scientific life continued in entirely new areas. His post-retirement research was in computational genomics, a domain that hardly existed while he was officially employed. The direction of Alex’s scientific research was guided by the wanderings of his intellectual curiosity. He started off in quantum physics. In 1982 he then stumbled into signal analysis after meeting with Jean Morlet, then an R&D engineer at the French oil company Elf. Jean had a method for interpreting seismic signals that seemed to work, but he did not have the means to explain or generalize it. The meeting with Alex led them to the now-famous wavelet theory. Alex was early adopter of personal computing devices in the late seventies, starting with fancy programmable Texas Instruments calculators, then a clunky Olivetti desktop computer in the early eighties. It is natural that he entered computational genomics after his official retirement.
14
M. Grossmann
Each of the themes he worked on could have kept a scientist busy for a lifetime. Wavelet theory became in 1985 a hot topic in applied science, and it was of interest to several industries. Still, Alex was happy to have others develop and expand this theory. Some of his close collaborators may have regretted this, but popular interest in a topic never was the guide for his research. The perpetual curiosity of his inner child was his drive.
Alex Grossmann, a Rinascimento Multidisciplinary Man Thierry Paul
“Puncto è la cui parte non è, secondo i geumetri dicono essel’e inmaginativo;”1 Piero Della Francesca De perspectiva pingendi (1576) “Peindre qu’on ne voit pas”2 M. Proust on the painting by Monet “Bras de Seine près de Giverny” “là où les âmes vivent avant de naitrë3 A. Grossmann to J-M. Combes, about the second Riemann sheet
1 A Jubilee of Multifold Research Alex Grossmann left his mark on what is generally referred to as mathematical physics.4 Those who knew Alex well remember that he did not like this hybrid term. On the contrary, he was able to bridge these disciplines and contribute to both, and to move later on to genomics.
1 According
to geometers who say it by imagination, the point is that part which is not; that you don’t see 3 Where souls live before being born. 4 This text is an extended version of the translation made by Michael Grossmann of my paper Alex Grossmann, un homme multidisciplinaire, La Gazette des Mathématiciens 161 78–81 (2019). 2 Painting
T. Paul () Sorbonne Université, CNRS, Université Paris Cité, Laboratoire Jacques-Louis Lions (LJLL), Paris, France e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Flandrin et al. (eds.), Theoretical Physics, Wavelets, Analysis, Genomics, Applied and Numerical Harmonic Analysis, https://doi.org/10.1007/978-3-030-45847-8_2
15
16
T. Paul
This short and incomplete overview of Alex Grossmann’s work should suffice to convey the spirit of his eclectic thinking. To summarize, Alex Grossmann’s scientific career can be divided into three phases. He started off in 1960 with quantum mechanics and the important analytical questions of the day. In the mid-nineties, he turned his attention to genetics and biology. Of course, starting in 1983, he contributed to the wavelet theory, of which he is one of the founding fathers. However, the success and uptake of wavelets both at the theoretical level and in its many applications should not let us forget the sheer variety of Alex Grossmann’s earlier scientific contributions. It is striking to note the number of ground-breaking works he made starting in 1960. Except for his work on the quantum theory of fields, on physical statistics (together with J. Ginibre and D. Ruelle), relativity theory (with R. Coquereaux), and of course wavelets and biology, the majority of Alex Grossmann’s works was about quantum mechanics. One notes that, at a time when modern mathematical physics was being born, the ideas of Alex Grossmann influenced modern physics with a very personal and geometric vision of mathematical analysis. His founding work on the now-famous wavelets could obscure his earlier works. For the sake of balance, it is necessary to document his earlier work, without trying to be exhaustive.
2 Quantum Physics and Related Fields The first important works of Alex Grossmann (1961–1962) were about the theory of scattering. He published a series of articles with T.T. Wu on the analyticity properties of scattering amplitudes, where through rigorous analysis he came up with innovative results. This initial work has spurred much further research into this area. The second series of works related rather to pure analysis. Trying to find a natural framework for the quantum theory of scattering, Alex Grossmann started to study a specific family of Hilbert spaces that are invariant under Fourier transform, and that may contain rapidly increasing functions, therefore non-tempered distribution. The prospect for providing a mathematically correct description of very singular operators guided his research for several years. While seeking spectral properties that are invariant between Hilbert spaces, Alex Grossmann came to introduce a new class of functional spaces called Nested Hilbert Spaces, which are sort of interweaved Hilbert Spaces. This construct, which was developed in 1966– 1967, was meant to produce a satisfactory theory of resonances. This theory was generalized and somewhat simplified some ten years later in collaboration with J.P. Antoine, with the introduction of partial internal product spaces. This theory in turn led several years later to the method of complex dilations of J.M. Combes (with J. Aguilar): resonances are the eigenvalues of the “complexified” Schrödinger operator. A 1970 article together with G. Loupias and E. Stein presents for the first time a semi-classical theory of the quantification process. This seminal work
Alex Grossmann, a Rinascimento Multidisciplinary Man
17
opened the way for significant research about pseudo-differential operators with the Planck constant, which are now commonly used both in physics and in mathematics. Towards the end of the seventies, Alex Grossmann returned to quantification problems, demonstrating the important role played by the parity operator (symmetry around the origin), moving in the phase-space, as an elementary bloc in the Weyl quantification process (a Wigner–Weyl Group). Numerous papers followed this idea, by, among others, P. Huguenin, H. Bacry, J. Zak, and J. Reigner. It culminated with the doctoral thesis of I. Daubechies (1980). Much later, the Wigner transform became a standard tool in (mathematical) semi-classical analysis, and more recently, in kinetic equations theory. Alex Grossmann’s algebraic vision remains a source of constant inspiration in this area, which brings us to our next topic. The linkage between analysis and algebra becomes nowadays increasingly present in theoretical and mathematical physics: the works of Jean Bellissard, inspired by those of A. Connes, can attest to this. It is important to mention, as J. Bellissard himself noted, that Alex Grossmann was the first one to exhibit a physical situation (solid-state physics with magnetic field) associated with a von Neumann algebra of Type II and not of usual the nType I. This ground-breaking work dates back to 1972, long before noncommutative geometry. In parallel with these developments in mathematics, Alex Grossmann’s interest and work covered many areas of physics. Beyond quantum mechanics on phasespace, which we already mentioned, his work on solid-state physics deserves special mention, and in particular his mathematical formulation of the concept of point interaction. His bibliography clearly shows the evolution of his scientific thought that often brought him back to solid state questions (Bloch Hamiltonians, Kronig-Penney models, Fermi pseudo-potentials), such as the study of the spectral properties of concrete Hamiltonians: the one-dimensional point interaction model of Kronig-Penney was generalized into higher dimension, including Fermi pseudopotentials, to which Alex Grossmann came back in the eighties with T.T. Wu. These, however, had an unsatisfactory mathematical definition. In dimensions greater than one, the Dirac mass is too great a perturbation for the Laplacian, and it destroys its self-adjoint property. Non-standard analysis (a mathematical theory of the infinitely small) does provide a rigorous definition, but it is somewhat exotic and is difficult to handle. In an article with R. Høegh-Krohn and M. Mebkhout, the problem is somewhat demystified thanks to a functional analysis definition: one had to work on the resolvent. On must also mention the quantization of canonical transforms (real or complex), the “leap-frogs,” the interaction between fields and atoms (with A. Tip), the theory of antennae (with T.T. Wu)—the list is long.
3 Wavelets The second part of Alex Grossmann’s scientific careers is vowed into the history of wavelets. Having worked for a long time with the Weyl-Heisenberg group, Alex
18
T. Paul
Grossmann was the go-to person for a geophysicist who had been wondering for years about the problem of sampling seismic signals. Such signals contain markers for each one of the geological layers. The problem was that they did not have a characteristic scale, ruining the efficiency of time-frequency methods associated with window of observation having a size fixed in advance. By proposing a timescale analysis based on the basis of constant form wavelets, J. Morlet brought a potentially interesting solution, as was then supported by numerical tests. Alex Grossmann understood that this approach was equivalent to a decomposition into generalized coherent states, and that one had to give up the Weyl-Heisenberg group to use instead the “ax+b” affine group. Translations are thus to be replaced by dilations, which are better suited to render the small-scale details of seismic signals. We see here an epistemological very important aspect of the conceptual nature of Alex Grossmann’s thinking: it is not the coherent states of quantum mechanics which happened to be useful in signal theory, but, in contrast, it is them which allowed Alex Grossmann to perform the decisive step towards a deep theoretical, almost abstract, understanding of Jean Morlet’s empirical methods. A decisive step that gave rise to such a rich and multidisciplinary development, the wavelets. Ana a tool named abstraction which was at the same time very profitable and elegant. For example, Alex adored explaining the inverse formula of the wavelet transform by saying that one just had to take the complex conjugate of the integral kernel of the transform itself and permute the variables: “the transform is unitary (on its range), its inverse is equal to its adjoint!”. Alex liked to praise the added value when a complicated calculation is replaced by a simple abstract argument.5 By building in the early eighties a rigorous framework for this time-scale decomposition, Alex Grossmann and Jean Morlet laid the foundations for continuous wavelet transforms, at the time known by their French name “ondelettes.” This theory was developed by Alex Grossmann, colleagues, and students. In collaboration with I. Daubechies and Y. Meyer, he established the discrete twin of the wavelet decomposition: a signal can be described that way using a countable family of functions, all translated and dilated from an elementary function. Later came the fundamental discovery of orthogonal wavelets by Y. Meyer. The notion of multi-scale analysis introduced by S. Mallat allowed formalizing the building of basis functions. This led to the development of efficient algorithms, which by using compact wavelet basis functions (cf. I. Daubechies, with A. Cohen and J.C. Fauveau in their bi-orthogonal version), brought wavelets into the mainstream as a tool with multiple applications, such as the JPEG2000 compression standard. Since the original works of Jean Morlet and Alex Grossmann, wavelets analysis has been used in such various domains as analysis and synthesis of signals, the detection of discontinuities, shape recognition problems, the analysis and processing of images, fractal theory, the inverse problem in potential theory (F. Moreau, G. Saracco), the analysis of turbulent phenomena (M. Farge), quantum mechanics
5 Y.
Avron, private communication.
Alex Grossmann, a Rinascimento Multidisciplinary Man
19
(thesis of T. Paul under the direction of Alex Grossmann), complex dilations, Feynman integrals, etc. Far from simply being a talented inspirer for research, Alex Grossmann would walk personally his ideas down the lane to their application in various fields. In signal analysis, Alex Grossmann did much to develop wavelets into a tool to analyze and synthetize sounds. His fruitful cooperation with R. Kronland-Martinet at the Mechanics and Acoustics Laboratory of Marseilles brought wavelets into the mainstream. It is noteworthy that his collaboration with M. Holschneider, R. Kronland-Martinet, and J. Morlet on the detection of step changes in sound signals brought about another application of wavelet transforms: the characterization of fractal objects and the ability to detect singularities in a function. This made wavelet analysis into a sort of mathematical microscope. It is also thanks to Alex Grossmann that continuous wavelet transforms were generalized into higher dimensions. He suggested this idea to R. Murenzi, then advised him and oversaw his mathematical work to develop a wavelet transform associated with a Euclidian group in d-dimension with dilations (with J.P. Antoine). This in turn opened the way to a whole new field in images analysis (shape recognition, detection of edges and textures). The combination of the continuous approach with progressive or directional wavelets (i.e., those whose basis of their Fourier transform is contained in a convex cone with a peak at the point of origin) is a Marseilles specialty, as Alex would muse. The use of such wavelets is used to define the concept of an “instant frequency,” which is local in scale also. This has many applications: Magnetic Nuclear Resonance, acoustical retrodiffusion, Milankovitch cycles, gravitational wave detection. The important contribution of the Marseilles group (PMA and CPT) in this area stands as proof of Alex’s acumen and the role he played during his Marseilles years.
4 Genomics But there is also an Alex Grossmann from after wavelets. After certain discussions on the wavelet analysis of DNA sequences, including those with A. Arnéodo, Alex Grossmann resolutely turned his attention to genetics and biology in the third part of his career, the witness of an exceptional dynamism and knowledge-based curiosity. In this context, he first used and then developed methods from data analysis, stochastic modeling, and then combinatorics, and theoretical computer science. By studying correlations between phenotype data, he modeled together with A.S. Carpentier, A. Hénaut, and B. Torrésani the way in which each gene infers the likelihood that a chromosome will be compacted. In an article with A. Hénaut, C. Devauchelle et B. Torrésani, the evolution of genetic sequences (e.g., in proteins) was modeled as branching Markov chains. The nodes of these chains reveal divergences with respect to evolution by point
20
T. Paul
mutations. This model reveals that, in many cases, evolutionary dynamics can be represented starting from a single universal stochastic matrix, and the divergence times can be modeled by a simple linear regression. In other words, one can reconstruct evolutional history from simple comparisons of genetic sequences. Representing the evolutionary process as a tree was not new, but confrontation with data was definitely an original approach, very interesting in its simplicity. Alex was fascinated by the possibility of inferring evolution from current genomic sequences. Alex Grossmann was very interested in the phenomena of growth and reconstruction of phylogenetic trees from genetic sequences using original methods phylogénétiques using combinatory and computation theory (with C. Devauchelle, A. Dress, S. Grünewald, A. Hénaut and J. Weyer-Menkhoff). His sudden passion, when he was almost seventy, for processing real life data, and the computational work they require, led him to write complex algorithms and thousands of lines of code.
5 How to Conclude Such a Short But Mind-Boggling Overview? As we can see, there is no trace of wavelets in Alex Grossmann’s work in genetics. This properly reflects on his dynamic, cross-discipline thinking : instead of trying to apply his toolbox to new sets of problems, he went boldly into new fields to innovate from scratch. During the more than fifty years of his career, he was able to combine abstract and concrete thinking, cultural and creative, rigorous and imaginative. The mark he left goes beyond the amount, however, great, of material in articles that he published. For all his students and those who worked under his guidance, speaking of him as a teacher is an understatement. His education started off in a Montessori school in Zagreb in the thirties, where learning was supposed to be driven by curiosity. He did not construct the knowledge of his students. Rather he subtly deconstructed formal knowledge, so that his students would find their own ways. By playing between sciences, arts, literature, languages . . . Alex Grossmann truly was a Rinascimento multidisciplinary man.
6 Postlude: A Geometric Existentialist Way of Thinking It seems to be in order now to finish this text by wondering what was the core of such a multifold creation, created by a multifold thinking. A word comes in mind whenever one looks at the mathematical way of thinking that Alex had: geometry. Alex was thinking geometrically, even when facing unknown, very dispersed scientific facts. But geometry does not mean abstraction—
Alex Grossmann, a Rinascimento Multidisciplinary Man
21
as many people think. After all, geometry surrounds all our life, sometimes hidden but never abstract. Alex’s geometrical perception was never abstract because Alex’s way of thinking was fundamentally existentialist. A bit like was existentialist the way of painting of Piero della Francesca6 at a period of the renaissance where the geometry of perspective was just—or not even—built. Alex’s work on the analyticity properties of scattering amplitudes illustrates this characteristic of Alex very much. Scattering amplitudes already existed in quantum mechanics when Alex, together with Tai Tsun Wu, studied their mathematical properties. But a few years later, the same property of analyticity became part of the axiomatic of quantum field theory, the theory supposedly essentialist, that is epistemological “before” ordinary quantum mechanics, and a domain on which Alex never had a real interest. The same happened to be true for the discovery of a non-trivial algebra in solid-state physics: it is a situation of noncommutative geometry inside ordinary quantum mechanics, the one of everyday life, in contrast to the infinite number of papers nowadays dealing with quantum mechanics on noncommutative spaces. It seems to me that it is this existentialist property which avoided the geometrical views of Alex to be abstract. Je préfère penser au demi-plan de Poincaré en le représentant comme le demi-plan inférieur plutôt que le demi-plan supérieur comme il est d’usage. De cette façon, je peux l’imaginer comme la mer, avec l’horizon à l’infini et des bateaux, des bateaux qui deviennent de plus en plus petits au fur et à mesure qu’ils se rapprochent du bord.7 Alex Grossmann to Robert Coquereaux.
6 Who said this definitively existentialist definition of painting: “Painting is nothing but a representation of surfaces and solids foreshortened or enlarged, and put on the plane of the picture in accordance with the fashion in which the real objects seen by the eye appear on this plane.” 7 I prefer to think of the Poincaré half-plane by representing it as the lower half-plane rather than the upper half-plane as is customary. That way, I can imagine it as the sea, with the infinite horizon and boats, boats that get smaller and smaller as they get closer to the shore.
Generalized Affine Signal Analysis with Time-Delay Thresholds Jan W. Dash, Alex Grossmann, and Thierry Paul
a 1986 unpublished draft by J.W. Dash, A. Grossmann and T. Paul
J. W. Dash () J. Dash Consultants, LLC, Silver Spring, MD, USA Fordham University, New York, NY, USA T. Paul () Sorbonne Université, CNRS, Université Paris Cité, Laboratoire Jacques-Louis Lions (LJLL), Paris, France e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Flandrin et al. (eds.), Theoretical Physics, Wavelets, Analysis, Genomics, Applied and Numerical Harmonic Analysis, https://doi.org/10.1007/978-3-030-45847-8_3
23
24
J. W. Dash et al.
Generalized Affine Signal Analysis with Time-Delay Thresholds
25
26
J. W. Dash et al.
Generalized Affine Signal Analysis with Time-Delay Thresholds
27
28
J. W. Dash et al.
Generalized Affine Signal Analysis with Time-Delay Thresholds
29
30
J. W. Dash et al.
Generalized Affine Signal Analysis with Time-Delay Thresholds
31
32
J. W. Dash et al.
Generalized Affine Signal Analysis with Time-Delay Thresholds
33
34
J. W. Dash et al.
Generalized Affine Signal Analysis with Time-Delay Thresholds
35
36
J. W. Dash et al.
Generalized Affine Signal Analysis with Time-Delay Thresholds
37
38
J. W. Dash et al.
Generalized Affine Signal Analysis with Time-Delay Thresholds
39
40
J. W. Dash et al.
Generalized Affine Signal Analysis with Time-Delay Thresholds
41
42
J. W. Dash et al.
Generalized Affine Signal Analysis with Time-Delay Thresholds
43
Introductory Note on the Draft Paper “Generalized Affine Signal Analysis with Time-Delay Thresholds”, by Jan W. Dash, Alex Grossmann and Thierry Paul Jan W. Dash and Thierry Paul
Personal recollections of Alex Alex was a delight and a genius. His family and ours were close friends. He spoke many languages; since my office door at the CNRS Marseille-Luminy was across from his, I could hear him responding to telephone calls in callers’ native languages. Alex mentored me on some of the mathematics in this paper. He also took time to learn to play the clarinet and I was privileged to return the favor and be his teacher! After we left France, Alex and DK visited us regularly in the US and we visited them in France. He once visited me at Bell Labs and gave a seminar. Alex was also interested in the quantitative finance and risk management issues in which I was later involved, and insightfully helped me with a difficult point involving my correlation-based expansion of n-dimensional Gaussian integrals. Alex is sorely missed. JWD
J. W. Dash () J. Dash Consultants, LLC, Silver Spring, MD, USA Fordham University, New York, NY, USA e-mail: [email protected]; [email protected] T. Paul () Sorbonne Université, CNRS, Université Paris Cité, Laboratoire Jacques-Louis Lions (LJLL), Paris, France e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Flandrin et al. (eds.), Theoretical Physics, Wavelets, Analysis, Genomics, Applied and Numerical Harmonic Analysis, https://doi.org/10.1007/978-3-030-45847-8_4
45
46
J. W. Dash and T. Paul Je vous parle d’un temps que les moins de vingt ans ne peuvent pas connaitre... La Bohême , J. Plante-C. Aznavour
The paper, an unpublished draft, is dated May 1986, a time when LaTex did not exist. Thirty-six years later, we only have a hard copy with annotations by Dash probably written at the same period than the draft itself. The “source file” was not found.1 The subject of the paper is to place some previously introduced useful functions—most notably containing a time-delay parameter—within the context of wavelet theory. In this introductory note we discuss historical background for the paper and comment on the content for perspective. The text in italics and numbered references are from the original.
1 Wavelets in the Mid-1980s The paper was written in May 1986, after discussions between the three authors in 1985–1986 and earlier. These years were a time of a full explosion of wavelets theory.
1.1 The Group-Theoretic View of Wavelets Attention focused at that time on the group-theoretic view of wavelets—i.e., orbits of functions by a representation of a (non-unimodular) group. The interest in this group-theoretic aspect was a bit unjustly diminished by the suspicion that the approach might not help solve practical problems. The coup de grace was given by the discovery of orthogonal bases of wavelets, especially those having compact support. Nevertheless, the group-theoretic approach for wavelets provided a precious tool for a multidimensional extension. The group-theoretic approach is also applicable in the case discussed in the paper, since an orthogonal basis for these functions has yet to be found.
1 We do not remember very well why the draft was never completed and sent to a journal. Jan found it when he was moving in 2021. In a letter to Alex and Thierry dated at the time of the 1986 draft, Jan suggested after completion “not sending this to a mathematical journal but rather a journal like the European Signal Processing Journal or the IEEE version of it.”
Introductory Note on the Draft Paper “Generalized Affine Signal Analysis with. . .
47
1.2 Overcompleteness Can Be a Virtue The discretization of a continuous wavelet transform was supplanted by the use of an orthonormal basis of (compactly supported) wavelets. Orthogonality has the advantage of providing exact inversion formulae. This procedure, however, has the disadvantage of being rigid, compared to the freedom provided by discretizing the grid and specifying other parameters for an overcomplete set of functions.2 This specification can be done using least squares or other techniques. Only a few of these more complicated functions have to be employed to get a description of complex phenomena, as opposed to having to use a large number of standard wavelets. This is a main point of the approach.
1.3 Comments on the First Sentence: “A Class of Functions Recently Introduced by Dash and Paul . . .” The functions introduced by Dash contain multiple parameters; as mentioned in the text they were useful for some applications in particle physics (Ref. 8), signal analysis (Ref. 7), and—as proposed much later—finance.3 The continuous wavelet introduced in the thesis of Paul (Ref. 9) consists in taking the Fourier transform of an exponential function with support on the positive half line and dilating and translating it as wavelet philosophy suggests.4 The overcomplete set of functions
2 See Chapter 52, Dash, Jan W. Quantitative Finance and Risk Management, a Physicist’s Approach, 2nd Ed. (ISBN 9789814571234); copyright 2016 by World Scientific Publishing Co. See https://www.worldscientific.com/doi/abs/10.1142/9789814571241_0052 This chapter had a “plea” to mathematicians to find a complete subset of these overcomplete functions. 3 The application in particle physics (ref. 8) was the description of some data in high-energy diffractive scattering. Here the functions are pieces of the imaginary part of the scattering amplitude, time is the logarithm of the beam energy, and the time thresholds correspond to the successive production in energy of particle with different flavors (strangeness, charm.. . .), with successively increasing of mass scales. The application in signal analysis (ref. 7) was the description of the measured response of some equipment in spatially separated locations to input transient electromagnetic fields. The timethreshold behavior occurred because different parts of the equipment reacted to the electromagnetic fields at different times. The proposed application in finance (ref. preceding footnote) is that these functions could form part of the macro component of financial markets operating over long time scales, and possibly also on shorter micro time scales for trading. 4 One of the main interests in TP’s thesis was to provide a Hilbert space representation of quantum mechanics where the Schrödinger equation for the hydrogen atom was explicitly solvable, in the sense that the eigenvalue problem becomes a first order differential equation. Equivalent to the Bargmann space for the harmonic oscillator, this setting qualitatively “explained” why the Bohr quantization rules are exact for the hydrogen atom. The “Cauchy” wavelet was also used extensively some years later for detecting singularities of functions.
48
J. W. Dash and T. Paul
used in the draft are precisely these exponential functions, multiplied by a Heaviside function, and then dilated and translated in time (first formula of the paper).
1.4 Phase As we just mentioned, this is the Fourier transform of our functions which is translated. This generates a phase in our wavelets themselves, a situation totally different from the usual wavelets. The study of the phase was another feature of interest in wavelets at that time. Note that the phase has regained interest in the last few years. One can look at the contribution to this memorial volume by S. Mallat et al. The functions can also have overall phase factors, which is useful in practice (Ref. 7).
2 Specific Comments on the Text The action of the Weyl-Heisenberg group is direct, not acting on the Fourier transform. All the machinery of orthogonality relations still works in this case, as explained in Section II of the paper: we are dealing with the same group but through another representation. A few proposed uses of these functions discussed at the time (e.g., speech recognition) remain to be explored. We have always felt that these functions could be generally useful with many applications—in addition to the applications cited above. The use of least squares to determine parameters in the 1980s could be upgraded in principle with machine learning. A technical appendix was planned for the paper but was never written.
3 Le Baron de Prony’s Overcomplete Set of “Wavelets à la Neanderthal” Gaspard de Prony5 invented a method that constitutes to our knowledge the first example (in 1795!) of a signal expansion in terms of an overcomplete set of 5 Gaspard de Prony (1755–1839) was an influential and very productive French engineer. He lived during the period in which many scientific institutions were born in France. He was Professor at the Ecole Polytechnique. See https://en.wikipedia.org/wiki/Gaspard_de_Prony. A description of de Prony’s method is here: https://en.wikipedia.org/wiki/Prony’s_method. de Prony was quickly rebaptized “Le Baron” by Alex, and we all were amused by quoting a paper from 1795; see Ref. 1, which can be found through the link http://users.polytech.unice.fr/~leroux/PRONY.pdf.
Introductory Note on the Draft Paper “Generalized Affine Signal Analysis with. . .
49
continuous functions along with a prescription for the expansion using a discrete number of constraints from a set of measurements in time. Prony’s method has practical applications. However, it assumes one pre-specified initial time, and a large number of terms—numerically unstable in practice—have to be used to simulate a time-delay threshold. (this observation actually was the origin of the work in Ref. 7.) De Prony functions are a special case of our functions.
4 Alex This paper seems to both of us be representative of the spirit of Alex “at work,” as it touches a large landscape of the sciences, history, and general knowledge. This paper shows also how Alex was interested in applications of mathematics in many areas. For example, Alex wrote a paper on numerical simulations6 where the role of geometry was pointed out. He also worked in the biological sciences incorporating combinatorics.7.. . . Alex was interested and curious about optimization issues and always kept attached to keeping the maximum of freedom in wavelets analysis. Although he was present at the very early beginning of the “discrete” story of wavelets,8 TP remembers very well how he was always more interested in discretizing the continuous than fixing a priori discrete sets of parameters.
5 Addendum/Corrigendum • • • •
p. 6, l. 2–3: Replace Refs. [xxx] by: Refs. [5, 6, 9]. The pagination “7” is correct. p. 7, l. 13: A point is missing after ψ(t). p. 7: The last three lines hand written read the following: The set of overlap integrals forms another Hilbert space H , this time of functions ´ fˆ on the group G, with scalar product (fˆ1 , fˆ2 ) = dμ(g)fˆ1∗ (g)fˆ2∗ (g). ˆ = • p. 8, l. 7: The scalar product here is the one we just mentioned: (eg , h) ´ ∗ ˆ dμ(g )[eg (g )] h(g ). • p. 9, first formula: f0 is the function f defined p. 3.
6 A. Grossmann, R. Coquereaux and B. Lautrup, “Iterative Method for Calculation of the Weierstrass Elliptic Function,” IMA Journal of Numerical Analysis 10(1), 119–128, 1990. 7 See, e.g., G. Didier, E. Corel, I. Laprevotte, A. Grossmann, and C. Landès-Devauchelle. Variable length local decoding and alignment-free sequence comparison. Theoretical Computer Science, 462:1–11, 2012. 8 A. Grossmann, I. Daubechies and Y. Meyer, “Painless nonorthogonal expansions,” J. Math. Phys. 27, 1271,1986.
50
J. W. Dash and T. Paul
• p. 9, l. 1: Let us recall that the orthogonality relations for square integrable representations of non-unimodular groups read ˆ .
dμ(g)|U (g)f1 U (g)f2 | = Cst × Identity
are valid also when f1 = f2 (as soon as they obey the “admissibility condition,” see [Ref. 6] in the draft (precised below in the last item) . Therefore, the reproducing kernel K built up with different values of t0 still “reproduces.” • p. 9, last formula: Let us remark that we “find” this formula as a consequence N (N) bi egi (g), which is of course not valid of the decomposition FINT (g) = i=1 in general but which is the ansatz we take. Letting this ansatz satisfy the (N) constraints gives immediately that bi = (K −1 ζ )i so that, as written, FINT (g) = N K(g, gi )(K −1 ζ )i . Writing now ζ = KK −1 ζ and using the Cramer’s i=1
formula, we find exactly the last formula of page 9, after expanding the numerator determinant by the first row and pushing (in the j th co-factor) the first column to the j th position. • p.11, l. 5: ij stands for (g (gi ) , g (gj ) )L2 (R) and likewise j ψ = (g (gj ) , ψ)L2 (R) . • p. 14, l. 11: Replace xx by “4th previous equation.” • The missing information, denoted by xxx, in the references are the following. – Ref (6): The second reference in Ref (6) is A. Grossmann, J. Morlet, and T. Paul, “Transforms associated to square integral group representations” I. General results, J. Math. Phys. , vol 26, 2473–2479, 1985, and II. Examples, Ann. Inst. Henri Poincaré, Phys. Théor., vol 45, 293, 1986. Another related reference is A. Grossmann and T. Paul, “Wave functions on subgroups of the group of affine canonical transformations” in “Resonances, models and phenomena.” Lect. Notes in Phys, 211, Springer-Verlag, 1984. – Ref(8): Another reference is J.W. Dash and S.T. Jones, “Flavoring, RFT and ln2 s Physics at the SPS Collider”, Physics Letters B157, p. 229 (1985). – Ref(9): T. Paul, Thèse d’état, Université d’Aix-Marseille, 1985. – Ref(13): B. Roy Frieden, “VIII Evaluation, Design and Extrapolation Methods for Optical Signals, Based on Use of the Prolate Functions”, Progress in Optics 9 Ch. VIII, 311–407, 1971. – Ref(14): E.S. Abers, B.W. Lee, “Gauge theories”, Physics Reports 9, Issue 1, 1–141 1973.
Alex Grossmann’s PhD Thesis (Harvard 1959): Covariant Functions of Quantum Fields - Table of Contents and Introduction Alex Grossmann
Mathematical attempts of understanding quantum field theory was a rich activity in theoretical physics in the 50s and 60s. Fundamentally based on relativistic invariance, axiomatic quantum fields theory paradigm emphasizes the role of functions possessing some symmetry under the action of the Lorentz group. The study of these covariance properties constitutes the subject of Alex’s PhD Thesis. The functions studied by Alex are associated to big names of theoretical physics of that time: Wightman functions, Schwinger functions, Dyson functions.. . . but the key idea of the thesis is, according to us, to undress the functions from their physical clothes and exhibit “properties of a class of functions rather than properties of quantized fields”. In a six pages long introduction to the memoir, Alex shows clearly three methodological main streams which will irrigate, quasi obsessively, his whole future scientific (and extra-scientific) activity: – the importance of geometrical and algebraic structures in the analysis of objects – the efficiency of considering classes, collections of objects rather than single ones – the deep and economical view provided by dispossessing objects from anything but strictly necessary. We all benefited from them and will continue to do so. Bruno, Patrick, Stéphane, and Thierry
T. Paul () Sorbonne Université, CNRS, Université Paris Cité, Laboratoire Jacques-Louis Lions (LJLL), Paris, France © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Flandrin et al. (eds.), Theoretical Physics, Wavelets, Analysis, Genomics, Applied and Numerical Harmonic Analysis, https://doi.org/10.1007/978-3-030-45847-8_5
51
52
A. Grossmann
Alex Grossmann’s PhD Thesis (Harvard 1959): Covariant Functions of. . .
53
54
A. Grossmann
Alex Grossmann’s PhD Thesis (Harvard 1959): Covariant Functions of. . .
55
56
A. Grossmann
Alex Grossmann’s PhD Thesis (Harvard 1959): Covariant Functions of. . .
57
58
A. Grossmann
Alex Grossmann’s PhD Thesis (Harvard 1959): Covariant Functions of. . .
59
60
A. Grossmann
Part I
Quantum Mechanics and Theoretical Physics
Alex Grossmann, from Nested Hilbert Spaces to Partial Inner Product Spaces and Wavelets Jean-Pierre Antoine
Abstract This chapter is devoted to the long collaboration between Alex Grossmann and I, the focal point being the notion of PIP-spaces, introduced around 1970. Next I will review the basic facts of the theory. Then I will explore all the consequences and applications of that notion, to operator theory, spectral analysis of self-adjoint operators, metric operators that generate lattices of Hilbert spaces, signal processing, etc. As a matter of fact, new developments are still arising today. In addition, we will describe the interaction with Alex that lead to 2D continuous wavelets.
1 Introduction: Some History In these lines, I will describe my collaboration with Alex and its rich legacy, starting in Spring 1968. Some years before, upon the suggestion of V. Bargmann, I had studied the notion of Rigged Hilbert Space (RHS) developed by I.M. Gel’fand et al. Next, in 1966, I had written a PhD thesis on a formulation of quantum mechanics in the new formalism [1, 2]. Then, in the Spring 1968, a visitor, called Alex Grossmann, came from Marseille and gave a talk in our department, on a topic unknown to most of us, namely Nested Hilbert Spaces (NHS). Actually both RHS and NHS (a detailed description will be given in the sequel) have the same goal, to describe in a rigorous manner those elements that do not belong to the fundamental Hilbert space, yet play a crucial role, for instance, in the Dirac bra–ket formalism (plane waves, .δ-function, distributions in general). In other words, both Alex and I were doing implicitly the same thing. After discussion, it became clear to us that there ought to be a more general formalism englobing both NHS and RHS. Thus Alex invited me to spend some time with him in Marseille.
J.-P. Antoine () Institut de Recherche en Mathématique et Physique, Université catholique de Louvain, Louvain-la-Neuve, Belgium e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Flandrin et al. (eds.), Theoretical Physics, Wavelets, Analysis, Genomics, Applied and Numerical Harmonic Analysis, https://doi.org/10.1007/978-3-030-45847-8_6
63
64
J.-P. Antoine
The trip with the family was planned for May 1968. But there was a bug : no gas station was operational along the roads of France! Thus we had to wait until June, and finally we settled in Cassis (where Alex lived) while working in Marseille, at the CNRS center (not yet transferred to Luminy). The first task that Alex suggested was to clean up his closet, it contained hundreds of pages on various topics, ranging from analyticity in quantum field theory to basics of Nested Hilbert Spaces. Alex was a man with such a wide scientific culture that he had original ideas about almost every topic .. . .but rarely went to their conclusion, so that there were many loose ends. We sorted and classified everything, ending up with three fat folders, quickly christened “the skeleton.” We made two copies, one for each, and I still have mine (I used it recently). It also generated funny situations, for instance, when I sent a postcard to Alex (no internet at the time!), with the sentence “Do you agree if I use the skeleton?” (in French “Es-tu d’accord que j’utilise le squelette ?”, to be well understood by the postal service employees.. . .) Having all this material at our disposal, we concentrated on our goal, to find a comprehensive formalism that would generalize both NHS and RHS. The net result was the concept of Partial inner product space that emerged in 1973 [3]. The basic papers were published in 1976 [4–6]. [Actually we had written three papers and submitted all three together. A first journal was willing to take one. The next one would accept two. So we gave up and the result is the two articles [4, 5] just cited. [I do not remember what happened to the third one . . . .] I continued along the same lines [7, 8], but Alex quickly moved to an emerging new concept, namely wavelets.. . ., but this is another story. Now we will switch to mathematics, without too much technical details.
2 Rigged Hilbert Spaces This is by now a standard tool in analysis, thanks to the work of Sobolev, Gel’fand, Maurin, Schwartz, etc. [9]. A rigged Hilbert space (RHS) consists of a triplet of spaces: ⊂ H ⊂ × ,
.
(1)
where .H is a Hilbert space, . is a dense subspace of .H, endowed with a locally convex topology .τ finer than the norm topology inherited from .H (i.e., a stronger notion of convergence), and .× is the space of continuous conjugate linear functionals .F (φ) on .. By duality, each space in (1) is dense in the next one and all embeddings are linear and continuous. Standard examples of rigged Hilbert spaces are the Schwartz distribution spaces over .R or .R3 , namely .S ⊂ H ⊂ S× or .D ⊂ H ⊂ D× . In general, one requires that the space . be complete, reflexive, and nuclear [we skip the technical details]. The third property implies that one can use the Gel’fand-Maurin spectral theorem. According to the latter, a self-adjoint operator A in .H that maps . into itself continuously (for .τ ) possesses a complete
Alex Grossmann, from Nested Hilbert Spaces to Partial Inner Product Spaces. . .
65
orthonormal set of generalized eigenvectors .ξλ ∈ × . The vector .ξλ is called a generalized eigenvector of A if it satisfies the equation .A× ξλ = λ∗ ξλ , where .A× is the extension (by duality) to .× of the adjoint .A∗ . Then the statement of the theorem is that, for any .φ, ψ ∈ , one has ˆ φ|ψ =
.
R
ˆ ξλ (φ) ξλ (ψ) dμ(λ) :=
R
φ|ξλ ξλ |ψ dμ(λ),
(2)
where .·|· is the sesquilinear form putting . and .× in duality and .μ is some measure on .R. In quantum mechanics, if one splits the measure .μ into a discrete part (essentially a sum of .δ-functions) and a continuous part, one recovers the distinction between normalizable .ξλ ∈ H and non-normalizable eigenvectors (.ξλ ∈ × \ H), that is, the discrete and the continuous part of the spectrum of the observable A. Thus (2) justifies Dirac’s formalism. This is very good, but this approach has a serious drawback: it requires mastering nontrivial mathematical notions from functional analysis. Another problem with the RHS (1) is that, besides the Hilbert space vectors, it contains only two types of elements, “very good” functions in . and “very bad” ones in .× . If one wants a fine control on the behavior of individual elements, one has to interpolate somehow between the two extreme spaces. In the case of the Schwartz triplet, .S ⊂ L2 ⊂ S× , a well-known solution is given by a chain of Hilbert spaces, the so-called Hermite representation of tempered distributions [10]. Actually this was anticipated by Alex already in 1965 [11], as we shall see in Sect. 6. These are two reasons that urged us, and Alex in the first place, to invent a powerful new language, namely Nested Hilbert Spaces and Partial inner product spaces, to which we turn now.
3 Nested Hilbert Spaces The original definition of Grossmann runs as follows [12, 13]. Consider the triple VI = [Vr ; Esr ; I ], where
.
(1) .I is an index set, directed to the right, with an order-reversing involution .r ↔ r and a self-dual element .o = o. (2) For each .r ∈ I, Vr is a Hilbert space. (3) For every .p ∈ I and every .q ≥ p, Eqp is an injective, bounded linear mapping from .Vp into .Vq , with dense range, such that .Epp is the identity on .Vp for every .p ∈ I and .Eqp = Eqr Erp whenever .q ≥ r ≥ p (.Eqp is called a nesting). (4) .VI is the algebraic inductive limit of the family .{Vr , r ∈ I } with respect to the nestings .Eqp ; this means the following: in the disjoint union . r∈I Vr , define an equivalence relation by writing .fp ∼ fr , (fp ∈ Vp , fr ∈ Vr ) if there is
66
J.-P. Antoine
an .s ≥ p, r such that .Esp fp = Esr fr ; then the set of equivalence classes is a vector space, noted .VI . One notes .EI r the natural embedding of .Vr into .VI . Then the triple .VI = [Vr ; Esr ; I ] is called a nested Hilbert space (NHS) if the following two conditions are satisfied: (nh.1 ) .VI is stable under intersection: for any two .r, q ∈ I , there is a .p ≤ r, q such that .EIp Vp = EI r Vr ∩ EI q Vq . (nh.2 ) For every .r ∈ I , there exists a (unique) unitary map .urr from .Vr onto .Vr such that .uoo = 1 and .(Esr )∗rs = urr Ers uss . Condition (nh.1 ) guarantees that the family .{Vr , r ∈ I } is a lattice, whereas Condition (nh.2 ) states that .Vr is the conjugate dual of .Vr , with the dual norm, .urr being the unitary (Riesz) operator establishing the duality between the two. Notice that the embeddings .Eqp need not reduce to the identity; they have to only satisfy Condition (3). Examples are given in [14, Sec.2.4.1]. The next step is to define a partial inner product on the NHS .VI . Given a vector .f ∈ VI , we consider the set .J (f ) = {r ∈ I : f ∈ Vr } that characterizes the behavior of f . Write .J (h) = {r ∈ I : r ∈ J (h)}. Then, given .f, g ∈ VI such that .J (f ) ∩ J (g) is not empty, the number .f |g := (urr fr , gr ) = (fr , urr gr ) is independent of r and called the inner product of f and g. This is obviously a sesquilinear form defined on any pair .f, g ∈ VI such that .J (f ) ∩ J (g) = ∅. Such a pair will be called compatible and .·|· a partial inner product on .VI . Next we introduce operators on a NHS. The idea is to replace unbounded, or even very singular, operators by a (coherent) collection of bounded operators .{Asr : Vr → Vs }, called representatives. Actually the original definition of [12] considers operators as algebraic inductive limit of sets of representatives, but a much simpler definition, yet equivalent, will be given in the context of PIP-spaces. For simplicity, we will give that version only, in Sect. 4. Suffice it to say that every operator A has an adjoint .A× verifying the natural relation .y|Av = A× y|v, when all the expressions are well defined. Also two operators .A, B can be multiplied only when matching conditions are satisfied, and then .(AB)× = B × A× . It follows that operators on a NHS form a partial *-algebra [14, 15]. Actually the final description of NHS given in [12, 13] was preceded by the paper [16], which clearly shows the original motivation, namely to find a elegant and rigorous formulation of quantum mechanics. The same applies to several papers on application of NHS, for instance [17, 18]. As explained in Sect. 1, the search for a more general formulation gradually turned to our next topic, namely Partial inner product spaces (PIP-spaces). Before that, Alex had already tried to generalize the NHS, under various names, such as “Espaces hilbertiens généralisés” or Dirac spaces [19]. The latter, in particular, is interesting, because it is defined as a collection of Banach spaces or Hilbert spaces: PIP-spaces are not far!
Alex Grossmann, from Nested Hilbert Spaces to Partial Inner Product Spaces. . .
67
4 Towards Partial Inner Product Spaces In a RHS, the form .·|· defines a partial inner product on .× , in the sense that the product .ξ |η is defined only when one of the two elements .ξ, η belongs to .. For a NHS .VI , we have seen above that the inner product .f |g is defined only if .f, g are compatible in the sense that .J (f ) ∩ J (g) = ∅. Thus, in both cases, the crucial notion is that of compatibility (although that of a RHS is poor, as pointed out in Sect. 2). This prompted Alex and I to base our formalism on an abstract notion of linear compatibility on a vector space V . That is, a symmetric binary relation .f # g which preserves linearity: f # g ⇐⇒ g # f, ∀ f, g ∈ V ,
.
f # g, f # h ⇒ f # (αg + βh), ∀ f, g, h ∈ V , ∀ α, β ∈ C. As a consequence, for every subset .S ⊂ V , one has S ## = (S # )# ⊇ S, S ### = S # .
.
We will call assaying subspace of V a subspace S such that .S ## = S and denote by .F(V , # ) the family of all assaying subsets of V , ordered by inclusion. Let F be the isomorphy class of .F, that is, .F considered as an abstract partially ordered set. Elements of F will be denoted by .r, q, . . ., and the corresponding assaying subsets .Vr , Vq , . . .. By definition, .q ≤ r if and only if .Vq Vr . We also write # .Vr = Vr , r ∈ F . In other words, vectors should not be considered individually, but only in terms of assaying subspaces, which are the building blocks of the whole structure. For instance, .f # g if and only if there is a .r ∈ F such that .f ∈ Vr and .g ∈ Vr . Then one has the following standard result from universal algebra: Theorem 1 The family .F(V , # ) ≡ {Vr , r ∈ F }, ordered by inclusion, is a complete involutive lattice, i.e., it is stable under the following operations, arbitrarily iterated: : : ..Supremum :
..I nvolution ..I nf imum
.Vr .Vp∧q .Vp∨q
# ↔ . .Vr = (Vr ) . := . .Vp ∧ Vq = Vp ∩ Vq , .(p, q, r ∈ F ). ## := . .Vp ∨ Vq = (Vp + Vq ) .
The smallest element of .F(V , # ) is .V # = r Vr and the greatest element is .V = V . By definition, the index set F is also a complete involutive lattice; r r for instance, (Vp∧q )# = Vp∧q = Vp∨q = Vp ∨ Vq .
.
68
J.-P. Antoine
A partial inner product on .(V , # ) is a Hermitian form .·|· defined exactly on compatible pairs of vectors. A partial inner product space (PIP-space) is a vector space V equipped with a linear compatibility and a partial inner product. The latter is usually assumed to be positive definite, but this is not compulsory. The partial inner product clearly defines a notion of orthogonality: .f ⊥ g if and only if .f # g and .f |g = 0. The PIP-space .(V , # , ·|·) is nondegenerate if # ⊥ = {0}, that is, if .f |g = 0 for all .f ∈ V # implies .g = 0. From now on, .(V ) we will assume that our PIP-space .(V , # , ·|·) is nondegenerate. As a consequence, # .(V , V ) and every couple .(Vr , Vr ), r ∈ F, are a dual pair in the sense of topological vector spaces [20]. Now, one wants the topological structure to match the algebraic structure, in particular, the topology .τr on .Vr should be such that its conjugate dual be .Vr : × = V , ∀r ∈ F . This implies that the topology .τ must be finer than the .(Vr [τr ]) r r weak topology .σ (Vr , Vr ) and coarser than the Mackey topology .τ (Vr , Vr ): σ (Vr , Vr ) τr τ (Vr , Vr ).
.
Throughout the theory, we will assume that every .Vr carries its Mackey topology τ (Vr , Vr ). This choice has two interesting consequences. First, if .Vr [τr ] is a Hilbert space or a reflexive Banach space, then .τ (Vr , Vr ) coincides with the norm topology. Next, .r < s implies .Vr ⊂ Vs , and the embedding operator .Esr : Vr → Vs is continuous and has dense range. In particular, .V # is dense in every .Vr . Standard examples include sequence spaces, spaces of locally integrable functions, Rigged Hilbert spaces, and Nested Hilbert spaces. As we have seen, .F(V , # ) is a huge lattice (it is complete!) and assaying subspaces may be complicated, such as Fréchet spaces, nonmetrizable spaces, etc. The theory may be developed in full generality, as shown in [7, 8]. In practice, however, a more restrictive class is largely sufficient, namely lattices of Banach spaces (LBS) or Hilbert spaces (LHS). In order to situate these within the general theory, one chooses an involutive sublattice .I ⊂ F, indexed by I , such that
.
(i) .I is generating: .f # g ⇔ ∃ r ∈ I such that f ∈ Vr , g ∈ Vr . (ii) Every .Vr , r ∈ I , is a Hilbert space or a reflexive Banach space. (iii) There is a unique self-dual, Hilbert, assaying subspace .Vo = Vo . In the Hilbert case, one has, of course, recovered the notion of NHS (with unit nestings). In both cases, one equips intersections and unions with the so-called projective, resp. inductive, norms, inherited from interpolation theory (see [14]). Familiar examples of LBS/LHS are 1. Chains of Banach or Hilbert spaces, such as the chain of Lebesgue spaces on a finite interval .I = {Lp ([0, 1], dx), .1 < p < ∞} or the scale of Hilbert spaces ∗ .{Hn , n ∈ Z} built on powers of an unbounded self-adjoint operator .A = A ≥ 1. 2. Various lattices of weighted Hilbert spaces of sequences or of locally integrable functions.
Alex Grossmann, from Nested Hilbert Spaces to Partial Inner Product Spaces. . .
69
3. The lattice generated by the spaces .Lp (R, dx), 1 < p < ∞, which is not a chain. 4. Many more spaces used in analysis or signal processing, such as Mixed norm p,q p,q Lebesgue spaces .Lm (R2 ), Amalgam spaces, Modulation spaces .Mm (Rd ) (see [14]). 5. Several PIP-spaces of entire functions with Bargmann’s space as central Hilbert space [21]. Given the Gaussian measure on .C, dμ(z) = π −1exp(−|z|2 dz, and a measurable real-valued function .ρ on .C, define the Hilbert space .F(ρ) of all entire functions f such that ˆ .
f 2ρ :=
C
|f (z)|2 e−ρ(z) dμ(z) < ∞.
Then one can build several PIP-spaces from families of (assaying) spaces .F(ρ), which were in fact introduced by Alex in a talk at UCL.
5 Operators on PIP-Spaces 5.1 General Definitions As already mentioned, the basic idea of PIP-spaces is that vectors should not be considered individually, but only in terms of the subspaces .Vr (.r ∈ F or .r ∈ I ), the building blocks of the structure. Correspondingly, an operator on a PIP-space should be defined in terms of assaying subspaces only, with the proviso that only bounded operators between Hilbert or Banach spaces are allowed. Thus an operator is a coherent collection of bounded operators. For simplicity, we restrict the definition to the case of a LBS/LHS, the general case is essentially identical. More precisely, given a LHS or LBS .VI = {Vr , r ∈ I }, an operator on .VI is a map .A : D(A) → V , where (i) .D(A) = q∈d(A) Vq , with .d(A) a nonempty subset of I . (ii) For every .q ∈ d(A), there exists a .p ∈ I such that the restriction of A to .Vq is linear and continuous into .Vp (we denote this restriction by .Apq ). (iii) A has no proper extension satisfying (i) and (ii). The set of all operators on .VI is denoted .Op(VI ). Operators between two LBS/LHS VI , YK are defined in the same way and their set is denoted by .Op(VI , YK ). The bounded operator .Apq : Vq → Vp is called a representative. The operator A may be characterized by the set .j(A) = {(q, p) ∈ I × I : Apq exists}. Thus the operator A may be identified with the collection of its representatives and every one of them uniquely defines A. The collection .j(A) is coherent, in the sense that, whenever .p < p , q > q , one has .Ap q = Ep p Apq Eqq .
.
70
J.-P. Antoine
The crucial fact in the definition is the maximality condition (iii), which implies that an operator A cannot have any extension, contrary to unbounded operators in a Hilbert space, for instance. The idea behind the notion of operator is to keep also the algebraic operations on operators, namely: (i) Adjoint : every .A ∈ Op(VI ) has a unique adjoint .A× ∈ Op(VI ), defined by the relation A× x|y = x|Ay, for y ∈ Vr , r ∈ d(A), and x ∈ Vs , s ∈ i(A),
.
(3)
that is, .(A× )rs = (Asr )∗ (usual Hilbert/Banach space adjoint). Here one has defined i(A) = {p ∈ I : there is a q such that Apq exists},
.
a set that plays the role of the range of A. It follows from the definition (3) that A×× = A, for every .A ∈ Op(VI ): no extension is allowed, by the maximality condition (iii). (ii) Partial multiplication: AB is defined if and only if there is a .q ∈ i(B) ∩ d(A), that is, if and only if there is a continuous factorization through some .Vq : .
B
A
Vr −→ Vq −→ Vs ,
.
i.e., (AB)sr = Asq Bqr .
It is worth noting that, for a LHS/LBS, the domain .D(A) is always a vector subspace of V (this is not true for a general PIP-space). Therefore, Op.(VI ) is a vector space and a partial *-algebra [15].
5.2 Homomorphisms and Orthogonal Projections There are many classes of operators on a PIP-space. The one we need most is that of homomorphisms Let .VI , YK be two LHSs or LBSs. An operator .A ∈ Op(VI , YK ) is called a homomorphism if (i) For every .r ∈ I there exists .u ∈ K such that both .Aur and .Aur exist. (ii) For every .u ∈ K there exists .r ∈ I such that both .Aur and .Aur exist. Equivalently, for every .r ∈ I , there exists .u ∈ K such that .(r, u) ∈ j(A) and (r, u) ∈ j(A), and for every .u ∈ K, there exists .r ∈ I with the same property.1
.
1 Contrary to what is stated in [14, Def. 3.3.4], the conditions (i) and (ii) do not imply .j(A) = I ×K and .j(A× ) = K × I .
Alex Grossmann, from Nested Hilbert Spaces to Partial Inner Product Spaces. . .
71
Such homomorphisms are needed in two instances. The first one is a formulation of the theory in the language of categories [22]. This is interesting, since operators on a PIP-space constitute sheaves and cosheaves, the latter being rather uncommon. The second case where homomorphisms play a crucial role is the definition of orthogonal projections [6] or [14, Sec.3.4]. Since PIP-spaces tend to mimic the properties of Hilbert spaces, one may wonder whether this applies also to geometry, in particular, the bijection between (closed) subspaces and orthogonal projections. To that effect we need a good definition of the latter. We adopted the following one : an orthogonal projection in a nondegenerate PIP-space is a homomorphism P satisfying the conditions .P 2 = P = P × . Then the main result is that a subspace W of a PIP-space V is orthocomplemented if and only if it is the range of an orthogonal projection .P : W = P V and V = W ⊕ Z.
.
There are equivalent conditions, for instance, the vector subspace W is orthocomplemented if it satisfies the following two conditions: (i) For every assaying subset .Vr ⊆ V , the intersections .Wr = W ∩ Vr and .Wr = W ∩ Vr are a dual pair in V . (ii) The intrinsic Mackey topology .τ (Wr , Wr ) coincides with the Mackey topology .τ (Vr , Vr )|Wr induced by .Vr . The upshot is that we have in this way the natural notion of PIP-subspace that generalizes that of Hilbert subspace. And indeed, the set Proj(.VI ) of all orthogonal projections enjoys lattice properties similar to their Hilbert space analogues. As a final remark, one may note that, in a nondegenerate PIP-space V , a finitedimensional subspace .W of V is orthocomplemented if and only if it is contained in # ⊥ = {0}. This may have significance in problems .V and nondegenerate, i.e., .W ∩W of approximation.
5.3 Symmetric Operators and Self-Adjointness A standard problem is to determine whether a symmetric operator in a Hilbert space H is in fact self-adjoint. This is the case, in particular, for quantum Hamiltonians. The usual technique is to start from the restriction to a smaller subspace and exploit the theory of self-adjoint extensions of von Neumann. But there is an alternative, namely to start from a space larger than .H and search for self-adjoint restrictions to .H. This approach, leading to the KLMN theorem, [KLMN stands for Kato, Lax, Lions, Milgram, Nelson], a technique pioneered by Nelson [23] and translated by us in PIP-space language [5]. The theorem runs as follows.
.
72
J.-P. Antoine
Theorem 2 Let .VI be a nondegenerate, positive definite PIP-space with a central Hilbert space .Vo = Vo and let .A = A× ∈ Op(VI ) be a symmetric operator. Assume there exists a .λ ∈ R such that .A − λ has an invertible representative .Asr − λEsr from a “small” .Vr onto a “big” .Vs (i.e., .Vr ⊆ Vo ⊆ Vs ). Then there exists a unique restriction of .Asr to a self-adjoint operator A in the Hilbert space .Vo . The number .λ does not belong to the spectrum of A. The domain of A is obtained by eliminating from .Vr exactly the vectors f that are mapped by .Asr beyond .Vo (i.e., satisfy .Asr f ∈ / Vo ). The resolvent .(A − λ)−1 is compact (trace class, etc.) if and only if the natural embedding .Esr : Vr → Vs is compact (trace class, etc.). Several variants of the theorem are available, for instance, in terms of quadratic forms, see or instance [14, Sec. 3.3.5]. Applications include the analysis (in momentum space) of Schrödinger (or Dirac) Hamiltonians in the presence of singular interactions [24] or solid state Hamiltonians [25] (see Sec. 6.3 below). Another application of KLMN-type approaches is a generalization of the Gel’fand-Maurin spectral theorem in a PIP-space [29]. Actually this discussion needed a full analysis of spectral properties of a self-adjoint operator, including the resolvent and its analyticity properties, generalized eigenvectors, and so on, going beyond the standard RHS result of Sect. 2. On the way, we had to define the inverse of an PIP-space operator, a missing piece of the puzzle.
6 Applications in Quantum Mechanics 6.1 General Formulation As it is well known, the natural framework of nonrelativistic quantum mechanics is the Schwartz triplet, .S ⊂ L2 ⊂ S× , suggested by the operators of position q and momentum p. A familiar formulation is the so-called Hermite representation of tempered distributions introduced by B. Simon [10], which consists in a chain of Hilbert spaces. Actually this was anticipated and generalized by Alex already in 1965 [11]. He defines recursively the operators M0 = 1
.
M1 = x 2 + p2 .. . Mn = xMn−1 x + pMn−1 p. Then he considers the numbers .(u, Mn u), u ∈ S and the Hilbert space .Hn as the set of functions .u ∈ S for which
Alex Grossmann, from Nested Hilbert Spaces to Partial Inner Product Spaces. . .
.
73
[(αn)−2 (u, Mn u) < ∞, α > 1/2. n
Since the numbers .(u, Mn u) can be reexpressed in terms of Hermite functions, we obviously recover the usual formulation, but in fact one gets much more. Given a sequence .{β} = {β0 , β1 , . . . , βn } of positive numbers, denote by .S(β) the set of all .u ∈ S such that . βn (u, Mn u) < ∞. n
Then .S(β) is a Hilbert space of test functions, specified by the sequence .{β}, and its dual is a Hilbert space of distributions, possibly much larger than .S× . This construction manifestly anticipates PIP-spaces! An interesting characteristic of .S(β) is that, for .βn decreasing sufficiently fast as .n → ∞, elements .u ∈ S(β) may be analytically continued to an entire function .u(x + iy) = u(z). This leads us to another domain of interest of Alex.
6.2 Resonances, Analyticity Properties In the skeleton, we had found many pieces of work concerning analytic functions in quantum mechanics. This is not surprising, since Alex had published, together with T.T.Wu [26], a pioneering paper on analyticity properties of Schrödinger scattering amplitudes. In the same vein, resonances may be defined as poles of the analytically continued amplitude. Actually, there are other possible definitions and it is not clear how they compare. In view of this cacophony, Ludwig Streit convened in 1984 a meeting in Bielefeld in order to clean the subject. Alex was of course invited. When his time came, he went to the overhead, put an empty transparency on it, and started to think . . . to think . . . to think (he had visibly not prepared a talk !). This lasted for several minutes, the audience was getting nervous, until finally Alex started to write something, of course original and interesting! You need some patience to work with him . . . . As mentioned in Sect. 4, PIP-spaces of entire functions with Bargmann’s space as central Hilbert space have been defined in [21], following suggestions from Alex. Actually these are in fact closely related to some of the spaces .S(β) described above. Interestingly, Bargmann’s space, which is at the core of the Bargmann-Segal (or Fock-Bargmann) representation of quantum mechanics, is also the source of many examples and counterexamples about PIP-spaces. In addition, since it consists of functions .u(x + iy) = u(z), it is a phase space representation, another subject omnipresent in Alex’s work. It provides also a link to frame theory, a key topic in Alex’s collaboration with Ingrid Daubechies [27]. Finally, it is well known that resonances are beautifully described in the theory of dilation-analytic operators, and here too Alex produced nice results. [28]. A comprehensive study of PIP-spaces of
74
J.-P. Antoine
entire functions may be found in [14, Sec.4.6]. More recently, analyticity properties surfaced in PIP-spaces [29], as mentioned already in Sec. 5.3, when studying the spectrum of a self-adjoint operator.
6.3 Condensed Matter Physics A beautiful application of NHS or PIP-spaces is the analysis of Schrödinger Hamiltonians in the presence of singular interactions, exemplified by the KronigPenney model [24], whose “potential” consists of a periodic array of .δ-functions in one dimension. Consider, in momentum space and arbitrary dimension .ν, the kinetic energy operator T , which is a multiplication operator .(T φ)(p) = t (p)φ(p) by a positive unbounded function .t (p). Then define the following family of Hilbert spaces ˆ Hr (Rν ) = {φ :
.
(t (p) + 1)r |φ(p)|2 dν (p) < ∞}.
This is again a scale of Hilbert spaces, thus a PIP-space, and one has H2 ⊂ H1 ⊂ H0 = L2 ⊂ H−1 ⊂ H−2 .
.
Next, consider exponentials, which correspond to .δ-functions in the x-representation. Then the key point is that each exponential .eνx ∼ eix.p , x, p ∈ Rν , in dimension .ν belongs to some minimal space .Hr , where r varies with .ν. For instance, e1x ∈ H−1 (R)
.
e2x ∈ H−1 (R2 ), e2x ∈ H−2 (R2 ) e3x ∈ H−1 (R3 ), e3x ∈ H−2 (R3 ) eνx ∈ H−2 (Rν ), if ν ≥ 4. Next one considers the resolvent .R(E) = (T − E)−1 of T and one observes that .R(E) is bounded with bounded inverse from .Hr to .Hr+2 , and similarly for 1/2 (E) : H → H .R r r+1 . Then it only remains to exploit the KLMN theorem of Sec. 5.3 to obtain by restriction self-adjoint operators in .H0 . Now, in order to obtain a decent Hamiltonian, the formal analogue of .T +||, . ∈ H−1 (mildly singular perturbation) or . ∈ H−2 (strongly singular perturbation), one defines its resolvent and develops the familiar resolvent formalism, exploiting the properties of every operator mapping .Hr to .Hs . In particular, one obtains a full description of the spectra. The strength of the PIP-space method is that one can treat both mildly and strongly singular perturbations by the same formalism. A similar analysis can be applied to many different situations, such as
Alex Grossmann, from Nested Hilbert Spaces to Partial Inner Product Spaces. . .
75
rows or lattices of .δ-functions in 1 or 3 dimensions, derivatives of .δ-functions or model molecules. A similar analysis allows to study (reduced) Bloch Hamiltonians of the type .H = T + V , where T is a fairly general kinetic energy and V is a periodic crystal field [25]. “Reduced” means that one considers the Hamiltonian .Hk for fixed quasimomentum k , which acts in the triplet of Hilbert spaces built on the powers of the kinetic energy .t (p), that is, H1 ⊂ H0 ⊂ H−1 ,
.
where .H0 is the usual .2 sequence space. The distinguished feature of this triplet is that all embeddings (nestings) are compact. This implies that the reduced Hamiltonian has a purely discrete spectrum, which proves the existence of bands in the full spectrum.
7 The Legacy: Operator Partial Algebras As I said before, my active collaboration with Alex on PIP-spaces lasted until a new wind started to blow, soon to become a storm that engulfed the Centre de Physique Théorique of Marseille, namely wavelets. The story started around 1980 with a collaboration between Alex and Jean Morlet [30], soon followed by many others. Alex got excited, started to play on a computer and, at least for a while, quit PIP-spaces. However, before jumping on the wavelet train (see Sect. 7 below), I continued, having dragged into the subject two PhD students, Françoise Mathot and Juma Shabani., but Alex did not follow. As he said himself, his only contribution to the subject was the title “POp*-algebras” (pop stars) !! Françoise was actually considering the partial algebra of operators on a NHS .VI [31], with the idea of finding genuine algebras inside, including a von Neumann algebra. Her result is that Op.(VI ) contains three *-algebras .C ⊆ B ⊆ A. Among these, .A is a *-algebra, .B is a Banach algebra and a C.∗ algebra under certain conditions, and .C is a von Neumann algebra, thus generated by its orthogonal projections—which are therefore sufficiently numerous. Next she studied the spectral properties of operators contained in each of the three algebras [32]. In both papers, she actually interacted significantly with Alex. As for Shabani, he studied various types of commutants and bicommutants of a subset .R of a PIP-space .VI [33, 34]. This then paved the way for the analysis of a general partial *-algebra of operators, the subject matter of our later textbook [15]. Actually this approach allowed him to obtain a nice definition of an unsmeared quantum field (field at a point) [35], parallel to that given by Alex [17].
76
J.-P. Antoine
8 Towards 2D Wavelets Another decisive interaction with Alex took place in Spring 1987. I had invited him in UCL, Louvain-la-Neuve, where he gave a course on phase space quantum mechanics. He also gave a talk on 1D wavelets, in particular, for advertising them to our chemists colleagues who struggled with signal processing. We had already started to collaborate on application of wavelets in NMR spectroscopy—and indeed I had become a sort of permanent visitor at the CPT, where the action was. Later this activity was greatly expanded with the help of EU Marie Curie Fellowships (see, for instance, [36, 37]. The story starts in the coffee room of the Institut de Physique Théorique of UCL. Thus the two of us were discussing a possible PhD topic for a young African student, called Romain Murenzi. The latter had just concluded a master’s thesis on five-dimensional quantum field theory, a subject hardly practical for a developing country! So the idea came up, why not try to do in two dimensions what had been so successful in 1D, namely wavelet analysis? The topic seemed tractable, involving moderate amounts of mathematics and some simple computing technology, and if it worked out, there could be very interesting practical applications. The problem was that nobody knew how to do it! The next summer, Romain went down to Marseille and started to work with Alex and Ingrid Daubechies who happened to be there too. When Romain came back three months later, the solution was clear. The key is to start from the operations that one wants to apply to an image, namely translations in the image plane, rotations for choosing a direction of sight, and global magnification (zooming in and out). The problem is to combine these three elements in such a way that the wavelet machine could start rolling (there are mathematical conditions to satisfy here). The result of Romain was that the so-called similitude group yields a solution (actually, the only one). There remained to put it all together, to turn the mathematical crank and to apply the resulting formalism to a real problem, namely 2D fractals, and the PhD thesis was within reach [38]. In addition, Romain visited Alain Arnéodo and his group in Bordeaux, where 2D fractals were the daily staple, and taught them how to use 2D wavelets. Many papers followed, more MSc or PhD students got involved over the years, thus continuing the interaction between Louvain-la-Neuve and Marseille. The key point is that this solution is universal: in order to design a continuous wavelet transform (CWT) on some manifold, one has to identify the operations one wants to apply to a signal that lives on it. Then, if these operations constitute a group with the required properties, the construction is straightforward. This method has been applied successfully to the CWT in 3D, on the 2-sphere, on a hyperboloid, on space-time. See [39] for a detailed analysis.
Alex Grossmann, from Nested Hilbert Spaces to Partial Inner Product Spaces. . .
77
9 Epilogue As we have seen, the influence of Alex’s ideas has spread in many directions. Here I am limiting myself to PIP-spaces and their descendants, on the one hand, and to wavelets, on the other hand. After the original papers [4–6], the structure surfaced again around 2007, in a serendipitous way. I had heard several times Hans Feichtinger (Vienna) claim that a single .Lp space (on [0,1] or on .R) has no intrinsic meaning, one has to consider the whole family .Lp , 1 ≤ p ≤ ∞ and operators acting globally on it, like translations or Fourier transform. Then I realized that PIP-spaces provided precisely such a formalism, so that it might be time to write a comprehensive account of it. Camillo Trapani (Palermo) agreed and was interested in such an enterprise since he had worked extensively on RHS, distributions and related matters. So we started and the result was our book [14]. I already mentioned the analysis of spectral properties of self-adjoint operators, with incidence on the formulation of quantum mechanics. Another development pertains to signal processing, in connection with frames, semi-frames and reproducing pairs. In all cases, the problem is to represent a signal in terms of some measurable functions on a manifold. And there too, PIP-spaces provide a valuable insight and far reaching generalizations [40, 41]. Another yet application has to do with the metric operators, a favorite tool in the so-called .PT-symmetric quantum mechanics, based on non-self-adjoint Hamiltonians. Indeed, a metric operator generates a whole lattice of Hilbert spaces, so that PIP-spaces are the obvious formalism to use. See the papers [42–44] for a review of this development. . . . And the adventure continues. . . . Thank you, Alex !
References 1. Antoine, J-P.: Dirac formalism and symmetry problems in Quantum Mechanics. I. General Dirac formalism. J. Math. Phys.10, 53–69 (1969) 2. Antoine, J-P.: Dirac formalism and symmetry problems in Quantum Mechanics. II. Symmetry problems. J. Math. Phys. 10, 2276–2290 (1969) 3. Antoine, J-P., Grossmann, A. : Partial inner product spaces. I. Definitions and general properties, Rapport CNRS Marseille 73/P.563 4. Antoine, J-P., Grossmann, A. : Partial inner product spaces. I. General properties. J. Functional Analysis 23, ( 369–378 1976) 5. Antoine, J-P., Grossmann, A. : Partial inner product spaces. II. Operators, J. Functional Analysis 23, 379–391 (1976) 6. Antoine, J-P., Grossmann, A. : Orthocomplemented subspaces of nondegenerate partial inner product spaces. J. Math. Phys. 19, 329–335 (1978) 7. Antoine, J-P.: Partial inner product spaces III. Compatibility relations revisited. J. Math. Phys. 21, 268–279 (1980) ; Err. ibid. 22, (1137 1981) 8. Antoine, J-P.: Partial inner product spaces IV. Topological considerations. J. Math. Phys. 21, 2067–2079 (1980)
78
J.-P. Antoine
9. Gel’fand,I.M., Shilov, G.E. : Generalized Functions, Vols. I-III. Academic Press, New York and London,(1964–1968); Gel’fand,I.M., Vilenkin, N.Ya. : Generalized Functions , Vol. IV. Academic Press, New York and London (1964) 10. Simon, B. : Distributions and their Hermite expansions. J. Math. Phys., 12,140–148 (1971). 11. Grossmann, A. : Hilbert spaces of type S. J. Math. Phys. 6, 54–67 (1965) 12. Grossmann, A. : Elementary properties of nested Hilbert spaces. Commun. Math. Phys 2, 1-30 (1965) 13. Grossmann, A. : Homomorphisms and direct sums of nested Hilbert spaces. Commun. Math. Phys 4, 190–202 (1967) 14. Antoine, J-P., Trapani, C. : Partial Inner Product Spaces—Theory and Applications. Lecture Notes in Mathematics, vol. 1986, Springer, Berlin, Heidelberg (2009) 15. Antoine, J-P., Inoue, A., Trapani, C. : Partial *-Algebras and Their Operator Realizations. Kluwer, Dordrecht (2002) 16. Grossmann, A. : Nested Hilbert spaces in quantum mechanics. I. J. Math. Phys. 5, 1025–1037 (1964) 17. Grossmann, A. : Fields at a point. Commun. Math. Phys. 4, 203—2016 (1967) 18. Grossmann, A. : Lectures on nested Hilbert spaces. In: Ramakrishnan, A. (ed.) Matscience Symposia on Theoretical Physics, vol.4, Plenum Press, New York (1968) 19. Grossmann, A. : Dirac spaces. 1ère Rencontre entre Physiciens et Mathématiciens. Lyon (1969) 20. Köthe, G. : Topological Vector Spaces I. Springer, Berlin (1966) 21. Antoine, J-P., Vause, M. : Partial inner product spaces of entire functions. Ann. Inst. Henri Poincaré 35, 195–224 (1981) 22. Antoine, J-P., Lambert, D., Trapani, C. : Partial inner product spaces: Some categorical aspects,. Adv. in Math. Phys. Vol. 2011, 957592 (2011). 23. Nelson, E. : Interaction of nonrelativistic particles with a quantized scalar field. J. Funct. Anal. 11, 211–219 (1972) 24. Grossmann, A., Hoegh-Krohn, R., Mebkhout, M. : A class of explicitly soluble, local, manycenter Hamiltonians for one-particle quantum mechanics in two and three dimensions. I. J. Math. Phys. 21, 2376–2385 (1980) 25. Avron, J., Grossmann, A., Rodriguez, R., Zak, J. : Spectral properties of reduced Bloch Hamiltonians. Ann. Phys. (N.Y.). 103, 47–63 (1977) 26. Grossmann, A., Wu, T.T. : Schrödinger scattering amplitudes III. J. Math. Phys. 3, 684-689 (1962) 27. Daubechies, I., Grossmann, A. : Frames in the Bargmann space of entire functions. Commun. Pure Appl. Math. 41, 151–164 (1986) 28. Balslev, E., Grossmann, A., Paul, T. : a characterization of dilation-analytic operators. Ann. Inst. Henri Poincaré 45, 277–292 (1977) 29. Antoine, J-P., Trapani, C. : Operators on partial inner product spaces: Towards a spectral analysis. Mediterranean J. Math. 13, 323–351 (2016) 30. Grossmann, A., Morlet, J. : Decomposition of Hardy functions into square integrable wavelets of constant shape. SIAM J. Math. Anal. 15, 723–736 (1984) 31. Debacker-Mathot, Fr. : Some operator algebras in nested Hilbert spaces. Commun. Math. Phys 42,183–193 (1975) 32. Debacker-Mathot, Fr. : Spectral properties in a class of operators and group representations in nested Hilbert spaces. Rep. Math. Phys. 11, 361–375 (1977) 33. Shabani, J. : Commutants of a family of operators on a partial inner product space. J. Math. Phys. 25, 3204–3208 (1984) 34. Shabani, J. : Some unbounded commutants of a set of operators on a partial inner product space. J. Math. Phys. 29, 2405–2410 (1988) 35. Shabani, J. : Quantized fields and operators on a partial inner product space. Ann. Inst. H. Poincar’e 48, 97–104 (1988) 36. A. Suvichakorn, H. Ratiney, S. Cavassila, and J-P. Antoine, Wavelet-based techniques in MRS, Chapter 9 in Signal Processing, pp. 167–196. S. Miron (ed.), IN-TECH, Vienna, Austria, and Rijeka, Croatia, (2010)
Alex Grossmann, from Nested Hilbert Spaces to Partial Inner Product Spaces. . .
79
37. C. Lemke, A. Schuck Jr., J-P. Antoine, and D. Sima : Metabolite-sensitive analysis of magnetic resonance spectroscopic signals using the continuous wavelet transform. Meas. Sci. Technol. 22 114013(2011) 38. Murenzi, R. : Ondelettes multidimensionnelles et applications à l’analyse d’images. Thèse de Doctorat, Univ. Cath. Louvain, Louvain-la-Neuve (1990) 39. Antoine, J-P., Murenzi, R., Vandergheynst, P., Ali, S.T. : Two-Dimensional Wavelets and their Relatives, Cambridge University Press, Cambridge (UK) (2004) paperback edition (2008) 40. Antoine, J-P., Trapani, C. : Reproducing pairs of measurable functions and partial inner product spaces. Adv. Operator Theory 2, 126–146 (2017) 41. Antoine, J-P., Trapani, C. : PIP-space valued reproducing pairs of measurable functions. Axioms 8, 52–73 (2019) 42. Antoine, J-P., Trapani, C. : Metric operators, generalized hermiticity, and lattices of Hilbert spaces, Chap. 7 in Bagarello, F., Gazeau, J-P., Szafraniec, F.H., Znojil, M.( eds) NonSelfadjoint Operators in Quantum Physics: Mathematical Aspects, pp. 345–402. J. Wiley, Hoboken, NJ ( 2015) 43. Antoine, J-P., Trapani, C. : Metric operators, generalized hermiticity and partial inner product spaces, In: Diagana, T., Toni, B. (eds) Mathematical Structures and Applications, pp. 1-20. Springer, New York (2018) 44. Antoine, J-P., Trapani, C. : Beyond frames: Semi-frames and reproducing pairs, in T. Diagana and B. Toni (eds) Mathematical Structures and Applications, pp. 21-59. Springer, New York (2018 )
Combining Quantum Mechanical Languages (A Tribute to Alex Grossmann) Joshua Zak
Abstract Alex was in control of about 10 languages, including Latin and Greek. He often liked to use his knowledge of languages for telling identity stories. Among them there was a story about a friend of his from Yugoslavia who came to visit him when he was a graduate student at Harvard University. One morning Alex left his friend in the apartment and went down to a nearby grocery to buy some things for breakfast. When Alex was away there was a telephone call asking for him. His friend answered the phone and said that Alex passed away. To which the person on the phone asked- and who are you. The answer was “I’m a ghost.” Here is a short autobiography of Alex. He wrote it in November 2015, when he was nominated for the Wigner medal. I was born in Zagreb, Yugoslavia, on August 5.th, 1930. We survived WW2 by moving and hiding but my father was taken to Auschwitz. He was liberated by the Russians in spring 45 and died in 47. Starting in 1945, we were back in Zagreb. I caught TB and spent the next seven or eight years in sanatoria but managed to spend some time at the university, studying mathematics/physics. I missed the only course on quantum mechanics. Nevertheless, I was hired by an atomic energy institute and sent to a summer school in France in 1954. This brought an invitation to the USA, where I spent ten years. I worked in mathematical physics until the mid-eighties, when I met Jean Morlet and helped start the wavelet story.
1 Introduction This tribute to Alex is on quantum mechanical representations and about combining them. Dirac [1] in his book “THE PRINCIPLES OF QUANTUM MECHANICS” assigns a whole chapter to representations in Quantum Mechanics. In honoring
J. Zak () Department of Physics, Technion—Israel Institute of Technology, Haifa, Israel e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Flandrin et al. (eds.), Theoretical Physics, Wavelets, Analysis, Genomics, Applied and Numerical Harmonic Analysis, https://doi.org/10.1007/978-3-030-45847-8_7
81
82
Joshua Zak
Alex’s knowledge of languages, we call the representations, quantum mechanical languages. Like a representation, a language in quantum mechanics is a set of numbers which can be discrete or in a continuous domain, or both. These numbers serve to connect abstract vectors and operators with the real world. For defining a language in quantum mechanics one uses a complete set of commuting operators, their eigenfunctions, and eigenvalues. Of particular interest to us will be the variables on which the wave functions depend and the space of their variation. There are many languages in quantum mechanics, like there are many languages in the world. In quantum mechanics the most often used representations is the coordinate representation (x-representation) and the momentum representation (prepresentation). Which language to choose in quantum mechanics depends on the problem one deals with. Often the symmetry of the problem dictates the choice of the language. Like working with the spherical coordinates, when dealing with central forces. Such a choice leads to the separation of coordinates and to a significant simplification of the solution of the problem. The next step is combining quantum languages. In what follows we will combine 2 languages, the x-representation and the kq-representation. The above Alex’s story is an example of combining English and Russian for an identity effect. In this paper we are going to discuss a problem in condensed matter physics which has to do with the identity of known Bloch and Wannier functions. Numerous papers were written on these functions. The Bloch functions were introduced in 1928 by F. Bloch [2]. They are eigenfunctions of the Hamiltonian and the finite translations in crystalline solids, and as such they are labeled by the Bloch momentum k (also known as quasi-momentum). The Bloch functions have actually started the quantum theory of condensed matter physics. Being eigenfunctions of finite translations, the Bloch functions are extended functions. 9 years later, in 1937, G. Wannier [3] introduced his functions, by integrating the Bloch functions over k in the Brillouin zone. As such, the Wannier functions are localized, e.g., atomic-like. Since their introduction, the notion of Wannier functions has played an important role in the conceptual development of the electron theory of solids. In more recent years they have also been playing an important role in computations, and one even questions which of the two, the Bloch or the Wannier functions, are more dominant [4, 5]. Since, as was mentioned before, the Wannier functions are obtained by integrating the Bloch functions over k in the Brillouin zone, the Bloch functions are discrete Fourier transforms of the Wannier functions. This is of crucial importance for the proof of the result that the Bloch and the Wannier functions can be chosen to be identical. This looks like impossible, because the Bloch functions are extended while the Wannier functions are localized. Nevertheless, one can show that the Wannier function in the kq-representation and the Bloch function in the xrepresentation are identical. A hint to this wonderful result was pointed out in some of the author’s previous publications, e.g., [6]. But a full and precise formulation of this result is given in what follows. Another feature that interconnects the two functions is the freedom of choice of the k-dependent phase of the Bloch functions. This freedom effects the localization of the Wannier functions [7] since, as was pointed out above, they are defined via
Combining Quantum Mechanical Languages (A Tribute to Alex Grossmann)
83
integration of the Bloch functions over the quasi-momentum k. In this paper we will show that in the framework of our results, when the Bloch function is multiplied by a k-dependent phase, the Wannier functions are multiplied by the same phase! Two quantum mechanical languages will be used in this manuscript, the xlanguage and the kq-language. We start by reminding the reader of the kq -representation. The x-representation (or the x-language) needs no special introduction; it has been known from the very beginning of quantum mechanics. The kq-language is relatively new; it was introduced by the author in 1967 [8]. A good and detailed description of the kq-representation is given in a paper by Janssen [9].
2 The kq-Representation The canonical approach of Classical Mechanics uses the coordinate x and the momentum p in a unique framework together. The uncertainty principle in Quantum Mechanics excludes such an approach and one commonly uses the coordinate description (the x-representation), or the momentum description (the prepresentation). Which representation to use is dictated by the nature of the problem, and, as is well known, for an electron in a homogeneous electric field it is more convenient to use the p-representation [10]. However, the uncertainty principle does not forbid to use partially the x-representation and partially the p-representation. There is a very wide class of problems in the elementary dynamics of electrons in crystalline solids where the symmetry of the problem anticipates the use of such a description. Bearing in mind the important role that the kq-representation has in this paper, we repeat here a short description of it [8]. For a one-dimensional crystal the potential .V (x) being a periodic function of x can be written as a Fourier series 2π xn , .V (x) = Vn exp i (1) a n where a is the period. The potential .V (x) depends on x, only via .exp i 2π x and a its powers. For any given .x, V (x) assumes a definite value, however, when .V (x) is given, x is at best defined only modulo a. This means that the function τ
.
2π a
2π x , = exp i a
(2)
or . V (x) does not carry the full information about x, and the uncertainty principle does not exclude the possibility of using also partial information about p. From the point of view of the physics of the problem, it is well known that for an electron in a periodic potential a translation by a is a conserved quantity. Such a translation by a is given by
84
Joshua Zak
T (a) = exp
.
i pa h¯
,
(3)
∂ x , where .p = −i h¯ ∂x is the momentum operator. Like in the case of .exp i 2π a the function of p, .exp hi¯ pa in Eq. (3), carries partial information about p and it determines p only modulo .h¯ 2π a . Formally, it is easy to check that the two functions in Eqs. (2) and (3) commute .
exp
i 2π x , exp pa = 0. i a h¯
(4)
This means that the partial information about x and p carried by the operators in Eqs. (2) and (3) does not conflict with the uncertainty principle. The question is then whether this is the maximal information that can be simultaneously assigned to x and p? In order to answer positively to this question one has to show that any function .f (x, p) of x and p that commutes with the two operators in Eq. (4) is necessarily a function of them. This follows from the fact that any function .f (x, p) can be expanded in a Fourier integral over the operators .exp(iαx) and .exp(iβp), where .α and .β are continuous variables extending from .−∞ to .∞. But then it can be seen that for this integral to commute with both operators in Eq. (4), it has to turn into a sum containing the operators in Eq. (4) and their different powers. This shows that the two operators in Eq. (4) form a complete system of commuting operators and can therefore be used for defining aquantum mechanical representation. Being (Eqs. (2) and (3)) have a common set of commuting operators, .T (a) and .τ 2π a eigenfunctions which can be chosen to be ψkq (x) =
.
a exp (ikan ) δ(x − q − na) ≡< x | kq > , 2π n
(5)
where .< x | kq > is the Dirac notationof .ψkq (x). Here k and q denote the 2π eigenvalues of the operators .T (a) and .τ a , respectively, because as can be checked one has T (a)ψkq (x) = exp(ika ) ψkq (x) 2π 2π ψkq (x) = exp iq ψkq (x) . τ a a
.
(6)
From Eq. (6) one sees that the different eigenvalues of .T (a) are givenby those 2π are k’s that vary from 0 to . 2π a . Similarly, the different eigenvalues of .τ a given by the q’s that vary in the interval from 0 to a. Correspondingly, k and q are called quasi-momentum and quasi-coordinate because they determine the
Combining Quantum Mechanical Languages (A Tribute to Alex Grossmann)
85
momentum modulo .h¯ 2π a and the coordinate modulo a. They vary in the following rectangle 0≤k≤
.
2π a
0 ≤ q ≤ a,
(7)
which defines the kq-space [11]. eigenfunctions of a complete commuting set of operators .T (a) and Being 2π .τ a , the eigenfunctions .ψkq (x) form a complete set of functions, and any function .ψ(x) can therefore be expanded in them with the expansion coefficients denoted by .C(k, q): ˆ
2π a
ψ(x) =
ˆ
a
.
0
0
C(k, q) ψkq (x)dkdq =
2π a
1 ˆ 2
2π/a
(8)
C(k, x) dk . 0
The first part of Eq. (8) defines the wave function .C(k, q) in the kq-coordinates, while the second part is the relation between the wave functions .ψ(x) and .C(k, q). In order to obtain the inverse relation, we expand .C(k, q) in the set .< kq | x > (complex conjugate of .< x | kq > in Eq. (5)), with the expansion coefficients being .ψ(x). ˆ C(k, q) =
ψ(x) < kq | x > dx =
.
a 1 2 exp(ikan )ψ(q − na) . 2π n
(9)
Equation (9) defines the kq-transform, .C(k, q), while Eq. (8) is the inverse kqtransform. For having a full representation in Quantum Mechanics we need also to know how to write the fundamental operators x and p in the kq-coordinates. As an example let us explicitly find the x-operator in the kq-coordinates (Eq. (5) is used) ˆ ˆ . < kq | x | k q >= < kq | x >< x | x | x >< x | k q > dx dx = ∂ = i + q < kq | k q > , ∂k
(10)
where .
< kq | k q > =
m
2π n . exp (ikam ) δ(q − q − ma) δ k−k − a n
(11) From Eq. (9) it is seen that .C(k, q) satisfies the following periodicity conditions
86
Joshua Zak
2π , q = exp(−ika )C(k, q + a) . .C(k, q) = C k+ a
(12)
With these conditions at hand we can simplify the expression for the x-operator in the kq-coordinates, in view of the fact that ˆ .
< kq | k q > C(k q ) dk dq = C(k, q) .
(13)
Therefore x acts on a kq-function in the following way x=i
.
∂ +q. ∂k
(14)
∂ . ∂q
(15)
Similarly, one shows that p = −i h¯
.
We see that the operators x and p satisfy the basic commutation relation .[x, p] = i h, ¯ as they should. One should, however, pay attention to their non-symmetric form, with x containing two terms. This follows from the particular choice of phase in the eigenfunctions (Eq. (5)). As a consequence of this same choice of phase the boundary conditions in Eq. (12) on the wave function .C(k, q) are also non-symmetric in k and q. Having the operators in Eqs. (2) and (3), their eigenfunctions in Eq. (5), the transformation for the wave function .C(k, q) (Formulas (8) and (9)), and the expressions in Eqs. (14), (15) for the basic operators complete the construction of the kq-representation in one dimension. Formulas for the generalization to 3 dimensions are straightforward. Having the coordinate and momentum operators written by means of k and q (Eqs. (14) and (15)) and having the connections between the wave functions .C(k, q) and .ψ(x) (Eqs. (8) and (9)) one can construct quantum mechanics in the kqrepresentation. Although the latter was developed by using the concept of a periodic potential, it is not necessarily restricted to solids. Thus, the lattice constant a does not appear in expressions (14), (15) for the basic operators. It appears, however, in the definition of the variation range for the coordinates k and q (Eq. (7)), and in the boundary conditions on the wave function (Eq. (12)), where it can be considered as an arbitrary constant. With these remarks in mind the kq-representation is as valid as any representation in quantum mechanics, e.g., the x or p-representations. When the kq-representation appeared in press, Physics Today noticed it [12]. The main feature of the kq-representation is in the simultaneous use of partial information about both the coordinate and the momentum. The quasi-momentum k 2π and the quasi-coordinate q, define the eigenvalues of the operators .T (a) and .τ a correspondingly (see Eqs. (2) and (3)), and by knowing k and q one knows the
Combining Quantum Mechanical Languages (A Tribute to Alex Grossmann)
87
momentum p modulo .h¯ 2π and the coordinate x modulo a. The information about a .hk and q is concentrated in a unit cell in phase space (in solids this is a unit cell in ¯ the inverse lattice plus a unit cell in the Bravais lattice). We call this double cell the von Neumann unit cell. von Neumann was first to introduce the concept of a lattice in phase space in connection with coherent states [13]. From the definition of the kq-representation it is clear that it is not restricted to x and p only. Thus, for any two operators .A and .B that satisfy the commutation relation .[A, B] = i one can always choose an arbitrary constant a and define the corresponding exponential operators (Eqs. (2) and (3)). The wave function .C(k, q) in the kq-representation has the conventional quantum mechanical meaning. Thus, .|C(k, q) |2 dkdq gives the probability of measuring k and q in .dkdq around k and q in the von Neumann unit cell. The expectation value .< A > of an operator .A(x, p) depending, in general, on the position and momentum operators is defined as follows: ˆ .
< A >=
ˆ
2π a
0
0
a
∂ ∂ + q, −i h¯ C (k, q)A i ∂q ∂k ∗
C(k, q) dkdq ,
(16)
∂ ∂ + q, −i h¯ ∂q (see Eqs. (14) and (15)) and where the where .A(x, p) = A i ∂k integration is over a unit cell of the von Neumann lattice. Similarly one defines also matrix elements of an operator .A ˆ Anm =
2π a
ˆ
.
0
0
a
Cn∗ (k, q)ACm (k, q) dkdq.
(17)
And finally, the time dependent Schrödinger equation assumes the following form in the kq-representation i h¯
.
∂C(k, q, t) = H C(k, q, t) , ∂t
(18)
where H should be written by using Eqs. (14) and (15) for the basic operators x and p. The kq-coordinates are of special nature in that they do not have a classical and .T (a) (Eqs. (2) and (3)) analog. This can best be seen from the fact that .τ 2π a commute in quantum mechanics, but they give a non-vanishing classical Poisson bracket. This is also the reason why in classical mechanics there is no conservation law for the quasi-momentum k. The kq-coordinates can rightfully be called the natural symmetric coordinates for problems with periodic potentials. For such potentials in one dimension (see Eq. (1)) the Hamiltonian in the kq-representation is
88
Joshua Zak
H =−
.
h¯ 2 ∂ 2 + V (q) , 2m ∂q 2
(19)
where we used the expressions (14) and (15) for x and p. This form of H shows that k and q are symmetric coordinates for periodic potentials in the same sense as spherical coordinates are symmetric for central force problems. In the latter case the potential .V (r) depends only on r, the absolute value of the radius vector .r. It is convenient to choose the spherical coordinates .r, ϕ, ψ in solving Schrödinger’s equation for the atom. The reason for this is that the potential depends only on r and it is therefore possible to separate the coordinates and look for the wave function .ψ in a product form .ψ(r, ϑ, ϕ) = R(r) Y (ϑ, ϕ) of the radial, .R(r), and angular, .Y (ϑ, ϕ), parts. As is well known, it is the separation of variables that makes the problems in atomic physics solvable. The separation was achieved by choosing as one of the coordinates the coordinate r on which and on which only the potential .V (r) depends. The two other coordinates have to be chosen in such a way that together with r they form what is called in quantum mechanics a complete set of commuting coordinates. Thus, one way of choosing such a set is to add to r the angles .ϑ and .ϕ. .r, ϑ, ϕ are usually called the symmetric coordinates for the central field problem. In this case one can choose the symmetric coordinates also in another way. Namely, .r, l 2 , lz , where .l 2 and .lz are the square of the angular momentum and its z-component, respectively, and they are, as is well known, constants of motion. The latter choice will correspond quite closely to the way of choosing symmetric coordinates in solids. Indeed, have seen above (Eq. (1)) the potential depends as we 2π only on the structure .exp i a x (Eq. (2)) and not explicitly on x. But this means that the potential depends only on the quasi-coordinate q as in Eq. (19). The additional coordinate completing the kq-representation is the quasi-momentum k, which is a conserved quantity for a periodic potential problem. There is, therefore, a full analogy between the spherical and kq-coordinates. Being the symmetric coordinates for periodic potentials one should expect that the use of the kq-language will often simplify the description of the problem. An example of such a simplification is the derivation of the acceleration theorem for the Bloch momentum k in a crystal in the presence of the electric field E. This theorem is of central importance in the dynamics of electrons in solids. It reads h¯ k˙ = −eE,
.
(20)
where .−e is the change of the electron. The commonly written form of this theorem in Eq. (20) is very simple; however, its derivation and interpretation is not simple at all, and it has been causing controversy for a very long time [12]. It turns out that in the kq-representation the derivation of this theorem simplifies considerably, in view of the fact that k is one of the coordinates of this representation. The Hamiltonian for an electron in a periodic potential .V (x) and an electric field E in the x and kq-representation is (we use
Combining Quantum Mechanical Languages (A Tribute to Alex Grossmann)
89
Eqs. (19) and (14)) h¯ 2 ∂ 2 ∂ p2 + V (x) + eEx = − +q . + V (q) + eE i .H = 2m 2m ∂q 2 ∂k
(21)
In the kq-representation, the quasi-momentum k appears in the exponential function exp (ika ) only (eigenvalue of the translation operator in Eq. (3)). According to elementary quantum mechanics we have for the derivative
.
.
i d iaeE exp (ika ) = H, exp (ika ) = − exp (ika ) . dt h¯ h¯
(22)
The left hand side of this equation is .iak exp(ika ), and from here one immediately obtains Eq. (20). As is well known, the conventional derivation of the acceleration theorem in Eq. (20) is not entirely elementary [8, 10, 12]. Another example, where the kq-representation simplifies the proof considerably has to do with the von Neumann’s sets on discrete latices in phase plane. We mention this proof, because Alex Grossmann was one of the authors of the paper, where it was given [14]. In the early 30s von Neumann introduced a complete set of coherent states on a lattice in the phase plane with a unit cell of area h, the Planck constant [13]. These states, in addition to forming a complete set, have also the attractive property of being well localized both in coordinate x and momentum p around each point .ma, nb on the lattice, where a and b are constants with .ab = h, and .m, n = 0 ± 1, ±2, . . . on the lattice. In 1946 Gabor defined a similar set of states in the plane of time t and frequency .ν with a unit cell of area 1 [15]. It has the same localization property for t and .ν as the von Neumann set has with respect to x and p. This is, in principle, the same set of states which can therefore be called the von NeumannGabor set. In the x-representation the von Neumann set has the form 2π n ψ0 (x − ma) , ψmn (x) = (−1)mn exp ix a
.
(23)
where .ψ0 (x) is the ground state of the harmonic oscillator ψ0 (x) =
.
1
1
2
1
λπ 2
x2 exp − 2 . 2λ
(24)
By going to the kq-representation according to Eq. (9), the von Neumann set becomes 2π mn n C0 (k, q) , (25) .Cmn (k, q) = (−1) exp −ikam + iq a
90
Joshua Zak
where .C0 (k, q) is the ground state of the harmonic oscillator in the kq— representation 1 a exp .C0 (k, q) = √ 2π λ
q2 − 2 2λ
Θ3
ka iqa a 2 . − 2 i 2 2λ 2π λ2
(26)
In Eq. (26) .Θ3 is the Theta–3 function [16]. von Neumann suggested [13] that his set in Eq. (23) was complete, but the actual proof of the completeness was given only in the 70s [14, 17]. It has also turned out that the von Neumann set is overcomplete by exactly one state: It remains complete if one state is removed but ceases to be complete when more than one state is removed. This is a very unusual peculiarity which is characteristic to the von Neumann sets. Originally the von Neumann sets were defined by starting with the ground state of a harmonic oscillator and by shifting it by the shift operators (2) and (3) (see Eq. (23)). Such sets can be defined by starting with any function .C(k, q). It is easy to show how the completeness properties of such sets can be proven in the kq-representation [14].
3 Bloch-Like Functions By definition Bloch functions are eigenfunctions of the Hamiltonian for an electron in a periodic potential (1) and of the discrete translations (3). We will denote them by .ψk (x). As such they satisfy the following boundary conditions: ψ
.
k+
2π (x) = ψk (x) = exp (ika)ψk (x − a) . a
(27)
As was shown above, any kq-functions .C(k, q) satisfies the same boundary conditions, Eq. (12). However, in general they are not eigenfunctions of either the Hamiltonian or the discrete translations. We will call the .C(kq) Bloch-like functions. To turn them to become Bloch functions we will have to require that they are eigenfunctions of the Hamiltonian (19) for a periodic potential and the discrete translations (3): H ψk (x) = ε(k)ψk (x),
.
T (a)ψk (x) = exp (ika)ψk (x) ,
(28)
where k is the Bloch momentum, which is a conserved quantity and .ε(k) is the energy. In .C(k, q), k is a variable, like q. Now we come to an interesting point concerning the variables on which the functions .ψk (q) and .C(k, q) depend. The Bloch function depends on the x-coordinate and the Bloch momentum k. x extends from minus infinity to plus infinity Fig. 1. In this space the Bloch function is extended and is not square integrable. The .C(k, q) functions depend on k and
Combining Quantum Mechanical Languages (A Tribute to Alex Grossmann)
91
Fig. 1 The x-space; extending from .−∞ to .∞ Fig. 2 The kq-space. See Eq. (7) in text
q, which vary in the kq-space Fig. 2. It is known that for two square integrable functions .ψ1 (x) and .ψ2 (x), the corresponding kq-functions .C1 (k, q) and .C2 (k, q) satisfy the following equality [9]: ˆ .
∞
−∞
ψ1∗ (x)ψ2 (x) dx
ˆ =
π a
ˆ
−π a
a 2 −a 2
C1∗ (k, q)C2 (k, q) dkdq ,
(29)
where the integration in the second integral is over the kq-space Fig. 2. In particular, we have ˆ .
∞
−∞
ˆ 2 ψ(x) dx =
π a −π a
ˆ
a 2 −a 2
2 C(k, q) dkdq .
(30)
Here comes the amusing remark that follows from the special structure of the Blochlike functions: it was mentioned above that the Bloch function .ψk (x) has no norm ˆ .
∞
−∞
2 ψk (k, x) dx −→ ∞ .
(31)
But this same function .ψk (q) (with x replaced by q) when considered in the kqspace (Fig. 2) gives a finite result ˆ .
π a −π 2
ˆ
a 2 −a 2
2 ψk (q) dkdq −→ finite .
(32)
This dual possibility of integrating in the space of x and the kq-space (See Eq. (7) and Figs. 1 and 2) is a characteristic feature of the kq-representation. Sometimes in Mathematics, the kq-transform is defined as a certain operation which takes as input a function of one variable and produces as output a function of two variables satisfying the boundary conditions in Eq. (12). One should, however, point out that this not only is a doubling of the number of variables on which the function depends
92
Joshua Zak
but also converts the kq-transform into a Bloch-like function. This explains the surprising result expressed in Eqs. (31) and (32). It also explains why the Wannier function in the kq-representation can be identical to the Bloch function in the xrepresentation! By definition, the Wannier function .W (x) is obtained via the Bloch function in the following way [3, 7] a 1 ˆ πa 2 .W (x) = ψk (x) dk , 2π − πa
(33)
where the integration is over the Brillouin zone and a is the lattice constant. It then follows that the Bloch function .ψk (x) can be expressed by the Wannier function as a discrete Fourier transform ψk (x) =
.
a 1 2 exp(ikan) W (x − na) . 2π n
(34)
If in Eq. (33) we replace the Bloch function .ψk (x) by a Bloch-like function C(k, q), then we recover Eq. (8). We find ourselves combining quantum mechanical representations. It is clear that if .C(k, q) is normalized to 1 in Eq. (8), so is also .ψ(x) (see Eq. (30)). As was pointed out before, Eq. (9) defines the kq-transform .C(k, q). Correspondingly, we call Eq. (8) the inverse kq-transform. It is very satisfying that the kq-transform, .C(k, q), is automatically a Blochlike function, obeying the boundary conditions Eq. (12). This is a clear indication that the kq-representation is a suitable language for dealing with elementary dynamics in crystals [8]. A particular kq-transform is the Bloch function in the kqrepresentation, .Ck (k, q), where .k is the conserved Bloch momentum. The relation of .Ck (k, q) to the Bloch function .ψk (x) in the x-representation can be found from Eq. (9) .
Ck (k, q) =
.
a 1 2 exp (ikan) ψk (q − na) 2π n
2π = a
1 2
ψk (q)Δ(k − k ) ,
(35)
where .Δ(k − k ) is a sum of Dirac delta functions .δ(k − k ) 2π . .Δ(k − k ) = δ k−k −n a n
(36)
By integrating Eq. (35) over .k in the Brillouin zone and by using the definition of the Wannier function Eq. (33), we find an expression for the Wannier function .W (k, q) in the kq-representation
Combining Quantum Mechanical Languages (A Tribute to Alex Grossmann)
W (k, q) = ψk (q) .
.
93
(37)
This is the equation we have looked for [6]. Its formulation in words is: The Wannier function in the kq-representation can be identical to the Bloch function in the x-representation (with x replaced by q). Equation (37) is written for the Wannier function at the origin of the Bravais lattice. For a Wannier function centered at the m’s unit cell of the Bravais lattice, Eq. (37) becomes .Wm (k, q)
= ψk (q)exp (−ikam) .
(38)
The equalities of the Wannier functions and the Bloch functions in Eqs. (37) and (38) are surprising, in view of the fact that in the x-representation the Wannier functions are localized while the Bloch functions are extended. This surprising result is a consequence of the kq-representation being in between the x- and p-representations and is striking both conceptionally and computationally. Conceptually, it belongs to textbooks because it is a fundamental result. Computationally it is customary to find first the Bloch functions from the Schrödinger equation by using, for example, the density functional theory. Then the Wannier functions in the x-representation are calculated by integration of the Bloch function over the Bloch momentum. With our result, by having the Bloch functions in the x-representation we also have the Wannier functions in the kq-representation [6]. There is no need for any additional calculations. This can, in principle, save much of computer time when doing computations in condensed matter physics. Having the basic operators x and p and the Wannier function W in the kq-representation, Eqs. (14), (15), and (38), we can carry out all the calculations in this representation. Thus, following Eq. (16) we have for the expectation values .< x > and .< p > in both, the x and kq-representations: ˆ .
< x >=
−∞
ˆ < p >=
∞
∞
−∞
∂ ψk∗ (q) i (39) + q ψk (q) dkdq. −π −a ∂k a 2 ˆ π ˆ a a 2 ∂ ∂ W (x) dx = ψk (q) dhdq . W ∗ (x) −i h¯ ψk∗ (q) −i −π −a ∂x ∂q a 2 W ∗ (x) xW (x) dx =
ˆ
π a
ˆ
a 2
(40) The integration on x is given in Fig. 1 and on kq in Fig. 2, correspondingly. Equations (39) and (40) demonstrate the striking nature of the result: the Bloch functions .ψk (x) in the x-representation play the role of the Wannier functions in the kq-representation. In summarizing Sect. 3, we say that for an arbitrary function .ψ(x), the kqtransform, .C(k, q), is a general Bloch-like function. However, when .ψ(x) is the Wannier function, then its kq-transform is the Bloch function (Eqs. (37) and (38)). This explains the striking result that we repeat in words: The Wannier function in the kq-representation is identical to the Bloch function in the x-representation. This
94
Joshua Zak
shows that the kq-representation opens a new dimension in the elementary theory of condensed matter.
4 Phase of the Bloch Function Much attention is paid to the choice of phase of the Bloch function [5, 7]. Here we will just touch on this subject in view of the new result in Eq. (37). In the x-representation when the Bloch function is multiplied by a k-dependent phase .exp [iθ (k)], Eq. (33) will turn into .W (x) W (x) =
.
a 1 ˆ 2 exp [iθ (k)]ψk (x) dk . 2π
(41)
W (x) is a completely new function. In the kq-representation when the Bloch function is multiplied by a k-depended phase the new Wannier function turns into
.
W (k, q) = exp iθ (k) W (k, q) .
.
(42)
This is a surprisingly simple result, when comparing with the one in Eq. (41). But it is also reasonable because of our
result in Eq. (37): when the Bloch function is multiplied by the phase .exp iθ (k) , the Wannier function is multiplied by the same
phase . iθ (k) ! There is no contradiction here because the Wannier function in the x-representation will still be given by Eq. (41).
5 kq-Space The kq-function .C(k, q) satisfies what is called the Bloch-like boundary conditions of Eq. (12). In the kq-representation the variables k and q vary in the range as shown in Eq. (7), or in Fig. 2. This, as was mentioned before, is called the kq-space [11]. .C(k, x) can be considered as a Bloch-like function on the x-axis (see Fig. 1). As such, it is an extended function that has no norm, because the integral of the square of the absolute value of this function, .|C(k, x)|2 , on the whole x-axis, diverges. On the other hand, the same function, .C(k, q), as a function in the kq-representation is square integrable on the kq-space (see Fig. 2). This is the essence of our new approach to the Wannier function.
Combining Quantum Mechanical Languages (A Tribute to Alex Grossmann)
95
6 Conclusion As was pointed out above, the main result of this paper is Eq. (37). It can straightforwardly be extended to any dimension and to any number of energy bands that are isolated from below and above by energy gaps. The extension is straightforward, but by no means trivial [4, 5]. With the striking result of Eq. (37) there is no difference between the Bloch and the Wannier functions, they are identical! When you know one of them, you know also the other one. This is the doing of the kq-representation, which is a complete quantum mechanical language: anything one can do in the x-representation (x-language) or p-representation (planguage) one can also do in the kq-representation (kq-language) [1]. In Eq. (9) the kq-transform, .C(k, q), is defined via the function .f (x) in the x-representation. One can equally well define the .C(k, q) via the Fourier transforms, .F (k), or .f (x) 1 F (k) = √ 2π
ˆ f (x) exp(−ikx) dx .
(43)
2π 2π F k+ exp iqn n . a a
(44)
.
One has [9]: C(k, q) =
.
1 1
a2
e
ikq
n
When .f (x) is the Wannier function, .W (x), then .C(k, q) is the Bloch function .ψk (x) (see Eq. (37)), .C(k, q) = ψk (q) = eikq uk (q). By multiplying both sides of Eq. (44) by .e−ikq and by integrating over the quasi-coordinate q we get F (k) =
.
1 1
a2
ˆ
a 2 −a 2
dquk (q) .
(45)
It is of interest to compare the result in Eq. (45) with the original definition of the Wannier function in Eq. (33). Here is an instruction of how to find the Wannier function in the kq representation. This results in a game of combining quantum mechanical languages: step 1, one finds the Bloch function in the x-representation, by say, the density functional theory (this we also do conventionally today); step 2, one replaces x in the Bloch function .ψk (x) by q (see Eq. (37)); and step 3, one uses Eq. (37) which gives the Wannier function .W (k, q) in the kq- representation. This is a wonderful result which stems from the fact that the function in the kq-representation, .C(k, q), is a discrete Fourier transform of the wave function .ψ(x) in the x-representation [9]. Having .W (k, q), and the operators x and p in the kq-representation (Eqs. (14) and (15)), one can carry out all the calculations in the kq-representation, without the need of the Wannier function in the x-representation (see Eqs. (16), (39) and (40)). We conclude by pointing out that in recent years the Wannier functions have been widely used in describing the topology of energy bands in solids [18–20].
96
Joshua Zak
References 1. P.A.M. Dirac, The Principles of Quantum Mechanics, Oxford University Press, London and New York, 1958. 2. F. Bloch, Quantum Mechanics of Electrons in Crystal Lattices, Ann. Phys., 52, 555, 1928. 3. G. H. Wannier, Structure of Electronic Excitation Levels in Insolating Crystals, Phys. Rev., 52, 191, 1937. 4. W. Kohn, Construction of Wannier Functions and Application to Energy Bands, Phys. Rev., 7, 4388, 1973. 5. N. Marzari, A. A. Mostofi, J. R. Yates, I. Suoza and D. Vanderbilt, Maximally Localized Wannier Functions: Theory and Applications, Rev. of Mod. Physics, 84, 1419, 2012. 6. J. Zak, A variational principle for band calculations in the kq-representation, Phys. Lett., 55A, 230, 1975. 7. W. Kohn, Analitic, Properties of Bloch Waves and Wannier Functions, Phys. Rev., 115, 809, 1959. 8. J. Zak, Finite Translations in Solid State Physics, Phys. Rev. Lett., 19, 1385, 1967.; The kq-representation in the Dynamics of Electrons in Solids, Solid State Physics, Edited by F. Seitz, D. Turnbull, and H. Ehrenreich (Academic Press, New York), 27, 1972. 9. A. J. E. M. Janssen, The Zak Transform: A Signal for Sampled Time-Continuous Signals, Philips J. Res., 43, pp. 23–69, 1988. 10. L. D. Landau and E. M. Lifshitz, Quantum Mechanics, Non-Relativistic Theory, 3rd Edition– Elsevier, 1981. 11. M. An, A. K. Brodzik and R. Tolimieri, Ideal Sequence Design in Time-Frequency Space, page 121, Birkäuser Boston, 2009. 12. J. Zak, Natural Coordinates for Electrons in Solids, Physics Today, 23, 2, 51, 1970; Mind Your k’s and q’s to Simplify Solid State Theory, Physics Today, 22, 2, 64, 1969. 13. J. von Neumann, Mathematical Foundations of Quantum Mechanics, Princeton University, Princeton, 1955. 14. H. Bacry, A. Grossmann and J. Zak, Proof of Completeness of Lattice States in the kqRepresentation, Phys. Rev., B12, 1118, 1975. 15. D. Gabor, Theory of Communication, J. Inst. Elect. Eng., 93, 429, 1946. 16. E. T. Whittaker and G. N. Watson, A Course of Modern Analysis, Cambridge University Press, 1996. 17. V. Bargmann, P. Butera, L. Girardello and J. R. Klauder, On the completeness of coherent states, Rep. Math. Phys., 2, 221, 1971. 18. J. Zak, Berry’s phase for energy bands in solids, Phys. Rev. Lett., 62, 2747 1989. 19. B. Bradlyn, L. Elcoro, J. Cano, M. G. Vergniory, Z. Wang, C. Felser, M.I. Aroyo, B. A. Bernevig, Topological Quantum Chemistry, Nature (London), 547, 298, 2017. 20. D. Vanderbilt, Electronic Structure Theory, Cambridge University Press, 2018.
Alex Grossmann, Scattering Amplitude, Fermi Pseudopotential, and Particle Physics Tai Tsun Wu
1 Harvard and Scattering Amplitude After graduating from the University of Minnesota, I entered the Graduate School of Harvard University in the fall of 1953. Not long afterwards, Alex Grossmann came to Harvard also. Because of the coronavirus episode, it has not been possible to get to the paper record of his student days at Harvard. To the best of my recollection, he came in 1955. We got to know each other shortly after his arrival. At that time, there was a dining hall, called the Harkness Commons, for the graduate students. Both Alex and I had our meals—breakfast, lunch, and dinner— there most of the time. Harkness Commons was a great dining hall; one of the remarkable features was that the graduate students could eat as much as they want to, and both Alex and I made good use of this feature. I am not aware of such an arrangement anywhere at Harvard any more. On the wall at one end of the main dining room in Harkness Commons, there was a painting by Joan Miro specifically designed by him for Harkness Commons and the graduate students. Years later, after Miro became very, very famous, this painting was removed from the wall and replaced by another painting, which to my taste is much less interesting. I do not know where the Miro painting was moved to. Although I entered the Graduate School before Alex, he had a much stronger background than I did, both in physics and in mathematics; I learned a great deal from Alex through our discussions. It was six years later when Alex and I published our first joint paper [1]. I believe that was one of the earliest papers on the rigorous treatment of the general properties
T. T. Wu () Gordon McKay Laboratory, Harvard University, Cambridge, MA, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Flandrin et al. (eds.), Theoretical Physics, Wavelets, Analysis, Genomics, Applied and Numerical Harmonic Analysis, https://doi.org/10.1007/978-3-030-45847-8_8
97
98
T. T. Wu
of scattering amplitude. We considered the non-relativistic scattering of a particle by a fixed potential .V (x). The conditions that we impose on this .V (x) are: (a) The integral ´´ |V (x)V (y)||x − y|2 dxdy . exists. (b) The integral ´ .I (κ) = |V (x)|eκ|x| dx exists for some .κ > 0. Denote by .α the lowest upper bound of real numbers .κ such that .I (κ) exists. A function .f (p, q; k) of seven independent complex variables—the three components of the vector .q, the three components of the vector .p, and the scalar k—is defined such that f is the scattering amplitude multiplied by .(2π 2 )−1 when the seven variables are all real and furthermore satisfy .|q| = |p| =k, where .q and .p are the final and the initial momenta of the particle, and k is the square root of the energy. Our main result is that, except for a discrete set of values of .k, f (q, p, k) is a holomorphic function of these seven complex variables at .(q, p; k) when I m k > − 12 α,
.
|I m p| < 12 α,
.
and
|I m q| < 12 α.
.
A year later, Alex and I published our second joint paper [3]. By that time, he had become a member of the Institute for Advanced Study in Princeton. In this second paper of ours, we have found that the domain of analyticity becomes larger when the first Born approximation is subtracted from the total scattering amplitude. This larger domain of analyticity is obtained when the assumptions on the potential .V (x) are made a little more restrictive. In this series of three papers on the Schrödinger scattering amplitude, there is of course also a second paper [2]. This second paper was entirely the work of Alex.
2 Marseille and Fermi Pseudopotential A few years after the publication of this second joint paper, Alex was offered a nice position at CNRS at Marseille, and he accepted the offer. At about the same time, Alex married Doris Karabian, a well-known young biochemist. Doris, called Dickie by all her friends, has made important contributions to biological pathways. Indeed, one of the steps in biological pathways is discovered by and named after her. Dickie and I have known each other well before this marriage. After settling down in Marseille, Alex and Dicke built a beautiful house in Bandol, not far from Marseille. I remember several discussions with them on the design of this house. I visited Alex there numerous times, and we also met many times elsewhere in Europe. There are
Alex Grossmann, Scattering Amplitude, Fermi Pseudopotential, and Particle Physics
99
many interesting stories about Alex while he was at CNRS; let me relate two of them. Once, he was visiting Amsterdam, and I went there to see him. When he saw me arriving, he put down the newspaper he was reading and started talking to me. After a while, I asked him how well he knew Dutch, wondering whether Dutch was one of the more than ten languages he was familiar with. He answered that he did not know Dutch. Only after I pointed to the newspaper he was reading, he realized that he was reading a Dutch newspaper! To me it was most impressive to know so many languages, but it was even more so to find him reading a newspaper in a language he did not even realize that he did not know. He told me that he must be interpolating between the languages that he did know, mostly between German and English in this case. But he could not explain his not being aware of reading a newspaper in a different language. On the second story, I have no direct knowledge; this is what Alex told me himself. After he was settled in Marseille, there was an occasion he was returning to Marseille from Paris with his family. At that time, he was not only married but also had his two sons, Michael and Etienne, who were still babies. At the Paris train station, he and his family waited for a specific train. The expected time of arrival of this train passed by and there was no train. Since this was most unusual for French trains, Alex took a more careful look at the time table. He then noticed that there was a small symbol next to the schedule for this train; going down the page, he then learned that this particular train ran only once a year. After this discovery, they had no trouble getting back home. Whenever we got together, Alex and I had many discussions on many topics in physics. One of the recurring topics for our discussion was: what might be the future development of the physics of elementary particles. Elementary particle physics forms the foundation of physics and is hence a worthwhile topic to think about. Most people, including even some of the close collaborators of Alex, do not realize his interest and knowledge of elementary particle physics. Let me therefore describe one of the topics that Alex and I frequently discussed. The topic that I am referring to is the masses of the elementary particles. The muon was discovered eighty five years ago, in 1936, by Anderson and Neddermeyer. Almost immediately after its discovery, the question was raised: since the muon has the same properties as the electron including the charge, but has a mass of about 200 times that of the electron, is there any way to understand, even qualitatively, why there is such a large difference in their masses? This is a very old problem, and there has been no progress for over eighty years. More generally, is there a way to understand the mass ratios of any two elementary particles? It was unfortunate that our discussions of such mass ratios were premature—our discussions occurred before the year 1995, when the top quark was discovered at the Fermi National Accelerator Laboratory [4]. During that time, Alex and I published two papers together [5, 6]. The Fermi pseudopotential has been extensively used very successfully in both nuclear physics [7] and many-body problems [8]. We learned Fermi pseudopotential from Professor Yang. These applications are of course for the case of three spatial
100
T. T. Wu
dimensions. Alex and I generalized this Fermi pseudopotential to higher spatial dimensions. While this generalization is natural and can be carried through formally when the strength is negative, there are basic changes in the underlying structure. In this paper [5], we give a list of Fermi pseudopotential in two to six dimensions, where the overall constant is omitted: two dimensions, .three dimensions, .f our dimensions, .
f ive dimensions, and .six dimensions,
.
∂ (ln r)−1 ; δ 2 (r)r(ln r)2 ∂r ∂ δ 3 (r) ∂r r; ∂ 2 ∂ 4 (ln r)−1 r −1 ∂r δ (r)r(ln r)2 ∂r r ; 3
∂ 3 δ 5 (r) ∂r 3r ; ∂ ∂ −1 ∂ 4 δ 6 (r)r(ln r)2 ∂r (ln r)−1 r −1 ∂r r ∂r r .
There may be some problem with the last one. In paper [6], Alex and I returned to the Fermi pseudopotential in three dimensions. We placed identical Fermi pseudopotentials on the N vertices of a regular planar polygon. It is found that, when the length of the sides of the polygon are held fixed but N increases, there are resonances the width of which is exponentially small in N. While this paper [6] was not published until 1987, the work reported in the paper was actually carried out more than ten years earlier. When that work was being carried out, Alex and I applied the same consideration to invent a novel antenna array. This antenna array may be described briefly as follows. Take a convex analytic closed curve in a plane, and place on this curve N identical equi-spaced antenna elements. These antenna elements can be, for example, dipoles or disk antennas. Keeping these antenna elements fixed, rescale this configuration so that the distance between adjacent antenna elements is a suitable fraction of the wavelength. When the number N of individual antenna elements is large, such an array can have various interesting properties depending on the choice of the original analytic curve. For example, such an array can be a highly directional transmitting or received antenna in one or more predetermined directions. Because of these interesting and useful properties, we were excited by such a practical application. Therefore, in the summer of 1973, Alex and I, together with Jack Schwartz who was a graduate student at Harvard the same time as us, applied for a French patent on this array. This is one of the examples that shows that Alex was not only a powerful mathematical physicist but was also adept in solving engineering problems. I do not know what happened to this patent application of ours. Soon after the publication of our two papers [5, 6], the interests of Alex and mine diverged. He became very much involved in developing his great work on the wavelet theory, and I became more interested in problems directly related to physics experiments. For some time, we did not get together as often as before.
Alex Grossmann, Scattering Amplitude, Fermi Pseudopotential, and Particle Physics
101
3 Particle Physics After the discovery of the top quark in 1995, Alex and I were interested in resuming our discussion of the masses of elementary particles. In November of 2008, we met in Cambridge, Massachusetts, and in May 2012 we met in Geneva, Switzerland. Unfortunately, on both occasions, the getting together was too short for us to be able even to make a tentative decision of how to think about this fundamental problem. We then tried to get together for a somewhat longer period, but that never happened. Disaster struck on February 12, 2019: on that day at 6:41 pm, I received an Email from Robert Coquereaux telling me that Alex had passed away. This was such a terrible shock to me that I was unable to do anything for some time. After several months, I returned to the problem of the masses of elementary particles. Besides with Alex, I had also been discussing this problem of the masses of elementary particles with Sau Lan Wu, who had made a seminal contribution to the experimental discovery of the Higgs particle at CERN in 2012 [9, 10]. Let me give here a summary of the present thinking of Sau Lan and me on this topic. One of the initial questions that have been raised is the following. After the discovery of the Higgs particle, which completes the list of particles in the standard model of Glashow, Weinberg, and Salam [11], what is the next major open issue in the theory of elementary particles? The standard model is the most important application of the Yang-Mills non-Abelian gauge theory [12]. Some physicists claim that this is the end of the elementary-particle theory unless further new particles are discovered. We disagree with this pessimistic view: how can we claim that we have a theoretical understanding of elementary particles unless we have at least some notion of what determines their masses? A stronger and more optimistic form of stating this point of view is as follows. The twentieth century is the century to study and understand the interactions of elementary particles, while the twenty-first century is the century to study and understand the masses of elementary particles. An important step in this direction has been taken, already in 1964, by Englert, Brout, Higgs, etc. [9]. For every massive elementary particle, the mass is due to the interaction of that particle with the Higgs particle. Since this important step replaces each mass parameter by a coupling constant and hence does not reduce the number of independent parameters in the standard model, it needs to be supplemented by something else. What can this “something else” be? Inspired by the old question of the mass ratio for muon and electron, we have decided to study the mass ratio between two elementary particles. The next question is then: which two elementary particles should we choose to be the first pair to study? As stated above, the top quark was discovered in 1995 at the Fermi National Accelerator Laboratory. When it was discovered, at a mass of 173 .GeV/c2 , it was the heaviest known elementary particle. What has impressed us very much is the fact that, more than quarter of a century later, this top quark is still the
102
T. T. Wu
heaviest known elementary particle. If one takes the view that the masses of lighter elementary particles are somehow “generated” by the masses of the heavier elementary particles, then it is necessary to take the mass of the top quark as an input. We believe in this argument; therefore the top quark has been chosen as one of the pair whose mass ratio is to be studied. Which other elementary particle should be chosen for the first step in our theoretical study of the mass ratios? With the top quark as one of the pair, the choice of the other member of the pair is clear: it must be the bottom quark. In other words, we have decided to study first the mass ratio of the top quark and the bottom quark. This is a much better problem to study then the very old problem of the mass ratio of the muon and the electron. This choice of studying the mass ratio between the top quark and the bottom quark is supported by the following numerological relation: m2b /m2t = α/4π,
.
where .mb and .mt are the masses of the bottom and the top quarks, respectively, and .α is the fine-structure constant. This numerological relation is highly accurate. While we should not take such numerological relations too seriously and it is a mystery how the fine-structure constant can come in, this relation nevertheless tells us that, in some sense, the mass of the bottom quark is a radiative correction to the mass of the top quark. Since we judge that this numerological relation is not likely to be accidental, our present view is roughly as follows. The mass of the top quark is an input to the standard model, and the mass of the bottom quark is a first radiative correction to this mass of the top quark. The masses of the other fermions—both quarks and leptons—may also be thought of as radiative corrections to this mass of the top quark. How can we turn this view into an actual calculation? For this purpose, approximations are unavoidable. The best approximation that we have pursued is the “large .mt approximation.” In this approximation, the mass of the top quark is taken to be much larger than all the other masses. So far as the fermions are concerned, this is a good approximation. However, in order to be able to proceed with this large .mt approximation, we also had to take all the boson masses to be small compared with this .mt . Since the mass ratio .m2t /m2Z is only about 3.6, this approximation cannot be expected to be very accurate. Nevertheless, this is so far the best that we have been able to do. Let me add that the main tool for the actual calculation of the .mb /mt mass ratio is the one that has been developed for the theoretical prediction of the increasing total cross sections [13]. Specifically, the essence of the method is given in Appendix B of the book by Hung Cheng and me [14]. The use of the large .mt approximation then leads to an integral equation, which can be reduced to coupled ordinary differential equations.
Alex Grossmann, Scattering Amplitude, Fermi Pseudopotential, and Particle Physics
103
This study of the ratio of the masses of elementary particles is highly speculative but has the potential of opening up a new way for the theoretical study of elementary particles; Alex would have appreciated it if he were still with us now.
References 1. Alex Grossmann and Tai Tsun Wu, Schrödinger Scattering Amplitude I, Journal of Mathematical Physics 2 (1961) 710. 2. Alex Grossmann, Schrödinger Scattering Amplitude II, Journal of Mathematical Physics 2 (1961) 714. 3. Alex Grossmann and Tai Tsun Wu, Schrödinger Scattering Amplitude III, Journal of Mathematical Physics 3 (1962) 684. 4. CDF Collaboration, Physical Review Letters 74 (1995) 2626; D0 Collaboration, Physical Review Letters 74 (1995) 2632. 5. Alex Grossmann and Tai Tsun Wu, Fermi pseudopotentials in higher dimensions, Journal of Mathematical Physics 25 (1984) 1742. 6. Alex Grossmann and Tai Tsun Wu, A Class of Potentials with Extremely Narrow Resonances, Chinese Journal of Physics 25 (1987) 129. 7. E. Fermi, Ric. Sci. 7 (1936) 13; J.M. Blatt and V.F. Weisskopf, Theoretical Nuclear Physics (Wiley, New York, 1952) pp. 74-75. 8. Kerson Huang and C.N. Yang, Physical Review 105 (1957) 767; Kerson Huang, C.N. Yang, and J. M. Luttinger, Physical Review 105 (1957) 776; T.D. Lee, Kerson Huang, and C.N. Yang, Physical Review 106 (1957) 1134; Tai Tsun Wu, Physical Review 115 (1959) 1390. 9. F. Englert and R. Brout, Physical Review Letters 13 (1964) 321; P.W. Higgs, Physics Letters 12 (1964) 132; G. S. Guralnik, C.R. Hagen, and T.W.B. Kibble, Physical Review Letters 13 (1964) 585. 10. ATLAS Collaboration, Physics Letters B 716 (2012) 1; CMS Collaboration, Physics Letters B 716 (2012) 30. 11. S.L. Glashow, Nuclear Physics 22 (1961) 579; S. Weinberg, Physical Review Letters 19 (1967) 1264; A. Salam, in N. Svartholm (Ed.) Proceedings of the 8th. Nobel Symposium, Stockholm, 1968, Almqvist, 1968, p.367. 12. C.N. Yang and R.L.Mills, Physical Review 96 (1954) 191. 13. Hung Cheng and Tai Tsun Wu, Physical Review Letters 24 (1970) 1456. 14. Hung Cheng and Tai Tsun Wu, Expanding Protons: Scattering at High Energies (MIT press, Cambridge, 1987) pp. 227-241.
Sixty Years of Hadronic Vacuum Polarization Eduardo de Rafael
Abstract After a short historic introduction I describe the present comparison between theory and experiment concerning the hadronic vacuum polarization (HVP) contribution to the anomalous magnetic moment of the muon .aμHVP . This is at present the contribution with the largest uncertainty. The possible determination of .aμHVP from first principles requires the knowledge of functions which, using lattice quantum chromodynamics techniques (LQCD), can only be determined with sufficient accuracy in limited regions. Therefore, dedicated interpolation methods are needed to perform the required integrals of these functions. I am sure that Alex could have been of great help in that respect. In this contribution, I have decided to discuss various methods that some of us have been trying, with precise questions for Alex, as if he were in my office at the CPT.
1 Introduction The first time I met Alex Grossman was in 1966 at the IHES in Bures-sur-Yvette where he was a senior visitor. I had just finished my PhD under the supervision of Louis Michel, and I was preparing to leave for a postdoc position at the BNL in the USA. Alex was a very friendly and curious person and in a short time we became good friends discussing not only physics but also music, languages, and politics as well. After the IHES, Alex was invited by Daniel Kastler to join the Centre de Physique Théorique (CPT) in Marseille as a permanent CNRS member. A few years later I was also offered a CNRS-DR position at the CPT-Marseille by Antoine Visconti, who was the CPT director at that time. The fact that Alex and Raymond Stora were already there, as well as younger particle physicists such as Chris Korthals-Altes, Michel Perrottet, and Jacques Calmet, was a strong motivation for me to accept that offer.
E. de Rafael () Aix Marseille Univ, Université de Toulon, CNRS, CPT, Marseille, France e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Flandrin et al. (eds.), Theoretical Physics, Wavelets, Analysis, Genomics, Applied and Numerical Harmonic Analysis, https://doi.org/10.1007/978-3-030-45847-8_9
105
106
E. de Rafael
Alex was not a particle physicist, but he was very curious about the progress in this field, both theoretically and experimentally, and often came to my office with pertinent questions. Alex’s knowledge of physics and mathematics was encyclopaedic. “He was a living Wikipedia”. Every time I encountered problems in handling a new type of calculation, I would go to Alex office to get some wisdom, which he always generously provided in a down-to-earth manner. I never got from Alex the familiar: oh, you don’t know that ? which sometimes one gets from “learned” colleagues. At the time that I became the director of the CPT, I had the privilege to learn from Alex about his pioneering work on wavelets. I saw how his group of excellent PhD students and young collaborators, Thierry Paul and Bruno Torrésani among others, was growing little by little with more and more post docs, and later on, with illustrious physicists and mathematicians coming to the CPT to visit Alex and his group. I think that if Alex was with us today he would very much like to know my opinion about the present discrepancy between theory and experiment concerning the anomalous magnetic moment of the muon and, particularly, the way we handle the contribution to this observable induced by the strong interactions. I have therefore decided to write this tribute to him in the same way as I imagine our discussions might have taken place at the CPT. I shall begin with a brief introduction which, although it would certainly have not been necessary for Alex, may be useful to some of the potential readers of this tribute who, perhaps, will also have welcome suggestions.
2 What Is Hadronic Vacuum Polarization? Vacuum polarization is a phenomenon which arises in quantum field theory and was first discussed within the context of Quantum Electrodynamics (QED) in papers by Dirac [1] and Heisenberg [2]. It occurs when a virtual propagating photon creates electrically charged particle-antiparticle pairs. These particles can be Leptons (like electrons and muons) or quark-antiquark pairs which are the fundamental matter fields in the Quantum Chromodynamics Gauge Theory (QCD) which describes the strong interactions of particles like, e.g., pions and protons, and which generically are referred to as Hadrons. Observable effects of the vacuum polarization phenomena induced by electronpositron pairs were first evaluated to lowest order in powers of the fine structure 1 constant .α ∼ 137.03... by Serber [3] and Uehling [4] in 1935. The subsequent formulation of QED after the war by the three Noble laureates: Feynman, Schwinger, and Tomonaga [5], as well as by Dyson [6], showed how to do systematic calculations in powers of the small coupling constant .α, in particular, the evaluation of vacuum polarization effects induced by leptons. Hadronic Vacuum Polarization (HVP) is the corresponding contribution induced by the strong interactions. It is the dominant hadronic contribution to low-energy observables like the energy levels of the hydrogen atom and the hyperfine splitting in muonium and positronium. It
Sixty Years of Hadronic Vacuum Polarization
107
2 contributes as a small correction but sizable, of .O πα , to the Dirac equation prediction .g = 2 of the gyromagnetic g-factor of a charged spin .1/2-particle. It also contributes to the way that the .α-coupling is effectively modified at the energy scale of the electroweak Z-particle (.MZ ∼ 91 GeV). From a technical point of view, the quantity we are concerned with is the Fourier transform of the vacuum expectation value of the time-ordered product of two electromagnetic hadronic four-vector currents at two separate space-time x-points (.gμν = diag(1, −1, −1, −1) is the metric tensor): ˆ had
.μν
(q) = i
+∞
−∞
d 4 x eiq·x 0|T Jμhad (x)Jνhad (0) |0 = (qμ qν − q 2 gμν )had (q 2 ) ,
(2.1)
and .Jμhad (x) denotes the hadronic electromagnetic current: had
.Jμ
(x) = (−ie)
⎧ ⎨2 ⎩3
ψ=u,c,t
1 ¯ ψ(x)γ μ ψ(x) − 3
ψ=d,s,b
⎫ ⎬ ¯ ψ(x)γ μ ψ(x) , ⎭
(2.2)
constructed with the spinor .ψ(x) functions of the quark-field flavours u, c, t and d , s , b of charges . 32 and .− 13 in units of the electric charge e; the .γμ are the Dirac matrices. The hadronic self-energy function .had (q 2 ) in Eq. (2.1) is a complex function of its complex .q 2 -variable. In quantum field theory, two-point functions of this type are analytic functions [7, 8] of their argument (in our case the .q 2 -variable) in the full complex plane, but for a cut in the real axis which in our case goes from the physical threshold at .q 2 = 4m2π ± (.mπ is charged pion mass) to infinity.1 As such, the on-shell renormalized .had (q 2 ) function, i.e., .had (q 2 ) subtracted at its value at 2 .q = 0, obeys the dispersion relation (Cauchy’s theorem): ˆ .had (q
.
2
)=
∞
4m2 ± π
dt q2 1 Imhad (t) . 2 t t − q − i π
(2.3)
Furthermore, the optical theorem relates the so-called hadronic spectral function 1 π Imhad (t) to the observable one-photon annihilation cross-section: .σ (t)e+ e− →had
∼
me →0
4π 2 α 1 Imhad (t) . t π
(2.4)
This is the way that experimental data-driven determinations of . π1 Imhad (t) have been obtained.
1 In fact Alex also worked on analyticity properties of scattering amplitudes in Quantum Mechanics, see, e.g., refs. [9–11]. For an overview of the subject in quantum field theory see the book by André Martin in ref. [12].
108
E. de Rafael
3 HVP and the Muon µ-Particle The anomalous magnetic moment of the muon .aμ is the deviation from result .gμ = 2 obtained from the Dirac equation: .aμ
=
1 (gμ − 2) , 2
(3.1)
and it has been directly measured to the remarkable accuracy of half of a per million [13] .aμ (E821
− BNL) = 116 592 089(63)stat (33)syst × 10−11 [0.54ppm] .
(3.2)
New experiments in progress at the Fermi National Laboratory in the USA [14] and at the J-PARC facility in Japan [15] plan to reduce the present error by a factor of four. However, there is a persistent discrepancy at the .∼4σ deviation level, between the present experimental determination and the standard model prediction .aμ (SM)
= 116 591 810 (43) × 10−11 .
(3.3)
It turns out that the contribution to .aμ (SM) which at present has the largest error2 is precisely the one coming from the HVP contribution, which during the last few years has been evaluated with increasing accuracy by Michel Davier and collaborators from experimental data-driven determinations of the hadronic spectral function, with the result [18]: HVP
.aμ
= (6.931 ± 0.034) × 10−8 .
(3.4)
An independent analysis of the data has also been made by the authors of ref. [19] with the result: HVP
.aμ
= (692.78 ± 2.42) × 10−10 .
(3.5)
The fact that these results, in spite of their remarkable accuracy, are still at the origin of the largest error in the .aμ (SM) result in Eq. (3.3) is the main reason why the HVP contribution has been attracting much attention. In fact, a recent preliminary theoretical evaluation from first principles, using lattice QCD techniques (LQCD), finds a result with already a competitive accuracy [20]: HVP
.aμ
2 See,
(BMWc) = (708.7 ± 5.3) × 10−10
(3.6)
e.g., ref [16] and specially ref. [17] for details of the different type of contributions in the Standard Model; in particular, from the hadronic light-by-light scattering which is the contribution to .aμ (SM) with the next largest error.
Sixty Years of Hadronic Vacuum Polarization
109
Fig. 1 Feynman diagram of the hadronic vacuum polarization contribution. The X in the muon line represents the external magnetic field
X µ
Hadrons
which, if confirmed, would bring to agreement within errors theory and experiment; but then of course one will have to find out the origin of the discrepancy between the data-driven results in Eqs. [(3.4), (3.5)] and the LQCD result in Eq. (3.6). In the language of Feynman diagrams, the .aμHVP contribution is shown in Fig. 1, where the red disc represents the contribution from all possible hadronic states which couple to a photon (the wavy lines). This contribution can be seen as induced by the following modification of a free photon propagator in a covariant gauge (fixed by the a -parameter in the equations below):
.
gμν − (1 − a)
−i
qμ qν q2
q 2 + i
⇒
−i
gμα − (1 − a)
= i gμν
qμ qα q2
αβ ihad (q)
q 2 + i qμ qν had (q 2 ) . − 2 q2 q
−i
gβν − (1 − a)
qβ qν q2
q 2 + i
(3.7)
Notice that terms proportional to .qμ qα and .qβ qν in the free propagators of the R.H.S. in the first line give no contribution when acting on the transverse tensor structure of .αβ had . Therefore the gauge dependence remains only in the case of a free field propagator.3 Using the dispersion relation in Eq. (2.3) we can therefore write the HVP-induced modification of the free photon propagator in Eq. (3.7) as follows: .
−i
gμν − (1 − a) q 2 + i
qμ qν q2
ˆ ⇒
∞ 4m2 ± π
dt t
−i
gμν −
qμ qν q2
q2 − t
1 Imhad (t) , π
(3.8)
showing that the HVP contribution can be seen as the convolution of the hadronic spectral function with a free but massive-like photon propagator (t here acts as an effective squared mass). The insertion of this massive-like photon propagator in the muon line vertex of the diagram in Fig. 1, represented by the q -average .. . . in the equation below, results then in a function of t which in Feynman x -parametric form is given by the integral representation
3 In fact, the contribution proportional to q in the free photon propagator of the lowest order Feynman diagram vanishes in the evaluation of the anomaly as a result of a gauge invariance Ward identity. At higher orders, however, it is only specific combinations of Feynman diagrams with free photon propagator insertions that result in gauge invariant contributions.
110
E. de Rafael
−i
.
gμν − q2
qμ qν q2
ˆ ⇒
−t
1
dx 0
x 2 (1 − x) ≡ K(t) . x 2 + mt2 (1 − x)
(3.9)
μ
In terms of this .K(t) kernel-function one finally gets the result obtained by Bouchiat and Michel [21]: HVP
.aμ
ˆ
α π
=
∞
4m2 ± π
dt 1 K(t) Imhad (t) , t π
(3.10)
which is the so-called dispersive representation used in all the data-driven determinations.4 An alternative representation of .aμHVP in terms of the hadronic self-energy function .had (q 2 ) itself in the Euclidean (.Q2 ≡ −q 2 ≥ 0), follows from a rearrangement of the integrands in Eqs. (3.9) and (3.10):
HVP
.aμ
ˆ
ˆ
∞
2
x 2 1 dt 1−x mμ Imhad (t) , t0 = 4m2π 2 t t + x m2μ π 0 t0 1−x 2 ˆ 1 α x2 x m2μ , Q2 ≡ m2 , dx(1 − x) had =− 1−x 1−x μ π 0
=−
α π
1
dx(1 − x)
(3.11) .
(3.12)
where the dispersion relation in Eq. (2.3) has been used to go from the first line to the second one. The representation in Eq. (3.12) was found to be very useful in some of the early QED calculations (see the review article in ref. [22] and also ref. [23]). This representation has now become of special interest for LQCD evaluations of the HVP contribution [24] because the formulation of HVP in the Euclidean is directly accessible to LQCD evaluations. The phenomenological point, first raised in ref. [21], and with the advent of more experimental information a few years later in ref. [25], is that because of the presence of hadronic resonances in the .σ (t)e+ e− →had cross-section, in particular, the dominant one from the lowest .ρ -vector meson resonance, and because of the shape of the .K(t)-function which I discuss below, the HVP contribution is enhanced when compared to a naive estimate. The 1969-result found in ref. [25]: HVP
.aμ
= (6.5 ± 0.5) × 10−8 ,
(3.13)
which is consistent with the determinations in Eqs. [(3.4), (3.5)] and (3.6), shows the progress made since then: the accuracy of the .aμHVP determination has been improved from .8% to .∼0.5%.
4 I remember Claude Bouchiat discussing this with Louis Michel in Louis’s office sixty years ago, but I could not understand a word of it at that time!
Sixty Years of Hadronic Vacuum Polarization
111
Let us examine the kernel-function .K(t) in more detail [25–27]: • In the range .0 ≤ t ≤ ∞ it is positive and decreases monotonously with the
behaviours: 1 .K(t) ∼ t=0 2
and
⎤ ⎡ 2 m2μ 1 m2μ t + O⎣ K(t) ∼ log 2 ⎦ . t→∞ 3 t t mμ
(3.14)
α result of the lowest order Schwinger • The value at .t = 0 recovers the famous . 2π
calculation [28], valid for the anomalous magnetic moment of any charged fermion with spin .1/2. • Rigorous bounds on .aμHVP were derived in 1969 with the result [29]: α 1 m2μ t0 α 1 m2μ HVP M(0) 1 − f < M(0) , . < a μ π 3 t0 π 3 t0 m2μ
(3.15)
where .M(0) is the leading moment of the hadronic spectral function, related to the slope of the self-energy .had (Q2 ) function at the origin: ˆ∞ .M(0)
≡ t0
dt t0 1 ∂had (Q2 ) α 1 Imhad (t) = −t0 2 ≡ π 2 , t t π ∂Q2 Q =0 had
(3.16)
and .f
t0 m2μ
ˆ ≡3
1
dx 0
x4 x2 +
t0 (1 − x) m2μ
= 0.36655 ,
for
t0 = 4m2π ± .
(3.17)
• The present experimental determination of the scale . had in the R.H.S. of
Eq. (3.16)5 gives = (0.502 ± 0.001) GeV
. had
(3.18)
and fixes the coupling of the underlying leading effective operator of dimension .d
=6 .
1 λ μν ∂ F ∂λ Fμν ,
2had
(3.19)
which governs the size of all low-energy electromagnetic hadronic observables [29].
5 Private
communication from the authors of ref. [19].
112
E. de Rafael
• In the case of the electron .(ge − 2)HVP the bounds in Eq. (3.15) are practically a
calculation. Using the experimental determination of the lowest HVP-moment: ˆ
∞
.
t0
dt t0 1 Imhad (t) = (0.7176 ± 0.0026) × 10−3 , t t π
(3.20)
one finds: .1.8550
× 10−12 < aeHVP < 1.8687 × 10−12 ,
(3.21)
which corresponds to an accuracy of 0.36%. I recall that the lowest order HVP experimental determination in the case of the electron anomaly gives [19]: = 1.8608(66) × 10−12 .
HVP
.ae
(3.22)
4 Mellin-Barnes Representation of aµHVP Equation (3.16) is a particular case of the moment identities: .
(−1)n+1 (t0 )n+1 (n + 1)!
∂ n+1 had (Q2 ) (∂Q2 )n+1
ˆ∞
Q2 =0
LQCD
=
dt t
t0 t
1+n
1 Imhad (t) , π
t0
n = 0, 1, 2, · · · ,
Experiment
(4.1) which follow from the dispersion relation in Eq. (2.3) and, as discussed in refs. [30– 32], provide excellent tests to confront LQCD evaluations of the L.H.S. with the R.H.S. experimental moment data-driven results. The moments in the R.H.S. of Eq. (4.1) are specific values of the more general Mellin transform of the hadronic spectral function ˆ .M(s)
=
∞ t0
dt t
t t0
s−1
1 Imhad (t) , π
s = 0, −1, −2, · · · ,
at
(4.2)
and an alternative representation of .aμHVP in terms of this Mellin transform has been proposed in ref. [30]: HVP .aμ
α m2 1 μ = π t0 2π i
csˆ+i∞
ds cs −i∞
m2μ t0
−s F(s) M(s) ,
(4.3)
where .cs ≡ Re(s) ∈]0, 1[ , t0 ≡ 4m2π ± > m2μ , and .F(s) is a known product of Gamma functions
Sixty Years of Hadronic Vacuum Polarization .F(s)
113
= −(3 − 2s) (−3 + s) (1 + s) .
(4.4)
The singular expansion of .F(s) at the L.H.S. of the fundamental strip .cs ≡ Re(s) ∈]0, 1[,6 i.e., .F(s)
1 6 25 1 97 1 11 − − + ··· , + + 3s 12 s + 1 10 s + 2 (s + 1)2 (s + 2)2
(4.5)
when inserted in the integrand of Eq. (4.3), gives successive contributions proporm2 tional to powers of . t0μ . The contributions from the simple poles are modulated by the moments in Eq. (4.2); those from the double poles by the log-weighted moments: .M
ˆ∞ (−n) =
dt t
t0
t0 t
n log
t 1 Imhad (t) , t0 π
n = 1, 2, 3, · · · ,
(4.6)
with the result:
HVP .aμ
=
α m2 μ
+
t0
π m2μ t0
m2μ 1 M(0) + 3 t0
2
t0 97 − 6 log 2 10 mμ
t0 25 − log 2 12 mμ
M(−1) − M (−1)
M(−2) − 6M (−2) + O
m2μ t0
3 ⎫ ⎬ ⎭
.
(4.7)
The bulk of the overall contribution to .aμHVP comes in fact from just the first few terms of this expansion. The first term is the upper-bound of ref. [29] with successive fast improvements from the following terms.
5 The Time Momentum Representation A few years ago the authors of ref. [35] made the crucial observation that the correlation function in Eq. (2.1), when expressed in the Euclidean where .x0
→ −ix0
xi → xi
d 4 x → −idx0 dx1 dx2 dx3
and
q μ xμ → −q0 x0 −
qi xi ,
i=1,2,3
(5.1) and in the special kinematic configuration where .q = 0, becomes the Fourier transform:
6 For a precise definition of the concepts singular expansion and fundamental strip see the excellent review article in ref. [33].
114
E. de Rafael
.ij (q0 , 0)
ˆ =
+∞
−∞
dx0 e−iq0 x0
ˆ
+∞ −∞
d 3 x δij 0|T Ji (x0 , x)Jj (0) |0 ,
(5.2)
G(x0 )
of a function .G(x0 ) which is accessible to LQCD evaluations. The corresponding integral representation of .aμHVP in terms of .G(x0 ) is called the Time Momentum Representation (TMR) [35]: HVP
.aμ
=
α 2 m π μ
ˆ
∞
ˆ dx0 G(mμ x0 )
0
∞
√
1 Imhad (ω2 ) , π
dω ω2 e−ω|x0 |
t0 G(x0 )
(5.3)
x0 =0
where .ω2 = t is the Minkowski t -variable of the spectral function, and the kernel .G(mμ x0 ) is given by a Meijer’s G-function: 1 x04 2,3 2 −1, − 2 , 0 ; −− G .G(mμ x0 ) ≡ (mμ x0 ) . 0, 1, −3, − 3 , −2 2 3,5 2
(5.4)
The shape of this Meijer’s .G2,3 3,5 (. . . ) function versus the dimensionless variable .xˆ 0 = mμ x0 is shown in Fig. 2. The Time Momentum Representation in Eq. (5.3) and the Spectral Function Representation in Eq. (3.10) are completely equivalent. One can see that from the fact that the .x0 -integral in Eq. 5.3 can be done analytically and coincides with the analytic expression for the kernel .K(t = ω2 ) evaluated in ref. [27]. Furthermore, the power moments in the .x0 -variable are also related to the usual Mellin moments as follows: ˆ
∞
.
0
dx0 x02n G(x0 ) = (2n + 1)
Fig. 2 Shape of the Meijer G-kernel function in Eq. (5.4) versus .xˆ0 = mμ x0
1 2
ˆ
∞
t0
dt 1−n 1 t Imhad (t) , t π
n integer ≥ 2 ,
(5.5)
Sixty Years of Hadronic Vacuum Polarization
115
which are the values of the Mellin transform in Eq. (4.2) at .s = 2n + 1. The .log tt0 weighted Mellin moments in Eq. (4.6) are also related to the .log x0 -weighted .x0 d moments as follows (.ψ(x) = dx (x)): ˆ
∞
.
0
dx0 x02n log x0 G(x0 )
= (2n + 1)
1 2
ˆ
∞
t0
" 1 dt 1−n ! t Imhad (t) , ψ(2n + 1) − log t t π
n ≥ 3.
(5.6)
6 Limitations of the LQCD Evaluations The LQCD evaluations of the HVP contribution to the muon anomaly provide a challenge to test the data-driven results which have been obtained using the dispersive representation in Eq. (3.10). Unfortunately, LQCD simulations at present do not have access with sufficient numerical precision to the full integration regions which are required in the different representations discussed above: limited range in 2 .Q when using the Euclidean representation in Eq. (3.11); only a few moments of the spectral function are accessible with sufficient precision; and in the time momentum representation which is the one favoured by the LQCD practitioners at present, the function .G(x0 ) can only be determined with adequate precision in a finite number of .x0 -points. These limitations can be partially complemented by the fact that one knows quite well, from perturbation theory in QCD (pQCD), the behaviour of the required hadronic functions at short distances; and at very long distances they are constrained by chiral perturbation theory (.χ PT), which is the effective field theory of QCD at the low-energy scale where only the Goldstone-like degrees of freedom (the low-lying pseudoscalar particles in the hadronic spectrum) are dynamically active. What is required, therefore, is an interpolation which covers the behaviour of the underlying hadronic functions in the full integration domain needed to evaluate .aμhad . The authors of ref. [20] have used specific methods adapted to their particular LQCD formulation, but it is important to have a well defined and systematic interpolation technique in the continuum as well in order to reduce uncertainties. At this point, the potential reader of this tribute to Alex Grossmann will realize that: This is precisely the kind of wisdom that Alex could have provided. What systematic interpolation methods applicable to the situations described above can one use?
The rest of my contribution describes various approaches which some of us have been trying, but first, several observations are in order: • The Spectral Function . π1 Imhad (t) is positive in all the physical domain .4mπ 2 ≤ ± t ≤ ∞ (it is proportional to a cross-section). However, it is known from
experiment that it has a lot of structure (superposition of many resonant-
116
E. de Rafael
multiparticle states). It is not possible from the present knowledge of the QCD underlying theory to compute the t -dependence of this function in its full physical domain. • By contrast, the three functions: 1. .had (Q2 ) in Eq. (2.3), which is the Hilbert Transform of . π1 Imhad (t) 2. The Mellin Transform of the spectral function ˆ .M(s)
=
∞ t0
dt t
s−1 t 1 Imhad (t) t0 π
(6.1)
3. And the Time Momentum Representation Function in Eq. (5.3), which is the second derivative of the Laplace Transform .L(x0 ) of the spectral function:
∂ .G(x0 ) = − ∂x0
2
ˆ L(x0 ) =
∞
√
dω e−ωx0 ω2
t0
1 Imhad (ω2 ) , π
ω2 = t
(6.2)
are all smooth functions of their variables: .had (Q2 ) is a Stieltjes type function and in fact .M(s) and .G(x0 ) are completely monotonous functions of their arguments. Question for Alex: Are these properties sufficient to assert that there exist rigorous interpolation methods for these functions?
Let me state a few more facts: • At small .Q2 values, .had (Q2 ) has a power series expansion: .
−
Q2 t0 had (Q2 ) ∼ M(0) − M(−1) + 2 t0 Q2 Q →0
Q2 t0
2 M(−2) −
Q2 t0
3 M(−3) + · · · ,
(6.3) where the coefficients .M(0), .M(−n), .n = 1, 2, 3, . . . are precisely the moments of the spectral function. Ramanujan’s master theorem [36], [37] implies then that ˆ
d
.
0 .
∞
Q2 t0
Q2 t0
s−1
Q2 M(−1) + M(0) − t0
Q2 t0
#
2 M(−2) + · · ·
= (s)(1 − s) M(s) ,
(6.4)
which allows one to reconstruct the Mellin transform .M(s) in the full complex s -plane from just the functional n-dependence of the discrete moments .M(−n), by the simple replacement: .n → −s . Unfortunately, Ramanujan’s theorem does
Sixty Years of Hadronic Vacuum Polarization
117
not tell us which is the best interpolating function we should use to approximate the exact .M(s) function when one only knows, numerically, a few .M(−n) moments.7 • Marichev Class of Mellin Transforms. The class in question is the one defined by standard products of gamma functions of the type .M(s)Marichev
=C
$ (ai − s)(cj + s) , (bk − s)(dl + s)
(6.5)
i,j,k,l
with constants C , .ai , .bk , .cj , and .dl and where the Mellin variable s only appears with a .± coefficient. The interesting thing about this class of functions is that all the Generalized Hypergeometric Functions have Mellin transforms of this type [38]. As a result, many functions one encounters in Mathematical Physics have a representation in terms of Mellin-Barnes integrals involving linear combinations of standard products of the Marichev type in Eq. (6.5). Furthermore, in our case, the monotonicity property of the Mellin transform implies precise restrictions on the subclass of Marichev-like functions that one may consider as potential interpolating functions. Based on these observations I shall next discuss the various methods that we have studied.
7 Mellin-Barnes Approximants Given N numerical values of the moments .M(−n), .n = 0, 1, 2, 3, · · · N − 1, one method that we have proposed [32], is the one which constructs successive .MN (s) functions of the Marichev class with the coefficients C , .ai , .bk , .cj and .dl fixed so as to reproduce the values of the first N -moments. These .MN (s) approximants are then inserted in the integrand of the R.H.S. of Eq. (4.3) to evaluate corresponding anomaly .aμHVP (N ) approximants. The numerical integration has to be made at a fixed .cs value within the fundamental strip along the corresponding imaginary axis: e.g., with the choice .s
=
1 − iτ , 2
(7.1)
the integral in Eq. (4.3) becomes a Fourier transform:
7 Padé Approximants to .
2 had (Q ) cannot be the answer because they fail to reproduce the singular pQCD behaviour at .s = 1 of .M(s). One needs an infinite number of Padé like terms for that.
118
E. de Rafael
Fig. of the function 3 Shape 1 2 − iτ in Eq. (4.3) versus .τ . The red curve is the real part of the function, the blue dashed curve its imaginary part
.F
HVP .aμ (N )
=
α π
%
m2μ 1 t0 2π
+∞ ˆ m2 1 1 iτ log t μ 0 F − iτ MN − iτ . dτ e 2 2
(7.2)
−∞
The weight function .F 21 − iτ defined in Eq. (4.4) is shown in Fig. 3 as a function of .τ . Notice that its real part (the red curve) is symmetric under .τ → −τ while its imaginary part is antisymmetric. Both thereal and imaginary parts fall very fast as .τ increases. Because of the shape of the .F 21 − iτ function and the growth
restrictions on .M 21 − iτ for large .τ , which are fixed by the fact that .(Q2 ) obeys a dispersion relation in QCD, the Fourier transform above is fully dominated by the behaviour of the integrand in a very restricted .τ -interval, .−T ≤ τ ≤ +T with T of order one. Marichev-like approximants constructed this way have been tested [32] with well-known analytic results in QED; in particular, with the rather complicated analytic expression for the contribution to .aμ from the muon vacuum polarization at two loops evaluated in ref. [34]. The results for the successive .aμHVP (N ) approximants as a function of the number N of input moments are shown in the second column of the Table below. They show a fast improvement of the accuracy at which they reproduce the exact result. HVP (N )
.aμ
3 results from Mellin Approximants .MN (s) in units of . πα
Input moments
HVP (N )
.aμ
Accuracy
.M(0)
.0.0500007
.5%
.M(0) , M(−1)
.0.0531447
.0.5%
.M(0) , M(−1) , M(−2)
.0.0528678
.0.004%
.M(0) , M(−1) , M(−2) , M(−3)
.0.0528711
.0.00075%
.M(0) , M(−1) , M(−2) , M(−3) , M(−4)
.0.0528706
.0.00018%
Sixty Years of Hadronic Vacuum Polarization
119
This seems very encouraging, but we cannot prove that the accuracy will necessarily improve with more input moments. In fact, beyond .N = 5 moments we find instabilities which we do not know if they are generic or simply due to numerical limitations. We have also tested this technique with the experimental values of the hadronic moments provided to us by the authors of ref. [19] with errors included. Using as input the first four moments, we reproduce the result in Eq. (3.5) to a similar accuracy. Again, this is encouraging, but we cannot prove that the method should systematically work as the input accuracy is improved. Any comments about this, Alex?
7.1 Euler Beta-Function Approximants The most successful class of approximants of the Marichev class that we have found are superpositions of simple Euler Beta functions α5 (n − s) , λn (bn − n) π3 (bn − s) N
.MN (s)
=
n=1
=
N α5 λn B (n − s, bn − n) , π3
λ1 = 1 ,
(7.3)
n=1
where the .λn and .bn are free parameters to be fixed from the matching to LQCD or experimental results. The overall constant in Eq. (7.3) is fixed by QCD: .λ1 = 1 because of the pQCD requirement: .MN (s)
∼ M
QCD
s→1
α (s) ∼ s→1 π
i
qi2
1 1 Nc , 3 1−s
(7.4)
&
and . i qi2 = 53 for .i = u, d, s, c, b, t . The Mellin approximants .MN (s) in Eq. (7.3) induce corresponding hadronic selfenergy approximants:
.N (Q
2
)=−
Q2 1 t0 2π i
csˆ+i∞
ds cs −i∞
Q2 t0
−s (s)(1 − s) MN (s) ,
cs ≡ Re(s) ∈]0, 1[ ,
(7.5)
120
E. de Rafael
which are superpositions of Gauss Hypergeometric Functions: N α 5 Q2 (bn − n) .N (Q ) = − λn (n) 2 F1 π 3 t0 (bn )
2
n=1
1 n Q2 . − bn t0
(7.6)
The Gauss Hypergeometric Function has a well-known integral representation . 2 F1
ˆ 1 1 (c) a b dx x b−1 (1 − x)c−b−1 , z = (b)(c − b) (1 − xz)a c 0
(7.7)
known as Euler formula. Setting .x
=
t0 t
and
z≡−
Q2 , t0
(7.8)
in Euler’s formula results in the integral representation . 2 F1
b−1 ˆ ∞ dt t0 t0 t0 c−b−1 (c) 1 b Q2 , 1− = − (b)(c − b) t0 t t + Q2 t t c t0
(7.9) which is a dispersion relation of the same class as the generic HVP dispersion relation ˆ .
− N (Q2 ) =
∞
4m2π
dt Q2 1 ImN (t) , t t + Q2 π
(7.10)
and we can, therefore, associate equivalent spectral function approximants α5 1 λn ImN (t) = π π3 N
.
n=1
4m2π t
n−1 1−
4m2π t
bn −n−1 θ (t − 4m2π ) ,
(7.11)
to the hadronic self-energy approximants .N (Q2 ) in Eq. (7.6). Unfortunately, we have not succeeded in finding necessary and sufficient analytic conditions on the parameters .λn and .bn to ascertain the positivity of . π1 ImN (t) and we are obliged to resort to numerical checks at each N -approximation. These Euler Beta-function approximants have been tested with well-known analytic results of QED lepton vacuum polarization, as well as with the experimental results of ref. [19] and the tests are very encouraging [39]. Furthermore, the shapes of the resulting spectral functions approximants reproduce better and better the basic features of the exact spectral functions; but again, we cannot claim that the method can be systematically improved.
Sixty Years of Hadronic Vacuum Polarization
121
8 Time Momentum Representation Approximants The application of the methods discussed so far require the determination of at least a few .M(−n) moments with sufficient accuracy which, unfortunately, in the LQCD simulations decreases dramatically as n increases. This is why the LQCD community has nowadays concentrated their efforts on a direct determination of the time momentum function .G(x0 ) defined in Eq. (5.3), at the .x0 -points where it is accessible with good accuracy. This suggests a direct question for Alex: knowing numerically this .G(x0 ) function in a finite interval, as well as its behaviour at small .x0 -values and at very large .x0 , what is the best one can do to interpolate the .G(x0 ) function at all .x0 -values?
There is an obvious way to answer this question that we have tried: insert the successful Beta-Function approximants to the hadronic spectral function, i.e., Eq. (7.11), in the integrand of .G(x0 ) with the result:
.GN (x0 )
=
=
N α 5
π
3
ˆ λn
n=1
∞
√ t0
2
dω ω e
−ωx0
t0 ω2
n−1
t0 1− 2 ω
bn −n−1
ˆ 1 N α 5 √ √ 1 3 t0 λn dz e− t0 x0 z (z2 )n−3 (1 − z2 )bn −n−1 , (8.1) π 3 0 n=1
where in the second line we have used .z = and then do the z-integral. One finds:
√
t0 ω
as a convenient integration variable,
N α 5 √ λn 3 t0 .GN (x0 ) = √ (bn − n) G3,0 1,3 π 3 2 π n=1
t0 x02 4
bn − 52 5 − 2 + n, 0, 12
(8.2)
which, not surprisingly, is a sum of Meijer’s G-functions once more. Then, fix the free parameters .λn and .bn (constrained by a numerical check of the positivity requirement of the corresponding spectral function approximant in Eq. (7.11)) by performing a least squared fit to the function .G(x0 ) in the region where it is numerically known with good accuracy. Unfortunately, this does not work. The results we find depend too much on the .x0 input choice region. Therefore: My final question for Alex: Is the information provided by the moments of a positive function, like .G(x0 ), necessarily better than the information provided by the values of the same function in a finite interval, to construct approximants to the full function?.
In the next subsection I discuss an alternative approach to construct approximants to the .G(x0 ) function.
122
E. de Rafael
8.1 Heaviside Spectral Function Approximants This is a new approach that we are studying [40]. At its simplest level, the first approximant of this type consists of a single Heaviside Spectral Function Distribution (.HSA1 ): .
1 Im(ω)HSA1 = λpQCD θ (ω − M1 ) , π
M1 >
√
t0 ,
and
λpQCD =
5α (1 + O(αs )) , 3π
(8.3) where .M1 is a free parameter and .λpQCD has been fixed so as to reproduce the asymptotic pQCD behaviour of the hadronic spectral function with six flavours at .ω → ∞. The HVP self-energy in this approximation is a simple log-function: ˆ .
− (Q )HSA1 = 2 2
∞ √ t0
1 dω Q2 Q2 Im(ω)HSA1 = λpQCD log 1 + 2 ω ω 2 + Q2 π M1
(8.4)
which, contrary to the usual Padé approximants which are rational functions, has the welcome feature that it satisfies the pQCD asymptotic behaviour at large .Q2 . The Mellin transform of the .HSA1 spectral function is then ˆ .M(s)HSA1
=2
∞ √ t0
dω ω
ω2 t0
s−1 λpQCD θ (ω − M1 ) =
M12 t0
s−1
λpQCD , 1−s
(8.5)
and the values at .s = 0, −1, −2, · · · fix the coefficients of the Taylor expansions of 2) 2 HSA1 at .Q → 0. The Heaviside spectral function in Eq. (8.3) generates a time momentum representation function .G(x0 )HSA1 , which is an exponential modulated by a factor reminiscent of the fact that we are dealing with the second derivative of the Laplace transform of the spectral function:
.(Q
ˆ .G(x0 )HSA1
=
∞ √ t0
dωe−ωx0 ω2
λpQCD −M1 x0 1 2 2 Im(ω2 )HSA1 = 2 + 2M . e x + M x 1 0 1 0 π x03
(8.6) This function has the correct pQCD behaviour at .x0 = 0 and it vanishes exponentially at long distances. In order to test this simple first .HSA1 -approximant, one can fix the value of the 2 .M1 -free parameter from the identification of the slope at the origin of .(Q )HSA1 with the experimental central value of the first moment. The resulting .M1 -value (the only free parameter in this case) and the corresponding prediction for .aμHVP are: .M1
= 0.6484 GeV
and
aμHVP (HSA1 ) = 7.225 × 10−8 ,
(8.7)
which already reproduces the central value of the experimental determination of HVP in ref. [19] at the 4% level.
.aμ
Sixty Years of Hadronic Vacuum Polarization
123
In general, the Heaviside Spectral Function Approximant that we propose consists of a superposition of N-Heaviside Spectral Distributions 1 Im(ω)HSAN = λi θ (ω − Mi ) , π N
.
(8.8)
i=1
with increasing mass-parameters .Mi+1
> Mi ,
(8.9)
M1 > 2mπ ,
and with the dimensionless .λi -couplings restricted to satisfy the positivity of the spectral function . π1 Im(ω)HSAN , as well as the asymptotic freedom pQCD behaviour. In this case the necessary and sufficient positivity conditions are rather simple, which is an advantage of this method: .λ1
> 0,
λ1 + λ2 > 0 ,
...
λ1 + λ2 + · · · + λN−1 > 0 ,
and
N
λi = λpQCD .
i=1
(8.10) The total number of free parameters at a given N-approximation is therefore 2N-1, and the resulting .G(x0 ) function in Eq. (5.3) is a sum of modulated exponentials: .G(x0 )HSAN
=
N 1
x03
λi e−Mi x0 2 + 2Mi x0 + Mi2 x02 .
(8.11)
i=1
To see the improvement one obtains when going from the first .HSA1 -approximant to the second .HSA2 -approximant we use as input the central values of the first three moments of the physical spectral function. This fixes the values of the required four parameters: .M1 , .M2 , .λ1 , .λ2 , with .λ1 + λ2 = λpQCD , and the resulting muon anomaly is now HVP
.aμ
(HSA2 ) = 6.966 × 10−8 ,
(8.12)
which reproduces the experimental central value at the 0.5% level, quite a significant improvement. Unfortunately, when trying further approximants, with .N = 3, 4, . . . , we find unstable results which, again, we do not know if they are generic to the method or due to numerical limitations when trying to increase the number of input parameters.
9 Conclusion I miss Alex very much!
124
E. de Rafael
I would have liked to discuss with him the ideas presented above, but I would not have liked to abuse of his precious time. I think it is time to stop. Thanks for listening Alex! Acknowledgments I am very much indebted to my CPT colleagues: Jérôme Charles, David Greynat, Marc Knecht, and Laurent Lellouch for their collaboration on many of the topics discussed in this tribute to Alex Grossmann.
References 1. P.A.M. Dirac Cambridge Phil. Soc. 30150 (1934). 2. W. Heisenberg, Zeitschrift für Physik, 90 209 (1934). 3. R. Serber, Phys. Rev. 48 49 (1935). 4. E.A. Uehling, Phys. Rev. 48 55 1935. 5. The Nobel Prize in Physics 1965 Lectures. 6. F. Dyson, The Radiation Theories of Tomonaga, Schwinger, and Feynman. Phys. Rev. 75 486 (1949). 7. G. Kallen, Hel. Phys. Acta 25 417 (1952). 8. H. Lehmann, Nuovo Cim. 11 342 (1954). 9. A. Grossmann and T.T. Wu, Jour. Math. Phys. 2 710 (1961). 10. A. Grossmann, Jour. Math. Phys. 2 714 (1961). 11. A. Grossmann and T.T. Wu, Jour. Math. Phys. 3 684 (1962). 12. A. Martin, Scattering Theory: Unitarity, Analyticity and Crossing, Lecture Notes in Physics, Springer-Verlag (1969). 13. G.W. Bennett et al. (The g-2 Collab.), Phys. ReV. D73 072003 (2006). 14. J. Grange et al., (Muon g-2 Collaboration), Muon (g-2) Technical Design Report, 2015, arXiv:1501.06858 [physics.ins-det]. 15. M. Abe et al., A new approach for measuring the muon anomalous magnetic moment and electric dipole moment, PTEP 2019 (5) 053C02 (2019) , arXiv:1901.03047 [physics.ins-det]. 16. Th. Blum, A. Denig, I. Logashenko, E. de Rafael, B. Lee Roberts, Th. Teubner and G. Venanzoni, The Muon (g − 2) Theory Value: Present and Future, arXiv:1311.2198v1 [hepph] 17. T. Aoyama et al, White Paper: The anomalous magnetic moment of the muon in the Standard Model, arXiv:2006.04822v1 [hep-ph] (to appear as a Physics Reports). 18. M. Davier, A. Hoecker, B. Malaescu, and Z. Zhang, Eur. Phys. J. C80, 241 (2020). 19. A. Keshavarzi, D. Nomura, and T. Teubner, Phys. Rev. D101, 014029 (2020). 20. Sz. Borsanyi, Z. Fodor, J. N. Guenther, C. Hoelbling, S. D. Katz, L. Lellouch, T. Lippert, K. Miura, L. Parato, K. K. Szabo, F. Stokes, B. C. Toth, Cs. Torok, L. Varnhorst (BMWccollaboration), arXiv:2002.12347v1 [hep-lat] 27 Feb 2020. 21. C. Bouchiat and L. Michel, J. Phys. Radium 22 121 (1961). 22. B.E. Lautrup, A. Peterman and E. de Rafael, Phys. Rep. C3 193 (1972). 23. E. de Rafael, Phys. Lett. B322 239 (1994). 24. T. Blum, Phys. Rev. Lett. 91 052001 (2003. 25. M. Gourdin and E. de Rafael, Nucl. Phys. B10 667 (1969). 26. S.J. Brodsky and E. de Rafael, Phys. Rev. 168 1620 (1968). 27. B.E. Lautrup and E. de Rafael, Phys. Rev. 174 1835 (1968). 28. J. Schwinger, Phys. Rev. 76 790 (1949). 29. J.S. Bell and E. de Rafael, Nucl. Phys. B11 611 (1969). 30. E. de Rafael, Phys. Letters B736 522 (2014). 31. E. de Rafael, Phys. Rev. D96 014510 (2017).
Sixty Years of Hadronic Vacuum Polarization
125
32. J. Charles, D. Greynat and E. de Rafael, Phys. Rev. D97 076014 (2018). 33. Ph. Flajolet, X. Gourdon and Ph. Dumas, Theor. Comput., Sci. 144 3 (1995). 34. J. Mignaco and E. Remiddi, Nuovo Cim. 60A 519 (1969). 35. D. Bernecker and H.B. Meyer, Eur. Phys. J. 47A 148 (2011). 36. B. Berndt, Ramanujan’s Notebooks, Part I. Springer -Verlag, New York, (1985). 37. G.H. Hardy, Ramanujan. Twelve Lectures on subjects suggested by his life and work, Chelsea Publishing Company, New York, 3rd ed., (1978). 38. O.I. Marichev, Handbuch of Integral Transforms of Higher Transcendental Functions: Theory and Algorithmic Tables, Wiley, New York, (1983). 39. E. de Rafael Update on Mellin-Barnes Approximants to HVP, talk at the 2nd Workshop of the Muon g-2 Theory Initiative, Mainz June 2018, based on work with Jérôme Charles and David Greynat. 40. J. Charles, D. Greynat and E. de Rafael, work in progress.
Standard Model, and Its Standard Problems Chris P. Korthals Altes
Abstract Since fifty years the standard model of particle physics reigns supreme. Not only does it explain the huge amount of accelerator data amassed with great precision. In cosmology it does have a say and may explain dark matter and absence of anti-matter. The model obtains by gauging its symmetries to generate the electroweak, strong and gravitational force. Symmetry breaking effects on the quantum level cause the standard model to obey a set of consistency conditions in order to be a viable quantum theory. These conditions need gravity to fix the hypercharges. The inclusion of gravity poses one of the standard problems, the unbearably heavy weight of the vacuum. Another standard problem discussed is the colossal spread in masses of the particles, more than .1013 decades.
1 Prologue Es gibt viele Theorien, die sich jedem Check entziehen. Dieses aber last sich checken: Elend werden sie verrecken. Christian Morgenstern(?)
To contribute to a liber amicorum licentiosorum for Alex Grossman is a great honour. Alex was fluent in Latin so I might please him by using “licentia” for “lack of discipline.” After all this book pretends to be a collection of undisciplined essays.
C. P. Korthals Altes () NIKHEF, Theory Group, Amsterdam, The Netherlands Aix Marseille Univ, Université de Toulon, CNRS, CPT, Marseille, France © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Flandrin et al. (eds.), Theoretical Physics, Wavelets, Analysis, Genomics, Applied and Numerical Harmonic Analysis, https://doi.org/10.1007/978-3-030-45847-8_10
127
128
C. P. Korthals Altes
Few people were as versatile and disciplined as Alex. His activity ranged from mathematical aspects of quantum mechanics to the theory of signal analysis. And in 1993 he joined a group in the Paris region working on genomics. When in 1971 I arrived at the Centre Physique Théorique in Marseille it was Alex who found us a house in Cassis, not far from where he lived, a quaint house looking out over the Mediterranean and the Calanques. We used to drive together to the institute, together with Dickey, working at the same campus. To visit the Grossman home taught you immediately about Alex’s formidable erudition, simply by scrutinizing his library. When discussing with him on physics topics his quick and deep insights proved always helpful. He had this wonderful smile, sometimes apologetic, often with a certain mischief. This contribution is about particle physics, and I will have an undisciplined look –for undisciplined laymen with the emphasis on laymen– at the standard model as it stands now, in 2021. It does not pretend to be a review for seasoned practitioners of the model. Of course I feel the mischievous smile of Alex, looking over my shoulder when writing these lines. Of one thing though I am pretty sure: Alex would have liked, and thoroughly agreed with Morgenstern’s poetic uttering!1 The model –now almost fifty years old and certainly not in the category of Morgenstern’s theories– has a built in property called chirality or handedness. All the elementary particles, like quarks, electrons, neutrinos, have a left and right handedness. The handedness of a particle is the correlation between the orientation of its intrinsic spin and its momentum, which can be a left handed screw sense or a right handed one. It has only an invariant meaning when the particle moves with the velocity of light; then one cannot go faster than the particle and view the screw sense in the opposite direction. Photons and gravitons are examples. Neutrinos are created as ultra-relativistic particles and were thought to be massless for almost fifty years, but the three neutrinos species have been shown to oscillate into one another, an impossibility without mass. Incidentally handedness is a ubiquitous element in nature: it also plays a role in the fundamentals of life that Alex was studying in genomics. In particle physics chirality made its appearance through the work of Sudarshan and Marshak [19], after the realization that parity was not a symmetry of weak interactions [18]. This was an extremely important step and led Nambu [21] to surmise an approximate chiral symmetry of the strong interactions. The title of this contribution refers to the problems that the Standard Model lived with during its half century life time. The first is called the cosmological problem and has to do with the energy of the ground state. It has not yet a solution, and practitioners have learnt to live with it, because the model has so many successes. The second is the problem of Dark Matter, and with the recognition that neutrinos have mass and therefore more degrees of freedom called the heavy neutral leptons,
1 There are many theories that cannot be checked. But one thing you can check: they will come to a miserable end.
Standard Model, and Its Standard Problems
129
and are very weakly interacting, there are models of the lightest and most stable providing an explanation. The third is the problem of baryon asymmetry, the fact that in the visible universe only baryons are seen. Anti-baryons appear only as secondaries. This problem has an explanation in terms of hypothetical heavy neutral leptons being Majorana particles. This is not say that these are the only problems. Accelerator data do show friction with the assumed universal properties of leptons. But in this essay the emphasis is on the first three. We first concentrate on the standard model and discuss how a principle, the gauge principle, brings forth the four interactions playing out on the microscopic level. They are the electromagnetic and weak interactions, the strong nuclear force and gravity. Then we will see how chirality in a relativistic quantum field theory is not conserved in the presence of external gauge fields. This is a fundamental fact that plays an important role in what follows. As Alex’s work was mainly in mathematics and mathematical physics I thought it was fitting to concentrate on the aspects of particle physics that received boosts from giants of that trade: Weyl, Cartan, Chern, Simons, Atiyah, and Singer. The genesis of electromagnetic gauge symmetry has its roots in the phase independence of the Schrödinger wave function of a charged particle and was invented by Fock [1] and elevated to a principle by Weyl [2]. The gauge symmetry of non-abelian phases is due to Yang and Mills [4]. They took global symmetries in particle physics, like isospin symmetry, that map one particle species (say a neutron) onto another one (the proton) and made them local. The action describing the theory is supposed to be local. Hence, to keep the theory invariant under the local symmetry one has to provide the gradient terms with an additive vector field –the gauge field. This field is supposed to transform under the local symmetry exactly as to undo the lack of covariance and to render the derivative into a covariant derivative under local transformations. The transformation law for the gauge field always has an inhomogeneous term only depending on the transformation and a gradient thereof. The new vector field makes now physical transitions possible between particle species, like a neutron into a proton by what is now called weak isospin gauge symmetry acting on the level of quarks. This transition is nothing but what is known as beta decay. With every charge (generator of the group) is a gauge field associated. So for weak isospin we have three vector fields, for the strong interactions the SU(3) colour group has eight vector fields. On top of introducing new physical transitions the commutator of the covariant derivative gives us the field strength. It transforms as the adjoint representation under the local transform. And that permits us to form an invariant by taking the square of the field strength, in analogy with Maxwell theory. That completes the action for the non-Abelian Maxwell theory. Every local invariance of the Standard Model invokes such a vector field (or vector fields when the symmetry is a non-Abelian group). And these fields are conveying the three forces acting in the subnuclear world, the electromagnetic force,
130
C. P. Korthals Altes
the weak force like in beta decay, and the strong force acting between quarks and as a van der Waals force between nucleons constituting the nuclei. And what about the fourth force, gravity? The previous discussion suggests we should push the gauge paradigm in the Standard Model to its logical end: change the model’s Poincaré invariance into a local invariance and see what gauge fields are to be introduced to get covariant derivatives. Since all fields are feeling the Lorentz transformation and translation the corresponding gauge fields will couple to all matter. Precisely this was investigated by Utiyama and Kibble [6], a decade before the advent of the standard model: the genesis of gravity as a gauged version of the Poincaré group. They found that a gauge covariant derivative could be constructed with a vector potential gauging the local Lorentz transformations and a vector field called the Vierbein with four degrees of freedom that transform under the local Lorentz transformations. Insisting on local Lorentz covariant derivatives produces then a diffeomorphism connexion which is not necessarily symmetric in its lower indices. The antisymmetric part is called torsion. Elie Cartan [5] had introduced forty years earlier in different language the same idea. Amusingly, Cartan was inspired by work on torsion and defects in crystals, and indeed their mathematical description [9] bears resemblance to that of torsional gravity. This resemblance has been taken up again and again in condensed matter physics. Hermann Weyl [3] understood already in 1950 that the extra component had also to do with intrinsic spin of the system involved and this motivated Sciama [6] at about the same time as Kibble. Later the subject was vigorously taken up by Hehl and coworkers [7]. For a review from the quantum field theory point of view see the one by Shapiro [8]. Another answer to the question of the relevance of gravity in particle physics should be and is based on experimental facts. Recent experiment [10] has shown that nuclei are behaving the same way as the matter in bulk used in Eötvös’ experiment[11], with an accuracy for the Eötvös parameter2 η = 1.6 ± 1.8(stat) ± 3.4( syst)] × 10−12 .
.
So it may not be a stretch of imagination to extend the mathematical formulation of Einstein gravity to the level where the other three forces operate: the Standard
2 This result, the equivalence principle, is to be expected for elementary particles if you are ready to accept that the gravitational field couples to the particles through a massless spin two particle. Furthermore that relativistic field theory governs elementary particles. The argument is sixty years old and due to Weinberg [12].
Standard Model, and Its Standard Problems
131
Model. The experimental result suggests that gravity can only couple through terms that respect the equivalence principle. This includes covariant derivatives and the metric tensor. However, this would exclude terms like the gravity curvature scalar coupling to matter or Einstein tensors coupling to kinetic terms as in Higgs inflation models [28, 29]. Of course both are vanishing in earth bound labs, so effects of those terms are not easy to see. But they are important at the beginning of inflation. Now the standard model in its quantized form is only consistent if it obeys certain conditions [36]. These conditions derive from the special nature of one of the symmetries that are gauged. This symmetry we already mentioned, chiral symmetry, and involves the handedness of the particle fields. As explained in Sect. 3 chiral symmetry for a given particle species cannot be maintained in the quantal version of the theory. But one can devise conditions between the different species that lead to the cancellation inside a family of species. Gravity comes in because one of the above consistency conditions involves gravity. If you take all the conditions into account the quantum numbers of the particle species are fixed, up to a common multiplicative factor. We need only say the charge of the electron to fix them all. Since gravity, along with the electroweak force and the strong force, participates in this fixing it may well be a gauge symmetry like the others. And indeed it is as we know from Utiyama and Kibble’s work [6]. It is explained in Sect. 4. Apart from the gauge anomalies we mentioned there is the global anomaly of the baryonic and leptonic charges [31]. They are the same for both due to a nice interplay between the weak hyper charges. And also in the gravitational sector if we admit a right handed neutrino in the standard model. Its presence causes a symmetry between leptonic and quark sectors apart from the colour degrees of freedom for the quarks. If on the other hand we admit only a left handed neutrino with a Majorana mass then they are not the same, and both B ±L are anomalous. These are anomalies that give rise to in principle observable facts. This contribution is set up as follows. In Sect. 2 I introduce the Standard Model for those not knowing it. In Sect. 3 we consider the fixing of the weak hypercharge conservation by including all particle species in a given family through the cancellation of the anomalous hypercharge non-conservation. In Sect. 4 the construction of the gauge covariant derivative for local Poincaré invariance is discussed in the spirit we sketched above. Then we turn to the global anomalies and two scenarios of completing the neutrino sector, Sect. 5. We conclude in the final Sect. 6. This contribution is mostly review and I will suppose the reader is not expert in the field but has some knowledge of relativistic field theory and gravitation.
132
C. P. Korthals Altes
2 A Short Introduction to the Standard Model A.D. 2021 The standard model [35, 36] consists of massless matter multiplets, in the form of quarks and leptons. The force coupling to both quarks and leptons is the electroweak force with coupling .g2 and .g1 , the force coupling to only the quarks and confining them into hadrons is the strong colour force with coupling .g3 . The weak force produces through beta decay electrons and anti-neutrinos, and through electron capture in heavy elements positrons and neutrinos, the latter showing they were left handed [20]. At the waning of last century it was experimentally shown [32, 33] that neutrinos are oscillating and must therefore have a mass. The question remains whether this mass is of the Dirac type or Majorana type. The first needs a right handed neutrino, and renders the number of degrees of freedom of the leptonic right handed sector and of the right handed quark sector the same, if we neglect colour. Right handed neutrinos are potential candidates for dark matter. Not adding a right handed neutrino but adding a Majorana mass for the left handed neutrino is still experimentally admitted. The electroweak force is the gauge theory of an SU(2) symmetry [13] that couples only to left handed fermions. This makes of course only sense if the left handed fermion stays a left handed fermion and does not couple to a right handed fermion. Let .ψ be a Dirac spinor then 1 (1 − γ5 )ψ 2 1 ψR = (1 + γ5 )ψ. 2 ψL =
.
¯ couples necessarily .ψL to .ψR . So all fermions must be A Dirac mass term .mψψ massless, as we want the .SU (2)L symmetry to be exact. Of course, nature provides us with massive particles. The Dirac masses will be induced by the interaction with a scalar particle, the Higgs boson. We work with Dirac matrices obeying .{γ μ , γ ν } = 2ημν , with signature .ημν = diag(+, −, −, −) and .γ 5 = iγ 0 γ 1 γ 2 γ 3 . Units are such that .h¯ = c = kB = 1, 8π G = MP−2 . The latter is called the reduced Planck mass and G is Newtons constant. There is one more mass scale in the Standard Model, the Fermi scale −16 . .v = 250 GeV and the ratio is .v/MP = 10 There is a U(1) gauge theory that couples to the weak hypercharge Y of left and right handed fermions. It is this part of the standard model that gives rise to the anomalies necessitating constraints. The strong force is the gauge theory of the SU(3) colour symmetry that does not prefer right or left handed fermions. It has no anomalies playing a role in the strong gauge symmetry. In the heading of the table .(dSU (3) , dSU (2) )Y stands for the dimension of the colour .SU (3) triplet and weak .SU (2) isospin doublet. The suffix Y stands for the value of the weak hypercharge. The three families are separated by a horizontal
Standard Model, and Its Standard Problems
133
spacing. The first two columns of a family contain the left handed weak isospin doublets of leptons and quarks. The quarks have a colour index as well, .c = 1, 2, 3. The last three columns are the leptonic right handed isospin singlets, and the right handed isospin singlet, colour triplet quarks. Note the presence of hypothetical right handed neutrinos in the right most column. They have no gauge charge and are called sterile neutrinos. This version of the standard model puts the neutrinos on the same footing as the charged leptons and indeed as the quarks (if you forget about their colour degrees of freedom). This model has the attractive feature that besides the light neutrinos coupling to the gauge fields we have heavy neutrinos (not necessarily heavier than the electroweak scale) that are sterile and may be candidates for dark matter and for an understanding of the absence of anti-matter through leptogenesis. Moreover they may be at experimentally accessible accelerator energies. Their presence is in the first place one of the options to accommodate the mass for the neutrino. This mass may be a Dirac mass, involving the left handed neutrino or a Majorana mass involving its own charge conjugate, or both. The latter is the most in use, and leaves us with three light Majorana neutrinos and three Majorana’s at a scale which need not be higher than the Fermi scale. As a maybe academic alternative one might want to describe only the experimentally observed neutrinos and not introduce right handed neutrinos. In that case one has introduce a Majorana mass from the left handed neutrino and its charge conjugate, the Weinberg mass term [39]. We will come back to this in Sect. 5.Then one is faced with an anomalous .B − L current, due to gravitation. It is of interest to understand what this anomaly might mean for the neutrino mass. There is the Higgs scalar particle [22], foundexperimentally in 2012 [34] It H+ is a member of an isospin doublet .H = with hypercharge .− 12 and its H0 hypercharge conjugate doublet .H˜ = iσ2 H ∗ . .dl is the hypercharge of the lepton doublet, .dq that of the quark doublet, .se that of the electron singlet, etc. It will turn out in the next section that the consistency conditions imposed by the anomalies fix the five fermionic charges numerically in terms of the hypercharge .dh of the Higgs doublet. Finally we need to write down the Lagrangian for the Standard Model. The local symmetries are .SU (3), acting equally on left and right handed quarks; .SU (2)L acting one the left handed lepton, quark doublets and the Higgs doublet. .U (1)L acting on the left handed doublets, and .U (1)R on the right handed singlets and on the Higgs doublet. The action consists of several parts. The matter part .SM containing the quarks and the leptons coupling through the covariant derivatives to the gauge fields. The gauge field part .SG containing the kinetic terms of the non-Abelian gauge fields and the Higgs field action .SH containing its kinetic term and its self-interaction .V (H ). Finally the Yukawa interactions .SY uk between the matter fields and the Higgs field.
134
C. P. Korthals Altes
The total action is Ssm = SM + SG + SH + SY uk + Sν .
(2.1)
.
The last term had to be added after the discovery of neutrino oscillations and we will discuss it later in this section. To write the model in an economic form we introduce the indices .c = 1, 2, 3 for the three colours. The .SU (2)L doublets are quark doublets and carry a colour index, the lepton doublets and the Higgs doublet carry no colour index c .QL
=
ucL dLc
, lL =
νeL eL
, H =
H0 H−
.
The vector potentials for weak isospin, weak hypercharge and gluons are written as 12 τ ˆ μ = −ig3 Gaμ λa 13 . Bˆ μ = −ig1 Bμ 13 , Wˆ μ = −ig2 Wμ . 12 , and G 2 2
.
For .SU (2) the generators .τ are the three Pauli matrices. For .SU (3 the .λa are the eight Hermitian Gell-Mann generators. The unit matrices .12 refer to isospin and .13 to colour. So we have explicitly for the matter Lagrangian .LM ¯ L iγ μ Dμ (Wˆ + dq Bˆ + G ˆ μ )QL + L¯ L iγ μ Dμ (Wˆ + dq B)L ˆ L LM = Q ˆ μ )qR + ˆ R. + q¯R iγ μ Dμ (sq Bˆ + G l¯R iγ μ Dμ (sl B)l
.
q=u,d
(2.2)
l=νe ,e
In the first line we wrote the kinetic term for the left handed doublets, in the second that of the right handed singlets. For the fermions in the second and third generation nothing changes in these couplings. The derivatives in the matter actions are covariant derivatives with respect to local .SU (3) × SU (2)L × U (1)Y transformations. From these covariant derivatives one gets through their commutators the field strengths .Fμν . The latter give the gauge invariant kinetic terms for the vector bosons 1 1 1 SG = − T r(F (W )μν F μν − F (B)μν F (B)μν − T rF (G)μν F (G)μν . (2.3) 2 4 2
.
The Higgs field does not couple to colour and so we have SH
.
2 v2 † μ ˆ † ˆ ˆ ˆ = (Dμ (W + dh B)H ) D (W + dh B)H + λ H H − + V . (2.4) 2
Standard Model, and Its Standard Problems
135
We added an arbitrary constant V which adds to the canonical energy momentum tensor a term .ημν V when the potential term is minimized. It is arbitrary, due to the (unknown) physical mechanism underlying the appearance of a vacuum expectation value (VEV) in the Higgs potential. If the Higgs doublet develops a vacuum expectation value .H † H = v 2 with real fields .h, hˆ and a complex field .H + fluctuating around v .
1 √ 2
H+ , v + h + i hˆ
(2.5)
then this expectation value breaks the .SU (2)L × UY (1) symmetry but respects a U (1)e.m. subgroup. This can be seen from the stability group of the Higgs field –in absence of fluctuations– iα e 0 0 0 . (2.6) = , 0 1 v v
.
which is associated with the photon field and electric charge Q. More precisely the matrix in (2.6) is a combination of weak isospin rotations around the 3-axis and .U (1)Y phase rotations and can be written as .
exp iα
τ3 1 . + 2 2
(2.7)
The second term in the exponent is the hypercharge .dh of the Higgs, hence .dh = 1/2 for the electromagnetic gauge group to respect the VEV of the Higgs field. The electric charges of the left handed fermions obey then Qf = I3 + df , f = q, l,
.
(2.8)
and those of the right handed fermions Qf = sf , f = e, u, d.
.
(2.9)
When expanding the Higgs action (2.4) the invariance guarantees the photon stays massless but the electrically charged W bosons become massive with a mass v .mW = g2 2 ∼ 80 GeV and the neutral Z boson has a mass on the same order of magnitude. It depends on the relative strength of the .SU (2) and .U (1)Y couplings, the Weinberg angle .
mW g2 . = cos θw , with cos θw = mZ g22 + g1 2
136
C. P. Korthals Altes
The longitudinal spin degree of freedom the latter need is provided by three of the ˆ and .H ± . four degrees of freedom in the Higgs doublet, the fields .h, The remaining degree of freedom h is the field of the Higgs particle and its existence has been confirmed in 2012. The Yukawa couplings couple through the Higgs multiplet left to right handed fermions, respecting the .SU (2) × U (1) symmetry and generate masses of the Dirac type through the Higgs expectation value ¯ L H˜ λQu uR + Q ¯ L H λQd dR + l¯L H˜ λlν νeR + l¯L H λle eR + h.c. (2.10) SY uk = Q
.
The couplings .λQu , etc. do connect the three families. They are three by three matrices and taken to be complex. To obtain the masses for the fermions to lowest order we substitute for the Higgs (2.5) its VEV and dropping the fluctuation terms. To obtain the masses we have to diagonalise the couplings. For this we need in the quark sector the two unitary matrices .Uu,d diagonalizing, respectively, the first and second term in 2.10. If we now write the weak charged currents .uγ ¯ μ μdWμ− from Eq. (2.2) in terms of the mass eigenstates the product .Uu† Ud appears in the current expressed in mass eigenstates. This matrix, the CKM matrix [40], has measurable effects and its three angles and one phase has been determined experimentally. For the lepton sector such a matrix can always be absorbed in the left handed neutrino field as long as the neutrino is massless. In the neutral current the unitarity of .Uu,d causes absence of any family mixing to lowest order.
2.1 The Neutrino Mass Sector Sν The last term in the SM action (2.1) is the part that refers to the massive neutrinos. The simplest way to accommodate them is to introduce a right handed neutrino field .νR which is an .SU (2) × U (1) scalar. This is done for all three generations. We will insist on only renormalizable couplings. Then .Sν becomes 1 Sν = ν¯ R ∂/νR + l¯L H † λlν νR + ν¯ Rcc MM νR + h.c. 2 The superscript cc stands for charge conjugation .νRt C = ν¯ Rcc with .C 2 = −1 and the superscript t for transpose. The first term is the familiar Yukawa Higgs coupling producing Dirac masses. It is invariant under the phase transformation .exp{(iλ)} multiplying left and right handed leptons. This leads to a conserved charge, lepton number L. Omitting the last term –the Majorana mass term, which lacks the symmetry– would leave us with massive Dirac neutrinos and lepton number conserved. Including it will lead to Majorana neutrinos as shown below- and hence lepton number no longer conserved. A small value for .MM is said to be natural because
Standard Model, and Its Standard Problems
137
setting it zero enhances the symmetry. As we will see in the last section lepton number is broken anyway by a global anomaly. We expect in that case that we will have from every generation two Majorana neutrinos. One that is the active neutrino produced in, for example, .b decay, and one that is a sterile neutrino only interacting through the Yukawa Higgs coupling and a coupling to the active neutrinos to be discussed below. Let us rewrite the Dirac mass terms we obtain by plugging into the Yukawa term the Higgs VEV . √v , thus obtaining a Dirac mass .MD : 2
ν¯ L MD νR =
.
1 t t ν¯ L MD νR − ν¯ Rt MD νL . 2
The latter term is the transpose of the first term. Using the properties of the charge conjugation matrix C it can be written in the form .
t t t cc − ν¯ Rt MD νL = ν¯ Rcc MD νL .
Substituting into (2.11) the neutrino mass term can be written in the typical Majorana form .
cc 1 νL 0 MD . (¯νL ν¯ Rcc ) t M MD νL 2 M
The zero in the matrix is due to our requirement of only renormalisable terms. It corresponds to Majorana mass formed from only left handed neutrinos. We will resuscitate this mass term in Sect. 5. The mass term has become a Majorana type mass term with a symmetric complex Majorana mass matrix within the parentheses. We can diagonalize this matrix with a unitary matrix .VP MN S and its transpose.3 A diagonal matrix results .
m 0 . 0 M
The matrix .m = diag(m1 m2 m3 ) has real positive entries and stands for the active neutrino masses. Same for the matrix M whose diagonal elements are the masses .M1,2,3 for the sterile neutrinos. The indices of the active neutrinos are a, those of the sterile neutrinos I . The corresponding states can mix between active and sterile. The regime with .MD > v the heaviest Majorana does decay mainly in leptons. This happens only when the plasma is not in equilibrium, and when there is sufficient CP violation.5 As the universe cools down to .T ∼ v/g2 the sphaleron mechanism [31, 60] transmutes this lepton asymmetry into the wanted baryon asymmetry. See (5) for more details. The model has been tested experimentally since almost 50 years in great detail, not only from the spectroscopic point of view.
5 These
are the Sakharov [61] conditions.
140
C. P. Korthals Altes
Parameters like the Weinberg angle, gauge couplings, decay rates and production rates have been tested and no deviations of appreciable magnitude have been found. Notable exceptions today are the anomalous magnetic moment of the muon [27] and lepton universality [23]. And the search for sterile neutrinos is still on. What remains not understood is notably: • The variation in the numerical values of the Yukawa couplings .λf in .SY uk leading to the mass spectrum. • The vacuum energy due to symmetry breaking. We will come to that in Sect. 4.1.
2.2 A Short Thermal History of the Universe, Seen Through the Standard Model The model has the potential in providing us with a calculable perturbation scheme up to the Planck mass. This is due to the experimental value of the Higgs mass. It leads to the important result that all couplings in the SM are becoming small at high energies [47] just like the non-Abelian couplings .g2,3 [25]. The value of the Higgs mass is such that this happens. Only the hypercharge coupling .g1 is growing but at a slow rate starting from a small value. On the other hand the Higgs mass should be not too low in order to avoid an unstable vacuum in the Higgs potential. This is crucial for understanding quantitatively the history of our Universe. This history is a thermal history. Starting with the Big Bang the Universe goes through a period of inflation and reheating, leaving a hot plasma phase of particles. This phase, where the .SU (2)L × U (1) symmetry is unbroken, is a hot plasma of four Higgs particles, W’s, Z’s, leptons, gluons and quarks without photons. The latter have been replaced with the hypercharge vector particles. The gauge couplings in a plasma do depend on the mean energy .E ∼ T . So at high enough T the couplings become small.6 At temperatures on the order of the Fermi scale (.Tew ∼ 160 GeV ) one sees a crossover, to a phase where the VEV of the Higgs appears. This is calculable with perturbative means. Only a small region around the transition temperature has long wavelength contributions .g22 (T )T which are not perturbative and have to be analysed with simulations. But the calculation is fully controlled [48, 49]. A further crossover is seen at a temperature of .∼150 MeV where the hadrons start to form. This is governed by the QCD coupling .g3 which at these energy scales is still large. So numerical simulation was needed to determine this crossover.
2 = g 2 (T )/ (exp(p/T ) − 1). effective coupling depends also on the occupation number, .geff So for long wave length modes .p ∼ g 2 (T )T the effective coupling is .O(1), no matter how small .g(T )! 6 The
Standard Model, and Its Standard Problems
141
After this crossover the neutrinos started to decouple from the plasma, at about T = 1 MeV [46], giving rise to the Cosmic Neutrino Background (CNB). At about the same time nucleosynthesis set in at about .T ∼ 1 − 10 MeV , and much later recombination causes the Cosmic Microwave Background at .T ∼ 0.3 eV . Unfortunately the unbroken phase is hidden behind the CMB. Only neutrinos and gravitational waves [50] can tell us about it. We have yet to introduce the fourth force: gravity. Before doing that we will see that gravity plays a role in fixing the hypercharge assignment in the Standard Model.
.
3 The Anomaly Cancellation and Fixing of Weak Hypercharge The structure of the standard model is simple. There are three families. In each family there are two left handed SU(2) doublets, two right handed lepton singlets, and two right handed quark singlets. In what follows we need only to discuss a single family, taking the first one. The attribution of the hypercharges to the fermion matter doublets and singlets and the Higgs doublet as in Table 1 can be done on the basis of the following simple reasoning. In the last section we saw how the symmetry breaking by the Higgs doublet implied a relation between the third component of isospin, .I3 , weak hypercharge and electric charge Q which reads for left handed particles Q = I3 + Y.
.
Right handed particles are isoscalar so electric charge and hypercharge are the same. For the electron, muon, and tau this gives indeed electric charge .−1 and their respective neutrinos are electrically neutral. Up and down quark have, respectively, charge .2/3 and .−1/3. But the question remains: who ordered this hypercharge attribution? Are there other hypercharge assignments?
Table 1 Fermion multiplets with dimensions of .SU (3) and .SU (2)L .(1, 2)− 1
2
.(3, 2) 1
6
c
.νe
.u
e .νμ .μ .ντ .τ
.d .c
.(1, 1)−1
.(3, 1) 2
.(3, 1)− 1
No charge
i .eR
c .uR
c .dR
.νeR
.μR
.cR
c
.sR
c
.νμ R
.τR
.tR
c
.bR
c
.ντ R
3
3
c
c
c .s .t
c
c .b
142
C. P. Korthals Altes
The answer is no.7 All hypercharges are fixed in terms of the hypercharge of the Higgs doublet. This is due to chirality not being conserved at the quantum level for a single particle species, but becoming conserved by taking all particle species in a given family. In the standard model the current associated with the hypercharge gauge field .Bμ has an axial part, proportional to .±g5 γ μ depending on whether it couples to the left handed or the right handed fermion. From the action .SM one sees that this current is of the form, for a generic fermion .ψ ¯ 5 γ μ ψ. j5 = i ψγ
.
μ
According to the first Noether theorem all these currents are conserved -on the classical levelμ
∂μ j5 = 0.
.
(3.1)
Physically this conservation law says that a massless fermion can never change its chirality, because for a massless fermion chirality and helicity are the same thing. And helicity never changes for massless particles. This seems an unassailable truth. Nevertheless in the quantum theory it is no longer true. In the quantum regime under certain circumstances we will explain below a right handed fermion can flip its chirality and become a left handed fermion and vice versa. We only need to address the current coupled to the .Bμ field. Figure 2 shows a one loop transition where .B μ with four momentum k produces two gauge bosons, either two W ’s, gluons, or gravitons. If we contract the resulting amplitude
γγ 5 μ
Fig. 2 Triangle anomaly. The axial-vector vertex is explicit. The wavy lines are both gluons, or both weak vector bosons, or gravitons. The straight lines are fermion propagators. For the gluons the diagram is evaluated for all the fermions in a family and should then cancel. Similar for the weak vector bosons and gravitons. There is another diagram where the wavy lines are crossed
7 For
earlier work see [51].
Standard Model, and Its Standard Problems
143 μ
j5
.
with the four momentum .kμ of the .Bμ field one obtains a non-vanishing result for (3.1) [14]. This non-vanishing is called the anomaly. The external fields in the diagram are classical so we can write them as such. The result is gauge invariant and a pseudoscalar and the only invariant is the field strength times its dual. If a given particle p circulates in the loop then the divergence becomes μ
ap F (V )μν F˜ (V )μν 8π 2 1 = μνσ F (V )σ . 2
∂μ j5 p =
.
F˜ μν
(3.2)
V stands for the weak vector boson and the gluon. And .ap stands for the numerical coefficient the given particle in the loop contributes. For the graviton anomaly [63] we have the Riemann tensor .Rκλμν contracted √ μν with its dual .R˜ κλμν = 12 κλσ −gRσ ∂μ
.
√
μ p
−gj5
=
ap Rκλμν R˜ κλμν . 192π 2
(3.3)
Actually since the particle is massless it will not change its chirality in any of the vertices in the diagram. Remember that was essentially the argument that the classical current was divergenceless. So how does the current pick up the anomalous divergence in the diagram? One answer is that in the quantum mechanical amplitude for the divergence of the axial current the contribution of a particle is integrated over its momentum and starts to diverge for large momenta. In the triangle diagram it is a linear divergence. In order to render it finite one introduces a fictitious particle with a large “regulator” mass and subtracts its amplitude from the one in Fig. 2. In the regulator diagram the helicity is not anymore conserved because of the mass. It turns out that even in the limit of infinite regulator mass this lack of conservation does not vanish and gives the anomaly. The reader might find this an unsatisfactory answer as it depends on a particular way of obtaining a finite result. However, it turns out that every regularization method that respects gauge invariance gives always the same number .ap multiplying the manifestly gauge invariant pseudo scalar in (3.2). Perhaps a satisfactory answer is to consider the vacuum state as a filled Dirac sea. The non-conservation follows [15] then by first switching on a constant external magnetic field B say along the positive z-direction. This will change [16] the Dirac energy levels into Landau levels with energy 1 En2 = pz2 + 2(n + )eB − 2eBs), n ≥ 0. 2
.
(3.4)
144
C. P. Korthals Altes
The part proportional to the spin .s = ± 12 is the Zeeman splitting. It causes a degeneracy of the levels with .n > 0. Only the .n = 0 level is non-degenerate with the .s = 12 parallel with the B field since the charge is positive. So .n = 0 states with positive momentum (.//B) are right handed and those with negative momentum left Lx Ly . handed. The degeneracy of the levels with n fixed is .eB 2π The states have charge positive charge .e = |e| so an applied electric field E in the positive z-direction will increase the momentum of the right handed ones, so that high lying right handed states in the Dirac sea, i.e., with .E = −|pz | small, will be evacuated to states deeper in the sea with larger momentum. The left handed states with .E = −|pz | are decelerated to zero momentum, then acquire a positive energy hence are pulled out of the sea. So the electric field will change the ground state –the filled Dirac sea– to a state with right handed holes and left handed particles. If the Dirac sea is infinitely deep (so that we do not have to worry about levels deep down getting emptied or filled) the net gain in chirality is the same for both. Note that the number of particles created equals the number of anti-particles created. But the gain in left handedness is obvious. Consider the system in a space volume .V = Lx Ly Lz then after a time lapse .t the number of left handed particle states .NL created is NL = 2 × (eEt
.
e2 Lz B )( V ), or ∂μ j μ = − 2 Fμν F˜ μν 2π 2π 8π
(3.5)
with .F˜ μν = 12 μνσ Fσ . This is the non-conservation of chirality [14, 15], also called the anomaly. We noted the conservation of fermion number. This is due to the presence of the two opposite chiralities, one producing particles, the other anti-particles. It follows that fermion number is not conserved if only one type of chirality is present. Note that the effect takes place where the particles are crossing the zero energy levels, that is, it is a low energy phenomenon. On the other hand, the bottomless Dirac sea we needed means the cut-off in the graphical calculation is taken to be infinite. Moreover, it does not change when computing radiative corrections to it. The anomaly stays insensitive to corrections. It can be written as the divergence of chirality current and is proportional to the inner product of electric and magnetic fields as shown before. A corroboration of this peculiar fact came in the late seventies from a theorem in algebraic geometry of Atiyah and Singer [17]. The theorem says that the space time integral of the anomaly equals the difference of positive and negative chirality solutions of the Dirac equation (the “Dirac index”) in the given external field. This Dirac index turns out to be equal to the space time integral of the divergence of the chiral current. And so the theorem is the integrated version of the anomaly equation. As .Fμν F˜ μν is a total divergence the value of the integral depends only on the boundary, not on the bulk properties of the configuration. This gives the anomaly a topological standing.
Standard Model, and Its Standard Problems
145
The anomaly is not limited to electromagnetic fields. In the standard model other forces appear and they all cause anomalies. These forces are the strong, electroweak and gravitational forces. For fermions in an .SU (N) gauge theory the anomalous divergence of the axial current .jμ5 (x) reads ∂∂ μ jμ5 =
.
g2 T rFμν F˜ μν . 8π 2
(3.6)
For the gravitational field the anomaly [63] becomes 1 Rμ νσ R˜ μνσ 192π 2 1√ σ −g μναβ R αβ . = 2
D μ jμ5 =
.
R˜ μνσ
(3.7)
This current non-conservation jeopardizes the consistency of the quantum theory of the standard model. Gauge invariance and unitarity shown in ref. [24] without anomalies are lost [26, 37] when present. One has to impose that the total result, i.e., the sum over all species inside the loop is vanishing. Fifty years ago [36] this was done by showing that the leptons and quarks create anomalies for the hypercharge current, but that they cancelled with the hypercharge assignments in Table 1. At the time only .LEW was considered, gluons were only a year later invented [38]. But the paper signalled a crucial moment for the SM.
3.1 Cancelling the Anomalies In what follows we turn the tables, take the hypercharge assignment as a set of five unknown parameters .dl , dq for the lepton (quark) doublets and .se , sν su , sd for the electron, neutrino, quark up and down singlets. Of course the neutrino being electrically neutral, .sν = 0. We keep .sν in the arguments below. Then, insisting on the cancellation of the anomaly for all four fundamental interactions, not only the electroweak but also including gluons and gravity, we will find that Nature only retains one possible assignment, the one in Table 1. The right handed neutrino does not play a role in the consistency. Remember it is supposed to be a gauge neutral particle. The vertices in the diagram are given by the couplings to the fermions in (2.2) and contribute group factors and charge factors to (3.2). The .ap factors contain only the latter and of course the coupling g of the external legs. Every fermion p with a definite chirality circulating in the loop contributes to this anomalous result (3.2) a factor .ap . The left handed fermions with a minus sign, the right handed with a plus sign, as the .γ5 in the right vertex acts on the chiral projectors as .γ 5 PL,R = (−, +)PL,R , respectively.
146
C. P. Korthals Altes
Let us concentrate on a simple example, that of two W bosons in the weak isospin current c μ c ¯ γ Q + L¯ L γ μ LL . igWμ Q L L
.
(3.8)
From the current (3.8) it is clear that only the left handed quark doublet and the lepton doublet can circulate in the loop. So the non-vanishing result is proportional to • .dl for the leptons, and • .3 × dq for the quarks, the factor 3 being due to the colour degrees of the quarks. In what follows it is useful to write .Nc for the number of colours. In nature .Nc = 3. And of course are to be multiplied with a minus sign. So the cancellation of this anomaly due to the weak isospin vector bosons becomes a fact if − dl − Nc dq = 0.
.
(3.9)
For the gluon anomaly to cancel one has Nc (−2dq + su + sd ) = 0.
.
(3.10)
There is one more anomalous triangle diagram, with in all three vertices a B vector meson. Now the weak hypercharges come in cubically and the cancellation imposes .
− 2dl3 − 2Nc dq3 + se3 + sν3 + 3su3 + 3sd3 = 0.
(3.11)
For the two graviton amplitude we use that gravitons couple universally to all fermions. So the left handed doublets contribute .
− (2dl + 2Nc dq )
and the right handed singlets se + sν + Nc (su + sd ).
.
So the graviton anomaly cancels if .
− (2dl + 2Nc dq ) + se + sν + Nc (su + sd ) = 0.
(3.12)
Note that in the graviton anomaly the first two terms appear in the anomaly of the vector bosons (3.9), so they drop out. Note that the right handed contributions to the linear anomaly constraints are really in terms of two variables, the sums of the right handed quark and lepton
Standard Model, and Its Standard Problems
147
singlets q =
.
se + sν su + sd , l = . 2 2
(3.13)
They read for the colour and gravitational anomaly − dq + q = 0
(3.14)
Nc q + l = 0.
(3.15)
.
and .
Note that from these two equations and (3.9) follows .
− dl + l = 0,
(3.16)
so there is perfect symmetry between leptons and quarks, except for the colour factor Nc .8 e d also enter. In and .Dl = sν −s For the cubic term the differences .q = su −s 2 2 fact, using (3.9), (3.14), (3.16) and after some simple algebra the cubic constraint is written as
2 2 = 0. .6dl l − Dq (3.17)
.
If we insist on the left handed couplings .dl (and hence .dq ,(3.9)) not to vanish we have found that there are two possibilities left l = ±q .
.
(3.18)
The case with the minus sign is unacceptable. It will give .dh = 0 using the conservation laws in the Yukawa couplings (3.20). The problem of solving the four constraints is now fully linear. Note we did not use the charge neutrality of the neutrino, .Dl = −l . Using it we have five linear equations with six unknowns. Solve this system with five of the unknowns in terms of .q and find .
8 Inversely,
1 1 1 su = sd = − se = −dl = Nc dq = q . 1 + 1/Nc −1 + 1/Nc 2
(3.19)
(3.16), (3.10), and (3.9) imply the gravitational constraint (3.19). Had there been a vector field coupling to lepton “colour” then the corresponding anomaly cancellation would have given (3.16).
.U (1)
148
C. P. Korthals Altes
From the hypercharge conservation in the Yukawa couplings in (2.10) we get four more constraints involving the Higgs doublet hypercharge .dh .
− dq − dh + su = 0 −dq + dh + sd = 0 −dl + dh + se = 0 −dl − dh + sν = 0.
(3.20)
The first two give Eq. (3.14) and .q = dh . The last two give Eq. (3.16) and .l = dh , or .l = q . The latter is just one of the solutions of the cubic constraint, Eq. (3.18) so gives no new information either. As already mentioned, the solution with a minus sign would have given .dh = 0 so is not acceptable. So all hypercharges are now expressed in term of .dh . The latter was fixed to be .1/2 in order to have the electromagnetic subgroup leave the Higgs VEV invariant, Eq. (2.6). So for three colours the values of the hypercharges in Table 1 are recovered. We end this section with the following amusing comment, as noted in reference [52]. Suppose we had not known about the gravitational anomaly (3.12). Then the cubic constraint is not reducible to the quadratic constraint (3.17). But using the charge neutrality of the neutrino, the .SU (2) and colour anomaly we can obtain the result (3.18) using Fermat’s last theorem. The argument starts from the cubic equation in the form 2Nc3 dq3 + se3 + 6Nc dq 2q = 0.
.
(3.21)
One can scale out the .Nc dependence by introducing judicious new variables .
−
q a−b se l Nc and = = = Nc 2dq q a+b q a+b
(3.22)
and by substituting in the cubic equation. The result is simply
8Nc3 a 3 + b3 − 1 = 0.
.
(3.23)
If the fermionic charges are rational numbers so are a and b. But then .a = 1 and b = 0 or vice versa according to Fermat’s Last Theorem. They give, respectively,
.
l = Nc dq and q = ±Nc dq ,
.
(3.24)
i.e., Eq. (3.18). Recall that this result was originally obtained by invoking the gravitational anomaly but for an arbitrary charge assignment .sν and no condition of rationality of the charges.
Standard Model, and Its Standard Problems
149
The latter condition is needed. Remember that the wave function of a physical on-shell hypercharge “photon” with, say, momentum .(p, 0, 0, p) has a non-compact little group [58] that in .SL(2, C) form reads S=
.
θ
ei 2 a θ 0 e−i 2
.
(3.25)
(+) The rotation angle .θ transforms the helicity vector .eμ ∼ (0, 1, i, 0)) by a phase. The complex number a shifts the helicity vector by .(a + (1 + i)|a|2 )pμ , a non-compact gauge transformation. So we need rational charges to ensure the hypercharge group being .U (1)Y . This is in the spirit of the standard model, which unites the hypercharge group with a .U (1) subgroup of .SU (2)L as in (2.6). The conclusion is that the gravitational constraint is needed for fixing the hypercharges. Unfortunately the anomaly cancellation gives no inkling of why there is an asymmetry between quarks and leptons. If we keep .sν as a variable one gets a sum of 3 cubic powers and there are many solutions. Let us recapitulate. The constraints do fix the hypercharges. Amongst them is the constraint due to the graviton. It plays the same role as the gauge bosons. The question is then does this similarity go further? Is gravity, like the other three forces, due to the gauging of a symmetry, in this case the Poincare group?
4 The Gauging of the Poincaré Group As discussed in the introduction the spirit of the Standard model is that of gauging symmetries suggested by experiment. Below we will do this for the Poincaré group. The argument is a shortened version of the original papers [6]. The transformation of a generic field .F (x) under the Poincaré group is F (x ) = S()(x)F (x),
.
(4.1)
with .S()(x) = exp(λ(x)ab )M ab a representation of the Lorentz group. For example, a Dirac spinor .ψ(x) with Dirac matrices .{γ a , γ b } = 2ηab M ab =
.
1 a b γ γ . 2
For a vector field .V c we have M ab V c = ηac V b − ηbc V a .
.
In both cases the generators obey the algebra of the Lorentz group
150
C. P. Korthals Altes
ab M , M cd = −ηac M db + c ↔ d + ηbc M da − c ↔ d.
.
μ
In what follows we take the x dependent Lorentz transformations .(x) ν and μ ≡ (x)ν x ν + t (x)μ as independent. The latter give rise to diffeomorphisms, μ also called Einstein transformations, . ∂x∂x ν(x) . Only the subgroup of local Lorentz transformations are analogous to the internal symmetries in the Standard Model. To make this analogy manifest we rename the labels of the local Lorentz transformations with Latin ones: .ab . The Minkowski metric is then .ηab . −1 We lower or raise Latin indices with the Lorentz metric .ηab or .ηab ≡ ηab and hence .ηab ηbc = δac . As for the internal symmetries in the Standard Model, the Lagrangian .L(F, ∂μ F ) is supposed invariant under constant transformations, with
μ .x (x)
∂μ F (x ) = νμ S()∂ν F (x).
(4.2)
.
We search for a derivative .Da that depends on the six Lorentz group potentials ωμ to obtain a covariant expression of the form
.
Da F (x ) = ba S()(x)Db F (x).
.
(4.3)
If we for sake of the argument forget the presence of .x (x) in the left hand side of (4.3) the transformation .S() would be the only, internal symmetry and the problem could be solved as we did before. That is, introduce six potentials .wμab labelled by antisymmetric indices .(a, b) to get the covariant derivative Dμ (ω) = ∂μ + ωμ , ωμ ≡
.
1 ωabμ M ab 2
(4.4)
with the transformation law as in (3.12) ω(x) ˜ μ = −∂μ S()S()−1 + S()ωμ S()−1 , so
.
Dμ (ω)F ˜ (x) = S()Dμ F (x).
(4.5)
Now reintroduce .x ← x in the covariant derivative .Dμ (ω) Dμ (ω (x ) =
.
∂ + ωμ (x ). ∂x μ (4.6) ν
∂x ∂ The derivative term becomes through the Leibniz rule . ∂x μ ∂x ν .
Standard Model, and Its Standard Problems
151
Now the potential is a contravariant vector as well9 and therefore the Lorentz covariant derivative is a contravariant derivative under Einstein transformations Dμ (ω (x )) =
.
∂x ν Dν (ω(x)). ∂x μ
(4.7)
The result for the Lorentz covariant derivative .Dμ (ω) under the combined transformation (Lorentz and Einstein, i.e., Poincaré) is ∂x ν Dν (ω )S()F (x) ∂x μ ∂x ν S()Dν (ω)F (x). = ∂x μ
Dμ (ω˜ )F (x ) =
.
(4.8) ν
∂x This derivative is not yet what we search for. The first factor . ∂x μ is an Einstein transformation and should be replaced by a local Lorentz transformation .(x). μ To obtain this we introduce the Vierbein potential .e a with the transformation law
ea (x ) =
.
μ
∂x μ b e (x). ∂x a b
(4.9)
The Vierbein is supposed to be an invertible matrix b eaμ eμ = δab .
.
After contraction of the Vierbein into (4.8) the covariant Einstein transformation of the Vierbein cancels with the contravariant one in (4.8) and replaces it by a local Lorentz transformation. Sometimes the Vierbein is called the potential for the four x dependent translations. But note that the transformation law of the Vierbein has no inhomogeneous term characteristic for a gauge potential. μ So the derivative .Da (e, ω) = e a Dμ (ω) has the required property Da (e , ω˜ )F (x ) = (x)ba S((x))Db (e, ω)F (x).
.
(4.10)
The Vierbein has come on the stage as a deus ex machina to produce a local Lorentz covariant derivative. Later on its role will become clear. The Lagrangian becomes by definition now a function .L(F, Da F ) and is invariant under local Poincaré transformations. The volume element .d 4 x e (.e = a ) is invariant according to Eq. (4.9). Note the Vierbein .ea is by definition the det eμ μ μ matrix inverse of .e a . So we have obtained what we wanted, namely a matter action invariant under local Lorentz transformations.
9 This
is easily justified for infinitesimal transformations .x = x + δx.
152
C. P. Korthals Altes
Now we are in a position to introduce the corresponding field strengths [Dμ (ω), Dν (ω)] =
.
1 1 ˆ R(ω)μν = Rˆ abμν Mab 2 2
and from the field strength the invariants μν Rˆ ≡ eaμ ebν Rˆ abμν , Rˆ abμν Rˆ ab , ...
(4.11)
.
Note the invariant linear in the field strength and quadratic in the Vierbein, clearly absent in internal gauge symmetries. Up till now the only gauge potential is the local Lorentz gauge potential .ω. The reader may feel now uncomfortable; there is no gravity yet. To see that gravity is also described by the theory there should be a gauge potential reflecting the Einstein transforms .x → x (x). Indeed such a potential can be constructed from .Dμ (ω) and the Vierbeins. Take the quantity . σ τ defined as
σ τ ≡ −eσb Dτ (ω)eb , Dτ (ω)eb = ∂τ eb + ωbcτ ec .
.
(4.12)
How does this quantity (a local Lorentz invariant) transform under an Einstein transformation? Take the transform of the covariant derivative (4.7) and of the Vierbein (4.9) and substitute into (4.12). After some algebra we obtain
σ τ (x ) =
.
∂x ∂x μ ∂x ν λ ∂x ∂ 2 x μ ˜ μν (x) − . ∂x λ ∂x σ ∂x τ ∂x μ ∂x σ ∂x τ
(4.13)
We have therefore recovered the familiar affine connection with one important difference. The symmetry in the lower indices is absent. The antisymmetric part is the torsion tensor .T σ τ , a tensor because the inhomogeneous term in (4.13) is symmetric. Now the Riemann tensor .R()κ λμν turns out to be after substitution of (4.12). R()κ λμν = eaκ ebλ Rˆ abμν .
.
So after contraction this identity shows the invariant .Rˆ is just the traditional curvature invariant .R(). Using the invertibility of the Vierbein the definition (4.12) is simply rewritten as f
∂μ efν − κμν efκ + ωμ b ebν = 0,
.
a eb η = g the familiar metricity condition, if we set .eμ μν equal to the metric. ν ab
(4.14)
Standard Model, and Its Standard Problems
153
Concluding, by gauging the Poincaré symmetry of the standard model we obtain usual Einstein gravity by setting torsion, the antisymmetric part of (4.12), equal to zero. Doing so determines the affine connection in terms of the Vierbein. If torsion does not vanish we obtain what is called Cartan gravity [5].
4.1 Gravity in the Standard Model and the Cosmological Constant Problem We have obtained in the previous section the couplings of the gravitational field to the matter fields in the SM. It remains to fix the kinetic term for gravity. One choice is the Einstein–Hilbert action ˆ √ .SEH = d 4 x −gR, R = R κ λκν g νλ ≡ Rλν g νλ . (4.15)
This action is invariant under all Einstein transformations . ∂x∂x σ(x) . It is consistent with all experimental evidence. However, using it in the standard model introduces an elephant in the room. It is the Linde argument [53] that basically says the vacuum is too heavy. When we look at the Higgs potential (2.4) its minimum is at the VEV v. But we do not understand the reason for this minimum, so it may be that the constant V is an energy density on the order of .v 4 . This is the energy of the vacuum state, with μν |0 >= g μν V . .< 0|T This addition to the energy momentum density .T μν gives though the Einstein field equation 1 R μν − g μν R + 8π Gg μν V = −8π GT μν 2
.
(4.16)
the universe a curvature .R = 32π GN V , neglecting the energy density of particle excitations on the right hand side, as .V ∼ v 4 . This conflicts with the measured cosmological constant [59] .8π G = 7.23 × 10−121 MP2 by a factor .1058 , a humbling defeat for theory.10 QCD is realized by broken chiral symmetry [21] with a condensate of quark-anti quark pairs and a vacuum energy density .∼ 10−3 GeV 4 leading to a mismatch of .1045 . Of course not only condensates are culprits. The zero point fluctuations of any quantum field theory cause an energy density on the order of . (4π1 )2 MP4 , due to an
10 The
reduced Planck mass .MP compares to the Fermi scale as .v/MP = 10−16 .
154
C. P. Korthals Altes
ultra-violet cut-off of order .MP . Then the comparison to the measured density gives a mismatch of .O(10120 ). This is not disconnected from any experimental reality. An experiment first proposed in 1948 by Casimir [56] measures the difference between these densities within two conducting metal plates on the inside and the outside. This results in an attraction and has been found experimentally [57]. In the pasty fifty years many cures have been suggested [54]. One is based on admitting only Einstein transformations that leave the volume √ element . −g invariant, the unimodular Einstein transformations with .det ∂x∂x(x) = 1. This implies the volume element is a physical scalar field. But it implies also that in the linearized theory there is a spin zero excitation that propagates, along with the spin 2 gravitational waves. √ To avoid this the volume element is fixed to be one uniformly:11 . −g = 1. It follows the variations .δgμν are not independent, and as a consequence the Einstein equations take now the restricted, traceless form 1 1 R μν − g μν R = −8π GN (T μν − g μν T ). 4 4
.
(4.17)
Note that changing the energy momentum tensor .T μν into .T μν + g μν V leaves the right hand side the same. In other words the gravitational field decouples from any contribution of the form .g μν V to the energy momentum tensor. This might look promising since the contribution from the VEV of the Higgs potential in (2.4) is precisely of this form. However, it is only the traceless part of the gravitational field (the Ricci tensor .R μν ) that is determined by the traceless part of the energy momentum tensor. What happens to the traceful part, i.e., the curvature R of the gravitational field? Actually it is determined up to an integration constant by the trace of the energy momentum tensor. This is easily seen by taking the covariant divergence of the traceless Einstein equations. The Einstein tensor and the energy momentum tensor are divergenceless so we are left with .
1 ˆ an arbitrary constant .(4.18) ∂μ (R − 8π GN T ) = 0, or R − 8π GN T = 4, 4
Substituting into the traceless equation we get 1 ˆ μν = −8π GN T μν . R μν − R + g 2
.
(4.19)
The “cosmological constant” is now the arbitrary integration constant. √ the original Einstein theory any given configuration √ with . −g = 1 can be brought by an Einstein transformation to the uniform configuration . −g = 1, a fact already known to and used by Schwarzschild in his original black hole solution.
11 In
Standard Model, and Its Standard Problems
155
If we switch on the VEV in the Higgs potential (2.4) the energy momentum ˆ may be used to undo that shift (assuming tensor changes by a shift .g μν V , and so . that the transition is strongly first order, which of course it is not). But then the VEV due to the breaking of chiral symmetry brings up the same problem but cannot anymore be dealt with in the same fashion. There are many other attempts to deal with the problem, mostly involving a new field that tunes dynamically the trace of the energy momentum tensor to vanish. For a review see [55].
5 Global Anomalies and Neutrino Sector In Sect. 5 two currents are discussed to which no gauge field couples. They are the baryonic and the leptonic current .Bμ and .Lμ obtained á la Noether by varying all the quarks with a phase, and averaging over colour and similar for the leptons. They give rise to global charges B and L. The two global currents are considered in Sect. 5. In this section we have stuck to the gauge currents. Anomalies in the gauge currents must and do cancel in the Standard Model as we saw in Sect. 3. However, there are anomalous currents not coupled to the gauge fields leading to the so-called global anomalies [31]. The SM has two global anomalies. The first one is in the baryonic current .Bμ and the second the leptonic current .Lμ . The baryonic current is obtained by varying all quark fields in the standard model action with the .U (1)B phase and dividing by the number of colours. For the .U (1)l lepton current similar. From .SM Eq. (2.2) the variation of Q and q fields gives .Bμ ⎛ ⎞ 1 ⎝ ¯c μ c q¯Rc γ μ qRc ⎠ . . QL γ Q + 3 c
(5.1)
q=u,d
The lepton current takes a similar form lμ = L¯ L γ μ LL +
.
l¯R γ μ lR
(5.2)
l=νe ,e
and of course classically both currents are conserved. However, in the quantum theory the left and right handed parts couple differently to the weak vector bosons which causes an anomaly in the one loop approximation [31]. As far as the weak isospin and weak hypercharge are the anomalies for the baryon and lepton current are identical We can see that easily through the same straightforward counting analysis as in Sect. 3. First we note that coupling two W ’s to the left handed currents gives the same result for every quark doublet as for the single lepton doublet.
156
C. P. Korthals Altes
The coupling of two B gauge bosons is slightly more subtle. For the left handed baryon current it gives a factor .
− 2dq2
and for the right handed baryon current su2 + sd2 .
.
For the lepton current the combined factor is .
− 2dl2 + se2 .
Important is that the presence or absence of the right handed neutrino is irrelevant for the lepton anomaly, as it has no hypercharge. With the hypercharge assignments fixed by the gauge anomaly cancellation as in Table 1 the anomalies are the same. This means that the difference of the two currents, .Bμ − Lμ , is still strictly conserved and that hence the charge difference .B − L is conserved. The result for .B + L is 1 1 F (B)μν F˜ (B)μν T rF (W )μν F˜ (W )μν + 8π 2 8π 2 4 Rκλμν R˜ κλμν . + 192π 2
∂μ (B μ + Lμ ) =
.
(5.3)
This result has to multiplied by the number of generations .ng = 3.
5.1 What Gauge Field Configurations Contribute to the Global Anomalies? Superficially one would say there is no contribution. The reason is that the anomaly is a total divergence. Only at infinity the four dimensional integral can contribute. But at infinity the field strengths vanish. For the Abelian contribution to .B + L this is true. To be more specific consider the Hamiltonian H of gauge theory with a compact non-Abelian theory H =
.
1 2
ˆ
dx Tr E2 + Tr B2 .
(5.4)
variables are .A and .E and obey the canonical commutation relations
The conjugate Ai Ej = 1i δij in a well known shorthand.
.
Standard Model, and Its Standard Problems
157
The Hilbert space is spanned by the states .|{A(x)} . Only the gauge invariant subspace is physical. The operator ˆ G ≡ exp i dx E.D(A)ω(x)
.
(5.5)
is the representation in Hilbert space of the gauge transformation .U (x) = exp{i (ω(x))}. The pure gauge configuration .A(x) = U (x) ∂i U (x)† defines a state where the Hamiltonian vanishes. This ground state has, in case the group is compact nonAbelian, a remarkable property: it is degenerate. It is labelled by the winding number n of .(x). In the case of the weak isospin group .SU (2) a transformation with winding unity is π x.σ . .U1 (x) = exp i√ x2 + λ2
(5.6)
At .x = 0 it equals .1, at .x = ∞ it equals .−1. The mapping of the group can be seen as one to one-it is very close to a stereographic mapping. It can easily generalized to an n-fold mapping .Un (x). If the Gauss operator in (5.5) corresponds to a gauge transformation .Un we denote it by .Gn . It is not hard to see the group properties .Gn Gm = Gn+m , G−1 n = G−n . A physical state can at most suffer a phase .exp{(iθn )} when .Gn acts on it. Hence with the group property the phase becomes .θn = nθ . The angle .θ distinguishes the possible physical states. And as the Hamiltonian commutes with the gauge transformations .Gn , these states can be chosen to be eigenstates of the Hamiltonian. Mappings with different winding have no transformations in common, they form disjoint sets of gauge transformations, with the consequence that a smooth interpolating gauge potential cannot be a pure gauge. Therefore the energy in between the states is non-vanishing and results in a potential barrier between two pure gauge states .|n and .|n + 1 . This degeneracy is only present for the weak SU(2) part of the .B + L anomaly above. All gauge theories with compact non-abelian gauge group have a nontrivial winding. The hypercharge contribution in the .B + L anomaly corresponds to an Abelian group, which has no winding. The same applies to the gravitational anomaly, as it corresponds to the non-compact group .SO(3, 1). The .SU (2)L anomaly is a total divergence .
1 T rF (W )μν F˜ (W )μν = 2∂μ K μ , 8π 2
which permits us to form a conserved current (B + L)cμ = Bμ + Lμ − 2Kμ
.
(5.7)
158
C. P. Korthals Altes
with a conserved charge ˆ QB+L =
.
ˆ dx (B + L)c0 =
ˆ dx (B0 + L0 ) − 2
dx K0 .
(5.8)
Although .QB+L is conserved it is not gauge invariant. The last term on the right hand side is invariant under gauge transformations with vanishing winding, but if the winding is n it is shifted ˆ
ˆ dx K0 →
.
dx K0 − n.
(5.9)
The integral on the right hand side is called the Chern-Simons number .NCS . Hence the transformed state .exp{(iαQB+L )}|θ has the angle .θ shifted by .−2α since it behaves under a gauge transformation .Gn as Gn exp{(iαQB+L )}|θ = exp{(in(θ − 2α))} exp{(iαQ)}|θ .
.
(5.10)
As the conserved Q commutes with the Hamiltonian the shifted state .|θ − 2α has the same energy as the state .|θ . So all angles .θ define the same theory. But the conserved charge does not annihilate the ground state but shifts it because of the anomaly appearing in (5.9). This means that the non-conservation of .B + L –due to the anomaly– appears as an even number (twice the winding number n). For example, the reaction, corresponding to one winding unit and 3 generations .n = 1, ng = 3 p + p → e+ μ+ ντ n¯ + ......
.
(5.11)
respects .B − L but violates .B + L, .B = L = −3. The dots stand for vector bosons, Higgs particles, and other. For low energy transitions the WKB approximation should be valid. They involve Euclidean tunnelling configurations with zero energy and boundary conditions matching the difference in winding between the ground states. These are the instantons [30, 31]. The vanishing energy condition is met when the instanton is self dual or anti-self dual in Euclidean space, .E = ±B. Continued to Minkowski space gives .iE = ±B or vanishing energy. So from the WKB approximation one expects the .B + L violating transition amplitude to be exponentially suppressed. There is an elegant way to see what the action is for this transition, based on the positivity of the Euclidean action .S(W ), (W )μν )2 = 2F (W )μν F (W )μν ± 2F (W )μν F (W )μν , or 0 ≤ (F (W )μν ± F
.
S(W ) ≥ |n|
8π 2 . g22
(5.12)
Standard Model, and Its Standard Problems
159
If the bound is saturated the corresponding field strength must be self dual or anti-self dual (in Euclidean space). Self dual instantons are solutions of the equation μν = ±F˜ μν and hence satisfy the classical equations of motion. So the probability .F for the .B + L violating process becomes tunneling ∼ exp −
.
16π 2 2 = exp (−4π sin (θ )/α) , W g22
(5.13)
α being the fine structure constant. Hence for low energy processes this is a vanishingly small number. In order to know what is meant by low energy we have to have an idea of the height of the potential mountain. This height is given by a saddle point configuration which is called a sphaleron [60] with energy .ES . This involves a classical computation involving the electroweak theory. In a first approximation the Weinberg angle is set to zero, in which case an .SU (2) gauge theory coupled to the isodoublet Higgs results. Naively one would expect .ES ∼ v. However, it is parametrically larger by a factor . g12 .In the energy functional one can replace .x by .g2 vx and obtain by rescaling the integral
.
ˆ E(H, W, g2 , v, λ) =
.
=
1 dx T rB(W )2 + |D(W )H |2 + λ((H † H )2 − v 2 )2 2
v E(H /v, W/v, 1, 1, λ/g22 ). g2
(5.14)
This parametric dependence realizes for any static soliton in this gauge theory. Much less trivial is to show [60] that a rotational invariant Ansatz for the Higgs and W fields in (5.14) gives a saddle point with one unstable mode, as Fig. 3. For the values of the Higgs mass and the weak coupling its energy
potential energy
E sphaleron
-1
0
1
N CS
Fig. 3 The potential between subsequent vacua. The height of the potential is given by the energy of the sphaleron. The sphaleron has Chern-Simons number (5.9) .NCS = 21
160
C. P. Korthals Altes
Esphaleron = 3.2 × 4π
.
v ∼ 10T eV . g2
Of course the Weinberg angle is not vanishing. Restoring it gives a result a few percent off. More important, the sphaleron gets a large magnetic moment .μsp ∼ 102 μW . Its size is .∼ 1/mW , hence its energy density is about .101 0 that of a proton. In the LHC centre of mass frame protons of energy .10 T eV are contracted by a factor .104 , so the energy density of the pancake is .108 . That is still two orders of magnitude off the density of the sphaleron. The proton is made up by gluons and quarks and these have to convert into the sphaleron, a coherent set of vector bosons and Higgs particles. That is an exponentially suppressed process [60]. The conclusion is that for accelerators TeV energies the detection of sphaleron induced processes remains elusive. In a process at high temperature the situation is quite different [44, 62] , as in leptogenesis mentioned in Sect. 2.1. High means above the electroweak phase transition .Tew ∼ 159 GeV . Then the height of the sphaleron collapses in the symmetric phase and .B +L fluctuates easily both in positive and negative directions, hence on average .B + L = 0. But .B − L is non-vanishing due to lepton number violating decays discussed in Sect. 2.2. When the temperature sinks below .Tew the sphaleron height is resurrected and the baryon number excess is frozen.
5.2 B − L Anomaly Through Gravitation Let us now assume the SM scenario where the right handed sterile neutrino is absent. Most people consider this a very unlikely scenario, as one would give up on the important roles of the sterile neutrinos, as discussed before. Of course we need a Majorana mass constructed out the left handed fields. If that is done the theory is not in conflict with experiment, as of today. Once more if a sterile neutrino is found then this theory is not viable. But it is interesting in its own right to explore the consequences of a purely gravitational anomaly. What happens to the anomaly in this case? The answer is: for the weak vector bosons there is no difference as the right handed neutrino was sterile and did not couple. However, the gravitons do note the presence or absence of the right handed neutrino! If it is present the contributions of the right and left handed sectors did neatly cancel. If it is absent the .B − L divergence becomes due to the single lefthanded neutrino [64] ∂μ B μ − L μ =
.
1 Rκλμν R˜ κλμν , if no sterile right handed neutrino (5.15) 384π 2
Standard Model, and Its Standard Problems
161
and therefore .B − L becomes anomalous only due to gravitational effects.12 Before exploring this purely gravitational anomaly we have to remember how a Majorana mass occurs for left handed neutrinos. Naively we might introduce a Majorana mass term for the left handed neutrinos (superscript cc stands for charge conjugation) by coupling the left handed neutrino to its charge conjugate mM ν¯e cc L νeL + h.c.
.
(5.16) This respects Lorentz invariance and conservation of electric charge, as the neutrino field .νe is neutral. However, the neutrino field has weak isospin and hypercharge. The mass term should respect these symmetries. The compound field formed by the Higgs and the weak isospin lepton doublet .nL ≡ H † lL is indeed a weak isospin and hypercharge scalar, and hence .
1 cc n¯L nL + h.c. S
(5.17)
is the correct Majorana mass term [39]. The scale S is introduced to make the term dimensionless. So the price of having a Majorana term of left handed neutrinos is a new scale parameter S. The mass obtains by observing that we can substitute Eq. (2.5) into (5.17) and neglect the fluctuations around the VEV of the Higgs field, v2 obtaining .mM = 2S . If this new scale is the Planck mass the Majorana mass is on the order of .10−16 v, about .10−4,−5 eV , already discussed in Sect. 2.1. Is this possible to happen through a gravitational instanton? In general an (anti)-self dual Riemann tensor leads to a vanishing Ricci tensor because of the algebraic Bianchi identity. This follows immediately by contracting μ the lefthand side of the self duality equation with .eb 1 Rabμν = ± abcd R cdμν 2
.
(5.18)
and noting that the righthand side is nothing but the algebraic Bianchi identity and hence vanishes.13 So in absence of a cosmological term also an(anti) self dual gravity field is solution to the Euclidean equations of motion, just as for the non-abelian case in (5.12). But unlike the non-abelian case the action of a self dual gravitational instanton is zero. At first sight this looks quite attractive: there is no exponential suppression!
12 Note
the factor .1/2 compared to the anomaly of the axial current (3.7). It is due to the absence of the righthanded neutrino. 13 In absence of torsion.
162
C. P. Korthals Altes
Fig. 4 Scenario with only left handed neutrinos and hence gravitationally anomalous .B − L, Eq. (5.15). To meet the need for a neutrino mass one takes either the well known Weinberg operator (b) or the K3 instanton induced Majorana mass (a)
But pushing further Gibbons et al. [71] discovered that gravitational instantons which are asymptotically flat and (anti) self dual are necessarily flat everywhere. And they surmised that the same is the case when dropping the self duality. This difference between the non-abelian and gravity case gets more understandable by noting the trivial fact that for the non-abelian case the compact gauge group admits winding for the pure gauge configuration, but not so for the non-compact .SO(3, 1 flat configuration. It was the winding that created the potential mountain between vacua with different winding and hence non-trivial tunnelling solutions bridging the winding difference. So Gibbons’ result tells us not to search for such “instantons.” So the question is: how does the gravitational anomaly manifest itself in a physical process? Below we give a tentative answer. How this might happen is sketched in Fig. 4 in the left panel. The blob represents a gravi-instanton, an instanton like solution .G0 of the Einstein equations. There are many of those [65, 67]. But only those are admitted that have a spin structure, i.e., that there is a way to define consistently a spin field on it [70]. Suppose we have found such an instanton we search for those neutrino states that are zero modes of the Dirac equation in the background of that gravi-instanton by solving the Dirac equation / 0 )νL (x) = 0. i D(G
(5.19)
Note that the curvature of a gravitational instanton can vanish. An example is the K3 configuration [71]. It is self dual hence has ´a vanishing Ricci tensor (see Eq. 1 μνσ = 48. (5.18). Its topological charge is .P (G0 ) = (16π d 4 x Rμνσ R 2) Hence the anomaly becomes for this configuration .
1 384π 2
ˆ
d 4 xRκλμν R˜ κλμν = 2.
(5.20)
Standard Model, and Its Standard Problems
163
So the helicity flip in the blob is 2 as it should since it is a mass term. To obtain the physical neutrino one projects the zero mode (5.19) on the Higgs field, which introduces a factor .v 2 , the VEV of the Higgs field. There is an overall normalizing factor .MP . The K3 instanton can be constructed out of Eguchi-Hanson (E-H) instantons [66], which are self dual, have a built in length scale a but are only locally asymptotically flat, due to the identification of all points x and .−x, to avoid a singularity at .x 2 = a 2 . Its metric is gμν = δμν +
.
a4 f1 xμ xν − f2 η3μ x η3νσ xσ , f1−1 + a 4 = f2−1 = r 4 . r2
(5.21)
We used ’t Hooft’s .η symbol [31]. A single E-H instanton has vanishing Dirac index14 The K3 instanton can be constructed [69] on a four dimensional torus where EH configurations are placed in each point on the torus which is invariant under the identification .x → −x. There are 16 of those points. They have to be smoothly connected and this can be done in a unique way, though there is no analytic expression—for references see [69]. It is the unique compact and simply connected space with a metric leading to a self dual Riemann tensor. However, our travails are not over. First of all our result is only valid for a one generation standard model.Second the K3 instanton is not asymptotically flat. However, the neutrino is on-shell, so should be at asymptotic distances, where the metric should be flat. Clearly this does not match. Only the helicity flip matches.
6 Conclusion: When All Else Fails, Just Tell the Truth [72] Gravity –or rather gauge gravity– is just as electroweak and strong interactions part of the Standard model. In the first part of this contribution we reviewed some formal aspects, such as the fixing of the weak hypercharges in which gravitation plays a role along with the strong and electroweak forces. But the very fact that gravity is naturally coming into play with the other forces is causing trouble: any field theory that is based on a condensate has a ground state that weighs far too much. The reaction of most practitioners has always been a sovereign disdain for the problem, justified by the way the model accommodates most of the experimental facts. In the last years experiments concerning muon universality and the anomalous magnetic moment of the muon turn out to be in conflict with the Standard Model. But whether they will lead to modification of the idea of spontaneous breakdown is not very probable.
14 For
a physical interpretation see ’t Hooft [68].
164
C. P. Korthals Altes
We reviewed very briefly the role of sterile neutrinos in a world where .B − L is anomaly free. They can account for Dark Matter and Baryogenesis through heavy Majorana particles. This mechanism works for sterile neutrinos having masses in a region which is accessible to today’s accelerators so can be tested. In the last section we considered a situation which is probably academic but interesting. We assume sterile neutrinos are absent. Then .B − L is gravitationally anomalous. We tried to find physical consequences of the anomaly. There is a gravitational configuration with a topological charge that does produce the helicity flip of the Majorana mass term. But this configuration has not the required asymptotic behaviour that behooves on-shell neutrinos. Another price to pay is to have only one family. Acknowledgments Thanks are due to Jan Smit, members of the NIKHEF theory group, Mikko Laine and Gerard ’t Hooft for discussions. Hospitality of the NIKHEF theory group and patience of Thierry Paul are gratefully acknowledged.
References 1. Fock, V., 1926, Über die invariante Form der Wellen- und der Bewegungsgleichungen für einen geladenen Massenpunkt, Zeit. für Physik 39, 226–232. 2. Weyl, H., 1929, Elektron und Gravitation, Zeit. für Physik 56, 330–352. 3. Weyl,H.,1950, “A remark on the coupling of gravitation and electron, Phys.Rev.77,699. 4. Yang, C. N., Mills, R. (1954). “Conservation of Isotopic Spin and Isotopic Gauge Invariance”, Phys. Rev. 96 (1): 191–195. 5. Cartan, E., 1922, Sur une generalisation de la notion de courbure de Riemann et les espaces de torsion, C. R. Acad. Sci. (Paris)174 593 595. Cartan E. 1923, Sur la connexion affine et la theorie de la relativite generalisee, Part I Ann. Ec. Norm. 40: 325–412 and 41 1–25; Part II: 42 17–88. 6. R. Utiyama, Phys. Rev.101 (1956) 1597. Kibble T. W. B. 1961, “Lorentz Invariance and the Gravitational Field”, JMP2 212–221. D. W. Sciama, “On the analogy between charge and spin in General Relativity”, Festschrift for Infeld pg 415, Pergamon Press, Oxford 1962. R. Utiyama, Phys. Rev., 101, 1597 (1956). 7. Hehl, Friedrich W., von der Heyde Paul, Kerlick, G. David, and Nester. James M., 1976a, “General relativity with spin and torsion: Foundations and prospects”, Rev. Mod. Phys. 48, 393–416. 8. I. L. Shapiro, Physical Aspects of the Space-Time Torsion, Phys.Rept.357:113,2002, arXiv:hep-th/0103093. 9. Cosserat, E and F (1909).Théorie des corps déformables. Paris: Hermann. 10. Atom-Interferometric Test of the Equivalence Principle at the 10−12 Level Peter Asenbaum, Chris Overstreet, Minjeong Kim, Joseph Curti, and Mark A. Kasevich Phys. Rev. Lett. 125, 191101. 11. Eötvös R.V., Pekar D., Fekete E., Ann. Physik, 68(1922) 11–66. Roll P.G., Krotkov R., Dicke R.H., Ann. Phys., 26 (1964)442–517; Baessler S., Heckel B.R., Adelberger E.G., Gundlach J.H., Schmidt U. and Swanson H.E., Phys. Rev. Lett.83 (1999) 3585–3588. 12. S. Weinberg, Phys.Rev. B 135 (1964) 1049. 13. S. Bludman, Nuovo Cim. 9 (1958) 433. S. Glashow, Nucl.Phys. 22 (1961) 579. 14. Adler, S. L. Axial-vector vertex in spinor electrodynamics. Phys. Rev. 177, 2426 (1969); Bell, J. S., Jackiw, R.: A PCAC puzzle in the sigma-model. Nuovo Cim. 60A, 47–61 (1969).
Standard Model, and Its Standard Problems
165
15. Nielsen, H. B., Ninomiya, M. The Adler-Bell-Jackiw anomaly and Weyl Fermions in a crystal. Phys. Lett. B 130, 389–396 (1983). 16. Rabi, I.I. Das freie Elektron im homogenen Magnetfeld nach der Diracschen Theorie. Z. Physik 49, 507–511 (1928). 17. M. F. Atiyah and I. M. Singer, The index of elliptic operators on compact manifolds, Bull. Amer. Math. Soc. 69 (1963), 422–433. 18. Lee, T.D.; Yang, C.N. (1956). “Question of Parity Conservation in Weak Interactions”. Physical Review. 104 (1): 254–258. 19. Chirality invariance and the universal Fermi interaction, E.C.G. Sudarshan, R. E. Marshak, Phys.Rev. 109 (1958) 1860. R. P. Feynman and M. Gell-Mann, Phys. Rev. 109 (1958) 193. 20. M. Goldhaber, L.Grodzins and A.W. Sunyar, Phys.Rev. 109 (1958) 1015–1017. 21. Y. Nambu and G. Jona-Lasinio (1961), Dynamical Model of Elementary Particles Based on an Analogy with Superconductivity. I, Phys. Rev. 122, 345–358. 22. P.W.Higgs, Phys. Rev. Lett. 13: 508–9 (1964). Englert, F. and Brout, R. Broken Symmetry and the Mass of Gauge Vector Mesons. Phys. Rev. Lett. 13: 321 (1964). 23. LHCb collaboration, arXiv:2103.11769. 24. ’t Hooft, G (1971). Renormalization of massless Yang-Mills fields. Nucl. Phys. B33: 173. ’t Hooft, G (1971). Renormalizable Lagrangians for massive Yang-Mills fields. Nucl. Phys. B35: 167. ’t Hooft, G and Veltman, M (1972). Regularization and renormalization of gauge fields. Nucl. Phys. B44: 189. 25. D.J. Gross; F. Wilczek, Ultraviolet behavior of non-abelian gauge theories, Physical Review Letters. 30 (26): 1343–1346. H.D. Politzer, Reliable perturbative results for strong interactions”. Physical Review Letters. 30 (26): 1346–1349. 26. C.P. Korthals Altes and M. Perrottet, Phys Lett. B 39 (1972), 546. 27. B. Abi et al. (Muon g-2 Collaboration) Phys. Rev. Lett. 126, 141801. 28. The Standard Model Higgs boson as the inflaton, Fedor L. Bezrukov, Mikhail Shaposhnikov, Phys.Lett.B 659 (2008) 703–706, 0710.3755 [hep-th] 29. C. Germani, A. Kehagias, Phys.Rev.Lett.105.011302; arXiv:1003.2635v2 and references therein. 30. A.A. Belavin; A.M. Polyakov; A.S. Schwartz; Yu.S.Tyupkin (1975). “Pseudoparticle solutions of the Yang-Mills equations”. Phys. Lett. B. 59 (1): 85–87. 31. G.’t Hooft, Phys. Rev. Lett.37, 8 (1976); Phys. Rev.D14, 3432 (1976). 32. Super Kamiokande collaboration, Phys.Rev.Lett.86: 5656–5660, 2001; arXiv:hep-ex/0103033. 33. Q.R. Ahmad et al., Phys.Rev. Lett.89, 011302, 2002, nucl-ex/0204008. For a review see A.B. McDonald, Invited Paper for Nobel Symposium 129, August 19–24, 2004, Enköping, Sweden, Phys.Scripta T121 (2005) 29–32; arXiv:hep-ex/0412060v1. 34. Atlas collaboration, Phys.Lett. B716 (2012) 1–29; arXiv:1207.7214 [hep-ex]. 35. A Model of Leptons. Steven Weinberg, Phys.Rev.Lett. 19 (1967) 1264–1266. 36. Bouchiat, C., Iliopoulos, J., Meyer, Ph. (1972), An anomaly-free version of Weinberg’s model, Physics Letters B 38 (7): 519–523, (1972). 37. H. Georgi and S. L. Glashow, “Gauge theories without anomalies,” Phys. Rev. D 6, 429 (1972). D. J. Gross and R. Jackiw, “Effect of anomalies on quasi-renormalizable theories,” Phys. Rev. D6, 477 (1972). L. Alvarez-Gaume and E. Witten, “Gravitational Anomalies,” Nucl. Phys. B 234, 269 (1984). 38. Fritzsch, H.; Gell-Mann, M.; Leutwyler, H. (1973). “Advantages of the color octet gluon picture”. Physics Letters. 47B (4): 365–368. 39. S. Weinberg, Baryon and lepton non-conserving processes, Phys. Rev. Lett. 43, 1566, 1979. 40. Kobayashi, M.; Maskawa, T. (1973). “CP-violation in the renormalizable theory of weak interaction”. Progress of Theoretical Physics. 49 (2): 652–657. 41. P. Minkowski, Physics Letters B67 (1977), 421–428; Gell-Mann, M.; Ramond, P.; Slansky, R. (1979). Freedman, D.; Van Nieuwenhuizen, P. (eds.). Supergravity. Amsterdam: North Holland. pp. 315–321. 42. Maki, Z.; Nakagawa, M.; Sakata, S. (1962). “Remarks on the unified model of elementary particles”. Progress of Theoretical Physics. 28 (5): 870.
166
C. P. Korthals Altes
43. A. Boyarsky, J. Franse, D. Iakubovskyi, and O. Ruchayskiy, Phys. Rev. Lett. 115, 161301 (2015), 1408.2503. Deep XMM Observations of Draco rule out at the 99% Confidence Level a Dark Matter Decay Origin for the 3.5 keV Line; Tesla E. Jeltema, Stefano Profumo, Mon.Not.Roy.Astron.Soc. 458 (2016) 4, 3592–3596, 1512.01239 [astro-ph.HE]. 44. Takehiko Asaka and Mikhail Shaposhnikov, The νMSM, Dark Matter and Baryon Asymmetry of the Universe, Phys.Lett.B620:17–26,2005;arXiv:hep-ph/0505013. Takehiko Asaka, Steve Blanchet, and Mikhail Shaposhnikov, Phys.Lett.B631:151–156,2005, arXiv:hepph/0503065v1. Uniting Low-Scale Leptogenesis Mechanisms, Juraj Klari´c, Mikhail Shaposhnikov(EPFL, Lausanne, LPPC and Ecole Polytechnique, Lausanne), Inar Timiryasov, Phys.Rev.Lett. 127 (2021) 11, 111802 • e-Print: 2008.13771 [hep-ph] 45. P.F. De Salas, S. Gariazzo, O. Mena, C.A. Ternes, M. Tórtola, Front.Astron.Space Sci. 5 (2018) 36; e-Print: 1806.11051 [hep-ph]. 46. see e.g. Relic neutrino decoupling with flavour oscillations revisited, Pablo F. de Salas, Sergio Pastor, JCAP 07 (2016) 051 • e-Print: 1606.06986 [hep-ph]. 47. Matching conditions and Higgs mass upper bounds revisited, Thomas Hambye, Kurt Riesselmann, Phys.Rev.D 55 (1997) 7255–7262; hep-ph/9610272 [hep-ph]. For a review: F. Jegerlehner, Acta Phys.Polon.B 52 (2021) 6–7, 575–605; 2106.00862 [hep-ph]. 48. M. Laine, M.Meyer, JCAP 1507 (2015) 035; arXiv:1503.04935v2 [hep-ph]. 49. Michela D’Onofrio, Kari Rummukainen, Phys. Rev. D 93, 025003 (2016); arXiv:1508.07161v1 [hep-ph] and references therein. 50. J. Ghiglieri and M. Laine, Gravitational wave background from Standard Model physics: Qualitative features, JCAP 07 (2015) 022 [1504.02569]. Gravitational wave background from non-Abelian reheating after axion-like inflation, P. Klose, M. Laine, S. Procacci; 2201.02317 [hep-ph] 51. C.Q. Geng and R.E. Marshak, Phys. Rev. D 39. 693 (1989); J.A. Minahan, Pierre Ramond, R.C. Warner, Phys.Rev.D 41 (1990) 715; A. Font, L. Ibanez and F. Quevedo, Phys. Lett. 228B, 79 (1989); R. Foot, G. C. Joshi, H. Lew and R. R. Volkas, Mod. Phys. Lett. A5, 95 (1990); K.S. Babu and R. N. Mohapatra, Phys. Rev. Lett. 63, 938 (1989); J.Minahan, P. Ramond and R. Warner, Phys. Rev. D 41, 715 (1990); S. Rudaz, Phys. Rev. D41, 2619 (1990). E. Golowich and P. B. Pal, Phys. Rev.D 41, 3537 (1990); P.H. Frampton, R.N. Mohapatra, Phys.Rev.D 50 (1994) 3569–3571; hep-ph/9312230 [hep-ph]. 52. Nakarin Lohitsiri, David Tong, SciPost Phys. 8 (2020) 1, 009; 1907.00514 [hep-th]. . 53. Is the Lee constant a cosmological constant? Andrei D. Linde, JETP Lett. 19 (1974) 183, Pisma Zh.Eksp.Teor.Fiz. 19 (1974) 320–322. Cosmology and the Higgs Mechanism, M.Veltman, Phys.Rev. Lett. 34,(1975), 777. 54. The Exchange of Massless Spin Two Particles, J.J. van der Bij, H. van Dam, Yee Jack Ng, Physica A 116 (1982) 307–320. Zee, A., 1985, in High Energy Physics: Proceedings of the 20th Annual Orbis Scientiae, 1983, edited by S. L. Mintz and A. Perlmutter (Plenum, New York). Buchmuller, W., and N. Dragon, Phys.Lett.B 207 (1988) 292–294, Phys.Lett.B 223 (1989) 313–317. Self-tuning vacuum variable and cosmological constant, F. R. Klinkhamer, G. E. Volovik, Phys.Rev.D 77 (2008) 085015. 55. S. Weinberg, The cosmological constant problem, Rev. Mod. Phys. 61,1 (1989). 56. Casimir, H.B.G.,1948, K.Ned.Acad. Wet. 51, 635. 57. Sparnaay, M (1958). “Measurements of attractive forces between flat plates”. Physica. 24 (6–10): 751–764. Mohideen, U.; Roy, Anushree (1998). “Precision Measurement of the Casimir Force from 0.1 to 0.9 μm”. Physical Review Letters. 81 (21): 4549–4552. arXiv:physics/9805038. 58. E.P. Wigner, Ann.Math 40(1939) 149. Group Theoretical Discussion of Relativistic Wave Equations, V. Bargmann, Eugene P. Wigner, Proc.Nat.Acad.Sci. 34 (1948) 211 59. Riess, A. G. et al., The Astronomical Journal. 116 (3): 1009–1038. arXiv:astro-ph/9805201. Perlmutter, S. et al., The Astrophysical Journal. 517 (2): 565–586. arXiv:astro-ph/9812133. Schmidt, B.P. et al., The Astrophysical Journal. 507 (1): 46–63. arXiv:astro-ph/9805200. The Planck Collaboration (2020). “Planck 2018 results. VI. Cosmological parameters”. Astronomy and Astrophysics. 641: A6. arXiv:1807.06209.
Standard Model, and Its Standard Problems
167
60. Topology in the Weinberg-Salam Theory, N.S. Manton, Phys.Rev.D 28 (1983) 2019. A Saddle Point Solution in the Weinberg-Salam Theory, Frans R. Klinkhamer, N.S. Manton, Phys.Rev.D 30 (1984) 2212. 61. Sakharov, A. D., “Violation of CP invariance, C asymmetry, and baryon asymmetry of the universe”, JETP Letters. 5 (1): 24–26. 62. V. A. Kuzmin, V. A. Rubakov and M. E. Shaposhnikov, Phys. Lett. B 155, 36 (1985). 63. T. Kimura, Prog. Theor. Phys.42, 1191 (1969). R. Delbourgo and A. Salam, Phys. Lett.40B, 381 (1972). T. Eguchi and P. Freund, Phys. Rev. Lett., 1251 (1976). 64. A. Dobado and A. Maroto, The standard model anomalies in curved space-time with torsion, Phys. Rev.D54(1996) 5185–5194,hep-ph/9509227. 65. S. W. Hawking, Gravitational instantons, Phys. Lett.60A(1977) 81. 66. Tohru Eguchi, Andrew J. Hanson, Self-dual Solutions to Euclidean Gravity, Annals Phys. 120 (1979) 82. 67. G.W. Gibbons, S.W. Hawking, Phys. Lett. 78B (1978),430. 68. A Physical Interpretation of Gravitational Instantons, Gerard ’t Hooft, Nucl.Phys.B 315 (1989) 517–527. 69. Page, D.N., Phys. Lett. 78B 249 (1978). 70. S.W. Hawking and C.N. Pope, Phys. Lett. 73B (1978) 42. 71. The Positive action conjecture and asymptotically Euclidean Metrics in Quantum Gravity, G.W. Gibbons, C.N. Pope, Commun.Math.Phys.66, 267–290(1979); New gravitational index theorems and super theorems, S.M.Christensen, M.J.Duff, Nuclear Physics B154 (1979). 72. D.T. Regan, 66th US Secretary of the Treasury, dixit.
SU(3) Higher Roots and Their Lattices
.
Robert Coquereaux
Abstract After recalling the notion of higher roots (or hyper-roots) associated with “quantum modules” of type .(G, k), for G a semi-simple Lie group and k a positive integer, following the definition given by A. Ocneanu in 2000, we study the theta series of their lattices. Here we only consider the higher roots associated with quantum modules (aka module-categories over the fusion category defined by the pair .(G, k)) that are also “quantum subgroups.” For .G = SU(2) the notion of higher roots coincides with the usual notion of roots for ADE Dynkin diagrams and the self-fusion restriction (the property of being a quantum subgroup) selects the diagrams of type .Ar , .Dr with r even, .E6 and .E8 ; their theta series are well known. In this paper we take .G = SU(3), where the same restriction selects the modules .Ak , .Dk with .mod(k, 3) = 0, and the three exceptional cases .E5 , .E9 and .E21 . The theta series for their associated lattices are expressed in terms of modular forms twisted by appropriate Dirichlet characters.
1 Introduction Root systems of Lie algebras are higher root systems of type .G = SU(2) and generate lattices whose properties and associated theta series are well known. Here we consider higher root systems of type .G = SU(3) using a general definition given by A. Ocneanu in 2000 [22], see also the more recent reference [25]. Such systems are classified by “quantum modules” or “quantum subgroups,” i.e., using a categorical language, by module-categories over the modular fusion category defined by a pair .(Lie(G), k), where k, the level, is a non-negative integer. Quantum modules are characterized by graphs, that, for .SU(3), have been obtained long ago in
R. Coquereaux () Aix Marseille Univ, Université de Toulon, CNRS, CPT, Marseille, France © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Flandrin et al. (eds.), Theoretical Physics, Wavelets, Analysis, Genomics, Applied and Numerical Harmonic Analysis, https://doi.org/10.1007/978-3-030-45847-8_11
169
170
R. Coquereaux
the framework of conformal field theories (see sec. 17.10 of the book [12, 13, 22]1 ) and generalize the usual ADE Dynkin diagrams. In the present paper we restrict our attention to an even smaller family whose members are called “quantum subgroups,” and sometimes nicknamed “exceptional modules with self-fusion” (they can be defined as connected étale algebras in a modular fusion category). To every quantum module is associated a higher root system. Such systems generate lattices, and our main purpose, after having recalled in the first two sections the necessary definitions and concepts, is to provide closed expressions for the corresponding theta series. By exhibiting Gram matrices encoding the geometry of lattices we express their theta series in terms of modular forms twisted by appropriate Dirichlet characters. More information about those lattices can be found in [3], where “higher roots” are called “hyper-roots.” The first few terms of several series had already been obtained by A. Ocneanu [22, 24], and our expressions, which can be expanded to arbitrary orders, agree with these older results. Using explicit Gram matrices leads to a technique that is only suitable for low levels but we hope that this work will trigger new developments and pave the way for more general results. I learned a lot of group theoretical concepts from Alex, mostly in problems related to quantization. He also introduced me to the fascinating world of elliptic functions and modular forms—it was in a cosmological framework, some time before the wavelet period started. This contribution to Alex memory has a little to do with quantization, very little to do, as far as I can see, with the theory of wavelets, and probably nothing to do with cosmology (but who knows?). Nevertheless, I am sure that Alex would have liked the mixture of geometry and analysis involved in the present work, even if its application to physical sciences still belongs to the realm of science-fiction, and, in any case, Alex was such a kind person that he would never had told me the opposite! Notations To dissipate possible misunderstandings, the neophyte should maybe read Sect. 2.1 first. Warning: the subindex k used for script symbols stands for the chosen level, it does not denote the number r of simple objects (the rank). Calling .g ∨ the dual Coxeter number of G, the number .N = g ∨ + k is the “generalized Coxeter number” or “altitude” of the quantum module .Ek (G); remember that .g ∨ = 2 for .G = SU(2) and .g ∨ = 3 for .G = SU(3). In the former case, N coincides with the dual Coxeter number of the Lie algebra defined by the associated Dynkin diagram (quantum McKay correspondence). Quantum subgroups are as follows: For .G = SU(2), we have .Ak (SU(2)) = Ak+1 with .r = k + 1, .Dk=4s (SU(2)) = D2s+2 with .r = 2s + 2, .E10 (SU(2)) = E6 with .r = 6, .E28 (SU(2)) = E8 with .r = 8. The associated graphs (Dynkin diagrams) are well known. Above and below: the symbol r denotes the number of vertices.
1 This reference stresses the importance of a cocycle condition for triangular cells, a condition that is explicitly worked out in [8], see also [15].
.SU(3)
Higher Roots and Their Lattices
171
For .G = SU(3) the quantum subgroups are denoted .Ak (SU(3)) with .r = (k + 1)(k + 2)/2!, .Dk=3s (SU(3)) with .r = 31 ( (k+1)(k+2) − 1) + 3), .E5 (SU(3)) with 2 .r = 12, .E9 (SU(3)) with .r = 12, and .E21 (SU (3)) with .r = 24. We shall often drop the reference to .SU (3) in the above notations since we are mostly interested in that case. The graphs associated with .A1 , .A2 , .A3 , .A4 , .D3 , .D6 , .E5 , .E9 , .E21 are sketched below.2
For the Category-Minded Reader The category .Ak of integrable modules of the affine Kac-Moody algebra associated with .Lie(G) at level k, see, e.g., [18], is equivalent (an equivalence of modular tensor categories), [16, 17, 19], to a category constructed in terms of representations of the quantum group .Gq at root of unity .q = exp( g ∨iπ+k )—take the quotient of the category of tilting modules by the additive subcategory generated by indecomposable modules of zero quantum dimension. A “quantum module” denoted .Ek or 3 .Ek (G) is, by definition, a module-category over .Ak = Ak (G). To each modulecategory over .Ak one associates a system of higher roots. Here we consider only higher roots associated with quantum modules that are also quantum subgroups, i.e., module-categories that define a commutative étale algebra (also called .E) in .Ak —every such algebra giving rise to an indecomposable module-category over .Ak of the particular kind that we have in mind (consider the category of .E-modules in .Ak ). This discussion can be phrased in terms of weak Hopf algebras, see [27].
2 Lattices of Higher Roots 2.1 Analogies and Warnings We want to develop an analogy aimed at introducing the subject to the novice reader—the present section is therefore rather informal and should be skipped by specialists. Groups may have subgroups, and the space of representations (or of characters) of a subgroup is a module over the ring of representations (or of characters) of the 2 Of
course they are not Dynkin diagrams.
3 It is defined by a monoidal functor from .A
.E,
k to the category of endofunctors of an abelian category [27]. Terminological warning: a module-category is usually not a modular category!
172
R. Coquereaux
group. The category of representations of a subgroup is a module-category over the category of representations of the corresponding groups: tensor multiplying a representation of a group by a representation of a subgroup gives a representation of the subgroup. Here we have a similar situation. The departure point is a category .Ak (G), where G is some semi-simple Lie group, this category can be defined in many ways, up to equivalence, but we do not even need to know one to proceed. It is enough to know that .Ak (G) is called “quantum G at level k” and that this category is monoidal, meaning that its objects can be multiplied, like the representations of a group. What matters is that we have a ring, which we also called .Ak (G), that plays the role of a ring of characters (it is indeed the ring of characters for an appropriate structure), and that this ring may have modules (like the space of representations of a subgroup), the ring acts on the module and this is called “fusion.” Some of those modules have themselves a compatible ring structure, so they play the role of quantum analogs for (space of representations of) subgroups. Of course, the ring .Ak (G) is one of its own modules. When .G = SU(2), .Ak (G) is generated by the analog of the spin .1/2 representation (the fundamental) but the difference with the classical case is that the number of simple objects (irreducible representations, or irreps) is finite: there are only .k + 1 of them. The law of composition of spins is the same as usual: composing a spin .1/2 with a spin j gives a representation that decomposes as the sum of two irreps, of spins .j − 1/2 and .j + 1/2, unless .j = (k + 1)/2, because in the quantum case at level k, the spin .(k + 1 + 1)/2 does not exist. The whole multiplication table of spins is determined by the action of the generator (.j = 1/2) and it is encoded by the graph .Ak+1 , a truncated line with only .k + 1 vertices: each vertex is an irrep and the result of multiplying the irrep j by .1/2 are the (two) neighbors of j on the graph. In conformal field theory (WZW models), these vertices are interpreted as “primary fields,” but we shall not need that. Every module on the latter ring is described by a finite graph that encodes the multiplication (fusion) of basis elements by the generator of .Ak (SU(2)), and it happens (as discovered first by physicists) that the graphs characterizing those modules are in one-to-one correspondence with the simply laced Dynkin diagram. Moreover those with “selffusion” (those for which one can define a product compatible with the previous action), i.e., those that are indeed “quantum subgroups,” have a graph belonging to a smaller family: the .Ak themselves, of course, but also the .Deven , and also .E6 (for .k = 10), and .E8 (for .k = 28). When G is .SU(2), this is the end of the classification history, which essentially took place—with another language—more than 30 years ago [1], when the purpose was to classify the modular invariants of WZW models of type .SU(2) (again, we shall not need that). If one takes .G = SU(3) rather than .SU(2), the graphs that one obtains, which describe again the action of (the quantum analog of) the defining representation of .SU(3) on the different modules of .Ak (SU(3)), have been obtained about 20 years ago [13, 22] (see also [6]). Those with self-fusion are displayed at the end of the introduction. Again, we do not need here to give the precise construction of those modules, or explain how this classification was obtained, but the interested reader
.SU(3)
Higher Roots and Their Lattices
173
could construct them from their graphs, since the adjacency matrix of each of these graphs defines, by definition, the action of the generator. In the .SU(3) case there are actually two generators, the defining representation and its conjugate, but here graphs are oriented and the action of the conjugated generator is obtained from the given diagrams by reversing the arrows. Warning Dynkin diagrams are ubiquitous in Mathematics and they are very often introduced when discussing the classification of simple Lie algebras. It should be stressed that the diagrams obtained for .G = SU(3) are not Dynkin diagrams, since the latter all appear when discussing the classification of modules over .Ak (SU(2))! As a matter of fact, when presenting a panoramic view of quantum modules for .SU(2) and .SU(3) at level k, one does not need to make any reference to (and do not need to use any information from) the general theory of Lie groups. For instance, when studying the module described by the graph .E8 , also denoted .E28 (SU(2)) because it is a module over .A28 (SU(2)) = A29 , or when studying quantum field theory models associated with this choice, one only needs to use the properties of .SU(2), not those of the exceptional group .E8 . The latter plays no role in this discussion despite the fact that its Dynkin diagram does appear. It remains that Dynkin diagrams do have something to do with semi-simple Lie groups! In particular, concepts like roots and weights can be defined from them, and solely from the knowledge of their diagrams. Since one can “reverse the machine” in the .SU(2) case, i.e., start from the classification of modules over .Ak (SU(2)) to discover the Dynkin diagrams, and from them, define for instance the notion of roots (one can define the roots of .E8 without having to define the group .E8 itself), it is tempting to consider the possibility of associating roots (or rather “higher roots,” or “hyperroots”) to the graphs describing the classification of modules over .Ak (SU(3)), even though there are not Dynkin diagrams. The interpretation and use of such higher roots in representation theory of usual Lie groups, or in quantum field theory, is something that is still largely unknown and goes anyway beyond the scope of this article, but it remains that such higher roots have been defined (for any semi-simple G, in [22]) and they are still quite mysterious. Like for the usual roots of simple Lie groups, their Z-spans define lattices, and the study of these lattices is the study of the present paper.
2.2 On Extended Fusion Matrices and Their Periods Ak being a monoidal category, with a finite number of simple objects denoted m, n, p . . ., we consider the corresponding Grothendieck ring and its structure coefficients, the so-called fusion coefficients Nmnp , where m × n = p Nmnp p. They are encoded by fusion matrices Nm with matrix elements (Nm )np = Nmnp . One may consider module-categories E associated with Ak . Of course, one can take for instance E= Ak . The fusion coefficients Fnab characterize the module structure: n × a = b Fnab b, where a, b, . . . denote the simple objects of Ek . They are encoded either by square matrices Fn , with matrix elements (Fn )ab = Fnab ,
174
R. Coquereaux
still called fusion matrices, or by the rectangular matrices4 τa = (τa )nb , with (τa )nb = (Fn )ab . The simple objects of Ak , or irreps, are labelled by the vertices of the Weyl alcove at level k. With G = SU(3), the simple objects n (irreps) are labelled by pairs (p, q) of non-negative integers with p + q ≤ k, so the matrix Fn = Fp,q has matrix elements (Fn )ab = (Fp,q )a,b and the Chebyshev recursion relations of the G = SU(2) case (composition of spins) are replaced by F(p,q) = F(1,0) F(p−1,q) − F(p−1,q−1) − F(p−2,q+1)
.
F(p,0) = F(1,0) F(p−1,0) − F(p−2,1)
if q = 0 (1)
F(0,q) = (F(q,0) )T F(0,0) is the identity matrix and F(1,0) and F(0,1) are the two generators.5 Also, T F(q,p) = F(p,q) . In some applications one sets to zero the fusion matrices whose Dynkin labels do not belong to the chosen Weyl alcove. This is not what we do here. On the contrary, the idea is to use the same recursion relations to extend the definition of the matrices Fn at level k from the Weyl alcove to the fundamental Weyl chamber of G (cone of dominant weights) and to use signed reflections with respect to the hyperplanes of the affine Weyl lattice in order to extend their definition to arbitrary arguments n ∈ , the weight lattice of G. By so doing one obtains an infinite family of matrices Fn that we still (abusively) call “fusion matrices,” and for which we keep the same notations, although their elements can be of both signs. It is also useful to shift (translation by the Weyl vector) the labelling index of the these matrices to the origin of the weight lattice; in other words, for n ∈ , using multi-indices, we set {n} = (n − 1), where the use of parenthesis refers to the usual Dynkin labels. We therefore introduce the “ρ-shifted notation” (ρ being the Weyl vector), setting F{p,q} = F(p−1,q−1) , so that F{0,0} is the zero matrix and F{1,1} = F(0,0) is the identity (the latter corresponding to the weight with components (0, 0) in the Dynkin basis, i.e., to the highest weight of the trivial representation). The following results belong to the folklore: Setting N = k + 3 one has F{p,q} = 0 whenever p = 0 mod N, q = 0 mod N or p + q = 0 mod N . One also gets immediately the following equalities: F{p+N,q} = (P .F ){p,q} , F{p,q+N } = (P 2 .F ){p,q} where P = F{N−2,1} is a generator of Z3 (with P 3 = 1) acting by rotation on the fusion graph of Ak and F{p+3N,q} = F{p,q+3N } = F{p+N,q+N} = F{p,q} . The sequence F{p,q} is periodic of period 3N in each of the variables p and q but it is completely characterized by the values that it takes in a rhombus with N × N vertices; for this reason, this rhombus D will be called periodicity cell, or periodicity rhombus. We
4 The F are sometimes called “annular matrices” when A and E are distinct (if they are the n k k same, then Fn = Nn ), and the τa are sometimes called (by the author) “essential matrices.” 5 The adjacency matrices of the graphs given at the end of the introduction are precisely the (unextended) matrices F(1,0) .
.SU(3)
Higher Roots and Their Lattices
175
have reflection symmetries (with sign) with respect to the lines {p} = 0 mod N, {q} = 0 mod N and {p + q} = 0 mod N. The F matrices labelled by vertices belonging to the Weyl alcove specified by the choice of a non-negative integer k have non-negative integer matrix elements (the alcove is strictly included in the first half of a periodicity rhombus); those with indices belonging to the other half of the inside of the rhombus have non-positive entries, those with vertices belonging to the walls of the Weyl chamber or to the second diagonal of the rhombus vanish, and the whole structure is periodic. The Weyl group action6 on the SU(3) lattice is well known. Since we have extended the definition of the fusion matrices Fn to allow arguments n belonging to the weight lattice we can do the same for the essential matrices τ ’s, keeping the same notation: the indices a, b of (τa )nb still refer to simple objects of E but the index n labels weights of G; the infinite matrices τa can be thought of as rectangular, with columns indexed by the elements of E (a finite number) and lines indexed by the weights of , the weight lattice of G.
2.3 The Quiver of Higher Roots (“the Ocneanu Ribbon”) 2.3.1
The Ribbon R
Given a module-category .E over .Ak (the same notation .E will also denote the set of isomorphisms classes of its simple objects), we defined, for every choice of .a ∈ E, an infinite matrix .τa which is a periodic, integer-valued, particular function, on . × E. In many cases the definition domain of .τa can be further restricted: indeed, there are many modules .E that have a non-trivial grading with respect to the center .Z (here .Z3 ) of the underlying Lie group (here .G = SU(3)); in those cases, not only the weights of G, its irreducible representations, the simple objects of .Ak , but also the simple objects of .E, have a well defined grading (denoted .∂) with respect to .Z, and the module structure is compatible with this grading: matrix elements of .τa in position .(n, b) will automatically vanish if .∂n + ∂a = ∂b. Existence of such a non-trivial grading occurs for all the module-categories considered in this paper (cases with self-fusion). The function .τa on . ×Z E, is specified by the values that it takes on the finite set .R∨ = D ×Z E where D is the period parallelotope. The set .R∨ , a finite rectangular table made periodic, may be thought as a closed ribbon.7 For the cases that we consider, the group .Z acts non-trivially and .R∨ has .rE |D|/|Z| elements, where the rank .rE is the number of simple objects of .E, and ∨ r .|D| = N G . The elements of .R will be called restricted (for reasons given below) higher roots of type G defined by the module .Ek (G). With .G = SU(3) one has ∨ 2 .|R | = rE (k + 3) /3.
6 This 7 The
is the shifted Weyl action: w · n = w(n + ρ) − ρ where ρ is the Weyl vector. terminology “ribbon” comes from A. Ocneanu.
176
R. Coquereaux
The choice of a fundamental irrep .π of G, with the constraint that it should exist at level k—so that .π defines a particular non-trivial simple object of .Ak (G)—allows one to associate with a graph .Eπ (but we shall usually drop the reference to .π in the notation): it is the graph of multiplication by .π, sometimes called fundamental fusion graph, fundamental representation graph, nimrep graph, or McKay graph associated with .π . If .π is complex, edges of .E are oriented; it is actually a quiver since it is a directed graph where loops and multiple arrows between two vertices are allowed. For .SU(3) there are two fundamental irreps, they are complex conjugate to one another and both appear already at level 1; the associated directed graphs just differ by a global change of orientation (reverse arrows), for definiteness we choose the fundamental irrep with highest weight .(1, 0) in the Dynkin basis, i.e., .{2, 1} in back shifted (i.e., .ρ shifted) coordinates. The weight lattice . can be considered as a directed graph in the usual way, the direct successors of .{p, q} being the vertices with components .{p + 1, q}, .{p, q − 1} and .{p−1, q +1}; its restriction to the dominant Weyl chamber can be interpreted as the representation graph of the .(1, 0)-fundamental representation of .SU(3). Being essentially a cartesian product of two directed multigraphs, the set .R∨ becomes a quiver: the quiver of higher roots, or “ribbon.” If .SU(3) is replaced by .SU(2), the graph .E has an adjacency matrix which is twice the identity minus the Cartan matrix of a simple Lie group and the above construction leads to an associated quiver of roots; several examples of this construction (for instance the quiver of .E6 with its 72 vertices) can be found in [4]. For usual roots, i.e., .SU(2) higher roots, the opposite of a root is a root. However, for .SU(3) higher roots, one can see, using the definition of the period parallelotope D, that if .α ∈ R∨ , .−α does not correspond to any vertex of .R∨ . This feature is not convenient. For all purposes it is useful to generalize the previous definitions, keeping .R = R∨ for .SU(2) but setting .R = R∨ ∪ −R∨ for .SU(3), then .|R| = 2|R∨ |. The opposite of a higher root (an element of .R) is then always a higher root. For higher roots associated with quantum subgroups .E of .SU(3) we have therefore |R| = 2|R∨ | =
.
2 rE (k + 3)2 3
(2)
This can be seen as a generalization of the Kostant relation: for .SU(2) higher roots, i.e., usual roots, one has .|R| = rE g where .g = k + 2 is the Coxeter number, and .rE the number of nodes, of the Dynkin diagram .E. For .SU(3) higher roots, if one chooses .E = Ak one has .rE = (N − 2)(N − 1)/2, with .N = k + 3, then.8 |R| = 2|R∨ | = (N − 2)(N − 1)N 2 /3
.
(3)
number of higher roots for .Ak (SU(3)) is therefore also given by the number of 3-cycles in the rook graph .MN = KN KN , the Cartesian square of the complete graph .KN on N vertices (one recognizes the A288961 sequence of the OEIS [26]). 8 The
.SU(3)
Higher Roots and Their Lattices
177
From now on, we shall usually not mention .R∨ , the set of restricted higher roots, since .R will be used most of the time.
2.4 An Euclidean Structure on the Space of Higher Roots The inner product of two higher roots .α = (m, a) and .β = (n, b) is defined as .
< α, β >=
(w)Fm−n+wρ−ρ,a,b
(4)
w∈W
where W is the Weyl group. The above expression generalizes the one obtained9 for .G = SU(2). Another possibility (the approach followed in [22] ) is to start from the notion of harmonicity: The .R∨ quiver being essentially a cartesian product of two multigraphs, we have a natural notion of harmonicity for the functions defined on its underlying set; a point .α of .R∨ specifies a harmonic function, also denoted |R∨ | on the .α, defined as the orthonormal projection of the Dirac measure .δα ∈ C ∨ subspace of harmonic functions; one proves that its value on .β ∈ R , is precisely given by Eq. (4). More generally, higher weights are .Z-valued functions that are harmonic on the ribbon. For us it will be enough to take Eq. (4) as a definition of the inner product between higher roots and extend . , by linearity to their linear span. One checks that it defines a positive definite10 inner product and therefore an Euclidean structure on the space of higher roots. This Euclidean space will be denoted .C. We did not introduce here any higher analogue of the non-simply laced condition, so that higher roots have only one possible length: .< α, α >= |W|, for all higher roots .α. With .G = SU(3), the Weyl group is .S3 , all higher roots have norm 6 and the inner product of two higher roots is obtained from Eq. (4) as the sum of six fusion coefficients.11 Writing .α = (m, a) = ((m1 , m2 ), a), .β = (n, b) = ((n1 , n2 ), b), setting .λ1 = m1 − n1 , .λ2 = m2 − n2 , and using shifted labels, we obtain .
9 This
< α, β > = F{λ1 +1,λ2 +1} + F{λ1 −2,λ2 +1} + F{λ1 +1,λ2 −2)} (a,b) − F{λ1 −1,λ2 −1} + F{λ1 −1,λ2 +2} + F{λ1 +2,λ2 −1} (a,b)
(5)
was recognized and generalized in [22] but it is already present in [14]. Eq. (4) one could define a periodic inner product on . ×Z E that would not be positive definite because of the periodicity, but we consider directly its non-degenerate quotient, naturally defined on the ribbon .D ×Z E. 11 With .G = SU(2), one recovers the fact that roots of simply laced Dynkin diagrams have norm 2 and that the inner product of two roots can be written as a sum of two fusion coefficients. 10 Using
178
R. Coquereaux
Elements of the higher root lattice, the .Z–span of higher roots, are called “higher root vectors,” and the elements of its dual lattice are “higher-weight vectors.”
2.5 Rank of the System The dimension .r = dim C of the space of higher roots associated with .Ek (G), in those cases where the center .Z acts non-trivially on the set of vertices of .Ek (G), |W| is12 .r = dim C = rE|Z| where .W is the Weyl group associated with the simple Lie group G. The term .|W|/|Z| cancels out for .G = SU(2) since .W and .Z are both isomorphic with .Z2 and one then recovers the rank .r = rE given by the number of vertices of the chosen Dynkin diagram. For .G = SU(3), .W = S3 , .|W| = 3!, and for modules .E with non-trivial triality we have .Z = Z3 , therefore .r = 2 rE . Moreover, if one chooses .E = Ak , then .r = (N − 2)(N − 1) with .N = k + 3. Warning: For higher roots of type .SU(3), in contrast with the case of usual roots, one should remember that .r and .rE are related by a factor 2, and this implies that the combinatorial structure encoding the inner product in the space of higher roots will require two copies of the fusion graph .E generalizing the Dynkin diagram. A naive generalization of the equation .A = 2 − F(1) relating the adjacency matrix of Dynkin diagrams to the Cartan matrix could suggest, in the case of .SU(3) higher roots, to replace A by .6 − (F(1,0) + F(0,1) ), some properties of this last matrix and of its inverse are actually investigated in one section of [10], see also [5], but the lattices obtained from this naive choice are not the lattices of higher roots considered in the present paper.
2.6 A Graphical Summary For illustration we consider the module .E = A3 = A3 (SU(3)). The weight lattice of .SU(3) and the period parallelotope at level .k = 3, a rhombus that we called D, are displayed in Fig. 1. Weights are blue or red dots, roots are blue dots. The edges of D have length .N = k + 3 = 6, the generalized Coxeter number (altitude). The next step is to consider the cartesian product . × E, which is displayed in Fig. 2: here .E denotes the graph encoding the module structure (its adjacency matrix, for the chosen example, is the fundamental fusion matrix of .SU(3) at level 3). So, to each weight one associates a copy of the graph .A3 (we only draw the graphs associated with weights belonging to a rhombus of side N ).
.
12 This
general result was claimed in the last two slides of [23] and it can be explicitly checked in all the cases that we consider below.
.SU(3)
Higher Roots and Their Lattices
179
Fig. 1 The .SU(3) weight lattice . and the period parallelotope D at level 3
Fig. 2 Hexagonal window in the Cartesian product . × A3
Each of the “small graphs .A3 ” has 10 vertices but not all vertices of those small graphs will represent higher roots since one has to quotient the previous Cartesian product by the action of the center .Z = Z3 : only the large dots of Fig. 3 (Right) define higher roots (the small graphs contain either 3 of 4 such large dots, leading to a total of .10 × 62 /3 = 120 higher roots). Because of periodicity one should
180
R. Coquereaux
Fig. 3 Top: the period parallelogram D of .A3 (SU(3)); its horizontal base has length = 3 + 3 = 6. Bottom: the positions of the 120 higher roots (blue dots) obtained by taking the Cartesian product of D by the “small” fusion graph of .A3 over the center .Z3 . Note: .(N + 1)2 = 72 small graphs are displayed on top of D but it is enough to consider .62 small graphs (with blue dots) since those located on the top and right edges of D (with red dots) can be obtained using periodicity
.N
take only once the small graphs associated with weights belonging to the edges of the period parallelogram, in other words, the top and right edges, in Fig. 3, do not contribute to the counting, so that the total is again .(4 × 2 + 3 × 4) × 6 = 120, as expected. The reader should use a screen display with a large enough magnification factor since, obviously, the described features cannot be seen in a printed version. In order not to clutter Figs. 3 and 4, the arrows of the resulting periodic quiver are not drawn—the interested reader may, however, look at Fig. 2 of reference [3] where a simpler example is completely described. Figure 4 displays the family of inner products between some chosen higher root (the one marked “6” in the picture) and all the higher roots of the quiver .R∨ of .A3 (SU(3)). Those inner products are calculated using Eq. (6). One obtains in this way a periodic function which is completely characterized by the values that it takes in .R∨ , but, as a function on a discrete subset of . × E, it is periodic of period 3N for each of the directions parallel to the walls of the Weyl
.SU(3)
Higher Roots and Their Lattices
181
Fig. 4 The family of inner products between the higher root marked 6 and all the higher roots of the quiver .R∨ of .A3 (directed edges are not displayed)
lattice (see our discussion in Sect. 2.2); one can also describe the resulting periodic function by displaying the values that it takes on the hexagon given in Fig. 5—here we do not even draw the edges of the small graphs. The same hexagon can be useful to explicitly check the harmonic property mentioned in Sect. 2.4: the sum of the values taken by a given higher root on its preimage relative to the “large directed graph” . is equal to the sum of the values that it takes on its preimage relative to the “small directed graph” .E (see Fig. 4 of [3] for illustration).
2.7 Choice of a Basis There are many ways of choosing a basis for a lattice. To every choice is associated a fundamental parallelotope and a Gram matrix A (the matrix of inner products in this basis). In the case of the .SU(2) higher root systems (i.e., root systems in the usual sense), one may choose for Gram matrix A the Cartan matrix corresponding to a given Coxeter-Dynkin graph .E. For lattices of higher roots the notion of “Cartan matrix” is not available. In what follows we shall present only one Gram matrix, called A, since this matrix defines the lattice up to integral equivalence. It will be obtained from Eq. (6) by choosing a basis that we call .B1 . Its .2rE elements (assuming .k > 0) belong to the bottom left corner of the .SU(3) period parallelogram, more precisely, we choose those higher roots located in the admissible vertices of the six fusion graphs sitting in positions .{{0, 0}, {0, 1}, {1, 0}, {1, 1}, {2, 0}, {0, 2}} of the weight lattice; one checks that this indeed determines a basis which is fully specified once an ordering has been chosen. Many other basis choices are of course possible, for instance .B2 or .B3 , respectively, associated with the admissible vertices belonging to the fusion graphs located in positions .{{1, 1}, {2, 1}, {1, 2}, {3, 1}, {2, 2}, {1, 3}} and .{{0, 0}, {1, 0}, {0, 1}, {N −
182
R. Coquereaux
Fig. 5 .A3 (SU(3)): hexagon displaying the periodicity of the scalar product between a chosen higher root (located at the center of the hexagon) and all the higher roots. Note: the chosen higher root is the same as the one chosen in Figs. 4 or 6, but the whole picture has been shifted to the right by one unit, in order for this higher root to be at the center of the hexagon
1, N − 2}, {N − 2, N − 1}, {N − 1, N − 1}}, with .N = k + 3, but we shall not display the associated Gram matrices. The basis .B1 is illustrated in Fig. 6.
2.8 Higher Roots and Their Inner Products: Summary of the Procedure • Choose a module E over Ak = Ak (SU(3)), for instance Ak itself. • From the fundamental fusion matrix F(10) of the chosen module, calculate the other fusion matrices Fn , for instance using the SU(3) recurrence relation (Eq. (1)). • Extend the fusion matrices to the weight lattice of SU(3), using symmetries and periodicity.
.SU(3)
Higher Roots and Their Lattices
183
Fig. 6 .SU(3) at level 3: the basis .B1 in the periodicity quiver (20 positions marked with integers— they are located in the bottom left corner of Fig. 4). We display the scalar product between the higher root marked 6 and all other basis elements; these values appear on line 5 of the Gram matrix for .A3 given in Sect. 3.3
• It is useful to build the periodic essential matrices τa , not only the Fn , in particular if the chosen module E is not Ak . • Using Eq. (6) one can determine a matrix Abig of the |R| × |R| scalar products between the (k + 1)(k + 2)(k + 3)2 /3 higher roots (or only between those of R∨ ). The matrix Abig has rank r = 2rE . • Select a family (αi ) of r independent higher roots (i.e., choose a basis) and call A the r × r restriction of the previous table Abig to the chosen basis. A will be a Gram matrix for the lattice of higher roots. Abig can, however, be huge and it is shorter to determine A by calculating only the r × r inner products between the basis elements of some chosen basis, for instance B1 , described previously. • The choice of A determines a basis (αi ) of higher roots that are such that < αi , αj >= Aij . Call K = A−1 the inverse of A and (ωi ) the dual basis of (αi ), then < ωi , ωj >= Kij and < αi , ωj >= δij . The family of vectors (ωi ) is, by definition, the basis of higher weights associated with the basis of higher roots (αi ). Linear combinations of the vectors (ωi ) with integer coefficients are (integral) higher weights. Warning: Indices i, j . . . of (αi ) or of (ωi ) run from 1 to r = 2rE whereas indices a, b of (τa ) refer to the irreps of E and therefore run only from 1 to rE . • The last step is to study the lattice of higher roots and its theta function. How this is done is described in the next section.
184
R. Coquereaux
One can a posteriori check that the orthonormal projection of a Dirac measure on the ribbon on the subspace of harmonic functions (higher weights) is indeed a higher root. This could have been used as a method to determine the latter.
3 Theta Functions for Lattices of Higher Roots 3.1 Lattices and Theta Functions (Reminders) We remind the reader a few results about lattices and their theta functions. This material can be gathered from [29]. Consider a positive definite quadratic form Q which takes integer values on .Zm . We can write .Q = 12 x T A x, with .x ∈ Zm and A a symmetric .m × m matrix. Integrality of Q implies that A is an even integral matrix (its matrix elements are integers and its diagonal elements are even). Therefore A is a positive definite nonsingular matrix, and .det (A) > 0. So the inverse .A−1 exists, as a matrix with rational coefficients. The modular level of Q, or of A, is the smallest integer . such that . A−1 is again an even integral matrix—this notion differs from the notion of conformal level k used in the previous part of this article. . = (−1)m det (A) is the discriminant of A. ∞ n 13 Given Q, one defines the theta function .θQ (z) = n=0 p(n) q where m .q = exp(2iπ z) and .p(n) ∈ Z≥0 is the number of vectors .x ∈ Z that are such that .Q(x) = n. The function .θQ is always a modular form of weight .m/2. In our framework m will always be even (in particular . = det (A)) so that we set .m = 2s with s an integer. The following theorem (Hecke-Schoenberg) is known [29] and will be used: Let .Q : Z2s → Z a positive definite quadratic form, integral, with .m = 2s variables, of level . and discriminant .. Then the theta function .θQ is a modular form on the group .0 ( ), of weight s, and character .χ . s In plain terms: .θQ ( az+b cz+d ) = χ (a) (cz + d) θQ (z) for all .z ∈ H (upper halfa b plane) and . c d ∈ 0 ( ). Here .0 ( ) is the subgroup 14 of .SL(2, Z) defined by the condition .c ≡ 0 mod and .χ is the unique Dirichlet character modulo . which is such that .χ (p) = L(, p) for all odd primes p that do not divide . , where .L denotes the Legendre symbol. Notice that m, as defined above, is also, in our framework, the dimension of the space .C of higher roots, which, for .G = SU(3), is equal to .2rE . In that case, the weight of the (twisted) modular form .θQ is therefore equal to .rE , the number of vertices of the fusion diagram, or the number of simple objects in .E.
13 This
parameter q is not related to the root of unity, called .q, that appears in Sect. 2.2. ⊂ 0 ( ), one can sometimes use modular forms (and bases of spaces of modular forms) twisted by Dirichlet characters on the congruence subgroup .1 ( ).
14 As . ( ) 1
.SU(3)
Higher Roots and Their Lattices
185
About Dirichlet Characters Dirichlet characters are particular functions from the integers to the complex numbers that arise as follows: given a character on the group of invertible elements of the set of integers modulo p, one can lift it to a completely multiplicative function on integers relatively prime to p and then extend this function to all integers by defining it to be 0 on integers having a non-trivial factor in common with p. A Dirichlet character with modulus p takes the same value on two integers that agree modulo p. The interested reader may consult the abundant literature on the subject but it is enough for us to remember that they are a particular kind of completely multiplicative complex valued functions on the set of integers, that there are .φ(p) characters modulo p, where .φ is the Euler function, and that they are tabulated in many places—there is even a command DirichletCharacter[p,j,n] in Mathematica [21] that gives the Dirichlet character with modulus p and index j as a function of n (the index j running from 1 to .φ(p)).
3.2 Lattice Properties: Tables From the fusion matrices associated with the different quantum subgroups one determines Gram matrices for the associated lattices of higher roots. For the first few members of the series considered in this paper, and for the exceptional cases, some information, encoded in the Gram matrices, is summarized in the following table15 .E .A1 .A2 .A3 .A4 .D3 .D6 .E5 .E9 .E21
N .rE 1 4 3 2 5 6 3 6 10 4 7 15 3 6 6 6 9 12 5 8 12 9 12 12 21 24 24 k
.r
.|R|
kiss
6 12 20 30 12 24 24 24 48
32 100 240 490 144 648 512 1152 9216
.32 q
.
6
.100 q
6
.240 q
6
.490 q
6
4 .36 q .162 q
4
.512 q
6
.756 q
4
4 .144 q
4.6 5.9 6.12 7.15 3.12 3.18 2.30 2.24 3.12
.
.ϕEuler ( ) 16 8 25 20 18 6 49 42 9 6 27 18 16 8 16 8 3 2
dim.(MrE (x ( ), χ)) 7 16 31 70 7 36 25 13 9
.|aut|
6!/2 .× 64 1200 864 3528 6912 2.6 3.11 2.11 3 2.10 3.4 5 2.12 3.2
We give below the coefficients of the associated theta series in the variable .q 2 (first term is .(q 2 )0 = 1)), for instance, 2 3 2 4 6 8 10 +2800q 12 + .θ (A2 ) = 1+100(q ) +450(q ) +. . . = 1+100q +450q +960q . . .. 15 In
the table, the subscript x of .x may be 0 or 1 (see footnote 10) and the entry kiss gives the smallest term of the theta series, its coefficient being the kissing number.
186 (rescaled)
A0
.
A1 :
.
A2 :
.
A3 :
.
A4 :
.
A5 : A6 : .D3 : .D6 : . .
E5 :
.
E9 :
.
E21 :
.
R. Coquereaux
:
1, 6, 0, 6, 6, 0, 0, 12, 0, 6, 0, 0, 6, 12, 0, 0, 6, 0, 0, 12, 0, 12, 0, 0, 0, 6, 0, 6, 12, 0, 0, 12, 0, 0, 0, 0, 6, 12, 0, 12, 0, 0, 0, 12, 0, 0, 0, 0, 6, 18, 0, 0, 12, 0, 0, 0, 0, 12, 0, 0, 0, 12, 0, 12, 6, 0, 0, 12, 0, 0, 0, 0, 0, 12, 0, 6, 12, 0, 0, 12, 0,. . . 1, 0, 0, 32, 60, 0, 0, 192, 252, 0, 0, 480, 544, 0, 0, 832, 1020, 0, 0, 1440, 1560, 0, 0, 2112, 2080, 0, 0, 2624, 3264, 0, 0, 3840, 4092, 0, 0, 4992, 4380, 0, 0, 5440, 6552, 0, 0, 7392, 8160, 0, 0, 8832, 8224,. . . 1, 0, 0, 100, 450, 960, 2800, 6600, 12300, 22400, 30690, 63000, 93150, 144000, 203100, 236080, 392850, 550800, 708350, 961800, 972780, 1581600, 1937250, 2495400, 2977400, 3063360, 4469400, 5547700, 6477600, 7963200, 7344920, 11094000, 12627000, 15127200, 17091900, 16459440, 22670850, 26899200,. . . 1, 0, 0, 240, 1782, 9072, 59328, 216432, 810000, 2059152, 6080832, 12349584, 31045596, 57036960, 122715648, 204193872, 418822650, 622067040, 1193611392, 1734272208, 3043596384, 4217152080, 7354100160, 9446435136, 15901091892, 20507712192, 32268036096, 40493364288, 64454759856, 76079125584, 118436670720, 142127536464,. . . 1, 0, 0, 490, 4998, 45864, 464422, 3429426, 21668094, 111678742, 492567012, 1876801038, 6352945942, 19484903508, 54935857326, 144330551050, . . . 1, 0, 0, 896, 11856, 154368, 2331648, 27065088, 281311128, . . . 1, 0, 0, 1512, 24300 ,425736, 8530758, . . . 1, 0, 36, 144, 486, 2880, 5724, 7776, 31068, 40320, 47628, . . . 1, 0, 162, 2322, 35478, 273942, 1771326, 9680148, 40813632, 150043014, 484705782. . . 1, 0, 0, 512, 11232, 145920, 1055616, 5618688, 25330128, 89127936, 295067136, 810542592, 2185379968, 5109275136, . . . 1, 0, 756, 5760, 98928, 1092096, 8435760, 45142272, 202712400, 715373568, 2350118808, 6501914496, 17469036096,. . . 1, 0, 144, 64512, 54181224,. . .
3.3 Properties of the Lattices Case A0 This lattice coincides with the (rescaled) usual root lattice of SU(3) also known as the planar hexagonal lattice, and R∨ with the set ofpositive roots. Although its theta
.SU(3)
Higher Roots and Their Lattices
187
function can be found in many textbooks.16 for instance in [2], it is instructive to obtain it by using the theorem recalled in the previous 6section. The Gram matrix, −3 in the basis of higher roots obtained from Eq. (6), is −3 6 , i.e., three times the Cartan matrix of SU(3). Its theta function is therefore a modular form on the group 0 (3), of weight s = 1 twisted by the (unique in this case) non-trivial Dirichlet character modulo 3; this vector space of modular forms is of dimension 1, hence θ can be identified with its generator. Case A1 The Gram matrix A, with the basis choice B1 , and its inverse K are given below. ⎛
6 ⎜ 2 ⎜ ⎜ ⎜ 2 .A = ⎜ ⎜ −2 ⎜ ⎝ −2 −2
2 6 2 2 −2 2
⎞ 2 −2 −2 −2 2 2 −2 2 ⎟ ⎟ ⎟ 6 2 2 −2 ⎟ ⎟ K= 2 6 2 2 ⎟ ⎟ 2 2 6 −2 ⎠ −2 2 −2 6
⎞ 3 −1 −1 1 1 1 ⎜ −1 3 −1 −1 1 −1 ⎟ ⎟ ⎜ ⎟ ⎜ ⎜ −1 −1 3 −1 −1 1 ⎟ ⎟ ⎜ ⎜ 1 −1 −1 3 −1 −1 ⎟ ⎟ ⎜ ⎝ 1 1 −1 −1 3 1 ⎠ 1 −1 1 −1 1 3 ⎛
1 8
(6)
The 16 elements of R∨ may be called “positive higher roots” (their opposites, the elements of −R∨ , being “negative”), they can be expanded on the root basis B1 = {αi }, i = 1 . . . 6 as follows: α1 −α2 +α6 , −α2 +α3 −α5 , −α1 +α3 −α4 , −α4 +α5 +α6 , α6 , −α2 +α4 −α5 , − α1 − α5 − α6 ,
.
−α1 + α2 − α4 , α2 , α4 , −α3 + α4 − α6 , α2 − α3 − α6 , α1 , α3 , α5 , α1 − α3 + α5 With the same ordering, the family of their mutual inner products builds a 16 × 16 matrix Abig , which is of rank 6, as expected. The lattice is even, with minimal norm 6, and all the coefficients of the above Gram matrix are even; if we rescale it, setting B = A/2, the vectors of minimal norm have then norm 3. The determinant of A, or “connection index,” is also the order of the dual quotient, an abelian group isomorphic with Z2 × (Z4 )×4 × Z8 . The lattice defined by A is obviously not self-dual. If we use its rescaled version, the connection index becomes 64 and the dual quotient is then isomorphic with (Z2 )×4 ×Z4 ; elements of the (rescaled) dual belong to one and only one congruence
16 Its
expression in terms of the elliptic theta function ϑ3 reads:
3 ϑ3 (0, q)3 + ϑ3 π3 , q 3 + ϑ3 2π 3 ,q .θ(z) = . 3ϑ3 0, q 3
188
R. Coquereaux
class, an element of the dual quotient, and are therefore be classified by 5-uplets (c21 , c22 , c23 , c24 , c4 ), with c2i ∈ {0, 1} and c4 ∈ {0, 1, 2, 3}. We find the theta function of this lattice by applying the Hecke-Schoenberg theorem. From the Gram matrix one finds that the discriminant is 46 and that the (modular) level of the quadratic form is 16. The odd primes not dividing 16 are 3, 5, 7, 11, 13 and their Legendre symbols are all equal to 1. From the 8×16 table of Dirichlet characters of modulus 16 over the cyclotomic field of order ϕEuler (16) = 8 restricted to odd primes not dividing the level, one selects the unique character whose values coincide with the list obtained for the Legendre symbols. The space of modular forms on 1 (16) of weight 3, twisted by this Dirichlet character, namely the Kronecker character −4, has dimension 7. It is spanned by the following forms (we set q2 = q 2 ): b1 = 1 + 12 q28 + 64 q212 + 60 q216 + O(q224 ),
.
b2 = q2 + 21 q29 + 40 q213 + 30 q217 + 72 q221 + O(q224 ), b3 = q22 + 26 q210 + 73 q218 + O(q224 ), b4 = q23 + 6 q27 + 15 q211 + 26 q215 + 45 q219 + 66 q223 + O(q224 ), b5 = q24 + 4 q28 + 8 q212 + 16 q216 + 26 q220 + O(q224 ), b6 = q25 + 2 q29 + 5 q213 + 10 q217 + 12 q221 + O(q224 ), b7 = q26 + 6 q214 + 15 q222 + O(q224 ) An explicit determination of the vectors (and their norms) belonging to the first shells shows that the theta function starts as 1 + 32q23 + 60q24 + O(q 14 ). The components of this modular form on the previous basis are therefore 1, 0, 0, 32, 60, 0, 0. In other words, θ = b1 + 32 b4 + 60 b5
.
Using a computer package, one can quickly obtain the q-expansion of the functions bn to very large orders. Here is a Magma [20] program that returns its series expansion up to order 48 in q2 and uses the above ideas: H := DirichletGroup(16,CyclotomicField(EulerPhi(16))); chars := Elements(H); eps := chars[2]; M := ModularForms([eps],3); order:=48; PowerSeries(M![1,0,0,32,60,0,0],order); One finds that the automorphism group aut of this lattice is of order 23,040 and that it is isomorphic with the semidirect product of A6 (the alternated group of order 6!/2 = 360) times an abelian group of order 64, actually with ((C2 )×5 A6 ) C2 . Orbits of the basis vectors under the aut action coincide and contain the 32 higher roots (the 16 positive and the 16 negative ones). From the given Gram
.SU(3)
Higher Roots and Their Lattices
189
matrix and using, for example, Magma, one finds that the Voronoi polytope has 92 5-dimensional facets, 4896 edges and 588 vertices. Other Avatars of This Lattice The obtained theta series starts as the theta series of a (scaled version of) the shifted D6 lattice, called D6+ = D6 ∪ ([1] + D6 ), see [2]. This coincidence (already noticed in [22]) is not sufficient to allow an identification with D6+ , but it is so because one can choose the same Gram matrix for both lattices. Using this identification, we can re-write the theta series in terms of elliptic theta functions as follows: .
1 ϑ2 0, q 4 6 + ϑ3 0, q 4 6 + ϑ4 0, q 4 6 2
This is an alternative to the expression of θ previously given in terms of appropriate modular forms. It is known [2] that the Dn+ packing is a lattice packing if and only if n is even. In particular it is so for n = 6. The fact that Dn+ is not a lattice for n odd excludes a possible systematic identification with lattices of type Ak (SU(3)) when k > 1. Here are a few others avatars of D6+ : – The generalized laminated lattice 6 [3] with minimal norm 3, see [28] (the authors study the family n [3] and provide enough information to allow one to recover the Gram matrix associated with 6 [3] and show that it coincides with the matrix A given before). – The lattice L4 generated by cuts of the complete graph on a set of 4 vertices, see [30] (the authors are interested in the Delaunay polytopes for the lattices Ln ; when n = 4, this lattice is isomorphic with D6+ , actually, the precise relation is √ L4 = 2 D6+ ). Identification of lattices defined by Ak (SU(3)) with other members of the above families fails. The Case A2 We obtain the following Gram matrix. ⎛
6 0 2 0 ⎜ 0 6 2 2 ⎜ ⎜ ⎜ 2 2 6 0 ⎜ ⎜ 0 2 0 6 ⎜ ⎜ 2 2 2 2 ⎜ ⎜ 0 2 2 0 .A = ⎜ ⎜ −2 1 2 0 ⎜ ⎜ 1 −1 2 2 ⎜ ⎜ ⎜ −2 0 −1 1 ⎜ ⎜ 2 −2 1 −2 ⎜ ⎝ −2 0 2 2 2 −2 2 0
⎞ 2 0 −2 1 −2 2 −2 2 2 2 1 −1 0 −2 0 −2 ⎟ ⎟ ⎟ 2 2 2 2 −1 1 2 2 ⎟ ⎟ 2 0 0 2 1 −2 2 0 ⎟ ⎟ 6 0 2 2 2 2 −1 1 ⎟ ⎟ 0 6 0 2 2 0 1 −2 ⎟ ⎟ 2 0 6 0 2 0 2 0 ⎟ ⎟ 2 2 0 6 2 2 2 2 ⎟ ⎟ ⎟ 2 2 2 2 6 0 0 −2 ⎟ ⎟ 2 0 0 2 0 6 −2 2 ⎟ ⎟ −1 1 2 2 0 −2 6 0 ⎠ 1 −2 0 2 −2 2 0 6
(7)
190
R. Coquereaux
The discriminant is readily calculated: = 59 . The modular level is = N 2 = 25. Applying the Hecke-Schoenberg theorem leads to the following result: the theta function of this lattice is of weight 6, modular level = 52 = 25 (the square of the altitude) and Dirichlet character χ (11) for the characters modulo 25 on a cyclotomic field of order 20. It is the only character (namely the Kronecker character 5), the eleventh on a collection of 20 = Euler (25), that coincides with the value of the Legendre symbol L(, p) for all odd primes p that do not divide 25. This space of modular forms has dimension 16. The theta function, in the variable q2 = q 2 , is therefore fully determined by its 16 first Fourier coefficients (the first being 1). The coefficients of q2a with a > 15 are then predicted. Here is the Magma code calculating the first 48 × 2 coefficients: H := DirichletGroup(25,CyclotomicField(EulerPhi(25))); chars := Elements(H); eps := chars[11]; M := ModularForms([eps],6); order:=48; PowerSeries(M![1,0,0,100,450,960,2800,6600,12300,22400, 30690, 63000, 93150, 144000, 203100, 236080],order); The first Fourier coefficients have to be computed by a brute force approach that relies, ultimately, on the obtained Gram matrix. The automorphism group of this lattice is of order 1200 and its structure, in terms of direct and semidirect products, is C2 × ((((C5 × C5 ) C4 ) C3 ) C2 ). Orbits of the basis vectors under the aut action coincide and contain the 100 higher roots (the 50 positive and the 50 negative ones). Their stabilizers are conjugated in aut, and are isomorphic with the group D12 (which is itself isomorphic with S3 × C2 ). We only mention that the Voronoi polytope has 5410 11-dimensional facets. The Case A3 A Gram matrix is given below. We only mention that the discriminant is = 612 , the modular level is = 18, the rank of the lattice L3 is r = 2× rA3 = 20, the period is a rhombus 6 × 6 and |R| = 240. The theta function belongs to a space of modular forms on 0 (18), of weight 10, twisted by an appropriate character of modulus 18 on a cyclotomic field of order 6 = Euler (18). The corresponding space of modular forms has dimension 31 and the theta function of the lattice is fully determined by its first Fourier coefficients. The automorphism group has order 864 and the Voronoi polytope has 539214 19-dimensional facets.
.SU(3)
Higher Roots and Their Lattices
⎛
6 0 0 ⎜ 0 6 0 ⎜ ⎜ 0 0 6 ⎜ ⎜ ⎜ 0 0 0 ⎜ ⎜ 2 2 0 ⎜ ⎜ 0 2 0 ⎜ ⎜ 0 2 2 ⎜ ⎜ 2 2 0 ⎜ ⎜ 0 2 2 ⎜ ⎜ ⎜ 0 2 0 .A = ⎜ ⎜ −2 1 0 ⎜ ⎜ 1 0 1 ⎜ ⎜ 0 1 −2 ⎜ ⎜ 0 1 0 ⎜ ⎜ −2 0 2 ⎜ ⎜ ⎜ 2 0 0 ⎜ ⎜ 0 0 −2 ⎜ ⎜ −2 0 0 ⎜ ⎝ 2 0 −2 0 0 2
0 0 0 6 0 2 0 0 0 2 0 1 0 −2 0 −2 2 2 0 −2
2 0 2 2 0 0 0 2 6 0 0 6 0 0 2 2 2 0 0 2 2 0 2 2 0 0 0 2 −1 1 1 −1 1 1 2 2 2 0 0 2
0 2 2 0 0 0 6 0 2 2 0 2 2 0 1 1 −1 0 2 2
2 0 2 2 0 2 0 0 2 2 2 0 0 2 6 0 0 6 0 0 2 0 2 2 0 2 0 0 2 2 2 0 0 2 −1 1 1 −1 1 1
191
0 −2 1 0 2 1 0 1 0 0 1 −2 2 0 1 0 0 2 2 0 2 0 2 0 2 0 2 2 0 2 2 0 0 0 2 2 6 0 2 0 0 6 0 0 2 0 6 0 0 0 0 6 2 0 0 0 0 2 2 0 2 0 2 0 2 0 2 2 1 2 2 0 1 0 2 2 −1 0 2 0
⎞ 0 −2 2 0 −2 2 0 1 0 0 0 0 0 0 ⎟ ⎟ 0 2 0 −2 0 −2 2 ⎟ ⎟ ⎟ −2 0 −2 2 2 0 −2 ⎟ ⎟ 0 −1 1 1 2 2 0 ⎟ ⎟ 2 1 −1 1 2 0 2 ⎟ ⎟ 0 1 1 −1 0 2 2 ⎟ ⎟ 0 2 2 0 −1 1 1 ⎟ ⎟ 0 2 0 2 1 −1 1 ⎟ ⎟ ⎟ 2 0 2 2 1 1 −1 ⎟ ⎟ 0 2 0 0 2 0 0 ⎟ ⎟ 0 2 2 2 2 2 2 ⎟ ⎟ 0 0 0 2 0 2 0 ⎟ ⎟ 6 0 2 0 0 0 2 ⎟ ⎟ 0 6 0 0 0 −2 2 ⎟ ⎟ ⎟ 2 0 6 0 −2 2 0 ⎟ ⎟ 0 0 0 6 2 0 −2 ⎟ ⎟ 0 0 −2 2 6 0 0 ⎟ ⎟ 0 −2 2 0 0 6 0 ⎠ 2 2 0 −2 0 0 6 (8)
Cases D3 , D6 , . . . , E5 , E9 , E21 The procedure should be clear by now and we just refer to the previously given tables. Explicit Gram matrices, in particular for the three exceptional cases, can be found in one appendix of [3].
3.4 Remarks About the Vectors of Smallest Norm For the lattices associated with Ak (SU(3)) that we considered explicitly, the lattice vectors of shortest length are precisely the higher roots (100 of them for L3 , for instance), the kissing number of those lattices are then given by the number of higher roots. As it is well known, this property holds for all usual root lattices, i.e., higher root lattices of the SU(2) family. However, this property does not always hold for those lattices associated with modules of the SU(3) family that are not of type Ak although it holds for E5 . In the case D3 the first shell is made of vectors of norm 4, so they are not higher roots, and the only vectors of the lattice that belong to the second shell, of norm 6, are precisely the higher roots. In the case D6 , like for E9 or E21 the first shell is made of vectors of norm 4 (which are not higher roots), but the second shell, of norm 6, contains not only the higher roots themselves, but other
192
R. Coquereaux
vectors as well. In all cases the vectors of smallest norm can of course be expanded on a chosen basis of higher roots. About the Determination of θ (Ak ), for General k The theta function, as a modular form twisted by a character, can, in principle, be obtained by following the method explained in the previous sections and illustrated in the case of the first few members of the Ak series. In this respect we observed that the (quadratic form) level of is often equal to = (k + 3)2 but it is not always so. The discriminant is (k +3)3(k+1) , the weight is rAk = (k +1)(k +2)/2, the quadratic form level is readily obtained from the Gram matrix, and the determination of the appropriate character requires a discussion relying on the arithmetic properties of the discriminant and of the level. However, the first coefficients of the Fourier series expansion have to be found, and the number of needed coefficients depends on the properties of an appropriate space of modular forms. The determination of the needed coefficients is done by brute force, namely by computing the norm of the vectors belonging to the first shells, using the Gram matrix as an input. Moreover, the explicit determination of a Gram matrix becomes a non-trivial exercise when k is large. The present method may therefore become rapidly intractable if we increase k too much. Construction of Lattices of Higher Roots for Quantum Subgroups of Type (G, k), for Other Lie Groups The calculations described in the present paper, leading to explicit theta series for lattices of higher roots associated with quantum subgroups of type (SU(3), k) could be generalized, without much ado, to the other Lie groups of rank 2, namely B2 and G2 , since the fusion matrices of their quantum modules and quantum subgroups at level k are available, see [9], see also [7] (the exceptional cases given there are obtained by conformal embeddings but it is believed that, at least for these Lie groups, there are no others). The list of quantum modules and subgroups of type (SU(4), k) is also known—see [22] and [11].
References 1. Cappelli A., Itzykson C. and Zuber J.-B., The ADE classification of minimal and A(1) 1 conformal invariant theories, Commun. Math. Phys., 13, pp 1–26, (1987). 2. Conway J. and Sloane N.J.A., Sphere Packings, Lattices and Groups, Springer (1999). 3. Coquereaux R., Theta functions for lattices of SU(3) hyper-roots, Experimental Mathematics, 29:2, 137–162, (2020, published online: 02 Apr 2018), DOI: 10.1080/10586458.2018.1446062 4. Coquereaux R., Quantum McKay correspondence and global dimensions for fusion and module-categories associated with Lie groups, Journal of Algebra, 398, pp 258–283 (2014). 5. Coquereaux R. and Schieber G., Orders and dimensions for sl(2) or sl(3) module-categories and boundary conformal field theories on a torus, J. of Math. Phys. 48 (2007) 043511. 6. Coquereaux R., Hammaoui D., Schieber G. and Tahri E.H., Comments about quantum symmetries of SU(3) graphs, Journal of Geometry and Physics 57, pp 269–292 (2006). 7. Coquereaux R., Fusion graphs, http://www.cpt.univ-mrs.fr/~coque/quantumfusion/ FusionGraphs.html
.SU(3)
Higher Roots and Their Lattices
193
8. Coquereaux R., Isasi E., Schieber G., Notes on TQFT wire models and coherence equations for SU(3) triangular cells, Symmetry, Integrability and Geometry: Methods and Applications, SIGMA 6 (2010), 099, 44 pp. 9. Coquereaux R., Tahri E.H., Rais R., Exceptional quantum subgroups for the rank two Lie algebras B2 and G2, Journal of Mathematical Physics, Vol.51, Issue 9 (2010). 10. Coquereaux R. and Zuber J.-B., On some properties of SU(3) Fusion Coefficients. Contribution to Mathematical Foundations of Quantum Field Theory, special issue in memory of Raymond Stora, 33 pp., Nucl. Phys. B.. DOI: 10.1016/j.nuclphysb.2016.05.029 (2016). 11. Coquereaux R. and Schieber G., From conformal embeddings to quantum symmetries: an exceptional SU(4) example, Journal of Physics: Conference Series, Vol 103, DOI https:// iopscience.iop.org/article/10.1088/1742-6596/103/1/012006, and Quantum symmetries for exceptional SU(4) modular invariants associated with conformal embeddings, Symmetry, Integrability and Geometry: Methods and Applications, SIGMA 5 (2009), 044, 31 pp, https:// doi.org/10.3842/SIGMA.2009.044 12. Di Francesco P., Matthieu P. and Senechal D., Conformal field theory, Springer, (1997). 13. Di Francesco P. and Zuber J.-B., SU(N) lattice integrable models associated with graphs, Nucl. Phys., B 338, pp 602–646, (1990). 14. Dorey P., Partition Functions, Intertwiners and the Coxeter Element. Int. J. Mod. Phys A8, pp 193–208 (1993). 15. Evans D. E. and Pugh M., Ocneanu cells and Boltzmann weights for the SU(3) ADE graphs. Münster J. of Math. 2, pp 95–142 (2009) 16. Finkelberg, M., An equivalence of fusion categories, Geom. Funct. Anal. 6 (1996), 249–267. 17. Y.-Z. Huang, Vertex operator algebras, the Verlinde conjecture, and modular tensor categories, Proc. Natl. Acad. Sci. USA, 102 (2005), 5352–5356. 18. Kac V., Infinite dimensional Lie algebras, Cambridge University Press, Cambridge (1990). 19. Kazhdan D. and Lusztig G., Tensor structures arising from affine Lie algebras, III, J. Amer. Math. Soc., 7, pp 335–381, (1994). 20. Bosma W., Cannon J., and Playoust C., The Magma algebra system. I. The user language, J. Symbolic Comput., 24 (1997), 235–265, http://magma.maths.usyd.edu.au 21. Wolfram Research, Inc., Mathematica, Champaign, IL (2010). 22. Ocneanu A., The classification of subgroups of quantum SU(N), in “Quantum symmetries in theoretical physics and mathematics”, Bariloche 2000. Eds. Coquereaux R., García A. and Trinchero R., AMS Contemporary Mathematics, 294, pp 133–160 (2000). 23. Ocneanu A., Higher Coxeter systems, http://www.msri.org/publications/ln/msri/2000/ subfactors/ocneanu (2000). 24. Ocneanu A., Poster communications (2004). 25. Ocneanu A., Harvard Lectures (2017–2018). YouTube: Video files Adrian Ocneanu Harvard Physics L22, 267 2017 10 25, L23, 267 2017 10 27, L24, 267 2017 10 30, https://www.youtube. com/watch?v=8ls_s7cpEjA&feature=youtu.be&t=2700 26. OEIS: The Online Encyclopedia of Integer Sequences, N.J.A. Sloane, /https://oeis.org 27. Ostrik V., Module categories, weak Hopf algebras and modular invariants, Transform. groups, 8, no 2, pp 177–206 (2003). 28. Plesken W. and Pohst M., Constructing integral lattices with prescribed minimum, Mathematics of Computation, Vol 45, No 171, pp 209–221, and supplement S5–S16. 29. Zagier D.B., Elliptic Modular Forms and Their Applications, in ‘The 1-2-3 of Modular forms’, Lectures at a Summer School in Nordfjordeid, Norway, Springer (2008). 30. Deza M. and Grishukhin V., Delaunay Polytopes of Cut Lattices, Linear Algebra and Its Applications, 226–228:667–685 (1995).
Quantum Field Theory with Dynamical Boundary Conditions and the Casimir Effect Benito A. Juárez-Aubry and Ricardo Weder
To the memory of Alex Grossmann
Abstract We study a coupled system that describes the interacting dynamics between a bulk field, confined to a finite region with timelike boundary, and a boundary observable. In our system the dynamics of the boundary observable prescribes dynamical boundary conditions for the bulk field. We cast our classical system in the form of an abstract linear Klein-Gordon equation, in an enlarged Hilbert space for the bulk field and the boundary observable. This makes it possible to apply to our coupled system the general methods of quantization. In particular, we implement the Fock quantization in full detail. Using this quantization we study the Casimir effect in our coupled system. Specifically, we compute the renormalized local state polarization and the local Casimir energy, which we can define for both the bulk field and the boundary observable of our system. Numerical examples in which the integrated Casimir energy is negative are presented.
1 Introduction In this work we are concerned with the study of a system describing the coupled dynamics between a bulk field, confined to a region with (timelike) boundary, and a boundary observable, whereby the dynamics of the boundary observable prescribes the boundary conditions for the bulk field. While the coupled system can be seen as an interacting one, as we shall see, it can be cast in the form of an abstract, linear Klein-Gordon equation, for which Fock quantization can be implemented in full detail. Once a quantum description of the system is available, a most natural question is to characterize the Casimir effect in this system. Indeed, the purpose of
B. A. Juárez-Aubry · R. Weder () Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Ciudad, de México, México e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Flandrin et al. (eds.), Theoretical Physics, Wavelets, Analysis, Genomics, Applied and Numerical Harmonic Analysis, https://doi.org/10.1007/978-3-030-45847-8_12
195
196
B. A. Juárez-Aubry and R. Weder
this paper is to study the renormalized local state polarization and the local Casimir energy, which we can define for both the bulk field and the boundary observable of the system. This class of mixed bulk-boundary systems have received attention both in the mathematics and physics literature. For time-periodic solutions they correspond to Sturm-Liouville problems with boundary conditions that depend in the spectral parameter. In the mathematics literature this type of Sturm-Liouville problems has been extensively studied. See, for example [7, 21, 31, 33, 39, 41, 45], and the references quoted there. In the physics literature, such bulk-boundary systems have been studied recently in the context of quantum field theory in [4, 5, 10, 28, 35, 49] owing to different motivations. In [4, 5], a central motivation was to underpin precisely the notion of boundary degrees of freedom, especially in connection with the study of black hole entropy calculations in the context of isolated or dynamical horizons [2, 3], and also to understand whether boundary degrees of freedom could serve as particle detectors (in the spirit of [43]) for bulk quantum fields. On the other hand, the works [10, 49] have been largely inspired by the holographic program of AdS/CFT and its generalizations (see, e.g., [30, 47]), and the similarities between these bulk-boundary systems and so-called holographic renormalization [40]. In [28] the authors are motivated by studying boundary fields in terms of boundary actions in a path integral formalism. The study of bulk-boundary systems in [35] is motivated by the modelling of superconducting circuits, which are relevant to experiments in quantum information. We should mention that it is further argued in [4, 5, 10] that the techniques applied to the study of bulk-boundary systems should also prove relevant for high-energy physics in the context of Yang-Mills in lower dimensions and Chern-Simons theory [48] and Maxwell-Chern-Simons theory [14, 15, 38]. Such techniques are presumably also relevant for condensed matter theory. See, e.g., [32] in the context of the modelling of topological insulators using effective field theory and [18] for the study of effects occurring therein. In the physical literature discussed above, the focus has been mainly on the construction of states and linear observables in quantum theory for such bulkboundary theories. This work differs in two senses. First, the dynamical boundary conditions that we impose are more general. In particular, the coefficient .β2 that appears in (1) is set to zero in the previous literature, whereas here we allow it to be different from zero. When .β2 is nonzero there is a new physical effect, namely an interaction between the bulk and the boundary that depends on the second time derivative of the bulk field, that is to say, an interaction that is frequency dependent. Second and most importantly, the main focus in this work is to study the Casimir effect, which is one of prime interest from a theoretical and experimental viewpoint. The Casimir effect posits that a system confined to some finite physical region must have a non-trivial vacuum energy density, which will depend on the boundary conditions of the system, even if the spacetime geometry is flat. For this reason, this effect has also been of great interest in the quantum field theory literature. The non-trivial vacuum energy is referred to as the Casimir energy. See, e.g., [8] and [16] for a literature overview on the subject. To the best of our knowledge,
Quantum Field Theory with Dynamical Boundary Conditions and the Casimir Effect
197
the first clear-cut calculation, which clarifies the origin of the Casimir effect in full quantum field theory, was performed by Kay in [29] in the case of a scalar field with periodic boundary conditions—i.e., in a cylindrical universe. A thorough discussion of the effect in the same spirit is presented in [19, Chap. 5]. In those references, a point-splitting regularization and Minkowski-vacuum subtraction are used to define the local Casimir energy. Our strategy to compute the local Casimir energy is in this spirit too, only differing in the fact that we perform a Hadamard subtraction instead (see Sect. 3.1 below), which is equivalent to removing the Minkowskivacuum contribution in our case of interest. Note that it is now understood that the Hadamard subtraction is better suited for generalizing renormalization prescriptions to curved spacetimes. See, e.g., [44]. To the best of our knowledge, the only instance in which the Casimir effect has been studied in the context of dynamical boundary conditions, i.e., for bulkboundary systems, appears in [17]. That work is motivated by the fact that superconducting circuit experiments, which are relevant to the experimental measurement of the Casimir energy, are modeled most appropriately by using dynamical boundary conditions [46]. Our works differ from [17] in that we are interested in the local Casimir energy, both at zero temperature and at positive temperature, while they focus on the integrated Casimir energy at zero temperature, and in that the boundary condition that we impose (i.e., the equation for the boundary observable) is more general as we allow for .β2 = 0. The method used for obtaining the Casimir energy is also different. In [17] the regularization procedure is implemented by confining the system to a large box, and obtaining the difference between the total energy of the system of interest (with dynamical boundary conditions) and the would-be total energy for the system confined in the large box (with non-dynamical boundary conditions), with the aid of complex-analytic techniques. The integrated Casimir energy is then obtained by taking the infinite-size limit of the large-box reference system. The advantage of this method is that it allows them to study the static and dynamical Casimir effects in an efficient way, but this comes at the price of having no information about the local Casimir energy, which we obtain in this paper. At temperature zero the main results for the renormalized local state polarization are given at (43), (46), (65), and (67), and for the local Casimir energy at (56), (58) (68), and (69). At positive temperature the main results for the renormalized local state polarization are given at (76a) and (76b), and for the local Casimir energy at (78a) and (78b). The paper is organized as follows. In Sect. 2 we consider our classical system, and we formulate it as an abstract Klein-Gordon equation. In Sect. 3 we quantize our classical system, we consider the Hadamard property, and we introduce the renormalized local state polarization and the local Casimir energy. In Sect. 4 we obtain our results on the renormalized local state polarization and on the local Casimir energy at zero temperature. In Sect. 5 we obtain our results on the renormalized local state polarization and on the local Casimir energy at positive temperature. In Sect. 6 we present numerical examples for the integrated Casimir energy which in each case is negative. We do not draw conclusions on the generic
198
B. A. Juárez-Aubry and R. Weder
negativity of the integrated Casimir energy for all allowed boundary-condition parameters. Section 7 contains our final remarks. In Sect. 8 (Appendix 1) we state some formulae that we use. In Sect. 9 (Appendix 2) we obtain estimates on the eigenvalues of our classical problem. Finally, in Sect. 10 (Appendix 3) we give details of the calculation of the renormalized local state polarization and of the local Casimir energy. An extension of the present work including coherent states appears in [26], where the Casimir force is also studied by a combination of analytic and numerical techniques. An executive summary of the present work and of [26] along with a short review of the Casimir effect appears in [27].
2 The Classical Problem We consider a scalar field obeying the following dynamical equation .φ : R × [0, ] → R, where . > 0. ⎧ ⎨ ∂t2 − ∂z2 + m2 + V (z) φ(t, z) = 0, t ∈ R, z ∈ (0, ), . cos α φ(t, 0) + sin α ∂z φ(t, 0) = 0, α ∈ [0, π ), ⎩ 2 β1 ∂t − β1 φ(t, ) = −β2 ∂z φ(t, ) + β2 ∂z ∂t2 φ(t, ).
(1)
Here, .m2 > 0 is a mass parameter, and .β1 , β2 , β1 , β2 are real parameters. The parameter .β1 can be seen as the square of an inverse velocity, or minus the square of an inverse velocity, .β1 as a mass or as a constant potential and, .β2 , .β2 can be viewed as coupling parameters to external sources for the boundary dynamical observable .φ∂(t) := φ(t, ). Furthermore, the potential .V (z) is a real valued continuous function defined for .z ∈ [0, ]. The system (1) subject to initial data on the surface defined by .t = 0 can be viewed as a dynamical system that describes the interaction between a bulk field .φ(t, z), 0 < z < , and a boundary observable .φ∂(t) := φ(t, l). Let us consider a solution to (1) of the form, φ(t, z) = e−iωt ϕ(z).
(2)
⎧ ⎨ −∂z2 + m2 + V (z) ϕ(z) = ω2 ϕ(z), z ∈ (0, ), . cos α ϕ(0) + sin α ∂z ϕ(0) = 0, α ∈ [0, π ), ⎩ − [β1 ϕ() − β2 ∂z ϕ()] = ω2 β1 ϕ() − β2 ∂z ϕ() .
(3)
.
Inserting (2) into (1) we obtain
The system (3) is a two-point boundary-value problem where the spectral parameter ω2 appears in the boundary condition at .z = . Systems of this type appear, after separation of variables, in the one dimensional wave and heat equations, in a variety of physical problems. In particular, cases with .β2 = 0 appear in the cooling of a thin
.
Quantum Field Theory with Dynamical Boundary Conditions and the Casimir Effect
199
solid bar placed at time zero in contact with a finite amount of liquid. See [21] for a discussion of these applications. It is a general fact that boundary-value problems where the spectral parameter appears in the boundary condition can be understood as boundary-value problems that describe part of a physical system, and that when the space of states is enlarged to include the missing part of the system, then, the problem can be formulated in an operator theoretical way, with a boundary condition that does not depend on the spectral parameter. This is precisely the situation here, when one enlarges the space of states to include the boundary observable, as we proceed to do now. Following [45], and [21], we write the boundary-value problem (3) in an operator theoretical way with an associated self-adjoint operator in a Hilbert space that describes the bulk field and the boundary observable. For this purpose we consider the following extended Hilbert space, .H, that contains the boundary observable. Namely, H := L2 ((0, )) ⊕ C,
.
(4)
where .C denotes the complex numbers. .H is equipped with the following scalar product. ˆ (ϕ, χ )H :=
.
dz ϕ1 (z)χ1 (z) + ρ −1 ϕ2 χ2 ,
(5)
0
for .ϕ = (ϕ1 , ϕ2 ) , .χ = (χ1 , χ2 ) ∈ H and where, β1 β1 = β1 β2 − β1 β2 . .ρ := det β2 β2
(6)
We always assume that .ρ > 0. As in [21] let us define the following linear operator A in .H by 2 −∂z + m2 + V (z) ϕ1 (z) , .ϕ ∈ D(A) → Aϕ := − [β1 ϕ1 () − β2 ∂z ϕ1 ()]
(7)
where the domain of A is defined as follows:
ϕ1 .D(A) = ϕ = ∈ H : ϕ1 , ∂z ϕ1 are absolutely continuous in [0, ], ∂z2 ϕ1 ∈ L2 ([0, ]), ϕ2 (8) cos αϕ1 (0) + sin α∂z ϕ1 (0) = 0, ϕ2 = β1 ϕ1 () − β2 ∂z ϕ1 () .
It is proven in [21] that if V is continuous in .[0, l] and .ρ > 0, the operator A is densely defined and self-adjoint. Furthermore, [21] proves that the spectrum of A is discrete and that it consists of multiplicity one eigenvalues that accumulate at .+∞. The two-point boundary-value problem (3) is equivalent to the following eigenvalue problem for the self-adjoint operator A in the Hilbert space .H
200
B. A. Juárez-Aubry and R. Weder
Aϕ = ω2 ϕ,
.
ϕ ∈ D(A).
(9)
Then, the dynamical system (1) can be formulated as the following abstract KleinGordon equation ∂t2 ϕ(t, z) + Aϕ(t, z) = 0,
.
ϕ(t, z) ∈ D(A).
(10)
To apply to the abstract Klein-Gordon equation (10) the general methods of quantization, it is crucial that the operator A be positive. In the next proposition we give sufficient conditions for the positivity of .A. Proposition 1 Suppose that the determinant .ρ defined in (6) is positive, that .m2 + V (x) ≥ 0, x ∈ [0, ], that .β2 = 0, and that either .α = 0, or . π2 ≤ α < π. Further, assume. 1. If .β2 > 0, then, .β1 ≥ 0, β2 < 0, and .β1 < 0. 2. If .β2 < 0, then, .β1 ≤ 0, β2 > 0, and .β1 > 0. Under these conditions the eigenvalues of A are all positive, and denoting by .ω12 the smallest eigenvalue, .A ≥ ω12 > 0. Proof Suppose that A has a nonpositive eigenvalue .−λ, with .λ ≥ 0, and with eigenvector .ϕ = (ϕ1 , ϕ2 ) ∈ D(A), A ϕ = −λ ϕ.
(11)
.
We denote
α := 0, if α = 0, and cot
α = cot α, if cot
.
π ≤ α < π. 2
Using the first line in the right-hand side of (7), (11), and integrating by parts we obtain .
− λ(ϕ1 , ϕ1 )L2 ((0,)) = ((Aϕ)1 , ϕ1 )L2 ((0,)) = (∂z ϕ1 , ∂z ϕ1 )L2 ((0,)) + .
α|ϕ1 (0)|2 − ∂z ϕ1 ()ϕ1 (), m2 + V ϕ1 , ϕ1 2 − cot L ((0,))
(12) (13)
αϕ1 (0), if where we also use that if .α = 0, ϕ1 (0) = 0, and that .∂z ϕ1 (0) = −cot π ≤ α < π. By the second line in the right-hand side of (7), (11), since by (8), 2 .ϕ2 = β ϕ1 () − β ∂z ϕ1 (), and reordering the terms we get 1 2
.
(β2 − λβ2 )∂z ϕ1 () = (β1 − λβ1 )ϕ1 ().
.
(14)
If .(β2 − λβ2 ) = (β1 − λβ1 ) = 0, it follows from (6) that .ρ = 0. Since we assume that .ρ = 0, we have that either, .(β2 −λβ2 ) = 0 or .(β1 −λβ1 ) = 0. If .(β2 −λβ2 ) = 0, and .(β1 − λβ1 ) = 0 it follows from (14) that .ϕ1 () = 0. Further, if .(β1 − λβ1 ) = 0,
Quantum Field Theory with Dynamical Boundary Conditions and the Casimir Effect
201
and .(β2 − λβ2 ) = 0 it follows from (14) that .∂z ϕ1 () = 0. Then, in these two cases we have − ∂z ϕ1 ()ϕ1 () = 0.
.
(15)
Moreover, if .(β2 − λβ2 ) = 0 and .(β1 − λβ1 ) = 0, it follows from (14), ∂z ϕ1 () = −γ ϕ1 (),
.
(16)
where γ =
.
β1 − λβ1 . λβ2 − β2
(17)
Moreover, under assumptions .(1) and .(2) above we see that .γ > 0. Introducing (15) and (16) into (12) we obtain . − λ(ϕ1 , ϕ1 )L2 ((0,)) = (∂z ϕ1 , ∂z ϕ1 )L2 ((0,)) +
α|ϕ1 (0)|2 + γ˜ |ϕ1 ()|2 , m2 + V ϕ1 , ϕ1 2 − cot
(18)
L ((0,))
where either .γ˜ = 0 or .γ˜ = γ > 0. As the left-hand side of (18) is nonpositive and the right-hand side is nonnegative it follows that .ϕ1 = 0. Furthermore, as by (8) .ϕ2 = β1 ϕ1 () − β2 ∂z ϕ1 (), we get .ϕ2 = 0, and in consequence .ϕ = 0. This completes the proof that A has no nonpositive eigenvalues. Note that as the eigenvalues of A accumulate at infinity, there cannot be a sequence of eigenvalues that converges to zero. Assuming that .A > 0, the solution to the abstract Klein-Gordon equation (10) subject to initial conditions, .
φ(0, z) = f (z), ∂t φ(0, z) = p(z),
(19)
is then given by
φ(t) = cos tA
.
1/2
sin tA1/2 f+ p. A1/2
(20)
For a strong solution we require that .f ∈ D(A) and .p ∈ D(A1/2 ). Note, however, that the right-hand side of (20) is well defined for .f, p ∈ H, and that in this case it defines a weak solution to (10). More explicitly, let .{n }n∈N be a complete set of orthonormal eigenfunctions of the self-adjoint operator .A. Then, the solution given by (20) takes the explicit form
202
B. A. Juárez-Aubry and R. Weder
φ(t, z) =
∞
.
n=1
sin(ωn t) . n (z) (n , f )H cos(ωn t) + (n , p)H ωn
(21)
The series in (21) converges in the norm of .H. Note that each of the eigenmodes that appear in (21), namely n (z) cos(ωn t), and n (z)
.
sin(ωn t) , ωn
n = 1, . . . ,
gives a solution to the dynamical system (1).
3 Quantum Field Theory As we have seen in Sect. 2, problem (1) can be cast in the form of an abstract linear Klein-Gordon equation. In this section we always suppose that the operator A is positive, that is to say, that all its eigenvalues are positive. In particular, this is true if the assumptions of Proposition 1 hold. It follows that the standard methods of quantization are available to yield the quantum counterpart of our classical problem. For details, the interested reader can look at Refs. [1, 13, 19, 44]. In [6], special attention is paid to the rôle of boundary conditions in canonical quantization. For our purposes it is convenient to use the canonical quantization method. The one-particle Hilbert space of the quantum theory is the Hilbert space .2 (N) that consists of all complex-valued, square-summable sequences, .{αn }∞ n=1 , with the scalar product, .
∞ ∞ {αn }∞ αn βn . , {β } := n n=1 l 2 (N) n=1 n=1
The many-particle Hilbert space of the quantum theory is the bosonic Fock space [1] n 2 H = C ⊕∞ n=1 (⊗s (N)),
.
where .⊗ns 2 (N)denotes the symmetric tensor product of n copies of .2 (N), n = 1, . . . [1]. In describing the system at zero temperature, there is a distinguished ground state—the vacuum state, = (1, 0, 0, · · · ) ∈ H .
.
The quantum fields are operators on .H given by
Quantum Field Theory with Dynamical Boundary Conditions and the Casimir Effect
ˆ z) = (t,
∞
.
n=1
1 e−iωn t n (z) an + eiωn t n (z) an† , 1/2 (2ωn )
203
(22)
where .an and .an† are annihilation and creation operators on Fock space, satisfying the canonical commutation relations, .[an , al† ] = δn,l , [an , al ] = 0, [an† , al† ] = 0, n, l = 1, . . . , .{n }n∈N is a complete set of orthonormal eigenfunctions of the self-adjoint operator A introduced in Sect. 2, and .ωn2 is the eigenvalue that corresponds to .n , n = 1, . . . . We write the eigenfunctions as follows [21]: ψn (z) . β1 ψn () − β2 ∂z ψn ()
n (z) =
.
(23)
The detailed form of the eigenfunctions .n , n = 1, . . . , depends on the boundary condition imposed at .z = 0. See below in Sects. 4.1 and 4.2. The creation operator † .an creates a particle in the state .n , n = 1, · · · , and the annihilation operator .an annihilates a particle in the state .n , n = 1, . . . . The two-point Wightman function in the vacuum . is given by ˆ z) (t ˆ , z ) =
|(t,
.
∞ 1 −iωn (t−t ) e n (z) ⊗ n (z ). 2ωn
(24)
n=1
The two-point Wightman function (24) has four tensor-product components. The diagonal ones constitute bulk-bulk and boundary-boundary correlation functions, while the off-diagonal terms define the bulk-boundary correlation functions. In order to define the bulk and boundary renormalized local state polarizations and local Casimir energies that we study in this work in Sect. 4, we will make use of the bulk-bulk and boundary-boundary two-point Wightman functions, which are given explicitly, in terms of the eigenfunctions (23), by ˆ B (t, z) ˆ B (t , z ) =
|
.
∞ 1 −iωn (t−t ) e ψn (z)ψn (z ) and . 2ωn
(25a)
n=1
ˆ ∂ (t) ˆ ∂ (t ) =
|
∞ 2 1 −iωn (t−t ) β1 ψn () − β2 ∂z ψn () , e 2ωn n=1
(25b) respectively. At a positive temperature, .T := 1/β > 0, the thermal equilibrium state for our problem can be defined in Fock space by the usual Gibbs formula. See Section 2.3 of [20] for details. Here we only quote the formula for the positive-temperature two-point Wightman function given in Eq. (2.43) of [20].
204
B. A. Juárez-Aubry and R. Weder
ˆ z)(t ˆ , z )β =
(t,
.
∞ 1 n (z) ⊗ n (z ) −iωn (t−t ) e + e−βωn eiωn (t−t ) . −βω n 2ωn 1 − e n=1
(26) The limit .β → ∞ of (26) yields (24), which justifies that the vacuum state can be seen as a zero-temperature state. In order to define the bulk and boundary renormalized local state polarizations and local Casimir energies at positive temperature, that we will study in Sect. 5, we will make use of the bulk-bulk and boundary-boundary positive-temperature twopoint Wightman functions, which are given explicitly, in terms of the eigenfunctions (23), by ˆ .
B
ˆ B (t , z )β = (t, z)
∞ 1 ψn (z)ψn (z ) −iωn (t−t ) e + e−βωn eiωn (t−t ) , and . −βω n 2ωn 1 − e
(27a)
n=1
ˆ ∂ (t, z) ˆ ∂ (t , z )β =
2 ∞ 1 β1 ψn () − β2 ∂z ψn () −iωn (t−t ) e + e−βωn eiωn (t−t ) , −βω n 2ωn 1−e n=1
(27b) respectively.
3.1 The Hadamard Property, the Renormalized Local State Polarization and the Local Casimir Energy The linear system that we study in this work consists of a bulk field, defined on the interval (i.e., in .1 + 1 dimensions), and a boundary observable defined on the right-end boundary of the interval (at .z = ). In this section, we will define the renormalized local state polarization and local Casimir energy for the bulk field and the boundary observable of the system. For the bulk field, we briefly introduce the notion of Hadamard states. The physically relevant states of linear fields satisfy the so-called Hadamard condition, which characterizes the short-distance behavior of field correlations. In .1 + 1 dimensions, which is the case that we study in this paper, we say that a state vector, ., is locally Hadamard if its two-point Wightman function satisfies that for any two points, .x and .x in a convex normal neighborhood of spacetime .N it holds that [24, 36, 44] ˆ (x ˆ ) − Hλ (x, x ) = C ∞ (N × N),
|(x)
.
where .Hλ is the .(1 + 1)-dimensional Hadamard bi-distribution [12]
(28)
Quantum Field Theory with Dynamical Boundary Conditions and the Casimir Effect
Hλ (x, x ) =
.
1 Q(x, x ) ln σ (x, x )/λ2 + W (x, x ) . 4π
205
(29)
Here we denote .x := (t, z), x := (t , z ), .λ ∈ R is an arbitrary scale, σ (x, x ) is the half-squared geodesic distance between the points .x and .x , which is unambiguously defined for .x and .x inside a convex neighborhood, and Q and W are smooth, symmetric coefficients obtained by solving the so-called Hadamard recursion relations. Furthermore, .ln z is the principal branch of the logarithm with the argument of z in .(−π, π ). We will be interested below (in Sect. 4) in studying the problem when the potential is .V (z) = 0. In this case, in local coordinates, we fix the Hadamard bidistribution to 2 m 1 2γ + ln σ ((t, z), (t , z )) .HM ((t, z), (t , z )) := − 4π 2 2 m m2 σ ((t, z), (t , z )) ln σ ((t, z), (t , z )) + 2γ − 2 + 2 2 2 m m4 2 σ ((t, z), (t , z )) ln σ ((t, z), (t , z )) +2γ − 3 + 16π 2 (30) + O σ 3 ((t, z), (t , z ) ln σ ((t, z), (t , z ) , .
where .2σ ((t, z), (t , z )) = −(t − t )2 + (z − z )2 and .γ is the Euler number. This choice guarantees that the renormalized state polarization and renormalized stressenergy tensor in the Minkowski vacuum can be set to zero, in agreement with Wald’s fourth axiom for the renormalized stress-energy tensor, see, e.g., [44]. The subscript .M on the left-hand side of (30) places emphasis on this fact. Using this prescription, our definitions (31) below are equivalent to subtracting the Minkowski vacuum, as has been done originally by Kay in [29] for studying the local Casimir effect with periodic boundary conditions. In plain words, as .x → x, a Hadamard state vector, ., will display a logarithmic singularity in the two-point Wightman function for linear theories defined in two spacetime dimensions. This singularity must be renormalized in order to define the renormalized local state polarization and local Casimir energy in the Hadamard state .. More precisely, for a linear scalar field of mass m the renormalized local state polarization and local Casimir energy are defined, respectively, by a point-splitting regularization as ˆ (x ˆ ) − HM (x, x ) , . ˆ ren )2 (x) := lim |(x)
|(
.
x →x
ˆ (x ˆ ) − HM (x, x ) ,
|Hˆ ren (x) := lim T
| (x) x →x
(31a) (31b)
206
B. A. Juárez-Aubry and R. Weder
where the operator .T takes the form .T = 12 ∂t ∂t + ∂z ∂z + m2 + V (z) in coordinates, .x := (t, z), defined on a convex normal neighborhood, .N. Recall that a globally hyperbolic spacetime is a spacetime without boundary such that the Cauchy problem is globally well posed. It is known that in globally hyperbolic spacetimes, vacuum and thermal states (more generally, passive states) satisfy the Hadamard condition [37]. Our spacetime is not globally hyperbolic because it has a boundary. However, as we saw in Sect. 2, when we impose our boundary condition the Cauchy problem is globally well posed. Furthermore, as we shall see, the Hadamard condition also holds in our problem, and renormalized bulk observables can be defined from the bulk two-point Wightman function (25a) with prescription (31). For boundary observables, as we shall see in Sect. 4, no Hadamard subtraction is needed, for the coincidence limit .x → x of the boundary two-point Wightman function (25b) as the limit will turn out to be regular, as it is to be expected for a system of zero spatial dimension.
4 The Casimir Effect In this section, we compute the renormalized local state polarization and the local Casimir energy. Throughout this section we set the potential .V (z) = 0, and we always suppose that the operator A is positive, that is to say, that all its eigenvalues are positive. In particular, this is true if the assumptions of Proposition 1 hold. We will be interested in the case in which .β2 = 0. The strategy that we will follow in this section for obtaining the renormalized local state polarization and local Casimir energy is to improve Fulton’s estimates in our cases of interest in order to characterize the ultraviolet behavior of the twopoint Wightman function. This will allow us in turn to extract and remove the singularity structure of the two-point Wightman function in the coincidence limit and appropriately define the renormalized local state polarization and the local Casimir energy. The required estimates are obtained in Appendix 9. The case in which a Dirichlet boundary condition is imposed at .z = 0 is treated in Sect. 4.1, while the case in which a Robin (including Neumann) boundary condition is imposed at .z = 0 appears in Sect. 4.2. From now on, we will use labels .D and .R to refer to Dirichlet and Robin quantities (states, eigenfunctions and eigenvalues), respectively. In particular, on spite of the fact that the vacuum state . does not (D) (R) depend on the boundary condition, we will use the notation . , . to emphasize that we consider, respectively, the Dirichlet or the Robin boundary condition at .z = 0.
Quantum Field Theory with Dynamical Boundary Conditions and the Casimir Effect
207
4.1 Dirichlet Boundary Condition at z = 0 The two-point Wightman function in the Dirichlet case is given by (D)
(D)
ˆ x)(t ˆ , x ) =
|(t,
.
∞ 1 −iωn (t−t ) (D) e n (z) ⊗ n(D) (z ), 2ωnD
(32)
n=1
with .(ωnD )2 = (snD )2 + m2 , where the normalized eigenfunctions are given by
(D) ψn (z) − sin snD z D ,. = (33) = Nn ∂ (D) −β1 sin snD + β2 snD cos snD ψn
−1/2 2ρ(snD )2 −(β1 (snD )2 +(β1 + β1 m2 ))(β2 (snD )2 +(β2 +β2 m2 )) D . Nn = + 2 2 (snD )2 (β2 (snD )2 +(β2 + β2 m2 ))2 +(β1 (snD )2 +(β1 +β1 m2 ))2
(D) .n (z)
(34) The eigenfunctions (34) are orthonormal with respect to the inner product on the classical Hilbert space .H, i.e., (D) (D) .(n , m )H
ˆ = 0
(D) dz ψn(D) (z)ψm (z) +
∂ (D)
ψn
∂ (D)
ψm ρ
= δnm ,
(35)
where .δnm is the Kronecker delta, with .ρ defined in (6). Recall that the eigen(D) functions .n are real valued. The eigenvalues .(ωnD )2 = (snD )2 + m2 satisfy the asymptotic behavior displayed in (94) of Appendix 9. In order to compute the renormalized local state polarization and the local Casimir energy in the bulk, we will make use of the bulk-bulk component of the two-point Wightman function, while for computing the boundary renormalized local state polarization and boundary local Casimir energy we will make use of the boundary-boundary component. They are given, respectively, by (D)
(D)
ˆ B (t, z) ˆ B (t , z ) =
|
.
ˆ ∂ ˆ ∂ (D)
(D) | (t) (t ) =
∞ 1 −iωnD (t−t ) (D) e ψn (z)ψn(D) (z ), . 2ωnD
(36a)
∞ 1 −iωnD (t−t ) ∂ (D) ∂ (D) e ψn ψn . 2ωnD
(36b)
n=1
n=1
In Sect. 4.1.1 we obtain the bulk and boundary local renormalized state polarization, while in Sect. 4.1.2 we obtain the bulk and boundary local Casimir energy, in the case with a Dirichlet boundary condition at .z = 0.
208
B. A. Juárez-Aubry and R. Weder
4.1.1
The Renormalized Local State Polarization
We begin by computing the renormalized local state polarization. Following our discussion in Sect. 3.1, for the bulk field and the boundary observable, we are interested in computing, respectively, (D)
.
(D) 2 ˆB |( ren ) (t, z) :=
lim
(t ,z )→(t,z)
ˆB ˆ B (D)
(D) | (t, z) (t , z ) − HM ((t, z), (t , z )) , .
(37a) (D)
(D)
(D)
(D)
ˆ ∂ren )2 (t) := lim | ˆ ∂ (t) ˆ ∂ (t ) ,
|(
(37b)
t →t
where .HM is defined in (30). We begin by computing the bulk renormalized local state polarization. We seek to control the ultraviolet behavior of (D)
(D)
ˆ B (t, z) ˆ B (t , z ) =
|
.
=
∞ (ND )2 n 8ωnD n=1
∞ (ND )2 n 2ωnD n=1
D e−iωn (t−t ) sin snD z sin snD z
D D D D D e−iωn (t−t ) eisn (z−z ) + e−isn (z−z ) − eisn (z+z ) − e−isn (z+z ) . (38)
It is clear from the asymptotic estimates of .snD (see Appendix 9) that in the limit .(t, z) → (t , z ) the sum on the right-hand side of (38) fails to converge for the first two terms inside the square bracket. We isolate the divergent behavior in this limit by noting that the large n behavior of these terms is (using (94)) 2 iπ (ND n ) −iωnD (t−t ) ±isnD (z±z ) . e e =e 8ωn
n− 21 [−(t−t )±(z±z )]
Thus, adding and subtracting terms of the form .e to the summand, we can write
iπ
1 −2 +O n . 4π n (39)
n− 12 [−(t−t )±(z±z )]
ˆB ˆ B (D)
(D) | (t, z) (t , z ) iπ 1 ∞ iπ 1 1 n− 2 [−(t−t )−(z−z )] n− 2 [−(t−t )+(z−z )] e = +e 4π n n=1 iπ 1 iπ 1 n− 2 [−(t−t )−(z+z )] n− 2 [−(t−t )+(z+z )] −e −e
.
/(4π n)
Quantum Field Theory with Dynamical Boundary Conditions and the Casimir Effect
209
∞ D 2 (ND D D D n ) −iωnD (t−t ) eisn (z−z ) + e−isn (z−z ) − eisn (z+z ) − e−isn (z+z ) + e D 8ωn n=1 iπ 1 iπ 1 n− 2 [−(t−t )+(z−z )] n− 21 [−(t−t )−(z−z )] +e − e 4π n iπ 1 iπ 1 n− 2 [−(t−t )−(z+z )] n− 2 [−(t−t )+(z+z )] . −e −e (40) The first sum can be performed immediately using formula (81a) of Appendix 8, and we obtain that ˆB ˆ B (D)
(D) | (t, z) (t , z ) iπ 1 − iπ [−(t−t )+(z−z )] e 2 =− ln 1 − e [−(t−t )+(z−z )] 4π iπ iπ + e− 2 [−(t−t )−(z−z )] ln 1 − e [−(t−t )−(z−z )] iπ iπ − e− 2 [−(t−t )+(z+z )] ln 1 − e [−(t−t )+(z+z )] iπ iπ −e− 2 [−(t−t )−(z+z )] ln 1 − e [−(t−t )−(z+z )] ∞ 2 (ND n ) −iωnD (t−t ) isnD (z−z ) −isnD (z−z ) isnD (z+z ) −isnD (z+z ) e + e + e − e − e 8ωnD n=1 iπ 1 iπ 1 1 n− 2 [−(t−t )+(z−z )] n− 2 [−(t−t )−(z−z )] e +e − 4π n iπ 1 iπ 1 n− 2 [−(t−t )−(z+z )] n− 2 [−(t−t )+(z+z )] , −e −e (41)
.
where the sum on the right-hand side is absolutely convergent as .(t , z ) → (t, z), since by (94) the summand is .O(n−2 ) uniformly in .z, z ∈ (0, ) and .t, t in a compact, say .[−T , T ]. The coincidence limit of the first two terms appearing in closed form in the right-hand side of (41) is singular and of Hadamard form. Indeed
.
iπ 1 − iπ [−(t−t )+(z−z )] 2 [−(t−t )+(z−z )] e ln 1 − e 4π (t ,z )→(t,z) iπ iπ +e− 2 [−(t−t )−(z−z )] ln 1 − e [−(t−t )−(z−z )] − HM ((t, z), (t , z )) lim
−
2 2 γ 1 m + = ln . 2 4π 2π 4π
(42)
210
B. A. Juárez-Aubry and R. Weder
For the sums appearing in (41), we can use the dominated convergence theorem to take the limit inside the sums due to the uniform .O(n−2 ) behavior of the summand. Thus we finally obtain that (D) ˆB 2
(D) |(ren ) (t, z) ˆB ˆ B (D)
(D) := lim | (t, z) (t , z ) − HM ((t, z), (t , z ))
.
(t ,x )→(t,x)
2 2 iπ i2π γ 1 m 1 z ln 1 − e z + + e ln = 2π 2π 4π 4π 2 ∞ 2 1 (ND n) 2 π 2 D + sin (n − 1/2)z . sin sn z − πn 2ωn
(43)
n=1
The sum appearing in (43) is absolutely convergent, has .O(n−2 ) summand uniformly in z and is uniformly bounded in z. We observe a logarithmic divergence as .z → 0 and as .z → . These logarithmic divergences are integrable and occur also, e.g., for the computation of the renormalized local state polarization and the local Casimir energy in the case of Dirichlet boundary conditions at the ends of the interval. See, e.g., [19, Chap. 5]. We stress that the bulk renormalized local state polarization is time-independent. We proceed to compute the boundary renormalized local state polarization. We have that (D) ˆ ∂ ˆ ∂ (D) | (t) (t ) =
.
∞ 2 2 (ND n ) −iωnD (t−t ) −β1 sin snD + β2 snD cos snD e . D 2ωn n=1
(44) We note that by (94) (ND )2 2 n −iωnD (t−t ) D D D −β1 sin sn + β2 sn cos sn . e 2ωnD 2 2 (ND n) D D D −β + β sin s s cos s n n 1 2 n 2ωnD 4 (β1 β2 − β1 β2 )2 −6 . = + O n π 5 (β2 )2 n5 =
(45)
From (45) it follows that the limit .t → t in the definition of the boundary renormalized local state polarization (37b) can be taken inside the sum by a dominated convergence argument, since the bounding function appearing in the right-hand side (45) is .t, t -independent and summable due to the polynomial falloff as .O(n−5 ) when .n → ∞. Hence, we have that
Quantum Field Theory with Dynamical Boundary Conditions and the Casimir Effect
ˆ ∂ 2 (D)
(D) |(ren ) =
.
∞ 2 2 (ND n) −β1 sin snD + β2 snD cos snD . D 2ωn
211
(46)
n=1
The sum on the right-hand side of (46) is absolutely convergent, the summand is O(n−5 ), and we stress that the renormalized local state polarization in the boundary is time-independent.
.
4.1.2
The Local Casimir Energy
Following our discussion in Sect. 3.1, for the bulk field and the boundary observable, we are interested in computing (D)
.
(D)
1 (D) ˆ B ˆ B (t , z )(D) ∂t ∂t + ∂z ∂z + m2 | (t, z) 2 (47a) − ∂t ∂t + ∂z ∂z + m2 HM ((t, z), (t , z )) , .
B |Hˆ ren (t, z) =
lim
(t ,z )→(t,z)
1 (D) ˆ∂ ˆ ∂ ˆ ∂ (D)
(D) |β1 |∂t ∂t − (sign β1 ) β1 (D) | (t) (t ) , |Hren (t) = lim t →t 2
(47b)
where the Hadamard bi-distribution, .HM is chosen as in (30). We begin by computing the bulk local Casimir energy. We are interested in controlling the ultraviolet behavior of the sum defining the first term on the righthand side of (47a) .
1 (D) ˆ B ˆ B (t , z )(D) ∂t ∂t + ∂z ∂z + m2 | (t, z) 2 ∞ D (N )2 n −iωnD (t−t ) D 2 2 D D D 2 D D = (ω sin s + (s , e ) + m z sin s z ) cos s z cos s z n n n n n n 4ωnD n=1
(48) which using the dispersion relation, .(ωnD )2 = (snD )2 + m2 , can be written as .
1 (D) ˆ B ˆ B (t , z )(D) ∂t ∂t + ∂z ∂z + m2 | (t, z) 2 ∞ H(1) H(2) = DBn (t, t , z, z ) + DBn (t, t , z, z ) , .
(49a)
n=1
D 2 (ND D D n) ωnD e−iωn (t−t ) eisn (z−z ) + e−isn (z−z ) , . 8 D (ND )2 m2 D D H(2) DBn (t, t , z, z ) := − n D e−iωn (t−t ) eisn (z+z ) + e−isn (z+z ) . 8ωn H(1) DBn (t, t , z, z ) :=
(49b) (49c)
212
B. A. Juárez-Aubry and R. Weder
We begin by studying the singular behavior as .(t , z ) → (t, z) of the term ∞ H(1) . n=1 DBn (t, t , z, z ). It follows from the estimates (94) that the right-hand side of (49b) fails to converge in the coincidence limit. Thus, as we have done for the renormalized local state polarization, we seek to isolate the singularity structure in this limit. Let i π n− 1 [−(t−t )±(z−z )]
2 h± (t − t , z − z ) := e 1 π D n− (t − t ) ± i snD − × 1 − i ωn − 2 1 1 π D n− (t − t ) ± i snD − + −i ωn − 2 2
.
1 n− (z − z ) 2 2 1 π n− (z − z ) . 2 (50) π
By estimate (83) D −iωnD (t−t ) ±isnD (z−z ) e e − h± (t − t , z − z )
. ωn
3 ωnD 1 1 π π D D −i ωn − ≤ n− (t − t ) ± i sn − n− (z − z ) . 6 2 2 (51)
It follows from the asymptotic expansion (94) that the bound on the right-hand side of (51) can be further bounded uniformly in .t, t and .z, z for .t, t ∈ [−T , T ] and .z, z ∈ (0, ) by a summable function, which behaves as .O(n−2 ) as .n → ∞ in an analogous way to the bulk renormalized local state polarization case above, whereby a dominated convergence argument allows us to write .
lim
∞
(t ,z )→(t,z)
DBn (t, t , z, z ) H(1)
n=1
=
lim
(t ,z )→(t,z)
∞ (ND )2 n
n=1
8
ωnD h+ (t − t , z − z ) + h− (t − t , z − z ) . (52)
Performing an asymptotic expansion for the summand using (94), we further write
Quantum Field Theory with Dynamical Boundary Conditions and the Casimir Effect
.
∞
lim
(t ,z )→(t,z)
=
213
H(1) DBn (t, t , z, z )
n=1
lim
(t ,z )→(t,z)
∞ + Pn ((t − t ), (z − z )) + Pn− ((t − t ), (z − z )) n=1
∞ 2 (ND n) + lim ωnD h+ (t − t , z − z ) + h− (t − t , z − z ) 8 (t ,z )→(t,z) n=1 −Pn+ ((t − t ), (z − z )) − Pn− ((t − t ), (z − z )) (53a) with ±
.Pn
((t − t ), (z − z ))
:= e
iπ
n− 12 [−(t−t )±(z−z )]
πn
−
2iβ1 [−(t − t ) ± (z − z )] + β2 π + im2 (t − t ) 8β2 2
42 (β )2 2 m2 4 − m2 (t − t )2 − 4β1 β2 m2 (t − t )[−(t − t ) ± (z − z )] − 4(β1 )2 [−(t − t ) ± (z − z )]2 + 2 . 32π(β2 )2 2 n
(53b) The second sum on the right-hand side of (53a) converges for all values of t, t and .z, z —one can verify from the asymptotic estimate (94) that the summand behaves as .O(n−2 ) as .n → ∞, for all values of .t, t in a compact set and .z z ∈ (0, )—while the first sum can be performed analytically using formulae (81a), (81b) and (81c), and the sum contains a distributional singularity as .(t , z ) → (t, z) that compensates the Hadamard singular structure. We have that
.
1 2 ∂t ∂t + ∂z ∂z + m HM (t, t , z, z ) . lim 2 (t ,z )→(t,z) n=1 2 2 π 2 β2 + 6(2γ − 1)β2 2 m2 + 6β2 2 m2 ln 4πm2 + 24β1 = 48πβ2 2
∞ 2 D (ND πn π m2 n ) ωn − 2+ 2− + (54) , 4 4π n 2 4
∞
H(1) DBn (t, t , z, z ) −
n=1
where the summand on the second term of the right-hand side of (54) is .O(n−2 ). The contribution to the local Casimir energy coming from the term (49c) can be handled as in the case of the renormalized local state polarization. See the discussion beginning at (39). We have that
214
B. A. Juárez-Aubry and R. Weder
.
lim
∞
(t ,z )→(t,z)
=−
H(2) DBn (t, t , z, z )
n=1
lim
(t ,z )→(t,z)
∞ (ND )2 m2 n
n=1
8ωnD
D D D e−iωn (t−t ) eisn (z+z ) + e−isn (z+z )
∞ 2 2π 1 (ND m2 −i π z n) i z 2 e ln 1 − e − +m = D 2π 4π n 8ωn n=1 2 1 (ND n) 2 D 2 π sin (n − 1/2)z , sin sn z − + 2π n 4ωnD
(55)
where the summand of the second term on right-hand side of (55) is .O(n−2 ) for all values of .z ∈ (0, ) and the sum converges absolutely and uniformly in z. Adding up (54) and (55), we find that the bulk local Casimir energy is (D) ˆ B (D) |Hren (t, z)
.
=
π 2 β2 + 6(2γ − 1)β2 2 m2 + 6β2 2 m2 ln
2 m2 4π 2
+ 24β1
48πβ2 2
∞ 2 D (ND π m2 m2 −i π z πn n ) ωn i 2π z + e + ln 1 − e − 2 + 2 − 2π 4 4πn 2 4 n=1
D 2 2 2 m2 1 (ND n) m 2 D 2 π 2 (Nn ) −m sin sn z − − + sin (n − 1/2)z . 4πn 2πn 4ωnD 8ωnD
(56) The sum appearing on the right-hand side of (56) is absolutely convergent and uniformly convergent in .z ∈ (0, ), the summand is .O(n−2 ) for all .z ∈ (0, ). We stress that the bulk local Casimir energy is time-independent, hence conserved. We observe a logarithmic divergence as .z → 0 and as .z → , as is the case for the bulk renormalized local state polarization, see (43). Since the logarithmic divergence of (56) is integrable, the total Casimir energy is finite. We now focus our attention on the boundary local Casimir energy. A look at (33), (36b), and (47b) indicates that we need to verify the convergence of the summand factor .
2 2 D 2 (β1 β2 − β1 β2 )2 (ND n ) ωn −β1 sin snD + β2 snD cos snD = + O(n−4 ), 4 2π 3 (β2 )2 n3 (57)
where the right-hand side of (57) is obtained using (94). It follows from the O(n−3 ) behavior that the sum defining the boundary local Casimir energy converges
.
Quantum Field Theory with Dynamical Boundary Conditions and the Casimir Effect
215
absolutely for all values of .t and t, and that the limit .t → t can be taken inside the sum by dominated convergence. We have that (D) ˆ ∂ (D) . |Hren (t)
D 2 ∞ 2 (ND n ) |β1 |(ωn ) − (sign β1 )β1 = 4ωnD n=1
2 × −β1 sin snD + β2 snD cos snD .
(58)
The summand on the right-hand side of (58) is .O(n−3 ) and the sum is absolutely convergent. The boundary Casimir energy is time-independent.
4.2 Robin Boundary Condition at z = 0 In the Robin case, the two-point Wightman function is given by (R) ˆ ˆ , x )(R) =
|(t, x)(t
.
∞ 1 −iωnR (t−t ) (R) e n (z) ⊗ n(R) (z ), 2ωnR
(59)
n=1
with .(ωnR )2 = (snR )2 + m2 , and with normalized eigenfunctions (R)
.n
(z) =
(R) ψn (z) ∂ (R)
ψn
⎞ cos α sin snR z R + sin α cos s z n ⎜ ⎟ R ⎜
⎟ sn R = NR ⎟ ,. n ⎜ cos α sin s ⎝ n R R R R ⎠ (cos α cos s sin α sin s + s ) β1 sin α cos sn − + β n n n 2 snR ⎛
−
(60)
sin(2α) R 2 NR − 4(sn ) (β1 + β1 m2 )2 + 9(β1 )2 (snR )4 + 2(β1 + β1 m2 )β1 (snR )2 n := − 4(snR )2 2 −1 +(snR )2 (β2 + β2 m2 ) + β2 (snR )2 (snR )2 − 1 cos(2α) − (snR )2 − 1 2 × (β1 + β1 m2 )2 + β12 (snR )4 + 2(β1 + β1 m2 )β1 (snR )2 + (snR )2 (β2 + β2 m2 ) + β2 (snR )2 −1/2 −(β1 + β1 m2 ) (β2 + β2 m2 ) + 3β2 (snR )2 + β1 (snR )2 (β2 + β2 m2 ) − β2 (snR )2 ,
(61) which are orthonormal with respect to the inner product on the classical Hilbert space .H, i.e.,
216
B. A. Juárez-Aubry and R. Weder
(R) (n(R) , m )H =
ˆ
.
0
(R) dz ψn(R) (z)ψm (z) +
∂ (R)
ψn
∂ (R)
ψm ρ
= δnm ,
(62)
(R)
where .ρ is defined in (6). Note that the eigenfunctions .n are real valued. The eigenvalues .(ωnR )2 = (snR )2 + m2 obey the asymptotic behavior presented in (100). See Appendix 9 for details. As we have explained in Sect. 3, the bulk renormalized local state polarization and local Casimir energy are obtained by using the bulk-bulk component of the twopoint Wightman function, and for the boundary counterparts we use the boundaryboundary component. They are, respectively, (R)
(R)
ˆ B (t, z) ˆ B (t , z ) =
|
.
(R) ˆ ∂ ˆ ∂ (t )(R) =
| (t)
4.2.1
∞ 1 −iωnR (t−t ) (R) e ψn (z)ψn(R) (z ), . 2ωnR
(63a)
∞ 1 −iωnR (t−t ) ∂ (R) ∂ (R) e ψn ψn . 2ωnR
(63b)
n=1
n=1
The Renormalized Local State Polarization
For the bulk renormalized local state polarization, we are interested in computing (R)
.
(R)
2 ˆB |( ren ) (t, z) =
lim
(t ,z )→(t,z)
(R) ˆ B ˆ B (t , z )(R) − HM ((t, z), (t , z )) .
| (t, z)
(64) The strategy to obtain the limit on the right-hand side of (64) is similar to the one in the Dirichlet case. We show the details of the calculation in Appendix 10.1, and state directly the result,
iπ i2π 1 γ − e− z ln 1 − e z 2π 2π
∞ π cos2 α 2 (NR 1 n) 2 2 R 2 R 2 cos (n − 1)z + sin α cos sn z + R sin sn z − πn 2ωnR (sn )2 n=1 (NR )2 (65) − Rn R sin α cos α sin 2snR z . 2ωn sn
1 (R) ˆ B 2 (R) . |(ren ) (t, z) = 4π ln
m2 2 4π 2
+
As is the case for a Dirichlet boundary condition at .z = 0, the renormalized local state polarization contains a logarithmic divergence at .z = 0 and .z = . We stress again that this divergence is integrable, and hence contributes as a finite term to the integrated total bulk state polarization. We also stress that the summand appearing on the right-hand side of (65) is .O(n−2 ) for all .z ∈ (0, ), hence summable.
Quantum Field Theory with Dynamical Boundary Conditions and the Casimir Effect
217
The sum converges absolutely and uniformly in z. We also point out that the bulk renormalized local state polarization is time-independent. For the boundary renormalized local state polarization, we wish to compute (R) ˆ∂ 2 ˆ ∂ ˆ ∂ (R)
(R)
(R) |(ren ) (t) = lim | (t) (t ) .
.
(66)
t →t
The details of the calculations are presented in Appendix 10.2 and we simply state the result,
∞ cos α sins R 2 (NR (R) ˆ ∂ 2 (R) n n) R . β1 sin α cos sn − |(ren ) (t) = 2ωnR snR n=1
2 +β2 (cos α cos snR + snR sin α sin snR ) .
(67)
We note that the boundary renormalized local state polarization is timeindependent. The summand on the right-hand side of (67) is .O(n−5 ) and the sum converges absolutely.
4.2.2
The Local Casimir Energy
The bulk local Casimir energy for a Robin boundary condition at .z = 0 is given by β1 cot α π m2 (R) ˆ B (R) . | H (t, z) = − − + − 2π 8π 2πβ2 242
m2 2 1 + ln 4π 2
+
γ m2 4π
∞ 2 2π m2 −i 2π z (NR n) e ln 1 − ei z − m2 sin α cos α sin 2snR z R R 2π 2ωn sn n=1
∞ 2 (NR cos2 α π(n − 1) n) R 2 R 2 2 (ωn ) + (sn ) + − sin α + R 22 8ωnR (sn )2 n=1 ∞ 2 2(n − 1)π 1 m2 (NR n) 2 R −2 2 R cos z sin α − (sn ) cos α cos 2sn z − + 2 2π n 4ωnR
−
n=1
∞ π cos2 α 2 m2 (NR 1 n) 2 2 R 2 R 2 cos (n − 1)z . sin α cos sn z + R sin sn z − + πn 2 2ωnR (sn )2 n=1
(68) The computations leading to (68) are presented in Appendix 10.3. As in the case of the renormalized local state polarization, appearing in (65), an integrable, logarithmic divergence appears at .z = 0 and .z = , and the result is timeindependent. The summand defined on the right-hand side of (68) is .O(n−2 ) for all .z ∈ (0, ) and the sum converges absolutely and uniformly in z.
218
B. A. Juárez-Aubry and R. Weder
In Appendix 10.4 we show that the boundary local Casimir energy for a Robin boundary condition at .z = 0 is given by (R) ˆ ∂ (R) . |Hren (t)
∞ 2 |β1 |(ωnR )2 − (sign β1 )β1 (NR n) = 4ωnR n=1
cos α sins R n R β1 sin α cos sn − snR 2 +β2 (cos α cos snR + snR sin α sin snR ) .
(69)
The summand on the right-hand side of (69) is .O(n−3 ) and the sum converges absolutely. We stress that the result for the boundary local Casimir energy is timeindependent. This result concludes the study of the system at zero temperature.
5 The Casimir Effect at Positive Temperature In this section we show how to obtain the Casimir effect at positive temperatures. We set the potential .V (z) = 0, and we always suppose that the operator A is positive, that is to say, that all its eigenvalues are positive. In particular, this is true if the assumptions of Proposition 1 hold. We are concerned in the case in which .β2 = 0. We shall follow a strategy that will allow us to write down both the renormalized local state polarization and the local Casimir energy at positive temperature .T = 1/β in terms of the quantities at zero temperature. We will deal with the Dirichlet and Robin cases simultaneously. We write (26) as ˆ z)(t ˆ , z )β = . (t,
∞ 1 n (z) ⊗ n (z ) 2ωn n=1 1 × e−iωn (t−t ) + βω e−iωn (t−t ) + eiωn (t−t ) . e j −1 (70)
By (24) and (70) we have ˆ z)(t ˆ , z )β = |(t, ˆ z)(t ˆ , z ) +
(t,
.
∞ n=1
(71)
1 1 e−iωn (t−t ) + eiωn (t−t ) . n (z) ⊗ n (z ) βω 2ωn e j −1
Quantum Field Theory with Dynamical Boundary Conditions and the Casimir Effect
219
We use the following result to write the renormalized local state polarization and local Casimir energy at positive temperature in terms of the quantities at zero temperature. Lemma 1 Let .n , .sn , and .ωn be as in the cases of Dirichlet or Robin boundary con 1/2 ditions at .z = 0, .n(D) ((33), (34)), respectively .snD ((94)) and .ωnD = (snD )2 + m2 1/2 or .n(R) ((60), (61)), .snR ((100)) and .ωnR = (snR )2 + m2 . The sums defined by the tensor-product components SB :=
.
∞ 1 1 e−iωn (t−t ) + eiωn (t−t ) , ψn (z)ψn (z ) βω 2ωn e j −1 n=1
S∂ :=
∞ 1 ∂ ∂ 1 e−iωn (t−t ) + eiωn (t−t ) , ψn ψn βω 2ωn e j −1
(72a)
n=1
converge absolutely for .β > 0, and uniformly for any values of .0 ≤ z, z ≤ and .t, t ∈ R. Proof It follows from the eigenfunctions .ψn in the Dirichlet and Robin cases and from the large n estimates for .sn (see (33), (34), (94), (45), and (60), (61), (100), and (116), respectively) that there exists polynomially bounded functions .P B , P ∂ : N → R+ such that for .z, z ∈ (0, ), 1 B −1 . ψ (z)ψ (z ) (73a) n 2ω n ≤ P (n) = O(n ), . n 1 ∂ 2 ∂ −5 ψ (73b) n ≤ P (n) = O(n ), 2ω n where the right-hand side of (73a) is uniform for .z, z ∈ (0, ). It then follows that .
∞ ∞ 1 2P B (n) 1 −iωn (t−t ) iωn (t−t ) ≤ e < ∞, . ψ (z)ψ (z ) + e n 2ω n βω e j −1 eβωj − 1 n=1
n
n=1
(74a) ∞ ∞ 1 ∂ 2 1 2P ∂ (n) −iωn (t−t ) iωn (t−t ) ≤ ψ e < ∞. + e n 2ω eβωj − 1 eβωj − 1 n n=1 n=1 (74b)
We define the renormalized local state polarization at positive temperature as follows: ˆ B (t, z) ˆ B )2 (t, z)β,ren := ˆ B (t , z )β − HM ((t, z), (t , z )) , .
. ( lim (t ,z )→(t,z)
(75a)
220
B. A. Juárez-Aubry and R. Weder
ˆ ∂ )2 (t)β,ren := lim ˆ ∂ (t) ˆ ∂ (t )β .
(
(75b)
t →t
It follows from (71) and Lemma 1 that the bulk and boundary renormalized local state polarizations at positive temperature are given by (D/R)
ˆ B )2 (t, z)β,ren =
(
.
ˆ ∂ )2 (t)β,ren
(
(D/R)
2 ˆB |( ren ) (t, z)
+
∞ n=1
2 (D/R) ψn (z) (D/R) ,. (D/R) eβωn ωn −1
(76a) 2 ∂ (D/R) ∞ ψn (D/R) ˆ ∂ 2 (D/R) . = |( )ren (t) + (D/R) (D/R) βωn e ω − 1 n=1 n (76b)
The sum on the right-hand side of (76a) is absolutely convergent and converges exponentially fast for all .z ∈ (0, ). The sum on the right-hand side of (76b) is absolutely convergent and converges exponentially fast. We define the local Casimir energy at positive temperature in the bulk and in the boundary as follows: 1 2 ˆ B (t, z) ˆ B (t , z )β ∂
∂ + ∂ ∂ + m t t z z (t ,z )→(t,z) 2 − ∂t ∂t + ∂z ∂z + m2 HM ((t, z), (t , z )) , . (77a)
B
Hˆ ren (t, z)β :=
.
lim
∂ 1 ∂ ˆ (t, z) ˆ ∂ (t , z )β . |β1 |∂t ∂t − (sign β1 )β1
Hˆ ren (t)β = lim t →t 2
(77b)
Then, by (71), (77a), and (77b) we have ˆ ren (t, z)β . H
(D/R)
(D/R)
B |Hˆ ren (t, z) ⎡ 2 ⎤ 2 (D/R) 2 (D/R) (D/R) ∞ (ωn ∂z ψn ) + m2 ψn (z) (z) 1 ⎥ ⎢ + + ⎦ ,. ⎣ (D/R) (D/R) (D/R) (D/R) βωn βωn 2ω 2ω e − 1 e − 1 n n n=1 B
=
(D/R) ˆ ∂ (D/R) ∂
Hˆ ren (t)β = |Hren (t) +
(78a)
∂ (D/R) 2 (D/R) 2 ∞ |β1 |(ωn ) − (sign β1 )β1 ψn n=1
(D/R)
2ωn
(D/R)
eβωn
−1
,
(78b) where the sums appearing on the right-hand side of (78a) and (78b) are absolutely convergent by an adaptation of Lemma 1 and converge exponentially fast. In particular, it can be verified that polynomially bounded functions such as the ones
Quantum Field Theory with Dynamical Boundary Conditions and the Casimir Effect
221
appearing on (73a) and (73b) can be found for the sums appearing on the righthand side of (78a) and (78b). Furthermore, the sum on the right-hand side of (78a) converges uniformly in z. We note that, as expected, when .β → ∞, the results obtained in this section for the renormalized local state polarization and local Casimir energy reduce to the zero temperature ones.
6 Numerical Examples In this section we present some numerical examples for the local Casimir energy at temperature zero in the case in which we set a Dirichlet boundary condition at 2 .z = 0. We fix the parameters of the problem to .β1 = −1, .β = 1, .β2 = 1, .m = 1 1 and . = 1, and compare three representative cases, for .β2 = −0.5, .β2 = −0.05 and .β2 = 0.5 in Figs. 1, 2, and 3, respectively. In each case the total integrated Casimir energy is negative. While the case .β2 = 0.5 does not satisfy the hypotheses of Proposition 1, our numerics verify that the operator A is positive in this case too. In each case, as .z → 0 we observe a negative logarithmic divergence and as .z → = 1 a positive logarithmic divergence for the local Casimir energy. The plots presented in this section are obtained by numerically approximating the local Casimir energy using the first 50 eigenvalues in each case. More precise numerical results can be carried out in cases of interest by computing a larger number of eigenvalues. We will denote by ˆ E0 (t) :=
.
0
(D) ˆB dz (D) |Hren (t, z)
(79)
the integrated Casimir energy at zero temperature in the bulk, i.e., the bulk total energy of the ground state. Note that in our cases, due to the conservation of the local energy in time, .E0 is time-independent. Fig. 1 Case .β2 = −0.5. The boundary Casimir energy is (D) ˆ ∂ (D) . |H ren ≈ 0.26. The integrated bulk Casimir energy is .E0 ≈ −77.80
222
B. A. Juárez-Aubry and R. Weder
Fig. 2 Case .β2 = −0.05. The boundary Casimir energy (D) ∂ (D) ≈ 0.73. is . |Hˆ ren The integrated bulk Casimir energy is .E0 ≈ −77.84
Fig. 3 Case .β2 = 0.5. The boundary Casimir energy is (D) ˆ ∂ (D) . |H ren ≈ 0.43. The integrated Casimir energy is .E0 ≈ −0.08
7 Final Remarks Throughout this work we have obtained the renormalized local state polarization and local Casimir energy for a system consisting of a bulk scalar field defined on the interval .[0, ] coupled to a boundary observable defined on the right-hand end of the interval. The coupling between the bulk scalar field and the boundary observable is implemented through a dynamical boundary condition for the bulk field. We have given expressions for the renormalized bulk and boundary local state polarization and local Casimir energy at zero temperature in Sect. 4 and in Sect. 5 at positive temperature. Our computations reveal that the renormalized local state polarization and the local Casimir energy are conserved in time, both for the bulk scalar field and for the boundary observable, and we also show that in the case of the bulk they display logarithmic (integrable) divergences near the boundaries of the interval. These divergences occur also for non-dynamical boundary conditions in the Dirichlet and Robin class (see, e.g., [19]), and we stress that they do not obstruct the total (integrated) Casimir energy from being finite, since they are integrable. Indeed, in Sect. 6 we have explored numerically a sample of cases and obtained an approximation of the integrated energy numerically. To the best of our knowledge, the Casimir energy for fields with dynamical boundary conditions had only been studied in [17] at zero temperature, but in that
Quantum Field Theory with Dynamical Boundary Conditions and the Casimir Effect
223
case only the integrated Casimir energy was obtained, and no information on the local Casimir energy, which we study here, can be directly inferred from their computational method. The work that we have carried out here can be generalized in several directions. First, by using Fourier transform methods, one can generalize the case of the interval to n dimensions. In this case, one has field theory in the bulk and in the boundary. Second, one can study other species of linear fields, such as the Maxwell, Dirac, or Proca fields. Third, we have only studied the static Casimir effect, but the dynamical Casimir effect [16] should also be of great interest. See, e.g., Refs. [11, 22, 25] for the dynamical Casimir effect in the context of quantum field theory and for analogies with black hole radiation and formation. It is particularly interesting the case in which the coefficients .β1 , .β1 , .β2 , and .β2 are time-dependent [17, 46], in which case one should have the so-called creation of particles.
8 Appendix 1: Useful Formulae Let ., k ∈ R. We have the following formula .
∞ in e n=1
nk
= Lik ei ,
(80)
where .Lik is the polylogarithm or Jonquière function of order k, see, e.g., [34, Eq. 25.12.10]. In particular, throughout the paper we will make use of the following three formulae for .k = 1, .k = 0, and .k = −1, .
= − ln 1 − ei , .
∞ in e n=1 ∞
n
ein =
n=1
ei ,. 1 − ei
∞
ei n ein = 2 . 1 − ei n=1
(81a)
(81b)
(81c)
Here, .ln z is the principal branch of the logarithm with the argument of z in (−π, π ). Equation (81a) can be easily obtained from the integral representation of .Li1 [34, Eq. 25.12.11], whereby .
.
∞ in e n=1
n
ˆ = Li1 ei = ei 0
∞
dx
ex
1 = − ln 1 − ei , i −e
(82)
224
B. A. Juárez-Aubry and R. Weder
while (81b) can be obtained from (81a) by taking derivatives in . on the left-hand and right-hand sides. Similarly, (81c) is obtained taking the derivative in . on the left-hand and right-hand sides of (81b). Let .x ∈ R. It holds that n k |x|n+1 ix (ix) . e − , ≤ k! (n + 1)!
(83)
k=0
for any .n ∈ N0 . This follows from estimating the Lagrange Remainder in the Taylor series for .eix to order n, cf. formulae 0.317.2 and 0.317.3 in [23].
9 Appendix 2: Asymptotic Estimates for the Eigenvalues In this appendix, we obtain asymptotic formulae for the eigenvalues, .ωn2 , n = 1, . . . , of our classical problem in Sect. 2, with .V (z) = 0 in the cases where there is a Dirichlet or Robin boundary condition at .z = 0. The results that we present here are an improvement on the asymptotics presented in [21, Sec. 4]. We assume throughout this appendix that .ρ > 0 and .β2 = 0. The detailed assumptions of Proposition 1, which ensure that the eigenvalues for problem (3) be all positive, are not needed here. The case for .β2 = 0 can also be treated, but estimates like the ones that we shall present in this appendix should be obtained separately, and we will not be concerned with this case. Furthermore, we use the dispersion relation, .ωn2 = sn2 + m2 , n = 1, . . . , to write (3) as follows: ⎧ 2 ⎨ −∂z ϕ = sn2 ϕ, . cos αϕ(0) + sin α∂z ϕ(0) = 0, α ∈ [0, π ), ⎩ − (β1 + β1 m2 )ϕ() − (β2 + β2 m2 )∂z ϕ() = sn2 β1 ϕ() − β2 ∂z ϕ() . (84) We are concerned with obtaining large eigenvalue estimates for problem (84). In order to obtain the eigenvalues, we seek to find the roots of the Wronskian, which, following standard techniques [42] in the problem at hand, can be written as (see [21, Eq. (3.4)]) ω(s 2 ) = (β1 s 2 + β1 + β1 m2 )φs 2 () − (β2 s 2 + β2 + β2 m2 )∂z φs 2 (),
.
(85)
where the eigenfunctions .φs 2 satisfy − ∂z2 φs 2 = s 2 φs 2 , . φs 2 (0) sin α = , ∂z φs 2 (0) − cos α .
(86) (87)
Quantum Field Theory with Dynamical Boundary Conditions and the Casimir Effect
225
with .α = 0 for Dirichlet boundary condition, .α ∈ (0, π ) for Robin boundary condition, at .z = 0, and, in particular, .α = π/2 in the Neumann case. In the case of Dirichlet boundary condition, we solve (86) with .φs 2 (z) = − sin(sz)/s. For Robin boundary condition, we set .φs 2 (z) = − cos α sin(sz)/s + sin α cos(sz) (in particular, for the Neumann case .φs 2 (z) = cos(sz)). We henceforth use labels .D and .R for Dirichlet and Robin quantities, respectively.
9.1 Dirichlet Boundary Condition at z = 0 Following [21, Sec. 4], Rouché’s theorem (see, e.g., [9]) implies that, for sufficiently large n, snD =
.
δnD = O n−1 .
(n − 1/2)π + δnD ,
(88)
Using (85), we need to solve for (β1 (snD )2 +β1 +β1 m2 ) sin snD −snD (β2 (snD )2 +β2 +β2 m2 ) cos snD = 0,
.
(89)
which upon using (88) can be written as .
β1 +
(n − 1/2)π + δnD
(n − 1/2)π + δnD
2
+ β1 + β1 m2
β2
cos δnD
(n − 1/2)π + δnD
2
+ β2 + β2 m2
sin δnD = 0. (90)
Equation (90) can be expanded using the fact that .δnD = O n−1 , and one obtains that D −2 , .β1 + πβ2 (n − 1/2)δn = O n (91) from where it follows that snD =
.
(n − 1/2)π D +δn ,
δnD = −
β1 + D , β2 (n − 1/2)π n
nD = O n−3 . (92)
Inserting (92) into (90), we find the relation β1 −
.
β1 β2 π 3 β2 (n − 1/2)3 nD (β1 )2 (β1 )3 −2 , − + = O n + β2 β2 2 3(β2 )2
(93)
226
B. A. Juárez-Aubry and R. Weder
which can be solved to yield D = (n − 1/2)π + δ D , n
.sn
β1
+ δnD = − β2 π(n − 1/2)
2 −3β1 (β2 )2 + 3β2 β1 β2 + (β1 )3 − 3(β1 )2 β2 3π 3 (β2 )3 (n − 1/2)3
+ O n−5 .
(94) The expansion (94) suffices for our purposes, but one can recursively obtain more precise estimates.
9.2 Robin Boundary Condition at z = 0 The treatment is analogous to the Dirichlet case. Following [21, Sec. 4], Rouché’s theorem (see, e.g., [9]) implies that, for sufficiently large n, snR =
.
δnR = O n−1 .
(n − 1)π + δnR ,
(95)
Using (85), we need to solve for
cos α sin snR R + sin α cos sn − snR + snR (β2 (snR )2 + β2 + β2 m2 ) cos α cos snR + snR sin α sin snR = 0,
R 2 .(β1 (sn )
+ β1 + β1 m2 )
(96)
which upon using (95) can be written as
.
β1
(n − 1)π + δnR
2
+ β1 + β1 m2
−
cos α sin δnR (n−1)π
+ δnR
+ sin α cos δnR
2 (n − 1)π R 2 + δn + β2 + β2 m β2 + (n − 1)π + δnR sin α sin δnR = 0. × cos α cos δnR +
(n − 1)π + δnR
(97)
Equation (97) can be expanded using the fact that .δnR = O n−1 , and one obtains that .
sin α(π δnR (n − 1)β2 + β1 ) + β2 cos α = O(n−2 ),
(98)
Quantum Field Theory with Dynamical Boundary Conditions and the Casimir Effect
227
from where it follows that snR =
.
(n − 1)π + δnR ,
δnR = −
β1 + β2 cot α + nR , β2 (n − 1)π
nR = O n−3 . (99)
Iterating as in the Dirichlet case by inserting (99) into (97), we find that snR =
.
δnR
(n − 1)π + δnR ,
2 −3β1 (β2 )2 + 3β2 β1 β2 + (β1 )3 − 3(β1 )2 β2 3π 3 (β2 )3 cot α[−6β1 + β2 cot α(3 − cot α)] 1 −4 . + O n (100) + 3π 3 β2 (n − 1)3
β + β2 cot α + = − 1 β2 (n − 1)π
In the case of Neumann boundary conditions, setting .α = π/2 in the Robin case, one has the expansion N = (n − 1)π + δ N , n
.sn
β1
+ δnN = − β2 (n − 1)π
2 −3β1 (β2 )2 + 3β2 β1 β2 + (β1 )3 − 3(β1 )2 β2 3π 3 (β2 )3 (n − 1)3
+ O n−4 .
(101)
10 Appendix 3: Calculations with a Robin Boundary Condition at z = 0 We present the computations for the renormalized local state polarization and Casimir energy in the case in which a Robin boundary condition is imposed at .z = 0.
10.1 Renormalized Local State Polarization in the Bulk We are interested in extracting the coincidence-limit singular behavior of (R) ˆ B ˆ B (R) | (t, z) (t , z )
.
=
∞ V(1) V(2) V(3) RB n (t, t , z, z ) + RB n (t, t , z, z ) + RB n (t, t , z, z ) , .
n=1
(102a)
228
B. A. Juárez-Aubry and R. Weder
2 (NR V(1) n ) −iωnR (t−t ) RB n (t, t , z, z ) := e R 8ωn
cos2 α sin2 α + R (sn )2
R R eisn (z−z ) + e−isn (z−z ) , .
(102b)
RB n (t, t , z, z ) := V(2)
2 2 (NR n ) −iωnR (t−t ) 2 α − cos α e sin (snR )2 8ωnR
R R eisn (z+z ) + e−isn (z+z ) , .
(102c) RB n (t, t , z, z ) := − V(3)
R 2 R (NR n ) −iωnR (t−t ) sin α cos α eisn (z+z ) − e−isn (z+z ) . e R R isn 4ωn
(102d)
It can be seen from the asymptotic estimates (100) that the sum defined by the V(1) summand .RB n fails to converge when .t = t and .z = z. Let us study this sum in detail. First, we rewrite .
∞ n=1
RB n (t, t , z, z ) V(1)
∞ + R 2 (NR R R n) sin2 α + (snR )−2 cos2 α e−iωn (t−t ) eisn (z−z ) + e−isn (z−z ) R 8ωn n=1 iπ iπ − e (n−1)[−(t−t )+(z−z )] + e (n−1)[−(t−t )−(z−z )]
=
+
∞ iπ 2 iπ (NR n) 2 R −2 2 (n−1)[−(t−t )+(z−z )] + e (n−1)[−(t−t )−(z−z )] . sin α + (s ) cos α e n 8ωnR n=1
(103) The first sum on the right-hand side of (103) vanishes in the limit .(t , z ) → (t, z), since by estimate (83) ∞ R 2 iπ )±(z−z )] R (t−t )±s R (z−z )] (Nn ) (n−1)[−(t−t 2 R −2 2 i[−ω n n sin α + (sn ) cos α e . −e 8ωnR n=1 ∞ R 2 (Nn ) 2 α + (s R )−2 cos2 α ≤ sin n 8ωnR n=1 π π − ωnR − (n − 1) (t − t ) ± snR − (n − 1) (z − z ) . (104)
The summand on the right-hand side of (104) vanishes as .(t , z ) → (t, z), and by (100) it is .O(n−2 ) uniformly for .t, t in bounded sets and .z, z ∈ [0, ]. Hence, by dominated convergence, the first sum on the right-hand side of (103) vanishes at .(t , z ) → (t, z). The second sum on the right-hand side of (103) is therefore the only contribution on the coincidence limit. We rewrite it as
Quantum Field Theory with Dynamical Boundary Conditions and the Casimir Effect
.
229
∞ iπ 2 iπ (NR n) 2 R −2 2 (n−1)[−(t−t )+(z−z )] + e (n−1)[−(t−t )−(z−z )] α + (s ) cos α e sin n 8ωnR n=1
=
∞ 2 1 (NR n) 2 R −2 2 α + (s ) cos α − sin n 4πn 8ωnR n=1
iπ
iπ
e (n−1)[−(t−t )+(z−z )] + e (n−1)[−(t−t )−(z−z )]
+
∞ iπ 1 iπ (n−1)[−(t−t )+(z−z )] + e (n−1)[−(t−t )−(z−z )] . e 4πn
(105)
n=1
The first sum on the right-hand side of (105) converges in absolute value and uniformly in .t, t and .z, z , since by the asymptotics in (100) .
2 (NR 1 n) 2 R −2 2 −2 sin , α ± (s ) cos α = + O n n 4π n 8ωnR
(106)
so one can take the limit inside the sum, while the second sum in (105) can be obtained explicitly, using formula (81a), ∞ iπ 1 iπ (n−1)[−(t−t )+(z−z )] e . + e (n−1)[−(t−t )−(z−z )] 4π n n=1
iπ 1 iπ [(t−t )−(z−z )] e ln 1 − e [−(t−t )+(z−z )] 4π iπ iπ +e [(t−t )+(z−z )] ln 1 − e− [(t−t )+(z−z )] ,
=−
(107)
and diverges logarithmically in the coincidence limit, as can be seen from (107). Indeed, subtracting the Hadamard bi-distribution (30) we obtain the finite limit .
lim
∞
(t ,z )→(t,z)
n=1
V(1) RB n (t, t , z, z ) − HM ((t, z), (t , z ))
γ m2 2 1 + ln 2π 4π 4π 2
∞ 2 1 (NR n) 2 R −2 2 + sin α + (sn ) cos α − . 2π n 4ωnR
=
(108)
n=1
With the aid of estimate (83) and (100), we can apply the dominated convergence theorem by similar arguments to the ones used for studying the term (102b) to obtain that
230
.
B. A. Juárez-Aubry and R. Weder ∞
lim
(t ,z )→(t,z)
=
n=1
RBV(2) n (t, t , z, z )
∞ (NR )2 n 4ωnR n=1
+
lim
(t ,z )→(t,z)
sin α 2
− (snR )−2 cos2 α
2(n − 1)π R z cos 2sn z − cos
∞ V(2)+ V(2)− RB n (t, t , z, z ) + RB n (t, t , z, z ) ,
(109)
n=1
with V(2)±
RB n
.
(t, t , z, z ) :=
(n−1)π 2 (NR n) 2 R −2 2 sin α − (s ) cos α ei [−(t−t )±(z+z )] . n R 8ωn (110)
Writing the second sum on the right-hand side of (109) as .
lim
(t ,z )→(t,z)
∞ V(2)− RBV(2)+ n (t, t , z, z ) + RB n (t, t , z, z ) n=1
(n−1)π 1 i (n−1)π [−(t−t )+(z+z )] e − + ei [−(t−t )−(z+z )] 4π n +
∞ 1 i (n−1)π [−(t−t )+(z+z )] i (n−1)π [−(t−t )−(z+z )] , e + e 4π n (t ,z )→(t,z)
lim
n=1
(111) one can use the dominated convergence theorem in view of (106) to finally obtain that .
lim
(t ,z )→(t,z)
∞ n=1
RBV(2) n (t, t , z, z ) =
−
∞ 2 (NR n) sin2 α − (snR )−2 cos2 α cos 2snR z R 4ωn n=1
2π 2π 1 2(n − 1)π 1 e−i z ln 1 − ei z , cos z − 2π n 2π
(112) where we have used (81a) to obtain the last term on the right-hand side of (112). The third term in (102a), defined by (102d), can be seen to converge absolutely (using (100)) and its limit as .(t , z ) → (t, z) can be applied to the summand by dominated convergence, whereby one obtains that .
lim
(t ,z )→(t,z)
∞ n=1
RB n (t, t , z, z ) = − V(3)
∞ 2 (NR n) R sin α cos α sin 2s z . n 2ωnR snR n=1
(113)
Quantum Field Theory with Dynamical Boundary Conditions and the Casimir Effect
231
Adding up (108), (112) and (113), we finally obtain that
iπ i2π γ 1 e− z ln 1 − e z − 2π 2π
∞ π cos2 α 2 (NR 1 n) 2 2 R 2 R 2 cos (n − 1)z + sin α cos sn z + R sin sn z − πn 2ωnR (sn )2 n=1 2 (NR n) R (114) − R R sin α cos α sin 2sn z , 2ωn sn
1 (R) ˆ B 2 (R) . |(ren ) (t, z) = 4π ln
m2 2 4π 2
+
where the summand on the right-hand side of (114) is .O(n−2 ) and the sum converges absolutely and uniformly in .z ∈ (0, ).
10.2 Renormalized Local State Polarization in the Boundary We are interested in analyzing the coincidence-limit behavior of (R) ˆ ∂ ˆ ∂ (t )(R) . | (t)
∞ cos α sins R 2 (NR n n ) −iωnR (t−t ) R = e β1 sin α cos sn − snR 2ωnR n=1
2 +β2 (cos α cos snR + snR sin α sin snR ) .
(115)
From our estimates (100), we observe that
2 cos α sins R 2 (NR n n) R R R R . β1 sin α cos sn − + β2 (cos α cos sn + sn sin α sin sn ) 2ωnR snR =
4 sin2 (α)(β1 β2 − β1 β2 )2 −6 + O n , π 5 (β2 )2 n5
(116)
so that the expression on the left-hand side of (116) provides a summable, .t, t independent bound for the summand on the right-hand side of (115), which allows us to use the dominated convergence theorem to write (R) ˆ∂ 2 ˆ ∂ ˆ ∂ (R)
(R)
(R) |( ) (t) = lim | (t) (t )
.
=
∞ (NR )2 n 2ωnR n=1
β1
t →t
cos α sins R n R sin α cos sn − snR
2 +β2 (cos α cos snR + snR sin α sin snR ) ,
(117)
232
B. A. Juárez-Aubry and R. Weder
where the summand on the right-hand side of (114) is .O(n−5 ) and the sum converges absolutely.
10.3 Local Casimir Energy in the Bulk We write the local Casimir energy as (R) ˆ B (R) |H (t, z)
.
=
1 ˆB ˆ B (R) ∂t ∂t + ∂z ∂z (R) lim | (t, z) (t , z ) 2 (t ,z )→(t,z)
m2 (R) ˆ B )2 (t, z)(R) . − ∂t ∂t + ∂z ∂z HM ((t, z), (t , z )) +
|( 2
(118) We are interested in extracting the coincidence-limit singular behavior of .
∞ 1 ˆB ˆ B (R) ∂t ∂t + ∂z ∂z (R) RBH(1) | (t, z) (t , z ) = n (t, t , z, z ) 2 n=1
H(2) H(3) +RB n (t, t , z, z ) + RB n (t, t , z, z )
,.
2 (NR n) R 2 R 2 2 R −2 2 ) + (s ) α + (s ) cos α (ω sin n n n 16ωnR R R R × e−iωn (t−t ) eisn (z−z ) + e−isn (z−z ) , .
(119a)
RBH(1) n (t, t , z, z ) :=
RBH(2) n (t, t , z, z ) :=
(119b)
2 (NR n) R 2 R 2 2 R −2 2 (ω sin ) − (s ) α − (s ) cos α n n n 16ωnR
R m2 R R V(2) (t, t , z, z ), . (119c) × e−iωn (t−t ) eisn (z+z ) + e−isn (z+z ) = R 2 Bn R sin α cos α 2 (NR R R n) RBH(3) e−iωn (t−t ) eisn (z+z ) − e−isn (z+z ) (ωnR )2 − (snR )2 n (t, t , z, z ) := − R R isn 8ωn =
m2 V(3) (t, t , z, z ). R 2 Bn
We start by handling the first term. We note that by estimate (83) (ωR )2 + (s R )2 (n−1)π n n R R e−iωn (t−t ) e±isn (z−z ) − ei [−(t−t )±(z−z )] . R ωn ⎫ k ⎬ 2 1 (n − 1)π (n − 1)π × (t − t ) ± i snR − (z − z ) −i ωnR − ⎭ k! k=0
(119d)
Quantum Field Theory with Dynamical Boundary Conditions and the Casimir Effect
233
(ωR )2 + (s R )2 R (n−1)π n n −i ωn − (t−t )±i snR − (n−1)π (z−z ) ≤ e R ωn ⎫ k ⎬ 2 1 (n − 1)π (n − 1)π −i ωnR − (t − t ) ± i snR − (z − z ) − ⎭ k! k=0
3 (ωnR )2 + (snR )2 −i ωR − (n − 1)π (t − t ) ± i s R − (n − 1)π (z − z ) ≤ n n R 6ωn 3 (ωnR )2 + (snR )2 ωR − (n − 1)π (t − t ) + s R − (n − 1)π (z − z ) . ≤ n n R 6ωn (120)
From the above bounds, and using the expansion (100), a dominated convergence argument (whereby one can find a summable summand—behaving as .O(n−2 ) for large .n, and that is uniform for .t, t in bounded set and in .z, z —that bounds the right-hand side of (120)) yields that ∞ +
lim
.
(t ,z )→(t,z)
n=1
RB n (t, t , z, z ) H(1)
cos2 α i (n−1)π [−(t−t )+(z−z )] R 2 R 2 2 (ωn ) + (sn ) sin α + R 2 e (sn ) k 2 1 (n − 1)π (n − 1)π R R −i ωn − (t − t ) − i sn − (z − z ) × = 0. k! (NR )2 − nR 16ωn
k=0
(121) We are thus interested in the coincidence limits of sums of the form ∞
R 2 (n−1)π (N ) . ei [−(t−t )±(z−z )] n R 16ωn n=1
cos2 α R 2 R 2 2 (ωn ) + (sn ) sin α + R 2 (sn )
k 2 1 (n − 1)π (n − 1)π R R −i ωn − (t − t ) ± i sn − (z − z ) . × k! k=0
(122) Observing from (100) that .
2 (NR n)
16ωnR
(ωnR )2 + (snR )2
sin2 α +
2 1 R − (n − 1)π (t − t ) −i ω n (snR )2 k=0 k!
cos2 α
234
B. A. Juárez-Aubry and R. Weder
k (n − 1)π H(1) ± (z − z ) − RBn ±i snR − (t, t , z, z ) = O n−2 , . (123a) i 2iπβ2 − β2 m2 (t − t ) − 2 β1 + β2 cot α (−(t − t ) ± (z − z )) π n H(1) ± RBn (t, t , z, z ) := 2 + 4 8β2 2 −
2 β2 m2 (t − t ) + 2 β1 + β2 cot α (−(t − t ) ± (z − z )) 32π(β2 )2 2 n
,
(123b)
a dominated convergence argument allows us to take the coincidence limit inside the sum and .
lim
(t ,z )→(t,z)
∞ + (n−1)π H(1) H(1) + RB n (t, t , z, z ) − ei [−(t−t )+(z−z )] RBn (t, t , z, z ) n=1
i (n−1)π
−e
[−(t−t )−(z−z )]
H(1) −
RBn
(t, t , z, z )
∞ 2 (NR π(n − 1) cos2 α n) R 2 R 2 2 (ωn ) + (sn ) sin α + R 2 − = . (sn ) 22 8ωnR
(124)
n=1
The second and third sum on the left-hand side of (124) can be obtained in closed form and together have the singular structure of . 12 (∂t ∂t + ∂z ∂z )HM ((t, z), (t , z )), cf. (30), plus .O(1) terms in the coincidence limit, from where it follows that .
lim
(t ,z )→(t,z)
∞
H(1) RB n (t, t , z, z ) −
n=1
1 (∂t ∂t + ∂z ∂z )HM ((t, z), (t , z )) 2
∞ 2 (NR π(n − 1) cos2 α n) R 2 R 2 2 = (ωn ) + (sn ) sin α + R 2 − (sn ) 22 8ωnR n=1
−
β1 m2 cot α π − + . − 2π 2πβ2 8π 242
(125)
Using (112) and (113) we obtain, respectively, .
lim
∞
(t ,z )→(t,z)
= −
n=1
∞ m2
2
n=1
RB n (t, t , z, z )
H(2)
2 (NR n) 2 R −2 2 R sin α − (s ) cos α cos 2s z n n 4ωnR
2π 2(n − 1)π m2 −i 2π z 1 cos z − e ln 1 − ei z 2π n 4π
(126)
Quantum Field Theory with Dynamical Boundary Conditions and the Casimir Effect
235
and .
lim
∞
(t ,z )→(t,z)
n=1
RBH(3) n (t, t , z, z ) = −
∞ 2 m2 (NR n) sin α cos α sin 2snR z . R R 2 2ωn sn n=1
(127) Collecting (125), (126), and (127), together with the expression for the renormalized local state polarization, (114) we finally have that β1 cot α π m2 (R) ˆ B (R) . |H (t, z) = − 242 − 2π − 2πβ + 8π 2
m2 2 1 + ln 4π 2
+
γ m2 4π
∞ 2 2π (NR m2 −i 2π z n) sin α cos α sin 2snR z e ln 1 − ei z − m2 R R 2π 2ωn sn n=1
∞ 2 (NR cos2 α π(n − 1) n) R 2 R 2 2 (ωn ) + (sn ) + − sin α + R 22 8ωnR (sn )2 n=1 ∞ 2 2(n − 1)π 1 m2 (NR n) 2 R −2 2 R cos z sin α − (sn ) cos α cos 2sn z − + 2 2π n 4ωnR
−
n=1
+
∞ m2
2
n=1
2 (NR n)
2ωnR
cos2 α sin2 α cos2 snR z + R sin2 snR z (sn )2
π 1 2 cos (n − 1)z . − πn
(128) The summand on the right-hand side of (128) is .O(n−2 ) and the sum converges absolutely and uniformly in .z ∈ (0, ).
10.4 Local Casimir Energy in the Boundary We have defined the Casimir energy in the boundary as 1 (R) ˆ∂ ˆ ∂ ˆ ∂ (R) lim |β |∂t ∂t − (sign β1 )β1 (R)
(R) | (t) (t ) , |H (t) = 2 t →t 1 (129)
.
so we wish to analyze the coincidence-limit behavior of .
(R) ∂ 1 ˆ (t) ˆ ∂ (t )(R) |β1 |∂t ∂t − (sign β1 )β1 | 2
236
B. A. Juárez-Aubry and R. Weder
∞ 2 |β1 |(ωnR )2 − (sign β1 )β1 (NR n ) −iωnR (t−t ) β1 sin α cos snR e = R 4ωn n=1
2 cos α sin snR R R R − + β2 (cos α cos sn + sn sin α sin sn ) . snR
(130)
We observe that from the asymptotics (100)
R 2 2 cos α sins R β1 (ωn ) − β1 (NR n) n . β1 sin α cos snR − 4ωnR snR 2 +β2 (cos α cos snR + snR sin α sin snR ) =
β1 2 sin2 (α)(β1 β2 − β1 β2 )2 + O n−4 . 3 2 3 2π (β2 ) n
(131)
Hence, (131) provides a summable, .t, t -independent bound for the summand on the right-hand side of (130), which allows us to use the dominated convergence theorem to take the .t → t limit inside the sum defining the boundary Casimir energy as follows: (R) ∂ 1 (R) (R) ˆ (t) ˆ ∂ (t )(R) |β1 |∂t ∂t − (sign β1 )β1 |
|Hˆ ∂ (t) = lim 2 t →t
∞ 2 cos α sins R |β1 |(ωnR )2 − (sign β1 )β1 (NR n) n R = β1 sin α cos sn − 4ωnR snR
.
n=1
(132) 2 + β2 (cos α cos snR + snR sin α sin snR ) . The summand on the right-hand side of (132) is .O(n−3 ) and the sum converges absolutely. Acknowledgments While this research was carried out, Benito A. Juárez-Aubry was supported by a DGAPA-UNAM Postdoctoral Fellowship. Benito A. Juárez-Aubry and Ricardo Weder are fellows of the Sistema Nacional de Investigadores. This paper was partially written while Ricardo Weder was visiting the Institut de Mathématique d’Orsay, Université Paris-Saclay. Ricardo Weder thanks Christian Gérard for his kind hospitality. Research partially supported by projects PAPIITDGAPA UNAM IN103918 and IN 100321, as well as by project SEP-CONACYT CB 2015, 254062.
Quantum Field Theory with Dynamical Boundary Conditions and the Casimir Effect
237
References 1. A. Arai. Analysis on Fock Spaces and Mathematical Theory of Quantum Fields: An Introduction to Mathematical Analysis of Quantum Fields . World Scientific Publishing, Singapore, 2018. 2. A. Ashtekar and B. Krishnan. Isolated and dynamical horizons and their applications. Living Rev. Rel., 7:10, 2004. 3. A. Ashtekar, J. Baez, A. Corichi, and K. Krasnov. Quantum geometry and black hole entropy. Phys. Rev. Lett., 80:904–907, 1998. 4. J. F. Barbero G., B. A. Juárez-Aubry, J. Margalef-Bentabol, and E. J. S. Villaseñor. Quantization of scalar fields coupled to point-masses. Class. Quant. Grav., 32(24):245009, 2015. 5. J. F. Barbero G., B. A. Juárez-Aubry, J. Margalef-Bentabol, and E. J. S. Villaseñor. Boundary Hilbert spaces and trace operators. Class. Quant. Grav., 34(9):095005, 2017. 6. J. F. Barbero G, B. Díaz, J. Margalef-Bentabol, and E. J. S Villaseñor. Dirac’s algorithm in the presence of boundaries: a practical guide to a geometric approach. Class. Quant. Grav., 36(20):205014, 2019. 7. J. Ben Amara, and A. A. Shkalikov. A Sturm-Liouville problem with physical and spectral parameters in boundary conditions. Mathematical Notes, 66(2): 127–134, 1999. 8. M. Bordag, G. L. Klimchitskaya, U. Mohideen, and V. M. Mostepanenko. Advances in the Casimir Effect. International Series of Monographs on Physics. Oxford University Press, 2009. 9. J. B. Conway. Functions of One Complex Variable I. Springer-Verlag, Berlin, 1978. 10. C. Dappiaggi, H. R. C. Ferreira, and B. A. Juárez-Aubry. Mode solutions for a Klein-Gordon field in anti-de Sitter spacetime with dynamical boundary conditions of Wentzell type. Phys. Rev. D, 97(8):085022, 2018. 11. P. C. W. Davies, S. A. Fulling, and W. G. Unruh. Energy Momentum Tensor Near an Evaporating Black Hole. Phys. Rev. D, 13: 2720–2723, 1976. 12. Y. Decanini and A. Folacci. Off-diagonal coefficients of the Dewitt-Schwinger and Hadamard representations of the Feynman propagator. Phys. Rev. D, 73:044027, 2006. 13. J. Derezi´nski and C. Gérard. Mathematics of Quantization and Quantum Fields. Cambridge Monographs on Mathematical Physics. Cambridge University Press, 2013. 14. S. Deser, R. Jackiw, and S. Templeton. Three-Dimensional Massive Gauge Theories. Phys. Rev. Lett., 48:975–978, 1982. 15. S. Deser, R. Jackiw, and S. Templeton. Topologically Massive Gauge Theories. Annals Phys. 140:372–411,1982. [Erratum:Annals Phys,185:406, 1988; Annals Phys., 281:409, 2000]. 16. V. V. Dodonov. Fifty Years of the Dynamical Casimir Effect. MDPI Physics, 2(1):67–104, 2020. 17. C. D. Fosco, F. C. Lombardo, and F. D. Mazzitelli. Vacuum fluctuations and generalized boundary conditions. Phys. Rev. D, 87(10):105008, 2013. 18. O. J. Franca, L. F. Urrutia, and O. Rodríguez-Tzompantzi. Reversed electromagnetic Vasilovˇ Cerenkov radiation in naturally existing magnetoelectric media. Phys. Rev. D, 99(11):116020, 2019. 19. S. A. Fulling. Aspects of Quantum Field Theory in Curved Space-time. Cambridge University Press, Cambridge,1989. 20. S. A. Fulling and S. N. M. Ruijsenaars. Temperature, Periodicity and Horizons. Physics Reports,152 (3), 135–176, 1987. 21. C. T. Fulton. Two-point boundary value problems with eigenvalue parameter contained in the boundary conditions. Proc. Roy. Soc. Edinburgh Sect. A , 77:293–308, 1977. 22. M. R. R. Good, P. R. Anderson, and C. R. Evans. Time Dependence of Particle Creation from Accelerating Mirrors. Phys. Rev. D, 88: 025023, 2013. 23. I. S. Gradshteyn and I. M. Ryzhik. Table of Integrals, Series, and Products. Academic Press, 2007. 24. S. W. Hawking, and G. F. R. Ellis. The Large Scale Structure of Space-Time. Cambridge Monographs on Mathematical Physics. Cambridge University Press, 2011.
238
B. A. Juárez-Aubry and R. Weder
25. B. A. Juárez-Aubry, and J. Louko. Onset and decay of the 1 + 1 Hawking-Unruh effect: what the derivative-coupling detector saw. Class. Quant. Grav, 31(24): 245007, 2014. 26. B. A. Juárez-Aubry, and R. Weder. Quantum field theory with dynamical boundary conditions and the Casimir effect: coherent states. J. Phys. A, 54(10):105203, 2021. 27. B. A. Juárez-Aubry, and R. Weder. A short review of the Casimir effect with emphasis on dynamical boundary conditions. [arXiv:2112.06824 [hep-th]], 2021. Joint Proceedings of the XIX Mexican School of Particles and Fields and the XXXV Annual Meeting of the Division of Particles and Fields of the Mexican Society of Physics, Suplemento de la Revista Mexicana de Fisica, 3, 020714, 1–7, 2022. 28. D. Karabali and V. Nair, Boundary Conditions as Dynamical Fields. Phys. Rev. D, 92(12):125003, 2015. 29. B. S. Kay. The Casimir Effect in Quantum Field Theory. Phys. Rev.D, 20:3052, 1979. 30. J. M. Maldacena. The Large N limit of superconformal field theories and supergravity. Int. J. Theor. Phys., 38:1113–1133, 1999. 31. M. Marletta, A. Shkalikov, and C. Tretter. Pencils of differential operators containing the eigenvalue parameter in the boundary conditions. Proc. Roy. Soc. Edinburgh Sect. A , 133: 893–917, 2003. 32. A. Martín-Ruiz, M. Cambiaso, and L. F. Urrutia. Green’s function approach to Chern-Simons extended electrodynamics: An effective theory describing topological insulators. Phys. Rev.D, 92(12):125015, 2015. 33. R. Mennicken, and M. Möller. Non-Self-adjoint Boundary Value Problems. North Holland Elsevier, Amsterdam, 2003. 34. NIST Digital Library of Mathematical Functions. Release 1.0.25 of 2019-12-15. 35. A. Parra-Rodriguez, E. Rico, E. Solano, and I. L. Egusquiza. Quantum networks in divergencefree circuit QED. Quantum Science and Technology, 3(2): 024012, 2018. 36. M. J. Radzikowski. Micro-local approach to the Hadamard condition in quantum field theory on curved space-time. Commun. Math. Phys.,179:529–553, 1996. 37. H. Sahlmann and R. Verch. Passivity and microlocal spectrum condition. Commun. Math. Phys., 214:705–731, 2000. 38. J. F. Schonfeld. A mass term for three-dimensional gauge fields. Nucl. Phys. B, 185:157–171, 1981. 39. A. A. Shkalikov, and C. Tretter. Spectral analysis for linear pencils N − λP of ordinary differential operators. Math. Nach., 179: 275–305, 1996. 40. K. Skenderis. Lecture notes on holographic renormalization. Class. Quant. Grav., 19:5849– 5876, 2002. 41. C. Tretter. Boundary eigenvalue problems for differential equations N η = λP η with λ−polynomial boundary conditions. J. Differential Equations, 170: 408–471, 2001. 42. E. C. Titchmarsch. Eigenfunction expansions associated with second order differential equations. Vol. 1. Oxford University Press, 1946. 43. W. G. Unruh. Notes on black hole evaporation. Phys. Rev. D,14:870, 1976. 44. R. M. Wald. Quantum field theory in curved spacetime and black hole thermodynamics. University of Chicago Press, 1994. 45. J. Walter Regular eigenvalue problem with eigenvalue parameter in the boundary condition. Math. Z., 133: 301–312, 1973. 46. C. M. Wilson, G. Johansson, A. Pourkabirian, M. Simoen, J. R. Johansson, T. Duty, F. Nori, and P. Delsing. Observation of the dynamical Casimir effect in a superconducting circuit. Nature, 479:376, 2011. 47. E. Witten. Anti-de Sitter space and holography. Adv. Theor. Math. Phys., 2:253–291, 1998. 48. E. Witten. Quantum Field Theory and the Jones Polynomial. Commun. Math. Phys., 121:351– 399, 1989. 49. J. Zahn. Generalized Wentzell boundary conditions and quantum field theory. Ann. Henri Poincaré, 19(1):163–187, 2018.
Where Is a Photon in an Interferometer? Yosi Avron
Alex Grossmann has been a beacon of light and warmth to me, a teacher, mentor, dear friend and father figure.
Abstract In a paper titled “Asking photons where they have been?” [1] Danan, Farfurnik, Bar-Ad, and Vaidman describe an experiment with pre and post-selected photons going through nested Mach-Zehnder interferometers. They find that some of the mirrors leave no footprints on the signal and interpret this as evidence that the photon skipped these mirrors. They argue that the experiment supports AharonovVaidman’s formulation of quantum mechanics [2] where post-selected particles are assigned disconnected trajectories. I review the experiment and analyze it within the orthodox framework of quantum mechanics. The standard view of interfering trajectories accounts for the experimental findings.
1 The Experiment Consider the nested Mach-Zehnder interferometers shown in Fig. 1. Each one of the five mirrors .{A, B, C, E, F } oscillates with its characteristic frequency. The intensity of light falling on the top half of the detector surface is compared with the intensity falling on the bottom half. The power spectrum of the signal bears evidence to the oscillation frequencies of the mirrors. The interferometer is tuned so that frequencies corresponding to mirrors .A, B, C show up in the power spectrum but those of .E, F do not. The appearance of a characteristic frequency in the signal gives, of course, evidence that photons hit the corresponding mirror. The authors of [1] go one step further and interpret the absence of a characteristic frequency as evidence that no photon visited the corresponding mirror. Since the photon apparently succeeded in going undetected
Y. Avron () Department of Physics, Technion, Haifa, Israel © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Flandrin et al. (eds.), Theoretical Physics, Wavelets, Analysis, Genomics, Applied and Numerical Harmonic Analysis, https://doi.org/10.1007/978-3-030-45847-8_13
239
240
Y. Avron
Fig. 1 A nested Mach-Zehnder interferometer: the four squares represent beam splitters where the ratio .p:q gives the intensity ratio of the outgoing beams. The external Mach-Zehnder has .1:2 beam splitters and the internal .1:1 beam splitters. The black lines marked {A,. . . , E} represent mirrors that slightly oscillate, each with its own specific frequency. Light is fed from the top left and measured by a “quad-detector” D which compares the signal on its top half with the bottom half. The colors correspond to the colors of the modes in Fig. 3
by the gate keepers of the .A, B mirrors, it seemed to have a skipping orbit that visited disconnected parts of space.
2 The History of Post-selected States The two state formulation of Aharonov and Vaidman [2] attempts to assign a consistent history to post-selected states. The input state in Fig. 2 evolves forward in time growing to the thick line. The post-selected state starts at the detector and evolves backward in time to become the thin line. The consistent history is, by definition, their intersection which is the disconnected set, the L shaped line and the square in Fig. 2. The notion of consistent history gives a simple account of the experiment of Danan et al. [1]: The photon never visited the E and F mirrors. However, the explanation comes with the price tag of disconnected trajectories. This is not what textbooks in quantum mechanics teach students. Danan et al. [1] are careful not claim that a disconnected path is the inevitable conclusion of their experiment. In fact, they show that the experiment can also be explained in terms of interfering classical waves. However, there is little doubt that the attention drawn to the work owes much to the support it gives to the unorthodox approach of Aharonov and Vaidman [2] and the concomitant controversy [3].
Where Is a Photon in an Interferometer?
241
Fig. 2 The blue graph describe the state that evolve forward in time and the red graph the state that starts at the detector and evolve backward in time. The blue graph is a connected set and so is the red graph. However, their intersection is a disconnected set: The line through C and the square touching A and B
The experiment has been analyzed in [1] both within Aharonov and Vaidman unorthodox formulation of quantum mechanics and also within classical Maxwell theory. What still appears to be missing is its analysis within orthodox quantum mechanics [4]. This is what I do here. The analysis is based on a quantum circuit model. I recover the main results in [1] and explain the weak footprints of some of the mirrors in terms of destructive quantum amplitudes.
3 A Quantum Circuit Model The orientations of the mirrors .A, B, C, E, F are functions of time. As the interaction time of the photon with each mirror is very short, one may consider the mirrors to be effectively at rest. The angles of the mirrors may then be viewed as parameters. The measurement of the power spectrum in [1] is, for the purpose of the analysis, just a clever trick to gain information about the instantaneous angles .α, β, γ , η, φ, of the mirrors. A quantum mechanical model corresponding to experiment is the circuit of gates shown in Fig. 3. The photon (annihilation) operator has two coordinates: The channel index j and wave vector .k: aj (k),
.
j ∈ {1, 2, 3},
k ∈ R2
(3.1)
(The top channel is labeled 1 the mid-channel 2 and the bottom channel 3 in Fig. 3. k describes the photon wave vector in the plane.) The modes satisfy the canonical commutation relations
.
[aj (k) , ak (k )† ] = δj k δ(k − k )
.
(3.2)
242
Y. Avron
Fig. 3 A circuit diagram corresponding to the experimental setting in Fig. 1. .A, B, C, E, F represents the mirrors with deflection angles .α, β, γ , η, φ. The deflection angles .γ and .φ are shown. The beam splitters are represented by the crossings. D is the detector. The top channel is labeled 1 the mid-channel 2 and the bottom channel 3. .|0 denotes the photonic vacuum and .|ϕ a single photon state
The beam splitter .Sj k acts trivially on .k and non-trivially on the mode indices j, k. Since a beam splitter is time reversal invariant the corresponding matrix may be chosen symmetric.1 The two beam splitters may be chosen as:
.
S12 (p : q) =
.
√ √ p i q , √ √ i q p
1 1i S23 = √ 2 i 1
(3.3)
with .p + q = 1. .p : q is the intensity ratio of the outgoing beams. In Fig. 1 .p → 1, q → 2. The mirrors act trivially on the channel index j and non-trivially on the wave vector .k. The mirror A, with angle .α/2 (deflection angle .α), acts on the vector .k as a reflection, represented by the orthogonal, symmetric matrix .Mα with .det Mα = −1: k → Mα k,
Mα =
.
cos α sin α , sin α − cos α
Mα2 = 1
(3.4)
The mirror acts on .|k as a unitary map, ˆ UA =
.
dk |Mα k k|
(3.5)
Similarly for the other mirrors. Since a reflection is an orthogonal transformation that is orientation reversing, an even number of reflections is a rotation and an odd number is a reflection. The total deflection associated with the path .E − A − F is the reflection Mφ Mα Mη = Mα˜ ,
.
α˜ = −α + η + φ
(3.6)
scattering matrix S is time reversal invariant if .T S = S −1 T with T the anti-unitary time reversal. Gauge freedom allows to redefine .Sj k → zj Sj k w¯ k where .|zj | = |wk | = 1 are arbitrary phases assigned to the incoming and outgoing modes.
1A
Where Is a Photon in an Interferometer?
243
Similarly for the path .E − B − F Mφ Mβ Mη = Mβ˜ ,
.
β˜ = −β + η + φ
(3.7)
One can decorate the circuit with unitaries that represent phase delays between channels. This is important in practice as it allows to tune the interferometers. However, for the sake of simplicity it is best not to.
4 Three Interfering Paths A photon entering the circuit at the blue port 1 of Fig. 3 has three routes to get to the detector: The route through .E − A − F and the route through .E − B − F and the route through C. By the linearity of quantum mechanics the amplitude at the detector is the sum of the amplitudes associated with the three routes: .
1, k |UA |1, k + 1, k |UB |1, k + 1, k |UC |1, k
(4.1)
The three amplitudes can be computed explicitly by tracing the gates in the three channels in Fig. 3. Since the beam splitters and mirrors act on different parts of the tensor product their action commute. And, since all the rotation angles are in the plane, they too commute. Using Eqs. 3.3 and 3.5 one finds .
1, k |UA |1, k = 1| S12 |2 2| S23 |22 2| S12 |1 δ(k − Mα˜ k) q = − δ(k − Mα˜ k). 2
(4.2)
q δ(k − Mβ˜ k). 2
(4.3)
1, k |UB |1, k = 1| S12 |2 2| S23 |3 3| S23 |2 2| S12 |1 δ(k − Mβ˜ k) =
1, k |UC |1, k = 1| S12 |1 δ(k − Mγ k) = p δ(k − Mγ k)
(4.4)
The only subtlety here is the relative sign between Eq. 4.2 and Eq. 4.3 which comes from .i 2 = −1 and .i 4 = 1. The first term in Eq. 4.1 depends on .α, ˜ the angle accumulated along the path A. Similarly, the second term depends on .β˜ and the third on .γ . The detection amplitude ˜ γ ). Since .α˜ and .β˜ depend on is therefore a function of the three variables .(α, ˜ β, .η + φ, variations of the angles that satisfy .δη + δφ = 0 do not affect the detection. More generally, variations of the five mirrors that satisfy δα = δ(η + φ),
.
δβ = δ(η + φ)
do not affect the detection amplitudes at all.
(4.5)
244
Y. Avron
4.1 The Amplitudes of the Outgoing State Let .|ϕ be the incoming one photon state in channel 1. Denote the outgoing state .
|ϕ ˜ = (UA + UB + UC ) |ϕ
(4.6)
Taking into account that a reflection is its own inverse, Eqs. 4.2–4.4 give for the outgoing amplitude .
q q 1, k|ϕ ˜ = − ϕ(Mα˜ k) + ϕ(Mβ˜ k) + pϕ(Mγ k) 2 2
(4.7)
˜ and the first two terms cancel. The detection When .α = β, then also .α˜ = β, amplitude then depends only on the .γ . A photon that goes through E does not get to the detector and is dumped in the 3 (green) output channel.
4.2 The Small Parameter: ε = α − β In the case that the mirrors A and B are synchronized so that .α ≈ β the first two terms almost cancel and to leading order in .ε = α − β we have .
q ϕ (Mα˜ k) 2
1, k|ϕ ˜ ≈ p ϕ(Mγ k) − ε
(4.8)
where .ϕ is the derivative of .ϕ with respect to the angle. This amplitude depends on .η + φ as they appear in .α. ˜ However, ˜ ≈ −ε ∂η 1, k|ϕ
.
q ϕ (Mα˜ k) 2
(4.9)
is small because of the .ε factor. We conclude that when .α ≈ β the detector is insensitive to the changes in the angles of the mirrors E and F . If we further assume that α ≈ β,
.
γ ≈ α˜
(4.10)
then to leading order in .ε .
1, k|ϕ ˜ ≈ p ϕ(Mγ k) − ε
q ϕ (Mγ k) 2
(4.11)
The RHS is now independent of .η and .φ. In particular, Eq. 4.10 is the case if all the angles are small. This is the case considered in [1].
Where Is a Photon in an Interferometer?
245
5 The Quad-Detector A quad-detector is a photo-detector that realizes the quantum observable of the difference of projections I = W+ − W− ,
.
W± = |w± w± |
(5.1)
w± are the characteristic functions of the boxes .W± , illustrated in Fig. 4
.
W± = {x, y | 0 > x > −1, 0 < ±y < 1}
.
(5.2)
˜ The probability amplitude for being detected by .W± is .w± |ϕ.
5.1 Calibrating the Detector Equation 4.8 gives the detection amplitudes .
w± |ϕ ˜ = A± + ε B±
(5.3)
The interesting term .εB± is masked by the dominant .A± . The quad-detector comes with a neat calibration trick that allows to expose the subdominant term magnifying it using the dominant term .A± . The signal of the quad-detector is ˜ |2 − | w− |ϕ ˜ |2 ≈ |A+ |2 − |A− |2 +2 ε Re(A+ B¯ + − A− B¯ − ) | w+ |ϕ
.
(5.4)
calibrate to 0
By shifting the quad-detector up or down one can make the leading term vanish. This calibration makes the quad-detector sensitive to the sub-leading term of .O(ε).
Fig. 4 A quad-detector is a photo-detector whose output is the difference of the signal of the upper box .W+ from the signal of the lower box .W−
246
Y. Avron
6 Concluding Remarks • The authors of [1] describe an experiment which agrees with a prediction of Aharonov-Vaidman theory [2] that post-selected quantum particles have discontinuous quantum trajectories. • Orthodox quantum mechanicians will find the claim: Quantum particles have discontinuous paths, shocking. • In the Feynman path integral formulation of quantum evolution, [5], a quantum particle tries all continuous paths simultaneously. Each path comes with a complex phase factor, eiS/h¯ . • The orthodox quantum mechanical analysis of the experiment [1] attributes the observation to the almost perfect destructive interference of paths from the source to the detector. • Asher Peres coined the aphorism: Physics is not an exact science, it is the science of approximation. The observation in [1] reflects the approximate nature of the measurement whereas the statement: The photon path is discontinuous is dichotomic. As such, it is difficult to reconcile with an approximation to reality. • As every amateur detective knows, the absence of footprints does not rule out a crime. Acknowledgments I thank Shimshon Bar-Ad and especially Lev Vaidman for several helpful conversations. Yoav Sagi for his criticism and Oded Kenneth for insightful comments and pruning an early version of the manuscript from errors and obscurities. I thank Thierry Paul for careful reading of the manuscript and making many useful suggestions.
References 1. A. Danan, D. Farfurnik, S. Bar-Ad, and L. Vaidman. Asking photons where they have been. Phys. Rev. Lett., 111:240402, Dec 2013. 2. Yakir Aharonov and Lev Vaidman. The two-state vector formalism: An updated review. Lecture Notes in Physics, 734:399–447, 2008. 3. Lev Vaidman. Weak value controversy. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 375.2106, 2017. 4. Christopher Fuchs and Asher Peres. Quantum theory needs no interpretation”. Physics Today, 53, 09 2000. 5. Richard Phillips Feynman and Albert Roach Hibbs. Quantum mechanics and path integrals. International series in pure and applied physics. McGraw-Hill, New York, NY, 1965.
About the Derivation of the Quasilinear Approximation in Plasma Physics Claude Bardos and Nicolas Besse
Abstract This contribution, built on the companion paper [1], is focused on the different mathematical approaches available for the analysis of the quasilinear approximation in plasma physics.
1 Introduction and Notation The origin of this contribution is the issue of the approximation of solutions of the Vlasov equation ∂t f + v · ∇x f + E · ∇v f = 0 ,
.
where .f (t, x, v) is a probability density driven by a self-consistent potential, given in terms of this density by the Poisson equation ˆ .
− (t, x) = ρ(t, x) =
Rdv
f (t, x, v)dv − 1 ,
E(t, x) = −∇(t, x) ,
in the domain .Td × Rdv , with .Td = (Rx /2π Z)d , by a parabolic (linear or nonlinear) diffusion equation for the space averaged density of particles, namely ∂t f (t, v) − ∇v · D(t, v)∇v f (t, v) = 0 .
.
(1)
C. Bardos () Laboratoire J.-L. Lions, Sorbonne Université, Paris, France N. Besse Laboratoire J.-L. Lagrange, Observatoire de la Côte d’Azur, Université Côte d’Azur, Nice, France e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Flandrin et al. (eds.), Theoretical Physics, Wavelets, Analysis, Genomics, Applied and Numerical Harmonic Analysis, https://doi.org/10.1007/978-3-030-45847-8_14
247
248
C. Bardos and N. Besse
Such equation carries the name of quasilinear approximation and is a very active subject of plasma physics. Here, relying on a companion paper [1] (devoted to a more detailed physical analysis and more focused on the interpretation of turbulence), we focus on the different mathematical approaches motivated by all the recent progress (for instance, around the question of Landau damping) on the analysis of the Vlasov equation. Starting from the natural scaling derived for instance in [1], we propose a rescaled version of the Vlasov equation and first show obstructions to the convergence to an equation of the type (1) with a non-zero diffusion. This leads to the introduction of a stochastic vector field, hence to a non- self-consistent Liouville equation. There, a direct approach (as in the contributions of A. Vasseur and coworkers [26, 28]) produces a complete positive answer. A more classical analysis leads to the comparison with the present results on Landau damping and to an alternate approach based on the spectral theory, and at variance with the rescaled equation, valid only for short time. No complete proof is given, but a natural road for convergence based on some plasma physics computations is proposed. The complete proof will be addressed in a future work. As a short-time correction, it may play in the subject the same role as was done by the introduction of the diffusion in the macroscopic limit of the Boltzmann equation by Ellis and Pinsky [13]. As a mathematical contribution to physics (however modest it may be), we dedicate this chapter to the memory of Alex Grossman. Besides being recognized for super scientific achievement in particular with the introduction of wavelets, he will be remembered over many years with his generous and charismatic influence on our community.
1.1 Notation and Some Hypotheses The flow .S(t) : f → f (t, x − vt, v) , with .s → x − vs denoting the free advection flow modulo .(2π )d on .Td , is the advection flow generated by the operator .−v · ∇x . In the same way, we also introduce the flow .S ε generated by the operator .−ε−2 v·∇x and defined by Stε f = S
.
t ε2
t f = f t, x − v 2 , v . ε
These are unitary groups in any .Lp (Td ×R+ v ), with .1 ≤ p ≤ ∞, which preserve the positivity and the total mass. Since this is not the relevant issue for our discussion, in the presence of an .ε > 0 , the initial data .f ε (0, x, v) is assumed to be independent of .ε and as smooth as required (hence, taking in account existing regularity results [16] for the Vlasov equation) to have global-in-time solutions that will satisfy the relevant computations. On the other hand, emphasis has to be put on the regularity estimates that are independent of .ε .
Quasilinear Approximation in Plasma
249
As observed in many previous publications starting from Landford [23], limit can be obtained not at the level of the equation but at the level of the solution itself. As a consequence, we will use the first-order Duhamel formula ˆ f (t, x, v) = St f (0) −
t
.
0
dσ1 St−σ1 E(σ1 ) · ∇v f (σ1 ) ,
and in Sect. 2 an avatar of the second-order Duhamel, to connect the value of .f (σ1 ) with the value of .f (σ2 ) according to the Duhamel formula ˆ f (σ1 ) = Sσ1 −σ2 f (σ2 ) −
σ1
.
σ2
dσ Sσ1 −σ E(σ ) · ∇v f (σ ) ,
which finally gives ˆ f (t) = St f (0) − ˆ
.
+
ˆ
t
0 σ1
dσ1 St−σ1 E(σ1 ) · ∇v Sσ1 −σ2 f (σ2 )
dσ St−σ1 E(σ1 ) · ∇v (Sσ1 −σ E(σ ) · ∇v f (σ )) .
dσ1 0
t
σ2
ffl Denoting by . dx the x-average on .Td , one obtains ∂t
.
f (t, x, v)dx + ∇v ·
E(t, x)f (t, x, v)dx = 0 .
(2)
The second term of (2) is the divergence of the averaged flux J =
.
E(t, x)f (t, x, v)dx .
Therefore, almost all the rest of this contribution is devoted to the determination of such flux (sometimes called a “Fick law”). Since weak convergence involves distributions and test functions, such duality is denoted by the following bracket notation .., .. Using the fact that .D(Rt × Td × Rdv ) = D(Rt ) ⊗ D(Td ) ⊗ D(Rdv ), test functions depending on one or an other of these spaces will be used according to convenience. The symbol .T ε will be used to denote any cluster point (in the sense of distributions or under some other stronger topology) of a family .{T ε }ε of bounded distributions. Scalar product in .Rd is either denoted by the dot symbol .· or enclosed in the parentheses .(·, ·). Eventually, .z denotes the complex conjugate of the complex number z.
250
C. Bardos and N. Besse
1.2 The Rescaled Liouville Equation Both from plasma physics considerations (see [1]) and also because it is compatible with the scaling invariance of the diffusion in the velocity variable (see Eq. (1)), densities .f ε (t, x, v), solutions of the following rescaled (with .ε > 0) Liouville equation, ∂t f ε +
.
v Eε · ∇v f = 0 , · ∇x f ε + 2 ε ε
with E ε = −∇ε ,
(3)
are considered. As it will soon appear below, the specific behavior of the solutions as .ε → 0 (and/or as .t → ∞) depends on the time regularity of the potential rather than the space regularity and on properties of the initial data. Then, unless otherwise specified, it is assumed below that the initial data .f0 ∈ S(Td × Rdv ) is an .εindependent smooth function and that .E ε = −∇ε is, locally in time, uniformly Lipschitz with respect to the variable x. Under such hypotheses, solutions of Eq. (3) are well and uniquely defined. On the other hand, no assumption is made on the uniform regularity of the solution .f ε (t, x, v) either as .t → ∞ or as .ε → 0. Hence, only .ε-independent estimates, which are in agreement with the classical results including in particular those concerning the solution of the Vlasov equation, are based on the fact that the Liouville equation preserves positivity and Lebesgue measure.
+
∀t ∈ R ,
ˆ 0 ≤ f (t, x, v) ≤ ε
.
∀1 ≤ p ≤ ∞,
∀ t ∈ R+ ,
sup (x,v)∈Td ×Rdv
f0 (x, v),
Td ×Rdv
f ε (t, x, v)dxdv = 1 ,
f ε (t) Lp (Td ×Rdv ) = f (0) Lp (Td ×Rdv ) .
To use the scaling and the ergodicity of the d-dimensional torus, the following proposition is recalled. Proposition 1 (Ergodicity) 1. Any .g ∈ Lp (Td × Rdv ), with .1 ≤ p ≤ ∞, which satisfies the relation v · ∇x g = 0,
.
in
D (Td × Rdv ) ,
(4)
is an x-independent function. 2. The solutions .f ε of the equation ε2 ∂t f ε + v · ∇x f ε = 0 ,
.
f ε (0, x, v) = f0 (x, v) ∈ Lp (Td × Rdv ) , ffl
d converge in .L∞ (R+ t ; D (Rv )) to . f0 (x, v)dx.
(5)
Quasilinear Approximation in Plasma
251
This proposition and its use in Landau damping are classical (see [5, 18]), and, for sake of completeness, its proof is shortly recalled below. Proof From (4), one deduces the relation .
d g(x − vt, v) = 0 , dt
which by Fourier transform gives 1 − exp(ik · vt) g(k, ˆ v) = 0,
∀k ∈ Zd , v ∈ Rd , t ∈ R .
.
This relation implies that the support of .gˆ is contained in the set supp(g) ˆ := {(k, v) ∈ Zd × Rd | k · vt ∈ 2π Z, ∀t ∈ R} .
.
For any .δ, T , r, R > 0, such that .δ < T and .r < R, the Lebesgue measure of the set .supp(g) ˆ for .δ < t < T and .r < |v| < R is zero. Therefore, since .g belongs to p d d .L (T × Rv ), this forces .g ˆ to be equal to zero for all .k = 0, and finally, one obtains g = g(0, ˆ v) =
g(x, v)dx .
.
In the same way for the point 2, use the fact that the solution of (5) is given by f ε (t, x, v) = f0 (x − vt/ε2 , v) to write, for all .k ∈ Zd and for any .φ ∈ D(Rdv ) ,
.
ˆ .
Rdv
fˆε (t, k, v)φ(v)dv
ˆ =
Rdv
dxf ε (t, x, v)e−ik·x φ(v)dv
dv
ˆ =
Rdv
t
(6)
ik·v fˆ0 (k, v)φ(v)e ε2 dv .
Since .f0 (x, v) ∈ Lp (Td ×Rdv ) , by the Riemann– Lebesgue theorem, the right-hand side of (6) goes to 0 for .k = 0 as .ε → 0 , hence completing the proof of the point 2. As a consequence, writing the rescaled Liouville equation in the following form, v · ∇x f ε = −ε2 ∂t f ε − εE ε · ∇v f ε ,
.
(7)
one deduces from the uniform estimates that any cluster point .f ε = f ε (t, v) of the d d family .{f ε }ε , in the .L∞ (R+ t × T × Rv ) weak.− topology, is independent of x and is a solution of the equation ∂t f ε + ∇v ·
.
Eε f ε dx ε
= 0.
(8)
252
C. Bardos and N. Besse
In (8), the Fick law (relating the variation of the density to the divergence of the current) appears as the ratio of two terms going at least formally to zero because under the hypothesis .E ε f ε = E ε f ε one has E ε f ε dx = −
.
∇ε (t, x)f ε (t, v)dx = 0 .
The justification of the quasilinear approximation would be the proof that .
Eε f ε dx = −D(t, v)∇v f ε , ε
(9)
with .D(t, v) being a non-negative diffusion matrix.
1.3 Obstruction to the Convergence to a Non-degenerate Diffusion Matrix The Liouville equation is the paradigm of an Hamiltonian system, while the diffusion equation is the model of an irreversible phenomenon. Then a paradox comes from the comparison of the two equations 1 d 2 dt .
1 d 2 dt
ˆ Rdv
dv
dx |f ε |2 = 0 and
dv
dx |f ε |2
ˆ
ˆ
Rdv
+
Rdv
dv
dx D(t, v)∇v f ε , ∇v f ε = 0 .
(10)
This paradox has to be resolved to justify, in some cases, that the diffusion matrix .D is not degenerate, taking in account for instance the following: Proposition 2 Assuming that .E ε = −∇ε is uniformly bounded in .L∞ (0, T ; 1,∞ d ε ε .W ffl ε (T )) and that .f (t, v) is the solution of (7) with .f (0, x, v) = f0 (x, v)dx, one has the following facts: 1. The density .f ε (t, x, v) converges strongly in .L2 ([0, T ] × Td × Rdv ) if and only if ˆ
ˆ
T
dt
.
0
Rdv
dv D(t, v)∇v f ε (t, v) , ∇v f ε (t, v) = 0 .
2. If .∂t ε (t, x) is bounded in some distribution space .L1 (0, T ; H −β (Td )) with some .β finite and if .ε∂t ε converges to 0 in .L1 ([0, T ] × Td ), then one obtains ∂t f ε = 0
.
and
v·
Eε f ε dx = 0 ε
on
[0, T ] × Rdv .
Quasilinear Approximation in Plasma
253
Proof The point 1 is a direct consequence of the comparison between the classical Hilbertian estimate ˆ t ˆ t 2 ε .∀t ∈ (0, T ), f (s) L2 (Td ×Rd ) ds ≥ f ε (s) 2L2 (Td ×Rd ) ds v
0
0
v
and the two equations appearing in the formula (10). For the point 2, multiply the rescaled Liouville equation by .εε (t, x)θ (t)φ(v) to obtain after integration ˆ
ˆ
R+ t
Rdv
ˆ
ˆ
.
=
v · ∇x f ε (t, x, v)ε (t, x)dx dvdt ε
θ (t)φ(v)
R+ t
Rdv
ˆ +ε
R+ t
θ (t)∇v φ(v) ·
E ε (t, x)ε (t, x)f ε (t, x, v)dxdvdt
(11)
ˆ
dt
dx
Rdv
dv f ε (t, x, v)∂t (θ (t)ε (t, x))φ(v) .
Now, since .ε∂t ε (t, x) converges strongly to zero in .L1 ([0, T ] × Td ) as .ε → 0 , the last term of the right-hand side of (11) goes to 0 as .ε → 0, while for the first term of the right-hand side of (11) with the Aubin–Lions theorem (see for instance [30]), one obtains ˆ
ˆ
R+ t .
Rdv
θ (t)∇v φ(v) · ˆ
ˆ
=− ˆ =−
E ε (t, x)ε (t, x)f ε (t, x, v)dxdvdt
R+ t
Rdv
θ (t)∇v φ(v) ·
ˆ
R+ t
Rdv
1 ∇x |ε (t, x)|2 f ε (t, x, v)dxdvdt 2
θ (t)f ε (t, v)∇v φ(v) ·
(12)
1 ∇x |ε (t, x)|2 dx = 0 . 2
Eventually, using .E ε = −∇ε and an integration by parts in the variable x, for the left-hand side of (9), we obtain from (11)–(12) ˆ
ˆ
R+ t
.
Rdv
ˆ =− ˆ =
θ (t)φ(v)v ·
R+ t
R+ t
ˆ
ˆ
Rdv
Rdv
θ (t)φ(v)
θ (t)φ(v)
E ε (t, x)f ε (t, x, v) dxdvdt ε v · ∇x ε (t, x)f ε (t, x, v) dxdvdt ε
(13)
v · ∇x f ε (t, x, v)ε (t, x) dxdvdt = 0 . ε
Then, one observes that any vector-valued function .v → ψ(v) ∈ D(Rdv ; Rdv ), with d .ψ(0) = 0 , can be written with the introduction of a function .v → γ (v) ∈ D(Rv )
254
C. Bardos and N. Besse
equal to 1 on the support of .ψ(v) as ˆ
1
ψ(v) =
∇v ψ(sv)ds v
0
.
ˆ = γ (v)
1
∇v ψ(sv)ds v = ϕ(v)v,
0
ˆ
with ϕ(v) := γ (v)
1
(14) ∇v ψ(sv)ds .
0
From (13)–(14), one concludes that for any such vector-valued function .v → ψ(v) with .ψ(0) = 0 , one obtains ˆ
ˆ .
R+ t
θ (t)
Rdv
ψ(v) ·
E ε (t, x)f ε (t, x, v) dxdvdt = 0 . ε
Therefore, the support of .∂t f ε (t, v) is contained in .[0, T ] × {v = 0} . Hence, for d any .θ (t) ∈ D(R+ t ) and for any .φ(v) ∈ D(Rv ) with the point 0 not included in the support of .φ(v), one obtains ˆ .
d R+ t ×Rv
f ε dx∂t θ (t)φ(v)dvdt = 0 .
(15)
ffl d However, since . f ε dx ∈ L∞ (R+ t × Rv ), relation (15) remains valid for any test d function .φ ∈ D(Rv ), and this completes the proof of the point 2. Remark 1 For Vlasov–Poisson equations, the “ergodic” convergence of . f ε (t, x, v) to . f ε (t, v) , which has already been proven, implies (using the Poisson equation) the weak convergence to zero of the electric field .E ε (t, x) . This is in this scaling the “baby Landau damping.” Strong convergence will be equivalent to the genuine Landau damping as proven in Theorem 3 of Sect. 3.3. As a consequence, too much time regularity for the electric field .E ε may prevent a limit described by a nondegenerate diffusion equation. Remark 2 From the above observations, one concludes that, as it is the case in many related examples, proofs should rather involve the behavior of the solution itself rather than the asymptotic structure of the equation. Therefore, the use of the Duhamel expansion (up to convenient order) appears to be a natural tool, and it will appear in two very different approaches. The first one is based on the introduction of stochasticity in the electric field (see Sect. 2). Hence, it corresponds to a situation where such electric field is non-self-consistent: it is a genuine Liouville equation. The second approach, based on a short-time asymptotic (see Sect. 3.4), deals with configurations where the electric field is given self-consistently (i.e., determined from the density of particles) through a spectral analysis.
Quasilinear Approximation in Plasma
255
1.4 The First Iteration of the Duhamel Formula and the Diffusion: Reynolds Electric Stress Tensor From the previous section, one deduces that the convergence to a genuine diffusion equation requires that the vector field .E ε or the potential .ε becomes “turbulent” as .ε → 0 . This justifies at present the introduction of the diffusion tensor .Dε . Using the Duhamel formula ˆ 1 t ε ε ε ε S E ε (σ ) · ∇v f ε (σ )dσ , (16) .f (t) = St f0 − ε 0 t−σ and the ergodicity (see Proposition 2), the first term of the right-hand side of (16) is ignored, while multiplying by a test function .φ(v) , one obtains for the Fick term the following expression:
ˆ −
dx
.
1 − 2 ε
Rdv
ˆ Rdv
dv φ(v)∇v · ˆ
dv ∇v φ(v) ·
Eε f ε ε
= (17)
t
ds 0
ε (E ε (s) · ∇ f ε (s)) . dx E ε (t)St−s v
Using the explicit formula, ε St−s (E ε (s) · ∇v f ε (s)) = E ε (s, x − v(t − s)/ε2 ) · (∇v f ε )(s, x − v(t − s)/ε2 , v) , (18)
.
the change of variable .σ = (t − s)/ε2 , and the .2π -periodicity of the functions .E ε and .f ε , one obtains from (17)
ˆ −
dx
Rdv
dv φ(v)∇v ·
ˆ − .
Rdv
ˆ
t ε2
dv (∇v φ(v))T
Eε f ε ε
dσ
=
dx E ε (t, x + σ v) ⊗ E ε (t − ε2 σ, x)∇v f ε (t − σ ε2 , x, v)
0
= ˆ
ˆ Rdv
dv
dx
t ε2
dσf ε (t − σ ε2 , x, v)∇v · E ε (t − ε2 σ, x) ⊗ E ε (t, x + σ v)∇v φ(v) .
0
(19)
Observe that the above integrations by part are justified on the following ground: (i) All the arguments, .E ε and .f ε , are assumed for .ε > 0 to be smooth enough. (ii) The smooth test function .φ is independent of x and t. Further analysis may be decomposed into four steps:
256
C. Bardos and N. Besse
1. In (19), replacing .f ε by an other smooth test function, one introduces the diffusion tensor .Dε (t, v) defined by ˆ
dv Dε (t, v)∇v ψ(v) , ∇v φ(v) =
Rdv
ˆ
.
=
Rdv
dv (∇v ψ(v))T
ˆ
t ε2
dx
ˆ Rdv
dv ∇v ψ(v) , Dε (t, v)T ∇v φ(v)
dσ E ε (t − ε 2 σ, x) ⊗ E ε (t, x + σ v) ∇v φ(v) .
0
2. Eventually observe that under a decorrelation hypothesis and with f ε (t − σ ε2 , x, v) = f ε (t, v) ,
.
one would obtain ˆ
ˆ .
Rdv
dv
t ε2
dσf ε (t − σ ε2 , x, v)∇v · E ε (t − ε2 σ, x) ⊗ E ε (t, x + σ v)∇v φ(v)
0
ˆ
=
dx
Rdv
f ε (t, v)∇v · Dε (t, v)T ∇v φ(v) dv .
3. With the notation E ε (t −ε2 σ, x)⊗E ε (t, x +σ v) = E(T , X)⊗E(S, Y )T =t−ε2 σ, X=x, S=t, Y =x+σ v ,
.
appears some type of average of the tensor that in the present field (“plasma turbulence”) plays a role very similar to the Reynolds hydrodynamic stress tensor .u(t, x) ⊗ u(s, y) in fluid turbulence. 4. Since the electric field is the gradient of a real x-periodic potential, i.e., ε ε .E (t, x) = −∇ (t, x), with the notation E ε (t, x)e−ik·x dx
E ε (t, k) =
.
and
ε (t, k) =
(t, x)e−ik·x dx ,
one has ˆ
Dε (t, v) =
k⊗k
.
k∈Zd \{0}
t ε2
ε (t − ε2 σ, k)(ε (t, k)) e−ik·vσ dσ ,
0
with
.
k∈Zd
|k|4 |ε (t, k)|2 ≤ C independent of ε .
Quasilinear Approximation in Plasma
257
In order to compare deterministic results of this section with the stochastic ones of Sect. 2, we state and prove the following: Proposition 3 We assume that the Fourier coefficients of the potential .ε (t, x) are given by the ansatz ε (t, k) = ε (t, k)e−iω(k)ε
.
−βk t
,
where .ω(k)ε−βk is a fast time frequency with .ω(−k) = ω(k) and .β−k = βk , while the amplitude .ε (t, k) is slowly varying with time and more precisely satisfies the estimate
(20) . |k|4 |ε (t, k)|2 ≤ C independent of ε . k∈Zd
Then: 1. The diffusion tensor .Dε is given by ˆ D (t, v) = ε
dx
t ε2
0
=
.
k∈Zd
=
dσ E ε (t, x + σ v) ⊗ E ε (t − ε2 σ, x) ˆ
t ε2
k⊗k
dσ |ε (t, k)|2 e−i(ε
2−βk ω(k)−k·v)σ
0
k ⊗ k | (t, k)| ε
2
sin (ε2−βk ω(k) − k · v) εt2
k∈Zd
ε2−βk ω(k) − k · v
.
2. Using the definition of the following hyperplanes, πk0 = {v ∈ Rdv such that k · v = 0} ,
.
and
1 .πk
= v ∈ Rdv such that ω(k) − k · v = 0
or k · (ω(k) − v) = 0 with ω(k) :=
ω(k) k |k| |k|
for any .ψ, φ ∈ D(Rdv ), one obtains the following behavior as .ε → 0: ˆ Rdv
(∇v φ(v))T Dε (t, v)∇v ψ(v) dv = π
.
+π
k=0, βk =2
ˆ |εk |2
πk1
ˆ |εk |2
k=0, βk 2 one obtains .
lim
sin (ε2−βk ω(k) − k · v) εt2 ε2−βk ω(k) − k · v
ε→0
= 0.
Remark 3 In point 2 of Proposition 3, one observes that under the decreasing condition (20), the cluster point .Dε is the distribution Dε = π
.
k⊗k |ε (t, k)|2 δ(k·v) + π
k⊗k |ε (t, k)|2 δ(ω(k)−k·v).
k∈Zd , βk =2
k∈Zd , βk τ ⇒ Ak (σ ) = 0 ,
is even,
and
ˆ k∈Zd
R
|k|3 |Ak (σ )|dσ < C1 ,
(26) with .C1 being independent of .ε. In the right-hand side of (25), the term .Ak ((t − s)/ε2 ) stands for the time homogeneity assumption, while the term .δ(k+k ) represents the hypothesis of space homogeneity. Observe that the functions .σ → Ak (σ ) can be extended by parity as functions defined on .R and with their Fourier transforms given by ˆ k (s) = .A
ˆ R
Ak (σ )e−isσ dσ .
As a consequence of the above hypotheses, one obtains, for .ε small enough, ˆ
t ε2
Dε (t, v) =
.
dσ E[E ε (t, x + σ v) ⊗ E ε (t − ε2 σ, x)]
0
=
k∈Zd \{0}
ˆ k⊗k 0
t ε2
Ak (σ )e−i(ωk −k·v)σ dσ
(27)
260
C. Bardos and N. Besse
=
1 2
ˆ
k⊗k
k∈Zd \{0}
R
Ak (σ )e−i(ωk −k·v)σ dσ .
2.1 Properties of the Reynolds Electric Stress Tensor Properties of the Reynolds electric stress tensor .Dε and of its limit as .ε → 0 are collected in: Proposition 4 Under assumptions H2 and H3 (see (23)–(25)): 1. The functions .s → Aˆ k (s) are real non-negative, analytic, and satisfy the estimate .
sup s∈R
|k|3 |Aˆ k (s)| < C1 .
(28)
k∈Zd
2. The limit of the Reynolds electric stress tensor ˆ Dε (t, v) =
.
t ε2
dσ E[E ε (t, x + σ v) ⊗ E ε (t − ε2 σ, x)]
0
is a real non-negative symmetric diffusion matrix, which is analytic in the variable v and given by D(v) = Dε (t, v) =
.
1 2
k ⊗ k Aˆ k (ωk − k · v) .
(29)
k∈Zd \{0}
3. For any .ψ, φ ∈ D(Rdv ), one has ˆ d R+ t ×Rv
(∇v φ(t, v))T Dε (t, v)∇v ψ(t, v)dvdt =
ˆ
.
−
d R+ t ×Rv
φ(t, v)∇v · D(v)∇v ψ(t, v) dvdt .
(30)
Proof The reality of .Aˆ k follows from the parity of the function .σ → Ak (σ ) . Then for any continuous and compactly supported function .R s → φ(s), using (25) and obvious changes of variables in time, one obtains ˆ ˆ R
R
Ak (t − s)φ(s)φ(t)dsdt =
ˆ ˆ
.
ε
4 R
R
E φ(t/ε ) (t, k) φ(s/ε2 )ε (s, k) dsdt .
2
ε
(31)
Quasilinear Approximation in Plasma
261
Observe that the left-hand side of (31) is independent of .ε, while the right-hand side is non-negative. As a consequence, one obtains ˆ ˆ .
R
R
Ak (t − s)φ(s)φ(t)dsdt ≥ 0 ,
and the positivity of .Aˆ k follows from the Bochner theorem (see [32]). Eventually, the fact that functions .Aˆ k (s) are analytic (an elementary version of the Paley–Wiener theorem) and satisfy estimate (28) is a direct consequence of (26). In the same way, the rest of the proof also follows directly from (24)–(25).
2.2 Decorrelation Assuming that the electric field is a stochastic function and introducing the expectation in the formula (19), one obtains −E
.
ˆ dx
Rdv
dv φ(v)∇v ·
ˆ =E
ˆ
Rdv
dv
dx
t ε2
Eε f ε ε
dσf ε (t − σ ε2 , x, v)∇v · E ε (t − ε2 σ, x) ⊗ E ε (t, x + σ v)∇v φ(v) ,
0
(32) which leads to a “smooth” well-defined diffusion matrix but which also requires a decorrelation formula more or less of the following type: ˆ E .
Rdv
ˆ dv(∇v φ(v))
T
dx
dσ E (t, x + σ v) ⊗ E (t − ε σ, x)∇v f (t − σ ε , x, v) ε
ε
2
ε
2
0
ˆ Rdv
t ε2
ˆ dv (∇v φ(v))T
dx
τ
dσ E[E ε (t, x + σ v) ⊗ E ε (t − ε2 σ, x)] E[∇v f ε (t − σ ε2 , x, v)] ,
0
and this is the object of the following lemma and proposition. Lemma 1 (Time Decorrelation Property Between .f ε and .E ε ) Assume H2 or (23). Suppose that the random initial data .f0ε and the electric field .E ε are independent. Then the operator .E ε (s) · ∇v is independent of .f ε (t) as soon as 2 .s ≥ t + ε τ. Proof From the Duhamel formula f ε (t) = Stε f0ε −
.
1 ε
ˆ 0
t
ε dσ St−σ E ε (σ )∇v f ε (σ ) ,
(33)
262
C. Bardos and N. Besse
we observe that .f ε (t) depends only on .f0ε and .E ε (σ ) for .σ ≤ t. Since .f0ε is independent of .E ε (t), .∀t ∈ R, and since the electric fields .E ε (s) and .E ε (t) are independent as soon as .s > t + ε2 τ (assumption H2 or (23)), Lemma 1 follows directly from (33). In the Duhamel formula connecting the solution from the time .t − ε2 τ to the time t, f ε (t) = Sεε2 τ f ε (t − ε2 τ) −
.
ˆ
1 ε
ε2 τ
0
dσ Sσε E ε (t − σ ) · ∇v f ε (t − σ ) ,
(34)
we insert for .f ε (t − σ ) the Duhamel formula connecting the solution from the time 2 .t − 2ε τ to the time .t − σ to obtain f (t) = ε
.
1 ε2
+
ˆ
Sεε2 τ f ε (t ˆ
ε2 τ
2ε2 τ−σ
dσ 0
1 − ε τ) − ε
0
ˆ
2
ε2 τ
0
ε ε ε 2 dσ Sσε E ε (t − σ ) · ∇v (S−σ S2ε 2 τ f (t − 2ε τ))
ds Sσε E ε (t −σ )·∇v (Ssε (E ε (t −σ −s)·∇v f ε (t −σ −s))) , (35)
which provides the essential tool for the needed decorrelation property according to the following: Proposition 5 Assume that the vector field .E ε satisfies Hypotheses H1 or (22) and H2 or (23); then for the expectation of the Fick term, one obtains .
=
E ε (t)f ε (t) dx ε
− ∇v · E 1 ε2
ˆ
ε2 τ
ε dx E[E ε (t)·∇v Sσε E ε (t −σ )·∇v S−σ ]E[f ε (t −2ε2 τ)]+E[μεt ] ,
dσ 0
with
=−
ε
.μt
1 ε3
ˆ
ˆ
ε2 τ
dσ 0
2ε2 τ−σ
ds 0
dx E ε (t)·∇v Sσε E ε (t −σ )·∇v Ssε E ε (t −σ −s)·∇v f ε (t −σ −s)] .
Proof Applying operator .E ε (t) · ∇v to (35), and then applying successively the average in space and the expectation value, we obtain .
− ∇v · E
1 E ε (t)f ε (t) = dx ε ε
dx E E ε (t) · ∇v Sεε2 τ f ε (t − ε 2 τ)
Quasilinear Approximation in Plasma
+
ˆ
1 ε2
263
ε ε ε 2 ε dx E E ε (t) · ∇v Sσε E ε (t − σ ) · ∇v S−σ S2ε 2 τ f (t − 2ε τ) +E[μt ] ,
ε2 τ
dσ 0
(36) with
=−
ε
.μt
1 ε3
ˆ
ˆ
ε2 τ
2ε2 τ−σ
dσ 0
ds 0
dx E ε (t)·∇v Sσε E ε (t −σ )·∇v Ssε E ε (t −σ −s)·∇v f ε (t −σ −s) .
Using Lemma 1, we obtain that .f ε (t) is independent of .E ε (s) · ∇v as soon as 2 .s ≥ t + ε τ. Then, using hypothesis .H1, we obtain
E E ε (t) · ∇v Sεε2 τ f ε (t − ε2 τ) = E E ε (t) · ∇v Sεε2 τ E f ε (t − ε2 τ) = 0 ,
.
and the first term of the right-hand side of (36) vanishes. Since Proposition 1 implies that .E ε (t)·∇v and .E ε (t −σ )·∇v are independent of .f ε (t −2ε2 τ), for .0 ≤ σ ≤ ε2 τ, we obtain from (36) ˆ
1 . ε2 =
ε ε ε 2 dx E E ε (t) · ∇v Sσε E ε (t − σ ) · ∇v S−σ S2ε 2 τ f (t − 2ε τ)
ε2 τ
dσ 0
1 ε2
ˆ
ε
ε E S2ε2 τ f ε (t − 2ε2 τ) . dx E E ε (t) · ∇v Sσε E ε (t − σ ) · ∇v S−σ
ε2 τ
dσ 0
2.3 Weak Limits The asymptotic behavior of the error term .μεt as .ε → 0 is given by: d Proposition 6 For any .φ ∈ D(R+ t × Rv ), one obtains
|μεt , φ| ≤ ετ4 C(φ)E E ε 3L∞ (R+ ;W 2,∞ (Td )) .
.
t
d Proof First changing .(σ, s) into .(ε2 σ, ε2 s), with any .φ ∈ D(R+ t × Rv ), one obtains
ˆ .μt
ε
1 ε3
ˆ
, φ = − ˆ
ε2 τ
dσ 0
d R+ t ×Rv
2ε2 τ−σ
ds 0
dtdv φ(t, v) dx E ε (t)·∇v Sσε E ε (t −σ )·∇v Ssε E ε (t −σ −s)·∇v f ε (t −σ −s)
264
C. Bardos and N. Besse
ˆ = −ε
d R+ t ×Rv
ˆ
ˆ
τ
dtdv φ(t, v)
2τ−σ
dσ 0
ds
dx
0
E ε (t) · ∇v Sεε2 σ E ε (t − ε 2 σ ) · ∇v Sεε2 s E ε (t − ε 2 (σ + s)) · ∇v f ε (t − ε 2 (σ + s)) . ε , one Then, with several integrations by part and using the fact that .S ε t ∗ = S−t obtains (see [1])
ˆ μεt , φ = −ε
.
ˆ
R+
dt
Rd
ˆ dv
ˆ
τ
dσ 0
2τ−σ
ds
dx
0
ε ε 2 ε ε f ε (t −ε2 (σ +s))E ε (t −ε2 (σ +s))·∇v (S−ε 2 s E (t −ε σ )·∇v (S−ε 2 σ E (t)·∇v φ)) . (37)
In the last line of (37) appears the term ε ε 2 ε ε E ε (t − ε2 (σ + s)) · ∇v (S−ε 2 s E (t − ε σ ) · ∇v (S−ε 2 σ E (t) · ∇v φ)) ,
.
which contains at most second-order derivatives with respect to v of expressions of the form .E ε (s, x + σ˜ v). With .τ finite, .x ∈ Td , and with the introduction of a test + d d d function .φ ∈ D(R+ t ×Rv ), the support of the integrand is bounded in .Rt ×T ×Rv . Then, with a crude estimate (that could be improved), one obtains |μεt , φ| ≤ ετ 4 C(φ) E ε 3L∞ (R+ ;W 2,∞ (Td )) .
.
(38)
t
Finally, taking the expectation of (38), one concludes the proof of Lemma 6.
The diffusion limit is given by: d Proposition 7 For any .φ ∈ D(R+ t × Rv ), one obtains
ˆ .
d R+ t ×Rv
dtdv φ∇v · E
ˆ E ε (t)f ε (t) =− dtdv f ε (t, v)∇v · D(v)∇φ(t, v) . dx + d ε Rt ×Rv
(39)
d 0 in .D (R+ t ×Rv ) as .ε
Proof Knowing already from Proposition 6 that .μεt one obtains ˆ E ε (t)f ε (t) = I ε, . dtdv φ∇v · E dx + d ε Rt ×Rv with
→ 0,
(40)
Quasilinear Approximation in Plasma
ˆ I ε := − .
ˆ dt
R+ t
Rdv
dv
1 ε2
265
ˆ
ε2 τ
dσ 0
ε ε dx φ(t, v)E E ε (t) · ∇v Sσε E ε (t − σ ) · ∇v S−σ E S2ε2 τ f ε (t − 2ε2 τ) .
After expanding the integrand of .I ε , using an integration by parts in v, and changing σ into .ε2 σ , one obtains
.
ˆ I ε :=
.
R+ t
ˆ dt
ˆ
Rdv
τ
dv
dx ∇v φ T E[E ε (t, x) ⊗ E ε (t − ε2 σ, x − vσ )]
dσ 0
E[(σ − 2τ)(∇x f )(t − 2ε2 τ, x − 2vτ, v) + (∇v f )(t − 2ε2 τ, x − 2vτ, v)] . Using the change of variables .(t, x, v) → (t = t − 2ε2 τ, x = x − 2vτ, v = v) and integration by parts in .(x, v), one obtains I ε := −
.
1 (2π )d
ˆ
ˆ Rt
dt
Rdv
ˆ dv
ˆ
τ
dσ 0
Rdx
dx E[f ε (t, x, v)] ε (t, σ, x, v) ,
with ε (t, σ, x, v) = 1[−2ε2 τ, +∞[ (t)1{Td −2vτ} (x) (∇v · + (σ − 2τ)∇x · )
.
E[E ε (t − ε2 (σ − 2τ), x − v(σ − 2τ)) ⊗ E ε (t + 2ε2 τ, x + 2vτ)]∇v φ(t + 2ε2 τ, v) . The domain of integration of the integral .I ε is a compact set K of . Rt × Rdv × [0, τ]σ × Td . We already know that .E[f ε (t, x, v)] f ε (t, v) in .L∞ (K) weak.− . It remains to show the strong convergence in .L1 (K) of . ε to a suitable cluster point . ε . For this, using (24), one obtains ε (t, σ, x, v) = 1[−2ε2 τ, +∞[ (t)1{Td −2vτ} (x) (∇v · + (σ − 2τ)∇x · )
−i(ωk +ωk ) t2 −i(ωk −k·v)σ ε e −k ⊗ k ei(k+k )·x ei2(k+k )·vτ e−i2(ωk +ωk )τ e
.
k,k ∈Zd
E[ε (t − ε2 (σ − 2τ), k)ε (t + 2ε2 τ, k )]∇v φ(t + 2ε2 τ, v) . Without space homogeneity, we observe that the term .exp −i(ωk + ωk )t/ε2 does not converge pointwise almost everywhere in time, which prevents strong convergence in .L1 (K) of the function . ε . By constrast, using the spatio-temporal homogeneity property (25), one obtains
266
C. Bardos and N. Besse
ε (t, σ, x, v) = 1[−2ε2 τ, +∞[ (t)1{Td −2vτ} (x) (∇v · + (σ − 2τ)∇x · )
Ak (σ )e−i(ωk −k·v)σ k ⊗ k∇v φ(t + 2ε2 τ, v) .
.
k∈Zd
Using regularity properties (26) and Lebesgue dominated convergence theorem, one obtains that . ε converges in .L1 (K) strongly toward the cluster point . ε , which is defined by
. ε (t, σ, x, v)
= 1R+ (t)1{Td −2vτ} (x) (∇v · + (σ −2τ)∇x · )
Ak (σ )e−i(ωk −k·v)σ k⊗k∇v φ(t, v) .
k∈Zd
Using properties (26) for .Ak , and passing to the limit .ε → 0 in .I ε , one obtains ˆ .I ε
=−
ˆ R+ t
dt
Rdv
dv f ε (t, v)∇v ·
1
2
ˆ k ⊗k
k∈Zd
R
dσ Ak (σ )e−i(ωk −k·v)σ ∇v φ(t, v) .
Using this last equation, definitions (27) and (29), and passing to the limit .ε → 0 in (40), ones obtains (39), which ends the proof of Proposition 7.
2.4 The Basic Stochastic Theorem From the above derivation, one deduces: Theorem 1 Let .{E ε (t, x; ω)}ω∈ = {−∇ε (t, x; ω)}ω∈ be a family of stochastic (with respect to the random variable .ω ∈ ) gradient vector fields. Assume that such vector fields satisfy the .ε-independent local-in-time regularity hypothesis ∀ε > 0 and ∀T > 0,
.
sup E ε (t) W 2,∞ (Td ) ≤ C(T ) ,
0 1, the spectra of .etTG , contained in the region .{μ ∈ C | |μ| ≥ α > 1}, are a finite sum of eigenvalues with finite multiplicity, which are the images of the poles of the resolvent of the generator .−TG , i.e., complex numbers .λm , λm > 0 , such that the equation λm h + v · ∇x h + E[h] · ∇v G = 0 ,
.
h ∈ L2 (Td × Rdv ) ,
has a non-trivial solution. Using Fourier series on .Td , this means that there exists at least one .h(km , v) ∈ L2 (Rdv ) (may be more than one if the multiplicity of .λm is greater than one) such that, taking into account the relation between the Fourier components of the electric field and of the density, namely E[h](km ) = −
.
ikm ρ[h](km ) , |km |2
one has the “dispersion equation” ˆ 1−
.
Rdv
ikm ∇v G(v) dv = 0 . · |km |2 λm + ikm · v
(46)
Since .G(v) is real, one observes that if .(λm , km ) is a solution, then the same is true for .((λm ) , −km ), and that in this case one obtains for the Fourier component of .h hλm (km , v) = −
.
1 Eλ (km ) · ∇v G(v) . (λm + ikm · v) m
As a consequence (assuming for sake of simplicity that .λm = γm + iωm is a simple root of the analytic equation (46)), one observes that .hm (t, x, v), the solution of Eq. (45), and the electric field Em (t, x) = eikm x+λm t Eλm (km )
.
are bound by the relation, for any time t, hm (t, x, v) = −
.
eikm x+λm t Eλ (km ) · ∇v G . (λm + ikm · v) m
On the other hand, for any .λm , introduce the Kato projector (see [19] page 178), on the eigenspace corresponding to the eigenvalue .λm , defined by
Quasilinear Approximation in Plasma
Pm h0 (x, v) =
.
271
1 2iπ
ˆ
(λI − TG )−1 h0 dλ ,
m
where .m is a “small” oriented contour of the complex half- plane .λ > 0 containing only .λm in its interior. Since for any .δ > 0 there is a finite number of eigenvalues in the region .
λ ∈ C | 0 < δ ≤ λ ≤ := sup λm ,
one obtains (assuming that these eigenvalues are simple), for any real density h(t, x, v) solution of the linearized equation (45), the following asymptotic expansion:
.h(t, x, v) = e(γm +iωm )t eikm ·x Pm h0 + O(eδt ) . (47)
.
{λm }m | 0 0) the electric field goes exponentially fast to zero as .t → ∞ . Following the recent version of Grenier et al. [17], we assume that the profile .v → G(v) and the initial data .(x, v) → h0 (x, v) are analytic functions. Using notation (44), Laplace transform with respect to time, Fourier transform with respect to x, and Fourier transform with respect to v are used. They are denoted as in Sect. 3.1. We first focus on the behavior of the density .ρ[h], of the solution h of the linearized equation (45), given by .
ˆ ρ[h](t, x) =
.
Rdv
h(t, x, v)dv .
272
C. Bardos and N. Besse
Using Fourier–Laplace transformations, we obtain, for .λ > 0 , the relation ˆ . (1
+ KG (λ, k))Lρ[h](λ, k) =
Rdv
ˆ
h0 (k, v) dv λ + ik · v
with
KG (λ, k) = −
Rdv
ik ∇v G(v) · dv . |k|2 λ + ik · v
Then, following [27], one observes that ˆ =
.KG (λ, k)
∞
e−λs s(Fv G)(ks)ds
ˆ and
ˆ ∞ hˆ 0 (k, v) e−λs (Fv h0 )(k, ks)ds . dv = λ + ik · v 0
Rdv
0
Therefore, as in [27], one proposes a stronger criterion for the absence of solution λm with .λm > 0 for Eq. (46), which is
.
∃κ0 > 0, such that
.
inf
k∈Zd ,λ>0
ˆ 1 +
∞
e
−λs
0
sFv G(ks)ds ≥ κ0 > 0 .
(48)
Then, we obtain for .λ > 0 the solution 1 .Lρ[h](λ, k) = (1 + KG (λ, k))
ˆ
∞
e−sλ (Fv )h0 (k, ks)ds .
0
With the hypothesis of analyticity, by the Paley–Wiener Theorem, there exist C and θ0 such that one has
.
|Fv G(ks)| ≤ Ce−θ0 |k|s ,
.
and
|Fv h0 (k, ks)| ≤ Ce−θ0 |k| .
As a consequence, using (48) for any .k ∈ Z\{0} , the functions ˆ KG (λ, k) =
.
∞
e−λs sFv G(ks)ds
ˆ and
S(λ, k) =
0
∞
e−sλ Fv h0 (k, ks)ds
0
can be extended (for the density .ρ[h], the integration with respect to v leads to an extension behind the imaginary axis without extra singularity) as analytic functions in the region .λ > −θ0 |k| . Eventually, one obtains (see [17]) that there exists .θ1 > 0 such that for λ ≥ −θ1 |k|,
.
|Lρ[h](λ, k)| ≤
C1 . 1 + |k|2 + |λ|2
(49)
This gives the exponential decay for .ρ[h] and .E[h] when G is analytic. This is sufficient for the present discussion (extension to an initial data .f0 belonging only to a Gevrey space with Gevrey index .γ ∈ ( 13 , 1] uses the presence of the term .|k|2 in (49)). Of course, the nonlinearity requires more sophisticated analysis in particular
Quasilinear Approximation in Plasma
273
in the interaction of modes, which is the classical problem of the echoes. Details can be found in [17], where the following theorem is obtained. Theorem 2 Assume for the initial data, f0 (x, v) = G(v) + εh0 (x, v),
(50)
.
that .G(v) and .h0 (x, v) are analytic, while the basic profile .G(v) satisfies the stability estimate given by the formula (48). Then, for .ε small enough, the corresponding solution exhibits the Landau damping effect, i.e., as .t → ∞, the electric field .E(t, x) goes exponentially fast to zero. As observed in the introduction, the ergodicity of the torus .Td implies that for any .0 < T < ∞ the solution .f ε (t, x, v), of the rescaled equation, converges in ∞ d d .L ([0, T ] × T × Rv ) weakly.− to an x-independent function. Hence, with the Poisson equation, the electric field converges also in .L∞ (0, T ; L2 (Td )) weakly.− to zero. Therefore, this property would justify the term “baby Landau damping.” In the above situation, genuine Landau damping would correspond to strong convergence (for any .δ > 0 and .δ < T < ∞) in .L∞ (δ, T ; L2 (Td )) . Strong convergence will be the counterpart, in the present situation, of Theorem 2. In fact, one has: Theorem 3 For solutions of the rescaled Vlasov–Poisson equations, ε2 ∂t f ε + v · ∇x f ε + εE ε · ∇v f ε = 0 , ˆ . f (t, x, v)dv − 1 , ∇x · E ε = ρ(t, x) =
(51)
Rdv
near an equilibrium f0 (x, v) = G(v) + h0 (x, v) ,
.
with analytic data and profile satisfying the stability estimate given by the formula (48), and no restriction on the “size” of the analytic perturbation .h0 , on . 0 < δ < T < ∞ , the electric field .E ε (t, x) converges exponentially fast to zero in ∞ 2 d .L (δ, T ; L (T )) as .ε → 0. Proof For strong convergence, one follows [17]. First introduce the function F ε (τ, x, v) solution of the equations
.
∂τ F ε + v · ∇x F ε + E ε ερ ε · ∇v F ε ,
.
with E ε · = ∇−1 · ,
.
f ε (t, x, v) = F ε (t/ε2 , x, v)
, and
εE ε [ρ ε ] = E ε [ερ ε ] .
274
C. Bardos and N. Besse
For .ε ≤ ε0 small enough, apply Theorem 2 to the solution .(Fε (τ, x, v), Eε (τ, x)) of ∂τ F + v · ∇x F + E(τ ) · ∇v F = 0 , ˆ −1 F dv − 1, E = ∇ . Rdv
Fε (0, x, v) = G(v) + εh0 (x, v) , which coincide with the solution .(f ε (t, x, v), E ε (t, x)) of (51) through the relation ε ε 2 2 .(f (t, x, v), E (t, x, v)) = (Fε (t/ε , x, v), Eε (t/ε , x)).
3.4 Roadmap for the Short-Time Quasilinear Approximation In this section, we present a prospective method to prove the validity of a short-time quasilinear approximation in the presence of unstable eigenvalues. Observe that for any time .t > 0 with f ε (t, x, v) = G(t, v) + εh(t, x, v) ,
.
h(t, x, v)dx = 0 ,
the Vlasov–Poisson equations are equivalent to the system ∂t G + ε 2 ∇ v · .
E[h]hdx = 0 ,
E[h] = ∇−1
∂t h + v · ∇x h + E[h] · ∇v G = −ε∇v · E[h]h −
ˆ Rdv
h(t, x, v)dv ,
(52)
E[h]hdx .
Next, assume the existence of a simple non-degenerated root .(λ(0) > 0, k) of the dispersion equation 1−
.
1 |k|2
ˆ Rdv
ik · ∇v G(0, v) dv = 0 , λ + ik · v
(53)
with .G(0, v) = G0 (v), and consider solutions .f ε (t, x, v) of the Vlasov equation with complex initial data f ε (0, x, v) = G0 (v) + εh0 (x, v) = G0 (v) + ε
.
E(0, k) · ∇v G0 (v) ik·x e . λ + ik · v
Assuming that .f ε (0, x, v) is analytic, one observes (as proven by Benachour in [2]) that the corresponding solution of the Vlasov equation is also analytic. Hence (see
Quasilinear Approximation in Plasma
275
[19] Chapter 2, Section 1), the root can be extended on a finite time interval as simple solution of the equation 1−
.
ˆ
1 |k|2
Rdv
ik · ∇v G(t, v) dv = 0 , λ(t) + ik · v
and then one introduces the approximate solution ´t ˜ x, v) = E(0, k) · ∇v G(t, v) e 0 dsλ(s)+ik·x . h(t, λ(t) + ik · v
.
(54)
Dε (v) such that, The function .h˜ will be used to construct an approximate diffusion . for short time, one has ε ∂t G(t, v) − ∇v · D (v)∇v G(t, v) = O(ε3 ),
.
while what follows from Eq. (52) is the estimate ∂t G(t, v) = O(ε2 ) .
.
(55)
Since .λ(t) and of course .G(t) itself are analytic functions, from (55) and (54), one deduces ˜ · ∇v G(t) = O(ε2 ) . ∂t h˜ + v · ∇x h˜ + E[h]
.
Hence, one also obtains ˜ ˜ ˜ ∂t (h− h)+v ·∇x (h− h)+E[h− h]·∇ v G = −ε∇v · E[h]h−
.
E[h]hdx +O(ε2 ) .
˜ ˜ = O(ε) , which eventually Then, with .(h − h)(0, x, v) = 0, one obtains .h(t) − h(t) implies ∂t G + ε 2 ∇ v ·
˜ hdx ˜ E[h] = ε 2 ∇v ·
.
˜ hdx ˜ − E[h]h dx = O(ε3 ) . E[h]
As pointed above, if .(λ, k) are solutions of the dispersion equation, then the same is true for .(λ , −k) , and one can extend the above comparison between genuine solutions with real initial data f ε (0, x, v) = G0 (v) + ε(h(0, x, v)) ,
.
and the approximate solutions given by ˜ x, v) . f˜ε (t, x, v) = G(t, v) + εh(t,
.
(56)
276
C. Bardos and N. Besse
Since the function .h˜ satisfies the relation ˜ k, v) + E[h(t, ˜ k, v)] · ∇v G(t, v) = 0 , (λ + ik · v)h(t,
(57)
.
one obtains, for solutions with initial data given by (56), ´t λe2 0 dsλ(s) E(0, k) ⊗ (E(0, k)) ∇v G(t, v) ∂t G(t, v) − ε2 ∇v · (k · v − λ)2 + (λ)2 . ˜ hdx ˜ = ∂t G(t, v) + ε2 ∇v · E[h] = O(ε3 ) .
The above construction can be combined with the Dunford–Kato formula (47). Assuming again that the roots of the dispersion relation .(λ, k(λ)) are simple, one obtains, for any initial data and any .δ > 0 , summing with respect to the solutions of the dispersion equation (53) with .λ > δ the equation ∂t G(t, v) −
ε ∇v · 2
λ>δ
.
´t
E(0, k(λ)) ⊗ (E(0, k(λ))) λe2 (k(λ) · v − λ)2 + (λ)2
0
dsλ(s)
∇v G(t, v)
= O(ε ) + O(ε2 )eδt , 3
which is the standard quasilinear approximation.
4 Remarks and Conclusion As said in the introduction, the thread in this contribution is the comparison from a genuine mathematical point of view of different approaches leading to the quasilinear diffusion approximation. Remark 4 The most natural one is with the introduction of the rescaled equation. However, in such configuration, it has been shown that the stochastic scenario, which implies that the electric field is independent of the density, is almost compulsory. As such, one could also start from a stochastic flow solution of the ODEs d ε 1 X (t) = 2 V ε (t) , dt ε . d ε 1 V (t) = − E ε (t, Xε (t)) , ε dt with a convenient correlation function .Aτ leading directly at the macroscopic level, without the kinetic step in between, to a diffusion equation
Quasilinear Approximation in Plasma
277
∂t f ε − ∇v · (D(v)∇v f ε ) = 0 .
.
Comparison with the diffusion matrix given also in terms of .Aτ by (41) leads in such cases to a closed formula for the determination of such diffusion. For an interpretation at the level of plasma physics for such relation, see reference [1] and the original ones [11, 31]. Complete proofs may be obtained following the contributions [12, 14, 15, 20]. Weak convergence and introduction of randomness imply that analysis should be made on the solution rather than on the equation. This leads to the introduction of the Duhamel series that may be considered as an avatar of other BBGKY hierarchies. However, in [26, 28], the authors have introduced some decorrelation properties valid at any time, which close the Duhamel series at second order. This is the road that was followed in this contribution. Remark 5 The short allusion to the Landau damping was motivated on one hand by the comparison with issue of strong, versus weak convergence, to zero of the electric field in the rescaled equation and on the other hand to underline the role of estimates on the charge density .ρ[h] under the stability condition (48), which allows to extend the resolvent beyond the imaginary axis. Remark 6 The short-time validity of the quasilinear approximation is based on even much more formal presentations that can be found in basic plasma physics textbooks (see for instance [21] pages 514–517). Here, the symbol .O(ε), used everywhere, should be clarified for a complete proof, which will be the matter of a future work. For short time, systematic use of the Nash–Moser theorem should balance the loss derivative (of order 1 with respect to v) in Sect. 3.4. One should keep in mind the striking difference between the rescaled diffusion scenario with an independent and non-self-consistent stochastic vector field, which is the typical model for longtime dynamics, and the short-time diffusion scenario, where the self-consistent electric field is slaved to the solution by the Poisson equation. In fact, these two diffusion regimes are present in the nonlinear relaxation of the weak warm beam– plasma instability problem. Self-consistent numerical simulations of such problem [3] confirm the existence of these two diffusion regimes, plus a third regime between the two. In this third regime, dubbed the “trapping turbulent regime” in plasma physics literature, nonlinear wave–wave coupling plays an important role. Until now and up to our knowledge, there is no even a roadmap for a full mathematical description of such regime. Acknowledgments The first author wishes to thank the Observatoire de la Côte d’Azur and the Laboratoire J.-L. Lagrange for their hospitality and financial support.
278
C. Bardos and N. Besse
References 1. C. Bardos, N. Besse, Diffusion limit of the Vlasov equation in the weak turbulent regime, submitted. 2. S. Benachour, Analyticité des solutions des équations de Vlasov–Poisson. (French) [Analyticity of solutions of Vlasov-Poisson equations] C. R. Acad. Sci. Paris Sér. I Math. 303 (13) 613–616 (1986). 3. N. Besse, Y. Elskens, D. Escande, P. Bertrand, Validity of quasilinear theory: refutations and new numerical confirmation, Plasma Phys. Control. Fusion 53 025012–48 (2011). 4. F. Bouchut, F. Golse, M. Pulvirenti, Kinetic equations and asymptotic theory, Series in Appl. Math., Gauthier-Villars, 2000. 5. E. Caglioti, C. Maffei, Time asymptotics for solutions of Vlasov–Poisson equation in a circle, J. Stat. Phys. 92 301–323 (1998). 6. K. M. Case, Plasma Oscillations, Ann. Phys. 7 349–364 (1959). 7. M.K. Case, P.F Zweifel, Linear transport theory. Addison-Wesley Publishing Co., Reading, 1967. 8. P. Degond, Spectral theory of the linearized Vlasov-Poisson equation, Trans. Amer. Math. Soc. 294 (2) 435–453 (1986). 9. R.J. DiPerna, P.-L. Lions, Solutions globales d’équations du type Vlasov–Poisson, C. R. Acad. Sci. Paris, Série I 307 (1988) 655–658. 10. R.J. DiPerna, P.-L. Lions, Global weak solutions of kinetic equations, Rend. Sem. Mat. Univers. Politecn. Torino 46 (1988) 259–288. 11. T.H. Dupree, A perturbation theory for strong plasma oscillations, Phys. Fluids 9 1773–1782 (1966). 12. D. Dürr, S. Goldstein, J. L. Lebowitz, Asymptotic motion of a classical particle in a random potential in two dimensions: Landau model, Comm. Math. Phys. 113 (2) 209–230 (1987). 13. R. S. Ellis, M.A. Pinsky, The first and second fluid approximations to the linearized Boltzmann equation, J. Math. Pures Appl. 54 (9) 125–156 (1975). 14. Y. Elskens, E. Pardoux, Diffusion limit for many particles in a periodic stochastic acceleration field, Ann. Appl. Prob. 20 2022–2039 (2010). 15. Y. Elskens, D. Escande, Microscopic dynamics of plasmas and chaos, Institute of Physics 2003. 16. R. Glassey, The Cauchy Problem in Kinetic Theory. Society for Industrial and Applied Mathematics Philadelphia, PA, 1996. 17. E. Grenier, T. Nguyen and I. Rodnianski, Landau damping for analytic and Gevrey data, arXiv:2004.05979v1 (2020). 18. H.J. Hwang, J.L. Velazquez, On the existence of exponentially decreasing solutions of the nonlinear Landau damping problem, Indiana Univ. Math. J. 58 (6) 2623–2660 (2009). 19. T. Kato, Perturbation theory for linear operators, Springer-Verlag. Berlin and New York, 1966. 20. H. Kesten, G.C. Papanicolaou, A limit theorem for stochastic acceleration, Commun. Math. Phys. 78 19–63 (1980). 21. N.A. Krall, A.W. Trivelpiece, Principles of plasma physics. McGraw-Hill, 1973. 22. L. Landau, On the vibrations of the electronic plasma, Akad. Nauk SSSR. Zhurnal Eksper. Teoret. Fiz., 16 574–586 (1946). 23. O.E. Lanford, The evolution of large classical systems in “Dynamical Systems, theory and applications”, J. Moser éd.. Lecture Notes in Physics 38, 1–111, Springer-Verlag, Heidelberg, 1975. 24. P. Lax, R. Phillips, Scattering theory. Academic Press New York 1967. 25. J.L. Lions, Equations différentielles opérationnelles et problèmes aux limites. Springer 1961. 26. G. Loeper, A. Vasseur, Electric turbulence in a plasma subject to a strong magnetic field, Asymptotic Anal. 40 51–65 (2004). 27. C. Mouhot, C. Villani, On Landau Damping, Acta. Math. 207 29–201 (2010). 28. F. Poupaud, A. Vasseur, Classical and quantum transport in random media, J. Math. Pures Appl. 82 711–748 (2003).
Quasilinear Approximation in Plasma
279
29. J. Sebastiõ e Silva, Les séries de multipôles des physiciens et la théorie des ultradistributions. Math. Ann. 174 109–142 (1967). 30. J. Simon, Compact sets in the space Lp (0, T ; B), Ann. Math. Pura. Appl. 146 65–96 (1987). 31. J. Weinstock, Formulation of a statistical theory of strong plasma turbulence, Phys. Fluids 12 1045–1058 (1969). 32. K. Yosida, Functional analysis, Springer, 1980.
Analysing the Scattering of Electromagnetic Ultra-wideband Pulses from Large-Scale Objects by the Use of Wavelets François Bentosela
Abstract The aim of this chapter is the study of the signal received by an antenna (RX) when a transmit antenna (TX) sends a short pulse in a large-scale space, for instance, in an urban environment. Integral equations are established, which link the densities of charges and currents inside the environment objects with the incident field created by the TX antenna. From these equations, we define an integral operator K. The densities can be obtained by inverting .1−K. The introduction of Daubechies wavelets allows us to obtain sparse matrices for .KK ∗ , which is computationally convenient to get .(1 − K)−1 .
Alex Grossmann started his career with two papers with Tai Tsun Wu [10] on “The Schrödinger scattering amplitude for a fixed potential”. The forthcoming paper is an homage to his deepness and his clarity in a field that is still alive.
1 Introduction With the development of wireless telephony, there is a need to better understand the propagation of signals in an urban environment with its buildings, its inhabitants, its cars, its furniture. . . It becomes important to characterize the electrical field received by a user walking in the street when the antenna of the base station, situated on a roof or on a house front, transmits signals. Previously, the efforts were concentrated in the propagation of harmonic signals or narrow band signals. In this case, the signal at the user antenna, which, in many cases, is considered as pointwise, is determined once we get the complex amplitude of the electrical field at this point. To calculate the electric field scattered by the urban environment, there are mainly two ways. The first one consists in using the geometrical optics approximation. Researchers and engineers introduce the
F. Bentosela () Aix Marseille Univ, Université de Toulon, CNRS, CPT, Marseille, France © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Flandrin et al. (eds.), Theoretical Physics, Wavelets, Analysis, Genomics, Applied and Numerical Harmonic Analysis, https://doi.org/10.1007/978-3-030-45847-8_15
281
282
F. Bentosela
reflections, transmission, and diffraction of rays. Today, in the range of 0.3 GHz to 6 GHz, despite the great number of models and software implementing this technique, the calculations suffer many discrepancies from measurements. The problem often comes from the difficulty to take into account the irregularities of the house front, windows, balconies, and so on. The second way is to use Lippmann–Schwinger type equations [3, 8]. As the wavelength is much smaller than the geometrical dimensions of the buildings, the matrix that has to be inverted numerically is very large. Even if new techniques are available for the inversion of large matrices [11, 12], they have not been applied to the urban problem. The numerical results concern only smaller bodies and simple geometries. Even if the harmonic problem was already difficult, the interest of the wireless community has been moving towards the propagation of ultra-wideband (UWB) signals, i.e., short pulses in the nanosecond range, for which the study, of course, is more challenging [16, 17]. Apple launched the first three phones with ultra-wideband capabilities in September 2019. Once again, the technique that is commonly used is the geometrical optics approximation that obviously infers an imprecise description of the received signal at the user antenna, giving only some indications on the time arrival and the amplitude of the signal, showing some picks corresponding to the main reflections. Comparison with measured signals is not good, the latter are much more complex, and they present an important tail, very badly described by the models deduced from ray tracing. At our knowledge, in the UWB situation, analytic techniques starting from the Maxwell equations have not yet been considered as probably the idea is that they should lead to the inversion of such a large matrix that it is infeasible at present time. The aim of this chapter is to show that there exists a way that improves the situation and could help to solve the problem partly analytically and partly numerically with a control of the different approximations. Section 2 is devoted to establish the definition of the K operator (the equivalent of the Lippmann–Schwinger operator in scattering theory) acting on the Hilbert space 2 4 4 .L (R ) ⊗ C of four functions .di (x, t), i = 0, 1, 2, 3. .d0 (x, t) represents the density of charges at different times, and .di (x, t), i = 1, 2, 3 represent the three components of the density of currents, inside the materials. These densities satisfy a Lippmann– Schwinger type equation, .(1 − K)d = din . This equation links the densities d to some functions .din , called “incident densities”, which depend on the incident electrical field alone, i.e. the field generated in free space by the TX antenna. The way to establish this equation seems to be new and relatively simple compared with the way Lippmann–Schwinger type equation was obtained in electromagnetism for the harmonic signals [8]. Notice that the usual Lippmann–Schwinger type equations include the electric and magnetic fields in full space; instead, our equations involve functions that are defined only inside the materials. Once the density of charges and the density of currents are determined, the scattered fields can be easily obtained. The numerical solution of .(1 − K)d = din is problematic because, even if .1 − K is sparse in a convenient basis, it is a non-self-adjoint, unbounded operator. One could construct a subspace of .L2 (R4 ) ⊗ C4 of finite dimension, generated by a finite number of well-chosen functions, and write the LS type equation in this space only.
Wavelet Analysis of Scattering of Electromagnetic UWB Pulses from Large-. . .
283
Due to the sparsity of the matrix .1 − K restricted to this subspace, one could solve numerically the corresponding linear system. After, it would be necessary to control the stability of the solution as we enlarge the subspace. Enlarging the subspace, as .(1 − K) is unbounded, we add some matrix elements that can be large. It becomes difficult to control the behaviour of the solution. Does the densities converge when new basis vectors are added? The aim of this chapter is to show that it is possible to tackle the problem differently. Multiplying the two terms of the LS type equation by .1 − K ∗ , we get ∗ ∗ .(1 − K )(1 − K)d = (1 − K )din . So we have to solve this new system or to invert ∗ ∗ .(1 − K )(1 − K). Notice that .(1 − K )(1 − K) is a self-adjoint operator that makes easier the problem of the stability of solutions. In Sect. 3 we introduce an orthonormal wavelet basis for .L2 (R4 ) ⊗ C4 [7, 13–15] of Daubechies .−p type [4, 5]. The first p moments of the Daubechies.−p wavelet are zero, and their Fourier transform is essentially concentrated in two intervals in which its absolute value part looks like a smoothed characteristic function (see Daubechies wavelets in Wikipedia [18] and Wolfram MathWorld [20]). From the mother wavelet w whose support is compact, equal to .[−p + 1, p], is built the spatial wavelet .wmn (x) = 2m1 /2 2m2 /2 2m3 /2 w(2m1 x1 −n1 )w(2m2 x2 −n2 )w(2m3 x3 − n3 ), where .m denotes .(m1 , m2 , m3 ) ∈ Z3 and .n denotes .(n1 , n2 , n3 ) ∈ Z3 . Also is built the temporal wavelet .wj k (t) = 2j/2 w(2j t − k). Their product constitutes a basis for .L2 (R4 ). We denote .xmn := (2−m1 n1 , 2−m2 n2 , 2−m3 n3 ) and .xm n := (2−m1 n1 , 2−m2 n2 , 2−m3 n3 ). One defines a subspace as the linear span of a set of wavelets corresponding to a finite set of .m, .n, j , and k. We will show that the matrix corresponding to .(1 − K ∗ )(1 − K) in this subspace is sparse, contrarily to what could be expected since multiplying two sparse matrices gives us generally a non-sparse matrix. In the figure below, using the results established in Theorems 3.1 and 3.2 for the matrix elements . wm n wj k , K ∗ Kwmn wj k , we present, in the case the material volume is a parallelepiped, three artist views of the amplitude of some matrix elements as .m, .n, j , and k are fixed (so we are looking only to the elements of the column, .Kwmn wj k ) as functions of .xm n ∈ V for three different couples .(j , k ). The dilation factors .m, j , .m , .j are chosen relatively large. The coordinates of the centre of the yellow spot in the upper figure are .xmn . Notice that they are also the centre coordinates of the yellow spherical crowns in the middle and lower figures. The upper figure corresponds to the values for . wm n wj k , K ∗ Kwmn wj k
as a function of .xm n when .2−j k − 2−j k is fixed and close to zero. The figure ∗ in the middle corresponds to the values for . wm n wj k , K Kwmn wj k when
2−j k − 2−j k is positive, and the mean radius of the yellow crown is .2−j k − 2−j k divided by c, the light velocity. The lower figure corresponds to the values for ∗ . wm n wj k , K Kwmn wj k when .2−j k − 2−j k is larger, and again the mean .
radius of the yellow crown is .(2−j k − 2−j k)/c.
284
F. Bentosela
Fig. 1 A representation of the modulus of the matrix element for . wm n wj k , K ∗ Kwmn wj k for
three different couples .(j , k ) corresponding to increasing times .2−j k
Black colour represents zero amplitude, yellow colour represents strongest amplitude, while a gradation of black and red is associated with small values, more or less large depending on the proportion of red. Notice that if the time difference .2−j k − 2−j k becomes sufficiently large, the black region covers all the parallelepipeds. So the time difference for which all the matrix elements become equal to zero can be easily estimated in terms of the initial position .xmn and the geometry. The presence of red scars results from Theorem 3.2. They disappear in the lower figure. In Sect. 4, we discuss the choice of the Daubechies wavelets order, in conjunction with the irregularities of the permittivity of the materials and their geometry. We will also discuss briefly the stability of the obtained densities as the number of elements of wavelets is increased and give some perspectives. Finally, let us notice that once charge density and current density inside the materials are obtained, one can easily get the scattered electric field given by the formula (7), and adding to it the incident field, one gets the total electric field anywhere in space and time. Remark The reason why we will call .K ∗ K and .KK ∗ time-reversal operators is because their sparsity has something to do with the time-reversal technique. In the time-reversal technique [6], a pulse .si (t) is sent by the ith TX antenna. This signal propagates in the environment, and it is registered by some RX antennas disposed around the materials. At the j th RX antenna, the received signal .rj i (t) = (Gj i si )(t)
Wavelet Analysis of Scattering of Electromagnetic UWB Pulses from Large-. . .
285
is memorised and time-reversed, that is transformed in .fj i (t) = rj i (2τ − t). After, each j th RX antenna acting now as an emitter sends its reversed signal. Then the .i th antenna, which was previously a TX antenna, receives now a signal .ri i (t) = j (Gi j fj i )(t). What can be observed is that the ith TX antenna receives now a pulse .rii (t) that is concentrated in time and more important than the signals .ri i (t) received by the other TX antennas. If one chooses .τ as the time origin, the time-reversal transform can be expressed in terms of the operator .G∗ , and the signal received at the TX antennas calculated from .GG∗ . The experiment indicates that in .G∗ G the diagonal elements are greater than the non-diagonal ones. The analogy between K and G gave us the idea of the sparsity of .K ∗ K.
2 Direct and Scattered Fields Once the coordinate system has been chosen conveniently with respect to the city (for instance, the 0z axis can be chosen vertical, and the 0y axis parallel to the street axis), the TX antenna is at point A, whose coordinates are given by .a ∈ R 3 . We choose some axis .Ax , Ay , Az linked with the antenna geometry and denote by .r , θ , φ the spherical coordinates of some point .u with respect to the .Ax , Ay , Az axis. We suppose (see Appendix 1: Antennas Diagrams) that in free space, the electrical field at point .u, generated by a given voltage excitation, .v(t), is given in the far-field approximation by Ein (u, t) =
.
dv 1 f(φ , θ ) (t − |u − a|/c). |u − a| dt
(1)
f(φ , θ ) is a vector whose components .fr (θ , φ ), .fθ (θ , φ ), .fφ (θ , φ ) have the dimension of a time. In the far-field approximation (see [1]), .f(φ , θ ) is transversal, i.e. .fr (θ , φ ) = 0. We are going to choose an ultra-wideband input voltage .v(t), which means that its Fourier transform has a support whose width is large, to fix the ideas, if the central frequency is .f0 = 2π ω0 , and the bandwidth is .0.1f0 . Today in wireless telephony, .0.8 GH z < f0 < 6 GH z, but higher frequencies will appear soon. The environment materials respond to the incident field .Ein (u, t), and charges and currents are created inside them. In turn, these charges and currents create everywhere an electrical field (called the scattered field), and summing it with the incident field, the total field is obtained inside or outside the materials. We are going to show that charge and current densities satisfy integral equations of Lippmann–Schwinger type. Let us come back to the electromagnetic theory. The total field polarizes the materials. Polarization means that under the effect of the total electric field, the bound electrons of atoms move, the charge density inside the materials is .
286
F. Bentosela
modified (some “dipoles” are created), and currents are generated. Introducing the polarisation vector .P = 0 χ (E), the charge density inside the material becomes ∂ .ρp (u, t) = −∇.P(u, t), and the current density is given by .Jp (u, t) = ∂t P(u, t). In the literature [9, 19], the relation between the total electric field and the polarisation vector has been discussed, and many simplifications have been introduced (see the Appendix 2: Polarisability). To simplify our forthcoming analysis, we will admit that in our situation, one can write that, approximatively, .P(u, t) = 0 χ (u)E(u, t) and then ρp (u, t) = −0 ∇.(χ (u)E(u, t)) = −0 ∇χ (u).E(u, t) − 0 χ (u)∇.E(u, t).
.
(2)
As .∇.E(u, t) = ρp (u, t)/0 , ρp (u, t) = −
.
0 ∇χ (u).E(u, t) 1 + χ (u)
(3)
and Jp (u, t) = 0 χ (u)
.
∂ E(u, t). ∂t
(4)
If there are free electrons inside the materials (which is the case in metals), they can move under the effect of the electrical field, and some current density is created. We will admit the linearity between the current density and the field i.e. the Ohm law, Jf (u, t) = σ e (u)E(u, t),
.
(5)
where .σ e (u) is the conductivity at point .u. Suppose that inside the materials, the volume of which is V , charge density is given by .ρ (1) (u1 , t) and the current density by .J(1) (u1 , t). In the Lorenz gauge, the scalar potential and the vector potential created at some point .u2 , at time t, respectively, by the charge density and the current density in the materials are given by the so-called retarded potential formulas: ˆ ρ (1) (u1 , t − |u2 − u1 |/c) 1 du1 4π 0 V |u2 − u1 | ˆ (1) J (u1 , t − |u2 − u1 |/c) μ0 . du1 A(1) (u2 , t) = |u2 − u1 | 4π V φ (1) (u2 , t) =
.
(6)
Now let us recall that the scattered electric field is related to the potentials by: Es = −∇φ − ∂t∂ A, and then
.
E(1) s (u2 , t) = −
.
1 ∇ 4π 0
ˆ du1 V
ρ (1) (u1 , t − |u2 − u1 |/c) |u2 − u1 |
Wavelet Analysis of Scattering of Electromagnetic UWB Pulses from Large-. . .
−
μ0 ∂ 4π ∂t
ˆ du1 V
J(1) (u1 , t − |u2 − u1 |/c) . |u2 − u1 |
287
(7)
The charges density induced by the total field, .Et (u2 , t) = Ein (u2 , t) + using (3) is
(1) Es (u2 , t),
0 ∇χ (u2 ).Et (u2 , t) 1+ χˆ (u2 ) 0 ∇χ (u2 ).Ein (u2 , t) =− 1+ χˆ (u2 ) ˆ 1 1 + ∇χ (u2 ).∇ du1 ρ (1) (u1 , t − |u2 −u1 |/c) 4π(1+χ (u2 )) |u −u | 2 1 V ˆ 1 ∂ 0 μ0 J(1) (u1 , t − |u2 −u1 |/c), ∇χ (u2 ). du1 + ∂t V |u2 −u1 | 4π(1+χ (u2 )) (8)
ρ (2) (u2 , t) = −
.
while using (4) and (5) ∂Et (u2 , t) + σ e (u2 )Et (u2 , t) ∂t ∂Ein (u2 , t) + σ e (u2 )Ein (u2 , t) = 0 χ (u2 ) ∂t ˆ ∂ρ (1) 1 1 − χ (u2 ) ∇ (u1 , t − |u2 − u1 |/c) du1 |u2 − u1 | ∂t 4π V ˆ ∂ 2 J(1) 1 0 μ0 − χ (u2 ) du1 (u1 , t − |u2 − u1 |/c) 4π V |u2 − u1 | ∂t 2 ˆ 1 σ e (u2 ) ρ (1) (u1 , t − |u2 − u1 |/c) ∇ du1 − |u − u1 | 4π 0 2 V ˆ μ0 σ e (u2 ) ∂J(1) 1 − (u1 , t − |u2 − u1 |/c). du1 |u2 − u1 | ∂t 4π V
J(2) (u2 , t) = 0 χ (u2 )
.
(9)
Denoting ρin (u, t) = −
.
0 ∇χ (u).Ein (u, t) , 1 + χ (u)
Jin (u, t) = 0 χ (u)
∂ Ein (u, t) + σ (u)Ein (u, t) , ∂t
we write (8) and (9) as .(ρ (2) , J(2) ) = (ρin , Jin ) + K(ρ (1) , J(1) ), where the matrix elements of K
288
F. Bentosela
⎡
K00 ⎢ K10 .K = ⎢ ⎣ K20 K30
K01 K11 K21 K31
K02 K12 K22 K32
⎤ K03 K13 ⎥ ⎥ K23 ⎦ K33
are integral operators that, in the case the conductivity is neglected, are given by .(K00 ρ)(u2 , t)
=
1 ∇χ(u2 ).∇ 4π(1 + χ(u2 ))
ˆ du1 V
1 ρ(u1 , t − |u2 −u1 |/c) |u2 −u1 |
K12 = K13 = K21 = K23 = K31 = K32 = 0 ∂χ(u21 , u22 , u23 ) 0 μ0 4π(1+χ(u2 )) ∂u2i ˆ 1 ∂ × du1 Ji (u1 , t − |u2 − u1 |/c) , ∂t V |u2 − u1 | ˆ 1 ∂ρ 1 ∂ (Ki0 ρ)(u2 , t) = −χ(u2 ) (u1 , t − |u2 − u1 |/c) , du1 4π ∂u2i V |u2 − u1 | ∂t ˆ ∂ 2 Ji 0 μ0 1 du1 (u1 , t − |u2 − u1 |/c) , (Kii J )(u2 , t) = −χ(u2 ) 4π V |u2 − u1 | ∂t 2
(K0i J )(u2 , t) =
(10) where .i = 1, 2, 3 in the last 3 expressions. The true charges and current densities have to be chosen consistently, so they have to satisfy .(ρ, J) = (ρin , Jin ) + K(ρ, J). Then we get .(ρ, J) = (1 − K)−1 (ρin , Jin ). One uses (1) and (3) to get ρin (u, t) = −
.
1 dv 0 ˆ ∇χ (u).fin (u) (t − |u − a|/c) dt 1 + χ (u) |u − a|
(11)
and (1) and (4) to get Jin (u, t) =
.
1 dv d 2v ˆ σ e (u) (t − |u − a|/c) + 0 χ (u) 2 (t − |u − a|/c) . fin (u) |u − a| dt dt (12)
K is an unbounded operator due to the presence of the spatial and temporal derivatives, so the Neumann series cannot be used to calculate .(1 − K)−1 . It is already clear from the numerical results obtained in small bodies that inside the materials the scattered field presents a shape very different from the shape of the incident field, and the wavelength is changed. Instead, if we use the Born approximation, i.e. .(1 − K)−1 ≈ (1 + K), this would lead to densities whose shape would be similar to the incident density ones.
Wavelet Analysis of Scattering of Electromagnetic UWB Pulses from Large-. . .
289
For the reasons already mentioned in the introduction, we are going to study the self-adjoint operator .(1 − K)∗ (1 − K) = 1 − K ∗ − K + K ∗ K. In particular, we are going to study the sparsity of .K ∗ K in the Daubechies wavelet basis.
3 Time-Reversal Operator ⎡ ⎢ ⎢ K ∗K = ⎢ ⎣
∗ K + K∗ K + K∗ K + K∗ K ∗ ∗ K00 00 10 10 20 20 30 30 K00 K01 + K10 K11 ∗ K + K∗ K ∗ K + K∗ K K01 K01 00 01 11 10 11 11 ∗ ∗ ∗ K K02 K00 + K22 K20 K02 01 ∗ K + K∗ K ∗ K K03 K03 00 01 33 30
⎤ ∗ K + K∗ K ∗ ∗ K00 02 20 22 K00 K03 + K30 K33 ⎥ ∗ K ∗ K K01 K01 ⎥ 02 03 ⎥. ∗ K + K∗ K ∗ K ⎦ K02 K02 02 03 22 22 ∗ ∗ ∗ K03 K02 K03 K03 + K33 K33
.
K ∗ K acts on a quadruplet constituted by the charge density function and the three components of the current density field, which will be expressed on the basis formed by the products of the spatial wavelets and the temporal wavelets. We want to study the operator K ∗ K in the quadruplet basis wm0 n0 (x)wj0 k0 (t), wm1 n1 (x)wj k (t), wm2 n2 (x)wj2 k2 (t), wm3 n3 (x)wj3 k3 (t) .
.
We are not going to study all the elements of K ∗ K as this would be pretty tedious. ∗ K + K∗ K , To simplify the exposure, we limit ourselves to the study of K01 01 11 11 which acts only on the first component of the current density. To simplify, we will also suppose that the conductivity is zero. ∗ K We are going to show that many matrix elements (wm1 n1 wj1 k1 , (K01 01 + ∗ K11 K11 )wm1 n1 wj1 k1 ) are equal to zero or very small. To simplify the writing, we suppress the index 1 in the wavelet notation. The diameter of the support of wmn (x) is called dm , and the diameter of the support of wm n (x), dm . These diameters can be calculated once we know that the support of the Daubechies-p wavelet is the interval (−p + 1, p) so the support of wmn (y) is (2−m1 n1 − 2−m1 (p − 1), 2−m1 n1 + 2−m1 p)
.
×(2−m2 n2 − 2−m2 (p − 1), 2−m2 n2 + 2−m2 p) ×(2−m3 n3 − 2−m3 (p − 1), 2−m3 n3 + 2−m3 p).
(13)
290
F. Bentosela
In the following theorem, we want to show that some matrix elements of ∗ K + K ∗ K corresponding to the column (j, k, m, n) are equal to zero. K01 01 11 11 Theorem 1 For fixed (j, k, m, n), if |xm n − xmn | ≤ c + 2−j k − 2−j k − (2−j + 2−j )p + 2−j − dm − dm −j −j −j −j −j − d − d , or if |xm n − xmn | ≤ m m c + 2 k − 2 k − (2 + 2 )p + 2 ∗ K + K ∗ K )w w the matrix element wm n wj k , (K01 01 mn j k = 0. 11 11
Proof From (9), we get 1 .(K11 wmn wj k )(u, t) = −χ (u) 4π c2
ˆ dy V
d 2 wj k 1 wmn (y) (t − |u−y|/c). |u−y| dt 2 (14)
∗ K is Then the matrix element of K11 11 ∗
= (K11 wm n wj k , K11 wmn wj k ) ˆ ∞ ˆ d 2 wj k 1 du|χ(u)|2 dt dx (t − |u − x|/c) wm n (x) |u − x| dt 2 V −∞ V
.(wm n wj k , K11 K11 wmn wj k )
ˆ
=
1 16π 2 c4 ˆ × dy
d 2 wj k 1 (t − |u − y|/c) wmn (y) |u − y| dt 2 V ˆ ˆ ˆ 1 1 = dxw (x) dyw (y) du|χ(u)|2 mn mn |u − y||u − x| 16π 2 c4 V V V ˆ ∞ d 2 wj k d 2 wj k (t − |u − x|/c) (t − |u − y|/c) dt × dt 2 dt 2 −∞ ˆ ˆ ˆ 1 1 dxw (x) dyw (y) du|χ(u)|2 = mn mn |u − y||u − x| 16π 2 c4 V V V ˆ ∞ 2 d wj 0 d 2 wj 0 (t) (t + |u − x|/c − |u − y|/c − 2−j k + 2−j k ). × dt 2 2 dt −∞ dt
(15) Introducing function Wjj , ˆ Wjj (t0 ) :=
∞
.
−∞
dt
d 2 wj 0 d 2 wj 0 (t − t0 ), (t) dt 2 dt 2
(16)
then ∗
.(wm n wj k , K11 K11 wmn wj k )
ˆ du|χ(u)|2 V
=
1 16π 2 c4
ˆ
ˆ dxwm n (x) V
dywmn (y) V
1 Wjj (|u − y|/c − |u − x|/c + 2−j k − 2−j k ). |u − y||u − x|
(17)
Wavelet Analysis of Scattering of Electromagnetic UWB Pulses from Large-. . . d 2 wj 0 d2w (t), dt 2j 0 (t) have, respectively, their support in dt 2 1), 2−j p) and (2−j (−p + 1), 2−j p), the support of
As
− (2−j
+ 2−j )p
+ 2−j , (2−j
+ 2−j )p
291
the intervals (2−j (−p + Wjj (t0 ) is the interval
− 2−j . (tm , tM ) := To calculate the integral over u in the material volume, we introduce the prolate spheroidal coordinates. The Cartesian coordinates for u = (u1 , u2 , u3 ) are given with respect to the following orthonormal vectors whose origins are the middle point in between x and y, e3 = (x − y)/ x − y , e2 is parallel to the ground plane and perpendicular to e3 , while e1 is perpendicular to the previous ones. If one denotes by 2a the distance in between x and y, the spheroidal coordinates are given by
1 2 + (u + a)2 + u2 + u2 + (u − a)2 ( u2 + u 1 2 3 1 2 3 2a
1 2 2 2 2 2 ( u2 σ = 1 + u2 + (u3 + a) − u1 + u2 + (u3 − a) 2a u φ = arctan 2 . u1 τ=
.
(18)
Notice that τ ∈ [1, ∞), while σ ∈ [−1, 1). The Cartesian coordinates, in terms of the spheroidal coordinates, are given by u1 = a (τ 2 − 1)(1 − σ 2 ) cos φ u2 = a (τ 2 − 1)(1 − σ 2 ) sin φ
.
u3 = aτ σ.
(19)
The surfaces of constant τ are prolate spheroids, while the surfaces of constant σ are hyperboloids. The volume element is dV = a 3 (τ 2 − σ 2 )dτ dσ dφ. 1 1 Notice that τ = 2a (|u − y| + |u − x|) and σ = 2a (|u − y| − |u − x|), then |u − y| = a(τ + σ ) and |u − x| = a(τ − σ ), so |u − y||u − x| = a 2 (τ 2 − σ 2 ). ∗ K w w ) The integrand simplifies, and the matrix element (wm n wj k K11 11 mn j k becomes ˆ ˆ 1 ∗ dxwm n (x) dywmn (y)|x − y| .(wm n wj k K11 K11 wmn wj k ) = 16π 2 c4 V V ˆ 1 ˆ ∞ ˆ 2π −j −j dσ Wjj σ |x − y|/c + 2 k − 2 k dτ dϕ|χ (σ, τ, ϕ, x, y)|2 . −1
1
0
(20)
292
F. Bentosela
Denoting ˆ f (σ, x, y) :=
ˆ
∞
2π
dτ
.
1
∗ (wm n wj k K11 K11 wmn wj k ) =
ˆ
1
−1
(21)
0
we get .
dϕ|χ (σ, τ, ϕ, x, y)|2 , ˆ
ˆ
1 16π 2 c4
dywmn (y)|x − y|
dxwm n (x) V
V
dσ Wjj (σ |x − y|/c + 2−j k − 2−j k )f (σ, x, y).
(22)
to
Performing the change of variables σ |x − y|/c + 2−j k − 2−j k = t0 , it is equal (w
.
m n
w
j k
∗ K11 K11 wmn wj k )
ˆ
1 = 16π 2 c3
ˆ
ˆ dxw V
m n
(x)
dywmn (y) V
|x−y|/c+2−j k−2−j k
−|x−y|/c+2−j k−2−j k
dt0 Wjj (t0 )f˜(t0 , x, y) ,
(23)
where −j k
c(2 .f˜(t0 , x, y) := f (
− 2−j k + t0 ) , x, y). |x − y|
(24)
We are going now to study the position of the integral bounds |x − y|/c + 2−j k − −|x − y|/c + 2−j k − 2−j k with respect to tm and tM . Recall that interval (tm , tM ) is the support of Wjj (t0 ), tm = −(2−j + 2−j )p + 2−j and tM = (2−j + 2−j )p − 2−j . All the possible cases are: 2−j k ,
|x − y|/c + 2−j k − 2−j k ≤ tm . −|x−y|/c+2−j k −2−j k ≤ tm and tm ≤ |x−y|/c+2−j k−2−j k + ≤ tM . −|x−y|/c+2−j k −2−j k + ≤ tm and |x−y|/c+2−j k −2−j k + ≥ tM . −|x−y|/c+2−j k−2−j k + ≥ tm and tm ≤ |x−y|/c+2−j k−2−j k + ≤ tM . tm ≤ −|x−y|/c+2−j k−2−j k + ≤ tM and |x−y|/c+2−j k−2−j k + ≥ tM . −|x − y|/c + 2−j k − 2−j k + ≥ tM . ´ |x−y|/c+2−j k−2−j k In cases A and F, the integral dt W (t )f˜(t0 , x, y) is equal −|x−y|/c+2−j k−2−j k 0 jj 0 to zero. As x and y belong, respectively, to the supports of the wavelets wm n and wmn , |x − y| ≤ |xm n − xmn | + dm + dm , then condition A is satisfied if |xm n − xmn | ≤ ∗ K w w ) = 0 if |x − c(2−j k − 2−j k + tm ) − dm − dm , so (wm n wj k K11 11 mn j k mn xmn | ≤ c(2−j k −2−j k −(2−j +2−j )p +2−j )−dm −dm . Condition F is satisfied if |xm n − xmn | ≤ c(2−j k − 2−j k + tM ) − dm − dm , so ∗ K w w ) = 0 if |x − x | ≤ c(2−j k − 2−j k + (2−j + (wm n wj k K11 11 mn j k mn mn 2−j )p − 2−j ) − dm − dm . QED
(case A) (case B) (case C) (case D) (case E) (case F)
Wavelet Analysis of Scattering of Electromagnetic UWB Pulses from Large-. . .
293
Remark Theorem results from a property of function supports. It could have been obtained more easily noticing that the operators Kij are the products of three operators, a time derivation operator, a free wave propagation operator, and a space derivation operator. If m and j are sufficiently large, it is easy to follow, as t increases, the spatial support of Kij wmn wj k . At time t = 2−j k, the support is in a small ball, while at time t larger than t = 2−j k, the support is a spherical crown centred at xmn whose mean radius is c(t − 2−j k) and width dm . Similarly, the support Kij wm n wj k at time t is a spherical crown centred at xm n whose mean radius is c(t − 2−j k ) and width dm . It is easy to see that if |xm n − xmn | ≤ −j −j c + 2 k − 2 k − (2−j + 2−j )p + 2−j − dm − dm or if |xm n − xmn | ≤ c + 2−j k − 2−j k − (2−j + 2−j )p + 2−j − dm − dm , the supports never intersect as t evolves, so the scalar product of Kij wmn wj k and Kij wm n wj k is equal to zero. So Theorem 3.1 is almost trivial, and its proof could do without the introduction of spheroidal coordinates. Let us notice that the supports intersect if |xm n − xmn | > c(2−j k − 2−j k + (2−j + 2−j )p − 2−j ) + dm + dm and 2−j k − 2−j k > (2−j + 2−j )p − 2−j . A priori, nothing can be said for the scalar product; nevertheless, we are going to prove that with the previous condition the scalar product can be very small. This result will be nontrivial and more interesting. ´1 We are going to examine −1 dσ Wjj (σ |x − y|/c + 2−j k − 2−j k )f (σ, x, y) in case C. We need some hypothesis about the behaviour of f (σ, x, y) on the domain of integration. Using the fact that (tm , tM ) is the support of Wjj , previous integral can be rewritten as
´
c(tM +2−j k −2−j k |x−y| c(tm +2−j k −2−j k |x−y|
dσ Wjj (σ |x − y|/c + 2−j k − 2−j k )f (σ, x, y).
We are going to suppose that f (σ, x, y) defined in (21) may have partial derivative w.r.t. σ of order s, but that derivative of order s + 1 does not exist at discrete points σi (x, y). In this case, we will say that f (σ, x, y) has a type-s singularity at σi (x, y). These singularities depend on the geometry of the materials and the singularities of the permittivity inside the volume. In the case the volume is a sphere and the permittivity is constant inside the sphere, s is infinite, but if the permittivity is 2 (σ,s,y) piecewise constant in some parallelepipeds, for fixed x, y, ∂ f∂σ does not exist 2 for the values of σ , for which the hyperboloid of constant σ hits a parallelepiped corner (see “Appendix 4: Singularities”). The Taylor series at point σi can be written as ∂f (σ, x, y) (σi ) + ... ∂σ (σ − σi )s−1 ∂ s−1 f˜(σ, x, y) + (σi ) (s − 1)! ∂σ s−1
f (σ, x, y) = f (σi , x, y) + (σ − σi )
.
+ Rs (σ, x, y) inside an interval that depends on x and y.
(σ − σi )s s!
(25)
294
F. Bentosela
Denoting t0i := σi |x − y|/c + 2−j k − 2−j k , we get ∂ f˜(t0 , x, y) (t0i ) + ... f˜(t0 , x, y) = f˜(t0i , x, y) + (t0 − t0i ) ∂t0
.
+
(t0 − t0i )s−1 ∂ s−1 f˜(t0 , x, y) (t0i ) (s − 1)! ∂t0s−1
+ R˜ s (t0 , x, y)
(t0 − t0i )s . s!
(26)
Theorem 2 If |xm n −xmn | > c(2−j k −2−j k +(2−j +2−j )p −2−j )+dm +dm , −j , and f (σ, x, y) has singularities of type-s with 2−j k −2−j k > (2−j +2−j )p−2 −j
−j
−j
−j
k −2 k) c(tM +2 k −2 k) for all couples , s ≥ p + 1, inside the interval c(tm +2|x−y| |x−y| (x, y) for which x is in the ball of centre xm n and radiusdm and y is in the ball ∗ K + of centre xmn and radius dm , the matrix element satisfies wm n wj k , (K01 01 ∗ K )w w −j + 2−j )p for some constant C. K11 11 mn j k < C(2
If |xm n − xmn | > c(2−j k − 2−j k + (2−j + 2−j )p − 2−j ) + dm + dm , −j 2 k − 2−j k > (2−j + 2−j )p − 2−j , and of f (σ, x, y) has a unique singularity −j
−j
−j
−j
k −2 k) c(tM +2 k −2 k) for , type-s with s < p + 1 inside the interval c(tm +2|x−y| |x−y| all couples (x, y) for which x is in the ball of centre xm n and radius dm and y is in ∗ K + the ball of centre xmn and radius dm , the matrix element wm n wj k , (K01 01 s ∗ −j −j K11 K11 )wmn wj k < C (2 + 2 ) for some constant C’.
Remark For j,k,j’,k’ fixed, the interval shrinks as |x − y| becomes large.
c(tm +2−j k −2−j k) c(tM +2−j k −2−j k) , |x−y| |x−y|
Proof x, y) has singularities of type-s with s ≥ p + 1 inside the interval If f−j(σ, c(tm +2 k −2−j k) c(tM +2−j k −2−j k) for all couples (x, y) for which x is in the , |x−y| |x−y| ball of centre xm n and radius dm and y is in the ball of centre xmn and radius dm , then the derivatives of order p + 1 of f˜(t0 , x, y) exist, and its Taylor series at point 0 is ∂ f˜(t0 , x, y) (0) + . . . f˜(t0 , x, y) = f˜(0, x, y) + t0 ∂t0
.
p p+1 t0 ∂ p f˜(t0 , x, y) t0 . + (0) + Rp (t0 , x, y) p p! (p + 1)! ∂t0
(27)
Replacing in the integral (23) f˜(t0 , x, y) by its Taylor expansion expression, it appears, as the p first moments for the wavelet are equal to zero, that the integral in
Wavelet Analysis of Scattering of Electromagnetic UWB Pulses from Large-. . . t
p+1
295
M < C(2−j +2−j )p+1 for some constant (23) is smaller than supt0 |Rp (t0 , x, y)| (p+1)! C, due to Stirling’s approximation for the factorial. One can use the continuity of Rp (t0 , x, y) in the neighbourhood of xm n and xmn to prove that the matrix element ∗ K w w ) is smaller than constant times (2−j + 2−j )p+1 . (wm n wj k K11 11 mn j k If f (σ, x, y) has a unique singularityof type-s inside the interval
c(tm +2−j k −2−j k) c(tM +2−j k −2−j k) , |x−y| |x−y|
for all couples (x, y) for which x is in the ball of centre xm n and radius dm and y is in the ball of centre xmn and radius dm , this is equivalent to the fact that f˜(t0 , x, y) has a unique singularity at point t0i inside the interval (tm , tM ). Then we can use its Taylor series at point t0i until order s, given by (26). ∂ f˜(t0 , x, y) (t0i ) + . . . f˜(t0 , x, y) = f˜(t0i , x, y) + (t0 − t0i ) ∂t0
.
+
(t0 − t0i )s−1 ∂ s−1 f˜(t0 , x, y) (t0i ) (s − 1)! ∂t0s−1
+ R˜ s (t0 , x, y)
(t0 − t0i )s . s!
(28)
Replacing in the integral (23) f˜(t0 , x, y) by its Taylor expansion expression, it appears, as the s first moments for the wavelet are equal to zero, that, the integral ts in (23), if s < p + 1, is smaller than supt0 |R˜ s (t0 , x, y)| s!M < C (2−j + 2−j )s for some constant C’. One can use the continuity of R˜ s (t0 , x, y) in the neighbourhood ∗ K w w ) is of xm n and xmn to prove that the matrix element (wm n wj k K11 11 mn j k s −j −j
smaller than constant times (2 + 2 ) . The cases |xm n − xmn | > c(2−j k − 2−j k + (2−j + 2−j )p − 2−j ) + dm + dm , 2−j k − 2−j k > (2−j + 2−j )p − 2−j , are similar and give us matrix elements s+1 −j −j smaller than C(2 + 2 ) or C(2−j + 2−j )p+1 depending on the existence or not of singularities with s < p + 1. ´ |x−y|/c+2−j k−2−j k dt0 Wjj (t0 )f˜(t0 , x, y) is small In cases B, D, E, the integral −|x−y|/c+2−j k−2−j k
because of the Wjj (t0 ) oscillations, but not very small, due to the position of the integration bounds with respect to the Wjj support. (See Fig. A.2, Appendix 3: Daubechies Wavelets and Wjj Function.) ∗ K . From (9), we get Let us now look at K01 01 .K01 wm1 n1 wj k (u, t)
=
∂χ (u) 1 4π c2 (1+χ (u)) ∂u1
Then the matrix element is
ˆ dy V
dwj k 1 wmn (y) (t −|u−y1 |/c). |u−y| dt
296
F. Bentosela
∗ (wm n wj k , K01 K01 wmn wj k ) = (K01 wm n wj k , K01 wmn wj k )
.
=
1 16π 2 c4
ˆ du V
ˆ
)2 ˆ ( ∂χ∂u(u) 1 (1+χ (u))2
0
∞
ˆ
dt
dx V
dwj k 1 wm n (x) (t − |u − x|/c) |u−x| dt
dwj k 1 × dy wmn (y) (t − |u − y|/c) |u − y| dt V ˆ ˆ ˆ )2 ( ∂χ∂u(u) 1 1 1 n (x) = dxw dyw (y) du mn m 2 |u − y||u − x| 16π 2 c4 V (1 + χ (u)) V V ˆ ∞ dwj k dwj k (t − |u − x|/c) (t − |u − y|/c) dt × dt dt 0 ˆ ˆ ˆ )2 ( ∂χ∂u(u) 1 1 1 n (x) = dxw dyw (y) du mn m 2 4 2 16π c V (1 + χ (u)) |u − y||u − x| V V ˆ ∞ dw dwj 0 j1 0 (t +|u − x|/c − |u−y|/c − 2−j k1 +2−j k ). (t) dt × dt dt 0 It is clear from the last formula that the same analysis that was performed before ∗ K w w ) can be used now replacing in (3) the second for (wm n wj k , K11 11 mn j k d 2 wj 0
dwj 0
(u) 2 ( ∂χ ∂u )
derivative dt 21 (t) by dt1 (t) and |χ (u))2 | by (1+χ1(u))2 . We can do the same for all the matrix elements of K ∗ K.
4 Evaluating the Sparsity of the Time-Reversal Operator In the case the Hilbert space .L2 (R4 ) ⊗ C4 is replaced by a finite-dimensional space generated by a finite set of wavelets, we are going to estimate the proportion of matrix elements that are not negligible. There is no clear limit to the wave propagation of pulses in cities; nevertheless, considering that TX antenna, situated at the origin, emits a short pulse at a time that can be considered as the time origin, considering that this pulse will create charge and current densities in the materials that decrease rapidly in time and become negligible for t larger than some .tmax , we will impose a time limit and a space limit to the study (measurements show that the signal fades at the receiver antenna approximatively 300 ns after the emission, and this could be .tmax ). Then we will limit the study to a ball of radius .ctmax . To simplify the calculations, we suppose that the selected volume is a parallelepiped .(−L1 , L1 ) × (−L2 , L2 ) × (−L3 , L3 ). We limit the indices .j, j , mi , mi to some set of values, .j ∈ (J , J ), .j ∈ (J , J ), .mi ∈ (M i , M i ), .m ∈ (M i , M i ). The values of .M i depend on the support of the i Daubechies-p wavelet and the lengths .L1 , L2 , L3 . .J depends on the support of the Daubechies-p wavelet and .tmax . The values of .M i and .J will be chosen depending on the objective in terms of precision for the densities to be calculated.
Wavelet Analysis of Scattering of Electromagnetic UWB Pulses from Large-. . .
297
Once .mi is given, the .ni is chosen such that the support of .wmi ni (xi ), i.e. (2−mi n1 − 2−m1 (p − 1), 2−m1 n1 + 2−m1 p), intersects the interval .(−Li , Li ). Once j is given, .k is chosen so that the support of .wj k (t), i.e. .(2−j k −2−j (p −1), 2−j k + 2−j p), intersects the interval .(0, tmax ). The total number of considered .wmi ni spatial wavelets is equal to M −M 1 ) × (1 + 2 + ... + 2M 2 −M 2 ) × (1 + 2 + ...2M 3 −M 3 ) .(1 + 2 + ... + 2 1 M +M 2 +M 1 −M 1 −M 2 −M 3 +3 , while the total number of considered .w tempo.= 2 1 jk ral wavelets is equal to .(1 + 2 + ... + 2J −J )N = 2J −J +1 . ∗ K operator acts on a space of dimension So the truncated .K11 11 M +M 2 +M 3 +J −M 1 −M 2 −M 3 −J +4 . .N0 = 2 1 Now we want to estimate the number of matrix elements of the truncated ∗ ∗ .K K01 +K K11 matrix that are not negligible. We are going to look at the column 01 11 ∗ K + K ∗ K corresponding to the basis vector .w w . of the matrix .K01 01 mn j k 11 11 For fixed .(j , k ), in the case .2−j k − 2−j k > (2−j + 2−j )p − 2−j , and if .xm n belongs to the spherical crown
.
c(2−j k − 2−j k − (2−j + 2−j )p + 2−j ) − dm − dm < |xm n − xmn |
.
< c(2−j k − 2−j k + (2−j + 2−j )p − 2−j ) + dm + dm , ∗ K w w ) are not negligible. the matrix elements .(wm n wj k K11 11 mn j k −j −j In the case .2 k − 2 k > (2−j + 2−j )p − 2−j , the non-negligible matrix ∗ elements .(wm n wj k K11 K11 wmn wj k ) belong to the spherical crown
c(2−j k − 2−j k − (2−j + 2−j )p + 2−j ) − dm − dm < |xm n − xmn |
.
< c(2−j k − 2−j k +(2−j +2−j )p−2−j )+dm +dm . √ As .dm = (2p−1) 2−2m1 +2−2m2 +2−2m3 , .dm = (2p−1) 2−2m1 +2−2m2 +2−2m3 , we notice that the width of the spherical crown becomes smaller as j , .j , .m1 , .m2 , .m3 , .m , .m , .m become large. 1 2 3 To count matrix elements of the column that are not negligible, we can count, fixing .j, k, m, n, j , k , the number .N0 (j, k, m, n, j , k , m ) of points .xm n on the lattice generated by basis vectors .2−m1 e1 , 2−m2 e2 , 2−m3 e3 , which are inside the material volume, and such that previous inequalities are satisfied. Now to get the total that are not negligible, we perform the sum number of matrix elements . N −j −j 0 (j, k, m, n, j , k , m ). j m k ,2 k −2 k≥0 To illustrate the sparsity, let us estimate .N0 (j, k, m, n, j , k , m ) in the case the volume is a parallelepipedic wall (.L1 = 10m, L2 = 3m, L3 = 0.3m), signals are in the GHz range, and the wavelength outside is of the order of the centimetre. The unit of time is the nanosecond, and then the light velocity is 0.3 m/ns. We choose the Daubechies 4 wavelet. Then, .p = 4, .M 1 = − log2 L1 /p = −1, .M 2 = − log2 L2 /p = 0, .M 3 = − log2 L3 /p = 4, .J = − log2 L1 /cp = 3. The values for
298
F. Bentosela
M 1 , .M 2 , and .M 3 can be determined once the desired precision on the densities is chosen. We can for instance look for space details of the order .1cm and temporal details of .0.01ns in which case .M 1 = M 2 = M 3 = − log2 10−3 /p = 9 and .J = 9. Then the dimension of the column vector is M +M 2 +M 3 +J −M 1 −M 2 −M 3 −J +4 = 222 = 4.1010 . .N0 = 2 1 Let us choose the column .(j, k, m, n) such that .j = 7, k satisfies, .2−7 k = 10, such that .m1 = m2 = m3 = 7 and .n1 , .n2 , .n3 satisfy, respectively, .2−7 n1 = 5, −7 n = 2, .2−7 n = 0.15. We fix .j = 7 and k’ such that .2−7 k = 15 and choose .2 2 3 .m = m = m = 7. The difference in between the inner and outer radii of the 1 2 3 crown is approximately .Rout − Rin = 0.3. As the number of non-negligible matrix elements is the volume of the crown inside the parallelepiped, divided by the volume of the lattice cell, we find that the ratio between .N0 (j, k, m, n, j , k , m ) and the total number of components of the column vector is approximatively equal to the ratio in between the width of the crown .Rout − Rin and the length .L1 , and then it is close to 0.03. In the presence of singularities for .f (σ, x, y) of order s, it results from theorem 3.2 that some matrix elements out of the spherical crown are not negligible when .j, j , s are small. To find these matrix elements, which correspond to the scars in Fig. 1, is cumbersome but feasible. (See Appendix 4: Singularities for the Parallelepipedic Case.) The set of matrix elements corresponding to the spherical crown and the set of matrix elements outside of the crown that are not negligible will be calculated using formula (17) or (23). Except on the diagonal and its vicinity, we expect small values due to the wavelet oscillations.
.
5 Discussion and Conclusion Here we want to address some open problems and perspectives. The choice of p, which affects the support of the Daubechies-p wavelet, has to be tuned. From theorem 3.1 hypothesis, the number of matrix elements ∗ ∗ −j k − . wm n wj k , K K01 + K K11 wmn wj k such that .|xm n − xmn | < c(2 01 11
2−j k − (2−j + 2−j )p + 2−j ) − dm − dm , which are equal to zero, decreases with the increase of p as can be checked from the inequality. Similarly from the ∗ K + K∗ K hypothesis of Theorem 3.2, the number of matrix elements of .K01 01 11 11 such that .|xm n − xmn | > c(2−j k − 2−j k + (2−j + 2−j )p − 2−j ) + dm + dm decreases with the increase of p, but some of them can become much smaller since of order .(2−j + 2−j )p+1 . Notice also that the number of singularities of .f (σ, x, y), inside intervals whose length depends on p, changes. All that affects the number of matrix elements that will be neglected. An important question is the stability of the charge and current densities when we increase the dimension of the space generated by a finite set of basis functions. Using the fact that .(1 − K)∗ (1 − K) is Hermitian, one could control the stability
Wavelet Analysis of Scattering of Electromagnetic UWB Pulses from Large-. . .
299
in the following way. Denoting .A the matrix .(1 − K)∗ (1 − K) restricted to the finite space spanned by a finite number of basis functions, adding for instance the function .wmn wJ +1,0 to the previous basis, and a column and a line are added to matrix .A. Then the new matrix is formed by 4 sub-blocks .A for the original matrix, .B for the added column without the last element, .C for the added line without the last element, and .D for the added diagonal element. Using the following analytic inversion formula .
−1 AB (A − BD−1 C)−1 = −1 −D C(A − BD−1 C)−1 CD
−(A − BD−1 C)−1 BD−1 , D−1 + D−1 C(A − BD−1 C)−1 BD−1
one notices that as .D is large and .B and .C have many zero elements, while the non-zero are relatively small with respect to .D, it appears that .BD−1 C is small, so the block .A − BD−1 C is close to .A. As the component of the “incident densities”, .din , on the added wavelet is small, then the modified densities are very close to the densities previously calculated since if we compare the first components of the new densities with the components of the old densities, they are close since .A − BD−1 C is close to .A, and the added component of the new densities is small. To solve numerically the .(1 − K)∗ (1 − K)d = (1 − K)∗ din system, one has first to calculate the non-negligible matrix elements using (17) or (23). Solving the system, even if the sparsity is important, will need clever algorithms and powerful computers. Finally, we conjecture that the irregularities of the house fronts, windows, balconies, etc., greatly affect the signals, and their presence modifies .f (σ, x, y) and increases the number of non-negligible elements of the matrix .(1 − K)∗ (1 − K). To study better this point, we could start with a simple rectangular wall 20m wide, 10m high, .0.4m thick and after we would introduce several windows of size .1m × 2m distributed regularly. The aim would be to compare the resulting densities for the two cases. We could also determine the behaviour of the signals at the user antenna and compare them with the signals obtained by ray tracing.
Appendix 1: Antennas Diagrams For a small dipole antenna of length l, if the wire is along the .z direction, Er (u, ω) 0
.
Eθ (u, ω) j
e−j ω|u−a|/c ηiˆ0 (ω)lω sin θ ej ωt |u − a| 4π c
Eφ (u, ω) 0,
300
F. Bentosela
where .η = 120π ohms is the intrinsic wave impedance and where .|u − a|, θ , φ are the spherical coordinates for .u with .a as origin. ´ If the applied current is .i(t) = √ 1 dωiˆ0 (ω)ej ωt (we modify the usual 2π ω0 Fourier transform introducing .ω0 that can be the central angular frequency of the signal, just to give to .iˆ0 (ω) and .i(t) the same units), the field becomes 1 dv √l .Eθ (u, t) |u−a| 2 2πc sin θ dt (t − |u − a|/c) with .v(t) = η i(t). In the case the antenna is a rectangular patch antenna at position .a, if the .x axis is perpendicular to the patch, the .y axis is along the patch length L, and the .z axis is along the patch width W (see figure 14.16 in [1]) and if the applied voltage is .v ˆ0 (ω)ej ωt , the far-field components at point .u are .Er (u, ω)
0
Eθ (u, ω) 0 Eφ (u, ω) j
ωL ωW 2vˆ0 (ω)e−j ω|u−a|/c e tan θ sin cos θ cos sin θ sin φ ej ωt . π|u − a| 2c 2c
.Le is the effective length ( see 14.2 and 14.3 formulas in [1] for the link in between L and .Le ). Notice that in the patch case .fφ the variation with respect to .ω is also approximately linear if W and .Le are small. So in this case we will write that the .φ field component is approximatively in the range of the frequencies considered of the form
Eφ (u, t)
.
dv 1 2W sin θ (t − |u − a|/c). √ dt |u − a| 2π c
In the two cases, the field will be expressed as .
1 dv f(φ , θ ) (t − |u − a|/c). dt |u − a|
Let us notice that we have been speaking about the total electrical field in the absence of the RX antenna. In [2], we discussed the conditions that have to be satisfied in order, and when we study the link between the received voltage at the RX antenna and the voltage sent at the TX antenna, we can separate into three steps the problem. First considering that the TX antenna is alone and emits a well-known field, then considering that the scatterers produce fields, and finally considering that the RX antenna receives a field that is the sum of the field emitted by the TX antenna and the field due to the charges inside the scatterers, this total field being considered as incident, to produce the final RX voltage.
Wavelet Analysis of Scattering of Electromagnetic UWB Pulses from Large-. . .
301
Appendix 2: Conductivity and Polarisability The total field polarizes the materials. At some given point, the polarisation vector P(u, t) is related to the total field. We can write this relation in the form .P = 0 χ (E) where .χ is an operator. Denoting
.
ˆ ρp (u1 , t − |u2 − u1 |/c) 1 ∇ du1 4π 0 |u2 − u1 | V ˆ Jp (u1 , t − |u2 − u1 |/c) μ0 ∂ (GJp )(u2 , t) := − du1 4π ∂t V |u2 − u1 |
(G0 ρp )(u2 , t) := −
.
and using .ρp = −∇.P = −0 ∇.χ (E) and .Jp = 0 ∂t∂ χ(E), if the materials are perfect insulators, the self-consistent equations for .ρp (u, t) and .Jp (u, t) can be written as μ0 ∂ 1 GJp ∇(G0 ρp ) − 4π 0 4π ∂t ∂ 1 μ0 ∂ GJp . Jp = 0 χ Ein − ∇(G0 ρp ) − ∂t 4π 0 4π ∂t
ρp = −0 ∇.χ Ein −
.
The materials used for constructions, such as concrete, can be neither metals nor perfect insulators. Some of their valence electrons are allowed to move freely in the whole material and generate electrical currents under the influence of the electric field. Jf = σ e (E).
.
In the literature appear some hypothesis on the relations .P = 0 χ (E) and .Jf = σ e (E). If the electric field is not large, it is considered that the operators .χ are linear. In this case, the previous equations become 0 μ0 ∂ 1 ∇.χ (∇(G0 ρp )) − ∇.χ (GJp ) 4π 4π ∂t ∂ ∂ 1 ∂ 0 μ0 ∂ χ (∇G0 ρp ) − χ ( GJp ). Jp = 0 χ (Ein ) − ∂t 4π ∂t 4π ∂t ∂t
ρp = −0 ∇.χ (Ein ) +
.
It is usually supposed that the linear operator .χ is a .3 × 3 matrix whose elements are integral operators with kernels .χij (x, x , t − t ), and each of them, supposing that the polarisation at point .x depends only on the values of the electric field at this point, is rewritten as .χij (x, t − t )δ(x − x ). Another assumption consists in (1) (2) supposing that .χij (x, t − t ) = χij (x)χij (t − t ). The temporal dependence is
302
F. Bentosela
generally described giving us the frequency dependence of the polarisability, so ´ −iω(t−t ) χˆ (ω). This means that the polarisation at a time does χ (2) ij ij (t − t ) = dωe not only depend on the value of the field at this time but also on the previous values of the field. There is a delay in the response. In the case the material is isotropic, the matrix is diagonal, and the integral operators on the diagonal are the same. Notice that the current density in the material is the sum of two terms whose origin is different. It appears that to calculate the charge and current densities, it is necessary to know for each material the dependence of the polarisability and the conductivity at least on the frequency range corresponding to the voltage excitation of the antenna. Finally, if the polarisability and the conductivity are almost flat in the considered frequency range, we can consider that the operator .χ acts simply as a multiplication by .χ0 (x) and that .σ acts simply as a multiplication by .σ0 (x) where .χ0 (x) and .σ0 (x) are, respectively, the values of the polarisability and the conductivity at .x, at the central frequency .f0 .
.
Appendix 3: Daubechies Wavelets and Wjj Function ∗ K w w ), we To calculate numerically the matrix elements .(wm n wj k K11 11 mn j k 2 ´∞ d wj 0 d 2 wj 0 have to calculate the functions .Wjj (t0 ) = −∞ dt dt 2 (t) dt 2 (t − t0 ) from the Daubechies wavelets. We are using Mathematica. If we directly calculate the second derivatives of the wavelets and use NIntegrate, the numerical integration, we get the indication that “it failed to converge to prescribed accuracy after 9 recursive bisections” or “is converging too slowly; suspect one of the following: highly oscillatory integrand, or WorkingPrecision too small”. Then, instead, the Fourier images of the wavelets ˆ are introduced, and we ´ ∞ .w(ω) −j ω). get the alternative form .Wjj (t0 ) = 2−j/2 2−j /2 −∞ dωeiωt0 ω4 w(2 ˆ −j ω)w(−2 ˆ The Fourier images are calculated from the wavelet filter coefficients. Here we can see in Fig A.1, the plot of the Daubechies-10 wavelet .w(t) calculated from the filter coefficients (Fig. A.1), from the filter coefficients [18, 20], and the plot ´∞ for .W00 (t0 ) calculated from .W00 (t0 ) = −∞ dωeiωt0 ω4 w(ω) ˆ w(−ω) ˆ through the ˆ of the Daubechies-10 wavelet (Fig. A.2). Fourier image, .w(ω),
Appendix 4: Singularities for the Parallelipedic Case In this appendix, we discuss about the Hypothesis .Rs in the case the volume is a parallelepiped with constant permittivity .χ0 . ´∞ ´ 2π We look at the singularities of .f (σ, x, y) = 1 dτ 0 dϕ|χ (σ, τ, ϕ, x, y)|2 .
Wavelet Analysis of Scattering of Electromagnetic UWB Pulses from Large-. . .
303
Fig. A.1 The Daubechies-10 wavelet
Fig. A.2 The .W00 function
Let us notice that if .Sσ denotes the surface of the hyperboloid .σ inside the parallelepiped, then .f (σ, x, y) = |χ0 |2 Sσ . It is clear that .Sσ is very regular as long as the hyperboloid .σ does not hit the parallelepiped corners. So there are 8 values, .σi (x, y), .i = 1, ...8, for which the derivative of .f (σ, x, y) is discontinuous. The .σi (x, y) are easily calculable. An expression for .Sσ can be obtained analytically. We have to find the intersection of the circle parametrized by .τ and .σ and the surface planes .P1 , P2 , , , P6 of the parallelepiped.
304
F. Bentosela
If the components of the normal to the plane .Pi are .(αi , βi , γi ), the equation for plane .Pi is .αi u1 + βni u2 + γi u3 = γi si , where .u1 , u2 , u3 are the Cartesian coordinates of a point with respect to three orthogonal axes whose origin is .(x + y)/2, and the third axis direction is given by the direction of .x − y. In spheroidal coordinates, intersection points canbe the the values for the .φi (τ, σ ) of extracted from .αi a (τ 2 − 1)(1 − σ 2 ) cos φ + βi a (τ 2 − 1)(1 − σ 2 ) sin φ + γi aτ σ = γi s, where s is the Cartesian coordinate of the point at the intersection of the .x − y axis with .Pi in the Cartesian coordinates. If we denote by .ti (x, y) the images of the .σi (x, y) by the transform .σ → T(σ ) = σ |x − y|/c + 2−j k − 2−j k , it is possible to see if the .ti (x, y) belong or not to the interval .(tm , tM ). If they do not belong to .(tm , tM ), one concludes that the matrix elements . wm n wj k , K ∗ Kwmn wj k , . wm n wj k , KK ∗ wmn wj k are of the order
(2−j + 2−j )(p+1) . If one of the .ti (x, y) belongs to the interval .(tm , tM ), the matrix elements are of the order .(2−j +2−j ). When j and .j become big enough, in general the .ti (x, y) are outside the interval .(tm , tM ). This is not the case when .x and .y are at equal distance from a corner ( or at approximatively equal distance from a corner), i.e. if .x is on the sphere of radius .|y − ci | or close to this sphere. So once .xmn , .j, k are fixed, if .|xm n − xmn | > dm + dm + c(2−j k − 2−j k + (2−j + 2−j )p − 2−j ) and .2−j k − 2−j k > (2−j + 2−j )p − 2−j , the matrix elements are of order .C(2−j + 2−j )p except those for which .xm n is close to one of the spheres centred at corners .ci of radius .|y − ci |, in which case they are of order −j + 2−j ). .(2 In Fig. 1, the red scars correspond to the singularities of .fˆ. The scars are in the vicinity of the spheres centred at the corners of the parallelepiped .ci and whose radius is .|ci − xmn |. Notice that they disappeared in the lower figure as for large time the intersection in the outer region of the crown with the spheres centred at the corners is void. .
References 1. C. A. Balanis, Antenna theory - Third edition - Wiley Interscience (2005) 2. F. Bentosela, H. Cornean, B. H. Fleury, N.Marchetti, On the transfer matrix of a MIMO system Mathematical Methods in the Applied Sciences Vol.34 (8) pp. 963–976 (2011) 3. D. Colton, R. Kress, Inverse acoustic and electromagnetic scattering theory 2nd edition, Springer -Verlag (1998) 4. I. Daubechies, “Orthonormal bases of compactly supported wavelets” Commun. Pure Appl. Math., 41 pp. 909–996 (1988) 5. I. Daubechies, “Ten lectures on wavelets”, Regional Conference Series, SIAM Philadelphia (1992) 6. M. Fink, “Time Reversed Acoustics”. Physics Today. 50 (3)(1997) 7. A. Grossmann and J. Morlet, “Decomposition of Hardy functions into square integrable wavelets of constant shape,” SIAM J. Math. vol. 15. pp. 723–736 (1984)
Wavelet Analysis of Scattering of Electromagnetic UWB Pulses from Large-. . .
305
8. P. Hahner, On acoustic, electromagnetic and elastic scattering problems in inhomogeneous media Habilitation thesis, Gottingen (1998) 9. J.D. Jackson, “Classical Electrodynamics” (3rd ed.) John Wiley and Sons N.Y. (1999) 10. A. Grossmann, T.T. Wu, Schrodinger Scattering Amplitude. I J. Math. Phys. 2, p.710 (1961) Schrödinger Scattering Amplitude. II J. Math. Phys., t. 3, p. 684 (1962) 11. L. A. García-Cortès C. Cabrillo, A Monte Carlo algorithm for efficient large matrix inversion http://arxiv.org/abs/cs/0412107v2. (Accessed June 5, 2016.) 12. A. George and J. Liu, Computer solution of large sparse positive definite systems, Prentice Hall, Englewood Cliffs, NJ, (1981) 13. P. G. Lemarié and Y. Meyer, “Ondelettes et bases Hilbertiennes.” Revista Matematica Ibero Americana. vol. 2. (1986). 14. S. Mallat, A Theory for multiresolution signal decomposition: the wavelet representation, IEEE Trans. on Pattern Anal. and Machine Intell., 11(7), pp. 674–693 (1989) 15. S. Mallat, A Wavelet Tour of Signal Processing (Third Edition), (2009) 16. F. Molisch, Ultra-wide-band propagation channels, Proceedings of the IEEE, vol. 97, no. 2, pp. 353–371, (2009). 17. L. Rubio, J. Reig, H. Fernandez, V. Rodrigo-Peñarrocha, Experimental UWB Propagation Channel Path Loss and Time-Dispersion Characterization in a Laboratory Environment Hindawi Publishing Corporation International Journal of Antennas and Propagation Volume 2013, Article ID 350167, https://doi.org/10.1155/2013/350167 18. Wikipedia “Daubechies wavelets” https://en.wikipedia.org/wiki/Daubechies(underscore) wavelet 19. Wikipedia “Polarisation density” https://en.wikipedia.org/wiki/Polarization(underscore) density 20. Wolfram https://mathworld.wolfram.com/DaubechiesWaveletFilter.html
Species of Spaces Thierry Paul
à André, Berthe, Georgette et Raoul, pour ce que je n’ai pas su “L’espace de notre vie n’est ni continu, ni infini, ni homogène, ni isotrope. Mais sait-on précisément où il se brise, où il se courbe, où il se déconnecte et où il se rassemble ? On sent confusément des fissures, des hiatus, des points de friction, on a parfois la vague impression que ça se coince quelque part, ou que ça éclate, ou que ça cogne. Nous cherchons rarement à en savoir davantage et le plus souvent nous passons d’un endroit à l’autre, d’un espace à l’autre sans songer à mesurer, à prendre en charge, à prendre en compte ces laps d’espace. Le problème n’est pas d’inventer l’espace, encore moins de le réinventer (trop de gens bien intentionnés sont là aujourd’hui pour penser notre environnement. . . ), mais de l’interroger, ou, plus simplement encore, de le lire ; car ce que nous appelons quotidienneté n’est pas évidence, mais opacité : une forme de cécité, une manière d’anesthésie. C’est à partir de ces constatations élémentaires que s’est développé ce [. . . ] journal d’un usager de l’espace.” Georges Perec Espèces d’espaces, Galilée 1974
Alex’s view of quantum mechanics was an entanglement of geometry, analysis and algebra. From nested Hilbert spaces to von Neumann algebras of type II, he has constantly been interested in quantizing classical underlying phase spaces. The lines that follow make the inverse journey and study traces of noncommutativity which remains at the classical limit. They all testify to Alex’s deep influence on me.
T. Paul () Sorbonne Université, CNRS, Université Paris Cité, Laboratoire Jacques-Louis Lions (LJLL), Paris, France e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Flandrin et al. (eds.), Theoretical Physics, Wavelets, Analysis, Genomics, Applied and Numerical Harmonic Analysis, https://doi.org/10.1007/978-3-030-45847-8_16
307
308
T. Paul
Abstract Classical limits of quantum systems are shown to lead to different conceptions of spaces different from the classical one underlying the process of quantization of such systems. The accent is put in situations where traces of noncommutativity, witness of an emblematic feature of quantum mechanise remains when the Planck constant vanishes, in the framework of noncommutative geometry. Complex canonical transformations, spin-statistics, topological quantum fields theory, long time semiclassical approximation and underlying chaotic dynamics are considered, together with a comparison/fusion of classical unpredictability with quantum indeterminism. Contents 1
2
3
4
5
6
Introduction: A Space Journey Through the Wonderful Landscape of Quantum and Classical Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 From Planck to Heisenberg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 From Symbols to Operators: A Quick Journey in Quantizland . . . . . . . . . . . . . . . . . . . . 1.3 From Operators to Symbols: Tell Me Which Operator You Are, I will Tell You What Is Your Symbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 From Symbols to Classical: Spaces as Possible Symbolic Calculi . . . . . . . . . . . . . . . . Quantizing Complex Canonical Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Off-diagonal Toeplitz Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Link with Weyl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Flows on Extended phase space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Real Phase Space Associated with Complex Linear Symplectomorphisms. . . . . . . 2.6 Noncommutative Geometry Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 The Canonical Groupoid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.2 Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.3 On the (Formal) Composition of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Non-Canonical Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bose-Einstein-Fermi at the Classical Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Husimi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Wigner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Toeplitz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 On Wigner Again . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Off-diagonal Toeplitz Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Link with the Complex (Anti)metaplectic Representation . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 A Classical Phase Space with Symmetries Inherited Form Quantum Statistics . . Noncommutative Moduli Spaces Underlying Topological Quantum Field Theory in Large Colouring Asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 The Standard Geometric Quantization of the Sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 a-Toeplitz Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Symbolic Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Main Result. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Classical Limit and Underlying “phase space” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Long Time Semiclassical Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Noncommutative Geometry Interpretation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Another Groupoid Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Extended Semiclassical Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Noncommutative Phase Space Associated to Time Arrow. . . . . . . . . . . . . . . . . . . . . . . . . The Noncommutative phase space Underlying the Quotient by the Quantum Flow . . . . . .
309 310 312 314 315 319 320 323 323 323 325 325 325 326 326 327 330 331 332 334 335 336 338 339 339 341 343 345 346 347 348 349 350 351 352 352 353
Species of Spaces
309
6.1
The Non-resonant Harmonic Oscillator Spectrum and the Non commutative Torus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 The Space of Frequencies and the Noncommutative Torus . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Extensions to Chaotic Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Homoclinic Foliations Versus Invariant Tori Fibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.2 The Construction of the Noncommutative Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.3 Bohr-Sommerfeld Conditions I. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.4 The “Poincaré” Section . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.5 Bohr-Sommerfeld Rules II. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.6 Creation, Annihilation, and All That . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.7 A New and Noncommutative Framework for Classical Dynamics . . . . . . . . . . . . . . . . 6.4 Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Conclusion: The Quotient of the Phase Space by the Flow . . . . . . . . . . . . . . . . . . . . . . . . 7 Indeterminism Versus Unpredictability (How Quantum Indeterminism Would Have Chocked Laplace But Not Poincaré) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Measurement in Quantum Mechanics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Critics of the Deterministic Reason of Classical Mechanics . . . . . . . . . . . . . . . . . . . . . . . 7.3 Pushing Sensitivity to Initial Conditions to Its Extreme . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Some Space for Merging the Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 A Classical Phase Space Incorporating the Point t = ±∞ . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
353 357 358 359 359 360 360 363 372 374 375 376 377 377 378 379 380 382 384
1 Introduction: A Space Journey Through the Wonderful Landscape of Quantum and Classical Mechanics “Il ne faut pas peindre ce que l’on voit, puisqu’on le voit. Il ne faut pas peindre ce que l’on ne voit pas, puisqu’on ne le voit pas. Il faut peindre qu’on ne voit pas.” Claude Monet (attr. [22]).1 It is striking to anybody who worked with Alex that he had a strong geometrical view of the problems he was looking at. More precisely, he had an automatic and unavoidable a priori way of not only looking but also intentionally watching geometrical. This attitude happened to be very fruitful in several cases where the underlying geometrical structure was not evident to detect but was crucial for the early development of future theories. To quote only two among many, let us first mention the excavation of the group theoretical structure (orthogonality relations for non-unimodular groups) [26] out of the hectic, say, computations of Jean Morlet [40]: the geometry of group theory underlying the not already truly born continuous wavelets was definitive for their primitive development. The second one, much less known, is the discovery, in [24] that a non-trivial operator algebra appears in
1 A rough translation could be: “One shouldn’t paint what one sees, since one sees it. One shouldn’t paint what one don’t see, since one doesn’t see it. One should paint that one doesn’t see.”
310
T. Paul
regular quantum mechanics for solid state physics in presence of a magnetic field. The reduction of the dynamics by the periodic symmetry, much more sophisticated than the (cotangent bundle of the) torus obtained by Bloch decomposition when no magnetic field is present: the reduced dynamics lives in a von Neumann algebra of type II, a premonition of noncommutative geometry to be born more than 10 years later in [15]. By some respects, the birth of the (mathematical) theory of wavelets recalls the birth of quantum mechanics, in 1925, in the extraordinary article by Heisenberg in [29]: working together with Max Born on a possible quantization “à la Bohr-Sommerfeld” of the perturbation series arising in the classical Helium atom, Heisenberg felt somehow necessary to enlarge the classical paradigm from functions subject to commutative multiplication (actually Fourier series endowed with commutative convolution) to matrices, endowed with their noncommutative multiplication law.2 At this precise moment, a new space was born: the one in which a true quantum dynamics could live, that is a natural kinematics for it. Every mathematician understands the gap between a function and a matrix and the impossibility to reduce the second to the first. Nevertheless, as the (commutative algebra of) continuous functions define in a unique way the manifold on which they are defined, the noncommutative algebra of Heisenberg matrices was going to define a new space, a noncommutative one, the space of quantum mechanics. By no mean reducible to the space of classical mechanics. In 1925, Heisenberg, “tel Monet, a peint qu’on ne voyait pas”.3
1.1 From Planck to Heisenberg The quite long gestation of quantum mechanics, from the seminal work of Planck to the first true paper by Heisenberg involving truly quantum dynamics twenty five years later, can maybe be explained by the fact that the totally new quantum paradigm was born deeply inside the classical one. A classical paradigm strongly influenced, from centuries, by the notion of a geometrical space (flat or curved, in three or four dimensions, whatever) which was considered as the unavoidable environment inside which any dynamics should take place. One of the difficulties met by quantum mechanics to be born was precisely the incompatibility of its foundations with the classical space, while at the same time the perception of quantum effect live precisely in this classical space.
2 The
similitude between iterated convolutions involving sum of terms of the form . . . ai−j bj −k ck−l . . . with iterated multiplication on matrices involving this times sum of the form . . . . ai,j bj,k ck,l . . . is striking .. . . at least when one knows what a matrix is, which was not the case of Heisenberg. 3 “like Monet, has painted that one didn’t see.”
.
Species of Spaces
311
Very early, soon after Heisenberg’s paper and even before Schrödinger delivered the equation in space of the waves conjectured by de Brogglie, Dirac understood a fundamental link between classical and quantum paradigm : the commutators of matrices are the quantization of the Poisson bracket4 of functions on classical spaces: classical {·, ·} ←→
.
i h¯ [·, ·]
quantum.
In fact, the general feeling at the beginning of quantum mechanics was that only the right oriented arrow was significant: one quantizes the Poison bracket to get the commutator: 0 → h¯ i h¯ [·, ·]
classical {·, ·} −→
.
quantum
but soon, thanks to the asymptotic studies of the Schrödinger equation for small values of the Planck constant, the left oriented arrow became more important, and somehow satisfying: the world is quantum but when the Planck constant is small, the classical paradigm reappears i [·, ·] h¯ → 0 h¯
classical {·, ·} ←−
.
quantum.
But, if this “correspondence” has the advantage of showing the (our) classical word at the border (.h¯ = 0) of the (true) quantum word, both directions of the arrow just mentioned have their own, unsolvable problems. 0→h¯
The direct quantization arrow .−→, introduced by Weyl very few years after the works by Heisenberg and Schrödinger, links two paradigms having two fundamentally different groups of symmetries. The natural geometrical setting of classical mechanics is one of the symplectic spaces, endowed with the group of symplectomorphism. The natural symmetries of the target quantum space are conjugations by unitary operators. These two groups of symmetries merge only for the restriction to linear symplectomorphisms, mapped into the so-called metaplectic operators. (we will get back to this with precise definitions in Sect. 2 below.) In other words, the only classical changes of variables truly “quantizable” are the ones which have already the very quantum flavour of linearity, a very small selection of possible classical symmetries. On the contrary, quantum statistics, bosons versus fermions, have no direct equivalents in classical “naive” situations. And these are just few examples.
us remind that the commutator of two matrices A and B is defined through .[A, B] := AB − BA and the Poisson bracket of two functions .f (q, p), g(q, p) on (say) .R2 is the function defined by .{f, g}(q, p) := ∂q f (q, p)∂p g(q, p) − ∂p f q, p)∂q g(q, p).
4 Let
312
T. Paul
The classical limit direction .←− is by many ways analogue to the passage from h¯ →0
physical to geometrical optics: it reintroduces trajectories out of waves. Moreover it re-establishes classical functions with point wise (commutative) multiplication out of noncommutative quantum multiplication of matrices by the (formal) argument: [·, ·] ∼ hi¯ {·, ·} → 0 as h¯ → 0.
.
Besides its aesthetic, securing beauty, this argument is by many ways too simple and leaves many problems unsolved. First of all it does not explain the very fundamental difference between classical and quantum dynamics for large times: the quantum one, linear, being quasiperiodic and the classical one, in the chaotic situation, showing high dependence on initial conditions. Second, the culturally admitted, till the last decades, equivalence between the two dichotomies micro / macro .quantum / classical ∼ short times / long times is not experimentally suitable any more: very excited atoms of Rubidium which have the size of a small bacteria are tractable nowadays in experimental quantum information for time scales of the order of several seconds. The reality is that what is contained in the “limit” .h¯ → 0 is much more complex than taking a simple limit in the values of one parameter, as one does, for example, by letting the speed of light diverging. In fact it is, at an epistemological point of view, closer to the different large number N of particle limits in classical mechanics: mean-field, grand canonical, Boltzmann Grad, etc. Very different situations are reachable under the simple aphorism .N → ∞. In the different sections of this article, we will present several quantum situations where the classical limit .h¯ → 0 involves geometrical situations which cannot be handled by the classical structures we just described, with a particular focus on the situations where the noncommutative aspects, flavour of quantum mechanics, do not disappear at the limit .h¯ → 0.
1.2 From Symbols to Operators: A Quick Journey in Quantizland In 1925, Heisenberg invented quantum mechanics as a change of paradigm from (classical) functions to (quantum) matrices. He founded the new mechanics on the well known identity .
1 i h¯ [Q, P ]
=1
Species of Spaces
313
that, a few months later, Dirac recognized as the quantization of the Poisson bracket {q, p} = 1.
.
Again a few years later, Weyl stated the first general quantization formula by associating to any function .f (q, p) the operator ˆ F (Q, P ) =
.
f˜(ξ, x)ei
xP −ξ Q h¯
dξ dx,
where .f˜ is the symplectic Fourier transform defined analogously through ˆ f (q, p) =
.
f˜(ξ, x))ei
xp−ξ q h¯
dξ dx.
Many years after was born the pseudodifferential calculus first establish by Calderon and Zygmund, and then formalized by Hörmander through the formula giving the integral kernel .ρF for the quantization F of a symbol f in d dimensions as ˆ ξ(x−y) dξ . .ρF (x, y) = f (x, ξ )ei h¯ (2π h) ¯ d A bit earlier had appeared, both in quantum field theory and in optics (Wick quantization) the (positive preserving) Toeplitz quantization of a symbol f 5 ˆ Op [f ] =
.
T
f (q, p)|q, pq, p|dqdq,
where .|q, p are the famous (suitably normalized) coherent states defined as α=i ψz=(q,p) in Sect. 2.1 below. As we see, quantization is not unique. But all the different symbolic calculi obtained above, after inversion of the quantization formulæ written above, share the same two first asymptotic features:
.
• The symbol of a product is, modulo .h¯ , the product of the symbols. • The symbol of the commutator divided by .i h¯ is, modulo .h¯ again, the Poison bracket of the symbols. In other words, they all define a classical underlying space (an algebra of functions) endowed with a Poisson (of more generally symplectic) structure.
Dirac notation will be used through the whole article: .|# will be meant as an element of a Hilbert space .H, .#1|#2 as the scalar product .(|#1, #2), and .|#1#2| as the operator on .H defined by .|#1#2|ϕ := (|#2, ϕ)|#1, .ϕ ∈ H. 5 The
314
T. Paul
1.3 From Operators to Symbols: Tell Me Which Operator You Are, I will Tell You What Is Your Symbol But it is very easy to show that this nice quantum/classical picture has its limits. And one can easily construct quantum operators whose classical limit will not follow the two items expressed above. Consider, for example, the well known creation and annihilation operators .a + = Q + iP , a − = Q − iP . They act of the eigenvectors .hj of the harmonic oscillator by a + hj =
.
− (j + 12 )hh ¯ j +1 , a hj =
(j − 12 )hh ¯ j −1 .
Consider now the matrices
M1+
.
⎛
⎞ 0 ... ... 0 . . . . . .⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ 1 0 . . .⎠ ... ... ...
⎛
⎞ 0 ... ... 0 . . . . . .⎟ ⎟ ⎟ ⎟ ⎟ ⎟, ⎟ ⎟ ⎟ 0 0 . . .⎠ ... ... ...
0 0 0 ⎜1 0 0 ⎜ ⎜ ... ⎜ ⎜ =⎜ ... ⎜ ⎜ ... ⎜ ⎝ 0 ... 0 ... ... ...
and its adjoint
M1−
.
0 1 0 ⎜0 0 1 ⎜ ⎜ ... ⎜ ⎜ =⎜ ... ⎜ ⎜ ... ⎜ ⎝ 0 ... 0 ... ... ...
N and the operator .M± 1 they define on the basis .{ϕn , n = o, . . . , N −1} of .HN defined by (29). An elementary computation shows that + 2 2 −1/2 2 2 −1/2 − M+ , M− a , 1 = a (P + Q ) 1 = (P + Q )
.
therefore, their (naively) expected leading symbols are .f + (q, p) = q+ip q−ip and q−ip − iθ ± ±iθ . .f (q, p) = q+ip or, in polar coordinates .q + ip = ρe , .f = e
Species of Spaces
315
− If symbolic calculus would work the leading symbol of .M+ 1 M1 should be equal + − + − to .f f = 1 and .M1 M1 should be therefore close to the identity I as .h¯ → 0. But ⎞ ⎛ 0 0 0 0 ... ... ⎜ 0 1 0 0 . . . . . .⎟ ⎟ ⎜ ⎜ ... . . .⎟ ⎟ ⎜ ⎟ ⎜ + − .M M ... . . .⎟ = I − |h0 h0 | I + o(1) as h¯ → 0. 1 1 =⎜ ⎟ ⎜ ⎜ ... . . .⎟ ⎟ ⎜ ⎝ 0 . . . 0 0 1 . . .⎠ ... ... ... ... ... ...
The reason for this defect comes from the fact that the functions .ei±θ = q±ip q∓ip are not smooth functions on the plane. In fact they are not even continuous at the origin: ± can tend to any value in .{eiθ , θ ∈ R} when .q, p → 0 tends to zero. .f Note finally that the commutator ⎛
−1 0 0 ⎜ ⎜ 0 0 0 ⎜ ⎜ ... ⎜ ⎜ + − .[M1 , M1 ] = ⎜ ... ⎜ ⎜ ... ⎜ ⎜ ⎝ 0 ... 0 ... ... ...
⎞
0 ... ... ⎟ 0 . . . . . .⎟ ⎟ . . .⎟ ⎟ ⎟ . . .⎟ = −|h0 h0 | = O(h), ¯ ⎟ ⎟ . . .⎟ ⎟ 0 0 . . .⎠ ... ... ...
(1)
so that its symbol does not vanish at leading order, as expected by standard symbolic asymptotics.
1.4 From Symbols to Classical: Spaces as Possible Symbolic Calculi Therefore the following question arises at the “classical” limit .h¯ → 0: what is the corresponding classical paradigm underlying this limit? The natural answer to this question seems to us: it is the locus where lives the limit at .h¯ → 0 of the symbol of the quantum object, after the later has been expressed in a through a symbolic calculus uniform in the Planck constant. For all the cases treated in Sect. 1.2, Wigner, Weyl, Töplitz, etc., this locus was the same: the classical phase space out of which the diverse quantization procedures were issued. But we just saw in the preceding Sect. 1.3 an example where this “rule” failed. The subject of the present article is to build other situations, interesting both
316
T. Paul
from a physical and mathematical points of view, where the classical underlying space is not the standard one, mostly endowed with a noncommutative structure inherited from the overlying, true, quantum paradigm. It happens that the different symbolic constructions that we will meet will be generalizations of the Töplitz one: ˆ OpT [f ] =
.
f (q, p)|q, pq, p|dqdq
which will remain somehow the core of our constructions. Let us remind that, on the contrary of the Weyl or pseudodifferential quantization situation, the Töplitz quantization does not offer a recipe which enables to recover the (full) symbol out of the operator per se. We believe that this fact, often considered as a pain in the neck, is in fact responsible for the rich possibilities of enlarging the aforementioned formulæ to more general ones. To put it in a nutshell, a space, quantum, classical or anything else, is nothing but a symbolic calculus endowing an operator algebra. Summarize This article is devoted to the study of the space hosting the dynamics underlying the quantum one at the limit where the Planck constant vanishes. In many situations this underlying space is not the expected one, namely the classical space on which the quantum theory is built through different quantization processes. In other words, the true classical underlying space of quantum mechanics is not the space of which the quantum overlying dynamics is the quantum one. Jokingly speaking, the diagram does not commute. In particular, we will show that a strong taste on noncommutativity, keystone of the quantum paradigm, remains sensitive, at the classical limit, in many situations. This underlying space will take different aspects, endowing therefore an existentialist flavour totally transverse, if not orthogonal, to the essential vision of an absolute space hosting the dynamics. This unusual conception of what a space should be, unusual in our classical culture, has to be put in front of the famous sentence by Einstein: “ It is the theory who decides what is to be observed”6 . Different dynamics, although they all live in the same quantum framework of Hilbert space, give rise to different spaces when the Planck constant becomes small. In fact, in several situations, a naive guess for the underlying space leads to drastic singularities, which forbid the existence or uniqueness of the classical dynamics. But these singularities disappear after a new suitable noncommutative re-dressing. This is perfectly in accordance with the philosophy of noncommutative
6 comments to W. Heisenberg after a talk he gave at the period where he was still looking for quantum trajectories, F. Balibar, private communication.
Species of Spaces
317
geometry of Alain Connes, and more generally, in fact, with the birth (and efficiency) of the quantum paradigm. If one believes that nature is quantum, after all, these new “classical” structures, albeit arising at a large scale compared to one of the Planck constant but also keeping track of the source of stability that constitutes the quantum noncommutativity, might explain the amazingly stable feeling provided by the macroscopic world. Overview of the Article This article contains six sections, each of them devoted to a different quantum situation whose limit at .h¯ → 0 does not correspond to the expected classical commutative paradigm. The keystone through the whole paper will be the (positive preserving) Toeplitz quantization of a symbol f already defined in Sect. 1 ˆ OpT [f ] =
.
f (q, p)|q, pq, p|dqdq,
where .|q, p are the (suitably normalized) coherent states, which will go through several modifications all along the paper. One of the most striking modification of the preceding formula will consist in changing the diagonal dyadic element .|q, pq, p| by off-diagonal ones of the form .|q , p q, p| where .(q , p ) is related to .(q, p): .(q , p ) = f (q, p) for a certain function f . In Sect. 2, we investigate a new Toeplitz-like quantization suitable for quantum observables propagated by quantum flows generated by complex quadratic Hamiltonians. More precisely, we look at the quantization of complex linear symplectic flows, a subject studied by Alex in a beautiful paper [25]. In particular we show that a very simple and “natural” off-diagonal extension of the Toeplitz paradigm .OpT [f ] introduced in the preceding section happens to be very useful. By off-diagonal, we mean a construction of the form ˆ T .Opcomplex [f ] = f (q, p)|q (q, p), p (q, p)q, p|dqdq for some mapping .(q, p) → (q , p ). Associated with this formula is a noncommutative symbol belonging to the .C ∗ -algebra of a groupoid linked to the complex symplectic group. Moreover, this formulation extends naturally to the non-canonical case. Section 3 investigates quantum statistics in front of a possible underlying phase space. The general feeling about this question is that the notion itself of quantum statistics disappears at the classical limit, since the wave function disappears by oscillation when .h¯ → 0. In fact we show that exchanging two quantum particles
318
T. Paul
correspond at the classical level to letting a complex linear mapping act on phase space, like in Sect. 2. But this time the mapping is anticanonical: it sends the symplectic form to its opposite dp ∧ dq → −dp ∧ dq.
.
In other words it is not a symmetry of the classical paradigm. But this construction infers the possibility of introducing a noncommutative “classical” phase space suitable for carrying quantum spin-statistics at “a” classical level. In Sect. 4, we turn to a problem in low dimension topology which in principle has nothing to do with quantum mechanics: the asymptotics on large values of colours of curve operators in topological quantum field theory. In [37] J. Marché and myself showed that “generically” on the way this asymptotics is taken, the curve operators are Toeplitz operators associated with phase spaces which are moduli spaces isomorphic to the two dimensional sphere. Recently, in [42], I investigated the general case of this asymptotics and have shown that the corresponding “semiclassical” structure is one of another extensions of Toeplitz quantization. This extension corresponds to changing the Hilbert structure of the Hilbert space appearing in the (geometrical) quantization of the symplectic 2D sphere and considers the associated orthogonal projector noted .|·a ·|: ˆ OpTT QF T =
.
|q, pa q, p|dqdp.
We investigate the noncommutative structure; one has to put on the sphere in order that it corresponds to the moduli (phase) space underlying this singular asymptotics. Section 5 is concerned with the long time quantum evolution of quantum observables, pseudodifferential operators, uniformly when .h¯ → 0. It is well known that observable subject to such evolution remain in the pseudodifferential category for scales of times not larger that logarithmic in .1/h. ¯ The reason for this is the fact that the non-locality of the quantum paradigm induces a sensibility to nearby trajectories whose spreading in general is (shown to be less than) exponential in time. Willing to be “microlocal” implies that this spreading in phase space should not exceed the size of the Planck constant that fixes the logarithmic scale aforementioned. In [41] was shown that passing this logarithmic barrier was possibly thanks to, again, an extension of the Toeplitz procedure. In the present article we reformulate this extension in the following “groupoid” form ˆ OpTgroupoidT =
.
(q ,p )∼u/s (q,p)
|q , p q, p|dqdpdq dp ,
where the equivalence relation .∼u/s means “belongs to the same unstable manifold” for a long time forward propagation and “belongs to the same stable manifold” for a long time backward one.
Species of Spaces
319
In Sect. 6, we describe how these Toeplitz extension can provide semiclassical spectral information. Since spectral data are related to invariance by (quantum) evolution, it is normal to try to let merge the two for- and backward settings of the preceding section, namely to take the intersection of stable and unstable manifold. In other words to consider homoclinic orbits. That is to say that one has to handle ansatz of the form ˆ T .Ophomoclin = |q , p q, p|dqdpdq dp , (q ,p )∼homo (q,p)
where .(q , p ) ∼homo (q, p) means “.(q , p ) belongs to any orbit homoclinic to (the trajectory passing by) .(q, p)". We construct a noncommutative algebra associated with this “homoclinic” groupoid which might determine (part of) the set of frequencies (differences of eigenvalues) at the classical limit. This construction is astonishly close conceptually to the explicit case of the non-resonant harmonic oscillator, the set of homoclinic orbits, intersections of two given stable and unstable manifolds of a dynamical system, playing the role of an invariant torus in the integrable case. Finally, Sect. 7 concerns the link between quantum indeterminism, a key stone of the quantum paradigm, and unpredictability in classical chaotic systems, forced by sensitivity to initial conditions at the limit where the time of evolution diverges. We show that these two concepts—indeterminism and unpredictability—merge at the classical limit when a change of the concept of classical space is performed. We conclude this article with a short epistemological essay on the geometrical structure of the quantum-classical entanglement: the classical dynamics appears at the border of the quantum one but contains also, at its own border, the quantum one... Let us finish this overview by remarking that the key stone of most of the mathematical constructions in this paper has the same core: the Toeplitz quantization.
2 Quantizing Complex Canonical Transformations This section contains the first of our attends of considering classical phase spaces as noncommutative spaces underlying quantum ones: the push-forward of the usual classical phase space by complex canonical transforms. We shall consider linear canonical (and later on non-canonical) transforms, so, as mentioned earlier, there is no need of taking the limit .h¯ → 0 to “recover” a classical space. This is the case for real linear canonical transforms, but we will see that the situation is much more rich in the complex case.
320
T. Paul
Complex quantum Hamiltonians have (re)gained a lot of interest these last years, see, e.g. the book [33] and all the references quoted there. It seems that, on the contrary, quantization of complex symplectic flows did not get the interest it deserves as it did at the period of the birth of Fourier integral operators, see [39], for example. Inside this category, representation of linear complex symplectic flows, namely the complex symplectic group and its corresponding complex metaplectic representation, has also lost most of interest since the golden years of “group theoretical methods in physics”, see [25]. This section relies on [43] and we will handle complex canonical transform not by direct metaplectic operators, but rather by conjugation of observables by them. Let us remark that, since complex numbers are (according to us) necessary to the formulation of quantum mechanics, considering the Schrödinger equation on a Hilbert space .H with complex (e.g. bounded) Hamiltonian .H does not create any intrinsic a priori difficulty, a property that classical mechanics (on a real symplectic phase space .P) does not share with its quantum counterpart. The quantum flow is still given by .e−itH /h¯ which exists (of course it is not any more unitary) for all time, for example, when H is bounded. .e−itH /h¯ defines for any .t ∈ R a bounded operator on .H. On the contrary of the Lie exponential . −th associated with the symbol h of H which does not apply on .P because the Hamiltonian vector field associated with h is not tangent to .P any more.
2.1 Off-diagonal Toeplitz Representation We will study conjugation of quantum observables given by the Toeplitz (anti-Wick) construction by complex metaplectic operators. In order to avoid heavy notation, we will be, in most of the paper, concerned with the one degree of freedom case: a two dimensional phase space that will be, for simplicity, .R2 ∼ C. The general higher dimension case is extensively treated in [43] which contains most of the rest of this section. More precisely we consider operators of the form ˆ H =
.
C
hα (z)|ψzα ψzα |
dzd z¯ , 2π h¯
where the family of coherent states .ψzα is defined, for .α ∈ C, α > 0 and .z = (q, p) ∈ R2 by α .ψz (x)
:=
α π h|α| ¯ 2
1 4
i
e− 2h¯ α (x−q) ei 2
px h¯
pq
e−i 2h¯ .
Species of Spaces
321
Note that the standard Toeplitz quantization corresponds to .α = i, but .α will be a true dynamical variable, so we need to consider it as a degree of freedom. The link between different .hα (·) leading to the same operator H (in particular the standard case) is the following ˆ hα (z)|ψzα ψzα |
.
C
dzd z¯ = 2π h¯
ˆ C
h¯ 1
1
hα,α (z)|ψzα ψzα |
dzd z¯ . 2π h¯
(2)
h¯
hα,α = e−i 4 (α−α )ξ +i 4 ( α − α )x hα
(3)
for .(α − α ) > 0. ( α1 − α1 ) < 0. An easy computation shows that, when .hα (z) := 1, .H = Id for any value of .α. −1 U (S), conjugated of H by the operator .U (S) We consider the operator .U (S) H ab where S is a real .2 × 2 matrix . of determinant one and U is the metaplectic cd representation. More precisely .U (S) is the operator of integral kernel given by U (x, y) = √ .
=
√
2 i 2 1 e− 2bh¯ dx −2xy+ay b2π h¯
dδ (dx − y) e
2 i dc x2
b = 0 (4) b=0
In order to show how (4) can be derived, let us recall that one way to define the metaplectic representation is through the formula S
.
x d −i h¯ dx
U (S)−1 xU (S) . d )U (S) U (S)−1 (−i h¯ dx
=
(5)
−1 ∗ −1 ¯ Writing that .U (S) is unitary by .U (S) = U (S) , that is .U (x, y) = U (y, x) we ab get for .S = , an equation whose solution is (4), of course modulo a global cd phase. It is well known and easy to derive after (5) that, when S is real the Weyl symbol of .U (S)H U (S)−1 is the push-forward of the Weyl symbol of H by S, namely W eyl
σU (S)−1 H U (S)
.
q q W eyl = σH S −1 . p p
An easy computation shows that this implies the following result: U (S)−1 H U (S) =
.
ˆ hαS (α),α (Sz)|ψzα ψzα |
dzd z¯ 2π h¯
(6)
322
T. Paul
with αS (α) = iS · (α), where .S · z = az+b cz+d , and .hαS (α),α is given by (3). Of course this formula does not make any sense for general (non-analytic) symbols h when S is not real any more. But we will see that there is a general “off-diagonal” Toeplitz representation. The main result of the present short note is the following theorem. ab Notation we will denote for .S = ∈ SL(2, C), .z = (q, p) and .α ∈ C, cd
.
Sc =
.
a¯ c¯
b¯ ab q , S(z) = and S · α = d¯ cd p
aα+b cα+d .
Moreover, .∧ will denote the symplectic form of .T ∗ R, .z ∧ z = pq − qp . Theorem 1 (Off-diagonal Toeplitz Representation) Let, for h compactly supported, ˆ H =
h(z)|ψzα ψzα |
.
dzd z¯ . 2π h¯
Define, for .S ∈ SL(2, Z) and .α > 0, the real .2 × 2 matrix .α TS by .α
TS (z) = (qSα , pSα ) ∈ R2 defined by qSα + αS (α)pSα S q := S(z), z = (q, p). = q S + αS (α)pS , pS
Then, for any .S, α such that .|V · α| < ∞, (V · α) > 0, V = S −1 , S c −1 , S −1 S c ,
U (S)−1 H U (S) =
ˆ
.
−1 S c ·α
α TS c #hS c ·α,α (z)
|ψαS T
α (z) ψz | S −1 S c −1 c ψαS T S ·α (z) |ψzα S −1 S c
dzd z¯ , 2π h¯
where .hS c ·α,α is defined by (2)–(3) and .T #h is the push-forward of h by T . The proof is given in [43]. The key ingredient for the effective computation of the result consists in the following trick: one easily computes that −1 ·α
U (S)−1 |ψzα ψzα |U (S) = L|ψαST
.
S −1 z
c −1 ·α
ψαST
S c −1
z|
for L ∈ C.
But .U (S)−1 |ψzα ψzα |U (S) is a projector, therefore −1 ·α
L2 ψαST
.
S −1 z
c −1 ·α
|ψαST
S c −1
z
= L ⇒ L =
1
S c −1 ·α z |ψα T c −1 z S −1 S
−1 ·α
ψαS T
.
Species of Spaces
323
2.2 Link with Weyl It is well known, see the beautiful paper by Alex et al. [27], that the Weyl symbol of a Toeplitz operator is the function obtained by letting the heat flow at time .h/4 ¯ act on the Toeplitz symbol. Therefore the Weyl symbol of a Toeplitz operator is an entire function on which a complex canonical transform acts naturally. As a corollary of Theorem 1, we get the following (expected) result. Theorem 2 Let H be a Toeplitz operator with Toeplitz symbol h. Let us denote by h¯ σHh¯ the Weyl symbol of H (note that .σHh¯ = e 4 h is an entire function). Then
.
σUh¯ (S)−1 H U (S) = σ h¯ ◦ S.
.
2.3 Flows on Extended phase space Consider on the extended phase space .C × C = {(z, α)} the mapping S : (α, z) → (S.α, α TS (z)). Theorem 3 SS = S S .
.
2.4 Examples Several examples are presented in the table below. Let −1 c
−1
S ·α Dα (q, p) = ψαST . −1 |ψzα c (z) S
S
, z = (q, p).
.. . .
Complex harmonic oscillator
Complex dilation
Multiplication by .e−tx
Complex time free evolution
U
.
.
0 i i 0
cosh t i sinh t −i sinh t cosh t
eit 0 . 0 e−it
1 0 −it 1
.
−1 S c 1 2it . 0 1
1 0 2it 1
.
.
−1 0 0 −1
cosh 2t i sinh 2t −i sinh 2t cosh 2t
e−2it 0 . 0 e2it
.
.S
S 1 −it . 0 1
i
i
−4it i
i 1−2t
.e
.
·i
+ 2t)
−1 S c
.1i(1
.S
0
0
1−4t 1−2t
0
1−4t 1−2t
0 1
0 1
1+4t 1+2t
.
−1 0 0 −1
e−2t 0 . 0 e2t
.
.
1 . 0
.i TS −1 S c
0
0
1−2t 1−t
0
1−2t 1−t
0 1
0 1
1+2t 1+t
.
−1 0 0 1
e−t 0 . 0 et
.
.
1 . 0
.i TS c
sinh 2t (q 2 +p 2 ) h¯
2i cos t qp h¯
1
.e
.e
(1+3t)2 q 2
t 2 h .e (1+2t) ¯
(1+3t)2 p 2
t 2 h .e (1+2t) ¯
.Di (q, p)
324 T. Paul
Species of Spaces
325
We finish this section by the same computations for a non-canonical S, used in Sects. 2.7 and 3.6 below.
Anticanonical example
Its opposite
S 0 −i . i 0 0 i . −i 0
.S
−1 S c
−1 0 . 0 −1 10 . 01
.S
−1 S c
·i
i
i
.i TS −1 S c
−1 0 . 0 −1 10 . 01
.i TS c
10 . 01 −1 0 . 0 −1
2.5 Real Phase Space Associated with Complex Linear Symplectomorphisms The following result is immediate. Proposition 1
.α
TS¯−1 and .α TS −1 determine S and .α.
As a corollary of Proposition 1, we see that .R×2 × C = {((p, q), β)} = T ∗ R × C, on which .β TS−1 acts, appears as the real phase space encoding the action of complex linear canonical transforms. In the next section, we will give another, noncommutative, interpretation of this phase space.
2.6 Noncommutative Geometry Interpretation In this section we give a noncommutative interpretation of the off-diagonal Toeplitz representation in Theorem 1 ([see [43] for further details).
2.6.1
The Canonical Groupoid
We consider .P = T ∗ R × C+ the action of the group .SP (2n, C) (z, α) → (α TS (z), S · α). Let us define the groupoid G defined as the semidirect product .P SL(2n, C) of .P by .SL(2n, C) [16, Definition1 p. 104-105 and Section 7] as .G = P × SL(2n, C), (0) = P × {1} and the functors range and source given by .G r((z, α), S) = (z, α), s((z, α), S) = ((S(z), S · α) ∀((z, α), S) ∈ P × SL(2n, R).
.
326
T. Paul
The .C ∗ algebra associated with the groupoid G is the crossed product .C0 (P) SL(2n, C) of the algebra of continuous functions on .P by the action of .SL(2n, C) defined by ..
2.6.2
Symbols
Let H be a .α Töplitz operator of symbol .σHT as given by (2). By Theorem 1, we associate to .U (S)−1 H U (S) the couple ((σHT )S·α,α ◦ αT S c , S −1 S c ),
.
where .(σHT )S·α,α is given by (3). This can be seen as an element of the algebra associated with the canonical groupoid defined in Sect. 2.6.1 by the following construction: we associate to T off [U (S)−1 H U (S)] on the canonical .((σ )S·α,α ◦ αT S c , S −1 S c ) the function .σ H groupoid identified with .P × P defined by σ off [U (S)−1 H U (S)]((z, α), (z , α )) := α T S c #(σHT )S·α,α (z))δ((z , α )
.
− S −1 S c (z, α)), where .α T S c #(σHT )S·α,α designate the push-forward of .(σHT )S·α,α by .α T S c . off Conversely, we “quantize” the symbol .σU (S)−1 H U (S) by the following offdiagonal Toeplitz type quantization formula ˆ T off [σ off ] :=
.
P×P
σ off ((z, α), (z , α ))
|ψzα ψzα | dzd z¯ dz d z¯ . 2π h¯ ψzα |ψzα
(7)
Proposition 2 T off [σ off [U (S)−1 H U (S)]] = U (S)−1 H U (S).
.
2.6.3
On the (Formal) Composition of Symbols
Conjugating an observable by .U (S) corresponds to a (complex or real) change of variable. Therefore, multiplication of functions should be defined on the same system of coordinates, computationally. This leads to associate to .U (S)−1 H U (S) the operator of “multiplication” acting on .H by U (S)−1 H U (S) ·S H := U (S)−1 H H U (S).
.
Species of Spaces
327
This gives rise to the following multiplication of symbols: when .H, H are Toeplitz operators, so is (asymptotically) .H H and its symbol is at leading order the products of the symbols of H and .H . Therefore the symbol of .U (S)−1 H U (S)·S H is the groupoid composition of one of the .U (S)−1 H U (S) by th (trivial) one of .H . In the case where .H := U (S )−1 H in U (S ) U (S)−1 H U (S) ·S H := U (S)−1 H H U (S).
.
= U (S)−1 H U (S )−1 H U (S )U (S). Using the result of Theorem 1 .
U (S )−1 H U (S ) −1 c ˆ |ψαS T S ·α (z) ψzα | dzd z¯ S −1 S c , = α TS c #hS c ·α,α (α TS c z) S −1 c ψα T S ·α (z) |ψzα 2π h ¯ −1 c S S
and (formally) −1 S c ·α
H |ψαST
.
S −1 S c (z)
−1 S c ·α
= h(α TS −1 S c (z))|ψαST
S −1 S c (z)
+ O(h¯ )
we get formally the usual groupoid composition of symbols (see [43] for explicit expressions).
2.7 Non-Canonical Transforms It is striking to notice that the definition of the metaplectic representation as defined ab of determinant one, the operator of integral by (4), namely for a matrix .S = cd kernel given by (4), depends only on the numbers .a, b, d. The absence of c is hidden by the fact that, thanks to .det S = 1, .c = ad−1 b . On the contrary, the main formula in Theorem 1 is expressed directly on the matrix S and therefore admits an extension to the case .det S = 1. Note that this extension is highly non-trivial also in the real case .S ∈ M(2, R). In the present paper, we will limit ourself to the case .det S = ±1. We set M ± (2, C) := {S ∈ SL(2, C), det S = ±1}.
.
Definition 1 Let ˆ H =
.
h(z)|ψzβ ψzβ |
dzd z¯ . 2π h¯
328
T. Paul
Define, for .S ∈ M ± (2, C), det S = 0, and .α > 0, the real .2 × 2 matrix .α TS by .α
TS (z) = (qSα , pSα ) ∈ R2 defined by qSα + αS (α)pSα S q S S = q + αS (α)p , := S(z), z = (q, p). pS
Then, for any .S, α such that .|V · α| < ∞, (V · α) > 0, V = S −1 , S c −1 , S −1 S c , we define the composition operator .C(S) acting on H by ˆ C(S)H =
.
α TS c #hS c ·α,α ((−1)
1−det S 2
−1 S c ·α
z)
|ψαS T
α (z) ψz | S −1 S c 1−det S −1 c ψαS T S ·α (z) |I 2 ψzα S −1 S c
dzd z¯ , 2π h¯
(8)
where I is the parity operator defined on .L2 (R) by .I ψ(x) = ψ(−x). When det S = 1, (8) is the same as the result of Theorem 1 so that, in this case, C(S)· = U (S)−1 · U (S). When det S = −1, the presence of the operator I 1−det S and of the factor (−1) 2 in the in the normalization constant . S −1 S c ·α1 α ψα T
S −1 S c
(z) |I ψz
argument of α TS c #hS c ·α,α follows from the following intuitive arguments (see [43] for a rigorous one): – As we have seen right after its statement, the key stone of the proof of Theorem 1 was the fact that the normalization constant L ensures U (S −1 )|ψzα ψzα |U (S) to be a projector. This requirement can be also seen as following the fact that Wigner functions of pure states composed by canonical transforms satisfy the same equality than the original one, namely ˆ .
S#W (z − z )S#W (z)ei ˆ ⇔
z∧z h¯
W (z − z )W (z )ei
dz = S#W (z)
z∧z h¯
dz = W (z)
(9)
since det S = 1 ⇒ S(z) ∧ S(z ) = z ∧ z . When det(S) = −1, the left hand side of (9) becomes ˆ z∧z S#W (z − z )S#W (z)e−i h¯ dz = S#W (z) . ˆ ⇔
W (z − z )W (−z )ei
z∧z h¯
dz = W (z)
(10)
leading to, if R denotes the operator of Wigner function W , C(S)RI C(S)R = −1 S c ·α C(S)R. This shows easily that I has to be introduced in ψαST −1 |I ψzα . S S c (z) – In the course of the proof of Theorem 1 is used the equality
Species of Spaces
U (S
.
−1
329
)e
i z∧Z h¯
U (S) = e
i z∧S(Z) h¯
=e
iS
−1 (z)∧Z h¯
x , Z= d , −i h¯ dx
(11)
due to the fact that S is canonical. When det S = −1, (11) becomes U (S −1 )ei
.
z∧Z h¯
U (S) = ei
z∧S(Z) h¯
= e−i
S −1 (z)∧Z h¯
,
−1 c
S ·α ψ α | and therefore in the responsible for the change .z → −z in .|ψαST −1 z S S c (z) argument of .α TS c #hS c ·α,α by change of variable in the integration in (8). Note again that, on the contrary of the symplectic case, .C(S) is not in general a conjugation. Nevertheless, since .C(S)H has the form .C(S)H = ´ α (α) f (z)|ϕz (z) ϕzα |dz, one can extend .C(S), as in the conjugation case, to more general operator than the Toeplitz class and define .C(S)C(S ) by the same formula as in definition 1 after first replacing .(z, α) by .(α TS −1 S c (z), S −1 S c · α) and then multiplying by the weight
α TS c #hS c ·α,α (z) −1 c ψαS T S ·α (z) |ψzα S −1 S c
S=S
(see [43] for further details). With this definition of .C(S ) is a representation of .M ± (2, C): Theorem 4 C(S )C(S) = C(S S) for all S, S in M ± (2, C).
.
As a significant example useful in the next section, let us consider the case .S = 0 i computed in the second table of Sect. 2.4. We get, in the case .α = i, −i 0 ˆ C(S)H =
.
i h(z)|ψ−z ψzi |
dzd z¯ , 2π h¯
(12)
i |I ψ i = ψ i |ψ i = 1. since .ψ−z −z −z z In other words, .C(S)H is the quantization of the symbol (with a slight abuse of notation)
σ off [C(S)H ]((z, i), (z , i)) = h(z)δ(z + z).
.
(13)
0 −i The case .S = can be treated the same way and leads to, thanks to the same i 0 table,
330
T. Paul
ˆ C(S)H =
.
i h(z)|ψzi ψ−z |
dzd z¯ . 2π h¯
(14)
3 Bose-Einstein-Fermi at the Classical Level Quantum statistics is a fundamental hypothesis in quantum mechanics. It ensures in particular the stability of matter. On the contrary of many other aspects of nonrelativistic quantum mechanics which have a natural “‘classical” counterpart, it seems at the first glance difficult to associate to statistics properties of quantum object a classical corresponding symmetry. Changing the sign after permutation of coordinates of different particle does not appeal any classical simple action. Moreover most of the quantities which “passes” at the limit of vanishing Planck constant are quadratic and therefore looks, wrongly, as we will see, insensible to the change of sign. Finally, typical fermionic expressions such as exchange term in the Hartree-Fock theory vanish numerically at the limit .h¯ → 0.7 In this section, which relies on [44], we will implement this “exchange” action on three (in fact four) different symbols associated with quantum density matrices: the Husimi function (average of the density matrix on coherent states, therefore a probability density), Wigner functions (that is the Weyl symbol suitably renormalized by a power of the Planck constant in order to be of integral 1 (but non positive) and the Toeplitz symbol appearing in the so-called positive quantization procedure. The fourth one will be related to the construction elaborated in Sect. 2.7. The presentation will be rather formal, rigorous statements and derivations can be found in [44]. Definition 2 Let .ρ be a density matrix given by an integral kernel .ρ(X; Y ), X = (x1 , . . . , xn ), Y = (y1 , . . . , yn ). We define, for .i, j = 1, . . . , N, the mappings .Ui↔j : ρ(X; Y ) → Ui→j ρ(X; Y ) = ρ(X; Y )|yi ↔yj and .Vi↔j : ρ(X; Y ) → Vi→j ρ(X; Y ) = ρ(X; Y )|xi ↔xj . In terms of density matrices, quantum statistics will be seen as looking at density matrices which are eigenvectors of eigenvalue 1 or .−1 of the two mappings .Ui↔j , Vi↔j . The indistinguishability property of the quantum system reads as Ui↔j Vi↔j = Vi↔j Ui↔j , ∀i, j = 1, . . . , N.
.
7 Pierre-Louis
Lions, hand-written private communication (1990).
(15)
Species of Spaces
331
3.1 Husimi Let us recall that the Husimi function of a density matrix .ρ is defined as ¯ = [ρ](Z, Z) W
.
1 ϕZ |ρ|ϕZ , (2π h¯ )dN
(16)
where, for .Z = q + ip ∈ ZdN and .x ∈ RdN , ϕZ (x) =
1
.
(π h¯ )
dN 4
e−
(x−q)2 2h¯
ei
p.x h¯
.
(17)
The most elementary properties of the Husimi transform are [ρ] ≥ 0 and W
ˆ
.
ZdN
[ρ](Z)dZ = trace ρ = 1, W
(18)
¯ [ρ] is a function analytic in Z and in .Z. and we remark that .W Our first link between quantum statistics and the classical underlying space is the contents of the following result. ¯ expressed on the [ρ](Z, Z) Lemma 1 Let us consider the Husimi function of .ρ, .W complex variables .Z = (z1 , . . . , zn ), .zl = ql + ipl , z¯ l = ql − ipl . Then ¯ = e− [Ui↔j ρ](Z, Z) W
.
(¯zi −¯zj )(zi −zj ) 2h¯
¯ = e− [Vi↔j ρ](Z, Z) W
.
|zi −zj |2 2h¯
¯ zi ↔zj [ρ](Z, Z)| W
¯ z¯ i ↔¯zj . [ρ](Z, Z)| W
Note that, as expected, ¯ =W [ρ](z, z¯ )|zi ↔zj , z¯ i ↔¯zj . [Vi↔j Ui↔j ρ](Z, Z) W
.
Note also that, with the definition z± = q± + ip± :=
.
⎛ ⎞ ⎛ ⎞ zi zj ⎜ zj ⎟ ⎜ zi ⎟ ⎟ ⎜ ⎟ .⎜ ⎝ z¯ i ⎠ → ⎝ z¯ i ⎠ ⇐⇒ z¯ j z¯ j
⎛
zi ± zj √ , 2
⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ z+ z+ q+ q+ ⎜z− ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ → ⎜−z− ⎟ ⇐⇒ ⎜ q− ⎟ → ⎜−ip− ⎟ ⎝z¯ + ⎠ ⎝ z¯ + ⎠ ⎝p+ ⎠ ⎝ p+ ⎠ z¯ − z¯ − p− iq−
(19)
332
T. Paul
so the complex metaplectic transform associated with the exchange term is the c with matrix .I+ ⊗ SH 0 −i c c , det SH .SH = = −1. (20) i 0
3.2 Wigner The Wigner function of a density matrix is nothing but its Weyl symbol, divided by (2π h¯ )dN . More precisely the Wigner function of .ρ is defined as
.
ˆ W [ρ](X, ) =
.
R2dN
δ δ X. ρ(X + h¯ , X − h¯ )ei h¯ dδ. 2 2
(21)
At the contrary of the Husimi function, .W [ρ] is not positive, but its main elementary properties are ˆ .
R2dN
1 (2π h¯ )dN
ˆ R2dN
W [ρ](X, )dXdξ = trace ρ = 1
(22)
and W [ρ](X, 0W [ρ ](X, )dXd = trace (ρρ ).
Let us now define the semiclassical symplectic Fourier transform as f (q, p h¯ ) =
.
1 (2π h¯ )d
ˆ Rd ×Rd
f (x, ξ )ei
qξ −px h¯
dxdξ.
Note that, at the difference of the usual Fourier transform: h¯ ¯ f (x , ξ ) = f (x, ξ ). h
.
a ∓a
Let .a∓ = i√ j for .a = q, p, y, ξ . And let omit the dependence in the variable 2 .q1 , . . . , qi−1 , qi+1 , . . . , qj −1 , qj +1 , . . . , qN and the same for p. We denote π
W 2 [ρ](x+ , ξ+ ; x− , ξ− ) = W [ρ](xi , xj ; ξi , ξj ).
.
Lemma 2 ¯ W 2 [Ui↔j ρ](q+ , p+ ; p− , q− ) = W 2 [ρ](q+ , p+ ; q − , p− ) π
π
.
h
¯ W 2 [Vi↔j ρ](q+ , p+ ; p− , q− ) = W 2 [ρ](q+ , p+ ; −q − , −p− ). π
.
π
h
Species of Spaces
333
Note that W [Vi↔j Ui↔j ρ](q1 , p1 , . . . , qi , pi , . . . , qj , pj , . . . , qn , pn ) =
.
W [ρ](q1 , p1 , . . . , qi−1 , pi−1 , qj , pj , . . . , qj −1 , pj −1 , qi , pi , . . . , qn , pn ).
.
Let us call now .W − the Wigner function (done with the symplectic Fourier transform) on the two variables .q− , p− , namely π W − W 2 [ρ] (q+ , p+ |p− , q− ; x− , ξ− ) =
.
ˆ .
π W 2 [ρ] (q+ , p+ , p− + 2δ h, ¯ q− + 2δ h¯ )
π W 2 [ρ] (q+ , p+ , p− − 2δ h¯ , q− − 2δ h¯ )ei(x− δ−ξ− δ ) dδdδ .
.
π W − W 2 [ρ] lives on .T ∗ (Rdq+ ) × T ∗ (R2d (p− ,q− ) ) equipped with the symplectic form .
dq+ ∧ qp+ + dq− ∧ dξ− + dp− ∧ dx− .
.
One has π W − W 2 [Ui↔j ρ] (q+ , p+ |p− , q− ; x− , ξ− ) =
.
π W − W 2 [ρ]](q+ , p+ | − ξ− , −x− ; q− , p− ).
.
π That is, the action of .Ui↔j on .ρ is seen on .W − W 2 [ρ] by the pointwise action of the following matrix: ⎛⎛
1 ⎜⎜0 ⎜⎜ ⎜⎝0 ⎜ ⎜ S+ 0 ⎜ 0 .S = =⎜ ⎜ 0 S− ⎜ ⎜ ⎜ ⎝
0 1 0 0
0 0 1 0
0
⎞ 0 0⎟ ⎟ 0⎠ 1
⎞ q+ ⎟ ⎜ ξ+ ⎟ ⎟ ⎜ ⎟ 0 ⎟ ⎜p ⎟ ⎟ ⎜ +⎟ ⎟ ⎜ ⎟ ⎟ ⎜x ⎟ ⎞⎟ on ⎜ + ⎟ ⎛ ⎜ q− ⎟ 0 0 0 1 ⎟ ⎟ ⎜ ⎟ ⎜ 0 0 −1 0⎟⎟ ⎜ ξ− ⎟ ⎟⎟ ⎜ ⎜ ⎟ ⎝ 0 1 0 0⎠⎠ ⎝ p− ⎠ −1 0 0 0 x− ⎞
⎛
and this matrix is symplectic. Defining now .z± = p± + ix± , θ± = q±+ iξ± we find that S becomes on c , S c ) = (I, i 0 1 ). And so the complex metaplectic these new variables, .S c = (S+ − 1 0 transform associated is
334
T. Paul
c .SW
0i C , det SW = = 1. i o
3.3 Toeplitz Let .ρ be a Toeplitz operator of symbol, . W [ρ]. This means that .ρ can be written as ˆ 1 ¯ Z ϕZ |dZ W [ρ](Z, Z)|ϕ (23) .ρ = (2π h) ¯ dN CdN (here the integral has to be understood in the weak sense on .H). Elementary properties of . W [ρ] are ˆ .
W [ρ] ≥ 0 ⇒ ρ > 0, and
CdN
W [ρ]dZ = trace ρ.
(24)
Moreover, the second property of (22) can be “disintegrated” in the following coupling between Husimi and Toeplitz settings: ˆ .
CdN
¯ ¯ W [ρ ](Z, Z)dZ [ρ](Z, Z) = trace (ρρ ). W
(25)
Lemma 3 (For . W Entire) .
.
.
W [Ui↔j ρ](zi , z¯ i , zj , z¯ j ) = e−
W [Ui↔j ρ](q− , p− ; q+ , p+ ) = e−
|zi −zj |2 2h¯
2 +p2 q− − 2h¯
W [ρ](zj , z¯ i , zi , z¯ j )
W [ρ](−ip− , iq− ; q+ , p+ )
W [Vi↔j ρ](q1 , p1 , . . . , qi , pi , . . . , qj , pj , . . . , qn , pn ) = e− .
(qi −qj )2 +(pi −pj )2 2h¯
× W [ρ](q1 , p1 , . . . , qi−1 , pi−1 , −ipj , iqj , . . . , qj −1 , pj −1 , −ipi , iqi , . . . , qn , pn )
.
= e−
(qi −qj )2 +(pi −pj )2 2h¯
W [ρ]|zi ↔−zj , zi = qi + ipi . z¯i ↔z¯j
In other words, the exchange action on the Toeplitz symbol is the same as the one on the Husimi function.
Species of Spaces
335
3.4 On Wigner Again Let us denote W Ui↔j W [ρ] = W [Ui↔j ρ].
.
Let us moreover denote by .W 2 [ρ] the Wigner function of the Wigner function of .ρ (see Footnote 1): W 2 [ρ] = W [W [ρ]].
.
Let us denote by .Qi = (qi , ξi ) and .Pi = (pi , xi ), i = 1, . . . , N, the variables in T ∗ (T ∗ Rd )). We define:
.
Qti = (ξi , qi ), Pit = (xi , pi ).
.
Lemma 4 W 2 [Ui↔j ρ](Q1 , P1 , . . . , Qi , Pi , . . . , Qj , Pj , . . . , Qn , Pn ) =
.
W 2 [ρ](Q1 , P1 , . . . , Qi−1 , Pi−1 , Pjt ,−Qtj , . . . , Qj −1 , Pj −1 , Pit , −Qti , . . . , Qn , Pn )
.
W W 2 [Ui↔j ρ] = W [Ui↔j W [ρ]] = W 2 [ρ]| Qi ↔P t .
.
j
Pi ↔−Qtj
W 2 [Vi↔j ρ](Q1 , P1 , . . . , Qi , Pi , . . . , Qj , Pj , . . . , Qn , Pn ) =
.
W 2 [ρ](Q1 , P1 , . . . , Qi−1 , Pi−1 ,−Pjt , Qtj , . . . , Qj −1 , Pj −1 , −Pit , Qti , . . . , Qn , Pn )
.
W W 2 [Vi↔j ρ] = W [Vi↔j W [ρ]] = W 2 [ρ]|Qi ↔−P t .
.
j
Pi ↔Qtj W , V So .Ui↔j i↔j are metaplectic operators associated with canonical transforms on ∗ ∗ dN )). .T (T (R
Lemma 5 Denoting now .zi = qi + ξi , θi = pi + ixi we have W W [Ui↔j W [ρ]] = W 2 [Ui↔j ρ] = W 2 [ρ]|zi ↔izj
.
θi ↔iθj
W 2 [Vi↔j ρ] = W 2 [ρ]|zi ↔−izj
.
θi ↔−iθj
336
T. Paul
W , V So .Ui↔j i↔j are metaplectic operators associated with complex canonical transforms on the complexification of .T ∗ (RdN ).
3.5 Off-diagonal Toeplitz Representations In this section, we take .d = 1 and .N = 2. A density matrix .ρ has an integral kernel .ρ(x1 , x2 ; y1 , y2 ) and (Uρ)(x1 , x2 ; y1 , y2 ) = ρ(x1 , x2 ; y2 , y1 )
.
(Vρ)(x1 , x2 ; y1 , y2 ) = ρ(x2 , x1 ; y1 , y2 ).
.
Therefore, performing a change of variables √ √ x = (x1 − x2 )/ 2, x = (x1 + x2 )/ 2,
.
√ √ y = (y1 − y2 )/ 2, y = (y1 − y2 )/ 2,
.
one get, with a slight abuse of notation that Uρ(x, y; x , y ) = ρ(x, −y : x , y )
.
Vρ(x, y; x , y ) = ρ(−x, y : x , y ).
.
In the rest of this section we will omit the variables .x , y . Let us consider a (generalized) Toeplitz operator ˆ H =
h(z)|ψzβ ψzβ |
.
dzd z¯ , 2π h¯
where, for .β > 0, z = q + ip, β .ψz
=
e−
β(x−q)2 2h¯
ei
px h¯
1
4 (π h/β) ¯
.
Let us define .H l by its integral kernel .H l (x, y) = H (−x, y) where .H (x, y) is the integral kernel of H . Let .H r be defined the same way by .H r (x, y) = h(x, −y). Obviously l
Hr =
.
ˆ
β
β
h(z)|ψ∓z ψ±z |
dzd z¯ . 2π h¯
Species of Spaces
337
Therefore, we get the following off-diagonal expressions. Lemma 6 ˆ VH =
.
h(q, p)|ψ−z ψz |
dzd z¯ 2π h¯
h(q, p)|ψz ψ−z |
dzd z¯ 2π h¯
ˆ UH = ˆ UV H =
h(q, p)|ψ−z ψ−z |
dzd z¯ 2π h¯
U 2 = V 2 = 1. These expressions have to be compared to the following ones, derived from Sect. 3.3. Lemma 7 ˆ VH =
.
ˆ UH =
h(ip, −iq)e−
q 2 +p 2 2h¯
|ψz ψz |
dzd z¯ 2π h¯
h(−ip, iq)e−
q 2 +p 2 2h¯
|ψz ψz |
dzd z¯ 2π h¯
ˆ UV H =
h(−q, −p)|ψz ψz |
dzd z¯ . 2π h¯
The Toeplitz symbol of V H (resp. U H ) is .hV (q, p) = h(ip, −iq)e− −q
q 2 +p2 2h¯
(resp.
2 +p2 2h¯
). hU (q, p) = h(−ip, iq)e ´ Lemma 8 Let .h ≥ 0, h = 1. Then .H B := 14 (H + V H + U H + U V H ) is a bosonic state, and .H F := 41 (H − V H − U H + U V H ) is a fermionic one.
.
Proof One has .H B = V H B = U H B = U V H B , .Tr H B = 1, .H F = −V H B = −U H B = U V H B , .Tr H B = 1, and ˆ dzd z¯ B .H = 14 h(q, p)|ψz + ψ−z ψz + ψ−z | ≥ 0. 2π h¯ ˆ dzd z¯ F .H = 41 h(q, p)|ψz − ψ−z ψz − ψ−z | ≥ 0. 2π h¯ Finally, .H B is “semiclassical”.
338
T. Paul
3.6 Link with the Complex (Anti)metaplectic Representation We have seen in theprevious (sub)sections that U (resp. V ) is associated with the 0 −i 0 i (resp. . ) on the Husimi function and the Toeplitz action of the matrix . i 0 −i 0 symbol. Therefore it is natural to think that U (resp. V ) should be associated with the −1 −1 0 i 0 −i 0 −i 0 i (resp. . ), “metaplectic” quantization of . = = −i 0 i 0 i 0 −i 0 “metaplectic” because these matrices are not canonical. Precisely, a definition of quantization of anticanonical mappings has been provided in the preceding section that we can use in the present situation. With the definition of .C(S) in [43] recalled in Sect. 2.7 above, we get our final result, as a direct application of (14). Lemma 9 Let H be a Toeplitz operator of symbol .h(q, p). Then U H = C(
.
0 i −i 0
)H.
)H. V H = C( 0i −i 0 But the “true” result is the following that we express only for U (the case V being straightforwardly the same). Proposition 3 0 i U H = T off σ off [C( −i )H ] , 0
.
0 i )H ] is defined by (13) and .T off by the off-diagonal Toeplitz where .σ off [C( −i 0 quantization formula (7). Namely, U H is given by the off-diagonal Toeplitz quantization of the off-diagonal 2 0 i − |·|h¯ Toeplitz symbol of C( −i as for )H without the multiplication by the factor e 0 the Husimi and the (diagonal) Toeplitz cases, as seen in the previous sections. Proposition 3 shows clearly first that the exchange mappings U, V are clearly associated with complex non-canonical linear transformations, and second that the off-diagonal Toeplitz quantization/representation of Mpm(2, C) established in Sect. 2.7, Definition 1, is meaningful. Note again that det
0 i −i 0
= det
0 −i i 0
= −1.
Species of Spaces
339
3.7 A Classical Phase Space with Symmetries Inherited Form Quantum Statistics The construction of the preceding (sub)section suggests that a noncommutative extension of the usual phase space of classical mechanics, namely the cotangent bundle of the configuration space, is possible in order to handle the trace, at the classical underlying level, of more symmetries, coming from the quantum one, than the one usually considered: namely the unitary in a Hilbert space of the quantum propagation leading to the symplectic classical evolution associated with SL(2, R). One recovers the presence of the fundamental (as responsible, e.g. of the stability of matter) spin-statistics symmetries at the classical level by extending the group of symmetry SL(2, R) acting pointwise on the phase space to M ± (2, C) = {S ∈ M(2, C), det S = ±1} with its action on a noncommutative space established in Sect. 2.7.
4 Noncommutative Moduli Spaces Underlying Topological Quantum Field Theory in Large Colouring Asymptotics Quantum Field Theory (QFT) is an extension of quantum mechanics to situations where particles interact with fields. It involves a quantization procedure of fields whose “philosophy” is the same as one of the regular quantum mechanics, but whose physical and mathematical realization is far from being as complete as for quantum mechanics. In fact, the modern formulation of QFT consists “only”8 in computing amplitudes of transitions through Feynman integrals of the (highly non rigorous) form ˆ S() (26) .IS = ei h¯ D[φ], where the “integral” is over families of fields and S() is a classical action, namely an integro-differential functional of . The “measure” D is supposed to be an extension of the Lebesgue measure in highly infinite dimension. The enormous success of the formal and mysterious formula (26) comes from the fact that it can be perturbatively computed when S() = So () + S1 () where S0 is a quadratic functional. This leads to an exact computation of IS0 by extension of Gaussian integrals to infinite dimension, and asymptotic computation of ISo +S1 by some perturbative calculus. Meanwhile, when the action has a large group of symmetries, IS reduces to an integral over the quotient of the set of fields
8 Let us quote Claude Itzykson when he was saying that the most difficult QFT is the one in dimension zero, that is regular quantum mechanics, because it is the one to which one asks the most precise questions.
340
T. Paul
by the group of symmetries and allows sometime non-perturbative (and physically interesting) computations. This is precisely the case for Topological Quantum Field Theory (TQFT), for which, roughly speaking, one imposes to the action S to be sensitive only to topological underlying structures (and not, e.g. the geometrical ones). Then the formula (26) produces topological invariants which happen to be directly linked to numerous domains in mathematics like knot theory, moduli spaces in algebraic geometry and low dimension topology, and in physics such as fractional quantum Hall effect and condensed matter physics. TQFT à la Witten [61] is a physical model for the Jones polynomial of knots. Without defining here precisely the objects involved (see [37] for some details) the invariant is then the partition function Zr (M, K) associated with any knot K in a 3-manifold M expressed as a Feynman integral over all connections A on some G-bundle over M, for a compact group G, of the form ˆ Zr (M, K) =
.
eirCS(A) traceV (HolK A)dA,
(27)
where traceV is the trace associated with a representation V of G and HolK (A) is the holonomy of the connection A along K and CS(A) is the Chern-Simons functional. The analogy between (26) and (27) is clear: the fields are the connections A, the “measure” D(A) = traceV (HolK A)dA, the action is CS and, very important for us, r = h1¯ .
.
Therefore the asymptotics r ∼ ∞ correspond to the semiclassical regime h¯ ∼ 0, for which the stationary phase principle implies that the integration in (27) should concentrate on critical points of the Chern-Simons functional, that is connections which are flat on M \ K. Moreover, if M has boundary , then Zr (M, K) should be interpreted as an element of the geometric quantization of the moduli space M(, G) i.e. the space of gauge equivalence classes of flat G-connections on . To any closed oriented surface with n marked points p1 , . . . , pn , any integer r > 0 and any colouring c = (c1 , . . . , cn ), ci ∈ {1, . . . , r − 1} of the marked points, TQFT provides, by the construction of [8], a finite dimensional Hermitian vector space Vr (, c) together with a basis {φn , n = 1, . . . , dim (Vr (, c))} of this space (see Sections 2.1 and 2.5 in [37]). On the other (classical) side, to each t ∈ (π Q)n we can associate the moduli space: M(, t) = {ρ : π1 ( \ {p1 , . . . , pn }) → SU2 s.t. ∀i, trρ(γi ) = 2 cos(ti )}/ ∼,
.
where one has ρ ∼ ρ if there is g ∈ SU2 such that ρ = gρg −1 and γi is any curve going around pi . When is either a once punctured torus or a 4-times punctured sphere, M(, t) is symplectomorphic to the standard sphere S 2 = CP 1 .
Species of Spaces
341
To any curve γ on the surface (that is, avoiding the marked points) we can associate two objects: a quantum one, the curve operator Tγ acting on Vr (, c), and a classical one, the function fγ on the symplectic manifold M(, t). • Tγ is obtained by a combinatorial topological construction recalled in Sections 2.3 and 2.4 in [37]. By the identification of the finite dimensional space Vr (, c) with the Hilbert space of the quantization of the sphere HN defined in [42, Section 2.1] with N := dim (Vr (, c)), through {φn , n = 1, . . . , dim (Vr (, c))} ↔ {ψnN , n = 1, . . . , N}, Tγ can be seen as a matrix on HN . One of the main results of [37] was to prove that this matrix is a trigonometric one in the sense of [42, Section 3.2]. • fγ : M(, t) → [0, π ] is defined by ρ → fγ (ρ) := −trρ(γ ).
.
(28)
The asymptotics considered in [37] consists in letting r → ∞ and considering a sequence of colourings cr such that π crr converges to t and the dimension of Vr (, cr ) := N grows linearly with r. One sees immediately that, by the identification Vr (, c) ↔ HN , this corresponds to the semiclassical asymptotics N → ∞. The main result of [37] states that, for generic values of t, Tγ is a Toeplitz operator of leading symbol fγ .
.
The generic values of t are the one for which fγ considered as a function on S 2 by the symplectic isomorphism mentioned earlier belongs to C ∞ (S 2 ). The present section relies on [42] and concerns the asymptotics π crr → t for any value of t ∈ [0, π ]. In Sect. 4.1 we present the geometric quantization of the sphere, together with the definition of Toeplitz operators we just mentioned. Then we state in Sect. 4.2 the extension of the Toeplitz calculus necessary to get rid of the generiticity condition for the asymptotic t → ∞. Our main result is contained in Sect. 4.4 and some geometrical considerations on the singular (in fact noncommutative) underlying classical limit space are given in Sect. 4.5.
4.1 The Standard Geometric Quantization of the Sphere In this section we will consider the quantization of the sphere in a very down-toearth way. See [19, 37] for more details. Given an integer N, we define the space HN of polynomials in the complex variable z of order strictly less than N and set
342
T. Paul
ˆ
N! zn n!(N − 1 − n)! C (29) The vectors (ϕnN )n=0...N −1 form an orthonormal basis of HN . By the stereographic projection i .P , Q = 2π
P (z)Q(z) dzd z¯ (1 + |z|2 )N+1
and
S 2 (τ, θ ) ∈ [0, 1] × S 1 → z =
.
ϕnN (z) =
τ eiθ ∈ C ∪ {∞}. 1−τ
The space .HN can be seen as a space of functions on the sphere (with a specific behaviour at the north pole). Write dμN =
.
i dzd z¯ . 2π (1 + |z|2 )N+1
(30)
As a space of analytic functions in .L2 (C, dμN ), the space .HN is closed. For .z0 ∈ C, we define the coherent state ρz0 (z) = N(1 + z¯ 0 z)N−1 .
.
(31)
These vectors satisfy .f, ρz0 = f (z0 ) for any .f ∈ HN and the orthogonal projector πN : L2 (C, dμN ) → HN satisfies .(πN ψ)(z) = ψ, ρz . For .f ∈ C ∞ (S 2 , R) we define the (standard) Toeplitz quantization of f as the operator
.
T N [f ] : HN → HN ˆ T N [f ] := f (z)|ρz ρz |dμN (z)
.
C
ˆ i.e. T N [f ]ψ :=
C
f (z)ρz , ψHN ρz dμN (z) = πN (f ψ) for ψ ∈ HN . (32)
A Toeplitz operator on .S 2 is a sequence of operators .(TN ) ∈ End(HN ) such that there exists a sequence .fk ∈ C ∞ (S 2 , R) such that for any integer M the operator M .R N defined by the equation M
TN =
M N −k Tfk + RN
k=0
is a bounded operator whose norm satisfies .||RM || = O(N −M−1 ). An easy use of the stationary phase Lemma shows that the (anti-)Wick symbol T ρz ,ρz satisfies (also called Husimi function) of .Tf , namely . ρf z ,ρ z
Species of Spaces
343
.
Tf ρz , ρz 1 = f + S f + O(N −2 ), ρz , ρz N
(33)
where .S = (1 + |z|2 )2 ∂z ∂z¯ is the Laplacian on the sphere.
4.2 a-Toeplitz Quantization Let .a ∈ S(R), .||a||L2 (R) = 1 and .z ∈ C. We define ˆ a .ψz
where .τ (z) =
=
R
a(t)ei
τ (z)t h¯
dt ρeit z √ , 2π
(34)
|z|2 1+|z|2
π . and .h¯ = N N−1 N! n N Since, by (31), .ρz = n!(N−1−n)! z¯ ϕn , we get that n=1
N −1
=
a .ψz
n=0
τ (z) − nh¯ a h¯
N! z¯ n ϕnN = n!(N − 1 − n)!
N −1
a n=0
τ (z) − nh¯ ϕnN (z)ϕnN , h¯ (35)
where .a is the Fourier transform of a 1 .a (y) := √ 2π
ˆ R
eixy a(x)dx.
Remark 1 Note that, by (35), .ψza depends only on the values of .a˜ on .[0, N ]. Therefore one can always restrict the choice of a to the functions whose Fourier transform is supported on .[0, N ]. In the sequel of this article we will always do so. By [37, Lemma 2] we have that ˆ .
C
N −1
|ψza ψza |dμN (z) =
CnN |ϕnN ϕnN |
(36)
n=0
with CnN =
.
(N − 1)! n!(N − 1 − n)!
Moreover, as .N − 1 =
1 h¯
ˆ 0
n dτ τ τ − nh¯ 2 (1 − τ )N−1 . a 1−τ h¯ h¯
1
→∞
CnN = 1 + O(
.
(37)
1 ), 0 < nh¯ < 1. N
(38)
344
T. Paul
N .Cn
1 ∼ n!
ˆ
∞
0
√ | a (λ − n)|2 λn e−λ 2π λdλ, 0 ∼ nh. ¯
N CnN ∼ CN ¯ ∼ 1. −1−n , nh
(39)
(40)
.
Let us define ψnN :=
.
CnN ϕnN .
(41)
By (36) we have ˆ .
C
N−1
|ψza ψza |dμN (z) =
|ψnN ψnN |.
(42)
n=0
This leads to the following equality: ˆ .
C
|ψza a ψza |dμN (z) = 1HaN ,
(43)
where .HNa is the same space of polynomials as .HN but now endowed with the renormalized scalar product .·, ·a fixed by N ψm , ψnN a = δm,n ,
(44)
|ψza a ψza |ψ := ψza , ψa ψza , ψ ∈ HaN .
(45)
.
and .
The Hilbert scalar product .·, ·a on .HaN is obtained out of (44) by bi-linearity. Since any polynomial f satisfies N−1
f =
ϕnN , f ϕbN =
.
0
1 ψ N , f ψbN , CnN n
we get N −1
f, ga :=
.
n=0
1 f, ψnN ψnN , g == (CnN )2
N −1 n=0
1 f, ϕnN ϕPnN , g. CnN
Note that ., a is not given by an integral kernel. But if we “change” representation and define .F (z) := ψza , f , G(z) := ψza , g then, by (43) we have
Species of Spaces
345
ˆ f, ga = F, G =
.
C
¯ F (z)G(z)dμ N (z).
Let us remark that ˆ ˆ τ (z)t dt .F (z) = a(t)ei h¯ f (eit z) √ , f (z ) = F (z)ψza (z )dz. 2π C R Finally, let us define ! " N " C τ (z) " h¯ −(−i∂x )−k ikθ(z) # N e , .C(z) := C τ (z) k=−(N −1) N−1
h¯
(46)
−(−i∂x )
where .CnN is the binomial coefficient, and, for a family of operators .z → σ (z), the convolution ˆ .σ C(z) := σ (ze−iθ )C(|z|eiθ )dθ. (47) S1
Definition 3 (a-Toeplitz Operator) Let .a ∈ S(R). To a (trigonometric) family z → (z) of (bounded) operators on .L2 (R) we associate the operator .OpTa () on a .HN defined by .
ˆ T .Opa ()
where .ψza
:=
dzd z¯ i 2π (1+|z|2 )N+1 .
´
R a(t)e
=
i τ (z)t h¯
S2
|ψzC(z)a a ψza |dμN (z),
ρeit z √dt
2π
with .τ (z) =
|z|2 1+|z|2
and .dμN (z) =
4.3 Symbolic Calculus Definition 4 Let γ (τ, θ ) =
K
γk (τ )eikθ be a trigonometric function on the
k=−K
sphere with each γk ∈ C ∞ (]0, 1[) ∩ L∞ ([0, 1]). Let N−1
Nγ =
Nk;γk where (Nk;γk )ij = δj,i+k γk ((k −
.
−(N−1)
(−1)k − 1 )h¯ ) 2
and Nγ the operator whose matrix on the basis {ψnN } is Nγ .
(48)
346
T. Paul
We call symbol of Nγ at the point z ∈ S 2 the operator N −1
σ [Nγ ](z) =
σk;γk (z),
.
(49)
k=−(N −1)
where σk;γk is given by eikx τ (z) eikx τ (z) . σk;γk (z) := k (γk (h·)μ − i∂x = k γk (τ (z)−i h∂ − i∂x ¯ k) ¯ x )μk z¯ z¯ h¯ h¯ (50) acting on L2 (R). The a-Toeplitz quantization procedure provides an exact (noncommutative) symbolic calculus as follows. Theorem 5 Let γ , γ and Nγ , Nγ as in Definition 4. Then Nγ = OpTa (σ [Nγ ])
.
Nγ = OpTa (σ [Nγ ]) Nγ Nγ = OpTa (σ [Nγ ]σ [Nγ ]).
4.4 Main Result It is easy to see that, for the remaining non-generic values of t, Tγ is not a standard Toeplitz operator. Nevertheless, it happens that it is an a-Toeplitz one. Theorem 6 ([42]) Let again be either the once punctured torus or the 4-times γ punctured sphere. For all values of t, the sequence of matrices (Tr ) is the matrices in the basis {ψnN }|n=0,...,N −1 of a family of a-Toeplitz operators on HaN with symbol σTTγ : L2 (R, dx) → L2 (R, dx) satisfying, away of the two poles, r
√ σTTγ (z) = fγ (e−ix z) + O( h¯ ), r
(51)
.
where fγ is the trace function defined by (28). Note that, far from the singularities (poles), the algebra of principal symbols is commutative and one can associate canonically to the symbol σTTγ (z) = fγ (e−ix z) r
the trace function fγ (z) by evaluation at x = 0 of the potential fγ (e−ix z).
Species of Spaces
347
4.5 Classical Limit and Underlying “phase space” We can rewrite the general structure of the symbol of an a-Toeplitz operator T as the form (near the south pole where τ ∼ |z| ∼ 0) σ (z) = S(1 − i
.
h¯ ∂x , x + θ, τ (z) − i h¯ ∂x , h¯ ), τ (z)
where the function S is .2π periodic in the second variable and the quantization present in the two first variables is one of the differential calculus. The function S satisfies S(1 + ξ, x + θ, τ (z) + ξ, h¯ ) → S(1, x + θ, τ (z), h) ¯ = γ (τ (z), θ + x) as ξ → 0, where .γ (τ, θ, h) ¯ is the so-called naive symbol of T . As .h¯ → 0, .z = 0, σ (z) → γ (τ, θ + x)
.
but the limit .h¯ , z → 0 is multivalued. Indeed as ⎧ ⎪ ⎨ h¯ → 0 z→0 . ⎪ ⎩ h¯ = h¯ 0 τ (z) we have σ (z) → S(1 − i h¯ 0 ∂x , ei(x+θ) , 0, 0).
.
And the “classical” noncommutative multiplication for the function S is given by: .S#S
(1 − h¯ 0 ξ, θ + x, τ, 0) = S(1 − h¯ 0 ξ, θ + x + i∂ξ , τ, 0)S (1 − h¯ 0 ξ , θ + x, τ, 0)|ξ =ξ →
:= S(1 − h¯ 0 ξ, θ + x + i ∂ξ , τ, 0)S (1 − h¯ 0 ξ, θ + x, τ, 0)
This defines the classical phase space, as a noncommutative algebra of functions i.e. a noncommutative blow up of the singularity. In other words, an algebra of functions with a multiplication law commutative except at certain points of singularity where it becomes noncommutative.
348
T. Paul
5 Long Time Semiclassical Evolution In this section we consider the long time semiclassical evolution through the linear Schrödinger equation, or more precisely to the associated von Neumann equation i h¯
.
d t O = [O t , H ], dt
(52)
where H is a Schrödinger operator .H = −h¯ 2 + V with smooth confining V (.V (x) → +∞ as .|x| → ∞)- or a more general semiclassical pseudodifferential operator of principal symbol h, elliptic and self-adjoint on the Hilbert space .L2 (M), where .M is a manifold of dimension .n + 1. It is well known [7, 12] that, for times smaller than .C log h1¯ , C small enough, t .O is still a Weyl (semiclassical) pseudodifferential operator and that its principal symbol is the push-forward of the initial one by the Hamiltonian flow associated with the principal symbol h of H . It is easy to get convinced that this is already not true for large values of C (greater than . 32 times the natural Liapunov exponent of the flow). Through this paper we will suppose that the Hamiltonian flow generated by h is Anosov, and moreover that there exists a smooth action of .R2(n+1) on .T ∗ M, 2(n+1) → ν ◦ μ ◦ T s,p , satisfying .(μ, ν; s, p) ∈ R ν ◦ μ ◦ T s,p ◦ t = t ◦ e
.
−λt ν
◦ e
λt μ
◦ T s+tp,p , λ > 0.
(53)
(53) is obviously mimicked on the case of the geodesic flow on surface of constant curvature (. μ (resp. .ν ) is the (resp. anti-)horocyclic flow, .T s,0 is the geodesic one and .T 0,p corresponds to a shift of energy), but we will not suppose that we are in this case and we will use only (53). Moreover we will restrict this note to the bidimensional situation .n = 1 (the extension to any n, keeping (53), is straightforward) and take .M = Rn+1 . The proofs are local and therefore are easily adaptable to the non-flat situation using the results of [52]. Finally we could extend some of our results to the case of variable Liapunov exponents. We will suppose that .O t=0 is a semiclassical pseudodifferential operator with smooth symbol supported in .h−1 (I ) for some interval .I ⊂ R such that .h−1 (I ) is compact. We first define, associated with .a ∈ D(Rn+1 ), aL2 = 1, and the family of so-called (Gaussian) coherent states − ϕ(p,q) (x) := (π h) ¯
.
n+1 4
e−
the family of Lagrangian states:
(x−q)2 2h¯
ei
px h¯
, (p, q) = z ∈ R2(n+1) ,
(54)
Species of Spaces
349
ˆ ψza :=
exp
.
⎧ ⎪ ⎨ i
⎫ ⎪ ⎬ dμds η a(μ, s)ϕ μ ◦T s,0 (z) n+1 , ⎪ ⎭ h¯ 2
s,0
μ ◦T ˆ (z)
⎪ ⎩ h¯
z
η := ξ.dx (symplectic potential on T ∗ Rn ).
(55)
It is easy to see that, microlocally in the interior of I and for a support of a small ´ n+1 enough, the operator defined by . h−1 (I ) |ψza ψza | d n+1z (here we denote by .|ψψ| h¯ the operator:.ϕ → ϕ, ψψ) is equal to the identity modulo .h¯ ∞ . The key idea of this paper will be to write any pseudodifferential operator in the form ˆ O=
.
Oz a
h−1 (I )
|ψz
ψza |
d n+1 z
(56)
h¯ n+1
for a suitable family of bounded pseudodifferential operators .Oz . The interest of such a formulation will be the fact that it is preserved by the evolution through (52). More precisely we prove in Theorem 7 that, for .n = 1 and any .0 ≤ t ≤ C h¯ −2+ , > 0, there exists a bounded operator .Ozt such that the solution .O(t) of (52), .O(t = 0) being microlocalized on .h−1 (I ), satisfies .
+ˆ + + +
Ozt a
h−1 (I )
|ψz
ψza |
d n+1 z h¯ n+1
+ + − O(t)+ +
B(L2 )
= O(h¯ ∞ )
(57)
(valid also for .O = Identity.|h−1 (I ) ). This suggests to consider .Ozt as the symbol of .O(t) at the point z. In fact we will identify the symbol of .O(t) as a noncommutative object related to the space of leaves of the unstable foliation of the dynamics generated by the principal symbol h of H . Let us give the motivation behind this identification.
5.1 Propagation Theorem 7 ([41]) Let us take n = 1. There bounded smooth and explicitly exists t j computable functions on R4 , Otz ∼ O + ∞ O h j =1 j ¯ , such that, uniformly for 0 ≤ −2+ , t ≤ h¯ ˆ tH dz ,t a O −i tH +i h h ¯ Oe ¯ − .e |ψz z ψza | 2 B(H) = O(h¯ ∞ ), h¯ h−1 (I ) ,z t has total semiclassical Weyl symbol O ,t (ξ ; x) := Ot (eλt ν ◦ e−λt μ ◦ T s+tp,p ◦ t (z)), x = (μ, s), ξ = (ν, p). O z (58)
350
T. Paul
Remark 2 As a Corollary of the proof of Theorem 7 it is easy to prove that a similar result is still valid when we replace the functions a by a h¯ -dependent ones of the form ah¯ (·) = h¯ −n /2 a(h¯ − ·) for ≥ 0 small enough.
5.2 Noncommutative Geometry Interpretation We first prove the following Lemma. Lemma 10 Let us define σOt (z, z ) := F (z1 , x, s; x .s ),
.
(59)
where .z = x T s,0 (z1 ), z = x T s ,0 (z1 ) and .F (z1 , ., .) is the integral symbol of an operator of Weyl symbol given by (58). Then .σOt (z, z ) does not depend on .z1 . The Lemma is easily proven by the translation invariance properties of the Weyl quantization procedure. We want to identify .σOt as an element of the crossed product .A of the algebra −1 (I ) by the group .Rn+1 under the action (53). A .CI of continuous functions on .h function .σ (z, z ), z ∈ z can be seen as a continuous function from G to .A by f (μ, t)(z) = σ (z, z ), z = μ ◦ T t (z).
.
(60)
Moreover we get an action of G on .CI by, .∀g ∈ G, α(μ,t) h(z) = h( μ ◦ T t (z)).
.
(61)
on .CI α G is given by the .-product .(f1 f2 )(g) = ´The algebra structure f1 (g1 )αg1 (f2 (g1−1 g))dg1 . An easy computation, using Theorem 7 and the symbolic property of Weyl quantization, shows easily that, at leading order and for all .0 ≤ t1 , t2 ≤ h¯ −2+ , σOt1 O t2 ∼ σOt1 σOt2 .
.
(62)
Moreover the norm .|||.||| on .A is equal to the supremum over z of the operator norm on .L2 (Rn+1 ) of the operator of integral kernel .σ (z, z ). (more precisely .A α G is the completion of the algebra of compactly supported kernels .σ (z, z ) with respect to the norm .|||.|||.) We can also give a corresponding interpretation of the vectors .ψza . Let us define μ ◦ T s,0 (z), .α(z, z ) := a(μ, s). Then .ψ a = ψ α := .α ∈ A by, for .z =
z ´ ´ i z η z h ¯ α(z , z)ϕz dz .
z e We associate to any element .γ of .A an operator .T (γ ) on .L2 (Rn+1 ) defined by
Species of Spaces
351
ˆ T (γ ) :=
.
h−1 (I )
|ψ γ α ψ α |
dz h¯ n+1
.
(63)
In particular a bounded pseudodifferential operator is such an operator (with .γ ∼ ∞ ¯ j ). Moreover, by definition of the norm .|||.|||, .T (γ ) is a bounded operator j =0 γj h for all .γ ∈ A α G and it is easy to see, using arguments of the proof of theorem 7, that .T (γ ) is bounded uniformly with .h¯ ∈ [0, 1] for .γ compactly supported. Noting that (63) is a way of writing (56) we get: t j ∈A Theorem 8 For .0 ≤ t ≤ h¯ −2+ there exist . t of “symbol” .γ t ∼ ∞ ¯ j =0 γj h such that T (γ )t := e−i
tH h¯
T (γ )e+i
tH h¯
γ0t = (t )⊗2 #γ0 +O(h¯ ). (64) Moreover the leading order symbol of .T (γ )t1 T (γ )t2 is .γ t1 γ t2 .
.
= T (γ t )+O(h¯ ∞ )
with
Let us remark also that an extension on the lines of Remark 2 is also valid in this framework.
5.3 Another Groupoid Approach The following easy computation will be the basis of the starting point of Sect. 6. Let us go back to the expression (63) ˆ T (γ ) :=
.
h−1 (I )
ˆ =
h−1 (I )
ˆ
h−1 (I )
h−1 (I )
h−1 (I )
dz
ˆ
z"
dz
ˆ
dz
´ z z"
i
η
´ z
dze h¯
ˆ
α(z", ¯ z)γ α(z , z)|ϕz ϕz" |
z"
η
α(z", ¯ z)γ α(z , z)|ϕz ϕz" |
z" i
dz"e h¯
z
ˆ
i
z
dz" dz
h¯ n+1 dz"e h¯
ˆ
ˆ =
ˆ
z
ˆ =
dz
dz
ˆ =
dz
|ψ γ α ψ α |
´ z z
η
α(z, ¯ z")γ α(z , z")|ϕz ϕz |
z
dz e
i h¯
´ z z
η
β(z , z)|ϕz ϕz |
(65)
z
with
ˆ
β(z , z) :=
.
dz"α(z, ¯ z")γ α(z , z").
z
Note that, when .a(z, z") is the integral kernel of a unitary operator on .L2 ( z ), then .β = γ . This will be the starting point of our discussion in Sect. 6. In particular,
352
T. Paul
when .γ (z, z") = δ(z, z") then we recover the usual decomposition of the identity with diagonal diade. So we see that the transport property given by Theorem 8 that, denoting .A(β) := ´ ´ ´ i z e h¯ z η β(z , z)|ϕ ϕ |, we have dz dz −1 z z h (I )
z e−i
.
tH h¯
A(β)ei
tH h¯
∼ A((t )⊗2 #β).
(66)
5.4 Extended Semiclassical Measures In the same way that one associates to a vector .ψ (or density matrix) the 2 considered as a measure by the formula .ψ, Oψ quantity .ψ, ϕz | = ´ OT (z, z¯ )|ψ, ϕz |2 dz, where .OT is the Toeplitz symbol of O, one can associate to .ψ (or a density matrix) a sort of “off-diagonal” version by the quantity .Rψ (z, z¯ ) := ϕz , ψψ, ϕz for .z ∈ z . .Rψ can be considered as an element of the dual of a (dense) subalgebra of .A and will have better properties of semiclassical propagation. For sake of shortness we express the result in the case of eigenvectors of the Hamiltonian H , leaving the straightforward derivation for .Rei tH ψ in the same topology. h¯
Theorem 9 Let us define for I compact interval of .R, .DI ∼ D(h−1 (I ) × Rn ) the subalgebra of smooth compactly supported elements of .A. Let .ψ be an eigenfunction of H . Then, restricted to .z ∈ h−1 (I ), .Rψ (z, z¯ ) considered as a function on .h−1 (I ) × n R , belongs to .D I . Moreover, in the weak-* topology and, .∀ > 0, uniformly for .0 ≤ t ≤ h ¯ −2+ t #R(z, z¯ ) = R(z, z¯ ) + O(h¯ ).
.
(67)
5.5 Noncommutative Phase Space Associated to Time Arrow Let us conclude this section by summarizing what has been done. What we have constructed is (an algebra of functions on) a phase space able to handle the classical limit of the quantum propagation at diverging times as the .C ∗ -algebra of the invariant foliations associated with the classical dynamics, i.e. the noncommutative space of leaves of such invariant foliations. But this construction is different depending on whether we consider positive or negative diverging times: the stable manifold foliation is needed in the first case though the unstable one is suitable for the second.
Species of Spaces
353
The idea comes then naturally, if one wishes to unify the two arrows of time in order to get objects truly invariant by the evolution for time from .−∞ to .+∞, to consider an equivalent construction associated with the intersection of the two stable and unstable foliations. The set of intersections of two given invariant manifolds is the set of homoclinic orbits of a trajectory. Such sets will become the “leaves” of a new foliation of the classical phase space, invariant this time by time evolution from .−∞ to .+∞. This is the construction we will present in the next session, amazingly inspired by perturbation methods for the resonant harmonic oscillator.
6 The Noncommutative phase space Underlying the Quotient by the Quantum Flow Partially less rigorous mathematically than the preceding ones, the present one proposes a construction of the classical limit of quantum systems whose underlying classical dynamics is chaotic. The key point of this work in progress consists in a deep change of point of view with respect to all other, to our knowledge, attempt of handling spectral asymptotics: instead of looking at eigenvalues .(ej )J =1,... of the quantum Hamiltonian H , the quantities we consider are rather consider the set of ei −ej frequencies . h¯ , namely the spectrum of the Heisenberg-von Neumann derivation . i1h¯ [H, ·].
ij =1,...
6.1 The Non-resonant Harmonic Oscillator Spectrum and the Non commutative Torus p
From the formula for the generating function of the Hermite polynomials .{hn }n=0,... , namely ∞
p
hn (x)
.
n=0
1√ zn 2 p = π 4 2n n! = e2xt−t , hn x2 − n! L2 (R,e 2 dx)
valid for all .z ∈ C, we get easily that the normalized eigenfunctions of the harmonic oscillator .hj , j = 0, . . . satisfy |z|2
( √zh )j e− 2h¯ 2 √ 2 |z|2 ¯ − 14 − z −2 22zx+x h¯ . hj (x) e− 2h¯ := g z (x). = (π h¯ ) e √ j! j =0 ∞
Note that .g z L2 (R,dx) = 1.
(68)
354
T. Paul
We define ϕ z :=
.
e
qp −i 2h ¯
−1 (π h) ¯ 4
e−
(x−q)2 +i px 2 h¯
, with z = q + ip.
We first remark that √
g
.
= ϕz
2z
so that (68) reads now
ϕz = e
.
∞
2 − |z|h¯
2 h¯ z
√ j =0
j .hj .
j!
We get that, for any choice of .z ∈ C, z = 0, 3
(2π )− 4
.
|z| √ h¯
ˆ
2π
! " 2 2 " |z| j e− |z|h¯ 2π |z|2 # h¯ h¯
g e z e−ij t dt = it
j!
0
hj
so that ! " " .hj = #
|z|2
j
h¯
j! e−
|z|2 h¯
− 43
2π
(2π )
|z|2 h¯
|z| √ h¯
ˆ
2π
g e z e−ij t dt it
0
and in particular, taking now .|z|2 = j h, ¯ ˆ
1
h j = Cj
j4
.
(2π )
3 4
2π
ge
iθ √j h
¯ e−ij θ dθ
0
with Cj =
.
j! < 1 and ∼ 1 as j → ∞. √ 2πj j j e− j
and hj =
.
1
j ! j4 Cj 3 j! (2π ) 4
ˆ 0
2π
ge
iθ √j h
¯ e−ij θ dθ.
Species of Spaces
355
It is of course exceptional, as the harmonic oscillator is exceptional for quantum mechanics, that such an exact formula exists, giving an explicit expression for the spectral decomposition of the Hamiltonian. It is well known that the following decomposition of the identity holds true ˆ
z¯ |ϕ z ϕ z | dzd 2π h¯ .
I=
.
(69)
But there is also the following decomposition ∞
I=
|hj hj |
.
(70)
j =0
which can be rewritten as ∞
ˆ
I=
eij (θ −θ) |ϕ
.
T2
j =0
√ j h¯ eiθ
√
|dθ dθ
(71)
|ϕ z ϕ z |dz dz.
(72)
ϕ
j h¯ eiθ
or, more geometrically, ˆ
∞
I=
.
2 2 j =0 |z | =|z| =j h¯
ei
´ z z
z¯ dz
The advantage of this last formulation is that, denoting .eiθz := .n ∈ Z, ˆ
∞
An :=
j
.
|z |2 =|z|2 =j h¯
j =0
e
i
´ z z
z¯ dz inθz
e
z
z z¯ ,
one has, for any
∞
|ϕ ϕ |dz dz = z
|hj +n hj |. j =0
Note that this is also equal to ˆ
∞
2π
1 T →∞ 2T
dθz lim
j
.
0
j =0
ˆ .
ˆ .
=
∼
ˆ
T
−T
1 T →∞ 2T
dz lim
1 dz lim T →∞ 2T
ˆ
T
−T
dθz eij (θz −θz ) einθz |ϕ z ϕ z |||z |2 =|z|2 =j h¯ ˆ
T
−T
dtei
dtei
|z| h¯ t
´ t (z) z
it
einθeit z |ϕ e z ϕ z |
z¯ dz inθt (z)
e
t
|ϕ (z) ϕ z |,
(73)
356
T. Paul
where we see that only the symplectic form, the flow, and the groupoid of the flow t (defined in a few lines below) appear. We see two points:
.
itH /h¯ = eint A+ , ∀t ⇔ i [H, A+ ] = n, i.e. – .e−itH /h¯ A+ ne n N h¯ + i .A is an eigenvector of the derivation . [H, ·] of eigenvalue n N h¯ ´ z
– .ρn (z , z) := ei z z¯ dz einθz is a function on the groupoid of the flow .t := {(z , z)/∃t, t (z) = z } which is an eigenvector of the action of the flow, i.e. .ρ t (z , z) := (t )⊗2 #ρ(z , z), .ρnt = int e ρn . In dimension two we have the following formula: consider H = ω1 H0 ⊗ I + ω2 I ⊗ H0 , hj1 ,j2 = hJ1 ⊗ hj2 , ϕ (z1 .z2 ) = ϕ z1 ⊗ ϕ z2 , and we then express the same formula as before in the form, for .N ∈ Z2 , .Z = dz1 dz2 , .Z = (θz1 , θz2 ), ˆ AN :=
1 T →∞ 2T
dZ lim
.
ˆ
T
−T
dtei
´ t (Z) Z
¯ iN·t (Z) Z·dZ
e
t
|ϕ (Z) ϕ Z |.
(74)
Obviously itH /h¯ = eiN·ωt A+ , ∀t, .ω = (ω , ω ), i.e. – .e−itH /h¯ A+ 1 2 n Ne i + .An is an eigenvector of the derivation . [H, ·] of eigenvalue .N · ω, h¯
– .ρN (Z , Z) := ei which is
´ Z Z
¯ Z·dZ eiN·Z
is function on the groupoid of the flow .t
t = eiN ωt ˙ ρ . an eigenvector of the action of the flow, i.e. .ρN N
Let us remark that, when Z belongs to a Bohr-Sommerfeld quantized torus .T2 × T2 , by the ergodic theorem .
ˆ T ´ t (Z) 1 t ¯ Z·dZ dtei Z eiN·t (Z) |ϕ (Z) ϕ Z | T →∞ 2T −T ˆ ´ Z ¯ ei Z Z·dZ eiN·Z |ϕ Z ϕ Z |dZ = lim
T2
and, in particular, does not depend on Z. We will come back to this remark in Sect. 6.3.6 below. Finally, we remark that 2π/ω2 Z = (θz1 + 2π ωω12 , θz2 ),
.
so that
(75)
Species of Spaces
ˆ
357
ω 2π ω1 K 2
.
ω −2π ω1 K
ˆ
K
f ( (Z))dt = k=−K
2
2π
t
dθ P2π ω1 f ω2
0
(θ1 + θ, θ2 + θ )
where ω1 ω2 , ·).
P2π ω1 f (·, ·) = f (· +
.
ω2
Moreover ω
e
.
−i2π ω1 H 2
ω
An+1,0 e
i2π ω1 H 2
=e
ω
i2π ω1 n1 2
A(n1 ,0 .
One recovers easily the {λi − λj } = {ω1 n1 + ω2 n2 , (n1 , n2 ) ∈ Z2 }.
.
Let us remind that the numbers .λi −λj are the frequencies of the quantum evolution. Looking at eigenfunctions of .V2π ω1 on torus, we get that they are of the form ω2 ω in2π ω1
2. n ∈ Z with eigenvalues .e Obviously, .P2π ω1 is the Poincaré mapping of the linear ergodic flow .t on the
1 einθ , .√ 2π
ω2
torus .T2 associated with the Poincaré section .θ2 = 0. Recalling that .AN was constructed out of the eigenfunctions .ein1 θ1 , one just proved the following result. Theorem 10 The set of frequencies of the two dimensional harmonic oscillator are determined by the spectrum of the Poincaré mapping of the classical flow.
6.2 The Space of Frequencies and the Noncommutative Torus The operator .P ω1 which appeared in the last section is well known in noncommutaω2
tive geometry: it is one of the two generator of the so-called noncommutative torus, namely the algebra generated by .U, V satisfying VU = e
.
ω i ω1
2
(76)
UV.
Indeed, taking .U = ×eiθ on .L2 (S1 ), one sees easily that taking .V := P ω1 , the couple .(U, V ) satisfies (76). One can now reformulate Theorem 10 in a more synthetic way.
ω2
Theorem 11 The (algebra of the) quotient by the flow of the harmonic oscillator determines the set of frequencies.
358
T. Paul
6.3 Extensions to Chaotic Systems From the beginning of this section, eigenvalues and differences of eigenvalues appear the same way, due to the fact that the spectrum of the harmonic oscillator is linear. This is not the case any more for chaotic systems, and the construction which follows will emphasize the role of frequencies with respect to one of the eigenvalues. Let us note that this is far from being non-physical. On the contrary. First because the Hamiltonian is always defined up to a constant unessential to the dynamics. Second, frequencies are the quantities that one observes. Finally constructing quasimodes associated with a given approximate eigenvalue is somehow ambiguous when the underlying classical flow is ergodic as this property forbids a clear localization of eigenfunctions in phase space, except the expected de-localization on the whole energy shell predicted by quantum ergodicity. Quasimodes associated with given frequencies are superposition of eigenfunctions, leading to the freedom of relocalization by superposition of unlocalized functions. This is what we are going to see now. The advantage of the statement of Theorem 11 is that it relies on the quantum frequencies of the system to the very only flow of the classical underlying system. But one has to say and one notices that this link is subtle: it relies the invariants of the quantum flow, namely the frequencies—a very natural quantum object due to the discrete quantum structure of the quantum spectra—which describes perfectly the quotient of the Hilbert space by the quantum dynamics, to what replaces this concept of frequency—which does not exist in classical mechanics outside the very peculiar situation of integrable systems—the quotient of the space by the flow, endowed, one has to say, with its very noncommutative algebra of structure. Classical frequencies do not exist generally in classical setting, but invariants of the classical dynamics do. It is therefore natural to try to get a concept of classical frequency through the algebra of the foliation by the classical dynamics, and try to get a theorem similar to Theorem 11 holding true for quantum situations whose underlying classical dynamics is chaotic. What are the invariants of classical dynamical systems? We have seen in Sect. 5 how stable and unstable manifolds enter the game of long time semiclassical evolution. More precisely, we have seen that unstable manifolds are suitable for long positive time evolution, and stable ones to negative times. This leads to the idea of considering intersections of stable and unstable manifold for a good candidate for handling forward and backward evolution times. These intersections are very natural objects in the dynamical systems theory: they are precisely the trajectories themselves. But one may consider all the intersections of two given invariant manifolds, that is the set of homoclinic orbits associated with a given trajectory.
Species of Spaces
6.3.1
359
Homoclinic Foliations Versus Invariant Tori Fibration
Homoclinic orbits of a given trajectory .γ are trajectories themselves .{t (zhom ), t ∈ R} which asymptotically tend to .γ when .t → ±∞: d(t (zhom ), γ ) → 0 when t → ±∞.
.
An invariant non-resonant torus .ω of an integrable classical system contains trajectories dense in .. In this spirit, all these trajectories get closer and closer to each other at some moments when .t → ±∞. It is therefore tempting to think at the set of homoclinic trajectories of a given one—i.e. the set of all intersections of two stable and unstable manifolds—on the one side, and invariant non-resonant tori on the other side, as the same object with two incarnations, in chaotic and integrable situations. There are both similitude and big differences in the geometries of the two situations, but let us claim already that the density feature of the trajectories on a non-resonant torus finds an echo in the density of homoclinic curves around a given trajectory at .T → ±∞. This is particularly visible in the case of a closed periodic orbit around which homoclinic orbits accumulate. We will get back to this later. On the other side, the geometries are different by essentially two facts: first, invariant tori provide a foliation to the phase space which is actually a fibration by Lagrangian manifolds. In the homoclinic case, one cannot even talk about foliation strictu senso, since the set of homoclinic orbits presents an intrinsic discrete character, they are isolated trajectories: all of them have the homoclinic flavour but a “leaf” is the infinite (countable) union of one dimensional manifolds (trajectories). Nevertheless, there is a big temptation to consider them as leaf, and to consider a noncommutative algebra associated with this very singular object. In fact the very basic primary axiomatic structure of noncommutative geometry will pass over this extra non-connectedness property of the “leaves”: indeed the concept of groupoid, in its very algebraic structure, does not care about connectedness.
6.3.2
The Construction of the Noncommutative Structure
Let us consider the two stable and unstable foliations .{ s } and {. u } of the classical flow. To any pair .( 1s , 1u ) we associate the set .H( 1s , 1u ) of intersections of . 1s and . 1u : it consists of a (countable) union of trajectories, all homoclinic to each other. Once again this is not strictly speaking a foliation (more precisely these intersections are not leaves), but one can associated with it the following groupoid H( s , u ) := {γ , s(γ ), r(γ ) ∈ H( 1s , 1u ) for some pair ( 1s , 1u )}.
.
(77)
The set of functions on .H( s , u ) can be seen as the “crossed product” of the algebra of functions on the phase space by the groupoid .H( s , u ).
360
T. Paul
We define the homoclinic algebra associated with the flow as usual, that is as the completion of the algebra of regular functions on the groupoid .H( s , u ) . We will come back to this below.
6.3.3
Bohr-Sommerfeld Conditions I
According to Sect. 5, the forward (resp. backward) propagation on time is done using the action of a symbol on the stable manifold (resp. unstable). Therefore one expects that the propagation for both forward and backward times should involve the same way the two propagations. This implies that, between two points .z, z ∈ H( 1s , 1u ), and any two paths .zu ∈ s , zu ∈ u e
.
i
´ z z z¯s dzs h¯
=e
−i
´ z z z¯u dzu h¯
,
i.e. ˛ .
z¯ dz = 2π k h, ¯ k ∈ Z.
(78)
This formula provides a first set of Bohr-Sommerfeld conditions that must satisfy the homoclinic torus. These conditions are invariant of the path along which one computes the integral, due to the Lagrangian property of invariant manifolds, precisely. It is a set of conditions because if we are in dimension greater than 2, that is with a phase space of dimension 2d, .d > 2, the invariant manifolds, being Lagrangian, have dimension d and there are .d − 1 different independent cycles that connect two homoclinic trajectories. Therefore, there is one “missing” Bohr-Sommerfeld condition, and it will be provided by the quantization of the Poincaré mapping that will be defined in the next section.
6.3.4
The “Poincaré” Section
In this section, we present a construction of what should be the Poincaré section in the case of chaotic systems. The main difference with the integrable case will be that this section will be in a certain sense infinitesimal around the trajectory. In other words, there will be a kind of germ of Poincaré section. But the traces of the homoclinic orbits in this infinitesimal neighbourhood will be dense, as the trajectories on the integrable torus are. Therefore, a concept of quotient by the Poincaré map will survive in the chaotic situation in a noncommutative form very close to the integrable one.
Species of Spaces
361
We have seen in Sect. 6.2 that the spectrum of frequencies of the non-resonant harmonic oscillator is determined by the reduced algebra of the quotient by the flow associated with a Poincaré section. This remark, added to the Bohr-Sommerfeld considerations of the preceding section, leads to the following considerations. For the rest of this section, we place ourselves in the two dimensional situation .d = 2. Let .z ∈ γ and let ., be the flows (supposed to exist) defined in Sect. 6.2 such that . μ (z) ∈ s (γ ), ν (z) ∈ u (γ ). If for some .μ± ∈ R± , . μ± (z) belongs to an homoclinic curve of .γ then, for some .ν± , t± ,
μ± (z) = ν± (t± (z)).
.
Let us suppose for the moment that .t± = 0, one can easily show that this is the case when .γ is periodic of period T (in this case .t± can be taken as .t± = kT , k ∈ Z, i.e. t . ± (z) = z). Suppose also that there exists T such that
−T ( μ± (z)) = μ± (z) = ν± (z).
.
Again this hypothesis is satisfied when .γ is periodic of period T with μ ± = e−T μ± , ν± = eT ν± .
.
Moreover, a simple computation shows that this pseudo-period is also a pseudoperiod for any other homoclinic intersection. Therefore, T is associated with the full homoclinic “leave”. Note that we will have also T (ν± (z) = e
.
−T ν ±
(z).
We will define the Poincaré section of .γ as the set of points of the form i −nT i −T i
e μ± (z) = e ν± (z), n ∈ Z for any homoclinic point .zi = μ± (z) = i ν± (z), n ∈ Z of z. By analogy with Sect. 6.1 we will denote .e−T := ωT and call .PωT the operator of translation by .ωT on coordinates .μ± , ν± .
.
Definition 5 Let .z ∈ H( 1s , 1u ), the Poincaré section of .H( 1s , 1u ) at z is - −nT h −T h Pz (H( 1s , 1u )) := e μ± (zh ) = e ν± (zh ), n ∈ Z,
.
. h h zh = μ± (z) = ν± (z) ∈ H( 1s , 1u ) , and the Poincaré mapping at z is Pz := PωT |Pz .
.
362
T. Paul
Of course Pz (H( 1s , 1u )) ⊂ Px := t (zh ), e−t (zh ), t ∈ R+ ,
.
. h h zh = μ± (z) = ν± (z) ∈ H( 1s , 1u ) . We define also Pγ (H( 1s , 1u )) := ∪ Pz (H( 1s , 1u )).
.
z∈γ
Note that .
∪
γ ∈H(H( 1s , 1u ))
Pγ (H( 1s , 1u )) = H( 1s , 1u ).
Taking now the coordinates .μ, ν ∈ R for the four parts of the stable and unstable manifold between z and . μ± (z) we get obviously that the eigenvectors of .PωT have the form .γλ (μ± ) = μiλ ±: PωT γλ = eiωT λ γλ .
.
On the unstable manifold the “continuation” of .γλ will be .ϕλ (ν± ) = e−iλν± . Note that, defined that way, .Pz is not a manifold: in particular it contains a selfcrossing, and it is not obvious what should be a (good) system of coordinates on .Pz . Therefore we see that there are four systems of coordinates on .Pz around the point z which hence appears as singular in the sense that it is not clear what are “regular” functions on .Pz . This is a (the) main difference with the case of the Poincaré section 1 .S in the case of the (regular) torus for the harmonic oscillator as seen in Sects. 6.1 and 6.2. Nothing in the classical paradigm provides a regularization of this singularity. But (one again) quantum mechanics suppress this singularity, just by putting a new structure on the phase space: a polarization. Without entering the formalism of geometrical quantization initiated by Souriau [58], let us say that a choice of a polarization means that coordinates are somehow linked. The more standard polarization is the so-called Schrödinger one: it consists in choosing the pair of the configuration coordinate x and the one on the cotangent fibre p, and to link them by the Fourier transform. In our present situation, another (but close) choice will be relevant: the pair of (germs of) the stable and unstable manifolds in phase space, which, being Lagrangian and transverse, are symplectomorphically equivalent to the .(x, p) axes. In the case of the Poincaré section for the torus, a naturals system of coordinates is generated by the exponential functions .einx , x ∈ [0, 2π ) which are also one of the two generators of the torus algebra. We will see in the next section the equivalent of this in the homoclinic setting
Species of Spaces
6.3.5
363
Bohr-Sommerfeld Rules II
In this section, we will give a groupoid construction of the homoclinic torus and construct on it the action of the dynamical flow. This will show what a “function of the noncommutative torus” is, leading to Bohr-Sommerfeld conditions and end up at the construction of a “frequency mixed quasimode”, mixed in the sense of being a density matrix.
Strategy Let us suppose first that the invariant manifolds are (locally near z) exactly the q and p spaces of the phase space. We will get back to the general situation at the end of this section. More precisely, we fix z = (0, 0)
.
u
± z := { (z), ±u s ± z := { (z), ±s
≥ 0, |u| ≤ } ⊂ {(q1 , 0, 0, 0, .....), ±q1 ≥ 0} ≥ 0, |s| ≤ } ⊂ {(0, p1 , 0, 0, .....), ±p1 ≥ 0}.
− + Remembering that . + z and .z are connected in .Pz , and the same for . − and .− , we want to define a “regular” function on .Pz as a function which is the symbol of ± Lagrangian distributions associated with . ± z , z . What does this mean? π − + − One passes from .+ z ∪ z to . z ∪ z by a rotation in phase space of angle . 4 : .(q, p) → (p, −q). This corresponds to a Fourier transform. Therefore, a good function on .Pz should satisfy the following property: the − restriction of the function on . + z ∪ z must be the Fourier transform of the + − restriction of the function on .z ∪ z . As we have seen, the Poincaré mapping acts by dilation by .e±T of the variables .μ, ν so that eigenvector should be complex powers of .μ, ν, namely of the form .μiλ , ν −iλ (after extraction of the Jacobian term 1 1 .μ 2 , ν 2 as we shall see below). Such Fourier transform can be found in [21, p. 360] (or recomputed after some easy complex dilations) and are given by the following formulas.
The Continuity Condition For .λ ∈ / ±N, ˆ
+∞
.
−∞
λ i x+ e
xσ h¯
√dx 2π h¯
π hλ+1 π ¯ = i(λ + 1) eiλ 2 σ+−λ−1 − e−iλ 2 σ−−λ−1 √ 2π h¯
(79)
364
T. Paul
ˆ
+∞
.
−∞
λ i x− e
xσ h¯
π hλ+1 π ¯ . = i(λ + 1) eiλ 2 σ−−λ−1 − e−iλ 2 σ+−λ−1 √ 2π h¯
√dx 2π h¯
(80)
We have four branches .μ± , ν± which have to be connected so that they form a system of coordinates for the “torus” intersection of the two stable and unstable manifolds. This gives a set a values .λn , n = 0, 1, . . . by the following reasoning. We will have four functions, with .λ = iω, iω− 12
s c+ μ+
.
iω− 12
s , c− μ−
−iω− 12
u , c+ ν+
−iω− 12
u and c− ν−
.
u is related to .cs and But by Definition 5 of the Poincaré section, the coefficient .c+ + u s ± the same for .c− and .c− : .∃fω such that u s c+ = fω+ c+
.
u s c− = fω− c− .
The “continuity” equation will then read: ˆ
+∞
.
−∞
iω− 12
s (c+ μ+
iω− 12
s + c− μ−
)ei
μν h¯
√dμ 2π h¯
−iω− 12
u = c+ ν+
−iω− 12
u + c− ν−
−λ− 12
s = fω+ c+ ν+
−iω− 12
s + fω− c− ν−
using (79) and (80), we get π
π
π
π
√ s 2π h¯ −iω fω+ c+ √ s = 2π h¯ −iω fω− c−
s s (iω + 12 )eω 2 ei 4 = ic+ (iω + 12 )e−ω 2 e−i 4 − ic−
.
π
π
π
π
s s −ic+ (iω + 12 )eω 2 ei 4 + ic− (iω + 12 )e−ω 2 e−i 4
i.e.
s c+ .Uω =0 s c−
(81)
with ⎛ .Uω
⎜ =⎝
π
π
i(iω + 12 )e−ω 2 e−i 4 − −i(iω +
√ 2π h¯ −iω fω+
1 ω π2 i π4 e 2 )e
⎛ √ π π (iω + 12 )e−ω 2 ei 4 − 2π h¯ −iω fω+ ⎜ =⎝ π π (iω + 12 )eω 2 e−i 4
π
⎞
π
−i(iω + 21 )eω 2 ei 4 i(iω +
1 −ω π2 −i π4 e 2 )e
π
√ − 2π h¯ −iω fω−
π
(iω + 21 )eω 2 e−i 4 (iω +
1 −ω π2 i π4 e 2 )e
√ − 2π h¯ −iω fω−
⎟ ⎠
⎞ ⎟ ⎠.
(82)
Species of Spaces
365
The Transmission Coefficients Therefore the condition for (81) to have a non-trivial solution is that .
det Uωn = 0.
We got to the point where we have to determine geometrically the two numbers .fω± . As in Definition 5 we define h
h
zh = μ± (z) = ν± (z).
.
We impose that (remember that .η is a symplectic potential: .dη = i . h¯
e
´ zh z
ηs
i
s c± (μh± )iω = e h¯
´ zh z
ηu u h −iω c± (ν± )
i
u i.e. c± = e h¯
´ zh z i
:= e∓ h¯
¸
ηu + h η±
´z
dz∧d z¯ 2i )
zh η
s
h iω u (μh± ν± ) c±
h iω u (μh± ν± ) c± ,
where .ηs , ηu are the symplectic potentials taken over the stable, unstable manifold. Therefore i
fω± = e∓ h¯
.
¸
h η±
h iω (μh± ν± ) .
(83)
Note that the computation does not depend on the point .zh Indeed, if h
h
z1 = α± (z) = β± (z),
.
then, for some .n ∈ Z, t
h
z1h = nT zh = e μ± ...
.
¸ and so is . ηh .
Bohr-Sommerfeld Conditions Therefore the condition (81) becomes ⎛ ⎞ √ π π π π (iω + 12 )eω 2 e−i 4 (iω + 21 )e−ω 2 ei 4 − 2π h¯ −iω fω+ ⎜ ⎟ . det ⎝ ⎠ = 0, √ π π π π 1 −ω 2 i 4 1 ω 2 −i 4 −iω − (iω + 2 )e e − 2π h¯ fω (iω + 2 )e e
i.e.
366
T. Paul
0 = 2i(iωn + 12 )2 cosh(π ωn ) ¸ h
i¸ h i π π√ h iωn h iωn −(iωn + 12 )e−ωn 2 ei 4 2π h¯ −iω e h¯ η− (μh− ν− ) + e− h¯ η+ (μh+ ν+ )
¸ ¸ h i h h h iωn ν− ) (84) +2π h¯ −2iω e h¯ η− − η+ (μh+ μh− ν+
.
or π π h cosh(π ωn ) ¯ −iω hi¯ ¸ η−h h h iωn (μ− ν− ) − (iωn + 12 )e−ωn 2 ei 4 √ e π 2π ¸ h i h iωn ) + e− h¯ η+ (μh+ ν+
¸ ¸ h i h h h iωn ν− ) = 0 + h¯ −2iω e h¯ η− − η+ (μh+ μh− ν+
i(iωn + 12 )2
.
or also ⎛ 1 −ωn π2 i π4 h ν h iωn ¸ h )e e + (iω i μ n − − η 2 1 2 cosh(π ωn ) ⎝e h¯ − .i(iωn + ) − √ 2 π h¯ 2π ⎞ h ν h iωn ¸ h i μ + + ⎠ +e− h¯ η+ h¯ +e
i h¯
¸
¸ h − ηh η− +
h μh ν h μh+ ν+ − − h¯ h¯
iωn = 0.
In particular, under (BSCI), i(iωn +
.
1 2 cosh(π ωn ) 2) π
h μh+ ν+ + h¯
1 −ωn π2 i π4 e 2 )e
(iωn + − √ 2π
⎛
μh ν h ⎝ − − h¯
iωn
iωn ⎞ h μh ν h iωn μh+ ν+ − − ⎠+ = 0, h¯ h¯
h = νh = νh, and under “symmetry” .±, .μh+ = μh− := μh , ν+ −
(iωn cosh(π ωn ) −2 √ i(iωn + 12 )2 π
+ 12 )
2π
e
−ωn π2 i π4
e
μh ν h h¯
iωn h h 2iωn μ ν − = 0, h¯
Species of Spaces
367
which gives (iωn +
.
1 2)
=π
−ω π i π 4
iωn −2e √ n 2 e μh ν h 2π
±
π
ωn ) i π2 e−2ωn 2 − 4i cosh(π π
2 cosh(π ωn ) h¯ h h iωn π π −ωn 2 √ ± ie+ωn 2 μ ν i π4 −e e = 2π 2 cosh(π ωn ) h¯
and √ μh ν h iωn −e−ωn π2 ± ie+ωn π2 . 2π 2 cosh(π ωn ) h¯
2 cosh (π ωn ) π = = 2π 2 cosh (π ωn ) 2 cosh (π ωn ) = |(iωn + 12 )|2
since, by the complement formula we have |(iωn+12 )| =
.
(iωn + 12 )(−iωn + 12 ) =
π sin (π(iωn +
1 2 ))
=
π . cosh (π ωn )
Therefore we get the (BSII) condition
.
cos arg (iωn +
μh ν h 1 2 ) + ωn log h¯
+
π 4
π
1 e−ωn 2 = −√ . = −√ 2 cosh (π ωn ) 1 + eπ ωn
In the symmetric case without (BSI) we get .
h h cos arg (iωn + 12 ) + ωn log μ h¯ν + π4 i¸ ¸ / i π = cosh (π ωn ) e−ωn 2 e h¯ η+ + e h¯ η− +
¸ ¸ i eωn π e h¯ ( η− − η+ )
+ e−ωn π
i¸ ¸ i η+ η− h h ¯ ¯ −e e ,
¸ ¸ ¸ and in particular if . η+ = η− := η, .
cos arg (iωn + 12 ) +
In the fully general case:
¸
η h¯
h h
+ ωn log μ h¯ν +
π 4
=−
1 . 1 + eπ ωn
368
.
T. Paul
cos arg (iωn + 12 ) − ωn log h¯ + =
/
i¸ ¸ i π h iωn h iωn ) cosh (π ωn ) e−ωn 2 e h¯ η+ (μh+ ν+ ) + e h¯ η− (μh− ν−
+
π 4
¸ ¸ i h μh ν h )iωn +e−ωn π eωn π e h¯ ( η− − η+ ) (μh+ ν+ − −
e
i h¯
¸
η+
i
h )iωn −e h¯ (μh+ ν+
¸
η−
h )iωn (μh− ν−
,
or .
h h h μ− ν− − log h¯ + π4 cos arg (iωn + 21 ) + ωn 12 log μh+ ν+ i¸ ¸ / i π h −iωn h −iωn ) + e h¯ η− (μh+ ν+ ) = cosh (π ωn ) e−ωn 2 e h¯ η+ (μh− ν− +
¸ ¸ i eωn π e h¯ ( η− − η+ )
+ e−ωn π
e
i h¯
¸
η+
h )−iωn (μh− ν−
−e
i h¯
¸
η−
h )−iωn (μh+ ν+
(85) This is quite complicated expression, but note that, as .ωn → ∞, arg (iωn + 12 ) ∼ ωn log ωn − ωn +
.
1 + O(ωn−2 ). 24ωn
(86)
The Spectrum of Frequencies The preceding discussion leads to the following conclusion: 0 .
λωn +
2π T m, (n, m)
1 0 ∈ Z2 ⊂ lim
Ei −Ej h¯ h¯ →0
1 , i, j ∈ N .
(87)
One notes the similitude between the different equations satisfied by the frequencies .ωn and the Bohr-Sommerfeld conditions obtained for the one dimensional systems near a separatrix [14, 17, 56]. The fact that equivalent formulas are valid for eigenvalues in the double well case and differences of eigenvalues in our setting can be explained by the fact that in the separatrix case the energy of the separatrix is fixed equal to 0, otherwise the formula would be valid for .Eseparatrix − Ej . In our case, the Hamiltonian is by no means fixed anywhere, as it should not be since the addition of a constant to the Hamiltonian does not modify the dynamics. A closer look at the quantization formulas (85), (86) shows that what is important is the germ of “eigenfunctions” of the Poincaré mapping near the trajectory. On the contrary to the integrable case which involves the whole Poincaré section. This is a trace of the fact that the density of homoclinic orbits is visible as the accumulation of them near the trajectory (and not by an ergodic property of the Poincaré map as in the integrable case).
.
Species of Spaces
369
Analysis of the “spectrum” as h¯ → 0 Let us look at the symmetric case without (BSI): .
cos arg (iωn + 12 ) +
¸
η h¯
h h
+ ωn log μ h¯ν +
• High part of the spectrum Thanks to (86), (88) becomes as .ωn $ . cos (ωn log ωn
− ωn +
1 + 24ωn
¸
η h¯
+ ωn log
=−
1 . 1 + eπ ωn
(88)
→∞
1 h¯
μh ν h h¯
π 4
+
π 4
+ O(ωn−2 )) = O(ωn−∞ )
1 + ωn log ωn − ωn + 24ωn
¸
η h¯
+ ωn log
μh ν h h¯
=
π 4
+ nπ + O(ωn−2 ), n ∈ Z
ωn ∼
¸ η 1 log h¯ ( h¯
−
π 4
+ nπ ), n ∈ Z.
• Low part of the spectrum √ √ (88) becomes, as .ωn = o(1) as h¯ → ∞, since .( 12 ) =, π ( 21 ) = − π (γ + 2 log 2) where .γ is the Euler-Mascheroni constant, so that
.
d 1 arg (iω + 21 )|ω=0 = 2i dω = =
. cos
d (iω + 12 ) dω (−iω + 21 )
(−iω + 12 ) (iω + 12 )
|ω=0
1 i (iω + 12 )(−iω + 12 ) + i (−iω + 21 )(iω + 21 ) (−iω + 21 ) |ω=0 2i (−iω + 12 )2 (iω + 21 ) ( 12 ) ( 12 )
= −γ − 2 log 2,
√ π − (γ + 2 log 2)ωn +
¸
η h¯
+ ωn log
μh ν h h¯
+
π 4
π ωn 1 + O(ωn2 ) + O(ωn2 ) = − + 2 4 ωn =
3π √ 1 (− − π+ log h¯ 4 +O(
• Number of frequencies an easy computation shows that #{ωn , |ωn | ≤ 1} ∼ log h¯ .
.
1 ), log h¯ 2
n ∈ Z.
¸
η h¯
+ n2π )
370
T. Paul
Let us remark to finish this paragraph that the spectrum of frequencies in the chaotic situation is denser than in the integrable case. This is due to the term .log h¯ present in the two last formulas. This is true as soon as we consider the contribution of a single homoclinic orbit and is a fortiori valid when all of them are considered in a row. On the other side, a single homoclinic orbit might give the full frequencies spectrum, as, as we saw before, what is important is the germ near an infinitesimal neighbourhood of the trajectory. In this neighbourhood, all the homoclinic orbits are dense, like, by ergodicity, in the integrable case in the full Poincaré section. Therefore, an homoclinic torus might be associated with a (any in fact) single homoclinic orbit, as the regular torus in the case of the non-resonant harmonic oscillator.
Homoclinic Torus We clearly saw that, at .h¯ → 0, the spectrum of frequencies does not depend, at leading order, to the invariant quantity .μh ν h associated with each homoclinic orbit. More, .μh ν h appears as a kind of Maslov index. Therefore, the structure limits, that is the sets of allowed functions on the Poincaré section, are really associated with the set of intersections of the two invariant manifolds: one has to consider all the homoclinic orbits .γih such that ¸ γh i
η− h¯
¸ γh j
+ O( log1 h¯ ) ∈ 2πZ. In other words, calling prequantized the tori associated with the same “space of functions”,
.
˛ {prequantized tori} = {homoclinic orbitsγih }\ ∼, γih ∼ γjh ⇔
γjh
.
˛ :=
γjh γih
γih
˛ ηs +
γih
γjh
η=
ηh + O( logh¯ h¯ ) ∈ 2π h¯ Z.
It is a polarization in the framework of geometric quantization [58]. Note that these tori are the equivalent to the tori surrounding periodic linearly stable trajectories, in the construction of quasimodes “à la Ralston” [54]. Note that in our case also, the tori are infinitesimal around the periodic trajectory.9
9 It would be interesting to look at our construction near an unstable periodic trajectory in the limit where the period diverges, and see if appears naturally a limit of the frequencies given by .ωk / log h. ¯
Species of Spaces
371
General Geometrical Setting In the last section, we considered the case where the stable and unstable manifold are (tangent to) the q- and p-spaces. The general case of two invariant manifolds is got by the combination of a rotation and a dilation in phase space: for example, a 2 2 rotation by .−π/4 sends the generation of dilation qp to . p −q and a dilation by .α 2 −2 2 2 2 to .α (p − α q ) whose invariant manifolds are the two lines .p = ±αq. The quantum action of any of these linear symplectic mappings is metaplectic operators M and .Uω becomes .
−1 M 0 M 0 Uω = Uω . 0 M 0 M −1
So that the matrix, and therefore the discussion of the preceding section, does not depend on the geometrical setting.
On the Spacing of Energy Levels Let us come back to (87) which expresses (some) differences of eigenvalues divided by .h¯ as .λωn + 2π T m, n.m ∈ Z: .
Ei − Ej ∼ λωn + h¯
2π T m.
If we order the spectrum .{Ei } by increasing order in i, we find that . Ei+1h¯−Ei should E −E
be the smallest of all the numbers . j h¯ i , j ∈ Z, keeping i fixed, so we are looking at small values of .λωn + 2π T m, which can appear only when .ωn diverges. Therefore we look at the high part of the frequencies spectrum that leads to look at the quantity .
¸ η 1 ( log h¯ h¯
−
π 4
+ nπ ) +
2π T m,
n, m → ±∞
and look at how accumulate these numbers near .− π4 as .n, m → ±∞. Conclusion We see that, on the contrary with the integrable case presented in Sects. 6.1 and 6.2, the mean of the spacing between two “eigenfrequencies” is not of order 1 any more, 1 but of order . log h¯ . This is somehow in accordance with the case of estimates given by trace formula around a critical point of the Hamiltonian, as shown in [10]: in our case in this paper,
372
T. Paul
there is no critical Hamiltonian, but the dynamics of the Poincaré resemble to the one near a critical point. The trace of the fact that a (quantized) mapping instead of a flow is concerned is visible on the fact that our prediction concerns frequencies and not eigenvalues: no value of the Hamiltonian is fixed here in advance. Once again, our goal is not to find eigenvectors, a task which seems to escape from semiclassical research activity, but to look at superposition of them that are well localized in noncommutative “phase space”. We will say more on this in the next section but a few remarks are in order to conclude this paragraph. Up to now we treat only the case where the stable and unstable manifolds are orthogonal, i.e. project on the canonical system of coordinates on .T ∗ Rd . The general case can be pull back to this case by a symplectic change of variables, leading in fact to an unessential change in the quantization formulas, as expected earlier. The construction provided strongly accentuates the importance of looking technically at frequencies and not at eigenvalues. The physical interest of this paradigmatic change was already mentioned earlier. The logarithmic factor present in all quantization formulas shows clearly that the frequency spectrum, though being fully discrete (because the eigenvalue’s one is so), tries to mimic, as .h¯ → 0, a continuous one (as this is even accentuated if one consider all homoclinic orbits in a row). This remark is not without reminding the quantum dynamics for Hamiltonian with continuous spectra, leading to diffusion on the contrary of the discrete case leading to quasiperiodic bounded evolution. This is certainly a source of what one can call, in this framework, quantum chaos.
6.3.6
Creation, Annihilation, and All That
In the preceding section, we have “guessed” by a geometrical analogy with the case of the resonant harmonic oscillator what should be frequencies quantization formula. Now we need to construct a quasimode corresponding to this guess. We will look at such a quasimode by using an ansatz given by formula (65) in Sect. 5.3, namely we look at an operator of the form ˆ A(β) =
.
ˆ
h−1 (I )
i
dz e h¯
dz
´ z z
η
β(z , z)|ϕz ϕz |.
(89)
z
Looking at .A(β) “eigenvector” of the conjugation by the flow leads, by (66), to ei
.
t (λi −λj ) h¯
A(β) = e−i
tH h¯
A(β)ei
tH h¯
= A((t )⊗2 #β)
that is (t )⊗2 #β = ei
.
t (λi −λj ) h¯
β.
(90)
Species of Spaces
373
The eigenvectors of the mapping .(t )⊗2 # will be exponential functions in the system of coordinates given by the horocyclic flow. Of course, the groupoid aspect expressed above leads to some discreteness of these coordinates, but the density of homoclinic trajectories gives an equivalent to continuous coordinate at the level of germs. In fact, what we need is that ˆ ´ i z dz e h¯ z η β(z , z)|ϕz . Pz
is a Lagrangian distribution. This corresponds to have a “continuity condition” of the form ˆ .
i
−
+ z ∪ z
dz e h¯
´ z z
η
βλk (z , z)|ϕz =
ˆ − + z ∪z
i
dz e h¯
´ z z
η
βλk (z , z)|ϕz .
And this obliges .λk to satisfy the Bohr-Sommerfeld condition (87) of the preceding section. This construction leads to “eigenvectors” (more precisely “quasidensity matrices”) of the quantum flow at times multiples of T and one recover, as usual, a total eigenvector, i.e. for any time t, of the quantum flow by integrating over a period the conjugate by the flow of the preceding construction. We get Aλk =
.
1 Tγ
ˆ 0
Tγ
ˆ −
+ z ∪ z
i
dz e h¯
´ z z
η
βλk (z , t (z))|ϕz ϕt (z) |.
(91)
Let us remark the strong link between formulas (91) and (75) and therefore (74): the integral in (91) can be considered as an integral over two cycles of the homoclinic “torus”. Therefore it is the strict equivalent to (75) and, thanks to the first Bohr-Sommerfeld condition(s), can be seen also as an ergodic “average over the flow”. This remark will become more lighting by a “groupoid” construction leading to a more geometrical construction of the operator .Aλk in (91). In the preceding paragraph, we associate to the homoclinic algebra—quotient of the phase space by the flow—the classical limit of (part of) the set of frequencies of the quantum evolution. What are quantum frequencies? They are (-i times) the eigenvalues of the Heisenberg-von Neumann derivation .
i [H, ·]. h¯
What are the “eigenvectors”? Obviously off-diagonal dyadic operators .|ψi ψj | where .ψk is an eigenvector of eigenvalue .Ek .
374
T. Paul
On the other side, to .ψλn one can associate operators, by the same trick that what we did in Sect. 6.1, to elements of the homoclinic algebra reduced to the “Poincaré” section which are eigenvectors of the shift operator .PωT of eigenvalue .λn , n ∈ Z. These operators are somehow functions of the creation and annihilation operators since they produce a shift in the spectrum .ψEi → ψEi +λn . Even in the case of the harmonic oscillator, the shift operators are very singular, since they reads .
a± with a ± = x ± √ H0
d dx ,
which is singular at the origin, classically. In fact the formulation (73) with .n = ±1 desingularizes this singularity, by a non-local formula, already for the rather spectrally stupid harmonic oscillator. Spectral shift operators are not “regular” observables, by far. They are semiclassical operators associated with noncommutative symbols, belonging to the noncommutative algebra of the homoclinic “foliation” of our system in the chaotic case, or to the noncommutative algebra of the invariant tori foliation in the integrable case. The construction of off-diagonal operators we propose is directly inspired from the construction established in Sect. 5.3.
6.3.7
A New and Noncommutative Framework for Classical Dynamics
The construction above shows clearly that the limit .h¯ → 0 of the quantum dynamics involves a much richer structure than the one provided by the classical paradigm, namely a nice geometrical space host a nice flow. Of course one knows since Poincaré that this nice flow on a nice space produces, when long evolutions are considered, not nice (at all!) structures, such as complex foliations that Poincaré himself described as not possible to draw. And the chaos appears precisely when one wants to merge these singular objects with the natural original space: chaos is the trace on the phase space of the homoclinic machinery. But what we see is that the true classical limit of quantum mechanics—namely the limit of the dynamics when .h¯ → 0—involves these complex “homoclinic” structures (showing up out of the quantum dynamics by letting .h¯ → 0) per se, that is the limit dynamics lives on them, and not only the traces we just mentioned. In this sense, classical limit is as stable as quantum mechanics, at the condition of letting it live on the right structures, different from the ones of the “underlying classical dynamics” and keeping track of some residual noncommutativity. This limit “space” is not an absolute one as one of the Newton (and Kant!); it depends intrinsically of the dynamics itself. The space does not host the dynamics any more, it is the dynamics which hosts the space.
Species of Spaces
375
The dynamics comes before the space as l’existence prédède l’essence (once again).
6.4 Miscellaneous Section 6 somehow synthesizes several aspects already presented in the preceding sections. Several remarks are in order. On the quantum side, an extension of the standard quantization procedure- in fact the Töplitz one—is necessary in order to reach the limit .h¯ → 0, or at least some uniformity in .h¯ ∼ 0. This is not only true for situations involving singularities or chaotic behaviour, it is also the case for the non-resonant harmonic oscillator of every day’s quantum mechanics. The classical level, the extension of Toeplitz quantization, lives on an extension of the regular phase space. The fibrated extension from configuration space to phase space—adding a fibre over each position point taking care of the momentum dimension—is not enough any more: one needs to foliate the fibre bundle by adding over each point an invariant locus of the dynamics (invariant torus, stable and unstable manifolds). This addition is linked also to a (generalization of a) foliated structure on the invariant locus itself (ergodic flow on tori, homoclinic flows on stable/unstable manifolds). These objects remain passive when the standard classical dynamics is involved but become somehow active when the dynamics is inherited form a quantum one. And they do remain so if one considers now that the dynamics coming from quantum mechanics is the true classical one. A new classical paradigm where the underlying a priori chaotic behaviour is in a certain sense taken into account in the kinematics itself. Indeed, this microlocal phase space.⊗2 , microlocal near points (or trajectories) in the standard classical phase space, the same way that the phase space is itself microlocal near points of the configuration space, can be seen as constituted of objects which can be the same way both points or invariant manifolds. And this is the key of the (strange) phenomenon of long time classical/quantum evolution when classical unpredictability and quantum indeterminism merge (see [48] and the next section of the present article): the long time classical evolution blows up to the point into the unstable manifold—this is the unpredictabilityexactly as the quantum evolution spreads a coherent space into a (WKB) Lagrangian state localized precisely on the unstable manifold—and then the indeterminism of the quantum paradigm, according to the Born interpretation of the wave function [11], reduces the points of the Lagrangian to have an insignificant, purely probabilistic role. If a point, according to Piero della Francesca [18], is “that part which is not”, quantum paradigm provide delocalized locus “which are not”. Assuming this, we believe that quantum mechanics becomes crystal clear. Let us finish this miscellaneous section by mentioning several directions for future extensions of the main ideas present in this homage to Alex Grossmann.
376
T. Paul
We have seen the importance of Poincaré mappings—both classical and quantum—in our construction. The Poincaré mapping is the reduction to a Poincaré section of a (continuous time) flow. Likewise, dynamical systems—such has linear mappings on tori—have a suspension which let them become a Poincaré mapping themselves. We believe that our construction furnishes an exact geometrical construction of the spectrum and eigenvectors of the quantization of the Arnold” s cat, a quantum system both chaotic and explicitly solvable. We also think that noncommutative constructions close to the one in the present section of our paper could provide existence of a time operator, existence excluded by the famous Pauli Theorem which remarks that no self-adjoint operator T can satisfy the commutation relation .[H, T ] = i h¯ when H is bounded from below10 (or from above): the fact of considering frequencies instead of eigenvalues already cancels the difficulty of semi-boundedness. Finally, a major achievement in quantum physics of the last fifty years concerns the understanding the phenomenon of statistics of level spacing: a “trace” of chaotic (or not) behaviour of the underlying classical mechanics exists apparently in the probabilistic distribution of the spacing between eigenvalues: our approach links directly frequencies—i.e. differences of eigenvalues—to the geometry of the underlying dynamics and the properties of its invariants.
6.5 Conclusion: The Quotient of the Phase Space by the Flow In this section we have seen that there is, in the chaotic situation, an equivalent to the tori fibration for integrable systems: this is the (noncommutative) space of the ‘foliation” of phase space by the “homoclinic leaves”, which consist in the quotient of the set of trajectories by the equivalence relation “being homoclinic”. A leaf is the set of all homoclinic trajectories to a given point .(p, q) or trajectory .γ . Each of these “Lagrangian” leaves lives in an extended phase space generated by the stable and unstable manifold at .(p, q) or .γ . Each point outside this homoclinic tangle has a hyperbolic behaviour when evolving in time. In another, but close, point of view, there is a discrete set of frequencies associated with each homoclinic orbit in the homoclinic “Lagrangian torus”. The smallest of these frequencies are not orbit dependent, and so are that way associated with the whole homoclinic Lagrangian itself. The frequencies become dense as .h ¯ → 0 (on the contrary of the integrable case) revealing a kind of continuous spectrum : this is the way chaotic hyperbolic dynamics appears.
10 Let
us recall that the formal argument is very simple, if .H ψλk = λk ψλk and .[H, T ] = i h, ¯ then,
for all .λ ∈ R, .e−i
λT h¯
ψλk = (λk − λ)ψλk .
Species of Spaces
377
7 Indeterminism Versus Unpredictability (How Quantum Indeterminism Would Have Chocked Laplace But Not Poincaré) This last section will be concerned by the indeterministic feature of quantum mechanics, a feature usually considered as characteristic of the quantum paradigm. It could seem a priori out of context in a paper mainly concerned with space to look at the indeterministic properties of the process of quantum measurements. Measurement is a part of the quantum dynamics and hence belongs more to time considerations. Beside, probabilistic features of quantum mechanics are often considered as disappearing at the classical limit because macroscopic sizes ‘forces” quantum (intrinsic) probabilities laws to become if not binary at least classical (that is just a witness of a lack of knowledge). On the contrary we will show that quantum indeterminism merges, at the classical limit, the concept of unpredictability by the de-localization of the wave functions precisely on the locus emphasized by the chaotic sensitivity to initial conditions: the unstable manifold. Our conclusion will be, to summarize, that if the paradigm of quantum mechanics differs from the classical one, it is with the Laplace one [31, p. 2] and let us remind that the “démon de Laplace” was already destroyed at the end of nineteenth century by the discovery of chaos in the works [53] of the “démon Poincaré”, Dont act.
7.1 Measurement in Quantum Mechanics Expressed in the most synthetic way, quantum dynamics “à la Copenhagen” is a succession (a combination) of two different types of evolution for elements of a Hilbert space .H: – Heisenberg-Schrödinger flows associated with Hamiltonian self-adjoint operatH tors H driven by the unitary operators .ei h¯ : namely, it reads infinitesimally .
ψ(t) → ψ(t + δt) = ei
δtH h¯
ψ(t).
– Measurement processes associated with self-adjoint observable O driven by random mappings sending instantaneously the state .ψ(t) to any eigenvector .ψk of O: .
ψ(t) → ψ(t + δt) = ψk for some random k,
the randomness being associated with the probability law .Pψ(t) (k) |ψ(t)|ψk |2 .
=
378
T. Paul
It is important to realize that nowadays (as it is the case in the Copenhagen axiomatic) measurement is a true part of the quantum evolution—for example, one can realize nowadays experimentally evolutions only driven by successive measurements, each one being “decided” by the result on the preceding one.11 One the other side, it is also important to admit that measurement is not part of the first type of evolution: even in the macroscopic limit of large number of involved particles decoherence phenomena do not explain the randomness of a single measurement process [28]. It is an intrinsic probabilistic event, not a consequence of luck of precise information on the system as in other domains of classical physics, such a statistical physics. It could seem therefore totally reconcilable with the paradigm of classical mechanics, but we will see that in fact that this indeterminism, at the classical limit, will in a certain sense fill a hole existing in the (comfortable but questionable) completeness of classical dynamics. Let us finish this section by saying that the loose of determinism while passing from classical to quantum is accomplished together with the creation of tools permitting the (in particular experimental) realization of the “measurement’ axiom, namely the discreteness of spectra. Indeed, the probabilistic meaning of the distribution .Pψ in (k) is that, if one realizes the same measurement process, that is with the same observable O and the same initial vector .ψ in , a large (infinite) number of times, although the different results .ψk , k = 0, . . . , ∞, will be each time intrinsically not possible to determine, the statistical distribution of values of k will be given by .Pψ in (k). Therefore the difficulty is to be sure each measurement is effectuated on the same initial vector .ψ in . We are touching here a notion of precision which is a subject very sensitive in classical mechanics due to the continuous feature of geometry of spaces, but which is much easier to handle in quantum mechanics just because any observable furnishes a discrete set of vectors: its own eigenvectors (see, e.g. [50] for further details).
7.2 Critics of the Deterministic Reason of Classical Mechanics The truly fabulous change of paradigm that Poincaré introduces at the end of the nineteenth century, much before the birth of quantum mechanics, forces us to revise, at the light of quantum considerations such as those exposed in the last paragraph of the preceding Sect. 7.1, the notion of measurement in classical mechanics (and also in computer sciences, see [34–36]): can we really talk about determinism when the computations of Jacques Laskar show that a precision of a few meters for the knowledge of the position of the earth is needed to prove that it will stay in the solar system in a few millions of years [32]? The phenomenon of “sensibility to initial conditions” for chaotic dynamical systems do not only destroy the Laplacian paradigm which would suggest that systems close to integrable are still integrable, their modern formulation poses the 11 cf.
Serge Haroche, Lectures at Collège de France.
Species of Spaces
379
question of space after infinite time evolution. Of course, in principal, time remains finite in classical dynamics12 but the theory of chaos needs ontologically such an infinite extension of time. For example, ergodicity involves averages on infinite times. But chaoticity features avoid carefully to define evolution at times .t = ∞: one cooks up some quantities (averages etc) defined at finite times the limit of which at .t → ∞ are considered. The reason for this careful prevention is very simple: the classical flow .t at time .t = ∞ does not exist. More precisely it escapes to the classical paradigm: this is the hole in completeness mentioned earlier. And this escape is a question of space: flow at infinite time escapes from the classical paradigm because it does not preserve the geometrical structure of points in space: after infinite evolution, single points would “become” extended objects as manifolds.
7.3 Pushing Sensitivity to Initial Conditions to Its Extreme The following diagram expresses the phenomenon we just described. Let us consider a hyperbolic fixed point y of an Anosov (say) dynamical system. Associated with it is an unstable manifold . y , set of all the points x “attracted” to y after a long backwards evolution
y = {x, lim −t (x) = y}.
.
t→+∞
Therefore, one has the following series of facts, (naively) logically linked as follows. ∀x ∈ y ,
.
t→+∞
−t (x) −→ y,
∀x ∈ y ,
−∞ (x)
=
y
⇔
−∞ ( y ) = {y}
+∞ (y)
=
x, ∀x ∈ y
⇔
sensitivity to initial conditions 12 “Eternity
∼
unpredictability.
is very long, specially at the end!”, P. Desproge.
+∞ (y) = y
380
T. Paul
Usually, the idealized version of the sensitivity to initial conditions expressed by “.+∞ (y) = y is indeed considered as an expression of unpredictability: one cannot predict the position of the point y after and infinite time evolution as we cannot answer the question of the presence or not of the earth inside the solar systems in a quasi-infinite time (a few millions of years) without knowing its position with a quasi-infinite precision (a few meters). But the two last line of the diagram could into be perfectly changed . ∀x ∈ y , −∞ (x) = y ⇔ −∞ ( y ) = {y}
+∞ (y) = x, ∀x ∈ y ⇔ +∞ (y) = y sensitivity to initial conditions ∼ indeterminism. This is exactly what we say by “we cannot answer the question of the presence or not of the earth in the solar systems in a few millions of years”. We will see that this vision of unpredictability/indeterminism is conform to the heritage of the quantum indeterminism at the classical limit .h¯ → 9.
7.4 Some Space for Merging the Two The so-called probabilistic interpretation of the wave function .ψ by Born [11] tells that the square modulus .|ψ(x)|2 gives the probability of “finding” an electron at the place x. The link with .Pψ introduced in Sect. 7.1 is the following. The observable “position” is the operator Q of multiplication by the variable Q : ψ ∈ H = L2 (R). → Qψ with (Qψ)(x) = xψ(x).
.
The spectrum of Q is .R and the (generalized) eigenvectors are .δx , x ∈ R, the Dirac mass at x 13 so that the result of the measure of the position can be any real number x, and the probability of “finding” the value x by measuring the state .ψ is ˆ
δx (x )ψ(x )dx ||2 = |ψ(x)|2
|ψ|δx | = |
.
2
CQFD.
R
Let us consider now the Hamiltonian generator of the dilations (we denote .P = d −i dx ) H =
.
13 Defined
QP + P Q i d − . := −ix dx 2 2
´ by . δx (y)ϕ(y)dy = ϕ(x) for every (test) function .ϕ.
Species of Spaces
381
Obviously ei
.
tH h¯
t
ψ(x) = e 2 ψ(et x).
The underlying classical dynamics on .R2 = (T ∗ R)(q,p) is generated by the Hamiltonian .h(q, p) = qp, which leads to the flow t (q, p) = (et q, e−t p),
.
so that we see immediately that ∀t ∈ R, t (0, 0) = (0, 0)
and
.
lim −t (q, 0) = (0, 0), ∀q ∈ R.
t→+∞
Therefore the origin of the phase space .T ∗ R is a fixed point, and the null section of .T ∗ R is its unstable manifold . (0,0) , a situation exactly comparable to the one in Sect. 7.2. We have seen in several places of this article that a natural way of associating to a point in phase space a vector in the quantum Hilbert space was obtained thanks to the Gaussian coherent state, as defined, for example, by (54) .ϕ(p,q) (x) := − − (π h) ¯ 4e 1
(x−q)2 2h¯
ei
px h¯
. Pinned up at the origin, this definition leads to x2
ψ(x) = ϕ(0,0) (x) = (π h¯ )− 4 e− 2h¯ 1
.
well localized to 0 as .h¯ → 0. We get e−i
.
tH h¯
ψ(x) = e− 2 ψ(e−t x) = e− 2 (π h¯ )− 4 e−e t
1
t
−2t x 2 2h¯
,
so that, when .t = th¯ = − log2 h¯ , e−i
.
th¯ H h¯
1
ψ(x) = (π )− 4 e−
x2 2
which is fully delocalized in x on the real line which is, let us recall, . (0,0) . In other −∞
words, .{0} → (0,0) or .−∞ (0) = (0,0) exactly as in Sect. 7.3. Quantum indeterminism will read now: any point .x ∈ (0,0) can be obtained th¯ H
by measuring the position on the state .e−i h¯ ψ. A statement very close to the unpredictability/indeterminism’s one of the last section. Therefore, when .h¯ → 0, quantum indeterminism and classical unpredictability merge. Even more: the probabilities emerging in classical mechanics because of (or thanks to) unpredictability are fully inherited from quantum mechanics.
382
T. Paul
7.5 A Classical Phase Space Incorporating the Point t = ±∞ What we see on this very simple example (which can be easily generalized to more complex dynamics) is that the fact of having abandoned in quantum mechanics the classical concepts of points and trajectories, a fact which seems to create problems to a lot of persons attached to the old paradigm(s), is in fact a richness. It allows to handle impossible situations in classical dynamics. Let us finish this section by presenting another situation where this phenomenon appears. The famous Cauchy-Lipschitz Theorem provide a sufficient condition for existence and uniqueness of solutions to differential equations associated with vector fields: .x˙ = f (x). Lipschitz regularity of f ensures existence and uniqueness of the solution. Otherwise, for less regular f s, the flow might develop ubiquity phenomena incompatible with the classical paradigm underlying the concept of space. Quantum mechanics is more global than local, therefore boundedness issues are more stringent than regularity ones. Situation with regularity “under CauchyLipschitz” can be perfectly reasonable, and provide perfectly well defined quantum mechanics,14 leading, for example, to non-ambiguous quantum dynamics. But these quantum trajectories live in the Hilbert space of quantum states and have few to share with the classical underlying space, except when .h¯ → 0. And one can easily cook up situation where a coherent state, classically associated with a single point in phase space, can evolve as a superposition of two, localized on two different points. But this ubiquity disappears as soon as a measurement is accomplished, giving rise to a single result, a random one but surely one and only one [4–6, 51]. In fact, quantum mechanics completes the incompleteness of classical mechanics. In a certain sense it was born for this purpose. Postlude species of spaces, towards an epistemological geometry of quantum/classical mechanics In this paper, we have stated several situation of quantum mechanics corresponding to the limit where the Planck constant .h¯ vanishes. As mentioned at the beginning of this paper, quantum mechanism was born deeply inside classical paradigm, after a strong epistemological jump operated by Heisenberg: a jump consisting by taking .h¯ as a serious parameter of physics and passing from the classical world where .h¯ = 0 to the quantum one where .h¯ > 0. In other words, letting, somehow, .0 → h. ¯ The change of paradigm between .h¯ = 0 and .h¯ > 0 is as deep as passing from the culturally classical notion of a function, which assign to a variable a given value taken into the range of the function, to the notion of matrix which acts on a “variable”, namely a vector, by spreading it all over the spectrum of the matrix. Seen like this, the two paradigms seem irreconcilable. And indeed they are, strictu 14 A
nice example showing this classical/quantum dichotomy is the case of the Coulomb potential: 1 is not bounded from below, the quantum one .−−1/|r| although the classical Hamiltonian .p 2 − |r| is.
Species of Spaces
383
senso, if we miss the fact that there exist special vectors, namely the eigenvectors of the matrix, to which the matrix assigns a single value taken into its spectrum: the corresponding eigenvalue. This remark made the makers of the foundations of quantum mechanics, say Born-Heisenberg-von Neumann,15 able to finish the quantum axiomatic by imposing that any eigenvalue could be assigned randomly to a general vector while measuring it. The addition of a property of the probability law insuring that eigenvectors and eigenvalues are deterministically linked makes the whole construction consistent. This view of quantum measurement, milestone of the quantum paradigm, inseparable from the rest of the axiomatic, is sometime seen as a somehow inverse of the Heisenberg jump, it is in a way a jump itself, as brutal as the original one. But this point of view is not correct. For example, coupled with a strong need of determinism in the classical theory, it creates the so-called paradoxes of quantum mechanics, so-called but fake ones as they are not paradoxes inside the quantum paradigm. They show some true contradictions but in a thinking dynamics “à cheval” between quantum and classical: an impossible world in which one desperately tries to marry two irreconcilable points of view. Inside, the word is pronounced: measurement lives inside the quantum paradigm, strictly inside, although the classical world can be only recover by inverting .0 → h¯ by .h¯ → 0.16 The classical limit plays the role for quantum mechanics that plays “.)” for the interval .(0, 1]: it is its border. CLASSICAL LIMIT = BORDER OF QUANTUM The standard way of interpreting this last proposition is to say that the classical world is the border of the quantum one. But in the present article, we have exhibited several situation where the limit .h¯ → 0 does NOT correspond to the classical world, this was precisely the goal of this article. In fact a more true statement is CLASSICAL .⊂ BORDER OF QUANTUM But this does not finish the story. Indeed, on the contrary of one usually thinks, the classical paradigm is far from being exempt of singularities and incompleteness. Newton’s law of inverse dependence of the force in the square of the distance between attracted objects does not tell you a single thing when this distance tends to 0. And one knows since Poincaré that, as the time t of evolution diverges, unpredictability is a serious challenge for the so culturally well installed determinism, 15 We
should mention at this point two alternative points of view: the one by de Brogglie (right before Heisenberg) and Schrödinger (right after Heisenberg) making quantum mechanics a wave theory. However, one has to say that these two viewpoints, the first one severely incomplete and the second one having been shown quickly to be equivalent to Heisnberg’s one, though taking place in a maybe more classical setting, don’t by no mean reconciliate classical determinism and quantum indeterminism. 16 Giving a precise discussion about what .h → 0 means is far beyond the scope of the present ¯ paper. Let us say that, under the proposition .h¯ → 0 is hidden numerous situations where the different parameters of the system studied tend to extreme value leading to the famous canonical relations .[Q, P ] = i h¯ tend to .[Q, P ] = 0.
384
T. Paul
actually linked to quantum indeterminism, as shown in the preceding section—see also [48, 49]. What we claim is that classical has itself a border, for example, when .t → ∞, and we proved in [49] that the sensitivity to initial conditions in chaotic systems let this long time limit merge with the corresponding one for the corresponding quantum evolution: sensitivity to initial conditions create, at the limit .t → 0, a nonlocality feature which is intrinsic to quantum paradigm and let classical unpredictability and quantum indeterminism merge. In other words, at list terminologically, QUANTUM .⊂ BORDER OF CLASSICAL So that: QUANTUM .⊂ BORDER OF CLASSICAL .⊂ BORDER OF .. . . BORDER OF QUANTUM and in other, final, “ à la Oulipo” words: QUANTUM .⊂ (THE BORDER OF) ITS OWN BORDER What a beautiful geometry, with strange species of strange spaces ! es mu.β sein
References 1. J. E. Andersen. Asymptotic faithfulness of the quantum SU(n) representations of the mapping class groups. Ann. of Math., 163 vol.1: 347–368, 2006. 2. J. E. Andersen. Toeplitz operators and Hitchin’s projectively flat connection. The many facets of geometry, 177–209, Oxford Univ. Press, Oxford, 2010. 3. J.E. Andersen. Hitchin’s connection, Toeplitz operators and symmetry invariant deformation quantization. arXiv: math.DG/0611126. pp. 35. To appear in Quantum Topology. 4. A. Athanassoulis and T. Paul, Strong phase space semiclassical asymptotics, “SIAM Journal on Mathematical Analysis”, 43 (2011), 2116. 5. A. Athanassoulis and T. Paul, Strong and weak semiclassical limits for some rough Hamiltonians, “Mathematical Models and Methods in Applied Sciences”, 12 (22) (2012) 6. A. Athanassoulis and T. Paul, On the selection of the classical limit for potentials with BV derivatives, “Journal of Dynamics and Differential Equations”, 25, p. 33–47 (2013). 7. D. Bambusi, S. Graffi, and T. Paul, Long time semiclassical approximation of quantum flows: A proof of the Ehrenfest time, Asymptot. Anal. 21 (1999), 149–160. 8. C. Blanchet, N. Habegger, G. Masbaum, and P. Vogel. Topological quantum field theories derived from the Kauffman bracket. Topology, 34 vol.4: 883–927, 1995. 9. A.Bloch. F. Golse, T. Paul and A. Uribe, Dispersionless Toda and Toeplitz operators, Duke Math. Journal, 117 157–196, 2003. 10. R. Brummelhuis, T. Paul, and A. Uribe, Duke Math. J. 78, 477 (1995). 11. M. Born, “Das Adiabatenprinzip in der Quantenmechanik’, Zeitschrift für Physik 40, 167–192, 1926. 12. A. Bouzuoina and D. Robert, Uniform semiclassical estimates for the propagation of quantum observables. Duke Math. J. 111, (2002), 223–252. 13. D. Bullock and J. H. Przytycki Multiplicative structure of Kauffman bracket skein module quantizations. Proc. Amer. Math. Soc. 128 no.3: 923–931, 2000.
Species of Spaces
385
14. Y. Colin de Verdi‘ere and B. Parisse, Commun. Math. Phys. 205, 459 (1999). 15. A. Connes, Noncommutative differential geometry, Inst. Hautes Études Sci. Publ. Math., 62, p. 257–360, 1985. 16. A. Connes Noncommutative geometry Academic Press, Inc,, 1994. 17. J. N. L. Connor, Chem. Phys. Lett. 4, 419 (1969). 18. Piero Della Francesca De perspectiva pingendi 1576 19. G. Folland, Harmonic analysis on phase space. Annals of Mathematical Studies, Princeton University Press, 1989. 20. M. H. Freedman, K. Walker, and Z. Wang. Quantum SU(2) faithfully detects mapping class groups modulo center. Geom. Topol., 6: 523–539 (electronic), 2002. 21. I. M. Gel’fand and G. E. Shilov, Generalized Functions: Volume I, Properties and Operations, Academic Press, 1964. 22. J.-L. Godard, “Adieu au language”, 2014. 23. W. M. Goldman Invariant functions on Lie groups and Hamiltonian flows of surface group representations. Invent. Math., 85, 263–302, 1986. 24. A. Grossman, Momentum like constants of motion, Proceedings of Europhysics Conference, Haifa, 1971. 25. A. Grossmann, Geometry of real and complex canonical transforms, Proceedings of the VI. International colloquium on group theoretical methods in physics Tubingen 1977, Lecture Notes in Physics, Springer 1977 26. A. Grossman and J. Morlet, Decomposition of Hardy Functions into Square Integrable Wavelets of Constant Shape, SIAM J. Math. Analysis 15 723, 1984. 27. A. Graossmann and R. Seiler, Heat Equation on Phase Space and the Classical Limit of Quantum Mechanical Expectation Values Commun. math. Phys. 48, 195–197, 1976 28. S. Haroche, oral communication, Cours au Collège de France 2005. 29. W. Heisenberg, “Uber quantentheoretische Umdeutung kinematischer und mechanischer Beziehungen”, Zeitscrift für Physik, 33, 879–893, 1925. 30. N. J. Hitchin, Flat connections and geometric quantization. Comm. Math. Phys., 131, no. 2, 347–380, 1990. 31. P.S. Laplace, Essai philosophique sur les probabilités, Courcier, 1814. 32. J. Laskar, Le système solaire est-il stable ? Séminaire Poincaré XIV (2010) 221–246. 33. N. Lerner, “Metrics on the phase space and non-selfadjoint pseudo-differential operators”, Birkhäuser Verlag, Basel, 2010. 34. G. Longo, C. Palamidessi,and T. Paul, Randomness: four questions and some challenges, “Randomness: 5 questions”, H. Zenil, Editor, Automatic Press / VIP, 2011. 35. G. Longo and T. Paul, Le monde et le calcul : réflexions sur calculabilité, mathématiques et physique, “Logique & Interaction : Géométrie de la cognition” Actes du colloque et école thématique du CNRS “Logique, Sciences, Philosophie” à Cerisy, Hermann, 2009. 36. G. Longo and T. Paul, The Mathematics of Computing between Logic and Physics, in “Computability in Context: Computation and Logic in the Real World”, S.B. Cooper and A. Sorbi, eds, Imperial College Press, 2011. 37. J. Marché and T. Paul Toeplitz operators in TQFT via skein theory. Trans AMS, 367, 3669– 3704, 2015. 38. G. Masbaum and P. Vogel. 3-valent graphs and the Kauffman bracket Pacific Journal of Mathematics, 164, no.2, 361–381, 1994. 39. A. Mellin, J. Sjostrand, Fourier integral operators with complex phase, CPDE 1974. 40. J. Morlet, Sampling theory and wave propagation, Proc. 51st Annual International Meeting of the Society of Exploration Geophysicists, Los Angeles, 1981. 41. T. Paul, Semiclassical approximation and noncommutative geometry C. R. Acad. Sci. Paris, Ser. I 349, 1177–1182, 2011. 42. T. Paul Symbolic calculus for singular curve operators Schrödinger Operators, Spectral Analysis and Number Theory—In Memory of Erik Balslev. Springer, S. Albeverio, A. Balslev, R. Weder (eds.):(2021) 195–221. 43. T. Paul, On quantum complex flows, preprint hal-02004992
386
T. Paul
44. T. Paul, Husimi, Wigner, Toeplitz, quantum statistics and anticanonical transformations, prerpint hal-02008709 45. T. Paul, “Fragments d’un questionnement amoureux : la rigueur en musique et philosophie interrogée par un mathématicien”, RIGUEUR, Spartacus, Paris (2021). 46. T. Paul, “Mathematical entities without objects, on the realism in mathematics and a possible mathematization of the (non)Platonism—Does Platonism dissolve in mathematics?”, “Symmetry, Proportion and Seriality: The Semantics of Mirroring and Repetition in Science and the Arts”, Special Issue de European Revie, M.Fludernik, M. Middeke and A. Buchleitner (edts.), European Review, 29 2, 253–273, Cambridge University Press, (2020). 47. T. Paul, “Poincaré face au négatif : une méthodologie ?”, Matapli 98 37–52, 2012. 48. T. Paul, “Indéterminisme quantique et imprédictibilité classique”, “Noesis”’ Vol. 17, p. 219– 232, 2010. 49. T. Paul, “Semiclassical Analysis and Sensitivity to Initial Conditions”, Information and Computation, 207, p. 660–669 (2009) 50. T. Paul, “Discrete-continuous and classical-quantum”, Mathematical Structures in Computer Science 17, p. 177–183, (2007). 51. T. Paul, Échelles de temps pour l’évolution quantique à petite constante de Planck, “Séminaire X-EDP 2007–2008”, Publications de l’École Polytechnique, 2008. 52. T. Paul and A. Uribe A construction of quasi-modes using coherent states Annales de l’IHP, Physique Théorique, 59, 357–381, 1993. 53. H. Poincaré, Les méthodes nouvelles de la mécanique céleste, Volume 2, Blanchard, Paris, 1987. 54. J. Ralston...Approximate solutions of the Laplacian, J. Diff. Geom. 12 (1977) 87–100 55. N.Y. Reshetikhin and V.G. Turaev. Invariants of 3-manifolds via link polynomials and quantum groups. Invent. Math., 92: 547–597, 1991. 56. P. Ribeiro1 and T. Paul, Semi-classical Analysis of Spin Systems near Critical Energies, “Phys. Rev. A” 79, 032107 (2009). 57. A. Sobrero, Thèse, Université Denis Diderot, Paris, 2005. 58. J-M. Souriau, Structure des systèmes dynamiques, Dunod 1967. 59. Y. U. Taylor and C. T. Woodward. 6j symbols for Uq (sl2 ) and non-Euclidean tetrahedra. Selecta Math. 11, no. 3–4, 539–571, 2005. 60. V. G. Turaev Skein quantization of Poisson algebras of loops on surfaces. Ann. Sci. École Norm. Sup. 24, no.4, 635–704, 1991. 61. E. Witten, Quantum field theory and the Jones polynomial. Comm. Math. Phys., 121, vol.3: 351–399, 1989.
Part II
Wavelets and Mathematical Analysis
Curved Model Sets and Crystalline Measures Yves Meyer
En hommage à Alexandre Grossmann, un ami et un maître. During the late seventies many scientists were working on new tools which will be later called wavelets. They were musicians playing their instrument without listening to each others. It was a cacophony. But a conductor came. Alex appeared among us. He had a sensitive hear. He understood everyone. And suddenly the noise ceased and we heard an exquisite music. Then we could talk to each others. It was a symphony. I could understand Jean Morlet. I could collaborate with Ingrid. Sterile and aggressive competition disappeared. A celestial harmony.
Abstract Pavel Kurasov and Peter Sarnak constructed a crystalline measure supported by a uniformly discrete set. Motivated by this achievement we propose an alternative to this construction. Our approach is based on the work of Patrick Robert Ahern.
1 Introduction A Radon measure .μ on the real line is a crystalline measure if .μ is supported by a locally finite set and if its distributional Fourier transform . μ is a measure which is also supported by a locally finite set. In the late fifties crystalline measures were studied (under other names) by Jean-Pierre Kahane, André-Paul Guinand, and Szolem Mandelbrojt [2, 4]. To every crystalline measure .μ satisfying some natural conditions one can attach a Dirichlet series .φ(μ, s) which is a meromorphic function in the complex plane with a unique pole at .s = 1. Moreover .φ(μ, s) and Y. Meyer () École normale supérieure Paris-Saclay, Gif-sur-Yvette, France e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Flandrin et al. (eds.), Theoretical Physics, Wavelets, Analysis, Genomics, Applied and Numerical Harmonic Analysis, https://doi.org/10.1007/978-3-030-45847-8_17
389
390
Y. Meyer
φ( μ, s) are related by a remarkable functional equation [4]. If .μ is a Dirac comb φ(μ, s) is the Riemann zeta function. This line of research can be traced back to Hans Ludwig Hamburger [3]. The discovery of quasi-crystals by Dan Shechtman (1982) renewed the interest in crystalline measures. Are quasi-crystals related to crystalline measures? Nir Lev and Alexander Olevskii answered this question and clarified the mathematical properties of crystalline measures in a remarkable series of contributions [6–8]. Finally Pavel Kurasov proved that the support of a nontrivial crystalline measure can be a uniformly discrete set. Quantum graphs are seminal in this beautiful construction. Recently Pavel Kurasov and Peter Sarnak replaced quantum graphs by a clever argument using some special polynomials (named stable polynomials) and Cauchy’s residue theorem [5]. Stable polynomials originate from quantum graphs but can be defined independently. Alexander Olevskii and Alexander Ulanovski discovered a new family of crystalline measures supported by uniformly discrete sets [9]. As it will be shown in Sect. 7 their crystalline measures do not belong to the family constructed by Kurasov and Sarnak. A fourth approach to crystalline measures is proposed here. Curved model sets and the remarkable work achieved by P.R. Ahern in [1] are the main ingredients. This new approach bridges the gap between crystalline measures and quasi-crystals. Nir Lev and Alexander Olevskii proved that a crystalline measure cannot be supported by a model set. That explains why standard model sets are replaced by curved model sets in our construction. A translation bounded crystalline measure .μ is an almost periodic measure. This is proved in Sect. 2. Therefore .μ can be lifted to a compact group .G. In this essay G is the .m−dimensional torus. Then the construction of translation bounded crystalline measures can benefit from our understanding of the properties of Ahern measures on .Tm . The crystalline measures constructed using this scheme are not new since they can also be obtained by the method proposed by Kurasov and Sarnak as it will be proved in Sect. 7. Our approach does not yield new results but provides us with a new perspective on the achievements by Kurasov and Sarnak.
. .
2 Crystalline Measures Are Almost Periodic The Dirac mass at .a ∈ Rn isdenoted by .δa or .δa (x). A purely atomic measure is a linear combination .μ = λ∈ c(λ)δ λ of Dirac masses where the coefficients .c(λ) are real or complex numbers and . |λ|≤R |c(λ)| is finite for every .R > 0. If .c(λ) = 0, ∀λ ∈ , then . is the support of .μ. A subset . ⊂ Rn is locally finite if . ∩ B is finite for every bounded set B. Equivalently . can be ordered as a sequence .{λj , j = 1, 2, . . . } and .|λj | tends to infinity with .j. A measure .μ is a tempered distribution if it has a polynomial growth at infinity in the sense k the measure . ∞ given by Laurent Schwartz in[16]. For instance, 1 2 δk is not ∞ k 3 a tempered distribution while . ∞ 1 k δk and . 1 2 [δ(k+2−k ) − δk ] are tempered distributions.´The Fourier transform .F(f ) = fof a function .f ∈ L1 (Rn ) is defined μ by .f(y) = Rn exp(−2π ix · y)f (x) dx. The distributional Fourier transform .
Curved Model Sets and Crystalline Measures
391
holds for every testing of .μ is defined by the following condition: . μ, φ = μ, φ function .φ belonging to the Schwartz class .S(Rn ). Definition 1 A purely atomic measure .μ on .Rn is a crystalline measure if : (a) The support . of .μ is a locally finite set. (b) .μ is a tempered distribution. (c) The distributional Fourier transform . μ of .μ is also a purely atomic measure supported by a locally finite set .S. A crystalline measure .μ is sparse if its support . is uniformly discrete. A set of real numbers, ., is uniformly discrete if there exists a .β > 0 such that λ, λ ∈ and .λ = λ imply .|λ − λ| ≥ β. The simplest example of a crystalline measure is the normalized Dirac comb .μ = +∞ support is .Z. If .μ is the −∞ δk whose +∞ Dirac comb we have . μ = μ. The measure .μ(a,b) = −∞ δa+bk , a > 0, b ∈ R, is a Dirac comb supported by .aZ + b. Simple manipulations on Dirac combs yield other examples of crystalline measures as the following definition shows.
.
Definition 2 Let .σj , 1 ≤ j ≤ N, be N Dirac combs supported by . j = aj Z + bj and let .gj (x), 1 ≤ j ≤ N, be N finite trigonometric sums. Let .μj be the product .gj σj . Then .μ = μ1 + · · · + μN is called a generalized Dirac comb. The support of a generalized Dirac comb .μ is a locally finite set since it is included in .∪N 1 j . The Fourier transform of a generalized Dirac comb is a generalized Dirac comb. Therefore a generalized Dirac comb is a crystalline measure. Do other crystalline measures exist? André-Paul Guinand (1912–1987) pioneered this investigation in [2] and proposed several examples of nontrivial crystalline measures. Here is one of his examples. One defines .χ : Z3 → {−1/2, 0, 4} by 3 3 3 3 3 .χ (k) = 0 if .k ∈ 4Z , .χ (k) = 4 if .k ∈ 2Z \ 4Z and .χ (k) = −1/2 if .k ∈ Z \ 2Z . Then the Fourier transform of the one dimensional odd measure .τ = χ (k)|k|−1 (δ|k|/2 − δ−|k|/2 ) k∈Z3
is .−iτ. √ The support of Guinand’s crystalline measure .τ is contained in the set = {± n; n ∈ N} and .τ is not sparse. Guinand’s arguments were unsatisfactory as it was noticed by Alexander Olevskii. But Guinand’s claims are true as it was finally shown in [12, 13]. Using a completely distinct approach Nir Lev and Alexander Olevskii [6–8] proved the existence of a crystalline measure .μ which is not a generalized Dirac comb. The density of the support of Guinand’s crystalline measure is infinite. This is also true for the Lev-Olevskii’s crystalline measure. A remarkable theorem by Lev and Olevskii implies that the distributional Fourier transform . μ of a sparse crystalline measure .μ cannot be supported by a uniformly discrete set S unless .μ is a generalized Dirac comb [7]. This theorem is still a conjecture in several dimensions. A sparse crystalline measure is defined by a geometrical property (a uniformly discrete support) and a spectral property (a
.
392
Y. Meyer
locally finite spectrum). In this essay sparse crystalline measures are constructed. Curved model sets take care of the geometrical property and Ahern measures of the spectral one. The situation is asymmetric. Let .μ be a crystalline measure. We then have μ=
.
a(λ)δλ , μ=
λ∈
b(s)δs
(1)
s∈S
and ., S are two locally finite sets. The left hand side of (1) unveils the geometrical structure of .μ and the right hand side its spectral structure. Let .S (R) denote the space of tempered distributions equipped with its standard topology [16]. The locally finite sets . and S can be ordered as increasing sequences .λj , j ∈ Z, and .sj , j ∈ Z, of real numbers and the convergence of the two series in .(2) refers to this order. Lemma 1 If .μ is a crystalline measure the two series of .(2) converge in .S (R). Indeed if the test function .φ is compactly supported . μ, φ is a finite sum. Since μ is a tempered distribution the convergence in the general case where .φ belongs to the Schwartz class follows by density. For the second series the discussion is similar. is compactly supported and the general case follows The convergence is obvious if .φ by density. In other terms we have . a(λ)δλ = b(s)exp(2π isx) (2)
.
λ∈
s∈S
and both series converge in .S (R). From now on we focus on translation bounded crystalline measures. Definition 3 A Radon measure .μ on .R is translation bounded if there exists a constant C such that .|μ|([x, x + 1]) ≤ C uniformly in .x ∈ R. The Guinand’s crystalline measure .τ is not translation bounded [12, 14]. In [5] Kurasov and Sarnak constructed a crystalline measure .μ for which .a(λ) = 1, ∀λ ∈ . Lemma 2 Let .μ = λ∈ a(λ)δλ be a crystalline measure. The condition .a(λ) ≥ 1, ∀λ ∈ , implies that . is a finite union of uniformly discrete sets and that .μ is translation bounded. ´ Let us prove this property. It suffices to evaluate the integral .I (u) = g(x − u) d μ(x) when g is a positive even function g is ´ in the Schwartz class such that . compactly supported. We have .I (u) = exp(2π iuy) g (y) d μ(y) which implies .|I (u)| ≤ C since . μ is a Radon measure and . g is compactly supported. On the other hand .I (u) is larger than .N(u) inf[−1,1] g where .N(u) is the number of points in . ∩ [u − 1, u + 1]. This ends the proof of Lemma 2. We have more.
Curved Model Sets and Crystalline Measures
393
Lemma 3 A translation bounded crystalline measure is an almost periodic measure. Before proving Lemma 3 let us define almost periodic distributions and almost periodic measures. The Banach space of Bohr almost periodic functions on the real line is denoted by .AP . The norm in .AP is the sup-norm on .R. The vector space consisting of finite trigonometric sums .g(x) = c(ω)exp(iωx), F ⊂ R, is ω∈F dense in .AP for the sup-norm. From now on “an almost periodic function" is a short name for a Bohr almost periodic function. Definition 4 A tempered distribution S is an almost periodic distribution if the convolution product .S ∗ φ is an almost periodic function for any test function .φ in the Schwartz class .S(R). This definition was proposed by Laurent Schwartz in [16]. We now define the Fourier series expansion of an almost periodic distribution. Let us begin with the definition of the Fourier coefficients of an almost periodic distribution .S. For an almost periodic function the definition of Fourier coefficients is well known. Lemma 4 Let S be an almost periodic distribution. Then for any real number .ω there exists a constant .c(ω) such that for any test function .φ the Fourier coefficient (ω). .S ∗ φ(ω) of the almost periodic function .S ∗ φ is given by .S ∗ φ(ω) = c(ω)φ Then .c(ω) is the Fourier coefficient of S at .ω. To prove this lemma it suffices to show that .c(ω) does not depend on .φ. Indeed if .φ0 and .φ1 are two test functions we have .(S ∗ φ0 ) ∗ φ1 = (S ∗ φ1 ) ∗ φ0 1 (ω) = c1 (ω)φ 0 (ω). This ends the proof. Let us relate these which implies .c0 (ω)φ Fourier coefficients to the distributional Fourier transform of .S. If . S denotes the distributional Fourier transform of an almost periodic distribution .S, the Fourier coefficients of S can be computed by c(ω) = lim S, φ ,ω ,
.
→0
∞ where .φ ,ω (x) = φ( x−ω
), .φ(0) = 1, and .φ is an even .C function supported by .[−1, 1]. This limit exists and does not depend on .φ. The set E defined by .c(ω) = 0 is at most countable and is called the spectrum of .S. The spectrum of the almost periodic distribution S is not closed in general and is included in the closed support of the tempered distribution . S. Finally the Fourier series expansion of an almost periodic distribution S is the formal series
S∼
.
c(ω)exp(2π iωx).
ω∈E
An almost periodic measure is defined similarly:
(3)
394
Y. Meyer
Definition 5 A Radon measure .μ on the real line is an almost periodic measure if for any compactly supported continuous function g the convolution product .μ ∗ g is an almost periodic function. A Radon measure .μ is an almost periodic measure if and only if .μ is a translation bounded measure and is an almost periodic distribution. The definition of the Fourier series of an almost periodic measure is the same as the one of an almost periodic distribution. Proposition 1 An almost periodic measure .μ is a continuous linear form on .AP . ´T This linear form is defined by . μ, P = limT →∞ T −1 0 P (x)dμ(x) for any finite trigonometric sum .P . If .χω (x) = exp(2π iωx) we have . μ, χω = μ(−ω). There exists a constant C such that .| μ, P | ≤ CP ∞ for any finite trigonometric sum .P . This follows immediately from .|μ|([0, T ]) ≤ CT for .T ≥ 1. This argument is detailed in [11]. The converse statement is not true. There exist continuous linear forms on .AP which are not almost periodic measures. The simplest example is a Dirac measure .δa where .a ∈ R. Theorem 1 Let .S ⊂ R be a locally finite set. Then the two following properties of a translation bounded measure .μ are equivalent: .(a) .μ = s∈S b(s)exp(2π isx) where the series converges to .μ in .S (R). .(b) .μ is an almost periodic measure and its Fourier series is μ∼
.
b(s)exp(2π isx).
(4)
s∈S
Let us add some remarks before proving Theorem 1. Theorem 1 does not apply to Guinand’s crystalline measure .τ. Indeed .τ is not an almost periodic measure since it is not translation bounded as it is proved in [12, 14]. Here is a complement to Theorem 1. Let . ⊂ R be the additive group generated by S and let G be the compact abelian group which is the dual of . . The trivial case where . is a lattice will be omitted in our discussion since .(2) is then the standard Poisson formula. Then .R is a dense subgroup of .G. Proposition 2 The equivalent conditions of Theorem 1 imply the existence of a Radon measure .σ on G whose Fourier series is also given by the right hand side of (4). Let .AP S denote the closed subspace of .AP consisting of all almost periodic functions whose spectrum is contained in .S. Similarly let .ES be the closed subspace of .C(G) consisting of all continuous functions on G whose spectrum is contained in .S. Any .f ∈ AP S extends by continuity to a function .F ∈ ES . Then .μ defines a continuous linear form on .ES . This is Proposition 1. This linear form can be extended to a Radon measure .σ on .G. If .χs is defined by .χs (x) = exp(2π isx), s ∈ S, then .χs can be extended by continuity to G and .(b) implies .σ ∼ s∈S b(s)χs on G which ends the proof of Proposition 2.
Curved Model Sets and Crystalline Measures
395
Proposition 2 implies that .μ is the trace of .σ on .R. Here we are anticipating a discussion which will be detailed in Sect. 3. To validate this heuristic it suffices to compare the expansion .μ ∼ s∈S b(s)χs on .R to the expansion of .σ given by Proposition 2. The first one is the trace on .R of the second one. To construct the Karasov-Sarnak measure .μ we proceed the other way around. One starts with a measure .σ on G enjoying two properties detailed in Theorem 3 and .μ is defined as the trace of .σ on .R. Let us prove .(a) ⇒ (b) in Theorem 1. This implication is not used in what follows. Lemma 5 Let .μ be a translation bounded measure. If . μ is supported by a locally finite set S then .μ is an almost periodic measure. We first prove that .μ is an almost periodic distribution. Since the measure .μ is translation bounded it suffices to check that .μ ∗ φ is an almost periodic function for any .φ ∈ S(R) whose Fourier transform is compactly supported. But in that case .μ ∗ φ is a finite trigonometric sum. Then we use the classical fact that a translation bounded measure which is an almost periodic distribution is an almost periodic measure. The Fourier coefficients .c(ω) of the almost periodic measure .μ are computed using (3) and the identity . μ = s∈S b(s)δs in .S (R). We obtain .c(ω) = 0 unless .ω ∈ S. If .ω ∈ S we have .c(s) = b(s) as announced. The implication .(b) ⇒ (a) is used in the proof of our main result. To prove .(b) ⇒ (a) it suffices to show that the distributional Fourier transform . μ of .μ is given by . μ = b(s)δ . If . φ belongs to the Schwartz class and if .φ ω s∈S is compactly supported, then .μ ∗ φ is an almost periodic function. Since S is locally finite the Fourier series of .μ ∗ φ is the finite trigonometric sum .P (x) = φ series. Therefore s∈S (s)b(s)exp(2π isx). Then this formal series is a genuine (s)b(s)δs . = s∈S φ .μ ∗ φ(x) = P (x). On the Fourier transform side it implies . μ φ Finally . μ = s∈S b(s)δs as announced. It ends the proof of Theorem 1. The implication .(b) ⇒ (a) in Theorem 1 opens the gate to the construction of a crystalline measure supported by a uniformly discrete set. We first fix . . Let .α1 , . . . , αm be m real numbers which are linearly independent over the field .Q. Then we set . = α1 Z + · · · + αm Z and . is isomorphic to .Zm . The compact group G containing .R as a dense subgroup is the dual of . . It is isomorphic to the m .m−dimensional torus .T where .T = R/Z. Definition 6 Let .h : R → Tm be the embedding of the real line into the .m−dimensional torus defined by .h(t) = (α1 t, . . . , αm t). Then .h(R) is dense in .Tm and for any continuous function F on .Tm the function .F ◦ h is almost periodic and its spectrum is contained in . . Given a Radon measure m .σ on .T its trace .μ = σ ◦h on .R is defined in Sect. 3. This trace is not the restriction of the Radon measure .σ to the Borel set .h(R).
396
Y. Meyer
3 Traces of Radon Measures The following result is proved in this section: Claim 1 If .σ is a .Zm -periodic Radon measure on .Rm and if a line .L ⊂ Rm is transverse to .σ, then the trace .σ L of .σ on L is well defined and is an almost periodic measure. Moreover the Fourier series of .σ L is the trace on L of the Fourier series of .σ. The meaning of the word transverse is given in Definition 8. Here is an equivalent formulation of our claim. Let .α1 , . . . , αm be m real numbers which are linearly independent over the field .Q. Our goal is to define the measure .σ ◦ h on .R for a large class .U of Radon measures .σ on .Tm and to prove that .σ ◦ h is an almost periodic measure. The measure .σ ◦ h is the trace of .σ on the dense subgroup .h(R) of .Tm . That is why we begin with the definition of the trace on a line L of a Radon measure .σ on .Rm . In the following definition the line L is assumed to be parallel to the .xm axis and the measure .σ is assumed to be compactly supported which prevents .σ from being periodic. The generalizations to an arbitrary line and to an arbitrary Radon measure are treated afterwards. We write .x = (x , xm ) where .x = (x1 , . . . , xm−1 ). Let .C0 (Rm ) denote the Banach space of all continuous functions on .Rm which tend to 0 at infinity. This Banach space is equipped with the sup-norm .f ∞ . Definition 7 Let .σ be a compactly supported Radon measure on .Rm . We say that .e = (0, . . . , 0, 1) is transverse to .σ if for any .g ∈ C0 (R) the measure ´ +∞ .σg (x ) = −∞ g(xm )dσ (x , xm ) is absolutely continuous with respect to the m−1 and if the density .wg of .σg with respect to .dx Lebesgue measure .dx on .R is a continuous function of .x = (x1 , . . . , xm−1 ). Let us examine some examples. If .σ is a Dirac measure at .a ∈ Rm then .σg = g(am )δa . Therefore .e is not transverse to .δa . If .m = 2 and if the measure .σ is the indicator function of the unit square .{0 ≤ x1 ≤ 1, 0 ≤ x2 ≤ 1} then .e = (0, 1) is not transverse to .σ since .σg = 2χ[O,1] + δ0 + δ1 if .g = 1. If .σ is the arc length measure on the unit circle then .e = (0, 1) is not transverse to .σ since if .g = 1 we have .wg = √ 2 2 if .|x| < 1 and 0 outside .[−1, 1]. If .σ is the arc length measure on 1−x
|x1 | + |x2 | = 1 then .e = (0,√1) is not transverse to .σ. Indeed if .g = 1 the function .wg is given by .wg (x) = 2 2 on .[−1, 1] and .wg (x) = 0 if .|x| > 1. If .σ is the indicator function of the square .|x1 | + |x2 | ≤ 1 then .e = (0, 1) is transverse to .σ and similarly if .σ is the indicator function of the unit disc. We now return to the general case. The continuous function .wg is compactly supported. The closed graph theorem implies that the linear application M defined by .M(g) = wg is continuous from .C0 (R) to .C0 (Rm−1 ). If .e is transverse to .σ then .σ can be restricted to any vertical line .L ⊂ Rm . This trace .σ L is a measure which depends continuously on .L. .
Curved Model Sets and Crystalline Measures
397
Definition 8 More precisely if .L = {x1 = a1 , . . . , xm−1 = am−1 } the trace .σ L of L .σ on L is defined by . σ , g = wg (a1 , . . . , am−1 ) for any .g ∈ C0 (R). The map .σ → σ L commutes with translations which are parallel to .L. If .σ = s(x) dx where the density .s(x) of .σ is a continuous function on .Rm then the trace of .σ on L coincides with the trace of the function s on .L. If the vector .e = (0, . . . , 0, 1) is transverse to .σ then for any continuous function f on .Rm the vector .e is also transverse to .f σ. This observation is obvious if .f (x) = h(x )g(xm ) where g and h are two continuous functions and the general case follows by linearity and density. Therefore being transverse is a local property. This permits to extend this definition to any Radon measure .σ whatever be its support. By a suitable choice of the coordinate system Definition 8 can be extended to any nonzero vector .v ∈ Rm . Lemma 6 With the notations of Definitions 7 and 8 the vector .α is transverse to the measure .σU if and only if .α is transverse to .U. This is proved in the Appendix. A new characterization of transversality is given now. It relies on the following definition. Definition 9 Let .v ∈ Rm , v = 0, and let L be the line containing .v. Let .τ be a Radon measure supported by the line .L. We write .τ ∈ Tv if .τ is absolutely continuous with respect to the Lebesgue measure on L and if the density .ω of .τ is a compactly supported continuous function. Lemma 7 With these notations .v is transverse to .σ if and only if .σ ∗ τ is a continuous function on .Rm for any .τ ∈ Tv . The proof is immediate. ´ +∞ Indeed in a coordinate system in which .v is .(0, . . . , 0, 1) we have .σ ∗ τ = −∞ ω(xm − t)dσ (x , t) = J (x , xm ). But .J (x , xm ) is a continuous function of .x for any .xm by the definition of transversality. Moreover the continuity of the operator M implies that .J (x , xm ) depends continuously on .xm . Therefore .J (x , xm ) is continuous on .Rm . The converse implication is even easier. We now reach our goal. Theorem 2 Let .σ be a .Zm −periodic Radon measure on .Rm . If .v is transverse to .σ then the trace .σ L of .σ on any line L parallel to .v is an almost periodic measure on .L. The problem is translation invariant and it can be assumed that L is the line containing .v. Let .ω be a compactly supported continuous function on .R. Let us show that .σ L ∗ ω is an almost periodic function. Let .τ ∈ Tv be defined by Definition 9 with density .ω. Then .σ L ∗ ω is the trace on L of .σ ∗ τ. But .σ ∗ τ is a continuous function on .Rm and is .Zm − periodic. Its trace on L is almost periodic which ends he proof. The following lemma complements Theorem 2. Lemma 8 The notations of Definition 9 are kept. Let L be the line containing .v. If the Fourier series of .σ is . k∈Zm b(k)exp(2π ik · x) the Fourier series of the almost periodic measure .σ L is
398
Y. Meyer
σL ∼
.
b(k)exp(2π i(k · v)t).
(5)
k∈Zm
Let again .ω be a compactly supported continuous function of a real variable and let .τ ∈ Tv be defined by Definition 9 with density .ω. The series . k∈Zm ω(k · v)b(k)exp(2π ik · x) is the Fourier series of the convolution product .σ ∗ τ. This function .σ ∗ τ is continuous on .Rm and its trace on L is the almost periodic function L ∗ ω. The trace on L of the Fourier series of .σ ∗ τ is obviously the Fourier series .σ of .σ L ∗ ω. This ends the proof. Corollary 1 Let .σ be a Radon measure on .Tm and let .α1 , . . . , αm be m positive real numbers linearly independent over .Q. If .α = (α1 , . . . , αm ) is transverse to .σ then .μ = σ ◦ h is an almost periodic measure on .R. Indeed .σ is a .Zm −periodic Radon measure on .Rm and we are reduced to Theorem 2. Here is a second definition of .σ ◦ h. We denote by´ w an even non negative .C∞ function supported by the interval .[−1, 1] such that . w = 1 and we set .w (t) = (1/ )w(t/ ). The support of .w is the interval .[− , ]. Let .τ be the image of the probability measure .w (t) dt by .h : R → Tm . Let .σ be a Radon measure on .Tm and let us assume that .α is transverse to .σ. Then .F = σ ∗ τ is a continuous function and .f = F ◦ h makes sense. Then .σ ◦ h is the weak-star limit of .f as . tends to .0.
4 Sparse Crystalline Measures We now describe the measure .σ which is used to construct .μ. In order to keep the discussion to its simplest level we start with a .C∞ function .φ : Tm−1 → T. Then m−1 and satisfying the .φ can be identified to a continuous function defined on .R m−1 functional equation: .φ(x + k) = φ(x) + q · k, k ∈ Z where .q ∈ Zm−1 . Let m−1 us denote the graph of .φ by .U ⊂ T × T. In other terms .U = {(x, φ(x)), x ∈ Tm−1 }. Definition 10 With the preceding notations the image of the Haar measure on .Tm−1 by the map: . : x → (x, φ(x)) is denoted by denoted by .σU . The support of .σU is U and .σU is our measure .σ. Definition 11 We say that the vector .α is transverse to U if there exists a .β > 0 such that .|α1 ∂1 φ + . . . + αm−1 ∂m−1 φ − αm | ≥ β on .Tm−1 . If the vector .α is transverse to U then it is transverse to the measure .σU . As it is proved in Sect. 3 the measure .μ = σU ◦ h is well defined and is an almost periodic measure. Moreover .μ is a sum of weighted Dirac masses on a uniformly discrete set .. These properties are immediate and are checked in Sect. 6. This measure .μ = σU ◦ h is the left hand side of (2). The geometrical properties of .μ are then elucidated and it remains to study its spectral properties.
Curved Model Sets and Crystalline Measures
399
We now focus on the spectral properties of .μ = σU ◦ h. For addressing this issues Theorem 1 is used. Instead of computing the distributional Fourier transform of .μ one computes its Fourier series which is much easier since it coincides with the trace on .h(R) of the Fourier series of .σU . The Fourier series of .σU is computed with a seminal tool provided by P.R. Ahern [1]. Before defining Ahern’s measures one needs to define holomorphic distributions on .Tm . A distribution .τ on .Tm is holomorphic if its Fourier coefficients . τ (k) satisfy . τ (k) = 0 unless .kj ≥ 0, 1 ≤ j ≤ m. In other terms there exists a holomorphic function F in the polydisc .Dm such that .τ is the trace of F on the distinguished boundary .Tm of .Dm . The following definition was proposed by P.R. Ahern in [1]. Definition 12 A real valued Radon measure .σ on .Tm is an Ahern measure if it is the real part of a holomorphic distribution. A complex valued measure is an Ahern measure if its real part and its imaginary part are Ahern measures. This condition is empty if .m = 1. Equivalently .σ is an Ahern measure if σ (k) = 0 unless either all coordinates .kj are nonnegative or all coordinate .kj are not positive. The following theorem opens the door to the construction of crystalline measures:
.
Theorem 3 Let us assume that the measure .σU on .Tm enjoys the two following properties: (a) The vector .α is transverse to U and all .αj are positive. (a) .σU is an Ahern measure.
. .
Then .μ = σU ◦ h is an almost periodic measure. It is a sum of weighted Dirac masses on a uniformly discrete set . and we have μ=
.
σU (k)exp[2π i(k · α)t]
(6)
k∈Zm
in the distributional sense. Therefore .μ is a sparse crystalline measure. The first assertions are implied by Theorem 2 and by the explicit calculation of .μ given in the Appendix. The Fourier series of the almost periodic measure .μ is the trace of the Fourier series of .σU . Since .σU is an Ahern measure its Fourier coefficients vanish unless all .kj , 1 ≤ j ≤ m, have the same sign. Therefore .|k ·α| = |k1 |α1 + · · · + |km |αm . It implies that the set of frequencies .k · α which are present in the right hand side of (6) is locally finite. Therefore the right hand side of (6) converges to .μ in the distributional sense and .μ is a crystalline measure. This ends the proof of Theorem 3. The definition of .σU and condition .(a) are two geometrical properties of .σU while condition .(b) is of spectral nature. Theorem 3 says that a geometrical object is at the same time a spectral object. But Theorem 3 would be an empty statement if the hypotheses .(a) and .(b) were antagonist. Fortunately non trivial examples of surfaces U such that .σU is an Ahern measure are given in Sect. 6. Theorem 3 is then
400
Y. Meyer
completed by Theorems 4 and 5 which imply the existence of sparse crystalline measures.
5 Curved Model Sets The goal of this section is to bridge the gap between quasi-crystals, model sets and crystalline measures. Let us begin with the simplest examples. Let .s(x) be the sawtooth function and let .α > 0 be irrational. Then the set . = {k + s(αk); k ∈ Z} is a model set. An example of a curved model set is given by . = {k + sin(2π αk); k ∈ Z}. The sawtooth function has been replaced by .sin(2π x). On this example we observe that curved model sets are close to model sets. If . is a model set (but not a lattice) then .σ = λ∈ δλ is not an almost periodic measure. The lack of regularity of .s(x) is responsible for this drawback. In the example of a curved model set .σ is an almost periodic measure and its distributional Fourier transform .σ is an atomic measure as it is proved in [10]. If in . = {k + sin(2π αk); k ∈ Z} the sine function is replaced by a suitable .2π periodic analytic function, .σ is a crystalline measure. This is proved by Kurasov and Sarnak in [5]. Let us recall the definition of model sets [9, 10]. Let .m ∈ N, let .Tm be the m → Tm be the quotient map. Let .α , . . . , α .m−dimensional torus, and let . : R 1 m be m real numbers which are linearly independent over the field .Q. To define a model set one begins with a .m − 1 dimensional affine space .V ⊂ Rm and a compact set .K ⊂ V . Let .U = (K). Let us assume that the vector .α = (α1 , . . . , αm ) is not parallel to .V . Let h be defined by (5). Then we have: Definition 13 The model set . ⊂ R defined by .α and U is = {t ∈ R; h(t) ∈ U }.
.
If . is a model set the Fourier transform . μ of the measure .μ = λ∈ δλ is not a measure. This is due to the boundary of .U. If U is replaced by a smooth .m − 1 dimensional manifold without boundary the Fourier transform . μ is an atomic measure [10, 11]. This advocates for the choice of a curved model set in our construction. But we want the support of . μ to be locally finite. Ahern’s measures and inner functions will take care of this requirement in Sect. 6. Definition 14 Let .U ⊂ Tm−1 × T be the graph of a .C∞ function .φ : Tm−1 → T and let .h : R → Tm be defined by (3). Let us assume that there exists a .β > 0 such that .|α1 ∂1 φ + . . . + αm−1 ∂m−1 φ − αm | ≥ β on .Tm−1 . Then the curved model set .U is U = {t ∈ R; h(t) ∈ U }.
.
Curved Model Sets and Crystalline Measures
401
This is equivalent to the condition that the vector .α is transverse to the measure σU . This hypothesis implies that .U is a Delone set. In the applications which are given in Sect. 7 we have .m = 2, .φ is an decreasing function, .α1 = α > 0, α2 = 1, and our condition is obviously satisfied.
.
6 Ahern Measures and Inner Functions We now construct Ahern measures. The group of complex numbers of modulus 1 is denoted by .T. Let .Dm be the open polydisc defined by .|zj | < 1, 1 ≤ j ≤ m. The distinguished boundary of .Dm is the .m−dimensional torus .Tm . Definition 15 A continuous function f on .Tm belongs to the polydisc algebra m m with the .A(D ) if there exists a continuous function F on the closure of .D following properties (a) F is analytic in .Dm . m .(b) .F = f on .T .
.
A function .f ∈ A(Dm ) is an inner function if .|f | = 1 everywhere on .Tm . If f and g are two inner functions their product is still an inner function. In one dimension inner functions belonging to the disc algebra are Blaschke products. A Blaschke product is defined by a finite set .z1 , . . . , zN , of complex numbers z−zj belonging to the open unit disc. We have .B(z) = N 1 1−zzj . The phase .φ of this Blaschke product is the continuous real valued function of the real variable t which is defined by .exp(2π iφ(t)) = B(exp(2π it)). Then .φ is increasing and we have .φ(t + 1) = φ(t) + N where .N ∈ N. The disc algebra and the inner functions can obviously be defined on .Tm . One uses Definition 15 and the canonical isomorphism between .Tm = (R/Z)m and m given by .(x ) m .T j 1≤j ≤m → (exp(2π ixj ))1≤j ≤m . Let .S ⊂ Z be defined by .S = {k; kj ≥ 0, 1 ≤ j ≤ m}. As it was said above a Radon measure .μ on .Tm is an Ahern measure if its Fourier transform is supported by .S ∪ (−S). Theorem 4 Let .φ : Tm−1 → T be a continuous function, let .U ⊂ Tm be the graph of .φ and let .σU be the image of the Haar measure on .Tm−1 by the map . : x → (x, φ(x)). Then .σU is an Ahern measure if and only if .exp(−2π iφ) is an inner function on .Tm−1 . The proof is immediate. Let us assume that .exp(−2π iφ) is an inner function and prove that .σU is an Ahern measure. We have ˆ σU (k) =
.
Tm−1
exp(−2π ikm φ)exp(−2π i(k1 x1 + · · · + km−1 xm−1 ))dx.
We know that the function .exp(−2π iφ) is an inner function. Therefore exp(−2π ikm φ) is also an inner function if .km ≥ 0 and . σU (k) = 0 unless
.
402
Y. Meyer
σU (k) the proof is complete. The k1 ≥ 0, . . . , km−1 ≥ 0. Since . σU (−k) = converse implication is as easy. Theorem 5 improves on Theorem 3.
.
Theorem 5 Let .φ : Tm−1 → T be a .C∞ function, let .U ⊂ Tm be the graph of m−1 by the map . : x → .φ and let .σU be the image of the Haar measure on .T (x, φ(x)). Let us assume that .exp(−2π iφ) is an inner function on .Tm−1 and that m−1 . Then .μ = σ ◦ h is a sum of .|α1 ∂1 φ + · · · + αm−1 ∂m−1 φ − αm | ≥ c > 0 on .T U weighted Dirac masses on a uniformly discrete set . and we have μ∼
.
σU (k)exp[2π i(k · α)t].
k∈Zm
We do not have .μ = λ∈ δλ . This improvement on Theorem 5 is obtained in Sect. 7 on a specific example.
7 The Set Constructed by Kurasov and Sarnak Is a Curved Model Set Let us compare our approach to what was achieved by Kurasov and Sarnak in [5]. The starting point of their construction is the definition of a stable polynomial. A stable polynomial .P (z) of m complex variables .zj , 1 ≤ j ≤ m, is defined by the condition that .P (z) does not vanish when .|zj | < 1, 1 ≤ j ≤ m, or when m is defined by .P (z) = 0 where .|zj | > 1, 1 ≤ j ≤ m. In [5] the manifold .U ⊂ T P is stable, . is the corresponding curved model set and .αj > 1. In a forthcoming paper Theorem 5 will be improved and the manifold U is no longer assumed to be a graph. Instead U will be defined by the equation .H (·) = 1 where H is a smooth inner function on .Tm . In Theorem 5 we are given an inner function .H0 on .Tm−1 and the inner function H has the special structure .H (z) = zm H0 (z1 , . . . , zm−1 ). Therefore the equation .H (z) = 0 defines a graph. We now compare these two definitions of U and show that the one given by Kurasov and Sarnak is more general. Walter Rudin and E.L. Stout [15] proved that a smooth inner function in the polydisc is a rational function .H = MA∗ /A where A is a polynomial which does not vanish is obtained by replacing all the coefficients of A by in the closed polydisc, .A(z) nm their complex conjugates, and .A∗ (z) = A(1/z). Finally .M(z) = z1n1 . . . zm is a ∗ monomial such that .B = MA is a polynomial. In our approach the manifold U is defined by .H (·) = 1 which is equivalent to .B(·) = A(·). But the same U can also be defined by .P (·) = 0 where .P = B − A. Lemma 9 This polynomial .P = B − A is a stable polynomial. We first prove that P does not vanish in the polydisc. Otherwise we would have H (z) = B(z)/A(z) = 1 for some .z ∈ Dm . If H is not identically equal to 1 this is impossible by the maximum principle since .|H (·)| = 1 on the distinguished boundary of the polydisc. We now prove that P does not vanish if .|zj | > 1, 1 ≤
.
Curved Model Sets and Crystalline Measures
403
j ≤ m. The conjugate polynomial in the sense given by Kurasov and Sarnak is ˜ ˜ Q(z) = M(z)P (1/z) = A(z) − B(z). We have .Q(z) = 0 in the polydisc. Indeed ˜ ˜ ˜ .A(z) = B(z) in the polydisc, it = A(z) and the same holds for .B. If we had .A(z) would imply .A(z) = B(z) which cannot happen as it was proved earlier. Lemma 9 is now proved and implies that the curved model sets built from inner functions belong to the class defined by Kurasov and Sarnak. But the opposite implication could be wrong and there could exist a Kurasov-Sarnak crystalline measure which cannot be obtained by our approach. In [9] Olevskii and Ulanovskii improved on the results of [5]. They elaborate a class of crystalline measures which is larger than the one which is constructed in [5]. The construction of Kurasov and Sarnak is based on the spectral properties of the almost periodic function .
g(t) = P (exp(2π iα1 t), . . . , exp(2π iαm t)),
.
where P is a stable polynomial. Olevskii and Ulanovskii give up stable polynomials. Instead they directly start with a trigonometric sum g(z) =
N
.
cj exp(2π iωj z), z ∈ C,
1
where the frequencies .ωj are real numbers. They assume that the zeros of g are real, simple and form a uniformly discrete set .. This assumption is satisfied when .g(t) = P (exp(2π iα1 t), . . . , exp(2π iαm t)) where P is a stable polynomial. Then the authors construct a crystalline measure .μ supported by .. Here is an example taken from [9]. Translated in our language the manifold .U ⊂ T2 used in [9] is the graph of the function .φ(θ ) = arcsin(sin θ/2). Then the function .exp(−iφ(θ )) cannot be an inner function since every smooth inner function on the circle group is a Blaschke product. Therefore the crystalline measure constructed in [9] cannot be obtained from our recipe. We now focus on two remarkable examples given in [5]. In the first one we start from the stable polynomial .P (z1 , z2 ) = 2z1 z2 + z1 + z2 + 2. We then have .Q = P 2+z1 and .U ⊂ T2 is defined by .P = 0. This is equivalent to .z2 = − 1+2z which is exactly 1 1 definition of U given in Theorem 5 with the inner function .− 1+2z 2+z1 . It can also be
1 written .H (z) = 1 where .H (z) = −z2 1+2z 2+z1 . On this example the two approaches are equivalent. On the second example .P (z) = 1 − 31 z1 + 13 z22 − z1 z22 and .U ⊂ T2 is defined by .P (z) = 0. Exchanging indices U is also defined by .−2 tan(π x2 ) = tan(2π x1 ). The coordinates which were used in [5] are permuted here which implies t)−i sin(2π t) U is the graph of a .C∞ function .φ. We have .exp(2π iφ(t)) = 22 cos(2π cos(2π t)+i sin(2π t) .
+1 We set .z = exp(2π it) and obtain .exp(−2π iφ(t)) = 3z which is a Blaschke 1+3z2 √ product whose zeros are .±i/ 3. This function .φ can also be defined by .φ(0) = 0 and .φ (t) = − 1+3 cos42 (2π t) . It is important to observe that .φ is decreasing. We have 2
404
Y. Meyer
φ(t + 1) = φ(t) − 2. Then .ψ(t) = φ(t) + 2t is a .1−periodic .C∞ function. Let .α > 0 be irrational and let .
= {t; (αt, t) ∈ U }.
.
(7)
Equivalently .t ∈ if and only if .αt = αφ(αt) + αk, k ∈ Z. Then .s = αt satisfies α α α ψ(s) = 1+2α k. But .(s) = s − 1+2α ψ(s) is s + 2αs − αψ(s) = αk or .s − 1+2α a diffeomorphism of the real line and we have .(s + k) = (s) + k, ∀k ∈ Z. The inverse diffeomorphism .ρ also satisfies this functional equation. Therefore .ρ(x) − x is a smooth .1−periodic function. We proved the following:
.
Theorem 6 There exists a .1−periodic .C∞ function r on the real line such that the αk k set . defined by (7) is the sequence .λk = 1+2α + r( 1+2α ), k ∈ Z. We now prove that .σ = λ∈ δλ is a crystalline measure. The proof relies on the following theorem: Theorem 7 Let .φ : T → T be a .C∞ function and let us assume that .exp(−2π iφ) is an inner function. Let .U ⊂ T2 be the graph of .φ. Let .ω be the function defined on U by .ω(t, φ(t)) = φ (t). Then .ν = ωσU is an Ahern measure. We have for any .(m, n) ∈ Z2 ˆ .ν(m, n) =
1
exp(−2π imt)exp(−2π inφ(t))φ (t) dt.
0
If .n = 0 we simply integrate by parts and obtain ˆ
1
.ν(m, n) = −m/n
exp(−2π imt)exp(−2π inφ(t)) dt 0
which vanishes unless m and n have the same sign as it was already proved. This ends the proof of Theorem 7. Let .α > 0 be an irrational number and let us define the map .h : R → T2 by .h(t) = (αt, t). As above we set = {t; (αt, t) ∈ U }.
.
Theorem 5 is applied to the measures .σU and .ωσU and yields two atomic measures μ1 = σU ◦ h and .μω = ωσU ◦ h. We have
.
μ1 =
.
λ∈
c(λ)δλ
Curved Model Sets and Crystalline Measures
405
and the explicit calculation given in the Appendix implies c(λ) = (φ (αλ) − α)−1 .
.
Similarly we have μω =
.
e(λ)δλ ,
λ∈
where e(λ) = φ (αλ)(φ (αλ) − α)−1 .
.
Both .μ1 and .μωare crystalline measures. We obviously have .σ = −αμ1 + μω . Therefore .σ = λ∈ δλ is a crystalline measure. This is one of the remarkable examples obtained by the Karasov-Sarnak’s scheme.
Appendix The details of the computation of .μ = σU ◦ h are given. We define a narrow strip U around U by .U = {x + αt; x ∈ U, |t| ≤ }. If . is small enough the map m−1 × [− , ] → U defined by .G : T
.
G(x, t) = (x1 + α1 t, . . . , xm−1 + αm−1 t, φ(x1 , . . . , xm−1 ) + αm t)
.
is a diffeomorphism. This is due to the transversality and the implicit function theorem. The Jacobian determinant of G is .|α1 ∂1 φ + . . . + αm−1 ∂m−1 φ − αm |. We denote by w ´an even non negative .C∞ function supported by the interval .[−1, 1] such that . w = 1 and we set .w (t) = (1/ )w(t/ ). The support of .w is the interval .[− , ]. Let .τ be the image of the probability measure .w (t) dt by
∞ function m .h : R → T . Then the convolution product .σ U = σU ∗ τ is a .C m on .T . A straightforward calculation yields for any .x = y + αs ∈ U where m−1 , and .|s| ≤ : .y = (u, φ(u)) ∈ U, u ∈ T σU (x) =
.
w (s) . |α1 ∂1 φ(u) + . . . + αm−1 ∂m−1 φ(u) − αm |
It implies with .α = (α1 , . . . , αm−1 ) σU ◦ h =
.
λ∈
w (t − λ) . |α1 ∂1 φ(α t) + . . . + αm−1 ∂m−1 φ(α t) − αm |
(8)
406
Y. Meyer
On the other hand the Fourier series of the .C∞ function .σU is .
σ w ( k)exp(2π ik · x). U (k)
k∈Zm
This series is absolutely convergent. Therefore the Fourier series of .σU ◦ h is .
σ w( k)exp[2π i(k · α)t]. U (k)
k∈Zm
It suffices to let . tend to 0 to obtain . ω(λ)δλ = σ U (k)exp[2π i(k · α)t], λ∈
k∈Zm
where ω(λ) = |α1 ∂1 φ(α λ) + . . . + αm−1 ∂m−1 φ(α λ) − αm |−1 .
.
The same tools are used in the proof of Lemma 6. If .α is transverse to U then .α is transverse to .σU . This is implied by Lemma 7 and (8). In the opposite direction if the vector .α is not transverse to U there exists a .u0 ∈ U such that .u0 + α is tangent to .u. Definition 7 is used in a coordinate system in which .α is .(0, 0, . . . , 0, 1) and .u0 = 0. Then it is a simple exercise to check that .wg is unbounded in a neighborhood of 0 when .g = 1 on .U. Acknowledgement The referee was patient, helpful, and inspiring. This work was supported by a grant from the Simons Foundation (601950, YM).
References 1. P. R. Ahern. Inner functions in the polydisc and measures on the torus. Michigan Math. J. 20 (1973). 2. A. P. Guinand. Concordance and the harmonic analysis of sequences. Acta Math. 101 (1959), 235–271. 3. H. L. Hamburger. Über einige Beziehungen, die mit der Funktionalgleichung der Riemannschen ζ −Funktion äquivalent sind. Math. Ann. (1922) 129–140. 4. J-P. Kahane et S. Mandelbrojt. Sur l’équation fonctionnelle de Riemann et la formule sommatoire de Poisson. Annales scientifiques de l’E.N.S., tome 75 (1958) 57–80. 5. P. Kurasov and P. Sarnak. Stable Polynomials and crystalline measures. https://arxiv.org/abs/ 2004.05678v1 6. N. Lev and A. Olevskii. Measures with uniformly discrete support and spectrum. C. R. Math. Acad. Sci. Paris 351 (2013), 613–617. 7. N. Lev and A. Olevskii. Quasicrystals and Poisson’s summation formula. Invent. Math., Volume 200, Issue 2, 585–606, 2015
Curved Model Sets and Crystalline Measures
407
8. N. Lev and A. Olevskii. Quasicrystals with discrete support and spectrum. Revista Matemática Iberoamericana, Vol. 32, Issue 4 (2016). 9. Y. Meyer. Algebraic numbers and harmonic analysis. North-Holland, Amsterdam, 1972. 10. Y. Meyer. Global and local estimates on trigonometric sums. Transactions of The Royal Norwegian Society of Sciences and Letters, 2018. 11. Y. Meyer. Quasicrystals, almost periodic patterns, mean-periodic functions and irregular sampling. African Diaspora Journal of Mathematics, Volume 13, Number 1, pp. 1–45 (2012) Special Issue. 12. Y. Meyer. Measures with locally finite support and spectrum. PNAS (2016) 3152–3158. 13. Y. Meyer. Measures with locally finite support and spectrum. Revista Matematica IberoAmericana 33 (2017) 1025–1036. 14. Y. Meyer. Guinand’s measures are almost periodic distributions. Bulletin of the Hellenic Mathematical Society. Vol. 61 (2017) 11–20. 15. W. Rudin and L.E. Stout. Boundary properties of functions of several complex variables. Journal of Mathematics and Mechanics. (1965) 991–1005. 16. L. Schwartz, Théorie des distributions. Hermann (1950).
Diffusion Maps: Using the Semigroup Property for Parameter Tuning Shan Shan and Ingrid Daubechies
Abstract Diffusion maps (DM) constitute a classic dimension reduction technique, for data lying on or close to a (relatively) low-dimensional manifold embedded in a much larger dimensional space. It consists in constructing a spectral parametrization for the manifold from simulated random walks or diffusion paths on the dataset. However, DM is hard to tune in practice. In particular, the task to set a diffusion time t when constructing the diffusion kernel matrix is critical. We address this problem by using the semigroup property of the diffusion operator. We propose a semigroup criterion for picking the “right” value for t. Experiments show that this principled approach is effective and robust.
1 Introduction Diffusion maps (DM) [1] are used in machine-learning to achieve dimension reduction for data that are assumed to be sampled from a lower-dimensional manifold within a higher-dimensional setting; they are related to other kernel eigenmap methods such as Laplacian eigenmaps [12], local linear embedding [3], Hessian eigenmaps [4], and local tangent space alignment [5]. The basic idea is simple: diffusion on a manifold is governed by the semigroup generated by the manifold’s Laplace–Beltrami operator; the spectral analysis of the diffusion operator thus provides information about the manifold that can be used to provide a lower-dimensional parametrization for the data that also removes “noise” from the data inconsistent with the manifold hypothesis. One can (approximately) simulate a random walk or diffusion process on the (unknown) manifold by taking small steps within the dataset according to probabilities estimated from the distances
S. Shan University of Southern Denmark, Odense, Denmark e-mail: [email protected] I. Daubechies () Duke University, Durham, NC, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Flandrin et al. (eds.), Theoretical Physics, Wavelets, Analysis, Genomics, Applied and Numerical Harmonic Analysis, https://doi.org/10.1007/978-3-030-45847-8_18
409
410
S. Shan and I. Daubechies
between data points. Indeed, if the data points provide a sufficiently dense sampling of the manifold, then their distances (measured in the high-dimensional ambient space) within a close neighborhood of a fixed data point P are close approximations to the distances between the corresponding points within the pull-back of the neighborhood to the tangent plane at P ; the diffusion kernel on the manifold can be likewise approximated (near P ) by that on the tangent plane at P . Since the diffusion kernel in a Euclidean space takes the same form (up to normalization) regardless of the dimension, one can thus simply use the distances in the ambient large-dimensional Euclidean space to generate a reasonable approximation to the manifold diffusion kernel for short diffusion times. More precisely, suppose we are given a set of points .D = {y1 , . . . , yN } residing in high-dimensional Euclidean space .RK . To compute a low-dimensional representation of the data with DM, we first construct a diffusion kernel matrix .W (t) ,
(t) .W ij
d 2 (yi , yj ) = exp − . t
(1)
Here, .d(·, ·) is a distance metric on .D, for example, as defined by the .L2 norm || · ||2 on .RK , and t is a diffusion time parameter chosen by the user. (This parameter is the “short time” from the hand-waving argument in the preceding paragraph.) Further operations are typically carried out (see below) to correct for possible local differences in, e.g., sampling density within the dataset. The resulting discrete matrix is interpreted as an approximation to the diffusion kernel (i.e., the kernel of the semigroup generated by the Laplace–Beltrami operator) on the manifold assumed to underlie the data. The entries .( )i ; = 1, . . . , L, i = 1, . . . N (with .L K) of this matrix’s first few eigenvectors . , sometimes suitably weighted according to the eigenvalues .λt , then provide an L-dimensional parametrization for the data points .yi , i = 1, . . . N. This article is organized as follows. Section 2 gives a brief capsule description of the algorithm for computing diffusion maps and its mathematical interpretation. Section 3 discusses the sensitivity of the method to the choice for t, illustrates the difficulty in guesstimating the “right” t, and introduces the semigroup test as a criterion for determining a useful value for t; we also give examples of using the semigroup test to the problems of finding optimal data embedding on synthetic data and real image data. Experiments show that this principled approach is effective and robust. .
2 Diffusion Operator and Diffusion Maps: A Brief Recap We present a very condensed summary; readers interested in more extensive discussion of Riemannian manifolds can consult, e.g., [6]; the basics of Diffusion Maps can be found in [7].
Diffusion Maps: Using the Semigroup Property for Parameter Tuning
411
2.1 Laplace–Beltrami Operator We begin by defining the Laplace–Beltrami operator applied to a scalar function f : M → R on a Riemannian manifold .M; for simplicity we shall assume .M to be compact, without boundary.
.
Definition 1 (Laplace–Beltrami) Let .M be a Riemannian manifold with a metric g. The Laplace–Beltrami operator on .M is defined by M f (x) = −Trace∇ M ∇f (x),
.
where .∇ M denotes the canonical Levi-Civita connection on .M associated with g. (In the case where .M is a compact manifold with a smooth boundary, one has to consider appropriate boundary conditions in order to define .∇M as a self-adjoint operator.) The spectrum of .M on a compact manifold .M is discrete. Let the eigenvalues be .0 = γ0 ≤ γ1 ≤ γ2 ≤ . . . and let .fi be the eigenfunction corresponding to eigenvalue .γi . Then the eigenfunctions of the Laplace–Beltrami operator give rise to an embedding operation with certain optimality properties. An important observation (see, e.g., [2]), is that .||∇f || provides us with an estimate of how far apart f maps nearby points. Let .yi , yj ∈ M; then |f (yi ) − f (yj )| ≤ distM (yi , yj )||∇f || + o(distM (yi , yj )),
.
(2)
indicating that points close together on the manifold are mapped by f to values close together in .R. The extent to which f “preserves” locality can be measured by, e.g., ˆ .
M
||∇f (y)||2 dvolM y;
(3)
minimizing this objective function is equivalent to finding eigenfunctions of the Laplace–Beltrami operator .M , in the following sense. The eigenfunction .f0 of .M with the lowest value for (3) is a constant function on .M (for which .∇f (y) = 0 for all y), which is completely uninformative concerning localization of manifold points w.r.t. each other, since it maps all points to the same value in .R. The next eigenfunction .f1 provides an optimal embedding map to the real line (in the sense that it minimizes the integral over .M of the averaged distortion bound (3)); similarly, the optimal embedding of the manifold in .RL is defined by g := (f1 (y), . . . , fL (y)) .
.
412
S. Shan and I. Daubechies
2.2 Diffusion Operator Although the first L eigenvectors of the Laplace–Beltrami operator provide an informative embedding of .M in .RL , it can be difficult to identify these eigenvectors with a reasonable degree of accuracy, starting from noisy samples of .M. For this reason, it may be useful, in order to determine (approximations to) these eigenvectors, to work instead with the semigroup of operators .{e−tM }t≥0 . The Laplace–Beltrami operator .M is the infinitesimal generator of .e−tM , i.e., .
I − e−tM f = −M f, t→0 t lim
whenever f belongs to a suitable dense subset of .C(M). The diffusion operators {e−tM }t>0 share the same eigenfunctions as .M , and the eigenvalues of .e−t are exactly the .e−γ t ; in particular, they are bounded: .1 = e−γ0 t ≥ e−γ1 t ≥ e−γ2 t ≥ · · · > 0. The Laplace–Beltrami operator .M and the diffusion operator .e−tM are also related through the heat (or diffusion) equation on the manifold.
.
Definition 2 (Heat Equation) Let .f : M → R be the initial temperature distribution on a manifold .M embedded in .Rd . The heat equation is the partial differential equation .
∂u + M u = 0 ∂t
u(x, 0) = f (x). The solution of the heat equation is given by the diffusion operator .e−tM , u(x, t) = e−tM f (x). ˆ = ht (x, y)f (y)dvolM y,
.
M
(4) (5)
where .ht is the heat kernel.
2.3 Approximating the Diffusion Operator on a Discrete Dataset When .x, y are close to each other and t is small, .ht can be approximated by the Gaussian d
ht (x, y) = (4π t)− 2 e−
.
where d is the dimension of .M.
||x−y||2 4t
,
(6)
Diffusion Maps: Using the Semigroup Property for Parameter Tuning
413
On discrete data samples of N data objects in K dimensions, interpreted as points of an unknown Riemannian manifold .M embedded in .RK , an approximation of the diffusion operator .e−tM is built as follows. First, define the N by N matrix W , ⎧ ⎨ − ||yi −yj ||2 4t e .Wij = ⎩0
if ||yi − yj || < , otherwise
(7)
where . is picked in concordance with t; typically . is proportional to .(Ct)1/2 for some C significantly larger than 1; the .Wij set to zero thus correspond to entries that would otherwise be so small that they would not contribute much to the overall matrix, while setting them to zero alleviates the complexity of the algorithm. The action of the true diffusion kernel, acting as an integral operator on the constant function 1 on .M, would produce the function 1 again; the discrete approximation W typically does not have the same effect on the all-ones vector approximating the function 1. Many of the approximation ingredients contribute to this shortcoming, such as setting some of the W -entries to zero, or (more importantly) local variations in the data manifold sampling, which result in some data points having more and/or closer neighbors than others and which also cause the summing (rather than integration) procedure to deviate from an optimal quadrature. To remedy the total effect of these shortcomings to some extent, matrix one defines a diagonal N , with entries given by .D −1 W then .D = [Dii ] = W ; the matrix . D ii ij j i=1 does indeed map the all-ones vector to itself. (Note that this also neatly sidesteps the problem that we had no estimate for the dimension d of .M, which would, in principle, have been necessary for the normalization of the gaussian approximation to .ht .) We then compute eigenvalues and eigenvectors for .D −1 W , i.e., D −1 W f = λ f ,
.
where we order the eigenvalues and eigenvectors so that .1 = λ0 ≥ λ1 ≥ λ2 ≥ . . . . The k-dimensional embedding is then defined by [i] gi = (f[i] 1 , . . . , fL ),
.
where .f[i] is the i-th entry in the N-dimensional .-th eigenvector. Figure 1 illustrates this embedding on a simple example. The dataset consists of points residing on a helix wrapped around a torus and shows the 2D embeddings obtained by Laplacian eigenmaps and diffusion maps, comparing them with those from PCA and MDS; Laplacian-based methods clearly do the better job recovering the intrinsic data structure—in this example, PCA and MDS essentially give a results similar to a linear projection onto a 2D-plane. In a second example the data are sampled uniformly (without added noise) from a 2D “Swiss roll” surface (a rectangle rolled up so that it forms a spiral—see Fig. 2a) embedded in 3D. Figure 2 compares the embeddings produced by Laplacian
414
S. Shan and I. Daubechies
Fig. 1 (a) Original data points, uniformly sampled (with some noise added) from a helicoidal curve wrapped around a torus in 3D. (We note that although we show the data embedded in a low-dimensional space in all our examples, the size of the ambient dimension has no impact on these methods, since they depend only on the distances between data points.) Smaller panels: 2D embeddings of these data obtained via (b) PCA, (c) MDS, (d) Laplacian eigenmaps, and (e) Diffusion eigenmaps
Fig. 2 Left: original Swiss roll data; Middle and Right: 2D Embeddings obtained via Laplacian eigenmaps and Diffusion maps, respectively
eigenmaps and by Diffusion maps, showing that Diffusion maps introduce fewer deformations. Remarks 1. The same “normalization by left-multiplication by a diagonal matrix” approach can be (and has been) used for the Laplace–Beltrami operator .M rather than −tM ; approximating the differential operator by a matrix L expressing a .e second-order difference, and then “normalizing” it by setting .D −1 L, leads to the Laplacian eigenmaps proposed in [2]. 2. Although the matrix W is symmetric, the “normalized” version .D −1 W typically is not. One can also consider instead the symmetrized version .D −1/2 W D −1/2 ,
Diffusion Maps: Using the Semigroup Property for Parameter Tuning
415
which has the same eigenvalues as .D −1 W ; its eigenvectors are the vectors 1/2 f . .D 3. In case the sampling density is known to be systematically not uniform over the manifold, it can be useful to introduce a correction for this in the matrix construction. The paper [1] describes in detail how one can modify the construction to incorporate (an approximation to) the sampling density. Depending on the value assigned to a tuning parameter .α, the modification introduced in [1] can be interpreted (when .α = 1) as introducing a Jacobian-like factor (so that, in the limit for finer and finer sampling, one recovers again the standard diffusion semigroup and its Laplace–Beltrami generator), or (when .α = 1) as an adjustment of the diffusion process itself, with a non-constant diffusivity; in the latter case, the approximation is linked to a different semigroup, the eigenvectors and eigenvalues of which encode again significant geometric information about the manifold, and can therefore again be used for a lowerdimensional parametrization of the manifold.
3 Setting the Diffusion Time t 3.1 Sensitivity to the Choice of t It is intuitively clear that the algorithm described earlier can work only in some window for t: if t is chosen very large, then .Wij will be different from zero even for pairs .i, j for which the data points .yi and .yj are far from each other in the ambient space, although we expect that their Euclidean distance is not at all informative about their relative roles on the manifold .M. We argued in Sect. 2 that the construction of .(Wt )ij (where we now explicitly denote the dependence of W on t) was reasonable for .i, j where the distance . yi − yj was sufficiently small that .yj could be viewed as close to the tangent plane to .M at .yi ; this means we should expect the method to work well only for t below some threshold. This is also consistent with the proofs in [7] and [1]: since those are proofs holding for .t −→ 0, the similarity of the eigendecompositions of .Wt and the true diffusion operator .e−tM on .M can be expected only in the regime of small t. When t is too small, the method faces a different problem: for .t < mini=j yi − yj 2 , W reduces to the identity operator, and no useful embedding can be constructed. The problem persists for slightly larger values of t, where only a few .(i, j ) emerge above the threshold. In a certain sense, the diffusion time is then too short for the diffusion process to consistently bridge the distance between sample points on .M. Ideally, one would like that for each i, .(Wt )ij = 0 for several .j = i. (One would also like the number of such “useful” neighbors not to vary by orders of magnitude over the dataset. This is possible only if the sampling is fairly uniform. It is when the spatial distribution of the points in the dataset varies so much that no single parameter setting in the definition of .Wt allows for the number .#{j = i; (Wt )ij = 0}
416
S. Shan and I. Daubechies
to be at least (say) 10 for all i without getting into the several 100s for other i, that it is necessary to adapt the simple diffusion operator .Wt , e.g., using the methods in [1].) Finding the “right” choice for t, in the happy medium between the two extreme regimes, can be tricky: as illustrated in Figs. 3 and 4 below, different choices for t can lead to very different outcomes for the same data.
3.2 Semigroup Test In practical applications of diffusion maps, it can take quite a bit of trial and error to find a “right” value for t. Our goal here is to describe a simple robust guiding strategy to reduce this guesswork, which finds a near-optimal value for t in many situations in which we have tested it. The diffusion operators .{e−tM }t>0 form a strongly continuous semigroup; i.e., the .Tt := e−tM satisfy Tt1 +t2 = Tt1 Tt2 , and s-lim Tt = Id .
.
t→0
It follows that the matrices .Dt−1 Wt , used to define diffusion maps, or their −1/2 −1/2 Wt Dt , can be approximate, discretized symmetrized versions .Kt := Dt versions of the diffusion operators on .M only when they likewise (approximately) satisfy the semigroup property. We use this insight to formulate a criterion to pick an “optimal” t. In the regime where the .Kt -operators are reasonable approximations of the semigroup 2 −tM } .{e t≥0 , .(Kt ) should be close to .K2t . This motivates the definition of the
Fig. 3 Three embeddings of the helicoidal data from Fig. 1 obtained via Diffusion maps, for different choices for the parameter t: .t1 (left), .t2 (middle) and .t3 (right). The choice .t = t2 was adopted for Fig. 1e; the choices .t = t1 = t2 /16 and .t = t3 = 4t2 are clearly suboptimal
Fig. 4 Several embeddings of the Swiss roll data from Fig. 2 obtained by the Diffusion map procedure of Sect. 2; the only difference lies in the choice of the parameter t, increasing from left to right in the figure by a factor 2 each time
Diffusion Maps: Using the Semigroup Property for Parameter Tuning
417
semigroup error .(SGE)(t), SGE(t) := (Kt )2 − K2t .
.
The norm used here is the operator norm; the operators we consider are (expected to be) positive, with eigenvalues between 0 and 1, and the range for SGE.(t) is between 0 and 1. In practice, we begin with initializing a wide range of discrete values for t, i.e., we pick a set .T := {tm ; m = 1, . . . M}. For each .tm in T , we construct the diffusion 2 matrices . Ktm and .K2tm , and we compute .(SGE)(tm ). Figure 5 below plots the semigroup error SGE.(t) for different values of t for the Swiss roll example of Figs. 2 and 4; SGE.(t) reaches its lowest value for the choice of t where the 2D-embedding is closest to a rectangle, which we know to be the ground truth in this case. In Fig. 6, below, we revisit the three embeddings shown in Fig. 3, next to the semigroup error plot for this dataset, and we observe that the visually optimal embedding (which is also the most accurate version of the ground truth for this manifold M) corresponds again to the value of t with the smallest SGE in the regime of small diffusion times. As shown in Fig. 6, the values of SGE.(t) behave qualitatively as we would have expected, based on the intuition explained at the start of this section: when t is very small, the semigroup behavior of the .Kt has not “kicked in” yet, because the numerical diffusion’s range is too short, and this is reflected by larger values for SGE.(t). (We recall that the range of SGE values is between 0 and 1; values exceeding 0.3 are indeed “large.”) As t increases, SGE.(t) drops to lower values, to
Fig. 5 Semigroup error SGE for the Swiss roll data, for the values of t illustrated in Fig. 3. The red dot is the optimal value of t and corresponds to the embedding that visually best reflects the ground truth. The scale for t is logarithmic; each of the successive .tm (at the tick marks) is larger by factor 2 than the previous one, .tm+1 = 2 tm
418
S. Shan and I. Daubechies
Semigroup error
0.35
0.25
0.15
0.05 -10 -8
-6
-4
-2
0
2
4
6
8
10
log2(t) Fig. 6 Semigroup error plot for a wide range of candidate values for t—the ratio .tmax /tmin equals 20 !—for the dataset illustrated in Figs. 1 and 3. Note that for very large t, the SGE estimate becomes small again—see discussion in the text
.2
start increasing again after a minimum SGE-value not too far above 0. (These small values are maintained in an interval for t, as may not be evident from Fig. 6, in which the successive values of t increase by a factor 4, .tm+1 = 4 tm ; detailed behavior in the neighborhood of each .tm is not apparent from this figure.) We interpret this increase as the influence of ambient-space geometry (such as the toroidal winding in this example), once the numerical diffusion is no longer “following” the manifold .M; because even noisy sampling from .M translates to very non-uniform sampling in the higher-dimensional ambient space, the .Kt are less close to following a semigroup behavior. When t becomes much larger, the value of SGE.(t) starts decreasing and ultimately becomes tiny: once the reach of the numerical diffusion is sufficiently large that the “sources” on or near .M all “act” as one diffuse blob, and the geometry of .M has been obscured, the semigroup nature of the ambient-space diffusion takes over. Although SGE.(t) is small again, one cannot hope to use the diffusion maps to generate an informative low-dimensional embedding of .M for t in this range. We next turn to a few examples with non-uniform sampling. In this case we compensate for the change in sampling density by using the techniques described in [1], using an integral kernel .At,α obtained by a “renormalization” of .Wt . Regardless of the parameter setting for .α that gives the best results (which depends on the type of non-uniformity), the basic intuition underlying the method remains the same: the spectral analysis, used to construct a low-dimensional embedding of .M, is predicated on the .At,α approximating the kernels of a semigroup of operators. One can thus again use the SGE to determine optimal choices of t. Figure 7 shows the results for a non-uniformly sampled circle. To illustrate the robustness of our the SGE test, we examine this dataset again after noise has been added. The results are shown in Fig. 8.
419
Semigroup error
Diffusion Maps: Using the Semigroup Property for Parameter Tuning
-10 -8
-6
-4
-2
0
2
4
6
8
10
log2(t)
Semigroup error
Fig. 7 Left: 512 non-uniformly sampled points on a circle; Middle: SGE.(t) for a wide range of t values, with the useful-range optimal t marked in red; Upper right inset: the embeddings obtained for the optimal .t = 1/4 and for .t = 1/16 (left) and .t = 1 (right). In this example, we set .α = 2 to determine the adapted kernels .At,α . (In this case, there is hardly any dimension reduction, since both the original and final circle are depicted on a 2D plane; this is a toy example, after all. It may be worth noting that the parametrization by Diffusion maps would have been identical had the data been embedded in a random not-coordinate-aligned plane in a much higher-dimensional space)
-10 -8
-6
-4
-2
0
2
4
6
8
10
Semigroup error
log2(t)
-10 -8
-6
-4
-2
0
2
4
6
8
10
log2(t)
Fig. 8 Left: 512 non-uniformly sampled points on a circle, now with noise added; Middle: SGE.(t) for a wide range of t values, with the useful-range optimal t marked in red; Upper right inset: the embeddings obtained for the optimal .t = 1/4, as well as for .t = 1/16 (left) and .t = 1 (right). In this case, we have again set .α = 2 to determine the adapted kernels .At,α
420
S. Shan and I. Daubechies
Fig. 9 DM with the semigroup tuning strategy applied to the Yale Face dataset, in which each data point is an image of .192 × 168 pixels. Top left: SGE.(t) for a wide range of t. The colored boxes above show the embeddings using the first 3 non-trivial eigenvectors for the optimal choice .topt = 1/4 and its two SGE-plot neighbors .t−4 = topt /4 and .t0 = 4 topt . Although the embedding for .t−4 looks comparable to that for .topt at first sight, the embedding shows that one data point is not integrated well with the rest (boxed green datapoint near one of the axes); for .t0 the structure of the dataset is much less well-defined than for .topt . Left: 2-dimensional DM embedding for .topt , with the datapoints indicated by thumbprints of the images, indicating that the DM parametrization captured the illumination degrees of freedom in the dataset. (In this case, we set .α = 1.5 to determine the adapted kernels)
After all the examples with simulated toy data, we conclude with one example of real data. The dataset is part of the extended Yale Face Database B [8]; it consists of 64 images (192 pixels by 168 pixels) of the same human face, in the same pose, under different illumination conditions: the light source is moved around in two different directions. The intrinsic dimensionality of this collection of images is therefore expected to be 2, although the images themselves are objects in a much higher-dimensional space. We applied Diffusion maps, coupled with the semigrouperror tuning strategy described above, to this collection; the results are shown in Fig. 9. By its very nature, the dataset in this example is noisy, since all photographs (as opposed to images generated by computer graphics) are inherently noisy, but we do not have an explicit characterization of this noise. To illustrate robustness of our analysis and semigroup criterion to noise, we resort to a common strategy in image analysis: we revisit the dataset in Fig. 10, after extra noise has been added independently to each of the 64 images. To add noise to the datapoints, from Figs. 9 and 10, we proceeded as follows. For each of the .192 × 168 pixels in each of the 64 images, we generated a random integer I uniformly in .[−100, 100]; we then replaced the pixel value P by .P + I if .0 < P + I < 255, by 0 if .P + I < 0 or by 255 if .P + I > 255. An example of one of the face photographs, before and after adding noise, is shown in Fig. 11 below. Despite the severity of the noise, we observe that the Diffusion Map analysis, combined with the semigroup tuning strategy, is remarkably robust: the same .topt is
Diffusion Maps: Using the Semigroup Property for Parameter Tuning
421
Fig. 10 DM with the semigroup tuning strategy applied to the same Yale Face dataset as in Fig. 9, after additional noise was added to each image of .192 × 168 pixels, i.e., to each data point. (The nature of the noise is explained below and illustrated in Fig. 11.) Top left: SGE.(t) for a wide range of t. The colored boxes above show the embeddings using the first 3 non-trivial eigenvectors for the optimal choice .topt = 1/4 and its two SGE-plot neighbors .t−4 = topt /4 and .t0 = 4 topt . Left: 2-dimensional DM embedding for .topt , with the datapoints indicated by thumbprints of the images. (We again set .α = 1.5 to determine the adapted kernels) Fig. 11 Left: one of the 64 images from the Yale Face dataset used in the analysis in Fig. 9; Right: the noisy version of this image obtained by adding, pixelwise, random integers picked uniformly and independently in [.−100, 100] to the gray value of the pixel, and rounding so the result is in [.0, 225]
selected in both cases, and the corresponding 2D embeddings are very similar (up to an inversion of the axis in one of the 2 variables), as illustrated by Fig. 12 below.
4 Conclusion Although Diffusion maps have shown to be a powerful tool to explore datasets embedded in high dimensions that are suspected to have interesting geometric structure on a much lower-dimensional scale [9–11], determining the “right” value
422
S. Shan and I. Daubechies
Fig. 12 Left and Middle: the 2D embeddings from Figs. 9 and 10, respectively, for the 64-face Yale Face dataset without and with the added noise illustrated in Fig. 11. Up to a change of sign in the horizontal axis, the two embeddings are remarkably similar, as illustrated by the figure on the Right, which superimposes onto the Middle embedding a “skeleton” of the other embedding, indicating with a green rectangle the mirrored position of each thumbprint from the Left embedding. Closer scrutiny shows that thumbprints in close geometric proximity in this comparison picture do indeed correspond to the same face picture
for the diffusion parameter t has been found to be tricky. Picking t so that it minimizes the semigroup error is computationally easy, makes sense from a theoretical point of view, and gives good results in practice.
Some Personal Comments Ingrid Daubechies I am grateful for the opportunity to contribute a paper to this Festschrift for Alex Grossmann, who I admired and loved over many years. Although the short paper above is not on a topic on which he and I worked together, I think he would have liked the basic idea—to really put to work the physical intuition of diffusion, and insist that the operators behave like the semigroup they should constitute if they want us to take them seriously. I first met Alex Grossmann in 1976, over 45 years ago, and he played an important role in my life for the next 40 years. Although my official Ph.D. advisor at the Vrije Universiteit Brussel was Jean Reignier, Alex was the researcher with whom I did most of my thesis work—and I proudly list him as my second advisor on the online Mathematics Genealogy project. Years after my Ph.D., when I came back to Europe after a postdoc in the USA, it was Alex who again set me on a different fruitful research course, introducing me to wavelets. Even though Alex’s influence looms large with respect to the directions in which I have worked and developed my own research, I can now see, with hindsight, that Alex had an even larger impact on my own personal development. I was still in my early twenties when I first met Alex, very young—I remember becoming absorbed
Diffusion Maps: Using the Semigroup Property for Parameter Tuning
423
Alex Grossmann and Jean Reignier at the reception after my Ph.D. defense
in the bandes dessinées of Ella, Michel and Etienne when I visited Alex and Dickie at their house, to Alex’s amusement—and had not met many families other than my parents’ friends. It was a breath of fresh air, a revelation (and a relief!) to be at dinners where what mattered were the food, the wine, and the company but not whether the cutlery all matched, to be in a house with so many books, to have wide ranging and interesting discussions about so many things, to find magazines to read in the bathroom. Alex, I am so glad to have known you—and I miss you.
References 1. Coifman, Ronald R and Lafon, StéphaneBroy, M.: Diffusion maps. Applied and computational harmonic analysis 21, 10–13. (2006) 2. Belkin, Mikhail and Niyogi, Partha: Towards a theoretical foundation for Laplacian-based manifold methods. Journal of Computer and System Sciences 74 1289–1308 (2008) 3. Roweis, Sam T and Saul, Lawrence K: Nonlinear dimensionality reduction by locally linear embedding. Science 290 2323–2326 (2000) 4. Donoho, David L and Grimes, Carrie: Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proceedings of the National Academy of Sciences 100 5591–5596 (2003) 5. Zhang, Tianhao and Yang, Jie and Zhao, Deli and Ge, Xinliang: Linear local tangent space alignment and application to face recognition. Neurocomputing 70 1547–1553 (2007) 6. Boothby, William M and Boothby, William Munger: An introduction to differentiable manifolds and Riemannian geometry, Revised. Gulf Professional Publishing (2003) 7. Coifman, Ronald R and Lafon, Stephane and Lee, Ann B and Maggioni, Mauro and Nadler, Boaz and Warner, Frederick and Zucker, Steven W: Geometric diffusions as a tool for harmonic analysis and structure definition of data: Multiscale methods. Proceedings of the national academy of sciences 102 7432–7437 (2005) 8. Georghiades, Athinodoros S. and Belhumeur, Peter N. and Kriegman, David J.: From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE transactions on pattern analysis and machine intelligence 23 643–660 (2001) 9. Liu, Jingen and Yang, Yang and Shah, Mubarak: Learning semantic visual vocabularies using diffusion distance. 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 461–468, IEEE (2009)
424
S. Shan and I. Daubechies
10. Van Dijk, David and Sharma, Roshan and Nainys, Juozas and Yim, Kristina and Kathail, Pooja and Carr, Ambrose J and Burdziak, Cassandra and Moon, Kevin R and Chaffer, Christine L and Pattabiraman, Diwakar and others: Recovering gene interactions from single-cell data using data diffusion. Cell 174 716–729 (2018) 11. Moon, Kevin R and van Dijk, David and Wang, Zheng and Gigante, Scott and Burkhardt, Daniel B and Chen, William S and Yim, Kristina and Elzen, Antonia van den and Hirn, Matthew J and Coifman, Ronald R and others: Visualizing structure and transitions in highdimensional biological data. Nature biotechnology 37 1482–1492 (2019) 12. M. Belkin and P. Niyogi.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation, 15(6):1373–1396, (2003)
Wavelet Phase Harmonics Stéphane Mallat, Gaspar Rochette, and Sixin Zhang
Abstract Alex Grossmann pointed out the importance of wavelet phase alignments across scales, to characterize local signal properties. This observation has drawn relatively little attention. Most wavelet research has concentrated on properties of wavelet coefficient amplitudes. This chapter shows that the phase is indeed crucial to characterize dependencies across scales, particularly to build non-Gaussian models of random processes. We introduce phase harmonic operators which capture the phase information with a windowed Fourier transform on the phase. We derive maximum entropy models of non-Gaussian stationary processes, conditioned by the covariance of wavelet phase harmonics. Relations with high-order moments and neural network coefficients are explained. It is shown that coherent structures of turbulent flows can be reproduced by wavelet phase harmonic models.
1 Introduction In their pioneer work on wavelets, Alex Grossmann and Jean Morlet [1, 2] showed with Kronland-Martinet that the phase propagation of a wavelet transform seems to characterize local “coherent structures” of signals. Singularities create lines of constant phase across scales, which converge to the location of these singularities at fine scales. They observed that these lines had different geometry depending upon the type of singularity or local structure. It was then proved that the position of stationary phases can specify the instantaneous frequency of harmonic structures
S. Mallat () DI ENS, PSL University, Paris, France Collège de France, Paris, France CCM, Flatiron Institute, New York, NY, USA e-mail: [email protected] G. Rochette · S. Zhang DI ENS, PSL University, Paris, France e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Flandrin et al. (eds.), Theoretical Physics, Wavelets, Analysis, Genomics, Applied and Numerical Harmonic Analysis, https://doi.org/10.1007/978-3-030-45847-8_19
425
426
S. Mallat et al.
[3]. Although the importance of phase seemed obvious for a physicist like Alex Grossmann, the use of wavelet phase has mostly been forgotten in applied harmonic analysis. This chapter reviews properties of the phase and explains why it is now playing a progressively more important role to analyze signals, particularly with neural networks. We concentrate on signal models and explain how phase properties are used to model non-Gaussian random processes with coherent structures. Mathematicians working in harmonic analysis often try to free themselves from the influence of phase fluctuations. Phase can sometime be considered as an annoying variable, which is unstable when the modulus vanishes and which may be avoided to capture important regularity properties. By proving that wavelet bases are unconditional bases of Sobolev, Hölder, or Besov spaces, Yves Meyer [4] proved that the amplitude of wavelet coefficients is sufficient to characterize the regularity of large classes of functions. As a consequence, in the 1990s, most harmonic analysis research and algorithms have concentrated on the properties of wavelet coefficient amplitudes, while neglecting the influence of phase. For example, Jaffard proved that the amplitude of wavelet coefficient is sufficient to characterize pointwise regularity [5], and other works showed that singularities can be detected from local maxima of wavelet coefficient amplitudes [6]. In the same vein, Donoho and Johnstone [7] proved that thresholding the amplitude of wavelet coefficients could lead to nearly optimal estimators of signals contaminated by noise. Wavelet coefficient amplitude may be sufficient to detect coherent structures in multiscale physical phenomena such as turbulences [8, 9], but it did not appear to be sufficient to build generative models of such structures. This question was tackled in image processing, where researchers tried to build models of complex non-Gaussian stationary processes considered as “visual textures.” Maximum entropy models conditioned by local Markov dependencies were introduced by D. Geman and S. Geman [10]. However, Markov properties do not capture long-range dependencies of multiscale random processes. Better results have been obtained by Portilla and Simoncelli [11], who conditioned maximum entropy models on correlations between wavelet coefficient amplitudes, while capturing dependencies across scales by using the phase of wavelet coefficients. More recently, remarkable image texture syntheses have been obtained from the covariance of one-layer convolutional neural network coefficients, calculated with a pointwise rectifier on filtered image coefficients [12]. This chapter analyzes the role of the phase to build models of non-Gaussian multiscale stationary processes and to recover typical coherent structures that appear in these processes. We review the properties of phase harmonic operators introduced in [13, 14], which connect the complex phase to high-order moments and to rectifier nonlinearities in neural networks. Section 2 begins by introducing maximum entropy models conditioned by the covariance of a signal representation. Section 3 explains that stationary processes have uncorrelated Fourier coefficients at different frequencies because of random phase fluctuations. High-order moments reveal dependencies between different frequencies by canceling the relative phase between different Fourier coefficients. However, such high-order moments have a
Wavelet Phase Harmonics
427
large variance because .zk amplifies the variability of z if .|z| is large and .k > 1. To preserve phase cancelation properties while avoiding large variance estimations, we replace .zk = |z|k eikϕ(z) by phase harmonics .|z| eikϕ(z) introduced in [13]. Section 3.2 shows that phase harmonic operators act as a windowed Fourier transform on the phase and can be computed with a rectifier nonlinearities in neural networks. Section 4 briefly reviews complex wavelet transforms of one- and twodimensional signals. Similarly to Fourier coefficients, wavelet coefficients are uncorrelated at far away scales because of their relative phase fluctuations. The covariance of wavelet phase harmonics specifies phase alignments observed by Alex Grossmann and Jean Morlet [2]. Section 5.2 shows that maximum entropy models conditioned by wavelet harmonic covariances can restore complex coherent structures observed in turbulent flows and other multiscale physical structures. Notations We write .z∗ the complex conjugate of .z ∈ C and .Y ∗ is the complex transpose of a matrix Y . The covariance of two random variables A and B is ∗ ∗ written .Cov(A, B) = E(A B ) − E(A) E(B) . An inner product is written .x, y = ´ 2 d x(u) y(u) du in .L (R ) and .x, y = u x(u) y(u) in .Rd . The cardinal of a set S is .|S|.
2 Maximum Entropy Models This first section introduces general maximum entropy models conditioned by vectors of expected values, which we shall define from phase information in the following sections. In his seminal paper, Jaynes [15] interprets statistical physics as an inference of a probability distribution from partial measurements, by maximizing its entropy. In Jaynes’ words [15], maximizing the entropy of a probability distribution “is maximally noncommittal with regard to missing information.” Gibbs distributions are maximum entropy models conditioned by an energy vector providing the expected values of some functions of X, corresponding to energy potentials in statistical physics.
2.1 Macrocanonical Models We review the properties of maximum entropy models of a stationary random vector X, conditioned by covariance coefficients of a representation .R(X). We shall try to adapt this representation to build models which capture important non-Gaussian properties of X, from a limited subset of all covariance coefficients in the matrix: KR = Cov(R(X), R(X)) = E (R(X) − MR )(R(X) − MR )∗ ,
.
428
S. Mallat et al.
where .MR = E(R(X)) is the mean vector. We suppose that .X(u) is defined for u in a cube .d ⊂ Zr with d grid points, for example, in .[1, d 1/r ]r . If .r = 2, then each realization is an image of d pixels. Let us write .R(X) = {Rv (X)}v∈V , where V is a finite set of vertices in a graph. We introduce a maximum entropy model conditioned by the covariance KR (v, v ) = Cov(Rv (X), Rv (X)),
.
where .v is in a neighborhood .Nv ⊂ V of v. This means that we only keep correlation coefficients along the set of edges which relates all neighbors .E = {(v, v ) : v ∈ V , v ∈ Nv }, on a graph .(V , E). We want to approximate the probability distribution of X, which is supposed to have a density .p(x) relatively to the Lebesgue measure in .Rd . The entropy of a probability density .p˜ on .Rd is ˆ H (p) ˜ =−
.
p(x) ˜ log p(x) ˜ dx.
conditioned by the covariance .KR A maximum entropy macrocanonical model .X over a set of edges E has a probability density .p˜ which maximizes .H (p) ˜ and satisfies covariance moment conditions for all .(v, v ) ∈ E: ˆ . (Rv (x) − MR (v)) (Rv (x) − MR (v ))∗ p(x) ˜ dx = KR (v, v ). (1) If there exists a solution to this convex optimization with equality constraints, then it is unique and it can be written as [16] p(x) ˜ = Z−1 exp − βv,v (Rv (x) − MR (v)) (Rv (x) − MR (v ))∗ ,
.
(2)
(v,v )∈E
where .βv,v is the Lagrange multiplier associated with the equality condition (1). The sum in the exponential is the Gibbs energy, and .Z is the partition function. If all .Rv (x) are linear operators, then the Gibbs energy is bilinear and .p(x) ˜ is thus a Gaussian distribution. This is not true if some .Rv (x) are nonlinear. If .Rv (x) is nonlinear, then computing the Lagrange multipliers can be computationally very expensive when the dimension d of X and the total number .|E| of moments are large. One can however generate samples of a maximum entropy distribution without computing these Lagrange multipliers, with a microcanonical approach.
Wavelet Phase Harmonics
429
2.2 Microcanonical Models Microcanonical models are computed from a single realization .x¯ of X. Since X is stationary, an empirical estimator of covariance coefficients is computed with an average over all spatial positions. A microcanonical ensemble . is a set of signals x which have an empirical covariance that is sufficiently close to the empirical covariance of .x. ¯ A microcanonical model is a maximum entropy distribution supported in .. If . is compact, then the maximum entropy distribution is uniform over .. A major issue in statistical physics is to verify the Boltzmann equivalence principle which guarantees the convergence of microcanonical and macrocanonical models toward the same Gibbs measures, when the signal dimension d goes to .∞ [17]. This involves the proof of a large deviation principle which expresses concentration properties of the empirical covariance of .R(X). If .R is continuous and bounded so that the interaction potential .(R − MR )(R − MR )∗ is also continuous and bounded and if there is no “phase transition,” which means that the limit is a unique Gibbs measure, then one can prove that microcanonical and macrocanonical measures converge to the same limit for an appropriate topology [18, 19]. Microcanonical models avoid the calculation of Lagrange multipliers and are guaranteed to exist as opposed to macrocanonical models. They rely on an ergodicity property which insures that covariance estimations concentrate near the true covariance .KR when d is sufficiently large. Sampling a maximum entropy microcanonical set requires using Monte Carlo algorithms. They are computationally very expensive when the number of moments and the dimension d are large because their mixing time becomes prohibitive [20]. Following the approach in [21], we approximate the sampling of microcanonical models with an L-BFGS quasi-Newton gradient descent algorithm initiated on a Gaussian white noise. The numerical and mathematical properties of these approximate sampling algorithms are studied in [21].
3 High-Order Moments and Phase Harmonics Non-Gaussian properties can be captured by high-order moments. Section 3.1 studies the relation between high-order moments and harmonics of the phase. Section 3.2 reviews the properties of phase harmonic operators and shows that one can capture non-Gaussian dependencies with phase harmonics, while avoiding the inconveniences of high-order moments. Relations with rectifiers used in neural networks are explained.
430
S. Mallat et al.
3.1 Fourier Phase and High-Order Moments The covariance of a stationary random vector is diagonalized by the discrete Fourier transform. We show the covariance of Fourier coefficients is zero at different frequencies because of relative phase fluctuations. Non-Gaussian dependencies across frequencies are revealed by higher order moments because they cancel relative phase fluctuations between frequencies. Since .X(u) is stationary .KR (u, u ) = Cov(X(u), X(u )) only depends on .u − u . We write .|u| the norm of u and .u .u the inner product. Let . x = Fu x be the discrete Fourier transform of x in a cube .d ⊂ Zr of cardinal d:
x (ω) =
.
x(u) e−iω.u for ω = 2π d −1/r m with m ∈ d .
(3)
u∈d
The Fourier representation .R = Fu is indexed by .ω ∈ V = 2π d −1/r d . The covariance for .(ω, ω ) ∈ V 2 is )). KR (ω, ω ) = Cov(X(ω) X(ω
.
If .ω = 0, then .E(X(ω)) = 0. If .ω = ω , then .KR (ω, ω ) = 0 because of random phase fluctuations. Indeed, translating .X(u) by any .τ ∈ d multiplies .X(ω) by −iτ.ω . Since X is stationary, this translation is a symmetry which does not modify .e )). It results that .Cov(X(ω), X(ω
)) = ei(ω−ω ).τ Cov(X(ω), )). Cov(X(ω), X(ω X(ω
.
(4)
Since this is true for any .τ ∈ d , it implies that )) = 0 if ω = ω . Cov(X(ω) X(ω
.
(5)
is also Gaussian, so this non-correlation implies that .X(ω) If X is Gaussian, then .X ) are independent. However, if X is non-Gaussian, then .X(ω) ) and .X(ω and .X(ω are typically not independent. To capture the dependencies of Fourier coefficients across frequencies, one can use high-order moments [22]. For any exponent .k ∈ Z, similarly to (4), translating X by .τ ∈ d yields k , X(ω )k ) = ei(kω−k ω ).τ Cov(X(ω) k , X(ω )k ). Cov(X(ω)
.
(6)
k is indexed by .v = (ω, k) ∈ V A high-order Fourier representation .Rv (X) = X(ω) with .0 ≤ k ≤ kmax . It results from (6) that k , X(ω )k ) = 0 if kω = k ω . KR (v, v ) = Cov(X(ω)
.
(7)
Wavelet Phase Harmonics
431
If .kω = k ω and X is not Gaussian, then this covariance is typically nonzero )k ∗ cancel the phase variations of .X(ω) k . This because the phase variations of .X(ω is also the key idea behind the use of bispectrum moments [23]. By adjusting .(k, k ), ) for these moments provide some dependency information between .X(ω) and .X(ω .ω = ω . For example, let .X(u) = x(u − S) be a random shift vector, where .x(u) is a fixed signal supported in .d and S is a random periodic shift which is uniformly distributed in .d . It is a stationary process whose Fourier coefficients have a random phase: .X(ω) = x (ω) e−iω.S . In this case, if .kω = k ω , then
k , X(ω )k ) = Cov(X(ω) x (ω)k x (ω )∗k .
.
(8)
It is nonzero at frequencies .(ω, ω ) where . x does not vanish. This shows that covariances of the high-order Fourier exponents .R(X) can capture the dependence k has a variability of Fourier coefficients at different frequencies. However .X(ω) which is amplified by the exponent .k > 1. The estimation of the covariance moments (8) thus has a large variance and typically becomes inaccurate when k increases.
3.2 Phase Harmonics High-order exponents yield nonzero correlation coefficients in a Fourier basis by canceling the phase fluctuations. For .z = |z|eiϕ(z) ∈ C, .zk = |z|k eikϕ(z) exponentiates the modulus together with the phase. The modulus exponentiation amplifies the variance of a random variable although it has little role in ensuring that correlation of Fourier coefficients is nonzero. We eliminate the modulus exponentiation and we replace it by the phase harmonics introduced in [13]. A phase harmonic computes a power .k ∈ Z of the phase only: [z]k = |z| eikϕ(z) .
.
It preserves the modulus: .|[z]k | = |z|. We shall see that it is computed with a windowed Fourier transform on the phase of z. This section reviews the properties of this phase windowed Fourier transform introduced in [13].
3.2.1
Phase Windowed Fourier Transform
A windowed Fourier transform of a signal .x(u) is a linear operator which multiplies x(u) by a translated window along u and computes a Fourier transform of the windowed signal along u. Because the phase .ϕ(z) is a nonlinear function of .z ∈ C, a windowed Fourier transform on the phase is a nonlinear operator.
.
432
S. Mallat et al.
The phase of z is translated by a variable .α ∈ [0, 2π ], and its support is limited by a .2π periodic window .h(α): H(z) = {|z| h(ϕ(z) + α)}α∈[0,2π ] .
.
(9)
This phase windowing is nonlinear. A phase windowed Fourier transform computes the Fourier transform of .H(z) relatively to .α. Let us write . h = Fα (h) the Fourier transform along phases: ˆ 2π 1 h(α) e−ikα dα. (10) .h(k) = 2π 0 Applying .Fα to (9) gives H(z) = { h(k) [z]k }k∈Z .
.
(11)
= Fα H computes weighted It proves that a phase windowed Fourier transform .H phase harmonics. The harmonic weights .h(k) amplify or eliminate different phase harmonics. The more regular the phase window h, the faster the decay of harmonic weights. A rectifier .ρ(a) = max(a, 0) is an important example of nonlinearity which acts as a phase windowing. Indeed ρ(Real(z)) = |z| ρ(cos ϕ(z)) ,
.
so {ρ(Real(eiα z))}α∈[0,2π ] = H(z) with h(α) = ρ(cos α).
.
(12)
The rectifier phase window .ρ(cos α) is positive and supported in .[−π/2, π/2]. The corresponding harmonic weights are computed in [13] with the Fourier integral (10): ⎧ −(i)k ⎪ ⎨ π(k−1)(k+1) if k is even . h(k) = 1 if k = ±1 ⎪ ⎩4 0 if |k| > 1 is odd.
(13)
We mentioned that polynomial exponents amplify the variability of random variables around their mean. Indeed, if .(z, z ) ∈ C2 , then .|zk − zk |/|z − z | may be arbitrarily large if .k > 1. On the contrary, a phase harmonic preserves the modulus which provides a bound on such amplification. It is proved in [13] that it is Lipschitz continuous ∀(z, z ) ∈ C2 , |[z]k − [z ]k | ≤ max(|k|, 1) |z − z |.
.
The distance .|z − z | is therefore amplified by at most .|k|.
(14)
Wavelet Phase Harmonics
3.2.2
433
Fourier Phase Harmonics Covariances
Similarly to high-order moments, phase harmonics can reveal the dependencies between Fourier coefficients by canceling their relative phase. Phase harmonic covariance coefficients are k )]k ). KHF h(k) h(k )∗ Cov([X(ω)] , [X(ω u (ω, k, ω , k ) =
.
As in the Fourier case, we verify that
k k Cov([X(ω)] , [X(ω )] ) = 0 if kω = k ω .
.
(15)
However these covariance coefficients are typically nonzero if .kω = k ω when ) are not independent. These properties are further studied in [14]. .X(ω) and .X(ω
4 Wavelet Transforms To model random vectors whose realizations include singularities and sharp transitions, following the work of Alex Grossmann and Jean Morlet [1], we replace the Fourier transform by a complex wavelet transform and we study the information carried by the phase. Sections 4.1 and 4.2 briefly review the properties of complex wavelet transforms.
4.1 Analytic Wavelets for 1D Signals A one-dimensional wavelet transform is computed by convolutions with dilated (ω) which is zero at wavelets. Analytic wavelets .ψ have a Fourier transform .ψ negative frequencies. We impose that .ψ is real valued, which implies that the real ´ (ω)|2 dω/ ψ 2 be part of .ψ is even and its imaginary part is odd. Let .ξ = ω |ψ the mean frequency of .ψ. We also suppose that .ψ (ξ ) > 0. A wavelet transform with Q scales per octave is calculated by dilating .ψ by .2j/Q , where j is an integer and Q is the number of intermediate scales per octave: λ (ω) = ψ (2j/Q ω) . ψλ (u) = 2−j/Q ψ(2−j/Q u) and hence ψ
.
The mean frequency of .ψλ is λ = 2−j/Q ξ.
.
434
S. Mallat et al.
A real wavelet of phase .α is defined by .Real(e−iα ψλ ). The phase .α is a symmetry parameter which makes the transition from even to odd filters and which changes the filters sign when adding .π . Figure 1 gives the modulus and the phase of the wavelet transform of a onedimensional signal, calculated with a Morlet wavelet, with .Q = 16 scales per −t 2 /2 (eiξ t −α) octave. A Morlet wavelet of center ´ frequency .ξ is defined by .ψ(t) = e where .α is adjusted so that . ψ(t) dt = 0. It is approximatively analytic. Fine scales correspond to high frequencies .λ. Large modulus coefficients .|x ψλ (u)| are sparse. They are located in the neighborhood of sharp signal transitions. The phase .ϕ(x ψλ (u)) gives a local symmetry information on the transition of .x ψλ at u. Since the real and imaginary parts of .ψλ are, respectively, symmetric and antisymmetric, the variations of .x ψλ are locally symmetric in the neighborhood of u if .ϕ(x ψλ (u)) = 0 and antisymmetric if this phase is .π/2. Lines of constant phase in Fig. 1b define curves across scales which depend upon local signal variations. Sharp transitions and singularities correspond to locations where the modulus has a large amplitude in Fig. 1a.
Fig. 1 Top: original signal .x(u) as a function of u. (a) Wavelet transform modulus .|x ∗ ψλ (u)| with .Q = 16 scales per octave, as a function of .(u, log2 λ) along the horizontal and vertical axes. White and black points correspond, respectively, to small and large amplitudes. (b) Complex phase .ϕ(x ∗ ψλ (u)) as a function of .(u, log2 λ)
Wavelet Phase Harmonics
435
4.2 Rotated Wavelet Frames Complex steerable wavelet frames were introduced in [24] and are further studied in [25], to easily compute wavelet coefficients of rotated images. A two-dimensional ´ wavelet is a localized function .ψ(u) with .u ∈ R2 such that . ψ(u) du = 0. Complex (ω) concentrated over one-half of the Fourier steerable wavelets have a Fourier .ψ (ω) is real. This Fourier domain .ω ∈ R2 . We impose that .ψ(−u) = ψ ∗ (u) so that .ψ transform is centered at a frequency .ξ ∈ R2 and is non-negligible for .ω ∈ R2 such that .|ω − ξ | ≤ C|ξ | for some .C > 0. Let .r be a rotation by an angle .2π/L. Multiscale rotated wavelets are derived from .ψ with dilations by .2j for .j ∈ Z and rotations over L angles .θ = 2π/L for .0 ≤ < L (2j r ω) with λ = 2−j r− ξ. λ (ω) = 2j ψ ψλ (u) = 2−j ψ(2−j r− u) ⇒ ψ (16) (ω) is non-negligible for .|ω − ξ | ≤ C|ξ |, it results that .ψ λ (ω) is centered Since .ψ at .λ and non-negligible for .|ω − λ| ≤ C|λ|. In space, .ψλ (u) is non-negligible for −1 for some .C > 0. We limit the scale .2j to a maximum .2J . The lowest .|u| ≤ C |λ| frequencies are captured by a scaling ´ function centered at .λ = 0. It is computed by dilating a function .φ(u) such that . φ(u) du = 1: .
(2J ω). 0 (ω) = 2J φ ψ0 (u) = 2−J φ(2−J u) ⇒ ψ
.
(17)
A wavelet frame is constructed by translating each .ψλ for .λ = 0 by .u = 2j −1 n and .ψ0 by .u = 2J −1 n for all .n ∈ Z2 . It introduces a factor 2 oversampling relatively to a wavelet orthonormal basis [26], which creates some redundancy. The wavelet transform of .x ∈ L2 (R2 ) is defined by Wx = {x ψλ (u)}(λ,u)∈ ,
.
where . is a frequency–space index set with .(λ, u) = (2−j r− ξ, 2j −1 n) for .1 ≤ j ≤ J , .0 ≤ < L, .n ∈ Z, or .(λ, u) = (0, 2J −1 n). Under appropriate conditions on 2 2 .ψ, the wavelet family .{ψλ (· − u)}(λ,u)∈ is a frame of .L (R ) [25], which implies that .W is invertible with a stable inverse. In all computations, we use a Morlet −|u|2 /(2σ 2 ) (eiξ.u − α), where .α .ξ defined by .ψ(u) = e wavelet of center frequency ´ is adjusted so that . ψ(u) du = 0 and .σ and .ξ are adjusted so that the translated wavelets define a frame. The wavelet transform can be redefined over discrete images x of d pixels supported in a two-dimensional square grid .d , uniformly sampled at intervals 1 in 1/2 ]2 . It requires to discretize and modify “boundary wavelets” whose supports .[1, d intersect image boundaries. This can be done over steerable wavelets [24, 25]. The by .(λ, u) ∈ . resulting wavelet .ψλ (· − u) is supported in .d . They are still indexed If .λ = r− ξ = 0 for .1 ≤ j ≤ J , .0 ≤ < L, then . u∈d ψλ (u) = 0. If .λ = 0, then . u∈d ψ0 (u) = 2J . Since .J ≤ (log2 d)/2, there are at most
436
S. Mallat et al.
Fig. 2 Top: turbulent velocity field. Bottom: each image gives the modulus (above) or the phase (below) of wavelet coefficients .x ψλ (u) for different frequency channels .λ = 2−j r− ξ . Large modulus coefficients are shown in black. The first and third columns correspond to finer scale wavelets along vertical and horizontal directions, whereas second and fourth columns correspond to larger scale wavelets along vertical and horizontal directions
L(log2 d)/2 + 1 different frequency channels .λ. For .λ = r− ξ = 0, .ψλ is translated by .u = 2j −1 n ∈ d , which yields .2−j +1 d wavelet coefficients. The total number of wavelets coefficients is about .4Ld/3 if .J = (log2 d)/2. The mother wavelet .ψ is chosen in order to obtain a sparse wavelet representation of realizations of X, with few large amplitude wavelet coefficients. This sparsity highlights non-Gaussian properties. Figure 2 displays the modulus and phase of wavelet coefficients of the vorticity field of a turbulent flow, computed with Morlet wavelets. This flow is obtained by running the 2D Navier Stokes equation with periodic boundary conditions, initialized with a random Gaussian field [27]. After a fixed time, it defines a stationary but non-Gaussian random process. For each scale and orientation, large amplitude modulus coefficients are located at positions where the image has sharp transitions, and the phase depends upon the position of these sharp transitions.
.
Wavelet Phase Harmonics
437
5 Maximum Entropy Models with Wavelet Phases 5.1 Foveal Wavelet Covariance Models A wavelet transform defines a linear representation .R = W indexed by .v = (λ, u). Similarly to Fourier coefficients, we show that wavelet coefficients have a covariance which nearly vanish at different frequencies and thus do not capture dependencies across frequencies. Wavelet covariances define Gaussian maximum entropy models that are briefly reviewed. Similarly to Fourier coefficients, wavelet coefficients have azero mean at nonzero frequencies. If .λ = 0, then .E(X ψλ (u)) = 0 because . u ψλ (u) = 0. If .λ = 0, then . u ψ0 (u) = 2J , so .E(X ψ0 (u)) = 2J E(X(u)). The covariance at .v = (λ, u) and .v = (λ , u ) is KW (v, v ) = Cov(X ψλ (u) , X ψλ (u )) .
.
λ (ω) be the discrete Fourier transIt depends on .u−u because X is stationary. Let .ψ form of .ψλ (u) defined in (3). Since wavelet coefficients are convolutions, covariance values can be rewritten from the power spectrum .K(ω) X(ω)) of X = d1 Cov(X(ω), KW (v, v ) =
.
λ∗ (ω) ei(u−u ).ω . λ (ω) ψ K(ω) ψ
(18)
ω∈d
λ (ω) ψ λ (ω) is nonλ (ω) = 0 for all .ω. Since .ψ It results that .KW (v, v ) = 0 if .ψ negligible only if .|ω − λ| ≤ C|λ|, the covariance .KW (v, v ) is non-negligible for .λ = λ only if .
|λ − λ | ≤C. |λ| + |λ |
(19)
It shows that similarly to Fourier coefficients, wavelet covariances are negligible across frequencies which are sufficiently far apart. A maximum entropy model conditioned by wavelet covariances is Gaussian because the wavelet transform is linear. The top images of Fig. 3 gives a realization of a stationary turbulent flow X and of a physical process generating bubbles. To reduce the dimensionality of the Gaussian model, we only keep wavelet covariance coefficients which are non-negligible. It corresponds to covariance coefficients = (u , λ ) in a small neighborhood of .KW (v, v ) on a graph which relates .v .v = (u, λ). Since wavelet coefficients are nearly decorrelated across frequencies, at each scale .2j , the neighborhood of .v = (u, λ) is defined as the set of .v = (u , λ ) at the same scale .λ = λ and within a spatial neighborhood proportional to the scale .|u − u | ≤ 2j −1 . Covariances are thus only specified at each scale over a spatial range proportional to the scale. It is called a foveal model. Long-range spatial correlations are partly captured because .2j −1 become large at large scales.
438
S. Mallat et al.
Fig. 3 The first row shows a realization .x¯ of X for a two-dimensional turbulent flow and of physical bubble process. The second row gives realizations of Gaussian models conditioned by wavelet covariance coefficients estimated from .x. ¯ The third row is generated by microcanonical models conditioned by wavelet phase harmonic covariances estimated from .x¯
Wavelet Phase Harmonics
439
It provides high-frequency correlations between close points and low-frequency correlations between far away points. It is similar to a visual fovea [28]. This foveal neighborhood is sufficient to approximate the covariances of large classes of random processes such as fractional Brownian motions [29]. Since .(u, u ) = 2j −1 (n, n ), the neighborhoods of all v have the same size, which is smaller than .(2 + 1)2 . The realizations of the Gaussian models in the second row of Fig. 3 have a geometry which is very different from the original turbulence flow or the bubble process X shown in the first row. It shows that in both cases, X is highly nonGaussian. This also appears by looking at the wavelet transform of the turbulent flow in Fig. 2. We explained that the wavelet coefficients of a stationary process X are not correlated at different scales and angles. If X was Gaussian, it would imply that these wavelet coefficients are independent. On the contrary, Fig. 2 shows that the modulus and phase of wavelet coefficients of X are strongly dependent across scales and angles. High-amplitude modulus coefficients are located in the same spatial neighborhoods because they are produced by the same sharp transitions of the flow. The next section explains how to capture these dependencies with phase harmonics.
5.2 Wavelet Phase Harmonic Models Similarly to the Fourier case, phase harmonics create correlations between wavelet coefficients across different frequency bands. We first study the properties of the resulting covariance matrix. We then show that maximum entropy models conditioned by these phase harmonic covariances define non-Gaussian models that restore the geometry of coherent structures.
5.2.1
Wavelet Phase Harmonics
To specify the dependence across frequencies, we apply a phase harmonic operator to wavelet coefficients:
.H(Wx) . = h(k) [x ψλ (u)]k (λ,u)∈,k∈Z
The coefficients of .R = HW are indexed by .v = (λ, k, u). The covariance are coefficients of .H(WX)
KHW h(k) h(k )∗ Cov([X ψλ (u)]k , [X ψλ (u )]k ). (v, v ) =
.
Since X is stationary, it only depends on .u − u . Such wavelet harmonic covariances have first been computed by Portilla and Simoncelli [11] to characterize the statistics of image textures. Their representation corresponds to .(k, k ) equal to .(0, 0), ˆ .(1, 1),and .(1, 2), which amounts to choosing .h(k) = 1[0,2] (k).
440
S. Mallat et al.
5.2.2
Rectified Neural Network Coefficients
Ustyuzhaninov et al. in [12] have shown that one can get good texture synthesis from the covariance of a one-layer convolutional neural network, computed with a rectifier. In the following, we show that these statistics are equivalent to phase harmonic covariances, computed with a rectifier phase window .h(α). = Fα H, where .H computes a phase windowing of Section 3.2 proves that .H wavelet coefficients
.H(Wx) = |x ψλ (u)| h(ϕ(x ψλ (u)) + α) . (λ,u)∈,α∈[0,2π ]
The covariance of .H(WX) and .H(WX) thus satisfies .KHW = Fα KHW F−1 α . It results from (12) that .KHW gives the covariance of rectified wavelet coefficients if .h(α) = ρ(cos α) with a rectifier .ρ(u) = max(u, 0). In this case, for any .v = (λ, α, u) and .v = (λ , α , u ), KHW (v, v ) = Cov ρ(X ψλ,α (u)) , ρ(X ψλ ,α (u ))
.
(20)
with .ψλ,α (u) = Real(e−iα ψλ (u)). It computes the covariance of rectified wavelet coefficients .ρ(X ψλ,α (u)). Such coefficients can be interpreted as the output of a one-layer convolutional network, computed with wavelet filters .ψλ,α of different frequencies .λ and phases .α. The difference with the statistics used by Ustyuzhaninov et al. in [12] relies in the choice of network filters. They use local cosine or random filters in their network as opposed to wavelets.
5.2.3
Maximum Entropy Wavelet Phase Harmonic Foveal Model
ˆ The phase harmonic operator is defined by the choice of the harmonic weights .h. ˆ In the following, we shall impose that .h(k) = 1[kmin ,kmax ] (k), which limits harmonic exponents in the range .[kmin , kmax ]. Wavelet harmonic coefficients are indexed by .v = (λ, k, u), with .λ = 2−j r− ξ and .u = 2j −1 n. The covariance graph model is specified by the correlation neighborhoods .Nv of each v. A foveal model defines neighborhoods whose size does not depend upon v. The range of spatial, scale, and angular parameters is limited by three parameters .n , .j , and . . A vertex .v = (λ , k , u ) with −j r ξ and .u = 2j −1 n is a neighbor of .v = (λ, k, u) only if .λ = 2 − |n − n | ≤ n , |j − j | ≤ j , | − | ≤ , (k, k ) ∈ [kmin , kmax ]2 .
.
Such foveal models have been used by Portilla and Simoncelli [11] to synthesize image textures.
Wavelet Phase Harmonics
441
We define a foveal wavelet phase harmonic model with .kmin = 0 and .kmax = 2. It incorporates covariances of the modulus and phase of wavelet coefficients across scales and angles. Neighborhoods are limited by .j = 1, . = L/4 and .n = 2. It incorporates .j = j + 1 to capture scale interactions. The phases of wavelet coefficients at a scale .2j and .2j = 2j +1 are correlated with the harmonic exponents .(k, k ) = (1, 2). The set of .(k, k ) is restricted to .k = 0, 1, 2 when .k = 0 and .k = 1, 2 when .k = 1. The top row of Fig. 3 shows a realization .x¯ of different stationary processes X. Realizations of Gaussian models shown below do not capture coherent geometric structures. We sample microcanonical maximum entropy models conditioned by foveal wavelet phase harmonic covariances. Empirical estimations of wavelet phase harmonic covariances are calculated from each .x. ¯ In these examples, the number of covariance coefficients is about 10 times smaller than the number of image pixels. Microcanonical models described in Sect. 2 are uniform distributions over sets of signals having close empirical covariance values. We approximate the sampling of this microcanonical model with a gradient descent. The realization is initialized to be a Gaussian white noise, and it is iteratively modified to adjust its wavelet phase harmonic covariances. The third row of Fig. 3 shows realizations of these wavelet phase harmonic models computed with the gradient descent. Adjusting wavelet phase harmonic covariances across frequencies and orientations restores the coherent structures of these turbulence and bubble processes. Besides visual evaluations, a detailed numerical study in [14] shows that such models reproduce multiscale structure functions and high-order moments. Wavelet phase harmonic models have also been applied to characterize the statistics of projected twodimensional mass density fields in Cosmology [30]. It is shown that such maximum entropy models reproduce physical properties and a wide range of high-order moments including bispectrum moments.
6 Conclusion The analysis of wavelet phases is yet another example of the remarkable scientific vision of Alex Grossmann. He was pointing out the importance of the phase to capture multiscale structures at the end of the 1980s, but besides some early work, it took more than 20 years to understand the importance of wavelet phases in signals and random process models. This chapter reviews techniques to do so with correlation matrices based on wavelet phase harmonics. It shows that maximum entropy models conditioned by wavelet phase harmonic covariances can indeed restore coherent structures of large classes of non-Gaussian random fields. These results apply to a wide range of multiscale processes such as turbulent flows or cosmological mass density fields. The importance of multiscale phases also appears in convolutional neural networks through the use of nonlinear rectifiers to produce phase harmonics. However, this topic initiated by Alex Grossmann remains mostly unexplored, with little mathematical understanding of phase alignment properties across scales.
442
S. Mallat et al.
Acknowledgement This work was supported by the PRAIRIE 3IA Institute of the French ANR19-P3IA-0001 program.
References 1. Grossmann, A and Morlet, J. Decomposition of Hardy functions into square integrable wavelets of constant shape. SIAM J. of Math. Anal., 15(4):723–736, July 1984. 2. Grossmann, A, Kronland-Martinet, R, and Morlet, J. Reading and understanding continuous wavelet transforms. In Combes, J, editor, Wavelets, time-frequency representations and phase space. Springer, Berlin, 1989. 3. Delprat, N, Escudié, B, Guillemain, P, Kronland-Martinet, R, Tchamitchian, P, and Torrésani, B. Asymptotic wavelet and Gabor analysis: extraction of instantaneous frequencies. IEEE Trans. Info. Theory, 38(2):644–664, March 1992. 4. Meyer, Y. Wavelets and Operators. Advanced mathematics. Cambridge university press, 1992. 5. Jaffard, S. Pointwise smoothness, two-microlocalisation and wavelet coefficients. Publications Matematiques, 35:155–168, 1991. 6. Mallat, S and Hwang, W. L. Singularity detection and processing with wavelets. IEEE Trans. on Information Theory, 38(2):617–643, March 1992. 7. Donoho, D and Johnstone, I. Ideal denoising in an orthonormal basis chosen from a library of bases. C.R. Acad. Sci. Paris, Série I, 319:1317–1322, 1994. 8. Wendt, H, Abry, P, and Jaffard, S. Bootstrap for empirical multifractal analysis. IEEE Signal Processing Magazine, 24(4):38–48, July 2007. 9. Farge, M, Schneider, K, and Kevlahan, N. Non-Gaussianity and coherent vortex simulation for two-dimensional turbulence using an adaptive orthogonal wavelet basis. Physics of Fluids, 11(8):2187–2201, 1999. 10. Geman, S and Geman, D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on pattern analysis and machine intelligence, (6):721–741, 1984. 11. Portilla, J and Simoncelli, E. P. A parametric texture model based on joint statistics of complex wavelet coefficients. International Journal of Computer Vision, 40(1):49–70, Oct 2000. 12. Ustyuzhaninov, I, Brendel, W, Gatys, L, and Bethge, M. What does it take to generate natural textures? In International Conference on Learning Representations, Apr 2017. 13. Mallat, S, Zhang, S, and Rochette, G. Phase harmonic correlations and convolutional neural networks. Information and Inference: A Journal of the IMA, 11 2019. 14. Zhang, S and Mallat, S. Wavelet phase harmonic covariance models of stationary processes. submitted to Jour. of Pure and Applied Harmonic Analysis, 2019. 15. Jaynes, E. T. Information theory and statistical mechanics. Physical review, 106(4):620, 1957. 16. Cover, T. M and Thomas, J. A. Elements of Information Theory. Wiley-Interscience, New York, NY, USA, 2006. 17. Dembo, A and Zeitouni, O. Large Deviations Techniques and Applications. Johns and Bartett Publishers, Boston, 1993. 18. Deuschel, J, Stroock, D, and Zession, H. Microcanonical distributions for lattice gases. Commun. Math. Phys., 139:83–101, 1991. 19. Georgii, H.-O. Gibbs measures and phase transitions, volume 9. Walter de Gruyter, 2011. 20. Lustig, R. Microcanonical Monte Carlo simulation of thermodynamic properties. The Journal of Chemical Physics, 109(20):8816–8828, 1998. 21. Bruna, J and Mallat, S. Multiscale sparse microcanonical models. Mathematical Statistics and Learning, 1:257–315, 2018. 22. Cramér, H. Mathematical Methods of Statistics (PMS-9). Princeton University Press, 1999. 23. Rao, T. S and Gabr, M. An introduction to bispectral analysis and bilinear time series models, volume 24. Springer Science & Business Media, 2012.
Wavelet Phase Harmonics
443
24. Simoncelli, E. P and Freeman, W. T. The steerable pyramid: a flexible architecture for multiscale derivative computation. In Proceedings., International Conference on Image Processing, volume 3, pages 444–447 vol.3, Oct 1995. 25. Unser, M and Chenouard, N. A unifying parametric framework for 2d steerable wavelet transforms. SIAM Journal on Imaging Sciences, 6(1):102–135, 2013. 26. Mallat, S. A Wavelet Tour of Signal Processing: The Sparse Way, 3rd Edition. Academic Press, 2001. 27. Schneider, K, Ziuber, J, Farge, M, and Azzalini, A. Coherent vortex extraction and simulation of 2d isotropic turbulence. Journal of Turbulence, (7):N44, 2006. 28. Mallat, S. Foveal detection and approximation for singularities. Applied and Computational Harmonic Analysis, 14(2):133–180, 2003. 29. Andreux, M. Foveal Autoregressive Neural Time-Series Modeling. PhD thesis, École normale supérieure, 2018. 30. Allys, E, Marchand, T, Cardoso, J.-F, Villaescusa-Navarro, F, Ho, S, and Mallat, S. New interpretable statistics for large-scale structure analysis and generation. Physical Review D, 102(10), Nov 2020.
Multiscale Decompositions of Hardy Spaces Ronald R. Coifman and Jacques Peyrière
1 Introduction We would like to elaborate on a program of analysis pursued by Alex Grossmann and his collaborators on the analytic utilization of the phase of Hardy functions, as a multiscale signal processing tool. An inspiration at the origin of “wavelet” analysis (when Grossmann, Morlet, Meyer, and collaborators were interacting and exploring versions of multiscale representations) was provided, by the analysis of holomorphic signals, for which the images of the phase of Cauchy wavelets were remarkable in their ability to reveal intricate singularities or dynamic structures, such as instantaneous frequency jumps, in musical recordings. This work which was pursued by Grossmann, KronlandMartinet et al. [10] exploiting phase and amplitude variability of holomorphic signals was challenged by computational complexity as well as by the lack of simple, efficient, mathematical processing and generalizations to higher dimensional signals. It was mostly bypassed by the orthogonal wavelet transforms. We aim to show that these ideas are powerful nonlinear subtle tools.
R. R. Coifman Department of Mathematics, Program in Applied Mathematics, Yale University, New Haven, CT, USA e-mail: [email protected] J. Peyrière () Institut de Mathématiques d’Orsay, CNRS, Université Paris-Saclay, Orsay, France e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Flandrin et al. (eds.), Theoretical Physics, Wavelets, Analysis, Genomics, Applied and Numerical Harmonic Analysis, https://doi.org/10.1007/978-3-030-45847-8_20
445
446
R. R. Coifman and J. Peyrière
Our goal here is to follow their seminal work and introduce recent developments in nonlinear analysis. In particular, we will sketch methods extending conventional Fourier analysis, exploiting both phase and amplitudes of holomorphic functions. The miracles of nonlinear complex analysis, such as factorization and composition of functions, lead to new versions of holomorphic wavelets and relate them to multiscale dynamical systems. Our story interlaces the role of the phase of signals with their analytic/geometric properties. The Blaschke factors are a key ingredient, in building analytic tools, starting with the Malmquist-Takenaka orthonormal bases of the Hardy space 2 .H (T), continuing with “best” adapted bases obtained through phase unwinding, and concluding with relations to composition of Blaschke products and their dynamics (on the disk and on invariant subspaces of .H2 (T)). Specifically we discuss multiscale orthonormal holomorphic wavelet bases, related to Grossmann’s and Morlet’s program [7], and associated generalized scaled holomorphic orthogonal bases, to dynamical systems, obtained by composing Blaschke factors. We also remark that the phase of a Blaschke product is a one-layer neural net with (.arctan as an activation sigmoid) and that the composition is a “Deep Neural Net” whose depth is the number of compositions, and our results provide a wealth of related libraries of orthogonal bases. We sketch these ideas in various “vignette” subsections and refer for more details on analytic methods [2], related to the Blaschke-based nonlinear phase unwinding decompositions [3, 4, 13], and we also consider orthogonal decompositions of invariant subspaces of Hardy spaces. In particular, we constructed a multiscale decomposition, described below, of the Hardy space of the upper half plane. Such a decomposition can be carried in the unit disk by conformal mapping. A somewhat different multiscale decomposition of the space .H2 (T) has been constructed by using Malmquist-Takenaka bases associated with Blaschke products n whose zéros are .(1 − 2−n )e2iπj/2 , where .n ≥ 1 and .0 ≤ j < 2n [6]. Here we provide a variety of multiscale decompositions by considering iterations of Blaschke products.
2 Preliminaries and Notation For .p ≥ 1, .Hp (T) stands for the space of analytic functions f on the unit disk .D such that ˆ .
sup 0 0}. The space of analytic functions f on .H such that .
sup f (· + iy)Lp (R) < +∞
y>0
is denoted by .Hp (R). These functions have boundary values in .Lp (R) when .p ≥ 1. The space .Hp (R) is identified to the space of .Lp functions whose Fourier transform vanishes on the negative half line .(−∞, 0). A subspace of .H2 (R) is said to be invariant if it is stable by multiplication by the functions .e2iπ ξ x for all .ξ > 0. As previously, the invariant subspaces are of the form 2 .u H , where u is an inner function, i.e., a bounded analytic function on .H whose boundary values are of modulus 1 almost everywhere. As previously, the operators of orthogonal projections on invariant subspaces extend, for any .p ∈ (1, +∞), as continuous operators on .Hp (R) with a uniform bound for their norms.
448
R. R. Coifman and J. Peyrière
3 Malmquist-Takenaka Bases on the Torus Lemma 1 Let a be a complex number of modulus less than 1 and u be an inner 1 − |a|2 function. Then, (z − a)uH2 has codimension 1 in uH2 and u is a unit 1 − az 2 2 vector in the orthogonal complement of (z − a)uH in uH . Proof Since f → uf is an isometry of H2 onto uH2 , it is enough to consider the case u = 1. One has ˆ π 1 − |a|2 1 − |a|2 1 iθ iθ (e − a)f (e ) dθ . (z − a)f (z), = 1 − az 2π −π 1 − ae−iθ ˆ 1 − |a|2 π iθ = e f (eiθ ) dθ = 0. 2π −π Also, if f is orthogonal to (1 − az)−1 , one has 0=
.
1 2π
ˆ
π
−π
f (eiθ ) 1 dθ = −iθ 2iπ 1 − ae
˛
f (z) dz = f (a), z−a
so f ∈ (z − a)H2 .
Now, let (an )n>0 be a sequence of complex numbers of modulus less than 1. For n ≥ 0, let Bn (z) =
.
0≤j 0
Multiscale Decompositions of Hardy Spaces
449
Consider a sequence (Bm )m≥1 : Bm (z) =
.
a m,j z − am,j |am,j | 1 − a m,j z
j >0
of convergent Blaschke products such that Let B0 = 1 and, for n ≥ 0 and m ≥ 1,
Bm,n (z) =
.
0≤j 0
z − am,j 1 − a m,j z
1 − |am,n |2 . φm,n (z) = Bm−1 (z)Bm,n (z) 1 − a m,n z Then, (φm,n )n≥1 is an orthonormal basis of Bm−1 H2 Bm−1 Bm H2 , and (φm,n )m≥1,n≥1 is an orthonormal basis of H2 . The bases so obtained are the Malmquist-Takenaka bases [11, 15].
4 The Upper Half Plane We present some prior results [2], without proof. In this section, one simply writes H2 instead of .H2 (R).
.
4.1 Malmquist-Takenaka Bases Let .(aj )1≤j be a sequence (finite or not) of complex numbers with positive imaginary parts and such that .
j ≥0
aj < +∞. 1 + |aj |2
The corresponding Blaschke product is 1 + aj2 x − aj , .B(x) = 1 + aj2 x − a j j ≥0
(2)
450
R. R. Coifman and J. Peyrière
where, .0/0, which appears if .aj = i, should be understood as 1. The factors 2 1 + aj . insure the convergence of this product when there are infinitely many 1 + aj2 zeroes. But, in some situations, it is more convenient to use other convergence factors as we shall see below. Whether the series (2) is convergent or not, one defines (for .n ≥ 0) the functions ⎛
⎞ x − aj 1 1 ⎝ ⎠ .φn (x) = √ . x − aj x − an π 0≤j 0, there is one fixed point where .|B | < 1. Let .P1 be the polynomial in z obtained by replacing a by .t + iu in the left-hand side of Eq. (5). The resultant of .P1 and .1 − wB (z) considered as polynomials in z is .(iu − t)3 (t 2 + u2 − 1)4 R(w), where
Multiscale Decompositions of Hardy Spaces
461
R(w) = 8(t 2 + u2 )(t 2 + u2 − 1)w 3 + (12t 4 + 24t 2 u2 + 12u4 − 4t 2 − 4u2 + 8t)w 2
.
+ 2(3t 2 + 3u2 + 1)(t 2 + u2 − 1)w + (t 2 + u2 − 1)2 . The roots of R, considered as a polynomial in w, are the inverses of the derivative of .B evaluated at the fixed points of .B. Let .R1 (w) = R(1 + w). We have R1 (w) = 8(t 2 + u2 )(t 2 + u2 − 1)w 3
+ 4 9(t 2 + u2 )2 − 7(t 2 + u2 ) + 2t w 2 + 2Qw + Q.
.
We see that the coefficients of .R1 present one variation of sign. This means that this polynomial has a positive root. Therefore, due to Descartes’ rule, R has a root larger than one. This means that there is a fixed point where the derivative of .B is in the interval .(0, 1). The following Fig. 5 shows the regions corresponding to the previous discussion. The equation of the red curve (a cardioid) is .27(t 2 +u2 )2 −18(t 2 +u2 )+8t −1 = 0. When the attracting fixed point is on the boundary of the disk, .|Bn (0)| converges exponentially fast toward 1. Therefore, if .(zj )j ≥0 is the sequence of the zeroes (counted according to their multiplicities) of all the iterates of .B, one has .
(1 − |zj |) < +∞. j ≥0
Fig. 5 When .(t, u) lies between the cardioid and the circle, all the fixed points have modulus 1, and when it lies inside, there is a fixed point inside .D
462
R. R. Coifman and J. Peyrière
References 1. Carleson, L., and Gamelin, T. W., Complex dynamics, Springer 1993. 2. Coifman, R. R., and Peyrière, J., Phase Unwinding, or invariant subspace decompositions of Hardy Spaces. Journal of Fourier Analysis and Applications 25 (2019), 684–695. 3. Coifman, R. R., and Steinerberger, S., Nonlinear phase unwinding of functions. J. Fourier Anal. Appl. (2016), 1–32. 4. Coifman, R. R., Steinerberger, S., and Wu, H. T., Carrier frequencies, holomorphy and unwinding. arXiv preprint arXiv:1606.06475, 2016 - arxiv.org. 5. I. Daubechies, R. DeVore, S. Foucart, B. Hanin, and G. Petrova. Nonlinear Approximation and (Deep) ReLU Networks. arXiv:1905.02199v1 [cs.LG] 5 May 2019. 6. Feichtinger, H.G. and Pap, M., Hyperbolic wavelets and multiresolution in the Hardy space of the upper half plane, Blaschke Products and Their Applications, (2013), Springer. 7. Grossmann, A. Morlet, J. “Decomposition of Hardy Functions into Square Integrable Wavelets of Constant Shape”. SIAM Journal on Mathematical Analysis. (1984) 8. Helson, H., Lectures on Invariant Subspaces. Academic press, New York and London, 1964. 9. Hoffman, K., Banach Spaces of Analytic Functions. Prentice-Hall, Englewood Cliffs, New Jersey, (1962). 10. R. Kronland-Martinet, J Morlet, A Grossmann, Analysis of sound patterns through wavelet transforms, International journal of pattern recognition and artificial, (1987). 11. F. Malmquist, Sur la determination d’une classe de fonctions analytiques par leurs valeurs dans un ensemble donne de poits, C.R. 6ieme Cong. Math. Scand. (Kopenhagen, 1925), Copenhagen, (1926), Gjellerups, 253–259. 12. W. Mi, T. Qian and F. Wan, A Fast Adaptive Model Reduction Method Based on TakenakaMalmquist Systems, Systems & Control Letters. Volume 61, Issue 1, January 2012, Pages 223–230. 13. Nahon, M., Dissertation, Yale University (2000). 14. T. Qian, I. T. Ho, I. T. Leong and Y. B. Wang, Adaptive decomposition of functions into pieces of non-negative instantaneous frequencies, International Journal of Wavelets, Multiresolution and Information Processing, 8 (2010), no. 5, 813–833. 15. Takenaka, S., On the orthogonal functions and a new formula of interpolation, Jpn. J. Math. II (1925), 129–145. 16. G. Weiss and M. Weiss, A derivation of the main results of the theory of H p -spaces. Rev. Un. Mat. Argentina 20 (1962), 63–71.
A Generalization of Gleason’s Frame Function for Quantum Measurement John J. Benedetto, Paul J. Koprowski, and John S. Nolan
Abstract The goal is to extend Gleason’s notion of a frame function, which is essential in his fundamental theorem in quantum measurement, to a more general function acting on 1-tight, so-called, Parseval frames. We refer to these functions as Gleason functions for Parseval frames. The reason for our generalization is that positive operator-valued measures (POVMs) are essentially equivalent to Parseval frames and that POVMs arise naturally in quantum measurement theory. We prove that under the proper assumptions, Gleason functions for Parseval frames are quadratic forms, as well as other results analogous to Gleason’s original theorem. Furthermore, we solve an intrinsic problem relating Gleason functions for Parseval frames of different lengths. We use this solution to weaken the hypotheses in the finite dimensional version of Busch’s theorem, which itself is an analog of Gleason’s mathematical characterization of quantum states.
1 Introduction 1.1 Background Garrett Birkhoff and John von Neumann [20] introduced quantum logic and the role of lattices to fathom “the novelty of the logical notions which quantum theory pre-supposes.”
J. J. Benedetto () Norbert Wiener Center, Department of Mathematics, University of Maryland, College Park, MD, USA e-mail: [email protected] P. J. Koprowski Amtrak, Washington DC, USA J. S. Nolan Department of Mathematics, UC Berkeley, Berkeley, CA, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Flandrin et al. (eds.), Theoretical Physics, Wavelets, Analysis, Genomics, Applied and Numerical Harmonic Analysis, https://doi.org/10.1007/978-3-030-45847-8_21
463
464
J. J. Benedetto et al.
The topics they mentioned for this “novelty” include: 1. Heisenberg’s uncertainty principle 2. Principle of non-commutativity of observations Their fundamental ideas led to the representation theorem in quantum logic that, loosely speaking, allows one to treat quantum measurement outcomes as a lattice .L(H) of subspaces of a separable Hilbert space .H over the field .K, where .K = R or .K = C, see, e.g., [24, 71]. As such, the work of Birkhoff and von Neumann, as well as von Neumann’s classic [63], led to the study of measures on the closed subspaces of .H as formulated by Mackey [57], cf. [58, 59]. A measure on the closed subspaces of .H is a function .μ, which assigns, to every closed subspace of .H, a non-negative number such that if .{Xi } is a sequence of mutually orthogonal subspaces having closed linear span X, then μ(X) =
.
μ(Xi ).
i
Let .dim(H) denote the dimension of .H. In [44], in 1957, Gleason proved the celebrated result that if .dim(H) ≥ 3, then every such measure .μ can be defined as μ(X) = tr(APX ),
.
(1)
where .X ⊆ H is a closed subspace, .PX is the orthogonal projection onto X, tr denotes the trace of the operator, and A is a positive semi-definite self-adjoint trace class operator, see Remark 1, Sect. 2.1, and Theorem 4, as well as the beautiful proof of Gleason’s theorem by Parthasarathy [66], Chap. 1, Sect. 8. Going back to von Neumann, A is also referred to as a density operator when .tr(A) = 1 and is often denoted by .. See Theorem 16 for Busch’s analog of (1), which has a different and meaningful definition of measure allowing the set of projections .PX to be extended to a larger set of operators in a physically meaningful way. If .dim(H) < ∞ and B is a linear operator on .H which has matrix representation .MB , then B is trace class, and the trace of B is the sum of the diagonal values of .MB . These notions, as well as those introduced in Sect. 1.2, will be expanded upon in the remaining sections. They are given here in Sect. 1 in bare-bones fashion so that we can state the goal of the paper in Sect. 1.2. Throughout, .H denotes a separable Hilbert space over .K. In the d-dimensional case, we shall deal exclusively with the Hilbert space .H = Kd over .K, taken with the canonical inner product, since all d-dimensional inner-product spaces over .K are isometric to the Hilbert space .H, and we shall not need further refinements such as defining different inner products on the same space in terms of different matrices. Example 1 (Closed Subspaces of .H) a. Let .H be infinite dimensional. The subspaces X of .H are not necessarily closed. For example, let .H = L2 [a, b], and let .X = C[a, b].
A Generalization of Gleason’s Frame Function for Quantum Measurement
465
b. On the other hand, every subspace X of .H = Kd is closed. To see this, let .dim(X) = m < d, and let .xn − y → 0, .xn ∈ X, y ∈ H. Assume .y ∈ X. If .{u1 , . . . , um } is an orthonormal basis for X, then .w = m j =1 y, uj uj is the unique vector in X for which .w − y = inf{x − y : x ∈ X}. Because .y ∈ X, we have .x − y ≥ w − y > 0 for all .x ∈ X. This contradicts the hypothesis that .xn − y → 0.
1.2 The Role of Gleason’s Theorem and Our Goal The theory of frames was initiated by Duffin and Schaeffer in 1952 [35], but frames were actually defined by Paley and Wiener in 1934 [65] to deal with closed linear span problems. A frame is a natural generalization of an ONB. For detailed introductions to frames, see [6, 25, 30]. We now define a frame in order to formulate our goal and shall expand on the theory of frames in Sect. 3. We denote the standard inner product associated with the Hilbert space .H by . ·, · . Definition 1 (Frames) Let .H = Kd . a. A sequence .{xj }j ∈J ⊆ H is a frame for the Hilbert space .H if ∃ A, B > 0, such that ∀y ∈ H, A y2 ≤
.
| y, xj |2 ≤ B y2 .
j ∈J
If .A = B, then .{xj }j ∈J is an A-tight frame for .H.If .A = B = 1, then .{xj } is a 1-tight or Parseval frame for .H. In this case, each .xj ≤ 1, see Proposition 6. The cardinality of the sequence J is denoted by .card(J ), and it satisfies .d ≤ card(J ) ≤ ∞. Usually, our Parseval frames will satisfy .N := card(J ) < ∞, but we shall need the case .card(J ) = ∞ in Theorem 17. b. If a sequence .{xj }dj =1 is an orthonormal basis (ONB) for the Hilbert space .H, then Parseval’s identity ensures that .{xj }dj =1 is a Parseval frame for .H, e.g., [45], page 27. Hence, any ONB is a Parseval frame and we may view Parseval frames as a natural generalization of ONBs. Gleason’s classification of measures on closed subspaces of Hilbert spaces, stated in Sect. 1.1, depends on his notion of a frame function. Since this is not related to the theory of frames, we shall refer to such functions as Gleason functions. Definition 2 (Gleason Function for ONBs) A Gleason function of weight .W ∈ K for the ONBs for .H is a function .g : S −→ K, where .S ⊆ H is the unit sphere S := {x ∈ H : x = 1},
.
and such that, for all ONBs .{xj }j ∈J for .H, one has
466
J. J. Benedetto et al.
.
g(xj ) = W.
j ∈J
In the case that .H = Kd , the unit sphere S is denoted by .S d−1 . Remark 1 (Quantum Logic and the Born Model) In quantum measurement theory, Gleason’s theorem has ramifications with regard to the transition from the quantum logic lattice interpretation of quantum events to a validation of the Born model (or rule or postulate) for probability in quantum mechanics. Specifically, Mackey had asked whether every measure on the lattice of projections of a Hilbert space can be defined by a positive operator with unit trace. Kadison proved this is false for two-dimensional Hilbert spaces. Gleason’s theorem, and, in particular, (1), answers Mackey’s question in the positive for higher dimensional Hilbert spaces. This means that a Gleason function for the ONBs for .H = Kd that is defined by a self-adjoint operator as in Theorem 2 is compatible with the Born rule, see, e.g., [43, 71, 84]. In functional analysis, Gleason’s theorem has had significant generalizations with regard to von Neumann algebras and other abstract notions, see [48]. These directions are not part of our goal. Definition 3 (Gleason Function for Parseval Frames) Let .H = Kd . A Gleason function of weight .W ∈ K for the Parseval frames for .H is a function .g : B d −→ K, where .B d ⊆ H is the closed unit ball B d := {x ∈ H : x ≤ 1},
.
such that, for all Parseval frames .{xj }j ∈J ⊆ B d for .H, one has .
g(xj ) = W.
j ∈J
Our goal is the following: define, implement, and generalize the notion of Gleason’s functions for ONBs and the unit sphere to the setting of complex Parseval frames and the closed unit ball. It turns out that there are fundamental mathematical implications and new technology required to implement Gleason’s theorem in this setting. The reason we shall pursue this goal is that a version of Gleason’s theorem has been proved in the setting of positive operator-valued measures (POVMs) [23, 29], and POVMs can be viewed as equivalent to Parseval frames, a fact established and exploited in quantum detection problems [16], see Sect. 3. A consequence of this goal and reason is a quantitative insight into Busch’s formulation of Gleason’s theorem in terms of his notion of a generalized probability measure, see Sect. 6. Remark 2 (The Welch Bound) There are natural problems and relationships to be resolved and understood. For example, it is not difficult to check that if g is a Gleason function of weight .WN for all unit norm frames with .N > d elements
A Generalization of Gleason’s Frame Function for Quantum Measurement
467
for a given d-dimensional Hilbert space .H = Kd , then g is constant on .S d−1 . On the other hand, we can formulate the definition of a Gleason function to consider the class of all equiangular Parseval frames, thereby interleaving the power of Gleason’s theorem with fundamental problems of equiangularity as they relate to the Welch bound and optimal ambiguity function behavior, see Definition 4 c and Appendix 6. This is inextricably related to the construction of constant amplitude finite sequences with 0-autocorrelation, whose narrowband ambiguity function is comparable to the Welch bound, e.g., see [8] and [4].
1.3 Outline In Sect. 2, we summarize Gleason’s work in [44] in order to motivate further the definition and analysis of Gleason functions. We also extend his fundamental theorem (Theorem 3) from the setting of non-negative functions to that of bounded functions, viz., Theorem 5. The proof is elementary but is necessary in the proof of our basic Theorem 14. Section 2.3 may seem superfluous to Gleason’s observation that the difficult direction of his theorem fails for .d = 2, but we do provide a reason for why this is so by a characterization of quadratic forms on .S 1 . Section 3 establishes the well-known relationship between POVMs and Parseval frames. The former have long been a staple in quantum measurement, e.g., [23, 29], and the latter is the central mathematical reason we have gone beyond Gleason’s use of ONBs. Sections 4–6 establish our basic theory. The Parseval frame formulation of POVMs allows us to look more deeply into Gleason functions in Sects. 5 and 6. Section 4 gives the basic properties of Gleason functions for the Parseval frames for .H = Kd . Theorem 7 shows that quadratic forms defined by self-adjoint operators are always Gleason functions for the Parseval frames for .Kd , similar to the case of Gleason functions for the ONBs for .Kd . We then prove that continuous or non-negative Gleason functions for the Parseval frames for .Kd are reminiscent of homogeneous functions of degree 2 on .B d (Theorems 8 and 9). Using Theorem 9, we characterize bounded, real-valued Gleason functions for Parseval frames in terms of quadratic forms defined by self-adjoint operators in analogy to results in [44]. This is Theorem 11, the converse of Theorem 7, cf., Theorem 12. Because Parseval frames for .H = Kd vary in cardinality N, a natural question arises about the relationship between the sets .GN of Gleason functions for the N-element Parseval frames for .Kd as N varies. .GN is the subject of Sect. 5. In Theorem 14, we prove that if .N ≥ d + 2, then .card GN = card GN +1 . The proof requires several propositions of independent interest. In Sect. 6, we use Theorem 14 to weaken the hypotheses in Busch’s theorem, which itself is an analog of Gleason’s mathematical characterization of quantum states. Since Parseval frames are central to our theory and because they play an important role in applications ranging from numerically effective noise reduction
468
J. J. Benedetto et al.
to the construction of Grassmannian frames dealing with spherical codes to geometrically uniform codes in information theory to Zauner’s conjecture in quantum measurement, we conclude with Appendix 6 putting some of these topics in context.
2 Gleason’s Theorem 2.1 Preliminaries In order to state Gleason’s theorem, viz., Theorems 3 and 4, we need the following setup and notions. Let .A : H −→ H be a linear operator, where .H is a separable Hilbert space defined over .K. We make the convention that .q(x) := A(x), x is a quadratic form in the sense that .q(αx) = |α|2 q(x) for all .x ∈ H and .α ∈ K, see Remark 3. A is bounded, i.e., continuous, if .Aop := supx≤1 A(x)H < ∞, and ∗ .L(H) denotes the space of bounded linear operators .A : H → H. The adjoint .A ∗ ∗ of A is the mapping .A : H → H defined by the formula . A(x), y = x, A (y)
for all .x, y ∈ H. A is self-adjoint if A is bounded and .A∗ = A. and A is positive, respectively, positive semi-definite if ∀x ∈ H \ {0},
.
A(x), x > 0, respectively, ≥ 0.
Let .L+ (H) denote the subset of positive semi-definite elements of .L(H), and let S+ (H) denote the set of positive semi-definite self-adjoint operators on .H. Recall that if .A : H −→ H is a linear operator on a Hilbert space .H defined over .K = C and . A(x), x ∈ R for all .x ∈ H, then A is self-adjoint. Conversely, if A is self-adjoint, then .
∀x ∈ H,
.
A(x), x ∈ R.
(2)
If A is self-adjoint, then the eigenvalues .λ of A are real. Thus, in the case that A is positive, respectively, positive semi-definite, then .λ > 0, respectively, .λ ≥ 0. When .H is defined over .K = R and A is positive, respectively, positive semi-definite, we also have that .λ > 0, respectively, .λ ≥ 0, without having to verify that A is self-adjoint. If .H = Kd , then we consider linear operators .A : H −→ H with the .d × d matrix d .A = (ai,j ) i,j =1 . A is easily checked to be bounded by making the matrix calculation
.
A(x)H ≤
d i,j =1
|ai,j |2 |xj |2 ≤
d i,j =1
|ai,j |2 x2H .
Remark 3 (Quadratic Forms) Classically, differing from our convention in the case K = C, a quadratic form over .Kd in the d variables .x1 , x2 , . . . , xd ∈ K is a polynomial
.
A Generalization of Gleason’s Frame Function for Quantum Measurement
Q(x) := q(x1 , . . . , xd ) :=
d d
.
ci,j xi xj ,
ci,j ∈ K,
469
(3)
i=1 j =1
in which every term has degree 2, i.e., every term is a multiple of .xi xj for some .i, j . If we set .ai,j := 12 (ci,j + cj,i ) and consider the matrices .A = (ai,j ) and .C = (ci,j ), then A is symmetric and ∀x = (x1 , . . . , xd ),
.
x τ A(x) =
d d
ai,j xi xj =
i=1 j =1
1 1 Q(x) + Q(x) = Q(x), 2 2
(4) where .τ denotes the transpose. If .K = R, then .x τ A(x) = A(x), x , where x is a .d × 1 vector in the matrix multiplication .A(x), cf. (2). This is not true for .K = C because of conjugation. The trace, .tr(A), of a .d × d matrix .A = (ai,j ) is tr(A) :=
d
.
aj,j .
j =1
For .d × d matrices A and B, we have .atr(A) + btr(B) = tr(aA + bB), tr(AB) = tr(BA), and .tr(A∗ ) = tr(A), where .A∗ is the adjoint of A. Furthermore, if A is self-adjoint or, more generally, if .AA∗ = A∗ A, i.e., A is complex normal, then tr(A) =
d
λj
.
and
tr(A∗ A) =
j =1
d
|λj |2 ,
(5)
j =1
where the .λj are not necessarily distinct eigenvalues of A. Given .H = Kd , let .B : H −→ H be a linear operator, let .{e1 , . . . , ed } be the standard ordered basis for .H, and let .MB = (bi,j ) be the .d × d matrix representation of B in this basis. (.{e1 , . . . , ed } standard means that each .ej = (0, . . . , 0, 1, 0, . . . , 0), where 1 is in the j th coordinate.) The trace of .MB , denoted by .tr(MB ), is tr(MB ) :=
d
.
bj,j .
j =1
Remark 4 (Trace Class) Let .H be a separable Hilbert space defined over .K. By definition, .A ∈ L(H) is a trace class operator if for some and hence all ONBs .{xn } for .H,
470
J. J. Benedetto et al.
.
A1 :=
(A∗ A)1/2 (xn ), xn < ∞, n
∗ 1/2 so that . n A(xn ), xn < ∞ when A is self-adjoint, noting that . (A A) (xn ), xn ≥ 0. The trace of A is .tr(A) := n A(xn ), xn , and this is compatible with (5). Furthermore, every compact operator .A ∈ L(H) is characterized by the representation
∀x ∈ H, A(x) =
.
λj x, yj xj ,
where
λj ≥ 0
and
λj → 0,
j
for some orthonormal bases .{xj } and .{yj } for .H. As is well known, finite rank operators .A ∈ L(H) are trace class, these are Hilbert–Schmidt, and these are compact. We mention this since the dual of the space of compact operators with the proper topology is the space of trace class operators and because of Theorem 1b, see [73] for all of this material. We shall use the spectral theorem several times throughout and state the following form, see [42, 45, 56, 73, 76, 79]. Theorem 1 a. Let .H = Kd , let .A : H −→ H be a linear operator, and for convenience denote .MA by A. If A is self-adjoint, i.e., A a real symmetric matrix if .K = R or a Hermitian matrix if .K = C, then there exists a matrix U with columns consisting of a complete set of orthonormal eigenvectors for A, such that . = U AU −1 is diagonal. Such a U is orthogonal if .K = R and unitary if .K = C. b. Let .H be a separable Hilbert space defined over .K, and let .A ∈ L(H) be a compact self-adjoint operator. There is an orthonormal sequence .{xj } ⊆ H of eigenvectors of A and a corresponding sequence .{λj } ⊆ K of eigenvalues, such that .∀x ∈ H, A(x) = λj x, xj xj . j
If .{λj }∞ j =1 is an infinite sequence, then .limj →∞ λj = 0. Remark 5 (Spectral Decomposition) a. With regard to part a of Theorem 1, we note the following. In the real symmetric case, we have .A = U U −1 , with orthonormal eigenvectors forming U and with the eigenvalues of A forming the diagonal matrix .. Also, in the .K = C case, the eigenvalues of self-adjoint A are real, and, in both cases, if two eigenvectors come from distinct eigenvalues, then they are orthogonal. Furthermore, to prove the existence of an ONB of eigenvectors in the case that .K = C and A is Hermitian, we apply the fundamental theorem of algebra to the characteristic polynomial of A to obtain an eigenvalue .λ1 and an eigenvector
A Generalization of Gleason’s Frame Function for Quantum Measurement
471
u1 . Then, we consider the orthogonal complement of .u1 to obtain a .u2 , and, continuing in this way, we see how to construct a complete set of orthonormal eigenvectors. The case .K = R can be deduced from the complex case by complexification, see, e.g., [47], Sect. 77. b. With regard to part b of Theorem 1, we note the following. Although any linear operator on .H = Kd has an eigenvalue, which is not necessarily the case even for self-adjoint operators on infinitely dimensional .H. Furthermore, the self-adjoint identity operator I on infinite dimensional .H is not a compact operator. .
2.2 Gleason’s Theorem If .K = C, then the following holds for normal operators A. Theorem 2 Let .H = Kd , and let A be a self-adjoint linear operator .A : H → H. The function .g : H → K, defined by the formula, ∀x ∈ S d−1 ,
.
g(x) = A(x), x ,
(6)
is a Gleason function of weight .W = tr(A) for the ONBs for .H. Proof By the spectral theorem, there exists an orthonormal eigenbasis .{ej }dj =1 associated with the set .{λj }dj =1 of eigenvalues of A. Hence, for all .x ∈ H, we have d d d .x = j =1 x, ej ej and .A(x) = j =1 x, ej λj ej . If .{xj }j =1 is an ONB for .H, then we have tr(A) =
d
.
λj =
j =1
d
d d 2 λj ej = λj | ej , xn |2 ,
j =1
j =1
n=1
where the last equality is due to the Parseval identity. Reordering the finite sums and using the orthogonality of .{ej } yield the desired result: tr(A) =
d d
.
λj ej , xn xn , ej
n=1 j =1
=
d d n=1 j =1
d d A(xn ), xn = xn , ej λj ej , xn = g(xn ). n=1
n=1
Therefore, g is a Gleason function of weight .W = tr(A) for the ONBs for .H.
The converse assertion of Theorem 2 is true directly for .d = 1. In fact, if .d = 1 and g is a Gleason function of weight W for the two ONBs for .H = K = R, then A is defined by the action .A(x) := W x. The same operator works for .H = K = C,
472
J. J. Benedetto et al.
but in this case the ONBs are the uncountable set, .{zu = eiu : u ∈ [0, 2π )}, and A is again defined by the action .A(x) := W x. The converse assertion of Theorem 2 is not true for the case .d = 2, see Sect. 2.3. Although the situation is substantially more intricate for .d ≥ 3, Gleason’s Theorem 3 asserts that the converse of Theorem 2 is still true, but with restrictions on the given Gleason function. As Gleason was well aware, some restrictions are necessary, see Proposition 1. Theorem 3 Let .H = Kd , and let .g : S d−1 −→ R be a non-negative Gleason function for the ONBs for .H, where .d ≥ 3. There exists a positive self-adjoint operator .A : H −→ H such that ∀x ∈ S d−1 ,
.
g(x) = A(x), x .
The result is also true for any separable Hilbert space .H. Remark 6 (Gleason’s Theorem for .R3 ) The proof of Theorem 3 depends on Gleason’s theorem that non-negative Gleason functions for the ONBs for .R3 satisfy (6) ([44], Theorem 2.8), and this, in turn, depends on his result that continuous Gleason functions for the ONBs for .R3 satisfy (6) ([44], Theorem 2.3). Both proofs are ingenious. Theorem 3 is essential and significant for the proof of the following result. The positivity hypothesis in Theorem 3 is natural given the measure theoretic nature of Theorem 4. Theorem 4 was our starting point in Sect. 1.1. Theorem 4 Let .μ be a measure on the closed subspaces of .H, where .dim(H) ≥ 3. There exists a positive semi-definite self-adjoint trace class operator .A : H −→ H such that, for all closed subspaces .X ⊆ H, μ(X) = tr(APX ),
.
where .PX is the orthogonal projection of .H onto X. Proof Let .Bx = span(x) for any unit norm vector .x ∈ H, i.e., .x ∈ S d−1 for d .H = K . Then, .g(x) = μ(Bx ) defines a non-negative Gleason function for the ONBs for .H by the definition of .μ. By Theorem 3, there exists a positive self-adjoint operator A such that for all unit norm .x ∈ H, we have .g(x) = A(x), x . Next, note that if .{xj } is an ONB for .H, then μ(H) =
.
j
μ(Bxj ) =
A(xj ), xj = tr(A), j
where the sums are finite since, by the definition of a measure .μ on the closed subspaces in Sect. 1.1, we have assumed .μ(H) < ∞. Because the latter sum is finite, A is trace class and, in fact, .tr(A) = μ(H). These latter assertions are immediate for the cases that .H = Kd , d ≥ 3.
A Generalization of Gleason’s Frame Function for Quantum Measurement
473
If .X ⊆ H is an arbitrary closed subspace, choose an ONB .{yi } for X and an ONB .{zj } for the orthogonal complement .X⊥ of X. Then, the projection mapping .PX satisfies .PX (yi ) = yi and .PX (zj ) = 0 for all i and j . Clearly, .{yi } ∪ {zj } is an ONB for .H. Therefore, we have .μ(X) = μ(Byi ) i
=
A(yi ), yi =
i
A(PX (yi )), yi +
i
A(PX (zj )), zj = tr(APX ), j
as desired.
Whereas Gleason formulated Theorem 3 only for the case where g takes on nonnegative real values, we now show that it is not difficult to extend the result to the more general case that .g : S d−1 → K is bounded. (Here, .K is the base field of d .H = K .) In fact, this generality is used in the sequel, e.g., in Theorem 15. Theorem 5 Let .H = Kd , and let .g : S d−1 → K be a bounded Gleason function for the ONBs for .H, where .d ≥ 3. There exists a bounded (necessarily since .H = Kd ) linear operator .A : H → H such that g(x) = A(x), x .
∀x ∈ S d−1 ,
.
Furthermore, A is self-adjoint if the bounded function g is real-valued and, in particular, if .H = Rd . Proof i. First, suppose that the image of g lies in .R. Let W denote the weight of g and let .λ = infx∈S d−1 g(x). Then, the function .f : S d−1 → K defined by .f (x) := g(x) − λ is a Gleason function of weight .W − λd for the ONBs for .H, since if d .{xi } i=1 is an ONB for .H, then d .
i=1
f (xi ) =
d
g(xi ) − λd = W − λd.
i=1
Furthermore, f is non-negative, since .f (x) ≥ λ−λ = 0 for all .x ∈ S d−1 . Hence, Gleason’s Theorem 3 implies that there exists a self-adjoint operator .B : H → H such that .f (x) = B(x), x for all .x ∈ S d−1 . Setting .A := B − λI , we obtain d−1 . Note that A is the .g(x) = (B − λI )(x), x = A(x), x for all .x ∈ S difference of two self-adjoint operators and is therefore self-adjoint. ii. Now we proceed to the general case, where the image of g lies in .C. Both .Re g and .Im g are Gleason functions for the ONBs for .H, since for any ONB .{xi }di=1 for .H, we have
474
J. J. Benedetto et al. n .
Re g(xi ) + i
i=1
n
Im g(xi ) =
i=1
n
g(xi ) = W = Re W + iIm W.
i=1
Clearly, both .Re g and .Im g are bounded. Hence, by part a, we obtain linear operators .B, C : H → H such that .Re g(x) = B(x), x and .Im g(x) = C(x), x for all .x ∈ S d−1 . Setting .A := B + iC, we obtain .g(x) = A(x), x
for all .x ∈ S d−1 . Although .B + iC is not self-adjoint, i.e., . (B + iC)(x), y = x, (B ∗ + iC ∗ )(y) , we do have . (B + iC)(x), y = x, (B ∗ − iC ∗ )(y) . Example 2 (Gleason Functions as Compositions) a. Let g be the composition .g = f ◦ h, restricted to .S d−1 , of .h : Kd → K and d−1 . Then, .f : K → K, and where h takes an arbitrary constant value .c ∈ K on .S d d for any ONB .{xj }j =1 ⊂ K , we have d .
d d (f ◦ h)(xj ) = f (c) = d · f (c) := W.
g(xj ) =
j =1
j =1
j =1
Thus, g is Gleason function of weight W for the ONBs for .Kd . We can write g as ∀x ∈ S d−1 ,
.
where .A =
W d I
and .I : Kd → Kd is the identity. In fact, for .x ∈ S d−1 , we have
A(x), x =
.
g(x) = A(x), x ,
W x, x = f (c) = (f ◦ h)(x) = g(x). d
Note that A is self-adjoint if .f : K → R. Furthermore, in this case, if .K = R, then .y T A(y) is the quadratic form .(W/d)(y12 + . . . + yd2 ). It is natural to consider the special case .c = x = 1 since .x ∈ S d−1 . b. Let g be the composition .g = f ◦ h, where .h : S d−1 → K is a Gleason function of weight W for the ONBs for .Kd and .f : K → K is a homomorphism on the additive group .K. For any ONB .{xj }dj =1 ⊂ Kd , we have d .
g(xj ) = f (h(x1 )) + . . . + f (h(xd )) = f (h(x1 ) + . . . + h(xd )) = f (W )
j =1
and so g is a Gleason function of weight .f (W ) for the ONBs for .Kd . c. Let .f : R → R be a homomorphism on the additive group .R. Since .f (0) = 0 and setting .c := f (1) ∈ R, we obtain .f (q) = cq for all .q ∈ Q by direct calculation.
A Generalization of Gleason’s Frame Function for Quantum Measurement
475
If f is continuous on .R, we can then assert that .f (x) = cx for all .x ∈ R. As is well known, the hypothesis of continuity can be relaxed to assuming only that f is continuous at a point or even only that f is Lebesgue measurable, and one still verifies that .f (x) = cx for all .x ∈ R. .R is an infinite dimensional vector space over the rational field .Q. For this setting, we say that H is a Hamel basis for .R if
∀x ∈ R, ∃{rα } ⊆ Q and ∃{hα } ⊆ H, such that x =
.
rα hα ,
α
where the sum is finite and the representation is unique. Using Zorn’s lemma, which is an equivalent form of the axiom of choice, it is straightforward to see that Hamel bases exist by the following argument: let .I be the family of all subsets of .R that are linearly independent over .Q; then there is a maximal element .H ∈ I, and this H can be shown to be a Hamel basis .R. We have .card(H ) = card(R). Furthermore, any vector space over any field has a Hamel basis over that field. See [10], pages 88–89, 150, 153, 162 for this material, its relation to measure theory, and classical references beyond Hausdorff’s fundamental book. Proposition 1 a. There are discontinuous homomorphisms .f : R → R. b. There are discontinuous and, in fact, non-Lebesgue-measurable Gleason functions for the ONBs for .H = Kd . c. There are Gleason functions g for the ONBs for .H = Kd that do not satisfy (6) for any self-adjoint operator A. Proof a. Define .f (t) ∈ R for each .t ∈ H , a Hamel basis for .R over .Q. This definition of f on H is arbitrary. Each .u ∈ R has a unique finite sum representation .u = t∈H rt (u)t, where .rt (u) ∈ Q. Define .f : R → R by .f (u) = t∈H rt (u)f (t). f is well defined on .R since H is a Hamel basis.By the unique representations, .u = t∈H rt (u)t, .v = t∈H rt (v)t, .u + v = t∈H rt (u + v)t, we can assert that f is a homomorphism because ∀u, v ∈ R,
.
rt (u) + rt (v) = rt (u + v)
and so f (u) + f (v) =
t∈H
.
=
t∈H
rt (u)f (t) +
rt (v)f (t)
t∈H
(rt (u) + rt (v))f (t) =
rt (u + v)f (t) = f (u + v).
t∈H
(7)
476
J. J. Benedetto et al.
The facts that .card(H ) = card(R) = c and f can be defined arbitrarily on H allow us to conclude that the homomorphism equation (7) has .cc solutions. On the other hand, any of these solutions f that is continuous at a point is of the form .f (u) = cu for some .c ∈ R (Example 2 c), i.e., there are c such solutions and so all of the other solutions are discontinuous. b. Combining Example 2 b with part a gives part b. c. Suppose g is a Gleason function that satisfies (6). The continuity of g on the compact set .S d−1 follows since A is bounded. More concretely, if .xn − xH → 0, where .xn , x ∈ S d−1 , then |g(xn ) − g(x)| = | A(xn − x), xn + A(x), xn − A(x), x |
.
.
≤ A(xn − x)H + | xn − x, A(x) | ≤ 2 A xn − xH ,
which goes to 0 in the limit. Choose a discontinuous Gleason function from part b. It cannot satisfy (6) for then it would be continuous.
2.3 The Case d = 2 Although, as noted in Sect. 2.2, the converse assertion of Gleason’s Theorem 2 is elementary to verify in the one-dimensional case and is true for separable Hilbert spaces of dimension .d ≥ 3, the theorem does not hold for .H = K2 . This means that there are Gleason functions g for the ONBs for .K2 for which there are no selfadjoint operators .Ag : K2 → K2 with the property that .g(x) = Ag (x), x on .S 1 . Our only insight about this assertion and that Gleason did not explicitly make is that our proof of Proposition 4 requires the characterization of quadratic forms over .R2 that we give in Proposition 3. Furthermore, the fact remarked by Gleason [44], page 886, that, on .R2 for example, Gleason functions for the ONBs for .R2 can be defined arbitrarily on the quadrant .θ ∈ [0, π/2) of .S 1 is routinely quantified in Example 3 and Proposition 5. Example 3 (Counterexample for Converse in .R2 ) Define the often used .0, 1-valued function ⎧ ⎪ 1, θ ∈ [0, π/2) ∩ Q, ⎪ ⎪ ⎪ ⎨0, θ ∈ [0, π/2) ∩ Qc , .g(θ ) = ⎪ 1 − g(θ − π/2), θ ∈ [π/2, π ), ⎪ ⎪ ⎪ ⎩ g(θ − π ), θ ∈ [π, 2π ), where .Qc is the set of irrational numbers and .θ is the angle that a unit vector .u ∈ R2 takes with the positive x axis in .R2 . Since every ONB for .R2 is of the form
A Generalization of Gleason’s Frame Function for Quantum Measurement
477
{(cos (θ ), sin (θ )), (−sin (θ ), cos (θ ))},
.
for some .θ ∈ [0, 2π ), an elementary calculation shows that g is a Gleason function for the ONBs for .R2 . On the other hand, g is clearly not a quadratic form since it is not continuous, see Eq. (4). Thus, Gleason’s theorem does not extend to twodimensional real inner product spaces. For the more difficult case of a continuous counterexample for the converse, see Proposition 4. The following result, a modified form of which was used by Gleason in his original paper, viz., his Lemma 2.2, illustrates why Theorem 2 does not generalize to such spaces. Proposition 2 Let .n ≡ 2 mod 4. The function .g(θ ) = 1 + cos(nθ ), defined on the unit circle .S 1 ⊆ R2 , i.e., the polar coordinate .θ ∈ [0, 2π ), is a non-negative Gleason function of weight 2 for the ONBs for .R2 . Proof As noted in Example 3, two unit vectors in .R2 form an orthonormal basis for 2 .R if and only if they have an angle of .π/2 radians between them. Indeed, this latter condition is equivalent to the orthogonality of the two vectors by the definition of angles in inner product spaces, and any two orthogonal unit vectors in .R2 form an ONB for .R2 , since nonzero orthogonal vectors are linearly independent. Clearly, .1 + cos(nθ ) takes on values in the range .[0, 2], so it remains to show that g(θ1 ) + g(θ2 ) = 2
.
for any .θ1 , θ2 giving the angles relative to the origin of the vectors in an ONB for R2 . Reordering if necessary, we can assume without loss of generality that .θ2 = θ1 + π/2. Using the trigonometric identity .cos(α + β) = cosαcosβ − sin α sin β and the fact that .nπ/2 is an odd multiple of .π , we obtain
.
g(θ1 ) + g(θ2 ) = 2 + cos(nθ1 ) + cos(nθ1 + nπ/2)
.
= 2 + cos(nθ1 ) + cos(nθ1 )cos(nπ/2) − sin(nθ1 ) sin(nπ/2) = 2 + cos(nθ1 ) − cos(nθ1 ) = 2 for all angles .θ1 .
In Proposition 4, we shall show that the Gleason functions defined in Proposition 2 are not quadratic forms on the unit circle .S 1 ⊆ R2 when .|n| = 2. To this end, we shall use the following result. Proposition 3 Any quadratic form over .R2 with more than four zeros on the unit circle is identically zero. Proof Let .Q(v) = A(v), v be a quadratic form over .R2 with more than four zeros on the unit circle. Then, there exist unit vectors .v1 and .v2 such that .Q(v1 ) = Q(v2 ) = 0, but .v2 = ±v1 . Thus, .v1 and .v2 are linearly independent, so they form a basis for the two-dimensional space .R2 . Consider the linear transformation T
478
J. J. Benedetto et al.
defined over the standard basis for .R2 by .T ei = vi for .i = 1, 2. T is invertible because it sends a basis for .R2 to a basis for .R2 . Define a new quadratic form .Q2 (v) := Q(T (v)) = AT (v), T (v) . We 2 shall relate .Q 2 −1to Q. Let u be a unit vector in .R . If .Q(u) = 0, then −1 .Q2 (T (u)/ T (u)) = 0. Likewise, if .Q2(u) = 0, then .Q(T (u)/ T (u)) = 0. −1 −1 The correspondences .u → T (u)/ T (u) and .u → T (u)/ T (u) give inverse automorphisms of the unit circle. Indeed, for any unit vector u, .
T (u) T −1 T (u) u T −1 (T (u)/ T (u)) = = = u, −1 T (T (u)/ T (u)) T (u) T −1 T (u) u
and, likewise, T (T −1 (u)/ T −1 (u)) = u. . T (T −1 (u)/ T −1 (u)) In particular, the sets of zeros of Q and .Q2 on the unit circle are in bijective correspondence, and if either Q or .Q2 is identically zero, then the other is as well. The quadratic form .Q2 can be expressed in rectangular coordinates as 2 2 .Q2 (x, y) = ax + bxy + cy for some constants .a, b, c ∈ R. Since .Q2 (e1 ) = Q2 (e2 ) = 0, it follows that .a = c = 0 and so .Q2 (x, y) = bxy. If .b = 0, then .Q2 is only zero on the unit circle at the values .±e1 , ±e2 , contradicting the hypothesis that Q, and hence .Q2 , has more than four zeros. Thus, .b = 0 and .Q2 is identically zero. By the previous comments, this implies that Q is identically zero. There exist quadratic forms that have exactly four zeros on the unit circle, e.g., the quadratic form Q defined in rectangular coordinates by .Q(x, y) = xy. Hence, the hypothesis of more than four zeros in Proposition 3 cannot be further relaxed. Proposition 4 Let .n ≡ 2 mod 4 with .|n| = 2. The function .g(θ ) = 1 + cos(nθ ), defined on the unit circle .S 1 ⊆ R2 , i.e., the polar coordinate .θ ∈ [0, 2π ), is a Gleason function of weight 2 for the ONBs for .R2 , but it is not the restriction of a quadratic form to .S 1 . Proof The fact that g is a Gleason function is the content of Proposition 2. It remains to show that g is not the restriction of a quadratic form to the unit circle. Suppose that g is a quadratic form on the unit circle. Noting that .|n| = 6, 10, 14, . . ., we easily check that .g(θ ) = 1 + cos(nθ ) has at least .|n| ≥ 6 > 4 distinct zeros at 2 .θ = kπ/|n|, for .1 ≤ k ≤ 2|n| − 1 and k odd. Thus, g is identically 0 over .R by Proposition 3. We obtain the desired contradiction since .g(0) = 1+cos(0) = 2 = 0. The hypothesis .|n| = 2 is necessary in Proposition 4. In fact, using the doubleangle and Pythagorean trigonometric identities, the function .g(θ ) = 1 + cos(±2θ ) can be rewritten as g(θ ) = 1 + cos2 (±θ ) − sin2 (±θ ) = 2cos2 (±θ ).
.
(8)
A Generalization of Gleason’s Frame Function for Quantum Measurement
479
Viewed in rectangular coordinates .(x, y) for inputs lying on the unit circle, the right side of (8) is .2x 2 , which is a quadratic form. The following is a quantitative version of Gleason’s remark noted at the beginning of this subsection. Proposition 5 Let .f : R → R be a bounded, non-negative, .π/2-periodic function, and let .W ≥ supf . Define g in polar coordinates .θ on .S 1 by the formula g(θ ) =
.
f (θ )
θ ∈ [0, π/2) ∪ [π, 3π/2)
W − f (θ )
θ ∈ [π/2, π ) ∪ [3π/2, 2π ).
Then, g is a non-negative Gleason function of weight W for the ONBs for .R2 . Proof Since f is non-negative and .W ≥ sup f , it follows that g is non-negative on S1. As also noted in the proof of Proposition 2 and by the definition of angles in inner product spaces, two nonzero vectors in .R2 are orthogonal if and only if they are separated by an angle of .π/2. Thus, it suffices to show that .g(θ ) + g(θ + π/2) = W for any angle .θ . If .θ ∈ [0, π/2) ∪ [π, 3π/2), then, taking angles modulo .2π as necessary, .θ + π/2 ∈ [π/2, π ) ∪ [3π/2, 2π ), and consequently,
.
g(θ ) + g(θ + π/2) = f (θ ) + W − f (θ + π/2) = f (θ ) + W − f (θ ) = W.
.
Otherwise, .θ ∈ [π/2, π ) ∪ [3π/2, 2π ), and, again taking angles modulo .2π as necessary, .θ + π/2 ∈ [0, π/2) ∪ [π, 3π/2). Hence, g(θ ) + g(θ + π/2) = f (θ ) + C − f (θ + π/2) = f (θ ) + W − f (θ ) = W.
.
Thus, g is a Gleason function of weight W for the ONBs for .R2 with weight W .
3 Parseval Frames and POVMs 3.1 Properties of Frames The following definition for Hilbert spaces is equivalent to Definition 1 for frames for .Kd and is formulated in terms of bounds that are often useful in computation and coding. Definition 4 (Frames) a. Let .H be a separable Hilbert space over the field .K, where .K = R or .K = C, e.g., d d 2 d .H = L (R ), R , C . A finite or countably infinite sequence, .X = {xh }h∈J , of elements of .H is a frame for .H if
480
J. J. Benedetto et al.
A x2 ≤
∃A, B > 0 such that ∀x ∈ H,
.
| x, xh |2 ≤ B x2 .
(9)
h∈J
The optimal constants, viz., the supremum over all such A and infimum over all such B, are the lower and upper frame bounds, respectively. When we refer to frame bounds or constants A and B, we shall mean these optimal constants. Otherwise, we use the terminology, a lower frame bound or an upper frame bound. b. A frame X for .H is an A-tight frame if .A = B. If a tight frame has the further property that .A = B = 1, then the frame is a Parseval frame for .H. A tight frame X for .H is a unit norm tight frame if each of the elements of X has norm .1. Finite unit norm tight frames for finite dimensional .H are designated as FUNTFs. ONBs are both Parseval frames and FUNTFs for finite dimensional .H. d c. A set .X = {xj }N j =1 ⊂ H = K is equiangular if ∃ α ≥ 0 such that
.
∀j = k,
| xj , xk | = α.
An equiangular tight frame is designated as an ETF. It is well known and elementary to verify that, for any .d ≥ 1, the simplex consisting of .N = d + 1 elements is an equiangular FUNTF and that such ETFs are so-called group frames, see [83]. Amazingly, and elementary to prove, the finite frames for .H = Kd are precisely d d the finite sequences, .X = {xh }N h=1 ⊆ K , that span .K , i.e., ∀ x ∈ Kd , ∃ c1 , . . . , cN ∈ K such that
.
x=
N
ch xh .
(10)
h=1
The innocent and Parseval-like Definition 4 is the basis (sic) for the power of frames, and it belies the power of frames in dealing with numerical stability, robust signal representation, and noise reduction problems, see, e.g., [13, 33] Chaps. 3 and 7, [30, 53], and [54]. Let .X = {xh }h∈J be a frame for .H. We define the following operators associated with every frame; they are crucial to frame theory. The analysis operator .L : H →
2 (J ) is defined by ∀x ∈ H,
.
Lx = { x, xh }h∈J .
The adjoint of the analysis operator is the synthesis operator .L∗ : 2 (J ) → H, and it is defined by ∀a ∈ 2 (J ),
.
L∗ a =
h∈J
ah xh .
A Generalization of Gleason’s Frame Function for Quantum Measurement
481
The frame operator is the mapping .F : H → H defined as .F = L∗ L, i.e., ∀x ∈ H,
.
F(x) =
x, xh xh .
h∈J
The following is a fundamental theorem. Theorem 6 Let .H be a separable Hilbert space, and let .X = {xh }h∈J ⊆ H. a. X is a frame for .H with frame bounds A and B if and only if .F : H → H is a −1 topological isomorphism with norm bounds .Fop ≤ B and .F−1 op ≤ A . b. In the case of either condition of part a, we have the following: B −1 I ≤ F−1 ≤ A−1 I,
(11)
.
{F−1 xh } is a frame for .H with frame bounds .B −1 and .A−1 , and
.
.∀x
∈ H,
x=
x, xh F−1 xh =
h∈J
h∈J
x, F−1 xh xh = x, F−1/2 xh F−1/2 xh . h∈J
(12) For a proof of part a, see [19], pages 100–104. For part b, let .X = {xh }h∈J be a frame for .H. Then, the frame operator .F is invertible [7, 33], and .F is a multiple of the identity precisely when X is a tight frame. Furthermore, .F−1 is a positive selfadjoint operator and has a square root .F−1/2 (Theorem 12.33 in [73]). This square root can be written as a power series in .F−1 ; consequently, it commutes with every operator that commutes with .F−1 and, in particular, with .F. These properties allow us to assert that .{F−1/2 xh } is a Parseval frame for .H and give the third equality of (12), see [30], page 155. The following is straightforward to prove, e.g., see [25, 83]. Proposition 6 Given .H = Kd and .N ≥ d, let .X = {xj }N j =1 ⊂ H. a. If X is a Parseval frame for .H and each .xj = 1, then .N = d and X is an ONB for .H. b. If X is a FUNTF for .H and not an ONB for .H, then the frame constant .A = 1. c. A FUNTF, respectively, Parseval frame, for .H is not a Parseval frame, respectively, FUNTF for .H, unless .N = d and X is an ONB for .H. Ad 1/2 d. If X is an equi-normed A-tight frame for .H, . then each . xj = N e. If X is a Parseval frame for .H, then each . xj ≤ 1. The same result is true for any separable Hilbert space over .K. f. If X is an equiangular, A-tight frame for .H, then Ad 1/2 A d(N − d) xj = and xj , xk = . N N N −1 g. If X is an equiangular, Parseval frame for .H, then each .xj < 1. ∀j, k = 1, . . . , N,
.
482
J. J. Benedetto et al.
Remark 7 (Frames and Bases for .H) In light of the fact that ONBs are frames, it is natural to ask to what extent frames can be constructed in terms of ONBs. 1. It may be considered surprising that any infinite dimensional .H contains a frame for .H which does not contain a basis for .H. The result is due to Casazza and Christensen, see [30], Chap. 7, for details. 2. The first result relating frames and sums of bases is due to Casazza [26]. Let .H be a separable Hilbert space, and let .X = {xh }h∈J be a frame for .H with upper frame bound B. Then, for every .ε > 0, there are ONBs .{uh }h∈J , {vh }h∈J , {wh }h∈J for .H and a constant .C = B(1 + ε) such that ∀h ∈ J,
xh = C(uh + vh + wh ).
.
The proof depends on an operator-theoretic argument.
3.2 POVMs Definition 5 is a measure theoretic formulation of POVMs, see [2, 24] for applications to coherent states and quantum physics, and see [10] for the measure theory. Definition 5 (POVM) Let .S be a set, let .B be a .σ -algebra of subsets of .S, and let .H be a separable Hilbert space. In this setting, a POVM on .B is a representation-like mapping, .μ : B −→ L(H), with the following properties: 1. .∀ U ∈ B, μ(U ) ∈ L(H) is a positive semi-definite self-adjoint operator. 2. .μ(∅) = 0, the 0-operator. 3. For every disjoint collection, .{Uj }∞ j =1 ⊆ B, if .x, y ∈ H, then .
∞ μ(∪∞ U )(x), y = μ(Uj )(x), y . j j =1 j =1
4. .μ(S) = I , the identity operator. L(H) is a non-commutative .∗-Banach algebra with unit, see [4].
.
Proposition 7 Let .{xj }j ∈J be a Parseval frame for .H, where .S = J ⊆ Z. Define a family .{μ(U )}U ⊆J of linear operators on .H by the formula ∀ x ∈ H,
.
μ(U )(x) =
x, xj xj . j ∈U
Then, .μ is a POVM on .B. If .H = Kd , then we typically take .J = {1, . . . , N }, . N ≥ d.
A Generalization of Gleason’s Frame Function for Quantum Measurement
483
Proof By direct manipulation with the definition of .μ(U ), we verify the first three criteria of Definition 5. The last criterion follows since .{xj }j ∈J is a Parseval frame for .H; in fact, .∀ x ∈ H, μ(S)(x) = x, xj xj = x. j ∈J
Proposition 8 Let .B = P(S) be the power set .σ -algebra of a countable set .S, and let .μ : B −→ L(H) be a POVM. Then, there is a countable set J , a Parseval frame .{xj }j ∈J for .H, and a disjoint partition .{Bi }i∈S of J such that ∀i ∈ S and ∀x ∈ H,
μ(i)(x) =
.
x, xj xj . j ∈Bi
Furthermore, if .H = Kd , then each .Bi may be taken to be finite. Proof For each .i ∈ S, .μ(i) ∈ L(H) is self-adjoint and positive semi-definite by definition. To fix ideas, let .H = Kd . (For more general .H, there are appropriate versions of the spectral theorem that we now apply to .Kd .) By Theorem 1, for each d .i ∈ S, there is a d-element indexing set .Bi , an orthonormal set .{vj }j ∈Bi ⊆ K , and a set .{λj }j ∈Bi of non-negative numbers such that ∀x ∈ Kd , μ(i)(x) =
.
λj x, vj vj = x, xj xj ,
j ∈Bi
j ∈Bi
where ∀j ∈ Bi ,
.
xj =
λj vj .
Furthermore, all the .Bi are disjoint. Set .J = ∪i∈S Bi . Because .S is countable and each .Bi is finite, J itself is countable. Since .μ(S) = I , we have that ∀x ∈ Kd ,
.
x = μ(S)(x) =
x, xj xj = x, xj xj . i∈S j ∈Bi
It follows that .{xj }j ∈J is a Parseval frame for .Kd .
j ∈J
Proposition 8 is a converse of Proposition 7 for the .σ -algebra of all subsets of .S = Z. Applicably, we can also say that analyzing quantum measurements with a discrete set .S of outcomes is equivalent to analyzing Parseval frames. Propositions 7 and 8 were established to illustrate the role of POVMs in quantum detection [16], which itself depends on frame potential theory [12]. Definition 5 is stated in some generality so as to be able eventually to extend the analysis of quantum measurements for more robust sets of outcomes. Also, in the setting of more general measurable spaces .S than .Z, there are corresponding equivalences with
484
J. J. Benedetto et al.
Parseval frames. In fact, each of the propositions in this subsection has a significant, straightforward generalization, which we do not pursue herein. Example 4 (Resolution of the Identity) a. Given .S, .B, and .H as in Definition 5 take .H over .C. A resolution of the identity on .B is a mapping, . : B −→ L(H), with the following properties: .(∅) = 0, .(S) = I ; each .(U ) is a self-adjoint projection and so each 2 .(U ) is positive semi-definite (. (U )(x), x = (U )(x) for all .x ∈ H); .(U ∩ V ) = (U )(V ) (composition) on .B; . is finitely additive on .B, and .x,y : B → C defined by ∀x, y ∈ H,
.
x,y (U ) = (U )(x), y
is a complex measure on .B. The importance/existence of resolutions of the identity is the spectral theorem that asserts that every bounded self-adjoint (and more generally) operator A on .H induces a resolution of the identity ., and A can be reconstructed from . in terms of a certain type of integral. b. With the setup of part a, suppose (the weak hypothesis) that .S can be written as a disjoint union .∪Un of a sequence .{Un } ⊆ B. Define .En = μ(Un ) : H → H, where .μ is given in Definition 5. Then, we have the resolution of the identity .I = En because, for all .x, y ∈ H, . I (x), y
= μ(S)(x), y = μ(∪Un )(x), y =
μ(Un )(x), y =
En (x), y .
c. We mention parts a and b since we shall be dealing with special POVMs described in Definition 6. These will correspond to discrete observables in quantum measurement, and the domain .B does not play an explicit role. Definition 6 (POVMs, Effects, and Projections) Let .U(H) denote the set of operators .U ∈ L+ (H) for which .0 ≤ U ≤ I , and let .E(H) denote the set of operators .E ∈ S+ (H) for which .0 ≤ E ≤ I , i.e., .E(H) = S+ (H) ∩ U(H). Note that if .A ∈ L+ (H) \ U(H), then there is .a > 1 and .U ∈ U(H) such that .A = aU . In fact, set .a = A > 1 and .U = (1/ A)A. The verification is immediate, e.g., .U ≤ I since . (I − U )(x), x = x, x − (1/ A) A(x), x and 2 .(1/ A) A(x), x ≤ x = x, x . Similarly, if .A ∈ S+ (H) \ E(H), then there is .a > 1 and .E ∈ E(H) such that .A = aE. .E(H) is the set of all effects. A positive operator-valued measurement on .H, which we also designate by POVM, is a sequence .{En } ⊆ E(H) such that .I = En , see [23, 24, 72]. For example, if .E ∈ E(H), then .I − E ∈ E(H) since .I − E ∈ S+ (H) and .I − E ≤ I , in particular, .0 ≤ I − E ≤ I , and thus .{E, I − E} is a POVM on .H. Let .P+ (H) ⊆ L(H) be the space of self-adjoint projections. If .P ∈ P+ (H), then 2 2 .P ≥ 0 since . P (x), x = P (x), x = P (x) . Furthermore, .P ∈ P+ (H) implies .I − P ∈ P+ (H) and so we have that .P+ (H) ⊆ E(H). POVMs, effects, and projections are the topic of Sect. 6.
A Generalization of Gleason’s Frame Function for Quantum Measurement
485
Definition 7 (Tensor Product and ket-bra Notation) a. For given .H over .K, let .H be the dual space .L(H, K) of bounded linear functionals .L : H → K, taken with the operator norm topology given by .L = supx≤1 |L(x)|. A fundamental result, which is essentially the Riesz representation theorem for Hilbert spaces, is that there is a conjugate-linear surjective isometry, .H → H , .y → Ly = y ∗ defined by the formula ∀x ∈ H,
.
Ly (x) = y ∗ (x) = x, y ,
where conjugate-linear means that . x, a1 y1 + a2 y2 = a1 x, y1 + a2 x, y2 . b. The tensor product .⊗ : H × H → L(H) is the bilinear mapping sending pairs ∗ ∗ .(x, y ) to linear operators .x ⊗ y defined by the action ∀z ∈ H,
.
(x ⊗ y ∗ )(z) = (y ∗ (z))x = z, y x.
In this definition, we note that if .K = C, then we do not generally have .ay ∗ = (ay)∗ , and the bilinearity follows since .(x ⊗ (ay ∗ ))(z) = z, ay x = a(x ⊗ y ∗ )(z). c. Let .x = (x1 , . . . , xd ), .y ∗ = (y1 , . . . , yd ) ∈ H = Kd . The outer product .xy ∗ is the .d × d matrix .(zi,j ), where .zi,j = xi yj , and, in fact, this is the tensor product ∗ ∗ .x ⊗ y defined more generally in part b. .x ⊗ y is clearly a rank-1 operator on d .K since each of the columns of .(zi,j ) is a multiple of the first column. In Dirac notation, .x ⊗ y ∗ is the ket-bra .|x y|. Lemma 1 Given .H, let .x ∈ H. Define .E = x ⊗ x ∗ ∈ L(H). Then, .E ∈ S+ (H), i.e., E is self-adjoint and positive semi-definite. Proof For any .y, z ∈ H we have the equations .
E(y), z = (x ⊗ x ∗ )(y), z = (x ∗ (y))x, z = y, x x, z
and .
y, E(z) = y, (x ⊗ x ∗ )(z) = y, (x ∗ (z))x = z, x y, x = x, z y, x .
Therefore, .
E(y), z = y, x x, z = x, z y, x = y, E(z)
and .
E(y), y = y, x x, y = y, x y, x = | y, x |2 ≥ 0.
Consequently, E is self-adjoint and positive semi-definite.
(13)
486
J. J. Benedetto et al.
d Proposition 9 Let .{xj }N j =1 be a Parseval frame for .H = K . Then, .{Ej := xj ⊗ d xj∗ }N j =1 ⊆ E(K ) is a POVM on .H.
Proof First, note that each .Ej ∈ E(H). In fact, .Ej ≤ I by (13) since .
2 Ej (y), y = | y, xj |2 ≤ y2 xj ≤ y, y ,
where we have used the fact that Parseval frames are contained in the closed unit ball of .H (stated for .H = Kd in Proposition 6). This is also a consequence of the resolution of the identity formula that we shall now verify, since the .Ej are positive semi-definite. For any .y ∈ H, we have N .
N N Ej (y), y = | y, xj |2 = y2 Ej y, y =
j =1
j =1
j =1
by the Parseval condition, so that any eigenvalue of . N j =1 Ej must have absolute value 1. Each .Ej is self-adjoint and positive semi-definite (Lemma 1), and hence N . Ej is self-adjoint and positive semi-definite. Thus, each eigenvalue of jN=1 . j =1 Ej must be real and non-negative. Combining these facts shows that 1 is the only eigenvalue of the operator . N j =1 Ej . The spectral theorem then implies N that . j =1 Ej is the identity operator. Conversely, given any POVM .{Ej }j ∈J on .Kd , we can construct a Parseval frame N for .Kd from the eigenvectors of the .E in the following way. The hypothesis .{xj } j j =1 that we are given a POVM is only used in the penultimate equality of the following proof. Proposition 10 Let .H = Kd , and let .{Ej }j ∈J ⊆ E(H) be a POVM on .H. There exists a Parseval frame .{xj,k }j ∈J,1≤k≤d for .H such that for each .Ej we have .Ej = d ∗ k=1 xj,k ⊗ xj,k . Proof For each .Ej , we invoke the spectral theorem to choose an eigenbasis {ej,k }dk=1 for .Ej corresponding to the (real, non-negative, not necessarily distinct) eigenvalues .λj,k , k = 1, . . . , d of .Ej . Then, for each .j ∈ J and .k ∈ {1, . . . , d}, set .xj,k = λj,k ej,k . For any .y ∈ H and any .j ∈ J , we can write .y = dk=1 yk ej,k . Next, we compute .
d .
k=1
∗ xj,k ⊗ xj,k y=
d
d y, xj,k xj,k = y, ej,k λj,k ej,k
k=1
k=1
=
d k=1
yk λj,k ej,k
A Generalization of Gleason’s Frame Function for Quantum Measurement
=
d
yk Ej (ej,k ) = Ej
k=1
so that .Ej =
.
d
k=1 xj,k
d
487
yk ej,k
= Ej (y),
k=1
∗ . Furthermore, ⊗ xj,k
d d | xj,k , y |2 = y, xj,k xj,k , y j ∈J k=1
j ∈J k=1
=
d y, xj,k xj,k , y j ∈J k=1
=
d
∗ )(y), y (xj,k ⊗ xj,k
j ∈J k=1
⎞ ⎞ ⎛ ⎛ d ∗ = ⎝ xj,k ⊗ xj,k ⎠ y, y = ⎝ Ej ⎠ (y), y = y, y = y2 , j ∈J k=1
j ∈J
where the penultimate equality follows since .{Ej }j ∈J ⊆ E(H) is a POVM on .H. Therefore, .{xj,k }j ∈J,1≤k≤d is a Parseval frame for .H = Kd .
4 Gleason Functions for Parseval Frames 4.1 Quadratic Forms Are Gleason Functions for Parseval Frames Suppose .f : Kd −→ K is a function for which there exists .W ∈ K such that for any frame .{xj }j ∈J for .Kd we have . j ∈J f (xj ) = W . Such a function must be identically zero since one can add an arbitrary vector to any frame and the set remains a frame. Therefore, a more specific class of frames must be examined in order to extend Gleason’s theorem to frames. The clear choices are Parseval frames or FUNTFs, since Parseval frames and FUNTFs reduce to ONBs when the cardinality of the frame is the dimension of the Hilbert space, see Proposition 6. Given that Gleason was originally concerned with measures corresponding to quantum measurement, and since Parseval frames directly correspond to positive operator-valued measures (Sect. 3.2), which in turn are a general form of quantum measurement, we shall extend the notion of Gleason’s functions to Parseval frames as promised in Sect. 1.2. The spectral theorem and a straightforward calculation give Theorem 7, which is a direct generalization of Theorem 2.
488
J. J. Benedetto et al.
Theorem 7 Let .H = Kd , and let A be a self-adjoint linear operator .A : H −→ H. The function .g : B d −→ K, defined by ∀x ∈ B d ,
.
g(x) = A(x), x
is a Gleason function of weight .W = tr(A) for the finite Parseval frames X for .H. Clearly, .g ∈ L∞ (B d ) with .gL∞ (B d ) ≤ Aop , .g(0) = 0, and ∀x ∈ B d and ∀α ∈ K, where |α| ≤ 1,
.
αx ∈ B d and g(αx) = |α|2 g(x).
Proof By the spectral theorem, there exists an orthonormal eigenbasis .{ej }dj =1 associated with the set .{λj }dj =1 of eigenvalues for A. Hence, for all .x ∈ H, we d d N have .x = j =1 x, ej ej and .A(x) = j =1 x, ej λj ej . If .X = {xj }j =1 is a Parseval frame for .H, then we compute tr(A) =
d
.
λj =
j =1
d
N d 2 λj ej = λj | xn , ej |2 ,
j =1
j =1
n=1
where the last equality follows since the frame is Parseval. Reordering the finite sums yields the desired result: tr(A) =
d N
.
λj ej , xn xn , ej
n=1 j =1
=
d N
N N A(xn ), xn = xn , ej λj ej , xn = g(xn ).
n=1 j =1
n=1
n=1
Therefore, g is a Gleason function of weight .W = tr(A) for the finite Parseval frames X for .H. Remark 8 (Generalizations of Theorem 7) a. Theorem 7 is true for normal operators over .C, since the spectral theorem is true in that setting, e.g., see [42], page 377. Furthermore, Theorem 7 is true for arbitrary linear operators A over .H = Cd , since every such A can be written as .A = B + iC, where B and C are self-adjoint linear operators, see, e.g., [47], Sect. 70. b. The proof of Theorem 7 can be extended to infinite Parseval frames .X = {xn }∞ n=1 2 for .Kd by noting that the series . dj =1 ∞ n=1 λj | xn , ej | < ∞ is absolutely convergent so that the terms can be rearranged, see Remark 4. c. See the Problem stated in Sect. 5.1 for a role that the condition .g(0) = 0 in Theorem 7 plays.
A Generalization of Gleason’s Frame Function for Quantum Measurement
489
In the remainder of this section, we shall establish broad conditions which imply that a Gleason function g for all Parseval frames for .H = Kd is of the form . A(x), x
for some linear operator A. We shall first focus on the case in which g is continuous or non-negative and then extend these results to bounded Gleason functions.
4.2 Basic Properties This subsection collects some general facts about Gleason functions for the finite Parseval frames for .H = Kd , which we shall use in the sequel. The first two results are elementary and are stated on their own to avoid unnecessary repetition. Proposition 11 Let .H = Kd . Then, the Gleason functions for the finite Parseval frames for .H form a .K-vector space under pointwise addition of functions and scalar multiplication. Proof Suppose .f, g : B d → K are two Gleason functions of weights .W1 and .W2 , respectively, for the Parseval frames for .H, and let .α, β ∈ K. We show that .αf + βg is a Gleason function of weight .αW1 + βW2 for the Parseval frames for .H. Let N ⊆ B d be a Parseval frame for .H. Then, .{xi } i=1 .
N N N N (αf + βg)(xi ) = αf (xi ) + βg(xi ) = α f (xi ) + β g(xi ) = αW1 + βW2 , i=1
i=1
i=1
i=1
as claimed.
Proposition 12 Let .H = Kd , and let g be a Gleason function of weight W for the finite Parseval frames for .H. Then, .g(0) = 0. Proof Let .{v1 , . . . , vd } be any ONB for .H. Then, as a consequence of the Parseval identity for ONBs, both .{v1 , . . . , vd } and .{0, v1 , . . . , vd } are Parseval frames for .H. Because g is a Gleason function of weight W for the Parseval frames for .H, we have d .
i=1
Thus, .g(0) = 0.
g(vi ) = W = g(0) +
d
g(vi ).
i=1
The following is a key lemma in obtaining information on the values of Gleason functions for Parseval frames. Lemma 2 Let .H = Kd , and let g be a Gleason function of weight W for the finite frames for .H. Let .{α1 , . . . , αn } be a finite sequence in .K such that n Parseval 2 = 1. Then, . |α | i i=1
490
J. J. Benedetto et al. n
∀x ∈ B d ,
.
g(αi x) = g(x).
i=1
Proof If .x = 0, then .αi x = αi · 0 = 0 for all i, and using Proposition 12, it follows that n .
g(αi x) = 0 = g(0),
i=1
as claimed. Otherwise .x ∈ B d \ {0}, and set .x1 = βx where .β = (1 − x2 )1/2 / x. Choose an ONB .X1 := {u1 , . . . , ud−1 } for the orthogonal complement .Y ⊥ of the closed linear span Y of x. Then, the sequences .X := {x} ∪ {x1 } ∪ X1 and .X := {α1 x, . . . , αn x} ∪ {x1 } ∪ X1 are both Parseval frames for .H. To verify this claim, begin by taking any .y ∈ H. Let .v = x/ x, so that .{v, u1 , . . . , ud−1 } is an ONB for .H. Then, by Parseval’s identity for ONBs, .
| y, u |2 = | y, x |2 +
u∈X
1 − x2 | y, x |2 + | y, u |2 + | y, v |2 − | y, v |2 2 x u∈X 1
= | y, x |2 +
x2
1− x2
| y, x |2 + y2 −
1 | y, x |2 = y2 . x2
Hence, X is a Parseval frame for .H. Because n .
| y, αi x |2 =
i=1
n
|αi |2 | y, x |2 = | y, x |2 ,
i=1
the above calculations imply that .X is also a Parseval frame for .H. Because X and .X are both Parseval frames for .H and g is a Gleason function of weight W for the finite Parseval frames for .H, we have
g(x) + g(x1 ) +
.
u∈X1
g(u) = W =
n i=1
Canceling like terms gives the desired result.
g(αi x) + g(x1 ) +
g(u).
u∈X1
Kd ,
Lemma 3 Let .H = and let g be a Gleason function of weight W for the finite Parseval frames for .H. Let .x ∈ B d , and let .q ∈ Q be a non-negative rational number √ √ with the property that . qx ∈ B d . Then, .g( qx) = q · g(x). Proof First, observe that for any .z ∈ B d √ and for any positive integer P , the sequence .{α1 , . . . , αP } defined by .αi := 1/ P for all i satisfies the hypotheses of Lemma 2. Hence,
A Generalization of Gleason’s Frame Function for Quantum Measurement
∀z ∈ B and ∀P ∈ N,
g(z) = P g
d
.
z √ P
491
(14)
.
Now, let .x ∈ B d , and suppose first that .q = M/N ∈ Q ∩ [0, 1], where M and .N √ are non-negative integers. Clearly .N = 0, and if .M = 0, the proof that .g( qx) = q · g(x) is immediate since .g(0) = 0. Thus, we may assume .M = 0. By (14), √ M d .g(x) = Ng(x/ N ). For .y = N x, we have .y ∈ B because .y ≤ x, and (14) √ gives .g(y) = Mg(y/ M). Hence, M . g(x) = Mg N
x √ N
= Mg
y √ M
= g(y) = g
M x . N
√ Otherwise, .q > 1. Then, .1/ q ∈ Q ∩ [0, 1], and the results of the preceding paragraph imply that g(x) = g
.
1 √ qx √ q
=
1 √ g( qx). q
Multiplying both sides by q yields the claim.
4.3 Gleason Functions and Quadratic Forms We showed in Theorem 7 that a self-adjoint operator generates a Gleason function for Parseval frames that is a quadratic form. We have the following result, cf. Busch [23] and Caves et al. [29]. Theorem 8 Let .H = Kd , and let g be a Gleason function of weight W for the finite Parseval frames for .H. Suppose g is continuous on .B d . Then, ∀x ∈ B d and ∀α ∈ K with |α| ≤ 1,
.
g(αx) = |α|2 g(x).
Proof Let .x ∈ B d , and let .α ∈ K satisfy .|α| ≤ 1. If .α = 0, then we need to show .g(0) = 0, but this has already been shown. Thus we may assume .α = 0. Let .ζ = |α|/α; then, .|ζ | = 1. By Lemma 2, .g(αx) = g(ζ αx) = g(|α|x). Hence, without loss of generality, we may take .α ∈ (0, 1]. Let .{qn }∞ n=1 be a sequence in .Q ∩ [0, 1] with .qn → α. Then, .qn2 → α 2 , so by continuity of g and Lemma 3, g(αx) = lim g(qn x) = lim qn2 g(x) = α 2 g(x)
.
as claimed.
n→∞
n→∞
492
J. J. Benedetto et al.
Theorem 9 Let .H = Kd , and let g be a Gleason function of weight W for the finite Parseval frames for .H. Suppose g is non-negative. Then, ∀x ∈ B d and ∀α ∈ K with |α| ≤ 1,
.
g(αx) = |α|2 g(x).
Proof Let .x ∈ B d and .α ∈ K with .|α| ≤ 1. As in the proof of Theorem 8, we may assume .α ∈ (0, 1]. Also, we may take .x = 0, as if .x = 0, then the claim is .g(0) = 0, which has already been shown. Let .θ ∈ [0, π/2]. Then, by the Pythagorean theorem, the sequence .{cosθ, sin θ } satisfies the hypotheses of Lemma 2, so that .g(x) = g(cos(θ )x) + g(sin(θ )x). Consider the line segment .Lx := {βx : β ∈ [0, 1]} extending from the origin to x. Let .0 ≤ γ < β ≤ 1, and set .θ = cos−1 (γ /β) ∈ [0, π/2]. Set .y := βx and .z := γ x, so that .y, z ∈ Lx . Then, g(y) − g(z) = g(y) − g(γ x) = g(y) − g(cos(θ )y) = g(sin(θ )y) ≥ 0.
.
Therefore, g is monotonically increasing from .g(0) = 0 on .Lx . ∞ Now, let .{pn }∞ n=1 and .{qn }n=1 be sequences in .Q ∩ [0, 1] with .{pn } decreasing, 2 2 .{qn } increasing, and .limn→∞ pn = limn→∞ qn = α . Then, .pn g(x) → α g(x) and 2 .qn g(x) → α g(x). Also, by monotonicity of g and Lemma 3, √ √ qn g(x) = g( qn x) ≤ g(αx) ≤ g( pn x) = pn g(x).
.
Combining these claims gives the desired result. This is a standard technique, see Busch [23] and Caves et al. [29]. Remark 9 (Continuity on Rays) Let g be a non-negative Gleason function for the Parseval frames for .Kd . In Theorem 9, we proved that along any ray beginning at the origin, g is an increasing function beginning at .g(0) = 0 and going out to the boundary of .B d . It is immediate from the definition that if g is a Gleason function for the Parseval frames for .Kd , then g is a Gleason function for the ONBs for .Kd . We have also noted that if g is defined by a self-adjoint linear operator, then g is a Gleason function for the Parseval frames for .Kd (Theorem 7). Using Gleason’s original theorem, we shall now prove various partial converses. We shall need the following result asserting that orthogonal projections of ONBs are Parseval frames. It is an elementary converse to Naimark’s theorem. Proposition 13 Let .H = Kd , and let .G be a closed subspace of .H. Write P for the orthogonal projection of .H onto .G. Let .{xj }N j =1 be a Parseval frame for .H. Then, N N .{P (xj )} j =1 is a Parseval frame for .G. In particular, if .{xj }j =1 is an ONB for .H, N then .{P (xj )}j =1 is a Parseval frame for .G. Proof Let .y ∈ G, so that .y = P (y). We verify the Parseval condition for {P (xj )}N j =1 by direct calculation:
.
A Generalization of Gleason’s Frame Function for Quantum Measurement
.
y2 =
493
N N N 2 | y, xj |2 = | y, P (xj ) |2 , P (y), xj = j =1
j =1
j =1
where the last equality holds because orthogonal projections are normal.
Remark 10 (Naimark’s Theorem) For Naimark’s theorem generally, see, e.g., Naimark [61, 62], Paulsen [67] in terms of POVMs, and Czaja [32], cf. Chandler Davis [34]. A beautiful idea dealing with their dilation viewpoint on frames gave rise to Han and Larson’s theorem, see [50], Proposition 1.1. It is at once a special case of Naimark’s theorem, it has an elementary proof different from Naimark’s formulation, it generalizes significantly in terms of group representations, and it has broad applicability, e.g., [27, 28]. Theorem 10 Let .H = Kd . A non-negative function .g : B d → R is a Gleason function of weight W for the finite Parseval frames for .H if and only if there exists a self-adjoint and positive semi-definite operator .A : H → H with trace .tr(A) = W such that ∀x ∈ B d ,
.
g(x) = A(x), x .
Proof If such an A exists, then g is a Gleason function of weight W for the finite Parseval frames for .Kd and, in fact, for all Parseval frames for .H, by Theorem 7. For the converse, let g be a Gleason function of weight W for the finite Parseval frames for .H. Let .N = max(d, 3) and consider the Hilbert space .KN . We can naturally identify .H with the closed subspace of .KN spanned by the first d standard basis vectors. Let .P : KN → H be the projection onto the first d coordinates, and define .F : B N → R by .F (x) := g(P (x)) for all .x ∈ B N . For an arbitrary N N ONB .{ei }N i=1 for .K , .{P (ei )}i=1 is a Parseval frame for .H by Proposition 13. Then, because g is a Gleason function of weight W for the finite Parseval frames for .H, we have N .
i=1
F (ei ) =
N
g(P (ei )).
i=1
It follows that F is a Gleason function of weight W for the ONBs for .KN . By Gleason’s Theorem 2 (which applies to F because F is non-negative and .n ≥ 3), there exists a necessarily positive self-adjoint operator .B : H → H such that .F (x) = B(x), x for all .x ∈ S N −1 . Set .A := P BP . We claim .g(x) = A(x), x for all .x ∈ B d . Let .x ∈ B d . If .x = 0, then the claim is .g(0) = 0, which has been proven already. Otherwise, .x = 0, and .y := x/ x is a unit norm vector. Since .y ∈ H, .y = P (y). Using Theorem 9 and the fact that P is self-adjoint, we obtain
494
J. J. Benedetto et al.
g(x) = g(x y) = x2 g(y) = x2 B(y), y
.
= x2 B(P (y)), P (y) = x2 A(y), y = A(x), x
.
as claimed. Note that A is a self-adjoint operator .H → H, since A∗ = (P BP )∗ = P ∗ B ∗ P ∗ = P BP = A.
.
Thus, the spectral theorem gives an ONB .{ui }di=1 for .H consisting of eigenvectors of A; say .A(ui ) = λi ui for each i. Then, .{ui }di=1 is a Parseval frame for .H, so that d
W =
.
d
g(ui ) =
i=1
A(ui ), ui =
i=1
d
λi ui 2 =
i=1
d
λi = tr(A),
i=1
completing the proof of the claim.
Theorem 10 extends to similar theorems about bounded Gleason functions, as we demonstrate. Theorem 11 Let .H = Kd . A bounded, real-valued function .g : B d → R is a Gleason function of weight W for the finite Parseval frames for .H if and only if there exists a self-adjoint operator .A : H → H with trace .tr(A) = W such that ∀x ∈ B d ,
.
g(x) = A(x), x .
Proof For the “if” direction, see Theorem 7. For the “only if” implication, let .λ := supx∈B d |g(x)|. We first claim that 2 .|g(x)| ≤ λ x . Suppose by contradiction that this is not the case; then, there d exists .y ∈ B such that .|g(y)| > λ y2 . In particular, there exists some .ε > 0 such that .|g(y)| > (λ + ε) y2 . By Proposition 12, we have .g(0) = 0 and so .y = 0. Since .λ/(λ + ε) < 1, there exists a positive rational number a satisfying λ .
Set .z =
√
(λ + ε)
y2
≤a≤
1 . y2
ay. Then, .z2 = a y2 ≤ 1 so that .z ∈ B d . Furthermore, by Lemma 3, |g(z)| = a|g(y)| > a(λ + ε) y2 ≥ (λ + ε)
.
λ = λ. λ+ε
Hence, .|g(z)| > λ, contradicting the choice of .λ. Now, define an auxiliary function .f (x) = g(x) + λ x2 . Then, f is nonnegative, since for any .x ∈ B d , we have f (x) ≥ λ x2 − |g(x)| ≥ 0.
.
A Generalization of Gleason’s Frame Function for Quantum Measurement
495
Furthermore, f is a Gleason function of weight .W +λd for the finite Parseval frames for .H. By Theorem 10, there exists a positive semi-definite self-adjoint operator B such that .f (x) = B(x), x for all .x ∈ B d . Then, .g(x) = x, (B − λI )(x) , where .B − λI is self-adjoint. The theorem follows by setting .A := B − λI , noting that .tr(A) = tr(B − λI ) = W + λd − λd = W . It is now not difficult to extend this result to the complex case. Theorem 12 Let .H = Cd . A bounded function .g : H → C is a Gleason function of weight W for the finite Parseval frames for .H if and only if there exists a linear operator .A : H → H with trace .tr(A) = W such that ∀x ∈ B d ,
.
g(x) = A(x), x .
Proof For the “if” implication, see Remark 8a. For the “only if” implication, observe that the real and imaginary parts of g are themselves Gleason functions for the finite Parseval frames for .H. Thus, .g = u + iv where u and v are both √ real-valued Gleason functions for the finite Parseval frames for .H. Since .|g| = u2 + v 2 is bounded, so too are u and v, and Theorem 11 applies. Hence there exist self-adjoint operators B and C such that .u(x) = B(x), x
and .v(x) = C(x), x for all .x ∈ B d . Let .A = B + iC; then .g(x) = A(x), x for all .x ∈ B d . Since W is equal to the weight of u plus i times the weight of v, we have .tr(A) = tr(B) + itr(C) = W .
5 Gleason Functions of Degree N 5.1 Inclusion Theorem and a Problem Let .PN be the set of Parseval frames for .H = Kd , for which each .P ∈ PN has .N ≥ d elements. Definition 8 (Gleason Functions of Degree N ) Let .H = Kd . A function .g : B d −→ K, B d ⊆ H, is a Gleason function of degree N and weight .W = Wg,N ∈ K for the set .PN of Parseval frames for .H if ∀X = {xj }N j =1 ∈ PN ,
N
.
g(xj ) = W.
j =1
Also, .GN designates the set of bounded Gleason functions of degree N and any weight. The proof of the following result is the same as that of Theorem 7.
496
J. J. Benedetto et al.
Theorem 13 Let .H = Kd , and let .A : H → H be a self-adjoint operator. The function .g : B d → K, defined by the formula, ∀x ∈ B d ,
.
g(x) = A(x), x ,
(15)
is a Gleason function of degree N, for any .N ≥ dimH, and weight .W = tr(A) for the set .PN of Parseval frames for .H. Example 5 (Gleason Functions for ONBs and Parseval Frames) Let .H = Kd . a. 2 Clearly, the function .g : B d → R defined by .g(x) = ex − 1 is constant on .S d−1 and therefore is a bounded Gleason function of weight .W = d(e − 1) for the ONBs for .H. b. We shall show that g is not a Gleason function of degree .N > d for the Parseval frames for .H. To this end, let .{xj }N j =1 be an equi-normed Parseval frame for .H, see Proposition 10 d as well as part c below, and let .{yj } be an ONB for .H with .N − d copies of the zero vector adjoined, so that it is also a Parseval frame for .H. Then, we have N
d
g(xj ) = Ne N − N = de1 + (N − d)e0 − N =
.
N
g(yj ),
j =1
j =1
where the inequality is clear. The fact that the sums are unequal proves that g is not a Gleason function of degree .N > d for the Parseval frames for .H. c. The fact that there are equi-normed Parseval frames for .H having .N > d elements fits into the theory of harmonic frames [4, 83], which itself is part of the group frame theory mentioned in Definition 4. For an explicit calculation to show the existence of equi-normed Parseval frames for .Cd , consider the .N × N DFT matrix, let .c > 0, and let .s : {1, . . . , d} → {1, . . . , N } be strictly increasing. Consider the N vectors .{xm }N m=1 , xm = c(e2π ims(1)/N , . . . , e2π ims(d)/N ) ∈ Cd .
.
For .z = (z1 , . . . , zd ) ∈ Cd , we compute N .
| z, xm |2 = c2
m=1
.
= c2
j =k
zj zk
d j,k=1
N m=1
zj zk
N
e2π im(s(k)−s(j ))/N
m=1
d N e2π im(s(k)−s(j ))/N + c2 |zj |2 ( 1) = Nc2 z2 . j =1
m=1
A Generalization of Gleason’s Frame Function for Quantum Measurement
497
√ Thus, .{xm }N is a Parseval frame when .c = 1/ N, and in this case we compute m=1√ that .xm = d/N for each m. In particular, each .xm ∈ B d as asserted in Proposition 10 e. Example 5 leads to the following problem. Problem a. Let .H = Kd and .N > d. Note that .GN ⊆ GN −1 . To see this, let .g ∈ GN have −1 N weight .Wg,N , and let .{yj }N j =1 ∈ PN −1 . Then, .{xj }j =1 ∈ PN , where .xj = yj for .1 ≤ j ≤ N − 1 and .xN = 0 ∈ H, since x2 =
∀x ∈ Kd ,
.
N −1
N | x, yj |2 = | x, xj |2 .
j =1
j =1
Thus, N −1 .
j =1
g(yj ) =
N −1
g(yj ) + g(0) − g(0) =
j =1
N
g(xj ) − g(0) = Wg,N − g(0),
j =1
i.e., .g ∈ GN −1 with weight .Wg,N −1 = Wg,N − g(0). b. We also have for .N > d that .GN Gd due to Example 5. Therefore, ∀N > d,
.
GN +1 ⊆ GN ⊆ · · · Gd .
The problem is to resolve if the inclusions are proper when .N > d. We shall prove Theorem 14, which, when combined with part a of the Problem, allows us to assert that ∀N ≥ d + 2,
.
GN +1 = GN .
(16)
Theorem 14 Let .H = Kd , and assume .N ≥ d + 2. Then, every bounded Gleason function g of degree N and weight W for the set .PN of Parseval frames for .H is also a Gleason function of degree .N + 1 and weight .W + g(0) for the set .PN +1 of Parseval frames for .H. The case .N < d is not important because no Parseval frames with fewer than d elements exist for .Kd . In the case .N = d, there exist Gleason functions of degree N but not of degree .N + 1, as shown in Example 5. For .N = d + 1 and .d = 1, it is known that .GN +1 GN , see Example 6. It is unknown whether .Gd+2 Gd+1 for .d > 1. The proof of Theorem 14 will be given in Sect. 5.2.
498
J. J. Benedetto et al.
5.2 Proof of Theorem 14 As in the proof of Theorem 10, we first consider the behavior of the functions of interest along lines through the origin. We use results from this case, combined with Gleason’s theorem, to prove a weak version of Theorem 14. From there, we extend the result to the full Theorem 14. Specifically, besides the theory developed in Sect. 4, the flow chart for proving Theorem 14 is the following. Lemma 4 is proved using Proposition 14, and this lemma is used to prove Lemma 5. Both lemmas have the setting .H = K1 . Lemma 5 is used in the proof of Theorem 15, along with Theorem 5, which is an extension of Gleason’s original theorem. Theorem 5 is the aforementioned weak version, and routine adjustments allow us to obtain Theorem 14. Proposition 14 Let .H = K1 = K, and let .X = {xi }N i=1 ⊆ H, .N ≥ 1. Then, X is a Parseval frame for .H if and only if N .
xi 2 = 1.
(17)
i=1
Proof i. Let .e1 be the standard basis vector for .H and so .e1 = 1, and let .y, z ∈ H. Then, .y = y e1 and .z = z e1 for some .y , z ∈ K. Thus, .y = |y | and .z = |z |, and hence, | y, z |2 = | y e1 , z e1 |2 = |y |2 |z |2 | e1 , e1 | = |y |2 |z |2 = y2 z2 . (18) ii. Assume (17). Using (18), we verify the Parseval frame condition as follows. Let .x ∈ H and calculate .
N .
i=1
| x, xi |2 =
N
x2 xi 2 = x2
i=1
N
xi 2 = x2 .
i=1
2 2 iii. Assume X is a Parseval frame for .H. Since .| e1 , xj |2 = e1 2 xj = xj by (18), we have N N 2 2 xj , . | e1 , xj | = j =1
j =1
but by the Parseval assumption on X the left side is .e1 2 = 1, and this gives (17).
A Generalization of Gleason’s Frame Function for Quantum Measurement
499
Using Proposition 14, we can establish the one-dimensional special case (where g(0) = 0) of Theorem 14. The proof of the one-dimensional case presented below does not depend crucially on the hypothesis .g(0) = 0, assuming, of course, that the necessary changes to the statement are made. We include the hypothesis .g(0) = 0 merely because it will be important in a future step of the proof of Theorem 14.
.
Lemma 4 Let .H = K1 = K, and assume g is a Gleason function of degree .N ≥ 3 and weight W for the set .PN of Parseval frames for .H. Furthermore, assume that .g(0) = 0. Then, g is a Gleason function of degree .N + 1 and weight W for the set .PN +1 of Parseval frames for .H. +1 Proof Let .X = {xi }N i=1 be a Parseval frame for .H, and let .e1 be the standard basis +1 vector for .H and so .e1 = 1. Since .{xi }N i=1 is a Parseval frame for .H, each .xi has 2 2 .xi ≤ 1. Also, .1 − x1 − x2 ≥ 0 by Proposition 14. Furthermore, also by Proposition 14, we see that
! x1 , x2 , 1 − x1 2 − x2 2 · e1
.
is a 3-element Parseval frame for .H. Similarly, .
x1
2
+ x2
2
! · y, 1 − x1 2 − x2 2 · e1 , 0
is a 3-element Parseval frame for .H. By appending .N − 3 copies of the 0-vector to these two sequences, we obtain N-element Parseval frames for .H. Because g is a Gleason function of degree N and weight W for the Parseval frames for .H, we have g(x1 ) + g(x2 ) + g
.
1 − x1 2 − x2 2 · e1 + (N − 3)g(0) = W
and g
.
x1
2
+ x2
2
· e1
+ g 1 − x1 2 − x2 2 · e1 + (N − 2)g(0) = W,
so that g(x1 ) + g(x2 ) = g
.
x1 2 + x2 2 · e1 .
Also note that .
! +1 x1 2 + x2 2 · e1 ∪ {xi }N i=3
is an N-element Parseval frame for .H by Proposition 14, since
(19)
500
J. J. Benedetto et al.
.
2 x1 2 + x2 2 · e1 = x1 2 + x2 2 .
Thus, using (19), we obtain N +1 .
g(xi ) = g
+1
N x1 2 + x2 2 · e1 + g(xi ) = W,
i=1
i=3
+1 where the second equality is a consequence of our assumption on g. Since .{xi }N i=1 1 was an arbitrary .(N +1)-element Parseval frame for .K , it follows that g is a Gleason function of degree .N + 1 and weight W for the Parseval frames .PN +1 for .H.
Example 6 (Lemma 4 and .N = 2) Lemma 4 is false for .N = 2 and counterexamples are not difficult to construct, as we now illustrate. Let .H = K1 . a. Choose an arbitrary .ε ∈ (0, 1/3). Define the function .g : B 1 → K by ⎧ 2 ⎪ ⎪ ⎨x , .g(x) = 1 − ε, ⎪ ⎪ ⎩ε,
x2 ∈ {ε, 1 − ε}, x2 = ε x2 = 1 − ε.
This g is a Gleason function of degree 2 and weight 1 for the set .P2 of Parseval frames for .H. To see this, suppose .{x1 , x2 } is a Parseval frame for .H. If either of .x1 2 or .x2 2 is one of the elements of the set .{ε, 1 − ε}, then the other squared norm must be the other element of the set .{ε, 1 − ε} by Proposition 14. Thus, if .x1 2 = ε, then .g(x1 ) = 1 − ε and .g(x2 ) = ε, yielding .g(x1 )+g(x2 ) = 1, which is the desired Gleason function property, with a similar calculation when .x1 2 = 1 − ε. If .x1 2 ∈ {ε, 1 − ε}, then .x2 2 ∈ {ε, 1 − ε} by Proposition 14. Consider the parabola .h(x) := I (x), x defined on .B d , noting that the trace of the identity mapping I is 1. We have .h(x1 ) + h(x2 ) = 1 by Theorem 7. On the other hand, g coincides with . x, x at .x1 and .x2 , and hence .g(x1 )+g(x2 ) = h(x1 )+h(x2 ) = 1, which again is the desired Gleason function property. b. However, g is not a Gleason function of degree .N > 2 and any weight for the Parseval frames for .H. To see this, let .e1 be the standard basis vector for .H, so that .e1 = 1. Given N for .H by setting .N ≥ 2, one can construct an N-element Parseval frame .{xi } i=1 .x1 = e1 and .x2 = x3 = · · · = xN = 0. This is a Parseval frame for .H by Proposition 14. We have N .
i=1
g(xi ) = g(x1 ) + (N − 1)g(0) = 1.
A Generalization of Gleason’s Frame Function for Quantum Measurement
501
However, we can also construct an N-element Parseval frame .{yi }N i=1 for .H by √ √ setting .y1 = y2 = ε · e1 , .y3 = 1 − 2ε · e1 , and .y4 = y5 = · · · = yN = 0. This is a Parseval frame for .H by Proposition 14. Observe that .1 − 2ε ∈ {ε, 1 − ε} since .ε ∈ (0, 1/3). Hence, N .
g(yi ) = g(y1 )+g(y2 )+g(y3 )+(N −3)g(0) = 2(1−ε)+(1−2ε)+0 = 3−4ε.
i=1
Since .3 − 4ε > 5/3 > 1, g cannot be a Gleason function of degree .N > 2 and any weight for the Parseval frames for .H. The following lemma is elementary to prove given Lemma 4, and it is crucial for the next step in the proof of Theorem 14. Lemma 5 Let .H = K1 , and let .g : B 1 → K be a Gleason function of degree .N ≥ 3 and weight W for the set .PN of Parseval frames for .H. Assume that g is bounded and .g(0) = 0. Then, ∀α ∈ K, |α| ≤ 1, and ∀x ∈ B 1 ,
.
g(αx) = |α|2 g(x).
Proof An induction based on Lemma 4 implies that g is in fact a Gleason function of weight W for all finite Parseval frames for .H. Then, Theorem 11 or Theorem 12, according to whether .K = R or .C, respectively, implies that there is a linear operator 1 .A : H → H such that .g(x) = A(x), x for all .x ∈ B . Then, for .α ∈ K with .|α| < 1, g(αx) = A(αx), αx = |α|2 A(x), x = |α|2 g(x).
.
We now prove the special case of Theorem 14, where .g(0) = 0 and g is bounded. The proof is similar to that of Theorem 10. Theorem 15 Let .H = Kd , and assume .N ≥ d + 2. Let g be a Gleason function of degree N and weight W for the set .PN of Parseval frames for .H. Assume that g is bounded and that .g(0) = 0. Then, g is a Gleason function of weight W for all the Parseval frames for .H. Proof i. Let .H1 be a one-dimensional subspace of .H. Then, the restriction of g to .B d ∩H1 is a bounded Gleason function of degree .N − d + 1 ≥ 3 and some weight .W1 for the set .PN −d+1 of Parseval frames for .H1 . N −d+1 ⊥ is a Parseval To see this, let .{yi }d−1 i=1 be an ONB for .(H1 ) . If .{xj }j =1
−d+1 frame for .H1 , then .{xj }N ∪{yi }d−1 j =1 i=1 is a Parseval frame for .H. In fact, letting .P1 denote the orthogonal projection onto .H1 and .P2 denote the orthogonal projection onto .(H1 )⊥ , we have
502
J. J. Benedetto et al.
.
x2 = x, x = P1 (x) + P2 (x), P1 (x) + P2 (x)
= P1 (x)2 + P1 (x), P2 (x) + P2 (x), P1 (x) + P2 (x)2 = P1 (x)2 + P2 (x)2 .
Thus, we compute
.
x2 = P1 (x)2 + P2 (x)2 =
N −d+1
d−1 | P1 (x), xj |2 + | P2 (x), yi |2
j =1
=
i=1
N −d+1
d−1 | x, xj |2 + | x, yi |2
j =1
i=1
−d+1 ⊥ because .{xj }N is a Parseval frame for .H1 , .{yi }d−1 j =1 i=1 is an ONB for .(H1 ) , and .P1 and .P2 are self-adjoint. By our assumption on g, we obtain N −d+1 .
g(xj ) +
d−1
j =1
g(yi ) = W.
i=1
Therefore, . j g(xj ) = W − i g(yi ). Setting .W1 := W − i g(yi ), the claim follows. ii. Now, let .x ∈ S d−1 . Take .H1 = x in part i. Then, Lemma 5 gives g(αx) = |α|2 g(x)
.
(20)
for any .α ∈ K with .|α| ≤ 1. iii. Next, view .H as the subspace of the larger space .KN spanned by the first d standard basis vectors. Let .P : KN → H be the projection onto the first d coordinates, and let .F (x) = g(P (x)) for all .x ∈ S N −1 . If .{ei }N i=1 is any ONB is a Parseval frame for .H. for .KN , then Proposition 13 implies that .{P (ei )}N i=1 Thus, we have N .
i=1
F (ei ) =
N
g(P (ei )) = W.
i=1
N Since .{ei }N i=1 was an arbitrary ONB for .K , F is a bounded Gleason function N for the ONBs for .K . If .d = 0, then the theorem holds trivially, so we may assume .d > 0. Thus, .N = d + 2 ≥ 3, so that Theorem 5 gives a linear operator N −1 . In particular, for .A : H → H such that .F (x) = A(x), x for all .x ∈ S d−1 .x ∈ S , we have
g(x) = g(P (x)) = F (x) = A(x), x .
.
A Generalization of Gleason’s Frame Function for Quantum Measurement
503
For any .y ∈ B d , we have either .y = 0 (in which case, .g(y) = 0 = A(y), y ) or .0 < y ≤ 1. In the latter case, we have y y y
y
g(y) = g y · = y2 g = y2 A , = A(y), y . y y y y
.
By Theorem 7 and Remark 8 b, we see that g is a Gleason function for all the Parseval frames for .H (not just the finite ones). Once these facts have been established, the proof of Theorem 14 is straightforward. Proof of Theorem 14 Observe that .f : B d → K, defined by .f (x) := g(x) − g(0), is a bounded Gleason function of degree N and weight .W − Ng(0) for the set .PN of Parseval frames for .H. Indeed, if .{xj }N j =1 is a Parseval frame for .H, then N .
j =1
f (xj ) =
N
g(xj ) − Ng(0) = W − Ng(0).
j =1
Furthermore, .f (0) = 0. Hence, Theorem 15 implies that f is a Gleason function of weight .W − Ng(0) for the finite Parseval frames for .H. In particular, for any N +1 .(N + 1)-element Parseval frame .{yi } i=1 for .H, we have N +1 .
i=1
g(yi ) =
+1 N
f (yi ) +(N +1)g(0) = W −Ng(0)+(N +1)g(0) = W +g(0).
i=1
Thus, g is a Gleason function of degree .N + 1 and weight .W + g(0) for the set PN +1 of Parseval frames for .H.
.
6 An Application of Gleason Functions Theorem 14 has an application in quantum measurement with regard to the theory developed by Busch in [23]. To see this, let us begin with Definition 9 taken from [23]. The set .E(H) of operators on .H was defined in Definition 6. Definition 9 A generalized probability measure on .E(H) is a function, .v : E(H) → R, with the following properties: 1. .0 ≤ v(E) ≤ 1 for all .E ∈ E(H). 2. .v(I ) = 1. 3. .v( j ∈J E j) = j ∈J v(Ej ) for all countable indexed families .{Ej }j ∈J ⊂ E(H) for which . j ∈J Ej ∈ E(H).
504
J. J. Benedetto et al.
Busch characterized the generalized probability measures on Hilbert spaces of the type encountered in quantum mechanics as follows. Theorem 16 (Busch) Let .H be a separable complex Hilbert space, and let v be a generalized probability measure on .E(H). Then, there exists a density operator . on .H such that .v(E) = tr(E) for all .E ∈ E(H). (Recall that a density operator is a positive semi-definite trace class operator with trace 1.) Remark 11 a. From the perspective of quantum mechanics, elements of .E(H) can be interpreted as physical effects, while generalized probability measures on .E(H) can be interpreted as physical states. Hence, Theorem 16 asserts that states can be represented by density operators in a similar fashion to the result of Theorem 4. Some comparison of these theorems is in order. Gleason’s Theorem 4 is concerned with measures on the closed subspaces of a Hilbert space, whereas Busch’s Theorem 16 is concerned with measures on the effects of a Hilbert space, cf. Caves et al. [29]. Both admit similar physical interpretations. Busch’s theorem is valid when the Hilbert space .H has dimension 2. b. Busch’s theorem is striking, useful, and weaker than Gleason’s theorem. It is essentially Gleason’ theorem for POVMs, and it is weaker since v is defined on a much larger space of operators than in Gleason’s setting. We shall use Theorem 14 to prove that if .H = Kd , d ≥ 2, then condition (3) in Definition 9 can be replaced by the seemingly weaker condition, N (3. ) There exists .N ≥ dim H + 2 such that . N i=1 v(Ei ) = 1 whenever .{Ei }i=1 is an N -element POVM on .H. This is made precise by the following theorem: Theorem 17 Let .H = Kd , d ≥ 2. Suppose .v : E(H) → R is a non-negative function for assume that there exists .N ≥ d + 2 which .v(I ) = 1. Furthermore, N is an N -element POVM on .H. Then, v such that . N v(E ) = 1 whenever . {E } i i i=1 i=1 is a generalized probability measure on .E(H). Proof i. Define a function .gv on the closed unit ball .B d of .H by .gv (x) = v(x ⊗ x ∗ ), recalling that the tensor product .x ⊗ x ∗ : H × H → L(H) is the outer product ∗ .xx . We claim that .gv is a Gleason function of degree N and weight 1 for all of the Parseval frames for .H. Clearly, .gv is non-negative by its definition and the hypothesis on v. If .{xi }N i=1 is an N-element Parseval frame for .H, then .{xi ⊗ xi∗ }N i=1 is a POVM on .H by Proposition 9. Therefore, by our POVM assumption, we have N .
i=1
gv (xi ) =
N i=1
v(xi ⊗ xi∗ ) = 1
A Generalization of Gleason’s Frame Function for Quantum Measurement
505
and so .gv is a Gleason function of degree N and weight 1 for the set .PN of Parseval frames for .H. Also, note that .{I, 0, . . . , 0}, where there are .N − 1 copies of the 0-operator, is a POVM on .H and so, by our POVM assumption again, we have v(I ) + (N − 1)v(0) = 1 + (N − 1)v(0) = 1.
.
Thus, since .N ≥ d +2 > 1, we obtain .v(0) = 0 for the 0-operator in the domain of v. As a result, we see that .gv (0) = 0 for .0 ∈ B d in the domain of .gv . From Theorem 14, it follows that .gv is a Gleason function of degree .N + 1 and weight 1 for the set .PN +1 of Parseval frames for .H. A straightforward induction argument shows that .gv is therefore a Gleason function of weight 1 for all finite Parseval frames for .H. Theorem 10 or Theorems 11 or 12 apply to prove that .gv is a quadratic form on .B d . From there, Theorem 7 and Remark 8 b imply that .gv is a Gleason function for all the Parseval frames for .H (not just the finite frames) as claimed. We shall use this result in part iii. ii. We shall show that if .E = di=1 xi ⊗ xi∗ , then .v(E) = di=1 gv (xi ). For this, note that both .{E, I − E} and .{x1 ⊗ x1∗ , . . . , xd ⊗ xd∗ , I − E} are POVMs on .H. Appending copies of 0 to these POVMs until both have N elements and applying the hypothesized condition on v, we obtain the equation .v(E)+v(I
−E)+(N −2)v(0) = 1 =
d
v(xi ⊗ xi∗ )
+v(I −E)+(N −d −1)v(0).
i=1
Using .v(0) = 0 and canceling the .v(I − E) term show that v(E) =
d
.
v(xi ⊗ xi∗ ) =
d
i=1
gv (xi ),
i=1
as asserted. iii. Now, let .{Ej }j ∈J ⊆ E(H) be a countable sequence such that . j ∈J Ej ∈ E(H). Invoke the spectral theorem as in Proposition 10 to write .Ej = di=1 xij ⊗ xij∗ d for each .j ∈ J , . j ∈J Ej = i=1 yi ⊗yi∗ , and .I − j ∈J Ej = di=1 zi ⊗zi∗ for some collections of vectors .xij , yi , zi ∈ H. Hence, we have the two equations d .
zi ⊗ zi∗ +
i=1 d i=1
zi ⊗ zi∗ +
d i=1
d j ∈J i=1
⎛ yi ⊗ yi∗ = ⎝I − ⎛
xij ⊗ xij∗ = ⎝I −
j ∈J
j ∈J
⎞ Ej ⎠ + ⎞ Ej ⎠ +
Ej = I,
j ∈J
j ∈J
Ej = I.
506
J. J. Benedetto et al.
Since .(u ⊗ u∗ )(x) = x, u u (Definition 7), we can apply these operators on the left side of both equations to any .x ∈ H and then take the inner product with x, to assert that .{yi }di=1 ∪ {zi }di=1 and .{xij }j ∈J,i=1,...,d ∪ {zi }di=1 are both Parseval frames for .H. For example,
.
d d x2 = | I (x), x | = zi ⊗ zi∗ x + yi ⊗ yi∗ x, x i=1
i=1
d d d d = x, zi zi + x, yi yi , x = | x, zi |2 + | x, yi |2 i=1
=
d
i=1
| x, zi |2 +
d
i=1
i=1
i=1
| x, yi |2 .
i=1
Thus, d .
gv (zi ) +
i=1
d
gv (yi ) = 1 =
i=1
d
gv (zi ) +
i=1
d
gv (xij ),
j ∈J i=1
so that d .
i=1
gv (yi ) =
d
gv (xij ).
j ∈J i=1
Consequently, we obtain ⎛ ⎞ d d .v ⎝ Ej ⎠ = gv (yi ) = gv (xij ) = v(Ej ), j ∈J
i=1
j ∈J i=1
(21)
j ∈J
where the last equality follows from part ii. Equation (21) is the desired countable additivity condition of Definition 9.
Appendix a. If N > d 2 , then there is no ETF for Cd consisting of N elements; and these values of N can be viewed as a natural regime for the Grassmannian frames defined in part c. Furthermore, if N < d 2 , then there are known cases for which there are no ETFs, e.g., d = 3, N = 8 [78]. Determining compatible values of d, N for which there are ETFs is a subtle, unresolved, and highly motivated problem, see, e.g., [38–40, 83].
A Generalization of Gleason’s Frame Function for Quantum Measurement
507
b. (ETFs and the Welch bound) The coherence or maximum correlation μ(X) of a d set X = {xj }N j =1 ⊆ C of unit norm elements is defined as μ(X) = maxj =k | xj , xk |.
.
(22)
Welch [85] proved the fundamental inequality, " μ(X) ≥
.
N −d , d(N − 1)
(23)
which itself is important in understanding the behavior of the narrow band ambiguity function, see [8, 51] and part e. The right side of the inequality (23) is the Welch bound, cf. Proposition 6 f. In the case that X is a FUNTFfor Cd , then
N −d equality holds in (23) if and only if X is an ETF with constant α = d(N −1) , see [77], Theorem 2.3, as well as [17], Theorem IV.2 (Theorem 3) for a modest but useful generalization. Because of the importance of Gabor frames in this topic,
1 we note that if N = d 2 , then α = d+1 . c. (Grassmannian frames) If an ETF does not exist for a given N ≥ d + 2, then a reasonable substitute is to consider (N, d)-Grassmannian frames. Let X = d {xj }N j =1 ⊆ C be a set of unit norm elements. X is an (N, d)-Grassmannian d frame for C if it is a FUNTF and if
μ(X) = inf μ(Y ),
.
where the infimum is taken over all FUNTFs Y for Cd consisting of N elements. A compactness argument shows that (N, d)-Grassmannian frames exist, see [17], Appendix. Also, ETFs are a subclass of Grassmannian frames, see [21, 82]. Furthermore, as noted in [77], Grassmannian frames have significant applicability, including spherical codes and designs, packet-based communication systems such as the Internet, and geometrically uniform codes in information theory, and these last are essentially group frames [41], cf. Definition 4 c and [22]. One of the major mathematical challenges is to construct Grassmannian frames, see [17, 83]. d. (Zauner’s conjecture) Zauner’s conjecture is that for any dimension d ≥ 1 there is a FUNTF X = {xj : j = 1, . . . , d 2 } for Cd such that ∀ j = k,
.
| xj , xk | =
1 . d +1
The problem can be restated by asking if for each d ≥ 1, there are (d 2 , d)Grassmannian frames that achieve equality with the Welch bound. This is an open problem in quantum information theory, and the conjecture by Zauner [86] was motivated by issues dealing with quantum measurement, cf. [72].
508
J. J. Benedetto et al.
There are solutions for some values of d, and the solutions are referred to as symmetric, informationally complete, positive operator-valued measures (SICPOVMs). POVMs were introduced in Sect. 3.2. They not only arise in quantum measurement and detection, e.g., see [16], Definition A.1, but also draw on issues dealing with coherent states [3]. A major recent contribution to Zauner’s conjecture is [5]. Zauner’s conjecture is also related to frame potential energy in the following way. In [12], FUNTFs were characterized as the minimizers of the 2 -frame potential energy functional motivated by Coulomb’s law. The p -version, merely defined in [17], was developed by Ehler and Okoudjou, see [36, 37]. The main theorem in [12] proves the existence of so-called Welch bound equality (WBE) sequences used for code-division multiple-access (CDMA) systems in communications, see [60, 82]. In fact, the essential inequality asserted in the WBE setting of Massey and Mittelholzer [60] is an 2 -version of the ∞ inequality (23); and the relevant equations in [60] are (3.4)–(3.6). With this backdrop, there is a compelling case relating solutions of Zauner’s conjecture, as well as Grassmannians, in terms of minimizers of all p -frame potentials, see [64]. e. (CAZAC sequences) Given a function u : Z/dZ −→ C, for any such u; we can define a Gabor FUNTF U = {uj : j = 1, . . . , d 2 }, where each uj consists of translates and modulations of u, e.g., see [70]. The discrete periodic ambiguity function A(u) of u is defined by the formula ∀ (m, n) ∈ Z/dZ × Z/dZ,
.
A(u)(m, n) =
d−1 1 u(m + k) u(k) e−2π ikn/d . d k=0
The function u is a constant amplitude 0-autocorrelation (CAZAC) sequence if ∀ m ∈ Z/dZ,
.
|u(m)| = 1,
(CA)
and ∀ m ∈ Z/dZ \ {0},
.
d−1 1 u(m + k) u(k) = 0. d
(ZAC).
k=0
A recent survey on the theory and applicability of CAZAC sequences is [9]. The construction of all CAZAC sequences remains a tantalizing and applicable venture. A fundamental fact is the following theorem [8], Theorem 3.8. Let d = p be prime. There are explicit CAZAC sequences u : Z/pZ −→ C (due to Björck) with the property that if (m, n) ∈ (Z/pZ × Z/pZ)\{(0, 0)}, then 2 + .|A(u)(m, n)| ≤ √ p
4 p 4 p3/2
if p ≡ 1 (mod 4) if p ≡ 3 (mod 4).
A Generalization of Gleason’s Frame Function for Quantum Measurement
509
√ In particular, |A(u)(m, n)| ≤ 3/ p. This implies that the coherence μ(U ) of U satisfies the inequalities .
1 3 ≤ μ(U ) ≤ √ , √ p p+1
(24)
√ even though |A(u)(m, n)| can have significantly smaller values than 3/ p for various (m, n). This latter property hints at the deeper applicability of CAZAC sequences such as the Björck sequence. Because of the 0-autocorrelation property, CAZAC sequences are the opposite of what candidates for Zauner’s conjecture should be. On the other hand, the inequality (24) gives perspective with regard to Zauner’s conjecture. Furthermore, these CAZAC sequences are an essential component of the background and goals dealing with phase-coded waveforms that were the driving force leading to the role of group frames in the vector-valued theory of [4]. Remark 12 (Uncertainty Principle) a. It is relevant to understand weighted extensions of Heisenberg’s uncertainty principle in the context of a Gleason theorem for Parseval frames, just as Gleason’s original theorem in the context of ONBs was driven by the Birkhoff and von Neumann remark in Sect. 1.1. These extensions of Heisenberg’s uncertainty principle are both physically motivated and use many techniques from harmonic analysis, see, e.g., [14], [15], [11]. b. Because of the role of the uncertainty principle in quantum mechanics and the technical role of graph theory in Schrödinger eigenmap methods for nonlinear dimension reduction techniques, it is natural to continue the development of graph theoretic uncertainty principles [1, 18, 31, 46, 49, 52, 55, 74, 75, 80, 81]. Acknowledgments We are grateful to Dr. Rad Balu of the Army Research Labs, Adelphi (ARL) for telling us about Busch’s work on Gleason’s theorem and for arranging a post-doctoral position for Dr. Koprowski at ARL. The first named author gratefully acknowledges the support of ARO Grant W911NF-17-1-0014 and NSF-DMS Grant 18-14253. The second named author gratefully acknowledges the support of the Norbert Wiener Center and ARL. The third named author gratefully acknowledges the support of the Norbert Wiener Center as a Daniel Sweet Undergraduate Research Fellow. We would all like to thank Professor Robert Benedetto of Amherst College for several key algebraic insights. Finally, the first named author had the unbelievable privilege of having both Professors Gleason and Mackey as instructors during the period 1960– 1962 for real/functional analysis and for Schwartz’ distribution theory, respectively.
References 1. Agaskar, A., Lu, Y.M.: A spectral graph uncertainty principle. Information Theory, IEEE Transactions on 59(7), 4338–4356 (2013) 2. Ali, S.T., Antoine, J.P., Gazeau, J.P.: Continuous frames in Hilbert space. Ann. Phys. (NY) 222, 1–37 (1993)
510
J. J. Benedetto et al.
3. Ali, S.T., Antoine, J.P., Gazeau, J.P.: Coherent States, Wavelets and their Generalizations. Springer-Verlag, New York (2000) 4. Andrews, T., Benedetto, J.J., Donatelli, J.J.: Frame multiplication theory and a vector-valued DFT and ambiguity function. J. Fourier Analysis and Appl. 25(4), 1795–1854 (2019) 5. Appleby, M., Chien, T.Y., Flammia, S., Waldron, S.: Constructing exact symmetric informationally complete measurements from numerical solutions. J. of Physics A-Mathematical and Theoretical 51(16), 40 pages (2018) 6. Benedetto, J.J.: Irregular sampling and frames. In: C.K. Chui (ed.) Wavelets: a Tutorial in Theory and Applications, pp. 445–507. Academic Press Inc., San Diego, CA, USA (1992) 7. Benedetto, J.J.: Frame decompositions, sampling, and uncertainty principle inequalities. In: J.J. Benedetto, M.W. Frazier (eds.) Wavelets: Mathematics and Applications, pp. 247–304. CRC Press, Boca Raton, FL (1994) 8. Benedetto, J.J., Benedetto, R.L., Woodworth, J.T.: Optimal ambiguity functions and Weil’s exponential sum bound. Journal of Fourier Analysis and Applications 18(3), 471–487 (2012) 9. Benedetto, J.J., Cordwell, K., Magsino, M.: CAZAC sequences and Haagerup’s characterization of cyclic N-roots. In: C. Cabrelli, U. Molter (eds.) New Trends in Applied Harmonic Analysis, Volume II: Harmonic Analysis, Geometric Measure Theory, and Applications. Springer-Birkhäuser, New York (2019). Invited chapter 10. Benedetto, J.J., Czaja, W.: Integration and Modern Analysis. Birkhäuser Advanced Texts. Springer-Birkhäuser, New York (2009) 11. Benedetto, J.J., Dellatorre, M.: Uncertainty principles and weighted norm inequalities. Amer. Math. Soc. Contemporary Mathematics, M. Cwikel and M. Milman, editors 693, 55–78 (2017) 12. Benedetto, J.J., Fickus, M.: Finite normalized tight frames. Adv. Comp. Math. 18(2–4), 357– 385 (2003) 13. Benedetto, J.J., Frazier, M. (eds.): Wavelets: Mathematics and Applications. Studies in Advanced Mathematics. CRC Press, Boca Ratan, FL (1994) 14. Benedetto, J.J., Heinig, H.P.: Fourier transform inequalities with measure weights. Advances in Mathematics 96(2), 194–225 (1992) 15. Benedetto, J.J., Heinig, H.P.: Weighted Fourier inequalities: new proofs and generalizations. J. of Fourier Anal. Appl. 9(1), 1–37 (2003) 16. Benedetto, J.J., Kebo, A.: The role of frame force in quantum detection. J. Fourier Analysis and Applications 14, 443–474 (2008) 17. Benedetto, J.J., Kolesar, J.D.: Geometric properties of Grassmannian frames in R 2 and R 3 . EURASIP Journal on Applied Signal Processing (2006) 18. Benedetto, J.J., Koprowski, P.J.: Graph theoretic uncertainty principles. SampTA, Washington, D.C. p. 5 pages (2015) 19. Benedetto, J.J., Walnut, D.: Gabor frames for L2 and related spaces. Wavelets: Mathematics and Applications, edited by J.J. Benedetto and M. Frazier, CRC pp. 97–162 (1994) 20. Birkhoff, G., von Neumann, J.: The logic of quantum mechanics. Annals of Mathematics 37(4), 823–843 (1936) 21. Bodmann, B.G., Paulsen, V.I., Tomforde, M.: Equiangular tight frames from complex Seidel matrices containing cube roots of unity. Linear Algebra Appl. 430(1), 396–417 (2009) 22. Bölcskei, H., Eldar, Y.C.: Geometrically uniform frames. IEEE Transactions on Information Theory 49(4), 993–1006 (2003) 23. Busch, P.: Quantum states and generalized observables: a simple proof of Gleason’s theorem. Physical Review Letters 91(12), 120403 (2003) 24. Busch, P., Grabowski, M., Lahti, P.: Operational Quantum Physics. Springer, New York (1997 (1995)) 25. Casazza, P., Kutyniok, G.: Finite Frames: Theory and Applications. Applied and Numerical Harmonic Analysis. Birkhäuser Boston (2012). URL https://books.google.com/books?id= 6ASdx0Rblk4C 26. Casazza, P.G.: Every frame is a sum of three (but not two) orthonormal bases - and other frame representations. J. Fourier Analysis and Applications 4(6), 727–732 (1998)
A Generalization of Gleason’s Frame Function for Quantum Measurement
511
27. Casazza, P.G., Kovaˇcevi´c, J.: Equal-norm tight frames with erasures. Adv. Comput. Math. 18(2–4), 387–430 (2003) 28. Casazza, P.G., Redmond, D., Tremain, J.C.: Real equiangular frames. CISS Meeting, Princeton, NJ (2008) 29. Caves, C.M., Fuchs, C.A., Manne, K.K., Renes, J.M.: Gleason-type derivations of the quantum probability rule for generalized measurements. Foundations of Physics 34(2), 193–209 (2004) 30. Christensen, O.: An Introduction to Frames and Riesz Bases, 2nd edition. Springer-Birkhäuser, New York (2016 (2003)) 31. Chung, F.R.: Spectral Graph Theory, vol. 92. American Mathematical Soc. (1997) 32. Czaja, W.: Remarks on Naimark’s duality. Proceedings of the American Mathematical Society 136(3), 867–871 (2008) 33. Daubechies, I.: Ten Lectures on Wavelets. CBMS-NSF Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics (1992). URL http://books. google.com/books?id=Nxnh48rS9jQC 34. Davis, C.H.: Geometric approach to a dilation theorem. Linear Algebra and its Applications 18(1), 33–43 (1977). DOI http://dx.doi.org/10.1016/0024-3795(77)90077-5. URL http://www. sciencedirect.com/science/article/pii/0024379577900775 35. Duffin, R.J., Schaeffer, A.C.: A class of nonharmonic Fourier series. Trans. Amer. Math. Soc. 72, 341–366 (1952) 36. Ehler, M., Okoudjou, K.: Probabilistic frames: an overview. In: P.G. Casazza, G. Kutyniok (eds.) Finite Frames: Theory and Applications, Applied and Numerical Harmonic Analysis. Springer-Birkhäuser, New York (2013). Chapter 12 37. Ehler, M., Okoudjou, K.A.: Minimization of the probabilistic p−frame potential. J. Statist. Plann. Inference 142(3), 645–659 (2012) 38. Fickus, M., Jasper, J., Mixon, D.G., Peterson, J.: Tremain equiangular tight frames. J. Combin. Theory (2018) 39. Fickus, M., Mixon, D.G.: Tables of the existence of equiangular tight frames. ArXiv preprint: arXiv:1504.00253 (2016) 40. Fickus, M., Mixon, D.G., Tremain, J.C.: Steiner equiangular tight frames. Linear Algebra Appl 436, 1014–1027 (2012) 41. Forney, G.D.: Geometrically uniform codes. Information Theory, IEEE Transactions on 37(5), 1241–1260 (1991) 42. Friedberg, S.H., Insel, A.J., Spence, L.E.: Linear Algebra, 3rd edn. Prentice-Hall, New York (1997) 43. Fuchs, C.A.: Coming of Age with Quantum Information: Notes on a Paulian Idea. Cambridge University Press (2011) 44. Gleason, A.M.: Measures on the closed subspaces of a Hilbert space. Journal of Mathematics and Mechanics 6(6), 885–893 (1957) 45. Gohberg, I., Goldberg, S.: Basic operator theory. Birkhäuser Boston, Mass. (1981) 46. Grünbaum, F.A.: The Heisenberg inequality for the discrete Fourier transform. Applied and Computational Harmonic Analysis 15(2), 163–167 (2003) 47. Halmos, P.R.: Finite-dimensional Vector Spaces, second edition. D. Van Nostrand Co., Inc., Princeton, NJ (1958) 48. Hamhalter, J.: Quantum Measure Theory. Fundamental Theories of Physics, 134. Springer, New York (2003) 49. Hammond, D.K., Vandergheynst, P., Gribonval, R.: Wavelets on graphs via spectral graph theory. Applied and Computational Harmonic Analysis 30(2), 129–150 (2011) 50. Han, D., Larson, D.: Frames, bases and group representations. Mem. Amer. Math. Soc. 147, no. 697 (2000) 51. Herman, M.A., Strohmer, T.: High-resolution radar via compressed sensing. IEEE Transactions on Signal Processing (2009) 52. Koprowski, P.J.: Graph theoretic uncertainty and feasibility. Sampling Theory in Signal and Image Processing 15, 73–93 (2016)
512
J. J. Benedetto et al.
53. Kovaˇcevi´c, J., Chebira, A.: Life beyond bases: The advent of frames (part I). Signal Processing Magazine, IEEE 24(4), 86–104 (2007) 54. Kovaˇcevi´c, J., Chebira, A.: Life beyond bases: The advent of frames (part II). Signal Processing Magazine, IEEE 24, 115–125 (2007) 55. Lammers, M., Maeser, A.: An uncertainty principle for finite frames. Journal of Mathematical Analysis and Applications 373(1), 242–247 (2011) 56. Lay, D.C.: Linear Algebra and its Applications. Addison-Wesley, Reading, MA (1994) 57. Mackey, G.W.: Quantum mechanics and Hilbert space. Amer. Math. Monthly 64(8, Part 2), 45–57 (1957) 58. Mackey, G.W.: The Mathematical Foundations of Quantum Mechanics. The Benjamin Cummings Publishing Co., Reading, MA (1963, 3rd printing 1977) 59. Mackey, G.W.: Unitary Group Representations. The Benjamin Cummings Publishing Co., Reading, MA (1978) 60. Massey, J.L., Mittelholzer, T.: Welch’s bound and sequence sets for code-division multipleaccess systems. Sequences II: Methods in Communication, Security and Computer Sciences (1993) 61. Naimark, M.A.: Spectral functions of a symmetric operator. Izv. Akad. Nauk SSSR Ser. Mat. 4(3), 277–318 (1940) 62. Naimark, M.A.: On a representation of additive operator set functions. In: Dokl. Akad. Nauk SSSR, vol. 41, pp. 359–361 (1943) 63. von Neumann, J.: Mathematical Foundations of Quantum Mechanics. Princeton University Press, Princeton (1955 (1932)) 64. Okoudjou, K.A.: Preconditioning techniques in frame theory and probabilistic frames. In: K.A. Okoudjou (ed.) Finite Frame Theory: A Complete Introduction to Overcompleteness, Proceedings of Symposia in Applied Mathematics, vol. 73. AMS, Providence, RI (2016). Chapter 4 65. Paley, R.E.A.C., Wiener, N.: Fourier Transforms in the Complex Domain, Amer. Math. Society Colloquium Publications, vol. XIX. American Mathematical Society, Providence, RI (1934) 66. Parthasarathy, K.R.: An Introduction to Quantum Stochastic Calculus. Springer, New York (1992) 67. Paulsen, V.I.: Completely Bounded Maps and Operator Algebras. Cambridge University Press (2003) 68. Pfander, G.E.: Gabor frames in finite dimensions. In: P.G. Casazza, G. Kutyniok (eds.) Finite Frames: Theory and Applications, pp. 193–239. Springer-Birkhäuser (2013) 69. Pfander, G.E.: Gabor frames in finite dimensions. In: P.G. Casazza, G. Kutyniok (eds.) Finite Frames: Theory and Applications, pp. 193–239. Birkhäuser (2013) 70. Pfander, G.E.: Sampling of operators. Journal of Fourier Analysis and Applications 19(3), 612–650 (2013) 71. Pitowsky, I.: Quantum Mechanics as a Theory of Probability. Springer, New York (2005) 72. Renes, J.M., Blume-Kohout, R., Scott, A.J., Caves, C.: Symmetric informationally complete quantum measurements. J. Math. Phys. 45(6), 2171–2180 (2004) 73. Rudin, W.: Functional Analysis, second edition. McGraw-Hill (1991 (1973)) 74. Shuman, D., Narang, S., Frossard, P., Ortega, A., Vandergheynst, P.: The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. Signal Processing Magazine, IEEE 30(3), 83–98 (2013). DOI https://doi. org/10.1109/MSP.2012.2235192 75. Shuman, D.I., Ricaud, B., Vandergheynst, P.: Vertex-frequency analysis on graphs. Applied and Computational Harmonic Analysis (2015) 76. Strang, G.: Linear Algebra and its Applications, 3rd (there is 5th) edn. Harcourt Brace Jovanovitch, New York (1988) 77. Strohmer, T., Heath, R.W.: Grassmannian frames with applications to coding and communications. Appl. Comp. Harm. Anal. 14, 257–275 (2003) 78. Szöllosi, F.: All complex equiangular tight frames in dimension 3. ArXiv preprint: arXiv:1402.6429 (2017)
A Generalization of Gleason’s Frame Function for Quantum Measurement
513
79. Trefethen, L.N., Bau, D.: Numerical Linear Algebra. Soc. Industrial and Applied Math., Philadelphia (1997) 80. Tsitsvero, M., Barbarossa, S., Di Lorenzo, P.: Signals on graphs: uncertainty principle and sampling. arXiv preprint arXiv:1507.08822 (2015) 81. Tsitsvero, M., Barbarossa, S., Di Lorenzo, P.: Uncertainty principle and sampling of signals defined on graphs. arXiv preprint arXiv:1512.00775 (2015) 82. Waldron, S.: Generalized Welch bound equality sequences are tight frames. IEEE Trans. Inf. Theory 49(92307–2309) (2003) 83. Waldron, S.: An Introduction to Finite Tight Frames. Springer-Birkhäuser, New York (2018) 84. Wallace, D.: Inferential versus dynamical conceptions of physics. In: O. Lombardi, S. Fortin, F. Holik, C. Lopez (eds.) What is quantum information?, pp. 179–209. Cambridge University Press (2017) 85. Welch, L.: Lower bounds on the maximum cross correlation of signals. IEEE Transactions on Information Theory 20(3), 397–399 (1974) 86. Zauner, G.: Quantum designs—foundations of non-commutative theory of designs. Ph.D. thesis, University of Vienna (1999). URL Available online at http://www.math.univie.ac.at/~ neum/papers.html
Post-Fourier Frequencies: Variations and Paradoxes Patrick Flandrin
In memory of Alex Grossmann, who had a knack for finding the depth behind the simplicity.
Abstract We address two questions related to the notion of frequency and its possible extensions in the case of evolutive situations, some of them leading to paradoxes. We first make a distinction between the concepts of instantaneity and locality; we then discuss links between local and global spectral properties, in relation, in particular, with the recently introduced phenomena of “supershift” and “superoscillation.”
In one of the seminal papers on wavelet transforms that Alex Grossmann coauthored in the late 1980s [18], it is stated from the beginning that “one of the aims of wavelet transforms is to provide an easily interpretable visual representation of signals.” A few lines later, it is made explicit that “the qualitative (and visual) information gathered from our pictures is certainly not the end of all desire of signal analysis. We believe however that it supplements in a non-trivial way the information obtained by inspection of the signal itself.” In retrospect, it appears that this claim, yet very relevant, was unduly modest in the sense that much more than a mere visual inspection was achievable when making mathematics enter the picture, as it has been amply proved in the following years. From a very general standpoint, which does not restrict to wavelets, one can say that what is at stake in a signal representation is to provide a mathematical framework aimed at reporting on the physical reality of some analyzed data in a clearly interpretable way. The present text, which mostly concerns the interpretation of frequency and its possible extensions beyond Fourier, is aimed at offering
P. Flandrin () Laboratoire de Physique, Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS, Lyon, France e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Flandrin et al. (eds.), Theoretical Physics, Wavelets, Analysis, Genomics, Applied and Numerical Harmonic Analysis, https://doi.org/10.1007/978-3-030-45847-8_22
515
516
P. Flandrin
variations on this theme and underlying some paradoxes that may come along, with some new results supplementing existing ones that are reviewed for discussion.
1 From Sine to Sinc Keeping in mind the primacy of interpretation, while aiming to marry physics with mathematics, one iconic issue is the concept of instantaneous frequency. Indeed, while this notion—which is instrumental for describing the commonly encountered situation of oscillations whose rhythm varies over time—seems natural from a physical standpoint, its mathematical formalization might prove difficult to establish, ending up eventually with descriptions that can be paradoxical, even for apparently very simple cases. In order to support this claim with an explicit example, let us consider the waveform displayed in Fig. 1a. Up to some slow decay in amplitude, the most simple reading of this graph suggests an interpretation in terms of a sinusoidal oscillation,
Fig. 1 (a) Slowly decaying sinusoidal oscillation, with frequency .f0 = 1 Hz (or, equivalently, period .T = 1 s). (b) Sinc function whose oscillation inside the red rectangle corresponds, when enlarged, to the waveform displayed in (a)
Post-Fourier Frequencies: Variations and Paradoxes
517
with frequency .f0 = 1 Hz. A natural requirement is therefore that any convenient definition of instantaneous frequency should end up with the actual value 1 Hz (or a very close number) over the observation time span, whatever the evolution of the signal outside this interval. However, as it has been cooked up for the example, the approximate sine function of Fig. 1a is in fact a detailed view “at large time” of the waveform displayed in Fig. 1b, in which a sinc function is easily recognized.1 When looking at the entire signal, it clearly appears that most of its oscillations follow what is observed “at large time” (i.e., a 1 s pseudo-period), at the notable exception of the vicinity of the time origin where the waveform experiences a doubling of the zero-crossing interval, whence an expected decrease in the local frequency. The question that will be addressed thereafter is how to account in a mathematical way for this physical behavior. While no unique approach can be followed, which would be satisfactory in all respects, different options will be considered, each with its own pros and cons.
1.1 Instantaneous: Gabor–Ville In order to characterize the “instantaneous” nature of an oscillation in a signal, the most classical and most celebrated approach, which goes back to D. Gabor [16] and J. Ville [32], relies on the concept of analytic signal. More precisely, given a realvalued signal .x(t) ∈ R, one can pair it with a complex-valued version .zx (t) ∈ C according to zx (t) = x(t) + i(Hx)(t),
.
(1)
where .(Hx)(t) stands for the Hilbert transform of .x(t) [9, 14]. The frequency representation of this analytic signal reads .Zx (f ) = 2X(f )1[0,∞) (f ), with ˆ X(f ) =
.
∞
−∞
x(t) exp{−i2πf t} dt
(2)
the Fourier transform of .x(t) and .1I (f ) the indicator function of the interval I (with value 1 when .f ∈ I and 0 elsewhere). Expressing the complex-valued time representation (1) under its polar form, one can follow D. Gabor [16] and J. Ville [32] for defining the instantaneous amplitude GV GV (t) via the respective equations: .ax (t) and instantaneous frequency .fx axGV (t) = |zx (t)|
.
1 The
(3)
term sinc was coined by P.M. Woodward and I.L. Davies in [33], based on their observation that “this function occurs so often in Fourier analysis and its applications that it does seem to merit a notation of its own.” While its place is central in sampling theory, the sinc function also appears in wavelet theory, where it plays the role of a scaling function (or “father” wavelet) in Shannon-type decompositions [25].
518
P. Flandrin
and fxGV (t) =
.
1 d arg{zx (t)}. 2π dt
(4)
In the case of the sinc function displayed in Fig. 1b, its mathematical expression reads s(t) =
.
sin(2πf0 t) , 2πf0 t
(5)
with .f0 the frequency controlling the oscillating part of the signal (in the present case, we have .f0 = 1 Hz). It is well known that .S(f ), the Fourier transform of .s(t), reads .S(f ) = (1/2f0 ) 1[−f0 ,f0 ] (f ). By construction, .s(t) happens therefore to be a band-limited function with maximum frequency .f0 and, equivalently, with a total bandwidth .B = 2f0 . The Fourier transform .Zs (f ) of its associated analytic signal .zs (t) is simply .Zs (f ) = (1/f0 ) 1[0,f0 ] (f ), whence, by inverse Fourier transformation, zs (t) =
.
sin2 (πf0 t) sin(2πf0 t) . +i πf0 t 2πf0 t
(6)
Re-expressing this complex-valued quantity under its polar representation zs (t) = asGV (t) exp{iϕsGV (t)},
(7)
sin(πf0 t) . asGV (t) = πf0 t
(8)
.
it follows from (3) that .
As for the phase .ϕsGV (t), one can write .
sin2 (πf0 t)/(πf0 t) tan ϕsGV (t) = sin(2πf0 t)/(2πf0 t) =
sin2 (πf0 t) sin(πf0 t)cos(πf0 t)
= tan(πf0 t),
(9)
whence .ϕsGV (t) = πf0 t + nπ, n ∈ Z. It then follows from (4) that fsGV (t) =
.
f0 . 2
(10)
Post-Fourier Frequencies: Variations and Paradoxes
519
Fig. 2 Gabor–Ville analysis of the sinc function displayed in Fig. 1b. (a) Analytic signal: real part (thick black line), imaginary part (thin black line), and instantaneous amplitude (red line). (b) Instantaneous phase. (c) Instantaneous frequency of the sinc function (blue line) and reference frequency .f0 of the sine entering the definition of the sinc function (thin black line)
All three instantaneous quantities (amplitude, phase, and frequency) are displayed in Fig. 2. Focusing on the instantaneous frequency, (10) evidences that it takes on a constant value corresponding to half the reference frequency .f0 of the sine entering the definition (5) of the sinc function. Equivalently, it corresponds to the mean frequency of the spectrum of the analytic signal which, in the present case, is flat in the band .[0, f0 ] and zero elsewhere.2 Yet mathematically correct, the description thus offered by the Gabor–Ville approach is physically counter-intuitive. Indeed, it ends up with an instantaneous frequency that is constant and everywhere equal to .f0 /2, whereas something
2 This is of course in agreement with the formula expressing the mean frequency of the energy spectrum of a signal as a properly weighted time average of its instantaneous frequency, thanks to the identity [9]: ˆ ∞ ˆ ∞ . f |Zs (f )|2 df = |asGV (t)|2 fsGV (t) dt. 0
−∞
520
P. Flandrin
different is expected when eyeballing Fig. 1a. First, the main lobe of width .1/f0 around the origin suggests a local slowing down of the frequency, and, second, the rest the signal mostly oscillates in a regular fashion, with a periodic vanishing at all time instants .{tk = k/2f0 , k ∈ Z∗ }, what should correspond to a frequency .f0 . Acknowledging the limitation of the Gabor–Ville approach, one can ask whether alternative definitions would provide more meaningful descriptions.
1.2 Local: Teager–Kaiser Let us consider a simple, real-valued, harmonic signal which takes on the form x(t) = a sin(2πf0 t + ψ). As pioneered by H.M. Teager et S.M. Teager [30], and later elaborated by J.F. Kaiser [22] and others, a possibility is to attach to such a waveform an “energy operator” involving first- and second-order derivatives and defined as [31]
.
2 [x(t)] = [x(t)] ˙ − x(t) x(t). ¨
(11)
.
It is easy to check that, in this case, one simply has .[x(t)] = a 2 (2πf0 )2 and .[x(t)] ˙ = a 2 (2πf0 )4 , whence a possible access to the amplitude a and the frequency .f0 via a convenient combination of the quantities obtained when applying the energy operator . to the signal and to its derivative. Since the approach is differential—and, hence, essentially local3 — it makes sense to consider the extension of its application to the situation of quasi-sinusoidal oscillations for which the amplitude and the frequency are both slowly varying. This leads to alternative definitions of the instantaneous amplitude and instantaneous frequency, according to the respective expressions (assuming they exist): TK .ax (t)
[x(t)] ; =√ [x(t)] ˙
fxTK (t)
1 = 2π
[x(t)] ˙ . [x(t)]
(12)
In the case of the sinc function (5), a straightforward calculation gives the following result for the energy operator applied to .s(t): 1 sin2 (2πf0 t) .[s(t)] = . 1− (2πf0 t)2 t2
(13)
Proceeding in a similar way for the derivative .s˙ (t), one gets after some manipulations
3 When formulated in a discrete time setting, the energy operator involves finite differences based on only three consecutive data samples [22].
Post-Fourier Frequencies: Variations and Paradoxes
521
Fig. 3 Teager–Kaiser analysis of the sinc function displayed in Fig. 1b. (a) Signal (black line) and instantaneous amplitude (red line). (b) Instantaneous frequency
1 (2πf0 )2 − 4 .[˙ s (t)] = t2 t
2 1− (2πf0 t)2
sin2 (2πf0 t) − 2
cos2 (2πf0 t) sin(4πf0 t) +2 , t4 2πf0 t 5
(14) and the corresponding expressions for the instantaneous amplitude and frequency follow from (12). Their graphs are displayed in Fig. 3. In comparison with the Gabor–Ville approach, the time-varying frequency obtained “à la Teager–Kaiser” appears as more satisfactory from the point of view of interpretation. Indeed, the very local nature of the computation results in a value which is essentially constant and equal almost everywhere to the expected frequency .f0 = 1, except in the region neighboring the origin where the existence of the lobe of width .1/f0√creates a dip in frequency (whose minimum can be shown to attain the value .f0 / 3 at .t = 0). If we rely on this only example, the Teager–Kaiser approach is quite appealing, but, unfortunately, this cannot be extended to any other situation without some caution. In particular, the energy operator (11) is not guaranteed to be positive for any signal: besides the interpretation issue this may raise, this can forbid the existence of an instantaneous frequency in the sense of (12).
522
P. Flandrin
This calls for further investigations in the quest for approaches that would possibly accommodate for locality and instantaneity.
1.3 Local and Instantaneous 1: Hilbert–Huang As it has been often noticed, the classical notion of frequency is attached to the “Fourier modes,” i.e., the complex exponentials which appear in (2). It is therefore this very notion which enters the definition of a monochromatic sinusoidal oscillation. When the rhythm of the oscillation happens to vary in time, difficulties show up as soon as the local behavior differs significantly from the nominal model. To alleviate (at least partially) this difficulty, N.E. Huang et al. proposed in [20] to restrict any estimation of an instantaneous frequency to what they referred to as Intrinsic Mode Functions (IMFs), i.e., oscillatory functions which locally behave as a Fourier mode. In the framework of the Empirical Mode Decomposition (EMD) that they developed for this purpose, IMFs are extracted sequentially, from fine to coarse scales, starting with the identification of the fastest local oscillation under the constraint of being locally zero mean (cf., e.g., [21] for further details on the method and the algorithm). Once an IMF has been identified, it is subtracted from the signal and the procedure is iterated on the residual. Proceeding this way down to some given depth K, a signal .x(t) is eventually decomposed as x(t) =
K
.
dk (t) + rK (t),
(15)
k=1
where each .dk (t) stands for an IMF (at level or “scale” k) and .rK (t) for the final residual. By construction, each mode .dk (t) can be given a proper Gabor–Ville analysis, resulting in a collection of instantaneous amplitudes and frequencies in the so-called Hilbert-Huang sense: akHH (t) = adGV (t); k
.
fkHH (t) = fdGV (t). k
(16)
Figure 4 displays the result of such an analysis when limited to .K = 1. Considering in this case the first IMF only, the associated residual plays the role of an estimate for the local mean. As expected, this local mean is not zero and its subtraction from the initial signal ends up with a frequency dip in the vicinity of the time origin .t = 0, while the expected value .f0 is observed elsewhere. As for the Teager–Kaiser approach, the EMD-based solution cannot be considered satisfactory in all respects. The result itself, as presented in Fig. 4c, is coherent with the idea of a frequency equal to .f0 almost everywhere with a dip in the vicinity of the origin .t = 0, but at the expense of adding spurious oscillations whose interpretation is puzzling. Understanding what happens would need some theoretical analysis of how the decomposition is actually achieved, leading to a second, more
Post-Fourier Frequencies: Variations and Paradoxes
523
Fig. 4 Empirical Mode Decomposition of the sinc function displayed in Fig. 1b. (a) Signal (black line) and local mean (red line). (b) First Intrinsic Mode Function, obtained by subtracting the local mean to the signal. (c) Instantaneous frequency, in the sense of Gabor–Ville, of the mode displayed in (b)
important, point: EMD is not defined as a transformation (in the sense of Fourier, Gabor–Ville, or Teager–Kaiser) but only as the output of some iterative algorithm whose intricate structure excludes to get closed-form solutions in most cases, not to mention the many degrees of freedom that are left to the user [21].
1.4 Local and Instantaneous 2: Time–Frequency Rather than looking directly for a 1D function accounting for the evolution of frequency as a function of time, a more comprehensive approach is (i) to consider in a first step a joint description of a signal in both time and frequency and (ii) only in the second step to get the candidate instantaneous frequency as a by-product of the 2D representation. While there are many roads leading to possibly different time–frequency representations [7, 9, 13, 14], the shared rationale is that large values in the plane are attached to those points where significant frequency contributions exist at a
524
P. Flandrin
given time. In the case of proper energy distributions .ρx (t, f ) ≥ 0, one can think of using ridges (i.e., time trajectories of local frequency maxima) for defining an instantaneous frequency [10], but other options are offered. The most effective one is to fall back on marginal distributions of the time– frequency surface, thus defining both an instantaneous amplitude and an instantaneous frequency according to, respectively, ˆ axTF (t) =
.
0
12
+∞
ρx (t, f ) df
(17)
and TF .fx (t)
−2 ˆ TF = ax (t)
0
+∞
f ρx (t, f ) df.
(18)
Making use of the analogy between a time–frequency energy distribution and a joint probability density function, the rationale for such a definition is to simply consider instantaneous frequency as a conditional frequency mean at time t. Spectrogram The simplest and most natural energy distribution we can think of is based on a local Fourier analysis (or STFT, for “Short-Time Fourier Transform”) involving a short-time, sliding window .h(t) applied to the signal prior Fourier transformation. More precisely, an STFT usually reads ˆ (h) .Fx (t, f )
= (h)
∞
−∞
x(s) h(s − t) exp{−i2πf s} ds.
(19)
(h)
Its squared magnitude .Sx (t, f ) = |Fx (t, f )|2 is referred to as a spectrogram. For finite energy signals .x(t) ∈ L2 (R), a spectrogram is a proper energy distribution provided that the window .h(t) is itself square-integrable (and of unit energy). The so-computed distribution is not intrinsic to the analyzed signal, but it rather results from the interaction of the latter with some analyzing short-time window, whose length is left to the user and instrumental in the result. In this approach, the “instantaneity” of frequency is supposed to be captured by a “local” analysis, and Fig. 5 illustrates what happens when varying locality via the length of the window (chosen as a Gaussian function in this example). While some form of instantaneity can be obtained this way, it clearly appears that this offers no unique answer, but as many as different windows are envisioned. In particular, using larger and larger windows turns the spectrogram into the Fourier spectrum, with no time dependence anymore. This is what is observed in the figure, with an instantaneous frequency reducing over larger and larger time spans to the mean frequency of the spectrum, i.e., .f0 /2 as in the Gabor–Ville case. Reassigned Spectrogram If one limitation of spectrograms is their dependence on a window which introduces a form of arbitrariness in the analysis, another one
Post-Fourier Frequencies: Variations and Paradoxes
525
Fig. 5 Local Fourier analysis of the sinc function displayed in Fig. 1b, with three different lengths for the short-time window (materialized by thin black segments in the diagrams of the bottom row): small in column (a), medium in (b), and large in (c). Top row: spectrogram. Middle row: reassigned spectrogram. Bottom row: instantaneous frequencies estimated as conditional first-order moments of the spectrogram (blue line) and of the reassigned spectrogram (red line)
is that they do not make use of all available information by throwing away the phase when squaring the STFT. Phase is however of utmost importance in spectra since it encodes the way frequency components (whose energetic contributions are measured by the squared magnitude) are structured in time. Making phase enter the picture is therefore natural, with the possibility of estimating the instantaneous frequency locally, within time–frequency domains defined by the short-time window. This viewpoint was the basic starting point for a dramatic improvement to spectrograms, as pioneered by K. Kodera, R. Gendrin, and C. de Villedary in the late 1970s. Initially referred to as the “Modified Moving Window Method” [23] and later coined as “reassignment” when reformulated and generalized [2], the approach is a two-step procedure which amounts (i) to estimate for each time–frequency point .(t, f ) another point of interest .(tˆ(t, f ), fˆ(t, f )) that locally encodes the instantaneous frequency in the sense of Gabor–Ville, together with its companion quantity of group delay, and (ii) to move spectrogram values from .(t, f ) to .(tˆ(t, f ), fˆ(t, f )) (see [15] for details).
526
P. Flandrin
Reassigned spectrograms present the twofold advantage of providing a sharpened distribution and being almost independent of the short-time window length. Figure 5 displays the corresponding results, in terms of both distributions and instantaneous frequency estimations as conditional first-order moments. Scalogram Coming back to what we started with, i.e., wavelets, another way of bypassing the choice of a specific window length is to have recourse to a continuous wavelet transform [17, 25], usually defined as 1
Tx(ψ) (t, a) = |a|− 2
.
ˆ
∞
−∞
x(s) ψ
s−t a
ds.
(20)
While (20) is basically a function of time t and scale a, choosing for the wavelet ψ(·) a band-pass function whose Fourier transform is unimodal and peaked around some reference frequency .fψ turns it into a time–frequency analysis, with the formal identification .f = fψ /a. Its squared modulus, referred to as a scalogram [29], offers an alternative to the spectrogram, but it cannot be used mutatis mutandis for getting an instantaneous frequency estimate from a local first-order moment. However, such information can be gained directly from the phase of the transform, by applying the classical Gabor–Ville recipe to suitable sections of the wavelet transform, in time–frequency regions neighboring ridges of the scalogram. This is illustrated in Fig. 6, with the two typical examples of a Morlet wavelet and of the so-called Mexican hat (the second derivative of a Gaussian) [18]. We observe in both cases a behavior that essentially evidences what is expected, yet at the expense of some difference in the shape of the frequency dip.
.
1.5 Which Lesson? Local vs. Instantaneous Pursuing so far the program of capturing evolutions of frequency, numerous approaches have been considered in the previous sections, yet not exhausting all proposals that can be found in the literature (see, e.g., [11, 19] as additional alternatives). This resulted in almost as many results as different methods were used, evidencing that the question is in some sense ill-posed, with a flavor of oxymoron when associating “instantaneous” or “local” with “frequency.”4 What the example of the sinc function illustrates more specifically is the distinction that needs to be made between the seemingly similar notions of instantaneous and local. Indeed, in Fig. 1, the interpretation of (a) in relation to (b) is based
4 In this respect, we can quote A. Blanc-Lapierre and B. Picinbono who wrote as early as in 1955 (original formulation in French, translation by the author): “The expression of instantaneous frequency contains, in a way, an internal contradiction: the frequency is perfectly defined only for a signal exactly sinusoidal but then it is not instantaneous but eternal.” [6].
Post-Fourier Frequencies: Variations and Paradoxes
527
Fig. 6 Wavelet analysis of the sinc function displayed in Fig. 1b, with two different wavelets. (a)– (c)–(e) Analysis based on a Morlet wavelet displaying, respectively, the scalogram, the phase of the transform, and the derivative of this phase along the section indicated in yellow in the diagrams above. (b)–(d)–(f) Idem with a “Mexican hat” wavelet
on an a priori of locality according to which the properties of some subpart of a signal should not be affected by those of this signal considered as a whole. This is clearly not satisfied with the instantaneous frequency (4) which, when computed over the entire signal (b), exhibits values “at large time,” which do not coincide with what would result from the analysis of the only subpart (a). A reason for this apparent paradox can be sketched as follows. As a matter of fact, the frequency in the sense of Gabor–Ville does indeed acquire an instantaneous nature (it becomes possibly time dependent) but at the expense of resulting from an operation (the Hilbert transformation) that is highly non-local, with a slow decay (“in .1/t”) of the impulse response of the transformation. When the decay of the analyzed signal happens to be of the same nature—which is the case for the sinc function—the provided instantaneity does not permit to attain the locality that one could expect. Conversely, approaches that are local by construction (Teager–Kaiser, spectrogram, and scalogram) have to face some loss of instantaneity because they necessarily rely on observation spans that cannot have zero extent: from a local perspective, instantaneity would call for a perfect locality.
528
P. Flandrin
We will see in the following that instantaneous and/or local approaches may also lead to other forms of unexpected behaviors when compared to global features.
2 Frequencies and Spectral Bandwidth Leaving aside for a while the sinc function and focusing on the instantaneous frequency interpretation, one important further remark can be made in the case of band-limited signals. Indeed, given the spectral representation (2), one could expect in such a case that the fastest local oscillation would be imposed by the Fourier mode whose frequency is the largest one in the global spectrum. However, the situation turns out to be not that simple, as discussed in the following.
2.1 Two Spectral Lines One well-known and widely documented example concerns the idealized situation of two spectral lines with unequal amplitudes, namely signals taking on the form .x(t) = a1 x1 (t) + a2 x2 (t), with .a1 , .a2 ∈ R+ , and .xj (t) = cos(2πfj t); j = 1, 2. Assuming for simplicity that .f2 > f1 , a direct computation [9, 24, 26] shows that fxGV (t) =
.
d ρ2 − 1 f1 + f2 + , 2 2 2 1 + ρ + 2ρcos(2π dt)
(21)
with .d = f2 − f1 and .ρ = a2 /a1 . Going Out of Band In the very specific case of exactly equal amplitudes, i.e., when .ρ = 1, one has .fxGV (t) = 12 (f1 + f2 ) almost everywhere (“almost” will be precised further in the following), but this is the only situation in which the physical interpretation meets the mathematical result. Indeed, for all cases where .ρ = 1, one could expect the instantaneous frequency to be some weighted average of .f1 and .f2 (with suitable weights depending on .a1 and .a2 ), but, considering the expression (21), it turns out that the instantaneous frequency necessarily takes values outside the frequency band .[f1 , f2 ] for some time instants t [24]. More precisely, if we focus on the time instants .(tk = (k + 12 )/d; k ∈ Z), one gets GV .fx (tk )
f2 − f1 f1 + f2 + = 2 2
ρ+1 . ρ−1
(22)
It follows that .fxGV (tk ) > f2 for .ρ > 1 and .fxGV (tk ) ≤ f1 for .0 ≤ ρ < 1, with frequencies that can even become negative in this case (whereas the signal is analytic, with a spectrum equal to zero on the half-real line of negative frequencies): this is illustrated in Fig. 7. Coming back to (21), one sees that excursions below .f1
Post-Fourier Frequencies: Variations and Paradoxes (a)
30
frequency (Hz)
529
20 10 0 -10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.6
0.7
0.8
0.9
1
time (s) (b)
frequency (Hz)
10 0 -10 -20 0
0.1
0.2
0.3
0.4
0.5
time (s)
Fig. 7 Instantaneous frequency of a signal made of the superimposition of two spectral lines at = 2 Hz and .f2 = 8 Hz, with unequal amplitudes. Whereas the spectrum of such a signal has no contribution below .f1 and above .f2 , with an instantaneous frequency expected to lie somewhere within the shaded band-limited area, excursions outside this band are observed as soon as the amplitude ratio .ρ = a2 /a1 differs from unity, with a periodicity equal to .1/(f2 − f1 ) ∼ 167 ms: (a) above .f2 for .ρ = 2.5 (black), .1.5 (blue), and .1.01 (red) and (b) below .f1 for .ρ = 0.4 (black), ..8 (blue), and .0.99 (red) .f1
and above .f2 are periodical, with frequencies .d = f2 − f1 . When the amplitude ratio .ρ goes to unity, the instantaneous frequency tends to be . 12 (f1 + f2 ) almost everywhere, except in the vicinity of the time instants .(tk = (k + 21 )/d; k ∈ Z) where it tends to diverge in amplitude to infinity as .1/|ρ − 1|, over smaller and √ smaller intervals that shrink as . |ρ − 1|. Phase Dislocations The singular behavior of the instantaneous frequency occurs at the time instants .(tk = (k + 12 )/d; k ∈ Z) which are such that the instantaneous amplitude 1 2 axGV (t) = a12 + a22 + 2a1 a2 cos(2π dt)
.
(23)
530
P. Flandrin
Fig. 8 Signal made of the superimposition of two spectral lines at .f1 = 4 Hz and .f2 = 8 Hz. (a) Contour plot of the phase diagram for the whole range of possible amplitude ratios (see text), exhibiting singular points when amplitudes are equal (.h = 0 or, equivalently, .ρ = 1, red line). (b) Signal in the case where .ρ = 1, with real part in black, imaginary part in dotted blue, and envelope (i.e., instantaneous amplitude and its opposite) in red
is zero. When .a2 = a1 , we have .axGV (t) = 2a1 |cos(π dt)|, evidencing that the instantaneous amplitude vanishes in a non-smooth way, with both the imaginary and real parts of the analytic signal becoming zero simultaneously. The instantaneous phase, defined as .tan−1 (Im{zx (t)}/Re{zx (t)}), is therefore indeterminate, inducing the singular behavior. This corresponds in fact to a phase dislocation mechanism echoing the analysis reported in [27]. More precisely, Fig. 8 displays a phase diagram of the composite signal . 21 ((1 − h)x1 (t) + (1 + h)x2 (t)) as a function of the parameter .h = (ρ − 1)/(ρ + 1) measuring the balance between amplitudes (the case .h = 0 simply corresponds to .ρ = 1, i.e., .a1 = a2 ). Isolated singularities appear along the horizontal line .h = 0, with periodicity .1/(f2 − f1 ) and at those points .tk for which the instantaneous amplitude vanishes as .|t − tk |. A companion description can be given by complementing the time history of the signal with a Fresnel (or Argand) diagram in which the imaginary part is plotted as
Post-Fourier Frequencies: Variations and Paradoxes
531
Fig. 9 Signal made of the superimposition of two spectral lines at .f1 = 4 Hz and .f2 = 8 Hz, for a number of different amplitude ratios .ρ = a2 /a1 . Rows 1 and 3 display a part of the waveform, with real part in black, imaginary part in dotted blue, and envelope in red. Rows 2 and 4 display the corresponding Fresnel diagrams, with the imaginary part in ordinate, as a function of the real part in abscissa. The special case .ρ = 1 (highlighted by bold boundaries) evidences that the vanishing of the envelope corresponds to a “touching” of the origin of the plane by the Fresnel curve. When passing from .ρ < 1 to .ρ > 1, the “winding number” is increased by 1, which is the signature of a phase dislocation
a function of the real part. Whenever the signal is a pure tone, the trajectory consists in a circle which is drawn with one rotation per period around the origin of the plane. This is what is observed in the top left and bottom right diagrams of Fig. 9, when the signal reduces to a pure tone with frequency .f1 and .f2 , respectively. Given that .f2 = 2f1 in the present example, two rotations occur in the .f2 case as compared to only one in the .f1 case, when the observation span reduces to the same period .1/f1 . In between these two extreme situations, a transition must exist so as to permit a jump in the “winding number.” This appears precisely as a sharp transition when .ρ = 1, with the Fresnel curve happening to “touch” the origin of the plane (bottom left diagrams of Fig. 9).
532
P. Flandrin
2.2 “Supershift” and “Superoscillation” The apparently paradoxical behavior of a frequency that can be locally larger than the maximal frequency contained in the spectrum has long been invoked for justifying the unsatisfactory character of the instantaneous frequency in the sense of Gabor–Ville (see, e.g., [9, Chap. 2]). No less paradoxically, it is the very same definition that has been used to justify the great discrepancy that can exist between local frequency and spectral bandwidth in the case of the recently highlighted phenomenon of “supershift” and “superoscillation” [12], when invoking “the local rate of change of the phase” [4] for quantifying how frequency can locally go out of band. “Faster than Fourier” [3] As first proposed by Y. Aharonov and further elaborated by M.V. Berry [3], the heuristics justifying the existence of “supershifts” and/or “superoscillations” proceeds from the following argument. One considers a spectral representation of the form ˆ zε (t) =
.
∞
−∞
Gε (f − if∗ ) exp{i2π K(f )t} df,
(24)
with .f1 ≤ K(f ) ≤ f2 , thus guaranteeing the waveform to be band-limited. If we then choose .Gε such that .limε→0 Gε (·) = δ(·), one obtains formally that z(t) = lim zε (t) = exp{iK(if∗ )t}.
.
ε→0
(25)
An admissible choice for .K(f ) is .cos2πf , ending up with .f2 = 1 as the largest frequency of all Fourier modes entering the representation (24); yet, according to (25), the signal .z(t) oscillates at a frequency .cos(if∗ ) = cosh(f∗ ), a quantity which is always greater than 1 and can take on any arbitrarily large value. We therefore face the apparently paradoxical situation in which a band-limited signal can oscillate faster than its Fourier mode with largest frequency. Initially referred to as “superoscillation,” this phenomenon is first of all a “supershift” in the sense that the local frequency is shifted beyond the maximum frequency of the spectrum without necessarily corresponding stricto sensu to an oscillation. One can remark that this was indeed already the case with the previously considered example of the two spectral lines, which can be viewed as the simplest situation leading to a “supershift” [4]. This being said, it is quite remarkable that a “supershift” can give rise in some cases to an actual “superoscillation.” A particularly simple model for this has been suggested according to the definition [4, 5]: B(t) = (cos(2π t/N) + ia sin(2π t/N))N ; a > 1, N 1.
.
(26)
Post-Fourier Frequencies: Variations and Paradoxes
533
250
200
log(|amplitude|) (dB)
150
100
50
0
-50
-3
-2
-1
0
1
2
3
time (s)
Fig. 10 “Superoscillation” in the signal defined as the real part of (26), with .a = 4 and .N = 12. Red line: absolute value of the signal, displayed on a (logarithmic) dB scale. Black line: Fourier mode with largest frequency in the spectrum (1 Hz in the present case) (adapted from [5, Fig. 2])
On the one hand, this signal is band-limited by construction, with a maximum frequency of 1 Hz. This can be seen by rewriting (26) as B(t) = 2−N [(a + 1) exp{i2π t/N} − (a − 1) exp{−i2π t/N}]N
.
(27)
and using
a binomial expansion, ending up with a Fourier representation of the form B(t) = N n=0 Cn (a, N ) exp{i2π(1 − 2n/N )t}, with .Cn (a, N ) suitable coefficients [1]. On the other hand, it follows directly from (26) [1] that .B(t) ≈ exp{i2π at} for .|t| small and .N 1, resulting in a local oscillation that can be faster that the fastest Fourier mode in the spectrum as soon as .a > 1. As illustrated in Fig. 10, a convenient choice of the parameters can therefore end up with an actual “superoscillation.” There is however a price to pay, namely an exponential growth of the signal amplitude outside the interval on which the “superoscillation” takes place [12]. .
Back to the Sinc Function One could think that the examples considered so far are very peculiar, but it turns out that many other ones can be found in the literature [12],
534
P. Flandrin
one of which brings us back to the sinc function from which the present discussion started. Indeed, if we rewrite (5) as ∞ 4f02 t 2 sin(2πf0 t) = .s(t) = , 1− 2πf0 t n2
(28)
n=1
it is argued in [28] that shifting N zeros guarantees that, for any .k > 0, the signal s∗ (t) =
N
.
n=1
4f 2 k 2 t 2 1− 0 2 n
∞ n=N +1
4f 2 t 2 1 − 02 n
(29)
remains band-limited to the interval .[−f0 , f0 ] while oscillating at the frequency .kf0 over the interval .[−N/2kf0 , N/2kf0 ].
3 To Conclude What can we conclude from the essentially phenomenological considerations developed in this text? 1. First, that there is no single solution for making a frequency “instantaneous” or “local,” with different mathematical options—each one based on a well-founded rationale—leading to results that are sometimes very different and far apart from what physical intuition might lead us to expect. 2. Second, that the two points of view of instantaneity and locality are to be distinguished, with a possibility of reconciling them partially by resorting to “time–frequency” approaches. 3. Third, that the very notion of frequency is to be considered differently depending on whether we are dealing with local or global properties. Attempting to make frequency instantaneous or local amounts to add one extra description parameter, namely time, to a quantity which it intrinsically attached to stationary situations. This does not exclude that the frequency components composing a spectrum (the Fourier modes) can perfectly account for nonstationary situations, from a mathematical point of view. How can a nonstationary situation result from the superposition of stationary ones? The answer is to be found in the inter-dependencies of the different modes, i.e., their phase relationships. Looking at the phase spectrum is just another way of looking at time structuration, unraveling the way frequency contributions get organized so as to eventually create localized oscillations. While the sinc function has a flat amplitude spectrum, a different structure of its phase would have resulted in other objects, as different as bandpass filtered white noise (in case of a complete randomness) or linear chirp (for a proper sliding of constructive interferences). Similarly, in the example of
Post-Fourier Frequencies: Variations and Paradoxes
535
“superoscillations,” it is a subtle arrangement of Fourier modes that locally “pushes” the frequency out of the spectrum. Those remarks give full meaning to L. de Broglie’s assertion [8], according to which5 “if we consider a quantity that can be represented, in the Fourier manner, by a superposition of monochromatic components, it is the superposition that has a physical meaning and not the Fourier components considered in isolation.” They also recommend not forgetting phase each time we want physics to meet mathematics, whether directly or indirectly, which brings us back to the seminal paper we started with [18].
References 1. Y. Aharonov, F. Colombo, I. Sabadini, D. Struppa, and J. Tollaksen, “The mathematics of superoscillations,” Mem. Am. Math. Soc., 247 https://doi.org/10.1090/memo/1174, 2017. 2. F. Auger and P. Flandrin, “Improving the readability of time-frequency and time-scale representations by using the reassignment method,” IEEE Trans. on Signal Proc., Vol. 43, No. 5, pp. 1068–1089, 1995. 3. M.V. Berry, “Faster than Fourier,” in Quantum Coherence and Reality; in Celebration of the 60th Birthday of Yakir Aharonov (J.S. Anandan and J.L. Safko, eds.), pp. 55–65, Singapore: World Scientific, 1994. 4. M.V. Berry and N. Moiseyev, “Superoscillations and supershifts in phase space: Wigner and Husimi function interpretations,” J. Phys. A: Math. Theor. A, 47 315204, 2014. 5. M.V. Berry and S. Morley-Short, “Representing fractals by superoscillations,” J. Phys. A: Math. Theor. A, 50 22LT01, 2017. 6. A. Blanc-Lapierre and B. Picinbono, “Remarques sur la notion de spectre instantané de puissance,” Publ. Sci. Univ. Alger, Vol. B 1, pp. 2–32, 1955. 7. B. Boashash (ed.), Time-Frequency Signal Analysis and Processing – A Comprehensive Reference (2nd edn.), Amsterdam, NL: Elsevier, 2016. 8. L. de Broglie, Certitudes et Incertitudes de la Science, Paris, F: Albin-Michel, 1966. 9. L. Cohen, Time-Frequency Analysis, Englewood Cliffs, NJ: Prentice-Hall, 1995. 10. N. Delprat, B. Escudié, Ph. Guillemain, R. Kronland-Martinet, Ph. Tchamitchian, and B. Torrésani, “Asymptotic wavelet and Gabor analysis: Extraction of instantaneous frequencies,” IEEE. Trans. on Info. Theory, Vol. 38, No. 2, pp. 644–664, 1992. 11. S. Equis, P. Flandrin, and P. Jacquot, 2011, “Phase extraction in speckle interferometry by a circle fitting procedure in the complex plane,” Opt. Lett., Vol. 36, No. 23, pp. 4617–4619, 2011. 12. P.J.S.G. Ferreira, “Superoscillations,” in New Perspectives on Approximation and Sampling Theory: Festschrift in honor of Paul Butzer’s 85th birthday (G. Schmeisser and A. Zayed, eds.), Springer, pp. 247–268, 2014. 13. P. Flandrin, Time-Frequency/Time-Scale Analysis, San Diego, CA: Academic Press, 1999. 14. P. Flandrin, Explorations in Time-Frequency Analysis, Cambridge, UK: Cambridge University Press, 2018. 15. P. Flandrin, F. Auger, and E. Chassande-Mottin, “Time-frequency reassignment — From principles to algorithms,” in Applications in Time-Frequency Signal Processing (A. PapandreouSuppappola, ed.), Chapter 5, pp. 179–203. Boca Raton, FL: CRC Press, 2003. 16. D. Gabor, “Theory of communication,” J. IEE, Vol. 93, No. 11, pp. 429–457, 1946.
5 Original
formulation in French, translation by the author.
536
P. Flandrin
17. A. Grossmann and J. Morlet, “Decomposition of Hardy functions into square integrable wavelets of constant shape,” SIAM J. on Math. Anal., Vol. 15, pp. 723–736, 1984. 18. A. Grossmann, R. Kronland-Martinet, and J. Morlet, “Reading and understanding continuous wavelet transforms,” in Wavelets — Time-Frequency Methods and Phase Space (J.M. Combes, A. Grossmann, and Ph. Tchamitchian, eds.), pp. 2–20, Berlin: Springer-Verlag, 1989. 19. M.-K. Hsu, J.-C. Sheu, and C. Hsue, “Overcoming the negative frequencies — Instantaneous frequency and amplitude estimation using osculating circle method,” J. Marine Sc. and Tech., Vol. 19, No. 5, pp. 514–521, 2011. 20. N.E. Huang, Z. Shen, S.R. Long, M.C. Wu, H.H. Shih, Q. Zheng, N.-C. Yen, C.C. Tung, and H.H. Liu, “The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis,” Proc. Roy. Soc. London A, Vol. 454, pp. 903–995, 1998. 21. N.E. Huang and S.P. Chen (eds.), Hilbert-Huang Transform and Its Applications (2nd edn.), Singapore: World Scientific, 2014. 22. J.F. Kaiser, “On a simple algorithm to calculate the ‘energy’ of a signal,” Proc. IEEE Int. Conf. on Acoust., Speech and Signal Proc. ICASSP-90, Albuquerque (NM), USA, pp. 381–384, 1990. 23. K. Kodera, C. de Villedary, and R. Gendrin, “A new method for the numerical analysis of non-stationary signals,” Phys. Earth and Planet. Int., Vol. 12, pp. 142–150, 1976. 24. P. Loughlin and B. Tacer, “Comments on the interpretation of instantaneous frequency,” IEEE Signal Proc. Lett., Vol. 4, No. 4, pp. 123–125, 1997. 25. S. Mallat, A Wavelet Tour of Signal Processing — The Sparse Way (3rd edn.), Burlington, MA: Academic Press, 2009. 26. L. Mandel, “Interpretation of instantaneous frequencies,” Amer. J. Phys., Vol. 42, pp. 840–846, 1974. 27. J.F. Nye and M.V. Berry, “Dislocations in wave trains,” Proc. Roy. Soc. London A, Vol. 336, pp. 165–190, 1974. 28. W. Qiao, “A simple model of Aharonov-Berry’s superoscillations,” J. Phys. A: Math. Gen., Vol. 29, pp. 2257–2258, 1996. 29. O. Rioul and P. Flandrin, “Time-scale energy distributions: A general class extending wavelet transforms ,” IEEE Trans. on Signal Proc., Vol. 40, No. 7, pp. 1746–1757, 1992. 30. H. M. Teager and S. M. Teager, “A phenomenological model for vowel production in the vocal tract,” in Speech Sciences: Recent Advances, pp. 73–109, San Diego, CA: College-Hill Press, 1983. 31. D. Vakman, “On the analytic signal, the Teager-Kaiser energy algorithm, and other methods for defining amplitude and frequency,” IEEE Trans. on Signal Proc., Vol. 44, No. 4, pp. 791–797, 1996. 32. J. Ville, “Théorie et applications de la notion de signal analytique,” Câbles et Transmissions, 2ème A., No. 1, pp. 61–74, 1948. 33. P.M. Woodward and I.L. Davies, “Information theory and inverse probability in telecommunication,” Proc. IEEE, Vol. 99 (Part III), No. 58, pp. 37–44, 1952.
The Unreasonable Effectiveness of Haar Frames Stéphane Jaffard and Hamid Krim
Abstract Alex Grossmann used to point out the relevance of redundant frame decompositions as opposed to the economical setting supplied by orthonormal bases. We investigate here another unexpected occurrence of the relevance of this general paradigm: we show that pointwise and global Hölder regularity can be characterized using the coefficients on the Haar tight frame obtained by using a finite union of shifted Haar bases, despite the fact that the elements composing the frame are discontinuous, whereas this property does not hold for the usual Haar basis.
1 Introduction When the first author started his PhD in 1986, wavelets had just been introduced; among the very few papers available on the subject stood painless nonorthogonal expansions by I. Daubechies, A. Grossmann and Y. Meyer [11], in which wavelet frames were introduced. These constructions were considered as a compromise between the continuous wavelet transform, which is a very flexible tool whose shape constraints are easily met and orthonormal wavelet bases whose use is computationally less costly. The notion of a frame was not new: it had been introduced by R. J. Duffin and A. C. Schaeffer in 1952 as redundant systems which share most of the “good” stability properties displayed by orthonormal bases [16]. Recall that if H is a Hilbert space, a frame is a sequence .(en )n∈N of element of H satisfying
S. Jaffard () Laboratoire d’Analyse et de Mathématiques, Appliquées Université Paris Est Créteil, Créteil, France e-mail: [email protected] H. Krim Department of Electrical and Computer Engineering, North Carolina State University Raleigh, Raleigh, NC, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Flandrin et al. (eds.), Theoretical Physics, Wavelets, Analysis, Genomics, Applied and Numerical Harmonic Analysis, https://doi.org/10.1007/978-3-030-45847-8_23
537
538
S. Jaffard and H. Krim
∃C, C > 0,
∀f ∈ H,
.
C f 2 ≤
|f |en |2 ≤ C f 2 .
(1)
n
This condition implies that the vectors .(en ) span E, and f can be reconstructed from the .en in a stable way: There exists a dual frame .(gn )n∈N such that the partial sums . n≤N f |gn en converge to f in H when .N → +∞. The article of Duffin and Shaeffer however had remained largely unnoticed (except for the notable exception of the book of R. Young on nonharmonic Fourier series [49] which exposed the basic properties of frames); in particular, this notion, though stated in a general Hilbert space setting, had not been used outside of this context. Strangely, for the first author, the immediate consequence of reading painless nonorthogonal expansions was to guide him to the beautiful paper of Duffin and Schaeffer, and the consequence of which was an improvement on the conditions under which a family of complex exponentials .eiλn t form a frame on .L2 ([a, b]), see [25]. Another consequence was an introduction to one paper by Alex Grossmann, which was not written in the tough language (at least for a young PhD student in mathematics) of theoretical physics. This quickly led to visits in Marseille to meet Alex and to understand his views on the newborn field of wavelet analysis. The first meeting with Alex was disconcerting: one quickly realized that he was a very unorthodox scientist; his deep insights, often masked by distressingly simple statements and explanations, could easily be missed; however, once one became aware of this and pondered and analyzed his words, especially the ones which sounded the most innocuous, then the magic worked and opened the door to the extremely original world of a great scientist. Painless nonorthogonal expansions quickly had a deep impact. With a 34 years delay, it played the seminal role that could have been played by the Duffin and Shaeffer paper: properties of expansions on frames were investigated, see, e.g., [4, 7] and the references therein for an account of very significant developments of frame theory and its relevance, both mathematical analysis and in signal and image processing. To illustrate this importance, we point out the necessity of using frames rather than bases, in the phase retrieval problem: assume that a signal f (in practice, the setting usually is image processing) is unknown, but the information that can be acquired is .|gn |f | for a particular set of functions .gn . What are the conditions on the .gn to recover .|f |? If the .gn form an orthonormal basis, this is obviously impossible: all the . εn f |gn gn with .εn = ±1 are possible solutions; therefore, some representation redundancy is needed. We encourage the reader to check geometrically that in .R2 , the beautiful frame supplied by three unit vectors with angles .2π/3 between each other solves the problem. Another important motivation for frames arises in settings where no orthonormal bases are available, as in time–frequency analysis: the famous Balian–Low theorem states that there exist no orthonormal bases of .L2 (R) of the form gm,n (t) = e2iπ αmt g(t − βn),
.
m, n, ∈ Z,
(2)
The Unreasonable Effectiveness of Haar Frames
539
where g is smooth and well localized, i.e., which satisfies ˆ
ˆ .
R
x 2 |g(x)|2 dx < ∞
and
R
ξ 2 |g(ξ ˆ )|2 dξ < ∞.
(3)
For several years, time–frequency frames similar to those in (2) were therefore the only option to get around the Balian–Low obstruction, see, e.g., [18]. Note that another option later appeared: Wilson and Malvar bases (roughly speaking, the complex exponentials in (2) are replaced by sines and cosines, and a proper choice of g, .α, and .β yields orthonormal bases, see [9, 12, 39]). These, however, did not diminish the important role of frames in time–frequency analysis: the most adapted signal processing tool that first detected the gravitational wave was a tight frame, see Sect. 4, composed of the union of 16 Wilson orthonormal bases, where the generating function g is in the Schwartz class, and compactly supported in the Fourier domain, see [5, 41]. The use of a (very) redundant frame is motivated by the fact that a gravitational wave has a much more sparse representation in this system than when using one Wilson basis only. Our objective in this chapter bears some similarity with this last point: we will show that a union of Haar bases has unexpected analysis properties which make it fit for some Artificial Intelligence (AI) and learning questions. Some results proved in this chapter have been announced in [28].
2 The Haar Basis Let .ϕ be the characteristic function of .[0, 1], which, more precisely, we define as ⎧ ⎨ ϕ(x) = 1 if x ∈ (0, 1), . ϕ(0) = ϕ(1) = 1/2, ⎩ ϕ(x) = 0 else,
(4)
and let ∀x ∈ R,
.
ψ(x) = ϕ(2x) − ϕ(2x − 1);
(5)
This (slightly unusual) definition for .ϕ (and hence .ψ) is motivated by the fact that we will consider pointwise values of partial sums of Haar series, and it is important that every point is a Lebesgue point of these partial sums, hence the values chosen for .ϕ at the end points of its support. The Haar basis on .R is the orthonormal basis of .L2 (R) composed of the functions .
ϕ(x − k) for k ∈ Z 2j/2 ψ(2j x − k) for j ≥ 0 and k ∈ Z.
(6)
540
S. Jaffard and H. Krim
This system (or, more precisely, its restriction on .[0, 1]) was introduced by A. Haar in his PhD thesis in 1909 in order to answer a question raised by D. Hilbert; a distressing drawback of Fourier expansions had recently been uncovered: the Fourier series of a continuous function may diverge at some points. Is it the case for all orthonormal systems or is the trigonometric system pathological? A. Haar proved that if f is a continuous function, then the partial sums of the Haar expansion of f converge uniformly. Ironically, an expansion using discontinuous “building blocks” behaves better in terms of regularity than the decomposition using the smooth trigonometric system; this counterintuitive achievement is the first “unreasonable effectiveness” of the Haar basis. Of course, we cannot expect too much from decompositions on this system: the fact that it is composed of discontinuous functions obviously prevents it from being a basis of function spaces composed of continuous functions: indeed, the partial sums of the reconstruction have to belong to the function space considered. To be more precise, we recall which function spaces the Haar system is a basis for. We first provide the appropriate definition. Definition 3 Let E be a separable Banach space. A sequence .(en )n∈N of elements of E is an unconditional basis if it satisfies the following requirements: 1. .∀f ∈ E, there exists a unique sequence of real numbers .(an )n∈N such that the partial sums . n≤N an en converge to f , i.e., N . an en − f −→ 0 n=1
when N → +∞.
(7)
E
2. There exists .C > 0 such that, for any sequence of real numbers .(an )n∈N , for any sequence .(εn ) such that .|εn | ≤ 1, then .
an en . εn an en ≤ C E
E
(8)
The second requirement insures the numerical stability of the reconstruction of f using linear combinations of the .en . A key consequence of (8) is that if .f = an en , then the norm of f in E is equivalent to a quantity built on the .(an )n∈N , which actually only depends on the .|an |. This means that E is isomorphic to a sequence space. In statistics, this key property is often referred to as the multiplier property. Note that it is only one of the two properties for an unconditional basis and, in particular, spaces that are not separable may satisfy the multiplier property (though they do not have unconditional bases). It is e.g. the case for the Hölder .C α spaces if a smooth orthonormal wavelet basis is used, i.e., a basis of the form (6) where α spaces are not .ϕ and .ψ are sufficiently smooth and well localized (since the .C separable, they cannot have an unconditional basis). These topics have played a central role at the beginning of wavelet theory: the first wavelet basis (before the
The Unreasonable Effectiveness of Haar Frames
541
term “wavelet” was coined) was introduced by J. O. Strömberg precisely in order to construct unconditional bases for the real Hardy spaces .H p [43]; several chapters of the seminal book [40] of Y. Meyer are devoted to obtaining equivalent sequence norms for large classes of function spaces, and the multiplier property plays a key role for wavelet methods in statistics, see, e.g., [15] and refs. therein. Note that it is very specific to wavelets, and does not hold for other “classical” bases. In particular, in the periodic case, it has been known for a long time that the .Lp or .C α norms cannot be characterized by a quantity bearing on the moduli of the Fourier coefficients (except in the Hilbert case, i.e., for .L2 ). Coming back to the Haar basis, the subtle problem of determining which function spaces the Haar system is an unconditional basis for, has been completely settled by G. Bourdaud in [2] for Besov spaces, the definition of which we now recall. To that end, we use a smooth wavelet basis so that no restriction on the indices of function spaces is required (one can take the functions .ϕ and .ψ below in the Schwartz class, as shown by Y. Meyer, see [35]). A wavelet basis on .R is an orthonormal basis of 2 .L (R), which has the same algorithmic structure as the Haar system (6), but using functions .ϕ and .ψ which can be smooth and well localized. The orthonormal basis requirement implies that any function .f ∈ L2 (R) can be written as f (x) =
.
ck ϕ(x − k) +
cj,k ψ(2j x − k),
(9)
j ≥0 k∈Z
k∈Z
where the wavelet coefficients of f are ˆ
ˆ cj,k = 2j
.
R
f (x)ψ(2j x − k)dx
and
ck =
R
f (x)ϕ(x − k)dx.
(10)
We will use the fact that convergence also holds pointwise: if .f ∈ L1 (R), then the partial sums of f converge almost everywhere and in particular at Lebesgue points of f , see [44, 47] (note that in the case of the Haar basis, this is a direct consequence of the fact that the partial reconstruction of f up to the scale j is the piecewise constant function on dyadic intervals of length .2−j , which takes for value the average of f on that interval). One of the equivalent definitions of Besov spaces is given by the following requirement. Definition 4 Let .α ∈ R and .p, q ∈ (0, ∞]. A tempered distribution f belongs to α,q the Besov space .Bp (R) if and only if its wavelet coefficients on a wavelet basis in the Schwartz class satisfy the two conditions: .(ck ) ∈ p and
.
j ∈Z
(α−1/p)j
2
|cj,k |
p
q/p ≤ C,
(11)
k∈Z
using the usual convention for . ∞ when p or q is infinite.
542
S. Jaffard and H. Krim
α,∞ In particular, the global Hölder spaces .C α (R) = B∞ (R) (for any .α ∈ R), sometimes referred to as Lipschitz spaces, are characterized by the condition
(ck ) ∈ ∞
.
and
∃C, ∀j, k
|cj,k | ≤ C2−αj .
(12)
The characterization supplied by (11) (respectively, (12)) can roughly be interpreted as follows: the fractional derivative of f of order .α belongs to .Lp (respectively, ∞ .L ), see [40]. Bourdaud’s theorem states that the Haar system is an unconditional basis of the α,q Besov space .Bp (R) (and (11) holds) if and only if
1 1 ,1 . − 1 < α < min . p p Note that Bourdaud’s result is much more general and applies to large classes of wavelet bases (and it is actually expressed in the slightly different setting of homogeneous Besov spaces, a subtlety that we do not need to consider here). It is remarkable that this result is sharp, and the reason for which the Haar system is not a basis for larger values of .α is the most obvious one: the Haar function .ψ given by (5) no longer belongs to the corresponding space. Note that this trivial obstruction also prevents (11) to yield a wavelet characterization of Besov spaces in that case; indeed the Haar function only has one nonvanishing coefficient on the Haar basis, and therefore its coefficients obviously satisfy (11); therefore, if this characterization held for the Haar basis, it would follow that the Haar function belongs to the corresponding Besov space, which is not the case. Despite the limitation due to its irregularity, the Haar system has been of constant use in signal and image processing. A purpose of this work is to show that this limitation can be mitigated by using a frame instead of an orthonormal basis, thus taking advantage of redundancy (as in phase reconstruction or gravitational wave detection). In order to more precisely address this question, we note that the information within the definition of an unconditional basis may be granularized into two different points: 1. The analysis problem: is it possible to characterize the fact that a function belongs to a function space by a condition on the moduli of its coefficients on the analyzing system, as given, e.g., by (11) in the case of Besov spaces? 2. The synthesis problem: when does the partial sum reconstruction formula (7) (or (9) in the case of wavelet bases) hold in the corresponding function space? It is clear that the second point cannot be improved by using a redundant system; indeed, as soon as the “building blocks” do not belong to the function space, the norms in (7) are not even defined. However, there may be some room for improvement concerning the first point: more information would be unveiled if (11) holds for the coefficients on a redundant system, and one might expect that it can be converted into some regularity information on the function allowing to go “beyond” the limitation of Bourdaud’s theorem. Several results actually back this intuition.
The Unreasonable Effectiveness of Haar Frames
543
The first one is the fact that .C α (R) regularity can be characterized with the help of the continuous wavelet transform even if a nonsmooth wavelet is used: let the wavelet .ψ satisfy |ψ(t)| ≤
.
C 1 + |t|2
ˆ and
ψ(t)dt = 0.
R
(13)
The continuous wavelet transform of a function .f ∈ L∞ ∩ L1loc is defined by
ˆ ∀a > 0 and b ∈ R,
Cψ (f, a, b) =
.
R
f (t) ψ
t −b a
dt , a
see [20, 21]. A key point is that the function f can be reconstructed from its continuous wavelet transform using a reconstructing wavelet .ϕ which may differ from .ψ: under weak conditions on .ϕ, one has ˆ f (t) = C 0
∞ˆ
.
R
Cψ (f, a, b) ϕ
t −b a
db da . a2
In particular, one can use a .C 1 compactly supported function .ϕ, which allows to prove the following characterization which holds without regularity assumption on .ψ, see, e.g., [10, 22]. Proposition 1 Let .f ∈ L∞ ∩ L1loc . Let .α ∈ (0, 1). Then, .f ∈ C α (R) if and only if its Haar wavelet continuous wavelet transform satisfies ∃C > 0,
.
∀(a, b) ∈ R∗,+ × R,
|Cψ (f, a, b)| ≤ Ca α .
Another motivation of this chapter is found in the theory of approximation: let .α ∈]0, 1[ and let .f ∈ L∞ ([0, 1]). Denote by .fn the best .L∞ approximation of f by piecewise constant functions on the intervals .[k/n, (k + 1)/n]; the error −α if and only if .f ∈ C α ([0, 1]). This is remarkable, . f − fn ∞ decays as .n since the approximants .fn are not even continuous, and it is due to the fact that the discontinuities of the .fn do not fall at the same points but are interlaced; of course, no such result holds if one considers nested sets of discontinuities, as in e.g. dyadic approximation, i.e., if one restricts to the .f2j , see, e.g., Chap. 12 Sec. 2 of [14]. Note that for the sake of simplicity, we stated this problem within the framework of .L∞ approximation of .C α functions using piecewise constant functions. However, the general case of .Lp approximation of functions in a Besov space using splines of arbitrary order is studied in [14]. Before considering the problem of regularity characterization using Haar frames, we first recall motivations that recently arose in AI and learning using Haar systems.
544
S. Jaffard and H. Krim
3 Relevance of Frames and Haar Decompositions in Machine Learning In addition to their established and useful role in mathematical analysis, frame decompositions and their associated extensions are widely sought in applied sciences. These decompositions often are a preferred and adapted tool to zoom in and unveil useful features/properties which are invariably critical to not only uncovering algorithmic solutions to many data-driven problems but also simplifying the ensuing computational challenges. As noted earlier, data scale-based information, which has long played a key role in signal processing and statistics, has recently emerged in the so-called Machine Learning (primarily nonlinear and neural network-based), yielding creative and promising exploitation of wavelet-based information. Convolutional Neural Network (CNN), see, e.g., [33], broadly referred to as Deep Learning (DL) is a more refined and better performing Neural Network, consisting of layers of banks of linear filters. The output of each of these filters (i.e., the convolution of a signal and the impulse response of the filter, also reflected by the parameters of the latter) is nonlinearly transformed. Their parameters (linear weights of each of the filters) together with the layer biases are iteratively adapted by the gradient of a global output objective appropriately adjusted to the respective layers. The nonlinear transformation is aimed at capturing higher order information modes present in the data. While a notion of progressive scale in the analysis may be present, the filter parameters are unlike those of a multi-resolution analysis in wavelet analysis where the basis functions are canonical and known a priori. While DL is predominantly used in inference applications where class labels are learned/approximated during training, a larger class of applications have been explored. Its pervasive success has, however, been primarily empirical in the computing and engineering sciences, and the mathematical theory has much left to achieve. While regularity and smoothness issues are not of obvious relevance in DL primary inference applications, they are of great importance in generative and reconstruction (e.g., super-resolution) problems of interest. The learned filters in the respective layers of DL provide an approximation of the input data/function at the associated scale. The corresponding nonlinear activation functions may be interpreted as a further polynomial enrichment [42] at the varying scales of the learning process. Other important applications include the so-called superresolution problem of increasing the resolution quality of images invoked the importance of judiciously selecting (albeit still an open problem) the proper scales to improve the performance. The DL layer-wise filter bank structure effectively provides a frame-like representation whose redundancy is predetermined by the chosen number of filters and layers and further refined by the gradient-based adaptation [46].
The Unreasonable Effectiveness of Haar Frames
545
3.1 Toward a Mathematical Formalism of DL The pursuit of a formalism of DL has led Cheng et al. [6] to provide an alternative interpretation of CNN as a wavelet function-based multi-scale frame optimization of data (referred to as Scattering Networks). The computational efficiency of this systematic and optimal representation selected from a wavelet frame representation (for a selected wavelet function) was shown to provide a near translation invariance and a viable inference framework. In contrast to CNN, the a priori choice of a canonical analyzing wavelet function, and its resulting non-optimal feature/structure matching as in CNN, limited its performance. Subsequent and recent data-driven and optimal over-complete representation [37, 38] using Dictionary learning ultimately proved to be very competitive with CNN with additional robustness to so-called adversarial noise. This re-interpretation of CNN sought for a large set of data classes, an optimal selection of data-driven atoms in a frame using the LASSO (Least Absolute Shrinkage and Selection Operator) algorithm [45]. This consisted of securing for a given function a sparse set of atoms from an over-complete set of functions according to an optimal .L2 reconstruction error. This was hierarchically pursued in each layer of a network, and in contrast to the CNN set prior choice of filters/atoms, the number of atoms is determined by the LASSO optimization criterion. The sparsity constraint on the representation coefficients also impacted the robustness of the network [38].
3.2 Graph-Based Deep Learning The notion of CNN was extended to data graph structures [3] where the Laplacian played a central role in the deep structure representation of a graph. Starting with the fact that a complex exponential is an eigenfunction of a Laplacian, it is argued that eigenvectors of a graph Laplacian can be used to construct an orthonormal basis for the space of functions defined on the vertices of a graph. Much like an inner product may be defined on such a space, a graph convolution may be defined. Using the basic duality property of a convolution and product in a space and its Fourier dual, they propose to exploit the Eigen spectrum of the graph Laplacian to efficiently obtain the so-called graph convolution and hence yield a formalism for Graph-based DL. In a similar way, Li et al. [36] constructed a computationally more efficient graph-based DL using their so-called Haar Convolution for which a Haar decomposition is performed at a much lower computational cost including the inversion. The implications of the present work for CNN and Deep Learning will be developed in [32]. In addition to the noted computational advantage, an increasing number of other fast frame-based graph approaches are being explored [48, 50]. A particular future research interest would seek to preserve this orthonormal Haar representation efficiency on Graph DL to a Haar frame representation and thereby inject the analysis with the properties developed in this chapter.
546
S. Jaffard and H. Krim
The regularity results of a Haar system in this chapter present a new potential for further refining Graph DL and for additionally providing a new perspective on computational impacts, such as invariance to translation which is critical to datainherent transient changes.
4 Haar Frame We first define the frame that we will use. The Haar basis (9) is an orthogonal basis of .L2 (R). We add to it the two orthogonal bases obtained by shifting the elements of the Haar basis by .1/3 and .2/3. This means that our analyzing system is composed of the
k k j , j, k ∈ Z. , and Hj,k (x) = ψ 2 x − (14) .Ik (x) = ϕ x − 3 3 After a correct normalization, the .Ik together with the .2j/2 Hj,k form a union of three orthonormal bases; therefore, they constitute a tight frame, i.e., a frame for which the inequalities in (1) are equalities; more precisely, it satisfies ∀f ∈ L2 (R),
.
⎞ ⎛ 1 |f |Ik |2 + |f |2j/2 Hj,k |2 ⎠ . f L2 = ⎝ 3 k
(15)
j,k
A similar wavelet frame-driven approach was first exploited to optimize the translation invariance of an orthonormal wavelet basis representation of a function [23, 31]. Indeed, one well-documented drawback of using an orthonormal wavelet basis in signal and image processing is that it does not supply a translation-invariant representation but is dependent on a particular discrete dyadic grid which is chosen. This drawback can be mitigated by oversampling this dyadic grid, thus replacing the initial orthonormal basis by a finite union of orthonormal bases. The choice supplied by (14) corresponds to an equally spaced oversampling by a factor 3. Applying the so-called multi-scale representation (a wavelet basis representation over a number of scales) in many practical scenarios such as signal detection in radar scenario, or reconstruction of a function/signal in noise (i.e., denoising) as well as parameter estimation in communication, is highly dependent on the translation invariance of the transformation. As an example, a continuous wavelet transform or a wavelet frame (i.e., redundant) representation of a signal guarantees that a time delay estimation or a detection of a very short transient will be successfully achieved. While an orthogonal wavelet representation carries several important statistical properties (such as non-correlated coefficients) as well as parsimony, it may turn out to be unable to detect a short transient or estimate translation-sensitive parameters. Toward mitigating such limitation, an algorithm seeking a translationinvariant wavelet representation of a given function was proposed, by searching for
The Unreasonable Effectiveness of Haar Frames
547
the best samples to prune along the tree-based structure of the frame, thus selecting at each scale the even/odd elements for orthogonalization. This exploitation of a redundant wavelet representation was similarly and later illustrated in [8] as what may be viewed as an “averaging procedure” for improved denoising and referred to as cycle-spinning. The temporal delay of a signal, ideally irrelevant to a successful reconstruction of a signal, is thus central to a translation-invariant representation, and its reflection in the proper coefficients thus becomes essential. Finally, note that wavelet frames with similar properties have been developed independently of the wavelet theory by M. Frazier and B. Jawerth under the denomination of the .φ-transform, cf. [19] and the references therein.
4.1 Uniform Regularity The first problem we consider is the characterization of the uniform Hölder spaces C α (R) on Haar frame coefficients ˆ ˆ j .Ck = f (x)Ik (x)dx and cj,k = 2 (16) f (x)Hj,k (x)dx.
.
α,∞ Recall that these spaces coincide with the Besov spaces .B∞ (R), see [40]; if .0 < α < 1, an equivalent definition is given by
f ∈ L∞ (R)
.
and
∃C, ∀x, y ∈ R,
|f (x) − f (y)| ≤ C|x − y|α , (17)
thus defining a norm equivalent to the .C α (R) norm. We first note that a characterization of .C α (R) on the Haar basis coefficients cannot hold because it would be satisfied by the Haar function itself (the coefficients of which all vanish except for one), and the Haar function is discontinuous and therefore does not belong to α .C (R). Nonetheless, Theorem 1 below shows that such a characterization holds if using the coefficients on the Haar frame (14). We will make the following regularity assumptions on the functions we will consider. Definition 5 Let .f be a locally bounded function; f is Lebesgue-regular if every point is a Lebesgue point of f , i.e., ∀x ∈ R,
.
1 f (x) = lim r→0 2r
ˆ
x+r
f (t)dt. x−r
Note that this definition implicitly makes the assumption that functions are defined “point to point” and not “except for a set of vanishing measure.” Continuous functions are of course Lebesgue regular, but this class also allows for discontinu-
548
S. Jaffard and H. Krim
ities. For instance, assume that at every point x, f has a right and a left limit at x and that at every discontinuity point .x0 , f satisfies 1 .f (x0 ) = 2
lim f (x) + lim f (x) ;
x→x0+
x→x0−
then, f is clearly Lebesgue regular. A key property that we will use is that the wavelet series of a Lebesgue-regular function f converges everywhere to f ; this follows from the fact that if .f ∈ L1 (R), then the partial sums of f converge at its Lebesgue points, see [44, 47]. We need to be more precise here concerning the meaning of the characterization of .C α (R) by (17). Let .f ∈ L1loc ∩L∞ ; then, its wavelet coefficients are well defined, and, if they satisfy (12), then the wavelet series of f converges uniformly toward a function g which coincides almost everywhere with f and satisfies (17). It follows that .f (x) = g(x) at every point x if f is defined “point by point” and is Lebesgueregular. These precautions, which may seem superfluous, are required if one wants to avoid absurd statements such as “the wavelet coefficients of the characteristic function of the rationals all vanish, and therefore it is a .C ∞ function.” Theorem 1 Let .α ∈ (0, 1). Let f be a Lebesgue-regular function; .f ∈ C α (R) if and only if its Haar frame coefficients (16) satisfy ∃C,
.
∀k,
|Ck | ≤ C and
∀j, k,
|cj,k | ≤ C2−αj .
(18)
Remarks 1. This is the same characterization as if we were using a smooth wavelet basis: everything happens as if the Haar basis was composed of smooth wavelets. 2. It follows from the proof that the quantity .
sup |Ck | + sup |2αj cj,k | k
j,k
supplies a norm which is equivalent to the .C α norm. 3. If f is a bounded locally integrable function, it follows that if f satisfies (18), then f coincides a.e. with a .C α function (which, thus, is Lebesgue-regular). Proof of Theorem 1 Assume that .f ∈ C α (R). Then, ˆ .|Ck | = f (x)Ik (x)dx ≤ Ik L1 f L∞ ≤ f L∞ , and hence the first statement in (18) holds. Let Ij,k =
.
k 1 k + j , 2 3 · 2j 3 · 2j
(19)
The Unreasonable Effectiveness of Haar Frames
549
− + be the support of .Hj,k . We denote by .Ij,k the left half of .Ij,k and by .Ij,k the right half of .Ij,k . Then,
ˆ cj,k = 2
j
.
ˆ f (x)Hj,k dx = 2
j
ˆ j
f (x)dx−2
− Ij,k
Ij,k
ˆ f (x)dx = 2
j
+ Ij,k
(f (x)−f (x+2−j )dx,
− Ij,k
so that ˆ |cj,k | ≤ 2j
.
|f (x) − f (x + 2−j )|dx ≤ C2−αj ,
− Ij,k
and hence the second statement in (18) holds. Conversely, assume that (18) holds. Since f is Lebesgue-regular, we can use the reconstruction formula for the Haar orthonormal wavelet basis only (which converges everywhere toward the pointwise value of f ); thus, ∀x ∈ R,
.
|f (x)| =
|ck | |ϕ(x − k)| +
|cj,k | |ψ(2j x − k)|,
j ≥0 k∈Z
k∈Z
.
≤C+C
∞
2−αj
j =0
so that .f ∈ L∞ . Let us now estimate increments of f . We have three possible reconstruction formulas for f using any of the three orthonormal bases composing the tight frame; the idea of the proof is to use this extra flexibility. Let .x = y be given. Define J by .
1 −J 1 2 ≤ |x − y| < · 2−J . 4 2
Consider now the intervals .IJ,k . Since these intervals are of length .2−J and deduce from each other by a shift of . 13 · 2−J , at least one of them contains both points x and y; we denote it by .IJ,kJ . We now use either the Haar basis or one of its two “sisters” shifted by 1/3, the choice being driven by the fact that the interval .IJ,kJ that we picked is the support of an element .HJ,k of the chosen basis. This implies that for all generations .j < J , either the support of an .Hj,k of this basis does not contain x and y or x and y are in the same “half” of the support of .Hj,k . Therefore, let us use the reconstruction formula using this orthonormal basis (and let us denote its elements by .ϕk and .ψj,k ); since it converges everywhere toward the pointwise value of f , we get f (x) − f (y) =
.
k
Ck (ϕk (x) − ϕk (y)) +
j ≤J
k
cj,k (ψj,k (x) − ψj,k (y)) + · · ·
550
S. Jaffard and H. Krim
.
··· +
j >J
cj,k (ψj,k (x) − ψj,k (y)).
k
Because of our choice of the basis, it follows that ∀k,
.
ϕk (x) = ϕk (y)
∀j ≤ J, ∀k,
and
ψj,k (x) = ψj,k (y);
therefore, |f (x) − f (y)| ≤
.
j >J
|cj,k ||ψj,k (x) − ψj,k (y)|.
k
At each generation j , at most two terms bring a contribution; using (18), we get |f (x) − f (y)| ≤
.
C2−αj ≤ C2−αJ ≤ C|x − y|α ,
j >J
so that Theorem 1 holds. As pointed out to us by Albert Cohen, the argument of selecting the “right” interval, which includes both x and y, is similar to the one developed in the mixing lemma, used in order to derive optimal order of approximation of functions in a Besov space by sequences .fn of piecewise constant functions on the intervals .[k/n, (k+1)/n], see, e.g., Chap. 12 Sec. 2 of [14]. Nonetheless, note that the approximants supplied by the three shifted Haar systems supply a much more economical sequence and allow to obtain the same order of approximation characterization result, as we shall now show. We now circle back to the problem that we raised at the end of Sect. 2. We will check that the order of approximation supplied by the piecewise constant functions between dyadic intervals and dyadic intervals shifted by 1/3 is as good as if we used the whole sequence of piecewise constant functions on the intervals .[k/n, (k+1)/n]. Let us be more precise. Let .f ∈ L∞ and denote by .Pj (f ) its partial reconstruction at scale .2−j using the Haar basis, and let .Qj = Pj +1 − Pj . If .f ∈ C α , then it follows from Theorem 1 that its Haar basis coefficients satisfy .|cj,k | ≤ C2−αj . Since .Qj (f ) = k cj,k ψj,k , it follows that .|Qj (f )(x)| ≤ C2−αj . Therefore, since f − Pj (f ) =
.
Qj (f ),
l>j
it follows that .
f − Pj (f ) ∞ ≤ C2−αj .
(20)
Of course, this condition is not sufficient by itself, as shown by picking for f the Haar wavelet which satisfies (20) but has no positive Hölder regularity. We now
The Unreasonable Effectiveness of Haar Frames
551
check that it is sufficient to add the same requirement on the projections on the shifted Haar systems in order to obtain a necessary and sufficient condition. Let i .P (f ) (.i = 1, 2, 3) denote (for .i = 1) the projection of f on the Haar system at j scale j (i.e., .Pj1 = Pj ) and (for .j = 2, 3) the projection of f on the translates, respectively, by 1/3 and -1/3 of the Haar system at scale j . We shall prove the following result. Proposition 2 Let .f ∈ L∞ ). The condition ∃C > 0,
.
∀i = 1, 2, 3,
∀j ≥ 0,
f − Pji (f ) ∞ ≤ C2−αj
is equivalent to .f ∈ C α .
(21)
Proof By (20), we already know that if .f ∈ C α , then . f − ∞ ≤ C2−αj , 2 and by translating the Haar system, the same result holds for .f − Pj (f ) and .f − Pj3 (f ). Conversely, assume that (21) holds. Note that Pj1 (f )
Qj (f ) = (f − Pj (f )) − (f − Pj +1 (f )).
.
Therefore, .
Q1j (f )) ∞ ≤ C2−αj ,
1 |. Thus, we obtain that but, on the support of .ψj,k , .|Q1j (f )(x)| = |cj,k
∃C > 0,
.
∀j,
∀k,
1 |cj,k | ≤ C2−αj .
(22)
The same argument applies to .Q2j (f ) and .Q3j (f ), so that all the Haar frame coefficients of f satisfy (22); we can now apply Theorem 1 which implies that α .f ∈ C .
4.2 Pointwise Regularity: The Haar Basis We will now consider the problem of characterizing pointwise regularity. We start by recalling this notion and the existing partial results if the Haar basis is used. Definition 6 Let .x0 ∈ R and .α ≥ 0. A locally bounded function .f : R → R belongs to .C α (x0 ) if there exist .C, R > 0 and a polynomial .Px0 with .deg(Px0 ) < α such that .
for a.e. x ∈ B(x0 , R),
|f (x) − Px0 (x)| ≤ C|x − x0 |α .
The pointwise Hölder exponent of f at .x0 is .hf (x0 ) = sup{α : f ∈ C α (x0 )}.
(23)
552
S. Jaffard and H. Krim
The polynomial P is unique; it is called the Taylor polynomial of f at .x0 . When using smooth wavelets, criteria based on the wavelet coefficients on an orthonormal wavelet basis allow to recover pointwise regularity; let us recall some notations. A dyadic interval is an interval of the form λ(= λj,k ) =
.
k k+1 . , 2j 2j
Let .λ be a dyadic interval; .3λ denotes the interval of same center as .λ and three times wider. Wavelet coefficients can therefore be indexed by dyadic intervals: we will write .cλ := cj,k . Definition 7 Let f be a locally bounded function, and let the .(ϕk ) and .(ψj,k ) generate a smooth wavelet basis. The wavelet leaders of f are the quantities dλ = sup |cλ |.
.
λ ⊂3λ
If the generating wavelet is smooth, then wavelet leaders allow to estimate pointwise Hölder exponents, see [24] for the initial 2-microlocal wavelet coefficients criterion and [27] for its reformulation in terms of wavelet leaders, which we now recall. We denote by .λj (x0 ) the dyadic interval of width .2−j which contains .x0 . Theorem 2 Let .f ∈ C ε (R) for an .ε > 0. If the generating wavelets .ϕ and .ψ belong to .C N (R), then ∀x0 ∈ R :
.
if hf (x0 ) < N, then
log dλj (x0 ) . hf (x0 ) = lim inf j →+∞ log 2−j
(24)
This question is more difficult to answer when using an irregular wavelet basis. To our knowledge, it was only tackled in [29] by B. Mandelbrot and the first author. Their motivation was the determination of the pointwise regularity of the Polya function, a famous example of a continuous “Peano-type” space-filling function. A remarkable property is that the coefficients of the Polya function on the Schauder basis (which consists of the primitives of the Haar basis) are given explicitly by Bernoulli binomial coefficients. Thus, the study of its regularity can be reduced to the understanding of which regularity properties of a function can be derived from its decomposition on a wavelet basis of limited smoothness; indeed, though it is not exactly a wavelet basis, the Schauder basis has the same algorithmic form as a wavelet basis, and the same type of regularity results, such as Theorem 2, apply, as long as the regularity exponents involved are below the regularity of the elements of the Schauder basis (which are continuous piecewise linear functions and therefore are Lipschitz functions), see [29]. Of course, no such result can hold for regularity exponents higher than the regularity of the wavelet, for the same reasons as in the uniform regularity case: analyzing one element of the basis using the same basis
The Unreasonable Effectiveness of Haar Frames
553
could not, by construction, detect the irregularity of this element at some points. Let us first recall the results concerning pointwise regularity results for the Haar basis. We will need the following notions. Recall that a dyadic rational is a point of the form .k/2j , for j , .k ∈ Z. Definition 8 Let .x ∈ R. The rate of approximation of x by dyadic rationals is log dist (x, 2−j Z) .r(x) = lim sup . log 2−j j →+∞
(25)
Every x satisfies .r(x) ≥ 1, and almost every x satisfies .r(x) = 1. We recall the following result of [29]. Proposition 3 Let f be a locally bounded function, and .x0 ∈ R. If .f ∈ C α (x0 ) for .α < 1, then its wavelet leaders on the Haar basis satisfy |dλj (x0 ) | ≤ C2−αj .
.
(26)
Conversely, if (26) holds and if the Haar basis coefficients .cj,k satisfy the uniform decay assumption ∃ε, C > 0 :
.
|cj,k | ≤ C2−εj ,
(27)
then hf (x0 ) ≥
.
α . r(x0 )
This result yields the best possible pointwise regularity that can be inferred from the knowledge of the size of the Haar basis coefficients. Since .r(x) = 1 a.e., if (27) holds, then this result yields the exact pointwise regularity at almost every point. Besides dyadic points (where it is clear that decay estimates on the Haar coefficients cannot allow to estimate pointwise regularity), the estimate of .hf (x0 ) yielded by Proposition 3 deteriorates if .r(x0 ) is large, i.e., if there exists a sequence of scales such that .x0 is “very close” to dyadic points at these scales. This also indicates that estimates on the Haar frame coefficients might not suffer from this drawback; indeed, if .x0 is “close” to dyadic points at certain scales, it will be “far” from the dyadic points shifted by .1/3 (at the same scales). Before formalizing this idea (see Theorem 3 below), we first consider a simple example where the limitations of Proposition 3 can be observed.
554
S. Jaffard and H. Krim
4.3 The Haar–Weierstrass Function Let R be the 2-periodic function defined by ⎧ ⎪ R(x) = 0 if x ∈ Z, ⎪ ⎪ ⎪ ⎪ ⎨ . R(x) = 1 if x ∈ (2k, 2k + 1) ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ R(x) = −1 if x ∈ (2k + 1, 2k + 2).
(28)
Definition 9 Let .β > 0; the Haar–Weierstrass functions .Hβ are defined by ∀x ∈ R,
Hβ (x) =
∞
2−βj .R(2j x)
(29)
j =1
.
=
∞
2−β(j +1) ψj,k (x).
(30)
j =0 k∈Z
This is a “poor’s man” version of the Weierstrass functions, where the smooth sine (or cosine) function is replaced by the saw-tooth function R. Note that its Haar basis coefficients have the same amplitude at a given scale, so that it is impossible to infer from their size some variations on the pointwise regularity of .Hβ . Nonetheless, we will see that its Hölder exponent is an extremely irregular function which takes all values from 0 to .β on any interval of arbitrary small length. First, note that the series (29) is normally convergent on .R. It follows that .Hβ is continuous at the points which are not dyadic rationals, and, additionally, if .rj,k = k/2j (with k odd) is a dyadic rational in .(0, 1], then .Hβ has a right and a left limit at .rj,k . The amplitude of the discontinuities at these points will yield an upper bound of the Hölder exponent, using the following lemma of [26]. Lemma 1 Let .f ∈ L∞ loc be such that f is Lebesgue-regular and has everywhere a right and a left limit. Let .x ∈ R, and let .xn → x be a sequence of discontinuity points of f ; denote by . f (xn ) the jump of f at .xn . Then, log f (xn ) . .hf (x0 ) ≤ lim inf s→x0 log(|x − xn |)
(31)
The Unreasonable Effectiveness of Haar Frames
555
2
3 2
1
1 0
0 -1
-1 -2 -2
-3 0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
The wavelet series (30) are displayed for .β = 1/2, using Haar’s wavelet (left) and Meyer’s wavelet (right). Since Meyer’s wavelet is .C ∞ , it follows that the Meyer–Weierstrass function is very similar to a Weierstrass function. In particular, its Hölder exponent takes everywhere the value .H = 1/2. This is in sharp contrast with the Haar–Weierstrass function which has discontinuities at dyadic points and whose Hölder exponent takes all possible values between 0 and 1/2.
Let us compute the amplitude . (rj,k ) of the jump (i.e., the difference between these limits) at .rj,k : the first of the .R(2l x) which has a discontinuity at .rj,k is .R(2j x) and its jump is negative, of amplitude .−2 · 2−βj , and the next ones, for .l > j have a positive jump, of amplitude .2 · 2−βl . It follows that the jump at .rj,k is (rj,k ) = Cβ · 2−βj ,
.
where
Cβ =
21−β − 1 1 − 2−β
(32)
so that .Cβ = 0 if .β = 1, which we assume. It follows from Definition 8, (31), and (32) that the Hölder exponent of .Hβ satisfies ∀x ∈ R,
.
hHβ (x) ≤
β . α2 (x)
On the other hand, the second part of Proposition 3 yields that ∀x ∈ R,
.
hHβ (x) ≥
β . α2 (x)
We have thus obtained the following result. Proposition 4 Let .β > 0 such that .β = 1. The Hölder exponent of .Hβ is ∀x ∈ R,
.
hHβ (x) =
β . α2 (x)
(33)
556
S. Jaffard and H. Krim
Remark This implies that .Hβ is a multifractal function, i.e., the equihölder sets EH = {x : hHβ (x) = H }
.
are everywhere dense fractal sets of Hausdorff dimension .DHβ (H ) = H /β for H ∈ [0, β] (and they are empty if .H ∈ / [0, β]). This is a direct consequence of the fact that points that satisfy .r(x) = R have Hausdorff dimension .1/R, see, e.g., [17] (or [26] where similar functions are studied).
.
4.4 Pointwise Regularity: The Haar Frame We will now show that in contradistinction with Proposition 3, the Haar frame coefficients allow to recover the pointwise Hölder exponent everywhere. Let us first introduce some notations. We index the elements of the Haar frame .Hj,k by their support .λ = Ij,k , see (19) (note that though the length of .λ is .2−j , it is not necessarily a dyadic interval). The corresponding Haar coefficient is ˆ cλ = 2j
.
f (x)Hj,k (x)dx.
Haar leaders also are indexed by dyadic intervals and defined by dλ = sup |cλ |.
.
λ ⊂3λ
We will prove the following result. Theorem 3 Let f be a locally bounded function, and let .α > 0. If .f ∈ C α (x0 ) and if the Taylor polynomial of f at .x0 is constant, then its wavelet leaders on the Haar frame satisfy dλj (x0 ) ≤ C2−αj .
.
(34)
Conversely, if (34) holds and if the Haar frame coefficients .cj,k satisfy the uniform decay assumption ∃ε, C > 0 :
.
|cj,k | ≤ C2−εj ,
(35)
then ∃C :
.
if |x − x0 | ≤ 1/2,
|f (x) − f (x0 )| ≤ C|x − x0 |α log(|x − x0 |).
(36)
Remarks The restriction on the Taylor polynomial is automatically satisfied if .0 < α < 1. It is easy to check that it is also satisfied by a class of functions which plays an important role in multifractal analysis: the distribution functions of singular
The Unreasonable Effectiveness of Haar Frames
557
measures (with no restriction on .α). The first statement of the theorem is a classical result, and we recall its proof for completeness, see, e.g., [29] or more recently p [30] where it used in the more general context of pointwise .Tα (x0 ) regularity: its purpose was to obtain a pointwise irregularity criterion based on the continuous (Haar) wavelet transform for the Brjuno function. The Haar wavelet is a natural choice in this case because integrals of the Brjuno function on certain intervals with rational ends have an (almost) explicit form, so that for an appropriate positioning of the support of the Haar wavelet, the orders of magnitude of the Haar coefficients are known. Proof Assume that .f ∈ C α (x0 ) for .α < 1. Let .j ≥ 0 be given, .j ≥ j and .λ ⊂ 3λj (x0 ); since .Hj ,k has a vanishing integral, j
ˆ
j
cj ,k = 2
.
ˆ
f (x)Hj ,k dx = 2 Ij ,k
(f (x) − f (x0 ))Hj ,k dx, Ij ,k
so that |cj ,k | ≤ 2j
ˆ |f (x) − f (x0 )|dx ≤ C2j
.
ˆ
Ij ,k
|x − x0 |α dx. Ij ,k
Since x and .x0 belong to the support of .Ij ,k , hence to .3λj (x0 ), .|x − x0 | ≤ 2−j and it follows that .|cj ,k | ≤ C2−αj , and therefore (34) holds. Suppose now that (34) and (35) hold. Because of Theorem 1, f belongs to ε .C (R), and therefore the wavelet series of f converges to the pointwise value of f everywhere. Define j by .
1 −j 1 2 ≤ |x − x0 | < · 2−j . 2 4
(37)
At least one of the three Haar bases is such that at the generation j , x and .x0 belong to the same interval .Ij,kj . As in the proof of uniform regularity, in order to estimate increments of f , we use this Haar basis for the reconstruction formula: .f (x)−f (x0 ) = Ck (ϕk (x)−ϕk (x0 ))+ cj ,k (ψj ,k (x)−ψj ,k (x0 ))+· · · j ≤j k
k
.
··· +
cj ,k (ψj ,k (x) − ψj ,k (x0 )).
j >j k
The terms for .j < j vanish because x and .x0 belong to the same (right or left) half of the support of .ψj ,k (or of .ϕk ). As regards the terms for .j ≥ j , we first assume that .j ≤ [Aj ], where A is a (large) constant, which will be fixed later. Each sum . cj ,k (ψj ,k (x) − ψj ,k (x0 )) k
558
S. Jaffard and H. Krim
contains at most two nonvanishing terms: the ones such that the x or .x0 belong to Ij ,k , but, in that case, (34) implies that .|cj ,k | ≤ C2−αj . Therefore,
.
. cj ,k (ψj ,k (x) − ψj ,k (x0 )) ≤ C2−αj , k
and [Aj ] .
j =j
cj ,k (ψj ,k (x) − ψj ,k (x0 )) ≤ Cj 2−αj .
(38)
k
Assume now that .j > j . Because of the localization of the Haar basis, (35) implies that . cj ,k (ψj ,k (x) − ψj ,k (x0 )) ≤ C2−εj , k
and .
cj ,k (ψj ,k (x) − ψj ,k (x0 )) ≤ C2−εAj .
j >[Aj ]
(39)
k
We pick A such that .εA = α; (37) implies that .j ≤ C| log(|x − x0 ])|, so that (36) follows from (38) and (39). It follows that Theorem 2 extends outside of the Haar frame setting: if .f satisfied (35) and if the generating wavelets .ϕ and .ψ belong to .C N (R), then log dλj (x0 ) , .hf (x0 ) = lim inf j →+∞ log 2−j if the right-hand side is less than 1.
5 Concluding Remarks and Open Problems If a regular oversampling is not required, the results we obtained also hold for less redundant frames, and the shift by 1/3 can be replaced by other rational shifts that are not dyadic (a dyadic shift would lead to no new Haar system elements for j large enough). Indeed, the key argument in the proofs of Theorems 1 and 2 and Proposition 2 is that if x and y are such that .|x − y| ∼ 2−j , then one can find a translated dyadic interval of length close to .2−j , which contains both x and y.
The Unreasonable Effectiveness of Haar Frames
559
This is clearly possible using only the Haar basis and one translates by a rational r = p/(2k + 1). Indeed, if I is an interval of length l satisfying .2−j ≤ l < 2 · 2−j , let m be defined by
.
.
p p ≤ 2−m < . 8(2k + 1) 4(2k + 1)
Then, I clearly is included either in a dyadic interval of length .2−j +l or in an interval of the same length obtained as a shift by r of a dyadic interval. Therefore, the proofs of Theorems 1 and 2 and Proposition 2 work in the same way for such unions of two orthonormal bases. The same conclusion also holds if adding additional translates by .r = q/(2k + 1) for several values of q (and in particular all of them, if one is concerned by the requirement of a regular oversampling). Note that the question of finding appropriate irrational translations for which the above results would hold is an open problem. Similarly, the above results extend to the several variable versions of the Haar system, which is obtained by a tensor product construction: for .x = (x1 , · · · , xd ), we define
(x) = ϕ(x1 ) · · · ϕ(xd ),
.
and (i) (x) = ψ1 (x1 ) · · · ψd (xd ),
.
where the .ψl are either the one-variable functions .ϕ or .ψ, the choice .ϕ(x1 ) · · · ϕ(xd ) being excluded (so that there are .2d − 1 wavelets . (i) ). Then, the d-variables Haar basis is composed of the . (x − k) for .k ∈ Zd and the .2dj/2 (i) (2j x − k) for d .j ≥ 0 and .k ∈ Z . A regularly spaced Haar frame is obtained by shifting this basis by the vectors . εi ei /3, where .εi ∈ {0, 1, 2} and the .ei are the elements of the canonical basis of .Rd . This yields .3d orthonormal bases, and the d-variables Haar frame is composed of the union of these bases. The proofs that we gave extend without difficulty to this setting; indeed, the key point is to notice that if −j , then one can find a translated dyadic cube of width close to .2−j which .|x−y| ∼ 2 contains both x and y; it suffices to use, in each direction of the canonical basis, the translation supplied by the one-dimensional case for the corresponding coordinate of the segment .[x, y]; this yields an interval of dyadic length in each variable; and these lengths are not necessarily the same in each direction, but picking the largest one will yield a cube of dyadic width which includes the segment .[x, y] and thus has the required property. Theorems 1 and 2 also extend to piecewise smooth wavelets, such as the “spline wavelets” constructed by G. Battle and P.-G. Lemarié, see [1, 34]; in that case, the same proofs allow to characterize global and pointwise regularity up to an order .α given by the number of vanishing moments of the wavelet, which is larger than its uniform regularity (and was the natural bound for previous regularity results). This
560
S. Jaffard and H. Krim
is possible because these wavelets are piecewise polynomials between integers. In contrast, an interesting open problem would be to determine if these results could be extended to, for example, Daubechies wavelets, the singularities of which are not located at integers, but on fractal sets, see [13]. Note that the extension of Proposition 2 for d-variables yields shifted cubes of dyadic width such that the piecewise constant functions of these cubes are sufficient to yield optimal rate of approximation which is more sparse than if one used approximation by functions .fn which are piecewise constant on the d ki ki + 1 , cubes . . Using only two shifted Haar bases instead of 3 yields a n n i=1
frame of redundancy .2d in dimension d. This leaves open the question of minimal redundancy in dimension d that would be sufficient to derive the conclusions of Theorems 1 and 2 and Proposition 2.
References 1. G. Battle. A block spin construction of ondelettes, Part II: The QFT connection. Comm. Math. Phys., 114:93–102, 1988. 2. G. Bourdaud. Ondelettes et espaces de Besov. Revista Matemática Iberoamericana, 11(3):477– 512, 1995. 3. J. Bruna, W. Zaremba, A. Szlam, and Y. Lecun. Spectral networks and locally connected networks on graphs. ICLR 2014, 27, 2016. 4. D. L. C. Heil, P. Jorgensen. Wavelets, Frames, and Operator Theory. Contemporary Mathematics, A. M. S., 2004. 5. E. Chassande-Mottin, S. Jaffard, and Y. Meyer. Des ondelettes pour détecter les ondes gravitationnelles. Gazette de la S.M.F., 148:61–64, 2016. 6. X. Cheng, X. Chen, and S. Mallat. Deep Haar scattering networks. Information and Inference: A Journal of the IMA, 5:105–133, 2016. 7. O. Christensen. An introduction to Frames and Riesz Bases. Birkhäuser, 2003. 8. R. Coifman and D. Donoho. Translation-invariant denoising. In Wavelets and Statistics, Antoniadis A., Oppenheim G. (eds) Lect. Not. Stat., volume 103, pages 121–150, Lyon, France, 1995. 9. R. Coifman and Y. Meyer. Remarques sur l’analyse de Fourier à fenêtre. C. R. Acad Sci. Paris, Sér I, Math., 312:259–261, 1991. 10. I. Daubechies. Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 1992. 11. I. Daubechies, A. Grossmann, and Y. Meyer. Painless nonorthogonal expansions. J. Math. Phys, 27:1271–1283, 1996. 12. I. Daubechies, S. Jaffard, and J. Journé. A simple Wilson orthonormal basis with exponential decay. SIAM J. Math. Anal., 22:554–572, 1991. 13. I. Daubechies and J. C. Lagarias. On the thermodynamic formalism for multifractal functions. Reviews in Mathematical Physics, 6:1033–1070, 1994. 14. R. DeVore and G. Lorentz. Constructive Approximation. Springer-Verlag, New York, 1993. 15. D. Donoho, I. Johnstone, G. Kerkyacharian, and D. Picard. Wavelet shrinkage: Asymptopia. J. Roy. Statist. Soc., B 57(2):301–369, 1995. 16. R. J. Duffin and A. C. Schaeffer. Painless nonorthogonal expansions. Trans. Amer. Math. Soc., 72:341–366, 1952.
The Unreasonable Effectiveness of Haar Frames
561
17. K. Falconer. Fractal Geometry: Mathematical Foundations and Applications. John Wiley & Sons, West Sussex, England, 1993. 18. P. Flandrin. Explorations in Time-Frequency Analysis. Cambridge University Press, 2018. 19. M. Frazier and B. Jawerth. A discrete transform and decompositions of distribution spaces. J. Funct. Anal., 93:34–170, 1990. 20. A. Grossmann and J. Morlet. Decomposition of Hardy functions into square integrable wavelets of constant shape. SIAM J. Math., 15:723–736, 1984. 21. A. Grossmann, J. Morlet, and T. Paul. Transforms associated to square integrable representations I. J. Math. Phys. (reedited in : Fundamental Papers in Wavelet Theory, C. Heil et D. Walnut, eds.), 26:2473–2479, 1985. 22. M. Holschneider and P. Tchamitchian. Pointwise analysis of Riemann’s “non differentiable” function. Inventiones Mathematicae, 105:157–176, 1991. 23. H. C. J.-C. Pesquet, H. Krim and J. G. Proakis. Estimation of noisy signals using time-invariant wavelet packets. In Proc. ASILOMAR Conf., pages 31–34, 1993. 24. S. Jaffard. Exposants de Hölder en des points donnés et coefficients d’ondelettes,. Compt. Rend. Acad. Sci., 308:79–81, 1989. 25. S. Jaffard. A density criterion for frames of complex exponentials. Michig. Math. J., 38:339– 348, 1991. 26. S. Jaffard. Old friends revisited: the multifractal nature of some classical functions. Journal of Fourier Analysis and Applications, 3(1):1–22, 1997. 27. S. Jaffard. Wavelet techniques in multifractal analysis. In Fractal Geometry and Applications: A Jubilee of Benoît Mandelbrot, M. Lapidus and M. van Frankenhuijsen, Eds., Proc. Symposia in Pure Mathematics, volume 72(2), pages 91–152. AMS, 2004. 28. S. Jaffard and H. Krim. Regularity properties of Haar frames. Compt. Rend. Acad. Sci., 359:1107–1117, 2021. 29. S. Jaffard and B. Mandelbrot. Local regularity of nonsmooth wavelet expansions and application to Polya’s function. Adv. Math., 120:265–282, 1996. 30. S. Jaffard and B. Martin. Multifractal analysis of the Brjuno function. Invent. Math., 212:109– 132, 2018. 31. H. K. J.C. Pesquet and H. Carfantan. Time-invariant orthonormal wavelet representations. IEEE Transactions on Signal Processing, 44:1964 – 1970, 1996. 32. H. Krim, S. Jaffard, S. Roheda, S. Mahdizadehaghdam, and A. Panahi. On stabilizing generative adversarial networks. In preparation, 2021. 33. Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521:436–444, 2015. 34. P.-G. Lemarié. Ondelettes à localisation exponentielle. Journal de Mathématiques Pures et Appliquées, 67:227–236, 1988. 35. P.-G. Lemarié and Y. Meyer. Ondelettes et bases hilbertiennes. Rev. Mat. Iberoamer., 1:1–18, 1986. 36. M. Li, Z. Ma, Y.-G. Wang, and X. Zhuang. Haar transform for graph neural networks. arxives, 3, 2019. 37. S. Mahdizadehaghdam, L. Dai, H. Krim, E. Skau, and H. Wang. Image classification: A hierarchical dictionary learning approach. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2597–2601, 2017. 38. S. Mahdizadehaghdam, A. Panahi, H. Krim, and L. Dai. Deep dictionary learning: A parametric network approach. In IEEE Transactions on Image Processing, 2019. 39. H. Malvar. Lapped transforms for efficient transform/subband coding. IEEE Trans. Acoust., Speech, Signal Process., 38(6):969–978, 1990. 40. Y. Meyer. Ondelettes et Opérateurs. Hermann, Paris, 1990. English translation, Wavelets and operators, Cambridge University Press, 1992. 41. V. Necula, S. Klimenko, and G. Mitselmakher. Method for detection and reconstruction of gravitational wave transients with networks of advanced detectors. Journal of Physics: conference series, 363:012–032, 2012.
562
S. Jaffard and H. Krim
42. S. Roheda and H. Krim. Conquering the cnn over-parameterization dilemma: A volterra filtering approach for action recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34. AAAI, 2019. 43. J. Strömberg. A modified franklin system and higher order spline systems on R n as unconditional bases for Hardy spaces. In W.Beckner, editor, Conference in honor of Antoni Zygmund, Vol. 2,, pages 475–493. Wadsworth Math series, 1983. 44. T. Tao. On the almost everywhere convergence of wavelet summation methods. Appl. Comput. Harmon. Anal., 3(4), 1996. 45. R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58–1:267–288, 1996. 46. K. Tran, A. Panahi, A. Adiga, W. A. Sakla, and H. Krim. Nonlinear multi-scale super-resolution using deep learning. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2019, Brighton, United Kingdom, May 12–17, 2019, pages 3182–3186. IEEE, 2019. 47. G. Walter. Pointwise convergence of wavelet expansions. J. Approx. Theory, 80:108–118, 1995. 48. Y. G. Wang and X. Zhuang. Tight framelets on graphs for multiscale data analysis. In In Wavelets and Sparsity XVIII, volume 11138, pages 100–111. SPIE, 2019. 49. R. Young. An introduction to nonharmonic Fourier series. Academic Press, 1980. 50. X. Zheng, B. Zhou, Y. G. Wang, and X. Zhuang. Decimated framelet system on graphs and fast g-framelet transforms. arXiv:2012.06922, 2020b, 2020.
Part III
Genomics and Biology
Quantifying the Rationality of Rhythmic Signals Alexandre Guillet, Alain Arneodo.‡ , Pierre Argoul, and Françoise Argoul
Abstract Rhythms and vibrations represent the quintessence of life, and they are ubiquitous (systemic) in all living systems. Recognising, unfolding these rhythms is paramount in medicine, for example, in the physiology of the heart, lung, hearing, speech, brain, the cellular and molecular processes involved in biological clocks. The importance of the commensurability of the frequencies in different rhythms has been thoroughly studied in music. We define a log-frequency correlation measure on spectral densities that gives the temporal evolution of the distribution of frequency ratios (rational or irrational) in between two signals, using analytic wavelets. We illustrate these concepts on numerical signals (sums of sine functions) and voice recordings from the Voice-Icar-Federico II database. Finally, with a second correlation operation from two of these distributions of ratios (a reference one and the other one from the voices), we introduce another quantity that we call sonance, measuring the “harmony” (rationality) of two voices sung together as a function of a pitch transposition.
1 Introduction Scientific approaches of natural systems have been revolutionised in the last part of the twentieth century with the advent of miniaturised electronic and computer systems. Beyond their impressive beauty, it was offered to human beings to demonstrate that nature is constructed from multi-scale intertwined networks (in time and in space) and that these networks are the field of highly complex nonlinear dynamics (nonlinear and/or non-stationary rhythms) [1–3]. Even though apparently
A. Guillet · A. Arneodo.‡ · F. Argoul () CNRS, UMR5787, Laboratoire Ondes et Matière d’Aquitaine, Université de Bordeaux, Bordeaux, France e-mail: [email protected]; [email protected] P. Argoul MAST-EMGCU, Univ Gustave Eiffel, IFSTTAR, Marne-la-Vallée, France e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Flandrin et al. (eds.), Theoretical Physics, Wavelets, Analysis, Genomics, Applied and Numerical Harmonic Analysis, https://doi.org/10.1007/978-3-030-45847-8_24
565
566
A. Guillet et al.
distinct biological rhythms (endogenous and exogenous) have been recognised as universal features of all organisms (neural signals, heart, hormone secretion, metabolism, tidal, circadian, lunar, seasonal, annual clocks, life cycle etc.) [4], the variability of these rhythms and their spatio-temporal interplay is still considered as incidental or ignored. Despite the fact that we can concretely demonstrate that the frequencies of these rhythms pave more than 10 decades, still, time (and frequency) is considered as varying linearly in living systems. In particular, the presence of strong nonlinearities can give us greater sensing resolution to less intense stimuli. These mechanisms are ubiquitous across animal species and across all sensory modalities. Interestingly, the mappings between external stimuli and the internal perception (psychophysical) of scales and laws are rather logarithmic than linear. A simple and more commonly encountered example for the non-specialist is the perception and emission of acoustic vibrations (sounds) by living species, and these processes occur in logarithmic scales in time and frequency domains [5]. It has also been demonstrated experimentally that the cochlear filters of the inner ear are not spaced at linear frequency intervals but that their spacing is approximately logarithmic [6]. The emission of sound (speech, songs) by human cord tract (larynx, pharynx, mouth) is a complex nonlinear process that combines both muscles and tissues with different temporal and spatial scales and the entire autonomic and central nervous systems. In this chapter, we analyse human voice signals (a single note maintained for a few seconds) that characterise the physiology of the vocal organ (larynx– pharynx–mouth) in healthy and pathological situations. To compare different signals and their spectral composition, we define a log-frequency correlation measure on spectral densities that gives the distribution of frequency ratios (rational or irrational numbers) between two signals. Using the wavelet transform formalism, we extend this measure to a time–frequency correlation measure, which offers the possibility to estimate the temporal variability of this log-frequency correlation. We introduce reference spectral expansions as sums of Dirac terms that resume the characteristic property of these voice signals (harmonics as integer multiples of a fundamental frequency). Finally, we define a new integral cross-correlation of the previously defined measure which quantifies the rationality of the rhythms of two compared signals. We call it sonance, by analogy with the term consonance (respectively, dissonance) that counts the perceived affinity or agreement (respectively, disagreement) between different sounds. We validate this method on numerical model and voice signals collected from different sources. The first section is this introduction. The second section describes the mathematical methodology for log-frequency correlations (or spectrum of frequency ratios) and its generalisation to time–frequency expansions in terms of analytic wavelet transforms. The third section illustrates these concepts on numerical signals (sums of sine functions) and voice recordings from the VoiceIcar-Federico II database, introduces the sonance measure and illustrates it on the previously computed log-frequency correlation measures of voice signals. Finally, we leave the medical application of voice dysphonia diagnosis with the comparison of an untrained voice with a singer voice that has similar spectral envelopes.
Quantifying the Rationality of Rhythmic Signals
567
2 Spectrum of Frequency Ratios: Formalism and Time–Frequency Generalisation 2.1 Correlation Functions for Signal Comparison Let us consider two signals x and y of finite total energy .L2 (R) : .x, x < +∞ and .y, y < +∞, where .·, · is the ordinary inner product of .L2 (R) and .x is the complex conjugate of x: ˆ x, y =
.
+∞
−∞
x(u)y(u)du.
(1)
The comparison of these two signals x and y is usually performed through a deterministic correlation function .R[x, y](ξ ) constructed from a time shift (translation) operator .Tξ : ˆ R[x, y](ξ ) = x, Tξ y =
.
+∞
−∞
x(u)y(u + ξ )du.
(2)
This definition, given for energy signals or square-integrable functions, can be extended to power signals. Thus, for signals that can be described by sums of periodic functions (stochastic signals with finite power), the cross-correlation function reads 1 T →∞ 2T
C[x, y](ξ ) = lim
.
ˆ
T
−T
x(u)y(u + ξ )du.
(3)
When .x = y, we get the auto-correlation function .C[x, x](ξ ), which characterises the similarity between observations of a same signal as a function of the lag .ξ between them. The auto-correlation function is Hermitian: .C[x, x](−ξ ) = C[x, x](ξ ). The absolute value of .C[x, x](ξ ) is maximum at the origin, where the auto-correlation function is real, positive and equal to the power of the signal x. When the signal x is real, this implies that the auto-correlation function is real and even. Note that when .u = t, t being the time variable, the function .C[x, y](ξ ) is the cross-correlation function commonly used for time signals, but u could be replaced by any other type of variable, and in particular the frequency (or log-frequency) when comparing spectral signals, as will be discussed below. Translation-based correlation functions are very important for physics. They turn functions of a relative quantity (such as time or space position whose value depends on a translation from an arbitrary origin) into a function of an absolute quantity (such as time or space interval). However, the value of absolute quantities
568
A. Guillet et al.
that have a physical dimension still depends on its comparison with an arbitrary standard: the physical unit. Since a scaling is involved, the unit plays the role of an arbitrary origin for the logarithm of these quantities. That is the reason why dilationbased correlation functions can be of interest for physics, as long as they compare functions of an absolute quantity: a new variable made of the ratio of two absolute quantities with the same physical unit neither depends on an origin nor on the unit; it is a pure proportion. To extend the concept of correlation functions to absolute physical quantities, we need first to revisit the definition of the inner product. We make use of the logarithm to change from the translation-invariant group .(R, +) to the dilation-invariant one + .(R , ×). The change of variable .u = log v applied in Eq. (1) yields ˆ X, Y =
.
∞
−∞
ˆ X(u) Y (u)du =
∞
X(log v)Y (log v)d log v.
(4)
0
The change from the function .X(u) and the measure .du to the function .X ◦ log(v) and the measure .d log v = dv/v means, for numerical computations, that we replace linearly sampled functions by geometrically sampled ones (of positive variable). In the following, we choose to make explicit the composition with the logarithm in each function. The previous translation operator .Tξ is naturally replaced by a dilation operator .Dq : Tlog q [X](log v) = Dq [X ◦ log](v) = X(log(qv)).
.
(5)
Combining Eqs. (4) and (5), we obtain from Eq. (2) a similar correlation function adapted to geometrically sampled signals: ˆ R[X, Y ](log q) =
.
∞
X(log v)Y (log(qv))d log v,
(6)
0
where q is positive. For functions X and Y , the finite energy condition for the validity of this integral takes the form .X, X < +∞. It can also be reformulated for finite power signals in a similar way as in Eq. (3). The dilation correlation function in Eq. (6) inherits the following symmetry and linearity properties from Eq. (2): R[Y, X](log q) = R[X, Y ](− log q), .
.
R[X, Y + Z](log q) = R[X, Y ](log q) + R[X, Z](log q).
(7) (8)
Note that the logarithm does not allow to study functions of a negative absolute quantity (for instance, negative delays or frequencies) nor negative ratios .q < 0.
Quantifying the Rationality of Rhythmic Signals
569
2.2 Spectrum of Frequency Ratios: A Frequency Ratio Distribution For the application of interest here, the unfolding of rhythms from real signals (their spectral “timbre”), we concentrate on “geometric” spectral densities that we define as real and positive functions .S(log f ) ≥ 0 of the logarithm of the frequency. The log-frequency correlation function between two such densities ˆ R[S1 , S2 ](log q) =
∞
.
S1 (log f )S2 (log(qf ))d log f
(9)
0
captures all the spectral relations between frequency modes of .S1 (log f ) and S2 (log f ). .R[S1 , S2 ](log q) is positive and gives the distribution of frequency ratios q of .S1 (log f ) and .S2 (log f ), hence the notation R for ratio distribution. Similarly to standard correlation function of linearly sampled variables, the existence of this integral .R[S1 , S2 ](log q) requires that both distributions .S1 (log f ) and .S2 (log f ) be square integrable with the geometric measure of f (linear measure for .log f ). Both the log-frequency distribution .S(log f ) and the frequency ratio distribution .R[S1 , S2 ](log q) can be normalised as probability density functions: .
ˆ .
0
∞
ˆ R[S1 , S2 ](log q)d log q = 0
∞
ˆ S1 (log f )d log f
∞
S2 (log f )d log f = 1 .
0
(10) Frequency ratio distributions can be written in analytic form from spectral densities defined as isolated or the sum of Dirac .δ functions. For example, the two spectral densities .Sj (log f ) = δ(log ffj ), .j = 1, 2, have a single frequency ratio . ff12
1 and give a frequency ratio distribution .R[S1 , S2 ](log q) = δ(log qf f2 ). If we define .S(log f ) as a doublet of Dirac deltas .S(log f ) = S1 (log f ) + S2 (log f ), from the linearity property Eq. (8), we can write the ratio distribution .R[S, S](log q) = qf2 1 δ(log qf f2 ) + 2δ(log q) + δ(log f1 ). This simple analytic case is illustrated in Fig. 1, where we distinguish from .R[S, S](log q) three peaks, corresponding to the frequency pairs: (4:4) and (8:8) for .log q = 0, (4:8) for .log q = log 2, and (8:4) for .log q = − log 2. The log-frequency spectral distributions .S(log f ) cannot be assimilated to linfrequency spectral densities defined from the Fourier transform of s: .sˆ (f ) = ´ +∞ −2π if t dt because their computation from linear measures in time and s(t)e −∞ frequency faces some difficulties. The main one is practical, power spectral densities estimated with Fast Fourier Transform (FFT) algorithms are sampled linearly, whereas the integral of Eq. (9) requires a geometric frequency sampling. Resampling strategies of the Fourier spectra have been proposed in the literature [7] and could be used for stationary signals, and however they require greater memory size and are computer time-consuming. Importantly, in the context of physiological signals which are often non-stationary, the extension of time-averaged spectral
570
A. Guillet et al.
(a) 1
(b) 2
1
0
0 2
2.5
-1
3
-0.5
0
0.5
1
Fig. 1 (a) Ideal distribution .S(log f ) in log-frequencies of a doublet of Dirac deltas such that the highest frequency is twice the lowest. (b) Representation of the spectrum of self-relations .R[S, S](log q) in logarithmic scale (base 2). The peak of ratio .log2 q = 0 represents the selfrelation of each frequency peak, whereas the ratios .log2 q = −1, 1 represent their cross-relations
quantities to time–frequency distributions is mandatory. The wavelet transform answers to both issues, and it not only provides a time–frequency representation of the spectral quantities but also allows a geometric sampling in frequency. Using time–frequency decompositions, we can straightforwardly extend our definition of log-frequency ratio distributions, Eq. (9), to time–log-frequency ratio distributions for the analysis of non-stationary signals.
2.3 Wavelet Transform Formalism Time–frequency analysing tools based on the wavelet transform have been introduced in the second half of the twentieth century and applied to many scientific domains for characterising and modelling non-stationary processes [8–14]. The wavelet transform of a finite energy signal .s(t) ∈ L2 (R) is defined as its inner product with the shifted copies of an analysing absolute integrable and finite energy wavelet .ψ(t) ∈ L1 (R) ∩ L2 (R) [9, 14–16]: (p) .W ψ [s](a, b)
= ψa,b , s = a
− p1
ˆ
+∞
−∞
s(t)ψ
t −b dt, a
(11)
where .b ∈ R and .a ∈ R+ are the shift and scaling parameters. .ψ is the complex conjugate of the analysing wavelet .ψ, and p is a parameter which defines the normalisation of the wavelet. Two values of p are usually found in the literature: .p = 1, corresponding to the 1 2 .L (R) norm, and .p = 2, corresponding to the .L (R) norm, respectively. .p = 1, often used for time-localised signals with different amplitudes, is appropriate when the magnitude of the modulus wavelet transform is wished to reflect the
Quantifying the Rationality of Rhythmic Signals
571
amplitude of the analysed signal .s(t). .p = 2 is appropriate when the modulussquared wavelet transform is wished to reflect the energy of the analysed signal .s(t). In the frequency domain, the expression of the wavelet transform reads (p)
Wψ [s](a, b) = a
.
1− p1
ˆ
+∞
−∞
ˆ sˆ (f ) ψ(af )e2iπf b df ,
(12)
where .sˆ and .ψˆ denote the Fourier transforms of the signal and the wavelet. This time-scale representation is quite suited for non-stationary signals since it localises the analysis around time b and operates a band-pass filtering scaled by the parameter a. Importantly, a can be sampled arbitrarily, and in our case, we will sample it geometrically. It is common practice to consider the scale a as proportional to an inverse frequency . f1a : a=
.
fψ , fa
(13)
where .fψ is a characteristic frequency of the mother wavelet .ψ. Three meaningful frequencies are classically used for .fψ [17]: the peak frequency .fψ0 where the ˆ frequency domain mother wavelet magnitude .ψ(f ) is maximum, the energy (norm 2 ˆ ) and the norm 1 frequency .fˇψ 2) frequency .f ∗ which is the mean of .ψ(f ψ
that can be interpreted as an instantaneous frequency for progressive wavelets. An asymmetry in the frequency domain of the mother wavelet leads to distinct values for the previous frequencies .fψ . For the computation of the log-frequency correlation functions, the expression .Wψ [s](fψ /fa , b) for the wavelet transform given in Eq. (12) can be turned to a time–frequency analysis by using Eq. (13) for a given characteristic frequency .fψ : (p) .W ψ [s]
ˆ +∞ fψ fψ 1− p1 ˆ ,b = a sˆ (f ) ψ f e2iπf b df . fa fa −∞
(14)
For our applications to physiological signals, the Banach space .L1 (R, dt) norm corresponding to .p = 1 will be preferred for the wavelet transform definition. This is due to the following fact: when rescaling time in the input signal as .s ρt , with .ρ > 0, both the time and the scale of the wavelet transform are rescaled, but s(ρ f ), without changing its magnitude. Thus, as the Fourier transform of .s ρt is .ρ a b Eq. (12) with .p = 1 leads to .W(1) ψ [s] ( ρ , ρ ). The superscript (1) is dropped in the following. The peak frequency .fψ0 will be then adopted for the characteristic frequency .fψ in Eqs. (13) and (14).
572
A. Guillet et al.
The admissibility condition for an analysing wavelet .ψ ∈ L1 (R) ∩ L2 (R) establishes that the number ˆ +∞ 2 du ˆ (15) .cψ = |ψ(u)| u 0 must be finite, nonzero and independent of .f ∈ R+ . If this admissibility condition is fulfilled, then every .s ∈ L2 (R) can be reconstructed from the convergent integral: s(t) =
.
2.3.1
1 cψ
ˆ
+∞ ˆ ∞
−∞
−∞
Wψ [s] (a, b) ψ
t −b a
da db . |a|
(16)
Time and Frequency Window for the Analysing Wavelet
The time–frequency window can be computed from the expression of the analysing ˆ ) ∈ L2 (R) wavelet .ψ, assuming that .ψ and .ψˆ verify .tψ(t) ∈ L2 (R) and .f ψ(f [18]. If the centre and the radius (with the norm 2) of the window function .ψ are, respectively, .tψ∗ and .ψ , .ψ((t − b)/a) is a window function with centre .b + atψ∗ and radius equal to .aψ : [b + atψ∗ − aψ , b + atψ∗ + aψ ] .
.
(17)
This window narrows (respectively, widens) for small (respectively, large) values of a. In the frequency domain, the window of .ψˆ is defined similarly, assuming that the centre and width of .ψˆ are .fψ∗ and .ψˆ , .ψ(af ) is centred around .fψ∗ /a and has a radius .ψˆ /a:
.
fψ∗
∗
fψ 1 1 − ψˆ , + ψˆ a a a a
.
(18)
In the following discussion, the centre .fψ∗ of .ψˆ is assumed to be positive. There are different ways of defining the wavelet resolution, called the quality factor of the wavelet. A first definition, given in [19], uses the bandwidth and the norm 2 frequency as follows: Q∗ =
.
fψ∗ /a 2ψˆ /a
=
fψ∗ 2ψˆ
,
(19)
which is independent of the scale parameter a. Alternatively, we could also use the ˆ )|2 instead of . ˆ . We thus define another full width at half maximum height of .|ψ(f ψ ˜ such as quality factor, .Q,
Quantifying the Rationality of Rhythmic Signals
˜ = Q
.
573
fψ0 |f2 − f1 |
,
(20)
ˆ 1 )|2 = |ψ(f ˆ 2 )|2 = |ψ(f ˆ 0 )|2 /2 and .f1 < f 0 < f2 . This factor where .|ψ(f ψ ψ is usually computed to characterise the qualitative damping behaviour of simple damped oscillators [20]. The choice of the quality factor is essential to obtain an adapted time–frequency resolution and consequently a “good” analysis of the processed signals. The authors in [21] propose three bounds to obtain a range of acceptable values. When the signal is composed of several frequency components, the proximity of their characteristic frequencies provides a lower bound. The exponential decay rate of the amplitude imposes another upper bound. Eventually, the length of the signal determines yet another upper bound.
2.3.2
Choice of the Analysing Wavelet: The Grossmann Wavelet
In the absence of a suitable unifying theory for wavelet behaviours, the choice of a particular wavelet for a particular problem may often appear arbitrary. For rhythmic ˆ ) = 0, signals, complex analytic analysing wavelets are preferred, leading to .ψ(f .∀f ≤ 0. In that case, the measure appears naturally in these integrals (Eq. 16), as in Eqs. (4) and (6), because the analysing wavelet is scale invariant (under dilations). In the following, we choose a single-parameter progressive wavelet, introduced for the decomposition of Hardy functions by Grossmann and Morlet [8]: ψ0 e 0
ˆ Q (f ) = .ψ
− 12 Q log
f 2 f0
∀f > 0 ; ∀f ≤ 0 ,
(21)
of peak frequency .fψ0 = f0 , for which the maximum value is .ψ0 . This wavelet is symmetric in log-frequencies about .log f0 . The other characteristic frequencies are 3 3 ∗ 4Q2 and .fˇ = f e 2Q2 . The Grossmann wavelet is also centred in time .f ψ 0 ψ = f0 e (.tψ∗ = 0), with a width (radius in norm 2):
ψ =
.
1 + 2Q2 . 4πf0
(22)
Both previously defined quality factors depend on Q only: Q∗ =
.
1 2
1 − 1 2 e 2Q2 − 1 ,.
(23)
574
A. Guillet et al.
(a) 0.3
(b) 1
(c) 1
0.2
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.1 0 -0.1 -0.2 -0.3 -5
0
0 0.5
5
(d) 0.04
0 1
1.5
(e) 1
(f) 1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.02
-0.1
0
0.1
-0.1
0
0.1
0
-0.02
-0.04 -20
0
0 0.5
20
0 1
1.5
Fig. 2 Grossmann analysing wavelet (log-normal in frequency). .ψ(t) is computed by inverse Fourier transform of .ψˆ Q (f ) (Eq. 21). (a) .ψQ (t). (b) .ψˆ Q (f ) in linear frequency scale. (c) .ψˆ Q (f ) in a (base 10) logarithmic frequency scale. (a–c) are computed for .Q = 8. (d–f) The same as (a–c) for .Q = 64. In (a) and (d), .|ψQ (t)| (respectively, . ψQ (t) ) are plotted in thick (respectively, light) black lines
√ −1 log 2 ˜ . Q = 2 sinh Q
(24)
√ When Q √ is large enough, the leading term in the expansions gives .Q∗ Q/ 2 and ˜ Q/ log 2, respectively, followed by a term of order . 1 . Consequently, we will .Q Q refer to the parameter Q as the quality factor for this wavelet. When choosing the value .ψ02 = √Qπ , the admissibility constant is one and ˆ )|2 can be considered as a probability density function in log-frequencies: .|ψ(f ˆ cψ =
.
0
∞
|ψˆ Q (f )|2 d log f = 1 .
(25)
In Fig. 2, we plot the Grossmann wavelet for two values of Q, respectively, Q = 8 (top plots) and .Q = 64 (bottom plots). For larger Q values, the number of oscillations of . ψQ (t) and its width increase, whereas .ψˆ Q (f ) narrows. We can observe in Fig. 2c,f that the wavelet in the log-frequency domain is symmetric around .log f0 = 0, whereas it is asymmetric around .f0 = 1 in linear frequencies (Fig. 2b,e). An important aspect of the oscillating progressive wavelets is how many
.
Quantifying the Rationality of Rhythmic Signals
575
oscillations are fitting inside their time window [17, 19]. This number of oscillations determines the acuteness of the local frequency detection of a given rhythm and is of order Q. If this number is too large, the wavelet averages over too many oscillations and cannot provide a correct estimation. Conversely, if the number of oscillations is insufficient (less than .∼ 3), the detection of a local rhythm will not be possible. The choice of this parameter is particularly important if the signal presents sharp transitions or close frequencies, as will be illustrated in the following figures. The authors in [17] showed that the Grossmann wavelet can be seen as a scaling limit of a general family of progressive wavelets with two parameters, the Morse wavelet [22–28]. The Cauchy–Paul wavelet, intensively used in quantum mechanics and in the context of analytic functions [29], and the analytic version of the derivative of Gaussian wavelet or the Airy wavelet all belong to the Morse family.
2.4 Extension of Frequency Ratio Distributions to Time–Frequency Ratio Distributions From the Grossmann progressive wavelet transform defined in the previous section, we define a time–frequency distribution for non-stationary signals: 2 f0 , b . S(Q) (log(fa ), b) = WψQ [s] fa
.
(26)
Note that the integral of the wavelet transform definition in Eq. (12) is sampled linearly in f , but that the values of the frequencies .fa (or scale a) can be chosen arbitrarily, for our purpose we will select them geometrically distributed. In the following, .b = t and .fa = f are considered as time and frequency parameters, which simplifies the notation of .S (Q) (log f, t). This distribution is computed for strictly positive values of f , and we can extend the definition of the cross-correlation function to time–frequency distributions: (Q) (Q) .R[S 1 , S2 ](log q, t)
ˆ = 0
ˆ =
0
∞
(Q)
(Q)
S1 (log f, t)S2 (log(qf ), t)d log f .
(27)
2 2 Wψ [s1 ] f0 , t Wψ [s2 ] qf0 , t d log f. Q Q f f
∞
(28) The log-frequency auto-correlation function is defined as .R[S(Q) , S(Q) ](log q, t). The temporal mean of .S(Q) (log f, t): .S(Q) t(log f ) that can be seen as a power spectral density based on the wavelet transform .WψQ [s].
576
A. Guillet et al.
2.4.1
Computation of the Log-Frequency Correlation Function (Q)
(Q)
Using the convolution theorem, .R[S1 , S2 ] can be computed quite efficiently using the fast Fourier transform (FFT) several times (which discretizes the Fourier transform here denoted .F): on a first step with respect to the time variable the signal (noted .F) and on a second step with respect to the log-frequency variable (noted .Flog f ), and the computation step is an inverse FFT in log-frequency space (noted −1 .F log f ). (Q)
.R[S1
(Q) −1 2 2 (log q). , S2 ](log q, t) = Flog f Flog f |WψQ [s1 ](·, t)| Flog f |WψQ [s2 ](·, t)| (29)
where WψQ [s](f, t) = F −1
· F[s](·) (t) . ψˆ Q f
(30)
This supposes that the frequency f (or scale a) parameter of the CWT is sampled geometrically. The slowest operations consist in matrix multiplications. The fact that the second step requires Fourier transforms of the distributions .S on log-frequency scale implies that the computed range of log-frequency values is enlarged and padded with zeros to avoid extra ratios arising from the FFT computation by an artificial periodisation of the .S distribution.
3 Computation of Log-Frequency Distributions from Numerical and Real Signals 3.1 Model Signals Constructed from Sine Functions In Fig. 3, we construct an artificial non-stationary signal from the sum of two sine functions: .s(t) = sin(φ1 (t)) + sin(φ2 (t)), with .φ2 (t) = 4π t linear in time and .φ1 (t) = 2π tH (−t)+3π tH (t) with the Heaviside step function H , and we compare the wavelet transform analysis for the two quality factors .Q = 8 and .Q = 64. With this signal, we estimate a lower bound of Q that is suitable for a frequency discrimination according to [21]: for .t < 0, .Q 10 and for .t > 0, .Q 14. Moreover, the signal length gives the constraint .Q 285. For .t < 0, the signal possesses two frequencies, highlighted on the colour-coded image of .S(Q) (log f, t) by two horizontal bands (.f1 = 1 and .f2 = 2), and the width of which depends on the quality factor Q (.Q = 8 near the lower acceptable Q bound in Fig. 3b and .Q = 64 in Fig. 3c). For .t > 0, we can again recognise the two bands .f1 and .f2 and, as for .t < 0, their narrowing for the larger Q values. The transition zone of this two bands, below and above .t = 0, needs to be discussed. Figure 4 highlights this transition with sections of .S(Q) (log f, t) performed for remarkable values of f : 1, 3/2 and 2. From the sections of Fig. 4a, we estimate the width of this transition
Quantifying the Rationality of Rhythmic Signals 0
1.5
577
0.1
0.2
0
0.3
1
1
1
0.5
0.5
0
0
0
-20
0
20
40
-0.5 -40
-20
0
(d)
2
0.2
0.3
(c)
0.5
-0.5 -40
0.1
1.5
(b)1.5
(a)
0 50
20
40
-0.5 -40 0
100
1.5
(f)
(e)
-20
0
1000
20 2000
40 3000
1.5
1
1
0.5
0.5
1 0
0
0
-0.5
-0.5
-1 -2 -40
-20
0
20
40
-1
-1
-1.5 -40
-1.5 -40
-20
0
20
40
-20
0
20
40
Fig. 3 Analysis of a model signal defined as the sum of two sine functions .s(t) = sin(2πf1 (t)t) + sin(2πf2 t), with .f2 = 2 constant, and .f1 (t) = H (−t) + 32 H (t) with the Heaviside step function. (a) Plot of the frequencies .f1 (t) and .f2 (t) in (base 2) logarithmic scale. (b) .S(8) (log f, t), computed for .Q = 8. (c) .S(64) (log f, t), computed for .Q = 64. (d) Temporal signal (8) (8) (64) .s(t) in the time window [.−40s, 40s]. (e) .R[S , S ](log q, t). (f) .R[S , S(64) ](log q, t). (Q) .R[S , S(Q) ](log q, t) is defined in Eq. (27)
∼ 4.8s for .Q = 8 and .∼ 39s for .Q = 64. Another interesting phenomenon emerges in the .t > 0 regime, where the two frequency bands become closer. A low-frequency modulation of the wavelet transform squared modulus in the intermediate frequency range .[f1 , f2 ] with period 2s appears, corresponding to frequency .fm = f2 − f1 (0.5Hz in this example). The matrix of the wavelet transform modulus is not simply the superimposition of the wavelet transform squared moduli of the sine alone, 2 2 2 .|WψQ [s1 + s2 ](a, b)| = |WψQ [s1 ](a, b)| + |WψQ [s2 ](a, b)| , but extra terms ˆ ˆ such as .2ψ(af 1 )ψ(af 2 ) cos(2π(f2 − f1 )b) are also involved and are not negligible when .f1 and .f2 become too close (which is the case in Fig. 3b). This effect ˆ ˆ disappears quite completely for larger Q values because the product .ψ(af 1 )ψ(af 2) vanishes. We conclude that the choice of Q is a compromise between two objectives: (i) discriminating close frequencies (in which case larger Q values will be preferred) and (ii) affording a correct temporal resolution for the detection of steep frequency changes (in which case smaller Q values will be more efficient). Figure 3e,f shows the corresponding colour-coded maps .R[S(Q) , S(Q) ](log q, t) for the same signal and the same values of Q (8 and 64). We recognise for .t < 0 three horizontal bands of constant q, corresponding, respectively, to frequency ratios .q = 1/2, 1, 2. The intensity of the middle band (.q = 1) is more contrasted (.×2) because it corresponds to the sum of self-relations (.f1 :.f1 ) and (.f2 :.f2 ). The two symmetric weaker bands correspond to cross-frequency ratios (.f1 :.f2 ) and (.f2 :.f1 ). For .t > 0, the three bands become closer, and similarly to the maps of .S(Q) in Fig. 3b, a slow temporal modulation of .R[S(Q) , S(Q) ](log q, t) superimposes to .
578
A. Guillet et al.
Fig. 4 (a, c, e) Sections of the log-frequency spectral distributions .S(8,64) (log fi , t) selected from Fig. 3b,c for the three frequencies .f1 = 1, .f2 = 3/2 and .f3 = 2 Hz and the same values of the quality factor Q (8 and 64). (b, d, f) Sections of .R[S(Q) , S(Q) ](log qi , t) selected from Fig. 3e,f for three values of q: .q1 = 1, .q2 = 4/3, .q3 = 2, corresponding to local maxima of .R[S(Q) , S(Q) ]. For each selected q value, .R[S(Q) , S(Q) ] was scaled by its maximum in the time interval
the bands, due to coupling terms in the wavelet transform modulus. As expected and similarly to what was observed on .S(Q) (log f, t) maps, increasing Q from 8 to 64 produces a strong narrowing of the bands and a strong reduction of the low-frequency modulation. The sections at fixed q of these .R[S(Q) , S(Q) ](log q, t) maps are shown in Fig. 4 to highlight similarly the transition zone around .t = 0, its widening for larger Q values and the slow temporal modulations observed for .Q = 8. Due to the use of the Grossmann wavelet, sections at fixed t of both √ (Q) .S (log f, t) and .R[S(Q) , S(Q) ](log q, t) are Gaussian of widths .( 2Q)−1 and −1 , respectively, when the bands are not interfering (independent of t). .Q Another family of model signals (Fig. 5e), particularly interesting with respect to nthe applications to voice signals, is defined as the sum of sine functions . i=1 sin(2πfi t) with .fi = if1 , i positive integer. We use here a simple model with discrete and constant frequency components, which does not pretend to account neither for the intrinsic randomness nor the non-stationarity of physiologic signals. To improve the matching of model equations with real signals, we suggest two recent works based on the implementation of stationarity-breaking operators on Gaussian stationary random signals [30, 31]. In Fig. 5, we take .n = 6 and perform
Quantifying the Rationality of Rhythmic Signals
579 -3
1
(a) 3
2
3
4
5
6
10 7
10 0.5
(b) 3
(c) 3
2
2
1
1.5
2
(d) 3 2.5
2.5 2
2 1.5
1.5 1
1
1
1
0.5
0.5 0
0
0
0
-0.5
-0.5 0
5
10
15
20
25
-6
30
-4
20
(e)
-3
2.5
1
0.8
-2
40
0
60
2
80
4
100
-6
6
120
0
-4
-2
500
0
1000
2
4
1500
6
0
0.1
0.2
0
0.5
1
2000
(f) 3
(g) 3
(h) 3
2
2
2
1
1
1
0
0
0
-1
-1
-1
-2
-2
-2
-3
-3
0.6 0.4 0.2 0 -6
-4
-2
0
2
4
6
-6
-4
-2
0
2
4
6
-3 -6
-4
-2
0
2
4
6
Fig. 5 Analysis of a model signal defined as the sum of 6 sine functions .s(t) = 61 sin(2πfi t), with .fi = if1 constant. (a) Fourier spectra of s: .|ˆs |(f ) on which the two axes have been inverted. (b) .S(8) (log f, t), computed for .Q = 8. (c) .S(128) (log f, t), computed for .Q = 128. (d) .S(Q) (log f, t = 0). (e) Temporal signal .s(t) in the time window [.−6.5s, 6.5s]. (f) (8) (8) (128) .R[S , S ](log q, t). (g) .R[S , S(128) ](log q, t). (h) .R[S(Q) , S(Q) ](log q, t = 0). In (d) and (h) the plots for .Q = 8 (respectively, 128) are coloured in black (respectively, red). (Q) .R[S , S(Q) ](log q, t) is defined in Eq. (27)
the same time–frequency analysis with a Grossmann analysing wavelet with two values of Q, respectively, .Q = 8 (b,c) and .Q = 128 (f,g). We note again that the larger Q, the finer and distinguishable the peaks of both .S(Q) (log f, t) and .R[S(Q) , S(Q) ](log q, t). The already noticed low-frequency modulations in the previous example again appear in this example (Fig. 5b,f) for .Q = 8. Amazingly the frequency of this slow mode is precisely the fundamental frequency of this signal, and this modulation is the most intense for the highest harmonic (.f6 = 6f1 ), and this effect is due to the ordering of these 6 frequencies as integer multiples of .f1 , giving a constant frequency step between successive harmonics .fi+1 − fi = f1 . This .f1 = 1 Hz slow modulation mode appears when two frequencies of the list are too close (in log-scale) for being separated properly by the analysing wavelet. We observe that in the time–frequency distribution of s shown in Fig. 5b for .Q = 8, the higher harmonics cannot be distinguished, and their separation requires increasing markedly the value of Q (for instance .Q = 128 in Fig. 5c). Figure 5d illustrates two sections of these distributions for .t = 0 (black curve, .Q = 8 and red curve, .Q = 128). This phenomenon is even more visible on the ratio distribution (Q) .R[S , S(Q) ](log q, t) in Fig. 5h. .R[S(Q) , S(Q) ](log q, t) presents an odd number of peaks, and it is symmetric around the central peak (.q = 1). In that example, each
580
A. Guillet et al.
of the sixth frequency components contributes to this central peak. 11 lateral peaks emerge for .q > 1 and accumulate closer to the central peak. The positions of these peaks correspond to all the possible distinguishable frequency ratios of the signal, and the amplitude of these peaks is proportional to the number of combinations of frequencies that produces a given ratio. To distinguish all the peaks in Fig. 5g,h, it was necessary to increase Q to 128. The total number of frequency ratios (for .q > 1) is . n−1 i=1 i = n(n − 1)/2 if .n > 2 , and in this example it is equal to 15. When there is no redundancy in the frequency ratios, for instance, if harmonic frequencies are prime multiples of the fundamental frequency, each frequency ratio occurs once in (Q) .R[S , S(Q) ](log q, t).
3.2 Physiological Signals: Voice Recordings The voice signals reported in this manuscript were selected from the VOiceICar-fEDerico II (VOICED) database [32] recorded by the “Institute of High Performance Computing and Networking of the National Research Council of Italy (ICAR-CNR)” and the Hospital University of Naples “Federico II” during 2016 and 2017. This database can be downloaded from the PhysioNet website [33]. It has been proposed lately as a new element in research on automatic voice disorder detection and classification. Together with medical phonetic examinations of a set of 208 individuals, among which 73 males and 135 females, voice signals, proportional to a local sound emission intensity, were acquired for about 4-5 s and sampled at 8000 Hz at 32 bit, vocal folds were examined by laryngoscopy and two medical questionnaires were collected at the ambulatories of Phoniatrics and Videolaryngoscopy of the “Federico II” Hospital of the University of Naples or at the medical room of the ICAR-CNR. The protocol description is reported in [32]. Dysphonia is a quite common voice disorder (1/3 of adults will suffer from it once in their lifetime), and it may originate from a functional or organic alteration of the vocal apparatus and its mechanics and may not systematically be considered as pathologic [34, 35]. On the one side, laryngoscopy is an invasive technique that gives a direct view of the physical alterations of the vocal tract [36]. On the other side, the analysis of the voice acoustic signal is not intrusive, and, thanks to the improvement of signal analysis methods, it can nowadays be used to guide or assist the recognition of the origin of a suspected dysphonia. Voice classification methods from voice recordings by the recognition and quantification of the voice timbre (or tone colour) have rapidly attracted the interest of electronic and computer science engineers. Globally, one can classify these methods in three groups [37]: (i) the time-domain methods that use auto-correlation functions or their variants [38, 39] to search for repeatability between a temporal waveform and its time lagged version, (ii) frequency domain methods that locate characteristic frequencies and conclude to a spectral “coloratura” for the voice (these methods meet rapidly their limitations if the signal is not stationary), and (iii) time–frequency domain techniques [40–42] that we have chosen for this study.
Quantifying the Rationality of Rhythmic Signals
581
The voice signal .s(t), numbered #008, is that of a female of 51 years without deep vocal impairment at the time of the test, ranked in the group of reflux laryngitis (Fig. 6). This example was chosen because it has marked peaks which can be detected by thresholding the signal (this is quite rare because it requires both a particular shape of the signal and a global stationarity of its amplitude). The Fourier spectra of this signal (reported in log–log and log–lin scales in Fig. 6b and c, respectively) weight the power (in log-scale) of its spectral components, a fundamental mode with frequency .f1 ∼ 188 Hz and higher modes (harmonics), ranked as integer multiples of .f1 : .if1 with .i = 2, 3, 4, 5, ... with different powers. This simple frequency decomposition was observed in most of the signals provided in the VOICED database, and this is a conspicuous characteristic of the human voice. These voice signals appear as the alternance of quite regular large and sharp peaks (which give the fundamental mode) and smaller oscillations which may be very irregular. In some cases, these smaller oscillations may be difficult to discriminate from the noise produced by some friction of the vocal tracts. Even though this type of signal can be compared to the sum of sine functions introduced in Fig. 5e, the higher number of harmonics of this signal and their different power means that it could be reproduced by a nonlinear dynamical system (ruled by nonlinear ordinary differential equations) where the different frequency components follow nonlinear rules [45]. Our purpose in this chapter is not to discuss the physical and biological mechanisms or the modelling of voice signals, and we have selected these examples as illustrations for our log-frequency correlation method because their spectral decomposition is very rich in harmonics (overtones) of the fundamental frequency. The temporal change of the fundamental mode frequency .f1 (ti ) and the largest peak amplitude .A(ti ) can be extracted from the #008 voice signal by thresholding its largest amplitude peaks (maxima: .sP (ti ) and minima .sp (ti )) as depicted in Fig. 6e. Figure 6c shows that .f1 (t) is modulated in time, suggesting an irregularity of the rhythm coming from some difficulty of the patient to maintain a constant value of .f1 . In this example, a similar temporal modulation is also visible on the largest peak amplitude .A(ti ) (Fig. 6g). If these temporal variations were solely produced by instrumental noise, the first return scatter plots of .f1 and A at successive peaks would give a symmetric cloud of points around the diagonal. In Fig. 6d for the fundamental frequency modulation and in Fig. 6h for the amplitude modulation, these first return scatter plots are anisotropic, meaning that the dispersion of these values extends beyond instrumental noise. This conclusion is also confirmed by the temporal evolution of .f1 (ti ) (Fig. 6c) and .A(ti ) (Fig. 6g), we notice that in the first second, the modulations of .f1 (ti ) have the largest amplitude and are quasi-periodic, and this first regime can also be recognised from the modulations of .A(ti ). This patient has a rather mild dysphonia (classified as produced by reflux laryngitis), which can be recognised by an important set of harmonics and a rather low vocal fold noise. The second voice signal illustrated here is that of a female of 62 years (#169 in the VOICED database), with hyperkinetic dysphonia. In that case, the quasiperiodicity observed in signal #008 is so much disrupted that it is impossible to use the previous
582
A. Guillet et al.
(b)
(a) 0.6
196
-1
194
194
-2
0.4
(d)
(c)
192 192
0.2
-3
0
-4
190
190
188
188 -0.2
-5
186
186
-0.4 1.9
(e)
1.95
2
2.05
-6
2.1
0.6
(f)
0.4
6
8
10
12
-1
184
(g)
0
1
2
3
4
1
195
0.9
-3 0.85
0
190
0.95
0.9
0.2
185
(h) 1
0.95
-2
184
0.85
-4 0.8
0.8
0.75
0.75
-0.2 -5 -0.4 1.995
2
2.005
2.01
-6
0
500
1000
1500
2000
0.7
0
1
2
3
4
0.7 0.7
0.8
0.9
1
Fig. 6 Analysis of the #008 voice signal .s(t) from the voice-icar-federico-ii database [32]. (a) Zoom of the signal during 0.2 s. (b) .|ˆs | plotted versus f in logarithmic scales (base 10 and base 2, respectively). (c) Local frequency .f1 (t) computed from the detection of the extrema of larger prevalence from the signal (see (e)). (d) First return inverse of interpeak intervals .f1 (ti ) = 1/Ti scatter plot (these large amplitude peaks are marked with black dots in (e)). (e) Zoom of the signal on the short interval (20 ms) showing the local maxima .sP (ti ) (black dots) and minima .sp (ti ) (black stars), which are used to compute both the local frequency: .f1 (ti ) = 1/Ti and the amplitude .A(ti ) of each larger amplitude peak: .A(ti ) = sP (ti ) − sp (ti ). (f) .|ˆ s | (in base 10 log-scale) plotted versus f in Hz (linear scale). (g) Amplitude of the largest peaks .A(ti ) versus time (see (e) for their detection). (h) First return peak amplitude (.A(ti+1 ) vs .A(ti )) scatter plot
threshold method for extracting the largest signal peaks; the time–frequency analysis is required to check to which extent we can find a timbre for this voice and how it changes with time. Figure 7b,c reports the colour-coded images of the #008 (.s1 ) and the #169 (.s2 ) time–frequency distributions .S(64) (log f, t). With a temporal averaging of these time–frequency distributions .S(Q) (log f, t)t , we get a smooth estimate of power spectrum distributions for these two examples (Fig. 7d). From the log-frequency filtering by the Grossmann analysing wavelet, the shapes of the averaged peaks shown in Fig. 7d are in general different from those which could be obtained with a Welch estimator [43]. The fundamental band frequency of #169 is much broader than that of #008 and shifted to greater values .f1,1 ∼ 188.8Hz (voice #008) and .f1,2 ∼ 268Hz (voice #169), and we also note that it is quite impossible in #169 to discriminate more than one harmonics from the averaged frequency spectrum. The time–frequency distributions in Fig. 7b,c highlight these differences. Whereas the fundamental mode band and its harmonics are weakly modulated in time for #008, those of #169 are very irregular, the third and fourth harmonics can be mixed up and indistinguishable, and the harmonics above five are no longer visible. The vocal folds of #169 can no longer maintain their tight contact
Quantifying the Rationality of Rhythmic Signals
583
(a)
s1
0.5
s2
0 -0.5 2.33
2.34
-8
-6
2.35 -4
2.36
-2
2.37
0
-8
12
2.38 -6
2.39
-4
-2
2.4
(b)
S1
11.5
11
11
11
10.5
10.5
10.5
10
10
10
9.5
9.5
9.5
9
9
9
8.5
8.5
8.5
8
8
8
7.5
7.5
7.5
0.5
1.5
2
2.5
3
3.5
0.5
S2
7
7 1
2.43
(d)
11.5
7
2.42
12
(c) 11.5
2.41
0
12
1
1.5
2
2.5
3
3.5
-6
-4
-2
0
Fig. 7 Comparison of the time–frequency analysis of voice signals #008 (.s1 ) and #169 (.s2 ). (a) Zooms of .s1 and .s2 in a 0.1s window. (b, c) Associated time–frequency distributions (Eq. 26) (64) .S [s1 ](log f, t), and .S(64) [s2 ](log f, t) computed with a Grossmann analysing wavelet and a quality factor .Q = 64. The horizontal bands highlight the fundamental and harmonic frequencies. (d) Corresponding temporal averages of the frequency distributions reported in panels (b) black line and (c) red line. The ordinate of (d) (here the horizontal axes) is arbitrary and the frequency distributions are normalised
that is essential for a correct sound emission and the resulting effect, when hearing the voice, is that of a scratching noise which covers completely the expected tone. This person is quite unable to sing a melody.
3.3 Tuning Voice Pitches via the Computation of Correlation Functions 3.3.1
Reference Frequency Distribution S0,j
For each signal .sj , an “ideal” frequency distribution .S0,j is first introduced in order (Q) to compare the real frequency distribution .Sj with a reference through the crosscorrelation defined in Eq. (27). Let us consider the vibrating string model used to represent the sounds emitted by stringed instruments. When the string is plucked at its ends, its natural frequencies are integer multiples of the fundamental frequency depending on the square root of the force of tension of the string [44]. By analogy to this model, the reference frequency distribution .S0,j (log f ) is a Dirac comb model in log-scale defined as a sum of integer multiples of the fundamental frequency .f1,j of the studied signal
584
A. Guillet et al.
S0,j (log f ) =
.
n
f , cn δ log nf1,j
(31)
with weights .cn ≥ 0 andpossible cut-off (.cn = 0, ∀n > N). We assume that the series of .cn is bounded: . n cn < +∞. We compare then the spectrum of the comb reference model to itself. We build an ideal ratio distribution by computing the auto-correlation of .S0,j derived from Eq. (28): R00 (log q) = R[S0,j , S0,j ](log q) =
.
n
m
n , cn cm δ log q m
(32)
which depends neither on time nor on the signal under study. When .log q = 0 (.q = 1), .R00 (0) = n cn2 < +∞.
3.3.2
(Q)
Cross-Correlation R[S0,j , Sj ] (Q)
The time distribution cross-correlation function between .S0,j and .Sj from Eq. (28): R[S0,j , S(Q) j ](log q, t) =
.
n
is deduced
cn S(Q) j (log qnf1,j , t).
(33)
Different degrees of “frequency matching” can also be captured by high peaks in R[S0 , S(Q) j ](log q, t), especially when q is a simple frequency ratio of harmonics of .f1,j and .f1,0 , for example, 1:2—octave, 2:3—fifth, 3:4—fourth would give a perfect consonance, 3:5—major sixth and 4:5—major third would give a medial consonance, 5:6—minor third and 5:8—minor sixth would give imperfect consonance. “Unmatched” frequency configurations would be obtained if a couple of frequency ratio of harmonics belongs to the dissonance list: 8:9—major second, 8:15—major seventh, 9:16—minor seventh, 15:16—minor second and 32:45 (.∼ √ 1/ 2)—tritone [46]. (Q) (Q) In the following, we will denote .Rij = R[Si , Sj ], and the time–frequency window is fixed (.Q = 64).
.
3.3.3
Application to Two Voice Signals from the VOICED Database
For each of the two voice signals #008 and #169, we construct a Dirac comb model as reference distribution. These distributions are such that their lowest frequency peak matches the signal fundamental frequency (for instance, for the signal #008: .f1,1 = 188.8Hz and for signal #169: .f1,2 = 268Hz). The frequency of the highest harmonic of the comb model is limited by the sampling frequency .Fs : .nf1,i Fs /2
Quantifying the Rationality of Rhythmic Signals
585
(.Fs = 8000 Hz). We take .cn = 1, .∀n ≤ 15: S0,j (log f ) =
.
15 f , δ log nf1,j
j = 1, 2.
(34)
n=1
For numerical computations, the frequency f is discretized in .fk = fmin · α k−1 , with .k = 1, 2, 3, ...N and N the size of the frequency vector f , .fmin its minimum value and .α the geometric factor determined from N, .fmin = 100 Hz (fixed by the voice database), and .fmax = Fs /2. The correlation function .R0j (log q, t) is computed by combining this comb distribution with that of the voice signal as in Eq. (29), using the analytic expression of the .log(f )-Fourier transform (.Flog f ) of the comb distribution. We do not take its Fourier transform numerically because it is the source of numerical artefacts. For the comb model aligned to the fundamental frequency .f1,j of signal .sj , it reads 15 Flog f S0,j (u) = exp −i2π u log nf1,j /fmin .
(35)
.
n=1
where u is the conjugated variable (through Fourier of .log(f ). transformation) In that space, we take the scalar product of .Flog f S0,j with the conjugate of (Q)
Flog f Sj
.
(computed numerically from .WψQ [sj ](log f, t)) and compute its (Q)
inverse Fourier transform to recover the correlation function .R[S0,j , Sj ](log q). (Q)
The time-averaged frequency ratio distributions .R[S0,j , Sj t ](log q) of each voice signals #008 and #169 with its “best-fitted” Dirac comb model are presented in Fig. 8. If the signals were regular and quasi-stationary, these ratio distributions should pinpoint ratios corresponding to the multiples of the fundamental frequencies. The plot of these correlation functions in linear q and .1/q scales in Fig. 8b and c highlights a strong asymmetry. It is due to different amplitudes of the fundamental mode and its harmonics compared to the constant coefficients in the comb model. Again, as for frequency distributions, we note a strong difference of the ratio distributions for signals #008 (.s1 ) and #169 (.s2 ). Confronting .R[S0 , S(Q) ](log q, t) with .S(Q) (log f, t) for voice signal #169 (Fig. 9) unveils important features which were not visible from the time-averaged ratio distribution .R[S0 , S(Q) t ](log q) (Fig. 8). Even if the fundamental mode frequency and its harmonics vary a lot during these 3s record, their ratios do not change dramatically, as a characteristic property of the mechanics of the vocal folds. In the middle of this signal (.1.4s.< t < 1.55s) (see Fig. 9a for a zoom in this interval), four flat ratio bands can be noticed, suggesting that this person put sufficient effort to recover for a short period of time a “mild sensation” of timbre. How this intermittent loss and recovery of the voice timbre occurs, and the time range of these alternating sequences, could be used as diagnosis criteria or aftercare follow-up (invasive intervention is necessary if soft or hard nodules are detected on the vocal cords (stage III) or voice exercises for earlier
586 (a)
A. Guillet et al. 1
R01 R02
0.5
0
(b)
-3
-2
-1
0
1
2
3
1
0.5
0
(c)
2
4
6
8
10
12
14
2
4
6
8
10
12
14
1
0.5
0
Fig. 8 Ratio distributions of voice signals ((1) #008 and (2) #169) and their frequencymatched Dirac comb models (.R0j ). (a) Correlations of the frequency distribution .R0i = (Q) R[S0,j , Sj t ](log q) (.j = 1, 2, .Q = 64) with their reference comb frequency distributions (defined in the text). These correlations have been normalised to their maximum for ease of (Q) comparison. (b, c) Plot of .R[S0,j , Sj t ] versus q for .q > 1 (b) and .1/q for .q < 1 (c)
stages). It has been recently shown for patients with neurodegenerative diseases that not only the patient’s ability to speak and formulate sentences is altered but also their voice purely acoustic features [47].
3.3.4
(Q)
Cross-Correlation R[Si
(Q)
, Sj ] of Two Voice Signals
There are two interpretations for the log-frequency cross-correlation function (Q) (Q) R[Si , Sj ](log q, t), leading to different possible applications. Either we see
.
(Q)
it as a distribution of the ratios q between the frequencies of .Si (log f, t) and (Q) (Q) .S j (log f, t) or we view it for each q as a measure of how well .Si (log f, t) (Q)
(Q)
(Q)
and .Sj (log(qf ), t) match. For instance, in the example reported here, .S1 , S2 are obtained from two different persons holding a pitch in their vocal range. The (Q) peaks in the correlation function .R[S(Q) 1 , S2 ](log q) indicate the importance of the corresponding frequency ratios between the voices, in accordance with its first interpretation as a ratio distribution. When these ratios are close to simple rational numbers .m/n, they indicate the presence of an .m : n-synchronisation, which corresponds to the consonance of the voices simply sung together. This outlines
Quantifying the Rationality of Rhythmic Signals
(a)
1
587
(b)
0.5 0 -0.5 -1 1.4
1.45
1.5
0
1
-10
(c)
11
11
10
10
9
9
8
8
7 -10
-5
0
7
2
2
1
1
-8
-7
2 -6
-5
-4
3 -3
-2
-1
0
(d)
0
1
0
(e)
-9
0.05
0.1
2 0.15
0.2
0.25
3 0.3
0.35
0.4
(f)
0
0 0
0.5
1
0
1
2
3
Fig. 9 Comparing .R[S0 , S(Q) ](log q, t) with .S(Q) (log f, t) for voice signal #169. (a) Zoom of in the [1.4s, 1.5s] interval. (b) Plot of a middle selection of 3s out of the 4.5s recorded voice signal. (c) Temporal average of the frequency distribution .S(Q) t (log f ) computed with a Grossmann analysing wavelet with quality factor .Q = 64. (d) Colour-coded map of the time– frequency distribution .S(Q) (log f, t). (e) Ratio distribution of the averaged frequency distribution: (Q) .R[S0 , S t ](log q). (f) Colour-coded map of the time-ratio distribution .R[S0 , S(Q) ](log q, t) .s(t)
our strategy to assess how rational the spectral relations of the voices are. The (Q) other interpretation is as follows: assuming .S2 (log qf ) models the second voice transposed by q to a different pitch, the peaks in .R[S1(Q) , S2(Q) ](log q) also indicate for which pitch transpositions the second voice would best match the first voice. This allows us to tune one voice with the other. The possibility to match a real voice signal with reference model signals is very interesting because it can limit the maximum harmonics frequency for this crosscorrelation, as an “intelligent” low-pass filtering. This would not be possible by (Q) (Q) computing directly .R[Si , Sj ]. We compare in Figs. 10 and 11 two different “normal” voices (“a” vowel) from the clinic research in the speech therapy labora(Q) tory UNADREO in Toulouse (France). The frequency distributions .Si (log f, t) (Q) and .Sj (log f, t) (.i = j ) plotted in Fig. 10 have the same characteristic frequency peaks structure as the voice #008, but we notice that the frequency distribution
588
A. Guillet et al.
(a) 0.6
s1
0.4
s2
0.2 0 -0.2 -0.4 0.7 -8
0.71
-6
-4
0.72 -2
0.73
0.74
-8
0
-6
0.75 -4
0.76
-2
12
12
11
11
11
10
10
10
9
9
9
8
8
8
0.4
0.6
0.8
1
1.2
1.4
S2
7
7 0.2
0.79
S1
12
7
0.78
(d)
(c)
(b)
0.77
0
0.2
0.4
0.6
0.8
1
1.2
1.4
-6
-4
-2
0
2
Fig. 10 Comparison of the time–frequency analysis for two “normal” voice signals: .s1 is a sung vowel, and .s2 is simply a maintained vowel. (a) Zooms of .s1 and .s2 in a 0.1s window. (b, c) (Q) (Q) Associated time–frequency distributions (Eq. 26) .S1 (log f, t), and .S2 (log f, t) computed with a Grossmann analysing wavelet with quality factor .Q = 64. The horizontal bands highlight the fundamental and harmonic frequencies. (d) Corresponding temporal averages of the frequency distributions reported in panels (b) black line and (c) red line. The ordinate of (d) (here the horizontal axes) is arbitrary and the frequency distributions are normalised
of the voice .s1 has greater energy in the harmonics around 1000 Hz, which is a characteristic of the emission of trained singer voices. (Q) (Q) The cross-correlation ratio distribution .Rij = R[Si , Sj ] is quite different
(Q) . A common from the auto-correlation ratio distributions .R00 , .Rii(Q) and .Rjj reference Dirac comb is chosen for both .s1 and .s2 (.i = 1 and .j = 2) and is aligned to the fundamental frequency .f1,1 = 307.2Hz. The highest central peak indicates the ratio of the fundamental frequencies, and it is centred for .R00 , .R11 and −0.315 for .R . .R shows very .R22 and shifted to .q = f1,2 /f1,1 = 247/307.2 ∼ 2 12 00 sharp and narrow peaks which line up symmetrically on either sides of .q = 1. The amplitude of these peaks recapitulates the weighting of the frequency ratios q for simple comb models and gives us which distribution would be obtained if all the frequency components of the signals had exactly the same power. We finish with the question of comparing the ratio distributions of the voices to the ideal reference.
3.4 Frequency Distribution Matching and Sonance We propose to find the best match between .R12 and .R00 by computing yet another cross-correlation:
Quantifying the Rationality of Rhythmic Signals
589
Fig. 11 Comparing ratio distributions and sonance of the two voice signals of Fig. 10. (a) Plots of (Q) the normalised ratio distributions .R0j = R[S0,1 , Sj ](log q) with .j = 1, 2. For both signals, we use the same Dirac comb model with the fundamental mode frequency of voice .s1 . (b) Plots of the normalised ratio distributions .R11 , .R22 , .R12 with .Rij = R[S(Q) i , Sj ](log q) computed from the two voices and .R00 = R[S0,1 , S0,1 ](log q) computed from the comb Dirac model. (c) Plot of the sonance (cross- and auto-) of the two voices with respect to the reference comb model ratio distribution for an arbitrary pitch transposition .log(x): .G[Rij ](log x) = R[R00 , Rij ](log x) with .i, j = 1, 2 green line, .i, j = 1, 1 blue line and .i, j = 2, 2 red line. (d) Sonance curves of Fig. 11c are corrected by subtracting their lower envelope
ˆ R[R00 , R12 ](log x, t) =
.
ˆ = 0
∞
R00 (log q)R12 (log xq, t)d log q.
(36)
0
∞
q m R12 (log q, t)d log q = R00 log R12 log x, t . x n n m
(37)
This new quantity can be obtained by two equivalent paths: R[R00 , R12 ](log x, t) = R[R01 , R02 ](log x, t),
.
(38)
each corresponding to two uses of the newly introduced parameter x that we call pitch transposition. Either .R12 (log xq, t) is seen as the distribution of ratios q (Q) between the first voice .S1 (log f, t) and the second voice of transposed pitch
590
A. Guillet et al.
(Q)
S2 (log xf, t) or the x in .R00 (log(q/x)) is seen as a varying ratio between the fundamental frequency of the ideal distribution .S0 . Indeed, the best match is expected when the pitch of the second voice is transposed up to the pitch of the first one, so that the voices are in unison. For the distributions of ratios, this is equivalent to matching them using the ratio of the fundamental frequencies: .x = f1,2 /f1,1 . As a result, for two voices of fundamental frequency .f1,1 and .f1,2 /x, the quantity .R[R00 , R12 ](log x, t) as a function of the pitch transposition x has the following interpretation: it measures how “ideal” (similar to the model .R00 ) the spectral relations are between the voices. Extrema of this curve appear directly related to the musical property of consonance or dissonance of certain fundamental frequency ratios. For this reason, we call this quantity the sonance between the two voices, and we rewrite its definition (36) using the reference ratio distribution as the density of .
a measure .dG(log q) = R00 (log q)d log(q):
G[R12 ](log x, t) = R[R00 , R12 ](log x, t) =
ˆ
.
0
∞
R12 (log xq, t)dG(log q), (39)
(Q)
(Q)
which we could denote equivalently .G[S1 , S2 ](log x, t). This sonance measure is a geometric function of a pitch transposition quantity x, whose maxima indicate the optimum relative pitch transpositions for which the two voices sung together would match best. This term sonance bears some analogy with the concepts of consonance and dissonance which were first suggested by Pythagoras (sixth century BC), hence our choice of the symbol .G, but this similarity of terms must be nuanced. Dissonance and consonance are not mathematical quantities since they have been used to describe an empirical sensation of human beings (a combination of cochlea physiology and cognitive training) when hearing a mixture of sounds (two or more)[49, 50]. In Fig. 11, we compute the sonance of the two voice records shown in Fig. 10, from time-averaged frequency distributions (for convenience) referenced to the same comb model. The comparison of the sonance profiles in (c) to the ones of the generating auto- and cross-correlation functions .R01 and .R02 in (a), and .R12 , .R11 , .R22 and .R00 in (b) draws our attention to important features. Apart from the asymmetry (symmetry) of the cross-(auto-)sonances, expected for any correlation function, the sonance profiles have a strong positive baseline and are much less peaked than their simpler counterparts .Rij . This can be linked to the combined effects of a dense forest of peaks in .R00 and their width in the voices ratio distributions. Nevertheless, this baseline can be subtracted, as shown in Fig. 11d, for further comparison. The highest peaks of .G[Rij ], pointing to the interval changes between the two voices that would lead to the best consonances, are very similar
Quantifying the Rationality of Rhythmic Signals
591
to the ones already present in .Rij . For instance, the global maxima of .G[R12 ] to the left of .x = 1 correspond to the ratio between the fundamental frequencies (voice .s2 has a lower pitch than voice .s1 ). However, their prominence is different, and sonance profiles contain a more detailed landscape of smaller peaks and wells, tracking the gains and losses of consonance of the voices sung simultaneously at the corresponding relative pitch transposition. The almost inexistent peak of .G[R12 ](0) indicates that these two voices sung together without any tone adjustment would be quite dissonant because their fundamental and harmonic frequencies have few commensurability: the frequency ratios are not close to simple rational numbers .m/n. In agreement with the common intuition, the central value of the autosonance is also higher for the singer voice than for the untrained one, .G[R11 ](0) >
G[R22 ](0). The implementation of a frequency reassignment would be particularly beneficial to this time-dependent application, improving successively the discrimination of higher harmonics in the CWT, the resulting ratio distributions and the resolution of the sonance landscape. In particular, recent developments in this area based on the synchrosqueezing transform [51, 52] could be important for further developing the sonance concept proposed herein. As a last remark, the sonance profile of the voices is directly influenced by two choices: first, the quality factor of the wavelet Q, which determines the distinguishability of the frequencies and their ratios, and second, the choice of the number of harmonics and their amplitude in the reference comb model .S0 . We believe that for a realistic sonance profile, Q should be related to the critical band of the ear [49] and is, together with the design of the reference ratio distribution .R00 , representative of the musical training of the ear.
4 Conclusion We have introduced time–log-frequency ratio distributions based on analytic wavelets that we have applied to model and physiological signals (voice records). We found that the Grossmann wavelet is a natural shape for this task. A second correlation operation was defined to compare the matching of the voices ratio distribution with an ideally rational one, called sonance. This function of a pitch transposition estimates, in a sense, the “harmony” produced by the two voices sung together. This work has shown that a geometric correlation function, in log-frequency, is best suited to uncover characteristic frequency ratios between different signals. The application to voice records has been selected not only for its simplicity to perform and reproduce but also because it gives credit to the concept of frequency ratios in voiced sounds. This method is presently generalised
592
A. Guillet et al.
to physiological signals recorded from different organs or tissues, such as the heart and the breath, extending the application of these ratio distributions. Acknowledgments This work has been supported by ANR under contract: ANR-18-CE45-001201, EURLight S&T funding. We are very indebted to L. Delmarre, E. Harte, and S. Polizzi for fruitful discussions.
References 1. S.E. Jorgensen, B.D. Fath (eds.), Encyclopedia of Ecology, 1st edn. (Elsevier, Amsterdam, The Netherlands, 2008) 2. P.C. Ivanov, K.K.L. Liu, R.P. Bartsch, New Journal of Physics 18(10), 100201 (2016) 3. A.L. Barabasi, N. Gulbahce, J. Loscalzo, Nature Reviews Genetics 12(1), 56 (2011) 4. A. Goldbeter, Au Coeur des Rythmes du Vivant. La Vie Oscillatoire (Odile Jacob, Paris, 2018) 5. J.N. Oppenheim, M.O. Magnasco, Physical Review Letters 110(4) (2013) 6. J. Schnupp, I. Nelken, A. King, Auditory Neuroscience: Making Sense of Sound (MIT Press, Cambridge, Mass, 2011) 7. G.V. Haines, A.G. Jones, Geophysical Journal International 92(1), 171 (1988) 8. A. Grossmann, J. Morlet, SIAM Journal on Mathematical Analysis 15(4), 723 (1984) 9. R. Kronland-Martinet, J. Morlet, A. Grossmann, International Journal of Pattern Recognition and Artificial Intelligence 1(2), 273 (1987) 10. J.M. Combes, A. Grossmann, P. Tchamitchian, Wavelets: Time-Frequency Methods and Phase Space (Springer, Berlin, Heidelberg, 1989) 11. N. Delprat, B. Escudie, P. Guillemain, R. Kronland-Martinet, P. Tchamitchian, B. Torresani, IEEE Transactions on Information Theory 38(2), 644 (1992) 12. R. Carmona, W.L. Hwang, B. Torresani, in Wavelets and Statistics, vol. 103, ed. by A. Antoniadis, G. Oppenheim (Springer New York, New York, NY, 1995), pp. 95–108 13. R. Carmona, W. Hwang, B. Torresani, IEEE Transactions on Signal Processing 45(10), 2586 (1997) 14. R. Carmona, W.L. Hwang, B. Torresani, Practical Time-frequency Analysis: Gabor and Wavelet Transforms with an Implementation in S. Vol. 9 in Wavelet Analysis and its Applications (Academic Press, San Diego, 1998) 15. B. Torresani, Analyse Continue par Ondelettes. Collection Savoirs Actuels (InterEditions, Paris, 1995) 16. P. Flandrin, Time-Frequency Time-Scale Analysis, Vol. 10 in Wavelet Analysis and its Applications (Academic Press, San Diego, 1998) 17. J. Lilly, S. Olhede, IEEE Transactions on Signal Processing 57(1), 146 (2009) 18. C.K. Chui, An Introduction to Wavelets (Academic Press, San Diego, 1992) 19. T.P. Le, P. Argoul, Journal of Sound and Vibration 277(1-2), 73 (2004) 20. Y. Rocard, Dynamique Générale des Vibrations (Masson et cie, P aris 1943) 21. S. Erlicher, P. Argoul, Mechanical Systems and Signal Processing 21(3), 1386 (2007) 22. I. Daubechies, T. Paul, Inverse Problems 4(3), 661 (1988) 23. P.M. Morse, Physical Review 34(1), 57 (1929) 24. I. Daubechies, J.R. Klauder, T. Paul, Journal of Mathematical Physics 28(1), 85 (1987) 25. S. Olhede, A. Walden, IEEE Transactions on Signal Processing 50(11), 2661 (2002) 26. J.M. Lilly, S.C. Olhede, IEEE Transactions on Information Theory 56(8), 4135 (2010) 27. J.M. Lilly, S.C. Olhede, IEEE Transactions on Signal Processing 60(11), 6036 (2012) 28. J.M. Lilly, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 473(2200), 20160776 (2017)
Quantifying the Rationality of Rhythmic Signals
593
29. T. Paul, K. Seip, in Wavelets and Their Applications, M.B. Ruskai, G. Beylkin, R. Coifman, I. Daubechies, S. Mallat, Y. Meyer and I. Raphael edn. (Jones and Bartlett, Boston, 1992), pp. 303–322 30. A. Meynard, B. Torresani, Spectral analysis for nonstationarity audio, in IEEE/ACM Transactions on Audio, Speech and Language Processing, 26(12) 2371 (2018) 31. H. Omer, B. Torrésani, Applied and Computationl Harmonic Analysis, 43 1 (2017) 32. U. Cesari, G. De Pietro, E. Marciano, C. Niri, G. Sannino, L. Verde, Computers & Electrical Engineering 68, 310 (2018) 33. A.L. Goldberger, L.A.N. Amaral, L. Glass, J.M. Hausdorff, P.C. Ivanov, R.G. Mark, J.E. Mietus, G.B. Moody, C.K. Peng, H.E. Stanley, Circulation 101(23) (2000) 34. F. Le Huche, A. Allali, La Voix, Tome 2: Pathologies Vocales d’Origine Fonctionnelle (Elsevier Masson, Paris, 2010) 35. F. Le Huche, A. Allali, La Voix, Tome 3: Pathologies Vocales d’Origine Organique (Elsevier Masson, Paris, 2010) 36. G.S. Berke, B.R. Gerratt, Journal of Voice 7(2), 123 (1993) 37. D. Jouvet, Y. Laprie, in 2017 25th European Signal Processing Conference (EUSIPCO) (IEEE, Kos, Greece, 2017), pp. 1614–1618 38. M. Ross, H. Shaffer, A. Cohen, R. Freudberg, H. Manley, IEEE Transactions on Acoustics, Speech and Signal Processing 22(5), 353 (1974) 39. S.Y. Lowell, R.H. Colton, R.T. Kelley, Y.C. Hahn, Journal of Voice 25(5), e223 (2011) 40. S. Mallat, A Wavelet Tour of Signal Processing (Academic Press, San Diego, 1999) 41. L. Cohen, Time-frequency Analysis (Prentice Hall, Upper Saddle River, NJ, 1995) 42. N.E. Huang, S.S.P. Shen (eds.), Hilbert-Huang Transform and its Applications, 2nd edn. No. 16 in Interdisciplinary Mathematical Sciences (World Scientific Publ, Singapore, 2014) 43. M. H. Hayes, Statistical Digital Signal Processing and Modeling, (John Willey & Sons, Ltd, Chichester, UK, 1996) 44. M. Gérardin, D.J. Rixen, Mechanical Vibrations. Theory and Application to Structural Dynamics, 3rd edn. (John Willey & Sons, Ltd, Chichester, UK, 2015) 45. C.M. Travieso, J.B. Alonso, J. Orozco-Arroyave, J. Vargas-Bonilla, E. Noth, A.G. RaveloGarcia, Expert Systems with Applications 82, 184 (2017). 46. H.V. Helmholtz, Theorie Physiologique de la Musique Fondée sur l’Etude des Sensations Auditives (Victor Masson et Fils, Paris, 1868) 47. S. Al-Hameed, M. Benaissa, H. Christensen, B. Mirheidari, D. Blackburn, M. Reuber M, PLoS ONE 14(5), e0217388 (2019) 48. C.-Y. Lin, L. Su, H.-T. Wu, Journal of Fourier Analysis and Applications 24, 451 (2018) 49. R. Plomp, W.J.M. Levelt, The Journal of the Acoustical Society of America 38(4), 548 (1965) 50. A. Kameoka, M. Kuriyagawa, The Journal of the Acoustical Society of America 45(6), 1460 (1969) 51. H.-T. Wu, Applied and Computation Harmonic Analysis, 35, 181 (2013) 52. I. Daubechies, Y. Wang, H.-T. Wu, Phil. Trans. R. Soc. A 374, 20150193 (2016)
Four Billion Years: The Story of an Ancient Protein Family Gilles Didier, Claudine Landès, Alain Hénaut, and Bruno Torrésani
Abstract Comparison of protein sequences has long been a very effective tool for producing biological knowledge. It was initially based on the alignment of sequences, that is to say organizing the set of sequences in columns (of a spreadsheet) of sites which have evolved from a common site of the ancestral sequence. Alignments are generally obtained by minimizing an evolution or an edition cost. Sequence comparisons are now often performed without alignments by comparing the N-mer compositions of the sequences. We present here the most popular methods used by biologists to compare sequences and place emphasis on an approach to augment the alphabet of a set of sequences in order to ease their comparison. The family of DNA topoisomerases, a set of ancient proteins whose history can be traced back 4 billion years, is used to illustrate this approach.
1 Introduction The DNA double helix has a very stable structure. For example, dissociating the two strands results in the creation of supercoils that stop their separation, which is clearly a bonus for the conservation of the genetic inheritance. It poses a problem, however, when the two strands need to be dissociated in order to be copied [45]. This topological problem is settled in vivo by proteins, the DNA topoisomerases,
G. Didier IMAG, Univ Montpellier, CNRS, Montpellier, France e-mail: [email protected] C. Landès Univ Angers, Institut Agro, INRAE, IRHS, SFR QUASAV, Angers, France e-mail: [email protected] A. Hénaut Université Publique Française, Evry, France B. Torrésani () Aix Marseille Univ, CNRS, I2M, Marseille, France e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Flandrin et al. (eds.), Theoretical Physics, Wavelets, Analysis, Genomics, Applied and Numerical Harmonic Analysis, https://doi.org/10.1007/978-3-030-45847-8_25
595
596
G. Didier et al.
Fig. 1 Hypothetical origin and distribution of Topoisomerases IA within the tree of life. T1, T3, and R indicate the TopoI, TopoIII, and reverse gyrase, respectively. The dashed lines indicate that only a part of the organisms belonging to that branch possess such enzymes. LUCA: Last Universal Common Ancestor is a theoretical construct—it might or might not have been something we today would call an organism. HGT: horizontal gene transfer or lateral gene transfer; the movement of genes between distantly related organisms. T represents the ancestor of all of the TopoIAs (Figure taken from [20])
that modify the supercoils. They trigger a transient cut of the DNA on one of the strands for type I topoisomerases, on both strands for type II topoisomerases. There are several classes of topoisomerases. Those belonging to class IA are found among all living species and have probably existed for 4 billion years. They form a multigenic family that has been undergoing a series of duplications in the course of evolution [1, 4, 18, 20], see Fig. 1. How can biologists trace back the history of such an ancient family of proteins? In other words, how can they draw up the phylogeny1 of the DNA topoisomerases IA? It has been known since 1965 that it is possible to reconstruct phylogenies through sequence comparisons.2 This is due to the fact that those proteins which possess the same function in closely related species do exhibit very similar sequences. In two closely related sequences the amino acids will be generally the same at a given position but they can also differ—the result of a mutation. The
1 Phylogeny is the study of the degree of relationship between living organisms, which enables to reconstruct their evolution. In a phylogenetic tree, the nodes represent the common ancestors. The greater the number of nodes between two taxa, the more ancient is their common ancestor and the farther they are in the tree of life—the length of the branches is approximately proportional to the time of divergence between the taxa [22, 40]. 2 Proteins are macromolecules composed of a linear string of amino acids. They are generally made of several hundreds of the 20 different amino acids.
Four Billion Years: The Story of an Ancient Protein Family
597
total number of differences (mutations) between two sequences being as a first approximation proportional to the time of divergence between the two species [49], they provide a useful information to reconstruct their phylogeny. Things become more complicated when the sequences have diverged a long time ago since several mutations may have occurred over time at the same site. Deciphering the phylogeny of species being a central concern in biology, it is not surprising that a number of different methods would have been developed in this respect. This article describes the principles of these methods and shows how original Alex’s approach is.
2 Sequence Comparisons with Alignment 2.1 Pairwise Alignment of Two Sequences Take two sequences. Modify (edit) the first sequence so that you end up with the second sequence and count the minimal number of modifications that are necessary to go from the first to the second: you have performed the pairwise alignment of the two sequences. The modifications that are relevant in biology are (1) the substitution of one letter by one another (the substitution of an amino acid by another one at a given site) and (2) the insertion or the deletion of one or several letters at a given site (which results in the creation of a gap at that position in one of the two sequences). A cost is associated with each modification, which enables to obtain a score when the alignment is completed. The first algorithm explicitly devoted to the pairwise alignment of biological sequences was devised by Needleman and Wunsch in 1970 [31]. This dynamic programming algorithm aims at aligning two sequences over their whole lengths (a global alignment) and guarantees that the resulting alignment score is maximal. However, a global alignment is not relevant if the similarity spreads only over a limited length. Smith and Waterman suggested in 1981 to look only for the regions of greatest similarity between two sequences and report only those regions (a local alignment) [41]. Their algorithm is a derivative of that by Needleman and Wunsch. The cost optimized by these algorithms mainly relies on two parameters: • A scoring matrix giving, for each pair of amino acids, a score that was estimated from: – The substitutions that have been observed in well-known protein families or – The chemical properties of the amino acids [34, 37] • A penalty for the creation of a gap (and possibly another one for its extension) A scoring matrix between amino acids is normally built from experimental data [11]. Unfortunately, the gap penalties are totally empirical—one may say that it is just been cobbled together [35, 42, p. 56].
598
G. Didier et al.
The programs based on dynamic programming are slow and can hardly be used to perform pairwise alignments of hundreds or thousands of sequences. Thus some heuristics were devised at the end of the 1980s to speed up the comparisons without sacrificing too much from sensitivity or specificity. The result are local alignment programs that proceed in two steps: (1) look for all the “regions of similarity” between two sequences, (2) if possible, join together all those regions in order to obtain a longer alignment. Many of these programs look for segments that are identical in both sequences and keep from this list only those pairs of segments that are equidistant in both sequences—thus allowing no gaps between them. The idea here is that two closely related sequences deriving from a common ancestor will not suffer from insertions/deletions in conserved regions. These segments are then used as anchoring points for a further local alignment. Some programs search for strictly identical segments of a given length, some others allow for some flexibility and accept “almost identical” segments. “Almost identical segments” are precisely defined through the use of a scoring matrix or a reduced amino acid alphabet. In the last case the number of amino acids is reduced from twenty to six or even four: acidic, basic, polar (hydrophilic) and apolar (hydrophobic) [34]. The use of a reduced alphabet speeds up the alignment process at the expense of specificity. The most famous programs for massive pairwise comparisons are FASTA [33] and BLAST [2], created in 1988 and 1990, respectively. Even if BLAST was inspired by FASTA, the two programs present an important difference: BLAST searches for similar words of three consecutive amino acids while FASTA searches for strictly identical words of two consecutive amino acids. BLAST is by far the most popular. It is generally used to look for similarities between a given sequence and all those that are gathered in databanks (at the end of 2020, the protein sequence data banks contained 186 billion sequences).
2.2 Simultaneous Alignment of Several Sequences In theory, the Smith and Waterman algorithm allows for the simultaneous alignment of several sequences (which is called a multiple alignment) but in practice this is feasible for only a very limited number of sequences (the multiple alignment problem is NP-hard [25]). One has therefore to resort to heuristics [6], the vast majority of them following the same path: (1) computation of the similarity between each pair of sequences through pairwise alignments; (2) creation of an ascending hierarchical tree using the similarity matrix based on the scores of the pairwise alignments. This tree then sets the order in which the sequences will be aggregated: (i) choose the pair of closest sequences in the tree; (ii) align these sequences; (iii) replace in the tree the pair of sequences by this alignment. Hence, during the course of the algorithm, one may align a sequence with a group of sequences that have already been aligned, or even align a group of sequences with another group. The root of the tree harbors the alignment of all the sequences.
Four Billion Years: The Story of an Ancient Protein Family
599
There are two major steps for the alignment of one sequence with a group of sequences or a group of sequences with another group: • The group is represented by a position sensitive substitution matrix (PSSM) where each element depends on the nature of the two amino acids which are in the same column as well as on the position of the column in the multiple alignment [21, 39]; the alignment is performed in a classical way. • One aligns the “regions of similarity” that were found in the first step. If a gap must be introduced, it is placed at the same position in all the sequences of the group. Eventually, after completion of the multiple alignment, the matrix of similarity between all the sequences is edited as a phylogenetic tree. Some relevant algorithms are available for this purpose [32, 43]. To sum up, the biologist is offered a host of options to perform a multiple alignment: global or local alignment, strict identity or mere similarity, scoring matrices or reduced alphabet, single-, average-, or complete-linkage hierarchical classification, to name a few (see [3] for a review of the most popular programs). If the sequences are closely similar, all the options will provide essentially the same results. But if the sequences are rather dissimilar, the alignments will heavily depend on the chosen options. In such cases the biologists will take advantage of additional information provided by a deep knowledge of the protein family (some adjustments may be done by hand) or by the 3D structures of some of the proteins in the set (if available) or even by a tentative and approximate prediction of the foldings of the proteins—a computationally intensive task [7, 24].
3 Alignment-Free Sequence Comparison and Local Decoding The concerns raised in the section above have motivated the development of approaches to compare sequences without aligning them. A natural way to do this is to compare sequences with regard to their composition. Since considering the frequencies of the 20 amino acids is not discriminating enough, one rather considers their composition in words of a given length N (so-called N-mers), i.e., by counting the number of times each word of length N occurs in each sequence to compare.
3.1 N-mers and N -Local Decoding Though the N -mers-based comparisons may provide accurate approximations of evolutionary distances, they generally lack definition due to the fact that they do not distinguish between the situation where two N -mers differ in only a few positions and the situation where they are completely unrelated. This point can be improved by allowing mismatches between words. This leads to the question of deciding
600
G. Didier et al.
Fig. 2 (1): a sequence over {a,b,c,d,e,f,g}; (2): the sequence of overlapping successive 2-mers of the sequence (1); (3): the sequence (2) with all letters replaced by the position of their first occurrence; (4): the sequence of overlapping successive 2-mers of a sequence different from sequence (1), which is the sequence (5) over {a,a’,b,c,d,d’,e,f,g}. It can be proved that the sequence (5) is maximal in the sense that all the sequences with overlapping successive 2-mers sequence corresponding to (3) can be obtained from (5) by letter-letter applications [12]. For instance, going from (5) to (1) is done by the letter-to-letter application which discards the ‘primes’
to what extend two different N-mers can be considered “equals” in a sequence comparison context. Note that this question is still being actively investigated (e.g., [16, 27, 47, 48]). The approach presented in [15] relies on the fact that the sequence of the overlapping successive N -mers of a given sequence s may generally be obtained as the sequence of the overlapping successive N-mers of many sequences (up to relabeling the N-mers), among which one is maximal in the sense that all the others (including s) may be obtained from it through letter-to-letter applications (not necessarily one-to-one, Fig. 2). It follows that the alphabet of this maximal sequence is greater than that of any other sequences whose overlapping successive N -mers sequence is the same as that of s, still up to relabeling. In particular, it is greater or equal to the alphabet of s. The N-local decoding of a sequence is this maximal sequence. The N-local decoding can alternatively be defined (and is computed) by considering the equivalence relations between the positions of the sequence(s) presented below. Let S be a set of sequences over a finite alphabet. Its site space . is the set of all pairs .(s, p) where s is a sequence of S, and p a position in it, namely = (s, p) | s ∈ S, 1 ≤ p ≤ (s) ,
.
where .(s) is the length of sequence s. For all positive integers N , we define the following relations between sites of S. 1. Two sites .(s, p) and .(s , p ) in . are directly related if there exists a word w of length N occurring at positions .p − i and .p − i of sequences s and .s , respectively, with .i < N. In other words, w overlaps both sites .(s, p) and .(s , p ) with the same offset i. If two sites .(s, p) and .(s , p ) are directly related, we write .(s, p) N (s , p ). Note that determining if the sites .(s, p) and .(s , p ) are directly related only requires to consider their centered neighborhoods of length .2N − 1.
Four Billion Years: The Story of an Ancient Protein Family
601
Fig. 3 Graphical representation of relatedness within an .∼N -class, with .N = 7. For each one of six sites, the word occupying its centered neighborhood is listed on the right of the figure. Directly related sites are connected by solid lines: each color corresponds to a word of length 7 shared by at least two neighborhoods and displayed at the bottom-right. Dashed gray lines connect sites that are related by .∼N but not directly related by .N
2. The equivalence relation .∼N is then defined as the transitive closure of .N . In other words, we say that .(s, p) ∼N (s , p ) if there is a chain of directly related sites connecting .(s, p) and .(s , p ). In order to illustrate these relations on an example, let us consider a set of protein sequences and examine one of the equivalence classes associated with the relation .∼N with .N = 7 (Fig. 3). This class contains 6 sites. The first site is described by the pair .(0, 571): this means that it lies at position 571 of the sequence number “0”, and similarly for the other five sites. Since .N = 7, the neighborhoods around these sites used to determine the relations above are of length .2N − 1 = 13 and are displayed in the figure with their central letter underlined. Directly related sites are connected by black lines. For instance, the sites .(0, 571), .(3, 630) and .(8, 614) share the word LREIDED starting at the third position of their neighborhood. The sites that are related without being directly related are connected by dashed gray lines. For instance, the sites (1, 580) and (5, 528) are connected by the chain (1, 580) .→ (0, 571) .→ (3, 630) .→ (5, 528). Theorem 1 ([15]) Let S be a set of sequences and N a positive integer. Any set of sequences having the same successive overlapping N-mers sequences as S (up to a relabeling) can be obtained by a letter-to-letter application from the set of sequences obtained from S by putting at each site the (ident of the) class to which it belongs in the partition associated with the relation .∼N . In other words, the N-local decoding of a set of sequences can be obtained by replacing the letter at each site of the set by the class of this site in the .∼N -partition. This approach can be applied to compare a set of sequences by considering their composition in symbols of its N-local decoding. The N-local decoding of a set of sequences can be seen as an intermediate level of information between this set and its sequences of N-mers. From a practical point of view, the N-local decoding of a (set of) sequence(s) can be computed with a complexity linear with the length of
602
G. Didier et al.
the sequence(s) both in time and memory space whatever N and thus with the same complexity as that required to compute its N-mers [8, 9, 13, 15]. Both the N-mers and the N -local decoding approaches require the selection of a suitable value of the parameter N in order to compare a given set of sequences. For large values of N, each N-mer (and each symbol of the local decoding) tends to occur at most once in the set of sequence whereas small values of N, the N -mers occur at too many positions in the set to be actually informative for comparison purposes. A basic solution is to try several values of N and to select the value which seems the most relevant with regard to the results observed. The MS4 approach, presented in Sect. 3.2, was designed to tackle this issue (MS4 stands for Multi-Scale Selector of Sequence Signatures). MS4 selects for a given site the smallest N such that the average number of occurrences per sequence of the equivalence class of this site is smaller than a given threshold .γ . The resulting values of N are allowed to differ between sites in order to adapt to the context of each site of the set of sequences that need to be compared. The parameter .γ has an intuitive interpretation since it reflects the average number of repetitions in the sequences. The necessity to adapt the size of the context considered around a site of a set of sequences to compare has also motivated the development of the variable length local decoding which generalizes the N-local decoding by considering not N-mers but words of various lengths to code and decode the set of sequences to be compared. This approach is briefly presented in Sect. 3.3.
3.2 MS4 The N-local decoding is used in order to produce partitions of the set of all sites in the sequences under study [15]. The MS4 approach relies on an object that describes the embedding of the successive results of the N -local decoding as N increases. The tree structure of this object is essential, since it can easily be parsed to select “relevant” (according to a certain criterion) classes of sites, which may occur at several values of N .
3.2.1
The Partition Tree
A recurring problem of N-mers-based methods is the lack of an objective criterion to tune the parameter N to a suitable value in order to compare a set of sequences. There is actually no reason to believe that a single “optimal” value of N will always be meaningful since a given set of sequences can contain parts that are very well conserved between sequences while others may vary a lot among them. The MS4 approach combines different N-local decoding equivalence classes for various values of N by using an original construction, the partition tree, which allows us to choose a set of “relevant” N-local decoding-classes. Let .EN be the
Four Billion Years: The Story of an Ancient Protein Family
603
partition of ., the site space of a set of sequences S, induced by the equivalence relation .∼N . Lemma 1 For all .N ≥ 0, the partition .EN is coarser than .EN +1 (i.e., each class of N +1 .E is included in a class of .EN ). This lemma (see [9] for proof) is crucial and corresponds to the intuitive idea that it is harder to group together large words than small ones. We are now ready to define the partition tree. Definition 1 Let us set .E0 = {} and .V = ∪i≥0 Ei (i.e., V contains all the equivalence classes of all the partitions .Ei for .i ≥ 0). The partition tree .P = (V , E P ) is the tree where the vertices are the equivalence classes of V and the set of edges P .E is defined by E P = {(u, v) ∈ EN × EN +1 | v ⊂ u}.
.
In other words, the vertices of .P are the equivalence classes associated with the relations .∼N for all values of N . The edges are drawn between pairs of classes that correspond to successive values of N and such that one is a subset of the other. By Lemma 1, any two sites that are .(N + 1)-equivalent are also N-equivalent. On the other hand, two sites that are N -equivalent are not necessarily .(N + 1)-equivalent. In other words, the N-classes split as N increases. The edges are drawn precisely between any N-class C and all the .(N + 1)-classes into which C splits. Since any vertex of .P has at most one ancestor by construction, .P is a tree.
3.2.2
Classes Selection
When we examine N -equivalence classes for all possible N , we face a deluge of information, moreover altogether redundant. We shall now use the partition tree to alleviate this problem. Given any set C of sites, let us define .|C|, the size of C as the number of sites in C and the spread of C as the number of sequences which contain at least one element of C. We shall consider the quantity .κ(C) defined as the ratio between the size and the spread of C: κ(C) =
.
|C| ≥ 1. |{s ∈ S | ∃p, (s, p) ∈ C}|
For a given value .γ ≥ 1, the condition .κ(C) ≤ γ means that the average number of occurrences of class C per sequence where it does occur is less or equal than .γ . In particular, .κ(C) = 1 means that no sequence contains more than one element of C (of course we take here C to be an N -local decoding-class). We call the parameter .γ the maximum average repetitivity. We use this parameter to select nodes in the partition tree that satisfy .κ(C) ≤ γ .
604
G. Didier et al.
This condition is not sufficient to make these classes relevant. Indeed, the bottom of the partition tree is occupied by classes corresponding to large N, which occur in only one sequence. Such classes are of no interest. In order to find relevant classes, we have to “climb upward” (towards smaller values of N). Since any vertex of a tree has only one ancestor, the following definition does make sense. Definition 2 An N-local decoding-class C is .γ -relevant if it satisfies .κ(C) ≤ γ while its ancestor does not. The MS4 method selects all and only the relevant classes in a set of sequences (and ignores all the others).
3.2.3
The Dissimilarity Matrix
At the end of the MS4 procedure, each sequence can be rewritten, by replacing the letter originally found at a given site by the identifier of the relevant MS4class to which the site belongs. We use the number of MS4 classes shared by two sequences to define a similarity index in a similar way as described in [14]. This measure is closely related to the percentage of identity classically used for sequence comparison. Given any two sequences .si and .sj , we compute their dissimilarity level .dij as follows. For a class C, let .ni (C) be the number of occurrences of C in .si . Denote by .Cij the set of relevant classes that have representatives both in .si and .sj . Since these two sequences may contain a different number of occurrences, we put .nij = C∈Cij min{ni (C), nj (C)}. We define the dissimilarity level .dij by dij = 1 −
.
nij , min{(si ), (sj )}
where .(si ) and .(sj ) are the lengths of .si and .sj , respectively. When .γ = 1, .nij is simply the number of relevant classes having representatives in both .si and .sj . The dissimilarity matrix .(dij )1≤i,j ≤S can be given as input to a phylogenetic reconstruction software [23, 32].
3.3 Variable Length Local Decoding of Sequences The variable length local decoding of sequences extends the local decoding of a given (and fixed) order presented above in the same way as variable length Markov models extend Markov models of a given order in the sense that the size of the window used to “decode” a position depends on the symbols in its neighborhood in a way similar to its “local order” (i.e., the memory size at this position) under a variable length Markov model [38].
Four Billion Years: The Story of an Ancient Protein Family
605
Fig. 4 Top-left: a prefix code .P and the corresponding coding identifiers. Right: the tree representation of .P. Bottom-left: a sequence, its coding w.r.t. .P and the corresponding sequence of words lengths
A variable length decoding scheme is defined from a prefix code .P. Let us first recall that a prefix code is a set .P of words on a given alphabet which is such that no word in .P is prefix of another word of .P but itself. For instance, .P = {A, CA, CCA, CCC} is a prefix code over the alphabet .{A, C}. By construction, the words of a prefix code .P are the leaves of the prefix tree storing the words of .P (i.e., the tree where the nodes are the prefixes of words of .P and where the direct ancestor of a node is obtained by discarding its last letter). A tree is a convenient representation of a prefix code (Fig. 4). Note that from the property defining a prefix code, there is at most one word of a prefix code which occurs at a given position of a sequence. We say that a prefix code .P is compliant w.r.t. a given sequence (or a set of sequences3 ) if whatever the position of the sequence one picks, there is a word of .P occurring at this position. It follows that if a prefix code .P is compliant w.r.t. a sequence, there is one and only one word of .P occurring at all positions of the sequence, except possibly at its last positions for which the corresponding words of .P may be truncated. The (variable length) coding of a sequence w.r.t. a given .P where each word is associated with a unique identifier, is the sequence of identifiers of the overlapping words of .P occurring along the sequence (Fig. 4). Given the coding of a sequence and the sequence of the corresponding lengths of the words of the prefix code (e.g., the two last rows of the table at the bottom-left of Fig. 4), there exists an antecedent which is maximal in the sense that (i) it has the greatest alphabet possible among the antecedents obtained with prefix codes with the same sequence of lengths of words and (ii) all the other antecedents can be obtained from it by letter-to-letter applications. Note that the set of all N -mers over a given alphabet is a compliant prefix code w.r.t. any sequence over this alphabet. The maximal antecedent obtained from the
3 All
the statements of this section still hold by replacing “sequence” with “set of sequences.”
606
G. Didier et al.
prefix code made of all N -mers is exactly the N -local decoding presented above and the variable length decoding does generalize the (standard) local decoding. The variable length local decoding of the sequence may be equivalently defined by considering the equivalence relation between the positions of the sequences defined as the transitive closure of relation connecting two positions if there is a same word from the prefix code covering them with the same offset. An important point here is that there is an algorithm performing the variable length local decoding of a sequence which is linear with the size of the sequence both in time and memory space. In other words determining the variable length local decoding is not more expensive than dealing with the N -mers or the N -local decoding from a computational point of view. Determining a (somewhat) relevant prefix code in order to perform an alignment free comparison of a given set of sequences is not obvious. In order to perform this task, we first remark that pruning the suffix tree of a sequence [43] leads to a compliant prefix code (reciprocally, the useful part of any compliant prefix code of a sequence may be obtained by this way). In [13], the suffix tree is pruned at the shallowest nodes corresponding to words having a probability smaller than a given threshold t to appear more than once in the whole set of sequence, under a Markov model of order 1 estimated on the set of sequences (in the current implementation, this probability is approximated from a binomial distribution). The probability threshold t is determined according to a heuristic criterion involving the number of occurrences of the words of the prefix code in the set of sequences. Finally, a dissimilarity matrix is computed from the variable length local decoding of the set of sequences to be compared in a way similar to that described in Sect. 3.2.3. This dissimilarity matrix can then be given as input to a phylogenetic reconstruction software.
4 Results: A Brief Look at the History of DNA Topoisomerases IA The necessity for a mechanism that would change the DNA topological state seemed obvious as soon as the structure of DNA was deciphered in 1953. The first DNA topoisomerase to be identified was isolated from a bacterium in 1971 and its sequence established in 1986 [44]. Any living organism possesses the two types of DNA topoisomerases, at least one of each type, and several subtypes can often be found in the same species. Among the different types, topoisomerases IA are the only ones to be present in all the living organisms. They are obviously proteins that have existed for a very long time (see Fig. 1).
Four Billion Years: The Story of an Ancient Protein Family Table 1 Number of species represented in RefSeq release 203 as of 9 November 2020
607 Prokarya Archaea Bacteria Eukarya Protozoa Fungi Plant Vertebrate mammalian Vertebrate other Invertebrate Total
1337 63,237 573 13,970 5684 1294 4544 4321 93,209
4.1 The Evolutionary History of Topoisomerases IA How can we decipher the evolutionary history of topoisomerases IA and trace the ancient duplications and horizontal transfers that led to the current state of affairs? One first problem comes from their presence in every living organism. In late 2020, nearly 100,000 different organisms were represented in the sequence databanks (Table 1). This is too much; it would be practically impossible to make an exhaustive analysis of such an enormous set. In addition, the information is not always relevant, for example: • Numerous strains or individual of some model species were sequenced plenty of times. While important for understanding the intra-species polymorphism, this information is not relevant in terms of phylogeny. • A comparison of man and chimpanzee is useless if one is interested in events that occurred way before the dinosaur era. One had better select species that diverged a long time ago, thus being far apart in the tree of life. For the following study we selected 2651 sequences: 2135 sequences of DNA topoisomerases IA from bacteria, 268 from archaea and 68 from Eukarya. The dissimilarity index between two sequences is measured with the Variable length local decoding (VLD, see above). There are 3 subtypes of topoisomerases IA: TopoI, TopoIII, and reverse gyrase. In the reverse gyrase subtype, the protein contains a helicase domain in addition to the topoisomerase domain. • The TopoI subtype is present only among bacteria. • The TopoIII subtype is present among bacteria, archaea, and eukaryotes. • The reverse gyrase subtype is present in hyperthermophilic bacteria and archaea.4
reverse gyrase exists mainly in bacteria and archaea whose growth optimum is above 80.◦ C; it protects DNA from the denaturation that normally occurs at such high temperatures.
4 The
608
G. Didier et al.
Within the subtypes, one can see groups that reflect some physiological differences (for example, thermophile vs halophile). There are also some sequences that cannot be confidently linked with a given subtype. When submitted to automatic classification algorithms, the topoisomerases IA tend to cluster into 9 groups that are biologically coherent: • With less than 9 groups, some markedly different proteins are clustered together and some characteristics are not highlighted. • With more than 9 groups, the groups are no longer coherent. The complete tree bears 2651 leaves, its analysis would be beyond the scope of this article. We present here the tree of hyperthermophiles (Fig 5). The leaves correspond to the genera. The dissimilarity index of a genus is the average of the dissimilarity index of the species it groups. In order to simplify the reading, the identifier of a leaf is the name of the corresponding class (a class is a taxonomic unit grouping several genera, e.g., Mammalia is a class) with R1/R2 if the reverse gyrase is duplicated. The number following the name of the class allows to go back to the sequences constituting the leaf. The names are in lower case for the reverse gyrases and in upper case for the other types of topoisomerases IA. The tree of Archaea TopoIII is consistent with the taxonomy except perhaps for Thermoprotei 65 (Pyrobaculum aerophilum str. IM2 + Pyrobaculum ferrireducens). The tree of Bacteria TopoI is consistent with the taxonomy except for Aquificae. Aquificae 75 and Aquificae 89 (Desulfurobacterium thermolithotrophum DSM 11699 and Thermovibrio ammonificans HB-1, respectively) are clearly disjoint from other Aquificae. Both groups of Aquificae have their own Thermodesulfobacteria (Thermodesulfatator indicus DSM 1528—Aquificae 85—in the first group and Thermodesulfobacterium geofontis OPF15—Aquificae 86—in the second). The tree of Fig. 5 contains in addition the TopoIII of vertebrates (labeled with a red star). It is very clearly related to the archaea TopoIII. This is true for all eukaryotes. The place of archaea in the evolutionary history of eukaryotes remains, however, an open question [29]. The tree of reverse gyrases is more complicated. In the Thermoprotei archaea, two types of reverse gyrases are clearly distinct and are separated from the bottom of the tree (noted R1 and R2, respectively). They correspond to a duplication of the reverse gyrase gene. A similar but less marked dichotomy is observed in bacteria. An analysis of the differences between the two types of reverse gyrases is presented in the following section.
4.2 The Subfunctionalization of Reverse Gyrases The reverse gyrase is the only protein (hence the only gene) to be quasi specific to hyperthermophilic organisms. It is systematically present among them and almost wholly absent in mesophilic cells. The reverse gyrase gene results from the fusion of
Four Billion Years: The Story of an Ancient Protein Family
609
Fig. 5 Topoisomerases IA tree in hyperthermophiles. The leaves correspond to the genus. Identifier is the name of the corresponding class (a class is a taxonomic unit grouping several genera, e.g., Mammalia is a class) with R1/R2 if the reverse gyrase is duplicated. The number following the name of the class allows to go back to the sequences constituting the leaf. The names are in lower case for the reverse gyrases and in upper case for the other types of topoisomerases IA. The tree also contains the TopoIII of vertebrates (labeled with a red star); the branch is clearly related to the archaea TopoIII
a topoisomerase gene with a helicase gene [19]. It is possible that hyperthermophilic organisms may have existed before the advent of reverse gyrases, but the selective advantage provided by this gene is such that it must have been incorporated very quickly in the genomes of all the hyperthermophilic organisms, bacteria as well as archaea [5]. A similar phenomenon can be observed nowadays with antibiotic resistance. The genes providing this resistance were rarely present among the bacterial populations—they were definitely not necessary—but with the current massive use of antibiotics those bacteria that possess the genes have now an obvious,
610
G. Didier et al.
Table 2 Archaea of our sample possessing a duplication of the reverse gyrase gene with the corresponding identifier in Fig. 5 Species Aeropyrum pernix + A. camini Desulfurococcus amylolyticus + D. mucosus Hyperthermus butylicus Pyrolobus fumarii Saccharolobus solfataricus Sulfolobus islandicus Sulfurisphaera tokodaii
R1 / R2 (identifiers in Fig. 5) Thermoprotei 2 / Thermoprotei 3 Thermoprotei 6 / Thermoprotei 7 Thermoprotei 10 / Thermoprotei 11 Thermoprotei 22 / Thermoprotei 23 Thermoprotei 24 / Thermoprotei 25 Thermoprotei 28 / Thermoprotei 27 Thermoprotei 29 / Thermoprotei 30
tremendous selective advantage. As a result, the resistance genes are now quite common within pathogenic bacteria. They have been gained through horizontal transfer (HGT). The revere gyrase gene is duplicated in several organisms, notably Sulfolobus (see Table 2). It has been shown that the two copies present some functional differences in Sulfolobus [19, 20]. The biologist now needs to identify the positions in the proteins which distinguish the two copies and are responsible for those differences. He must first identify the potentially interesting sites, as experiments (in the “wet lab”) are long and costly. The approach presented below provides an answer, since it establishes a list of words (in the sense of Fig. 3) that are characteristic of a given group of sequences. Figure 6 gives an example of the results obtained by studying 35 reverse gyrases that are representative of the biodiversity of hyperthermophilic organisms, comprising 8 pairs of duplicated genes topR1 and topR2. Among all the classes (in the sense of Fig. 3), we looked for those that were lacking in topR1 but present in topR2, and vice versa. Figure 6—drawn with the WebLogo software [10]— provides two examples of classes that distinguish topR1 and topR2. The sequence conservation at a particular position in the alignment is defined as the difference between the maximum possible entropy and the entropy of the observed symbol distribution: m .Rseq = Smax − Sobs = log2 m − −m pn log2 pn n=1
(with m=20 amino acids). The maximum sequence conservation per site is 4.32 bits. Amino acids are given colors according to their chemical properties: polar amino acids (G, S, T, Y, C) show as green, hydrophobic (A, V, L, I, P, W, F, M) yellow, basic (K, R, H) blue, acidic (D, E) red and their amide (N, Q) purple [10]. Class N143_11 is observed in 19 reverse gyrases (among the 35 of the set) including the 8 topR1 and class T23_8 in 16 reverse gyrases of the same set including the 8 topR2. Class A26_6 is present in 17 reverse gyrases including the 8 topR1 and class G156_8 in 15 reverse gyrases including the 8 topR2 (3 reverse
Four Billion Years: The Story of an Ancient Protein Family
611
Fig. 6 Examples of classes (in the sense of Fig. 3) that discriminate reverse gyrases topR1 and topR2. Top: alignment of classes N143_11 and T23_8. Bottom: alignment of classes G156_8 and A23_6. For example, N143_11 is one of the classes within the 35 gyrases given by the N-local decoding with N=11. With an asparagine N at its center, it is 21 amino acids long. The sequence conservation at a particular position in the alignment is defined as the difference between the maximum possible entropy and the entropy of the observed symbol distribution m .Rseq = Smax − Sobs = log2 m − (− pn log2 pn ) (with m=20 amino acids). The maximum n=1
sequence conservation per site is 4.32 bits. Amino acids have colors according to their chemical properties: polar amino acids (G, S, T, Y, C) show as green, hydrophobic (A, V, L, I, P, W, F, M) yellow, basic (K, R, H) blue, acidic (D, E) red and their amide (N, Q) purple [10]
gyrases do not fall into either category). The discriminating amino acids are at the center of the motifs. Some positions in T23_8 are strictly conserved—VESP on the left of the central T and KA on its right—while the other positions are more versatile. Of course, the fact that classes N143_11 and T23_8 discriminate topR1 and topR2 does not prove that these sequences are responsible for the functional differences between the two genes. It is, however, an observation of interest to the
612
G. Didier et al.
biologist as it could give him a clue on where to start searching. The results given by the computer do not bring any proof but they enable to optimize the experimental work, which is important since experiments are long, extensive, and expensive (the actual experiments had not yet been completed when this article was written).
5 From Molecular Phylogenies to the Tree of Life Several insights can be gained from the topoisomerases phylogeny, without necessarily being able to establish the tree of life: the TopoI subtype is specific to the bacterial world, the TopoIII subtype enables to distinguish the bacteria from the archaea and from the eukaryotes while the reverse gyrases group together all the hyperthermophilic species. This is a common feature of molecular phylogenies being due on the one hand to the duplication of genes in the course of evolution and on the other hand, within bacteria and archaea, to the transfer of genes between widely divergent species—the so-called horizontal gene transfer or HTG. It is estimated that 97% of the genes in bacteria and archaea have been the subject of horizontal transfers [46]. These horizontal transfers, however, seem to be randomly distributed. No obvious species are either donors or receptors. In other words, the HTGs blur the image we have of the tree of life, but without introducing any systematic bias [36]. As a result, the topologies of the phylogenetic trees are generally convergent. The evolutionary histories of the genes that are present in (almost) all the bacteria and archaea, as deduced from their phylogenetic trees, are coherent. This is also the case for the phylogeny of the DNA topoisomerases IA. The link created by the reverse gyrases between the bacteria and the hyperthermophilic archaea does not call into question the validity of the bacterial and archaeal branches. This link is observed exclusively in the phylogeny of the reverse gyrases which are “modern” enzymes resulting from the fusion of two pre-existing genes, a DNA topoisomerase and a helicase. By contrast, the separation of bacteria and archaea into two different branches, which is observed in the phylogenies of the TopoI and TopoIII isomerases, is also found in most of the molecular phylogenies [46]. The relative position of the branches that are situated near the root of the tree, however, is controversial [17, 30]. Considering that those events occurred four billion years ago, this is not surprising. It is possible that, at that time, the genetic material might have been RNA and not DNA (this is still the case for many viruses) [28]. Interestingly, most of the DNA topoisomerases IA possess an RNAtopoisomerase activity which appears important for untangling long RNA that forms pseudoknots. It has been hypothesized that this RNA-topoisomerase activity could be crucial in the RNA world, suggesting that the type IA is one of the most ancient enzymes [1, 19, 20]. Why four billion years? Molecular phylogenies show how the various evolutionary events are linked together, but provide no clue as to the date they occurred. The chronology is given by other disciplines. Conventional fossils trace the history
Four Billion Years: The Story of an Ancient Protein Family
613
Fig. 7 Time table for Earth’s early history (Figure taken from [26])
of animals over a period of ca. 600 million years. Microfossils, stromatolites, remains of lipids and isotopic ratios5 provide information on microorganisms and biogeochemical cycles in the Proterozoic oceans (2500–540 My). They can be roughly interpreted in terms of extant organisms and metabolic processes. Archean rocks (more than 2500 My) provide proof of the presence of life as far as 3500 My ago, maybe even more. The phylogenetic and functional details are, however, quite limited [26, 30] (see Fig. 7).
6 Conclusions Mathematics are now at the heart of biology. They are absolutely necessary to extract relevant information from the gigantic mass of data coming from the sequencing of numerous genomes and other high throughput techniques. In 2020, 186 million protein sequences, belonging to 105,000 organisms (including viruses), had been determined. Alex had anticipated this evolution and had become interested in the analysis of biological sequences as early as the 90s. However, while the general tendency in biology is to develop tools and their ad hoc tweaks, Alex systematically looked for non-trivial but simple solutions. Which lead him to constantly ask the question “What are the fundamental principles?”. Probably a legacy of his career in theoretical physics. Acknowledgments The authors would like to sincerely thank Marc Nadal and Jean-Loup Risler for their constructive criticism and Alessandra Riva for proofreading the article.
5 As
an example, let us take the bias in the isotopic composition of carbon. Atmospheric .CO2 is made up of a mixture of .12 C and .13 C. Since the photosynthetic organisms have a preference for the .12 C-containing .CO2 , the biological fossil sediments will be richer in .12 C than the abiotic sediments. This corresponds to the .13 C depleted reduced carbon in Fig. 7.
614
G. Didier et al.
References 1. M. Ahmad, Y. Xue, S. Lee, J. Martindale, W. Shen, W. Li, S. Zou, M. Ciaramella, H. Debat, M. Nadal, F. Leng, H. Zhang, Q. Wang, G. Siaw, H. Niu, Y. Pommier, M. Gorospe, T.-S. Hsieh, Y.-C. Tse-Dinh, and W. Wang. RNA topoisomerase is prevalent in all domains of life and associates with polyribosomes in animals. Nucleic Acids Res., 44:gkw508, 06 2016. 2. S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. J. Mol. Biol., 215(3):403–410, 1990. 3. P. Bawono, M. Dijkstra, W. Pirovano, K. A. Feenstra, and S. Abeln. Multiple Sequence Alignment, volume 1525, pages 167–189. Humana Press Inc, 11 2017. 4. A. H. Bizard and I. D. Hickson. The many lives of type IA topoisomerases. J. Biol. Chem., 295(20):7138–7153, 2020. 5. R. J. Catchpole and P. Forterre. The evolution of reverse gyrase suggests a nonhyperthermophilic last universal common ancestor. Mol Biol Evol, 36(12):2737–2747, Dec. 2019. 6. M. Chatzou, C. Magis, J.-M. Chang, C. Kemena, G. Bussotti, I. Erb, and C. Notredame. Multiple sequence alignment modeling: methods and applications. Briefings in Bioinformatics, 17(6):1009–1023, 11 2015. 7. J. Chen and S. W. I. Siu. Machine learning approaches for quality assessment of protein structures. Biomolecules, 10(4):626, Apr. 2020. 8. E. Corel, R. Fegalhi, F. Gérardin, M. Hoebeke, M. Nadal, A. Grossmann, and C. LandésDevauchelle. Local similarities and clustering of biological sequences: New insights from N-local decoding. The First International Symposium on Optimization and Systems Biology, 01 2007. 9. E. Corel, F. Pitschi, I. Laprevotte, G. Grasseau, G. Didier, and C. Landès-Devauchelle. MS4multi-scale selector of sequence signatures: An alignment-free method for classification of biological sequences. BMC Bioinf, 11:406, 07 2010. 10. G. E. Crooks, G. Hon, J.-M. Chandonia, and S. E. Brenner. Weblogo: A sequence logo generator. Genome Res., 14(6):1188–1190, 2004. 11. C. Devauchelle, Y. Diaz, G. Didier, A. Hénaut, and B. Torrésani. Pseudo-rate matrices, beyond Dayhoff’s model. This volume, 2021. 12. G. Didier. Caractérisation des n-écritures et application à l’étude des suites de complexité ultimement n+ cste. Theoretical computer science, 215(1-2):31–49, 1999. 13. G. Didier, E. Corel, I. Laprevotte, A. Grossmann, and C. Landés-Devauchelle. Variable length local decoding and alignment-free sequence comparison. Theoretical Computer Science, 462:1–11, 2012. 14. G. Didier, L. Debomy, M. Pupin, M. Zhang, A. Grossmann, C. Devauchelle, and I. Laprevotte. Comparing sequences without using alignments: application to HIV/SIV subtyping. BMC Bioinf., 8(1):1, Jan. 2007. 15. G. Didier, I. Laprevotte, M. Pupin, and A. Hénaut. Local decoding of sequences and alignmentfree comparison. Journal of computational biology : a journal of computational molecular cell biology, 13:1465–76, 11 2006. 16. T. Farkaš, J. Sitarˇcík, B. Brejová, and M. Lucká. SWSPM: A novel alignment-free DNA comparison method based on signal processing approaches. Evolutionary bioinformatics online, 15:1176934319849071, 2019. 17. P. Forterre. The universal tree of life: an update. Front. Microbiol., 6:717, 2015. 18. P. Forterre and D. Gadelle. Phylogenomics of DNA topoisomerases: Their origin and putative roles in the emergence of modern organisms. Nucleic Acids Res, 37:679–92, 03 2009. 19. F. Garnier, M. Couturier, H. Débat, and M. Nadal. Archaea: a gold mine for topoisomerase diversity. Front. Microbiol., 2021. In press. 20. F. Garnier, H. Debat, and M. Nadal. Type IA DNA topoisomerases: A universal core and multiple activities. In M. Drolet, editor, DNA Topoisomerases, volume 1703 of Methods in Molecular Biology, chapter 1, page 1:20. Springer, 2018.
Four Billion Years: The Story of an Ancient Protein Family
615
21. G. Z. Hertz and G. D. Stormo. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics, 15(7):563–577, 07 1999. 22. D. M. Hillis, C. Moritz, and B. K. Mable, editors. Molecular Systematics. Sinauer Associates Inc., 1996. 23. D. Huson. Application of phylogenetic networks in evolutionary studies. Mol. Biol. Evol., 23(2):254–267, 01 2006. 24. L. Jaroszewski. Protein structure prediction based on sequence similarity. Methods in molecular biology (Clifton, N.J.), 569:129–56, 02 2009. 25. W. Just. Computational complexity of multiple sequence alignment with SP-Score. J. Comput. Biol., 8(6):615–623, 2001. PMID: 11747615. 26. A. H. Knoll, K. D. Bergmann, and J. V. Strauss. Life: the first two billion years. Philos Trans R Soc Lond B Biol Sci, 371(1707):20150493, Nov. 2016. 27. C.-A. Leimeister, S. Sohrabi-Jahromi, and B. Morgenstern. Fast and accurate phylogeny reconstruction using filtered spaced-word matches. Bioinformatics, 33(7):971–979, 01 2017. 28. W. Ma. What does “the RNA world” mean to “the origin of life”? Life (Basel, Switzerland), 7(4):49, Nov. 2017. 29. F. MacLeod, G. S. Kindler, H. L. Wong, R. Chen, and B. P. Burns. Asgard archaea: Diversity, function, and evolutionary implications in a range of microbiomes. AIMS microbiology, 5(1):48–61, Jan. 2019. 30. W. F. Martin and F. L. Sousa. Early microbial evolution: The age of anaerobes. Cold Spring Harbor Perspect. Biol., 8(2), 2016. 31. S. B. Needleman and C. D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol., 48(3):443–453, 1970. 32. L.-T. Nguyen, H. A. Schmidt, A. von Haeseler, and B. Q. Minh. IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies. Mol. Biol. Evol., 32(1):268–274, 11 2014. 33. W. Pearson and D. Lipman. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A, 85:2444–8, 05 1988. 34. E. L. Peterson, J. Kondev, J. A. Theriot, and R. Phillips. Reduced amino acid alphabets exhibit an improved sensitivity and selectivity in fold assignment. Bioinformatics (Oxford, England), 25(11):1356–1362, June 2009. 35. V. Polyanovsky, A. Lifanov, N. Esipova, and V. Tumanyan. The ranging of amino acids substitution matrices of various types in accordance with the alignment accuracy criterion. BMC Bioinf., 21(11):294, Sept. 2020. 36. P. Puigbó, Y. I. Wolf, and E. V. Koonin. Seeing the tree of life behind the phylogenetic forest. BMC Biol., 11(1):46, Apr. 2013. 37. J. Risler, M. Delorme, H. Delacroix, and A. Henaut. Amino acid substitutions in structurally related proteins a pattern recognition approach: Determination of a new and efficient scoring matrix. J. Mol. Biol., 204(4):1019–1029, 1988. 38. J. Rissanen. A universal data compression system. IEEE Transactions on Information Theory, 29(5):656–664, 1983. 39. T. D. Schneider, G. D. Stormo, L. Gold, and A. Ehrenfeucht. Information content of binding sites on nucleotide sequences. J. Mol. Biol., 188(3):415–431, 1986. 40. C. Semple and M. Steel. Phylogenetics, volume 24 of Oxford lecture series in mathematics and its applications. Oxford University Press, 2003. 41. T. Smith and M. Waterman. Identification of common molecular subsequences. J. Mol. Biol., 147(1):195–197, 1981. 42. D. Tagu and J.-L. Risler. Bioinformatique ; Principes d’utilisation des outils. Editions Quae, Paris, 2010. 43. E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249–260, Sept. 1995. 44. J. C. Wang. DNA topoisomerases: why so many ? J Biol Chem, 266(11):6659–62, 1991. 45. J. D. Watson and F. Crick. Molecular structure of nucleic acids: a structure for deoxyribose nucleic acid. Nature, 171(4356):737–738, 1953.
616
G. Didier et al.
46. M. C. Weiss, M. Preiner, J. C. Xavier, V. Zimorski, and W. F. Martin. The last universal common ancestor between ancient earth chemistry and the onset of genetics. PLos Genet, 14(8):1–19, 08 2018. 47. A. Zielezinski, H. Z. Girgis, G. Bernard, C.-A. Leimeister, K. Tang, T. Dencker, A. K. Lau, S. Röhling, J. J. Choi, M. S. Waterman, M. Comin, S.-H. Kim, S. Vinga, J. S. Almeida, C. X. Chan, B. T. James, F. Sun, B. Morgenstern, and W. M. Karlowski. Benchmarking of alignmentfree sequence comparison methods. Genome Biol., 20(1):144, July 2019. 48. A. Zielezinski, S. Vinga, J. Almeida, and W. M. Karlowski. Alignment-free sequence comparison: benefits, applications, and tools. Genome Biol., 18(1):186, Oct. 2017. 49. E. Zuckerlandl and L. Pauling. Evolutionary divergence and convergence in proteins. In V. Bryson and H. J. Vogel, editors, Evolving Genes and Proteins, pages 97–166. Academic Press, 1965.
Pseudo-Rate Matrices, Beyond Dayhoff’s Model Claudine Landès, Yolande Diaz-Lazcoz, Alain Hénaut, and Bruno Torrésani
Abstract One of the fundamental techniques of biology is sequence alignment, namely transforming one sequence into another with minimal change. Sequence alignment is essential for evolutionary studies and is a source of information for the analysis of the physico-chemical mechanisms which are at the heart of protein activity. Biologists almost exclusively use methods based on a very simple model, although they are aware that this can be quite removed from reality. In fact, the more complex models involve so many variables that they cannot be calculated in practice. This paper presents a method to estimate the quality of the approximation made using simple models, giving a measure of the deviation from reality. It is exclusively based on the analysis of pairwise alignments, without resorting to multiple alignments, and therefore without requiring the construction of trees and the problems associated with it. The paper also describes an approach that allows building trees and clusters from sequences without strongly relying on the choice of a dissimilarity measure. It illustrates the interest and effectiveness of the point of view promoted by Alex: assume as little as possible and try to gather information from the data, before turning to explicit modeling if necessary.
C. Landès Univ Angers, Institut Agro, INRAE, IRHS, SFR QUASAV, Angers, France e-mail: [email protected] Y. Diaz-Lazcoz Université d’Evry, LaMME, Evry, France e-mail: [email protected] A. Hénaut Université Publique Française, Evry, France B. Torrésani () Aix Marseille Univ, CNRS, I2M, Marseille, France e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Flandrin et al. (eds.), Theoretical Physics, Wavelets, Analysis, Genomics, Applied and Numerical Harmonic Analysis, https://doi.org/10.1007/978-3-030-45847-8_26
617
618
C. Landès et al.
1 Introduction Alignment-free sequence comparisons make it possible to reconstitute the phylogeny of proteins that have diverged greatly over time (see the introduction of the companion paper [2] in this volume for definitions of the main concepts), but they do not, on their own, enable to model the mechanisms of protein evolution. This requires alignments, that is, the transformation of one sequence into another whilst minimizing the number of changes. The realization of the alignments has two clearly distinct parts: an alignment algorithm and a transition matrix (also called substitution scoring matrix or matrix of accepted mutation rates, see Definition 5 below). This matrix defines the rate at which amino acids are replaced by others over the course of evolution. We presented the alignment algorithms in the article [2]. Here we present some of the issues related to the construction of amino acid substitution rate matrices, the related models of sequence evolution, and how Alex approached them. A main aspect of Alex’s contributions is the will to adapt models to data, not vice versa. This paper first describes the estimation of observed rate matrices from pairwise sequence alignments, examines connections with popular sequence evolution models and shows how simple multivariate analysis techniques can be applied to these matrices to highlight and quantify departures from such models. It also accounts for an original approach to biological sequence clustering that avoids as much as possible ad-hoc dissimilarity measures that tend to bias results and thus interpretations. Theoretical developments are complemented by numerical results on real data that include topoisomerases already discussed in [2].
2 Classical Approaches, Scoring Matrices 2.1 Dayhoff Evolution Model—The PAM (Point Accepted Mutation) Matrix In the late 1960s, Margaret Dayhoff had the excellent idea of collating as many closely related homologous protein sequences as possible, aligning them “by hand” and counting the substitutions, which enabled her to estimate the probability that a given amino acid will be replaced by another when there is about 1% change between two sequences. She thus obtained a .20 × 20 matrix called PAM1 for “1 Point Accepted Mutation per 100 residues.” By construction, the PAM matrices are symmetrical.1
1 In general, the PAM series rather refers to the scoring matrices that are used to weight the replacements in protein alignment methods. These scoring matrices, calculated from these probabilities, and closely connected to transition matrices, are studied in Sect. 3.2 below.
Pseudo-Rate Matrices, Beyond Dayhoff’s Model
619
Fig. 1 Correspondence of the observed percent difference and the estimated evolutionary distance in PAM [5]
Margaret Dayhoff hypothesized that the probability of replacing an amino acid at a given site depends only on the nature of that amino acid. This probability is independent of what may happen at the other sites and of what may have happened previously at this same site [5]. In other words, repeated mutations over a longer period of evolution follow the same substitution pattern as those observed. We can therefore extrapolate from PAM1 the PAM matrices corresponding to any percentage of change. Biologists typically use the PAM120 matrix (about 40% identity). They switch to PAM250 if they find that the sequences are very divergent (about 20% identity) (see Fig. 1).
2.2 BLOSUM, Another General Purpose Substitution Matrix Since then, other matrices based on the same principle have been developed. The most widely used are the BLOcks of Amino Acid SUbstitution Matrix (BLOSUM) matrix series. BLOSUM matrices were constructed from the count of substitutions observed in a series of multiple alignments listed in the Blocks database [22]. This database collated alignment blocks without gap within related proteins [2]. These well-conserved regions are believed to have greater functional relevance. Created twenty years after the PAM matrices, the BLOSUM matrices obviously have the advantage of being constructed from a much higher number of alignments,
620
C. Landès et al.
themselves made from a much larger range of proteins. Biologists generally use the BLOSUM62 matrix (i.e., the matrix constructed by retaining in all the aligned pairs of the Blocks database those which present at most 62% identity). As BLOSUM matrices are based on structures which have been well conserved during evolution, they are less “lax,” which means that amino acids are less easily exchangeable than in PAM matrices. The general consensus is that the BLOSUM matrices are superior in terms of sensitivity and specificity without, however, the difference with the PAM matrices being considerable (see Chapter 11 in [27]).
2.3 Available Biological Material for the Estimation of Scoring Matrices Margaret Dayhoff had very little data when she calculated the first PAM matrices (in 1969, 814 substitutions in all were known in pairs of sequences having more than 85% identity). Progress has been very rapid, the matrices were based on 60,000 substitutions by the early 1990s. The number of known sequences has exploded since. Biologists now have a choice of dozens of substitution scoring matrices (see [28] and [23]). They differ among other things by: • The sequences used in the training set (it can contain a single family of proteins or several hundred). • Some authors use the mutations observed in a global alignment, including both highly conserved regions as well as highly mutable regions (e.g., PAM), while others only take into account regions whose structure is well conserved (e.g., BLOSUM). On the other hand, all authors implicitly assume that the matrices of the substitution rates are homogeneous in the training set. The latter assumption is not necessary. As we show below, one can simultaneously estimate the evolution rate matrix and a divergence age for each pair of sequences [8].
3 Rate Matrices, Beyond Dayhoff’s Model The construction of the Dayhoff matrices is very similar to approaches developed in the context of the inference of evolutionary trees that often rely on Markov models on trees (see [9], and Chapter 11 of [14], see also [20] for a different approach, which bears similarities with the techniques presented here). However, Dayhoff’s approach departs from these as it is mainly descriptive and does not involve explicit modeling and corresponding parameter estimation. It only exploits simple counting, converted into scores. The construction in [8], described below, builds on these ideas and attempts to interpret counting matrices in terms of a
Pseudo-Rate Matrices, Beyond Dayhoff’s Model
621
minimal number of parameters, namely a rate matrix and divergence times whenever possible, or several matrices in more complex cases.
3.1 Definitions and Notations We start by introducing background definitions and notations. Throughout this paper, we work with square matrices with real entries. We refer to [4] for an account of the main aspects of matrix calculus. We use the following for any .m×m notations: matrices .M, M , their inner product is defined as . M, M = m ij Mij , the i,j =1 M m corresponding norm is denoted by .M2 , and the trace of .M is .Tr(M) = i=1 Mii . The spectral norm .M of a matrix .M is its largest singular value. We will make use of the matrix logarithm, which should be understood as the inverse function of the matrix exponential. While the latter can be defined by its power series which is always convergent, the matrix logarithm raises more difficult questions, and may be defined in various ways (see the note by H.E. Haber [13] for a summary). We will limit ourselves to the definition based on the Mercator series: Definition 1 (Matrix Logarithm) The logarithm of a matrix .M is defined by the infinite power series expansion .
log M =
∞ (−1)k+1 k=1
k
(M − Im )k ,
(1)
when the latter is convergent. Here, .Im denotes the .m × m identity matrix. Convergence is ensured whenever .M − Im < 1. The matrix logarithm does not satisfy all the usual properties of the numerical logarithm: in general, .log(M1 M2 ) = log(M1 ) + log(M2 ). However, the equality holds true when .M1 and .M2 commute, and the relation .
log Mτ = τ log(M)
(2)
is preserved for all positive integer .τ , and more generally when .Mτ is well defined. Definition 2 (Transition Matrices, Rate Matrices)
1. A transition matrix is an .m × m matrix .P such that for all i, . m j =1 Pij = 1. m 2. A pseudo-rate matrix is a .m × m matrix .Q such that . j =1 Qij = 0 for all i and .Qii ≤ 0 for all i. 3. A rate matrix is a pseudo-rate matrix .Q such that .Qij ≥ 0 for all .i = j . Transition matrices are sometimes called stochastic matrices, or Markov transition matrices. Transition matrices are naturally associated with finite state Markov chains, i.e., random processes such that the probability of moving from state i
622
C. Landès et al.
to state j in one time step is given by the matrix element .Pij . In general, the eigenvalues of a transition matrix are complex numbers of modulus smaller than or equal to 1. Definition 3 (Markov Semigroup) A Markov semigroup is a family of transition matrices .t ∈ R+ → P(t) satisfying the Chapman–Kolmogorov equation P(t)P(t ) = P(t + t ) ,
.
t, t ∈ R+ ,
(3)
and such that for all i, j , .Pij (0) = δij and .limt→0 Pii (0) = 1. Given a Markov semigroup, there always exists a matrix .Q = P (0) such that .P(t) = etQ and .Q is a rate matrix. Conversely, if .Q is a rate matrix, then the exponentials tQ , where .t ∈ R+ , form a Markov semigroup. .e Definition 4 (Embeddable Transition Matrices) 1. A transition matrix .P is embeddable into a Markov semigroup, or simply embeddable if there exists a corresponding Markov semigroup .t → P(t) such that .P = P(1) or, equivalently, if there exists a rate matrix .Q such that .P = eQ . .Q is called a generator. 2. A set of transition matrices .{P(1) , . . . P(K) } is jointly embeddable if there exists a rate matrix .Q and positive numbers .τ1 , . . . τK such that for all k, .P(k) = eτk Q . When .P is embeddable, the corresponding rate matrix .Q coincides with the matrix logarithm of the transition matrix .P. These notions are at the heart of the maximum likelihood approach developed by Felsenstein [9] and followers for the estimation of evolutionary trees. Evolutionary trees model the evolution of a set of current time data, called taxa, from their most recent common ancestor. In the situation of interest here, taxa are protein sequences. The latter are represented as symbolic sequences, namely sequences .{x(k), k = 1 . . . K} with values in a finite alphabet .A of cardinality .m = #A (for protein sequences, .m = 20, letters label amino acids). A pairwise alignment of two sequences .x, y is an ordered pair .(x, y) = {(x(k), y(k)), k ∈ I (x, y)} where .I (x, y) is a set of indices. Pairwise alignments are used to identify regions of similarity between two sequences of interest. Multiple alignments involve more than two sequences. Algorithms for aligning sequences have been described in the companion paper [2]. Many models have been proposed for describing a family of aligned sequences (a multiple alignment) from an evolutionary perspective (see [25] for a review). Among these the Markov Chain on a Tree (MCT) model has received considerable attention. The MCT model assumes that the sites of the sequences are independent, identically distributed, random variables (see, e.g., [26] for a discussion of the consequences of such assumptions), whose time evolution is mainly described by two parameters:
Pseudo-Rate Matrices, Beyond Dayhoff’s Model
623
Fig. 2 Two examples of rooted evolutionary trees with 4 taxa, with different topologies. In the most general form of MCT models a transition matrix is associated with each branch
• A binary tree, whose leaves are the sequences considered (present time), nodes are ancestor sequences, and branches represent Markovian evolution. • A family of transition matrices associated with the branches of the tree that characterize evolution along the branch, modeled by a Markov chain. At each node a sequence gives rise to two different sequences, each one evolving according to its own Markov chain. A couple of examples is provided in Fig. 2. A rooted tree with K leaves has .2K −3 edges (.2K −2 for unrooted trees), thus .2K −3 transition matrices. It is worth mentioning at this point that the number of different tree topologies is equal to .(2K − 3)!!, which makes the tree identification problem extremely difficult for large numbers of sequences. Among MCT models, the F81 model of [9] also assumes that the evolutionary process is stationary, homogeneous, and reversible. In the context of interest here, stationarity means that the probabilities of amino acids are the same at all nodes of the tree; homogeneity essentially means that all transition matrices are embeddable (local homogeneity) or jointly embeddable (global homogeneity); reversibility means that for all pair .(i, j ) of amino acids, substitutions .i → j and .j → i have equal probabilities. The parameters of the model (transition matrices and tree) are sufficient to compute probabilities of all possible multiple alignments and evaluate them numerically. The associated estimation problem is: infer parameter values of the multiple alignment, which was done in [9] using a maximum likelihood approach. Parameter estimation turns out to be computationally heavy for large families of sequences, even under the assumptions of homogeneity, stationarity and reversibility and additional simplifications are often made. In addition, the comparison of likelihoods for different tree topologies raises other difficult questions [1]. Finally, these assumptions are often violated by data, so that the inferred evolutionary trees have to be taken cautiously. An alternative approach avoiding these assumptions has been proposed in [3] and more recently [16], it attempts to estimate a transition matrix for each branch of the tree using again maximum likelihood. Still, the
624
C. Landès et al.
problem is extremely difficult when large sequence families are considered, and the statistical significance of results obtained with such a large number of parameters may be rather questionable.
3.2 From Pairwise Alignments to Rate Matrix The approach developed in [8] departs from these general models and attempts to find simpler and more versatile descriptions for multiple alignments. The starting point is a multiple alignment of sufficiently related and sufficiently close protein sequences (to be introduced in Definitions 6 and 7 below). As stressed above, unlike most probabilistic approaches to phylogeny (see [9], and [14] and references therein), the approach of [8] does not use the full multiple alignment, and only focuses on pairwise alignments. From each pairwise alignment .(x, y), an observed transition matrix .P(x,y) is computed (see Definition 5 below), and the question is: to which extend can the so-obtained family of observed transition matrices can be gathered (embedded in the sense of Definition 4) into a common framework. For that, the objects of interest will be the matrix logarithms .L(x,y) of the observed transition matrices .P(x,y) .
3.2.1
Observed Transition and Rate Matrices
From now on, we consider a set of sequences denoted by X, and a set .{(x, y), x, y ∈ X} of pairwise alignments. Definition 5 (Counts, Frequencies) Given an ordered pairwise alignment .(x, y) of length .c(x, y), 1. The matrix of transition frequencies .F(x,y) is defined by its elements (x,y)
Fij
.
=
1 k : x(k) = i and y(k) = j , c(x, y) (x)
i, j = 1, . . . m .
(4)
(x)
2. The vectors of frequencies .π (x) = (π1 , · · · πm ) are given by (x)
πi
.
=
1 k : x(k) = i , c(x, y)
i = 1, . . . m .
(5)
We also denote by .(x) = diag(π (x) ) the corresponding diagonal matrices. From these quantities, observed transition and rate matrices can be introduced. Definition 6 (Observed Transition and Rate Matrices) Given an ordered pairwise alignment .(x, y) of length .c(x, y),
Pseudo-Rate Matrices, Beyond Dayhoff’s Model
625
1. The associated observed transition matrix is defined as (x)−1 (x,y)
=
(x,y)
P
.
F
(x,y)
,
(x,y) Pij
=
Fij
(x)
πi
.
(6)
2. The sequences .x, y are sufficiently related if the corresponding observed transition matrices admit a logarithm in the sense of Definition 1. In this case, the matrices L(x,y) = log P(x,y)
.
(7)
are called observed rate matrices. The diagonals of these matrices are called mutabilities, and denoted by
μ(x,y) = diag L(x,y) .
.
(8)
Algorithms for reconstructing evolutionary trees from multiple alignments can be based upon “evolutionary distances.” Such distances can be constructed from the above data. For example, the LogDet distance proposed in [3]
ldet(x, y) = log det P(x,y)
.
(9)
is a natural choice in situations where the observed transition matrices are jointly embeddable, i.e., of the form .P(x,y) = eτ (x,y)Q for some rate matrix .Q (called generator). In such a case .ldet(x, y) = Tr(L(x,y) ) = τ (x, y)Tr(Q) is proportional to the divergence time .τ (x, y). Although termed “distance,” .ldet is not a metric, in particular is not symmetric, and is therefore not a suitable quantity for most tree reconstruction methods that require tree metrics.2 Nevertheless, .ldet may be an interesting quantity to look at, precisely because it does not force symmetry. We will analyze a biologically relevant example in Sect. 4.1 below. Remark 1 (Symmetric LogDet Distances) Several alternative LogDet distances have also been proposed andstudied in the literature. Among these, the quantity (x,y) .d(x, y) = − log det F proposed in [18], where it was shown that this distance allows identification of the tree topology, but not edge lengths. Another alternative is .δ(x, y) = 12 log det P(x,y) P(y,x) which possesses the desired symmetry property and interesting interpretations in the context of reversible MCT models [25].
tree metric is a map .(x, y) → δ(x, y) that satisfies the requirements of a dissimilarity map (it is non-negative, symmetric, and such that .δ(x, x) = 0 for all x) and an additional condition called the four points condition, see, e.g., chapter 11 in [14].
2A
626
C. Landès et al.
3.2.2
The Symmetrized Case
In the above setting a pairwise alignment .(x, y) gives rise to two observed rate matrices .L(x,y) and .L(y,x) , which complicates the analysis (although a strong discrepancy between these two would indicate a strong departure from the above model). Simplification can be achieved by averaging these two matrices; however, it appears more natural to introduce symmetrization directly in the counting (x,y) procedure. With the above notations, we introduce the symmetrized matrices .F (x,y) and vectors .π˜ 1 (x,y) (x,y) F + F(y,x) , .F = 2
π˜ (x,y) =
1 (x,y) π + π (y,x) , 2
(10)
(x,y) = diag(π˜ (x,y) ). and define as before the diagonal matrix . Definition 7 (Sufficiently Close Sequences) Two sequences .(x, y) are sufficiently close when the corresponding matrix . F(x,y) is positive definite. (x,y) is As stated in [8], for sufficiently close sequences .(x, y), the matrix . nonsingular. This motivates the introduction of corresponding transition and rate matrices: Proposition 1 Let .(x, y) be a pairwise alignment of sufficiently close sequences. Then the following symmetrized observed transition and rate matrices are well defined (x,y) )−1 P(x,y) = ( F(x,y) , .
L(x,y) = log P(x,y) .
(11)
Assuming, for argument’s sake, that alignments were generated according to the Markov tree model of [9], the following observations can be made: • Observed transition matrices can be expected to be powers .Pτ (x,y) of a unique transition matrix .P (up to statistical fluctuations). • Therefore observed rate matrices can be expected to be (up to fluctuations) proportional to a unique rate matrix .Q = log(P). In such a situation, simple tools such as linear regression may be expected to yield estimates for the rate matrix .Q and divergence times .τ (x, y). To resolve the scaling indeterminacy, a normalization condition has to be imposed on either .Q or the divergence times, for example, .Tr(Q) = −1 (or .QF = 1 as in [8]).
3.3 Multivariate Analysis of Observed Rate Matrices We now address the problem of comparing sufficiently related sequences using observed rate matrices and without additional assumptions. Consider a set of p pair-
Pseudo-Rate Matrices, Beyond Dayhoff’s Model
627
wise alignments .(x, y) of sufficiently related sequences as defined in Definition 6. To each pair .(x, y) is associated an observed rate matrix .L(x,y) , which provides an 2 .m -dimensional representation of the alignment. In the biological applications described below we mainly focus on two aspects, namely the symmetry and the adequacy of models associated with a unique rate matrix .Q. For that we will resort to multivariate analysis techniques, in particular adaptations of principal component analysis (PCA for short, see, e.g., [17] for a recent review), which we briefly outline here. PCA is a very simple and routinely used tool for exploratory data analysis. Given an .n × p data matrix .X, PCA provides orthonormal bases of the space of rows and the space of columns of .X, denoted, respectively, by .{Vk } and .{U }. These are eigenvectors of the matrices .XT X and .XXT , respectively3 . Eigenvalues are real and non-negative, and conventionally ordered in decreasing order (basis vectors are sorted accordingly). They represent the standard deviations of the projections of rows and columns of .X onto the axes generated by corresponding basis vectors. The coordinates of rows and columns with respect to these bases can often be given a sensible interpretation. It is customary to represent graphically projections onto subspaces spanned by the first eigenvectors (for convenience two-dimensional subspaces are chosen, the so-called first factorial planes). Also of interest are the weights of the expansion of basis vectors .{Vk } (resp. .{U }) as linear combinations of rows (resp. columns) of .X, which we will call contributions. Remark 2 (Mutabilities, ldet) Besides observed rate matrices, other simple quantities are also worth investigating. For example, the mutabilities (diagonals of observed rate matrices) defined in (8) can be analyzed in the same way and provide similar or complementary conclusions. While rate matrix elements are labeled by pairs of amino acids, mutabilities are vectors labeled by amino acids. Biologists prefer to use mutabilities because they are easier to interpret. This is what we will do in the examples below. Also, traces of observed rate matrices (i.e., sum of diagonal elements) coincide with the ldet distance defined in (9), and therefore provide information relative to divergence times of sequences.
4 Biological Validation of Observed Rate Matrices The above approach is very general and provides a representation of alignments which enables in particular to check if .L(x,y) is indeed equal to .L(y,x) or if the rate matrix .Q is the same for all the sequences in the sample. These are necessary checks because, as we show below, this is not always the case. They provide a rational basis for the choice of a model of molecular phylogenies. 3 Standard PCA often involves prior centering of the columns of X, and sometimes an additional normalization.
628
C. Landès et al.
4.1 Rate Matrices and Subfunctionalization in Reverse Gyrases For some genes, multiple copies are present in the genome. The different copies can have the same function, the duplication simply increasing the amount of protein produced, or different functions while keeping the same type of enzymatic activity; this is what biologists call a subfunctionalization. Subfunctionalization involves a modification of the protein sequence that goes beyond conservative substitutions, i.e., those where one amino acid is replaced by another which will play the same role. The associated mutation matrix must therefore be different from those usually observed since the latter were calculated on sequences whose function was conserved during evolution. The whole question is whether the differences are sufficiently larger than the noise level to be visible. We approached the problem by studying the duplication of a gene, the reverse gyrase,4 because we know from biological data that there are cases of subfunctionalization. (See the introduction and section 4 of [2] where these points are discussed in more detail). In Sulfolobus, for example, the two genes encoding the reverse gyrase are essential, the two proteins have different enzymatic properties and have a specific regulatory pathway [10, 11]. The set of sequences that we used contained 17 reverse gyrases representative of the biodiversity of hyperthermophiles and the duplicated genes topR1 and topR2 from three Sulfolobus, namely S. acidocaldarius, S. solfataricus and S. tokodaii. We performed a pairwise alignment of the 23 reverse gyrases. The sequences are close enough (according to Definition 6) to allow the computation of the .ldet divergences, following Eq. (9). The distribution of the asymmetry .|ldet(x, y) − ldet(y, x)| is displayed in Fig. 3. A first remark to be made: the main mode of the asymmetry distribution is not located at the zero value (it is located between 0.4 and 0.5); in addition, the 22 values greater than 2.5 forming the second mode all correspond to the alignments involving the reverse gyrase of the bacteria Thermus thermophilus. Here we will only discuss the comparisons between topR1 and topR2 in Sulfolobus. As shown in Table 2, the asymmetry is much weaker in their case, but it is not zero (the identifiers are defined in Table 1). Table 3 displays the differences .ldet(x1 , y2 ) − ldet(x2 , y2 ) between a pair .(x1 , y1 ) of sequences topR1 and the peer pair .(x2 , y2 ) of sequences topR2. The average value of this difference approximately equals .−2.3. Since .ldet(x, y) = τ (x, y)Tr(Q) (Eq. 9) and as .τ is the same for topR1 and topR2 for any given couple of species (this is the time since the two species evolved separately), this means that a unique rate matrix .Q cannot describe both topR1 and topR2. This proves that the subfunctionalization is associated with a modification of the observed rate matrix .L. We now turn to multivariate analysis of mutabilities .μ(x,y) (i.e., diagonals of observed rate matrices, see (8)). Although the set of such vectors may be seen reverse gyrase exists mostly in bacteria and archaea whose growth optimum is above 80.◦ C; it protects DNA from the denaturation that normally occurs at such high temperatures [11].
4 The
Pseudo-Rate Matrices, Beyond Dayhoff’s Model
629
Fig. 3 Distribution of the asymmetry .|ldet(x, y) − ldet(y, x)| for the 253 alignments of the 23 reverse gyrases. The 22 values larger than 2.5 correspond to alignments involving the reverse gyrase of the bacteria Thermus thermophilus Table 1 Definition of identifiers used in Tables 2 and 3 Species S. acidocaldarius S. solfataricus S. tokodaii
Gene topR1 topR1 topR1
Identifier A1 B1 C1
Species S. acidocaldarius S. solfataricus S. tokodaii
Gene topR2 topR2 topR2
Identifier A2 B2 C2
Table 2 Reverse Gyrases: asymmetry of the .ldet “distance” between topR1 and topR2 in Sulfolobus Alignment A1.B1 A1.C1 B1.C1 A2.B2 A2.C2 B2.C2
.ldet(x, y)
13.03 9.79 11.90 15.11 12.22 14.47
Alignment B1.A1 C1.A1 C1.B1 B2.A2 C2.A2 C2.B2
.ldet(y, x)
13.07 9.89 11.91 15.03 12.08 14.36
.ldet(x, y) − ldet(y, x) −0.04 −0.10 −0.01 0.08 0.14 0.11
geometrically as a cloud of points in a 20-dimensional space, they actually lie (to some extent) in a subspace of much smaller dimension. Performing an uncentered PCA (see Sect. 3.3 above) on this dataset gives a satisfactory image of this dimension reduction: the first two axes here account for 59 % (35% + 24%) of the
630
C. Landès et al.
Table 3 Reverse Gyrases: differences between .ldet values for type R1 proteins (denoted by and homologous type R2 proteins (denoted by .ldet(x2 , y2 )). The average value approximately equals .−2.3
.ldet(x1 , y1 ))
Alignment A1.B1 A1.C1 B1.A1 B1.C1 C1.A1 C1.B1
.ldet(x1 , y1 )
13.03 9.79 13.07 11.90 9.89 11.91
Alignment A2.B2 A2.C2 B2.A2 B2.C2 C2.A2 C2.B2
.ldet(x2 , y2 )
15.11 12.22 15.03 14.47 12.08 14.36
.ldet(x1 , y1 ) − ldet(x2 , y2 ) −2.08 −2.43 −1.95 −2.57 −2.19 −2.45
Fig. 4 Reverse gyrases: projections of mutabilities .μ(x,y) onto the first factorial plane of the principal component analysis. Each point represents an alignment between topR1 or topR2. See Table 1 for the definitions of identifiers
variance. There is an almost perfect homothety between the values of .ldet(x, y) and the coordinates on the first axis: a linear regression gives .axis1(x, y) ≈ 0.24 ldet(x, y) (.R 2 = 0.97). However, the observation of the projection onto the first factorial plane (i.e., the plane generated by the first two principal components) displayed in Fig. 4 provides more information than the simple calculation of .ldet(x, y). We see, for example, that the second axis separates .μ(x,y) and .μ(y,x) and therefore outlines the asymmetry mentioned above. In addition, Fig. 5 shows that the projection onto the plane 2-3 both outlines the asymmetry (axis 2) and separates .μ(x1 ,y1 ) and .μ(x2 ,y2 ) (axis 3, which accounts for 7% of the variance), i.e., matrices corresponding to pairs of sequences topR1 and the peer pairs topR2. We could not observe clear structures in the higher dimensions.
Pseudo-Rate Matrices, Beyond Dayhoff’s Model
631
Fig. 5 Reverse gyrases: projections of mutabilities (x,y) onto the plane .μ generated by principal components 2 and 3. Each point represents an alignment between topR1 or topR2. See Table 1 for the definitions of identifiers
The analysis of the contribution makes it possible to give biological significance to these observations by highlighting the amino acids whose mutability varies according to the matrices .Q.
4.2 Several Rate Matrices Within a Protein Family: The Case of Mitochondrial Proteins Mitochondria are organelles found in almost all eukaryotic cells (i.e., in all living organisms except bacteria and archaea). They contain the respiratory chain. Mitochondria have their own genome, which encodes for a subset of the proteins of the respiratory chain (they are called mtDNA-encoded proteins in the following). We compared the mtDNA-encoded proteins of 120 representative species of animals: arthropods, tetrapods, echinoderms, molluscs, and roundworms. These are very different groups: 500 million years ago the ancestors of today’s arthropods were already totally different from the ancestors of vertebrates. Arthropods have diversified throughout geological time: the ancestors of spiders and scorpions already existed 500 million years ago while the ancestors of insects appeared 400 million years ago, at the same time as the first tetrapods [19]. Mammals are much more recent [29]. Just before the disappearance of the dinosaurs (65 million years ago) and especially during the ten million years that followed, mammals underwent an explosive diversification. The sample also contains pairs of species that have diverged for several million years (e.g., man and chimpanzee, different species of
632
C. Landès et al.
Fig. 6 Distribution of the asymmetry .|ldet(x, y) − ldet(y, x)| for the 14,280 alignments of the mtDNA-encoded proteins of 120 species. Values between 3 and 4 all correspond to alignments involving bees
Drosophila—small fruit flies). It therefore allows us to analyze the evolution over a very large time scale. We aligned the 12 mtDNA-encoded proteins that are present in all the species considered here. Transition matrices .P(x,y) were computed for all the 14,280 pairs, after summation over the 12 proteins of two species x and y. Sequences were sufficiently related (according to Definition 6), to allow the computation of the observed rate matrices .L(x,y) . The distribution of the asymmetry .|ldet(x, y) − ldet(y, x)| is displayed in Fig. 6. The main mode of the asymmetry distribution is around the zero value and 95% of the values are less than 2. However, the difference can be as high as 4. Values between 3 and 4 all correspond to alignments involving bees. As before, the mutabilities .μ(x,y) can be viewed geometrically as points in a 20 dimensional space. As the cloud of points is very elongated, the PCA gives a satisfactory image. The first axis represents 66% of the variance, the second 15% and the third 8%, all others are below 3%. The dominance of axis 1 over axis 2 means, as already mentioned, that a MCT model with a single generator is approximately valid for the proteins in our set, axis 1 roughly corresponding to the divergence time .τ .
Pseudo-Rate Matrices, Beyond Dayhoff’s Model
633
Fig. 7 mtDNA-encoded proteins: projections of the vectors .μ(x,y) onto the first factorial planes of an uncentered principal component analysis. Each point represents an alignment between two species. Points corresponding to alignments are identified by blue stars (insect-insect) or red triangles (mammal-mammal). The projection strongly suggests the existence of two different generators .Q for insects and mammals
Figure 7 displays the projections of the .22×21 mutabilities .μ(x,y) corresponding to alignments within insects excluding bees (in blue) and the .21 × 20 vectors .μ(x,y) corresponding to alignments within mammals (in red). The staggering of the alignments along axis 1 reflects the time that has elapsed since the species have diverged: mammals are grouped to the right near the origin of the cloud, while alignments involving insects are far to the left of the cloud. However, a closer examination of the projection on the second and third axis shows biologically significant deviations. The points corresponding to the vectors (x,y) mammal-mammal (red) form a cluster clearly disjointed from that of the .μ vectors .μ(x,y) insect-insect (blue). Clearly, a unique rate matrix .Q cannot account for these two clusters. It is perhaps not surprising that groups that have evolved separately for more than 500 million years correspond to different .Q matrices. It is very surprising, however, to observe a difference between bees and other insects (Fig. 8). This observation is consistent with other works on the mitochondrial genome of bees which have shown that its evolution presents peculiarities [6, 30]. It should be noted that these authors took into account other criteria than the protein sequence. This difference may be due to the extremely high A+T/G+C ratio and the very small number of reproductive individuals in bees. Studies on other organisms have shown high genome instability under these conditions [21].
5 The Influence of Dissimilarity Measures on Sequence Clustering and Phylogeny Reconstruction The analysis of multiple alignment based upon rate matrices is interesting in several respects. On the one hand, it assumes as little as possible about data,
634
C. Landès et al.
Fig. 8 mtDNA-encoded proteins: projections of the mutabilities .μ(x,y) onto the factorial plane 2-3 of an uncentered principal component analysis. Each point represents an alignment between two species. Points corresponding to alignments are identified by blue crosses (insects other as bees), red squares (bees .→ other) and green triangles (other .→ bees). The projection strongly suggests the existence of two different generators .Q for bees and other insects
of which it provides model-free representations (even though largely inspired by Markov evolution ideas). On the other hand, these representations turn out to provide valuable information about evolution. It is a good example which shows the existence of two distinct generators for the sequence family under consideration. Such information may in turn be used to fine tune models. However, such an approach alone does not directly meet the expectations of biologists, who often seek a tree or a clustering that summarizes the information contained in dissimilarity matrix. Even though there exist many techniques that can generate such trees or clustering, the demand is less trivial than it appears. Indeed, the available, biologically relevant dissimilarity measures are numerous and varied (including the .ldet distance introduced above), not to mention that some of them do not satisfy the necessary assumptions that allow using these techniques. This is actually an inexhaustible source of debate for biologists.
5.1 An Iterative-Rank Based Clustering To overcome the dependence on the choice of a dissimilarity measure (and a tree building algorithm), Alex proposed an alternative solution [7]: assume that what really matters in dissimilarities between sequences is their relative order, not their numerical value. The choice of a particular measure then no longer has any importance since they will all give the same result as long as they are all related
Pseudo-Rate Matrices, Beyond Dayhoff’s Model
635
by a monotonic transformation. As for the dependence on tree building/clustering technique, the solution of [7] relies on an iterative procedure which we detail below. The Agreement of Judgments Consider a set X of species (i.e., sequences) of size .n = X, and a matrix D on X, seen as a function .D : X × X → R+ such that .D(x, x) = 0 for all .x ∈ X. D may be chosen symmetric, but not necessarily. Associate with D a family .{Dx , x ∈ X} of evaluation maps defined by Dx (y) = D(x, y) .
.
(12)
These maps provide a description of how each .x ∈ X (hereafter called judge) evaluates all elements of X (the candidates). To achieve the announced goal, i.e., get rid of numerical values and preserve ordering, each map .Dx is replaced with the corresponding rank .rDx : y → rDx (y) ∈ N defined by rDx (y) = {z ∈ X, Dx (z) ≤ Dx (y)} .
.
(13)
It may be shown that for all .x, y, z ∈ X, .rDx (y) ≤ rDx (z) is equivalent to .D(x, y) ≤ D(x, z), i.e., ordering is preserved. The agreement of judgments of two judges .x, y ∈ X may be measured using any mapping .T : Nn × Nn → R+ , by evaluating .T (rDx , rDy ). Definition 8 With the above notations, the T -derivate of the matrix .D : X × X → R+ is the map .∂T D : x, y ∈ X −→ ∂T D(x, y) ∈ R+ defined by ∂T D(x, y) = T rDx , rDy .
.
(14)
∂T D(x, y) thus provides a quantitative measure of the agreement of judges x and y on X. In [7], the squared Euclidean distance .T2 : (u, v) → x∈X (u(x) − v(x))2 is used (also studied in Spearman rank statistics).
.
Iteration Clearly enough, if T is a symmetric map, the T -derivate .∂T D of the matrix D is always a dissimilarity matrix. This also suggests to iterate the procedure, thus deriving a whole family of dissimilarities .∂T D : X × X → R (. = 0, 1, . . .) from any matrix D, defined recursively by ∂T0 D := D
.
and
∂T+1 D := ∂T (∂T D).
(15)
Since X is finite, the sequence .(∂T D) necessarily runs into a cycle, but there is no reason to expect that this iteration should converge. However, the authors of [7] observe that in their experiments, for most distance data obtained either by
636
C. Landès et al.
comparing biological sequences or by random simulation, there was always some integer .i0 of about the same order of magnitude as .X such that .(∂T2 )i0 D is a fixed point of .∂T2 . Clustering and Tree Construction In cluster analysis, a standard task is to associate, to any dissimilarity matrix D, a Linnean hierarchy .H = H(D), i.e., a collection .H of subsets .A, B, . . . of X such that .A ∩ B = ∅ implies .A ⊆ B or .B ⊆ A. Introduce the collection AD = AD (X) := {A ⊆ X|a, b ∈ A, x ∈ X\A ⇒ D(a, b) < D(a, x)}
.
(16)
of subsets A of X such that, for any .a ∈ A, any other .b ∈ A is “closer” to a than any x ∈ X outside A. These subsets always form a Linnean hierarchy, and are therefore natural candidates for being clusters in applications. Further, denote by .BD (x : y) the smallest ball (with respect to D) with center x containing y or, in other words, the set of all z in X that are, relative to D, at least as “close” to x as y. One has [7] for all .x, y ∈ X
.
rDx (y) = #BD (x : y) .
.
(17)
So, while the actual values of D might be debatable, one only needs to trust that one can use D to decide, for any three distinct objects .x, u, v in X, whether u or v is more similar to x. And only the resulting rankings of the objects .u, v, . . . in X relative to the objects .x, . . . in X are needed to define .AD . Remark 3 (Practical Considerations) 1. Given the way the rank is defined, the maximal rank value in a cluster C equals the cluster size .C. This simplifies significantly the determination of clusters. Indeed, in order to find all the clusters of size N (groups of N judges having the same view on the candidates), it suffices to browse the rows (which correspond to judges) of the final rank matrix and find all values with rank less than N . 2. The chosen definition of ranks also facilitates the computation of the distance between two objects (two judges) for the construction of the tree. This distance turns out to be equal to the maximum rank between the two objects, minus 1. This distance is ultrametric, i.e., the two largest distances of a triplet are equal to each other, which defines a hierarchy that can be represented by a dendrogram.
5.2 Application to mtDNA-Encoded Proteins of Tetrapods We present below a study of the phylogeny of tetrapods based on mtDNA-encoded proteins, already discussed in Sect. 4.2. As this phylogeny is very firmly established at the scale considered here, it makes it possible to assess the reliability of phylogeny reconstruction programs. A common pitfall of the latter is that they separate species
Pseudo-Rate Matrices, Beyond Dayhoff’s Model
637
which actually have a common ancestor. This apparent non-monopholy is due to poor management of the differences between the transition matrices of the different branches. The Iterative-Rank Clustering described above turns out to give results of a quality quite comparable to that of the most frequently used software suites. The sample is composed of 49 species representative of Amphibians, Reptiles, Birds and Mammals. It covers about 400 million years. The species belong to 12 clearly separated monophyletic categories but the divergence can be significant in some branches of a given category. However, Prototheria (platypus) is relatively close to Metatheria (marsupial) and Tubulidentata (aardvark) to Cetartiodactyla (ruminant, cetacean). The 12 mtDNA-encoded proteins that are present in the 49 species under consideration have been aligned, and the matrices .P(x,y) have been calculated for all the pairs of sequences. The resulting observed rate matrices could be obtained, since the sequences are sufficiently related (according to Definition 6). The sample is homogeneous (a single matrix .Q) and asymmetry .|ldet(x, y)−ldet(y, x)| is weak, the mode is around 0.15 and 95% of values are less than 0.45. In the study below, we use the average . 12 (ldet(x, y) + ldet(y, x)), but the same results are obtained using .ldet min (x, y) or .ldetmax (x, y). Tetrapod phylogenies have been constructed using various standard programs. These are based upon two ingredients: 1. Distance matrices: we used .ldet (defined in Eq. 9) and a distance calculated using the ProtDist5 software, see [24]. 2. A tree or network reconstruction algorithm: we used BioNJ [12] (a variant of Neighbor Joining6 ), iterative rank (Sect. 5.1) and NeighborNet7 (see [15]). Table 4 summarizes the groupings proposed by the phylogeny construction programs. It should also be noted that all the programs show the proximity of Prototheria to Metatheria and of Tubulidentata to Cetartiodactyla. All the programs made at least two errors (three in the case of ProtDist + NeighborNet) in finding non-monophyly, but not necessarily for the same species. The nature of the errors depends both, on the distances and on the method used to construct the tree. One way to detect them is therefore to compare the results of several independent analyzes. In this context, the iterative-rank clustering approach developed by Alex is of great interest in several respects. On the one hand, it is not
5 ProtDist provides a distance measure for protein sequences, using maximum likelihood estimates based on amino acid scoring matrices. It uses the multiple sequence alignment provided by the user. 6 Neighbor joining is an agglomerative (i.e., aggregation from leaves to root) clustering method for the creation of phylogenetic trees that only requires the knowledge of the distance between each pair of taxa (e.g., sequences) to form the tree. It evaluates branch lengths so that the distances deduced from the tree are closest to the values in the distance table. 7 NeighborNet is similar to Neighbor Joining, except that it can lead to overlapping clusters which do not form a hierarchy, and are represented using a type of phylogenetic network called a splits graph.
638
C. Landès et al.
redundant with the commonly used methods. In addition, the user is not likely to give in to the temptation to tweak the options until he obtains the result he wants.
5.3 The Impact of Symmetry Assumptions Experience shows that given a pair of sequences .(x, y), there is always a more or less significant gap between .ldet(x, y) and .ldet(y, x). This gap is mainly due to the asymmetry of the matrix of transition frequencies .F(x,y) as defined in Eq. (4). Since the early work of Margaret Dayhoff, the problem has been evaded by symmetrizing the matrices .F(x,y) and the vectors .π (x,y) (see Eq. 10). This choice has no consequences if the difference is small, as in the case presented in Table 4, it is, however, very questionable when the asymmetry is important as in the case of Figs. 3 and 8. It amounts to attempting to characterize a bimodal distribution by its mean, which is hardly not the most relevant characteristic value in this case! With the exception of iterative-rank based clustering, the methods generally used, notably those cited in Table 4, assume that the data have been symmetrized previously. This is a drawback because the exploitation of asymmetry opens new possibilities, which can highlight different aspects of the alignments. We consider here four symmetric measures that can be derived from asymmetric quantities, in this case .ldet. Given an alignment .(x, y), we consider .ldetmin (x, y) = min(ldet(y, x), ldet(x, y)) and .ldetmax (x, y) = max(ldet(y, x), ldet(x, y)). In the spirit of Sect. 5.1, we also use the rank-based dissimilarities .D1 (x, y) = Table 4 Reliability of the reconstitution of the phylogeny of tetrapods from mtDNA-encoded proteins. The program makes a mistake when it splits the species of a category into several groups. This is the case, for example, with .ldet + BioNJ and ProtDist + BioNJ which distinguish two groups of snakes (one with 2 species and the other with 1) while they are monophyletic
Amphibia Testudines Squamata Paleognathae Neognathae Crocodylidae Prototheria Metatheria Tubulidentata Cetartiodactyla Lagomorpha Primata Rodentia
E.g., Frog Turtle Snakes Ostrich Chicken Alligator Platypus Marsupial aardvark ruminant Rabbit Monkey Mouse
Nb species 4 4 3 7 7 2 1 2 1 5 2 6 5
.ldet
.ldet
rank 4 4 3 7 7 2 1 2 1 5 2 4|2 3|2
NJ 4 4 2|1 7 7 2 1 2 1 5 2 4|2 5
ProtDist NeighborNet 4 4 3 7 4|3 1|1 1 2 1 5 2 4|2 5
ProtDist NJ 4 4 2|1 7 5|2 2 1 2 1 5 2 6 5
Pseudo-Rate Matrices, Beyond Dayhoff’s Model
639
∂T2 ldet(x, y) and the similar quantity .D2 (x, y) = ∂T2 ldett (x, y) built from the transposed matrix .ldett of .ldet. Here, .∂T2 is the .T2 -derivate (see Sect. 5.1), .T2 being the squared Euclidean distance. These four solutions are not mutually exclusive (one may also consider others, for example, higher order .T2 derivates of .ldet); in fact they turn out to provide complementary points of view on the set of observed rate matrices. We used this approach to analyze the phylogeny of duplicated genes in reverse gyrases (see Sect. 4.1) using the Neighbor Joining method to build the trees. The .ldetmax matrix shows, as already known, that the topR1 and topR2 genes have a common origin. The matrix .ldetmin and the two dissimilarity matrices .D1 and .D2 highlight the phylogenetic relationships of Sulfolobus and Aeropyrum pernix, which are both Archaea of the class Thermoprotei. The tree groups the genes topR1 of Sulfolobus and the gene topR1 of Aeropyrum pernix on one branch and the genes topR2 of Sulfolobus and the gene topR2 of Aeropyrum pernix on another. We display in Fig. 9 the reverse gyrase phylogenetic tree constructed from .ldetmax (x, y), and in Fig. 10 the corresponding tree constructed from .ldetmin (x, y).
6 Discussion—Conclusion One of the fundamental techniques of biology is sequence alignment, namely transforming one sequence into another with minimal change. Sequence alignment is essential for the study of evolution and is a source of information for the analysis of the physico-chemical mechanisms which are at the heart of protein activity. Almost all multiple alignment programs use a guide tree to reduce complexity. Advanced programs proceed by iteration: 1. The first estimation of the distance between the sequences is made. 2. A guide tree is computed based on these estimates. 3. The sequences are progressively aligned following the order given by the guide tree. 4. A new distance between the sequences is calculated after alignment. 5. The program iterates at point 2 as long as the procedure improves the alignment score. A refinement consists in splitting the guide tree and proceeding in the same way in each sub-tree, the stopping criterion remaining the improvement of the alignment score (see Chapter 25 in [27]). The procedure gives an optimal Alignment - Tree pair for a given measure of distance between sequences. The topology is almost frozen by the alignment program. The biologist may then use Monte Carlo type methods to get an idea of trees with roughly equivalent scores. It consists in randomly modifying the matrices .P to identify the TRULY robust parts in the tree.
640
C. Landès et al.
Fig. 9 Phylogenetic tree of reverse gyrase constructed with NJ from .ldetmax (x, y). The topR1 and topR2 proteins of Sulfolobus are clustered in this tree. In contrast, the proteins topR1 and topR2 of Aeropyrum pernix are separated. The identifiers consist of genus, species, and group. They are archaea when not specified otherwise
This approach differs considerably from the one we propose as we do not modify the matrices .P at all, we simply change the angle under which we look at the matrices .L in order to better perceive the proximities between the sequences. Indeed, one can think that in many situations a single tree is not enough to faithfully summarize the information contained in an alignment; it can therefore be interesting to build several trees, exploiting different points of view, rather than trying to make a single one and trying to show that it is significant by bootstrap methods or similar.
Pseudo-Rate Matrices, Beyond Dayhoff’s Model
641
Fig. 10 Phylogenetic tree of reverse gyrase constructed with NJ from .ldetmin (x, y). The topR1 proteins of Sulfolobus are grouped with the topR1 proteins of Aeropyrum pernix in this tree (and similarly for the topR2 proteins of all four species) but topR1 and topR2 are separate. The identifiers consist of genus, species, and group. They are archaea when not specified otherwise
The point of view promoted by Alex was to assume as little as possible and try to collect information from data, before turning to explicit modeling if needed “We must look for a model that fits the data and not twist the data to fit the model.” The starting point here is to avoid modeling multiple alignments (and therefore introducing trees) and try to see which information could be obtained from pairwise alignments. Another originality was to compare pairwise alignments, not only sequences.
642
C. Landès et al.
Acknowledgments The authors would like to sincerely thank Marc Nadal and Jean-Loup Risler for their constructive criticism and Alessandra Riva for proofreading the article.
References 1. J. Adachi and M. Hasegawa. Amino acid substitution of proteins coded for in mitochondrial DNA during mammalian evolution. Idengaku zasshi, 67:187–97, 07 1992. 2. G. Didier, C. Landès, A. Hénaut, and B. Torrésani. Four billion years: the story of an ancient protein family. This volume, 2021. 3. D. Barry and J. A. Hartigan. Statistical analysis of hominoid molecular evolution. Statist. Sci., 2(2):191–207, 05 1987. 4. R. Bhatia. Matrix Analysis, volume 169 of Graduate Texts in Mathematics. Springer-Verlag, New York, 1997. 5. M. Dayhoff, R. Schwartz, and B. Orcutt. A model of evolutionary change in proteins. In M. Dayhoff, editor, Atlas of Protein Sequence and Structure, volume 5, pages 345–352. National Biomedical Research Foundation, Washington, D. C., 1978. 6. de Paula Freitas, C. Flávia, A. P. Lourenço, F. M. F. Nunes, A. R. Paschoal, F. C. P. Abreu, F. O. Barbin, L. Bataglia, C. A. M. Cardoso-Júnior, M. S. Cervoni, S. R. Silva, F. Dalarmi, M. A. Del Lama, T. S. Depintor, K. M. Ferreira, P. S. Gória, M. C. Jaskot, D. C. Lago, D. LunaLucena, L. M. Moda, L. Nascimento, M. Pedrino, F. R. Oliveira, F. C. Sanches, D. E. Santos, C. G. Santos, J. Vieira, A. R. Barchuk, K. Hartfelder, Z. L. P. Simões, M. M. G. Bitondi, and D. G. Pinheiro. The nuclear and mitochondrial genomes of Frieseomelitta varia - a highly eusocial stingless bee (Meliponini) with a permanently sterile worker caste. BMC Genomics, 21(1):386, June 2020. 7. C. Devauchelle, A. W. M. Dress, A. Grossmann, S. Grünewald, and A. Henaut. Constructing hierarchical set systems. Annals of Combinatorics, 8(4):441–456, Jan. 2005. 8. C. Devauchelle, A. Grossmann, A. Hénaut, M. Holschneider, M. Monnerot, J. Risler, and B. Torrésani. Rate matrices for analyzing large families of protein sequences. J. Comput. Biol., 8(4):381–399, 2001. PMID: 11571074. 9. J. Felsenstein. Evolutionary trees from DNA sequences: A maximum likelihood approach. J. Mol. Evol., 17(6):368–376, Nov. 1981. 10. F. Garnier, M. Couturier, H. Débat, and M. Nadal. Archaea: a gold mine for topoisomerase diversity. Front. Microbiol., 2021. In press. 11. F. Garnier, H. Debat, and M. Nadal. Type IA DNA topoisomerases: A universal core and multiple activities. In M. Drolet, editor, DNA Topoisomerases, volume 1703 of Methods in Molecular Biology, chapter 1, page 1:20. Springer, 2018. 12. O. Gascuel and M. Steel. Neighbor-Joining Revealed. Molecular Biology and Evolution, 23(11):1997–2000, 07 2006. 13. H. E. Haber. Notes on the matrix exponential and logarithm. online, May 2019. 14. D. M. Hillis, C. Moritz, and B. K. Mable, editors. Molecular Systematics. Sinauer Associates Inc., 1996. 15. D. H. Huson and D. Bryant. Application of Phylogenetic Networks in Evolutionary Studies. Molecular Biology and Evolution, 23(2):254–267, 10 2005. 16. V. Jayaswal, L. S. Jermiin, and J. Robinson. Estimation of phylogeny using a general Markov model. Evolutionary bioinformatics online, 1:62–80, Feb. 2007. 17. I. T. Jolliffe and J. Cadima. Principal component analysis: a review and recent developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 374(2065):20150202, 2016. 18. P. J. Lockhart, M. A. Steel, M. D. Hendy, and D. Penny. Recovering evolutionary trees under a more realistic model of sequence evolution. Mol. Biol. Evol., 11(4):605–612, 07 1994.
Pseudo-Rate Matrices, Beyond Dayhoff’s Model
643
19. B. Misof, S. Liu, K. Meusemann, R. S. Peters, A. Donath, C. Mayer, P. B. Frandsen, J. Ware, T. Flouri, R. G. Beutel, O. Niehuis, M. Petersen, F. Izquierdo-Carrasco, T. Wappler, J. Rust, A. J. Aberer, U. Aspöck, H. Aspöck, D. Bartel, A. Blanke, S. Berger, A. Böhm, T. R. Buckley, B. Calcott, J. Chen, F. Friedrich, M. Fukui, M. Fujita, C. Greve, P. Grobe, S. Gu, Y. Huang, L. S. Jermiin, A. Y. Kawahara, L. Krogmann, M. Kubiak, R. Lanfear, H. Letsch, Y. Li, Z. Li, J. Li, H. Lu, R. Machida, Y. Mashimo, P. Kapli, D. D. McKenna, G. Meng, Y. Nakagaki, J. L. Navarrete-Heredia, M. Ott, Y. Ou, G. Pass, L. Podsiadlowski, H. Pohl, B. M. von Reumont, K. Schütte, K. Sekiya, S. Shimizu, A. Slipinski, A. Stamatakis, W. Song, X. Su, N. U. Szucsich, M. Tan, X. Tan, M. Tang, J. Tang, G. Timelthaler, S. Tomizuka, M. Trautwein, X. Tong, T. Uchifune, M. G. Walzl, B. M. Wiegmann, J. Wilbrandt, B. Wipfler, T. K. F. Wong, Q. Wu, G. Wu, Y. Xie, S. Yang, Q. Yang, D. K. Yeates, K. Yoshizawa, Q. Zhang, R. Zhang, W. Zhang, Y. Zhang, J. Zhao, C. Zhou, L. Zhou, T. Ziesmann, S. Zou, Y. Li, X. Xu, Y. Zhang, H. Yang, J. Wang, J. Wang, K. M. Kjer, and X. Zhou. Phylogenomics resolves the timing and pattern of insect evolution. Science, 346(6210):763–767, 2014. 20. T. Müller and M. Vingron. Modeling amino acid replacement. J. Comput. Biol., 7(6):761–776, 2000. PMID: 11382360. 21. D. T. Nguyen, B. Wu, S. Xiao, and W. Hao. Evolution of a Record-Setting AT-Rich Genome: Indel Mutation, Recombination, and Substitution Bias. Genome Biology and Evolution, 12(12):2344–2354, 09 2020. 22. S. Pietrokovski, J. G. Henikoff, and S. Henikoff. The Blocks Database—A System for Protein Classification. Nucleic Acids Research, 24(1):197–200, 01 1996. 23. V. Polyanovsky, A. Lifanov, N. Esipova, and V. Tumanyan. The ranging of amino acids substitution matrices of various types in accordance with the alignment accuracy criterion. BMC Bioinf., 21(11):294, Sept. 2020. 24. Protdist. Program to compute distance matrix from protein sequences. online, 1993. https:// evolution.gs.washington.edu/phylip/doc/protdist.html. 25. C. Semple and M. Steel. Phylogenetics, volume 24 of Oxford lecture series in mathematics and its applications. Oxford University Press, 2003. 26. M. Steel. Reconstructing evolutionary trees under a variety of Markov-style models. Proceedings of Phylogeny Workshop 95-48, DIMACS, Princeton University, 1995. 51–54. 27. D. Tagu and J.-L. Risler. Bioinformatique ; Principes d’utilisation des outils. Editions Quae, Paris, 2010. 28. R. Trivedi and H. A. Nagarajaram. Substitution scoring matrices for proteins - an overview. Protein Sci., n/a(n/a), 2020. 29. N. S. Upham, J. A. Esselstyn, and W. Jetz. Inferring the mammal tree: Species-level sets of phylogenies for questions in ecology, evolution, and conservation. PLoS Biol, 17(12):e3000494–e3000494, Dec. 2019. 30. S.-j. Wei, M. Shi, M. J. Sharkey, C. van Achterberg, and X.-x. Chen. Comparative mitogenomics of Braconidae (Insecta: Hymenoptera) and the phylogenetic utility of mitochondrial genomes with special reference to holometabolous insects. BMC Genomics, 11(1):371, June 2010.
644
C. Landès et al.
Rome Università Roma Tre, February 7 2023 (many thanks to Sascha Lill and Davide Fermi)
Applied and Numerical Harmonic Analysis (105 volumes)
1. A. I. Saichev and W. A. Woyczy´nski: Distributions in the Physical and Engineering Sciences (ISBN: 978-0-8176-3924-2) 2. C. E. D’Attellis and E. M. Fernandez-Berdaguer: Wavelet Theory and Harmonic Analysis in Applied Sciences (ISBN: 978-0-8176-3953-2) 3. H. G. Feichtinger and T. Strohmer: Gabor Analysis and Algorithms (ISBN: 978-0-8176-3959-4) 4. R. Tolimieri and M. An: Time-Frequency Representations (ISBN: 978-0-81763918-1) 5. T. M. Peters and J. C. Williams: The Fourier Transform in Biomedical Engineering (ISBN: 978-0-8176-3941-9) 6. G. T. Herman: Geometry of Digital Spaces (ISBN: 978-0-8176-3897-9) 7. A. Teolis: Computational Signal Processing with Wavelets (ISBN: 978-08176-3909-9) 8. J. Ramanathan: Methods of Applied Fourier Analysis (ISBN: 978-0-81763963-1) 9. J. M. Cooper: Introduction to Partial Differential Equations with MATLAB (ISBN: 978-0-8176-3967-9) 10. Procházka, N. G. Kingsbury, P. J. Payner, and J. Uhlir: Signal Analysis and Prediction (ISBN: 978-0-8176-4042-2) 11. W. Bray and C. Stanojevic: Analysis of Divergence (ISBN: 978-1-4612-74674) 12. G. T. Herman and A. Kuba: Discrete Tomography (ISBN: 978-0-8176-4101-6) 13. K. Gröchenig: Foundations of Time-Frequency Analysis (ISBN: 978-0-81764022-4) 14. L. Debnath: Wavelet Transforms and Time-Frequency Signal Analysis (ISBN: 978-0-8176-4104-7) 15. J. J. Benedetto and P. J. S. G. Ferreira: Modern Sampling Theory (ISBN: 978-0-8176-4023-1) 16. D. F. Walnut: An Introduction to Wavelet Analysis (ISBN: 978-0-8176-3962-4) © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Flandrin et al. (eds.), Theoretical Physics, Wavelets, Analysis, Genomics, Applied and Numerical Harmonic Analysis, https://doi.org/10.1007/978-3-030-45847-8
645
646
Applied and Numerical Harmonic Analysis (105 volumes)
17. A. Abbate, C. DeCusatis, and P. K. Das: Wavelets and Subbands (ISBN: 978-0-8176-4136-8) 18. O. Bratteli, P. Jorgensen, and B. Treadway: Wavelets Through a Looking Glass (ISBN: 978-0-8176-4280-80 19. H. G. Feichtinger and T. Strohmer: Advances in Gabor Analysis (ISBN: 978-0-8176-4239-6) 20. O. Christensen: An Introduction to Frames and Riesz Bases (ISBN: 978-08176-4295-2) 21. L. Debnath: Wavelets and Signal Processing (ISBN: 978-0-8176-4235-8) 22. G. Bi and Y. Zeng: Transforms and Fast Algorithms for Signal Analysis and Representations (ISBN: 978-0-8176-4279-2) 23. J. H. Davis: Methods of Applied Mathematics with a MATLAB Overview (ISBN: 978-0-8176-4331-7) 24. J. J. Benedetto and A. I. Zayed: Sampling, Wavelets, and Tomography (ISBN: 978-0-8176-4304-1) 25. E. Prestini: The Evolution of Applied Harmonic Analysis (ISBN: 978-0-81764125-2) 26. L. Brandolini, L. Colzani, A. Iosevich, and G. Travaglini: Fourier Analysis and Convexity (ISBN: 978-0-8176-3263-2) 27. W. Freeden and V. Michel: Multiscale Potential Theory (ISBN: 978-0-81764105-4) 28. O. Christensen and K. L. Christensen: Approximation Theory (ISBN: 978-08176-3600-5) 29. O. Calin and D.-C. Chang: Geometric Mechanics on Riemannian Manifolds (ISBN: 978-0-8176-4354-6) 30. J. A. Hogan: Time?Frequency and Time?Scale Methods (ISBN: 978-0-81764276-1) 31. C. Heil: Harmonic Analysis and Applications (ISBN: 978-0-8176-3778-1) 32. K. Borre, D. M. Akos, N. Bertelsen, P. Rinder, and S. H. Jensen: A SoftwareDefined GPS and Galileo Receiver (ISBN: 978-0-8176-4390-4) 33. T. Qian, M. I. Vai, and Y. Xu: Wavelet Analysis and Applications (ISBN: 978-3-7643-7777-9) 34. G. T. Herman and A. Kuba: Advances in Discrete Tomography and Its Applications (ISBN: 978-0-8176-3614-2) 35. M. C. Fu, R. A. Jarrow, J.-Y. Yen, and R. J. Elliott: Advances in Mathematical Finance (ISBN: 978-0-8176-4544-1) 36. O. Christensen: Frames and Bases (ISBN: 978-0-8176-4677-6) 37. P. E. T. Jorgensen, J. D. Merrill, and J. A. Packer: Representations, Wavelets, and Frames (ISBN: 978-0-8176-4682-0) 38. M. An, A. K. Brodzik, and R. Tolimieri: Ideal Sequence Design in TimeFrequency Space (ISBN: 978-0-8176-4737-7) 39. S. G. Krantz: Explorations in Harmonic Analysis (ISBN: 978-0-8176-4668-4) 40. B. Luong: Fourier Analysis on Finite Abelian Groups (ISBN: 978-0-81764915-9)
Applied and Numerical Harmonic Analysis (105 volumes)
647
41. G. S. Chirikjian: Stochastic Models, Information Theory, and Lie Groups, Volume 1 (ISBN: 978-0-8176-4802-2) 42. C. Cabrelli and J. L. Torrea: Recent Developments in Real and Harmonic Analysis (ISBN: 978-0-8176-4531-1) 43. M. V. Wickerhauser: Mathematics for Multimedia (ISBN: 978-0-8176-4879-4) 44. B. Forster, P. Massopust, O. Christensen, K. Gröchenig, D. Labate, P. Vandergheynst, G. Weiss, and Y. Wiaux: Four Short Courses on Harmonic Analysis (ISBN: 978-0-8176-4890-9) 45. O. Christensen: Functions, Spaces, and Expansions (ISBN: 978-0-8176-49791) 46. J. Barral and S. Seuret: Recent Developments in Fractals and Related Fields (ISBN: 978-0-8176-4887-9) 47. O. Calin, D.-C. Chang, and K. Furutani, and C. Iwasaki: Heat Kernels for Elliptic and Sub-elliptic Operators (ISBN: 978-0-8176-4994-4) 48. C. Heil: A Basis Theory Primer (ISBN: 978-0-8176-4686-8) 49. J. R. Klauder: A Modern Approach to Functional Integration (ISBN: 978-08176-4790-2) 50. J. Cohen and A. I. Zayed: Wavelets and Multiscale Analysis (ISBN: 978-08176-8094-7) 51. D. Joyner and J.-L. Kim: Selected Unsolved Problems in Coding Theory (ISBN: 978-0-8176-8255-2) 52. G. S. Chirikjian: Stochastic Models, Information Theory, and Lie Groups, Volume 2 (ISBN: 978-0-8176-4943-2) 53. J. A. Hogan and J. D. Lakey: Duration and Bandwidth Limiting (ISBN: 9780-8176-8306-1) 54. G. Kutyniok and D. Labate: Shearlets (ISBN: 978-0-8176-8315-3) 55. P. G. Casazza and P. Kutyniok: Finite Frames (ISBN: 978-0-8176-8372-6) 56. V. Michel: Lectures on Constructive Approximation (ISBN: 978-0-81768402-0) 57. D. Mitrea, I. Mitrea, M. Mitrea, and S. Monniaux: Groupoid Metrization Theory (ISBN: 978-0-8176-8396-2) 58. T. D. Andrews, R. Balan, J. J. Benedetto, W. Czaja, and K. A. Okoudjou: Excursions in Harmonic Analysis, Volume 1 (ISBN: 978-0-8176-8375-7) 59. T. D. Andrews, R. Balan, J. J. Benedetto, W. Czaja, and K. A. Okoudjou: Excursions in Harmonic Analysis, Volume 2 (ISBN: 978-0-8176-8378-8) 60. D. V. Cruz-Uribe and A. Fiorenza: Variable Lebesgue Spaces (ISBN: 978-30348-0547-6) 61. W. Freeden and M. Gutting: Special Functions of Mathematical (Geo-)Physics (ISBN: 978-3-0348-0562-9) 62. A. I. Saichev and W. A. Woyczy´nski: Distributions in the Physical and Engineering Sciences, Volume 2: Linear and Nonlinear Dynamics of Continuous Media (ISBN: 978-0-8176-3942-6) 63. S. Foucart and H. Rauhut: A Mathematical Introduction to Compressive Sensing (ISBN: 978-0-8176-4947-0)
648
Applied and Numerical Harmonic Analysis (105 volumes)
64. G. T. Herman and J. Frank: Computational Methods for Three-Dimensional Microscopy Reconstruction (ISBN: 978-1-4614-9520-8) 65. A. Paprotny and M. Thess: Realtime Data Mining: Self-Learning Techniques for Recommendation Engines (ISBN: 978-3-319-01320-6) 66. A. I. Zayed and G. Schmeisser: New Perspectives on Approximation and Sampling Theory: Festschrift in Honor of Paul Butzer’s 85.th Birthday (ISBN: 978-3-319-08800-6) 67. R. Balan, M. Begue, J. Benedetto, W. Czaja, and K. A. Okoudjou: Excursions in Harmonic Analysis, Volume 3 (ISBN: 978-3-319-13229-7) 68. H. Boche, R. Calderbank, G. Kutyniok, and J. Vybiral: Compressed Sensing and its Applications (ISBN: 978-3-319-16041-2) 69. S. Dahlke, F. De Mari, P. Grohs, and D. Labate: Harmonic and Applied Analysis: From Groups to Signals (ISBN: 978-3-319-18862-1) 70. A. Aldroubi: New Trends in Applied Harmonic Analysis (ISBN: 978-3-31927871-1) 71. M. Ruzhansky: Methods of Fourier Analysis and Approximation Theory (ISBN: 978-3-319-27465-2) 72. G. Pfander: Sampling Theory, a Renaissance (ISBN: 978-3-319-19748-7) 73. R. Balan, M. Begue, J. Benedetto, W. Czaja, and K. A. Okoudjou: Excursions in Harmonic Analysis, Volume 4 (ISBN: 978-3-319-20187-0) 74. O. Christensen: An Introduction to Frames and Riesz Bases, Second Edition (ISBN: 978-3-319-25611-5) 75. E. Prestini: The Evolution of Applied Harmonic Analysis: Models of the Real World, Second Edition (ISBN: 978-1-4899-7987-2) 76. J. H. Davis: Methods of Applied Mathematics with a Software Overview, Second Edition (ISBN: 978-3-319-43369-1) 77. M. Gilman, E. M. Smith, and S. M. Tsynkov: Transionospheric Synthetic Aperture Imaging (ISBN: 978-3-319-52125-1) 78. S. Chanillo, B. Franchi, G. Lu, C. Perez, and E. T. Sawyer: Harmonic Analysis, Partial Differential Equations and Applications (ISBN: 978-3-319-52741-3) 79. R. Balan, J. Benedetto, W. Czaja, M. Dellatorre, and K. A. Okoudjou: Excursions in Harmonic Analysis, Volume 5 (ISBN: 978-3-319-54710-7) 80. I. Pesenson, Q. T. Le Gia, A. Mayeli, H. Mhaskar, and D. X. Zhou: Frames and Other Bases in Abstract and Function Spaces: Novel Methods in Harmonic Analysis, Volume 1 (ISBN: 978-3-319-55549-2) 81. I. Pesenson, Q. T. Le Gia, A. Mayeli, H. Mhaskar, and D. X. Zhou: Recent Applications of Harmonic Analysis to Function Spaces, Differential Equations, and Data Science: Novel Methods in Harmonic Analysis, Volume 2 (ISBN: 978-3-319-55555-3) 82. F. Weisz: Convergence and Summability of Fourier Transforms and Hardy Spaces (ISBN: 978-3-319-56813-3) 83. C. Heil: Metrics, Norms, Inner Products, and Operator Theory (ISBN: 978-3319-65321-1) 84. S. Waldron: An Introduction to Finite Tight Frames: Theory and Applications. (ISBN: 978-0-8176-4814-5)
Applied and Numerical Harmonic Analysis (105 volumes)
649
85. D. Joyner and C. G. Melles: Adventures in Graph Theory: A Bridge to Advanced Mathematics. (ISBN: 978-3-319-68381-2) 86. B. Han: Framelets and Wavelets: Algorithms, Analysis, and Applications (ISBN: 978-3-319-68529-8) 87. H. Boche, G. Caire, R. Calderbank, M. März, G. Kutyniok, and R. Mathar: Compressed Sensing and Its Applications (ISBN: 978-3-319-69801-4) 88. A. I. Saichev and W. A. Woyczy´nski: Distributions in the Physical and Engineering Sciences, Volume 3: Random and Fractal Signals and Fields (ISBN: 978-3-319-92584-4) 89. G. Plonka, D. Potts, G. Steidl, and M. Tasche: Numerical Fourier Analysis (978-3-030-04305-6) 90. K. Bredies and D. Lorenz: Mathematical Image Processing (ISBN: 978-3-03001457-5) 91. H. G. Feichtinger, P. Boggiatto, E. Cordero, M. de Gosson, F. Nicola, A. Oliaro, and A. Tabacco: Landscapes of Time-Frequency Analysis (ISBN: 9783-030-05209-6) 92. E. Liflyand: Functions of Bounded Variation and Their Fourier Transforms (ISBN: 978-3-030-04428-2) 93. R. Campos: The XFT Quadrature in Discrete Fourier Analysis (ISBN: 978-3030-13422-8) 94. M. Abell, E. Iacob, A. Stokolos, S. Taylor, S. Tikhonov, J. Zhu: Topics in Classical and Modern Analysis: In Memory of Yingkang Hu (ISBN: 9783-030-12276-8) 95. H. Boche, G. Caire, R. Calderbank, G. Kutyniok, R. Mathar, P. Petersen: Compressed Sensing and its Applications: Third International MATHEON Conference 2017 (ISBN: 978-3-319-73073-8) 96. A. Aldroubi, C. Cabrelli, S. Jaffard, U. Molter: New Trends in Applied Harmonic Analysis, Volume II: Harmonic Analysis, Geometric Measure Theory, and Applications (ISBN: 978-3-030-32352-3) 97. S. Dos Santos, M. Maslouhi, K. Okoudjou: Recent Advances in Mathematics and Technology: Proceedings of the First International Conference on Technology, Engineering, and Mathematics, Kenitra, Morocco, March 26-27, 2018 (ISBN: 978-3-030-35201-1) 98. Á. Bényi, K. Okoudjou: Modulation Spaces: With Applications to Pseudodifferential Operators and Nonlinear Schrödinger Equations (ISBN: 978-1-07160330-7) 99. P. Boggiato, M. Cappiello, E. Cordero, S. Coriasco, G. Garello, A. Oliaro, J. Seiler: Advances in Microlocal and Time-Frequency Analysis (ISBN: 978-3030-36137-2) 100. S. Casey, K. Okoudjou, M. Robinson, B. Sadler: Sampling: Theory and Applications (ISBN: 978-3-030-36290-4) 101. P. Boggiatto, T. Bruno, E. Cordero, H. G. Feichtinger, F. Nicola, A. Oliaro, A. Tabacco, M. Vallarino: Landscapes of Time-Frequency Analysis: ATFA 2019 (ISBN: 978-3-030-56004-1)
650
Applied and Numerical Harmonic Analysis (105 volumes)
102. M. Hirn, S. Li, K. Okoudjou, S. Saliana, Ö. Yilmaz: Excursions in Harmonic Analysis, Volume 6: In Honor of John Benedetto’s 80th Birthday (ISBN: 9783-030-69636-8) 103. F. De Mari, E. De Vito: Harmonic and Applied Analysis: From Radon Transforms to Machine Learning (ISBN: 978-3-030-86663-1) 104. G. Kutyniok, H. Rauhut, R. J. Kunsch: Compressed Sensing in Information Processing (ISBN: 978-3-031-09744-7) 105. P. Flandrin, S. Jaffard, T. Paul, B. Torresani: Theoretical Physics, Wavelets, Analysis, Genomics: An Indisciplinary Tribute to Alex Grossmann (ISBN: 9783-030-45846-1) For an up-to-date list of ANHA titles, please visit http://www.springer.com/series/ 4968