137 91
English Pages 790 [824] Year 2013
European Congress of Mathematics Kraków, 2 – 7 July, 2012
European Congress of Mathematics Kraków, 2 – 7 July, 2012 Rafał Latała, Andrzej Rucin´ski, Paweł Strzelecki, Jacek S´wia˛tkowski, Dariusz Wrzosek and Piotr Zakrzewski Editors
The European Congress of Mathematics, held every four years, has become a wellestablished major international mathematical event. Following those in Paris (1992), Budapest (1996), Barcelona (2000), Stockholm (2004) and Amsterdam (2008), the Sixth European Congress of Mathematics (6ECM) took place in Kraków, Poland, July 2–7, 2012, with about 1000 participants from all over the world. Ten plenary, thirty-three invited lectures and three special lectures formed the core of the program. As at all the previous EMS congresses, ten outstanding young mathematicians received the EMS prizes in recognition of their research achievements. In addition, two more prizes were awarded: the Felix Klein Prize for a remarkable solution of an industrial problem, and – for the first time – the Otto Neugebauer Prize for a highly original and influential piece of work in the history of mathematics. The program was complemented by twenty-four minisymposia with nearly 100 talks, spread over all areas of mathematics. Six panel discussions were organized, covering a variety of issues ranging from the financing of mathematical research to gender imbalance in mathematics.
www.ems-ph.org
Rafał Latała Andrzej Rucin´ski Paweł Strzelecki Jacek S´wia˛tkowski Dariusz Wrzosek Piotr Zakrzewski
ISBN 978-3-03719-120-0
Editors
These proceedings present extended versions of most of the invited talks which were delivered during the congress, providing a permanent record of the best what mathematics offers today.
European Congress of Mathematics Kraków, 2 – 7 July, 2012 Rafał Latała Andrzej Rucin´ski Paweł Strzelecki Jacek S´wia˛tkowski Dariusz Wrzosek Piotr Zakrzewski Editors
European Congress of Mathematics Kraków, 2 – 7 July, 2012 Rafał Latała Andrzej Rucin´ski Paweł Strzelecki Jacek S´wia˛tkowski Dariusz Wrzosek Piotr Zakrzewski Editors
Editors: Rafał Latała Institute of Mathematics University of Warsaw Banacha 2 02-097 Warsaw, Poland
Jacek S´wia˛tkowski Mathematical Institute University of Wrocław Pl. Grunwaldzki 2/4 50-384 Wrocław, Poland
E-mail: [email protected]
E-mail: [email protected]
Andrzej Rucin´ ski Faculty of Mathematics and Computer Science A. Mickiewicz University Umultowska 67 61-614 Poznan´ , Poland
Dariusz Wrzosek Institute of Applied Mathematics and Mechanics University of Warsaw Banacha 2 02-097 Warsaw, Poland
E-mail: [email protected]
E-mail: [email protected]
Paweł Strzelecki Institute of Mathematics University of Warsaw Banacha 2 02-097 Warsaw, Poland
Piotr Zakrzewski Institute of Mathematics University of Warsaw Banacha 2 02-097 Warsaw, Poland
E-mail: [email protected]
E-mail: [email protected]
2010 Mathematics Subject Classification: 00Bxx
ISBN 978-3-03719-120-0 The Swiss National Library lists this publication in The Swiss Book, the Swiss national bibliography, and the detailed bibliographic data are available on the Internet at http://www.helveticat.ch. This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in other ways, and storage in data banks. For any kind of use permission of the copyright owner must be obtained. © European Mathematical Society 2013
Contact address:
European Mathematical Society Publishing House Seminar for Applied Mathematics ETH-Zentrum SEW A27 CH-8092 Zürich Switzerland
Phone: +41 (0)44 632 34 36 Email: [email protected] Homepage: www.ems-ph.org
Typeset using the authors’ T E X files: Leszek Pienia˛z˙ek, Kraków Background picture of St. Mary’s church in Kraków courtesy of Jan Mehlich Printing and binding: Beltz Bad Langensalza GmbH, Bad Langensalza, Germany ∞ Printed on acid free paper 987654321
Preface The Sixth European Congress of Mathematics (6 ECM) was held from July 2nd till July 7th, 2012 at the Auditorium Maximum of the Jagiellonian University in Kraków. It was organized by the Polish Mathematical Society (Polskie Towarzystwo Matematyczne, PTM) in collaboration with the Jagiellonian University (UJ), under the auspices of the European Mathematical Society (EMS). Previous European Congresses of Mathematics were held in Paris (1992), Budapest (1996), Barcelona (2000), Stockholm (2004) and Amsterdam (2008). As at all the previous EMS congresses, ten young researchers selected by the Prize Committee nominated by the EMS received the EMS prizes in recognition of outstanding research accomplishments. Twenty years after the 1 ECM in Paris (1992), these prizes are considered to be one of the most prestigious awards for young talented mathematicians. A glance at the lists of all EMS prize winners and of the Fields Medal laureates confirms that view. In addition, the Felix Klein Prize was awarded for a third time, jointly by the EMS and the Institute for Industrial Mathematics in Kaiserslautern, for a remarkable solution of an industrial problem. Finally, for the first time the Otto Neugebauer Prize was awarded for a highly original and influential piece of work in the history of mathematics. About 1000 mathematicians attended the congress and took part in the activities that consisted of 10 plenary lectures, 33 invited lectures, 3 special lectures and 11 lectures by the prize winners, complemented by 6 panel discussions and 24 minisymposia with 94 talks. 179 posters were presented, and 11 of them were awarded prizes, funded by the publishers presenting their products during the congress. Moreover, at the nearby AGH University of Science and Technology, 15 satellite thematic sessions were held with 155 talks. In total, almost half of the participants presented their results in some form. A variety of opportunities of presentations helped to increase participation in the 6 ECM. Finally, 29 satellite conferences of the 6 ECM have been organized. These proceedings present extended versions of most of the invited talks which were delivered during the congress, or in one case submitted to the proceedings by the plenary speaker who couldn’t come. A volume such as this one is always a specific snapshot, possibly somewhat biased but still worthwile to look at, of the current state of mathematics; it tries to capture and show the most fashionable trends, crucial achievements, emerging new research directions, and – last but not least – the research leaders who shape the field. The organizers of 6 ECM and the editors of the proceedings thank all the authors who made an effort to prepare papers for this volume; we all do know that the free time necessary to complete such a task is a rare and precious commodity that one has to provide taking into account the neverending heap of research, teaching, administrative and personal duties. We are grateful for all the support we have obtained. All errors in that book are ours. On behalf of the editors Paweł Strzelecki
v
Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v 6 ECM Committees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 6 ECM Sponsors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Opening Ceremony . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv The Prize Winners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi List of events. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xxvi Closing Ceremony . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxx Participants, lectures and speakers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxi
Plenary Lectures Adrian Constantin Some mathematical aspects of water waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 C a m i l l o D e L e l l i s, L á s z l ó S z é k e l y h i d i Continuous dissipative Euler flows and a conjecture of Onsager . . . . . . . 13 H e r b e r t E d e l s b r u n n e r, D m i t r i y M o r o z o v Persistent homology: theory and practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Mikhail Gromov In a search for a structure, Part 1: On entropy . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Christopher Hacon Classification of algebraic varieties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 A l e x a n d e r B r a v e r m a n, D a v i d K a z h d a n Representations of affine Kac–Moody groups over local and global fields: a survey of some recent results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Sylvia Serfaty Emergence of the Abrikosov lattice in several models with two dimensional Coulomb interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Saharon Shelah Dependent classes, E72 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Michel Talagrand Chaining and the geometry of stochastic processes. . . . . . . . . . . . . . . . . . . . . .159
Invited Lectures Anton Alekseev Duflo isomorphism, the Kashiwara–Vergne conjecture and Drinfeld associators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Jean Bertoin Coagulation with limited aggregations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
vii
Serge Cantat The Cremona group in two variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Vicent Caselles Variational models for image inpainting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 Alessandra Celletti KAM theory and its applications: from conservative to dissipative systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Pierre Colmez Le programme de Langlands p-adique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 T o m C o a t e s, A l e s s i o C o r t i, S e r g e y G a l k i n, V a s i l y G o l y s h e v, A l e x a n d e r K a s p r z y k Mirror symmetry and Fano manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Hélène Esnault On flat bundles in characteristic 0 and p > 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 A l e x a n d e r A. G a i f u l l i n Combinatorial realisation of cycles and small covers . . . . . . . . . . . . . . . . . . . . 315 Isabelle Gallagher Remarks on the global regularity for solutions to the incompressible Navier–Stokes equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 Olle Häggström Why the empirical sciences need statistics so desperately . . . . . . . . . . . . . . . 347 Arieh Iserles Computing the Schrödinger equation with no fear of commutators . . . . . 361 A l e x a n d e r S. K e c h r i s Dynamics of non-archimedean Polish groups. . . . . . . . . . . . . . . . . . . . . . . . . . . .375 Bernhard Keller Cluster algebras and cluster monomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399 Sławomir Kołodziej Weak solutions to the complex Monge–Ampère equation . . . . . . . . . . . . . . . . 415 Gady Kozma Reinforced random walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429 Frank Merle On blow-up curves for semilinear wave equations . . . . . . . . . . . . . . . . . . . . . . . 445 A n d r e y E. M i r o n o v Commuting higher rank ordinary differential operators . . . . . . . . . . . . . . . . 459 David Nualart Stochastic calculus with respect to the fractional Brownian motion . . . . . 475 Alexander Olevskii Sampling, interpolation, translates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489 Leonid Parnovski Multidimensional periodic and almost-periodic spectral problems . . . . . . 503
viii
Benjamin Schlein Effective equations for quantum dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515 Piotr Śniady Combinatorics of asymptotic representation theory . . . . . . . . . . . . . . . . . . . . . 531 H a o J i a, V l a d i m í r S v e ř á k On scale-invariant solutions of the Navier–Stokes equations. . . . . . . . . . . .547 Stevo Todorčević Ramsey-theoretic analysis of the conditional structure of weakly-null sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555
Prize Winners’ Lectures Simon Brendle Uniqueness results for minimal surfaces and constant mean curvature surfaces in Riemannian manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573 Alessio Figalli Stability in geometric and functional inequalities . . . . . . . . . . . . . . . . . . . . . . . 585 Adrian Ioana Classification and rigidity for von Neumann algebras . . . . . . . . . . . . . . . . . . . 601 Mathieu Lewin A nonlinear variational problem in relativistic quantum mechanics . . . . 627 Ciprian Manolescu Grid diagrams in Heegaard Floer theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643 Gregory Miermont Random maps and continuum random 2-dimensional geometries . . . . . . 659 Tom Sanders Approximate (Abelian) groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .675 Corinna Ulcigrai Shearing and mixing in parabolic flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 691 Emmanuel Trelat Optimal control theory and some applications to aerospace problems . . 707 J a n P. H o g e n d i j k Mathematics and geometric ornamentation in the medieval Islamic world . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727
Special Lectures José Francisco Rodrigues Some mathematical aspects of the Planet Earth . . . . . . . . . . . . . . . . . . . . . . . . 743 Philip Welch Turing’s mathematical work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .763 A r t u r S i e m a s z k o, M a c i e j P . W o j t k o w s k i Counting Berg partitions via Sturmian words and substitution tilings . 779
ix
Andrzej Pelczar (1937–2010), the initiator and original organizer of the 6 ECM c Konrad K. Pollesch Photo:
6 ECM Commitees Honorary Patronage President of the Republic of Poland Bronisław Komorowski
Honorary Committee Minister of Science and Higher Education Barbara Kudrycka Voivode of Małopolska Voivodship Jerzy Miller Marshal of Małopolska Voivodship Marek Sowa Mayor of Kraków Jacek Majchrowski
Scientific Committee Eduard Feireisl (Chair), Joan Bagaria, Brian Davies, Corrado De Concini, Gerhard Frey, Sara van de Geer, Sabir Gusein-Zade, Helge Holden, Philip K. Maini, Marian Mrozek, Felix Otto, Jesus Sanz-Serna, Jan H. van Schuppen, Misha Sodin, Claire Voisin
EMS Prize Committee Frances Kirwan (Chair), Eva Bayer Flückiger, Alfredo Bermúdez de Castro, Jørgen Ellegaard Andersen, Maria Esteban, Olle Häggström, Alex Lubotzky, Wolfgang Lück, Tomasz Łuczak, Ieke Moerdijk, Sergey Novikov, Alfio Quarteroni, Karl Sigmund, Erling Størmer, Alain-Sol Sznitman
Felix Klein Prize Committee Wil H.A. Schilders (Chair), Yvon Maday, Antonio Fasano, Axel Klar, Helmut Neunzert, Hilary Ockendon
Otto Neugebauer Prize Committee Jeremy Gray (Chair), Lennart Berggren, Jesper Lützen, Jeanne Peiffer, Catriona Byrne
xi
Organisers Executive Organising Committee Stefan Jackowski (UW∗ ), Chair, President of PTM† Zbigniew Błocki (UJ‡ ), Vice-Chair Krystyna Jaworska (WAT§ ), Secretary of PTM Wacław Marzantowicz (UAM¶ ), Vice-President of PTM Piotr Tworzewski (UJ), Vice-Rector of UJ Robert Wolak (UJ) Secretaries: Andrzej Grzesik (UJ), Anna Grzesik, Piotr Niemiec (UJ) Programme Coordinator: Witold Majdak (AGHk ) Software development & Webmasters: Janusz Meissner, Joanna Meissner (AGH), Paweł Witkowski (UW, Intools) Exhibition & Promotion Coordinator: Anna Kula (UJ) Grant Committee: Robert Wolak (UJ) – Chair, Sławomir Rams (UJ), Andrzej Biś (UŁ∗∗ ), Jerzy Jaworski (UAM) Poster Committee: Robert Wolak (UJ) – Chair, Sylwia Barnaś (PK†† ) – Secretary, Kamil Rusek (UJ), Sławomir Rams (UJ) Tourist programme: Liliana Klimczak (UJ), Katarzyna Kos, Magdalena Nowak (UJ), Wojciech Słomczyński (UJ) Other Organisers: Krzysztof Ciesielski (UJ), Krzysztof Deszyński (UJ), Armen Edigarian (UJ), Łucja Farnik (UJ), Katarzyna Gizicka (UJ), Tomasz Lenarcik (UJ), Anna Pelczar-Barwacz (UJ), Przemysław Rola (UJ), Jerzy Stochel (AGH), Dagmara Waszkiewicz (UJ), Małgorzata Zajęcka (UJ, PK)
The opening day
Photos: ∗ UW
= Univeristy of Warsaw = Polish Mathematical Society ‡ UJ = Jagiellonian University in Kraków § WAT = Military University of Technology, Warszawa ¶ UAM = Adam Mickiewicz University in Poznań k AGH = AGH University of Science and Technology, Kraków ∗∗ UŁ = University of Łódź †† PK = Tadeusz Kościuszko Cracow University of Technology † PTM
xii
Ada Pałka
Sponsors Ministerstwo Nauki i Szkolnictwa Wyższego Ministry of Science and Higher Education
Fundacja na rzecz Nauki Polskiej Foundation for Polish Science
European Matehmatical Society
Województwo Małopolskie Małopolska Region
Uniwersytet Warszawski University of Warsaw
Akademia Górniczo–Hutnicza im. St. Staszica AGH University for Science and Technology
Politechnika Krakowska im. Tadeusza Kościuszki Tadeusz Kościuszko Cracow University of Technology
Ericpol Sp. z o.o.
xiii
Opening Ceremony The opening ceremony of the 6th European Congress of Mathematics, presided over by Piotr Krasnowolski, was held on Moday, July 2 in the Auditorium Maximum of the Jagellonian University in the heart of Kraków old town. After a brief welcome address by the Rector of Jagiellonian University, Professor Karol Musioł, three speeches were delivered by Professor Barbara Kudrycka, Minister of Science and Higher Education of Republic of Poland, by Professor Marta Sanz-Solé, President of the European Mathematical Society and by Professor Stefan Jackowski, President of the Polish Mathematical Society and Chair of the Executive Organizing Committee of the 6th ECM.
Speech of Minister Barbara Kudrycka We are meeting today in Kraków, a magical city in Poland. Magical, because it combines what is the most beautiful of Poland’s historical heritage with the most modern science and technology. We are meeting at the Jagiellonian University – one of the oldest universities in Europe with a long and distinguished scientific tradition. Today Kraków is one of the most attractive places in Europe for international investors in terms of BOP‡‡ factors, thanks to high qualification of its residents. Kraków has recently become the seat of the National Science Centre that finances fundamental research, and which, by the way, is headed by a mathematician – Professor Michał Karoński, the Chairman of the NCN Council. Kraków is a very special place for Polish mathematics. In 1916, whilst strolling through the Planty Park, Dr. Hugo Steinhaus met Stefan Banach – at that time a young student. Steinhaus heard the young man discussing mathematics with a friend. He interrupted his conversation and gave Banach his first academic job. Today, nearly half of all competitions announced by the National Science Centre are addressed to very young scientists. Polish mathematicians of today continue Banach’s great tradition. The prestigious European Mathematical Society Prize was awarded to the Polish ‡‡ BOP
= balance of payments.
xv
mathematicians: Tomasz Łuczak (in 1992) and Agata Smoktunowicz (in 2008), both of whom are present at this congress. Young university students, and even high school students, have also achieved success in this field. Last year, the Third Prize at the 23rd EU Contest for Young Scientists organized by the European Commission was awarded to Michał Miśkiewicz, a young mathematician from Warsaw. Research work carried out by individuals who were not yet even university students also qualified for the competition. Students of the Jagiellonian University turned out to be the best at the International Mathematics Competition for University Students. When we speak about mathematics we must remember that mathematics is the most fundamental of fundamental sciences. In recent years Poland has greatly increased its spending on basic research, largely as a result of the establishment of the National Science Centre, whose budget (in PLN) amounted to 471 million in 2011, while in 2012 it will reach 858 million. The Polish business community is becoming increasingly aware of the importance of research in mathematics. Ericpol – Poland’s largest software exporter, can serve as an example, as it funds the Stefan Banach Prize for doctoral dissertations in mathematics. Poland is very aware of the importance of science education, which of course includes mathematics. The government has provided considerable support for science studies through the “priority studies” programme, which allocates additional grants for science and technology students. As a result, last year the number of candidates applying for technical studies exceeded the number of candidates for non-technical disciplines for the first time in many decades. The government is aware of the special role of mathematical abilities in the labour market. That is why in 2010 mathematics was reintroduced as a compulsory subject at secondary school final exams. The recently initiated process of selecting Leading National Research Centres in Poland will play an important role in the reform of science and education as a whole. The Leading Centres will attract people of the highest research and teaching ability, bringing together outstanding scientists, doctoral students and undergraduates. This month the Leading Centre in mathematics will be selected from applying institutions, given the central role of mathematics in all sciences. For the same reason, of the many scientific congresses that take place in Europe every year, I have no doubt that this one – the scientific congress of mathematics – is the most important. Ladies and Gentlemen, people of science are travellers. Thank you that these days you arrived in Kraków. I am sure that you will have a very good and fruitful scientific discussion here and it will be an interesting experience.
xvi
Speech of EMS President Marta Sanz-Solé Rector Magnificus; Minister of Science and Higher Education of the Republic of Poland; distinguished guests, Ladies and Gentlemen! It is my privilege to welcome you all to the 6th ECM. This is one of the largest events in mathematics in the world and the most important scientific activity of the European Mathematical Society. We express our sincere thanks to the Jagiellonian University for hosting the Congress and for its generous support. We also thank the distinguished guests. With your presence, you are showing a much appreciated interest and support to mathematics. The invitation to Kraków was made by a honourable member of this university, the former Professor and Rector, and also former Vice-President of the European Mathematical Society, Andrzej Pelczar. Let me take this opportunity to honor his memory and to pay tribute to his devoted work for the Society. Mathematics has a strong tradition of volunteer work. Running mathematical societies, organizing scientific events, publishing journals and books, and organizing activities to attract talented young students are among the very many examples. Poland, with its longstanding and solid mathematical tradition, and outstanding mathematicians, has been among the most generous in this respect. Let me mention a very few but illustrative cases: – In 1929, only ten years after its foundation, the Polish Mathematical Society organized the First Congress of Mathematicians of the Slavic countries. – Poland was the organizer of the International Congress of Mathematicians (ICM) in August 1983. To put this event into better context, let us recall than between December 1981 and July 1983, this country was under martial law, in an attempt to crash political opposition. These were extremely difficult times for most of the citizens of this country. – Mathematics institutions in Poland, and in particular the Banach Centre, have been instrumental in providing conditions for interaction and collaboration of mathematicians across Europe. This has been extremely valuable, specially for those coming from East European countries in a period where crossing borders was extremely difficult if not impossible.
xvii
– The last example is of special significance for the history of the EMS, since it constitutes its public debut. Our Society was founded on the 28th October 1990, in a residence of the Polish Academy of Sciences in Mądralin (near Warsaw). Bogdan Bojarski, on behalf of the Polish Academy of Sciences, and Andrzej Pelczar, President of the Polish Mathematical Society, were the hosts of this important event. We are just about to enjoy a great feast of mathematics in Europe. This is made possible thanks to the devoted efforts of very many people and institutions who deserve our gratitude. Let me mention them: – the members of the Scientific Committee for their excellent work in putting together the program of lectures; – the members of the three prize committees – the EMS Prize, the Felix Klein Prize and the Otto Neugebauer Prize – for their difficult task in selecting the awardees among a large number of remarkable nominations; – the Organizing Committee. Thanks to its tremendous and brilliant work, we will all be able to savour an unforgettable event. This is yet another example of generous service to mathematics of the Polish mathematical community; – the sponsors of the Congress: all the funding agencies, universities from Kraków, Warsaw and other cities, and private and public organisations; – the sponsors of the Prizes: Foundation Compositio Mathematica, the Institute for Industrial Mathematics in Kaiserslautern and Springer Verlag. Why ECM’s? Like many other disciplines, mathematics has reached a degree of extreme specialization. Nevertheless, there remains a need for keeping its unity as a scientific discipline, for resisting fragmentation, and for maintaining and even increasing fluid communication between its domains. A holistic structure will better contribute to genuine progress of scientific knowledge. As for other theoretical or experimental areas, scientific, social or humanistic, the most significant mathematical advances and breakthroughs involve a complex and sophisticated combination of ingredients, expertise and techniques from different fields. By keeping our minds wide open, and nurturing the desire of exploring beyond the boundaries of one’s specific research speciality, we will have a better chance to be at the forefront of the scientific advances in our discipline. Events like the European Congresses of Mathematics provide a very suitable stage and good conditions for these practices. An ECM is a forum for sharing mathematical knowledge and experience with mathematicians interested in different subjects, including those at the crossroads of the discipline. It is also a forum for discussion of many aspects of the profession, a place for networking and for establishing bonds of solidarity, for becoming xviii
more aware of the importance of mathematics for the world, for feeling the need of coming closer to society, explaining the usefulness of mathematics to the public. We are in an ancient and beautiful city of Europe, located in a splendid region and full of historical monuments. Those who enjoy nature and landscape will have the possibility to navigate along the Vistula River, or to hike in the Tatra mountains. If you would prefer peace and time for meditation, you will find shelter in the omnipresent magnificent Krakovian churches. And on the streets, be surprised! You will see that mathematics is the cultural protagonist in the city throughout this week. On behalf of the European Mathematical Society, I would like to thank all those who helped to bring the 6 ECM to fruition, and I wish you all a rewarding and enjoyable Congress. I declare the 6 ECM open. Thank you very much.
Speech of PTM President Stefan Jackowski Madame Minister, Rector Magnificus, Madame President of the EMS, Fellow mathematicians, On behalf of the Polish Mathematical Society and everyone involved in the organization of the Congress, I welcome all of you to the 6th European Congress of Mathematicians! I would like to extend a particular welcome to all the members of the European Mathematical Society. Let me remind you that the Society was founded in 1990 in Poland, near Warsaw. We are very pleased and honoured that this year the flagship meeting of the EMS is taking place in our country – welcome back! Thanks are due to the Scientific Committee of the 6 ECM for selecting speakers and to the prize award committees for nomination of the prize winners. I welcome cordially the speakers and organisers of many congress activities. I offer a special welcome to our colleagues from countries which do not yet enjoy the privileges of free travel around Europe. They often had to overcome many bureaucratic and financial obstacles to come here.
xix
I’d like to express our thanks to the sponsors of the congress – especially the EMS and the Polish Science Foundation whose generous grants made participation possible for many young mathematicians and for mathematicians from the economically challenged countries. Welcome to all our friends from other continents who are with us today. Welcome to everybody! Mathematicians are very grateful to the President of Poland who is the patron of our Congress and to the members of the Honorary Committee. The presence here of the Minister of Science and Higher Education emphasizes the place of mathematics in science, and will help to raise awareness of the role of mathematics and its prestige among the general public. Thank you very much, Professor Kudrycka for coming! We are very grateful for the hospitality of the city of Kraków and the Małopolska region, the AGH University of Science and Technology and last but not least the Jagiellonian University – co-organizer of the Congress. I cannot speak about the Jagiellonian University without mentioning Professor Andrzej Pelczar, a former Rector of the University, a past President of the PTM and a past vicepresident of the EMS – the initiator and original organizer of the congress, who died in May 2010 at the beginning of the preparations, leaving us with the obligation to turn his dream into reality. Professor Pelczar was a great patriot of his city. Kraków is a magical historic place for all Poles, also for mathematicians. I would like to remind you that the Polish Mathematical Society was founded here 93 years ago, and Stefan Banach, born in Kraków, was one of its founders. In the special volume of Wiadomości Matematyczne – journal of the Polish Mathematical Society – which participants will find in their conference bags, there is a lot of information about the history of mathematics in Poland, and in particular in Kraków. Besides the strictly scientific programme which offers a panorama of contemporary mathematics, the congress has very important social and cultural aspects. Panel discussions will touch upon social and political issues related to mathematics. Exhibitions of old mathematical books and of art related to mathematics, as well as ‘maths busking’ in the streets of Kraków, will help to raise public awareness of mathematics. The tourist programme will provide an opportunity to learn more about Poland’s complex history, and about its contemporary affairs, with the emphasis on its broader European context. I wish all of the participants a mathematically illuminating experience at the congress, and interesting social encounters. I wish all participants and their companions an interesting and pleasant stay in Kraków. Thank you for your attention.
xx
The Prize Winners During the Opening Ceremony, the prize winners, their work and achievements have been presented by the chairs of the respective prize committees.† Ten EMS Prizes, the Felix Klein Prize and the Otto Neugebauer Prize have been awarded.
Laureates at the Opening Ceremony
EMS Prize Winners Simon Brendle, Stanford University, USA Born: 1981, Germany. PhD: Tübingen Univeristy. Simon Brendle has received the EMS Prize for his outstanding results on geometric partial differential equations and systems of elliptic, parabolic and hyperbolic types, which have led to breakthroughs in differential geometry including the differentiable sphere theorem, the general convergence of Yamabe flow, the compactness property for solutions of the Yamabe equation, and the Min-Oo conjecture. † The
editors use quoations from the prize committees statements.
xxi
Emmanuel Breuillard, Université de Paris-Sud, Orsay, France Born: 1977, France. PhD: Université Paris-Sud, joint degree from Yale University. Emmanuel Breuillard received the EMS Prize for his important and deep research in asymptotic group theory, in particular on the Tits alternative for linear groups and on the study of approximate subgroups, using a wealth of methods from very different areas of mathematics, which has already made a lasting impact on combinatorics, group theory, number theory and beyond. Alessio Figalli, University of Texas, Austin, USA Born: 1984, Italy. PhD: Scuola Normale Superiore, Pisa, and École Normale Supérieure, Lyon. Alessio Figalli received the EMS Prize for his outstanding contributions to the regularity theory of optimal transport maps, to quantitative geometric and functional inequalities and to partial solutions of the Mather and Mañé conjectures in the theory of dynamical systems. Adrian Ioana, University of California at San Diego, USA Born: 1981, Romania. PhD: University of California, Los Angeles. Adrian Ioana received the EMS Prize for his impressive and deep work in the field of operator algebras and their connections to ergodic theory and group theory, and in particular for solving several important open problems in deformation and rigidity theory, among them a long standing conjecture of Connes concerning von Neumann algebras with no outer automorphisms. xxii
Mathieu Lewin, CNRS & Université Cergy–Pontoise, France Born: 1977, France. PhD: Université Paris–Dauphine. Mathieu Lewin received the EMS Prize for his ground breaking work in rigorous aspects of quantum chemistry, mean field approximations to relativistic quantum field theory and statistical mechanics. His research focuses on appications of variational and spectral methods to models from quantum mechanics.
Ciprian Manolescu, University of California, Los Angeles, USA Born: 1978, Romania. PhD: Harvard University. Ciprian Manolescu received the EMS Prize for his deep and highly influential work on Floer theory, successfully combining techniques from gauge theory, symplectic geometry, algebraic topology, dynamical systems and algebraic geometry to study low-dimensional manifolds, and in particular for his key role in the development of combinatorial Floer theory.
Grégory Miermont, Université de Paris-Sud, Orsay, France Born: 1979, France. Phd: École Normale Supérieure, Paris. Grégory Miermont received the EMS Prize for his outstanding work on scaling limits of random structures such as trees and random planar maps, and his highly innovative insight in the treatment of random metrics. xxiii
Sophie Morel, Harvard University, USA Born: 1979, France. PhD: Université de Paris-Sud, Orsay.
Sopie Morel’s research focuses on number theory, algebraic geometry, and group representation theory. She has obtained the EMS Prize for her deep and original work in arithmetic geometry and automorphic forms, in particular for her study of Shimura varieties, bringing new and unexpected ideas to this field. (Sophie Morel did not attend the 6 ECM).
Tom Sanders, Univeristy of Oxford, United Kingdom Born: 1981, UK. PhD: Cambridge Univeristy. Tom Sanders received the EMS Prize for his fundamental results in additive combinatorics and harmonic analysis, which combine in a masterful way deep known techniques with the invention of new methods to achieve spectacular applications.
Corinna Ulcigrai, University of Bristol, United Kingdom Born: 1980, Italy. PhD: Princeton University.
Corinna Ulcigrai has received the EMS Prize for advancing our understanding of dynamical systems and the mathematical characterisations of chaos, and especially for solving a long-standing fundamental question on the mixing property for locally Hamiltonian surface flows. xxiv
Otto Neugebauer Prize Jan P. Hogendijk, Utrecht University, The Netherlands Professor Jan Hogendijk has illuminated how Greek mathematics was absorbed in the medieval Arabic world, how mathematics developed in medieval Islam, and how it was eventually transmitted to Europe. His analysis also embraces the scientific traditions of the Babylonian, Greek, Indian, Persian, Eastern and Western Arabic, and Latin civilizations. His work is based on previously unexplored manuscripts and primary sources, and the highly specialized contents of his writings are balanced by his precise yet friendly style. For all of these reasons the jury is unanimous in recommending that Prof. Hogendijk receive the Neugebauer Prize.
Felix Klein Prize Emmanuel Trelat, Université Pierre et Marie Curie, Paris, France Emmanuel Trélat combines truly impressive and beautiful contributions in fine fundamental mathematics to understand and solve new problems in control of PDE’s and ODE’s (continuous, discrete and mixed problems), and above all for his studies on singular trajectories, with remarkable numerical methods and algorithms able to provide solutions to many industrial problems in real time, with substantial impact especially in the area of astronautics. He is certainly an example of a successful researcher in the field of mathematics for industry, illustrating that it is possible to be highly recognized in mathematics and working on real problems, with end-product in the form of software that is really useful in industry. xxv
Collector coins commemorating Stefan Banach
On the occasion of the 6 ECM, National Bank of Poland issued collector coins designed by Robert Kotowicz: bronze 2 PLN, silver 10 PLN, and gold 200 PLN, to commemorate Stefan Banach (1892–1945). On the reverse of all three coins Banach is portrayed; each obverse depicts a notion or result of functional analysis. At the opening ceremony a Member of the Board of the National Bank of Poland, Professor Eugeniusz Gatnar, handled silver coins to prize winners. (All 6 ECM participants received bronze coins.)
List of events Plenary Lectures Adrian Constantin, Some mathematical aspects of water waves Camillo De Lellis, Dissipative solutions of the Euler equations Herbert Edelsbrunner, Persistent homology and applications Mikhail Gromov, In a search for a structure‡ Christopher Hacon, Classification of algebraic varieties David Kazhdan, Representations of affine Kac–Moody groups over local and global fields Tomasz Łuczak, Threshold behaviour of random discrete structures Sylvia Serfaty, Renormalized energy, Abrikosov lattice, and log gases Saharon Shelah, Classifying classes of structures in model theory Michel Talagrand, Geometry of stochastic processes
‡ The plenary lecture of Misha Gromov has been cancelled due to his illness; however, the reader will find his text on p. 51 in the present volume.
xxvi
Invited Lectures Anton Alekseev, Bernoulli numbers, Drinfeld associators and the Kashiwara– Vergne problem Kari Astala, Holomorphic deformations, quasiconformal mappings and vector valued calculus of variations Jean Bertoin, Coagulation with limited aggregations Serge Cantat, The Cremona group Vicent Caselles, Exemplar-based image inpainting and applications Alessandra Celletti, KAM theory: a journey from conservative to dissipative systems Pierre Colmez, The p-adic Langlands program Alessio Corti, Mirror symmetry and Fano manifolds Amadeu Delshams, Irregular motion and global instability in Hamiltonian systems Hélène Esnault, On flat bundles in characteristic 0 and p > 0 Alexander A. Gaifullin, Combinatorial realisation of cycles and small covers Isabelle Gallagher, Remarks on global regularity for solutions to the incompressible Navier–Stokes equations Olle Häggström, Why the empirical sciences need statistics so desperately? Martin Hairer, Solving the KPZ equation Nicholas J. Higham, The matrix logarithm: from theory to computation Arieh Iserles, Computing the Schrödinger equation with no fear of commutators Alexander S. Kechris, Dynamics of non-archimedean Polish groups Bernhard Keller, Cluster algebras and cluster monomials Sławomir Kołodziej, Weak solutions to the complex Monge–Ampère equation Gady Kozma, Phase transitions in self-interacting random walks Frank Merle, On blow-up curves for semilinear wave equations Andrey Mironov, Commuting higher rank ordinary differential operators David Nualart, Stochastic calculus with respect to the fractional Brownian motion Alexander Olevskii, Sampling, interpolation, translates Leonid Parnovski, Multidimensional periodic and almost-periodic spectral problems: Bethe–Sommerfeld Conjecture and integrated density of states Florian Pop, About covering spaces and numbers Igor Rodnianski, Evolution problem in General Relativity Zeév Rudnick, Quantum chaos and number theory Benjamin Schlein, Effective equations for quantum dynamics Piotr Śniady, Combinatorics of asymptotic representation theory Andrew Stuart, Probing probability measures in high dimensions
xxvii
Vladimír Sveřák, On scale-invariant solutions of the Navier–Stokes equations Stevo Todorčević, Ramsey-theoretic analysis of the conditional structure of weakly-null sequences
Special Lectures and Events José Francisco Rodrigues, Mathematics for the Planet Earth. A Challenge and an Opportunity to Mathematicians Maciej P. Wojtkowski, Tilings and Markov Partitions. PTM Andrzej Pelczar Memorial Lecture Philip Welch, Mechanising the Mind: Turing and the Computable. A Centenary Lecture EMS Friedrich Hirzebruch Memorial Session
Mini-symposia 25 Years of Quantum Groups: From Definition to Classification (Alexander Stolin) Absolute Arithmetic and F1 -geometry (Koen Thas) Applied and Computational Algebraic Topology (Martin Raussen) Arithmetic Geometry (Wojciech Gajda, Samir Siksek) Bachelier Finance Society: Mathematical Finance (Peter K. Friz) Braids and Configuration Spaces (Mario Salvetti) Computational Dynamics and Computer Assisted Proofs (Warwick Tucker, Piotr Zgliczyński) Continuous Real Rational Functions and Related Topics (Krzysztof Kurdyka) Differential Algebra and Galois Theory (Zbigniew Hajto) Discrete Structures in Algebra, Geometry, Topology, and Computer Science (Eva-Maria Feichtner, Dmitry Feichtner-Kozlov) Fluid Dynamics (Piotr B. Mucha, Agnieszka Świerczewska-Gwiazda) Geometric and Quantitative Rigidity (Marta Lewicka) How Mathematics Illuminates Biology (Marta Tyran-Kamińska, Michael C. Mackey) Hyperbolic Conservation Laws (Piotr Gwiazda, Agnieszka ŚwierczewskaGwiazda) Implicitly Constituted Material Models: Modeling and Analysis (Josef Malek, Endre Süli) Infinite-dimensional Dynamical Systems with Time Delays (Tibor Krisztin, Hans-Otto Walther) Knot Theory and its Ramification (Józef H. Przytycki) Matchbox Dynamics (Krystyna Kuperberg) On Solutions to the Euler Equations of Incompressible Fluids (Xinyu He) Optimal Stopping and Applications (F. Thomas Bruss, Krzysztof Szajowski) Probabilistic Methods for Partial Differential Equations (Dan Crisan) xxviii
Progress in ‘Chemical Reaction Network Theory’ (Carsten Wiuf, Elisenda Feliu) Semigroups of Operators: Theory and Applications (Adam Bobrowski, Yuri Tomilov, Ralph Chill) Stochastic Models in Biosciences and Climatology (Samy Tindel)
Satellite Thematic Sessions Algebraic and Geometric Methods in Nonlinear PDEs, Mechanics and Field Theory (Vyacheslav S. Kalnitsky, Alexandre M. Vinogradov) Anisotropic Parabolic Problems and their Applications (Piotr B. Mucha, Piotr Rybka) Combinatorics (Jarosław Grytczuk, Michał Karoński, Mariusz Woźniak) Delay Equations in Biomedical Applications (Urszula Foryś) Geometric Methods in Calculus of Variations (Marcella Palese) Geometric Topology (Jerzy Dydak, Danuta Kołodziejczyk, Stanisław Spież) Geometry in Dynamics (Alex Clark, Krystyna Kuperberg) Homotopy Theory (David Blanc, Marek Golasiński) Infinite Dimensional Dynamical Systems with Time Delays (Tibor Krisztin, Hans-Otto Walther) Integrable Systems (Maciej Błaszak) Knot Theory and its Ramification (Józef H. Przytycki, Bronisław Wajnryb, Paweł Traczyk) Mathematical Physics and Developments in Algebra (Alexander Stolin, Konstantin Zarembo) Optimal Stopping and Applications (Łukasz Balbus, F. Thomas Bruss, Krzysztof Szajowski) Quasiconformal Mappings and Complex Dynamical Systems (Mark Elin, Anatoly Golberg, Stanisława Kanas, Toshiyuki Sugawa) Special Classes of Hilbert Space Operators (Marek Ptak)
Panel Discussions EuDML: Accessing Europe’s Mathematical Treasures (moderator: Jiří Rákosník), speakers: Laurent Guillopé, Marek Niezgódka, Olaf Teschke Financing of Mathematical Research (moderator: Pavel Exner), speakers: Mats Gyllenberg, Michał Karoński, Sastry G. Pantula, Lex Zandee Redressing the Gender Imbalance in Mathematics: Strategies and Outcomes (moderator: Caroline Series), speakers: Penelope Bidgood, Kari Hag, Marja Makarow, Christie Marr, Marie-Francoise Roy The Role of Mathematics in the Emerging Economies (moderator: Andreas Griewank, Tsou Sheung Tsun), speakers: Neela Nataraj, Alexander Shananin, YuanJin Yun, Gareth Witten
xxix
‘Solid Findings’ in Mathematics Education; Proposals and Discussion, speakers: Guenter Toerner, Tommy Dreyfus, Despina Potari What Is Expected From European Learned Societies? (moderator: Marta Sanz-Solé), speakers: Ehrhard Behrends, Wolfgang Eppenschwandter, Maria J. Esteban, Gert-Martin Greuel, Ari Laptev
Closing Ceremony The closing ceremony of the 6 ECM began with a presentation of the statistics concerning participation and the programme of the 6 ECM, by the Chair of the Executive Organizing Committee. The diagrams he showed are collected in the next section. Next, the prizes for posters were announced by Prof. Robert Wolak, the Chair of the poster prize committee. The diplommas were presented to the winners by the President of the EMS. The following mathematicians received the prizes: Elena Yu. Bunkova (Steklov Mathematical Institute RAS, RU), Elliptic and Krichever formal group laws Francesco Cellarosi (Mathematical Sciences Research Institute, Berkeley, USA), Ergodic properties of square-free numbers (with Ilya Vinogradov) Andrzej Czarnecki (Jagiellonian University, Kraków, PL), Topological characterization of trivial cohomology with some applications Pablo González Sequeiros (University of Santiago de Compostela, ES), Robinson inflation for repetitive planar tilings (with Fernando Alcalde Cuesta, Álvaro Lozano Rojo) Maria Infusino (University of Reading, UK), On the discrepancy of some generalized Kakutani’s sequences of partitions (with Michael Drmota) Andrii Khrabustovskyi (National Academy of Sciences, UA), Periodic Laplace–Beltrami operator with preassigned spectral gaps Diána H. Knipl (University of Szeged, HU), Modelling the spread of influenza on long distance travel networks with real air traffic data Marian Ioan Munteanu (University of Iaşi, RO), The classification of Killing magnetic curves in M 2 (c) × R Ana Nistor (KU Leuven, BE), Constant angle surfaces in 3-manifolds Weronika Siwek (University of Silesia, Katowice, PL), Stochastic bursting production in gene expression Loredana Smaranda (University of Piteşti, RO), Convergence of the Lagrange–Galerkin method for a fluid-rigid system with discontinuous density
xxx
Authors of Prized Posters with Professors Sanz-Solé, Jackowski, and Wolak
The winners received books and other publication prizes, provided by the publishers presenting their products during the congress. After passing the prizes to the winners the President of the German Mathematical Society (DMV) Prof. Christian Bäer came to the podium and invited everybody to the 7 ECM which will be held in Berlin in 2016. Finally, the chair of of the Executive Organizing Committee expressed his thanks to everybody involved in the organization of the 6 ECM, and invited them to the podium. Tens persons – faculty, staff and students appeared and they were gratefully aplauded by the audience.
Participants, lectures and speakers There were total 980 registered participants of the 6 ECM; among them 76% men and 24% women. They were joined by 126 accompanying persons. In order to ensure broad participation in the 6 ECM and reduce economic barriers, grants were offered by the Foundation for Polish Science, the European Mathematical Society and by the European Women in Mathematics. Grants were awarded to 164 participants, 34 of them with Polish affiliation. Among the participants 139 were members of EMS and 129 of PTM (9 of the both societies). Below, we gather a few diagrams presenting the statistics of affiliations of the participants (Figure 1), their declared primary interests according to AMS MSC (Figure 2), and comparing the participation in the 6 ECM with the previous Congresses (Figure 3). The scientific programme of the 6 ECM consisted of: • • • •
9 plenary lectures (60 minutes) 33 invited lectures (45 minutes in parallel sessions) 11 lectures of the Prize Winners (45 minutes in parallel sessions) 24 Mini-symposia (2 hours, in parallel sessions) at which 94 talks were given by the speakers invited by the organizers of minisymposium • 3 special lectures, 1 special session and 6 panel discussions xxxi
Figure 1: Affiliations of 6 ECM participants
Figure 2: Declared primary interest according to AMS MSC Distribution of the speakers by country of affiliation, in comparison to the distribution of all participants, and the proportion of number of lectures in the selected fields compared to the distribution of the interests of all participants (clearly showing some differences, and the fact that some participants had their primary interest, defined via AMS MSC, different from the primary interests of all the speakers), is presented in two diagrams that comprise Figure 4.
xxxii
Figure 3: Numbers of ECM participants since 1992
Figure 4: Participants and speakers xxxiii
22, 51, 52, 53, 54, 55, 57, 58 26, 28, 30, 32, 33, 41, 42, 46, 47, 49, 90, 93
3. Geometry, Topology, Global Analysis
4. Analysis, Functional Analysis & Applications, Control Theory
3
xxxiv 1
179
7 4 2
46
05, 60, 62, 91 65, 68 01
5
18
9
5
70, 74, 76, 82, 85, 92
32
14
38
34
25
7
2
3
3
10
3
24
5
5
3
3
1
3
4
94
18
20
12
12
4
12
16
15
2
1
2
2
2
5
1
155
24
7
13
21
21
60
9
Plenary&Invited Mini-symposia STS Posters Lectures Sessions Talks Sessions Talks
35
34, 37, 39
11, 14, 15, 16, 17, 20
5. Dynamical Systems, Ordinary Differential Equations 6. Partial Differential Equations 7. Mathematics in Science & Technology, Mathematical Physics 8. Probability, Combinatorics, Statistics 9. Numerical Analysis, Scientific Computing 10. History of Mathematics, Mathematics Education, Popularization of Mathematics Total
03
2. Algebra, Number Theory, Algebraic Geometry
AMS Subjects
1. Logic, Foundations
Sections
In order to ensure broad participation in the 6 ECM the organizers offered to the participants to organize the Satellite Thematic Sessions. There were 15, in some cases continuation of the minisymposia, at which 155 talks were delivered. Thematic distribution of all the lectures and mini-symposia is presented in the following table:
Some mathematical aspects of water waves Adrian Constantin
Abstract. We discuss some recent mathematical investigations that offer insight and open up promising perspectives on the fundamental aspect of fluid mechanics concerned with the motion beneath a surface water wave. 2010 Mathematics Subject Classification. 31A05, 34C99, 35J15. Keywords. Water waves, irrotational flow, particle trajectories.
1. Introduction Water waves have fascinated scientists and laypersons alike since times immemorial and their understanding extends beyond intellectual curiosity. This research area is a rich source of problems where progress is contingent upon a fruitful interplay between mathematical analysis, numerical simulation, experimental evidence, and physical intuition. Moreover, in water-wave phenomena nonlinear approaches often describe more accurately the real nature of the ongoing processes instead of linear paradigms that usually capture only small perturbations of simple states. We survey some recent advances on a fundamental aspect arising in the study of water waves by presenting the state-of-the-art concerning the particle paths beneath the most regular water wave patterns of noticeable size – travelling twodimensional periodic gravity waves propagating in irrotational flow at the surface of a layer of water with a flat bed. The theme illustrates that an appreciation of mathematical rigor and elegance, combined with the power of meaningful abstraction, often leads to breakthroughs in physical insight, while mathematics draws considerable inspiration and stimulation from physics.
2. Particle paths in Stokes waves While watching the sea it is often possible to trace a wave as it propagates on the water’s surface. Contrary to a possible first impression, what one observes travelling across the sea is not the water but a pattern – the waves are not moving humps of water but invisible pulses of energy moving through water, as enunciated intuitively in the fifteenth century by Leonardo da Vinci in the following form: “...it often happens that the wave flees the place of its creation, while the water does not; like the waves made in a field of grain by the wind, where we see the waves running across the field while the grain remains in place.” A basic question in water waves concerns the flow beneath a surface wave. Stokes waves, termed swell
2
Adrian Constantin
in oceanography, are the most regular wave patterns propagating at constant speed at the surface of water in irrotational flow over a flat bed. It is widely believed (see, for example, the classical textbooks [1, 8, 12, 13, 14, 15, 16, 17]) that particles in the water beneath a Stokes wave execute a closed-path motion as the surface wave passes over: individual particles of water do not travel along with the wave, but instead they move in closed, circular or elliptical, orbits (depending on whether the ratio of wavelength to water depth is small or large, respectively). Support for this conclusion is apparently given by experimental evidence: photographs (see [9, 16, 17]) and even early movie films [2] of small buoyant particles in laboratory wave tanks where almost closed elliptical paths are recognizable. The classical approach towards explaining this aspect of water waves consists in analyzing the particle motion after linearizing the nonlinear governing equations for water waves. But even after linearizing the governing equations and obtaining explicit formulas for the free surface and for the fluid velocity field, the system describing particle motion turns out to be again nonlinear. Thus one linearizes again and the closed paths emerge. We now briefly discuss this linearization approach. Let us first present the governing equations. Water is modelled as a homogeneous incompressible inviscid fluid. Let u = (u, v) be the fluid velocity of a two-dimensional flow that presents no variation in the horizontal direction orthogonal to the X-direction of propagation of the wave, with the Y -direction chosen vertically upwards. The wave profile travelling at constant speed c > 0 at the water’s free surface is represented by Y = η(X − ct), with Y = −d being the flat bed. Set ρ = 1 for the fluid density and denote by g the gravitational constant of acceleration. The governing equations for travelling gravity water waves are: • The equation of mass conservation uX + vY = 0 throughout the fluid domain − d ≤ Y ≤ η(X − ct) ,
(2.1)
for the velocity field u(X − ct, Y ), which is appropriate if we regard the water as an incompressible fluid of constant density. • The equation of motion is Euler’s equation ut + (u · ∇)u = − ∇P + (0, −g) in the fluid domain − d ≤ Y ≤ η(X − ct) , (2.2) were P (X − ct, Y ) is the pressure. Supposing the matter to be continuously distributed and the fluid to be inviscid, the pressure is the manifestation of the action of the internal forces to the constraint of incompressibility (2.1), and the evolution of the surface wave, as expressed by (2.2), is governed by the balance between gravity (as a restoring force) and the inertia of the system. • In the absence of an underlying non-uniform current, we have the irrotational flow condition uY = vX
throughout the fluid domain
− d ≤ Y ≤ η(X − ct) .
(2.3)
3
Some mathematical aspects of water waves
• The boundary conditions are the kinematic boundary conditions v
=
(u − c) ηX
v
=
0
on the free surface Y = η(X − ct),
on the flat bed Y = −d,
(2.4) (2.5)
expressing the fact that a particle on the boundary is confined to it, and the dynamic boundary condition P = Patm
on the free surface Y = η(X − ct),
(2.6)
which decouples the water flow from the motion of the air above it. The above formulation ignores viscosity and surface tension. For an in-depth discussion of the physical grounds for neglecting these factors we refer to [5]. In short, the inclusion of viscosity allows the waves, eventually, to decay (which is realistic); however, this is on a far longer timescale and lengthscale compared with the wave period or the wavelength, so that it is reasonable to neglect viscosity. As for surface tension, its presence is only important for wavelengths of the order a few cm, and we are investigating gravity waves with wavelengths measured in m; the typical wavelength of swell being in excess of 100 m.
Y
0 Y = ! ( X ! ct )
L
!d
X
Figure 1. Given the wave speed c > 0, a Stokes wave is a two-dimensional periodic travelling wave: the space-time dependence of the free surface, of the pressure, and of the velocity field has the form (X − ct) and is periodic with period L > 0, with the wave Y = η(X − ct) oscillating about the (undisturbed) flat surface Y = 0, that is, Z L η(X) dX = 0. 0
The classical studies of particle paths beneath a Stokes wave rely on linearization. By dropping all nonlinear terms in (2.1)–(2.6), one obtains the linear wave
4
Adrian Constantin
solution η(X − ct) = ε d cos(kX − ωt), +d]) u(X − ct, Y ) = ε ωd cosh(k[Y cos(kX − ωt) , sinh(kd) +d]) v(X − ct, Y ) = ε ωd sinh(k[Y sin(kX − ωt) , sinh(kd) cosh(k[Y +d]) P (X − ct, Y ) = P cos(kx − ωt), atm − ρgY + ε ρgd cosh(kd)
(2.7)
p of wavelength L > 0, where k = 2π/L and ω = gk tanh(kd) are the p wavenumber, respectively the frequency, and the dispersion relation c = ω/k = g tanh(kd)/k expresses the speed c of the linear wave. The small positive parameter ε 1 in (2.7) is indicative of the regime of waves of small amplitude. Despite the performed linearization, the motion of the particle (X(t), Y (t)) beneath this linear wave is described by the nonlinear system dX = u = M cosh(k[Y + d]) cos k(X − ct), dt (2.8) dY = v = M sinh(k[Y + d]) sin k(X − ct), dt with initial data (X0 , Y0 ), where M = ε ωd/ sinh(kd). The classical approach pursued in [8, 15, 17], is to seek approximations of the solution in terms of the small parameter M . We restrict our attention to the fixed time interval [0, T ], where T = λ/c > 0 is the wave period. Since Y (t) belongs a priori to the bounded set [−d, dε], we readily obtain that X(t) − X0 = O(M ),
Y (t) − Y0 = O(M ),
t ∈ [0, T ],
where O(M ) denotes an expression of order M . The mean-value theorem yields dX = M cosh(k[Y0 + d]) cos(kX0 − ωt) + O(M 2 ), dt dY = M sinh(k[Y + d]) sin(kX − ωt) + O(M 2 ), 0 0 dt since ω = kc. Neglecting terms of second order in M , we get dX ≈ M cosh(k[Y0 + d]) cos(kX0 − ωt), dt dY ≈ M sinh(k[Y + d]) sin(kX − ωt) . 0 0 dt Integration yields M cosh(k[Y0 + d]) sin(kX0 ) − sin(kX0 − ωt) , X(t) ≈ X0 + ω M Y (t) ≈ Y + sinh(k[Y0 + d]) cos(kX0 − ωt) − cos(kX0 ) , 0 ω
5
Some mathematical aspects of water waves
so that
[X(t) − X0∗ ]2 [Y (t) − Y0∗ ]2 M2 + ≈ 2. 2 2 ω cosh (k[Y0 + d]) sinh (k[Y0 + d])
This is the equation of an ellipse: to a first-order approximation the water particles above the flat bed move clockwise in closed elliptic orbits, the centre of the ellipse being (X0∗ , Y0∗ ), with X0∗ = X0 +
M cosh(k[Y0 + d]) sin(kX0 ), ω
Y0∗ = Y0 −
M sinh(k[Y0 + d]) cos(kX0 ). ω
While each particle appears to describe its ellipse in a wave period and all are in the same phase, the lengths of the major and minor axes of the ellipses decrease exponentially fast as we descend into the water. At the flat bed the ellipse degenerates into a straight line, and the particles located there appear to perform the back-and-forth motion M sin(kX0 ) − sin(kX0 − ωt) , Y (t) = −d. X(t) ≈ X0 + ω The wave characteristics clearly determine the shape of the ellipses. In particular, since the eccentricity of the ellipse is 1/ cosh(k[Y0 +d]) and kd = 2πδ, with δ = d/L large being the hallmark of the deep-water regime, the deviation of the ellipse from a circle is barely noticeable for deep-water waves. The above survey of the classical linear approach highlights its shortcomings: even within linear water wave theory, the system of ordinary differential equations system describing the motion of the particles is inherently nonlinear, and a further linearization of this system is performed. One can hardly expect this ‘brute-force’ approach to yield an accurate description of the subtleties of the particle path motion, especially since the resulting outcome of closed orbits throughout the flow is a pattern that is easily destroyed by small perturbations. It turns out that no particle trajectory is closed: over a wave period, each particle that does not lie on the flat bed performs a backward/forward and upward/downward movement, with the path an elliptical-like loop, not closed but with a forward drift (albeit mostly small and thus often hard to detect experimentally). On the flat bed this path degenerates into a back-and-forth horizontal motion. These features are lost in the process of linearization but can be established within the framework of the nonlinear governing equations. The proof relies on an interplay of methods from harmonic function theory, dynamical systems and elliptic partial differential equations (see [4, 6]). Before presenting a sketch of the proof, let us elucidate the reason why the laboratory experiments that underly the photographic evidence provided in [9, 16, 17] had shown up closed loops. It turns of that the specific choice of wave characteristics for these experiments results in a very small forward drift, as revealed in recent high-tech experiments [3, 19]. Without being able to keep track of this small forward drift, the particles appear to describe circular paths. Let us introduce the moving frame x = X − ct,
y = Y,
6
Adrian Constantin
as a setting that ignores the time-dependence in the problem. For clarity, let us specify that we call Stokes wave a smooth solution to the governing equations for which η, u, v, P are all periodic in the x-variable, with the functions η, u, P even and v odd in the x-variable. Moreover, the wave profile η should be strictly monotone between consecutive crests and troughs, and symmetric. For a survey of the bifurcation theory approach towards the existence of these regular wave patterns we refer to [5] and [18]. An essential feature is that the flow presents no stagnation points, that is, u(x, y) < c for
− d ≤ y ≤ η(x).
(2.9)
This is consistent with experimental data, indicating that in general the horizontal motion of individual water particles is slower than the propagation speed of the wave. The mean current κ beneath a Stokes wave is the average horizontal current Z L 1 u(x, y0 ) dx, that is, the average of u on any horizontal line in the water, κ = L 0 y = y0 below the wave trough level, which is independent on y0 cf. [6]. Physically one can imagine that swell originating from a distant storm enters a region of water in uniform flow, with κ = 0 corresponding to swell entering a region of still water. Throughout these considerations we place ourselves in the physical regime κ = 0, that is, Z L 0= u(x, −d) dx . (2.10) 0
We refer to [6] for the general case. To reduce the number of unknowns in the problem, it is convenient to introduce in the moving frame the stream function Z y ψ(x, y) = m + [u(x, s) − c] ds −d
with the following properties: • ψ has period L in the x-variable and ∆ψ = 0; • ψx = −v,
ψy = u − c;
• ψ is constant on y = η(x) and on y = −d. If m is the relative mass flux Z
η(x)
m=
c − u(x, y) dy > 0,
−d
then ψ = 0 on y = η(x). Taking advantage of the fact that in the moving frame the Euler equation (2.2) is equivalent to the statement that the expression E=
(c − u)2 + v 2 + gy + P 2
7
Some mathematical aspects of water waves
is constant throughout the flow (this is Bernoulli’s law), we can reformulate the governing equations as ∆ψ = 0 in |∇ψ|2 + 2g(y + d) = Q on ψ=0 on ψ=m on
− d < y < η(x), y = η(x), y = η(x), y = −d,
(2.11)
a free-boundary problem which is to be solved in the class of functions that are of period L in the x-variable. Q and m are physical constants (head, relative mass flux). For simplicity we set L = 2π and choose the wave crest at x = 0. Other than its nonlinear character, the main difficulty in dealing with (2.11) lies in the fact that the surface profile η is unknown. Taking advantage of the structure of the problem, we can overcome this inconvenience. Introduce the velocity potential Z Z x
y
[u(l, −d) − c] dl +
φ(x, y) =
v(x, s) ds, −d
0
with the following properties: • φx = u − c, φy = v; • ∆φ = 0; • φ is odd in the x-variable and φ = 0 on the crest line x = 0; • φ(x, y) + cx has period 2π in x. The conformal hodograph change of variables q = −φ(x, y),
p = −ψ(x, y),
(2.12)
transforms the free boundary problem into a nonlinear boundary problem for the harmonic function h(q, p) = y + d in a fixed rectangular domain. The transformed boundary problem is for −cπ < q < cπ, −m < p < 0, ∆q,p h = 0 h=0 on p = −m, 2 (Q − gh)(h2q + h2p ) = 1 on p = 0,
(2.13)
for h even and periodic of period 2cπ in the q-variable. As a first step towards the investigation of the particle paths beneath a Stokes wave, note that P is superharmonic: 2 2 2 ∆P = −ψxx − 2ψxy − ψyy ≤ 0.
8
Adrian Constantin
y
S!
S+
p
x
p=0 q
%!
q = !"
%+
^ %!
p= ! !
B! $#
B+
^ %+
p=!m
$ c#
#
c#
The conformal change of variables .
Figure 2. The conformal change of variable transform the free-boundary problem into a boundary-value problem in a fixed domain.
Since Py = −g < 0 on y = −d by Hopf’s maximum principle the minimum of P is attained only along the free surface y = η(x), where P = Patm . In particular, Px (x, η(x)) < 0 for x ∈ (0, π). The relation uq =
Px (c − u)2 + v 2
now yields ˆ +, uq < 0 along the top boundary of Ω
(2.14)
ˆ + = {(q, p) : 0 < q < cπ, −m < p < 0}. On the other hand, the where Ω functions u and v being harmonic in the (x, y)-variables, they remain harmonic in the (q, p)-variables due to the conformal change of variables. Thus uq is harmonic. Furthermore, using Hopf’s maximum principle and the boundary conditions, we get ˆ +, • uq = 0 along the lateral sides of Ω ˆ+ of Ω ˆ +, • uq < 0 on the lower boundary B • uq < 0 on the top boundary Sˆ+ , ˆ + , while v = 0 on the above three parts of the boundary (see since v > 0 in Ω Fig. 2). Recalling (2.14), we infer from the strong maximum principle that uq < 0 ˆ + . Thus, due to symmetry, u is a strictly increasing function of x along any in Ω streamline in Ω− and a strictly decreasing function of x along any streamline in Ω+ .
Some mathematical aspects of water waves
9
The streamlines [ψ = −p] with p ∈ [−m, 0] provide a foliation of the rectangular domain {(q, p) : −cπ ≤ q ≤ cπ, − m ≤ p ≤ 0}. The particle path {(X(t), Y (t))}t≥0 starting at (X0 , Y0 ) can be obtained by solving the system ( 0 X (t) = u(X − ct, Y ), Y 0 (t) = v(X − ct, Y ), with initial data (X(0), Y (0)) = (X0 , Y0 ). This corresponds to the streamline x(t) = X(t) − ct, y(t) = Y (t) in the moving frame, and to the autonomous Hamiltonian system (with Hamiltonian function ψ) ( 0 x (t) = u(x, y) − c, y 0 (t) = v(x, y) . As u − c < 0, for any particle path there is a time, say t = 0, when x = π, and another time, say t = θ > 0 when x = −π. θ is the elapsed time (the time it takes to traverse one period in the moving plane), given by Z π dx , θ(p) = c − u(x, y(x)) −π where y = y(x) is a parametrization of the streamline ψ(x, y) = −p. To the elapsed time one can associate the drift of a particle: the net horizontal distance moved by the particle between its positions below two consecutive troughs (or crests), that is, X(θ) − X(0) = c θ − 2π = X(t + θ) − X(t), t ∈ R, which corresponds to one period in the moving plane. The drift is positive if the particle moves in the direction of wave propagation, and negative if it moves in the reverse direction, while zero drift characterizes a closed particle path that 2π corresponds to a solution of period θ = . It turns out (see [4, 6]) that the drift c of a particle strictly decreases with depth, with θ(−m) > 2π/c on the flat bed. The qualitative description of the particle trajectories is now within reach. We ˆ + . In showed that u is a strictly increasing function of x along any streamline in Ω view of (2.10), there is precisely one point x ∈ (0, π) where u(x, −d) = 0. The level set {u = 0} in Ω+ can be seen to consists of a smooth curve C+ that intersects each streamline ψ = −p exactly once. In Ω− the level set u = 0 is given by the reflection C− of the curve C+ across the line x = 0. Throughout the region bounded above by y = η(x), below by y = −d and laterally by C− and C+ we have u > 0 (including the top and lower boundaries), while in the region bounded above by y = η(x), below by y = −d and laterally by x = −π and C− , as well as in the region bounded above by y = η(x), below by y = −d and laterally by C+ and x = π we have u < 0. In particular, u < 0 on the trough line x = π and u > 0 on the crest line x = 0. Consider now (X(0), Y (0)) with X(0) = π and Y (0) > −d as the initial position of a particle. In the moving frame the point (x(t), y(t)) reaches x = −π with y(θ) = Y (0) in time θ = θ(y0 ), moving to the left for t ∈ (0, θ) and
10
Adrian Constantin
C
D
B
E
A
C D
B
E
x = !!
E
q = !c !
x=0
D
C
q=0
A
x= !
B
A
q= c !
Figure 3. Typical particle path above the flat bed for a Stokes wave with no underlying current. The top figure is drawn in the physical frame: at A and E the wave trough is located right above the particle, while at C the wave crest is right above it. The middle and bottom figures depict the corresponding motion in the moving frame and in the conformal frame, respectively: in both cases the motion is to the left (the free surface and the flat bed are also drawn).
intersecting successively the curve C+ at a point B, the vertical segment x = 0 at a point C, the curve C− at D until it finally reaches x = −π at E = (−π, Y (0)) for t = θ. In the time interval needed for (x(t), y(t)) to get from A to B and from D to E we know that u < 0 so that in the physical variables (X, Y ) the particle (X(t), Y (t)) moves to the left. In the time interval needed for (x(t), y(t)) to get from B to D we have u > 0 so that (X(t), Y (t)) moves to the right. Between A and C we have v > 0 so that (X(t), Y (t)) moves up, while between C and E we know that v < 0 so that (X(t), Y (t)) moves down. The pattern depicted in Fig. 3 emerges, while for a particle located on the flat bed at (π, −d) the motion has a backward-forward pattern with a forward drift, mirroring the projection of the above loop to the flat bed.
Some mathematical aspects of water waves
11
3. Concluding remarks Using formally quadratic quantities in the wave amplitude within the framework of linear theory, one can compute the average flow of energy and infer that the water particles in the fluid experience on average a net displacement in the direction in which the waves are propagating (see [14]). The corresponding mean rate of movement is known as the Stokes drift. The previous analysis of the governing equations shows that the Stokes drift is not a phenomenon noticeable just on average: all particles are looping in the direction of wave propagation. It is interesting to point out that if we perform just the linearization of the governing equations, starting with (2.7) without subsequently also linearizing the corresponding explicit system (2.8) for the particle paths, the same pattern as that presented in Fig. 3 emerges for particles above the flat bed. This was first proved by means of a detailed qualitative study of the system (2.8) in [7], and subsequently the paper [11] provided the explicit solution of the system (2.8). Thus refraining from linearizing the nonlinear explicit system (2.8) suffices to capture the essential qualitative features of the particle paths for waves of small amplitude. However, the previous qualitative analysis of the nonlinear governing equations is not contingent upon this type of restriction. The fact that it is difficult to assign a well-defined quantitative meaning to the concept “small amplitude” emphasizes the importance of nonlinear theory.
References [1] D. J. Acheson, Elementary fluid dynamics, The Clarendon Press, Oxford University Press, New York, 1990. [2] A. E. Bryson, Waves in fluids, National Committee for Fluid Mechanics Films (Encyclopaedia Britannica Educational Corporation), Chicago, 60611. [3] Y.-Y. Chen, H.-C. Hsu, and G.-Y. Chen, Lagrangian experiment and solution for irrotational finite-amplitude progressive gravity waves at uniform depth, Fluid Dyn. Res. 42 (2010), 045511. [4] A. Constantin, The trajectories of particles in Stokes waves, Invent. Math. 166 (2006), 523–535. [5] A. Constantin, Nonlinear water waves with applications to wave-current interactions and tsunamis, CBMS-NSF Conf. Ser. Appl. Math. Vol. 81, SIAM, Philadelphia, 2011. [6] A. Constantin and W. Strauss, Pressure beneath a Stokes wave, Comm. Pure Appl. Math. 63 (2010), 533–557. [7] A. Constantin and G. Villari, Particle trajectories in linear water waves, J. Math. Fluid Mech. 10 (2008), 1–18. [8] G. D. Crapper, Introduction to water waves, Ellis Horwood Ltd., Chichester, 1984. [9] L. Debnath, Nonlinear water waves, Academic Press, Boston, 1994. [10] D. Gilbarg and N. S. Trudinger, Elliptic partial differential equations of second order, Springer-Verlag, Berlin, 2001.
12
Adrian Constantin
[11] D. Ionescu-Kruse, Particle trajectories in linearized irrotational shallow water flows, J. Nonlinear Math. Phys. 15 (2008), 13–27. [12] B. Kinsman, Wind waves: their generation and propagation on the ocean surface, Dover, 2002. [13] H. Lamb, Hydrodynamics, Cambridge Univ. Press, Cambridge, 1895. [14] J. Lighthill, Waves in fluids, Cambridge Univ. Press, Cambridge-New York, 1978. [15] L. M. Milne-Thomson, Theoretical hydrodynamics, The Macmillan Co., New York 1960. [16] A. Sommerfeld, Mechanics of deformable bodies, Academic Press, New York, 1950. [17] J. J. Stoker, Water waves. The mathematical theory with applications, New York: Interscience Publ. Inc., 1957. [18] J. F. Toland, Stokes waves, Topol. Meth. Nonl. Anal. 7 (1996), 1–48. [19] M. Umeyama, Eulerian/Lagrangian analysis for particle velocities and trajectories in a pure wave motion using particle image velocimetry, Philos. Trans. Roy. Soc. London A 370 (1964), 1687–1702.
Adrian Constantin, Department of Mathematics, King’s College London, Strand, London WC2R 2LS, UK E-mail: [email protected] Faculty of Mathematics, University of Vienna, Nordbergstrasse 15, 1090 Vienna, Austria E-mail: [email protected]
Continuous dissipative Euler flows and a conjecture of Onsager Camillo De Lellis∗ and L´aszl´o Sz´ekelyhidi†
Abstract. It is known since the pioneering works of Scheffer and Shnirelman that there are nontrivial distributional solutions to the Euler equations which are compactly supported in space and time. Obviously these solutions do not respect the classical conservation law for the total kinetic energy and they are therefore very irregular. In recent joint works we have proved the existence of continuous and even H¨ older continuous solutions which dissipate the kinetic energy. Our theorem might be regarded as a first step towards a conjecture of Lars Onsager, which in 1949 asserted the existence of dissipative H¨ older solutions for any H¨ older exponent smaller than 31 . 2010 Mathematics Subject Classification. 35D30, 76F05, 34A60, 53B20. Keywords. Euler equations, anomalous dissipation, h-principle, Onsager’s conjecture.
1. The Euler equations The incompressible Euler equations are a system of partial differential equations which were derived more than 250 years ago by Euler to describe the motion of an inviscid fluid. If we assume that the density of the fluid is a constant ρ0 , the unknowns of the system are the velocity v, a vector field, and the pressure p, a scalar field. For convenience we will assume that these fields are defined on Tn × I or in Rn × I, where Tn = S1 × . . . × S1 is the n-dimensional torus and I is either an open interval ]0, T [, or the open halfline ]0, ∞[ or the entire real line R. In general we assume n ≥ 2, but the case of interests here are obviously n = 2, 3. The equations take then the following form ( ∂t v + divx (v ⊗ v) + ∇p = 0 (1.1) divx v = 0 , where the density ρ0 of the fluid is normalized to 1. The velocity v(x, t) represents the speed of the fluid particle which at times t occupies the point x. If Ω is a smooth (bounded) open domain, then Z p(x, t)ν(x) dA(x) ∂Ω ∗ The † The
first author acknowledges the support of the SFB Grant TR71 second author acknowledges the support of the ERC Grant Agreement No. 277993
14
Camillo De Lellis and L´ aszl´ o Sz´ekelyhidi
is the total force exerted at time t by the fluid outside Ω upon the portion of fluid inside Ω. ν denotes the exterior unit normal to Ω. Note that p is then well-defined up to an arbitrary function of time, since Z ν=0 ∂Ω
for every smooth bounded open set Ω. This arbitrariness in the definition of p can Rbe seen directly from (1.1) and it is natural to mod it out by normalizing p so that p(x, t) dx = 0, which from now on will always be assumed to hold. Tn The two equations in (1.1) express simply the conservation of mass and momentum. Indeed, if (v, p) is a pair of C 1 functions satisfying (1.1) and Ω an arbitrary domain, the divergence theorem implies Z v · ν = 0, (1.2) Z Z∂Ω Z d v= v(v · ν) + pν . (1.3) dt Ω ∂Ω ∂Ω The identity (1.2) expresses the conservation of mass: the total amount of fluid particles “getting out” of Ω is balanced by the total amount “getting in”. The identity (1.3) is the counterpart of the conservation of momentum: the rate of change of the momentum of the fluid contained in Ω is given by the sum of the flux of momentum through Ω and the total force exerted on Ω by the portion of fluid lying outside. In continuum mechanics it is often the case that balance laws as in (1.2) and (1.3) (valid for any “fluid element” Ω) are derived, under suitable assumptions, from first principles, whereas the differential equations (as (1.1)) are deduced as consequences when the functions are sufficiently smooth. In the case at hand (1.1) can be easily derived from (1.2)–(1.3) if the pair (v, p) is C 1 . However we can make sense of (1.2) and (1.3) even if (v, p) are much less smooth: the continuity of the pair is, for instance, enough to make sense of all the integrals in (1.2) and (1.3) whenever Ω has C 1 (or even Lipschitz) boundary. Though this looks quite natural, we will see that there are pairs of continuous functions satisfying (1.2) and (1.3) which display a quite counterintuitive behavior.
2. Anomalous dissipation If (v, p) is a C 1 solution of (1.1), we can scalar multiply the first equation by v and use the chain rule to derive the identity 2 |v|2 |v| ∂t + divx + p v v = 0. 2 2 Assume that the domain of definition is Tn × I and integrate this last equality in space. We then derive the conservation of the total kinetic energy Z d |v|2 (x, t) dx = 0 . (2.1) dt Tn
Continuous dissipative Euler flows
15
Thus, classical solutions of the incompressible Euler equations are energy conservative. Nonetheless, in [42] Onsager suggested the existence of solutions to the 3dimensional incompressible Euler equations which dissipate the energy. Such solutions cannot be interpreted in classical terms and it is remarkable that indeed Onsager himself suggests a concept of solution which coincides with our modern notion of weak (distributional) solutions. Before coming to this, let us briefly describe the considerations of Onsager on the energy spectrum for 3-dimensional isotropic turbulence. We start by introducing the Navier–Stokes equations, namely the system ( ∂t v + divx (v ⊗ v) + ∇p = ν∆v (2.2) divx v = 0 , where the viscosity ν is considered to be fairly small (or, in the language of fluid dynamics, the Reynolds number of the flow is high). For a smooth solution of (2.2) the balance law for the energy (2.1) would then take the form Z Z d 2 |v| (x, t) dx = −2ν |∇ × v|2 (x, t) dx . (2.3) dt Tn Tn It is well known that, in 2 dimensions, the right hand side of (2.3), called the enstrophy, is a conserved quantity and hence there is no mechanism of “inflation” for the dissipation term. However, this conservation does not hold for 3-dimensional solutions, where the energy is the only constant of motion and there are several experimental reasons to believe that typically the enstrophy becomes quite large. If we were considering a family of solutions uν with ν → 0 and if these solutions were to converge to a classical solution of (1.1), then the right hand side of (2.3) would behave as O(ν). However, in the theory of hydrodynamic turbulence it is expected that, in 3-dimensions and for “typical” turbulent solutions of (2.2), the right hand side of (2.3) is independent of the viscosity. Thus, one may advance the hypothesis that the dissipation of the energy is not primarily driven by the viscous term ν∆u and that the main responsible for this dissipation is indeed the nonlinear term of the equations, which appear as well in (1.1). This hypothesis and a corresponding “energy spectrum” law has been first put forward by Kolmogorov in [34] (nowadays often cited as K41 theory) and, as pointed out by Onsager in [42], rediscovered independently at least twice (in [41] and [56]; see also [30], which refers to [45]). We briefly explain here the motivations given by [42] for the Kolmogorov’s law (and refer to [28] for a nice and much more detailed analysis of Onsager’s discoveries). Denote by E(t) the average of the total kinetic energy (divided by the density of the fluid) and by Q = − dE dt its rate of dissipation. Moreover, we let L be the “macroscale” of the flow (in our case we can suppose this is the side length of the torus, i.e. 2π). If we assume that Q depends only on L and E a simple dimensional analysis suggests the law 3 dE Q=− = cE 2 L−1 , (2.4) dt
16
Camillo De Lellis and L´ aszl´ o Sz´ekelyhidi
where c is a dimensionless constant. Indeed, if σ denotes the unit of space and τ the unit of time, then E is measured in σ 2 /τ 2 , Q in σ 2 /τ 3 and L in σ: it can be readily checked that the law (2.4) is the only possible one of the form cE α Lβ for which c is a dimensionless constant. The law (2.4) has been verified extensively in experiments and it turns out to be valid as long as the viscosity is very small compared to E(t). In order to get into Onsager’s explanation of how this might be possible, we expand the velocity v in Fourier series: X v(x, t) = ak (t)eik·x . k∈Z3
Obviously a−k = ak , because v is real-valued. Moreover the divergence-free constraint translates into the identity k · ak = 0. We then rewrite the remaining equations of (2.2) as an infinite-dimensional system of ODEs for the ak : X (a` · k)k dak − ν|k|2 ak . (2.5) =i ak−` · ` −a` + dt |k|2 ` P 2 Clearly the total kinetic energy is (up to constant factors) k |ak | . Observe, P 2 moreover, that L is essentially the smallest λ such that |k|=λ |ak | is comparable to E. We next derive the rate of change of the energy carried by a given wave number: X d |ak |2 = −2|k|2 ν|ak |2 + Q(k, `) , (2.6) dt `
where the term Q(k, `) is given by Q(k, `) = −2Im ((ak+` · `)(ak · a` ) + (a`−k · k)(ak · a` )) . Note that Q(k, `) = −Q(`, k): this term accounts for the “energy exchange” between different Fourier modes. As long as −ν|k|2 is small (i.e. for sufficiently small k), we can assume that the term Q(k, `) is the dominating one in (2.6). The picture proposed by Onsager for a “typical” chaotic flow is the following: in the infinite sum at the right hand side of (2.6) only the terms where ak , a` , ak+` have a comparable size are dominating. So, the energy gets redistributed from wave lengths of a certain size to wave length of, say, double that size. As λ grows the redistribution process happens faster and faster, so that after a short time (i.e. before E becomes too smal for the validity of (2.4)) the energy is redistributed at all scales. If this transfer is a chaotic process, after few steps the information about the low wave numbers (i.e. the macroscopic features of the flow). It is therefore plausible that the energy flux of the energy distribution depends only on the total dissipation rate Q = − dE and on the modulus of the wave number |k|. Pdt df If we set f (λ) := |k|≤λ |ak |2 the energy distribution E(λ) is “formally” dλ , R −1 so that E = E(λ)dλ. Since the frequency is measured in σ , E(λ) is measured in σ 3 τ −2 . The same dimensional analysis leading to (2.4) gives then 2
5
E(λ) = βQ 3 λ− 3 ,
(2.7)
17
Continuous dissipative Euler flows
where β is a dimensional constant. The last identity is the famous Kolmogorov’s law.
3. Weak solutions At the end of his note Onsager remarks that . . . in principle, turbulent dissipation as described could take place just as readily without the final assistance of viscosity. In the absence of viscosity the standard proof of conservation of energy does not apply, because the velocity field does not remain differentiable! In fact it is possible to show that the velocity field in such “ideal turbulence” cannot obey any Lipschitz condition of the form |v(x) − v(y)| ≤ C|x|α for any α greater than 31 ; otherwise the energy is conserved. First of all, translated in modern PDE terminology, Onsager is simply proposing to look at weak solutions. Again, quoting him directly . . . Of course, under the circumstances, the ordinary formulation of the laws of motion in terms of differential equations becomes inadequate and must be replaced by a more general description; for example, the formulation (2.5) in terms of Fourier series will do. Thus he is simply looking at functions with Fourier coefficients satisfying (2.5), where ν is set equal to 0. However, he is not assuming any differentiability of these solutions: the only assumption is that the right hand side of (2.5) makes sense, or in other words that the series: X (a` · k)k ak−` · ` −a` + |k|2 `
converges. Recall that ak · k = 0. Thus the series above can be rewritten as X (a` · k)k ak−` · k −a` + . |k|2 ` P For the converge it is then sufficient to assume that |ak |2 < ∞, i.e. that the velocity field is square summable. Summarizing, “Onsager’s solutions” are those divergence free real valued fields X v(x, t) = ak (t)eik·x with Fourier coefficients satisfying X (a` · k)k dak =i ak−` · k −a` + dt |k|2 `
and such that X
|ak |2 (t) < ∞
for every t.
Consider now the (time-dependent) vector X b(t) := −i ak−` · ka` . `
(3.1)
18
Camillo De Lellis and L´ aszl´ o Sz´ekelyhidi
Observe that the right hand side of (3.1) is the vector b − (k·b) |k|2 k, i.e. the projection of b(t) on the vector space Vk orthogonal to k. Since t 7→ ak (t) is a curve in Vk , (3.1) is satisfied if and only if w·
dak − w · b(t) = 0 dt
for all w ∈ Vk .
In turn the last identity can be rewritten as w·
X dak +w⊗k : ak−` ⊗ a` = 0 . dt
(3.2)
`
Introduce next the vector field ϕ(x) := weik·x and observe that, for w ∈ Vk , ϕ is divergence-free. The identity (3.2) is simply Z Z ∂v (x, t) dx − ∇ϕ(x) : v ⊗ v(x, t) dx = 0 . (3.3) ϕ(x) · ∂t Tn Tn A simple density argument shows then that (3.1) holds if and only if v is a weak solution in the sense of distributions. We recall the latter notion for the reader’s convenience. Definition 3.1. A vector field v ∈ L2 (Tn × I) is a weak solution of the incompressible Euler equations if Z ∂t ϕ · v + ∇ϕ : (v ⊗ v) dxdt = 0 (3.4) for all ϕ ∈ Cc∞ (Tn × I; Rn ) with div ϕ = 0 and Z v · ∇ψ dxdt = 0 for all ψ ∈ Cc∞ (Tn × I).
(3.5)
4. The Onsager’s conjecture The final sentence of Onsager’s note is then . . . The detailed conservation of energy (2.6) does not imply conservation of the total energy if the number of steps in the cascade is infinite, as expected, and the double sum of Q(`, k) converges only conditionally. Here, as implicit in the discussion, we are setting ν = 0. Thus Onsager claims that a closer inspection of the identity (2.6) shows that the total conservation of the energy can be inferred from the weak formulation of the equation (3.1) only when the solution is H¨ older continuous with exponent larger than 31 , whereas this might fail for smaller exponents. Following this suggestion, the claim about the energy conservation has been P shown by Eyink in [27] under the assumption that k |k|α |ak | < ∞ (which does imply the α-H¨ older regularity, in space, of the function v, but it is obviously a stronger condition). Onsager’s exact claim has then been sown by Constantin, E
Continuous dissipative Euler flows
19
and Titi with an elegant and fairly short argument (we refer also to [49] for more precise results). However, much less is known on the other side of the conjecture, namely on the existence of solutions with lower regularity which do not preserve energy. This will be the main focus of the rest of the note, where we will explore what has been proved up to now. The exponent 13 has a direct significance in isotropic turbulence, since it is related to another famous law of the Kolmogorov’s theory, namely the fact that, in isotropic turbulent flows, the spatial variance of velocities is comparable to the distance to the power 32 (see the discussion in the paper [28]). These laws are always derived by scaling arguments and thus a proof of the Onsager’s conjecture would give a first justification purely based on rigorous mathematical considerations pertaining to the equations of motions.
5. Weak solutions with compact support in time The first proof that weak solutions of the Euler’s equations might not be energy conservative is due to Scheffer in his groundbreaking paper [46]. The main theorem of [46] states the existence of a non-trivial weak solution in L2 (R2 ×R) with compact support in space and time. Later on Shnirelman in [47] gave a different proof of the existence of a non-trivial weak solution in L2 (T2 × R) with compact support in time. In these constructions it is not clear if the solution belongs to the energy space, i.e. whether each time-slice belongs to L2 . In the note [48] Shnirelman gave the first existence proof of a solution of the 3-dimensional Euler equations which dissipates the energy: obviously this solution does belong to the energy space and hence satisfies the requirement that the kinetic energy be finite at each time. In the paper [20] we provided a relatively simple proof of the following stronger statement. Theorem 5.1. There exist infinitely many compactly supported bounded weak solutions of the incompressible Euler equations in any space dimension. The proof in [20] is based on a suitable notion of subsolution and it embeds the examples of Theorem 5.1 in a long tradition of (rather counterintuitive) constructions in the theory of differential inclusions. As pointed out in the important paper ˇ [38] by M¨ uller and Sverak, these results (see, for instance, [10, 11, 19, 32, 33]) have a close relation to Gromov’s h-principle. In particular the method of convex inteˇ gration, introduced by Gromov and extended by M¨ uller and Sverak to Lipschitz mappings, provides a very powerful tool to construct such examples. In the paper [20] these tools were suitably modified and used for the first time to explain Scheffer’s non-uniqueness theorem. It was also noticed immediately that this approach allows to go way beyond the result of Scheffer. Indeed it has lead to new developments for several equations in fluid dynamics (see [12, 18, 50, 52, 53, 54, 57]), for which we refer to the survey article [22]. We now motivate the definition of subsolution following [22]. Let us first recall the concept of Reynolds stress. It is generally accepted that the appearance of
20
Camillo De Lellis and L´ aszl´ o Sz´ekelyhidi
high-frequency oscillations in the velocity field is the main reason responsible for turbulent phenomena in incompressible flows. One related major problem is therefore to understand the dynamics of the coarse-grained, in other words macroscopically averaged, velocity field. If v denotes the macroscopically averaged velocity field, then it satisfies ∂t v + div (v ⊗ v + R) + ∇p = 0 , div v = 0 ,
(5.1)
where R = v⊗v−v⊗v. The latter quantity is called Reynolds stress and arises because the averaging does not commute with the nonlinearity v ⊗ v. On this formal level the precise definition of averaging plays no role, be it long-time averages, ensemble-averages or local space-time averages. The latter can be interpreted as taking weak limits. Indeed, weak limits of Leray solutions of the Navier–Stokes equations with vanishing viscosity have been proposed in the literature as a deterministic approach to turbulence (see [1], [2], [13], [37]). We are now ready to introduce our notion of subsolution. In what follows we will use S n×n for the space of n × n symmetric matrices. Definition 5.2 (Subsolutions). Let e ∈ L1loc (Rn ×(0, T )) with e ≥ 0. A subsolution to the incompressible Euler equations with given kinetic energy density e is a triple (v, R, p) : Rn × (0, T ) → Rn × S n×n × R with the following properties: (i) v ∈ L2loc ,
u ∈ L1loc ,
p is a distribution;
(ii) (5.1) is satisfied in the sense of distributions; (iii) R ≥
1 e− n (2¯
|¯ v |2 )Id ≥ 0 a.e.
Remark 5.3. Though in the various reference [20, 21, 22] the notion of subsolution is seemingly different from the one given above, the two concepts are easily shown to be equivalent. Consider, for instance the triple (v, u, q) of [22, Definition 2.3] and impose the relations tr R = 2¯ e − |v|2 ,
q =p+
2 e¯ n
and
u = (R + v ⊗ v) −
2 e¯ Id . n
It is then obvious that (v, R, p) is a subsolution in the sense of Definition 5.2 if and only if (v, u, q) is a subsolution in the sense of [22, Definition 2.3]. Observe that if R = 0, then the v component of the subsolution is in fact a weak solution of the Euler equations. As mentioned above, in passing to weak limits (or when considering any other averaging process), the high-frequency oscillations in the velocity are responsible for the appearance of a non-trivial Reynolds stress. Equivalently stated, this phenomenon is responsible for the inequality sign in (iii).
21
Continuous dissipative Euler flows
The key point in our approach to prove Theorem 5.1 is that, starting from a subsolution, an appropriate iteration process reintroduces the high-frequency oscillations. In the limit of this process one obtains weak solutions. However, since the oscillations are reintroduced in a very non-unique way, in fact this generates many solutions from the same subsolution. In the next theorem we give a precise formulation of the previous discussion. Theorem 5.4 (Subsolution criterion). Let e ∈ C(Rn × (0, T )) and (v, R, p) be a smooth subsolution such that 2¯ e − |¯ v |2 > 0. Then there exist infinitely many weak ∞ n solutions v ∈ Lloc (R × (0, T )) of the Euler equations such that 2 1 2 |v|
=e
almost everywhere. Infinitely many among these belong to C((0, T ), L2 ). This theorem corresponds to Proposition 2 of [21] (cp. with Theorem 2.4 of [22]). From it we derived quite severe counterexamples to the uniqueness of solutions to the Euler equations, even when imposing quite restrictive additional constraints.
6. The Nash–Kuiper Theorem and Gromov’s h-principle The origin of convex integration lies in the famous Nash–Kuiper theorem. In this section we briefly recall some landmark results from the theory of isometric embeddings. Let M n be a smooth compact manifold of dimension n ≥ 2, equipped with a Riemannian metric g. An isometric immersion of (M n , g) into Rm is a map u ∈ C 1 (M n ; Rm ) such that the induced metric u] e agrees with g. In local coordinates this amounts to the system ∂i u · ∂j u = gij (6.1) consisting of n2 (n + 1) equations in m unknowns. If in addition u is injective, it is an isometric embedding. Assume for the moment that g ∈ C ∞ . The two classical theorems concerning the solvability of this system are: (A) if m ≥ 21 (n + 2)(n + 3), then any short embedding can be uniformly approximated by isometric embeddings of class C ∞ (Nash [40], Gromov [29]); (B) if m ≥ n + 1, then any short embedding can be uniformly approximated by isometric embeddings of class C 1 (Nash [39], Kuiper [36]). Recall that a short embedding is an injective map u : M n → Rm such that the metric induced on M by u is shorter than g. In coordinates this means that (∂i u · ∂j u) ≤ (gij )
(6.2)
in the sense of quadratic forms. Thus, (A) and (B) are not merely existence theorems, they show that there exists a huge (essentially C 0 -dense) set of solutions.
22
Camillo De Lellis and L´ aszl´ o Sz´ekelyhidi
This type of abundance of solutions is a central aspect of Gromov’s h-principle, for which the isometric embedding problem is a primary example (see [26, 29]). There is a clear formal analogy between (6.1)–(6.2) and (1.1)–(5.1). First of all, note that the Reynolds stress measures the defect to being a solution of the Euler equations and it is in general a nonnegative symmetric tensor, whereas gij −∂i u·∂j u measures the defect to being isometric and, for a short map, is also a nonnegative symmetric tensor. More precisely (6.1) can be formulated for the deformation gradient A := Du as the coupling of the linear constraint curl A = 0 with the nonlinear relation At A = g. In this sense short maps are “subsolutions” to the isometric embedding problem in the spirit of Definition 5.2. Along this line of thought, Theorem 5.4 is then the analogue for the Euler equations of the Nash–Kuiper result (B). However note that, strictly speaking, the formal analog of statement (B) would be replacing L∞ by C 0 in Theorem 5.4. Statement (B) is rather surprising for two reasons. First of all, for n ≥ 3 and m = n + 1, the system (6.1) is overdetermined. Moreover, for n = 2 we can compare (B) to the classical rigidity result concerning the Weyl problem: if (S2 , g) is a compact Riemannian surface with positive Gauss curvature and u ∈ C 2 is an isometric immersion into R3 , then u is uniquely determined up to a rigid motion ([14, 31], see also [51] Chapter 12 for a thorough discussion). Thus it is clear that isometric immersions have a completely different qualitative behavior at low and high regularity (i.e. below and above C 2 ). A strikingly similar phenomenon holds for the Euler equations since, when coupled with the energy constraint |v|2 = 2¯ e, they are also formally overdetermined. Moreover C 1 solutions of the Cauchy problem are unique. There are further analogies when we look at embeddings with H¨older regularity, as we will see in Section 8 below.
7. Continuous and H¨ older dissipative solutions In the paper [23] we have succeeded in constructing the first example of a dissipative continuous solutions of the Euler equations. More precisely, we can prove the following statement. Theorem 7.1. Assume e : [0, 1] → R is a positive smooth function. Then there is a continuous vector field v : T3 × [0, 1] → R3 and a continuous scalar field p : T3 × [0, 1] → R which solve (1.1) in the sense of distributions and such that Z e(t) = |v|2 (x, t) dx ∀t ∈ [0, 1] . (7.1)
Continuous dissipative Euler flows
23
Moreover, in the more recent note [24] we have achieved a version of Theorem 7.1 which allows for a small H¨older exponent. Theorem 7.2. There is θ ∈]0, 13 [ with the following property. For every smooth positive function e : S1 → R there is a vector field v ∈ C θ (T3 × S1 , R3 ) and a scalar field p ∈ C θ (T3 × S1 ) which solve the incompressible Euler equations in the sense of distributions and such that Z e(t) = |v|2 (x, t) dx ∀t ∈ S1 . (7.2) This represents obviously the first instance that Onsager’s suggestion might indeed be correct. The construction in [23] is much more complicated and more surprising than the one in [20]. Note indeed that by a simple approximation argument continuous weak solutions of (1.1) satisfy the much stronger balance laws (1.2) and (1.3) for any C 1 open domain Ω. Clearly, Theorem 7.1 is not the C 0 counterpart of Theorem 5.4. The way the theorem is derived share, however, several similarities with the Nash-Kuiper approach to the approximation of short maps with C 1 isometric embeddings. Indeed, Theorem 7.1 is achieved through an iteration procedure: the final product of this scheme can be seen as a superposition of infinitely many (perturbed) and weakly interacting Beltrami flows. Curiously, the idea that turbulent flows can be understood as a superposition of Beltrami flows has already been proposed almost 30 years ago in the fluid dynamics literature: see the work of Constantin and Majda [16]. Along the iteration the maps will be subsolutions of the Euler equations in the sense of Definition 5.2. In what follows S03×3 denotes the vector space of symmetric trace-free 3 × 3 matrices. ˚ are smooth functions on T3 × [0, 1] taking values, Definition 7.3. Assume v, p, R 3×3 3 respectively, in R , R, S0 . We say that they solve the Euler–Reynolds system if (
˚ ∂t v + div (v ⊗ v) + ∇p = div R
(7.3)
div v = 0 . ˚ is just the traceless part of the Reynolds stress R introClearly, the tensor −R duced in (5.1). We are now ready to state the main proposition of [23], of which Theorem 7.1 is a simple corollary. Proposition 7.4. Let e be as in Theorem 7.1. Then there are positive constants η and M with the following property. ˚ a solution of the Euler–Reynolds Let δ ≤ 1 be any positive number and (v, p, R) system (7.3) such that Z 3δ |v|2 (x, t) dx ≤ 5δ ∀t ∈ [0, 1] (7.4) 4 e(t) ≤ e(t) − 4 e(t)
24
Camillo De Lellis and L´ aszl´ o Sz´ekelyhidi
and ˚ t)| ≤ ηδ . sup |R(x,
(7.5)
x,t
˚1 ) which solves as well the Euler–Reynolds Then there is a second triple (v1 , p1 , R system and satisfies the following estimates: Z 3δ e(t) ≤ e(t) − |v1 |2 (x, t) dx ≤ 5δ ∀t ∈ [0, 1] , (7.6) 8 8 e(t) ˚1 (x, t)| ≤ 1 ηδ , sup |R 2
(7.7)
√ sup |v1 (x, t) − v(x, t)| ≤ M δ
(7.8)
x,t
x,t
and sup |p1 (x, t) − p(x, t)| ≤ M δ .
(7.9)
x,t
˚0 = 0 and δ := 1. Proof of Theorem 7.1. We start by setting v0 = 0, p0 = 0, R ˚n ) which We then apply Proposition 7.4 iteratively to reach a sequence (vn , pn , R solves (7.3) and such that Z 3 e(t) 5 e(t) ≤ e(t) − |vn |2 (x, t) dx ≤ for all t ∈ [0, 1] (7.10) 4 2n 4 2n ˚n (x, t)| ≤ η sup |R (7.11) 2n x,t r 1 sup |vn+1 (x, t) − vn (x, t)| ≤ M (7.12) 2n x,t M sup |pn+1 (x, t) − pn (x, t)| ≤ . (7.13) 2n x,t Then {vn } and {pn } are both Cauchy sequences in C(T3 × [0, 1]) and converge ˚n converges uniformly uniformly to two continuous functions v and p. Similarly R to 0. Moreover, by (7.10) Z |v|2 (x, t) dx = e(t) ∀t ∈ [0, 1] . T3
Passing into the limit in (7.3) we therefore conclude that (v, p) solves (1.1). The proof of Proposition 7.4 shares several similarities with Nash’s scheme. The most important one, common to all instances of the h principle, is that the map v1 consists of adding two perturbations to v: v1 = v + wo + wc =: v + w. where the leading term of the perturbation has the form wo (x, t) = W (x, t, λx, λt)
(7.14)
Continuous dissipative Euler flows
25
with W smooth and λ very large. Thus v1 is derived from v by adding very fast oscillations. On the other hand there are several points where our method departs dramatically from Nash’s, due to some issues which are typical of the Euler equations and are not present for the isometric embeddings. We just highlight the two ones which are, in our opinion, the most relevant. First of all, our scheme has to deal with a “transport term” which arises, roughly speaking, as the linearization of the first equation in (1.1). This term is typical of an evolution equation, whereas, instead, the equations for isometric embeddings are “static”. At a first glance this transport term makes it impossible to use a scheme like the one of Nash to prove Theorem 7.1. To overcome this obstruction we need to introduce a phase-function that acts as a kind of discrete Galilean transformation of the (stationary) Beltrami flows, and to introduce an “intermediate” scale along each iteration step on which this transformation acts. Secondly, convex integration heavily relies on one-dimensional oscillations - the simple reason being that these can be “integrated”, hence the name convex integration. As already mentioned, the main building blocks of our iteration scheme are Beltrami flows, which are truly three-dimensional oscillations. The issue of going beyond one-dimensional oscillations has been raised by Gromov (p219 of [29]) ˇ ak (p52 of [33]), but as far as we know, there as well as Kirchheim–M¨ uller–Sver´ have been no such examples in the literature so far. In fact, it seems that with one-dimensional oscillations alone one cannot reach a proof of Proposition 7.4.
8. C 1,α isometric embeddings The question of a sharp regularity threshold has been the object of investigation for the isometric embedding of surfaces as well (see for instance [29], [58]). Consider a smooth Riemannian 2-dimensional manifold M = (S2 , g) with positive curvature. As already mentioned, the isometric embeddings of M into R3 are rigid in the class C 2 , whereas the h-principle holds for C 1 . Borisov investigated embeddings of class C 1,α and proved the rigidity for α > 32 (as a culminating result of the 1 investigations in [3, 4, 5, 6, 7]) and the local h-principle for α < 13 (although the latter was announced in 1965, see [8], a partial proof only appeared in 2004 [9]). In [17] we returned to this problem, and gave a more modern PDE proof of the loca h-principle for α < 17 , together with more general statements in all dimensions (here by locality we mean that the h-principle holds in this form for Riemannian 2-dimensional manifolds diffeomorphic to R2 : for purely technical reasons, when the topology is more complicated the proofs yield a lower treshold, cf. with [17, Corollaries 1 and 2]). The arguments of [3, 4, 5, 6, 7] for the rigidity when α > 23 are geometric but quite involved. A short proof of Borisov’s rigidity result was provided in [17]. Note that if u ∈ C 3 one can compute the area distorsion of the Gauss map from the Riemann-curvature tensor, which in turn depends only on the metric. When the curvature is positive, the image u(M ) is therefore locally convex. Even if the
26
Camillo De Lellis and L´ aszl´ o Sz´ekelyhidi
metric g is smooth, this is nonetheless false in general when the isometry is not regular enough, as shown precisely by the Nash–Kuiper theorem. However, by a result of Pogorelov (see [43] and [44]), the convexity of u(M ) holds even for C 1 maps u, provided one could show that the area distortion is always “positive” (cp. with [17] for the exact definition). The theory developed by Borisov in [3, 4, 5, 6, 7] shows that this positivity holds when the isometric immersion u is 2 of class C 1, 3 +ε . In [17] we recover Borisov’s statement expressing the equality between the Riemann-curvature tensor and the area distortion of the Gauss map with a suitable integrable formula. The latter resembles, in structure, the integral identity leading to the energy conservation for the Euler equations. Indeed our computations in [17] bear striking similarities with those used in [15] for proving 1 the energy conservation of C 1, 3 +ε solutions of Euler. In the case of isometric embeddings there does not seem to be a universally accepted critical exponent (see Problem 27 in [58]), even though 12 and 13 seem both relevant (compare with the discussion in [9] and with that in [22]).
References [1] C. Bardos, J.-M. Ghidaglia, and S. Kamvissis, Weak convergence and deterministic approach to turbulent diffusion. Nonlinear wave equations (Providence, RI, 1998), Contemp. Math., Amer. Math. Soc. 263, 2000, 1–15. [2] C. Bardos and E. S. Titi, Euler equations for an ideal incompressible fluid. Uspekhi Mat. Nauk 62(3) (375) (2007), 5–46. [3] J. F. Borisov, The parallel translation on a smooth surface. I. Vestnik Leningrad. Univ., 13(7) (1958), 160–171. [4] J. F. Borisov, The parallel translation on a smooth surface. II. Vestnik Leningrad. Univ. 13(19) (1958), 45–54. [5] J. F. Borisov, On the connection between the spatial form of smooth surfaces and their intrinsic geometry. Vestnik Leningrad. Univ. 14(13) (1959), 20–26. [6] J. F. Borisov, The parallel translation on a smooth surface. III. Vestnik Leningrad. Univ. 14(1) (1959), 34–50. [7] J. F. Borisov, On the question of parallel displacement on a smooth surface and the connection of space forms of smooth surfaces with their intrinsic geometries. Vestnik Leningrad. Univ. 15(19) (1960), 127–129. [8] J. F. Borisov, C 1,α -isometric immersions of Riemannian spaces. Doklady 163 (1965), 869–871. [9] J. F. Borisov, Irregular C 1,β -surfaces with analytic metric. Sib. Mat. Zh. 45(1) (2004), 25–61. [10] A. Bressan and F. Flores, On total differential inclusions. Rend. Sem. Mat. Univ. Padova 92 (1994), 9–16. [11] A. Cellina, On the differential inclusion x0 ∈ [−1, +1]. Atti Accad. Naz. Lincei Rend. Cl. Sci. Fis. Mat. Natur. (8) 69 (1-2) (1980), 1–6 (1981).
Continuous dissipative Euler flows
27
[12] E. Chiodaroli, A counterexample to well-posedness of entropy solutions to the compressible Euler system. Preprint (2012). [13] A. J. Chorin, Vorticity and turbulence. Applied Mathematical Sciences 103, Springer-Verlag, New York, 1994. [14] S. Cohn-Vossen, Zwei S¨ atze u ¨ber die Starrheit der Eifl¨ achen, Nachrichten G¨ ottingen (1927), 125–137. [15] P. Constantin, W. E, and E. S. Titi, Onsager’s conjecture on the energy conservation for solutions of Euler’s equation. Comm. Math. Phys. 165(1) (1994), 207–209. [16] P. Constantin and A. Majda, The Beltrami spectrum for incompressible fluid flows, Comm. Math. Phys. 115(3) (1988), 435–456. [17] S. Conti, C. De Lellis, and L. Sz´ekelyhidi, Jr., h-principle and rigidity for C 1,α isometric embeddings. In: Nonlinear partial differential equations, The Abel Symposium 2010, H. Holden and K. H. Karlsen, Eds. Springer-Verlag, 2012, 83–116. [18] D. Cordoba, D. Faraco, and F. Gancedo, Lack of uniqueness for weak solutions of the incompressible porous media equation. Arch. Ration. Mech. Anal. 100(3) (2011), 725–746. [19] B. Dacorogna and P. Marcellini, General existence theorems for Hamilton–Jacobi equations in the scalar and vectorial cases. Acta Math. 178 (1997), 1–37. [20] C. De Lellis and L. Sz´ekelyhidi, Jr., The Euler equations as a differential inclusion. Ann. of Math. (2) 170(3) (2009), 1417–1436. [21] C. De Lellis and L. Sz´ekelyhidi, Jr., On admissibility criteria for weak solutions of the Euler equations. Arch. Ration. Mech. Anal. 195(1) (2010), 225–260. [22] C. De Lellis and L. Sz´ekelyhidi, Jr., The h-principle and the equations of fluid dynamics. To appear in Bull. of the Amer. Math. Soc. [23] C. De Lellis and L. Sz´ekelyhidi, Jr. Continuous dissipative Euler flows. Preprint (2012). [24] C. De Lellis and L. Sz´ekelyhidi, Jr., In preparation. [25] R. J. DiPerna, Compensated compactness and general systems of conservation laws. Trans. Amer. Math. Soc. 292(2) (1985), 383–420. [26] Y. Eliashberg and N. Mishachev, Introduction to the h-principle. Graduate Studies in Mathematics 48. American Mathematical Society, Providence, RI, 2002. [27] G. L. Eyink, Energy dissipation without viscosity in ideal hydrodynamics. I. Fourier analysis and local energy transfer. Phys. D 78(3-4) (1994), 222–240. [28] G. L. Eyink and K. R. Sreenivasan, Onsager and the theory of hydrodynamic turbulence, Reviews of Modern Physics 78 (2006). [29] M. Gromov, Partial differential relations. Ergebnisse der Mathematik und ihrer Grenzgebiete (3) 9, Springer-Verlag Berlin, 1986. [30] W. Heisenberg, On the theory of statistical and isotropic turbulence. Proceedings of the Royal Society of London. Series A 195 (1948), 402–406. ¨ [31] G. Herglotz, Uber die Starrheit der Eifl¨ achen. Abh. Math. Semin. Hansische Univ. 15 (1943), 127–129. [32] B. Kirchheim, Rigidity and Geometry of microstructures. Habilitation thesis, University of Leipzig, 2003.
28
Camillo De Lellis and L´ aszl´ o Sz´ekelyhidi
ˇ ak, Studying nonlinear PDE by geometry in [33] B. Kirchheim, S. M¨ uller, and V. Sver´ matrix space. In: Geometric analysis and Nonlinear partial differential equations, S. Hildebrandt and H. Karcher, Eds. Springer-Verlag (2003) 347–395. [34] A. N. Kolmogorov, The local structure of turbulence in incompressible viscous fluid for very Large Reynolds’ numbers. C. R. (Doklady) Acad. Sci. URSS (N.S.) 30 (1941), 301–305. [35] A. N. Kolmogorov, On the degeneracy of isotropic turbulence in an incompressible viscous fluid. C. R. (Doklady) Acad. Sci. URSS (N. S.) 31 (1941), 538–540. [36] N. Kuiper, On C 1 isometric imbeddings i,ii. Proc. Kon. Acad. Wet. Amsterdam A 58 (1955), 545–556, 683–689. [37] P. D. Lax, Deterministic theories of turbulence. In: Frontiers in pure and applied mathematics, North-Holland, Amsterdam (1991), 179–184. ˇ ak, Convex integration for Lipschitz mappings and counterex[38] S. M¨ uller and V. Sver´ amples to regularity. Ann. of Math. (2) 157(3) (2003), 715–742. [39] J. Nash, C 1 isometric imbeddings. Ann. Math. 60 (1954), 383–396. [40] J. Nash, The imbedding problem for Riemannian manifolds. Ann. Math. 63 (1956), 20–63. [41] L. Onsager, The distribution of energy in turbulence. Phys. Rev. 68 (1945), 286. [42] L. Onsager, Statistical hydrodynamics. Nuovo Cimento (9), 6 Supplemento, 2(Convegno Internazionale di Meccanica Statistica) (1949), 279–287. [43] A. V. Pogorelov, The rigidity of general convex surfaces. Doklady Acad. Nauk SSSR 79 (1951), 739–742. [44] A. V. Pogorelov, Extrinsic geometry of convex surfaces. Translations of Mathematical Monographs, Vol. 35. American Mathematical Society, Providence, R.I., 1973. ¨ [45] L. Prandtl, Uber ein Formelsystem fur die ausgebildete Turbulenz. Nachr. Akad. Wiss. G¨ ottingen (Math. Phys. Kl.), Vol. IIA (1945). [46] V. Scheffer, An inviscid flow with compact support in space-time. J. Geom. Anal. 3(4) (1993), 343–401. [47] A. Shnirelman, On the nonuniqueness of weak solution of the Euler equation. Comm. Pure Appl. Math. 50(12) (1997), 1261–1286. [48] A. Shnirelman, Weak solutions with decreasing energy of incompressible Euler equations. Comm. Math. Phys. 210(3) (2000), 541–603. [49] R. Shvydkoy, Lectures on the Onsager conjecture. Discrete Contin. Dyn. Syst. Ser. S 3 3 (2010), 473–496. [50] R. Shvydkoy, Convex integration for a class of active scalar equations. J. Amer. Math. Soc. 24(4) (2011), 1159–1174. [51] M. Spivak, A comprehensive introduction to differential geometry. Vol. V. 2nd ed. Berkeley: Publish or Perish, Inc. (1979). [52] L. Sz´ekelyhidi, Jr., Weak solutions to the incompressible Euler equations with vortex sheet initial data. C. R. Acad. Sci. Paris S´er. I Math. 349 (19–20) (2011), 1063– 1066. [53] L. Sz´ekelyhidi, Jr., Relaxation of the incompressible porous medium equation. To appear in Ann. Sci. Ec. Norm. Sup. (2012).
Continuous dissipative Euler flows
29
[54] L. Sz´ekelyhidi, Jr. and E. Wiedemann, Young measures generated by ideal incompressible fluid flows. To appear in Arch. Rat. Mech. Anal. (2012). [55] L. Tartar, Compensated compactness and applications to partial differential equations, In: Nonlinear analysis and mechanics: Heriot–Watt Symposium, Vol. IV, Res. Notes in Math. vol. 39, Pitman, Boston (1979), 136–212. [56] C. F. von Weisz¨ acker, Das Spektrum der Turbulenz bei grossen Reynoldsschen Zahlen. Z. Phys. 124 (1958), 614–627. [57] E. Wiedemann, Existence of weak solutions for the incompressible Euler equations, Ann. Inst. H. Poincar´e Anal. Non Lin´eaire 28(5) (2011), 727–730. [58] S. T. Yau, Open problems in geometry. Differential geometry: partial differential equations on manifolds (Los Angeles, CA, 1990), Proc. Sympos. Pure Math. vol. 54, Amer. Math. Soc. Providence, RI (1993), 1–28.
Camillo De Lellis, Institut f¨ ur Mathematik, Universit¨ at Z¨ urich, CH-8057 Z¨ urich, Switzerland E-mail: [email protected] L´ aszl´ o Sz´ekelyhidi, Institut f¨ ur Mathematik, Universit¨ at Leipzig, D-04103 Leipzig E-mail: [email protected]
Persistent homology: theory and practice Herbert Edelsbrunner∗ and Dmitriy Morozov†
Abstract. Persistent homology is a recent grandchild of homology that has found use in science and engineering as well as in mathematics. This paper surveys the method as well as the applications, neglecting completeness in favor of highlighting ideas and directions. 2010 Mathematics Subject Classification. Primary 55N99; Secondary 68W30. Keywords. Algebraic topology, homology groups, distance, stability, algorithms; scale, shape analysis, topology repair, high-dimensional data.
1. Introduction Built on a sequence of spaces and the corresponding homology groups with homomorphism between them, persistence assesses the interval within which a homology class contributes. Among other situations, this ability is useful when a space is not fixed but depends on the scale of the observation, which is a common scenario in the sciences. After a brief review of the historical development, we sketch characteristics of the method. History. Like many other concepts in mathematics, persistent homology has a beginning but also a historical root system that comes into sight when we increase the resolution of the inquiry. This is precisely what persistent homology does for a much more general class of spaces: it synthesizes the different views as aspects into a single consistent reality that spans a range of scales. We mention three main historical tracks in the root system of persistent homology. In 1990, Patrizio Frosini and collaborators introduced size functions, a formalism that is equivalent to 0-dimensional persistent homology [24]. The main direction of the pursuant work is on shape analysis and its applications in computer vision and medical imaging; see a recent survey [4]. In 1999, Vanessa Robins studied the homology of sampled spaces and described the images of homomorphisms induced by inclusions as persistent homology groups [39]. In 2000, Edelsbrunner, Letscher and Zomorodian independently introduced persistent homology together with a fast algorithm and the diagram [22], as we will discuss later. Both of these works were inspired by the computational notion of alpha shapes [21, 23] and the related Betti number algorithm [16]. Within mathematics, there is a distinct relationship with ∗ This research is partially supported by NSF under grant DBI-0820624, by ESF under the Research Networking Programme, and by the Russian Government Project 11.G34.31.0053. † This work is partially supported by the DOE Office of Science ASCR under award number KJ0402-KRD047 and contract number DE-AC02-05CH11231.
32
Herbert Edelsbrunner, Dmitriy Morozov
spectral sequences, originally introduced in 1946 by Jean Leray [34]. Motivated by its significant applications, persistent homology has found repeated exposure in the popular mathematics literature [5, 27, 45] and features prominently in a recent text on computational topology [20]. Perspectives. In a nutshell, persistent homology expands the relationship between a topological space and its homology groups to that between a function and its persistence diagram. The latter relationship gives rise to a rich theory which invites different perspectives if studied with different mind-sets. Mathematics: We see an extension of the algebraic theory of homology forming a bridge to measure theory. The extension is inspired by Morse theoretic reasoning taken to the algebraic level of homology groups connected by maps. Computation: The persistent homology groups are computed by reducing the boundary matrices of complexes. Indeed, all algebraic relationships have parallels in the matrix representation. Applications: The matrices give fast algorithms and the algebra leads to scaledependent measurements of spaces. Importantly, these measurements are stable, they can be used to compare and analyze shapes, and they can be exploited to repair faulty topology. Depending on the interest, we focus on different aspects of the method. We present the material in two main sections, first explaining the theoretical framework of persistent homology, and second sketching four applications selected to highlight different aspects of the theory.
2. The theory In this section, we discuss the mathematical and computational representation of topological spaces, the algebra obtained by applying the homology functor, the implications of this construction to measuring topology, and algorithms that compute the groups in this algebra. 2.1. Spaces and functions. Persistent homology is applied to a filtered space or, equivalently, to the sequence of sublevel sets of a real-valued function on this space. We discuss how to extract both ingredients, a space and a function, from different kinds of data. Data. Sometimes the data already comes as a real-valued function, such as digital images. They are usually laid out on a regular integer grid, in which every cell records the locally averaged intensity value of the measured light field. Most common are 2D images in which the cells are squares, but also time-series of 2D images and 3D images are widely used. Indeed, digital images form one of the
Persistent homology: theory and practice
33
most important classes of data as they are inexpensive to acquire and they probe nature in exquisite detail. Another prevailing form of input data are point clouds, finite subsets of some ambient space, most often Euclidean. Each point represents a sequence of measurements of an individual in a population. We typically want to understand the overall shape of the cloud, for example, by measuring the topology of the space we get by thickening each point to a ball and taking the union. Equivalently, we may introduce the distance function that maps each point of the ambient space to its distance from the nearest data point. Letting α be the radius of the balls, we get the union as the sublevel set, defined as the set of points with function value at most α. A crucial property of this construction is its stability: if the input data follows an underlying law that appears as a shape in the ambient space, then the function we construct is close to the distance function defined by that shape and thus facilitates the study of the latter. A third class of input data are shapes, subsets of ambient space that satisfy regularity conditions of one kind or another. A common subclass consists of surfaces in R3 , e.g., obtained by collecting points on the boundary of a solid object with a 3D scanner and connecting the points to a surface by interpolation. As in the point cloud case, the function is typically constructed in a second step, perhaps to highlight or define features of the shape, such as protrusions or cavities. In the case of a surface, popular such functions are the mean and the Gaussian curvature, as well as the eccentricity [28]. There are plenty of other possibilities — with special constructions for special purposes — such as the elevation function defined in terms of the persistent homology of the 2-parameter family of height functions in R3 [1]. Complexes. Following a long-standing tradition in topology, we work with complexes to represent continuous spaces. Common examples are CW-, cubical and simplicial complexes, to name a few. Cubical complexes have already been mentioned as the basis of digital images. They consist of cubes of various dimensions, with the requirement that with every p-cube, the complex also contains the 2p (p − 1)-cubes that are its faces. Significant improvements in the efficiency of computations are gained if we store cubical subdivisions hierarchically, such as in quadand oct-trees [40]. The CW- and simplicial complexes are extreme examples on opposite ends of the spectrum. The CW-complexes allow for complicated cells glued to each other in complicated ways and thus facilitate representations of spaces with only a few cells. In contrast, all cells in a simplicial complex are simplices, and any two are glued along a single shared face or not at all. In spite of the frequently required large number of simplices, the local simplicity of these complexes lends itself to efficient computations. Every simplicial complex has an abstract and a geometric side, and it is useful to fully exploit both. Take, for example, the nerve of a finite collection of convex sets; that is: the system of subcollections with non-empty common intersection. This is an abstract simplicial complex since every collection U in the nerve implies the membership of the subsets of U . A particularly useful collection of convex sets are
34
Herbert Edelsbrunner, Dmitriy Morozov
the Voronoi cells of a finite set of points in Rn [43]. Assuming general position, the maximum number of Voronoi cells with non-empty common intersection is n + 1. In this case, the nerve has a natural geometric realization, known as the Delaunay triangulation of the points in Rn [15]. Specifically, for each U in the nerve of the Voronoi cells, the Delaunay triangulation contains the convex hull of the points whose Voronoi cells are in U . This simplicial complex supports computations of the Euclidean distance function defined by the points. Indeed, the sublevel set of a threshold α > 0 is a union of balls of radius α, one around each point. Intersecting each ball with the corresponding Voronoi cell gives another collection of convex sets, and its nerve is isomorphic to a subsystem of the nerve of the Voronoi cells. Its geometric realization is known as the α-complex [21, 23], which is, of course, a subcomplex of the Delaunay triangulation. There are many situations in which the Delaunay triangulation is not defined, or we cannot afford to compute it. A popular alternative is the Vietoris–Rips complex, which exists whenever we have the distances between pairs of points. Given a threshold a > 0, this complex contains a simplex spanned by p + 1 points iff every two of these points are at distance at most a from each other. Equivalently, the Vietoris–Rips complex for parameter a is the flag complex built on the set of edges with length at most a. 2.2. Algebra. The classic theory of homology maps a topological space to an abelian group which, in the case of coefficients in a field, is a vector space. Having a filtered space, we get a sequence of vector spaces, together with linear maps induced by inclusion. This is the basic set-up for persistent homology, which we now describe. Homology. The theory of homology is a classic subject within algebraic topology, which is described in most of the standard texts, including Munkres [38] and Hatcher [29]. The construction begins with a chain group, Cp , whose elements are the p-chains, which for a given complex are formal sums of the p-dimensional cells. The boundary homomorphism, ∂p : Cp → Cp−1 , maps each p-chain to the sum of the (p − 1)-dimensional faces of its p-cells, which is a (p − 1)-chain. Writing the groups and maps in sequence, we get the chain complex : ∂p+2
∂p+1
∂p
∂p−1
. . . −→ Cp+1 −→ Cp −→ Cp−1 −→ . . .
(1)
The kernels and the images of the boundary homomorphisms are the cycle and the boundary groups. A fundamental property of the boundary homomorphism is that its square is zero, ∂p ◦ ∂p+1 = 0. Therefore, for every p, the boundaries form a subgroup of the cycles, and we can take the quotient, which gives a group whose elements are classes of homologous cycles. This is the p-th homology group, denoted as Hp , where p is again the dimension. We assume coefficients in a field, F, so that Hp = F ⊕ F ⊕ . . . ⊕ F = Fβp is a vector space over F, with βp = rank Hp known as the p-th Betti number. For a topological space, X, we write Hp (X) and βp (X) for its p-th homology group and Betti number. They are defined for every integer, p, but if the dimension of X is n, then the only possibly non-trivial homology groups
35
Persistent homology: theory and practice
are for 0 ≤ p ≤ n. Accordingly, we have βp = 0 unless 0 ≤ p ≤ n. To Lsimplify the notation, we will often suppress the dimension and write H(X) = p Hp (X) for the direct sum. Let X0 ⊆ X be a topological subspace. Every cycle in X0 is also a cycle in X, although it may be trivial in the latter without being trivial in the former. The inclusion of X0 in X induces a linear map on the homology groups, ϕ : H(X0 ) → H(X). We will also consider the pair of spaces, (X, X0 ), whose (relative) homology is obtained by identifying cycles that differ only inside X0 . We have again a linear map induced by inclusion, ψ : H(X) → H(X, X0 ). Furthermore, there is a third linear map, D : H(X, X0 ) → H(X0 ), such that Dp+1
ϕp
ψp
Dp
ϕp−1
. . . −→ Hp (X0 ) −→ Hp (X) −→ Hp (X, X0 ) −→ Hp−1 (X0 ) −→ . . .
(2)
is exact, by which we mean that the image of every map is the kernel of the next map. This particular sequence is the exact sequence of the pair (X, X0 ). It is a compact expression of how the relative homology of the pair is related to the (absolute) homology of the two spaces. Filtrations. The basic set-up for persistent homology consists of a filtered space, a nested sequence of subspaces that begins with the empty and ends with the complete space [22, 48]. Writing ∅ = X0 ⊆ X1 ⊆ . . . ⊆ Xm = X, we apply the homology functor, which for each space gives a vector space and for each inclusion gives a linear map: 0 = H(X0 ) → H(X1 ) → . . . → H(Xm ) = H(X),
(3)
referring to this sequence as a persistence module. It is instructive to split the module into indecomposable summands of the form 0 → F → . . . → F → 0, where every nonzero map is the identity. There is a unique such decomposition whose direct sum gives the original module. Each summand can be interpreted as the birth of a homology class at its first non-zero term and the death of the same class right after its last non-zero term. More precisely, the summand represents an entire coset of classes that are born and die together, but we prefer to simplify language by talking about generators. It should be clear that the module above is not necessarily exact. In fact, it is exact iff each summand is of the form 0 → F → F → 0, consisting of precisely two non-zero terms. Of particular significance is the length of a summand, which measures the duration of the corresponding class. We refer to it as the persistence of the homology class. When a filtration results from a function, we often define persistence not as the number of non-zero terms but rather as the absolute difference between the function values at the birth and the death. A related concept are the persistent homology groups, which are the images under the composition of the linear maps. For example, the image of H(Xi ) in H(Xj ) is such a group, and its rank is the number of indecomposable summands whose births happen at or before H(Xi ) and whose deaths happen after H(Xj ). Since (3) ends with a possibly non-trivial group, some homology classes may never die. We set the value at the death to ∞, but doing so deprives us of a
36
Herbert Edelsbrunner, Dmitriy Morozov
meaningful measure of the duration of such a class. Alternatively, we may add relative homology classes constituting a second pass: 0 = H(X0 ) → . . . → H(Xm ) → H(X, Xm ) → . . . → H(X, X0 ) = 0,
(4)
where Xi is the closure of X − Xi . This is the extended persistence module as introduced in [12]. Decompositions into summands, births, and deaths are defined as before. Now every class that is born also dies. We distinguish between three kinds: classes that are born and die during the first pass, classes that are born during the first pass and die during the second pass, and classes that are born and die during the second pass. The second kind comprises all classes of the entire space, X, which are precisely the ones that were born but did not die in (3). Beyond homology groups, the above decomposition holds for any linear sequence of vector spaces. In this context, we note the connection between persistence and quiver representations observed in [7]. A fundamental result for quivers states that the orientation of maps between vector spaces does not affect the structure of the indecomposable summands [17]. This implies that a module can be replaced by a sequence in which any two contiguous vector spaces are connected by a map — either from left to right, or from right to left. Such generalized sequences, referred to as zigzag modules, elucidate the relationship between the extended persistence and the homology of interlevel sets of scalar functions [8].
f e d x c
0
death-birth = persistence
height
2.3. Measuring. The persistence of a homology class is the length of the interval that supports it. The connection to applications is that the persistence measurement carries useful information about spaces, functions, and data. A particularly useful property of this measurement is its stability under perturbations of the function, as we explain in this section.
2
1 0
death+birth
b a
b
c
x d
e
f
a
Figure 1. Left: the height function with six critical points on a topological sphere. We also show five interleaving level sets and highlight one sublevel set. Right: the persistence diagram with two finite and two infinite points. The wedge anchored at (x, 0) contains two points labeled 0 and one point labeled 1, implying that the highlighted sublevel set has β0 = 2 and β1 = 1. Below: the barcode representation of the same information.
37
Persistent homology: theory and practice
Persistence diagrams. The splitting into indecomposable summands suggests a combinatorial representation as a multi-set of points in the extended 2-dimensional plane. Let f : X → R be continuous. With a minor modification of the original construction, we build this multi-set by adding a copy of the point 12 (y + x, y−x) for each summand with birth at x and death at y; see Figure 1. We refer to this multi-set as the persistence diagram of the function f , denoting it as Dgm(f ), or as Dgmp (f ) if it is restricted to classes of dimension p. Instead of the points in the plane, we sometimes draw the intervals defined such that a point u ∈ Dgm(f ) is contained in a wedge anchored at (x, 0) iff x is contained in the interval of u; see Figure 1. The multi-set of such intervals is the barcode of the function f . Using this wedge, we can determine the Betti numbers of the corresponding sublevel set simply by counting the points of the diagram it contains. More generally, the points of the diagram contained in the wedge anchored at 21 (y + x, y − x) determine the rank of the persistent homology group defined by the inclusion of the sublevel set for x in the sublevel set for y. Some modifications are in order if we substitute the extended module (4) for the ordinary module (3). Since we get each value twice, we get a multi-set in a double covering of the plane. As a benefit of the complication, we can read the ranks of the persistent homology groups of all sublevel and superlevel sets as well as of all level and interlevel sets of the function; see [8]. Stability. To compare the diagrams of two functions, f, g : X → R, we may the Wasserstein distance between them, which is defined as the q-th root of infimum, over all matchings between the points, of the sum of q-th powers of edge lengths: q1 X q ku − γ(u)k∞ , Wq (Dgm(f ), Dgm(g)) = inf γ
use the the
(5)
u∈Dgm(f )
where q is a positive real number; see e.g. [42]. In the limit, for q going to infinity, we get the bottleneck distance, which is the length of the longest edge in the best matching. For these definitions to make sense, we add infinitely many copies of every point on the horizontal axis to the diagrams; they guarantee that there are bijections between the multi-sets. An important property of the bottleneck distance is its stability with respect to perturbations. Specifically, we have W∞ (Dgm(f ), Dgm(g)) ≤
kf − gk∞ ,
(6)
whenever f and g are both tame, by which we mean that they have only finitely many critical values, and all sublevel sets have finite rank homology groups. This is the Bottleneck Stability Theorem first proved in [11]. A word of caution is in order: (6) implies that the critical value pairs that define the points in the diagram are stable, but it does not imply that the critical values or the critical points are stable. In fact, they are not. While the bottleneck distance leads to a very general stability result, it has drawbacks in practice because it is sensitive to only the worst edge in the best
38
Herbert Edelsbrunner, Dmitriy Morozov
matching. The other Wasserstein distances do not imply stability for quite as general a class of functions, but they do so for interesting classes, such as for Lipschitz functions [13]. A different extension of the stability result — from tame functions to parametrized families of vector spaces — appears in [10]. 2.4. Computation. An alternative to the algebraic description of homology based on chain complexes is the computational description based on boundary matrices. The algorithms form the bridge that connects the rich field of algebraic topology with applications, as discussed in Section 3. Matrices and ranks. The p-th boundary matrix, denoted as Dp , is a computationally convenient representation of the p-th boundary homomorphism, ∂p . Its columns are indexed by the p-dimensional cells, its rows by the (p − 1)-dimensional cells, and Dp [i, j] stores the coefficient of the i-th (p−1)-cell in the boundary of the j-th p-cell. Recall that a p-chain is a formal sum of p-cells. Writing it as a column vector, c, we can multiply with the matrix to get its boundary, Dp c, again written as a column vector. By construction, the column space of Dp is isomorphic to the group of (p − 1)-boundaries. Similarly, the null space of Dp is isomorphic to the group of p-cycles. Since the p-th homology group is the quotient of the p-th cycle group over the p-th boundary group, we get its rank as the dimension of the null space of Dp minus the dimension of the column space of Dp+1 . To compute these dimensions, we put the boundary matrices into normal form in which an initial segment of the diagonal contains 1’s while the rest of the matrix is zero. To do this, we use elementary row and column operations: 1. exchange two rows or two columns; 2. add a row to another row or a column to another column; 3. multiply a row or a column with a coefficient from the field. Similar to Gauss-Jordan elimination, we apply these operations to move a 1 to the upper-left corner and to zero out its row and its column. The normal form is then completed by recursing on the smaller matrix obtained by removing the lead row and the lead column. Of course, the recursion halts when the remaining matrix is empty or zero. The row operations can be summarized by multiplying the boundary matrix from the left, and the column operations by multiplying from the right. This gives Np = Up−1 Dp Vp , where Np is the matrix in normal form, which provides all the information we need: zp bp βp
= #zero columns in Np = rank of null space; = #non-zero columns in Np+1 = rank of column space; = zp − bp = rank of homology group.
The auxiliary matrices, Up−1 and Vp , provide additional information, which is sometimes useful. In particular, the last zp columns of Vp give a basis of the p-th −1 cycle group, and the first bp−1 columns of the inverse, Up−1 , give a basis of the (p − 1)-st boundary group.
Persistent homology: theory and practice
39
Preserving order. We can do more with less: we can compute homology as well as persistence while stopping short of reducing the boundary matrix to normal form. To describe how this works, we put all boundary information into a single matrix, D. We assume that the topological space is constructed one cell at a time, making sure that each cell is preceded by its faces. Denoting the corresponding ordering of the cells by σ1 , σ2 , . . . , σm , D[i, j] is the coefficient of σi in the boundary of σj . We reduce D with a subset of the column operations, refraining from exchanging columns and adding columns from left to right. The algorithm pays special attention to the lowest non-zero entry in each column, which we may assume is 1. If all lowest 1s appear in distinct rows, then the matrix is reduced. To get D into this form, we iterate through the columns from left to right, reducing each column by subtracting multiples of conflicting preceding columns. The greedy nature of the process ensures that the resulting matrix is reduced. As before, we can express the operations as a multiplication with another matrix: R = DV , where R is reduced and V is invertible and upper-triangular. While this decomposition is not unique, the lowest 1s in R are unique [14]. We get all information from their number and their locations within the reduced matrix. Similarly as before, the number of zero columns that belong to p-cells is the rank of the p-th cycle group, and the number of lowest 1s in columns of p-cells is the rank of the (p − 1)-st boundary group. But we can extract more: • adding σj gives birth to a homology class iff column j of R is zero; • in contrast, adding σj kills a homology class iff column j is non-zero; letting R[i, j] be its lowest 1, σj kills the class born with the addition of σi . If non-zero, column j of R contains a cycle representative of the dying class; it is the boundary of the chain in column j of V . To prove these relationships, we assume again that R[i, j] is the lowest 1 in column j. The representative cycle of the dying class thus contains σi , which implies that it did not exist before the addition of σi . All classes born before the addition of σi cannot die with the addition of σj , else we could prove inductively that R is not yet reduced. The above greedy algorithm is due to [22]. Its running time is the total squared persistence of the filtration. In the worst case, it is proportional to m3 but shows significantly better performance in practice. The worst-case time can be improved to mω , where ω = 2.372 . . . [37], which is the currently best upper bound on the complexity of matrix multiplication [46]; see also Strassen [41] for a milestone paper in the sequence of improvements. These algorithms work for arbitrary boundary matrices, while we can sometimes exploit special structure to get faster algorithms. For example, if our space is a 2-manifold, we can use Poincar´e duality and limit the computation to 0-dimensional homology. In this case, a combinatorial algorithm maintaining disjoint sets computes persistence in time proportional to m log m [26]. A common special case are regular cubical grids used in image processing. The algorithm in [44] takes full advantage of the possibility to compute boundaries implicitly, through subscript computations in an array. Additional savings are possible if we use hierarchical cubical complexes [3], such as quad- and oct-trees.
40
Herbert Edelsbrunner, Dmitriy Morozov
3. The practice In this section, we discuss four applications of persistent homology: the first to atomic structures highlighting the role of scale, the second to human jaws illustrating derived metrics, the third to root systems controlling topological connectivity, and the fourth to natural images mapping data to high dimensions.
3.1. The atomic structure of material. Nature is full of structures that possess features on multiple scales. Persistent homology quantifies scale and can be used to measure the relative abundance of one scale to another. In this section, we approach simulated organic material from this angle, following the work of MacPherson and Schweinhart [36], who take steps toward characterizing the statistical distribution of scale.
Pockets and cages. Let X be a union of finitely many closed balls in R3 . We may think of X as the geometric model of a protein, as commonly used in structural molecular biology [33]. More interesting for biological questions than the model itself is, in many ways, its complement. The cavities of the model are prime candidate areas for interactions with small ligands and other proteins. Here, ‘cavity’ is an informal term for a depression or a partially protected area of the surface that is still accessible from the outside. In an effort to make this intuition concrete, [19] introduces the notion of a pocket, which is a subset of R3 − X that turns into a void under uniform thickening of X; see [35] for a biologically motivated study of their volume and shape. Similar to the evolution under thickening, which can be complicated, pockets exhibit hierarchical structure. Without going into detail, we note that each pocket of X corresponds to a point in the 2-nd persistence diagram of the distance function, dX : R3 → R, defined by the geometric model. A point u = 12 (y + x, y − x) corresponds to a void that forms at the thickening radius x and disappears at the radius y. The existence of this point does not contradict the possibility of the void splitting up into two at a radius x < r < y, with one of the two voids disappearing at r < s < y. In this particular case, we have a side pocket that corresponds to another point, 21 (s + r, s − r), in Dgm2 (dX ). This interpretation of points in Dgm2 (dX ) raises the question about the meaning of points in the 0-th and the 1-st diagrams. We find a common metaphor by interpreting their geometric realizations as cages with dimension and scale. For example, the pocket corresponding to u cages a ball of radius between x and y; it does not have enough space for a ball of radius larger than y, and it cannot prevent a ball of radius smaller than x from escaping. Similarly, a point v = 21 (y + x, y − x) in Dgm1 (dX ) cages an endless tube of cross-section radius between x and y. Indeed, we can move the tube through the partial loop, but we cannot remove it unless we find a place where its cross-section has radius less than x. Finally, a point w = 21 (y + x, y − x) in Dgm0 (dX ) cages a closed surface uniformly thickened to radius between x and y.
Persistent homology: theory and practice
41
Random polymers. We construct idealized geometric models of polymers iteratively, at each step randomly adding a unit ball to the growing structure. We call the result a branched polymer if the new ball is glued at a single point, and this point is chosen uniformly at random. As illustrated in Figure 2, the set of points on the boundary that are available for gluing can be constructed by first doubling the radius and second shrinking the boundary back to the original model. This
Figure 2. A collection of 12 touching unit disks. The set of points on the boundary where a 13-th disk can be glued without creating any additional intersection is constructed using the boundary of the union of the disks with twice the radius.
set is a union of open patches on the spheres bounding the balls, and its area can be computed using software based on alpha shapes [31]. To have a comparison, we introduce Brownian trees, which are constructed the same way except that the uniform distribution over the mentioned set is replaced by another distribution that takes into account the difficulty of reaching a point with a unit ball approaching the union by Brownian motion from infinity. The branched peptides and the Brownian trees are easy to distinguish. Indeed, a branched peptide cannot have a large void; a ball that could comfortably fit inside would be added sooner or later. In contrast, a Brownian tree can protect a large void with narrow entrances. Letting dP , dT : R3 → R be the distance functions defined by a branched peptide and by a Brownian tree, we therefore expect points with large persistence in Dgm(dT ) but not in Dgm(dP ). There is a less obvious difference in the horizontal or scale direction. Let Pp (x) be the number of points in Dgmp (dP ) with death value plus birth value at least 2x, and let Tp be the similarly defined function for the Brownian tree. MacPherson and Schweinhart find experimental evidence that P1 and P2 are both roughly a constant times x12 , which is the motivation to say that the branched peptides have persistence dimension 2, both for 1-cages and for pockets. No such exponent seems to exist for Brownian trees.
42
Herbert Edelsbrunner, Dmitriy Morozov
3.2. The shape of a human jaw. Its stability suggests the Wasserstein distance between persistence diagrams as a similarity measure for shapes. Indeed, it is difficult to compare shapes directly, but it is easy to compute and compare persistence diagrams for suitably chosen functions. We discuss this approach by, first, presenting a relation between persistence and the Gromov–Hausdorff distance and, second, reviewing an application to human jaws. Comparing shapes and metrics. The comparison or fitting of shapes arises in many walks of life — too many to warrant an example. For solid shapes, we may focus on prominent protrusions and cavities, but this is less effective when we deal with flexible shapes. An important subclass of the latter has a boundary that folds but does not stretch or shrink. More formally, this boundary is a space with a constant metric. Following this line of thought, Chazal et al. [9] consider finite metric spaces, X and Y, and use persistence diagrams as their stable signatures. We need some definitions to state their results. A correspondence between X and Y is a subset of X × Y whose projections back to X and to Y are the entire spaces. The Gromov–Hausdorff distance between the two spaces is GH(X, Y)
=
1 inf sup kx − x0 kX − ky − y 0 kY , 2
(7)
where we take the infimum over all correspondences, γ, and the supremum over all pairs (x, y) and (x0 , y 0 ) in γ. With this definition in place, we consider the Vietoris–Rips complex for a distance threshold, a ≥ 0. Specifically, we draw an edge between any two points at distance at most a, and we let Ripsa (X) be the flag complex defined by these edges; see Section 2.1. Varying a, we let RX be the resulting sequence of spaces, and, applying the homology functor, we get a persistence module characterized by the persistence diagram, which we denote as Dgm(RX ). A consequence of the main result in [9] is a relation between the bottleneck distance and the Gromov–Hausdorff distance: W∞ (Dgm(RX ), Dgm(RY )) ≤ GH(X, Y).
(8)
We thus get a lower bound on a quantity that is generally difficult to compute and to approximate. If we accept the Gromov–Hausdorff distance as a reasonable comparison of metric spaces, we can use the bottleneck distance between persistence diagrams to disprove that two spaces are similar. But because the inequality is one-sided, we cannot prove their similarity. Average and individual jaws. An important shape in orthodontics is the human jaw. Comparisons between them have several practical applications: one being the recognition of medical conditions, such as the Habsburger chin; another is the monitoring of ongoing treatments. Traditionally, this comparison is done with a standardized version of the landmark method in statistical shape analysis [18]. Here, we describe an enhancement of this method using persistent homology, as employed by Gamble and Heo [25]. In this particular study, they consider a collection of N = 240 jaw bones, each represented by k = 22 landmark points chosen
43
Persistent homology: theory and practice
by an expert for their clinical relevance. The points are labeled and denoted as uji , for 1 ≤ i ≤ k and 1 ≤ j ≤ N . After aligning the jaw bones in R3 , we average PN the landmarks to get k points ui = ( j=1 uji )/N , which we call the mean shape of the N data sets. The k points define a Delaunay triangulation, D, which is a 3-dimensional simplicial complex with probability 1. We view it as an abstract (as opposed to geometric) simplicial complex, and filter it differently for each data set. To describe this filtration, let Dj be the j-th copy of the Delaunay triangulation, and define the weight of the edge connecting the points ui and ui0 as weightj (i, i0 )
=
kuji − uji0 k PN
`=1
ku`i − u`i0 k
.
Inspired by the construction of the Vietoris–Rips complex, we use a real threshold a ≥ 0 and filter Dj by taking the maximal subcomplex whose edges have weight at most a. This gives N filtrations of the Delaunay triangulation and, correspondingly, N persistence diagrams, one for each data set. The final analysis is done in the space of persistence diagrams, which is an important point in this story. Fixing q = 2, we get the Wasserstein distance, Wq , between every pair of diagrams. Doing this for individual dimensions but also for the cumulative diagrams, the most interesting results appear in dimension 1. Switching to traditional methods, the pairwise Wasserstein distances are used to embed the data as points in R2 using multidimensional scaling [32]. Closely examining the results, Gamble and Heo find that one of the coordinates correlates with the expansion of the jaw — the treatment used on the patients. In particular, it distinguishes between the control group and the two treatment groups as they evolve over time. Interestingly, the inter-landmark distances that have the highest positive correlation with that coordinate are those that cross the mouth and measure the width of the jaw. 3.3. The connectivity of root systems. A common theme in the reconstruction of shapes is topologically correct connectivity. One example are brain surfaces, which, at the commonly adopted scale, all have the topology of the 2-dimensional sphere. Another example are root systems of agricultural plants, which are thickened 1-dimensional trees. In this section, we focus on the contribution of persistent homology to the control of the topological connectivity. Reconstruction by ordered selection. A common paradigm in the reconstruction of shapes is the selection of cells from an underlying collection, U. This is often facilitated by estimating a fitness value for each cell; that is: a function f : U → R. Given a threshold, α, we select all cells with fitness at least α. In other words, we reconstruct the shape f −1 [α, ∞). This is also the strategy in the reconstruction of root systems as described in [47]. Inevitably, there are cells with fitness value close to the threshold for which the decision depends on chance. To avoid such cases, we may put effort into improving the accuracy of the fitness function. Here, we follow an alternative approach that uses global information to influence the selection.
44
Herbert Edelsbrunner, Dmitriy Morozov
To make the setting more concrete, assume U is a decomposition of a compact subset of R3 into unit cubes called voxels. The information about the root system is obtained from a collection of 2-dimensional photographs taken from different directions, each segmented into foreground and background, the former being the projection of the root system onto the plane of the camera. We construct the shape as the collection of voxels all of whose projections belong to foregrounds. To make this more realistic, we allow for ambiguity entering the setting through uncertainty about the position and the angle of the camera, imperfect lighting conditions, optical distortion, shape details that challenge the resolution of our observations, etc. Instead of a binary we get a real-valued fitness function, as discussed earlier. To shed light on the dependence of the reconstruction on the threshold, we sort the voxels in the order of non-increasing fitness, adding their square sides, edges, and vertices, making sure that every cell (of any dimension) succeeds its faces in the ordering. We call the result a filter, listing its cells in order as σ1 , σ2 , . . . , σm . Letting Ki be the complex consisting of σ1 to σi , we get a filtered complex: ∅ = K0 ⊆ K1 ⊆ . . . ⊆ Km . Each complex Ki is our best choice for f (σi ) ≥ α > f (σi+1 ). How do we know that it would not be better to use a slightly different threshold or to permute some of the cells with same or similar fitness values? We use persistence to elucidate this question. When the target connectivity is clear, this perspective leads to improved local choices. For a root system, we expect β0 = 1 and β1 = β2 = 0; that is: a connected shape without tunnels and voids. Local reordering. To get started, we apply the homology functor to get a persistence module 0 = H(K0 ) → . . . → H(Km ). As explained in Section 2.2, homology classes are born and die. In our case, they correspond to components, loops, and closed walls. Since the complex is built up one cell at a time, we can associate these events with individual cells. dim = 0: a vertex gives birth to a new component; there is no other case. dim = 1: an edge gives either birth to a loop, or it kills a component by bridging the gap to another component. dim = 2: a square gives either birth to a wall, or it kills a loop by filling in the last opening of the tunnel. dim = 3: a voxel kills a wall by filling in the last piece of the void it surrounds; there is no other case. Importantly, we can associate each birth with a death, or with infinity if it marks a homology class of the last complex. We visualize these pairs as intervals in the barcode, paying special attention to the ones that contain the threshold, α. Suppose there are β0 + β1 + β2 such intervals, and note that they correspond to the components, loops, and walls in Ki , where f (σi ) ≥ α > f (σi+1 ). In the lucky case, we have β0 = 1 and β1 = β2 = 0, and, therefore, a connected reconstruction of the root system, without loops or voids, as desired. Otherwise, we aim at removing
Persistent homology: theory and practice
45
all surplus intervals, which we do by modifying the fitness values of the cells. This may lead to changes in the ordering, which we decompose into transpositions of contiguous cells, an operation we discuss next. Suppose we increase the fitness of a cell σ = σ` , let τ = σ`−1 be the cell to its left, and assume that the increase improves the fitness of σ beyond that of τ . If there is no dependence between σ and τ , then we can just transpose them. If the transposition affects the pairing we call it a switch and refer to [14] for a complete analysis and a fast update algorithm. The most interesting case is the switch in which σ and τ change their status: from giving birth to giving death and vice versa. Finally, if τ is a face of σ, then the transposition is prohibited, and we have to increase the fitness value of τ along with that of σ. Moving τ may have an adverse effect on the connectivity at α since τ may be the endpoint of another interval. Indeed, obstacles to repair cannot always be avoided as the general problem of optimal reconstruction is NP-hard [2]. Notwithstanding these shortcomings, the filter is an efficient mechanism for the control of the topology of the reconstructed shape. Considering the widespread need of topology repair in the applications, this presents a significant potential for the improvement of reconstruction algorithms. 3.4. The statistics of natural images. After discussing low-dimensional applications, we are ready to extrapolate what we learned to dimensions beyond the visible. The need for such extensions is substantial because scientists collect progressively more and larger datasets whose meaning is hidden in the invisible dimensions. An example are cancer profiles which promise to shed new light on individual differences. In this section, we focus on high-dimensional data derived from photographs, following the work of Carlsson et al. [6]. Image statistics. To understand the variation of receptive properties of simple cells in the mammalian visual cortex, van Hateren and van der Schaaf [30] used a carefully calibrated digital camera to gather a collection of 4,212 images of natural environments — woods, open landscapes, and urban areas. Following earlier work, their research relates the statistics of such images to the cell properties and supports the proposition that the cells have evolved to process natural images. Moving toward a mathematically accessible setting, Carlsson et al. [6] consider the topology of 3-by-3 high-contrast patches extracted from these photographs. Studies show that humans look more in the regions of high spatial contrast, which justifies their emphasis. The restriction to the small patches is especially interesting. It allows to dramatically reduce the dimensionality of the problem to nine while preserving information about the global statistics of the image. The patches are selected as follows: (1) after picking 5, 000 patches in each image, we treat each one as a vector with nine coordinates (one per pixel) and therefore as a point in R9 ; (2) we subtract the average from each component, noting that this puts every point on a hyperplane, which we identify with R8 , and moves low contrast patches close to the origin of R8 ;
46
Herbert Edelsbrunner, Dmitriy Morozov
(3) defining the contrast of a patch as the norm of the point, we select the 1, 000 patches with highest contrast from each image; (4) normalizing by the contrast, we obtain a set of points in the 7-dimensional sphere in R8 , which we denote as S7 . For computational reasons, the space is down-sampled further, from about four million to 50, 000 points. Even after the initial filtering, the data set contains more information than we can comprehend. Therefore, it is prudent to focus on its core subsets to expose otherwise obscured phenomena. To this end, the remaining highcontrast patches are filtered by their local densities in S7 . Specifically, we compute the distance to the k-nearest neighbor for each point, and we write X(k, P ) for the top P percent of the points ordered by this distance measurement. Popular subspaces. Like the knobs on a microscope, the parameters k and P control the focus of our view. At the coarsest scale, the space X(300, 30) consists of a single circle, noticeable in the 1-dimensional persistence diagram of the distance function. Inspecting it, Carlsson and collaborators find that it consists of linear gradients, rotating around the center of the patch. In Figure 3, it is depicted by the patches on the two horizontal gray lines; they connect into a single circle by identifying the matching patches at their opposite ends.
Figure 3. The Klein bottle of 3-by-3 patches. The horizontal edges are glued to each other from left to right, and the vertical edges are glued with a twist.
After sharpening the view by transitioning to X(15, 30), the 1-dimensional persistence diagram detects five prominent homology classes. Inspection of the point set verifies the ‘3-circle model’, suggested in earlier work by Carlsson and de Silva. In addition to the first, two more circles appear in X(15, 30) and intersect the primary circle in two points each. (The first Betti number of the resulting space is indeed five.) In Figure 3, the matching patches at the top and the bottom are
Persistent homology: theory and practice
47
identified, turning the two dotted lines into circles. The appearance of the three circles in the high-density subsample hints at the two preferences in natural images: linear intensity functions as well as vertical and horizontal directions. Turning the knobs further down and more, we fill in lower density regions. Persistent homology of the resulting point set, taken with modulo two coefficients, acquires a 2-dimensional class and retains two independent 1-cycles. The torus and the Klein bottle are the only 2-manifolds with this homology. By examining how they fit into the point set (both experimentally and theoretically), the Klein bottle prevails. Figure 3 illustrates the corresponding arrangement of the patches. Knowing that the bulk of the points lie near a Klein bottle, Carlsson and collaborators push on to find an explicit representation of this 2-manifold inside S7 . The motivation is image compression. A point (a 3-by-3 patch) on the Klein bottle is fully specified by only two coordinates. There are not many points far from the Klein bottle, and each such point can be specified by two coordinates plus a residual description of the difference to the projection onto the Klein bottle.
4. Discussion Persistent homology is a new mathematical concept that has received attention from inside and outside mathematics. It is our interpretation that the reason for the interest is multi-facetted. We hope that the structure of this paper has made this point clear. In particular, • we stress the connection to data in Section 2.1; • we emphasize the algebraic side of persistent homology in Section 2.2; • we explain the stability of its diagrams in Section 2.3; • we sketch its fundamental algorithms in Section 2.4; • we shed light on the role of scale in Section 3.1; • we discuss derived metrics facilitating the analysis of shapes in Section 3.2; • we exhibit the control of topological connectivity in Section 3.3; • and we show that high dimensions aid our understanding in Section 3.4. What are the developments we may expect to push the envelope of the method in the next few years? We see multi-parameter persistence, the statistics of persistence, and persistence for dynamical systems as major thrusts of the current research. All three are driven by applications, as was persistent homology from its very beginning.
48
Herbert Edelsbrunner, Dmitriy Morozov
References [1] P. K. Agarwal, H. Edelsbrunner, J. Harer, and Y. Wang, Extreme elevation on a 2-manifold. Discrete Comput. Geom. 36 (2006), 553–572. [2] D. Attali and A. Lieutier, Optimal reconstruction might be hard. In: Proc. 26th Ann. Sympos. Comput. Geom., 2010, 334–343. [3] P. Bendich, H. Edelsbrunner, and M. Kerber, Computing robustness and persistence for images. IEEE Trans. Visual. Comput. Graphics 16 (2010), 1251–1260. [4] S. Biasotti, L. De Floriani, B. Falcidieno, P. Frosini, D. Giorgi, C. Landi, L. Papaleo, and M. Spagnuolo, Describing shapes by geometrical-topological properties of real functions. ACM Comput. Surveys 40 (2008), 1–87. [5] G. Carlsson, Topology and data. Bulletin Amer. Math. Soc. 46 (2009), 255–308. [6] G. Carlsson, T. Ishkhanov, V. de Silva, and A. Zomorodian, On the local behavior of spaces of natural images. Internat. J. Comput. Vision 76 (2008), 1–12. [7] G. Carlsson and V. de Silva, Zigzag persistence. Found. Comput. Math. 10 (2010), 367–405. [8] G. Carlsson, V. de Silva, and D. Morozov, Zigzag persistent homology and real-valued functions. In: Proc. 25th Ann. Sympos. Comput. Geom., 2009, 227–236. [9] F. Chazal, D. Cohen-Steiner, L. J. Guibas, F. M´emoli, and S. Y. Oudot, Gromov– Hausdorff stable signatures for shapes using persistence. Comput. Graphics Forum 28 (2009), 1393–1403. [10] F. Chazal, D. Cohen-Steiner, L. Guibas, and S. Oudot, Proximity of persistence modules and their diagrams. In: Proc. 25th Ann. Sympos. Comput. Geom., 2009, 237–246. [11] D. Cohen-Steiner, H. Edelsbrunner, and J. Harer, Stability of persistence diagrams. Discrete Comput. Geom. 37 (2007), 103–120. [12] D. Cohen-Steiner, H. Edelsbrunner, and J. Harer, Extending persistence using Poincar´e and Lefschetz duality. Found. Comput. Math. 9 (2009), 79–103. [13] D. Cohen-Steiner, H. Edelsbrunner, J. Harer, and Y. Mileyko, Lipschitz functions have Lp -stable persistence. Found. Comput. Math. 10 (2010), 127–139. [14] D. Cohen-Steiner, H. Edelsbrunner, and D. Morozov, Vines and vineyards by updating persistence in linear time. In: Proc. 22nd Ann. Sympos. Comput. Geom., 2006, 119–126. [15] B. Delaunay, Sur la sph`ere vide. Izv. Akad. Nauk SSSR, Otdelenie Matematicheskikh i Estestvennykh Nauk 7 (1934), 793–800. [16] C. J. A. Delfinado, and H. Edelsbrunner, An incremental algorithm for Betti numbers of simplicial complexes on the 3-sphere. Comput. Aided Geom. Design 12 (1995), 771–784. [17] H. Derksen and J. Weyman, Quiver representations. Notices Amer. Math. Soc. 52 (2005), 200–206. [18] I. L. Dryden and K. V. Mardia, Statistical Shape Analysis. Wiley, Chichester, England, 1998. [19] H. Edelsbrunner, M. A. Facello, and J. Liang, On the definition and the construction of pockets in macromolecules. Discrete Appl. Math. 88 (1998), 83–102.
Persistent homology: theory and practice
49
[20] H. Edelsbrunner and J. Harer, Computational Topology. An Introduction. Amer. Math. Soc., Providence, Rhode Island, 2010. [21] H. Edelsbrunner, D. G. Kirkpatrick, and R. Seidel, On the shape of a set of points in the plane. IEEE Trans. Inform. Theory 29 (1983), 551–559. [22] H. Edelsbrunner, D. Letscher, and A. Zomorodian, Topological persistence and simplification. In: Proc. 41st IEEE Sympos. Found. Comput. Sci., 2000, 454–463, also Discrete Comput. Geom. 28 (2002), 511–533. [23] H. Edelsbrunner and E. P. M¨ ucke, Three-dimensional alpha shapes. ACM Trans. Graphics 13 (1994), 43–72. [24] P. Frosini, A distance for similarity classes of submanifolds of a Euclidean space. Bulletin Australian Math. Soc. 42 (1990), 407–416. [25] J. Gamble and G. Heo, Exploring uses of persistent homology for statistical analysis of landmark-based shape data. J. Multivar. Analysis 101 (2010), 2184–2199. [26] L. Georgiadis, R. E. Tarjan, and R. F. Werneck, Design of data structures for mergeable trees. In: Proc. 17th Ann. ACM–SIAM Sympos. Discrete Alg., 2006, 394–403. [27] R. Ghrist, Barcodes: the persistent topology of data. Bulletin Amer. Math. Soc. 45 (2008), 61–75. [28] A. B. Hamza and H. Krim, Geodesic object representation and recognition. Lect. Notes Comput. Sci. 2886 (2003), 378–387. [29] A. Hatcher, Algebraic Topology. Cambridge Univ. Press, England, 2002. [30] J. H. van Hateren and A. van der Schaaf, Independent component filters of natural images compared with simple cells in primary visual cortex. Proc. Royal Soc. London B 265 (1998), 359–366. [31] P. Koehl, ProShape: Understanding the shape of protein structures. Software at http://biogeometry.duke.edu/software/proshape/. [32] J. B. Kruskal, Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29 (1964), 1–27. [33] B. Lee and F. M. Richards, The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol. 55 (1971), 379–400. [34] J. Leray, L’anneau d’homolgie d’une repr´esentation and Structure de l’anneau d’homolgie d’une repr´esentation. Les Comptes rendus de l’Acad´emie des Sciences 222 (1946), 1366–1368 and 1419–1422. [35] J. Liang, H. Edelsbrunner, and C. Woodward, Anatomy of protein pockets and cavities: measurement of binding site geometry and implications for ligand binding. Protein Science 7 (1998), 1884–1897. [36] R. MacPherson and B. Schweinhart, Measuring shape with topology. arXiv: 1011.2258, 2010. ˇ [37] N. Milosavljevi´c, D. Morozov, and P. Skraba, Zigzag persistent homology in matrix multiplication time. In: Proc. 27th Ann. Sympos. Comput. Geom., 2011, 216–225. [38] J. R. Munkres, Elements of Algebraic Topology. Addison-Wesley, Redwood City, California, 1984. [39] V. Robins, Toward computing homology from finite approximations. Topology Proceedings 24 (1999), 503–532.
50
Herbert Edelsbrunner, Dmitriy Morozov
[40] H. Samet, Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, 2006. [41] V. Strassen, Relative bilinear complexity and matrix multiplication. J. Reine Angew. Math. 375/376 (1987), 406–443. [42] C. Villani, Topics in Optimal Transportation. Amer. Math. Soc., Providence, Rhode Island, 2003. [43] G. Voronoi, Nouvelles applications des param`etres continus a ` la th´eorie des formes quadratiques. J. Reine Angew. Math. 133 (1907), 97–178 and 134 (1908), 198–287. [44] H. Wagner, C. Chen, and E. Vu¸cini, Efficient computation of persistent homology for cubical data. In: Proc. 4th Workshop Topology-based Methods in Data Analysis and Visualization, 2011. [45] S. Weinberger, What is ... persistent homology? Notices Amer. Math. Soc. 58 (2011), 36–39. [46] V. Williams, Breaking the Coppersmith–Winograd barrier. Preprint. (2011) [47] Y. Zheng, S. Gu, H. Edelsbrunner, C. Tomasi, and P. Benfey, Detailed reconstruction of 3D plant root shape. In: Proc. 13th Internat. Conf. Comput. Vision, 2011, 2026– 2033. [48] A. Zomorodian and G. Carlsson, Computing persistent homology. Discrete Comput. Geom. 33 (2005), 249–274.
Herbert Edelsbrunner, IST Austria, Am Campus 1, 3400 Klosterneuburg, Austria, Duke University, Durham, North Carolina, and Geomagic, Research Triangle Park, North Carolina. E-mail: [email protected] Dmitriy Morozov, Lawrence Berkeley National Lab, 1 Cyclotron Road, Berkeley, CA 94720-8139, USA E-mail: [email protected]
In a search for a structure, Part 1: On entropy. Misha Gromov
Abstract. Mathematics is about “interesting structures”. What make a structure interesting is an abundance of interesting problems; we study a structure by solving these problems. The worlds of science, as well as of mathematics itself, is abundant with gems (germs?) of simple beautiful ideas. When and how may such an idea direct you toward beautiful mathematics? I try to present in this talk a few suggestive examples. 2010 Mathematics Subject Classification. 28420, 37135, 82B10. Keywords. Shannon–Kolmogorov–Sinai entropy, Boltzmann formula, von Neumann entropy, strong subadditivity.
1. States, Spaces, Crystals and Entropy. What is the “number of states” of a (classical as opposed to quantum) system, S, e.g. of a crystal? A “naive physicist’s” system is an infinite ensemble of infinitely small mutually equal “states”, where you know nothing about what these states are but you believe they are equal because of observable symmetries of S. The number of these states, although infinite, can be assigned a specific numerical value by comparing with another system taken for a “unit of entropy”; moreover, this number can be measured experimentally with a predictable error. The logarithm of this number is called (mean statistical Boltzmann) entropy of S, [14]. What is the “space of states” of S? No such thing exists – would be our physicists answer (unless he/she is a Shroedinger’s cat). Even the “number of states” – the value of entropy – may depend on accuracy of your data. This S is not a “real thing”, nor is it a mathematician’s “set”, it is “something” that depends on a class of mutually equivalent imaginary experimental protocols. This, probably, appeared gibberish to mathematicians of the late 19th century, when Boltzmann developed his concept of entropy, and even of the early 20th century, when Lebesgue (1902) and Kolmogorov (1933) expressed the ideas of measure and probability in the language of the set theory of Cantor (1873). But now-a-days this “gibberish” (almost) automatically translates to the language of non-standard analysis (Abraham Robinson, 1966) [15], [18] and even easier to that of category theory (Eilenberg–MacLane–Steenrod–Cartan–Grothendieck’s, 1945–1957).
52
Misha Gromov
For instance, our physists’ description of a crystal (see below) amounts to Kolmogorov’s theorem on dynamic entropy of Bernoulli shifts (1958) (that was originally motivated by Shannon’s information theory, 1948). To see this, you just need to realize that “something” of a physicist, is a covariant functor from a suitable “category of protocols” to the category of sets – outcomes of experiments; all you have to do afterwards is to follow the guidelines prescribed by the syntax of category theory. (Arguably, the category language, some call it “abstract”, reflects mental undercurrents that surface as our “intuitive reasoning”; a comprehensive mathematical description of this “reasoning”, will be, probably, even farther removed from the “real world” than categories and functors.) Think of S is an “ensemble” of molecules located at some sites/points in the 3-dimensional Euclidean space, denoted s ∈ S ⊂ R3 , e.g. at the integer points s ∈ S = Z3 ⊂ R3 , (i.e. s = (n1 , n2 , n3 ) such that n1 , n2 , n3 are integers) where each molecule may occupy finitely many, say ks , different states, e.g. levels of energy characterized by ks different colors. Represent such a molecule by a finite set of cardinality ks and declare the Cartesian product of all these finite sets over s ∈ S to be the space of pure states of S; accordingly, let the product of numbers ∏s∈S ks = exp ∑s log ks (which should be properly normalized for infinite S) represent the number of states of S. If, however, molecules exchange pure states exceptionally rarely, we shall see only one state, the perceived entropy will be zero: a state counts only if it visited by molecules with a definite frequency with emission/absorption of energy at the change of states that can be registered by experimental devices. If the molecules at all sites visit their individual states with equal relative frequencies 1/ks , then ∑s log ks is, indeed, a fair measure of entropy, provided the molecules do not interact. Yet, if particles do interact, where, for example, two neighbors in S only reluctantly display the same color, then S will have fewer observable states. How to account for this? And, remember, you have no direct access to the “total space of states”, you do not observe individual molecules, you do not know, a priori, how molecules interact (if at all) and you do not even know what the numbers ks are. What you have at your disposal are certain devices – “state detectors”, call them P , that are also “physical systems” but now with relatively few, say nP , “pure states” in them. You may think of a P as a plate with an array of nP windows that are sensitive to different “colors”. When you “attach” P to S (you do not have to know the physical nature of this “attachment”) you may see flashes of lights in these windows. But you yourself are color blind and you do not know beforehand if two windows have identical or different “colors”. All you can do is to count the numbers of flashes in the windows at various (small or large) time intervals. Moreover, given two P , you do not know if they have identical colors of their respective windows or different ones; yet, if a window P2 is moved along S, by a symmetry, that is a group element γ ∈ Γ = Z3 , then you assume that P2 is “the same” as P1 .
53
In a search for a structure
You assign a number ∣p∣ to each window p in a P attached to S that is the relative frequency of flashes in P ; thus, ∑p ∣p∣ = 1 for all windows. Then you postulate that “entropy of S perceived by P ” call it ent(P ), is given by ent(P ) = − ∑p∈P ∣p∣ log ∣p∣ and you assume that the probability (relative frequency) of observing a sequence of flashes in given windows p1 , p2 , . . . , pN ∈ P at consecutive time intervals is roughly exp(−N ⋅ ent(P )) for all sequences p1 , p2 , . . . , pN of windows where flashes in this order do “realistically occur”. (You can not experimentally verify this – the number exp(N ⋅ ent(P )) may be smaller then nN P but it is still huge.) If you attach two plates P1 and P2 with nP1 and nP2 windows, you regard the pair as a new plate (state detector), denoted P1 ∨ P2 with nP1 ⋅ nP2 windows. You count the numbers of flashes in the pairs of windows (p1 ∈ P1 , p2 ∈ P2 ) and thus define/determine the entropy ent(P1 ∨ P2 ). A possible mathematical representation of a “state detector” P attached to S is a finite measurable partition ⊔p Xp of a measure space X = (X, µ), i.e. X = ⊔p Xp , where µ(X) = ∣P ∣, µ(Xp ) = ∣p∣ and where P1 ∨ P2 becomes ⊔p1 ,p2 Xp1 ∩ Xp2 . But a precise definition of this is heavy: X is not quite a set but “something” associated with a σ-algebra Σ of all (more than continuum!) its subsets; to express this rigorously one needs the language of the Zermelo–Fraenkel set theory. In mathematical practice, one takes a specific model of X, that is a topological space with a Borel measure on it, where X is represented by a set. This is similar to representation of vectors by n-tuples of numbers with a chosen coordinate system in a linear space. On the other hand one can define “measure spaces” without introducing a particular set theoretic model as follows. Finite Measure Spaces. A finite measure space P = {p} is a finite set with a positive function denoted p ↦ ∣p∣ > 0. We think of it as a set of atoms p that are one point sets with positive masses ∣p∣ attached to them. We denote by ∣P ∣ = ∑p ∣p∣ the (total) mass of P . If ∣P ∣ = 1, then P is called a probability space. We manipulate with spaces P as with their underlying sets, denoted set(P ), in-so-far as it does lead to confusion. For example, we speak of subsets P ′ ⊂ P , with mass ∣P ′ ∣ = ∑p∈P ′ ∣p∣ and of maps P → Q that are maps set(P ) → set(Q), etc. f
Reductions and P. Following physicists, we call a map P → Q a reduction if the q-fibers Pq = f −1 (q) ⊂ P satisfy ∣Pq ∣ = ∣q∣ for all q ∈ Q. We also express this by saying that Q is a reduction of P . (Think of Q as a “plate with windows” through which you “observe” P . What you see of the states of P is what “filters” through the windows of Q.) We use the notation P for the category with objects P and reductions taken for morphisms. All morphisms in this category are epimorphisms, P looks very much as a partially ordered set (with P ≻ Q corresponding to reductions f ∶ P → Q and few, if any, reductions between given P and Q) but we treat it for a time being as a general category. Why Category? There is a subtle but significant conceptual difference between
54
Misha Gromov
f
writing P ≻ Q and P → Q. Physically speaking, there is no a priori given “attachment” of Q to P , an abstract “≻” is meaningless, it must be implemented by a particular operation f . (If one keeps track of “protocol of attaching Q to P ”, one arrives at the concept of 2-category.) The f -notation, besides being more precise, is also more flexible. For example one may write ent(f ) but not ent(≻) with no P and Q in the notation. Spaces over P. A space X over P is, by definition, a covariant functor from P to the category of sets, where the value of X on P ∈ P is denoted X (P ). For example, if X is an ordinary measure space, then the corresponding X assigns the sets of (classes of) measure preserving maps (modulo . . . ) f ∶ X → P to all P ∈ P. In general, an element f in the set X (P ) can be regarded as a morphism f ∶ X → P in a category P /X that is obtained by augmenting P with an object corresponding to X , such that every object, in P /X receives at most one (possibly none) morphism from X . Conversely, every category extension written of P with such an object1 defines a space over P. ∨-Categories and Measure Spaces. Given a set I of morphisms fi ∶ x → bi , i ∈ I, in a category, we call these x-fans over {bi }, say that an a-fan fi′ ∶ a → bi lies between x and {bi } if there is a morphism g ∶ x → a such that fi′ ○ g = fi for all i ∈ I. To abbreviate we may say “a between x and bi ”. Call P /X a ∨-category if every X -fan over finitely many Pi ∈ P admits a Q ∈ P between X and {Pi }. Definition. An X over the category P of finite measure spaces P , is called a measure space if P /X is a ∨-category. Minimal Fans and Injectivity. An x-fan over bi in a category is called minimal if every a between x and {bi } is isomorphic to x. (More precisely, the arrow x → a that implements “between” is an isomorphism.) It is obvious that every X -fan over finitely many finite measure spaces Pi ∈ P in a ∨-category over P admits a Q ∈ P between X and {Pi }, such that the corresponding Q-fan over Pi is minimal. This Q, when seen as an object in P is unique up to an isomorphism; the same Q is unique up to a canonical isomorphism in P /X . We call this ∨-(co)product of Pi in P /X and write: Q = ⋁i Pi . This product naturally/functorially extends to morphisms g in P /X , denoted ⋁i gi ∶ ⋁i Pi → ⋁i Pi′ for given reductions gi ∶ Pi → Pi′ . Observe that the ∨-product is defined (only) for those objects and morphisms in P /X that lie under X . An essential feature of minimal fans, say fi ∶ Q → Pi , a feature that does not depend on X (unlike the ∨-product itself) is the injectivity of the corresponding (set) map from Q to the Cartesian product ∏i Pi (that, in general, is not a reduction). 1 This, as was pointed out to me by Thomas Riepe, is uncarefully written. In order to have the “at most one” property each P ∈ P, must appear in the category P /X in several “copies” indexed by the set X (P ).
55
In a search for a structure
Let us express the idea of “number of states” and/or of entropy – logarithm of this number, in rigorous terms by reformulating Bernoulli’s law of large numbers (1713) in the language of P as follows. Cartesian product: P × Q is the is the set of pairs of atoms (p, q) that are given the weights ∣p∣ ⋅ ∣q∣ and denoted pq = (p, q). (This corresponds to observing non-interacting “somethings” with P and Q.) The maps pq ↦ p and pq ↦ q are called Cartesian projections P × Q → P, Q. Notice that, say pq ↦ p, is a reduction only if Q is a probability space. In general, one may rescale/normalize the spaces and make these maps reductions. Such a rescaling, being a non-trivial symmetry, is a significant structure in its own right; for example, the group of families of such “rescalings” leads the amazing orthogonal symmetry of the Fisher metric (see section 2); you are not supposed to say “rescale” and forget about it.) Homogeneous Spaces. A finite measure space P is called homogeneous if all atoms p ∈ P have equal masses ∣p∣. (Categorically speaking, all morphisms P → Q that are invariant under the group of automorphisms of P factor through P → ● for terminal objects ● ∈ P, that are monoatomic spaces.) Entropy of a homogeneous P is defined as the logarithm of the cardinality of set(P ), that is ent(P ) = log ∣set(P )∣. Observe that reductions f ∶ P → Q between homogeneous spaces (non-canonically) split, that is P decomposes into Cartesian product P = P ′ × Q where the projection P → Q equals f . distπ (P, Q) and Asymptotic Equivalence. Let P and Q be finite measure spaces, and let π ∶ P → Q be an injective correspondence that is a partially defined bijective map defined on a subset P ′ ⊂ P that is bijectively sent to Q′ ⊂ Q. Let us introduce a numerical measure of deviation of π from being an isomorphism. To simplify, we assume P and Q are probability spaces, i.e. ∣P ∣ = ∣Q∣ = 1, otherwise, normalize them by p ↦ p/∣P ∣ in P and q ↦ q/∣Q∣ and denote ∣p ∶ q∣ = max(p/q, q/p) for q = π(p) and M = min(∣set(P )∣, ∣set(Q)∣). Let ∣P − Q∣π = ∣P ∖ P ′ ∣ + ∣Q ∖ Q′ ∣, ∣ log P ∶ Q∣π = sup p∈P ′
log ∣p ∶ q∣ , where 0/0 =def 0, log M
and distπ (P, Q) = ∣P − Q∣π + ∣ log P ∶ Q∣π . Call sequence of injective correspondences πN ∶ PN → QN an asymptotic equivalence if distπN (PN , QN ) Ð→ 0 N →∞
and say that two sequences of finite measure spaces PN and QN are asymptotically equivalent if there exists an asymptotic equivalence πN ∶ PN → QN . The law of large numbers applied to the random variable p → log p on P , can be stated as follows.
56
Misha Gromov
Bernoulli Approximation Theorem. The sequence of Cartesian powers P N of every P ∈ P admits an asymptotically equivalent sequence HN of homogeneous spaces. Such a sequence HN is called a homogeneous Bernoulli approximation of P N . Bernoulli Entropy. This is defined as ent(P ) = lim N −1 log ∣set(HN )∣ N →∞
for a homogeneous sequence HN that is asymptotically equivalent to P N . Entropy can be also defined without an explicit use of Bernoulli theorem as follows. Call probability spaces P1 and P2 Bernoulli equivalent if the power sequences P1N and P2N are asymptotically equivalent. The set Ber(P) of the classes of probability spaces P ∈ P under this equivalence carries a natural structure of commutative semigroup corresponding to the Cartesian product P × Q as well as a topology for the metric lim sup distπN (P N , QN ). N →∞
Boltzmann entropy. This, by definition, is the Bernoulli class of P in Ber(P). A posteriori, the law of large numbers shows that this is equivalent to Bernoulli’s definition: Two finite probability spaces P and Q are Bernoulli equivalent if and only if they have equal Bernoulli’s entropies. More precisely, There is a the topological isomorphism of the Bernoulli (Grothendieck) semigroup Ber(P) onto the multiplicative semigroup R×≥1 of real numbers ≥ 1 that extend the homomorphism H ↦ ∣set(H)∣ for homogeneous spaces H ∈ P. The Bernoulli–Boltzmann entropy is then recaptured by composing this isomorphism with log ∶ R×≥1 → R+ . (The mathematical significance of this log is not apparent until you give a close look at the Fisher metric.) Boltzmann Formula: ent(P ) = − ∑ ∣p∣ log ∣p∣ for all finite probability spaces P = {p} ∈ P. p∈P
If ∣P ∣ ≠ 1, then ent(P ) = − ∑ p
⎛ ⎞ ∣p∣ ∣p∣ log = ∣P ∣−1 − ∑ ∣p∣ log ∣p∣ + log ∣P ∣. ∣P ∣ ∣P ∣ ⎝ p ⎠
This is obvious with Bernoulli’s approximation theorem but the original ent(P)= − K ∑p ∣p∣ log ∣pi ∣, where K is the unit conversion constant, is by no means obvious: it provides a non-trivial link between microworld on the 10−9±1 m scale with what we see with the naked eye. Bernoulli–Boltzmann’s definition (unlike − ∑p ∣p∣ log ∣p∣) fully and rigorously expresses the idea that entropy equals the logarithm of the “number of mutually
57
In a search for a structure
equal states encoded/detected by P ” and, thus, makes essential properties of entropy quite transparent. (There is also an information-theoretic rendition of Boltzmann’s argument, often presented as a “bits bargaining” between “Bob and Alice”. Probably, it is understandable by those who is well versed in the stock marked.) For example, one immediately sees the following (log n)-Bound: ent(P ) ≤ log ∣set(P )∣ with the equality for homogeneous spaces with n equal atoms, since the powers P N “Bernoulli converge” to measures with “nearly equal” atoms on subsets SN ⊂ set(P )N , that have cardinalities ∣SN ∣ ≤ ∣set(P )∣N and where N1 log ∣SN ∣ → ent(P ). (Text-book proofs, where − ∑ ∣p∣ log ∣p∣ is taken for the definition of entropy, rely on convexity of x log x. In fact, this convexity follows from the law of large numbers, but the sharpness of the log n-bound, that is the implication ent(P ) = log ∣set(P )∣ ⇒ P is homogeneous, is better seen with ∑ ∣p∣ log ∣p∣, where real analyticity of log x implies sharpness of this (log n)-inequality. Also Boltzmann’s formula implies continuity of entropy as a function of ∣p∣ ≤ 0, p ∈ P .) Functorial Bernoulli. The law of large numbers not only (trivially) yields Bernoulli approximation of objects (finite measure spaces) in P, but also approximation of reduction (morphisms) in P. Namely, Given a reduction f ∶ P1 →P2 , there exists a sequence of reductions φN ∶ H1N →H2N , where H1N and H2N are homogeneous Bernoulli approximations of P1N and of P2N . We call this a homogeneous Bernoulli approximation of the Cartesian powers f N ∶ P1N → P2N of f . The existence of such approximation immediately implies, for example, that Entropy is monotone decreasing under reductions: if P2 is a reduction of P1 then ent(P2 ) ≤ ent(P1 ); in particular, ent(P ∨ Q) ≥ ent(P ) for all P and Q in P /X under X . Let {fi }, i ∈ I be a finite set of reductions between some objects P in P. Ideally, one would like to have to have homogeneous Bernoulli approximations φiN of all fiN , such that [BA]1
[fi = fj ○ fk ] ⇒ [φiN = φjN ○ φkN ],
and such that injectivity/minimality of all fans is being preserved, i.e. [BA]2
minimality of fiν ∶ P → Qν ⇒ minimality of φiν N ∶ HN → Hiν N .
Probably, this is not always possible (I have no specific counterexample), but one can achieve this with the following weaker assumption on the approximating sequences. Call a sequence BN = {bN } of finite measure spaces Bernoulli if it is εN homogeneous for some sequence εN Ð→ 0. This means that the atoms bN in all N →∞
BN satisfy:
1 1 ∣ log ∣bN ∣ + log ∣set(BN )∣∣ ≤ εN + log ∣BN ∣. N N
58
Misha Gromov
A Bernoulli approximation of P N is a Bernoulli sequence BN that is asymptotically equivalent to P N ; accordingly, one defines Bernoulli approximation φN of powers f N of reductions f ∶ P → Q. Now it is easy to see (as in the slice removal lemma from [11]) that the above {fi } do admit Bernoulli (not necessarily homogeneous) approximations that satisfy [BA]1 and [BA]2 . Shannon Inequality. If a fan φi ∶ H0 → Hi of homogeneous spaces is minimal/injective, i.e. the Cartesian product map ×i φi ∶ H0 → ⨉i Hi is injective, then, obviously, ∣set(H0 )∣ ≤ ∏i ∣set(Hi )∣ and ent(H0 ) ≤ ∑i ent(Hi ). This, applied to a (εN -homogeneous) Bernoulli approximation of a minimal/injective fan P0 → Pi of arbitrary finite measure spaces, shows that ent(P0 ) ≤ ∑i ent(Pi ). In particular, if Pi ∈ P /X lie under X (e.g. being represented by finite partitions of an ordinary measure space X), then ent(⋁i Pi ) ≤ ∑ ent(Pi ). i
The above argument, going back to to Boltzmann and Gibbs, is a translation of a naive physicist’s reasoning to mathematical language. In fact, this ∨, physically speaking, is a kind of a sum, the result of pooling together the results of the joint entropy count by all Pi . If all Pi positioned far away one from another on your, say, crystal, then you assume (observe?) that flashes of lights are (essentially) independent: ⋁i Pi = ∏i Pi and ent(⋁i Pi ) = ∑i ent(Pi ). In general however, the observable states may constrain one another by mutual interaction; then, there are less states to observe and ent(⋁i Pi ) < ∑i ent(Pi ) in agreement with experiment. Relative Entropy. Since the fibers Gh = φ−1 (h) ⊂ G, h ∈ H, of a reduction φ ∶ G → H between homogeneous spaces have equal cardinalities, one may define entropy of φ by ent(φ) = log ∣set(Gh )∣, where, obviously, this entropy satisfies: ent(φ) = ent(G) − ent(H). Then one defines relative Boltzmann’s entropy ent(f ) of a reduction f ∶ P → Q between arbitrary finite measure spaces via a homogeneous Bernoulli approximation φN ∶ GN → HN of f N as ent(f ) = lim N −1 ent(φN ). N →∞
Alternatively, one can do it in more abstract fashion with the relative (GrothenÐ→
dieck) Bernoulli semigroup Ber(P) generated by classes [f ] of asymptotic equivalence of reductions f ∈ P with the addition rule [f1 ○ f2 ] = [f1 ] + [f2 ] (compare [1], [16]). Relative Shannon inequality. It is clear with Bernoulli approximation as in the absolute case that reductions fi ∶ Pi ∨ Qi → Qi in P /X for spaces Pi and Qi under X satisfy: ent(⋁i fi ) ≤ ∑ ent(fi ). i
59
In a search for a structure
Since ent(fi ) = ent(Qi ∨ Pi ) − ent(Qi ), this is equivalent to ent(⋁i (Qi ∨ Pi )) − ent(⋁i Qi ) ≤ ∑[ent(Qi ∨ Pi ) − ent(Qi )]. i
Alternatively, one can formulate such an inequality in terms of minimal/injective fans of reductions P → Qi , i = 1, 2, . . . , n, coming along with (cofans of) reductions Qi → R, such that the obvious diagrams commute: ent(P ) + (n − 1) ent(R) ≤ ∑ ent(Qi ). i
Another pleasant, albeit obvious (with Bernoulli), feature of the relative entropy of reductions f ∶ P → Q between probability spaces is the representation of ent(f ) by the convex combination of the entropies of the q-fibers Pq = f −1 (q) ⊂ P , q ∈ Q. Summation Formula: ent(f ) = ∑ ∣q∣ ⋅ ent(Pq ). q
Remark. The above definition of ent(f ) is applicable to f ∶ P → Q, where P and Q are countable probability (sometimes more general) spaces. Possibly, with ent(P ) = ∞ and ent(Q) = ∞ where the formula ent(f ) = ent(P ) − ent(Q) serves as a definition of the difference between these two infinities. Resolution of Infinite Spaces X . Let P /X be the ∨-category associated with X and let us formalize the notion of “equivalent protocols” of our physicist with sequences P∞ = {Pi } of finite objects in P /X , i.e. of finite measure spaces. Say that P∞ resolves a finite measure space Q ∈ P /X that lies under X if there is no eventual gain in state detection if you include Q into your protocol: ent(Q ∨ Pi ) − ent(Pi ) ≤ εi Ð→ 0. i→∞
If P∞ resolves all Q, then, by definition, it is a resolution of X . Infinite Products. Say that X is representable by a (usually countable) Cartesian product Ps ∈ P /X , s ∈ S, briefly, X is a Cartesian product ∏s∈S Ps , if the finite Cartesian products ΠT = ∏s∈T Ps , s ∈ T , lie under X for all finite subsets T ⊂ S and if these ΠT resolve X , namely, some sequence ΠTi resolves X . (The subsets Ti ⊂ S exhaust S in this case.) Examples. A product X = ∏s∈S Ps is called minimal if a Q in P /X lies under X if and only if it lies under some finite product ΠT . For instance, all Q under the minimal Cartesian power { 12 , 21 }S are composed of dyadic atoms. The classical Lebesgue–Kolmogorov product X = ∏s∈S Ps is also a product in this sense, where the resolution property is a reformulation of Lebesgue density theorem, where translation Lebesgue’s density ⇒ resolution goes with the following evident property of relative entropy: f
Let P ← R → Q be a minimal R-fan of reductions, let P ′ ∈ P be a subspace, denote by Rp′ = f −1 (p′ ) ⊂ R, p′ ∈ P ′ , the p′ -fibers of f and let MII (p′ ) be the mass of the second greatest atom in Rp′ .
60
Misha Gromov
If ∣P ∖ P ′ ∣ ≤ λ ⋅ ∣P ∣ and MII (p′ ) ≤ λ∣Rp′ ∣ for some (small) 0 ≤ λ < 1 and all p′ ∈ P ′ , then ent(f ) ≤ (λ + ε) ⋅ ∣set(Q)∣ for ε = ε(λ) Ð→ 0. λ→0
(Secretly, ε ≤ λ ⋅ (1 − log(1 − λ)) by Boltzmann formula.) To see this, observe that ent(Rp ) ≤ ∣set(Rp )∣ ≤ ∣set(Q)∣ for all p ∈ P , that ent(Rp′ ) ≤ ε Ð→ 0 by continuity of entropy for MII (p′ ) Ð→ 0 and conclude by λ→0
using summation formula. Normalization and Symmetry. All of the above properties of entropy of finite spaces appear in Shannon’s information theory. (Probably, this was known to Boltzmann and Gibbs who had never explicitly formulated something so physically obvious.) Granted these, we can now understand what “naive physicist” was trying to say. Infinite systems/spaces X have infinite entropies that need be renormalized, e.g. with some “natural” approximation of X by finite spaces PN , such that “ ent(X ∶ size)” = lim
N →∞
ent(PN ) . “size”(PN )
The simplest case where “size” makes sense is when you state detector PN consists of, say k, “identical parts”; then you may take k for the size of PN . Physically speaking, “identical” means “related by symmetries” of X with detectors attached to it, e.g. for X corresponding to a crystal S. With this in mind, take a finite P and apply several (eventually many) symmetry transformations δ of X to P (assuming these symmetries exist), call the set of these transformation ∆N , denote by ∣∆N ∣ its cardinality and let entP (X ∶ ∆∞ ) = lim ∣∆N ∣−1 ent ( ⋁ δ(P )) N →∞
δ∈∆N
for some sequence ∆N with ∣∆N ∣ → ∞ where a sublimit (let it be physically meaningless) will do if there is no limit. (Caution: transformations of categories are functors, not maps, but you can easily define them as maps in P /X .) The answer certainly will depend on {∆N } but what concerns us at the moment is dependence on P . A single P , and even all ⋁ δ(P ) may not suffice to fully “resolve” X . So δ∈∆N
we take a resolution P∞ = {Pi } of X (that, observe, has nothing to do with our transformations) and define ent(X ∶ ∆∞ ) = entP∞ (X ∶ ∆∞ ) = lim entPi (X ). i→∞
This, indeed, does not depend on P∞ . If Q∞ = {Qi } is another resolution (or any sequence for this matter), then the entropy contribution of each Qj to Pi , that is the difference ent(Pi ∨ Qj ) − ent(Pi ) is smaller than εi = ε(j, i) Ð→ 0 by the above i→∞
definition of resolution.
61
In a search for a structure
Since δ are automorphisms, the entropies do not change under δ-moves and ent(δ(Pi ) ∨ δ(Qj )) − ent(δ(Pi )) = ent(Pi ∨ Qj ) − ent(Pi ) ≤ εi ; therefore, when “renormalized by size” of ∆N , the corresponding ∨-products satisfy the same inequality: ∣∆N ∣−1 (ent [ ⋁ (δ(Pi ) ∨ δ(Qj ))] − ent [ ⋁ δ(Pi )]) ≤ εi Ð→ 0 δ∈∆N
i→∞
δ∈∆N
by the relative Shannon inequality. Now we see that adding Q1 , Q2 , . . . , Qj to P∞ does not change the above entropy, since it is defined with i → ∞ and adding all of Q∞ does not change it either. Finally, we turn the tables, resolve Pj by Qi and conclude that P∞ and Q∞ , that represent “equivalent experimental protocols”, give us the same entropy: entP∞ (X ∶ ∆∞ ) = entQ∞ (X ∶ ∆∞ ), as our physicist has been telling us all along. Kolmogorov Theorem for Bernoulli Systems. Let P be a finite probability space and X = P Z . This means in our language that the corresponding X is representable by a Cartesian power P Z with the obvious (Bernoulli) action of Z on it. If spaces P Z and QZ are Z-equivariantly isomorphic then ent(P ) = ent(Q). Proof. Let Pi denote the Cartesian Power P {−i,...0,...i} , let ∆N = {1, . . . , N } ⊂ Z, observe that {−i,...,i+N } ⋁ δ(Pi ) = P δ∈∆N
and conclude that ent( ⋁ δ(Pi )) = (N + i) ent(P ) for all, i = 1, 2, . . . . Therefore, δ∈∆N
entPi (X ∶ ∆∞ ) = lim N −1 ent( ⋁ δ(Pi )) = lim N →∞
δ∈∆N
N →∞
N +i ent(P ) = ent(P ) N
and ent(X ∶ ∆∞ ) = lim entPi (X ∶ ∆∞ ) = ent(P ). i→∞
Z
Similarly, ent(Q ∶ ∆∞ ) = ent(Q) and since P Z and QZ are Z-equivariantly isomorphic, ent(P Z ∶ ∆∞ ) = ent(QZ ∶ ∆∞ ); hence ent(P ) = ent(Q). Discussion (A) The above argument applies to all amenable (e.g. Abelian) groups Γ (that satisfy a generalized “(N + i)/N → 1, N → ∞” property) where it also shows that if QΓ is a Γ-reduction of P Γ then ent(Q) ≤ ent(P ). (“Reduction” means that QΓ receives a Γ-equivariant measure preserving map from P Γ that is a natural transformation of functors represented by the two Γspaces.)
62
Misha Gromov
To our “naive physicist’s” surprise, the invariance of entropy for “Bernoulli crystals”, was accepted by mathematicians not around 1900 but in 1958 (see [13] for how it was going after 1958). Had it taken so long because mathematicians were discouraged by the lack of “rigor” in physicists’ reasoning? But had this been already known to physicists, rigor or no rigor? (A related result – the existence of thermodynamic limit for a physically significant class of systems, was published by Van Hove in 1949, but no self-respecting physicist, even if he/she understood it, would not care/dare to write anything about one-dimensional systems like P Z with no interaction in them.) Maybe, the simplicity of Kolmogorov’s argument and an apparent inevitability with which it comes along with translation of “baby-Boltzmann” to “babyGroethendieck” is illusory. An “entropy barrier” on the road toward a conceptual proof (unlike the “energy barrier” surrounding a “hard proof”) may remain unnoticed by one who follows the marks left by a pathfinder that keep you on the track through the labyrinth under the “mountain of entropy”. All this is history. The tantalizing possibility suggested by entropy – this is the main reason for telling the story – is that there may be other “little somethings” around us the mathematical beauty of which we still fail to recognize because we see them in a curved mirror of our preconceptions. Ultralimits and Sofic Groups. In 1987 Ornstein and Weiss constructed Γequivariant continuous surjective, hence measure preserving, group homomorphisms AΓ → (A × A)Γ , for all free non-cyclic groups Γ and all finite Abelian groups A. It is unknown, in general, when there is such a continuous surjective (injective, bijective) Γ-equivariant homomorphism AΓ → B Γ for given Γ and compact (e.g. finite) groups A and B but many significant examples of “entropy increasing” Γreductions for non-amenable groups Γ are constructed in [2], and a general result of this kind is available [7] for groups Γ that contain free subgroups. For example, If Γ ⊃ F2 , then there exists a Γ-reduction P1Γ → P2Γ for all finite probability spaces P1 and P2 except for the trivial case where P1 consists of a single atom. (In view of [2], this is likely to be true for all non-amenable groups.) But, amazingly, this was shown by Bowen in 2010, A Γ-isomorphism between Bernoulli systems P1Γ ↔ P2Γ implies that ent(P1 ) = ent(P2 ) for a class of non-amenable groups Γ, including, for example, all residually finite groups such as free groups. One can arrive at this class of groups, they are called sofic following Weiss (2000), by implementing “naive physicist’s reasoning”, (probably close to what Boltzmann had in mind) in terms of non-standard analysis, namely, by completing P not with “projective-like limit spaces” X but with “non-standard spaces” that are objects in a non-standard model P ∗ of the R-valued first order language of P that can be represented as an ultra limit (or ultra product) of P as it is done by Pestov in [19]. Roughly, objects P in P ∗ are, collection of N atoms of weights ∣p∣ where N is an infinitely large non-standard integer, ∣p∣ are positive infinitesimals and where the sum ∑P ∣p∣ is an ordinary real number. Then sofic groups are defined as subgroups of automorphism groups of such spaces.
63
In a search for a structure
These groups seem rather special, but there is no example at the moment of a countable non-sofic group. Probably, suitably defined random groups [17] are nonsofic. On the other hand, there may be a meaningful class of “random Γ-spaces” parametrized by the same probability measure (space) as random Γ. In 2010, Bowen introduced a spectrum of sofic entropies (with some properties reminiscent of von Neumann entropy) and proved, in particular, that Minimal/injective fans of reduction of Bernoulli systems P Γ → QΓi , i = 1, 2, . . . , n, for sofic groups Γ satisfy Shannon’s inequality ent(P ) ≤
∑
ent(Qi ).
i=1,...,n
Moreover, let n = 2 and let QΓ1 → RΓ , QΓ2 → RΓ be reductions, such that the “cofan” QΓ1 → RΓ ← QΓ2 is minimal (i.e. there is no Γ-space Ro between QΓi and RΓ ) and such that the diamond diagram with four arrows, P Γ ⇉ QΓi ⇉ RΓ , i = 1, 2, commutes. Then the four Γ-systems in this diamond satisfy the Relative Shannon Inequality, that is also called Strong Subadditivity: ent(P ) + ent(R) ≤ ent(Q1 ) + ent(Q2 ). (This and everything else we say about sofic entropies, was explained to me by Lewis Bowen.) It may be non-surprising that Shannon inequalities persist in the sofic Γ-spaces categories, since Shannon’s inequalities were derived by Bernoulli approximation from the implication [A injects into B] ⇒ ∣A∣ ≤ ∣B∣, rather than from [B surjects onto A] ⇒ ∣A∣ ≤ ∣B∣, but it is unknown if these inequalities are true/false for a single non-sofic group Γ, assuming such a Γ exists. To get an idea why “injective” rather than “surjective” plays a key role in the sofic world, look at a self-map of a set, say f ∶ A → A. If A is finite, then [f is non-injective] ⇔ [f is non-surjective], but it is not so anymore if we go to some kind of (ultra)limit: (i) non-injectivity says that an equation, namely, f (a1 ) = f (a2 ), does admit a non-trivial solution. This is stable under (ultra)limits and, supported by a counting argument (of the “numbers of pairs of pure states” that come together under fans of reductions), seems to underlay some of Bowen’s entropies. (ii) non-surjectivity says that another equation, f (a1 ) = b, does not always admit a solution. New solutions may appear in the (ultra)limit. (Computationally speaking, you may have a simple rule/algorithm for finding solutions of f (a1 ) = f (a2 ) with a1 ≠ a2 , e.g. for a polynomial, let it be even linear, self-map of a vector space over Fp but it may be harder to obtain an effective description of the image f (A) ⊂ A and even less so of its complement A ∖ f (A).) Questions. Is there a description/definition of (some of) sofic entropies in categorical terms?
64
Misha Gromov
More specifically, consider the category of Lebesgue probability Γ-spaces X for a given countable group Γ and let [X ∶ Γ] be the Grothendieck (semi) group generated by Γ-reductions f with the relations [f1 ○f2 ] = [f1 ]+[f2 ], where, as usual, the Γ-spaces themselves are identified with the reductions to one point spaces. How large is this semigroup? When is it non-trivial? Which part of it is generated by the Bernoulli shifts? If we do not require any continuity property, this group may be huge; some continuity, e.g. under projective limits of reductions (such as infinite Cartesian products), seems necessary, but it is unclear what should be a Γ-counterpart of the the asymptotic equivalence. Also some extra conditions, e.g. additivity for Cartesian products: [f1 × f2 ] = [f1 ] + [f2 ], or at least, [f N ] = N [f ] for Cartesian powers may be needed. Besides, the semigroup [X ∶ Γ] must(?) carry a partial order structure that should satisfy (at least some of ) the inequalities that hold in P, e.g. the above Shannon type inequalities for minimal/injective fans. (I am not certain if there are entropy inequalities for more complicated diagrams/quivers in P that do not formally follow from Shannon inequalities, but if there are any, they may be required to hold in [X ∶ Γ].) The most naive entropy invariant that should be expressible in terms [X ∶ Γ] is the infimum of entropies of generating partitions, or rather, the infimum of (log M )/N , such that the Cartesian power (X N , Γ) is isomorphic to the Bernoulli action of Γ on the topological infinite power space Y = {1, 2, . . . , M }Γ with some Γ-invariant measure Borel probability on Y (that is not necessarily the Cartesian power measure). One may expect (require?) the (semi)group [X ∶ Γ] to be functorial in Γ, e.g. for equivariant reductions (X1 , Γ1 ) → (X2 , Γ2 ) for homomorphisms Γ1 → Γ2 and/or for several groups Γi acting on an X , in particular, for Bernoulli shifts on P ∆ for Γi transitively acting on a countable set ∆.
2. Fisher Metric and Von Neumann Entropy. Let us ponder over Boltzmann’s function e(p) = ∑i pi log pi . All our inequalities for the entropy were reflections of the convexity of this e(p), p = {pi }, i ∈ I, on the unit simplex △(I), ∑i pi = 1, in the positive cone RI+ ⊂ RI . Convexity translates to the language of calculus as positive definiteness of Hessian h = Hess(e) on △(I); following Fisher (1925) let us regard h as a Riemannian metric on △(I). Can you guess how the Riemannian space (△(I), h) looks like? Is it metrically complete? Have you ever seen anything like that? In fact, the Riemannian metric h on △(I) has constant sectional curvature, where the real moment map MR ∶ {xi } → {pi = x2i } is, up to 1/4-factor, an isometry from the positive “quadrant” of the unit Euclidean sphere onto (△(I), h). Unbelievable! Yet this trivially follows from (p log p)′′ = 1/p, since the Riemannian metric induced by MR−1 at {pi } equals √ 2 2 ∑(d pi ) = ∑ dpi /4pi . i
i
In a search for a structure
65
This MR extends to the (full) moment map M ∶ CI → RI+ = CI /TI for M ∶ zi → zi z i where TI is the n-torus naturally acting on CI and where the the restriction of M to the unit sphere in CI → RI+ factors through the complex projective space CP (I) of complex dimension ∣I∣ − 1 that sends CP (I) → △(I). This tells you what you could have been feeling all along: the cone RI+ is ugly, it breaks the Euclidean/orthogonal symmetry of RI – the symmetry is invisible (?) in the category P unless we write down and differentiate Boltzmann’s formula. Now we have the orthogonal symmetry, even better the unitary symmetry of CI , and may feel proud upon discovering the new world where entropy “truly” lives. Well. . . , it is not new, physicists came here ahead of us and named this world “quantum”. Yet, even if disappointed, we feel warm toward Nature who shares with us the idea of mathematical beauty. We are not tied up to a particular orthogonal basis for defining entropy anymore, we forget the coordinate space CI that we regard as a Hilbert space S, where one basis of orthonormal vectors {s} ⊂ S is as good as another. An “atomic measure”, or a pure state P in S is a (complex) line in S with a positive real number ∣p∣ attached to it. In order to be able to add such measures, we regard P it as positive definite Hermitian form of rank one that vanishes on the orthogonal complement to our line, and such that P equals ∣p∣ on the unit vectors in this line. Accordingly, (non-atomic) states P on S are defined as convex combinations of pure ones. In other words, a quantum state P on a Hilbert space S is a non-zero semipositive Hermitian form on S (that customary is represented by a semipositive self adjoint operator S → S) that we regard as √ a real valued quadratic function on S that is invariant under multiplication by −1. (In fact, one could forget the C-structure in S and admit all non-negative quadratic function P (s) as states on S.) We may think of a state P as a “measure” on subspaces T ⊂ S, where the “P -mass” of T , denoted P (T ), is the sum ∑t P (t), where the summation is taken over an orthonormal basis {t} in T . (This does not depend on the basis by the Pythagorean theorem.) The total mass of P is denoted ∣P ∣ = P (S); if ∣P ∣ = 1 then P is called a density (instead of probability) state. Observe that P (T1 ⊕ T2 ) = P (T1 ) + P (T2 ) for orthogonal subspaces T1 and T2 in S and that the tensor product of states P1 on S1 and P2 on S2 , that is a state on S1 ⊗ S2 , denoted P = P1 ⊗ P2 , satisfies P (T1 ⊗ T2 ) = P1 (T1 ) ⋅ P2 (T2 ) for all T1 ⊂ S1 and T2 ⊂ S2 . If Σ = {si }i∈I ⊂ S, ∣I∣ = dim(S) is an orthonormal basis in S then the set P (Σ) = {P (si )} is a finite measure space of mass ∣P (Σ)∣ = ∣P ∣. Thus, P defines a map from the space F rI (S) of full orthonormal I-frames Σ in S (that is a principal
66
Misha Gromov
homogeneous space of the unitary group U (S)) to the Euclidean (∣I∣ − 1)-simplex of measures of mass ∣P ∣ on the set I, that is {pi } ⊂ RI+ , ∑i pi = ∣P ∣. Classical Example. A finite measure space P = {p} defines a quantum state on the Hilbert space S = Cset(P ) that is the diagonal form P = ∑p∈P ∣p∣zp z p . Notice, that we excluded spaces with zero atoms from the category P in the definition of classical measure spaces with no(?) effect on the essential properties of P. But one needs to keep track of these “zeros” in the quantum case. For example, there is a unique, up to a scale homogeneous state, on S that is the Hilbert form of S, but the states that are homogeneous on their supports (normal to 0(S)) constitute a respectable space of all linear subspaces in S. Von Neumann Entropy. There are several equivalent definitions of ent(P ) that we shall be using interchangingly. (1) The “minimalistic” definition is given by extracting a single number from the classical entropy function on the space of full orthonormal frames in S, that is Σ ↦ ent(P (Σ)), by taking the infimum of this functions over Σ ∈ F rI (S), ∣I∣ = dim(S), ent(P ) = inf ent(P (Σ)). Σ
(The supremum of ent(P (Σ)) equals log dim(S). In fact, there always exists a full orthonormal frame {si }, such that P (si ) = P (sj ) for all i, j ∈ I by Kakutani– Yamabe–Yujobo’s theorem that is applicable to all continuous function on spheres. Also, the average of ent(P (Σ)) over F rI is close to log dim(S) for large ∣I∣ by an easy argument.) It is immediate with this definition that The function P ↦ ent(P ) is concave on the space of density states: ent (
P1 + P2 ent(P1 ) + ent(P2 ) )≥ . 2 2
Indeed, the classical entropy is a concave function on the simplex of probability measures on the set I, that is {pi } ⊂ RI+ , ∑i pi = 1, and infima of families of concave functions are concave. (2) The traditional “spectral definition” says that the von Neumann entropy of P equals the classical entropy of the spectral measure of P . That is ent(P ) equals P (Σ) for a frame Σ = {si } that diagonalizes the Hermitian form P , i.e. where si is P -orthogonal to sj for all i ≠ j. Equivalently, “spectral entropy” can be defined as the (obviously unique) unitary invariant extension of Boltzmann’s entropy from the subspace of classical states to the space of quantum states, where “unitary invariant” means that ent(g(P )) = ent(P ) for all unitary transformations g of S. If concavity of entropy is non-obvious with this definition, it is clear that The spectrally defined entropy is additive under tensor products of states: ent(⊗k Pk ) = ∏ ent(Pk ), k
67
In a search for a structure
and if ∑k ∣Pk ∣ = 1, then the direct sum of Pk satisfies ent(⊕k Pk ) = ∑ ∣Pk ∣ ent(Pk ) + ∑ ∣Pk ∣ log ∣Pk ∣, 1≤k≤n
1≤k≤n
This follows from the corresponding properties of the classical entropy, since tensor products of states correspond to Cartesian products of measure spaces: (P1 ⊗ P2 )(Σ1 ⊗ Σ2 ) = P 1 (Σ1 ) × P 2 (Σ2 ) and the direct sums correspond to disjoint unions of sets. (3) Let is give yet another definition that will bring together the above two. Denote by Tε = Tε (S) the set of the linear subspaces T ⊂ S such that P (T ) ≥ (1 − ε)P (S)) and define entε (P ) = inf log dim(T ). T ∈Tε
By Weyl’s variational principle, the supremum of P (T ) over all n-dimensional subspaces T ⊂ S is achieved on a subspace, say S+ (n) ⊂ S spanned by n mutually orthogonal spectral vectors sj ∈ S, that are vectors from a basis Σ = {si } that diagonalizes P . Namely, one takes sj for j ∈ J ⊂ I, ∣J∣ = n, such that P (sj ) ≥ P (sk ) for all j ∈ J and k ∈ I ∖ J. (To see this, orthogonally split S = S+ (n) ⊕ S− (n) and observe that the P -mass of every subspace T ⊂ S increases under the transformations (s+ , s− ) → (λs+ , s− ) that eventually, for λ → +∞, bring T to the span of spectral vectors.) Thus, this entε equals its classical counterpart for the spectral measure of P . To arrive at the actual entropy, we evaluate entε on the tensorial powers P ⊗N on S ⊗N of states S and, by applying the law of large numbers to the corresponding Cartesian powers of the spectral measure space of P , conclude that The limit ent(P ) = lim
1
N →∞ N
entε (P ⊗N )
exists and it equals the spectral entropy of P for all 0 < ε < 1. (One may send ε → 0 if one wishes.) It also follows from Weyl’s variational principle that the entε -definition agrees with the “minimalistic” one. (It takes a little extra effort to check that ent(P (Σ)) is strictly smaller than lim N1 entε (P ⊗N ) for all non-spectral frames Σ in S but we shall not need this.) Unitary Symmetrization and Reduction. Let dµ be a Borel probability measure on the group U (S) of the unitary transformation of S, e.g. the normalized Haar measure dg on a compact subgroup G ⊂ U (S). The µ-average of P of a state P on S, that is called the G-average for dµ = dg is defined by µ ∗ P = ∫ (g ∗ P )dµ for (g ∗ P )(s) =def P (g(s)). G
68
Misha Gromov
Notice that ent(µ∗P ) ≥ ent(P ) by concavity of entropy and that the G-average of P , denoted G ∗ P , equals the (obviously unique) G-invariant state on S such that G ∗ P (T ) = P (T ) for all G-invariant subspaces T ⊂ S. Also observe that the µ-averaging operator commutes with tensor products: (µ1 × µ2 ) ∗ (P1 ⊗ P2 ) = (µ1 ∗ (P1 )) ⊗ (µ2 ∗ (P2 )). If S = S1 ⊗ S2 , and the group G = G1 equals U (S1 ) that naturally acts on S1 (or any G irreducibly acting on S1 for this matter), then there is a one-to-one correspondence between G1 -invariant states on S and states on S2 . The state P2 on S2 that corresponds to G1 ∗ P on S is called the canonical reduction of P to S2 . Equivalently, one can define P2 by the condition P2 (T2 ) = P (S1 ⊗ T2 ) for all T2 ⊂ S2 . (Customary, one regards states as selfadjoint operators O on S defined by ⟨O(s1 ), s2 ⟩ = P (s1 , s2 )). The reduction of an O on S1 ⊗ S2 , to an operator, say, on S2 is defined as the S1 -trace of O that does not use the Hilbertian structure in S.) Notice that ∣P2 ∣ = P2 (S2 ) = ∣P ∣ = P (S), that (∗)
ent(P2 ) = ent(G ∗ P ) − log dim(S1 )
and that the canonical reduction of the tensorial power P ⊗N to S2⊗N equals P2⊗N . Classical Remark. If we admit zero atoms to finite measure spaces, then a classical reduction can be represented by the push-forward of a measure P from a Cartesian product of sets, S = S 1 × S 2 to P 2 on S 2 under the coordinate projection S → S2 . Thus, canonical reductions generalize classical reductions. (“Reduction by G-symmetrization” with non-compact, say amenable G, may be of interest also for Γ-dynamical spaces/systems, for instance, such as P Γ in the classical case and P ⊗Γ in the quantum setting.) A novel feature of “quantum” is a possible increase of entropy under reductions (that is similar to what happens to sofic entropies of classical Γ-systems for nonamenable groups Γ). For example if P is a pure state on S ⊗ T (entropy=0) that is supported on (the line generated by) the vector ∑i si ⊗ ti for an orthonormal bases in S and in T (here dim(S) = dim(T )), then, obviously, the canonical reduction of P to T is a homogeneous state with entropy = log dim(T ). (In fact, every state of P on a Hilbert space T equals the canonical reduction of a pure state on T ⊗ S whenever dim(S) ≥ dim(T ), because every Hermitian form on T can be represented as a vector in the tensor product of T with its Hermitian dual.) Thus a singe classically indivisible “atom” represented by a pure state on S ⊗ T may appear to the observer looking at it through the kaleidoscope of quantum windows in T as several (equiprobable in the above case) particles. On the other hand, the Shannon inequality remains valid in the quantum case, where it is usually formulated as follows. Subadditivity of von Neumann’s Entropy (Lanford–Robinson 1968). The entropies of the canonical reductions P1 and P2 of a state P on S = S1 ⊗ S2 to S1 and to S2 satisfy ent(P1 ) + ent(P2 ) ≥ ent(P ).
69
In a search for a structure
Proof. Let Σ1 and Σ2 be orthonormal basis in S1 and S2 and let Σ = Σ1 ×Σ2 be the corresponding basis in S = S1 × S2 . Then the measure spaces P1 (Σ1 ) and P2 (Σ2 ) equal classical reductions of P (Σ) for the Cartesian projections of Σ to Σ1 and to Σ2 . Therefore, ent(P (Σ1 × Σ2 )) ≤ ent(P (Σ1 )) + ent(P (Σ1 )) by Shannon inequality, while ent(P ) ≤ ent(P (Σ1 × Σ2 )) according to our minimalistic definition of von Neumann entropy. Alternatively, one can derive subadditivity with the entε -definition by observing that entε1 (P1 ) + entε2 (P2 ) ≥ entε12 (P ) for ε12 = ε1 + ε2 + ε1 ε2 and applying this to P ⊗N for N → ∞, say with ε1 = ε2 = 1/3. Concavity of Entropy Versus Subadditivity. There is a simple link between the two properties. To see this, let P1 and P2 be density states on S and let Q = 12 P1 ⊕ 12 P2 be their direct sum on S ⊕ S = S ⊗ C2 . Clearly, ent(Q) = ent(P ) + log 2 On the other hand, the canonical reduction of Q to S equals 21 (P1 + P2 ), while the reduction of Q to C2 = C ⊕ C is 21 ⊕ 21 . Thus, concavity follows from subadditivity and the converse implication is straightforward. Here is another rendition of subadditivity. Let compact groups G1 and G2 unitarly act on S such that the two actions commute and the action of G1 × G2 on S is irreducible, then (⋆)
ent(P ) + ent((G1 × G2 ) ∗ P ) ≤ ent(G1 ∗ P ) + ent(G2 ∗ P )
for all states P on S. This is seen by equivariantly decomposing S into the direct sum of, say n, tensor products: S = ⊕(S1k ⊗ S2k ), k = 1, 2, . . . n, k
for some unitary actions of G1 on all S1k and of G2 on S2k and by observing that (⋆) is equivalent to subaditivity for the reductions of P on these tensor products. Strong Subadditivity and Bernoulli States. The inequality (⋆) generalizes as follows. Let H and G be compact groups of unitary transformations of a finite dimensional Hilbert space S and let P be a state (positive semidefinite Hermitian form) on S. If the actions of H and G commute, then the von Neumann entropies of the G- and H-averages of P satisfy (⋆⋆)
ent(G ∗ (H ∗ P )) − ent(G ∗ P ) ≤ ent(H ∗ P ) − ent(P ).
70
Misha Gromov
Acknowledgement. This was stated in the earlier version of the paper for noncommuting actions with an indication of an argument justifying it. But Michael Walter pointed out to me that if P is G-invariant, then, in fact, one has the opposite inequality: ent(G ∗ (H ∗ P )) − ent(G ∗ P ) ≥ ent(H ∗ P ) − ent(P ). Also he formulated the following (correct) version of (⋆⋆) for non-commuting actions (that follows by the argument similar to that for the derivation of concavity of entropy from subadditivity): ent(G ∗ (H ∗ P )) − ∫ ent(G ∗ (h ∗ P )dh ≤ ent(H ∗ P ) − ent(P ). H
The inequality (⋆⋆), applied to the actions of the unitary groups H = U (S1 ) and G = U (S2 ) on S = S1 ⊗S2 ⊗S3 , is equivalent, by the above (∗), to the following Strong Subadditivity of von Neumann Entropy (Lieb–Ruskai, 1973). Let P = P123 be a state on S = S1 ⊗ S2 ⊗ S3 and let P23 , P13 and P3 be the canonical reductions of P123 to S2 ⊗ S3 , to S1 ⊗ S3 and to S3 . Then ent(P3 ) + ent(P123 ) ≤ ent(P23 ) + ent(P13 ). Notice, that the action of U (S1 ) × U (S2 ) on S is a multiple of an irreducible representation, namely it equals N3 -multiple, N3 = dim(S3 ), of the action of U (S1 ) × U (S2 ) on S1 ⊗ S2 . This is why one needs (⋆⋆) rather than (⋆) for the proof. The relative Shannon inequality (that is not fully trivial) for measures reduces by Bernoulli–Gibbs’ argument to a trivial intersection property of subsets in a finite set. Let us do the same for the von Neumann entropy. The support of a state P on S is the orthogonal complement to the null-space 0(P ) ⊂ S – the subspace where the (positive semidefinite) Hermitian form P vanishes. We denote this support by 0⊥ (P ) and write rank(P ) for dim(0⊥ (P )). Observe that (⇔)
P (T ) = ∣P ∣ ⇔ T ⊃ 0⊥ (P )
for all linear subspaces T ⊂ S. A state P is sub-homogeneous, if P (s) is constant, say equal λ(P ), on the unit vectors from the support 0⊥ (P ) ∈ S of P . (These states correspond to subsets in the classical case.) If, besides being sub-homogeneous, P is a density state, i.e. ∣P ∣ = 1, then, obviously, ent(P ) = − log λ(P ) = log dim(0⊥ (P )). Also observe that if P1 and P2 are sub-homogeneous states such that 0⊥ (P1 ) ⊂ ⊥ 0 (P2 ), then (/ ≥ /)
P1 (s)/P2 (s) ≤ λ(P1 )/λ(P2 )
for all s ∈ S (with the obvious convention for 0/0 applied to s ∈ 0(P2 )).
71
In a search for a structure
If a sub-homogeneous state Q equals the G-average of some (not necessarily sub-homogeneous)state P , then 0⊥ (Q) ⊃ 0⊥ (P )). Indeed, by the definition of the average, Q(T ) = P (T ) for all G-invariant subspaces T ⊂ S. Since Q(0⊥ (Q)) = Q(S) = P (S) = P (0⊥ (Q)) and the above (⇔) applies. Trivial Corollary. The inequality (⋆⋆) holds in the case where all four states: P , P1 = H ∗ P , P2 = G ∗ P and P12 = G ∗ (H ∗ P ), are sub-homogeneous. Trivial Proof. The inequality (⋆⋆) translates in the sub-homogeneous case to the corresponding inequality between the values of the states on their respective supports: λ2 /λ12 ≤ λ/λ1 , for λ = λ(P ), λ1 = λ(P1 ), etc. and proving the sub-homogeneous (⋆⋆) amounts to showing that the implication (≤⇒≤)
λ ≤ cλ1 ⇒ λ2 ≤ cλ12
holds for all c ≥ 0. Since 0⊥ (P ) ⊂ 0⊥ (P1 ), the inequality λ ≤ cλ1 implies, by the above (/ ≥ /), that P (s) ≤ cP1 (s) for all s, where this integrates over G to P2 (s) ≤ cP12 (s) for all s ∈ S. Since 0⊥ (P2 ) ⊂ 0⊥ (P12 ), there exists at least one non-zero vector s0 ∈ 0⊥ (P2 ) ∩ ⊥ 0 (P12 ) and the proof follows, because P2 (s0 )/P12 (s0 ) = λ2 /λ12 for such an s0 . “Nonstandard” Proof of (⋆⋆) in the General Case. Since tensorial powers P ⊗N of all states P “converge” to “ideal sub-homogeneous states” P ⊗∞ by Bernoulli’s theorem, the “trivial proof”, applied to these ideal P ⊗∞ , yields (⋆⋆) for all P . If “ideal sub-homogeneous states” are understood as objects of a non-standard model of the first oder R-language of the category of finite dimensional Hilbert spaces, then the trivial proof applies in the case where the action of G and of H commute, where the role of “commute” is explained later on. In truth, one does not need for the proof the full fledged “non-standard” language – everything can be expressed in terms of infinite families of ordinary states; yet, this needs a bit of additional terminology that we introduce below. From now on, our states are defined on finite dimensional Hilbert spaces SN , that make a countable family, denoted S∗ = {SN }, where where N are members of a countable set N , e.g. N = N with some non-principal ultra filter on it. This essentially means that what we say about S∗ must hold for infinitely many N . Real numbers are replaced by families/sequences of numbers, say a∗ = {aN }, where we may assume, using our ultrafilter, that the limit aN , N → ∞, always exists (possibly equal ±∞). This means, in simple terms, that we are allowed to pass to convergent subsequences as often as we wish to. We write a∗ ∼ b∗ if the corresponding sequences have equal limits. If P∗ and Q∗ are states on S∗ , we write P∗ ∼ Q∗ if P∗ (T∗ ) ∼ Q∗ (T∗ ) for all linear subspaces T∗ ⊂ S∗ . This signifies that lim PN (TN ) = lim QN (TN ) for all TN ⊂ SN and some subsequence of {N }.
72
Misha Gromov
Let us formulate and prove the counterpart of the above implication P (T ) = ∣P ∣ ⇒ T ⊃ 0⊥ (P ) for sub-homogeneous density states P∗ . Notice that P∗ (T ) ∼ ∣P∗ ∣ does not imply that T∗ ⊃ 0⊥ (P∗ ); yet, it does imply that ● there exists a state P∗′ ∼ P∗ , such that T∗ ⊃ 0⊥ (P∗′ ). Proof. Let U∗ be the support of P∗ and let Π∗ ∶ U∗ → T∗ be the normal projection. Then the sub-homogeneous density state Π′∗ with the support Π∗ (U∗ ) ⊂ T∗ (there is only one such state) is the required one by a trivial argument. To complete the translation of the “nonstandard” proof of (⋆⋆) we need a few more definitions. Multiplicative Homogeneity. Let Ent∗ = {EntN } = log dim(SN ) and let us normalize positive (multiplicative) constants (scalars) c = c∗ = {cN } ≥ 0 as follows, 1
∣c∣⋆ = ∣c∗ ∣ Ent∗ . In what follows, especially if “⋆” is there, we may omit “∗”. A state B = B∗ = {BN } is called ⋆-homogeneous, if ∣B(s1 )∣⋆ ∼ ∣B(s2 )∣⋆ for all spectral vectors s1 , s2 ∈ 0⊥ (B) ⊂ S∗ , or, equivalently, if the (unique) sub- homogeneous, state B ′ for which 0⊥ (B ′ ) = 0⊥ (B) and ∣B ′ ∣ = ∣B∣ satisfies ∣B ′ (s)∣⋆ ∼ ∣B(s)∣⋆ for all unit vectors s ∈ 0⊥ (B). Since the number ∣B ′ (s)∣, s ∈ 0⊥ (B ′ ) is independent of s ∈ 0⊥ (A′ ), we may denote it by ∣B∣⋆ . Let B be a ⋆-homogeneous density state with support T = 0⊥ (B) and A a sub-homogeneous density state with support U = 0⊥ (A). If A(T ) ∼ B(T ) = 1 Then there exist a linear subspace U ′ ⊂ U such that ∣dim(U ′ )/dim(U )∣ ∼ 1 and ∣B(s)∣⋆ ∼ ∣B∣⋆ for all unit vectors s ∈ U ′ . Proof. Let ΠT ∶ U → T and ΠU ∶ T → U be the normal projections and let ui be the eigenvectors of the (self-adjoint) operator ΠU ○ ΠT ∶ U → U ordered by their eigenvalues λ1 ≤ λ2 . . . , λi , . . . By Pythagorean theorem, dim(U )−1 ∑i λi = 1−B(T ); therefore the span Uε of those ui where λi ≥ 1 − ε satisfies ∣dim(Uε )/dim(U )∣ ∼ 1 for all ε > 0; any such Uε can be taken for U ′ . ●● Corollary. Let B be be a finite set of ∗-homogeneous density states B on S∗ , such that A(0⊥ (B)) ∼ 1 for all B ∈ B. Then there exists a unit vector u ∈ U = 0⊥ (A), such that ∣B(u)∣⋆ ∼ ∣B∣⋆ for all B ∈ B. This is shown by the obvious induction on cardinality of B with U ′ replacing U at each step. Let us normalize entropy of A∗ = {AN } by setting ent⋆ (A∗ ) = ent(A∗ )/ Ent∗ = {
ent(AN ) } log dim(SN )
73
In a search for a structure
and let us call a vector s ∈ S∗ Bernoulli for a density state A∗ on S∗ , if log ∣A(s)∣⋆ ∼ − ent⋆ (A). A density state A on S∗ is called Bernoulli if there is a subspace U , called a Bernoulli core of A, spanned by some spectral Bernoulli vectors of A, such that A(U ) ∼ 1. For example, all s in the support of a ⋆-homogeneous density state A are Bernoulli. More significantly, the families of tensorial powers, A∗ = {P ⊗N } on S∗ = {S ⊗N }, are Bernoulli for all density states P on S by Bernoulli’s law of large numbers. Multiplicative Equivalence and Bernoulli Equivalence. Besides the relation A ∼ ⋆ B it is convenient to have its multiplicative counterpart, denoted A ∼ B, which signifies ∣A(s)∣⋆ ∼ ∣B(s)∣⋆ for all s ∈ S∗ . Bernoulli equivalence relation, on the set of density states on S∗ is defined as ⋆ ⋆ the span of A ∼ B and A ∼ B. For example, if A ∼ B, B ∼ C and C ∼ D, then A is Bernoulli equivalent to D. Observe that Bernoulli equivalence is stable under convex combinations of states. ⋆
⋆
In particular, if A ∼ B, then G ∗ A ∼ G ∗ B, for all compact groups G of unitary transformations of S∗ (i.e. for all sequences GN acting on SN .) This Bernoulli equivalence is similar to that for (sequences of) classical finite measure spaces and the following two properties of this equivalence trivially follow from the classical case via Weyl variational principle. (We explain this below in “non-standard” terms.) (1) If A is Bernoulli and B is Bernoulli equivalent to A then B is also Bernoulli. Thus, A is Bernoulli if and only if it is Bernoulli equivalent to a sub-homogeneous state on S∗ . (2) If A is Bernoulli equivalent to B then ent⋆ (A) ∼ ent⋆ (B). We write a∗ ≳ b∗ for aN , bN ∈ R, if a∗ − b∗ ∼ c∗ ≥ 0. If B is a Bernoulli state on S∗ and A is a density state, write A ≺ B if B admits a Bernoulli core T , such that A(T ) ∼ 1. This relation is invariant under equivalence A ∼ A′ , but not for B ∼ B ′ . Neither is this relation transitive for Bernoulli states. Main Example. If B equals the G-average of A for some compact unitary transformation group of S∗ , then A ≺ B. Indeed, by the definition of average, B(T ) = A(T ) for all G-invariant subspaces T . On the other hand, if a G-invariant B is Bernoulli, then it admits a G-invariant core, since the set of spectral Bernoulli vectors is G-invariant and all unit vectors in the span of spectral Bernoulli vectors are Bernoulli. Main Lemma. Let A, B, C, D be Bernoulli states on S∗ , such that A ≺ B and A ≺ D and let G be a compact unitary transformation group of S∗ . If C ∼ G ∗ A and D = G ∗ B and if A is sub-homogeneous, then ent⋆ (B) − ent⋆ (A) ≳ ent⋆ (C) − ent⋆ (D).
74
Misha Gromov
Proof. According to ●, there is a state A′ ∼ A, such that it support 0⊥ (A′ ) is contained in some Bernoulli core of B, and since our assumptions and the conclusion are invariant under equivalence A ∼ A′ . we may assume that U = 0⊥ (A) itself is contained in a Bernoulli core of B. Thus, A(s) ≤ cEnt∗ B(u) for all c > exp(ent(B) − ent(A)) and all s ∈ S∗ Also, we may assume that C = G ∗ A since averaging and ent⋆ are invariant under the ∼-equivalence. Then C = G ∗ A and D = G ∗ B also satisfy C(s) ≤ cEnt∗ D(s) for all s ∈ S∗ . In particular, C(u) ≤ cEnt∗ D(u) for a common Bernoulli vector, u of C and D where the existence of such a u ∈ U is ensured by ●●. Thus, ∣C(u)∣⋆ ≤ c∣D(u)∣⋆ for all c > exp(ent⋆ (B) − ent⋆ (A)). Since C and D are Bernoulli, ent⋆ (C) ∼ − log ∣C(u)∣⋆ and ent⋆ (D) ∼ − log ∣D(u)∣⋆ ; hence ent∗ (D) − ent⋆ (C) ≤ c for all c ≤ ent⋆ (B) − ent⋆ (A) that means ent⋆ (B) − ent⋆ (A) ≳ ent⋆ (C) − ent⋆ (D). Proof of (⋆⋆). Let P be a density state on a Hilbert space S, let G and H be unitary group acting on S, and let us show that ent(G ∗ (H ∗ P )) − ent(G ∗ P ) ≤ ent(H ∗ P ) − ent(P ) assuming that G and H commute. In fact, all we need is that the state G ∗ (H ∗ P ) equals the K-average of P for some group K, where K = G × H serves this purpose in the commuting case. Recall that the family {P ⊗N } on S∗ = {SN = S ⊗N } is Bernoullian for all P on S, and the averages, being tensorial powers themselves, are also Bernoullian. Let A∗ = {AN } be the subhomogeneous state S∗ that is Bernoulli equivalent to P ⊗N , where, by the above, their averages remains Bernoullian. (Alternatively, N one could take A⊗M N , say, for M = 2 .) Since both states B and D are averages of A in the commuting case, A ≺ B and A ≺ D; thus the lemma applies and the proof follows. On the above (1) and (2). A density state P on S is fully characterized, up to unitary equivalence, by its spectral distribution function ΨP (t) ∈ [0, 1], t ∈ [0, dim(S)], that equals the maximum of P (T ) over linear subspaces T ⊂ S of dimension n for integer n, and that is linearly interpolated to t ∈ [n, n + 1]. By Weyl’s variational principal this Ψ equals its classical counterpart, where the maximum is taken over spectral subspaces T .
In a search for a structure
75
The ε-entropy and Bernoullian property, are easily readable from this function and so the properties (1) and (2) follow from their obvious classical counterparts, that we have used, albeit implicitly, in the definition of the classical Bernoulli– Boltzmann’s entropy. Nonstandard Euclidean/Hilbertian Geometry. Entropy constitute only a tiny part of asymptotic information encoded by ΨAN in the limit for N → ∞, where there is no problem with passing to limits since, obviously, Ψ are concave functions. However, most of this information is lost under “naive limits” and one has to use limits in the sense of nonstandard analysis. Furthermore, individual Ψ do not tell you anything about mutual positions between different states on S∗ : joint Hilbertian geometry of several states is determined by the complex valued functions, kind of (scattering) “matrices”, say Υij ∶ P i × P j → C, where the “entries” of Υij equal the scalar products between unit spectral vectors of Pi and of Pj . (There is a phase ambiguity in this definition that becomes significant if there are multiple eigenvalues.) Since these Υij are unitary “matrices” in an obvious sense, the corresponding Σij = ∣Υij ∣2 define bistochastic correspondences (customary represented by matrices) between respective spectral measure spaces. (Unitarity imposes much stronger restrains on these matrices than mere bistochasticity. Only a minority of bistochastic matrices, that are called unistochastic, have “unitary origin”. In physics, if I get it right, experimentally observable unistochasticity of scattering matrices can be taken for evidence of unitarity of “quantum universe”.) Moreover, the totality of “entries” of “matrices” Υij , that is the full array of scalar products between all spectral vectors of all Pi , satisfy a stronger positive definiteness condition. At the end of the day, everything is expressed by scalar products between unit spectral vectors of different Pi and the values of Pi on their spectral vectors; non-standards limits of arrays of these numbers fully describe the nonstandard geometry of finite sets of non-standard states on nonstandard Hilbert spaces. Reformulation of Reduction. The entropy inequalities for canonical reductions can be more symmetrically expressed in terms of entropies of bilinear forms Φ(s1 , s2 ), si ∈ Si (i = 1, 2), where the entropy of a Φ is defined as the entropy of the Hermitian form P1 on S1 that is induced by the linear map Φ′1 ∶ S1 → S2′ from the Hilbert form on the linear dual S2′ of S2 , where, observe, this entropy equal to that of the Hermitian form on S2 induced by Φ′2 ∶ S2 → S1′ . In this language, for example, subadditivity translates to Araki–Lieb Triangular Inequality (1970). The entropies of the three bilinear forms associated to a given 3-linear form Φ(s1 , s2 , s3 ) satisfy ent(Φ(s1 , s2 ⊗ s3 )) ≤ ent(Φ(s2 , s1 ⊗ s3 )) + ent(Φ(s3 , s1 ⊗ s3 )). Discussion. Strong subadditivity was conjectured by Lanford and Robinson in 1968 and proved five years later by Lieb and Ruskai with operator convexity techniques.
76
Misha Gromov
Many proofs are based on an easy reduction of strong subadditivity to the trace convexity of the operator function e(x, y) = x log x − x log y. The shortest present day proof of this trace convexity is due to Ruskai [21] and the most transparent one to Effros [9]. On the other hand, this was pointed out to me by Mary Beth Ruskai (along with many other remarks two of which we indicate below), there are by now other proofs of SSA, e.g. in [12] and in [20], which do not use trace convexity of x log x − x log y. 1. In fact, one of the two original proofs of SSA did not use the trace convexity of x log x − x log y either, but relied on the concavity of the map x ↦ trace (ey+log x ) as it is explained in [22] along with H. Epstein’s elegant proof that ey+log x is a trace concave function in x. 2. The possibility of deriving SSA from the trace concavity of ey+log x was independently observed in 1973 by A. Uhlmann who also suggested a reformulation of SSA in terms of group averages. Recently, Michael Walter explained to me that our “Bernoullian” proof is close to that in [20] and he also pointed out to me to the paper [8] where the authors establish asymptotics of recoupling coefficients for tensor products of representations of permutation groups. This refines the Bernoulli theorem and, in particular, directly implies the SSA inequality. Sharp convexity inequalities are circumvented in our “soft” argument by exploiting the “equalizing effect” of Bernoulli theorem that reduces evaluation of sums (or integrals) to a point-wise estimate. Some other operator convexity inequalities can be also derived with Bernoulli approximation, but this method is limited (?) to the cases that are stable under tensorization and it seems poorly adjusted to identification of states where such inequalities become equalities. (I could not find a simple “Bernoullian proof” of the trace convexity of the operator function x log x − x log y, where such a proof of convexity of the ordinary x log x − x log y is as easy as for x log x.) There are more powerful “equalization techniques” that are used in proofs of “classical” geometric inequalities and that involve elliptic PDE, such as solution of Monge–Kantorovich transportation problem in the proof of Bracamp–Lieb refinement of the Shannon–Loomis–Whitney–Shearer inequality (see [3] and references therein) and invertibility of some Hodge operators on toric K¨ ahler manifolds as in the analytic rendition of Khovanski–Teissier proof of the Alexandrov–Fenhcel inequality for mixed volumes of convex sets [10]. It is tempting to to find “quantum counterparts” to these proofs. Also it is desirable to find more functorial and more informative proofs of “natural” inequalities in geometric (monoidal?) categories. (See [4], [23] for how it goes along different lines.) On Algebraic Inequalities. Besides “unitarization” some Shannon inequalities admit linearization, where the first non-trivial instance of this is the following linearized Loomis–Whitney 3D-isoperimetric inequality for ranks of bilinear forms associated with a 4-linear form Φ = Φ(s1 , s2 , s3 , s4 ) where we denote ∣ . . . ∣ = rank(. . .): ∣Φ(s1 , s2 ⊗ s3 ⊗ s4 )∣2 ≤ ∣Φ(s1 ⊗ s2 , s3 ⊗ s4 )∣ ⋅ ∣Φ(s1 ⊗ s3 , s2 ⊗ s4 )∣ ⋅ ∣Φ(s1 ⊗ s4 , s2 ⊗ s3 )∣
In a search for a structure
77
This easily reduces (see [11]) to the original Loomis–Whitney inequality and also can proven directly with Bernoulli tensorisation. But the counterpart to the strong subadditivity – the relative Shannon inequality: ∣Φ(s1 , s2 ⊗ s3 ⊗ s4 )∣ ⋅ ∣Φ(s4 , s1 ⊗ s2 ⊗ s3 )∣ ≤ ∣Φ(s1 ⊗ s2 , s3 ⊗ s4 )∣ ⋅ ∣Φ(s1 ⊗ s3 , s2 ⊗ s4 )∣ (that is valid with ent(. . .) instead of ∣ . . . ∣) fails to be true for general Φ. (The obvious counterexamples can be taken care of with suitable Bernoulli-like-core stabilized ranks, but this, probably, does not work in general.) Such “rank inequalities” are reminiscent of inequalities for spaces of sections and (cohomologies in general) of positive vector bundles such e.g. as in the Khovanski–Teissier theorem and in the Esnault–Viehweg proof of the sharpened Dyson– Roth lemma, but a direct link is yet to be found. Apology to the Reader. Originally, Part 1 of ”Structures” was planned as about a half of an introduction to the main body of the text of my talk at the European Congress of Mathematics in Krak´ow with the sole purpose to motivate what would follow on ”mathematics in biology”. But it took me several months, instead of expected few days, to express apparently well understood simple things in an appropriately simple manner. Yet, I hope that I managed to convey the message: the mathematical language developed by the end of the 20th century by far exceeds in its expressive power anything, even imaginable, say, before 1960. Any meaningful idea coming from science can be fully developed in this language. Well. . . , actually, I planned to give examples where a new language was needed, and to suggest some possibilities. It would take me, I naively believed, a couple of months but the experience with writing this “introduction” suggested a time coefficient of order 30. I decided to postpone.
References [1] J. Baez, T. Fritz, and T. Leinster, A Characterization of Entropy in Terms of Information Loss. Entropy, 13 (2011), 1945–1957. [2] K. Ball, Factors of i.i.d. processes with nonamenable group actions. www.ima.umn.edu/~kball/factor.pdf, 2003. [3] F. Barthe, On a reverse form of the Brascamp–Lieb inequality. Invent. Math. 134 no. 2 (1998), 335–361. [4] P. Biane, L. Bouten, and F. Cipriani, Quantum Potential Theory. Springer 2009. [5] L. Bowen, A new measure conjugacy invariant for actions of free groups. Ann. of Math. 171 No. 2 (2010), 1387–1400. [6] L. Bowen, Sofic entropy and amenable groups. To appear in Ergodic Theory and Dynam. Systems. [7] L. Bowen, Weak isomorphisms between Bernoulli shifts. Israel J. of Math 183 no. 1 (2011), 93–102.
78
Misha Gromov
[8] M. Cristandl, M. B. Sahinoglu, and M. Walter, Recoupling Coefficients and Quantum Entropies. arXiv:1210.0463 (2012). [9] E. G. Effros, A matrix convexity approach to some celebrated quantum inequalities. PNAS 106 no. 4 (2009), 1006–1008. [10] M. Gromov, Convex sets and K¨ ahler manifolds. Advances in Differential Geometry and Topology, ed. F. Tricerri, World Scientific, Singapore, 1-38, 1990. [11] M. Gromov, Entropy and Isoperimetry for Linear and non-Linear Group Actions. Groups Geom. Dyn. 2 no. 4 (2008), 499–593. [12] M. Horodecki, J. Oppenheim, and A. Winter, Quantum state merging and negative information. CMP 269 (2007), 107. [13] A. Katok, Fifty years of entropy in dynamics: 1958–2007. J. of Modern Dyn. 1 (2007), 545–596. [14] O. E. Lanford, Entropy and equilibrium states in classical statistical mechanics, Lecture Notes in Physics 20, 1–113, 1973. [15] P. A. Loeb, Measure Spaces in Nonstandard Models Underlying Standard Stochastic Processes. Proc. Intern. Congr. Math. Warsaw, 323–335, 1983 . [16] M. Marcolli and R. Thorngren, Thermodynamic Semirings. arXiv.org>math> arXiv:1108.2874. [17] Y. Ollivier, A January 2005 invitation to random groups. Ensaios Matem´ aticos [Mathematical Surveys] 10. Sociedade Brasileira de Matem´ atica, Rio de Janeiro, 2005. [18] A. Ostebee, P. Gambardella, and M. Dresden, Nonstandard approach to the thermodynamic limit. I. Phys. Rev. A 13 (1976), 878–881. [19] V. G. Pestov, Hyperlinear and sofic groups: a brief guide. Bull. Symbolic Logic 14(4) (2008), 449–480. [20] R. Renner, Security of Quantum Key Distribution. arXiv:quant-ph/0512258, 2005. [21] M. B. Ruskai, Another Short and Elementary Proof of Strong Subadditivity of Quantum Entropy. arXiv:quant-ph/0604206v1, 27 Apr 2006. [22] M. B. Ruskai, Inequalities for quantum entropy: A review with conditions for equality. Journal of Mathematical Physics 43:58, Issue 9 (2002). [23] E. Stormer, Entropy in operator algebras. In: E. Blanchard, D. Ellwood, M. Khalkhali, M. Marcolli, H. Moscovici, S. Popa (eds.), Quanta of Maths. AMS and Clay Mathematics Institute. 2010. [24] B. Weiss, Sofic groups and dynamical systems. The Indian Journal of Statistics Special issue on Ergodic Theory and Harmonic Analysis 2000, Volume 62, Series A, Pt. 3, 350–359.
Misha Gromov, IHES, Bures-sur-Yvette, France and Courant Institute, NYU, New York, USA.
Classification of algebraic varieties Christopher D. Hacon∗
Abstract. We discuss recent results in the Minimal Model Program that have lead to several breakthroughs in the classification of algebraic varieties.
1. Introduction Algebraic geometry is the study of solutions to polynomial equations, say P1 (x1 , . . . , xn ), . . . , Pr (x1 , . . . , xn ) ∈ k[x1 , . . . , xn ] where k is a field (or even a ring). Depending on the choice of the field k (Q, R, C etc.), the resulting theories have very different flavours. In this note we will focus on the case of complex coefficients k = C. A (Zariski) closed subset of Cn is the subset defined by the zeroes of a finite set of polynomial equations, say P1 (x1 , . . . , xn ), . . . , Pr (x1 , . . . , xn ) ∈ C[x1 , . . . , xn ]. An affine variety is an irreducible closed subset of Cn . A simple example is the cusp {x2 − y 3 = 0} ⊂ C2 . For technical reasons it is convenient to work with compact sets (projective varieties). The natural compactification of Cn is given by complex projective n-space PnC := Cn+1 − (0, . . . , 0) /C∗ . If P (x0 , . . . , xn ) is a homogeneous polynomial of degree m, then P (t·c0 , . . . , t·cn ) = tm · P (c0 , . . . , cn ) for any t ∈ C∗ and any (c0 , . . . , cn ) ∈ Cn+1 . Therefore, even if the value of P (x0 , . . . , xn ) on a point [c0 : . . . : cn ] ∈ PnC is not defined, the zero set of P (x0 , . . . , xn ) is a well defined subset of PnC . For any variety Z ⊂ Cn defined by polynomials Pj (x1 , . . . , xn ) (of degree dj , 1 ≤ j ≤ r), we then consider the compactification Z¯ ⊂ PnC given by the corresponding homogeneous polynomials d x0j Pj (x1 /x0 , . . . , xn /x0 ). For example if Z ⊂ C2 is the cusp defined above, then the projective variety Z¯ ⊂ P2C is defined by the homogeneous equation zx2 −y 3 = 0. We will now focus on smooth projective varieties i.e. varieties Z that are complex manifolds. Notice that “most” varieties are smooth (if Z is defined by {P1 , . . . , Pr } a general choice of polynomials, then the Jacobian (∂Pj /∂xi )i,j has maximal rank at each point of Z and so by the implicit function theorem it defines a complex manifold). Notice also that by Hironaka’s celebrated theorem on the resolution of singularities, one can find a map of varieties X 0 → X which is an isomorphism over the smooth locus of X, such that X 0 is smooth. ∗ The
author was partially supported by NSF research grant no: 0757897
80
Christopher D. Hacon
When dimC Z = 1, complex projective varieties are just compact Riemann surfaces (two dimensional orientable real manifolds). However, since dimC Z = 1, we will refer to them as curves. From a topological point of view curves are classified by their genus g ∈ Z≥0 . There are three rough classes of curves. (1) P1C is the unique curve of genus g = 0. (2) There is a 1 parameter family of curves of genus 1. These are known as elliptic curves and they can be defined by equations of degree 3 in the plane. (3) When g ≥ 2, there is a 3g − 3 parameter family of such curves. It should be noted that these three classes have markedly different behaviour. From the topological point of view, P1C is simply connected, elliptic curves have abelian fundamental group, and curves of genus g ≥ 2 have non-abelian fundamental group. From the differential geometric point of view, we have positive, trivial and negative curvature. From the number theoretic point of view, curves of genus g ≥ 2 are distinguished by the fact that they have finitely many rational points. One of the difficulties in studying projective varieties, is that there are infinitely many different embeddings X = ∩ni=1 {Pi (x0 , . . . , xm ) = 0} ⊂ Pm C. It is important to find a natural (or canonical) description for the embedding X ⊂ Pm C . Since X is a compact complex manifold, the only global holomorphic functions are constant. Therefore, to study X, we consider meromorphic functions with prescribed poles or equivalently sections of line bundles. If L is a line bundle on X, then H 0 (X, L) denotes the vector space of global sections of L and φL : X 99K Pm C the corresponding map to projective space. Here we let h0 (X, L) = dimC H 0 (X, L) and m = h0 (X, L) − 1. The map φL is defined as follows. Let s0 , . . . , sm be a basis of H 0 (X, L), then for any x ∈ X we let φL (x) = [s0 (x) : . . . : sm (x)]. Note that if s0 (x) = . . . = sm (x) = 0, then the point [s0 (x) : . . . : sm (x)] is not defined (hence the broken arrow). We say that such a point x ∈ X is a base point of H 0 (X, L). If there are no base points, then we say that H 0 (X, L) is base point free. In this case we have a morphism φL : X → Pm C (i.e. the map is everywhere 0 defined). Notice that any section 0 = 6 s ∈ H (X, L) corresponds to the pullback Pm m of a linear polynomial l = i=0 ai xi ∈ H 0 (Pm , O PC (1)) and the line bundle L C corresponds to the pullback of the line bundle OPm (1) i.e. we have L ∼ (1). = φ∗L OPm C C 0 m If moreover, H (X, L) defines an embedding φL : X ,→ PC , then we say that L is very ample. It is hard to find interesting line bundles on projective varieties X ⊂ Pm C , and typically the only choice is given by (tensor powers of) the canonical line bundle which is defined to be the line bundle corresponding to the top exterior power of ∨ the cotangent bundle. Thus if dimC X = n, then ωX = ∧n TX . For any m > 0, ⊗m ), which can locally be written one can consider global sections s ∈ H 0 (X, ωX
81
Classification of algebraic varieties
as f (x1 , . . . , xn )(dx1 ∧ . . . ∧ dxn )⊗m for some holomorphic function f (x1 , . . . , xn ). The canonical ring, is the graded ring defined by R(X, ωX ) =
M
⊗m H 0 (X, ωX ).
m≥0 ∨ When dimC X = 1, then ωX = Ω1X = TX is the line bundle corresponding to holomorphic 1-forms. In this case we have h0 (X, ωX ) = g and deg ωX = 2g − 2 where g denotes the genus of X. We have ωP1C = OP1C (−2) and thus R(P1C , ωP1C ) = C and if g = 1, then ωX ∼ = OX so that R(X, ωX ) ∼ = C[t]. Thus, R(X, ωX ) is not ⊗3 interesting for g ≤ 1. However, if X is of general type, i.e. if g ≥ 2, then ωX ⊗3 ∗ is very ample and in particular ωX = φω⊗3 OPm (1). It is then easy to see that C X
the graded ring R(X, ωX ) is finitely generated and determines X. More precisely we have that X = ProjR(X, ωX ) so that X is determined by the generators and relations of the ring R(X, ωX ). Consider now the map f : X 99K P2C defined by φω⊗3 followed by a generic proX 2 jection Pm C 99K PC , then f (X) has degree 6g − 6 and hence its equation depends on finitely many parameters (the coefficients of a homogeneous polynomial of degree 6g − 6). Since X is uniquely determined by f (X) it follows that X varies in a family depending on finitely many parameters (in fact 3g − 3 parameters suffice).
2. Surfaces One could hope to generalize the curve picture explained above to higher dimensions. In dimension 2 we already encounter a first difficulty, due to the fact that given any surface X (i.e. variety of dimension 2), and any point x ∈ X, one can produce a map from a new surface µ : X 0 → X such that (1) if E = µ−1 (x) ⊂ X 0 , then E ∼ = P1C , and (2) X 0 \ E is isomorphic to X \ {x}. This procedure is known as blowing up the point x ∈ X. In local coordinates it can be defined as follows: Suppose X = C2 with coordinates x1 , x2 and x = (0, 0) ∈ C2 is the origin, then let X 0 ⊂ P1C × C2 be defined by the equation y1 x2 − y2 x1 = 0 where y1 , y2 are the corresponding coordinates on P1C . The map µ : X 0 → C2 is just the projection on to the second factor. Properties (1-2) above are easy to check. Note that B2 (X 0 ) = B2 (X) + 1 (where B2 denotes the second Betti number) thus X 0 and X are not topologically equivalent and hence not isomorphic. This implies that there are uncountably many families of surfaces. The situation can be remedied by working up to birational equivalence. Two surfaces (or algebraic varieties) are birational if they have isomorphic open subsets. It turns out that any two birational curves are isomorphic. Blowing up a point on a surface (or higher dimensional variety) x ∈ X gives an example of a non-trivial birational
82
Christopher D. Hacon
map. In dimension 2, two surfaces X and X 0 are birational if and only if there they are isomorphic after a finite sequence of blow ups 0 X ← X1 ← . . . ← Xn ∼ → . . . → X10 → X 0 . = Xm
(Here each Xi+1 is obtained by blowing up a point xi ∈ Xi and similarly for each 0 Xj+1 .) We then study surfaces up to birational isomorphism. It is natural to ask if one can choose a distinguished representative in each birational class. The most natural idea is to pick a surface that is not obtained from blowing up another surface. We say that any such surface is minimal. Since, as observed above, blowing up a point on a surface increases the second Betti number (which is a positive integer), it follows that minimal surfaces always exist (in any birational equivalence class). One can then hope that in each birational equivalence class there is a unique minimal surface and that these minimal surfaces belong to countably many finite dimensional families determined by some invariants (similarly to the case of curves). Therefore, the next natural question is what discrete invariant should replace the genus of a curve? The answer is given in terms of the canonical line bundle. ⊗m For each m ≥ 0 we have the plurigenera Pm (X) := h0 (ωX ) = 0. In dimension 1 we have that Pm (X) = (2m − 1)(g − 1) for any g > 0 and m > 1, but in higher dimension, their behaviour is not so simple. It is useful to define a coarser invariant. The Kodaira dimension of X, denoted by κ(X), is defined as follows: (1) If Pm (X) = 0 for all m > 0, then we say that κ(X) < 0. (2) If the maximum of {Pm (X)|m > 0} is 1, then we say that κ(X) = 0. (3) Otherwise, we let κ(X) ∈ {1, . . . , dimC X} to be the maximum dimension of φm (X) for all m > 0. It is known that κ(X) = tr.deg.C R(X, ωX ) − 1 and that Pm (X) = O(mκ(X) ) for m > 0 sufficiently big and divisible. It is also easy to see that if X and X 0 are ⊗m ∼ birational smooth surfaces, then there is a natural isomorphism H 0 (X, ωX ) = ⊗m ⊗m 0 0 0 H (X , ωX 0 ) and thus H (X, ωX ) are birational invariants. When dimC X = 1 we have three cases κ(X) < 0, κ(X) = 0 and κ(X) = 1 corresponding to the cases g = 0, g = 1 and g ≥ 2 respectively. Note that “most” curves belong to the κ(X) = 1 case and thus, in most cases, we can use the sections ⊗m of H 0 (X, ωX ) to study X. In fact, as remarked above, it is an easy exercise (using the Riemann–Roch formula) to see that φω⊗3 : X → P5g−6 is an embedding for all C X g ≥ 2 whose image has degree 6g − 6. When dimC X = 2 we have four cases κ(X) ∈ {< 0, 0, 1, 2}. As we have seen above, working up to birational equivalence, we may consider only minimal surfaces. The first three cases are well understood and the last case is studied in terms of the pluricanonical maps φω⊗m . More precisely we have: X
(1) If κ(X) < 0, then X is either P2C or has a morphism X → C to a curve C with general fiber F ∼ = P1C . The minimal surface is not unique (in the birational
83
Classification of algebraic varieties
class), but it is well understood how two minimal surfaces are related. If κ(X) ≥ 0, it is however known that minimal surfaces are unique (in the birational class). We denote such a minimal surface by Xmin . (2) If κ(Xmin ) = 0, then Xmin belongs to one of four well understood cases (abelian surfaces, bielliptic surfaces, K3 surfaces and Enriques surfaces). In ⊗12 ∼ particular it is known that ωX = OXmin is the trivial line bundle. min (3) If κ(Xmin ) = 1, then for all m > 0 divisible by 24, we have that φω⊗m Xmin → C is a morphism to a curve whose general fiber has genus 1.
:
Xmin
(4) If κ(Xmin ) = 2, then we say that Xmin has general type. In this case, by a result of Bombieri, it is known that for all m ≥ 5 the pluricanonical maps give birational morphisms φω⊗m : Xmin → Xcan , where the canonical Xmin
model Xcan is uniquely determined by Xcan := ProjR(Xmin , ωXmin ) (i.e. it is defined by the generators and relations of the canonical ring). Note that Xcan is no longer smooth. Its singularities are however mild and well understood (they are known as canonical or DuVal singularities). In particular ωXcan is an ample line bundle. (Recall that a line bundle L is ample if L⊗m is very ample for some m > 0.) We have R(ωXmin ) = R(ωXcan ) = R(ωX ) for any smooth surface X birational to Xmin . Viewing Xcan as the image of Xmin via φω⊗5 , we have that Xcan is embedded as a subvariety of degree Xmin
2 2 M in PN C where M = 25 · c1 (ωXmin ) and N = 20 · c1 (ωXmin ) + χ(ωXmin ) − 1. Thus Xcan is determined by the zeroes of homogeneous polynomial of degree M and thus it depends on finitely many parameters. Since Xmin is uniquely determined by Xcan (it is given by the minimal desingularization), it follows that Xmin also depends on finitely many parameters.
At this point it would seem a matter of personal preference to work with minimal models (and hence smooth varieties) or canonical models (and hence varieties with ample canonical line bundle but mild singularities).
3. The minimal model program Naturally, one would like to generalize these results to higher dimensions. We hope to proceed as follows: (1) Given a smooth complex projective variety X, construct a map X 99K Xmin to a “minimal” model. Ideally this map is constructed via a finite sequence of well understood elementary operations generalizing the blow up operation discussed above. In particular these elementary operations should remove curves C such that c1 (ωX ) · C < 0. (2) If κ(X) < dimC X, describe the structure of Xmin explicitly. There should be two cases: If κ(X) < 0, then X is covered by rational curves. (Note that one can show that being covered by rational curves implies κ(X) < 0 and the
84
Christopher D. Hacon
converse is a deep conjecture.) If 0 ≤ κ(X) < dimC X, then we expect that there is a morphism φ : Xmin → Z where dimC Z = κ(X) and the general fiber has Kodaira dimension 0. (In fact one expects that φ = φω⊗m for all X m > 0 sufficiently divisible and Z = ProjR(X, ωX )). (3) If κ(X) = dimC (X), then we say that X is of general type. In this case, we expect that there exists an integer m0 (depending only on dimC X) such that for all m > 0 divisible by m0 , the pluricanonical maps give morphisms φωX ⊗m : Xmin → Xcan . This then implies that for any constant C > 0, min
there are finitely many families of varieties of general type Xcan and degree c1 (ωXcan )dimC X ≤ C. Conjecturally all of these steps have been well understood since the 1980’s. The first issue is to define minimal models. 3.1. Minimal models. In dimension 2, we construct minimal models by removing certain curves that arise from blowing up smooth points. As we have seen above, these curves E ⊂ X are isomorphic to P1C and we have c1 (ωX ) · E = deg(ωX |E ) = −1. Any curve in a surface with these properties is known as −1 curve and we may think of the minimal model Xmin as being obtained from X by removing all −1 curves. (In fact by a theorem of Castelnuovo, if E ⊂ X is a −1 curve, then there is a morphism µ : X → X 0 which is the blow up of a point x0 ∈ X 0 and E = µ−1 (x0 ).) Once we have removed all −1 curves we obtain the minimal model Xmin . It turns out that if κ(X) ≥ 0, then ωXmin is nef, i.e. it is positive in the sense that c1 (ωXmin ) · C ≥ 0 for any curve C ⊂ X. In other words, once we have removed all −1 curves, we have in fact removed all ωX negative curves. Note that this is a necessary condition for ωXmin to be semi-ample i.e. for φ = φω⊗m : Xmin → Xcan ⊂ PN C to be a morphism for some m > 0. If this Xmin
morphism exists, then c1 (ωXmin ) · C = the following:
1 m
degPN (φ(C)) ≥ 0. Therefore we make C
Temporary-Definition 3.1. Let X be a smooth projective variety, then X is a smooth minimal model if ωX is nef. It turns out that smooth minimal models do not always exist in dimension ≥ 3. Example 3.2. Let Y be an abelian threefold (i.e. a complex projective torus of dimension 3; for example E × E × E where E is an elliptic curve). Then Y is an abelian group and −1Y : Y → Y defines an automorphism of Y of order 2. Let X be the quotient of Y by this involution. Then X has canonical singularities. It is not smooth at the 26 points corresponding to the fixed points of −1Y . The canonical sheaf ωY is not a line bundle, but ωY⊗2 is a nef line bundle. One can show that Y does not admit any smooth minimal model (see for example [2, 5.38]). Therefore it is necessary to allow certain mild singularities known as terminal singularities. The first difficulty is how to define the canonical line bundle on a (normal) singular variety. The natural approach is to consider the smooth locus
85
Classification of algebraic varieties
ˆ
ˆ
⊗m ⊗m ⊗m i : Xsm ⊂ X and to let ωX := i∗ ωX . In other words, the sections ωX on an sm ⊗m open subset U ⊂ X are determined by those of ωX on U ∩ X and so we have sm sm ˆ
⊗m ⊗m that H 0 (X, ωX ) = H 0 (Xsm , ωX ). We will always assume that X is normal, sm and in particular the singular locus of X has codimension ≥ 2. It follows that ˆ ˆ ⊗m ⊗m H 0 (X, ωX ) is finite dimensional. If ωX is a line bundle for some m > 0, then we P ˆ ⊗m say that ωX is Cartier and ωX is Q-Cartier. More generally, if A = ai Ai ⊂ X is a divisor (i.e. a formal linear combination of codimension 1 subvarieties Ai ⊂ X with integer coefficients ai ∈ Z), then we let OX (A) denote the sheaf whose sections correspond to rational functions f on X such that (f ) + A ≥ 0 (here (f ) is the divisor given by the zeroes of f minus the poles of f ). It is easy to see that if X is ˆ smooth then OX (A) is Cartier, thus if X is normal, then OX (A)⊗m = OX (mA). A normal variety X is Q-factorial if each divisor A is Q-Cartier so that there exists an integer m > 0 such that OX (mA) is a line bundle. ˆ
⊗m Definition 3.3. Let X be a normal Q-factorial variety. Fix m > 0 such that ωX is Cartier. For any resolution of singularities (i.e. a projective birational morphism ˆ ⊗m from a smooth variety) f : Y → X, we may write ωY⊗m = f ∗ ωX ⊗ OX (A) (the ˆ ˆ ˆ ⊗m ⊗m ⊗m pull-back f ∗ ωX is defined as ωX is a line bundle, f ∗ ωX ⊗ OX (A) denotes the ˆ ⊗m line bundle whose sections are meromorphic sections of f ∗ ωX with poles at most the divisor A and the above equation defines the divisor A). X is terminal (resp. canonical, klt and log canonical) if for all f : Y → X as above, we P have A ≥ 0 and its support contains all exceptional divisors (resp. A ≥ 0, A = ai Ai where P ai > −1, and A = ai Ai where ai ≥ −1).
Since exceptional divisors are typically negative on a covering family of curves, the condition “A ≥ 0 and its support contains all exceptional divisors” can be interpreted as saying that X is obtained from Y by removing ωY negative curves. For example, if X is a smooth surface and f : Y → X is the blow up of a smooth point and E is the corresponding −1 curve, then ωY = f ∗ ωX ⊗OY (E) and c1 (ωY )· E = −1 < 0. More generally, if f : Y → X is the blow up of a smooth subvariety Z of codimension r + 1, then ωY = f ∗ ωX ⊗ OY (rE) and c1 (ωY ) · C < 0 for any f -exceptional curve C. We think of the canonical model as being obtained by contracting curves C such that c1 (ωXmin ) · C = 0. Thus, from now on we will consider varieties with terminal singularities. One ˆ ⊗m of the advantages is that as ωX is Q-Cartier (i.e. ωX is a line bundle for some m > 0), then c1 (ωX ) · C is well defined for any curve C ⊂ X (and computed by ˆ ⊗m 1 deg(ωX |C ) for m > 0 sufficiently divisible). It then makes sense c1 (ωX ) · C = m to ask whether ωX is nef. A second issue in dimension ≥ 3 is that a variety may have many different minimal models. For example Example 3.4. Let V ⊂ C4 be the cone over P1C × P1C defined by the equation wz − xy = 0. Blowing up the vertex O yields a map W → V that replaces O by a divisor E isomorphic to P1C × P1C . One can define morphisms φ1 : W → W1 and φ2 : W → W2 which contract the first and second rulings respectively. Let V 0 ⊂ P4
86
Christopher D. Hacon
be the corresponding compactification of V . Pick a very general hyperplane H of degree 2d 0. Consider the corresponding degree 2 cover X → V 0 ramified over H|V 0 . Define Xi similarly so that Xi is isomorphic to the corresponding double cover of the corresponding compactification of Wi . It is easy to see that Xi are both minimal models (see [2, 5.47]). It turns out that any two birational minimal models are isomorphic in codimension 1 (cf. [2, 5.37]) and they are related by a finite sequence of flops [6] (the birational map X1 99K X2 above is an example of a flop). The map X1 99K X2 is an example of a flop. In general a flop X1 99K X2 is given by two small birational morphisms of normal varieties Xi 99K Z such that the Xi have terminal singularities, ρ(Xi /Z) = 1 and c1 (ωXi ) · C = 0 for any curve C ⊂ Xi exceptional over Z. Recall that two (linear combinations of) curves C and C 0 are numerically equivalent C ≡ C 0 if c1 (L) · (C − C 0 ) = 0 for any line bundle L on X. We then let X ci Ci |ci ∈ R, Ci is a curve in X with f∗ Ci = 0}/≡ , N1 (X/Z) = { and ρ(X/Z) = dimR N1 (X/Z). 3.2. Flips and divisorial contractions. The next question is of course how to construct a minimal model starting from a given variety via a finite sequence of elementary maps. The main ingredient is the following consequence of the cone theorem. Theorem 3.5. Let X be a projective variety with terminal singularities. If ωX is not nef, then there is a curve C such that c1 (ωX ) · C < 0 and a morphism f : X → Z surjective, with connected fibers such that ρ(X/Z) = 1 and any curve D is contracted if and only if [D] ∈ R[C] ⊂ N1 (X). The fibers of f are covered by rational curves. The idea behind the minimal model program is as follows. We start with a variety X with terminal singularities. (1) If ωX is nef, then X is a minimal model and we stop. (2) Otherwise we consider the corresponding morphism f : X → Z whose existence is guaranteed by the cone theorem (cf. (3.5)). (3) If dimC Z < dimC X, then X is covered by rational curves (i.e. curves isomorphic to P1C ). This implies that κ(X) < 0 (conjecturally, the converse holds, i.e. if κ(X) < 0, then X is covered by rational curves). In this case, we say that f : X → Z is a Mori fiber space. (4) If dimC Z = dimC X and dimC Ex(f ) = dimC X − 1, then we say that f is a divisorial contraction. This is the higher dimensional analog of contracting a −1 curve. In this case Z has terminal singularities and we may replace X by Z. We have ρ(X/Z) = 1.
87
Classification of algebraic varieties
(5) If dimC Z = dimC X and dimC Ex(f ) < dimC X − 1, then we say that f is a flipping contraction. This is a new phenomenon (it does not happen in dimension 2). It is easy to see that ωZ is not Q-Cartier and thus we may not replace X by Z. The idea then is to construct the flip f + : X + → Z. This is a birational morphism such that dimC Ex(f + ) < dimC X − 1, ρ(X + /Z) = 1, X + has terminal singularities and ωX + · C + > 0 for any curve C + ⊂ X + exceptional over Z. In other words we replace some ωX negative curves by ωX + positive curves. If the flip exists, it is unique. If Z ⊂ CN is an affine variety, then the flip is given by ProjR(X, ωX ). Thus in order to show that flips exist, one must show that R(X, ωX ) is finitely generated. We may then replace X by X + . (6) After finitely many iterations we expect to obtain a finite sequence of flips and divisorial contractions X = X0 99K X1 99K . . . 99K XN such that either XN is a minimal model or there is a Mori fiber space XN → Z. (7) Since ρ(X) decreases by 1 with each divisorial contraction and is unchanged by flips, the difficulty is to show that there is no infinite sequence of flips.
4. Recent results We will focus on the case of varieties of general type. By a recent deep result of [1] and [11], it turns out that the canonical ring is always finitely generated. Theorem 4.1. Let X be a smooth (or terminal or even klt) complex projective variety. The canonical ring R(X, ωX ) is finitely generated. In particular if X is of general type, then the canonical model Xcan = Proj R(X, ωX ) is well defined. It is of course singular (as observed above), but its singularities are mild (they are the canonical singularities defined above). It is also the case that minimal models of varieties of general type exist, however they are not necessarily unique and they have terminal singularities. We have the following (cf. [1]). Theorem 4.2. Let X be a smooth complex projective variety of general type. Then there exists a birational map to a minimal model X 99K Xmin which is given by a finite sequence of flips and divisorial contractions X = X0 99K X1 99K . . . 99K Xn = Xmin , and ωXmin is semiample so that φω⊗m : Xmin → Xcan is a birational ˆ Xmin
morphism for some m > 0. One can also generalize the other features of the classification of surfaces. We have
88
Christopher D. Hacon
Theorem 4.3 ([13], [3], [12]). For any integer d > 0 there exists an integer M > 0 such that if X is a smooth projective variety of general type and dimension d, then φω⊗M is birational. X
It follows that if c1 (ωXmin )d ≤ V for some fixed integer V > 0, then φω⊗M (X) X
has degree ≤ M d V . Thus each X as above is birational to a (typically very singu¯ ⊂ PN of bounded degree. Since Xcan , the canonical model of X, can lar) variety X C ¯ (it is given by ProjR(X 0 , ωX 0 ) where X 0 → X ¯ is any be recovered uniquely from X resolution), it follows that canonical models Xcan of fixed dimension and bounded canonical degree c1 (ωXcan )d ≤ V belong to a bounded family. In particular, using techniques developed by Alexeev, Koll´ar, Shephered–Barron, Viehweg and others, one can show the existence of a coarse moduli space Md,V (such that each point of Md,V corresponds to a unique canonical model). One interesting consequence is the following. Corollary 4.4. Fix d ∈ N. The set Vd = {vol(X)}, where X is a smooth complex projective variety of dimension d, is discrete. Here vol(X) = c1 (ωXcan )d if X is of general type and vol(X) = 0 otherwise. Note however that the above result is not effective. If dimC X = 2, then since ωXcan is Cartier, we have c1 (ωXcan )2 ≥ 1, but if dimC X = 3, then there are examples 1 ) and hence M is big. where c1 (ωXcan )3 is small (in fact equal to 420 The next natural question is if one can construct a geometrically meaningful compactification of Md,V . In particular what varieties are allowed as limits of canonical models? The answer is once again provided by the minimal model program. Let f ∗ : X ∗ → T ∗ be a projective morphism to an open subset of a curve T ∗ ⊂ T . We aim to compactify f ∗ i.e. to produce a “nice” morphism of projective varieties f : X → T extending f ∗ . We proceed in several steps. (1) Since f ∗ is projective, we may pick a projective morphism f¯ : X¯ → T extending f ∗ . However X¯ is typically very singular and depends on the given embedding. (2) We may resolve the singularities of X¯ via a morphism µ : X 0 → X¯ . We may then assume that X 0 is smooth, the exceptional divisor and each fiber have simple normal crossings (a union of smooth varieties intersecting transversely). Once again X 0 is not uniquely determined and it may not extend f ∗ (we had to resolve singular points on X ∗ ). It may also happen that certain fibers P are not reduced, i.e. if O ∈ T is a point, then the fiber (f¯ ◦ µ)−1 (O) = yi Yi where some yi are > 1. (3) After replacing T by a ramified cover, we may assume that f 0 : X 0 → T is semistable so that each fiber has reduced simple normal crossings (is a union of smooth varieties with coefficient 1 meeting transversely). (4) We replace f 0 : X 0 → T by the relative canonical model f : X = ProjR(X 0 /T, ωX 0 ) → T . It is easy to see that f extends f ∗ and f has nice properties (e.g. the outcome is uniquely determined, ωX is ample over T and all fibers have log canonical singularities).
Classification of algebraic varieties
89
P It follows that we must allow reducible canonical models X = Yi . The typical example is a union of smooth varieties meeting transversely, so that ωX |Yi = P ωYi ( j6=i P Yj |Yi ). In general we have semi-log-canonical pairs (SLC pairs for short) say (X = Yi , B) where ωX (B)|Yi = ωYi (Bi ) is a log canonical pair and ωYi (Bi ) is an ample Q-Cartier divisor. The coefficients of the divisor Bi belong to the set {1− 1 k |k ∈ N}. Many technical issues arise in this context, however there has been much recent progress in understanding the geometry of SLC pairs. Understanding the geometry of SLC pairs is reduced to understanding the geometry of each component and then gluing the components together (cf. [8]). Hacon, Mc Kernan and Xu recently announced the following result (cf. [4]). Theorem 4.5. Fix an integer d > 0 and consider the set of volumes V = {vol(ωX (B)) = c1 ((ωX (B)))d } where (X, B) is a log canonical pair, ωX (B) is ample and the coefficients of B lie in the set {1 − k1 |k ∈ N}. Then V has no accumulation points from above, in particular there is a positive minimal element in V . P P If d = 1, then vol(ωX ( (1 − n1i )Bi )) = 2g − 2 + (1 − n1i ) and by an easy 1 computation, one sees that the minimum element of V is 42 . Despite the numerous technical issues, it appears that all the pieces are now in place to construct the moduli space of canonical models of SLC pairs (X,B) (thanks to work of Koll´ ar, Abramovich, Alexeev, Hacking, Hacon, Hassett, Karu, Kov´ acs, Mc Kernan, Viehweg, Xu and others (cf. [9], [10]).
References [1] C. Birkar, P. Cascini, C. Hacon, and J. Mc Kernan, Existence of minimal models for varieties of log general type. J. Amer. Math. Soc. 23 no. 2 (2010), 405–468. [2] C. Hacon and S. Kov´ acs, Classification of higher dimensional algebraic varieties. Oberwolfach Seminars, Birkh¨ auser Boston, Boston, MA, 2010. [3] C. Hacon, J. Mc Kernan, Boundedness of pluricanonical maps of varieties of general type. Invent. Math. 166 (2006), 1–25. [4] C. Hacon, J. Mc Kernan, and C. Xu; ACC for the log canonical threshold. Preprint 2012. [5] C. Hacon and C. Xu; Existence of log canonical closures. arXiv:1105.1169. To appear in Invent. Math. [6] Y. Kawamata, Flops connect minimal models. Publ. Res. Inst. Math. Sci 44 (2008), 419–423. [7] J. Koll´ ar and S. Mori, Birational geometry of algebraic varieties. With the collaboration of C. H. Clemens and A. Corti. Translated from the 1998 Japanese original. Cambridge Tracts in Mathematics, 134. Cambridge University Press, Cambridge, 1998. [8] J. Koll´ ar, Quotients by finite equivalence relations. arXiv:0812.3608. [9] J. Koll´ ar, Moduli of varieties of general type. arXiv:1008.0621.
90
Christopher D. Hacon
[10] J. Koll´ ar, Moduli of higher dimensional varieties. Book to appear. Available in March 2010 at http://www.math.princeton.edu/~kollar/. [11] Y.-T. Siu, Finite generation of canonical ring by analytic method. Science in China Series A: Mathematics Apr., 2008, Vol. 51, No. 4, 481–502. [12] S. Takayama, Pluricanonical systems on algebraic varieties of general type. Invent. Math. 165 no. 3 (2006), 551–587. [13] H. Tsuji, Pluricanonical systems of projective varieties of general type. II. Osaka J. Math. 44 no. 3 (2007), 723–764.
Christopher D. Hacon, Department of Mathematics, University of Utah, 155 South 1400 East, Salt Lake City, UT 48112-0090, USA E-mail: [email protected]
Representations of affine Kac–Moody groups over local and global fields: a survey of some recent results Alexander Braverman and David Kazhdan
Abstract. Let G be a reductive algebraic group over a local field K or a global field F . It is well know that there exists a non-trivial and interesting representation theory of the group G(K) as well as the theory of automorphic forms on an adelic group G(AF ). The purpose of this talk is to give a survey of some recent constructions and results, which show that there should exist an analog of the above theories in the case when b aff (which is essentially G is replaced by the corresponding affine Kac–Moody group G built from the formal loop group G((t)) of G). Specifically we discuss the following topics: affine (classical and geometric) Satake isomorphism, Iwahori–Hecke algebra of b aff , affine Eisenstein series and Tamagawa measure. G
1. Introduction 1.1. Reductive groups: notations. Let G be a split connected reductive algebraic group defined over integers. We fix a maximal split torus T in G and denote by Λ its lattice of cocharacters (the coweight lattice of G). We let W denote the Weyl group of G; it acts on T and Λ. Let T ∨ denote the dual torus of T ; considered as a group over C. The lattice of characters of T ∨ is canonically isomorphic to Λ. Let G∨ denote the Langlands dual group of G which by the definition contains T ∨ as a maximal torus. 1.2. The group Gaff . To a split connected reductive group G as above one can associate the corresponding affine Kac–Moody group Gaff in the following way. Let Λ denote the coweight lattice of G let Q be an integral, even, negativedefinite symmetric bilinear form on Λ which is invariant under the Weyl group of G. One can consider the polynomial loop group G[t, t−1 ] (this is an infinite-dimensional group ind-scheme). It is well-known that a form Q as above gives rise to a e of G[t, t−1 ]: central extension G e → G[t, t−1 ] → 1 1 → Gm → G e has again a natural structure of a group ind-scheme. Moreover, G The multiplicative group Gm acts naturally on G[t, t−1 ] and this action lifts e We denote the corresponding semi-direct product by Gaff ; we also let gaff to G. denote its Lie algebra. Thus if G is semi-simple then gaff is an untwisted affine Kac–Moody Lie algebra in the sense of [24]; in particular, it can be described by the corresponding affine root system.
92
Alexander Braverman and David Kazhdan
Similarly, one can consider the corresponding completed affine Kac–Moody b aff by replacing the polynomial loop group G[t, t−1 ] with the formal loop group G group G((t)) in the above definitions. b 0 ) the quotient of Gaff (resp. of G b aff ) by We shall also denote by G0aff (resp. G aff the central Gm . 1.3. The dream. Let K be a non-archimedian local field with ring of integers O and residue field k. A smooth representation of G(K) is a vector space V over C together with a homomorphism π : G(K) → Aut(V ) such that the stabilizer of every v ∈ V contains an open compact subgroup K of G(K). We denote the category of smooth representations by M(G(K)). The category M(G(K)) has been extensively studied in the past 50 years. Similarly, given a global field F we can consider automorphic representations of G(AF ) where AF is the ring of adeles of F . In both (local and global) cases the most interesting statement about the above representation is Langlands correspondence which relates representations of G(K) (resp. automorphic representations of G(AF )) to homomorphisms from the absolute Galois group of K (resp. of F ) to the Langlads dual group G∨ of G. Our dream would be to develop an analog of the above representation theories b aff (or, more generally, and the Langlands correspondence for the group Gaff or G for any symmetrizable Kac–Moody group). This is a fascinating task by itself but b aff will we also believe that a fully developed theory of automorphic forms for G have powerful applications to automorphic forms on G. Of course, at the moment the above dream remains only a dream; however, in the recent years some interesting results about representation theory of Gaff over either local or global field have appeared. The purpose of this paper is to survey some of those results; more precisely, we are going to concentrate on two aspects: study of some particular Hecke algebras in the local case and the study of Eisenstein series in the global case. All the results that we are going to discuss generalize well-known results for the group G itself; however, the generalizations are not always straightforward and some new features appear in the affine case. 1.4. Hecke algebras and Satake isomorphism. First let us mention that b aff (K) was developed in [25] some version of the above representation theory for G and [14]–[16]. This theory looks promising, but we are not going to discuss it in this paper; on the other hand, in [9] we generalized the Satake isomorphism to the case of Gaff . Let us first recall the usual Satake isomorphism (which can be thought of as the starting point for Langlands duality mentioned above). Given an open compact subgroup K in G(K) one can consider the Hecke algebra H(G, K) of compactly supported K-bi-invariant distributions on G(K) (this is an algebra with respect to convolution). We say that H(G, K) is a Hecke algebra of G with respect to K. H(G, K) is a unital associative algebra and it is well-known that the study of the representation theory of G(K) is essentially equivalent to studying representation theory of H(G, K) for different choices of K. The group G(O) is a maximal compact subgroup of G(K). We denote the corresponding Hecke algebra H(G, G(O)) by Hsph (G, K) and call it the spherical Hecke algebra. The Satake isomorphism is an isomorphism between Hsph (G, K) and the
Representations of affine Kac–Moody groups over local and global fields
93
complexified Grothendieck ring K0 (Rep(G∨ )) of finite-dimensional representations of G∨ . For future purposes it will be convenient to note that K0 (Rep(G∨ )) is also naturally isomorphic to the algebra C(T ∨ )W of polynomial functions on the maximal torus T ∨ ⊂ G∨ invariant under W . The Satake isomorphism was generalized to Gaff in [9] and it was studied in more detail in [11]. The formulation is similar to the case of G but the details are somewhat more involved. We shall give the precise formulation in Section 4, where we also discuss the analog of the so called Iwahori–Hecke algebra for Gaff and the affine version of the Gindikin–Karpelevich formula and the Macdonald formula for the spherical function (following the papers [7], [8], [11]). A (partial) analog of the geometric Satake isomorphism (cf. [33]) in the affine case (following the papers [3]–[5]) is also discussed in Section 4. b aff . The most basis example of automorphic forms, 1.5. Eisenstein series for G b aff coming from the Borel subgroup of G b aff , were studied the Eisenstein series for G extensively in the works of Garland (cf. e.g. [17]–[19] and references therein) and Kapranov [26]. In the forthcoming publication [10] we are going to continue the above study of Eisenstein series and give some applications (for example, we are going to describe an affine version of the Tamagawa number formula for Gaff ). We hope that the above results should have interesting applications to automorphic Lfunctions for the group G itself, using the affine version of the so called Langlands– Shahidi method (cf. [19]). We discuss it in more detail in Section 5. 1.6. Contents of the paper. In Section 2 we are going to review some facts about spherical and Iwahori Hecke algebras of reductive groups over a local nonarchimedian fields that we are going to generalize later to the affine case. In addition we also review the corresponding geometric Satake isomorphism. In Section 3 we review some well-known facts about automorphic forms and, in particular, about Eisenstein series over a global field F . To simplify the discussion we concentrate on the case when F is a functional field and the automorphic forms in question are everywhere unramified. In Section 4 we generalize the constructions of Section 2 to the case of Gaff . Similarly, in Section 5 we generalize the constructions of Section 3 to the affine case. In particular, we discuss the foundations of the theory of Eisenstein series b aff coming from various parabolic subgroups of G b aff and relate this theory to for G some infinite products of automorphic L-functions for the finite-dimensional group G. We also discuss and affine Tamagawa number formula. Finally in Section 6 we discuss some other works related to the above constructions as well as future directions of research.
2. Spherical and Iwahori Hecke algebras in the finite-dimensional case In this section we recall some well-known facts about representation theory and Hecke algebras (and their geometric version) of reductive groups of a local nonarchimedian field K, which we are going to generalize to the affine case.
94
Alexander Braverman and David Kazhdan
2.1. Groups over local non-archimedian fields and their representations. For any open compact subgroup K of G(K), we denote by H(G, K) the Hecke algebra of G with respect to K. A choice of a Haar measure on G(K) provides an identification of H(G, K) with the space of K-bi-invariant functions on G(K). For every K as above, there is a natural functor from M(G(K)) to the category of left H(G, K)-modules, sending every representation V to the corresponding space V K of K-invariants; thus one can try do understand the category M(G, K) by studying the categories of H(G, K)-modules for different K. There are two choices of K that will be of special interest to us. The first case is K = G(O) (which is a maximal compact subgroup of G(K)). Recall that the corresponding Hecke algebra in this case is called the spherical Hecke algebra and it is denoted it by Hsph (G, K). The second case is the case when K is the Iwahori subgroup I (by definition, this is a subgroup of G(O) which is equal to the preimage of a Borel subgroup in G(k) under the natural map G(O) → G(k)). Let us remind the description of the corresponding algebras in this case. 2.2. Satake isomorphism. The spherical Hecke algebra Hsph (G, K) is commutative. The Satake isomorphism is a canonical isomorphism between Hsph (G, K) and the algebra C[Λ]W . The latter algebra has several other standard interpretations: it is also isomorphic to the complexified Grothendieck ring K0 (Rep(G∨ )) of finite-dimensional representations of G∨ as well as to the algebra C(T ∨ )W of polynomial functions on the maximal torus T ∨ ⊂ G∨ invariant under W of G. The Satake isomorphism is one of the starting points for the celebrated Langlands conjectures. We are going to present a generalization of the Satake isomorphism to the case of affine Kac–Moody groups in Section 4. 2.3. The Iwahori–Hecke algebra. The algebra H(G, I) is known to have the following presentation (usually called “Bernstein presentation”). It is generated by elements Xλ for λ ∈ Λ and Tw for w ∈ W , subject to the following relations: 1) Tw Tw0 = Tww0 if `(ww0 ) = `(w) + `(w0 ); 2) Xλ Xµ = Xλ+µ ; in other words, Xλ ’s generate the algebra C(T ∨ ) inside H(G, I); 3) For any f ∈ C(T ∨ ) and any simple reflection s ∈ W we have f Ts − Ts s(f ) = (q − 1)
f − s(f ) 1 − X−αs
(it is easy to see that the right hand side is again an element of C(T ∨ ). The spherical Hecke algebra is a subalgebra of H(G, P I); in terms of the above presentation it is equal to P · H(G, I) · P where P = w∈W Tw . 2.4. Explicit description of the Satake isomorphism. Recall, that the Satake isomorphism is an isomorphism S between the algebra Hsph (G, K) and C(T ∨ )W. Both algebras possess a natural basis, parameterized by the set Λ+ of dominant coweights of G.
Representations of affine Kac–Moody groups over local and global fields
95
Namely, let $ ∈ O be a uniformizer; for λ ∈ Λ let us denote by $λ the image of $ ∈ K∗ under the map λ : K∗ → T (K) ⊂ G(K). Then it is known that G(K) is the disjoint union of the cosets G(O) · $λ · G(O) when λ runs over Λ+ . For every λ ∈ Λ+ we denote by hλ ∈ Hsph (G, K) the characteristic function of the corresponding double coset. This is a basis of Hsph (G, K). On the other hand, for λ ∈ Λ+ let L(λ) denote the irreducible representation of G∨ with highest weight λ. Then the characters χ(L(λ)) form a basis of C(T ∨ )W . We would like to recall what happens to these bases under the Satake isomorphism (and its inverse). Let Wλ is the stabilizer of λ in W and set X Wλ (q) = q `(w) . w∈Wλ
Theorem 2.1 (Macdonald, [31]). For any λ ∈ Λ+ we have Q 1 − q −1 e−α ! hλ,ρ∨ i X q α∈R + Q w eλ . S(hλ ) = 1 − e−α Wλ (q −1 ) w∈W
(1)
α∈R+
Here ρ∨ is the half-sum of the positive roots of G. Let us stress the following: there are many proofs of the above theorem, but essentially all of them use the Iwahori Hecke algebra H(G, I). On the other hand, we can consider the Satake isomorphism S as is an isomorphism between the algebras Hsph (G, K) and K0 (Rep(G∨ )). For any λ ∈ Λ+ let L(λ) denote the irreducible representation of G∨ with highest weight λ. Then their classes [L(λ)] form a basis of K0 (Rep(G∨ )). The functions S −1 ([L(λ)]) were described by Lusztig [30] (cf. also [12] and [27]). Namely, we have: Theorem 2.2 (Lusztig, [30]). Let λ, µ ∈ Λ+ . Then S −1 ([L(λ)])(π µ ) is non-zero if and only if µ is a weight of L(λ) and in that case S −1 ([L(λ)])(π µ ) is a certain qanalog of the weight multiplicity dim L(λ)µ (i.e. it is a polynomial in q with integral coefficients whose value at q = 1 is equal to the weight multiplicity). 2.5. Gindikin–Karpelevich formula. The classical Gindikin–Karpelevich formula describes explicitly how a certain intertwining operator acts on the spherical vector in a principal series representation of G(K).1 In more explicit terms it can be formulated as follows. Let us choose a Borel subgroup B of G and an opposite Borel subgroup B− ; let U, U− be their unipotent radicals. In addition, let L denote the coroot lattice of G, R+ ⊂ L – the set of positive coroots, LP + – the subsemigroup of L generated by R+ . Thus any γ ∈ Λ+ can be written as ai αi where αi are the simple roots. We shall denote by |γ| the sum of all the ai . 1 More precisely, the Gindikin–Karpelevich formula answers the analogous question for real groups.
96
Alexander Braverman and David Kazhdan
Set now GrG = G(K)/G(O). Then it is known that U(K)-orbits on Gr are in one-to-one correspondence with elements of Λ for any µ ∈ Λ we shall denote by S µ the corresponding orbit. The same thing is true for U− (K)-orbits. For each γ ∈ Λ we shall denote by T γ the corresponding orbit. It is well-known that T γ ∩ S µ is non-empty iff µ − γ ∈ L+ and in that case the above intersection is finite. The Gindikin–Karpelevich formula allows one to compute the number of points in T −γ ∩ S 0 for γ ∈ L+ (it is easy to see that the above intersection is naturally isomorphic to T −γ+µ ∩ S µ for any µ ∈ Λ). The answer is most easily stated in terms of the corresponding generating function: Theorem 2.3 (Gindikin–Karpelevich formula). Y 1 − q −1 e−α X . #(T −γ ∩ S 0 )q −|γ| e−γ = 1 − e−α α∈R+
γ∈Λ+
The proof of the above formula is not difficult – it can be reduced to G = SL(2). However, it can also be obtained from the Macdonald formula (1) by a certain limiting procedure. The second proof is important for our purposes since we are going to use similar argument in the affine case. 2.6. Geometric Satake isomorphism. The Satake isomorphism recalled above has a geometric version, called the geometric Satake isomorphism.2 Let us recall some facts about the geometric Satake isomorphism; later, we are going to discuss a (partial) generalization of it to the affine case. It is probably worthwhile to note that there exists a geometric approach to the Iwahori–Hecke algebra (it was developed in the works of Ginzburg, Kazhdan–Lusztig and Bezrukavnikov), but we are not going to discuss it in this paper. Let now K = C((s)) and let O = C[[s]]; here s is a formal variable. Let GrG = G(K)/G(O). Then the geometric (or categorical) analog of the algebra H(G) considered above is the category PervG(O) (GrG ) of G(O)-equivariant perverse sheaves on GrG (cf. e.g. [27]). According to loc. cit. the category PervG(O) (GrG ) possesses canonical tensor structure and the geometric Satake isomorphism asserts that this category is equivalent to Rep(G∨ ) as a tensor category. The corresponding fiber functor from PervG(O) (GrG ) to vector spaces sends every perverse sheaf S ∈ PervG(O) (GrG ) to its cohomology. Another way to construct this (“fiber”) functor is discussed in [32]. More precisely, one can show that G(O)-orbits on GrG are finite-dimensional and they are indexed by the set Λ+ of dominant weights of G∨ . For every λ ∈ Λ+ we λ denote by GrλG the corresponding orbit and by GrG its closure in GrG . Then GrλG λ is a non-singular quasi-projective algebraic variety over C and GrG is a (usually singular) projective variety. One has [ λ GrG = GrλG . µ≤λ 2 As
was mentioned above the usual Satake isomorphism is the starting point for Langlands duality; the classical Langlands duality has its geometric counterpart, usually referred to as the geometric Langlands duality which is based on the geometric version of the Satake isomorphism.
97
Representations of affine Kac–Moody groups over local and global fields
One of the main properties of the geometric Satake isomorphism is that it sends λ the irreducible G∨ -module L(λ) to the intersection cohomology complex IC(GrG ). In particular, the module L(λ) itself gets realized as the intersection cohomology λ of the variety GrG . λ As a byproduct of the geometric Satake isomorphism one can compute IC(GrG ) λ in terms of the module L(λ). Namely, it is well-known that the stalk of IC(GrG ) at a point of GrµG as a graded vector space is essentially equal to the associated graded grF L(λ)µ of the µ-weight space L(λ)µ in L(λ) with respect to certain filtration, called the Brylinski–Kostant filtration. This is a geometric analog of Theorem 2.2. λ λ One can construct certain canonical transversal slice W G,µ to GrµG inside GrG . This is a conical affine algebraic variety (i.e. it is endowed with an action of the multiplicative group Gm which contracts it to one point). The above result about λ the stalks of IC(GrG ) then gets translated into saying that the stalk of the IC-sheaf λ of W G,µ at the unique Gm -fixed point is essentially isomorphic to grF L(λ)µ . Note λ
that since W G,µ is contracted to the above point by the Gm -action, it follows that λ
λ
the stalk of of IC(W G,µ ) is equal to the global intersection cohomology IH∗ (W G,µ ). λ
The varieties W G,µ are important for us because we are going to describe their λ
analogs when G is replaced by Gaff ; on the other hand, the affine analogs of GrG should be wildly infinite-dimensional and we don’t know how to think about them.
3. Unramified automorphic forms in the finite-dimensional case In this Section we recall some classical facts about automorphic forms on reductive groups of reductive groups over a global field, which we are going to generalize to the affine case later. To simplify the discussion, we restrict ourselves to unramified automorphic forms over a functional field F (i.e. the field of rational functions on a smooth projective curve X over a finite field k). In this case, there is a way to think about automorphic forms in terms of functions on the moduli space of G-bundles over the curve X; this point of view will be convenient for us in the affine case.3 3.1. Automorphic forms over functional fields and G-bundles. Let X be a smooth projective geometrically irreducible curve over a finite filed k = Fq . Let also G be a split semi-simple simply connected group over k. We set F = k(X); this is a global field and we let AF denote its ring of adeles. We also denote by O(AF ) ring of integral adeles. 3 We would like to note that the pioneering work on the subject in the affine case was done by H. Garland [17]–[19], who dealt with the case when F = Q. Most of the results that we are going to describe in the affine case can be generalized to any global field, but technically the case of number fields is more difficult due to the existence of archimedian places and we prefer not to discuss it in this survey paper.
98
Alexander Braverman and David Kazhdan
It is well-know that the double quotient G(O(AF ))\G(AF )/G(F ) ' BunG (X)
(2)
(this an equivalence of groupoids). (Complex valued) functions on the above space will usually be referred to as unramified automorphic forms. We denote the space of such functions by C(BunG (X)). 3.2. The Hecke algebra action. Recall, that for a local field K with ring of integers OK we denote by HK (G) the spherical Hecke algebra of G over K; this ∨ algebra is isomorphic to the algebra C[G∨ ]G of ad-invariant regular functions on G∨ . Let v be a place of F and let Kv be the corresponding local completion of F with ring of integers Ov . Let qv be the number of elements in the residue field of Ov . Let Hv (G) denote the corresponding Hecke algebra. It is well-known that Hv (G) acts on C(BunG (X)) by correspondences. Assume that f ∈ C(BunG (X)) is an eigen-function of all the Hv (G). In We say that f is a Hecke eigen-form. The corresponding eigen-value is an element gv (f ) ∈ G∨ for all places v. Given a finite-dimensional representation ρ : G∨ → Aut(V ). We define the L-function L(f, ρ, s) of f by LG (f, ρ, s) =
Y v
1 . det(1 − qv−s ρ(gv ))
3.3. Groupoids. In what follows it will be convenient to treat various double coset spaces as groupoids rather than as sets. By a groupoid we shall mean a small category X where all morphisms are isomorphisms. Any such groupoid is equivalent to a quotient groupoid X/H where X is a set and H is a group acting on X. In particular, given a group G and two subgroups H1 and H2 , the double quotient H1 \G/H2 has a natural structure of a groupoid; in particular, this applies to the double coset space (2). We shall sometimes denote by |X | the set of isomorphism classes of objects of X . Given x ∈ X we denote by AutX (x) the automorphism group of x in X . Given two groupoids X and Y it makes sense to consider functors f : X → Y. For every y ∈ Y , the fiber f −1 (y) is also a groupoid: by definition Autf −1 (y) (x) is the group of automorphisms of x whose image in the group of automorphisms of y is trivial. We say that f is representable if Autf −1 (y) (x) is trivial for every x and y. For a map p : X → Y between two sets we are going to denote by p∗ the pullback of functions (this is a linear map from functions on Y to functions on X) and by p! the operation of summation over the fiber (this is a linear map from functions on X to functions on Y ; a priori it is well-defined when p has finite fibers). More generally, we can talk about p∗ and p! when X and Y are groupoids. In this case we should define p! in the following way p! (f )(y) =
X x∈|p−1 (y)|
f (x) . # Autp−1 (y) (x)
(3)
Representations of affine Kac–Moody groups over local and global fields
99
In particular, we can apply this to the R case when p is the map from X to the point. In this case, we shall denote p! by X . In other words, for a function f : |X | → C we set Z X f (x) f= . (4) # Aut(x) x∈|X |
X
The above sum makes sense if both |X | and Aut(x) are finite. When |X | is infinite, the sum sometimes still makes sense (if it converges). 3.4. Eisenstein series and constant term. Let P ⊂ G be parabolic subgroup and let M be the corresponding Levi group. We have canonical maps G ← P → M which give rise to the diagram η
BunP (X) −−−−→ BunM (X) πy BunG (X) The map π is representable but has infinite fibers. The map η has finite fibers but it is not representable. The connected components of BunM are numbered by elements of the lattice ΛM = Hom(Gm , M/[M, M ]); we denote by 2ρP the element of the dual lattice Hom(M, Gm ), equal to the determinant of M -action on nP . Abusing the notation, we shall denote by the same symbol the corresponding function BunM (X) → Z; it also makes sense to talk about the function ρP : BunM (X) → Z.4 Given a function f ∈ C(BunM (X)) we define the Eisenstein series EisGP (f ) ∈ C(BunG (X)) by setting EisGP (f ) = π! η ∗ (f q ρP ). This makes sense when f has finite support. When f has infinite support, wellknown convergence issues arise. Similarly, for a function g ∈ C(BunG (X)) we define the constant term cGP (g) ∈ C(BunM (X)) by cGP (g) = q ρP η! π ∗ (g). The constant term is well-defined for any g. We say that g is cuspidal if cGP (g) = 0 for all P . 3.5. Constant term of Eisenstein series. Let us recall how to compute the composition of Eisenstein series and constant term operators. For simplicity we are going to restrict ourselves to the following situation. Let P and M be as above and let P− be a parabolic subgroup opposite to P (i.e. P ∩P− = M ). Let f be a cuspidal function on BunM . Let M ∨ be the corresponding Langlands dual group and let P ∨ , ∨ P−∨ be the corresponding parabolics in G∨ ; let also n∨ P , nP− be the nilpotent radicals ∨ of their Lie algebras. The group M acts on both of these space. For each w ∈ 4ρ
P
takes values in Z and not
1 Z 2
since G was assumed to be simply connected.
100
Alexander Braverman and David Kazhdan
N (M )/M = N (M ∨ )/M ∨ we denote by by P w , (P ∨ )w , nw P etc. the corresponding groups or Lie algebras obtained. Similarly, for f ∈ BunM (X) we denote by f w the corresponding function on BunM obtained by applying conjugation by w to f . Then we have the following well known: Theorem 3.1. Assume that f ∈ C(BunM (X)) is a cuspidal Hecke eigen-function. (1) X
cGP ◦ EisGP (f ) =
∨
∨ w
q (g−1) dim nP ∩(nP ) f w
w∈N (M )/M
(2)
X
cGP− ◦ EisGP (f ) =
q
∨ w (g−1) dim n∨ P ∩(nP ) −
∨ w L(f, n∨ P ∩ (nP− ) , 0) ∨ w L(f, n∨ P ∩ (nP− ) , 1)
fw
w∈N (M )/M
.
∨ w L(f, n∨ P ∩ (nP ) , 0) . ∨ ∨ L(f, nP ∩ (nP )w , 1)
Remark. Similar formula holds for the composition cGQ ◦ EisGP where Q is any parabolic having M as its Levi subgroup. However, only the above cases will be relevant for us in the sequel. 3.6. Tamagawa number. In this section we recall the calculation of the Tamagawa number in the finite-dimensional case. For simplicity we stick again to the functional field case; we also assume that G is simply connected. Let X, G, BunG (X) be as above. Let ζ(s) denote the ζ-function of the curve X. In other words, ζ(s) =
det(1 − q −s Fr : H 1 (X, Ql ) → H 1 (X, Ql )) . (1 − q −s )(1 − q −s+1 )
Let also d1 , · · · , dr be the degrees of the generators of the ring of invariant polynomials on g = Lie(G) (also called the exponents of G). Theorem 3.2. We have Z BunG (X)
1 = q (g−1) dim G
l Y
ζ(di ).
(5)
i=1
The above formula was proved by Langlands in [29].5 It can also be derived from a computation of H ∗ (BunG , Ql ) (due to a result of Behrend [2]), which was done in [23] (following a computation of Atiyah–Bott [1] over C). Let us formulate this result, since this interpretation of Theorem 3.2 is instructive from the point of view of generalization to the affine case. It is well-known that H ∗ (pt/G, Ql ) = Sym∗ (V ) for a certain graded vector space endowed with an action of Frobenius. Explicitly, V has a basis c1 , . . . , cr where each ci has cohomological degree 2di and Frobenius eigenvalue q di . Theorem 3.3. H ∗ (BunG , Ql ) = Sym∗ (V ⊗H ∗ (X, Ql )[2]); here Sym∗ is understood in the “super”-sense. 5 Langlands considered the case of global field Q but the argument in [29] applies to any global field.
Representations of affine Kac–Moody groups over local and global fields
101
3.7. Sketch of the proof of Theorem 3.2. Let us recall the basis steps of Langlands’ proof of Theorem 3.2, since this proof will actually be important for some definitions in the affine case. For any γ ∈ Λ let us denote by pγ the restriction of the natural map p : BunB → BunG to BunγB . Let us denote by t∗ the dual space to the Lie algebra of T (over C). Then for any s ∈ t∗ let us set Eγ = (pγ )! (1BunγB );
X X q hγ,s+ρi Eγ . E(s) = EisGB ( 1BunγT q hγ,si ) = γ
γ∈Λ
The following result is well-known: Lemma 3.4. (1) The above series is absolutely convergent for Re(s) > ρ∨ and it extends to a meromorphic function on the whole of t∗ . (2) The function E(s) has simple pole along every hyperplane of the form hs − ρ, α∨ i = 0 where α is a simple coroot and no other poles near s = ρ. Q (3) The residue of E(s) at s = ρ defined as the limit lim E(s) hs − ρ, αi∨ i is a s→ρ i∈I constant function on BunG . Let us call the above residue r(G). The Ridea of the proof of Theorem 3.2 is this: on the one hand it is easy to compute r(G) · 1BunG . On the other hand, BunG
one can compute r(G) itself by computing the constant term of r(G) · 1BunG . More precisely, are going to deduce Theorem 3.2 from the following Theorem 3.5.
(1) Z
r(G) · 1BunG = q (g−1) dim N (ln q)−r
(# Pic0 (X))r . (q − 1)r
BunG
(2) cGB (r(G) · 1BunG )|BunγT = q −hγ,ρi
(Ress=1 ζ(s))r . r Q ζ(di ) i=1
Theorem 3.5 implies Theorem 3.2 since Ress=1 ζ(s) =
# Pic0 (X) q g−1 (q−1) ln q .
4. Spherical and Iwahori Hecke algebras of affine Kac–Moody groups over a local non-archimedian field In this Section we discuss analogs of spherical and Iwahori Hecke algebra for the group Gaff over a local non-archimedian field. The results are taken from [9], [7], [8] and [11].
102
Alexander Braverman and David Kazhdan
4.1. The semi-group G+ aff (K) and the affine spherical Hecke algebra. One may consider the group Gaff (K) and its subgroup Gaff (O). The group Gaff by definition maps to Gm ; thus Gaff (K) maps to K∗ . We denote by % the composition of this map with the valuation maps∗ → Z. We now define the semigroup G+ aff (K) to be the subsemigroup of Gaff (K) generated by: • the central K∗ ⊂ Gaff (K); • the subgroup Gaff (O); • all elements g ∈ Gaff (K) such that %(g) > 0. We show in [9] that there exists an associative algebra structure on a completion Hsph (Gaff , K) of the space of finite linear combinations of double cosets with respect to Gaff (O). We would like to emphasize that this statement is by no means trivial – in [9] it is proved using some cumbersome algebro-geometric machinery. A more elementary proof of this fact is going to appear in [11] but that proof is still rather long. We call the above algebra the spherical Hecke algebra of Gaff . The algebra Hsph (Gaff , K) is graded by non-negative integers (the grading comes from the map π which is well-defined on double cosets with respect to Gaff (O)); it is also an algebra over the field C((v)) of Laurent power series in a variable v, which comes from the central K∗ in Gaff (K). 4.2. The affine Satake isomorphism. The statement of the Satake isomorphism for Gaff is very similar to that for G. First of all, in [9] we define an analog of ∨ Waff the algebra C(T ∨ )W which we shall denote by C(Tbaff ) (here Taff = C∗ ×T ∨ ×C∗ is the dual of the maximal torus of Gaff , Waff is the corresponding affine Weyl group ∨ and C(Tb∨ ) denotes certain completion of the algebra of regular functions on Taff ). This is a finitely generated Z≥0 -graded commutative algebra over the field C((v)) of Laurent formal power series in the variable v which should be thought of as a ∨ coordinate on the third factor in Taff = C∗ × T ∨ × C∗ (the grading has to do with the first factor); moreover, each component of the grading is finite-dimensional over C((v)). To simplify notations we will always assume that G is a simple and simply connected (although the case when G is a torus is also very instructive and it is much less trivial than in the usual case – cf. Section 3 of [9]). In this case we define (in [9]) the Langlands dual group G∨ aff , which is a group ind-scheme over C. Then G∨ aff is another Kac–Moody group whose Lie algebra g∨ aff is an affine Kac–Moody algebra with root system dual to that of gaff (thus, in particular, it might be a twisted affine Lie algebra when g is not simply laced). The ∨ ∗ ∨ group G∨ aff contains the torus Taff ; moreover the first C -factor in Taff is central in ∨ ∗ G∨ ; also the projection T → C to the last factor extends to a homomorphism aff aff ∗ ∨ ∨ G∨ aff → C . One defines a category Rep(Gaff ) of Gaff which properly contains all highest weight integrable representations of finite length and also certain infinite direct sums of irreducible highest weight integrable representations which which is stable under tensor product. The character map provides an isomorphism b∨ Waff The of the complexified Grothendieck ring K0 (G∨ aff ) with the algebra C(Taff )
Representations of affine Kac–Moody groups over local and global fields
103
complexified Grothendieck ring K0 (G∨ aff ) of this category is naturally isomorphic ∨ Waff b to the algebra C(Taff ) via the character map. The corresponding grading on K0 (Gaff ) comes from the central charge of G∨ aff -modules and the action of the variable v comes from tensoring G∨ -modules by the one-dimensional representation aff ∗ coming from the homomorphism G∨ aff → C , mentioned above. The affine Satake isomorphism (proved in [9] and reproved in a different way in [11]) claims the following: ∨ Waff Theorem 4.1. The algebra Hsph (Gaff , K) is canonically isomorphic to C(Tbaff ) ∨ (and thus also to K0 (Gaff )).
As was mentioned, in the case when G is semi-simple and simply connected, the group Gaff is an affine Kac–Moody group. We expect that with slight modifications our Satake isomorphism should make sense for any symmetrizable Kac–Moody group. However, our proofs are really designed for the affine case and do not seem to generalize to more general Kac–Moody groups. The corresponding Hecke algebra has recently been defined in [22]. 4.3. The Iwahori–Hecke algebra of Gaff . Let now Iaff ⊂ Gaff (O) be the Iwahori subgroup and let H(G+ aff , Iaff ) denote the space of Iaff -bi-invariant functions on G+ (K) supported on a union of finitely many double cosets. It is shown [11] aff that the usual convolution is well-defined on H(G+ aff , Iaff ). Note that this is different from the spherical case, the convolution was only defined on a completion of the space of functions on double cosets with finite support. The structure of the algebra H(G+ aff , Iaff ) is similar to the finite-dimensional case. For simplicity let us assume that G is semi-simple and simply connected. Let Λaff = Z ⊕ Λ ⊕ Z be the lattice of cocharacters of Taff . Let Λ+ aff ⊂ Λ be the Tits cone, consisting of all elements (a, λ, k) ∈ Λaff such that either k > 0 or k = 0 and λ = 0. Let Haff denote the algebra generated by elements Xλ , λ ∈ Λaff and Tw , w ∈ Waff with relations 1), 2), 3) as in Subsection 2.3. In this description we consider Haff as an algebra over C[q, q −1 ] where q is a formal variable. The algebra Haff is Z-graded; this grading is defined by setting deg Tw = 0;
deg X(a,λ,k) = k.
We denote by Haff,k the space of all elements of degree k in Haff . Note that Haff,0 is a subalgebra of Haff , which is isomorphic to Cherednik’s double affine Hecke algebra. On the other hand, let ! M + Haff = ChTw iw∈Waff ⊕ Haff,k . k>0 + It is easy to see that H+ aff is just generated by elements Xλ , λ ∈ Λaff and Tw , w ∈ W. The following result is proved in [11]:
104
Alexander Braverman and David Kazhdan
+ Theorem 4.2. The algebra H(G+ aff , Iaff ) is isomorphic to the specialization of Haff to q being the number of elements in the residue field of K.
In particular, the algebra H(G+ aff , Iaff ) is closely related to Cherednik’s double affine Hecke algebra. We would like to note that another relation between the double affine Hecke algebra and the group Gaff (K) was studied by Kapranov (cf. [25]). By definition the algebra H(G+ aff , Iaff ) by definition is endowed with a natural basis corresponding to characteristic functions of double cosets of Iaff on G+ aff (K). It is natural to conjecture that this basis comes in fact from a C[q, q −1 ]-basis in H+ aff but we don’t know how to prove this. It would be interesting to give an algebraic description of this basis. 4.4. Explicit description of the affine Satake isomorphism. We would like now to describe the affine analog of Theorem 2.1 (the proof is going to appear in [11] and it uses the algebra H(G+ aff , Iaff ) in an essential way). For simplicity we are going to assume again that G is semi-simple and simply connected. The (topological) basis {hλ } of the algebra Hsph (Gaff , K) is defined exactly as in the case of G and one might expect that Q 1 − q −1 e−α !mα hλ,ρ∨ i X α∈R q aff +,aff Q . w eλ Saff (hλ ) = 1 − e−α Waff,λ (q −1 ) w∈Waff
α∈R+,aff
Here R+,aff denote the set of positive coroots of Gaff , mα is the multiplicity of the ∨ coroot α and ρ∨ aff is the corresponding affine analog of ρ . However, it turns out that the above formula is wrong! Let us explain how to see this (and then we are going to present the correct statement). ∨ Waff Let Saff : Hsph (Gaff , K) → C(Tbaff ) denote the affine Satake isomorphism. We would like to compute Saff (hλ ) for every λ ∈ Λ+ . Let us consider the case λ = 0. In this case the element h0 is the unit element of the algebra Hsph (Gaff , K) and thus we must have Saff (h0 ) = 1. On the other hand, the right hand side of (1) in Theorem 2.1 in this case equal to Q 1 − q −1 e−α !mα X α∈Raff,+ 1 Q w . Waff (q −1 ) 1 − e−α w∈Waff
α∈Raff,+
Let us denote this infinite sum by ∆. This function was studied by Macdonald in [32]. It is easy to see that in the finite case this sum is equal 1 but in the affine case it is different from 1 and Macdonald gave the following explicit product formula for ∆ which is going to be important when we discuss affine Eisenstein series. For simplicity let us assume that G is simply laced and that its Lie algebra is simple. Let also δ denote the minimal positive imaginary coroot of G and let d1 , . . . , dr be the exponents of G. Then Macdonald proved that ∆=
r Y ∞ Y 1 − q −di e−jδ . 1 − q −di +1 e−jδ i=1 j=1
(6)
105
Representations of affine Kac–Moody groups over local and global fields
Let us now go back to the description of Saff (hλ ). The following theorem is proved in [11]: Theorem 4.3. For every λ ∈ Λ+ we have Q hλ,ρ∨ aff i
1 q Saff (hλ ) = · ∆ Waff,λ (q −1 )
X
w e
w∈Waff
λ α∈R+,aff
1 − q −1 e−α !mα
Q
1 − e−α
.
(7)
α∈R+,aff
The appearance of ∆ in the above formula is very curious. Some geometric explanation for it was given in [7] (it is mentioned in a little more detail in the next Subsection). 4.5. Affine Gindikin–Karpelevich formula. Let us recall the notations of Subsection 2.5; it is clear that at least set theoretically it makes sense to conb The notations sider GrGb for any completed symmetrizable Kac–Moody group G. µ γ Λ, Λ+ , R+ , GrG , S , T make sense for Gb without any changes (at least if we think about S µ and T γ as sets and not as geometric objects). Conjecture 4.4. For any γ ∈ Λ+ the intersection T −γ ∩ S 0 is finite. This conjecture is proved in [7] when K = Fq ((t)) and in [8] for any nonb aff .6 In the cases when Conjecture 4.4 is archimedian local field K when Gb = G known, it makes sense to ask whether one can compute the generating function X IGb(q) = #(T −γ ∩ S 0 ) q −|γ| e−γ . γ∈Λ+
We do not know the answer in general; however, it [7] and [8] it is proved that b aff we have when Gb = G Q 1 − q −1 e−α !mα 1 α∈R+,aff Q IGb(q) = . (8) ∆ 1 − e−α α∈R+,aff
The formula (8) is obtained in [8] as a limit of (7). Earlier, a different proof was given in [7] in the case when K is a functional field, using the geometry of Uhlenbeck spaces studied in [6]. We remark that the latter proof also gave a 1 geometric explanation for the appearance of the correction term ∆ in this formula. We don’t have enough room for details, but let us just note that very roughly speaking it is related to the fact that affine Kac–Moody groups over a functional local field can be studied using various moduli spaces of bundles on algebraic surfaces. In particular, by combining the results of [7], [8] and [11] one gets a new proof of the identity (6) (which is independent of Cherednik’s results). 6 The
general case is probably provable by using the techniques of [22].
106
Alexander Braverman and David Kazhdan
4.6. Towards geometric Satake isomorphism for Gaff . Some parts of the geometric Satake isomorphism have been generalized to Gaff in the papers [3][5]. The idea the approach of loc. cit. belongs in fact to I. Frenkel who suggested that integrable representations of G∨ aff of level k should be realized geometrically in terms of some moduli spaces related to G-bundles on A2 /Γk , where Γk is the group of roots of unity of order k acting on A2 by ζ(x, y) = (ζx, ζ −1 y). [3] constitutes an attempt to make this idea more precise. Let BunG (A2 ) denote the moduli space of principal G-bundles on P2 trivialized at the “infinite” line P1∞ ⊂ P2 . This is an algebraic variety which has connected components parametrized by non-negative integers, corresponding to different values of the second Chern class of the corresponding bundles. Similarly, one can define BunG (A2 /Γk ). Very vaguely, the main idea of [3] can be formulated in the following way: The basic principle: 1) The integrable representations of G∨ aff of level k have to do with the geometry (e.g. intersection cohomology) of some varieties closely related to BunG (A2 /Γk ). 2) This relation should be thought of as similar to the relation between finitedimensional representations of G∨ and the geometry of the affine Grassmannian GrG . We believe that 1) above has many different aspects. [3] is concentrated on just one such aspect; namely, it is explained in [3] how one can construct an analog of λ the varieties W µ in the affine case (using the variety BunG (A2 /Γk ) as well as the corresponding Uhlenbeck compactification of the moduli space of G-bundles – cf. [6]). It is conjectured that the stalks of IC-sheaves of these varieties are governed by the affine version of grF L(λ)µ .7 This conjecture is still open, but in [3] it is proved in several special cases. More precisely, it is proved in loc. cit. that 1) all of the above conjectures hold in the limit k → ∞ (cf. [3] for the exact formulation). 2) In [3] a slightly weaker version of the above conjecture is proved in the case k = 1; the proof is based on the results of [6]. A recent paper [37] proves it in full generality. 3) Again, a slightly weaker version of the above conjecture is proved in [3] for G = SL(N ). Let us mention the main ingredient of that proof. Let g be a simply laced simple finite-dimensional Lie algebra. Then by McKay correspondence one can associate with g a finite subgroup Γ of SL(2, C). Recall that H. Nakajima (cf. e.g. [34]) gave a geometric construction of integrable gaff -modules of level N using certain moduli spaces which, roughly speaking, have to do with vector bundles of rank N on A2 /Γ. In particular, if g = sl(k) it follows that 1) By H. Nakajima the geometry of vector bundles of rank n on A2 /Γk is related to integrable modules over sl(k)aff of level N . 2) By I. Frenkel’s suggestion the geometry of vector bundles of rank N on A2 /Γk is related to integrable modules over sl(N )aff of level k. 7 In fact, the definition of the filtration F given in [3] is slightly wrong; the correct definition is given in [37].
Representations of affine Kac–Moody groups over local and global fields
107
On the other hand, in the representation theory of affine Lie algebras there is a well-known relation, due to I. Frenkel, between integrable modules over sl(k)aff of level N and integrable modules over sl(N )aff of level k. This connection is called level-rank duality; one of its aspects is discussed in [13]. It turns out that combining the results of [13] with the results of [34] one can get a proof of a slightly weaker version of our main conjecture. It is of course reasonable to ask why G-bundles on A2 /Γk have anything to do with the sought-for affine Grassmannian of Gaff . We don’t have a satisfactory answer to this question, though some sort of explanation (which would be too long to reproduce in the Introduction) is provided in [9]. Also, E. Witten produced an explanation of this phenomenon in terms of 6-dimensional conformal field theory (cf. [40]). In [4] and [5] other aspects of 1) are explored; in particular, an affine analog of convolution of G(O)-equivariant perverse sheaves on GrG and the analog of the so called Beilinson–Drinfeld Grassmannian are discussed in [4] and the analog of the Mirkovic–Vilonen fiber functor is discussed in [5]. Most of the statements of loc. cit. are still conjectural, but for the case of G = SL(N ) almost all of them follow easily from the work of Nakajima [35].
5. Eisenstein series for affine Kac–Moody groups In this Section we discuss unramified Eisenstein series for Gaff over a functional field; we are going treat the subject in geometric way. Most of the results described below are adaptions of the corresponding results of H. Garland who considered the case of the global field Q); other results are going to appear in [10]. 5.1. The double quotient. As was explained in Section 3.1 unramified automorphic forms in the case of finite-dimensional G are functions on the double quotient G(OAF )\G(AF )/G(F ). We would like to introduce an analogous quob aff and then give some interesting examples of functions on tient for the group G this quotient (given by an affine analog of Eisenstein series).8 b aff is a group ind-scheme over Z, it makes sense to consider G b aff (AF ). Since G However, it turns out that this is not the right thing to consider. Instead, let us set b aff,A = {(gv ∈ G b aff (Kv ))| gv ∈ G b aff (Ov ) for almost all v}. G F
Then the double quotient on which we are going to produce some interesting functions is the quotient Gaff (OAF )\Gaff,AF /Gaff (F ). To motivate the consideration of this double quotient let us provide its geometric interpretation. 8 Most of the discussion will go through for the group G b aff instead of Gaff , but for certain purposes which go beyond the scope of this survey paper it seems more appropriate to work here b aff here. with G
108
Alexander Braverman and David Kazhdan
b Let S be a smooth algebraic surface over a field 5.2. The groupoid BunG (S). k and let X ⊂ S be a smooth projective curve in S corresponding to a sheaf of ideals JX . We let Sn ⊂ S denote the closed sub-scheme of S corresponding to sheaf of ideals JXn+1 . For each n ≥ m we have the embedding imn : Sm → Sn . Let us denote by Sb the formal completion of S along X (it can be considered as either a formal scheme or an ind-scheme). For an algebraic group G, we would like b of G-bundles on S. b By the definition an object to consider the groupoid BunG (S) b consists of G-bundles Fn on each Sn together with a compatible F of BunG (S) system of isomorphisms Fn |Sm ' Fm (and the notion of isomorphism of such objects is clear). b0 ). We start our definition of G-bundles on Sb0 = S\X b 5.3. BunG (S with the case G = GL(n). In other words, we are going to define the category Vect(Sb0 ) of vector bundles on Sb0 . b the category of locally free coherent sheaves on S; b Let us denote by Vect(S) 0 b b • Objects of Vect(S ) are objects of Vect(S). b we define • Given two objects E1 , E2 of Vect(S) [ HomVect(Sb0 ) (E1 , E2 ) = HomVect(S) b (E1 , E2 (nX)). n≥0
Let us now assume that G is simply connected. Since Vect(Sb0 ) is a tensor category we can define a groupoid Bun0G (Sb0 ) of exact tensor functors from Rep(G) to Vect(Sb0 ). It is clear that we have a functor r : BunG (S) → Bun0G (Sb0 ) and we define BunG (Sb0 ) to be the full subcategory of Bun0G (Sb0 ) consisting of objects which lie in the image of r. We conjecture that BunG (Sb0 ) = Bun0G (Sb0 ). Since we assume that G is simply connected by Theorem 11.5 of [38] this conjecture is true if we replace Sb by S. 5.4. Bundles with respect to G0aff . We would like to define the groupoid b aff -bundles on X; let us first do it for the group G b 0 . Since we are given a G aff 0 b → Gm , if such a notion makes sense, then we should have a homomorphism G aff functor ι : BunG0aff → Pic(X). So, in order to describe BunG0aff , it is enough to describe the groupoid η −1 (L) for each L ∈ Pic(X) in a way that the assignment L 7→ η −1 (L) is functorial with respect to isomorphisms in Pic(X). We set ι−1 (L) = BunG (SbL0 ) where S is the total space of L. Proposition 5.1. We have an equivalence of groupoids b 0aff (O(AF ))\G b 0aff,A /G b 0aff (F ). BunGb0 ' G F aff
Representations of affine Kac–Moody groups over local and global fields
109
Let us now turn on the central extension. Let X ⊂ S be as in the beginning of this Section (in particular, we do not assume anything about the self-intersection of X). The central extension b aff → G b 0aff → 1 1 → Gm → G g G (Sb0 ) over BunG (Sb0 ) for any X ⊂ S as above. gives rise to a Pic(X)-torsor Bun g G (Sb0 ) where L runs over all Denote as as before by BunGbaff the union of Bun L elements of Pic(X). Then the analog of Proposition 5.1 reads as follows: b aff (O(AF ))\G b aff,A /G b aff (F ). BunGbaff ' G F
(9)
5.5. Eisenstein series. We now fix L ∈ Pic(X) and assume that deg(L) < 0. For a parabolic subgroup P ⊂ G, let BunG,P (SbL ) be the groupoid of G-bundles on SbL endowed with a P -structure on X ⊂ SbL . Note that BunG,G (SbL ) is just BunG (SbL ) (in the future we shall use the following convention: for any symbol of the form ?G,P , defined for all parabolic subgroups of G, we shall write ?G instead of ?G,G ). Consider diagram ηG,P,L BunG,P (SbL ) × Pic(X) −−−−→ BunM (X) × Pic(X) πG,P,L y
(10)
g G (Sb0 ) Bun L We shall mostly be interested in the cases P = G and P = B (a Borel subgroup of G); for P = G we shall just write EisG,L instead of EisG,G,L . Similarly to Subsection 3.4 we let ρaff P : Z × ΛM → Z be the homomorphism which such that ρP,aff (α) = 0 for every coroot α of M and ρP,aff (αi ) = 1 for every b aff not lying in M . We also denote by ρP,aff the corresponding simple coroot of G function Pic(X)×BunM (X) → Z. For a complex-valued function f on BunM (X)× Pic(X) we would like to set ∗ EisG,P,L (f ) = (πG,P,L )! ηG,P,L (q ρP,aff f ).
(11)
A priori, it is not clear what sense it makes, since the RHS of (11) might be an infinite sum. We say that EisG,P,L (f ) is well-defined if it value at every point is given by a finite sum Then we have Theorem 5.2. (1) Assume that f has finite support. Then EisG,P,L (f ) is welldefined (i.e. its value at every point is given by a finite sum). If P = G then the same is true for any f such that the image of supp(f ) in Pic(X) is finite. (2) Assume that P = B; then BunM (X) × Pic(X) = BunT (X) × Pic(X). Let deg : BunT (X) × Pic(X) → Λ ⊕ Z denote the natural degree map and let fs = χ · q hdeg,si , where χ is a unitary character of BunT × Pic(X) and s ∈ t∗ ⊕C. Then the series EisG,B,L (fs ) is absolutely convergent when Re s > ρ∨ aff . Moreover, it has a meromorphic continuation to the domain Re s > 0.
110
Alexander Braverman and David Kazhdan
(3) Assume that P = G. Then EisG,P,L (f ) is well-defined if f has finite support modulo Pic(X). In particular, this is true for any cuspidal f . The first statement is due to Kapranov [26]. The second statement is due to Garland [17] and the third statement will appear in [10]. b aff defined Given P as above one can consider the parahoric subgroup PP ⊂ G as Gm × G[[t]]P o Gm , where G[[t]]P is the preimage of P under the natural map G[[t]] → G. It should be thought of as a parabolic subgroup of the affine Kac– b aff and the corresponding Levi factor is Gm × M × Gm where M Moody group G is the Levi factor of P . The operators EisG,P,L should be thought of as Eisenstein series EisGbaff ,PP (restricted to ι−1 (L)). If B ⊂ G is a Borel subgroup, then PB b aff (later, we are going to consider a should be thought of as a Borel subgroup of G b different type of Borel subgroups of Gaff . In fact, it is well-known that the group b aff has more general parabolic subgroups containing PB than those of the form G b aff of PP for some P ⊂ G; we shall call such subgroups parabolic subgroups of G positive type (since later we are going to consider other parabolic subgroups). The b aff of above Eisenstein series can be defined for any parabolic subgroup P ⊂ G positive type and the analog of Theorem 5.2 holds for any such P. 5.6. Constant term. We now want to define the operator of constant term, actg G (Sb0 ) to functions on Pic(X) × BunM (X). In principle, ing from functions on Bun L ∗ (f ). However, formally there we would like to set cG,P,L (f ) = q ρP,aff (ηG,P,L )! πG,P,L is a problem with this definition. Namely, the morphism ηG,P,L is not representable and the corresponding automorphism groups (whose sizes enter the definition of (ηG,P,L )! as in (3)) are infinite. So, we need to apply certain “renormalization” procedure in order to define (ηG,P,L )! . Let us explain this procedure for P = G(the general case is similar). Let F ∈ BunG (SbL ). The, for any M ∈ Pic(X), the −1 automorphism group of (M, F) as a point in ηG,L (M, F|X ) is Aut0 (F) which consists of those automorphisms of F which are equal to identity when restricted to X. This is an infinite group which is equal to the set of Fq -points of some pro-unipotent algebraic group and thus formally its “number of points” should be the same as the number of points in the corresponding Lie algebra, which is equal to H 0 (X, p∗ (gF )(−X)), where p : SbL → X denotes the natural morphism and gF is the vector bundle associated to F by means of the adjoint representation of G. The space H 0 (X, p∗ (gF )(−X)) is infinite-dimensional, but the corresponding space H 1 (X, p∗ (gF )(−X)) is actually finite-dimensional (here we use the fact that deg(L) < 0) and we formally set # Aut0 (F) = q dim H
1
(X,p∗ (gF )(−X))
.
Informally, this definition is reasonable because by the Riemann–Roch theorem if the Euler characteristic dim H 0 (X, p∗ (gF )(−X)) − dim H 1 (X, p∗ (gF )(−X)) made sense, it would have been independent of F. Given this convention, we can now g G (Sb0 ) we set define (ηG,P,L )! and for a function f on Bun L
cG,P,L (f ) = q
−ρP,aff
∗ (ηG,P,L )! πG,P,L (f ).
Representations of affine Kac–Moody groups over local and global fields
111
In fact, a slight variation gives a definition of cGbaff ,P for any parabolic subgroup P b aff of positive type.9 For P = B it is easy to see that this definition coincides of G with that of [17]. It is easy to see that cG,P,L (f ) makes sense for any function f on g G (Sb0 ), since the fibers of the map BunG (SbL ) → BunG (X) has finitely many Bun L isomorphism classes of objects. 5.7. More on Eisenstein series an constant term. Given a parabolic subb aff defined Gm × G[t−1 ]P o Gm group P of G one consider the subgroup QP of G −1 where G[t ]P is the preimage of P under the map G[t−1 ] → G obtained by setting t−1 = 0. We shall call such subgroups parabolic subgroups of negative type (as before there exist more general parabolic subgroups of negative type, but we shall not consider them in this paper). One can talk about constant term and Eisenstein series for parabolic subgroups of negative type. Geometrically, these are defined by means of the following analog of (10). − ηG,P,L
Pic(X) × BunG,P (SL−1 ) −−−−→ Pic(X) × BunM (X) − πG,P,L y
(12)
g G (Sb0 ) Bun L Here SL−1 denotes the total space of L−1 and BunG,P (SL−1 ) denotes the groupoid of G-bundles on SL−1 endowed with a P -structure on X ⊂ SL−1 . The existence of the map BunG,P (SL−1 ) → BunG (SbL0 ) is clear; its lift to a map Pic(X) × g G (Sb0 ) is not difficult to define, but we shall not do it in this BunG,P (SL−1 ) → Bun L paper. Thus one can define operator c− G,P,L . In this case no “renormalization” as − in Subsection 5.6 is needed, but the fibers of the map ηG,P,L are infinite and thus − a priori cG,P,L can only be applied to functions with finite support – otherwise one needs to check convergence (this is very different from the finite-dimensional case). One can also define the corresponding Eisenstein series operator Eis− G,P,L ; this is an operator from functions with finite support on Pic(X)×BunG (X) to functions g G (Sb0 ). However, its definition is not straightforward, since the composition on Bun L − − (πG,P,L )! (ηG,P,L )∗ (f ) is given by an infinite sum even when f has finite support and we must use a renormalization procedure similar to that from Subsection 5.6. The formal definition will appear in [10]. We now want to present the analog Theorem 3.1. Let nP,aff be the nilpotent radical of LiePP and let nP,aff,− be the nilpotent radical of LieQP . We can also consider the corresponding Langlands dual subalgebras in g∨ aff . Theorem 5.3. Assume that f ∈ C(Pic(X)×BunM (X)) is a cuspidal Hecke eigenfunction. Then 9 Note that the factor q ρP,aff is changed to q −ρP,aff ; this is due to the fact a factor of q 2ρP,aff is hidden in the renormalization procedure described above.
112
Alexander Braverman and David Kazhdan
(1) cG,P,L ◦ EisG,P,L (f ) = ∨ w X L(f, n∨ ∨ w P,aff ∩ (nP− ,aff,− ) , 0) −(g−1) dim n∨ P,aff ∩(nP− ,aff,− ) q fw . ∨ w L(f, n∨ P,aff ∩ (nP− ,aff,− ) , 1) w∈Waff /W (M )
Note that the factor in front of f w is slightly different from that in Theorem 3.1(1); this has to do with the renormalization procedure of Subsection 5.6. (2) Assume that G is simply laced and let f = (χ, φ) where χ is a character of Pic(X) and φ is a cuspidal Hecke eigen-function on BunM (X). Then c− G,P,L ◦ EisG,P,L (f ) = X
q
w∈Waff /W (M )
r Y ∞ Y
L(χj , di ) × L(χj , di − 1) i=1 j=1
∨ (g−1) dim n∨ P,aff ∩(nP
− ,aff,−
)w
fw
∨ w L(f, n∨ P,aff ∩ (nP,aff ) , 0) . ∨ ∨ L(f, nP,aff ∩ (nP,aff )w , 1)
A similar statement holds for the composition cG,P,L ◦ Eis− G,P,L . A few remarks are in order about the formulation of Theorem 5.3. The first statement is proved in a similar manner to the finite-dimensional case; in the case P = B it appears in [17] (we also refer the reader to [17] for the discussion of convergence of the right hand side of Theorem 5.3(1)). In the second statement the product of abelian L-functions in front of the sum comes from the formula (6) for the “correction term” in the affine Gindikin–Karpelevich formula; a variant of this statement holds for non-simply laced groups as well (using the formula for the correction term described in [7]). Let us now look at the case P = G. Then in the right hand side of Theorem 5.3(2) only one term (corresponding to w = 1) is left and it is equal to f multiplied by certain infinite product of ratios of Lfunctions. Moreover, let us consider the case when χ above is equal to q s deg . Then the product ∞ r Y r Y ∞ Y Y L(χj , di ) ζ(js + di ) = , j L(χ , di − 1) i=1 j=1 ζ(js + di − 1) i=1 j=1 which is a meromorphic function of s when Re(s) > 0. Also, ∞ Y L(f, n∨ L(φ, ρ, js) P,aff , 0) = , ∨ L(f, nP,aff , 1) j=1 L(φ, ρ, js + 1)
where ρ is the adjoint representation of G∨ . The above product is absolutely convergent for Re(s) > 1 and the standard conjectures about automorphic Lfunctions imply that it should have a meromorphic continuation to the domain Re(s) > 0. We do not know how to prove this at the moment, but we expect this observation to be a useful tool in proving that L(φ, ρ, s) has a meromorphic continuation. It is also important to note that we are not claiming anything about the com− position c− G,P,L ◦ EisG,P,L since we don’t know how to make sense of it.
Representations of affine Kac–Moody groups over local and global fields
113
5.8. Affine Tamagawa number formula. To conclude this Section, we would like to present an analog of Theorem 3.2 in the affine case. First, one needs to g G (Sb0 ) analogous to the one defined by (4). Naively, one define a measure on Bun L can try to define in it by the same formula as (4) on BunG (SbL0 ) rather than on g G (Sb0 ); however, one quickly discovers that the automorphism groups which Bun L appear in (4) are infinite, and to define the measure one needs to perform again a renormalization procedure similar to the one in Subsection 5.6. We are not going to give the details of that procedure here, but let is just mention that it is a little more involved than in Subsection 5.6 and, in particular, the resulting g G (Sb0 ); moreover, with respect to the natural measure makes sense only on Bun L ∨ Pic(X)-action on it changes according to the character M 7→ q −2h deg(M) . This phenomenon has been discovered by H. Garland in [17], who gave a group-theoretic definition of this measure. Since the above measure changes according to a non-trivial character of Pic(X), we can’t integrate the function 1 with respect to this measure. Instead, we are going g G (Sb0 ) by recalling the interpretation of the volume of to define the volume of Bun L BunG (X) given by Theorem 3.5. Namely, for s ∈ C⊕t∗ we set E(s) = EisG,B,L (fs ) where fs is equal to h(n, γ), si on Picn (X) × BunγT (X). Thus we can define rGbaff as h?,ρaff i the residue of c− (this G,B,L (E(s)) at s = ρaff multiplied by by the function q is a constant function on Pic(X) × BunT (X) which priori, rGbaff might depend on L but it is easy to deduce from Theorem 5.3 that it does not). Then we define the g G (Sb0 ) as volume of Bun L (Ress=1 ζ(s))r+1 . rGbaff With this definition we now have the following Theorem 5.4. Assume that G is simply laced and simple. Let us assume that the exponents d1 , · · · , dr are numbered in such a way that d1 = 2. Then the volume of g G (Sb0 ) is equal to Bun L r Q
ζ(di )
i=1 r Q
.
(13)
ζ(di − 1)
i=2
Let us make two remarks about (13). First, the absence of the factor corresponding to d1 in the denominator has to do with the fact that we are working b 0 but with its central extension G b aff . Second, usually most not with the group G aff terms in (13) cancel out (e.g. for G = SL(n) only ζ(n) survives). However, writing the answer as in (13) is still instructive, since one can give an explanation why the above answer is natural in the spirit of Theorem 3.3 (however, we don’t know how to formulate precisely a “cohomological” statement that would imply Theorem 5.4).
114
Alexander Braverman and David Kazhdan
6. Further questions and constructions In this Section we mention some related works and formulate several possible directions of future research. 6.1. Existence of cuspidal representations and automorphic forms. It is interesting whether there exists a reasonable notion of cuspidal automorphic b aff . We can try to say that a function f on Bun g G (Sb0 ) is cuspidal if forms for G L b cGbaff ,P (f ) = 0 for every parahoric subgroup P of Gaff . Here the words “for every parahoric” can in principle be interpreted in several different ways. Namely, we can require that this holds for all parahoric subgroups of positive type, or for all parahoric of negative type, or both. We conjecture that these conditions are in fact equivalent. More precisely, we conjecture that if P and Q are opposite parahoric subgroups then cGbaff ,P (f ) = 0 implies that cGbaff ,Q = 0. If the above conjecture is true, then we get an unambiguous definition of cuspidal functions. However, it is not at all clear whether non-zero cuspidal functions exist. It would be very interesting to understand if they do exist and how to describe the space of cuspidal functions. The above questions make sense locally. As was mentioned in the introduction, b aff (K) (for K in the paper [14]–[15] the theory of representations of the group G being a local non-archimedian field) was developed. One can define the notion of a cuspidal representation in this framework and it would be very interesting to b aff has any irreducible cuspidal representations. understand whether G 6.2. Weil representation and theta-correspondence. In [41] Y. Zhu genf eralized the notion of Weil representation of the double cover Sp(2n, K) of the symplectic group Sp(2n, K) over a local field K (archimedian or not) to its affine analog; moreover, the affine analog of the corresponding automorphic representation is also constructed in loc. cit. One of the main applications of the usual Weil representation to the representation theory of p-adic groups and to automorphic forms is the construction of the so called theta-correspondence (cf. [36] for a survey on theta-correspondence). One of the most interesting features of the theta-correspondence is that it provides a fairly explicit tool for producing examples of cuspidal automorphic forms. It would be very interesting to develop an affine analog of theta-correspondence; in particular, it might give rise to a construction of cuspidal automorphic forms b aff (for some particular choices of G). The full theory of thetafor the group G correspondence has not yet been developed in the affine case. However, it was shown in [20] and [21] that one of the main tools used in the theory of (global) theta-correspondence – the so called Siegel–Weil formula does have an analogue in the affine case. 6.3. Meromorphic continuation of Eisenstein series. As was claimed above, the Eisenstein series EisG,L (f ) is convergent for any cuspidal function f on Pic(X)× BunG (X). This fact is not true for Eis− G,L . Let f be of the form χ × φ where φ is a
Representations of affine Kac–Moody groups over local and global fields
115
cuspidal function on BunG (X) and χ is a unitary character of Pic(X). For every s ∈ C set fs (M, F) = χ(M)φ(F)q s deg(M) . ∨ Then one can show that Eis− G,L (fs ) is absolutely convergent for Re(s) > h where − ∨ h is the dual Coxeter number of G. We conjecture that in fact EisG,L (fs ) has a meromorphic continuation to the domain Re(s) > 0.
6.4. Towards Kazhdan–Lusztig theory for DAHA. Let us recall the algebra H+ aff defined in Subsection 4.3. Moreover, as was remarked after Theorem 4.2, when we choose a local non-archimedian field K the specialization of q to the number of elements in its residue field acquires a natural basis. We conjecture that this basis comes from a C[q, q −1 ]-basis of H+ aff ; it should be thought of as an analog of the “standard basis” of a finite of affine Hecke algebra. It would be interesting to define an analogue of Kazhdan–Lusztig basis of H+ aff . We don’t know how to attack this problem algebraically; however, it should be possible to attack it geometrically using appropriate generalization of the constructions discussed in Subsection 4.6.
References [1] M. F. Atiyah and R. Bott, The Yang–Mills equations over Riemann surfaces. Phil. Trans. R. Soc. Lond. A 308 (1983), 523–615. [2] K. A. Behrend, The Lefschetz trace formula for algebraic stacks. Invent. Math. 112 (1993), 127–149. [3] A. Braverman and M. Finkelberg, Pursuing the double affine Grassmannian I: Transversal slices via instantons on Ak -singularities. Duke Math. J. 152 no. 2 (2010), 175–206 [4] A. Braverman and M. Finkelberg, Pursuing the double affine Grassmannian II: Convolution. Adv. in Math. 230 (2012), 414–432 [5] A. Braverman and M. Finkelberg, Pursuing the double affine Grassmannian III: Convolution with affine Zastava. arXiv:1010.3499. [6] A. Braverman, M. Finkelberg, and D. Gaitsgory, Uhlenbeck spaces via affine Lie algebras. The unity of mathematics, 17–135, Progr. Math., 244, Birkh¨ auser Boston, Boston, MA, 2006. [7] A. Braverman, M. Finkelberg, and D. Kazhdan, Affine Gindikin–Karpelevich formula via Uhlenbeck spaces. [8] A. Braverman, H. Garland, D. Kazhdan, and M. Patnaik, A Gindikin–Karpelevich formula for p-adic loop groups. In preparation. [9] A. Braverman and D. Kazhdan, The spherical Hecke algebra for affine Kac–Moody groups I. Ann. of Math. (2) 174 no. 3 (2011), 1603–1642. [10] A. Braverman and D. Kazhdan, Affine Eisenstein series I: unramified series over functional fields and affine Tamagawa number formula. In preparation. [11] A. Braverman, D. Kazhdan, and M. Patnaik, Hecke algebras for p-adic loop groups. In preparation.
116
Alexander Braverman and David Kazhdan
[12] R. Brylinski, Limits of weight spaces, Lusztig’s q-analogs, and fiberings of adjoint orbits. J. Amer. Math. Soc. 2 no. 3 (1989), 517–533. [13] I. B. Frenkel, Representations of affine Lie algebras, Hecke modular forms and Korteweg–de Vries type equations. In: Lie algebras and related topics (New Brunswick, N.J., 1981), 71–110, Lecture Notes in Math. 933 Springer, Berlin–New York, 1982. [14] D. Gaitsgory and D. Kazhdan, Representations of algebraic groups over a 2-dimensional local field. Geom. Funct. Anal. 14 no. 3 (2004), 535–574. [15] D. Gaitsgory and D. Kazhdan, Algebraic groups over a 2-dimensional local field: irreducibility of certain induced representations. J. Differential Geom. 70 no. 1 (2005), 113–127 [16] D. Gaitsgory and D. Kazhdan, Algebraic groups over a 2-dimensional local field: some further constructions. In: Studies in Lie theory, 97–130, Progr. Math. 243, Birkh¨ auser Boston, Boston, MA, 2006. [17] H. Garland, Eisenstein series on arithmetic quotients of loop groups. Math. Res. Lett. 6 no. 5–6 (1999), 723–733. [18] H. Garland, Absolute convergence of Eisenstein series on loop groups. Duke Math. J. 135 no. 2 (2006), 203–260. [19] H. Garland, On extending the Langlands–Shahidi method to arithmetic quotients of loop groups. In: Representation theory and mathematical physics, 151–167, Contemp. Math., 557, Amer. Math. Soc., Providence, RI, 2011. [20] H. Garland and Y. Zhu, On the Siegel–Weil theorem for loop groups, I. Duke Math. J. 157 no. 2 (2011), 283–336. [21] H. Garland and Y. Zhu, On the Siegel–Weil theorem for loop groups, II. arXiv:0906.4749. [22] S. Gaussent and G. Rousseau, Spherical Hecke algebras for Kac–Moody groups over local fields. arXiv:1201.6050. [23] J. Heinloth and A. Schmitt, The cohomology rings of moduli stacks of principal bundles over curves. Doc. Math. 15 (2010), 423–488. [24] V. Kac, Infinite-dimensional Lie algebras. Second edition, Cambridge University Press, Cambridge, 1985. [25] M. Kapranov, Double affine Hecke algebras and 2-dimensional local fields. J. Amer. Math. Soc. 14 no. 1 (2001), 239–262. [26] M. Kapranov, The elliptic curve in the S-duality theory and Eisenstein series for Kac–Moody groups, math.AG/0001005. [27] S. Kato, Spherical functions and a q-analogue of Kostant’s weight multiplicity formula. Invent. Math. 66 no. 3 (1982), 461–468. [28] R. P. Langlands, On the functional equations satisfied by Eisenstein series. Lecture Notes in Mathematics, 544. Springer-Verlag, Berlin–New York, 1976. [29] R. P. Langlands, The volume of the fundamental domain for some arithmetical subgroups of Chevalley groups. In: 1966 Algebraic Groups and Discontinuous Subgroups (Proc. Sympos. Pure Math., Boulder, Colo., 1965) 143–148 Amer. Math. Soc., Providence, R.I.
Representations of affine Kac–Moody groups over local and global fields
117
[30] G. Lusztig, Singularities, character formulas, and a q-analog of weight multiplicities. Analysis and topology on singular spaces. Ast´erisque, 101–102 (1983), 208–229. [31] I. G. Macdonald, Spherical functions on a p-adic Chevalley group. Bull. Amer. Math. Soc. 74 1968, 520–525. [32] I. G. Macdonald, A formal identity for affine root systems. Lie groups and symmetric spaces. Amer. Math. Soc. Transl. Ser. 2, 210, 195–211, Amer. Math. Soc., Providence, RI, 2003. [33] I. Mirkovi´c and K. Vilonen, Perverse sheaves on affine Grassmannians and Langlands duality. Math. Res. Lett. 7 no. 1 (2000), 13–24. [34] H. Nakajima, Geometric construction of representations of affine algebras. Proceedings of the International Congress of Mathematicians, Vol. I (Beijing, 2002), 423– 438. [35] H. Nakajima, Quiver varieties and branching. SIGMA 5 (2009), 003, 37 pages. [36] D. Prasad, A brief survey on the theta correspondence. Number theory. (Tiruchirapalli, 1996), 171–193, Contemp. Math. 210, Amer. Math. Soc., Providence, RI, 1998. [37] W. Slofstra, A Brylinski filtration for affine Kac–Moody algebras. (English summary) Adv. Math. 229 no. 2 (2012), 968–983. [38] J. Starr, Rational points of rationally connected and rationally simply connected varieties, available at http://www.math.sunysb.edu/~jstarr/papers. [39] S. Viswanath, Kostka–Foulkes polynomials for symmetrizable Kac–Moody algebras. S´em. Lothar. Combin. 58 (2007/08), Art. B58f, 20 pp. [40] E. Witten, Geometric Langlands from Six Dimensions, A celebration of the mathematical legacy of Raoul Bott, CRM Proc. Lecture Notes 50, Amer. Math. Soc., Providence, RI (2010), 281–310. [41] Y. Zhu, Theta functions and Weil representations of loop symplectic groups. Duke Math. J. 143 no. 1 (2008), 17–39.
Alexander Braverman, Department of Mathematics, Brown University E-mail: [email protected] David Kazhdan, Einstein Institute of Mathematics, Hebrew University of Jerusalem E-mail: [email protected]
Emergence of the Abrikosov lattice in several models with two dimensional Coulomb interaction Sylvia Serfaty∗
Abstract. We consider three different models coming from physics and involving the Coulomb interaction of points in the plane. The first is the classical Coulomb gas in an external potential, the second is the Ginzburg–Landau model of superconductivity, the third is the Ohta–Kawasaki model of polymers. In superconductivity, one observes in certain regimes the emergence of densely packed point vortices forming perfect triangular lattices named “Abrikosov lattices” in physics. We show how these Abrikosov lattices are expected to emerge in the three systems, via the minimization of a “Coulombian renormalized energy” that we defined with Etienne Sandier. We also present applications to the statistical mechanics of Coulomb gases, which are related to some random matrix models, and show how the previous results lead to expecting crystallisation in the low temperature limit. 2010 Mathematics Subject Classification. 35B25, 82D55, 35Q99, 35J20, 52C15, 82B05, 82D10, 82D99, 15B52. Keywords. Coulomb gas, log gases, Ginzburg–Landau, superconductivity, vortices, Ohta–Kawasaki, random matrices, Abrikosov lattice, renormalized energy.
1. Introduction We will consider three different models: (1) two-dimensional Coulomb gases (work with Etienne Sandier [39]) (2) the Ginzburg–Landau model of superconductivity (work with Etienne Sandier [38]) (3) the Ohta–Kawasaki model of “diblock copolymers” (work with Dorian Goldman and Cyrill Muratov [20]) These three models share in common the fact that at their core is a Coulomb interaction between a divergent number points in the plane. We wish to describe here and exploit this similarity, to derive a limit “total Coulomb interaction” called “renormalized energy” as the number of points tends to infinity. First let us present the three models. ∗ The
author was supported by a EURYI award.
120
Sylvia Serfaty
1.1. Two-dimensional Coulomb gases. Consider n points x1 , · · · , xn in R2 and a smooth potential V , growing faster than log |x| at infinity. To this configuration of points we associate the energy n X X wn (x1 , · · · , xn ) = − log |xi − xj | + n V (xi ). (1.1) i=1
i6=j
This is the energy of a “Coulomb gas” in the potential V at zero temperature (for general background and references see [18]). The ground states of such a system are the minimizers of wn , and we wish to understand their behavior as n → ∞. The minimizers of wn are also maximizers of Y
|xi − xj |
n Y
e−nV (xi ) .
i=1
i6=j
Such points are called “weighted Fekete sets.” These appear in interpolation and are interesting in their own rightQ (cf. [35]). Note that usual (nonweighted) Fekete sets correspond to maximizers of i6=j |xi −xj | where the points xi are constrained to belong to a certain set S (equivalently this can be obtained by taking V = 0 on S and +∞ on the complement of S, a nonsmooth situation that we don’t quite treat here). For V (x) = |x|2 , some numerical simulations give the shapes of minimizers of wn :
Figure 1. Numerical minimization of wn by Gueron–Shafrir, n = 29
We can also examine the statistical mechanics of such a Coulomb gas in nonzero temperature: for that one considers the so-called Gibbs measure dPβn (x1 , · · · , xn ) =
1 Znβ
e−βwn (x1 ,··· ,xn ) dx1 · · · dxn
(1.2)
where Znβ is the associated partition function, i.e. a normalization factor that makes dPβn a probability measure. The statistical mechanics literature on such
Emergence of the Abrikosov lattice in problems with 2D Coulomb interaction
121
Coulomb systems in two dimensions is relatively vast in two dimensions (cf. e.g. [2, 25, 26, 40, 27, 28]). It was first pointed out by Dyson [17] that Coulomb gases are naturally related P log |xi −xj | i6 = j to random matrices. This is Q due to the fact that e is the square of the Vandermonde determinant i 0 with
ψ, U ∗ (t; s) (N + 1)k U(t; s)ψ ≤ CeK|t−s| hψ, (N + 1)2k+2 ψi for all t, s ∈ R. Remark. The operator inequality V 2 (x) ≤ D(1 − ∆), meaning that Z Z dx V (x) |ϕ(x)|2 ≤ D dx |∇ϕ(x)|2 + |ϕ(x)|2 = Dkϕk2H 1 , for all ϕ ∈ L2 (R3 ), is satisfied, because of Hardy’s inequality, for potentials with Coulomb type singularities V (x) = ±1/|x|. Proposition 2.1 immediately implies that the first error term on the r.h.s. of (14) is of the order 1/N , for any fixed t ∈ R. With some more work one can show the same estimate also for the last two terms on the r.h.s. of (14). As a consequence, one obtains convergence towards the Hartree dynamics. The details of the proof of the next theorem can be found in [7].
524
Benjamin Schlein
Theorem 2.2. Suppose that there exists a constant D > 0 such that the operator inequality V 2 (x) ≤ D(1 − ∆) holds true. Suppose that, at time t = 0, ΨN = √ W ( N ϕ)Ω, for some ϕ ∈ H 1 (R3 ). Let ΨN,t = e−iHN t ΨN , with the Hamilton (1) operator (8), and let ΓN,t be the one-particle density associated with ΨN,t . Then there exist constants C, K > 0 with CeK|t| (1) tr ΓN,t − |ϕt ihϕt | ≤ N where ϕt is the solution of the Hartree equation (4), with initial data ϕt=0 = ϕ. Similar bounds can also be proven for the k-particle reduced density, for any fixed k ∈ N. Remark. The same result can be obtained, with exactly the same techniques, √ for initial states of the form ψN = W ( N ϕ)ψ, for a ψ ∈ F with hψ, N 2 ψi . 1 (independent of N ). This approach to the study of the evolution of coherent states not only implies the convergence towards the mean field Hartree dynamics. Instead, it also establishes the form of the fluctuation dynamics in the limit of large N . Formally, the generator LN (t) converges, as N → ∞, towards the Fock-space operator Z Z L∞ (t) = dx ∇x a∗x ∇x ax + dx (V ∗ |ϕt |2 )(x)a∗x ax Z + dxdy V (x − y)ϕt (x)ϕt (y) a∗x ay Z + dxdy V (x − y) ϕt (x)ϕt (y) a∗x a∗y + ϕt (x)ϕt (y)ax ay . One can expect, therefore, that the fluctuation dynamics U(t; s) converges, as N → ∞, towards the limiting dynamics U∞ (t; s), defined by i∂t U∞ (t; s) = L∞ (t)U∞ (t; s)
with
U∞ (s; s) = 1
for all s ∈ R .
Since L∞ (t) is a time-dependent unbounded operator, the definition of the limiting dynamics U∞ (t; s) generated by L∞ (t) is not at all trivial. The existence of U∞ (t; s), and the (strong) convergence U(t; s) → U∞ (t; s) were rigorously established in [4], making use of appropriate approximations of L∞ (t). Since the generator L∞ (t) is a quadratic expression in creation and annihilation operators, it turns out that the limiting dynamics U∞ (t; s) can be described as a so called Bogoliubov transformation. For f, g ∈ L2 (R3 ), we define on F the linear combination of creation and annihilation operators A(f, g) = a∗ (f ) + a(g). By definition, A is linear in both f and g. Observe that 0 J f (A(f, g))∗ = A(Jg, Jf ) = A J 0 g
525
Effective equations for quantum dynamics
where J is the antiunitary operator on L2 (R3 ) defined by Jf = f . A simple computation shows that, in terms of the operators A(f, g), the canonical commutation relations (6) assume the form f1 1 0 f2 [A(f1 , g1 ), A(f2 , g2 )] = , g1 0 −1 g2 where h · , · i denotes the standard inner product on L2 (R3 )⊕L2 (R3 ). A Bogoliubov transformation is a linear map θ : L2 (R3 ) ⊕ L2 (R3 ) → L2 (R3 ) ⊕ L2 (R3 ) with the properties 0 J 0 J θ = θ (16) J 0 J 0 and θ∗
1 0 0 −1
θ=
1 0 0 −1
.
(17)
Eq. (16) guarantees the preservation of the relation between A and its adjoint. Eq. (17), on the other hand, guarantees the preservation of the canonical commutation relations. It is simple to check that every Bogoliubov transformation can be expressed through the operator-valued matrix U JV J θ= (18) V JU J where U, V : L2 (R3 ) → L2 (R3 ) are s.t. U ∗ U −V ∗ V = 1 and U ∗ V −V ∗ U = 0 (notice that θ is not unitary, unless V = 0). Every Bogoliubov transformation defines therefore a new set of creation and annihilation operators. It is interesting to ask when the new representation of the canonical commutation relation is unitary equivalent to the one given by the original operators a∗ (f ), a(f ). It turns out that this is the case if and only if the operator V appearing in (18) is Hilbert–Schmidt (Shale–Stinespring condition). The statement that the limiting fluctuation dynamics U∞ (t; s) acts as a Bogoliubov transformation has to be understood as follows. There exists a two parameter family of Bogoliubov transformation θ(t; s) : L2 (R3 ) ⊕ L2 (R3 ) → L2 (R3 ) ⊕ L2 (R3 ) such that ∗ U∞ (t; s)A(f, g)U∞ (t; s) = A(θ(t; s)(f, g)) (19) for every f, g ∈ L2 (R3 ). Formally, θ(t; s) is given by the solution of Dt −B t i∂t θ(t; s) = θ(t; s) Bt −Dt with initial condition θ(s; s) = 1, and with Dt , Bt : L2 (R3 ) → L2 (R3 ) defined by Dt f = −∆f + (V ∗ |ϕt |2 )f + (V ∗ ϕt f )ϕt , Bt f = (V ∗ ϕt f )ϕt . We will not make use of this formal characterization of θ(t; s). The important observation is that the time evolution U∞ (t; s), which is in principle a two-parameter
526
Benjamin Schlein
family of unitary transformation on the large Hilbert space F can be completely described in terms of the family θ(t; s) operating on the much smaller space L2 (R3 ) ⊕ L2 (R3 ). So far we considered the evolution of initial coherent states. Next, we turn our attention back to initially factorized (or approximately factorized) N -particle states, as those considered in the introduction. To this end, we notice that, for any ϕ ∈ L2 (R3 ), we can write the factorized state √ {0, . . . , 0, ϕ⊗N , 0, . . . } = dN PN W ( N ϕ)Ω where PN denotes the orthogonal projection onto the sector with exactly N particles, and where dN is a normalization constant; a simple computation shows that dN ' N 1/4 . The reduced one-particle density associated with the evolution of the factorized initial data is given by 1 (a∗ (ϕ))N (a∗ (ϕ))N (1) γN,t (x, y) = e−iHN t √ Ω, a∗x ay e−iHN t √ Ω N N! N! ∗ N √ dN (a (ϕ)) = Ω, a∗x ay e−iHN t PN W ( N ϕ)Ω e−iHN t √ N N! ∗ √ dN (a (ϕ))N = e−iHN t √ Ω, a∗x ay e−iHN t W ( N ϕ)Ω N N! because PN commutes with HN and with a∗y ax , and because PN (a∗ (ϕ))N Ω = (a∗ (ϕ))N Ω. Letting √ (a∗ (ϕ))N ξ = dN W ∗ ( N ϕ) √ Ω N! we find E √ √ 1D (1) γN,t (x, y) = ξ, U(t; 0)(a∗x + N ϕt (x))(ay + N ϕt (y))U ∗ (t; 0)Ω N 1 = ϕt (x)ϕt (y) + hξ, U ∗ (t; 0)a∗x ay U(t; 0)Ωi (20) N ϕ (x) ϕt (y) + √t hξ, U ∗ (t; 0)ay U(t; 0)Ωi + √ hξ, U ∗ (t; 0)a∗x U(t; 0)Ωi N N where U(t; s) denotes the fluctuation dynamics defined in (13). From this formula, we conclude that, similarly as for initial coherent states, proving the convergence towards the Hartree dynamics reduces to the problem of obtaining uniform bounds for the growth of the product |hξ, U ∗ (t; 0)N U(t; 0)Ωi| .
(21)
The only difference compared to the case of coherent initial data is that now, on the l.h.s. of the product, we have the vector ξ instead of the vacuum. Since kξk = dN ' N 1/4 , it seems now more difficult to get estimates uniformly in N . It
Effective equations for quantum dynamics
527
turns out, however, that when restricted to sectors with small number of particles, ξ is an order one vector, in the sense that k(N + 1)−1 ξk . 1 uniformly in N . For this reason, |hξ, U ∗ (t; 0)N U(t; 0)Ωi| ≤ k(N + 1)−1 ξk k(N + 1)U ∗ (t; 0)N U(t; 0)Ωk . eK|t| k(N + 1)3 U(t; 0)Ωk
(22)
. e2K|t| k(N + 1)7 Ωk . e2K|t| applying twice Proposition 2.1. As a corollary of (22), with some additional work needed to bound the last two terms on the r.h.s. of (20), we obtain the convergence of the evolution of initially factorized data towards the Hartree dynamics, with an explicit bound on the rate of convergence. The proof of the following theorem was found in [2], optimizing ideas from [7] (in [7], the error for factorized initial data was shown to be at most of the order N −1/2 , for every fixed t ∈ R). Theorem 2.3. Suppose that there exists a constant D > 0 such that the operator inequality V 2 (x) ≤ D(1 − ∆) holds true. Let ψN,t = e−iHN t ϕ⊗N with the Hamilton (1) operator (2), and for some ϕ ∈ H 1 (R3 ). Let γN,t be the one-particle reduced density associated with ψN,t . Then there exist constants C, K > 0 with CeK|t| (1) . tr γN,t − |ϕt ihϕt | ≤ N Similar bounds holds for the k-particle reduced densities as well. Remark. √The same result can be obtained for initial data of the form ψN = dN PN W ( N ϕ)ψ, for an arbitrary ψ ∈ F with hψ, N m ψi . 1 for some m ∈ N large enough, and where dN ' N 1/4 is chosen s.t. ψN is normalized.
3. A probabilistic setting In this last section, we formulate the convergence towards the mean field Hartree dynamics in a language more common in probability theory. To this end, we consider an N -particle system, described by a permutation symmetric wave function ψN ∈ L2 (R3N ). We consider, moreover, a self adjoint operator O acting on L2 (R3 ). For j = 1, . . . , N , we define O(j) to be the self-adjoint operator acting as O on the j-th particle and as the identity on the other (N − 1) particles. Every O(j) can be thought of as a random variable assuming different values with different probabilities. Through the spectral theorem, the wave function ψN determines the law of the random variables O(j) . At time t = 0, we assume that the system is described by a factorized wave function ψN = ϕ⊗N , for some ϕ ∈ L2 (R3 ). The operators O(j) , j = 1, . . . , N , define then a sequence of independent and identically distributed random variables, with
528
Benjamin Schlein
a common distribution determined by ϕ. We obtain immediately the (weak) law of large numbers, stating that, for any δ > 0, N 1 X Pϕ⊗N O(j) − hϕ, Oϕi ≥ δ → 0 N j=1 as N → ∞. We also have a central limit theorem, stating that, in distribution, N 1 X (j) √ O − hϕ, Oϕi → N (0, σ 2 ) N j=1
where N (0, σ 2 ) denotes a centered Gaussian random variable, with the variance σ 2 = hϕ, O2 ϕi − hϕ, Oϕi2 . Now, let us consider the evolution of the many body quantum system, as generated by the mean field Hamiltonian (2). The wave function describing the evolved system is given by ψN,t = e−iHN t ϕ⊗N . For t 6= 0, ψN,t is not factorized; hence, the random variables O(j) are not independent. Still, the results presented above imply that correlations are small in the limit of large N , at least in the sense of the reduced densities. It seems therefore natural to ask whether law of large numbers and central limit theorem are still valid at t 6= 0. It turns out that the convergence of the reduced density, as stated in Theorem 2.3, easily implies the (weak) law of e = O − hϕt , Oϕt i, we find large numbers. In fact, letting O ! !2 + * N N 1 X X 1 (i) (i) e e O ≥ δ ≤ 2 2 ψN,t , O ψN,t PψN,t N δ N i=1 i=1 1 (2) e e2 e + 1 tr γ (1) O tr γN,t (O ⊗ O) N,t δ2 δ2 N 1 e ⊗ O) e =0 → 2 tr |ϕt ihϕt |2 (O δ
=
e tr |ϕt ihϕt |O e = hϕt , Oϕ e t i = 0. because, by definition of O, What about the central limit theorem at t 6= 0? The answer to this question is given in the next theorem, which was recently proven in [1]. Theorem 3.1. Suppose that V 2 (x) ≤ D(1 − ∆) for a constant D > 0. Let ψN,t = e−iHN t ϕ⊗N with HN given in (2) and for ϕ ∈ H 1 (R3 ). Let O be a bounded self-adjoint operator on L2 (R3 ), with k∇O(1 − ∆)−1/2 k < ∞. Then, for every t ∈ R, with respect to the evolved wave function ψN,t , the random variable N 1 X (i) √ O − hϕt , Oϕt i N i=1
converges in distribution, as N → ∞, to a centered Gaussian random variable with variance
Effective equations for quantum dynamics
σt2 =
529
h
θ(t; 0) Oϕt , Oϕt , θ(t; 0) Oϕt , Oϕt 2 i 1 − θ(t; 0) Oϕt , Oϕt , √ (ϕ, ϕ) 2
where θ(t; 0) is the time-dependent Bogoliubov transformation introduced in (19). Hence, for t 6= 0, the central limit theorem is still valid but the variance of the limiting Gaussian variable is changed. While on the level of the law of large numbers there is no difference between ψN,t and the product ϕ⊗N , this is not true t for the central limit theorem; although in both cases the fluctuations are gaussians, their variance is not the same.
References [1] G. Ben Arous, K. Kirkpatrik, and B. Schlein, A central limit theorem in many-body quantum dynamics. Preprint. arXiv:1111.6999. [2] L. Chen, J. O. Lee, and B. Schlein, Rate of convergence towards Hartree dynamics. J. Statist. Phys. 144 no. 4 (2011), 872–903. [3] L. Erd˝ os and H.-T. Yau, Derivation of the nonlinear Schr¨ odinger equation from a many body Coulomb system. Adv. Theor. Math. Phys. 5 no. 6 (2001), 1169–1205. [4] J. Ginibre and G. Velo, The classical field limit of scattering theory for non-relativistic many-boson systems. I and II. Commun. Math. Phys. 66 (1979), 37–76, and 68 (1979), 45–68. [5] K. Hepp, The classical limit for quantum mechanical correlation functions. Commun. Math. Phys. 35 (1974), 265–277. [6] A. Knowles and P. Pickl, Mean-field dynamics: singular potentials and rate of convergence. Comm. Math. Phys. 298 (2010), 101–139. [7] I. Rodnianski and B. Schlein, Quantum fluctuations and rate of convergence towards mean field dynamics. Comm. Math. Phys. 291 no. 1 (2009), 31–61. [8] H. Spohn, Kinetic equations from Hamiltonian dynamics. Rev. Mod. Phys. 52 no. 3 (1980), 569–615.
Institute for applied mathematics, University of Bonn, Endenicher Allee 60, 53115 Bonn, Germany E-mail: [email protected]
Combinatorics of asymptotic representation theory Piotr Śniady
Abstract. The representation theory of the symmetric groups S(n) is intimately related to combinatorics: combinatorial objects such as Young tableaux and combinatorial algorithms such as Murnaghan–Nakayama rule. In the limit as n tends to infinity, the structure of these combinatorial objects and algorithms becomes complicated and it is hard to extract from them some meaningful answers to asymptotic questions. In order to overcome these difficulties, a kind of dual combinatorics of the representation theory of the symmetric groups was initiated in 1990s. We will concentrate on one of its highlights: Kerov polynomials which express characters in terms of, so called, free cumulants. 2010 Mathematics Subject Classification. Primary 20C30; Secondary 05E10, 46L54. Keywords. representations of symmetric groups, Young diagrams, asymptotic representation theory, free cumulants, Kerov polynomials.
Dedicated to Augustyn Kałuża, my Teacher. This note is a guided tour through some selected topics of the asymptotic representation theory of the symmetric groups and its combinatorics. Our guide will be the formula shape character z }| { z}|{ Ch5 = R6 + 15R4 + 5R22 + 8R2 . (∗) The forthcoming sections are devoted to an explanation of the cryptic quantities involved here as well as to exploration of the interesting features of this equality.
1. Representations and characters The left-hand side of (∗) is a character, a fundamental object in the representation theory. In this section we will briefly review this theory. 1.1. Example: representation of the symmetric group S(3). Roughly speaking, the subject of the representation theory is the investigation of the ways in which a given abstract group can be realized concretely as a group of matrices. Before we give a formal definition let us have a look on a simple example. The symmetric group S(3) is the group of permutations of the set {1, 2, 3}. If we label the vertices of an equilateral triangle by the elements of this set (Figure 1a), any element of S(3) gives rise to a symmetry of the triangle, thus to an isometry of the plane. If the coordinate system is chosen properly, these isometries are linear
532
Piotr Śniady
1
2
3 (a)
(b)
Figure 1: (a) Equilateral triangle on the plane. (b) Regular dodecahedron and one of the five cubes (the dashed lines) which can be inscribed into it. and thus described by 2 × 2 matrices. We can say (abusing a bit the terminology) that we represented the symmetric group S(3) as certain 2 × 2 matrices. Formally, a representation of a group G is a homomorphism ρ : G → Mn which to the elements of the group associates invertible matrices. 1.2. Example: representation of the alternating group A(5). The above example was too simple; we will now present a less obvious one. It is possible to inscribe a cube into the regular dodecahedron in such a way that each vertex of the cube is also a vertex of the dodecahedron (Figure 1b). For a fixed dodecahedron there are five such cubes. Thus to any rotation of the dodecahedron corresponds a permutation of the cubes. This permutation is even or, in other words, belongs to the alternating group A(5) and this correspondence is bijective. We can revert the optics: to any element of the alternating group A(5) we associate the corresponding rotation of the dodecahedron. If the coordinate system is chosen properly, this rotation is a linear isometry. In this way we constructed an interesting representation of the alternating group A(5). 1.3. Motivations. As we already mentioned, the representation theory studies the ways a given abstract group can be represented concretely as a group of matrices. In this way we can use the power of linear algebra in order to study problems from the group theory. Another motivation comes from harmonic analysis. One of the most powerful tools for analysis and probability on the real line R is the Fourier transform. If we would like to replace the real line R by the finite cyclic group Zn one should simply use the discrete Fourier transform instead. It is less obvious how to define the Fourier transform on a non-commutative finite group G. It turns out that representations of the group G are the right tool to define such an analogue. 1.4. Characters. If we view a representation ρ : G → Mn as a matrix-valued function, its values depend on the choice of the coordinate system in the vector space. Sometimes it would be preferable to have some quantities which do not
533
Combinatorics of asymptotic representation theory
depend on such choices. One of such quantities is the trace of a matrix. This motivates the study of the character of ρ χρ (g) := Tr ρ(g)
for g ∈ G,
which is a scalar-valued function on the group G. A significant part of the representation theory is devoted to investigation of such characters. At first sight it might appear that changing the focus from representations to characters might cause some loss of information because a matrix contains much more data than just its trace. Surprisingly, it is not the case as almost all natural questions of the representation theory can be reformulated in the language of characters. The left-hand side of our guiding formula (∗) is such a character (up to some normalizing factors which will be discussed later).
2. Young diagrams and their shapes The right-hand side of our favorite equality (∗) describes the shape of a Young diagram. This section is devoted to this concept. 2.1. Irreducible representations. If ρ1 : G → Mn1 and ρ2 : G → Mn2 are representations of the same group G, we can define a new representation of G, called direct sum ρ1 ⊕ ρ2 : G → Mn1 +n2 which is given by the block matrices (ρ1 ⊕ ρ2 )(g) =
ρ1 (g)
ρ2 (g)
.
Representations which can be written (possibly after change of the coordinate system) as direct sums of smaller representations are called reducible and are less interesting. Our attention will concentrate on representations for which such a decomposition is not possible; they are called irreducible representations and play a fundamental role in the representation theory (for example they are used in the construction of the non-commutative Fourier transform). The corresponding characters are called irreducible characters and they are in the focus of this article. 2.2. Irreducible representations of the symmetric groups and Young diagrams. There is a bijection between irreducible representations of the symmetric group S(n) and Young diagrams with n boxes. The latter are collections of boxes which are nicely aligned to the left and to the bottom (Figure 2a). For a Young diagram λ we will denote the corresponding irreducible representation by ρλ . Unfortunately, the details of this bijection are technically involved. In order to give the flavor of this difficulty we mention only that one of the irreducible representations of the symmetric group S(5) is closely related to the not-so-trivial representation of its subgroup A(5) ⊂ S(5) which we discussed in Section 1.2.
534
6 3
2
1
0
4
−
1
−
2
2
0
−
2
−
4
Piotr Śniady
z z (a)
(b)
Figure 2: (a) Young diagram λ = (3, 1) corresponding to the partition 4 = 3 + 1. (b) The dilation 2λ of the Young diagram λ shown on the left. 2.3. Shape of the Young diagram. What can we say about the irreducible representations of the symmetric groups when the corresponding Young diagrams tend infinity, having a fixed ‘macroscopic shape’ ? In order to make this question more concrete, we will use the notion of dilation. If s is a positive integer and λ is a Young diagram we will denote by sλ the dilated Young diagram, obtained from λ by replacing each box by a s × s grid of boxes (Figure 2b). If we disregard the size, such a dilated Young diagram has the same shape as the original diagram (compare Figures 2a and 2b). Our original question can be therefore reformulated as investigation of the dilated Young diagrams sλ, where λ is fixed and s → ∞. 2.4. Homogeneous functions. For this kind of asymptotic problems we need the right tools: functions on the set of Young diagrams which would depend ‘nicely’ on the shape of the Young diagram. For example we could require from such a nice function f that it depends only on the shape of the Young diagram and not on its size: f (sλ) = f (λ). This requirement is too strong; it would not be a big problem if we allow a simple dependence of f on the the size of the Young diagram: f (sλ) = sk f (λ) for some exponent k. If this is the case we say that f is homogeneous of degree k.
3. Relationship between characters and the shape? Our favorite formula (∗) gives a relationship between the irreducible characters of the symmetric groups and the shape of the Young diagram. In this section we will investigate this kind of relationships. For a wide class of questions concerning irreducible representations of the symmetric groups there is a known answer given in terms of some combinatorial algorithm involving boxes of the Young diagram. For example, the dimension of the
Combinatorics of asymptotic representation theory
535
75 81 89 98 100 58 60 72 94 99 51 56 62 93 95 26 38 54 79 92 18 33 37 59 87 12 20 35 36 42 46 67 68 70 78 82 84 88 90 97 11 17 19 22 30 43 52 55 64 65 66 74 83 85 96 8
10 13 21 23 29 34 45 47 49 63 71 76 80 91
2
7
9
15 16 24 27 39 41 44 48 57 69 77 86
1
3
4
5
6
14 25 28 31 32 40 50 53 61 73
Figure 3: Example of a Young tableau. Shaded regions show boxes with numbers smaller than some thresholds. irreducible representation ρλ is equal to the number of Young tableaux filling λ. A Young tableau is a filling of the boxes of the Young diagram λ with numbers 1, 2, . . . , n (where n is the number of boxes of λ) in such a way that each number is used exactly once and the numbers increase from left to right and from bottom to top (Figure 3). Investigation of algorithms with a similar flavor is a one of important branches of combinatorics. In particular, irreducible characters of the symmetric groups χλ (π) := Tr ρλ (π)
for π ∈ S(n)
can be calculated using such a combinatorial algorithm, the Murnaghan–Nakayama rule which is a signed sum over, roughly speaking, Young tableaux filling λ with some additional properties (related to the conjugacy class of the permutation π). Unfortunately, it is a common feature of such combinatorial algorithms that they quickly become cumbersome when the number of the boxes of the Young diagram tends to infinity. For this reason they are not very suitable for the investigation of the asymptotics problems. In particular, Murnaghan–Nakayama rule does not give too much insight into our favorite question — the answer to which is given by our guiding equality (∗) — about the relationship between the characters and the shape of the Young diagram. In order to overcome these difficulties we will have to find a better normalization of the characters as well as find a good way of describing the shape of a Young diagram.
4. Normalized characters The left-hand side of our favorite equality (∗) is the normalized character. In this section we will present the details of this quantity.
536
Piotr Śniady
The usual way of studying the characters of the symmetric groups is to fix the Young diagram λ and to view χλ (π) as a function of the permutation π. It was a brilliant idea of Kerov and Olshanski to do the opposite and to study the dual combinatorics of representations of the symmetric groups, see below. For a fixed integer k ≥ 1 we will denote by [k] = (1, 2, . . . , k) ∈ S(k) the full cycle; we will investigate the characters evaluated on the permutation [k]. Let λ be a Young diagram with n boxes; we are interested in the character Tr ρλ ([k]). It might seem that this quantity does not make much sense since [k] belongs to S(k) while ρλ is a representation of a different group, namely S(n). Nevertheless, for n ≥ k we can consider an embedding S(k) ⊂ S(n) and regard [k] as an element of S(n) simply by adding additional fixpoints. It turns out that the ‘right’ way to define the normalized character on a cycle of length k is as follows: Tr ρλ ([k]) , Chk (λ) := n(n − 1) · · · (n − k + 1) {z } Tr ρλ (e) | k factors
where n is the number of boxes of λ. As we already mentioned, we will view the normalized character Chk as a function on the set of Young diagrams and we impose no restrictions on the number of boxes of the Young diagrams. The normalization factor in the above definition is equal to zero if n < k; in this way the right-hand side is equal to zero and we do not have to worry that ρλ ([k]) is not well-defined in this case. The denominator Tr ρλ (e), the character on the group unit, is equal just to the dimension of the representation ρλ and there are some effective methods of calculating it. This means that the normalized characters Chk contain essentially the same information as the usual characters χλ (π) thus they are just as interesting. On the other hand the normalized characters Chk have some advantages over the usual characters χλ (π), for example (λ1 , λ2 , . . . ) 7→ Chk (λ) is a polynomial function of the lengths of the rows of the Young diagram λ = (λ1 , λ2 , . . . ).
5. Free cumulants The right-hand side of our favorite formula (∗) concerns the shape of the Young diagram. The question is: how to choose parameters which would describe the shape of the Young diagram in the best way? The answer comes from Voiculescu’s free probability theory.
Combinatorics of asymptotic representation theory
537
5.1. Random matrices and free cumulants. Voiculescu initiated a highly non-commutative probability theory called free probability. One of its highlights is related to random matrices. We consider the following concrete problem. Let a11 · · · a1n .. .. A = ... . . an1
···
ann
be an n × n random matrix, selected uniformly from the manifold of all hermitian matrices with prescribed eigenvalues x1 , . . . , xn . Let 1 ≤ m < n; what can we say about the eigenvalues of the m × m upper-left corner a11 · · · a1m .. ? .. A0 = ... . . am1
···
amm
In the limit as n → ∞ and m n converges to some limit, a kind of law of large numbers occurs and these eigenvalues with high probability concentrate around some limit distribution depending only on the distribution of the eigenvalues x1 , . . . , xn of the big matrix. As an illustration, we present the results of a computer experiment for a large matrix A with eigenvalues −80, 0, 120 (each with some high multiplicity which will be discussed later). The eigenvalues of the corner matrix A0 are shown as ‘plus’ markers on the z-axis on Figure 4. Some of these eigenvalues are degenerate and coincide with the three eigenvalues of the original matrix (thick markers). The remaining eigenvalues occupy two intervals: one around −40 and one around 80, the empirical density of these eigenvalues turns out to be quite close to the asymptotic value for n → ∞. In order to explain this law of large numbers phenomenon we will use free cumulants. The basic idea is that even though the random matrix A is a complicated, multidimensional object, its corner entry a11 is just a complex-valued random variable which can be investigated by the (logarithm of) the Fourier-Laplace transform. The fact that our random matrix A has a large symmetry implies that this corner entry a11 contains essentially all information about A which is necessary for asymptotic problems. The free cumulants Rk = Rk (A) of the random matrix A (with k ∈ {1, 2, . . . }) are defined as suitably normalized coefficients in the expansion t2 t3 R2 + 2 R3 + · · · . 2n 3n The normalization was chosen in such a way that the free cumulants converge to finite values as n → ∞. Free cumulants contain the same information as the eigenvalues of the matrix but they are much more convenient for asymptotic problems. For example, the solution to the our problem of the eigenvalues of the corner A0 is given by m k Rk+1 (A0 ) = Rk+1 (A) (1) n which is a direct consequence of the above definition of free cumulants. log Eeta11 = tR1 +
538
12 0
80
40
0
− 40
− 80
Piotr Śniady
z
Figure 4: The ‘plus’ markers on the z-axis indicate eigenvalues of a large random matrix A0 . A large Young tableau analogous to Figure 3 (individual boxes were not shown for clarity). Shaded regions show boxes with numbers smaller than some thresholds. The diagonal lines show the z-coordinates of the concave corners of the Young diagram drawn with a thick line. 5.2. Free cumulants of a Young diagram. It was observed by Biane that to an irreducible representation ρλ of a symmetric group one can associate a certain large matrix Γλ which contains all information about ρλ . The eigenvalues of this matrix are nicely related to the shape of the Young diagram: the z-coordinates (defined as x − y) of the concave corners of the Young diagram (indicated by the diagonal gray lines on Figure 2a) are the eigenvalues of this matrix. For example, the matrix Γλ associated to the Young diagram from Figure 2a has eigenvalues: −2, 0, 3 (each with some high multiplicity). In particular, the eigenvalues of Γλ and of Γsλ are related to each other by a simple scaling by factor s; compare Figures 2a and 2b. Biane defined the free cumulants of the Young diagram λ Rk (λ) := Rk (Γλ ) as free cumulants of the corresponding matrix Γλ . The free cumulant Rk is a homogeneous function of degree k on the set of Young diagrams: Rk (sλ) = sk Rk (λ), in other words free cumulants are examples of the ‘nice’ functions which we were looking for in Section 2.4. From our perspective we can forget that free cumulants of
539
Combinatorics of asymptotic representation theory
R3 (λ)
R4 (λ)
Figure 5: Intuitive meaning of the free cumulants R3 and R4 as parameters describing shape of a Young diagram. Young diagrams have this long and interesting history related to random matrices and simply treat them as convenient parameters describing the shape of the Young diagram and which can be calculated efficiently. For example, for a Young diagram λ with n boxes (Figure 5) ZZ ZZ 3 R3 (λ) = 2 (x − y) dx dy; R4 (λ) = 3 (x − y)2 dx dy − n. 2 (x,y)∈λ
(x,y)∈λ
6. Kerov polynomials It was proved by Kerov that free cumulants Rk = Rk (λ) can be used for calculation of the normalized characters Chk = Chk (λ). For example, Ch2 = R3 , Ch3 = R4 + R2 , Ch4 = R5 + 5R3 , Ch5 = R6 + 15R4 + 5R22 + 8R2 . The right-hand sides are called Kerov polynomials. The reader can recognize that our guiding formula (∗) is among them. We will discuss some interesting features of these polynomials. 6.1. Ch5 ≈ R6 . We evaluate the equality (∗) on the dilated diagram sλ. Homogeneity of free cumulants implies that Ch5 (sλ) = s6 R6 (λ) + 15s4 R4 (λ) + 5s4 R22 (λ) + 8s2 R2 (λ) . {z } | {z } | {z } | degree 6
degree 4
degree 2
In the limit s → ∞ only the top-degree part really matters, therefore we can informally write Ch5 ≈ R6 . It was shown by Biane that it is a general phenomenon: the value of the (normalized) irreducible character on the cycle [k] is (asymptotically, for large Young diagrams) given by the free cumulant Rk+1 of the Young diagram: Chk ≈ Rk+1 . The left-hand side is the quantity which we wanted to understand because it is so fundamental for the representation theory; the right-hand side can be efficiently calculated from the shape of the Young diagram. This is a beautiful result and we will present one of its applications below.
540
Piotr Śniady
6.2. Biane’s law of large numbers. Let us randomly select a Young tableau filling prescribed Young diagram λ with n boxes and let us remove boxes with numbers bigger than some prescribed threshold 1 ≤ m < n (Figure 3). What can we say about the shape of the resulting smaller Young diagram µ? The results of computer experiments (Figure 4) suggest that with high probability these new Young diagrams asymptotically concentrate around some smooth limiting shapes. This problem can be reformulated in the language of the representation theory: ρλ is a representation of S(n); we consider its restriction ρλ ↓S(m) to the subgroup S(m). This restriction is usually a reducible representation; we decompose it into irreducible components and randomly select one of them, say ρµ . The distribution of the resulting random Young diagram µ has the same distribution as in the original problem. For large Young diagrams λ and µ we can write Tr χλ ([k]) , Tr χλ (e) λ Tr χµ ([k]) k Tr χ ([k]) = m . ERk+1 (µ) ≈ E Chk (µ) ≈ mk E Tr χµ (e) Tr χλ (e) Rk+1 (λ) ≈ Chk (λ) ≈ nk
By comparing these two approximate equalities we conclude that for a typical random Young diagram µ we can expect that m k Rk+1 (µ) ≈ Rk+1 (λ). (2) n What a surprise! This equality has the same form as (1) for the random matrices. This is illustrated on Figure 4: the diagonal lines indicate the z-coordinates of the concave corners of the Young diagram µ drawn with a thick line while the ‘plus’ markers indicate eigenvalues of a corner of a large random matrix A with the same eigenvalues as Γλ . The parallelism between (1) and (2) implies that (asymptotically) the density of the eigenvalues of the corner matrix A0 should match the density of eigenvalues of Γµ and thus the z-coordinates of the concave corners of the diagram µ. As one can see on Figure 4, there is indeed a good match. 6.3. Kerov positivity conjecture. An interesting feature of the examples of Kerov polynomials presented at the beginning of Section 6 is that their coefficients are non-negative integers. Kerov positivity conjecture states that it is a general phenomenon. The fact that these coefficients are integers followed easily from Kerov’s construction but their positivity was rather mysterious. Combinatorialists tend to believe that if some reasonable integer numbers turn out to be non-negative, there should be a natural explanation by showing that they are cardinalities of some interesting objects. Following this line of thinking, positivity conjectures indicate that the object we are investigating might have some hidden underlying structure and thus such conjectures are very inspirational for the research. This was also the case with Kerov conjecture; it initiated investigation of the characters of the symmetric groups with a new perspective. We will review it in the following.
541
Combinatorics of asymptotic representation theory
4 6
6 R2 R4
1
7 1
5
3 2 4
Figure 6: Map on the torus. The left side of the square should be glued to the right side, as well as bottom to top, as indicated by arrows. 6.4. Maps. A map is a graph drawn on an oriented surface (Figure 6). We will consider only maps which are bipartite (each vertex is either white or black, there are no edges between vertices of the same color), unicellular (if we remove the graph from the surface, the remaining part — called cell — is homeomorphic to one disc), labeled (the edges are labeled; if we go clockwise along the boundary of the cell and read every second label, whey will form the sequence 1, 2, . . . , k). A map carries more information than just a graph, for example for each vertex it makes sense to speak about the cyclic order of the incident edges. We can encode this by a cycle from the permutation group S(k). By merging such disjoint cycles corresponding to white vertices (respectively, black vertices) we obtain a permutation σ1 (respectively, σ2 ). For example, map presented on Figure 6 corresponds to σ1 = (1, 6)(2)(3)(4, 7, 5) and σ2 = (1, 2, 3, 5)(4, 7, 6). Permutations σ1 and σ2 contain the same information as the original map. In particular, the structure of the cells of our map can be recovered from the product σ1 σ2 ; in our example σ1 σ2 = (1, 2, 3, . . . , 7) = [7] ∈ S(7) has exactly one cycle which reflects the fact that our map is unicellular. Studying the maps is therefore equivalent to studying solutions of the equation σ1 σ2 = [k]
with σ1 , σ2 ∈ S(k),
but maps have an advantage related to their geometric and graph-theoretic flavor. 6.5. Stanley character formula. Attempts to prove Kerov conjecture have led in a natural way to discovery of Stanley’s formula for normalized characters: X Chk (λ) = (−1)k−#white vertices NM (λ), M
where the sum runs over all maps M with k edges. Above NM (λ) denotes the number of embeddings of the map M to the Young diagram λ (Figure 7). An
542
Piotr Śniady
2 W
V
3
Σ 5
β
1, 4
α
2, 5
Π
W
V Π
4 1
1 2
a
(a)
3 b
Σ
c (b)
Figure 7: (a) map on the torus and (b) an example of its embedding F (Σ) = α, F (Π) = β, F (V ) = a, F (W ) = c. F (1) = F (4) = (aβ), F (2) = F (5) = (aα), F (3) = (cα). The columns of the Young diagram were indexed by Latin letters, the rows by Greek letters. embedding is a function which maps white vertices to columns of λ, black vertices to rows of λ, edges to boxes of λ. We also require that an embedding preserves the incidence, i.e. a vertex V and an incident edge E should be mapped to a row or column F (V ) which contains the box F (E). Stanley’s formula is a perfect tool for studying asymptotics of characters of symmetric groups in various scalings. It was also the tool which was essential in the proof of Kerov positivity conjecture (which will be discussed below). 6.6. Genus expansion. The function λ 7→ NM (λ) is homogeneous which explains why Stanley’s formula is a perfect tool for our purposes. The degree of this homogeneous function deg NM = k + 1 − 2 genus(M ) is directly related to the genus of the surface on which map M is drawn. Thus the planar maps (which can be drawn on the sphere, genus equal to zero) have maximal possible degree and asymptotically have the biggest contribution. This kind of genus expansion where to combinatorial summands one can associate a surface which determines the asymptotics is very common in the asymptotic representation theory as well as in the random matrix theory. Genus expansion and, in particular, the special role of planar maps explains why combinatorics of free cumulants (originally formulated by Speicher in terms of non-crossing partitions which are set partitions that can be drawn on a sphere) is so useful. 6.7. Proof of Kerov’s conjecture and combinatorial interpretation of Kerov polynomials. The coefficient standing at monomial Ri1 · · · Ril in Kerov polynomial Chk turns out to be the number of maps with k edges with black
Combinatorics of asymptotic representation theory
543
vertices decorated by Ri1 , . . . , Ril (Figure 6) such that the following transportation problem has a solution. We imagine that each white vertex is a factory producing a unit of some liquid, each black vertex decorated by Ri is a consumer demanding i − 1 units of this liquid and the edges of the map are one-way pipes which can transport the liquid only from white to black vertices. We require that it is possible to arrange the amount of the liquid in each pipe in such a way that each pipe transports a strictly positive amount. The map on Figure 6 fulfills this condition. The fact that we require a strictly positive solution is quite unusual for such transportation problems and has some interesting consequences. Firstly, the classical criterion (given by Hall’s marriage theorem) for checking whether such a transportation problem has a solution has to be changed. Secondly, this strict positivity requirement restricts the maps which could contribute to the coefficients of Kerov polynomials, namely such a map cannot contain a disconnecting edge except for edges leading to white leaves (Figure 6). This is quite a strong restriction; for example the number of such maps with a fixed genus grows only polynomially with the size of the map while the number of all maps with fixed genus grows exponentially. This implies that the coefficients of Kerov polynomials for Chk corresponding to fixed genus grow relatively slowly (polynomially) with k. It should be compared with analogues of Kerov polynomials in which instead of free cumulants we use some other quantities describing the shape of the Young diagram; in the latter case the growth of the integer coefficients with k is usually exponential. This is an indication that Kerov polynomials contain relatively small amount of information, they have small complexity and thus free cumulants are the right quantities for studying asymptotics of characters. 6.8. Gaussian fluctuations. The definition of the normalized characters can be easily adapted to more complicated conjugacy classes; for example we denote by Chk,l (λ) the normalized character on a pair of cycles of lengths k and l. Following our guide (∗) we can write an analogue of Kerov polynomials, for example Ch3,2 = R3 R4 − 5R2 R3 − 6R5 − 18R3 . As one can see, the analogue of Kerov’s positivity conjecture does not hold true. This is an indication that characters Chk,l are not the right quantities. It turns out that it is much better to study a kind of covariance Cov(Chk , Chl ) := Chk,l − Chk Chl which measures how the character on two disjoint cycles differs from the product of the characters on each cycle separately. One can consider the corresponding Kerov polynomials, for example Cov(Ch3 , Ch2 ) := Ch3,2 − Ch3 Ch2 = − 6R2 R3 + 6R5 + 18R3 ; apart from the global change of the sign all coefficients are again non-negative integers which is an indication that such a covariance is the right quantity. Indeed, the combinatorial interpretation of the coefficients of Kerov polynomials from Section
544
Piotr Śniady
6.7 holds true after some simple adjustments, including the requirement that we consider only connected maps (for unicellular maps this was automatic). A degree of a function F on Young diagrams is defined as the degree of the polynomial s 7→ F (sλ). The connectivity requirement in the combinatorial interpretation of Kerov polynomials influences the topology of the maps which we count; hence the degree of the covariance Cov(Chk , Chl ) is smaller than the degrees of individual summands Chk Chl and Chk,l which will have interesting consequences. If the number of cycles is bigger, instead of covariance one should consider a cumulant k(Chk , Chl , . . . , Chs ) which measures in a more refined way how much the character Chk,l,...,s differs from products of characters with simpler cycle structure. All the above mentioned results hold true also in this more general setup. The fact that the degree of the cumulant k(Chk , Chl , . . . , Chs ) is much smaller than the sum of degrees of the individual factors implies that (if proper normalization is chosen) Ch2 , Ch3 , . . . regarded in a rather abstract way as random variables are asymptotically Gaussian. More specifically, this implies that a generalization of Kerov’s Central Limit Theorem holds true: for a wide class of reducible representations of the symmetric groups if we randomly select an irreducible component ρµ (like in Section 6.2), the fluctuations of the shape of µ will be asymptotically Gaussian. This is yet another application of Kerov polynomials and their combinatorics.
7. Open problems As a rule, open problems are much more interesting than the solved ones. Fortunately, there are still several mysteries concerning Kerov polynomials and related objects. We will review some of these open problems. Just like Kerov’s conjecture, they are also related to positivity, thus they hint at some unexpected hidden combinatorial structures and we hope that investigation of them will be as profitable as the investigation of Kerov conjecture was. 7.1. Goulden–Rattan character polynomials. Goulden–Rattan polynomials express the difference Chk −Rk+1 as a polynomial in C2 , C3 , . . . (where C2 (λ), C3 (λ), . . . are some quantities, related to free cumulants, describing the shape of a Young diagram). For example Ch7 −R8 = 14C6 +
203 2 469 C4 + C + 180C2 . 3 3 2
These polynomials are less complicated then the corresponding Kerov polynomials (i.e., contain a smaller number of summands), which suggests that (Ck ) are better, more fundamental, for understanding the deviation from the approximation Chk ≈ Rk+1 . Furthermore, the coefficients of these polynomials seem to be positive rational numbers with relatively small denominators. It would be more difficult to find a combinatorial interpretation of positive numbers which are not integers, nevertheless this possibility is very tempting.
Combinatorics of asymptotic representation theory
545
7.2. Kerov polynomials for Jack characters. Lassalle observed that just like normalized characters describe the dual combinatorics of representations of symmetric groups, it is possible to consider the dual combinatorics of Jack polynomials. In this way one can obtain a quite natural deformation of the characters of the symmetric group with an additional parameter γ. Also in this more general case it is possible to find Kerov polynomials, for example (γ)
Ch4
= R5 + 6γR4 + γR22 + (5 + 11γ 2 )R3 + (7γ + 6γ 3 )R2 .
As the reader can see, the coefficients of these more general Kerov polynomials also seem to be non-negative integers.
8. Further reading Due to lack of space we will refer mostly to overview articles. Ref. [4] is a haikustyle introduction to free cumulants. Ref. [1] is an overview of combinatorics of free cumulants and their applications to random matrices and representation theory. A lengthy introduction to [2] gives an overview of Kerov polynomials. Ref. [3] gives more details on Stanley’s character formula and its applications to asymptotics of characters. Ref. [5] gives more details on Gaussian fluctuations of Young diagrams.
References [1] P. Biane, Free probability and combinatorics. In: Proceedings of the International Congress of Mathematicians, Vol. II (Beijing, 2002), 765–774, Beijing, 2002. Higher Ed. Press. [2] M. Dołęga, V. Féray, and P. Śniady, Explicit combinatorial interpretation of Kerov character polynomials as numbers of permutation factorizations. Adv. Math. 225(1) (2010), 81–120. [3] V. Féray and P. Śniady, Asymptotics of characters of symmetric groups related to Stanley character formula. Ann. of Math. (2) 173(2) (2011), 887–906. [4] J. Novak and P. Śniady, What is . . . a free cumulant? Notices Amer. Math. Soc. 58(2) (2011), 300–301. [5] P. Śniady, Gaussian fluctuations of characters of symmetric groups and of Young diagrams. Probab. Theory Related Fields 136(2) (2006), 263–297.
Piotr Śniady, Institute of Mathematics, Polish Academy of Sciences, Śniadeckich 8, 00-956 Warszawa, Poland Institute of Mathematics, University of Wrocław, pl. Grunwaldzki 2/4, 50-384 Wrocław, Poland E-mail: [email protected]
On scale-invariant solutions of the Navier–Stokes equations ˇ ak∗ Hao Jia, Vladim´ır Sver´
Abstract. We discuss the forward self-similar solutions of the Navier–Stokes equations. It appears these solutions may provide an interesting window into non-perturbative regimes of the solutions of the equations. 2010 Mathematics Subject Classification. 35Q30, 76D05, 76N10.
1. Introduction We consider the classical Cauchy problem for the incompressible Navier–Stokes equation ut + u∇u + ∇p − ∆u = 0 in R3 × (0, ∞) , (1.1) div u = 0 u|t=0 = u0
in R3 .
(1.2)
We recall that the problem is invariant under the scaling u(x, t) → uλ (x, t) = λu(λx, λ2 t) , p(x, t) → pλ (x, t) = λ2 p(λx, λ2 t) , u0 (x) → u0λ (x) = λu0 (λx) ,
(1.3)
where λ > 0. We say that a solution u is scale-invariant if uλ = u and pλ = p for each λ > 0. Similarly, we say that an initial condition u0 is scale-invariant, if u0λ = u0 for each λ > 0. This is of course the same as requiring that u0 be (−1)-homogeneous. We will discuss the following result, which we recently proved in [7]. Theorem 1.1. Assume u0 is scale-invariant and locally H¨ older continuous in R3 \ {0} with div u0 = 0 in R3 . Then the Cauchy problem (1.1), (1.2) has at least one scale-invariant solution u which is smooth in R3 × (0, ∞) and locally H¨ older continuous in R3 × [0, ∞) \ {(0, 0)}. Previously this result has been known only under suitable smallness conditions on u0 , see for example [2, 10]. For small u0 one can also prove uniqueness (in suitable function classes). It is quite conceivable that uniqueness may fail for large data. We will discuss this point in more detail below. ∗ We thank Gregory Seregin for valuable comments. This work was supported in part by grant DMS 1101428 from the National Science Foundation.
ˇ ak Hao Jia, Vladim´ır Sver´
548
2. Well-posedness and scale invariant initial data We recall that a function space X of div-free fields on R3 is homogeneous if ||u0λ ||X = λα ||u0 ||X for some α ∈ R. A homogeneous space X is scale invariant (for the Navier–Stokes scaling) if α = 0, i.e. ||u0λ ||X = ||u0 ||X . Within the class of the homogeneous function spaces, the borderline spaces for perturbation theory of (1.1), (1.2) should be scale-invariant. Perturbation theory for the well-posedness results for the Navier–Stokes equation with initial data in such spaces was initiated in a well-known paper [8]. Paper [10] can be considered as a culmination of these developments. In [8] the function space X is taken as X = L3 (where we slightly abuse notation by using L3 for div-free vector fields which belong to L3 ). We note that the function |x|−1 “just misses” L3 (R3 ). In [10] the space X is taken X = BMO−1 (again restricted to div-free fields). We note that the function |x|−1 belongs to BMO−1 . The well-posedness result for X = BMO−1 is slightly more subtle here than with X = L3 in that the equations are well-posed in BMO−1 only for sufficiently small data, even in the sense of the local-in-time well-posedness. To get a local-in-time well-posedness for large data, one must further restrict the function space. As we shall see, for X = BMO−1 this smallness assumption may in fact be essential. It is conceivable that the equations are not well-posed (even locally in time) for large initial data in BMO−1 . At a heuristic level it is not hard to see that (−1)-homogeneous vector fields should play an important role. If u0 (x) is such a vector field which is smooth away from the origin and a > 0, then |∆(au0 )| ∼ a|x|−3 ,
|au0 ∇(au0 )| ∼ a2 |x|−3 .
(2.1)
We see that for a > 1 the non-linear term dominates. At a ∼ 1 both terms should be of the same order of magnitude (assuming the quantities u0 , ∇u0 are of similar magnitude on the unit sphere). The solutions obtained by the perturbation theory are often called mild solutions. These solutions exist on a certain maximal interval of existence [0, T ) and are regular in R3 × (0, T ), see, for example, [3, 4]. For small initial data we can take T = ∞, but for large initial data then we can conceivably have T < ∞, although it is not known whether this really happens. We emphasize again that once some div-free vector field with a singularity of the strength ∼ |x|−1 belongs to X, then one needs a smallness assumption even for the proofs of the local-in-time well-posedness. In the classic paper [14] many of these ideas are considered in slightly different spaces, which are not homogeneous. In addition to the class of mild solutions, we have the class of the weak solutions. The solutions of this kind were first constructed in [14] and their construction is based on the energy inequality, weak convergence and compactness. It was realized relatively recently, see [13], that this technique is applicable even when the energy of
549
On scale-invariant solutions of the Navier–Stokes equations
the initial data u0 is only locally finite (u0 ∈ L2loc ), with the additional assumption Z lim |u0 |2 dx = 0 . (2.2) x→∞
Bx,r
In particular, there is no problem in constructing weak solutions when the initial datum is a (−1)-homogeneous field u0 which is locally bounded away from 0. Unlike for the mild solutions, in the construction of the weak solutions the function |x|−1 does not play any distinguished role. For example, the scale invariant fields continuous away from the origin satisfy all the assumptions needed for the construction with good margins. The function |x|−1 “comes back” when we try to investigate uniqueness of the weak solutions. At present the best available results for the uniqueness of the weak solutions are of the same form as already discussed in [14], and later extended in [18], [13] and other works. The result say, roughly speaking, that if we have two weak solutions u, v for the same initial datum u0 and one of the solutions has similar regularity as the mild solutions, then the two solutions coincide. The initial datum of the “good solution” must essentially have the same regularity as required by the perturbation theory for the mild solutions. Viewed from the perspective of this proof, the function |x|−1 makes its return, even when we deal only with the weak solutions. Is the borderline role of the (−1)-homogeneous functions an artefact of our techniques, or is there something deeper behind it? We will argue for the latter.
3. Proof of Theorem (1.1) To prove Theorem 1.1, we seek the solution u(x, t) in the form x 1 . u(x, t) = √ U √ t t
(3.1)
The Navier–Stokes equation for u gives 1 1 −∆U − U − x∇U + U ∇U + ∇P = 0, 2 2
div U = 0 ,
(3.2)
in R3 . For a scale-invariant u0 the problem of finding a scale-invariant solution of the Cauchy problem (1.1), (1.2) is equivalent to the problem of finding a solution of (3.2) with the asymptotics 1 , x → ∞. (3.3) |U (x) − u0 (x)| = o |x| The problem (3.2), (3.3) is reminiscent of the classical Leray’s problem of finding steady-state solution of the Navier–Stokes equation in a bounded domain (which is now replaced by the whole space R3 ), with a given boundary conditions (which is now replaced by (3.3)). Heuristically it is clear that the main difficulty in pursuing
ˇ ak Hao Jia, Vladim´ır Sver´
550
this analogy is the potentially uncontrolled behavior of U for x → ∞. Roughly speaking if we can show that near ∞ the function U and its derivatives have the same decay as |x|−1 and the corresponding derivatives, then we can conclude that nothing surprising is happening near ∞, and the situation is indeed analogous to the bounded domain. (One still needs to establish estimates in the finite region, but these are very similar to the classical case of a bounded domain.) The main difficulty is in establishing these estimates. Once such estimates are established, we can essentially follow the classical Leray proof of Leray for the existence of the steady solutions in bounded domains, see [7] for details.
4. Possible Non-uniqueness As in the case of the bounded domains, the Leray–Schauder approach gives existence of the solutions, but not uniqueness. In the case of bounded domains one does not generically expect uniqueness for large data, and this non-uniqueness is in fact expected to be quite typical in the context of the steady Navier–Stokes, once the data is large. Let us for example consider the problem −∆u + u∇u + ∇p = 0 div u = 0 u|∂Ω = λg
in Ω 00
(4.1)
at ∂Ω
where g is Ra given smooth vector field at the boundary satisfying the compatibility condition ∂Ω g = 0 and λ > 0 is a parameter. Eventually we aim to take λ = 1. We know the equations (4.1) have a unique solution for small λ (by perturbation arguments and energy inequality, for example), with u|λ=0 = 0. We can try to continue the solution u into λ > 0 as u = u(λ), but the curve of can “turn back” and will not be a graph of a function of λ. The existence of these turning points signals non-uniqueness. For bounded domain the existence of such turning points is presumably quite typical, and for generic set-ups we do expect non-uniqueness once the function λg is “sufficiently large”. (This is true especially in dimension n = 3. In dimension n = 2 the situation might be in some cases different, see a related result in [16].) Could this also be the case for the problem (3.2), (3.3)? This would lead to non-uniqueness for the Cauchy problem (1.1), (1.2) with scale-invariant u0 . We believe it is likely that this indeed happens, and that the solution of the Cauchy problem (1.1), (1.2) for the scale-invariant u0 may not be unique for large data. This would mean, for example, that the initial value problem may not be well-posed in BMO−1 if the initial condition is not small. The possible non-uniqueness might be detected by following the curve of solutions U = U (λ) of the problem (3.2) with the “boundary condition” (3.3) replaced by |U (x) − λu0 (x)| = o
1 |x|
,
x → ∞,
(4.2)
On scale-invariant solutions of the Navier–Stokes equations
551
starting at λ = 0. For λ small we have a unique solution U (λ) and we can observe the spectrum of the linearized problem as we increase λ. Let us denote the spectrum by Σ(λ). One expects that for small λ we will have Σ(λ) ⊂ Π = {z, Re z < 0}. As we increase λ, the spectrum may leave Π. If is does so through z = 0, we expect a turning point in the curve of the solution and non-uniqueness as discusses above. What happens when the spectrum leaves Π through the imaginary axis? It is natural to expect that (under some natural assumptions) this will correspond to a Hopf bifurcation, with the appearance of periodic solution to the equation 1 1 div U = 0 , (4.3) Us − ∆U − U − x∇U + U ∇U + ∇P = 0, 2 2 with the “boundary condition” at ∞ given by (4.2). By a (standard) change of variables 1 x t u(x, t) = √ U √ , log , (4.4) t0 t t we see that this would correspond to a solution u of the initial value problem which would be only “discretely scale-invariant” for the scale invariant initial datum u0 . By this we mean that λu(λx, λ2 t) = u(x, t) not for all λ > 0, but only for a discrete subgroup {λk0 , k ∈ Z} of R+ . The existence of such solutions for discretely scaleinvariant u0 with λ0 close to 1 is proved in a recent paper [20]. Such solutions would still violate uniqueness for the Cauchy problem (1.1), (1.2) for the scaleinvariant initial data u0 . In this case there would be a scale-invariant solution guaranteed by Theorem (1.1) and another solution which is not scale-invariant, but only discretely scale-invariant. We believe that such scenarios are quite likely. The above considerations apply to the Cauchy problem with the scale-invariant initial data. Can such consideration be taken even further, to some solutions with finite energy obtained by a suitable “truncation at infinity” of the scale-invariant initial data? If this is the case, then we might not only have non-uniqueness for the scale-invariant initial data, but also non-uniqueness for finite-energy initial data, and – in particular – for the Leray–Hopf weak solutions. Moreover, the nonuniqueness would appear right at the borderline of the classes for which uniqueness can be proved via the weak-strong uniqueness theorems mentioned earlier. It is interesting to note the opinion of some prominent mathematicians on the question of the uniqueness of Leray–Hopf weak solutions. In [5] we can find the following comment (p. 217): “It is hard to believe that the initial value problem for the viscous fluid in dimension n = 3 could have more than one solution, and more work should be devoted to the study of the uniqueness question.” On the other hand, it is known that O. A. Ladyzhenskaya believed in non-uniqueness of the weak solution. The answer to the uniqueness question is still not known, but our current opinion, based on the discussion above, leans towards the non-uniqueness.
5. Estimates An important theme in [7] can be perhaps called local-in-space regularity estimates near the initial time t = 0. The connection to estimates of solutions of (3.2) near
ˇ ak Hao Jia, Vladim´ır Sver´
552
∞ can be seen from (3.1): if, say, ∇k u is bounded in {x, 1 ≤ |x| ≤ 2} for times close to 0, it means ∇U (x) = O(|x|−1−k ) as |x| → ∞. The following statement appears to be quite natural (S) Modulo the usual (and quite mild) non-local influences of the pressure, local regularity of the initial data propagates for at least a short time. Results in the direction of (S) can be found already in the classical paper [1]. More recently, related questions about vorticity propagation have been studied in [19]. Our main result in this direction, which is behind the necessary a-priori estimates for the solutions of (3.2) is as follows. Theorem 5.1. (Local H¨ older regularity of Leray solutions) R Let u0 ∈ L2loc (R3 ) with supx0 ∈R3 B1 (x0 ) |u|2 (x)dx ≤ α < ∞. Suppose u0 is in C γ (B2 (0)) with ku0 kC γ (B2 (0)) ≤ M < ∞. Then there exists a positive T = T (α, γ, M ) > 0, such that any Leray solution u with the initial datum u0 (which implies u is also a local suitable weak solution in the sense of [1]), satisfies γ γ u ∈ Cpar (B1/4 × [0, T ]), and kukCpar (B1/4 ×[0,T ]) ≤ C(M, α, γ).
(5.1)
We refer the reader to [7] for the precise definition of Leray solution. Our proof of 5.1 in [7] is based on a combination of techniques from [6, 15, 11, 13, 9]. Heuristically, the main point is that one can obtain a sufficient control of the energy flux into “good regions” from the rest of the space, see Section 3. Once we know that only small amount of energy can move into the “good region” one can use (a slight modification of) partial regularity schemes in [15, 11] to prove regularity.
References [1] L. Caffarelli, R.-V. Kohn, and L. Nirenberg, Partial regularity of suitable weak solutions of the Navier–Stokes equations. Comm. Pure Appl. Math. Vol. XXXV (1982), 771–831. [2] M. Cannone and F. Planchon, Self-Similar solutions for Navier–Stokes equations in R3 . Comm. Partial Differential Equations 21 no. 1–2 (1996), 179–193. [3] H. Dong and D. Du, On the local smoothness of solutions of the Navier–Stokes equations. J. Math. Fluid Mech. 9 no. 2 (2007), 139–152. [4] P. Germain, N. Pavlovic, and N. Staffilani, Regularity of solutions to the Navier– Stokes equations evolving from small data in BMO−1 . Int. Math. Res. Not. IMRN 2007, no. 21. ¨ [5] E. Hopf, Uber die Anfangswertaufgabe f¨ ur die hydrodynamischen Grundgleichungen. Math. Nachr. 4 (1951), 213–231. ˇ ak, Minimal L3 -initial data for potential Navier–Stokes singular[6] H. Jia and V. Sver´ ities. arXiv:1201.1592.
On scale-invariant solutions of the Navier–Stokes equations
553
ˇ ak, Local-in-space estimates near initial time for weak solutions [7] H. Jia and V. Sver´ of the Navier–Stokes equations and forward self-similar solutions. arXiv:1204.0529. [8] T. Kato, Strong Lp -solutions of the Navier–Stokes equation in Rm , with applications to weak solutions. Math. Z. 187 no. 4 (1984), 471–480. [9] N. Kikuchi and G. Seregin, Weak solutions to the Cauchy problem for the Navier– Stokes equations satisfying the local energy inequality. AMS translations, Series 2, Vol. 220, 141–164. [10] H. Koch and D. Tataru, Well-posedness for the Navier–Stokes equations. Adv. Math. 157 no. 1 (2001), 22–35. [11] O. A. Ladyzhenskaya and G. A. Seregin, On partial regularity of suitable weak solutions to the three-dimensional Navier–Stokes equations. J. Math. Fluid Mech. 1 no. 4 (1999), 356–387. [12] O. A. Ladyzhenskaya, On Uniqueness and smoothness of generalized solutions to the Navier–Stokes equations. Zapiski Nauchn, Seminar. POMI 5 (1967), 169–185. [13] P. G. Lemari´e-Rieusset, Recent developments in the Navier–Stokes problem. Chapman & Hall/CRC research Notes in Mathematics, 431 Chapman & Hall/CRC, Boca Raton, FL, 2002. [14] J. Leray, Sur le mouvement d’un liquide visqueux emplissant l’espace. Acta Math. 63 (1934), 193–248. [15] F-H. Lin, A new proof of the Caffarelli–Korn–Nirenberg theorem. Com. Pure Appl. Math. 51 no. 3 (1998), 241–257. [16] C. Marchioro, An example of absence of turbulence for any Reynolds number. Comm. Math. Phys. 105 (1986), 99–106. [17] G. Prodi, Un teorema di unicit` a per el equazioni di Navier–Stokes. Ann Mat. Pura Appl. 48 (1959), 173–182. [18] J. Serrin, The initial value problem for the Navier–Stokes equations. 1963 Nonlinear Problems (Proc. Sympos., Madison, Wis., 1962), 69–98 Univ. of Wisconsin Press, Madison, Wis. [19] T. Tao, Localization and compactness properties of the Navier–Stokes global regularity problem. arXiv:1108.1165v3. [20] T.-P. Tsai, Forward Discrete Self-Similar Solutions of the Navier–Stokes Equations. arXiv:1210.2783. ˇ ak, University of Minnesota, School of Mathematics, 206 Hao Jia, Vladim´ır Sver´ Church St. S.E., Minneapolis, MN 55455, USA
Ramsey-theoretic analysis of the conditional structure of weakly-null sequences Stevo Todorˇcevi´c∗
Abstract. Understanding the possible conditional structure in a given weakly-null sequence (xi ) in some normed space X lies in the heart of several classical problems of this area of mathematics. We will expose the set-theoretic and Ramsey-theoretic methods relevant to both the lack and the existence of this conditional structure. We will concentrate on more recent results and will point out problems for further study. 2010 Mathematics Subject Classification. 03E05, 05D10, 46B20. Keywords. Ramsey spaces, unconditional sequences.
1. The unconditional basic sequence problem Recall that a sequence (xi ) in some normed space X is unconditional if there is a constant C ≥ 1 such that
X
X
ai xi ≤ C a j xj
i∈I
j∈J
for any pair I ⊆ J of (finite) subsets of the index-set of (xi ) and for every sequence (aj : j ∈ J) of scalars. The unconditional basic sequence problem asking whether an arbitrary infinite-dimensional1 normed space contains an infinite unconditional basic2 sequence has played a prominent role both before and after its eventual solution by Gowers and Maurey [18]. Theorem 1.1 ([18]). There is a separable reflexive infinite-dimensional space X with no infinite unconditional basic sequence. In [1], Argyros, Lopez-Abad and Todorˇcevi´c were able to extend this to the level of non-separable spaces as well. Theorem 1.2 ([1]). There is also a non-separable reflexive space X with no infinite unconditional basic sequence. ∗ The
author is grateful to the Fields Institute for hospitality during the writing of this paper. otherwise stated, from now on, all normed spaces are implicitly assumed to be infinitedimensional although we shall keep stressing this from time to time. 2 The ’basic’ here refers to the notion of Schauder basic sequence defined at the beginning of the next Section. 1 Unless
556
Stevo Todorˇcevi´c
The Ramsey-theoretic nature of the unconditional basic sequence problem was apparent quite early but the following result of Gowers [16] that immediately followed [18] required a new infinite-dimensional Ramsey theorem commonly known today as Gowers Dichotomy (see also [17]). Theorem 1.3 ([16]). An infinite-dimensional Banach space contains either an infinite unconditional basic sequence or a hereditarily indecomposable Banach space.3 Concerning this result we note that the space of Theorem 1.1 is actually hereditarily indecomposable while the space of Theorem 1.2 being reflexive and non-separable must have many decompositions as sum of two closed infinite-dimensional subspaces. In this article, we shall discuss the following two general versions of the problem. Problem 1.4. (1) When does an infinite-dimensional normed space contains an infinite unconditional basic sequence? (2) When does an infinite normalized weakly null sequence in some normed space contains an infinite unconditional subsequence? In view of the solution of the unconditional basic sequence problem these problems may appear a bit wage but here is one example that shows that even a partial result in this direction sheds some light to another classical problem in this area, the separable quotient problem. Theorem 1.5 ([19], [24]). If the dual X ∗ of some Banach space X contains an infinite unconditional basic sequence then X admits a quotient with an unconditional basis.
2. Finite and partial unconditionality Recall that a sequence (xi )∞ basic sequence i=0 in some normed space is a Schauder P if itPis normalized and if there is a constant C ≥ 1 such that k i 0. Suppose that for every i < k we are given a normalized weakly null sequence (xin )∞ n=0 in some Banach space X. Then, there exists an infinite set M of integers such that for every {n0 < · · · < nk−1 } ⊆ M the k-sequence (xini )i 0 and every α < ω ω , every normalized weakly null sequence in C(α + 1) has a (2 + ε)-unconditional subsequence. (2) For every ε > 0 every normalized weakly null sequence in C(ω ω + 1) has a (4 + ε)-unconditional subsequence. 2
(3) There is a normalized weakly null sequence in C(ω ω + 1) with no unconditional subsequence. There are several results in the literature that give sufficient conditions on a given weakly null sequence in order to contain an infinite unconditional subsequence. Of these we mention the following result that uses the Nash-Williams theory of fronts and barriers. Theorem 2.4 ([3], [14], [45]). Suppose that (xn ) is a normalized weakly-null sequence in `∞ (Γ) with the property that inf{|xn (γ)| : n ∈ N, γ ∈ Γ} > 0. Then (xn ) contains an infinite unconditional basic subsequence. There is indeed a very natural relation between weakly null sequences (xn )∞ n=0 and compact and precompact families of finite subsets of N that are subject to the Nash-Williams theory. To see this, assume, without loss of generality that (xn ) is a weakly null sequence in some space of the form `∞ (Γ). For γ ∈ Γ, set Fγ = {n ∈ N : xn (γ) 6= 0}. Let F = {Fγ : γ ∈ Γ}. Then F is a precompact family of finite subsets of N, i.e., all the pointwise limits of this family are finite sets, or to put it combinatorially, every infinite subset M of N contains a finite initial segment s such that s is not a proper initial segment of any element of the family F. Let B be the collection of all finite subsets s of N that have no proper end-extensions in F and are minimal with respect to this property, i.e., every proper initial segment of s has an endextension in F. First of all note that B is a thin family, i.e., forms an antichain relative to the ordering v of end-extension. However, note that B is a front, i.e., every infinite subset M of N has an initial segment in B. These are the notions introduced originally by Nash-Williams [36], where he proved that thin families have the Ramsey property in the following sense.
558
Stevo Todorˇcevi´c
Theorem 2.5 ([36]). Suppose H = H0 ∪ · · · ∪ Hl is a finite partition of a thin family H of finite subsets of N. Then there is an infinite set M ⊆ N and i < l such that H M ⊆ Hi .4 Note that for a fixed positive integer k the family [N]k of all k-element subsets of N is a thin family (and, in fact it is a front) and that in this case Nash-Williams’ theorem reduces to Ramsey’s theorem. However, Nash-Williams’ theorem is in fact a far-reaching extension of Ramsey’s theorem that initiated the study of Ramsey theory of infinite dimension, the Ramsey theory most relevant to the questions we discuss here. So, going back to our family F associated to the weakly null sequence (xn ) and the front B and applying Nash-Williams’ theorem we find an infinite set such that F[M ] = B M .5 From this we conclude that any study of further subsequences of (xn )n∈M must involve the front B on M. A closer examination reveals that one has to study mappings with domains B M. The following important result of Pudlak and R¨ odl [40] reveals the true complexity of any such study. Theorem 2.6 ([40]). For every front B on N and every mapping f : B → N there exist an infinite subset M of N and a mapping ϕ : B M → B M such that: (1) ϕ is an internal mapping, i.e., ϕ(s) ⊆ s for all ∈ B M , (2) ϕ(s) 6v ϕ(t) for all s, t ∈ B M such that ϕ(s) 6= ϕ(t) and (3) for s, t ∈ B M, f (s) = f (t) iff ϕ(s) = ϕ(t). This shows that the complexity weakly null subsequence (xn )n∈M is captured by the complexity of internal mappings on fronts like B M. It should also be mentioned that the mapping ϕ satisfying the conclusion of Theorem 2.6 must be a unique such a mapping with domain B M. More precisely, suppose that ϕ0 : B M0 → B M0 and ϕ1 : B M1 → B M1 are two mappings satisfying the conclusion of Theorem 2.6. If the intersection M0 ∩ M1 is infinite, then there is an infinite set N ⊆ M0 ∩ M1 such that ϕ0 (B N ) = ϕ1 (B N ). Another useful consequence of Theorem 2.5 is that for every front B there is an infinite set M such that B M is, in fact, a barrier on M , i.e., that every infinite subset of M has an initial segment in B and that, moreover, B M is Sperner, i.e., that s 6⊆ t for all s 6= t in B M. Thus, without loss of generality we may work with barriers instead with fronts. One useful property of barriers B is that for every infinite set M the topological closure B M is simply equal to the ⊆-downwards closure of B M. Note that the finite rank fronts [N]k are 4 Here, 5 Here,
H M = {s ∈ H : s ⊆ M }. F [M ] = {F ∩ M : F ∈ F} and B M is the topological closure of the restriction B M.
Conditional structure of weakly-null sequences
559
also barriers, but there are barriers B whose topological closures have arbitrary countable Cantor–Bendixon ranks. One important example of a barrier of rank ω is the Schreier barrier S = {s ⊆ N : |s| = min(s) + 1} that forms the initial stage of a well studied transfinite hierarchy Sξ (1 ≤ ξ < ω1 ) of Schreier barriers of higher ranks. Part of their importance in this area is based on the fact that their topological closures are spreading, i.e., the property that if some s belongs to Sξ then so does every finite set t of the same cardinality as s such that for every i < |t|, the ith element of t is bigger or equal than the ith element of s. We refer the reader to [28] which attempts towards a systematic study of combinatorial and topological properties of barriers as well as systematic study of internal mappings on barriers that are relevant to problems about weakly null sequences. We finish this section by mentioning the well-known result of Elton [10] about the unconditional structure found inside arbitrary weakly null sequences. Theorem 2.7 ([10]). For every 0 < ε ≤ 1 there is a constant C(ε) ≥ 1 such that every normalized weakly nullPsequence (xn ) has an infinite subsequence (xni ) such P that k i∈I ai xni k ≤ C(ε)k j∈J aj xnj k for every pair I ⊆ J of subsets of N and every choice (aj : j ∈ J) of scalars such that ε ≤ |aj | ≤ 1 for all j ∈ J. The following problem is in the literature known as the Elton unconditional constant problem (see, for example, [6]). Problem 2.8. Is sup0 0, the set {γ ∈ Γ : |fγ (x)| ≥ ε} is finite. That nontrivial such sequences always exist is a theorem due to Josefson [25] and Nissenzweig [37]. Theorem 3.1 ([25], [37]). For every infinite-dimensional normed space X there ∗ is a normalized w∗ -null sequence (fn )∞ n=0 in X . ∗ Having such normalized sequence (fn )∞ n=0 in X one is tempted to apply the Bessaga–Pelczy´ nski technique to try to select a Schauder basic subsequence. This is exactly what Johnson and Rosenthal [24] did when they realized that one should also be looking for such Schauder basic subsequence (fni ) that has some sequence (xi ) in X as the corresponding sequence of biorthogonal functionals on the closed norm span of (fni ). This is how they proved the following well-known result.
Theorem 3.2 ([24]). Every separable infinite-dimensional space has an infinitedimensional quotient with a Schauder basis.
560
Stevo Todorˇcevi´c
It is quite natural to ask if this result can be extended to arbitrary spaces and this is what became known as the separable quotient problem. If one tries using the Ramsey-theoretic or set-theoretic analysis of this problem one will observe that the arguments in [24] are enough for getting separable quotients for spaces of density < b. Recall that b is the minimal cardinality of a subset of NN unbounded in the ordering of eventual dominance. Recall also the similar number p, the minimal T cardinality of a family F of infinite subsets of N such that F0 is infinite for all finite F0 ⊆ F but there is no infinite M ⊆ N such that M \N is finite for all N ∈ F. Recall also that m is the minimal cardinality of a family of nowhere dense subsets that cover some nonempty compact T2 -space K which has no isolated points and which satisfies the countable chain condition. It is easily seen that ω1 ≤ m ≤ p ≤ b. Then we have the following extension of the result in [24]. Theorem 3.3 ([46]). Suppose that a Banach space X has density < m and that its dual X ∗ has an uncountable normalized w∗ -null sequence. Then X has a quotient with a Schauder basis of length ω1 . Remark 3.4. Given an w∗ -null sequence {fγ : γ < ω1 } ⊆ X ∗ , the proof finds an uncountable subsequence {fγ : γ ∈ Γ} that forms a Schauder basis of its normclosed linear span span{fγ : γ ∈ Γ} and a quotient map T : X → (span{fγ : γ ∈ Γ})∗ onto the dual of this space which itself has a Schauder basis {fγ∗ : γ ∈ Γ} formed by the biorthogonal functionals of the Schauder basis {fγ : γ ∈ Γ}. This feature of the proof is of independent interest and has already been used in appplications some of which will be mentioned below. To satisfy the hypothesis of Theorem 3.3 one needs to invoke a set-theoretic dichotomy, PID. To introduce this dichotomy, we need to recall some standard definitions. Definition 3.5. Recall than an ideal on an index set S is simply a family I of subsets of S closed under taking subsets and finite unions of its elements. We shall consider only ideals of countable subsets of S and assume that all our ideals include the ideal of all finite subsets of S. Definition 3.6. We say that such an ideal I is a P-ideal if for every sequence (an ) in I there is b ∈ I such that an \ b is finite for all n. Example.
(a) The ideal [S] ω1 . Then every non-separable Banach space has an uncountable biorthogonal system. Recall that an Asplund space is a Banach space X with the property that separable subspaces of X have separable duals. They were originally introduces as spaces X with the property that every convex continuous function defined on a convex open subset U of X is Fr´echet differentiable on a dense Gδ -subset of U . Recall also that a Banach space X has the Mazur intersection property if every closed convex subset of X is the intersection of closed balls of X. Mazur [32] proved that every Banach space with a Fr´echet differentiable norm has the Mazur intersection property, so it was quite natural to ask if Asplund spaces have this property as well. The following two fact connects Theorem 3.8 to this problem. Theorem 3.10 ([23]). Suppose that a Banach space X has a biorthogonal system {(xi , fi ) : i ∈ I} ⊆ X × X ∗ such that X ∗ = span{fi : i ∈ I}. then X admits an equivalent norm with the Mazur intersection property. Theorem 3.11 ([4]). Suppose X is an Asplund space of density ℵ1 with an uncountable biorthogonal system. Then there is a normalized sequence {xξ : ξ < ω1 } of elements of X such that the operator f 7→ (f (xξ ) : ξ < ω1 ) maps X ∗ into a nonseparable subset of c0 (ω1 ). Remark 3.12. Note that this result is giving us a particular instance of the hypothesis of Theorem 3.3. So applying (the proof of) Theorem 3.3 (see Remark 3.4), we get an uncountable subsequence {xγ : γ ∈ Γ} forming a Schauder basis of its norm-closed linear span span{xγ : γ ∈ Γ} and a quotient map T : X ∗ → (span{xγ : γ ∈ Γ})∗
562
Stevo Todorˇcevi´c
onto the dual of this space which itself is spanned by the basis {x∗γ : γ ∈ Γ} formed by the biorthogonal functionals of the basis {xγ : γ ∈ Γ}. So if in addition X (and therefore X ∗ ) has density ℵ1 , we satisfy the hypothesis of Theorem 3.10. Combining this with Theorems 3.3 (Remarks 3.4 and 3.12), 3.10 and 3.8, we get the following. Corollary 3.13 ([4]). Assume PID. Then every Asplund space of density < m admits an equivalent norm with the Mazur intersection property. Assumptions like p > ω1 are necessary in Corollary 3.9 in view of the following fact.6 Theorem 3.14 ([43]). If b = ω1 then there is a non separable Asplund space of the form X = C(K) with no uncountable biorthogonal systems.7 We finish this section with another application of Theorem 3.8. Theorem 3.15 ([46]). Assume PID and m > ω1 . Then every non-separable Banach space contains a closed convex subset supported 8 by all of its points. Remark 3.16. In [41], Rolewicz showed that a separable Banach space does not contain such a convex subset. There are examples that show that some assumption is needed in Theorem 3.15 (see [27] and [29]). However, given the assumption PID, we feel that it would be of independent interest to determine the exact extra assumptions that are needed for each of the three problems from the geometry of Banach spaces. For example, at this stage it is unclear if PID itself is sufficient for solving the problem of Rolewicz about support sets. On the other hand, it seems plausible that, assuming PID, the set-theoretic assumption b > ω1 is equivalent to the the existence of an uncountable biorthogonal system in every nonseparable Asplund space and also equivalent to the statement that every Asplund space of density not bigger than ℵ1 admits an equivalent norm with the Mazur intersection property.
4. Weakly null sequences on Polish spaces When the weakly null sequence lives in `∞ (Γ) and Γ is a Polish space unconditionality results can be obtained using the Ramsey theory of trees based on the Halpern–L¨ auchli theorem [21] (see [48]). We spend this section to give some explanation of this. Definition 4.1. Fix a rooted finitely branching tree U with no terminal nodes. A subtree T of U will be called a strong subtree if the levels of T are subsets of the levels of U and if for every t ∈ T every immediate successor of t in U is extended by a unique immediate successor of t in T. 6 Recall
that PID is consistent with the equality b = ω1 . fact, X = C(K) is hereditarily Lindel¨ of relative to its weak topology so it admits no equivalent norm with the Mazur intersection property. 8 Recall that x in C supports C if there is f ∈ X ∗ such that f (x) = min{f (y) : y ∈ C} < sup{f (y) : y ∈ C}. 7 In
Conditional structure of weakly-null sequences
563
Theorem 4.2 ([21]). For every sequence U0 , . . . , Ud−1 of rooted finitelly branching trees with no terminal nodes and for every finite colouring of the level product U0 ⊗ · · · ⊗ Ud−1 , we can find for each i < d a strong subtree Ti of Ui such that the Ti ’s share the same level set and such that the level product T0 ⊗ · · · ⊗ Td−1 is monochromatic. This theorem serves as a pigeonhole principle behind the topological Ramsey space S∞ (U ) of strong subtrees of U (see, [48]). The following result of Milliken [34] is the analogue of the well-known result of Galvin and Prikry [13] about the space of all infinite subsets of N, the space that was relevant in the previous section of this paper. Theorem 4.3 ([34]). For every finite Borel colouring of the space S∞ (U ) of all strong subtrees of U there is a strong subtree T of U such that the set S∞ (T ) of strong subtrees of T is monochromatic. In applications one usually colours some specific subsets F of U. This theorem is relevant because the “shape” of F uniquely determines its strong subtree envelope, so the colouring can be induced to S∞ (U ). For more information about this, the reader is referred to the relevant Chapter of [48]. When the trees U0 , U1 , . . . , Ud−1 are uniformly branching then the corresponding version of the Halpern–L¨ auchli theorem is closely related to another well-known pigeonhole principle, the Hales–Jewett theorem [20], and consequently also to the Hindman theorem ([22]) and the Gowers theorem ([15]) which also have Ramsey spaces associated to them (see [48]). Here we mention one of these because of its relevance to the problems we treat here. Let FIN be the collection of all nonempty finite subsets of N. A block-sequence in FIN is a sequence X = (xn ) ⊆ FIN such that xm < xn whenever m < n. We say that X = (xm ) is a block-subsequence of Y = (yn ) and write X ≤ Y whenever every xm can be written as a union of some of the ym ’s. Let FIN[∞] be the space of all infinite block-sequences in FIN. The Hindman theorem is the pigeonhole principle behind the important fact that FIN[∞] forms a topological Ramsey space. We just mention here a consequence of this fact. Theorem 4.4 ([33]). For every finite Borel colouring of FIN[∞] there is Y = (yn ) ∈ FIN[∞] such that the collection of all infinite block subsequences of Y is monochromatic. Here is a typical application of this result that shows its relationship to both the space [N]∞ of all infinite subsets of N and the space of all perfect subtrees 9 of the complete binary tree 2 0. Hence, as an immediate consequence of (29) we get that, for any T ≥ 2, Z T 1 1 inf D[ρ(t, ·)] ≤ D[ρ(t, ·)] dt ≤ Hκ [ρ(0, ·)] . (30) T −1 1 T −1 0 t∈[1,T ]
596
Alessio Figalli
(The reason for considering t ≥ 1 is to ensure that some time passes so that the solution enjoys some further regularity properties needed to apply our estimates.) Observe now that for any density σ on R2 such that k∇σ 1/4 k2 < ∞, √ D[σ] = k∇σ 1/4 k2 kσ 1/4 k24 + πkσ 1/4 k36 δGN (σ 1/4 ) . (31) Hence, taking advantage of some uniform a priori bound on solutions to KS, we deduce the existence of some t¯ ∈ [1, T ] such that δGN [ρ1/4 (t¯, ·)] ≤
C Hκ0 [ρ] T
for some universal constant C, so by the stability Theorem 3.1 we conclude that kρ(t¯, ·)3/2 − σκ (· − x0 )3/2 k1 ≤ C
1/2 1 Hκ0 [ρ] T
for some x0 ∈ R2 and κ > 0 (recall that the density vλ4 is a multiple of σ1/λ ). Now, using some uniform estimates on the p-th moments of the solution and its Lq norms for all p < 2 and q < ∞, and exploiting that the KS evolution preserves the baricenter (in particular, without loss of generality we can assume that ρ(t, ·) has baricenter at the origin for all t), we obtain kρ(t¯, ·) − σκ k1 ≤ C
(p−1)/4p 1 Hκ0 [ρ] , T
(32)
for all p < 2 (here C depends also on p). Hence the above inequality bounds the time it takes a solution of the critical mass Keller–Segel equation to approach σκ for some κ. However, to get a quantitative convergence result, we must do two more things: (A) Show that ρ(t, ·) approaches σκ for κ = κ0 . (B) Show that eventually it remains close. While (A) is relatively easy since Hκ0 [σκ ] = +∞ if κ 6= κ0 , (B) is much more involved. To achieve it, we first recall that there is another functional which is decreasing along the KS evolution: this is the Logarithmic Hardy–Littlewood– Sobolev (Log-HLS) functional F, defined by Z
Z F[ρ] :=
ρ(x) log ρ(x) dx + 2 R2
R2
−1Z Z ρ(x) dx
ρ(x) log |x − y|ρ(y) dx dy.
R2 ×R2
(The fact that such a functional is decreasing along the KS equation is not really surprising, since the KS equation can be interpreted as the gradient flow of F with respect to the 2-Wasserstein distance.) This functional is invariant under scale changes a 7→ a2 ρ(ax). In particular, F[σκ ] is independent of κ. Moreover the functions {σκ }κ>0 uniquely minimize F, see [3, 8]. Keeping this in mind, for (B) we proceed as follows: first we show almost Lipschitz regularity of F in L1 [7, Theorem 3.7], which combined with (32) and
Stability in geometric & functional inequalities, with applications
597
the fact that p can be chosen close to 2, allows us to deduce that for any > 0 and T ≥ 2 there exists t¯ ∈ [1, T ] such that F[ρ(t¯, ·)] − min F ≤ C T −(1−)/8 . Then, since t¯ ≤ T and F[ρ(t, ·)] is decreasing in time, we get F[ρ(T, ·)] − C(8π) ≤ C T −(1−)/8
(33)
for all T ≥ 2. Finally, in order to conclude that ρ(T, ·) is close to some σκ we prove a stability result for the Log-HLS functional [7, Theorem 1.9]. (This is obtained exploiting Theorem 3.1 and some dissipation properties of the Log-HLS functional along a fast diffusion equation, see the proof of [7, Theorem 1.9] for more details.) Using this second stability result (combined with some additional time regularity estimates on the solution, see [7, Lemma 3.8]), one finally deduces for all t ≥ 2 the existence of some constant κ(t) > 0 such that kρ(t, ·) − σκ(t) k1 ≤ C t−(1−)/320 . Finally, a simple argument using the sensitive dependence of Hκ0 on tails allows us to show that κ(t) converges at a logarithmic rate to κ0 . Thus, the final convergence result proved in [7, Theorem 3.5] becomes: Theorem 4.1. Let ρ(t, x) be any properly dissipative solution of the Keller–Segel equation of critical mass M = 8π Rsuch that Hκ0 [ρ(0, ·)] < ∞ for some κ0 > 0, and F[ρ(0, ·)] < ∞. Assume that R2 xρ(x, 0) dx = 0. Then, for all > 0 there are constants C1 and C2 , depending only on , κ, Hκ,8π [ρ(0, ·)] and F[ρ(0, ·)], such that, for all t > 0, F[ρ(t, ·)] − C(8π) ≤ C1 (1 + t)−(1−)/8 , inf kρ(t, ·) − σµ,8π k1 ≤ C2 (1 + t)−(1−)/320 .
µ>0
Moreover, there is a positive number a > 0, depending only on Hκ0 [ρ(0, ·)] and F[ρ(0, ·)], so that for each t > 0, inf kρ(t, ·) − σκ k1 =
κ>0
min a 0 for which the amplification M t (the “t by t matrices over M ”) is isomorphic to M . The calculation of the fundamental group F(M ) and the outer automorphism group Out(M ) of a II1 factor M was and is an extremely challenging problem. By [8], these groups are countable if M comes from a property (T) group. Free probability theory was later used to prove that F(L(F∞ )) = R∗+ [98, 87]. But, it was not until Popa’s deformation/rigidity theory that the first examples of II1 factors with trivial fundamental and outer automorphism group were found. After giving the first examples of II1 factors with trivial fundamental group [71], Popa showed that F(M ) can be any countable subgroup of R∗+ [72]. The first calculations of Out(M ) were obtained in joint work with Peterson and Popa [48]. We showed that for any compact abelian group K, there exists a II1 factor M such that Out(M ) = K. In particular, there are II1 factors without outer automorphisms. The techniques of [48] were further exploited to show that there are II1 factors with no non-trivial subfactors of finite Jones index [93] and that Out(M ) can be any compact group [23]. Subsequently, our understanding of the possible values of F(M ) and Out(M ) has considerably improved. Remarkably, Popa and Vaes showed that F(M ) can be uncountable and 6= R∗+ , and moreover that F(M ) can have arbitrary Hausdorff dimension α ∈ [0, 1] ([80, 81], see also [20]). Additionally, they proved that Out(M ) can be any countable group [79, 94]. In spite of all this progress on F(M ) and Out(M ), not a single calculation of the endomorphism semigroup End(M ) is yet available. Nevertheless, a description of all endomorphisms of certain II1 factors M was obtained in [45]. In the mid 90s, Voiculescu used his free entropy dimension to show that the free group factors L(Fn ) do not have Cartan subalgebras [99]. Ge then showed that the free group factors are also prime [32], i.e. they cannot be written as the tensor
604
Adrian Ioana
product of two II1 factors. In the last decade, these indecomposability results have been generalized and strengthened in many ways. Using subtle C∗ -algebraic techniques, Ozawa proved that II1 factors arising from icc hyperbolic groups Γ are solid: the relative commutant A0 ∩ L(Γ) of any diffuse von Neumann subalgebra A ⊂ L(Γ) is amenable [58]. In particular, L(Γ) and all of its non-hyperfinite subfactors are prime. Techniques from [71, 58] were then combined in [62] to provide a family of II1 factors that can be uniquely written as a tensor product of prime factors. By developing a novel technique based on closable derivations, Peterson was able to show that II1 factors arising from icc groups with positive first `2 -Betti number are prime [66]. A new proof of solidity of L(Fn ) was found by Popa in [77], while II1 factors coming from icc groups Γ admitting a proper cocycle into `2 (Γ) were shown to be solid in [66]. For further examples of prime and solid II1 factors, see [59, 75, 10, 5]. In [63], Ozawa and Popa discovered that the free group factors enjoy a remarkable structural property, called strong solidity, which strengthens both solidity and absence of Cartan subalgebras: the normalizer of any diffuse amenable subalgebra A ⊂ L(Fn ) is amenable. Recently, Chifan and Sinclair showed that, more generally, the group von Neumann algebra of any icc hyperbolic group is strongly solid [17]. For more examples of strongly solid factors, see [64, 36, 39, 90]. 1.5. W∗ -superrigidity and uniqueness of Cartan subalgebras. Two free ergodic pmp actions Γ y (X, µ) and Λ y (Y, ν) are orbit equivalent (OE) if there exists an isomorphism of probability spaces θ : X → Y such that θ(Γ · x) = Λ· θ(x), for almost every x ∈ X. Singer showed that orbit equivalence amounts to the existence of an isomorphism L∞ (X) o Γ ∼ = L∞ (Y ) o Λ which identifies the Cartan ∞ ∞ subalgebras L (X) and L (Y ) [89]. Thus, W∗ -equivalence of actions (imposing isomorphism of their group measure space factors) is weaker than orbit equivalence. In addition, orbit equivalence is clearly weaker than conjugacy. By [65], if Γ and Λ are infinite amenable groups, then any free ergodic pmp actions Γ y (X, µ) and Λ y (Y, ν) are orbit equivalent. On the other hand, the culmination of a series of works [34, 42, 33, 22] has recently resulted in showing that any non-amenable group Γ admits uncountably many non-OE free ergodic pmp actions [22] (see the survey [37]). Furthermore, as shown in [42], Γ admits uncountably many non-W∗ -equivalent actions. Popa’s strong rigidity theorem [73] shows that conjugacy of the actions Γ y (X, µ) and Λ y (Y, ν) can be deduced from W∗ -equivalence of these actions if certain conditions are imposed both on the action Γ y (X, µ) and the group Λ. This result made it reasonable to believe that there exist actions Γ y (X, µ) which are W∗ -superrigid in the following sense: an arbitrary action Λ y (Y, ν) that is W∗ -equivalent to Γ y (X, µ) must be conjugate to it. Specifically, Popa asked in [73] whether Bernoulli actions of icc property (T) groups are W∗ -superrigid. By [25, 26], not all Cartan subalgebras come from the group measure space construction. To distinguish the ones that do, we call them group measure space Cartan subalgebras. Then an action Γ y (X, µ) is W∗ -superrigid if and only if it is OE superrigid and L∞ (X) is the unique group measure space Cartan subalgebra
Classification and rigidity for von Neumann algebras
605
of L∞ (X) o Γ, up to unitary conjugacy. Each of these two properties is extremely hard to establish and their combination is even more so. In the last 15 years, several large families families of OE superrigid actions have been found [26, 74, 75, 52, 43, 82, 53, 27, 78], including: (1) the standard action SLn (Z) y (Tn , λn ), for n > 3 [26] (2) Bernoulli actions Γ y (X, µ) = (X0 , µ0 )Γ of many groups, including property (T) groups [74] and products of non-amenable groups [75] (3) arbitrary free ergodic pmp actions of certain mapping class groups [52] (4) free ergodic profinite actions of property (T) groups [43] Note that the actions Γ y (X, µ) from (1) and (4) are “virtually” OE-superrigid: if a free ergodic pmp action Λ y (Y, ν) is OE to Γ y (X, µ), then the restrictions of these actions to some finite subgroups Γ0 < Γ and Λ0 < Λ are conjugate. On the other hand, the first uniqueness result for Cartan subalgebras, up to unitary conjugacy, was obtained only recently (2007) by Ozawa and Popa in their breakthrough work [63]. They proved that II1 factors L∞ (X) o Γ associated with free ergodic profinite actions of free groups Γ = Fn and their direct products Γ = Fn1 ×Fn2 ×. . .×Fnk have a unique Cartan subalgebra. This result was extended in [64] to groups Γ that have the complete metric approximation property and admit a proper cocycle into a non-amenable representation. It was then shown in [67] that II1 factors arising from profinite actions of groups Γ that admit an unbounded cocycle into a mixing representation but do not have Haagerup’s property, have a unique group measure Cartan subalgebra. However, since none of these actions was known to be OE superrigid, W∗ -superrigidity could not be concluded. The situation changed starting with the work of Peterson [67] who was able to show the existence of virtually W∗ -superrigid actions. Shortly after, Popa and Vaes discovered the first concrete families of W∗ superrigid actions [83]. They first showed a general unique decomposition result: any II1 factor arising from an arbitrary free ergodic pmp action Γ y (X, µ) of a group Γ in a large class of amalgamated free product groups has a unique group measure space Cartan subalgebra, up to unitary conjugacy. Applying OE superrigidity results from [74, 75] and [53] then allowed them to provide several wide classes of W∗ -superrigid actions. For related results, see [24, 38]. Despite this progress, the original question of whether Bernoulli actions of icc property (T) groups are W∗ -superrigid remained open for some time, until it was answered in the affirmative in [45]. The starting point of the proof is the observation that every group measure space decomposition M = L∞ (Y ) o Λ of a ¯ II1 factor M gives rise to an embedding ∆ : M → M ⊗M [83]. In the case when M arises from a Bernoulli action Γ y (X, µ) = (X0 , µ0 )Γ of an icc property (T) group Γ, a classification of all such embeddings was given in [45]. This classification is precise enough to imply that the Cartan subalgebras L∞ (X) and L∞ (Y ) are unitarily conjugated. In combination with the OE superrigidity theorem [74] it follows that the action Γ y (X, µ) is W∗ -superrigid.
606
Adrian Ioana
An icc group Γ is called W∗ -superrigid if any group Λ satisfying L(Γ) ∼ = L(Λ) must be isomorphic to Γ. The superrigidity question for groups is significantly harder than for actions. While large families of W∗ -superrigid actions were found in [83, 45], not a single example of a W∗ -superrigid group was known until our joint work with Popa and Vaes [49]. We proved that many generalized wreath product groups are W∗ -superrigid, although plain wreath product groups essentially never have this property. For instance, we showed that given any non-amenable group Γ0 , its canonical “augmentation” Γ = (Z/2Z)(I) o (Γ0 o Z) is W∗ -superrigid, where the (Z) set I is the quotient (Γ0 oZ)/Z on which the wreath product group Γ0 oZ = Γ0 oZ acts by left multiplication. A general conjecture predicts that II1 factors L∞ (X) o Γ arising from arbitrary free ergodic pmp actions of groups Γ with positive first `2 -Betti number, have a (2) unique Cartan subalgebra, up to unitary conjugacy. The condition β1 (Γ) > 0 is equivalent to Γ being non-amenable and having an unbounded cocycle into its left regular representation [3, 86], and is satisfied by any free product group Γ = Γ1 ∗Γ2 , with |Γ1 | > 2 and |Γ2 | > 3. Several recent results provide supporting evidence for this conjecture. Popa and Vaes showed in [83] that if Γ is the free product of a non-trivial group and an infinite property (T) group, then any II1 factor L∞ (X) o Γ associated with a free ergodic pmp action of Γ has a unique group measure space Cartan subalgebra. Chifan and Peterson then proved that the same holds for groups Γ with positive first `2 -Betti number that admit a non-amenable subgroup with the relative property (T) [15]. A common generalization of the last two results was obtained in [96]. (2) Most recently, it was proven in [46, 47] that if a group Γ satisfies β1 (Γ) > 0, then ∞ L (X) o Γ has a unique group measure space Cartan subalgebra whenever the action Γ y (X, µ) is either rigid or profinite. Generalizing [63], it was shown in [17, 18] that profinite actions of hyperbolic groups and of direct products of hyperbolic groups give rise to II1 factors with a unique Cartan subalgebra. The proofs of [63, 17, 18] rely both on the fact that free groups (and, more generally, hyperbolic groups, see [60, 61]) are weakly amenable and that the actions are profinite. In a very recent breakthrough, Popa and Vaes removed the assumption that the action is profinite and obtained wide-ranging unique Cartan subalgebra results. (2) They proved that if Γ is either a weakly amenable group with β1 (Γ) > 0 [84] or a hyperbolic group [85] (or a direct product of groups in one of these classes), then II1 factors L∞ (X) o Γ arising from arbitrary free ergodic pmp actions of Γ have a unique Cartan subalgebra, up to unitary conjugacy. In particular, this settles (2) the “β1 (Γ) > 0 conjecture” in the key case when Γ is a free group. Note that in the meantime, the main result of [85] has been extended by Houdayer and Vaes to cover non-amenable non-singular actions of hyperbolic groups [40]. Organization of the paper. Besides the introduction and a section of preliminaries, this paper has four other sections. In Section 3 we review the basic notions and techniques of Popa’s deformation/rigidity theory. In Sections 4–6 we expand upon the topics discussed in the last part of the introduction. In doing so, we
Classification and rigidity for von Neumann algebras
607
follow three directions: W∗ -superrigidity for Bernoulli actions and wreath product groups, uniqueness of group measure space Cartan subalgebras, and uniqueness of arbitrary Cartan subalgebras. Acknowledgments. It is my pleasure to thank Jesse Peterson and Sorin Popa for useful comments, and Stefaan Vaes for many suggestions that helped improve the exposition.
2. Preliminaries 2.1. Tracial von Neumann algebras. Definition 2.1. A tracial von Neumann algebra (M, τ ) is a von Neumann algebra M together with a positive linear functional τ : M → C that is • a trace: τ (xy) = τ (yx), for all x, y ∈ M , • faithful: if τ (x) = 0, for some x > 0, then x = 0, and P P • normal: τ ( i∈I pi ) = i∈I τ (pi ), for any family {pi }i∈I ⊂ M of mutually orthogonal projections. Any tracial von Neumann algebra (M, τ ) admits a canonical 2 representation on a Hilbert space. Indeed, denote by Lp (M ) the obtained by completing M w.r.t. the 2-norm: kxk2 = τ (x∗ x). multiplication on M extends to an injective ∗-homomorphism π : M
(or standard) Hilbert space Then the left → B(L2 (M )).
Next, we recall from [56, 57, 25] three general constructions of tracial von Neumann algebras: from groups, group actions and equivalence relations. Let Γ be a countable group and denote by {δh }h∈Γ the usual orthonormal basis of `2 (Γ). The left regular representation u : Γ → U(`2 (Γ)) is given by ug (δh ) = δgh . The group von Neumann algebra L(Γ) is the von Neumann algebra generated by {ug }g∈Γ , i.e. the weak operator closure of the group algebra CΓ = span {ug }g∈Γ . It has a faithful normal trace given by τ (T ) = hT δe , δe i. In other words, ( 1, if g = e, τ (ug ) = 0, if g 6= e. Now, let Γ y (X, µ) be a probability measure preserving (pmp) action of a countable group Γ an a probability space (X, µ). Denote by (σg )g∈Γ the associated action of Γ on L∞ (X), i.e. σg (a)(x) = a(g −1 · x). Then both Γ and L∞ (X) are ¯ 2 (Γ) through the formulae: represented on the Hilbert space L2 (X, µ)⊗` ug (b ⊗ δh ) = σg (b) ⊗ δgh , and a(b ⊗ δh ) = ab ⊗ δh . The group measure space von Neumann algebra L∞ (X)oΓ is the von Neumann algebra generated by {ug }g∈Γ and L∞ (X). It has a faithful normal trace defined by τ (T ) = hT (1 ⊗ δe ), 1 ⊗ δe i. Since ug au∗g = σg (a), the group measure space
608
Adrian Ioana
algebra L∞ (X) o Γ is equal to the weak operator closure of the span of the set {aug |a ∈ L∞ (X), g ∈ Γ}. The restriction of τ to this set is given by (R a dµ, if g = e, X τ (aug ) = 0, if g 6= e. Finally, let R be a countable pmp equivalence relation on a probability space (X, µ) [25]. This means that R has countable classes, R is a Borel subset of X × X, and that every Borel automorphism θ of X satisfying (θ(x), x) ∈ R, almost everywhere, preserves the measure µ. The group of such automorphisms θ is called the full group of R and denoted [R]. Further, we consider on R the infinite Borel measure given by Z µ ˜(A) = |{y ∈ X|(x, y) ∈ A}| dµ(x), for every Borel subset A ⊂ R. X
Then both [R] and L∞ (X) are represented on the Hilbert space L2 (R, µ ˜) through the formulae uθ (f )(x, y) = f (θ−1 (x), y) and (af )(x, y) = a(x)f (x, y), for all f ∈ L2 (R, µ ˜) and every θ ∈ [R], a ∈ L∞ (X). The generalized group measure space von Neumann algebra L(R) associated with R is generated by {uθ }θ∈[R] and L∞ (X). This algebra also has a faithful normal trace given by τ (T ) = hT (1∆ ), 1∆ i, where ∆ = {(x, x)|x ∈ X}, or explicitly by Z τ (uθ ) = µ({x ∈ X|θ(x) = x}), and τ (a) = a dµ(x). X
More generally, to every scalar 2-cocycle w on R, one can associate a tracial von Neumann algebra L(R, w), which, as above, contains a copy of L∞ (X) [25]. Note that if Γ y (X, µ) is a free pmp action and we denote by RΓyX the equivalence relation given by its orbits, then there exists a canonical isomorphism L∞ (X) o Γ = L(RΓyX ) whose restriction to L∞ (X) is the identity. 2.2. II1 factors and Cartan subalgebras. A II1 factor is an infinite dimensional tracial von Neumann algebra with trivial center. We have that L(Γ) is a II1 factor if and only if Γ is icc and that L∞ (X) o Γ is a II1 factor whenever the action Γ y (X, µ) is free and ergodic. Moreover, L(R) is a II1 factor if and only if R is ergodic: any R-invariant Borel subset of X has measure 0 or 1. Let A be a von Neumann subalgebra of a II1 factor M . The normalizer of A in M , denoted NM (A), is the group of unitaries u ∈ M satisfying uAu∗ = A. We say that A is a Cartan subalgebra of M if it is maximal abelian and its normalizer generates M . If Γ y (X, µ) is a free ergodic pmp action, then L∞ (X) is a group measure space Cartan subalgebra of L∞ (X)oΓ. If R is an ergodic pmp equivalence relation, then L∞ (X) is a Cartan subalgebra of L(R). More generally, L∞ (X) is a Cartan subalgebra of L(R, w), for every 2-cocycle w on R. Conversely, Feldman and Moore [25] showed that any Cartan subalgebra inclusion arises in this way. By [9], any two Cartan subalgebras of the hyperfinite II1 factor are conjugated by an automorphism. The first examples of II1 factors admitting two Cartan
Classification and rigidity for von Neumann algebras
609
subalgebras that are not conjugated by an automorphism were given by Connes and Jones in [12]. Examples of such II1 factors where the two Cartan subalgebras are explicit were recently found by Ozawa and Popa ([64], see also [83]). Very recently, a class of II1 factors M whose Cartan subalgebras cannot be classified, in the sense that the equivalence relation of being conjugated by an automorphism of M is not Borel, has been constructed in [91]. Proving uniqueness of Cartan subalgebras plays a crucial role in the classification of group measure space II1 factors. As the next result shows, it allows one to reduce the classification of group measure space factors, up to isomorphism, to the classification of the corresponding group actions, up to orbit equivalence. Proposition 2.2 (Singer, [89]). Let Γ y (X, µ) and Λ y (Y, ν) be free ergodic pmp actions. Then the following are equivalent: • there exists a ∗-isomorphism θ : L∞ (X) o Γ → L∞ (Y ) o Λ such that θ(L∞ (X)) = L∞ (Y ). • the actions Γ y (X, µ) and Λ y (Y, ν) are orbit equivalent: there exists an isomorphism α : X → Y of probability spaces such that α(Γ · x) = Λ · α(x), for almost all x ∈ X. The proof of this proposition relies on the fact that any unitary element u in L∞ (X) o Γ = L(RΓyX ) that normalizes L∞ (X) is the form u = auθ , for some unitary a ∈ L∞ (X) and some θ ∈ [RΓyX ]. 2.3. Popa’s intertwining-by-bimodules. Let Γ y (X, µ) and Λ y (Y, ν) be free ergodic pmp actions and θ : L∞ (X) o Γ → L∞ (Y ) o Λ be a ∗-isomorphism. Then both θ(L∞ (X)) and L∞ (Y ) are Cartan subalgebras of M = L∞ (Y ) o Λ. By Proposition 2.2, in order to conclude that the initial actions are orbit equivalent, it suffices to find an automorphism ρ of M such that ρ(θ(L∞ (X)) = L∞ (Y ). In [71, 72], Popa developed a powerful technique for showing unitary conjugacy of subalgebras of a tracial von Neumann algebra. In particular, this provides a criterion for the existence of an inner automorphism ρ of M with the desired property. Theorem 2.3 (Popa, [71, 72]). Let P, Q be von Neumann subalgebras of a tracial von Neumann algebra (M, τ ). Then the following conditions are equivalent: • there exist projections p ∈ P , q ∈ Q, a ∗-homomorphism θ : pP p → qQq and a non-zero partial isometry v ∈ qM p such that θ(x)v = vx, for all x ∈ pP p. • there is no sequence of unitaries un ∈ P satisfying kEQ (xun y)k2 → 0, for all x, y ∈ M . If one of these conditions is satisfied, we say that a corner of P embeds into Q. Moreover, if P and Q are Cartan subalgebras of M , and a corner of P embeds into Q, then there exists a unitary u ∈ M such that uP u∗ = Q. Here, as usual, EQ : M → Q denotes the unique τ -preserving conditional expectation onto Q.
610
Adrian Ioana
3. Popa’s deformation/rigidity theory 3.1. Deformations. In the last decade, Popa’s deformation/rigidity theory has completely reshaped the landscape of von Neumann algebras. At the heart of Popa’s theory is the notion of deformation of II1 factors. In the first part of this section, we define this notion and then illustrate it with many examples. Definition 3.1. A deformation of the identity of a tracial von Neumann algebra (M, τ ) is a sequence φn : M → M of unital, trace preserving, completely positive maps φn : M → M satisfying kφn (x) − xk2 → 0, for all x ∈ M . A linear map φ : M → M is completely positive if for all m > 1 the amplification φ(m) : Mm (M ) → Mm (M ) given by φ(m) ([xi,j ]) = [φ(xi,j )] is positive. Example 3.2. Let φn : Γ → C be a sequence of positive definite functions on a countable group Γ such that φn (e) = 1, for all n, and φn (g) → 1, for all g ∈ Γ. Then we have a deformation of L(Γ) given by ug 7→ φn (g)ug , and one of any group measure space algebra L∞ (X) o Γ given by aug 7→ φn (g)aug . If Γ has Haagerup’s property [35], then there is such a sequence φn : Γ → C satisfying φn ∈ c0 (Γ), for all n. In this case, the resulting deformation of L∞ (X)oΓ is compact relative to L∞ (X). This fact is a crucial ingredient of Popa’s proof that the II1 factor L∞ (T2 ) o SL2 (Z) has trivial fundamental group [71]. Example 3.3. Let Γ be a countable group and Γ y (X, µ) = ([0, 1], Leb)Γ be the Bernoulli action: g · (xh )h∈Γ = (xg−1 h )h∈Γ . Popa discovered in [70, 72] that Bernoulli actions have a remarkable deformation property, called malleability: there exists a continuous family of automorphisms (αt )t∈R of the probability space X×X, which commute with diagonal action of Γ, and satisfy α0 = id, α1 (x, y) = (y, x). To see this, first construct a continuous family of automorphisms (αt0 ) of the probability space [0, 1]×[0, 1] such that α00 = id and α10 (x, y) = (y, x). For instance, one can take ( (x, y), if |x − y| > t, αt0 (x, y) = (y, x), if |x − y| < t. Then identify X × X = ([0, 1] × [0, 1])Γ and define αt ((xh )h∈Γ ) = (αt0 (xh ))h∈Γ . Next, let us explain how to get a deformation of M = L∞ (X) o Γ from (αt )t∈R . Denote by θt the automorphism of L∞ (X × X) given by θt (a)(x) = a(αt−1 (x)). Since θt commutes with the diagonal action of Γ, it extends to an automorphism ˜ = L∞ (X × X) o Γ by letting θt (ug ) = ug , for g ∈ Γ. Then (θt )t∈R is a of M ˜ such that θ0 = id. continuous family of automorphisms of M ˜ , (θt )t∈R ), letting Now, in general, whenever one has such a pair (M φt = EM ◦ θt|M : M → M and choosing any sequence tn → 0, gives a deformation (φtn )n of M . This is why ˜ , (θt )t∈R ) is a deformation from now on, abusing notation, we say that the pair (M of M .
Classification and rigidity for von Neumann algebras
611
Since they were first introduced in [70, 72], malleable deformations have been found in several other contexts and are now a central tool in Popa’s deformation/rigidity theory (see [76, Section 6]). Next, we review several constructions of malleable deformations. Example 3.4. A malleable deformation for Bernoulli actions related to the one in [72] was introduced in [41]. Note that if (B, τ ) is a tracial von Neumann algebra and I is a set, then one can construct a tensor product von Q Neumann algebra ¯ i∈I B. This algebra is tracial, with its trace given by τ˜(x) = i∈I τ (xi ), for every ⊗ element x = ⊗i∈I xi whose support, {i ∈ I|xi 6= 1}, is finite. With the notations from Example 3.3, we can now identify A = L∞ (X) with the ¯ g∈Γ A0 , where A0 = L∞ ([0, 1]). Define A˜0 to be infinite tensor product algebra ⊗ the free product von Neumann algebra A0 ∗ L(Z). Let u ∈ L(Z) be the canonical generating unitary and choose a self-adjoint operator h such that u = exp(ih). For t ∈ R, let θt0 be the inner automorphism of A˜0 given by θt0 = Ad(exp(ith)). ¯ g∈Γ A˜0 which commutes with the Then θt = ⊗g∈Γ θt0 is an automorphism of A˜ = ⊗ ˜ = A˜ o Γ by Bernoulli action of Γ. Thus, θt extends to an automorphism of M letting θt (ug ) = ug . Example 3.5. Let (M1 , τ1 ) and (M2 , τ2 ) be tracial von Neumann algebras with a common von Neumann subalgebra A such that τ1|A = τ2|A . Denote by M the amalgamated free product von Neumann algebra M1 ∗A M2 (see [69] and [97] for the definition). Following [48], M admits a natural malleable deformation. More ˜ = M ∗A (A⊗L(F ˜ ¯ precisely, define M 2 )). Then M ⊂ M and one constructs a 1˜ as follows. Let u1 and u2 be the parameter group of automorphisms (θt )t∈R of M canonical generating unitaries of L(F2 ). Choose self-adjoint operators h1 and h2 such that u1 = exp(ih1 ) and u2 = exp(ih2 ). Then θt is defined to be the identity on L(F2 ) and inner on both M1 and M2 : θt|M1 = Ad(exp(ith1 ))|M1 and
θt|M2 = Ad(exp(ith2 ))|M2 .
See also [68, 77] for the construction of a malleable deformation for the free group factors. Example 3.6. Next, we recall the construction of a malleable deformation from cocycles ([90], see also [78]). Start with a countable group Γ, an orthogonal representation π : Γ → O(H) onto a separable real Hilbert space H, and a cocycle c : Γ → H. In other words, the cocycle relation c(gh) = c(g) + π(g)c(h) holds for all g, h ∈ Γ. Let Γ y (X, µ) be a pmp action. Define A = L∞ (X) and M = A o Γ. Let (D, τ ) be the unique tracial von Neumann algebra generated by unitaries ω(ξ), with ξ ∈ H, subject to the relations ω(ξ + η) = ω(ξ)ω(η), ω(ξ)∗ = ω(−ξ) and τ (ω(ξ)) = exp (−kξk2 ). Consider the Gaussian action of Γ on D which on the generating unitaries ω(ξ) is given by σg (ω(ξ)) = ω(π(g)(ξ)). Further, consider the ˜ = (A⊗D) ¯ and denote M ¯ diagonal product action of Γ on A⊗D o Γ. ˜ ¯ Then M ⊂ M . Defining θt to be the identify on A⊗D and θt (ug ) = (1 ⊗ ω(tc(g))) ug for all g ∈ Γ ˜. gives a 1-parameter group of automorphisms (θt )t∈R of M
612
Adrian Ioana
Example 3.7. Now, assume that c : Γ → H is a quasi-cocycle rather than a cocycle. Thus, the cocycle relation only holds up to bounded error: there exists κ > 0 such that kc(gh) − c(g) − π(g)c(h)k 6 κ, for all g, h ∈ Γ. It was recently discovered in [17] that quasi-cocycles can still be used to construct deformations. It is clear that defining θt as in the cocycle case will not not work. The original idea of [17] is to use the canonical unitary implementation of ˜ ) with L2 (A) ⊗ L2 (D) ⊗ `2 (Γ) and define θt instead. More precisely, identify L2 (M 2 ˜ a unitary operator Vt on L (M ) by letting Vt (ξ ⊗ η ⊗ δg ) = ξ ⊗ ω(tc(g))η ⊗ δg . ˜ , define θt (x) = Vt xVt∗ . If c is a cocycle, then this formula coincides For x ∈ M ˜ . In general, it is with the above one and thus θt gives an automorphism of M ˜ generated only true that θt leaves invariant the larger von Neumann algebra M ˜ and `∞ (Γ). Since M ˜ is not tracial, we cannot derive a deformation of M in by M the usual sense. Nevertheless, θt is a “deformation at the C∗ -algebraic level”, in a sense made precise in [17]. Example 3.8. Deformations arise naturally from closable derivations [16, 66]. Let (M, τ ) be a tracial von Neumann algebra and δ : M → H be a closable, ∗¯ densely defined, real derivation into an M -M bimodule H. Defining φt = e−tδ δ : M → M we obtain a continuous semigroup (φt )t>0 of unital trace preserving completely positive maps. It what recently shown [19] that any such semigroup ˜ containing M admits a dilation: there exists a tracial von Neumann algebra M ˜ and a continuous family (θt )t>0 of automorphisms of M such that φt = EM ◦ θt|M . 3.2. Rigidity. A second central notion in deformation/rigidity theory is Popa’s relative property (T) for inclusions of von Neumann algebras. Definition 3.9. A von Neumann subalgebra P of a tracial von Neumann algebra (M, τ ) has the relative property (T) if any deformation φn : M → M of the identity of M must converge uniformly to the identity on the unital ball of P [71]. Note that if P = M , then this property amounts to the property (T) of M , in the sense Connes and Jones [13]. Given two countable groups Γ0 < Γ, the inclusion of group von Neumann algebras L(Γ0 ) ⊂ L(Γ) has the relative property (T) if and only if the inclusion Γ0 < Γ has the relative property (T) [71]. Examples of inclusions of groups with the relative property (T) include SLn (Z) < SLn (Z), for n > 3 [51], and Z2 < Z2 oΓ, for any non-amenable subgroup Γ < SL2 (Z) [4]. Several classes of inclusions of von Neumann algebras inclusions satisfying the relative property (T) that do not arise from inclusions of groups have been recently found in [44, 11, 50]. For instance, the main result of [44] asserts that if M ⊂ L(Z2 o SL2 (Z)) is a non-hyperfinite subfactor which contains L(Z2 ), then the inclusion L(Z2 ) ⊂ M has the relative property (T). If (M, τ ) is a tracial von Neumann algebra, then an M -M bimodule is a Hilbert space endowed with commuting ∗-representations of M and its opposite algebra,
Classification and rigidity for von Neumann algebras
613
M op . Examples of M -M bimodules include the trivial bimodule L2 (M ) and the ¯ 2 (M ). These are particular instances of the bimodule coarse bimodule L2 (M )⊗L ¯ P L2 (M ) obtained by completing the algebraic tensor product M ⊗M with L2 (M )⊗ respect to the scalar product hx1 ⊗ x2 , y1 ⊗ y2 i = τ (y2∗ EP (x∗2 x1 )y1 ). Note that this bimodule is isomorphic to the L2 -space of Jones’s basic construction hM, eP i. The natural correspondence between completely positive maps and bimodules allows to reformulate relative property (T) in terms of bimodules. Recall that if H is an M -M bimodule, then a vector ξ ∈ H is tracial if hxξ, ξi = hξx, ξi = τ (x), for all x ∈ M ; it is P -central if xξ = ξx, for all x ∈ P . Also, a net of vectors (ξn )n is called almost central if kxξn − ξn xk → 0, for all x ∈ M . Then the relative property (T) for an inclusion P ⊂ M means that any M -M bimodule without P -central vectors does not admit a net (ξn )n of tracial, almost central vectors [71]. Thus, relative property (T) requires that all bimodules satisfy a certain spectral gap property. One way to obtain less restrictive notions of rigidity is to impose that only certain bimodules have spectral gap (see [76, Section 6.5]). This brings us to the following: Definition 3.10. A tracial von Neumann algebra (M, τ ) is amenable if there exists ¯ 2 (M ) of tracial, almost central vectors [68]. a net ξn ∈ L2 (M )⊗L If P, Q ⊂ M are von Neumann subalgebras, then Q is amenable relative to P if ¯ P L2 (M ) of tracial, Q-almost central vectors [63]. there exists a net ξn ∈ L2 (M )⊗ Thus, the failure of an algebra to be amenable (or amenable relative to some other algebra) can be viewed as a source of “spectral gap rigidity”. The notion of spectral gap rigidity has been introduced by S. Popa in the context of von Neumann algebras, and has been used to great effect in [75, 77, 63]. 3.3. Deformation vs. rigidity. We finally explain how deformation and rigidity are put together in Popa’s proof of his strong rigidity theorem. Theorem 3.11 (Popa, [72, 73]). Let Γ be an icc group and Γ y (X, µ) = (X0 , µ0 )Γ be a Bernoulli action. Let Λ be a property (T) group and Λ y (Y, ν) be any free ergodic pmp action. If L∞ (X) o Γ ∼ = L∞ (Y ) o Λ, then the actions Γ y (X, µ) and Λ y (Y, ν) are conjugate. In the first part of the proof, Popa essentially shows that any subalgebra P of M = L∞ (X) o Γ with the property (T) can be unitarily conjugated into L(Γ). ˜ be the tracial algebra containing To give an idea of why this is plausible, let M M together with its 1-parameter group of automorphisms (θt )t∈R constructed in Example 3.4. Then the deformation φt = EM ◦ θt|M converges uniformly to the identity on the unital ball of P , as t → 0. It is immediate that the same is true for θt . This forces that, for small enough t > 0, the restriction of θt to P is inner. ¯ g∈Γ A0 ) o Γ. By its definition, θt is If we denote A0 = L∞ (X0 ), then M = (⊗ ¯ g∈F A0 , for every finite subset F ⊂ Γ. As it turns inner both on L(Γ) and on ⊗ out, the converse is also true: any subalgebra of M on which θt is inner can be ¯ g∈F A0 is abelian, it cannot unitarily conjugate into one of these algebras. Since ⊗ contain any property (T) subalgebra. This altogether implies the initial claim.
614
Adrian Ioana
Now, identify L∞ (X) o Γ = L∞ (Y ) o Λ. Since L(Λ) has property (T), by the first part of the proof, we may assume that L(Λ) ⊂ L(Γ). In the second part of the proof, Popa proves that any group measure space Cartan subalgebra of M that is normalized by many unitaries in L(Γ) can be unitarily conjugate into L∞ (X). Thus, we have that both L(Λ) ⊂ L(Γ) and uL∞ (Y )u∗ ⊂ L∞ (X), for some unitary u ∈ M . In the final part of the proof, Popa is able to show that we may take u = 1. This readily implies that the actions are conjugate.
4. W∗ -superrigidity for Bernoulli actions and generalized wreath product groups The natural question underlying Theorem 3.11 is whether Bernoulli actions of icc property (T) groups are W∗ -superrigid [73]. In the first part of this section, we discuss the recent positive resolution of this question. Theorem 4.1 (Ioana, [45]). Let Γ be any icc property (T) group. Then the Bernoulli action Γ y (X, µ) = (X0 , µ0 )Γ is W∗ -superrigid. More generally, Theorem 4.1 holds for icc groups Γ which admit an infinite normal subgroup with the relative property (T). To outline the strategy of proof, denote A = L∞ (X) and M = A o Γ. Assume that M = BoΛ, where B = L∞ (Y ). This new group measure space decomposition ¯ of M gives rise to an embedding ∆ : M → M ⊗M defined by ∆(bvh ) = bvh ⊗ vh , for all b ∈ B and h ∈ Λ, where (vh )h∈Λ ⊂ M denote the canonical unitaries [83]. ¯ . The proof relies on a classification of all possible embeddings ∆ : M → M ⊗M This classification is precise enough to imply that A and B are unitarily conjugated, and hence that the actions Γ y (X, µ) and Λ y (Y, ν) are orbit equivalent. Since the action Γ y (X, µ) is OE superrigid by a result of Popa [74] it follows that the actions are indeed conjugate. Now, let us say a few words about the techniques that we use in order to classify ¯ . Since the same techniques are needed in the study embeddings ∆ : M → M ⊗M of embeddings θ : M → M , for simplicity, we only discuss the latter issue here. Since Γ has property (T), by the first part of the proof of Theorem 3.11, we may assume that θ(L(Γ)) ⊂ L(Γ). Denoting D = θ(A), we have that D is an abelian algebra which is normalized by a group of unitary elements, θ(Γ), from L(Γ). The main novelty of [45] is a structural result for abelian subalgebras D of M : suppose that D is normalized by a sequence of unitary elements un ∈ L(Γ) such that un → 0, weakly. Then essentially either D can be unitarily conjugated into L(Γ), or its relative commutant D0 ∩ M can be unitarily conjugated into A. Assuming that D cannot be unitarily conjugated into L(Γ), we deduce that the unitaries un ∈ L(Γ) are uniformly close to the discrete subgroup Γ ⊂ L(Γ). √ More precisely, there exists a sequence gn ∈ Γ such that supn kun − ugn k2 < 2. This allows us to carry subsequent calculations by analogy with the case un ∈ Γ and thereby conclude that D0 ∩ M can be unitarily conjugate into A. The techniques from [45] also yield a new class of II1 factors that do not arise from groups. In the setting of Theorem 4.1, assume additionally that Γ is torsion
615
Classification and rigidity for von Neumann algebras
free. Then for any projection p ∈ M \ {0, 1}, the II1 factor pM p is not isomorphic to the group von Neumann algebra of any countable group. Moreover, pM p is not isomorphic to any twisted group von Neumann algebra Lα (G), where α is a scalar 2-cocycle on a countable group G. This gave the first examples of such II1 factors. In the second part of this section, we present the recent discovery of the first classes of W∗ -superrigid groups. Theorem 4.2 (Ioana, Popa, Vaes, [49]). Let Γ0 be any non-amenable group and S (S) be any infinite amenable group. Let Γ0 = ⊕s∈S Γ0 and define the wreath product (S) group Γ = Γ0 o S. Consider the left multiplication action of Γ on the coset space I = Γ/S. (I)
Then the generalized wreath product G = (Z/2Z) o Γ is W∗ -superrigid. That is, if Λ is any countable group such that L(G) ∼ = L(Λ), then G ∼ = Λ. The fact that certain generalized wreath product groups are W∗ -superrigid should not be surprising, since such groups have been recently recognized to be remarkably rigid in the von Neumann algebra context. For instance, Popa’s strong (Γ ) rigidity theorem implies that if Gi = (Z/2Z) i o Γi , where Γi is a property (T) ∼ group for i ∈ {1, 2}, then L(G1 ) = L(G2 ) entails G1 ∼ = G2 [73]. Note, however, that the conclusion of Theorem 4.2 does not hold for plain (Γ) wreath product groups G = (Z/2Z) o Γ. In fact, for any non-trivial torsion free group Γ, there exists a torsion free group Λ such that L(G) ∼ = L(Λ) [49]. Nevertheless, for certain classes of groups Γ, including icc property (T) groups and products of non-amenable groups, we are still able to classify more or less explicitly all groups Λ with L(G) ∼ = L(Λ). To describe the main steps of the proof of Theorem 4.2, let M = L(G) and ¯ assume that M = L(Λ), for a countable group Λ. Denote by ∆Λ : M → M ⊗M the embedding given by ∆Λ (vh ) = vh ⊗ vh , where (vh )h∈Λ ⊂ M are the canonical ¯ . We start by viewing M as the group unitaries. Similarly, define ∆G : M → M ⊗M measure space II1 factor of the generalized Bernoulli action Γ y ({0, 1}, µ0 )Γ , where µ0 is the measure on {0, 1} given by µ0 ({0}) = µ0 ({1}) = 21 . By extending the methods of [45] from plain Bernoulli actions to generalized Bernoulli actions, ¯ . we then give a classification of all possible embeddings ∆ : M → M ⊗M When applied to ∆Λ , this enables us to deduce the existence of a unitary ¯ element Ω ∈ M ⊗M such that ∆Λ (x) = Ω∆G (x)Ω∗ , for all x ∈ M . Moreover, it follows that Ω satisfies a certain “dual” 2-cocycle relation. A main novelty of [49] is a vanishing result for dual 2-cocycles which allows to conclude that the groups G and Λ are isomorphic from the existence of Ω. Let us emphasize a particular case of this result that provides a surprising criterion for the unitary conjugacy of arbitrary icc groups G, Λ giving √ the same II1 factor, L(G) = L(Λ). Assume that there exists a constant κ < 2 with the property that for every g ∈ G we can find h ∈ Λ such that kug − vh k2 6 κ. Then G∼ = Λ and there exist a group isomorphism δ : G → Λ, a character η : G → T and a unitary element u ∈ L(G) such that uug u∗ = η(g)vδ(g) , for all g ∈ G.
616
Adrian Ioana
5. Uniqueness of group measure space Cartan subalgebras In this section we discuss several uniqueness results for group measure space Cartan subalgebras. We start with a general uniqueness result of Popa and Vaes: Theorem 5.1 (Popa, Vaes, [83]). Let Γ = Γ1 ∗Σ Γ2 be a non-trivial amalgamated free product such that Γ1 admits a non-amenable subgroup with the relative property (T), Σ is amenable and there exist g1 , g2 , . . . , gn ∈ Γ such that ∩ni=1 gi Σgi−1 is finite. Let Γ y (X, µ) be any free ergodic pmp action. Then L∞ (X) o Γ has a unique group measure space Cartan subalgebra, up to unitary conjugacy. Theorem 5.1 covers in particular arbitrary free ergodic pmp actions of any free product Γ = Γ1 ∗ Γ2 of an infinite property (T) group and a non-trivial group. In combination with the OE superrigidity theorems of Popa [74, 75] and Kida [53], it lead in [83] to the first families of W∗ -superrigid actions. For instance, if Tn < PSLn (Z) is the group of triangular matrices for some n > 3, then any free mixing pmp action of PSLn (Z) ∗Tn PSLn (Z) is W∗ -superrigid. For a group whose all free ergodic pmp actions are W∗ -superrigid, see [38]. To give an overview of the proof, assume for simplicity that Σ = {e}. Define A = L∞ (X) and M = A o Γ. Then we have an amalgamated free product decomposition M = M1 ∗A M2 , where M1 = A o Γ1 and M2 = A o Γ2 . Let ˜ , (θt )t∈R ) be the malleable deformation constructed in Example 3.5. (M Assume that M = B o Λ is another group measure space decomposition and denote by (vh )h∈Λ the canonical unitaries. The first part of the proof amounts to transferring some of the rigidity of Γ to Λ. Intuitively, since Γ has a non-amenable subgroup with the relative property (T) while A and B are amenable, Λ must admit a “non-amenable subset S with the relative property (T)”. Concretely, Popa and Vaes prove that given ε > 0, there exist a sequence hn ∈ Λ and t > 0 such that the unitary elements vn := vhn satisfy (1) kθt (vn ) − vn k2 < ε, for all n, and (2) kEA (xvn y)k2 → 0, for all x, y ∈ M . Condition (1) implies that the unitaries vn have “uniformly bounded length” in M = M1 ∗A M2 : they are almost supported on words on length 6 κ in M1 and M2 , for a fixed κ > 1. In the second part of the proof, using a combinatorial argument, Popa and Vaes conclude that since B is abelian and is normalized by the unitaries vn , the whole unit ball of B must also have uniformly bounded length. It is clear that both M1 and M2 have uniformly bounded length. Conversely, the main technical result of [48] shows that any subalgebra B with this property can be unitarily conjugated into either M1 or M2 . Finally, since the normalizer of B generates the whole M , this forces that B can be unitarily conjugate into A. Next, we comment on a result of Chifan and Peterson which gives a 1-cohomology approach to uniqueness of group measure space Cartan subalgebras.
617
Classification and rigidity for von Neumann algebras
Theorem 5.2 (Chifan, Peterson, [15]). Let Γ be a countable group which admits a non-amenable subgroup with the relative property (T) and an unbounded cocycle c : Γ → H into a mixing orthogonal representation π : Γ → O(H). Let Γ y (X, µ) be any free ergodic pmp action. Then L∞ (X) o Γ has a unique group measure space Cartan subalgebra, up to unitary conjugacy. Recall that π is mixing if hπ(g)ξ, ηi → 0, as g → ∞, for any vectors ξ, η ∈ H. Define A = L∞ (X) and M = A o Γ. The original proof of Theorem 5.2 uses Peterson’s technique of closable derivations [66]. Following [96], we consider instead the deformation of M given by the cocycle c : Γ → H as in Example 3.6. The proof of Theorem 5.2 has the same skeleton as the proof of Theorem 5.1. For this reason, we only emphasize two new ingredients. Thus, let B ⊂ M be an abelian von Neumann subalgebra. Assume for every ε > 0 there exist t > 0 and a sequence of unitaries vn normalizing B such that conditions (1) and (2) hold. Chifan and Peterson then prove that the deformation θt must converge uniformly on the unit ball of B. If this is the case, then a result from [66] further implies that either B can be unitarily conjugate into A, or θt converges uniformly on the normalizer of B. When B is Cartan subalgebra, the normalizer of B generates M , and the latter condition is impossible, since c is unbounded. In [96], Vaes generalized Theorem 5.2 by replacing the condition that π is mixing with the weaker condition that π is mixing relative to a family of amenable subgroups of Γ. This result also recovers Theorem 5.1 since any amalgamated free product Γ = Γ1 ∗Σ Γ2 admits an unbounded cocycle into the quasi-regular representation π : Γ → `2 (Γ/Σ) which is mixing relative to Σ. Theorems 5.1 and 5.2 provide supportive evidence for the general conjecture that L∞ (X) o Γ must have a unique Cartan subalgebra, for any free ergodic pmp (2) action of any group with β1 (Γ) > 0. The following result provides further positive evidence towards this conjecture. (2)
Theorem 5.3 (Ioana, [46, 47]). Let Γ be a countable group with β1 (Γ) > 0. Let Γ y (X, µ) be a free ergodic pmp action which is either rigid or profinite. Then L∞ (X) o Γ has a unique group measure space Cartan subalgebra, up to unitary conjugacy. Recall that an action Γ y (X, µ) is rigid if the inclusion L∞ (X) ⊂ L∞ (X) o Γ has the relative property (T) [71]. Examples of rigid actions are given by the actions SL2 (Z) y T2 [71] and SL2 (Z) y SL2 (R)/SL2 (Z) [50]. By [30] any free product group Γ = Γ1 ∗ Γ2 with |Γ1 | > 2 and |Γ2 | > 3 admits a continuum of rigid actions whose II1 factors are mutually non-isomorphic. Also, recall that an action Γ y (X, µ) is profinite if it is the inverse limit lim Γ y (Xn , µn ) of actions of Γ on finite probability spaces (Xn , µn ). Note that if ←− G = lim Γ/Γn is the profinite completion of Γ with respect to a descending chain ←− of finite index subgroups, then the left translation action Γ y G is profinite. To outline the proof of Theorem 5.3, define A = L∞ (X) and M = A o Γ. Suppose that M = B o Λ is another group measure space decomposition.
618
Adrian Ioana
In the first part of the proof, we show A can be unitarily conjugated into B oΣ, for some amenable subgroup Σ < Λ. This is quite unexpected, because a priori one has no knowledge about the subgroups of Λ. To derive this, we consider the deformation of M arising from an unbounded cocycle c : Γ → `2 (Γ) (such a cocycle (2) exists since β1 (Γ) > 0 [86]). We then combine the above mentioned results of [15] with quite delicate estimates in an ultraproduct algebra M ω associated with a cofinal ultrafilter ω over a (possibly uncountable) directed set. Since Σ is amenable, the algebra B o Σ is also amenable. Thus, we may essentially assume that von Neumann algebra N generated by A and B is amenable. In the second part of the proof, we use this to conclude that A and B are unitarily conjugate. If A and B are not conjugate, then the equivalence relation R on (X, µ) associated with the inclusion A ⊂ N [25] must be “weakly normal” in RΓyX . On the other hand, since Γ has positive first `2 -Betti number, RΓyX does not admit an aperiodic, amenable subequivalence relation that is weakly normal [28, 29].
6. Uniqueness of arbitrary Cartan subalgebras In this section we discuss several results providing classes of II1 factors with a unique Cartan subalgebra, up to unitary conjugacy. The proofs of these results make crucial use of the following approximation property for groups introduced by Cowling and Haagerup: Definition 6.1. A countable group Γ is weakly amenable [14] if there exists a sequence a sequence of functions ϕk : Γ → C such that • ϕk has finite support, for all k, • limk ϕk (g) = 1, for all g ∈ Γ, • lim supk kφk kcb < ∞, where φk : L(Γ) → L(Γ) is the unique map satisfying φk (ug ) = ϕk (g)ug , for all g ∈ Γ, and kφk kcb is its completely bounded norm. Moreover, if there exist ϕk : Γ → C as above such that lim supk kφk kcb = 1, then we say that Γ has the complete metric approximation property (CMAP) [35]. The first result ever showing uniqueness, up to unitary conjugacy, of arbitrary Cartan subalgebras, was obtained by Ozawa and Popa: Theorem 6.2 (Ozawa, Popa, [63]). Let Fn y (X, µ) be a free ergodic profinite pmp action of a free group, Fn , for some n > 2. Then L∞ (X) o Γ has a unique Cartan subalgebra, up to unitary conjugacy. Denote A = L∞ (X) and M = A o Fn . Since Fn has the CMAP [35] and the action Fn y X is profinite, the II1 factor M also has the CMAP: there exists a sequence of finite rank completely bounded maps φk : M → M such that kφk (x) − xk2 → 0, for all x ∈ M , and lim supk kφk kcb = 1. Consider an arbitrary diffuse amenable von Neumann subalgebra P ⊂ M and denote by G its normalizer.
Classification and rigidity for von Neumann algebras
619
Ozawa and Popa made the amazing discovery that since M has the CMAP, the action of G on P by conjugation is weakly compact. More precisely, there exists ¯ 2 (P ) which are tracial, P -almost central a net of positive vectors ξk ∈ L2 (P )⊗L and almost invariant under the diagonal action of G. Note that if the vectors ξk are actually invariant under the diagonal action of G, then the action G y P is compact, i.e. the closure of G inside Aut(P ) is compact. In the second part of the proof, Ozawa and Popa combine the free malleable deformation of M [68, 77] with the weak compactness of the action G y P . They conclude that either a corner of P embeds into A or the von Neumann algebra generated by G is amenable. If P ⊂ M is a Cartan subalgebra, then G 00 = M is not amenable. Therefore, by Theorem 2.3, P must be unitarily conjugate to A. In [61] Ozawa showed that one can replace the usage of the CMAP by that of weak amenability in the proof of Theorem 6.2. This result opened up the possibility that Theorem 6.2 could be extended to weakly amenable groups Γ that do not have the CMAP. Motivated by this, Chifan and Sinclair proved the following: Theorem 6.3 (Chifan, Sinclair, [17]). Let Γ y (X, µ) be a free ergodic profinite pmp action of a non-elementary hyperbolic group Γ. Then L∞ (X) o Γ has a unique Cartan subalgebra, up to unitary conjugacy. Without explaining further details, let us mention the three main ingredients of the proof of Theorem 6.3. Firstly, since hyperbolic groups are weakly amenable [60], the conjugation action NM (P ) y P is weakly compact, for any diffuse von Neumann subalgebra P of M = L∞ (X) o Γ [61]. Secondly, following [55], for any hyperbolic group Γ there is a proper quasi-cocycle c : Γ → `2 (Γ). This gives rise to a C∗ -algebraic deformation of M ([17], see Example 3.7) that is compact relative to L∞ (X). Finally, since hyperbolic groups are exact, elements of M admit “good” approximations by elements in the reduced C∗ -algebra L∞ (X) or Γ. Very recently, Popa and Vaes obtained sweeping “unique Cartan subalgebra” results. In particular, they were able to extend Theorems 6.2 and 6.3 to arbitrary actions of free groups and non-elementary hyperbolic groups. Theorem 6.4 (Popa, Vaes, [84, 85]). Let Γ be a countable group and let Γ y (X, µ) be any free ergodic pmp action. Assume that either (1) Γ is weakly amenable and admits an unbounded cocycle into a non-amenable mixing orthogonal representation π : Γ → O(H), or (2) Γ is non-elementary hyperbolic. Then L∞ (X) o Γ has a unique Cartan subalgebra, up to unitary conjugcy. Recall that a representation π is amenable if π ⊗ π ¯ has almost invariant vectors. Hence, the left regular representation of a non-amenable group Γ is non-amenable. (2) Therefore, any weakly amenable group Γ with β1 (Γ) > 0 satisfies Theorem 6.4. Theorem 6.4 has the following beautiful consequence. If 2 6 m, n 6 ∞ and m 6= n, then any free ergodic pmp actions Fm y (X, µ) and Fn y (Y, ν) give rise to non-isomorphic II1 factors, L∞ (X) o Fm 6∼ = L∞ (Y ) o Fn .
620
Adrian Ioana
Indeed, if these factors were isomorphic, then by Theorem 6.4, the actions Fm y X and Fn y Y would be orbit equivalent. However, it is proven in [28, 29] that free actions of free groups of different ranks are never orbit equivalent. In combination with [1, 2], Theorem 6.4 also allows to completely classify all amplifications of the II1 factors associated with the wreath product groups Z o Fn . Thus, L(Z o Fm )t ∼ = L(Z o Fn )s if and only if (m − 1)/s = (n − 1)/t. In order to give an overview the proof of Theorem 6.4, denote M = L∞ (X)oΓ. Since Γ is weakly amenable, there exists a sequence φk : M → M of completely bounded maps of “finite rank relative to L∞ (X)” such that kφk (x) − xk2 → 0, for all x ∈ M , and lim supk kφk kcb = 1. The existence of such maps, however, cannot imply that actions of the form NM (P ) y P are weakly compact. Indeed, the action Γ y X might itself not be weakly compact (e.g. if it is a Bernoulli action and Γ is non-amenable). Nevertheless, Popa and Vaes discovered that there is an appropriate notion of weak compactness which is some sense relative to L∞ (X). They then proved, by extending a technique from [63], that the action NM (P ) y P has the relative weak compactness property, for any diffuse amenable subalgebra P ⊂ M . To complete the proof, Popa and Vaes combine this property with the malleable deformation of M associated with an unbounded cocycle c : Γ → H [90], in case (1), and a proper quasi-cocycle c : Γ → `2 (Γ) [17], in case (2). As shown in a later version of [85], in case (2) one can alternatively use the fact that hyperbolic groups are biexact [58].
7. References [1] L. Bowen, Orbit equivalence, coinduced actions and free products. Groups Geom. Dyn. 5 (2011), 1–15. [2] L. Bowen, Stable orbit equivalence of Bernoulli shifts over free groups. Groups Geom. Dyn. 5 (2011), 17–38. [3] B. Bekka and A. Valette, Group cohomology, harmonic functions and the first L2 Betti number. Potential Anal. 6 (1997), 313–326. [4] M. Burger, Kazhdan constants for SL(3, Z). J. Reine Angew. Math. 413 (1991), 36–67. [5] I. Chifan and C. Houdayer, Bass–Serre rigidity results in von Neumann algebras. Duke Math J. 153 (2010), 23–54. [6] A. Connes, Sur la classification des facteurs de type II. C. R. Acad. Sci. Paris S´er. A-B 281 (1975), A13–A15. [7] A. Connes, Classification of injective factors. Ann. of Math. (2) 104 (1976), 73–115. [8] A. Connes, A factor of type II1 with countable fundamental group. J. Operator Theory 4 (1980), 151–153. [9] A. Connes, J. Feldman, and B. Weiss, An amenable equivalence relation is generated by a single transformation. Ergodic. Th. and Dynam. Sys. 1 no. 4 (1981), 431–450. [10] I. Chifan and A. Ioana, Ergodic subequivalence relations induced by a Bernoulli action, Geom. Funct. Anal. 20 (2010), 53–67.
Classification and rigidity for von Neumann algebras
621
[11] I. Chifan and A. Ioana, On Relative property (T) and Haagerup’s property, Trans. Amer. Math. Soc. 363 (2011), 6407–6420. [12] A. Connes and V. F. R. Jones, A II1 factor with two non-conjugate Cartan subalgebras. Bull. Amer. Math. Soc. 6 (1982), 211–212. [13] A. Connes and V. F. R. Jones, Property (T) for von Neumann algebras. Bull. London Math. Soc. 17 (1985), 57–62. [14] M. Cowling and U. Haagerup, Completely bounded multipliers of the Fourier algebra of a simple Lie group of real rank one. Invent. Math. 96 (1989), 507–549. [15] I. Chifan and J. Peterson, Some unique group-measure space decomposition results. Preprint arXiv:1010.5194. [16] F. Cipriani and J.-L. Sauvageot, Derivations as square roots of Dirichlet forms. J. Funct. Anal. 201 (2003), 78–120. [17] I. Chifan and T. Sinclair, On the structural theory of II1 factors of negatively curved ´ groups. Preprint arXiv:1103.4299, to appear in Ann. Sci. Ecole Norm. Sup. [18] I. Chifan, T. Sinclair, and B. Udrea, On the structural theory of II1 factors of negatively curved groups, II. Actions by product groups. Preprint arXiv:1108.4200. [19] Y. Dabrowski, A non-commutative Path Space approach to stationary free Stochastic Differential Equations. Preprint arXiv:1006.4351. [20] S. Deprez, Explicit examples of equivalence relations and factors with prescribed fundamental group and outer automorphism group. Preprint arXiv:1010.3612. [21] H. A. Dye, On groups of measure preserving transformation. I. Amer. J. Math. 81 (1959), 119–159. [22] I. Epstein, Orbit inequivalent actions of non-amenable groups. Preprint arXiv:0707.4215. [23] S. Falgui`eres and S. Vaes, Every compact group arises as the outer automorphism group of a II1 factor. J. Funct. Anal. 254 (2008), 2317–2328. [24] P. Fima and S. Vaes, HNN extensions and unique group measure space decomposition of II1 factors. Trans. Amer. Math. Soc. 364 (2012), 2601–2617. [25] J. Feldman and C. C. Moore, Ergodic Equivalence Relations, Cohomology, and Von Neumann Algebras. I, II. Trans. Amer. Math. Soc. 234 (1977), 289–324, 325–359. [26] A. Furman, Orbit equivalence rigidity. Ann. of Math. (2) 150 (1999), 1083–1108. [27] A. Furman, A survey of measured group theory. Geometry, rigidity, and group actions, 296–374, Chicago Lectures in Math., Univ. Chicago Press, Chicago, IL, 2011. [28] D. Gaboriau, Coˆ ut des relations d’´equivalence et des groupes. Invent. Math. 139 (2000), 41–98. [29] D. Gaboriau, Invariants `2 de relations d’´eequivalence et de groupes. Publ. Math. ´ Inst. Hautes Etudes Sci. 95 (2002), 93–150. [30] D. Gaboriau, Relative Property (T) Actions and Trivial Outer Automorphism Groups, J. Funct. Anal. 260 no. 2 (2011), 414–427. [31] D. Gaboriau, Orbit equivalence and measured group theory. In: Proceedings of the International Congress of Mathematicians (Hyderabad, India, 2010), Vol. III, Hindustan Book Agency, 2010, 1501–1527.
622
Adrian Ioana
[32] L. Ge, Applications of free entropy to finite von Neumann algebras. II. Ann. of Math. 147 (1998), 143–157. [33] D. Gaboriau and R. Lyons. A measurable-group-theoretic solution to von Neumann’s problem. Invent. Math. 177 (2009), 533–540. [34] D. Gaboriau and S. Popa. An uncountable family of non-orbit equivalent actions of Fn . J. Amer. Math. Soc. 18 (2005), 547–559. [35] U. Haagerup, An example of a non-nuclear C∗ -algebra, which has the metric approximation property. Invent. Math. 50 (1978/79), 279–293. [36] C. Houdayer, Strongly solid group factors which are not interpolated free group factors. Math. Ann. 346 (2010), 969–989. [37] C. Houdayer, Invariant percolation and measured theory of nonamenable groups. Preprint arXiv:1106.5337, to appear in Ast´erisque. [38] C. Houdayer, S. Popa, and S. Vaes, A class of groups for which every action is W∗ -superrigid. Preprint arXiv:1010.5077, to appear in Groups Geom. Dyn. [39] C. Houdayer and D. Shlyakhtenko, Strongly solid II1 factors with an exotic MASA. Int. Math. Res. Not. IMRN 2011, 1352–1380. [40] C. Houdayer and S. Vaes, Type III factors with unique Cartan decomposition. Preprint arXiv:1203.1254. [41] A. Ioana, Rigidity results for wreath product II1 factors. J. Funct. Anal. 252 (2007), 763–791. [42] A. Ioana, Orbit inequivalent actions for groups containing a copy of F2 . Invent. Math. 185 (2011), 55–73. [43] A. Ioana, Cocycle superrigidity for profinite actions of property (T) groups. Duke Math. J. 157 no. 2 (2011), 337–367. [44] A. Ioana, Relative property (T) for the subequivalence relations induced by the action of SL2 (Z) on T2 . Adv. Math. 224 no. 4 (2010), 1589–1617. [45] A. Ioana, W∗ -superrigidity for Bernoulli actions of property (T) groups. J. Amer. Math. Soc. 24 (2011), 1175–1226. [46] A. Ioana, Uniqueness of the group measure space decomposition for Popa’s HT factors. Preprint arXiv:1104.2913, to appear in Geom. Funct. Anal. [47] A. Ioana, Compact actions and uniqueness of the group measure space decomposition of II1 factors. J. Funct. Anal. 262 (2012), 4525–4533. [48] A. Ioana, J. Peterson and S. Popa, Amalgamated free products of weakly rigid factors and calculation of their symmetry groups. Acta Math. 200 (2008), 85–153. [49] A. Ioana, S. Popa, and S. Vaes, A class of superrigid group von Neumann algebras. Preprint arXiv:1007.1412. [50] A. Ioana, Y. Shalom, Rigidity for equivalence relations on homogeneous spaces. Preprint arXiv:1010.3778, to appear in Groups Geom. Dyn. [51] D. Kazhdan, Connection of the dual space of a group with the structure of its closed subgroups. Funct. Anal. and its Appl. 1 (1967), 63–65. [52] Y. Kida, Measure equivalence rigidity of the mapping class group. Ann. of Math. (2) 171 (2010), 1851–1901.
623
Classification and rigidity for von Neumann algebras
[53] Y. Kida, Rigidity of amalgamated free products in measure equivalence. J. Topol. 4 (2011), 687–735. [54] D. McDuff, Uncountably many II1 factors. Ann. of Math. 90 (1969), 372–377. [55] I. Mineyev, N. Monod, and Y. Shalom, Ideal bicombings for hyperbolic groups and applications. Topology 43 (2004), 1319–1344. [56] F. J. Murray and J. von Neumann, On rings of operators. Ann. Math. 37 (1936), 116–229. [57] F. J. Murray and J. von Neumann, Rings of operators IV. Ann. Math. 44 (1943), 716–808. [58] N. Ozawa, Solid von Neumann algebras. Acta Math. 192 (2004), 111–117. [59] N. Ozawa, A Kurosh type theorem for type II1 factors. Internat. Math. Res. Notices 2006, Article ID 97560 [60] N. Ozawa, Weak amenability of hyperbolic groups. Groups Geom. Dyn. 2 (2008), 271–280. [61] N. Ozawa, Examples of groups which are not weakly amenable, Kyoto J. Math. 52 (2012), 333–344. [62] N. Ozawa and S. Popa. Some prime factorization results for type II1 factors. Invent. Math. 156 (2004), 223–234. [63] N. Ozawa and S. Popa. On a class of II1 factors with at most one Cartan subalgebra. Ann. Math. 172 (2010), 713–749. [64] N. Ozawa and S. Popa. On a class of II1 factors with at most one Cartan subalgebra. II. Amer. J. Math. 132 (2010), 841–866. [65] D. Ornstein and B. Weiss. Ergodic theory of amenable group actions. I. The Rohlin lemma. Bull. Amer. Math. Soc. (N.S.) 2(1) 1980, 161–164. [66] J. Peterson, L2 -rigidity in von Neumann algebras. Invent. Math. 175 (2009), 417– 433. [67] J. Peterson, Examples of group actions which are virtually W∗ -superrigid. Preprint arXiv:1002.1745. [68] S. Popa, Correspondences. INCREST preprint 56 http://www.math.ucla.edu/~popa/preprints.html.
(1986),
available
at
[69] S. Popa, Markov traces on universal Jones algebras and subfactors of finite index. Invent. Math. 111 (1993), 375–405. [70] S. Popa, Some rigidity results for non-commutative Bernoulli shifts. J. Funct. Anal. 230 (2006), 273–328. [71] S. Popa, On a class of type II1 factors with Betti numbers invariants. Ann. of Math. 163 (2006), 809–899. [72] S. Popa, Strong rigidity of II1 factors arising from malleable actions of w-rigid groups, I. Invent. Math. 165 (2006), 369–408. [73] S. Popa, Strong rigidity of II1 factors arising from malleable actions of w-rigid groups, II. Invent. Math. 165 (2006), 409–452. [74] S. Popa, Cocycle and orbit equivalence superrigidity for malleable actions of w-rigid groups. Invent. Math. 170 (2007), 243–295.
624
Adrian Ioana
[75] S. Popa, On the superrigidity of malleable actions with spectral gap. J. Amer. Math. Soc. 21 (2008), 981–1000. [76] S. Popa, Deformation and rigidity for group actions and von Neumann algebras. In: Proceedings of the International Congress of Mathematicians (Madrid, 2006), Vol. I, European Mathematical Society Publishing House, 2007, 445–477. [77] S. Popa, On Ozawa’s property for free group factors. Int. Math. Res. Not. IMRN 2007, no. 11, Art. ID rnm036, 10. [78] J. Peterson and T. Sinclair, On cocycle superrigidity for Gaussian actions. Erg. Th. Dyn. Sys. 32 (2012), 249–272. [79] S. Popa and S. Vaes, Strong rigidity of generalized Bernoulli actions and computations of their symmetry groups. Adv. Math. 217 (2008), 833–872. [80] S. Popa and S. Vaes, Actions of F∞ whose II1 factors and orbit equivalence relations have prescribed fundamental group. J. Amer. Math. Soc. 23 (2010), 383–403. [81] S. Popa and S. Vaes, On the fundamental group of II1 factors and equivalence relations arising from group actions. Quanta of maths, 519–541, Clay Math. Proc. 11, Amer. Math. Soc., Providence, RI, 2010. [82] S. Popa and S. Vaes, Cocycle and orbit superrigidity for lattices in SL(n, R) acting on homogeneous spaces. In: Geometry, rigidity, and group actions, 419–451, Chicago Lectures in Math., Univ. Chicago Press, Chicago, IL, 2011. [83] S. Popa and S. Vaes, Group measure space decomposition of II1 factors and W∗ superrigidity. Invent. Math. 182 no. 2 (2010), 371–417 [84] S. Popa and S. Vaes, Unique Cartan decomposition for II1 factors arising from arbitrary actions of free groups. Preprint arXiv:1111.6951. [85] S. Popa and S. Vaes, Unique Cartan decomposition for II1 factors arising from arbitrary actions of hyperbolic groups. Preprint arXiv:1201.2824. [86] J. Peterson and A. Thom, Group cocycles and the ring of affiliated operators. Invent. Math. 185, 561–592 (2011). [87] F. R˘ adulescu, The fundamental group of the von Neumann algebra of a free group with infinitely many generators is R+ \ {0}. J. Amer. Math. Soc. 5 (1992), 517–532. [88] Y. Shalom, Measurable group theory. In: European Congress of Mathematics, European Mathematical Society Publishing House, 2005, 391–423. [89] I. M. Singer, Automorphisms of finite factors. Amer. J. Math. 77 (1955), 117–133. [90] T. Sinclair, Strong solidity of group factors from lattices in SO(n, 1) and SU(n, 1), J. Funct. Anal. 260 (2011), 3209–3221. [91] A. Speelman and S. Vaes, A class of II1 factors with many non conjugate Cartan subalgebras. Preprint arXiv:1107.1356, to appear in Adv. Math. [92] S. Vaes, Rigidity results for Bernoulli actions and their von Neumann algebras (after Sorin Popa). S´eminaire Bourbaki, exp. no. 961. Ast´erisque 311 (2007), 237–294. [93] S. Vaes, Factors of type II1 without non-trivial finite index subfactors. Trans. Amer. Math. Soc. 361 (2009), 2587–2606. [94] S. Vaes, Explicit computations of all finite index bimodules for a family of II1 ´ factors. Ann. Sci. Ecole Norm. Sup. 41 (2008), 743–788.
Classification and rigidity for von Neumann algebras
625
[95] S. Vaes, Rigidity for von Neumann algebras and their invariants. In: Proceedings of the ICM (Hyderabad, India, 2010), Vol. III, Hindustan Book Agency (2010), 1624–1650. [96] S. Vaes, One-cohomology and the uniqueness of the group measure space decomposition of a II1 factor. Preprint arXiv:1012.5377, to appear in Math. Ann. [97] D. V. Voiculescu, K. J. Dykema, and A. Nica, Free random variables. CRM Monograph Series 1, American Mathematical Society, Providence, RI, 1992. [98] D. V. Voiculescu, Circular and semicircular systems and free product factors. In: Operator algebras, unitary representations, enveloping algebras, and invariant theory (Paris, 1989), Progr. Math. 92, Birkh¨ auser, Boston, 1990, 45–60. [99] D. V. Voiculescu, The analogues of entropy and of Fisher’s information measure in free probability theory, III. Geom. Funct. Anal. 6 (1996), 172–199. [100] R. Zimmer, Ergodic theory and semisimple groups. Monographs in Mathematics, 81. Birkh¨ auser Verlag, Basel, 1984.
Adrian Ioana, Department of Mathematics, UC San Diego, La Jolla, CA, 92093, USA E-mail: [email protected]
A nonlinear variational problem in relativistic quantum mechanics Mathieu Lewin∗
Abstract. We describe several recent results obtained in collaboration with P. Gravejat, ´ S´er´e and J. P. Solovej, concerning a nonlinear model for the relativistic C. Hainzl, E. quantum vacuum in interaction with a classical electromagnetic field. 2010 Mathematics Subject Classification. Primary 35Q40; Secondary 81V10. Keywords. Nonlinear analysis, variational methods, quantum mechanics, Dirac operator, renormalization, quantum electrodynamics, vacuum polarization.
1. Introduction Quantum Electrodynamics (QED) is one of the most successful and precise theory in physics. Already studied in the early years of quantum mechanics by Dirac, Pauli, Heisenberg, Fermi, Weisskopf and others, it was finally formulated in its definite form by Bethe, Dyson, Feynman, Schwinger and Tomonaga between 1946 and 1950. This gave Feynman, Schwinger and Tomonaga a Nobel prize in 1965. The theory is the combination of quantum mechanics and Einstein’s special relativity. It aims at describing the interactions between matter and light, at the microscopic scale where quantum effects are dominant. It allows to determine the time-dependent behavior of charged particles (like the electrons in an atom or a molecule) when they are coupled to photons (the quanta of light). Quantum Electrodynamics has an important symmetry called charge conjugation. The latter means that the behavior of positively charged and negatively charged particles is described in a similar manner. More importantly, the theory predicts that any charged particle automatically has an anti-particle with the same mass but an opposite charge (for the electron, this particle is called the positron), and that it is possible to create a particle/anti-particle pair by providing a sufficient amount of energy to the vacuum. From Einstein’s famous relation, this energy must be at least 2 × mc2 . Because one can create matter from energy, the vacuum cannot be seen anymore as an empty and inert object as it is considered in everyday life. At the microscopic scale, the quantum vacuum is a fluctuating complicated system which participates to any physical phenomenon. Quantum Electrodynamics is an extremely accurate theory. Its predictions are the most precise ever obtained from a physical model, when compared with experi∗ Grants from the French Ministry of Research (ANR BLAN-10-0101) and from the European Research Council under the European Community’s Seventh Framework Programme (FP7/20072013 Grant Agreement MNIQS 258023) are gratefully acknowledged.
628
Mathieu Lewin
ments. The most famous successful predictions are the Lamb shift in the spectrum of the hydrogen atom and the electron anomalous magnetic dipole moment. The agreement with experiment is within a window of about 10−8 . In more technical terms, Quantum Electrodynamics is an abelian gauge field theory. The ideas of Dyson, Feynman, Schwinger and Tomonaga were later used to invent more complicated non-abelian gauge field (a.k.a. Yang–Mills) theories, like those describing the strong and weak forces. In units where the speed of light is c = 1 and Plank’s constant is ~ = 1, QED only depends on two parameters whose value has to be determined by experiment: the mass m of the electron, and the Sommerfeld coupling constant α which is the square of the electron charge, α = e2 . Of course, there might also be external fields which are applied to the system. The physical value of the coupling constant α is small (about 1/137) and, in the physical literature, Quantum Electrodynamics is always formulated as a perturbative theory. This means that the interesting physical quantities are formally expanded as a power series in α, and that only the coefficients of the series are computed explicitly, order by order. But there are divergences occurring and these coefficients are all infinite! In order to solve this problem, a regularization parameter Λ has to be introduced in the model. Then, the divergences are absorbed by making a change of variable for m and α, in order to obtain a well-defined formal series in the limit Λ → ∞. This procedure is called renormalization. It is fair to say that this perturbative formulation of QED is perfectly rigorous in the sense that there is a unique and well defined way to get the final answer (which, in most cases, is then in a surprising agreement with experiment). But, on the other hand, the perturbative nature is certainly frustrating from a mathematical point of view. In a famous quotation, Feynman himself said in 1985: The shell game that we play (...) is technically called ‘renormalization’. But no matter how clever the word, it is still what I would call a dippy process! Having to resort to such hocus-pocus has prevented us from proving that the theory of quantum electrodynamics is mathematically self-consistent. It’s surprising that the theory still hasn’t been proved self-consistent one way or the other by now; I suspect that renormalization is not mathematically legitimate. [19] There has been no dramatic change since Feynman’s quotation, and renormalization has become a common (and somehow accepted) tool. On the mathematical side, there have been several works on models originating from QED, but not so many on the true theory itself. We should recall that one of the famous Millennium Prize Problems of the Clay Mathematics Institute concerns the construction of a well-defined Yang–Mills theory as well as the understanding of its low energy excitation spectrum. Yang–Mills theory gathers all non-abelian gauge field theories as alluded above, and any deeper understanding of the (abelian) theory of Quantum Electrodynamics is therefore desirable. In the last few years, mathematicians have been particularly interested in studying the interaction of light with non relativistic matter. In this simplified model the electrons are quantum but they are assumed to have a speed which is much
A nonlinear variational problem in relativistic quantum mechanics
629
smaller than the speed of light, such that relativistic effects can be neglected. Their description then involves the Laplacian instead of the Dirac operator. Events like the creation of electron-positron pairs are encoded in the Dirac operator and they cannot be described within such a non-relativistic theory. But it is already a very important and fundamental problem to understand the effect of quantized light on non-relativistic particles. Some of the recent works concern for instance the existence of a lower eigenvalue for the underlying Hamiltonian [7, 25, 34], resonances and the relaxation to the ground state [1,6,7], problems related to divergences and mass renormalization [4,5,31,33,42], and the stability of large systems [18,40,43,44]. Coming back to relativistic particles, some authors have considered Lattice QED [47, 51]. Other studied linear and nonlinear models based on the Dirac operator for finitely many particles in an ‘inert’ vacuum and with a classical electromagnetic field, see e.g. [13, 15, 17, 20, 21, 35] and the references in [16]. Finally, the quantum vacuum and the process of pair creation was investigated in a noninteracting setting (meaning without any light at all), e.g. in [36, 37, 45, 46, 49]. ´ S´er´e and J.-P. Solovej, we followed another With P. Gravejat, C. Hainzl, E. route in a series of works [22–24,26–30] which originated from a fundamental paper of Chaix and Iracane [8,9], and which was stimulated by the previous works [3,32]. We considered relativistic particles in a fluctuating vacuum, both described by the Dirac equation, and in interaction with a classical electromagnetic field. That light is not quantized in this model prevents us from describing important physical effects. But, on the other hand, these seem to be the first mathematical results dealing with the quantum vacuum in interaction with light. Also, as we will explain, we are able to construct the associated model in a fully non-perturbative fashion, in a simplified setting. The purpose of this article is to review these recent results. We will particularly insist on the uncommon mathematical aspects, like those related to renormalization, and which are related to important physical effects. The reader interested in knowing more details can also read the last section of the review [16].
2. The quantum vacuum in classical electromagnetic fields In this section we present a model for quantum relativistic particles evolving in a fluctuating quantum vacuum, and interacting with a classical electromagnetic field. We will not discuss too much here the mathematical meaning of the equations, which will be explained in the next sections. Actually, most of the terms that we write here are infinite quantities if no special care is employed. We will concentrate on the time-independent model and look for stationary states. Our system is composed of particles (electrons or positrons) together with the quantum vacuum on the one side, and of a classical electromagnetic field (E, B) describing light on the other side, all evolving in the physical space R3 . It is useful to use electromagnetic potentials V and A which are such that E = −∇V and B = ∇ ∧ A. We might also consider fixed external fields Eext = −∇Vext and Bext = ∇ ∧ Aext which are applied to the system. The particles and the vacuum
630
Mathieu Lewin
are described by the Dirac operator 3 X ∂ Dm,eAtot := − eAtot (x)k + mβ − eVtot (x), αk − i ∂xk
(1)
k=1
where Atot := (Vtot , Atot ) with Vtot = V + Vext and Atot = A + Aext are the total electromagnetic fields, and e is the (bare) electron charge. The four Dirac matrices α = (α1 , α2 , α3 ) and β are equal to 0 σk I 0 αk := and β := 2 , σk 0 0 −I2 the Pauli matrices σ1 , 0 σ1 := 1
σ2 and σ3 being defined by 1 0 −i 1 0 , σ2 := and σ3 := . 0 i 0 0 −1 2 These matrices are chosen to ensure that Dm,0,0 = −∆ + m2 which is the quantum equivalent of Einstein’s fundamental relation E 2 = c2 p2 + m2 c4 for the relativistic classical energy E in terms of the momentum p, with c = 1 in our case. The operator in (1) acts on the Hilbert space H := L2 (R3 , C4 ). Under reasonable assumptions on Vtot and Atot , it is self-adjoint on the Sobolev space H 1 (R3 , C4 ), see [53]. The charge conjugation is the anti-unitary operator defined on H by C f := iβα2 f , and which satisfies C Dm,eAtot C −1 = −Dm,−eAtot . This relation implies that the spectrum of Dm,0,0 is symmetric with respect to 0, σ Dm,0,0 = (−∞, −m] ∪ [m, ∞). When Vtot and Atot decay at infinity,the essential spectrum stays the same by the Rellich–Kato theorem, σess Dm,eAtot = (−∞, −m] ∪ [m, ∞). Isolated eigenvalues of finite multiplicity can however appear in the gap (−m, m) when Vtot 6= 0. The stateR of one particle is described by a normalized wave function ϕ ∈ L2 (R3 , C4 ), R3 |ϕ|2 = 1, and its corresponding energy is hϕ, Dm,eAtot ϕi (which has to be understood in the form sense when ϕ ∈ H 1/2 (R3 , C4 )). The fact that the Dirac operator is unbounded from below, hence that the energy can be arbitrarily negative was very surprising for his inventor, P. A. M. Dirac [10, 12]. In quantum mechanics it is always assumed that the most stable state of the system is the one with the lowest energy, and there does not seem to be any here. From a mathematical point of view this makes a huge difference as compared to non-relativistic models based on the Laplace operator −∆. That the energy is unbounded from below means that minimization methods cannot be employed and that one has to resort to complicated min-max techniques to construct solutions [16]. Dirac did not want to renounce the physical picture that states with lower energy are more stable. So, in 1930 he suggested to reinterpret the problem by changing the role of the vacuum: We make the assumption that, in the world as we know it, nearly all the states of negative energy for the electrons are occupied, with just one electron in each state, and that a uniform filling of all the negative-energy states is completely unobservable to us. [11]
A nonlinear variational problem in relativistic quantum mechanics
631
Physically, one therefore has to imagine that the vacuum (called the Dirac sea) is filled with infinitely many virtual particles occupying the negative energy states. With this conjecture, a real electron cannot be in a negative state. This is because electrons are fermions and that two such particles can never be in the same quantum state (Pauli principle). With this interpretation, Dirac was able to conjecture the existence of “holes” in the vacuum, interpreted as anti-electrons or positrons, and which were later experimentally discovered by Anderson [2]. Dirac also predicted the phenomenon of vacuum polarization: In an external electric field, the virtual electrons are displaced, and the vacuum acquires a non constant density of charge. The idea of a fluctuating quantum vacuum was born. Let us now explain Dirac’s idea in more mathematical terms. The state of N electrons can be represented by N wave functions ϕ1 , . . . , ϕN ∈ H 1 (R3 , C4 ), such that hϕi , ϕj iL2 = δij , the latter constraint being the mathematical formulation of the Pauli principle. The total energy is the sum of the energies of the individual electrons, which can be written as N X
hϕj , Dm,eAtot ϕj i = Tr Dm,eAtot P ,
j=1
where P is the orthogonal projector onto the NP -dimensional space spanned by the N ϕj ’s. Using the bra-ket notation we have P = j=1 |ϕj ihϕj |. Now if we allow any number N of particles and look for the state of lowest energy, the formal solution is the negative spectral projector − P = Pm,eA := 1(−∞,0) Dm,eAtot . tot
(2)
Think of a finite hermitian matrix M , then the solution to the minimization problem inf{Tr(M P ), P 2 = P = P ∗ } is the negative spectral projector P = 1(−∞,0) (M ) of M , and the corresponding energy is − Tr M− with x− := − min(x, 0) denoting the negative part. The minimizer is unique when ker(M ) = {0}. In our case the state in (2) corresponds to filling with particles all the negative energies − of Dm,eAtot as suggested by Dirac. This state Pm,eA is the quantum vacuum in tot the presence of the electromagnetic potential Atot . It has infinitely many particles and the corresponding total energy is also infinite. Remark 2.1. There are several ways to give a mathematical meaning to the − assertion that Pm,eA minimizes P 7→ Tr(Dm,eAtot P ). The first is to use a sotot called thermodynamic limit. The ambient space H = L2 (R3 , C4 ) is approximated by a sequence of finite-dimensional spaces in which everything makes sense and − it is proved that in the limit the minimizer converges to Pm,eA . The second tot possibility is to directly argue in the whole space H that − Tr Dm,eAtot (P − Pm,eA )≥0 tot − for all projection P . Under appropriate assumptions on P (for instance P − Pm,0
632
Mathieu Lewin
finite rank and smooth), we can write − + − Tr Dm,eAtot (P − Pm,eA ) = Tr |D | Pm,eA (P − Pm,eA )P + m,eA tot tot tot tot m,eAtot − − − − Pm,eA (P − Pm,eA )Pm,eA tot tot tot + + P Pm,eA = Tr |Dm,eAtot |1/2 Pm,eA tot tot − − 1/2 + Pm,eA (1 − P )P |D ≥ 0. (3) m,eAtot | m,eAtot tot We have used here the commutativity of the trace and that Dm,eAtot = |Dm,eAtot | + − Pm,eA −P , by definition of the spectral projections. The last term in (3) m,eA tot tot is the trace of a non-negative operator which is always well-defined in [0, ∞], and which can be taken as a definition for the trace on the left side. See [26, Sec. 2.1] for a systematic and abstract theory of traces of this form. − When Atot = 0, then Pm,0 is nothing but the negative spectral projector of the free Dirac operator Dm,0 , which represents the free vacuum. This operator is translation invariant and, even if the corresponding charge density is infinite, it is somehow constant. This is the ‘uniformity’ alluded to in Dirac’s quotation. When eAtot 6= 0, the state of the vacuum changes. Later we will want to optimize over A with Aext fixed, but before we have to modify a bit our theory in order to make it charge-conjugation invariant. If we take an arbitrary state P = P 2 , then we get Tr Dm,eAtot P = − Tr Dm,−eAtot C P C −1 . This is not very satisfactory because −C P C −1 is not a fermionic state and we cannot reinterpret this as the energy of something. It is better to subtract half the identity to the operator P , which amounts to adding a (infinite) constant. So we consider instead the energy 1 P − P⊥ Tr Dm,eAtot P − = Tr Dm,eAtot 2 2 ⊥
with P ⊥ := 1−P . Now we see that Tr Dm,eAtot (P −P ⊥ ) = Tr Dm,−eAtot (P 0 −P 0 ) with P 0 := C P ⊥ C −1 . The subtraction of half the identity to P is a common technique for systems at half filling like ours. It is explained using the formalism of second quantization in [30]. We are now able to write the total Lagrangian of our system, which is the (formal) sum of the particle energy and Maxwell’s classical Lagrangian: Z 1 1 Lm,e (P, A ; Aext ) := Tr Dm,e(A+Aext ) P − + |B|2 − |E|2 , 2 8π R3 where we recall that E = −∇V and B = ∇ ∧ A. Our purpose is to look for critical points of this Lagrangian, obtained by minimizing over P and A and maximizing over V , as is usually done in classical electrodynamics. We have already explained − that the minimum over P with A fixed gives P = Pm,eA . It is also possible to tot optimize over A, with P fixed. The optimal potentials solve Gauss’ equation −∆V = 4π e ρP −1/2 ,
−∆A = 4π e jP −1/2
A nonlinear variational problem in relativistic quantum mechanics
633
where ρP −1/2 and jP −1/2 are the (formal) density of charge and density of current of the vacuum in the state P . These are defined by ρM (x) = TrC4 M (x, x) and jM (x)k = TrC4 αk M (x, x), for any locally trace-class operator M with integral kernel M (x, y). Note that these are zero for the free vacuum: ρP −
m,0 −1/2
≡ 0,
jP −
m,0 −1/2
≡ 0,
as is shown using the commutation properties of the matrices αk . This emphasizes the importance of subtracting half the identity to our vacuum state. Our task is to optimize over both P and A and it is reasonable to think that − the free vacuum P = Pm,0 is the optimal state when Aext = 0. But, the functional Lm,e is infinite and so we cannot easily optimize it. We can however subtract the − universal (infinite) constant Lm,e (Pm,0 , 0 ; 0) and consider the relative Lagrangian rel Lm,e (P, A
− Pm,0
Z
; Aext ) = Tr Dm,0 P − −e jP −P − · (A + Aext ) m,0 R3 Z Z 1 |B|2 − |E|2 . (4) −e ρP −P − (V + Vext ) + m,0 8π 3 3 R R
− Because the difference P − Pm,0 is a much better behaved operator than P − 1/2, we will be able to give a clear meaning to this Lagrangian. The purely electrostatic case A = Aext = 0 is much easier to deal with than the general case, and we discuss it first in the next section.
3. The purely electrostatic case In the relative Lagrangian (4), we now assume A = Aext = 0 and we maximize with respect to V . We find the following energy functional Z α Tr Dm,0 Q − e ρQ Vext + D(ρQ , ρQ ) 2 R3 − with α = e2 (the Sommerfeld coupling constant), Q := P − Pm,0 and
Z
Z
D(f, g) := R3
R3
f (x) g(y) dx dy = 4π |x − y|
Z R3
fˆ(k) gˆ(k) dk |k|2
which is the Coulomb (i.e. H˙ −1 (R3 )) scalar product. The associated space will be denoted by C := {f ∈ S 0 (R3 ) : D(f, f ) < ∞}. In our setting we are interested in taking for Vext the external field induced by some localized density of charge νext , describing for instance the nuclei of a molecule. So we take Vext = eνext ∗ |x|−1 and get the energy νext Em,α (Q) = Tr Dm,0 Q − αD(ρQ , νext ) +
α D(ρQ , ρQ ) 2
which is called the reduced Bogoliubov–Dirac–Fock energy [8, 27].
(5)
634
Mathieu Lewin
Under suitable assumptions on νext , the minimization of this functional in Q makes perfect sense. First we remark that, by the Cauchy–Schwarz inequality, −αD(ρQ , νext ) +
α α D(ρQ , ρQ ) ≥ − D(νext , νext ) 2 2
which is finite when νext ∈ C (for instance for νext ∈ L6/5 (R3 ), by the Hardy– Littlewood–Sobolev inequality [41]). Then we recall that Tr Dm,0 Q ≥ 0 for any − operator Q = P −Pm,0 , provided the trace is defined like in (3). This is reminiscent − of the fact that Pm,0 formally minimizes P 7→ Tr Dm,0 P . From all this we conclude that E νext (Q) ≥ −αD(νext , νext )/2 and the energy is bounded from below. νext However we are unlucky that the infimum of Em,α is never attained [27, Thm 2], except when νext ≡ 0 where Q = 0 is the optimizer. The clear mathematical statement is α α D(ρ , ρ ) = − D(ν, ν). inf Tr D Q − αD(ρ , ν) + Q Q m,0 Q − + 2 2 −Pm,0 ≤Q≤Pm,0 In the infimum one can restrict to finite-rank smooth operators Q, which ensures that all the terms make sense. We see that there is never a minimizer when νext 6= 0, because it should satisfy both ρQ = ν and Q = 0. As explained in [27], this issue is due to certain divergences in Fourier space at infinity. In order to give a meaning to our minimization problem, we have to impose a cut-off for the large Fourier frequencies. There are several ways to do this, some being more natural than others. In the purely electric case, we can consider a simple sharp cut-off consisting in replacing our ambient Hilbert space H by n o HΛ := f ∈ L2 (R3 , C4 ) : fb(k) = 0 for |k| ≥ Λ . Note that Dm,0 maps HΛ into itself. For simplicity we use the same notation Dm,0 ± and Pm,0 for the associated restrictions to HΛ . The operator Q is defined on HΛ and its energy has the same expression as in (5). It now has minimizers. Theorem 3.1 (Existence of the polarized vacuum [26,27]). Let m > 0, 0 < Λ < ∞ and α ≥ 0. For any fixed νext ∈ C = H˙ −1 (R3 ), the minimization problem n o − + ± ± νext (Q) : Q : HΛ → HΛ , −Pm,0 ≤ Q ≤ Pm,0 , ∓ Tr Pm,0 QPm,0 < ∞ (6) inf Em,α admits at least one minimizer Q∗ . All these minimizers share the same density ρQ∗ . Any minimizer Q∗ is a solution of the nonlinear equation − Q∗ = 1(−∞,0) (D∗ ) − Pm,0 + δ, (7) −1 D = Π D ΠΛ ∗ Λ m,0 + α(ρQ∗ − νext ) ∗ |x| where ΠΛ is the projection onto HΛ , and 0 ≤ δ ≤ 1{0} (D∗ ). √ + + If απ 1/6 211/6 D(ν, ν)1/2 < m, then ker (D∗ ) = {0} and Tr(Pm,0 Q∗ Pm,0 + − − Pm,0 Q∗ Pm,0 ) = 0. Hence δ ≡ 0 in (7), and the minimizer Q∗ is unique.
A nonlinear variational problem in relativistic quantum mechanics
635
− The equation (7) on the infinite rank operator P∗ = Q∗ + Pm,0 can be interpreted as an infinite system of coupled nonlinear partial differential equations. As we have said we always use the convention Tr Dm,0 Q := Tr |Dm,0 | Q++ − Q−− + + − − where Q++ := Pm,0 QPm,0 ≥ 0 and Q−− := Pm,0 QPm,0 ≤ 0 when Q satisfies − + the constraint −Pm,0 ≤ Q ≤ Pm,0 . Since with the cut-off Λ the operator Dm,0 is bounded, it is sufficient to ask that Q++ and Q−− are trace-class in order to properly define the trace, as required in (6). On the other hand simple algebraic manipulations show that the constraint is equivalent to − + −Pm,0 ≤ Q ≤ Pm,0 ⇐⇒ Q2 ≤ Q++ − Q−−
(8)
as remarked in [3]. So we see that we must also have Tr Q2 < ∞. In other words, for the first term in the energy to be finite, Q must be a Hilbert–Schmidt operator whose diagonal blocks Q±± are both trace-class. Under these conditions and with the ultraviolet cut-off Λ, it was proved in [28] that ρQ ∈ L2 (R3 ) ∩ C. This is sufficient to prove Theorem 3.1 using simple arguments from convex analysis. In [26, 27] a more complicated model with an additional so-called exchange term is considered and this term makes the analysis much more involved. Let us emphasize that Theorem 3.1 is valid for any value of the coupling constant α and any value of the ultraviolet cut-off Λ. This is therefore a non-perturbative result. Because no smallness assumption is made on the external density νext , the model is appropriate for the description of non-perturbative events like the spontaneous creation of electron-positron pairs in strong external fields [50]. Note that for simplicity we have minimized the energy with the relaxed constraint (8) − instead of the original constraint that P := Q + Pm,0 = P 2 . However, for a small external density νext , the minimizer Q∗ is unique and the corresponding state P∗ is an orthogonal projection, as expected. In Theorem 3.1 we have constructed the polarized quantum vacuum in the presence of the external density νext , which is by definition a global minimizer of the energy. It is also possible to construct states having a certain number N of ‘real’ electrons (or positrons). More precisely, we can minimize the energy under a charge constraint of the form “ Tr Q” := Tr(Q++ + Q−− ) = N , see [23, 28]. Minimizers do not always exist, depending on the value of N and on the strength of the density νext . Estimates on the maximal number of electrons that can be bound by νext were provided in [23], following ideas of Lieb [39]. Any minimizer with charge constraint solves a nonlinear equation similar to (7), with the energy level 0 replaced by a Lagrange multiplier µ ∈ (−m, m): − Q∗ = 1(−∞,µ) (D∗ ) − Pm,0 +δ
(9)
where now 0 ≤ δ ≤ 1{µ} (D∗ ). If µ > 0, splitting 1(−∞,µ) = 1(−∞,0) + 1(0,µ) gives the corresponding states for the polarized vacuum and for the electrons. If µ < 0, we split 1(−∞,µ) = 1(−∞,0) − 1(µ,0) and get the state of the polarized vacuum and of the positrons. In the rest of this article we will not discuss further this
636
Mathieu Lewin
constrained minimization problem, and we will instead concentrate on the pure vacuum case considered in Theorem 3.1. The minimizers obtained in Theorem 3.1 are very singular mathematical objects. The following says that Q∗ is never a trace-class operator. Theorem 3.2 (Renormalized charge [23]). Let Q∗ be a minimizer as in Theorem 3.1, for some νext ∈ L1 (R3 ) ∩ C. Then we have ρQ∗ ∈ L1 (R3 ) and Z −− Z Tr(Q++ + Q ) − νext ∗ ∗ R3 (10) ρQ∗ − νext = 1 + αBΛ/m R3 where √
Λ
2 z 2 − z 4 /3 Λ 5 2 log 2 dz = log − + + O(m2 /Λ2 ). (11) 2 1 − z 3π m 9π 3π 0 √ In particular, if απ 1/6 211/6 D(ν, ν)1/2 < m, then Q∗ is not trace-class. √ ∗ Let us recall that a compact operator PB is trace-class when Tr |B| = Tr B B < ∞. This is equivalent to saying that j hϕj , |B|ϕj i < ∞ for one (hence for all) orthonormal basis {ϕj } of the ambient Hilbert space. By the spectralP theorem, this is also the same as saying that the eigenvalues of B are summable, j |λj | < 2 ∞. If Pthe ambient space is an L space, we can write by the spectral P theorem B = j λj |ψj ihψj | and we get that the corresponding density ρB = j λj |ψj |2 is R in L1 and that ρB = Tr B. P For a non positive operator B, it can happen that j hϕj , Bϕj i is convergent for one basis and not for another. This is exactly what is happening for our −− ++ operator blocks diagonal
− Q∗ and Q∗ are trace-class, which means P Q∗+. The two + − that j ϕj , Q∗ ϕj + ϕj , Q∗ ϕj converges for any orthonormal basis {ϕ± j } of ± 1 3 H± := P H . The surprise is that ρ is always in L (R ) but that in general Q∗ m,0 Λ RΛ ρ 6= Tr(Q++ + Q−− ). The discrepancy between these two quantities is R3 Q∗ universal and it is given by the relation (10). The problem comes from the offwhich are in L1 (R3 ) but do not have a vanishing integral. diagonal densities ρQ±∓ ∗ More precisely, only the first order term when ρQ±∓ are expanded in a power series ∗ R in α contribute to R3 ρQ∗ . The proof of Theorem 3.2 consists in studying this term in details [23]. Theorem 3.2 has a natural interpretationR in terms of charge renormalization. Imagine that we put a nucleus of charge Z = R3 νext in the vacuum, which is weak − in the sense that D(νext , νext ) is small, and let P∗ = Q∗ + Pm,0 be the corresponding unique polarized vacuum obtained by Theorem 3.1. This vacuum is neutral, Tr(Q++ + Q−− ∗ ∗ ) = 0, which means that the external field is not strong enough to create electron-positron pairs. In reality we never measure the charge of the nucleus alone, but we always also observe the corresponding vacuum polarization. Hence we do not see Z, but rather Z/(1 + αBΛ/m ). This corresponds to having a physical coupling constant αph given by the renormalization formula αph α αph = ⇐⇒ α = . (12) 1 + αBΛ/m 1 − αph BΛ/m BΛ/m
1 = π
Z
m2 +Λ2
A nonlinear variational problem in relativistic quantum mechanics
637
In our theory we must take αph ' 1/137, the ‘bare’ α can never been observed. So using the change of variable (12) we should express any other physical quantity predicted by our model in terms of αph , m and Λ only. The natural question arises whether it is possible to remove the ultraviolet cutoff Λ, by keeping αph and m fixed (note that the mass m is not renormalized in this model). The answer is clearly no! We immediately get from (12) that αph BΛ/m < 1 and therefore αph → 0 whenever we try to take Λ → ∞. This phenomenon is called the Landau pole [38] and one has to look for a weaker definition of renormalizability. The cut-off Λ which was first introduced as a mathematical trick to regularize the model has actually a physical meaning. A natural scale occurs beyond which the model does not make sense. Fortunately, this corresponds to momenta of the order Λ ∼ me3π/2αph , a huge number for αph ' 1/137. In [24], the regime αph 1, Λ 1 with αph log Λ fixed was studied. It was proved that the dressed density of the nucleus admits a Taylor expansion α(νext − ρQ∗ ) = αph νext +
K X
k+1 K+2 νk αph + O(αph )
(13)
k=1
where the terms νk in the expansion are independent of the value of αph log Λ. The first correction ν1 gives rise to the famous Uehling potential [52, 54]. The relation (13) shows that the density of the nucleus can be renormalized order by order. It is believed that the series in (13) is divergent [14], but no mathematical argument has been provided so far. To summarize, in the purely electrostatic case we have a well defined nonperturbative theory for all m > 0, all αph ≥ 0 and all cut-off Λ such that αph BΛ/m < 1. In the regime αph 1, Λ 1 with αph log Λ fixed, the perturbation series computed by the physicists is recovered.
4. The electromagnetic case Let us go back to our relative Lagrangian (4), including now the electromagnetic fields Btot = B + Bext . Optimizing with respect to P we find a formal Lagrangian depending only on the classical fields Z 1 1 rel |B|2 − |E|2 . (14) Lm,e (A ; Aext ) = Tr |Dm,0 | − |Dm,e(A+Aext ) | + 2 8π R3 This Lagrangian is not really well-defined because of the ultraviolet divergences which have to be taken care of. On the contrary to the pure electrostatic case studied in the last section, we cannot impose a sharp cut-off here, as it is extremely important to keep the magnetic gauge invariance corresponding to replacing A by A + ∇ϕ. In [22] we used the Pauli–Villars method [48], which consists in introducing two fictitious particle fields of very high masses m1 , m2 1 which play the role of ultraviolet cut-offs. The Pauli–Villars-regulated Lagrangian is
638
Mathieu Lewin
defined by PV Lm,e (A ; Aext ) =
2 1 Z 1 X Tr |B|2 −|E|2 (15) cj |Dmj ,0 |−|Dmj ,e(A+Aext ) | + 2 8π 3 R j=0
where c0 = 1, m0 = m and the fictitious fields are chosen such that 2 X
cj =
j=0
2 X
cj m2j = 0.
(16)
j=0
The main result of [22] is the following Theorem 4.1 (The Pauli–Villars-regulated vacuum in electromagnetic fields [22]). Let e ∈ R, m > 0 and c1 , c2 , m1 , m2 satisfying (16). P2 (i) The functional A 7→ Tr TrC4 j=0 cj |Dmj ,0 | − |Dmj ,A | is well-defined for A ∈ Cc∞ (R3 , R4 ) and it admits a unique continuous extension to 1 H˙ div (R3 ) = (V, A) ∈ L6 (R3 , R4 ) : divA = 0, (−∇V, ∇ ∧ A) ∈ L2 (R3 , R6 ) . So the Pauli–Villars-regulated Lagrangian (15) is well-defined and continuous on 1 1 H˙ div (R3 ) × H˙ div (R3 ). √ 1 (ii) For any Aext ∈ H˙ div (R3 ) with ekAext kH˙ 1 (R3 ) < r m/2, there exists a unique div solution A∗ = (V∗ , A∗ ) ∈ H˙ 1 (R3 ) to the min-max problem div
PV Lm,e (A∗ , Aext ) =
max k∇V kL2
0 for i = 1, 2, and such that δH (N ; Z) = 2 0 ⊂ H (X; Z). An admissible cut can be found for any X. Given an admissible cut, one can delete a four-ball from the interior of each piece Xi to obtain two cobordisms W1 (from S 3 to N ) and W2 (from N to S 3 ). Let s be a Spinc structure on X. By − + combining the cobordism maps FW and FW in a certain way, one can 1 ,s|W1 2 ,s|W2 − + 3 3 construct a mixed map from HF (S ) to HF (S ). The image of 1 ∈ HF − (S 3 ) ∼ = Z under this map defines the Ozsv´ ath–Szab´ o mixed invariant of the pair (X, s). This is conjecturally equivalent to the well-known Seiberg–Witten invariant [46], and is known to share many of its properties. In particular, it can be used to distinguish homeomorphic 4-manifolds that are not diffeomorphic.
3. Link Floer homology Ozsv´ ath–Szab´ o [38] and, independently, Rasmussen [44] used Heegaard Floer theory to define invariants for knots in 3-manifolds: these are the various versions of knot Floer homology. Recall that a marked Heegaard diagram (Σ, α, β, z) represents a 3-manifold Y . If one specifies another basepoint w in the complement of the alpha and beta curves, this gives rise to a knot K ⊂ Y . Indeed, one can join w to z by a path in Σ \ ∪αi , and then push this path into the interior of the handlebody Uα . Similarly,
Grid diagrams in Heegaard Floer theory
647
one can join w to z in the complement of the beta curves, and push the path into Uβ . The union of these two paths is the knot K. For simplicity, we will assume that K is null-homologous. (Of course, this happens automatically if Y = S 3 .) In the definition of the Heegaard Floer complex CF − we kept track of intersections with z through the exponent nz (φ) of the variable U . Now that we have two basepoints, we have two quantities nz (φ) and nw (φ). One thing we can do is to count only disks in classes with nz (φ) = 0, and keep track of nw (φ) in the exponent of U . The result is a complex of Z[U ]-modules denoted CF K − (Y, K), with homology HF K − (Y, K). If we set the variable U to zero, we get a complex \ \ CF K(Y, K), with homology HF K(Y, K). These are two of the variants of knot Floer homology. There exist many other variants, some of which involve classes φ with nz (φ) 6= 0 and nw (φ) 6= 0; an example of this, denoted A± (K), will be mentioned in section 5. \ Let us focus on the case when Y = S 3 . The groups HF K(S 3 , K) naturally split as direct sums: M \ \ HF K(S 3 , K) = HF K m (S 3 , K, s). m,s∈Z
Here, m and s are certain quantities called the Maslov and Alexander gradings, \ respectively. We can encode some of the information in HF K into a polynomial X \ PK (t, q) = tm q s · rank HF K m (S 3 , K, s). m,s∈Z
The specialization PK (−1, q) is the classical Alexander polynomial of K. However, the applications of knot Floer homology go well beyond those of the Alexander polynomial. In particular, the genus of the knot, which is defined as g(K) = min{g | ∃ embedded, oriented surface Σ ⊂ S 3 , ∂Σ = K}, \ can be read from HF K: Theorem 3.1 ([39]). For any knot K ⊂ S 3 , we have \ g(K) = max{s ≥ 0 | ∃ m, HF K m (S 3 , K, s) 6= 0}. Since the only knot of genus zero is the unknot, we have: Corollary 3.2. K is the unknot if and only if PK (q, t) = 1. By a result of Ghiggini [13], the polynomial P has enough information to also detect the right-handed trefoil, the left-handed trefoil, and the figure-eight knot. Ni [31] extended the work in [13] to show that S 3 \ K fibers over the circle if and \ only if ⊕m HF K m (S 3 , K, g(K)) ∼ = Z. Other applications of knot Floer homology include the construction of a concordance invariant called τ [35], and a complete characterization of which lens spaces can be obtained by surgery on knots [14]. If instead of a knot K ⊂ S 3 we have a link L (a disjoint union of knots), we \ can define invariants HF L(S 3 , L), HF L− (S 3 , L), which are versions of link Floer
648
Ciprian Manolescu
\ \ homology. When L is a knot, HF L and HF L− reduce to HF K and HF K − , respectively. In general, the definition of link Floer homology involves choosing a new kind of Heegaard diagram for S 3 , in which the number of alpha (or beta) curves exceeds the genus of the Heegaard surface. The details can be found in [42]. If the diagram has g + k − 1 alpha curves, it should also have g + k − 1 beta curves, k basepoints of type z, and k basepoints of type w. In the simplest version, k is the same as the number ` of components of the link, and joining the w basepoints to the z basepoints in pairs (by a total of 2` paths) produces the link L. Instead of Z[U ], the link Floer complexes are defined over a polynomial ring Z/2[U1 , . . . , U` ], with one variable for each component. (In fact, we expect the complexes to be defined over Z[U1 , . . . , U` ]. However, at the moment some orientation issues are not yet settled, and the theory is only defined with mod 2 coefficients.) More generally, we could have k ≥ `, and break the link into more segments. We can then define a link Floer complex over Z/2[U1 , . . . , Uk ], with one variable for each basepoint; see [27, 26]. The homology of this complex is still HF L− , and all the variables corresponding to basepoints on the same link component act the same way. If we set one Ui variable from each link component to zero in the \ complex, the resulting homology is HF L. If we set all the Ui variables to zero in the complex, the homology becomes \ HF L(S 3 , L) ⊗ V k−` ,
(1)
where V is a two-dimensional vector space over Z/2.
4. Grid diagrams and combinatorial link Floer complexes Definition 4.1. Let L ⊂ S 3 be a link. A grid diagram for L consists of an n-by-n grid in the plane with O and X markings inside, such that: (1) Each row and each column contains exactly one X and one O; (2) As we trace the vertical and horizontal segments between O’s and X’s (with the vertical segments passing over the horizontal segments), we see a planar diagram for the link L. An example is shown on the left hand side of Figure 1. It is not hard to see that every link admits a grid diagram. In fact, as a way of representing links, grid diagrams are equivalent to arc presentations, which originated in the work of Brunn [5]. The minimal number n such that L admits a grid diagram of size n is called the arc index of L. Grid diagrams can be viewed as particular examples of Heegaard diagrams with multiple basepoints, of the kind discussed at the end of the previous section. Indeed, if we identify the opposite sides of a grid diagram G to get a torus, we can let this torus be the Heegaard surface Σ, the horizontal circles be the α curves, the vertical circles be the β curves, the O markings be the w basepoints, and the X markings be the z basepoints. A point x = {x1 , . . . , xn } in the intersection Tα ∩Tβ
649
Grid diagrams in Heegaard Floer theory
Figure 1. A grid diagram for the trefoil, and two empty rectangles in Rect◦ (x, y). On the right, x is indicated by the collection of black dots, and y by the collection of white dots at the intersection of the grid lines. One empty rectangle is darkly shaded. The other rectangle is wrapped around the torus, and consists of the union of the four lightly shaded areas.
consists an n-tuple of points on the grid (one on each vertical and horizontal circle). There are n! such intersection points, and they are precisely the generators of the link Floer complex. We denote the set of these generators by S(G). Definition 4.2. Let G be a grid diagram, and x, y ∈ S(G). We define a rectangle from x to y to be a rectangle on the grid torus with the lower left and upper right corner being points of x, the lower right and upper right corners being points of y, and such that all the other components of x and y coincide. (In particular, for such a rectangle to exist we need x to differ from y in exactly two rows.) A rectangle is called empty if it contains no components of x or y in its interior. The set of empty rectangles from x to y is denoted Rect◦ (x, y). Of course, the space Rect◦ (x, y) has at most two elements. An example where it has exactly two is shown on the right hand side of Figure 1. The reason why grid diagrams are useful in Heegaard Floer theory is that they make pseudo-holomorphic disks of Maslov index 1 easy to count: Proposition 4.3 ([27]). Let G be a grid diagram, and let x, y ∈ S(G). Then, there is a 1-to-1 correspondence: n o φ ∈ π2 (x, y) | µ(φ) = 1, c(φ, J) ≡ 1(mod 2) for generic J ←→ Rect◦ (x, y). Sketch of proof. In any Heegaard diagram, if we have a relative homotopy class φ ∈ π2 (x, y), we can associate to it a two-chain D(φ) on the Heegaard surface Σ, as follows. Let n be the number of alpha (or beta) curves. Together, the alpha and the beta curves split Σ into several connected regions R1 , . . . , Rm . For each i, let us pick a point pi in the interior of Ri , and define the multiplicity of φ at Ri to be the intersection number npi (φ) between φ and {pi } × Symn−1 (φ). We set X D(φ) = npi (φ)Ri . i
650
Ciprian Manolescu
This is called the domain of φ. If φ admits any pseudo-holomorphic representatives, then the multiplicities npi (φ) must be nonnegative. Lipshitz [21] showed that the Maslov index of φ can be expressed in terms of the domain: µ(φ) = e(D(φ)) + N (D(φ)), where e and N are certain quantities called the Euler measure and total vertex multiplicity, respectively. The Euler P measure is additive on regions, that P is, we can define e(Ri ) such that e(D(φ)) = i npi (φ)e(Ri ). If we take the sum i e(Ri ) we get the Euler characteristic of the Heegaard surface Σ. As for the total vertex multiplicity, it is the sum of 2n vertex multiplicities Nq (D(φ)), one for each point q ∈ x or q ∈ y. The quantity Nq (D(φ)) is the average of the multiplicities of φ in the four quadrants around q. In the case of a grid G, the regions Ri are the n2 unit squares of G. Each square has Euler measure zero. If we have φ ∈ π2 (x, y) with µ(φ) = 1 and c(φ, J) 6= 0, then the coefficients of Ri in D(φ) are nonnegative. This implies that 1 = µ(φ) = N (D(φ)) is a sum of vertex multiplicities Nq (D(φ)). Each Nq (D(φ)) is either zero or at least 1/4. A short analysis shows that D(φ) must be an empty rectangle. Conversely, given an empty rectangle, there is a corresponding class φ with µ(φ) = 1. An application of the Riemann mapping theorem shows that φ has an odd number of pseudo-holomorphic representatives for generic J. In view of Proposition 4.3, the link Floer complex associated to a grid can be defined in a purely combinatorial way. Precisely, we define C − (G) to be freely generated by S(G) over the ring Z/2[U1 , . . . , Un ], with differential: ∂x =
X
X
O (r)
U1 1
. . . UnOn (r) · y.
y∈S(G) {r∈Rect◦ (x,y)|Xi (r)=0, ∀i}
Here, Oi (r) encodes whether or not the ith marking of type O is in the interior of the rectangle r: if it is, we set Oi (r) to be 1; otherwise it is 0. The quantity Xi (r) ∈ {0, 1} is defined similarly, in terms of the ith marking of type X. The homology of C − (G) is the link Floer homology HF L− (S 3 , L). Remark 4.4. Although the complex C − (G) is defined with mod 2 coefficients, one can add signs in the differential to get a complex over Z[U1 , . . . , Un ], whose homology is still a link invariant. See [28, 12]. If in C − (G) we set one variable Ui from each link component to zero, we get a b \ complex C(G) with homology HF L(S 3 , L). Perhaps the simplest complex is e C(G) = C − (G)/(U1 = U2 = · · · = Un = 0), for which we only need to count the empty rectangles with no markings of any e \ type in their interior. The homology of C(G) is HF L(S 3 , L) ⊗ V n−` ; compare (1).
Grid diagrams in Heegaard Floer theory
651
e \ In particular, when L = K is a knot, the homology of C(G) is HF K(S 3 , K) ⊗ V . There exist simple combinatorial formulas for the Maslov and Alexander gradings of the generators in S(G), and from them one gets a bi-grading on e \ H∗ (C(G)); see [27], [28]. From here one can recover HF K(S 3 , K) as a bi-graded group, taking into account that each V factor is spanned by one generator in bi\ degree (0, 0) and another in bi-degree (−1, −1). This method of calculating HF K was implemented on the computer by Baldwin and Gillam [2]; see also Droz [9] for a more efficient program, using a variation of this method due to Beliakova e [4]. Note that the size of the combinatorial knot Floer complex C(G) increases super-exponentially (like n!) in the size n of the grid. Nevertheless, the programs \ can effectively compute HF K for knots of arc index at least up to 13. In view of Theorem 3.1, we see that grid diagrams yield an algorithm for detecting the genus of a knot. In particular, if we are given a knot diagram and want to see if it represents the unknot, we can turn it into a grid diagram (after some ∼ e e suitable isotopies), then set up the complex C(G), and check if H∗ (C(G)) = V n−1 . Among the other applications of the grid diagram method we mention Sarkar’s combinatorial proof of the Milnor conjecture [47]. The Milnor conjecture states that the slice genus of the torus knot T (p, q) is (p − 1)(q − 1)/2; a corollary is that the minimum number of crossing changes needed to turn T (p, q) into the unknot is also (p − 1)(q − 1)/2. The conjecture was first proved by Kronheimer and Mrowka using gauge theory [16]; for other proofs, see [35], [45]. Slight variations of grid diagrams can be used to compute the knot Floer homology of knots inside lens spaces [1], and of a knot inside its cyclic branched covers [19]. Finally, we mention that there exists a purely combinatorial proof that b H∗ (C − (G)) and H∗ (C(G)) are link invariants [28]. However, it remains an open problem to prove combinatorially that knot Floer homology detects the genus of a knot. n−1
5. Three-manifolds and four-manifolds For a general Heegaard diagram, counting pseudo-holomorphic disks in the symmetric product is very difficult. Why is it easy for a grid diagram? If we look at the proof of Proposition 4.3, a key point we find is that the regions Ri have zero Euler measure. In fact, what is important is that they have nonnegative Euler measure: since the total vertex multiplicity is always nonnegative, the fact that e(D(φ)) + N (D(φ)) = 1 imposes tight constraints on the possibilities for D(φ). In general, if a Heegaard surface Σ can be partitioned into regions of nonnegative Euler measure, its Euler characteristic (which is the sum of all the Euler measures) must be nonnegative; that is, Σ must be a sphere or a torus. Our grid diagrams were set on a torus. There is also a variant on the sphere, that produces another combinatorial link Floer complex, and in the end yields the same homology.
652
Ciprian Manolescu
Instead of a knot in S 3 , we could take a 3-manifold Y and try to compute its Heegaard Floer homology using this method. The problem is that a typical 3-manifold does not admit a Heegaard diagram of genus 0 or 1; only S 3 , S 1 × S 2 and lens spaces do. However, Sarkar and Wang [48] proved that one can find a Heegaard diagram for Y , called a nice diagram, in which all regions except one have nonnegative Euler measure. (This is related to the fact that on a surface of higher genus we can move all negative curvature to a neighborhood of a point.) If we put the basepoint z in the bad region (the one with negative Euler measure), then we can understand pseudo-holomorphic curve counts for all classes φ with nz (φ) = 0. These are the only classes that appear when defining the complex d(Y ). Thus, we get an algorithm for computing HF d of any 3-manifold. We refer CF to [48] for more details, and to [33, 34] for related work. Similarly, one can compute the cobordism maps FˆW,s for any simply connected W [22]. These suffice to detect exotic smooth structures on some 4-manifolds with boundary, but not on any closed 4-manifolds. This line of thought runs into major difficulties if one wants to understand combinatorially the plus and minus versions of HF , or the mixed invariants of 4-manifolds. Instead, what is helpful is to reduce everything to the case of links in S 3 , and then appeal to grid diagrams. This program was developed in [26, 29], and is summarized below. We start by recalling that any closed 3-manifold Y is integral surgery on a link in S 3 : Y = (S 3 \ ν(L)) ∪φ (ν(L)). Here, ν(L) is a tubular neighborhood, and φ is a self-diffeomorphism of ∂ν(L). The diffeomorphism can be specified in terms of a framing of the link, which in turn is determined by choosing one integer for each link component. For example, the Poincar´e sphere is surgery on the right-handed trefoil with +1 framing. In general, we denote by SΛ3 (L) the result of surgery on L with framing Λ. Four-manifolds can also be expressed in terms of links, using Kirby diagrams [15]. By Morse theory, a closed 4-manifold can be broken into a 0-handle, some 1-handles (represented in a Kirby diagram by circles marked with a dot), some 2-handles (represented by framed knots), some 3-handles, and a 4-handle. The positions of the 1-handles and 2-handles determine the manifold. The next step in the program is to express the Heegaard Floer homology of surgery on a link in terms of data associated to the link. The first result in this direction was obtained by Ozsv´ath and Szab´o [41], who dealt with surgery on knots: Theorem 5.1 ([41]). There is an (infinitely generated) version of the knot Floer complex, A+ (K), such that ΦK HF + (Sn3 (K)) = H∗ Cone A+ (K) −−n→ A+ (∅) where in A+ (K), A+ (∅) we count pseudo-holomorphic bigons and in ΦK n we count pseudo-holomorphic triangles.
653
Grid diagrams in Heegaard Floer theory
The complex A+ (∅) is a direct sum of infinitely many copies of CF + (S 3 ). The inclusion of one of these copies into the mapping cone complex ΦK Cone A+ (K) −−n→ A+ (∅) − induces on homology the map FW,s : HF + (S 3 ) −→ HF + (Sn3 (K)) corresponding to the surgery cobordism (2-handle attachment along K), equipped with a Spinc structure s.
The proof of Theorem 5.1 is based on an important property of HF + called the surgery exact triangle. The version HF − does not have a similar exact triangle, but a slight variant of it, HF− does. The version HF− is obtained from HF − by completion with respect to the U variable. For torsion Spinc structures s, one can recover HF − (Y, s) from HF− (Y, s), so in that case the two versions contain equivalent information. There is an analogue of Theorem 5.1 with HF− instead of HF + , and with a knot Floer complex denoted A− instead of A+ . There is also an extension of Theorem 5.1 to surgeries on links rather than single knots. Phrased in terms of HF− , it reads: Theorem 5.2 ([26]). If L = K1 ∪ K2 ⊂ S 3 is a link with framing Λ, then HF− (SΛ3 (L)) is isomorphic to the homology of a complex C − (L, Λ) of the form A− (L) A− (K2 )
/ A− (K1 ) %
(2)
/ A− (∅)
where the edge maps count holomorphic triangles, and the diagonal map counts holomorphic quadrilaterals. This can be generalized to links with any number of components. The higher diagonals involve counting higher holomorphic polygons. Further, the inclusion of the subcomplex corresponding to L0 ⊆ L corresponds to the cobordism maps given by surgery on L − L0 . Remark 5.3. For technical reasons, at the moment Theorem 5.2 is only established with mod 2 coefficients. If W is a cobordism between (connected) 3-manifolds that consists of 2-handles only, then we can express one boundary piece of W as surgery on a link L0 ⊂ S 3 , and W as a handle attachment along a link L − L0 . Thus, Theorem 5.2 gives a description of the maps on HF− associated to any such cobordism W . In fact, 2-handles are the main source of complexity in 4-manifolds. Once we understand them, is not hard to incorporate the maps induced by 1-handles and 3-handles into the picture. The result is a description of the Ozsv´ath–Szab´o mixed invariant of a 4-manifold X in terms of link Floer complexes. For this one needs to represent X by a slight variant of a Kirby diagram, called a cut link presentation; we refer to [26] for more details.
654
Ciprian Manolescu
Figure 2. Snail domains. Darker shading corresponds to higher local multiplicities. The domains in each row (top or bottom) are part of an infinite sequence, corresponding to increasing complexities. The larger circles represent certain fixed points on the grid, called destabilization points. Each domain corresponds to a pseudo-holomorphic triangle in the symmetric product of the grid.
Theorem 5.4 ([29]). Given any 3-manifold Y with a Spinc structure s, the Heegaard Floer homologies HF + (Y, s) and HF− (Y, s) (with mod 2 coefficients) are algorithmically computable. So are the mixed invariants ΨX,s (mod 2) for closed c 4-manifolds X with b+ 2 (X) > 1 and s ∈ Spin (X). Sketch of proof. We can represent the 3-manifold or the 4-manifold in terms of a link, as above (by a surgery diagram or a cut link presentation). The idea is to take a grid diagram G for the link, and then use Theorem 5.2. We know that index 1 holomorphic disks (bigons) on the symmetric product of the grid correspond to empty rectangles. However, to apply Theorem 5.2 we also need to be able to count higher pseudo-holomorphic polygons. In [29], it is shown that isolated pseudoholomorphic triangles on the symmetric product are in 1-to-1 correspondence with domains on the grid of certain shapes, as shown in Figure 2. No such easy description is available for counts of pseudo-holomorphic m-gons on Symn (G) with m ≥ 4. The trouble is that, unlike for m = 2 or 3, the counts for m ≥ 4 depend on the choice of a generic family J of almost complex structures on Symn (G). Still, the counts are required to satisfy certain constraints, coming from positivity of intersections and Gromov compactness. We define a formal complex structure on G to be any count of domains on the grid that satisfies these constraints. A formal complex structure is a purely combinatorial object. Each such structure c gives rise to a complex C − (G, Λ, c), similar to (2), but where instead of pseudo-holomorphic polygon counts we use the domain counts prescribed by c. In particular, a family of almost complex structures J on the symmetric product produces a formal complex structure, whose corresponding complex is exactly (2). There is a definition of homotopy between formal complex structures, and if two such structures are homotopic, they give rise to quasi-isomorphic complexes C − (G, Λ, c).
Grid diagrams in Heegaard Floer theory
655
We conjecture that any two formal complex structures on a grid diagram are homotopic. A weaker form of this conjecture, sufficient for our purposes, is proved in [29]. Instead of an ordinary grid diagram G, we use its sparse double G# . This is obtained from G by introducing n additional rows, columns, and O markings, interspersed between the previous rows and columns. The sparse double is not a grid diagram in the usual sense, because the new rows and columns have no X markings. Nevertheless, it can still be viewed as a type of Heegaard diagram for the link, and pseudo-holomorphic bigons and triangles correspond to empty rectangles and snail domains, just as before. One result of [29] is that on the sparse double, any two formal complex structures are homotopic. With this in mind, the desired algorithm for computing HF− is as follows: Choose any formal complex structure on G# , and then calculate the homology of C − (G, Λ, c). This homology is independent of c, so it agrees with the homology of the complex (2). By Theorem 5.2, this gives exactly HF− of surgery on the framed link. Similar algorithms can be constructed for computing HF + and ΦX,s . Acknowledgement. I would like to thank Tye Lidman for several helpful suggestions about the exposition.
References [1] K. Baker, J. Grigsby, and M. Hedden, Grid diagrams for lens spaces and combinatorial knot Floer homology. Int. Math. Res. Not. IMRN, no. 10 2008, ID rnm024, 39 pp. [2] J. A. Baldwin and W. D. Gillam, Computations of Heegaard–Floer knot homology. Preprint (2006), arXiv:math/0610167. [3] J. A. Baldwin and A. S. Levine, A combinatorial spanning tree model for knot Floer homology. Preprint (2011), arXiv:1105.5199. [4] A. Beliakova, A simplification of combinatorial link Floer homology. J. Knot Theory Ramifications 19 no. 2 (2010), 125–144. ¨ [5] H. Brunn, Uber verknotete Kurven. In: Verhandlungen des Internationalen Math. Kongresses (Zurich 1897), 256–259, 1898. [6] V. Colin, P. Ghiggini, and K. Honda, Equivalence of Heegaard Floer homology and embedded contact homology via open book decompositions. Proc. Nat. Acad. Sci. 108 (20), 8100–8105. [7] S. Donaldson, An application of gauge theory to four-dimensional topology. J. Differential Geom. 18 no. 2 (1983), 279–315. [8] S. Donaldson, Polynomial invariants for smooth four-manifolds. Topology 29 no. 3 (1990), 257–315. [9] J.-M. Droz, Effective computation of knot Floer homology. Acta Math. Vietnam. 33 no. 3 (2008), 471–491. [10] A. Floer, An instanton-invariant for 3-manifolds. Comm. Math. Phys. 118 no. 2 (1988), 215–240.
656
Ciprian Manolescu
[11] K. Frøyshov, Monopole Floer homology for rational homology 3-spheres. Duke Math. J. 155 no. 3 (2010), 519–576. [12] E. Gallais, Sign refinement for combinatorial link Floer homology. Algebr. Geom. Topol. 8 no. 3 (2008), 1581–1592. [13] P. Ghiggini, Knot Floer homology detects genus-one fibred knots. Amer. J. Math. 130 no. 5 (2008), 1151–1169. [14] J. E. Greene, The lens space realization problem. Preprint (2010), arXiv:1010.6257. [15] R. Kirby, A calculus for framed links in S 3 . Invent. Math. 45 no. 1 (1978), 35–56. [16] P. Kronheimer and T. Mrowka, Gauge theory for embedded surfaces I. Topology 32 no. 4 (1993), 773–826. [17] P. Kronheimer and T. Mrowka, Monopoles and three-manifolds. New Mathematical Monographs 10, Cambridge University Press, Cambridge, 2007. [18] C. Kutluhan, Y.-J. Lee, and C. Taubes, HF=HM I: Heegaard Floer homology and Seiberg–Witten Floer homology. Preprint (2010), arXiv:1007.1979. [19] A. Levine, Computing knot Floer homology in cyclic branched covers. Algebr. Geom. Topol. 8 no. 2 (2008), 1163–1190. [20] T. Lidman, Heegaard Floer homology and triple cup products. Preprint (2010), arXiv:1011.4277. [21] R. Lipshitz, A cylindrical reformulation of Heegaard Floer homology. Geom. Topol. 10 (2006), 955–1097. [22] R. Lipshitz, C. Manolescu, and J. Wang, Combinatorial cobordism maps in hat Heegaard Floer theory. Duke Math. J. 145 no. 2 (2008), 207–247. d by factoring mapping [23] R. Lipshitz, P. S. Ozsv´ ath, and D. P. Thurston, Computing HF classes. Preprint (2010), arXiv:1010.2550. [24] P. Lisca and A. Stipsicz, On the existence of tight contact structures on Seifert fibered 3-manifolds. Duke Math. J. 148 no. 2 (2009), 175–209. [25] C. Manolescu, Seiberg–Witten–Floer stable homotopy type of three-manifolds with b1 = 0. Geom. Topol. 7 (2003), 889–932. [26] C. Manolescu and P. S. Ozsv´ ath, Heegaard Floer homology and integer surgeries on links. Preprint (2010), arXiv:1011.1317. [27] C. Manolescu, P. S. Ozsv´ ath and S. Sarkar, A combinatorial description of knot Floer homology. Ann. of Math. (2) 169 no. 2 (2009), 633–660. [28] C. Manolescu, P. S. Ozsv´ ath, Z. Szab´ o, and D. P. Thurston, On combinatorial link Floer homology. Geom. Topol. 11 (2007), 2339–2412. [29] C. Manolescu, P. S. Ozsv´ ath, and D. P. Thurston, Grid diagrams and Heegaard Floer invariants. Preprint (2009), arXiv:0910.0078. [30] M. Marcolli and B.-L. Wang, Equivariant Seiberg–Witten Floer homology. Comm. Anal. Geom. 9 no. 3 (2001), 451–639. [31] Y. Ni, Knot Floer homology detects fibred knots. Invent. Math. 170 no. 3 (2007), 577–608. [32] Y. Ni, Heegaard Floer homology and fibred 3-manifolds. Amer. J. Math. 131 no. 4 (2009), 1047–1063.
Grid diagrams in Heegaard Floer theory
657
[33] P. S. Ozsv´ ath, A. Stipsicz, and Z. Szab´ o, A combinatorial description of the U 2 = 0 version of Heegaard Floer homology. Preprint (2008), arXiv:0811.3395. [34] P. S. Ozsv´ ath, A. Stipsicz, and Z. Szab´ o, Combinatorial Heegaard Floer homology and nice Heegaard diagrams. Preprint (2009), arXiv:0912.0830. [35] P. S. Ozsv´ ath and Z. Szab´ o, Knot Floer homology and the four-ball genus. Geom. Topol. 7 (2003), 615–639. [36] P. S. Ozsv´ ath and Z. Sz´ abo, Holomorphic disks and topological invariants for closed three-manifolds. Ann. of Math. (2) 159 no. 3 (2004), 1027–1158. [37] P. S. Ozsv´ ath and Z. Sz´ abo, Holomorphic disks and three-manifold invariants: properties and applications. Ann. of Math. (2) 159 no. 3 (2004), 1159–1245. [38] P. S. Ozsv´ ath and Z. Szab´ o, Holomorphic disks and knot invariants. Adv. Math. 186 no. 1 (2004), 58–116. [39] P. S. Ozsv´ ath and Z. Szab´ o, Holomorphic disks and genus bounds. Geom. Topol. 8 (2004), 311–334. [40] P. S. Ozsv´ ath and Z. Sz´ abo, Holomorphic triangles and invariants for smooth fourmanifolds. Adv. Math. 202 no. 2 (2006), 326–400. [41] P. S. Ozsv´ ath and Z. Szab´ o, Knot Floer homology and integer surgeries. Algebr. Geom. Topol. 8 no. 1 (2008), 101–153. [42] P. S. Ozsv´ ath and Z. Szab´ o, Holomorphic disks, link invariants and the multi-variable Alexander polynomial. Algebr. Geom. Topol. 8 no. 2 (2008), 615–692. [43] P. S. Ozsv´ ath and Z. Szab´ o, A cube of resolutions of knot Floer homology. J. Topol. 2 no. 4 (2009), 865–910. [44] J. Rasmussen, Floer homology and knot complements. Ph. D. thesis, Harvard University, 2003. [45] J. Rasmussen, Khovanov homology and the slice genus. Invent. Math. 182 no. 2 (2010), 419–447. [46] N. Seiberg and E. Witten, Electric-magnetic duality, monopole condensation, and confinement in N = 2 supersymmetric Yang–Mills theory. Nuclear Phys. B 426 no. 1 (1994), 19–52. [47] S. Sarkar, Grid diagrams and the Ozsv´ ath–Szabo tau-invariant. Preprint, arXiv:1011.5265. [48] S. Sarkar and J. Wang, A combinatorial description of some Heegaard Floer homologies. Ann. of Math. (2) 171 no. 2 (2010), 1213–1236. [49] E. Witten, Monopoles and four-manifolds. Math. Res. Lett. 1 no. 6 (1994), 769–796.
Ciprian Manolescu, Department of Mathematics, UCLA, 520 Portola Plaza, Los Angeles, CA 90095-1555, USA E-mail: [email protected]
Random maps and continuum random 2-dimensional geometries Gr´egory Miermont
Abstract. In the recent years, much progress has been made in the mathematical understanding of the scaling limit of random maps, making precise the sense in which random embedded graphs approach a model of continuum surface. In particular, it is now known that many natural models of random plane maps, for which the faces degrees remain small, admit a universal scaling limit, the Brownian map. Other models, favoring large faces, also admit a one-parameter family of scaling limits, called stable maps. The latter are believed to describe the asymptotic geometry of random maps carrying statistical physics models, as has now been established in some important cases (including the socalled rigid O(n) model on quadrangulations). 2010 Mathematics Subject Classification. Primary 60-XX; Secondary 05C10, 82B20. Keywords. Random maps, random trees, Brownian map, stable maps, O(n) model.
1. Introduction A map is a finite graph that is properly embedded into a 2-dimensional oriented topological surface, and that dissects the latter into a collection of topological polygons, called the faces of the map. Two maps are equivalent (and henceforth are identified) if they can be put in correspondence via a direct homeomorphism between the underlying surfaces. When the underlying surface is the 2-dimensional sphere S2 , we say that the map is plane. There are many other equivalent definitions for maps, reflecting the central role they play in many different branches of mathematics, including graph theory (e.g. the 4-color theorem), combinatorics, representation theory, algebraic geometry, mathematical physics. The book [33] gives a very accessible introduction to a variety of topics featuring maps as a key concept. Starting in the years 1980, it was recognized by theoretical physicists that maps could provide a useful tool in the theory of 2-dimensional quantum gravity. See for instance the survey [24] or the book [3]. A basic object in this theory is a partition function defined as the integral of a certain action functional over the space of all Riemannian metrics on a 2-dimensional surface, considered up to diffeomorphisms. This integral, which can be seen as a 2-dimensional analog of path integrals, is a problematic and ill-defined object on a mathematical point of view. It was therefore suggested that the integral could be approximated by a (finite) sum over maps with a fixed, but large size. Here the family of maps over which one takes a sum (and the notion of size), is not specified: It is expected that any “reasonable” choice for
660
Gr´egory Miermont
such a family will provide an acceptable approximation of the same “universal” object. Keeping in mind the fact that path integrals can be formulated in terms of Brownian motion, and that the latter is (by Donsker’s theorem) the scaling limit of any random walk with centered, independent increments having a finite variance, this last assertion does not seem unreasonable. From a probabilistic point of view, the above questions can be formulated as follows: Taking a map at random in a certain collection, and letting its size go to infinity, can one approximate a continuum random 2-dimensional geometry? Let us be more specific. We say that a map is rooted if one of the edges is distinguished and given an orientation. We are only going to consider rooted maps in the sequel, which is only a matter of mathematical convenience: All results discussed below are believed to hold also for non-rooted maps. Let Qn be the set of rooted plane quadrangulations with n faces, meaning that all faces are bounded by 4 edges — here, an edge that lies entirely in one face should be counted twice. Since equivalent maps are identified, the set Qn is a finite set, and in fact [55] 3n 2n 2 . (1) Card(Qn ) = n+2n+1 n Let Qn be a random variable that is uniformly distributed in Qn . We can view Qn as a metric space by naturally endowing the set V (Qn ) of its vertices with the graph distance dQn : For u, v ∈ V (Qn ), dQn (u, v) is the minimal number of edges needed to go from u to v in Qn . We want to understand the geometry of the space (V (Qn ), dQn ) as n → ∞. At this point, we choose to consider properly renormalized versions of these spaces as n → ∞. Without a renormalization, these spaces become unbounded, and their discrete limits, called local limits, describe so to speak a random infinite lattice, the Uniform Infinite Planar Quadrangulation, which was introduced (for the slightly different case of triangulations) by Angel and Schramm, see [6, 5, 32, 20, 44]. The theory of local limits is very rich, but to obtain such limits, it is not necessary to have a very detailed understanding of the distance dQn . Rather, we want to consider scaling limits and obtain bounded, continuum limiting objects. In order to address such problems, a natural approach is to try to count quadrangulations with vertices at certain prescribed distances. The enumerative theory of maps started with the “census” works of Tutte [55], who established (1) alongside many other similar results for other families of maps. A striking connection between map enumeration and matrix integrals was established in the years 1970 [54, 17] and spawned a huge literature, see [24] for a survey. However, these approaches do not seem to allow to keep track of the extra information of graph distances. Despite a spectacular semi-rigorous computation by Ambjørn and Watabiki [4, 3] for the two-point function of triangulations, there was no clear way to to attack the problem from a mathematical angle. Yet another, more direct approach to the enumeration of maps had been noted in 1981 by Cori and Vauquelin [22]. Motivated by the simple form of formulas such as (1), they were able to provide a bijection between rooted maps and a family of trees, called well-labeled. Despite some notable exceptions such as [7],
Random maps and continuum random 2-dimensional geometries
661
these techniques were mostly put to sleep until they reached their full potential starting with the PhD thesis of Schaeffer [51], who developed a more systematic study of bijective enumeration techniques for maps. A key feature of the Cori– Vauquelin–Schaeffer bijections is that the labels of a well-labeled tree allow to keep track of certain graph distances in the associated map. Based on this observation and on the fact that random labeled trees are a relatively common object in the probability literature, Chassaing and Schaeffer [21] were able to derive rigorously the limit distribution of the radius of Qn , as well as other interesting functionals. Specifically, they proved in that if u∗ is the origin of the root edge of Qn , then 9 1/4 max dQn (u, u∗ ) −→ ∆ n→∞ 8n u∈V (Qn )
(2)
in distribution, where ∆ is a random variable that can be defined in terms of Le Gall’s Brownian snake [34] (or Aldous’ Integrated SuperBrownian Excursion [1]) — here, the normalization (8/9)1/4 is a matter of convention. Using generalizations of the Cori–Vauquelin–Schaeffer bijections by Bouttier, Di Francesco and Guitter [14], it was shown in [42, 56, 45, 49] that (2) holds for models of plane maps that are far more general than uniform quadrangulations, with the same random variable ∆ arising in the limit and the same normalization exponent n1/4 , but with possibly different scaling constants. These models include uniform p-angulations for any p ≥ 3, i.e. uniform maps with n faces all of degree p. The work by Chassaing and Schaeffer suggested that the whole renormalized metric space (V (Qn ), (9/8n)1/4 dQn ) should approach a limiting metric space as n → ∞ in some sense. Such a result was obtained by Marckert and Mokkadem [43], but for an ad hoc topology that does not take fully into account the metric structure. However, the limit object that they introduced is a well-defined random metric space, that they called the Brownian map. A series of papers by Le Gall [35, 36] and by Le Gall and Paulin [41] set important milestones in the theory. In particular, [35] shows that the laws of the random metric spaces ((V (Qn ), (9/8n)1/4 dQn ), n ≥ 1) form a relatively compact family of probability laws on metric spaces, endowed with the Gromov–Hausdorff topology (see below). This shows that these spaces converge, at least along proper extractions, to a limiting space (S, D). Further, Le Gall showed that the topology of any such limit is a.s. the same as Marckert–Mokkadem’s Brownian map, and [41] showed that the latter is none other than that of the 2-dimensional sphere (see also [46] for an alternative approach). Results on uniqueness of typical geodesics in the subsequential limits of random quadrangulations were obtained in [36, 47]. We also mention that Bouttier and Guitter [16] derived the 3-point function of quadrangulations, i.e. the limit law of n−1/4 (dQn (u1 , u2 ), dQn (u1 , u3 ), dQn (u2 , u3 )), where u1 , u2 , u3 are three random points chosen uniformly and independently in V (Qn ). Other references and surveys on the topic include [8, 10, 9, 15, 19, 23, 40]. These works left open the question of the uniqueness of the limiting distribution of (V (Qn ), (9/8n)1/4 dQn ), and this problem was only solved recently in two independent works by Le Gall and the author [37, 48], where it is proved that the limit is indeed the random metric space introduced by Marckert and Mokkadem.
662
Gr´egory Miermont
We postpone an exact statement to Section 2.3. We also mention that Le Gall obtains universality results in the same context as in [42], as well as for uniform random triangulations. The “uniqueness of the Brownian map” somehow justifies the initial statement of physicists that random maps approximate a continuum random surface. The story is far from ending here, though. It is noted in [39] that other natural models of random maps admit scaling limits that are different from the Brownian map, when one allows models where the degrees of faces are large. These limits, the “stable maps”, which form a one-parameter family of mutually singular random spaces, are not topological spheres anymore, but should rather be thought of as random fractal carpets. One initial motivation for this work was that maps with large faces were believed to describe the interfaces of statistical physics models on random maps. This has been partially established in two recent works by Borot, Bouttier and Guitter [12, 11], for the so-called O(n) model on quadrangulations and certain variants, giving a renewed viewpoint on these well-studied models [29, 30, 13]. Let us conclude this introduction by stressing that there is another, purely continuum approach to quantum gravity, the so-called Liouville quantum gravity. The mathematical grounds for this theory are starting to emerge after the work by Duplantier and Sheffield [26]. This theory crucially involves conformal invariance, as opposed to the random maps approach, yet it is believed that the two approaches have very deep connections. The rest of the paper is organized as follows. In Section 2.1 we introduce the Cori–Vauquelin–Schaeffer bijection, and use it to motivate the definition of the Brownian map. We introduce the latter in section 2.3, as well as the main convergence result. Then in Section 3 we briefly introduce the model of stable maps, and motivate its connection to the O(n) model on quadrangulations. Section 4 gives some final remarks.
2. Convergence to the Brownian map In this section, we explain how to construct plane quadrangulations from labeled trees. This will motivate the definition of the Brownian map in the next section. 2.1. The Cori–Vauquelin–Schaeffer bijection. A plane tree is a plane map with a single face, or equivalently, a finite tree that is embedded in the sphere. If the tree, considered as a map, is rooted, then the root vertex is by definition the origin of the root-edge. A well-labeled tree is a pair (t, l), where t is a rooted plane tree, and l : V (t) → Z is a labeling function on the set of vertices of t, that satisfies the following properties: • l(root) = 0, • |l(u) − l(v)| ≤ 1 if u, v are adjacent vertices in t.
663
Random maps and continuum random 2-dimensional geometries
2n 1 It is well-known that there are n+1 rooted plane trees with n edges, and it n 3n 2n follows that there are n+1 n well-labeled trees with n edges. We let Tn be the set of such well-labeled trees. Let (t, l) be a fixed well-labeled tree with n edges. As we go around the tree in cyclic order, one encounters exactly once each of the 2n corners incident to the vertices of the tree. For every such corner c, we draw an arc from c to its successor, which is the first corner s(c) coming after c in cyclic order around t, whose label is strictly smaller than the label of the vertex incident to c. If there is no such corner, then we draw an arc from c to an extra vertex named v∗ . The vertex v∗ should not belong to the support of the embedding of t, and the arcs can be drawn in such a way that they do not cross, and do not intersect the support of t (except of course at its vertices). Finally, consider the embedded graph q whose edge-set is the set of arcs, and whose vertex-set is V (q) = V (t) ∪ {v∗ }. It is easy to see that q is a map. An edge of this map is naturally distinguished: It is the arc going from the corner incident to the root-edge of t to its successor. This edge can be given two orientations, yielding two different rooted maps. The choice of the orientation can be specified using a parameter ∈ {−1, 1}. The map q is also naturally pointed, in the sense that it has a distinguished vertex v∗ . We let Q•n be the set of pointed, rooted plane quadrangulations with n faces. −1
−1
−1
−1 0
0
0
0
1
1
0
0
0 v∗
−2
0 min ` − 1
−1
−1
= −1
=1
Figure 1. Illustration of the Cori–Vauquelin–Schaeffer bijection
Theorem 2.1 ([22, 21]). The map q is a plane quadrangulation with n faces, and the mapping ((t, l), ) 7→ q defined above is a bijection between Tn × {−1, 1} and Q•n . Moreover, if v ∈ V (t) = V (q) \ {v∗ }, then the graph distance in q from v to the distinguished vertex v∗ is given by the formula dq (v, v∗ ) = l(v) − min l(u) + 1 . u∈V (t)
(3)
Let us draw some immediate consequences of this theorem. For every q ∈ Qn , it holds that Card(V (q)) = n + 2, which is a simple consequence of the Euler
664
Gr´egory Miermont
formula. Each choice of a vertex in q yields a distinct element of Q•n , so that Card(Q•n ) = (n+2)Card(Qn ). Together with the previous theorem and the formula we derived earlier for Card(Tn ), this entails the counting formula (1). A second consequence is that if (Tn , Ln ) is a uniform random element in Tn , then the pointed rooted quadrangulation (Qn , v∗ ) associated with it (choosing the orientation of the root uniformly at random) is such that Qn is uniform over Qn , and v∗ is uniform over V (Qn ) conditionally given Qn . Hence, we see that (3) implies roughly that through the Cori–Vauquelin–Schaeffer bijection, the label function (Ln (v), v ∈ V (Tn )) describe graph distances to a uniformly chosen vertex, in a uniformly chosen random quadrangulation with n edges. 2.2. Scaling limits of well-labeled trees. In order to derive the large n behavior of Qn , we are thus led to understand that of the random labeled tree (Tn , Ln ). A convenient way to do this is by describing the latter using the so-called contour and label processes. Let u0 , u1 , . . . , u2n−1 , u2n = u0 be the sequence of vertices of Tn visited in contour order, starting from the root corner (some vertices are visited more than once), so in particular, u0 is the root vertex. Let Cn (i) be the graph distance in Tn from u0 to ui , and let Ln (i) = Ln (ui ), with a slight abuse of notation. Both Cn , Ln are extended to continuous processes from [0, 2n] to R by linear interpolation between integer times. Recall that the normalized Brownian excursion e = (et , 0 ≤ t ≤ 1) is a random continuous process, which is so to speak an excursion of Brownian motion away from 0, conditioned to have total duration 1. For s, t ∈ [0, 1], let de (s, t) = es + et − 2
inf
s∧t≤u≤s∨t
eu ,
0 ≤ s, t ≤ 1 .
This quantity, denoted by de (s, t), defines a pseudo-distance on [0, 1], and the quotient metric space Te = [0, 1]/{de = 0} is an important probabilistic object, the Continuum Random Tree [2]. It is a random R-tree, see [28]. Conditionally given e, we define a label process Z = (Zt , 0 ≤ t ≤ 1) as a continuous centered Gaussian process satisfying Z0 = 0 and E[|Zs − Zt |2 | e] = de (s, t) ,
0 ≤ s, t ≤ 1 .
It is easy to see that Z is a class function for the relation {de = 0}, and hence that it induces a random function on Te , which we still denote by Z. Informally, Z should be understood as a Brownian motion indexed by the tree Te . The pair (Te , Z) is the continuum counterpart of the tree (Tn , Ln ), which can be formalized in the following statement. Theorem 2.2. As n → ∞, the following convergence in distribution holds in the space C([0, 1], R)2 (endowed with the uniform norm):
9 1/4 Cn (2nt) √ , Ln (2nt) −→ (e, Z) . 8n 0≤t≤1 2n 0≤t≤1
Random maps and continuum random 2-dimensional geometries
665
The renormalization√that appear in this statement is relatively transparent: The diffusive rescaling n of the first component comes from the fact that Cn is very similar to a simple random walk (it is in fact conditioned to be positive and to be back to the origin at time 2n), while the n1/4 rescaling of the second component comes from the fact that Ln describes a family of centered√random walks indexed by the branches of the tree, which have lengths of order n. The exact scaling constants come from applications of the central limit theorem. 2.3. The Brownian map. Starting from the continuum labeled tree (Te , Z), one can try to define a continuous analog of the Cori–Vauquelin–Schaeffer bijection. Due to the fact that we want to rescale distances in Qn , the arcs involved in the construction of Qn from (Tn , Ln ) should become smaller and smaller, and in the limit they correspond to certain identifications of points in Te . Similarly to de , one can define a pseudo-distance dZ on [0, 1] by the formula dZ (s, t) = Zs + Zt − 2 max inf Zu , inf Zu , u∈I(s,t)
u∈I(t,s)
where I(s, t) = [s, t] if s ≤ t, and I(s, t) = [s, 1] ∪ [0, t] if t < s: This is the circular arc from s to t if [0, 1] is seen as a circle by identifying 0 with 1. In turn the quotient ([0, 1]/{dZ = 0}, dZ ) is a random real tree. Following [43, 35], the Brownian map is the metric space obtained by quotienting the pseudo-metric space ([0, 1], dZ ) with respect to the two equivalence relations {de = 0} and {dZ = 0}. Formally, define a pseudo-metric on [0, 1] by letting D∗ (s, t) = inf
k nX i=1
o dZ (si , ti ) : k ≥ 1, s = s1 , t = tk , de (ti , si+1 ) = 0, 1 ≤ i ≤ k−1 ,
and let S = [0, 1]/{D∗ = 0}. Definition 2.3. The Brownian map is the random metric space (S, D∗ ). We can now state the main convergence result of [37, 48]. Recall that the Gromov–Hausdorff distance between two compact metric spaces is the infimum Hausdorff distance between isometric embeddings of these two spaces in a common metric space [18]. In order to give a more complete picture, we also include results from [35, 41, 46] on the Hausdorff dimension and topology of the limiting metric space at the end of the following statement. Theorem 2.4. We have the following convergence in distribution in the Gromov– Hausdorff topology:
V (Qn ),
9 1/4 dQn −→ (S, D∗ ) . n→∞ 8n
The Brownian map is a random geodesic metric space which is almost surely homeomorphic to the 2-dimensional sphere, and has Hausdorff dimension 4.
666
Gr´egory Miermont
Let us give some intuition on this result and elements of the proof. The existence of limits of (V (Qn ), (9/8n)1/4 dQn ) along subsequences can be obtained as a consequence of Theorem 2.2 and Gromov’s compactness theorem [18]. Then, from the description of Qn in terms of (Tn , Ln ), it is not too difficult to see that any subsequential limit should be described as a pseudo metric D on [0, 1], satisfying the key properties, for s, t ∈ [0, 1]: D(s, s∗ ) = Zs − inf Z ,
D(s, t) ≤ dZ (s, t) ,
D(s, t) = 0 if de (s, t) = 0 ,
where s∗ is the point of [0, 1] at which the process Z attains its overall minimum. The first formula is a continuum analog of (3), and the second can be easily obtained from the discrete picture by building explicitly a path with length dZ (s, t) from s to t in ([0, 1], D), by gluing two pieces of geodesic paths from s, t towards s∗ . Such geodesics are obtained as continuum analogs of the chain from a given corner to its consecutive successors until reaching v∗ . In the continuum, they correspond to negative records of the process Z from s to s∗ . Finally, the last constraint is obvious from the discrete picture: It just says that two corners incident to the same vertex of Tn also correspond to a single vertex in Qn . It can then be checked that D∗ is the maximal pseudo-distance on [0, 1] that satisfies the three constraints above. In particular, it always holds that D ≤ D∗ , and the uniqueness property of the Brownian map boils down to showing that D∗ ≤ D. To this end, one must show that in the metric space ([0, 1], D), any given points s, t ∈ [0, 1] can be joined by a path made of pieces of geodesic paths pointing towards s∗ , whose total length can be made arbitrarily close to D(s, t). This property can look surprising (it is certainly wrong in Euclidean geometry), but it turns out to be true in our situation. More precisely, if γ is a geodesic path from s to t in ([0, 1], D), it so happens that for almost every point u on γ, a geodesic from u to s∗ intersects γ along a non-trivial segment, so that geodesics in ([0, 1], D) tend to stick together (a property related to the coalescence of geodesics studied in [36]). Most of the work in [48] is to show that the “bad” set Γ of exceptional points u on γ from which a geodesic to s∗ does not re-intersect γ is small, in the sense that its box-counting dimension is strictly bounded by 1. See Figure 2. This is proved by essentially counting arguments based on a bijection developed in [47], which is a generalization of the Cori–Vauquelin–Schaeffer bijection taking into account several distinguished vertices rather than one. This bijection allows to estimate the probability of certain star-shaped configurations of geodesics, which can be related to the event that a uniformly random point being close to Γ. s
u∈Γ
t γ
s∗ Figure 2. Illustration of a bad point u ∈ Γ: The geodesic from u to s∗ branches away from the geodesic from s to t immediately.
Random maps and continuum random 2-dimensional geometries
667
3. Boltzmann maps and O(n) models A natural model of random maps, generalizing the model of uniform quadrangulations considered so far, consists in fixing a family of non-negative local weights and choosing a map with probability proportional to the product of these local weights indexed by the faces of the map. Here we are going to focus only on bipartite maps, where all the faces have even degree, as it is a technically simpler situation. 3.1. Boltzmann maps. We fix a family w = (w1 , w2 , . . .) of non-negative real numbers, and assume that wi > 0 for some i > 1. By convention, fix w0 = 1. Let M be the set of rooted bipartite plane maps, and Mn the subset of such maps with n vertices. It is assumed that M1 contains a single element, the vertex map with one vertex, one face and no edge. For every m ∈ M, let F (m) be the set of faces of m and deg(f ) be the degree of an element f ∈ F (m). Then, let Y Ww (m) = wdeg(f )/2 . f ∈F (m)
This defines a σ-finite, non-negative measure on M, with total mass Zw = Ww (M) ∈ (0, ∞]. If it is finite, then we can define a Boltzmann probability distribution by letting Pw = Ww /Zw . For technical reasons, it is useful to require slightly more than the finiteness of Zw , so we say that w is admissible if X • Zw = Ww (Card(V (·))) = Card(V (m))Ww (m) < ∞ . m∈M
It turns out that some of the key features of Boltzmann measures can be obtained in terms of the function X 2k + 1 fw (x) = qk+1 xk , x ≥ 0. (4) k k≥0
For instance, it is shown in [42] that w is admissible if and only if the equation fw (x) = 1 −
1 x
+ is equal to admits a solution in (1, ∞). In this case, the smallest such solution zw • (Zw + 1)/2. The interesting situation occurs when the two graphs of the functions fw and + x 7→ 1 − x−1 are tangent at zw , which is then necessarily the unique solution. One says that w is critical.
3.2. Regular critical maps. Moreover, w is said to be regular critical if it is + critical, and if the radius of convergence Rw of fw satisfies Rw > zw . Regular critical random maps behave like uniform quadrangulations in the scaling limit, as the following result by Le Gall shows, generalizing Theorem 2.4.
668
Gr´egory Miermont
Theorem 3.1 ([37]). Let w be a regular critical sequence, and let Mn be a random element of Mn with distribution Pw (· | Mn ). Then there is a constant Cw ∈ (0, ∞) such that (V (Mn ), Cw n−1/4 dMn ) converges as n → ∞ to the Brownian map (S, D∗ ), in distribution for the Gromov–Hausdorff topology. Partial results toward this theorem had been obtained in [42], which extends the Chassaing–Schaeffer results of [21], such as the convergence (2). See also [45, 56, 49]. 3.3. Maps with large faces. However, an interesting phenomenon occurs in certain situations where w is a critical, non-regular weight. More precisely, we + want to look at situations where the second derivative of fw explodes at zw . This happens when, so to speak, the distributions of face degrees under Pw is heavytailed. To look for such situations we follow [39] and introduce a base weight sequence w◦ = (wk◦ , k ≥ 1) satisfying lim k a wk◦ = 1 ,
k→∞
(5)
for some positive parameter a > 3/2. Set f◦ = fw◦ , so by (4,5), the radius of convergence of f◦ is 1/4. The fact that a > 3/2 guarantees that f◦ (1/4) and f◦0 (1/4) are finite. Set c=
4 , 4f◦ (1/4) + f◦0 (1/4)
β=
f◦0 (1/4) , 4f◦ (1/4) + f◦0 (1/4)
and consider the weight sequence w = (wk , k ≥ 1) defined by wk = c(β/4)k−1 wk◦ . + = Rw = Then [39, Proposition 2] shows that w is admissible, critical and that zw −1 β . Moreover, these choices for c, β are the only ones for which these properties hold. We finally assume that a is less than 5/2. Under these hypotheses, we consider a random map Mn with distribution Pw (· | Mn ). Then it holds that the largest degree of a face of Mn is of order n1/α , and the typical graph distances in Mn are of order n1/2α , where α = a − 1/2 ∈ (1, 2). We obtain in [39] the following partial scaling limit result.
Theorem 3.2. From every subsequence, we can extract a further subsequence along which the following convergence in distribution holds in the Gromov–Hausdorff topology: (V (Mn ), n−1/2α dMn ) −→ (Sα , Dα ) . n→∞
The limit (Sα , Dα ) is a random metric space called stable map of exponent α. Its Hausdorff dimension equals 2α almost-surely. Note that the convergence in this statement holds only along appropriate subsequences, it is still an open question to show that the distribution of (Sα , Dα ) is uniquely defined. We see that the laws of these spaces are mutually singular when
Random maps and continuum random 2-dimensional geometries
669
α varies, because they have different dimensions, and also mutually singular with respect to the law of the Brownian map, which has Hausdorff dimension 4. The stable maps are described as random quotients, similarly to the Brownian map, but the processes that encode these objects are more elaborate than the Brownian snake (e, Z). Using a bijection by Bouttier, Di Francesco and Guitter [14], one can relate Pw -distributed maps with certain models of Galton-Watson trees with two types, and with labeled vertices. Under the hypotheses of Theorem 3.1, these trees still admit the Brownian snake as a scaling limit. But under the hypotheses of Theorem 3.2, the trees converge to the so-called stable trees of Duquesne, Le Gall and Le Jan [38, 27, 28], which are models of random R-trees with branchpoints of infinite degrees. The stable maps are random quotients of these trees. Many questions remain on the topological nature of the spaces (Sα , Dα ). It is expected that these are random fractal carpets, i.e. spheres minus a countable collection of mutually disjoint open subsets. Depending on the value of α, it is believed that these “holes” have simple and mutually non-intersecting boundaries a.s., or have self and mutual intersections a.s., the critical value for α being 3/2. These conjectures come from analogies with the so-called conformal loop ensembles CLE from [52, 53], which are believed to describe the interfaces of conformally invariant statistical physics models on regular lattices. In the next section, we explain how stable maps play a similar role in the situation where the lattices are random rather than regular. The reason why one believes that topological aspects of CLEs and stable maps should be similar comes from physical motivation, namely, the so-called Knizhnik– Polyakov–Zamolodchikov correspondence [31, 25], which relates conformally invariant models to models in random metrics. Despite spectacular recent progress [26] towards its mathematical understanding, this correspondence is still quite mysterious, and far from being well understood on the side of random maps. 3.4. The O(n) model on quadrangulations. As an a posteriori justification for the model of maps introduced around (5), let us discuss the rigid O(n) model introduced in [12]. Here, we consider maps made of two building blocks: Plain quadrangles and quadrangles traversed by a piece of arc from a side to the opposite side. The rigid O(n) configurations are maps made of these two building blocks with obvious compatibility conditions, namely, that pieces of arcs should connect to form a collection of closed loops. See Figure 3. For given positive parameters g, h, n, we assign weight g, h to the two building blocks respectively, and weight n to every loop of the configuration. The total weight of a configuration c is then (n) the product Wg,h (c) of weights of its blocks and loops. If the sum of these total weights over all configurations is finite, we can consider a probability measure (n) (n) (n) Pg,h by renormalizing Wg,h . A partial account on a Pg,h -distributed random map can be given by the exterior gasket, which is obtained by removing the interior of the loops (i.e. the part that does not contain the root-edge) as well as the faces traversed by the loops. It is then an easy exercise to check that this exterior gasket has a Boltzmann law Pw , with wk = nh2k Fk + g1{k=2} , where Fk is the sum of
670
Gr´egory Miermont
total weights of O(n) configurations with a boundary of size 2k. Such maps are made of the usual building blocks, but the face incident to the root is a polygon of degree 2k, which is not traversed by a loop.
g
g 3 h12 n3
h
n
w1 w22 w3
Figure 3. A example of rigid O(n) configuration, and its exterior gasket
It is shown in [12] that for any fixed n ∈ (0, 2], the weights wk are exactly of the form discussed after (5) if and only if the parameters (g, h) belong to a concave critical line, which we assume from now on. If h is smaller than a value hc = hc (n) > 0, then the hypotheses of Theorem 3.1 hold, and the scaling limit of the exterior gasket is the Brownian map. If h is larger than or equal to hc then the hypotheses of Theorem 3.2 are in force, with a = 3/2 + π −1 arcsin(n/2) ∈ (3/2, 2] if h > hc , and a = 5/2 − π −1 arcsin(n/2) ∈ (3/2, 2] if h = hc , these situations being called dense and dilute phases of the O(n) model in physics. Note that for n = 2, the dense and dilute phase coincide in a single phase corresponding to a = 2.
4. Conclusion There are many interesting aspects of the geometry of maps that we have not covered in this short review. One of them is the analogous problem of scaling limit of maps on other surfaces than the sphere. In the case of the g-torus or of the disk, Chapuy [19] and Bettinelli [8, 10, 9] have set the first milestones in this problem (see also [47]), but the analog of Theorem 2.4 in this context is still open so far. Arguably, the most crucial question in the theory of random maps and their scaling limits is to relate these to other approaches of 2-dimensional quantum gravity, and in particular, to discover connections with the approaches based on conformal geometry or the moduli space of curves [50]. The genuine combinatorial nature of the bijections underlying the study of random maps make these potential links quite mysterious, but this participates to the intrinsic beauty of the topic.
Random maps and continuum random 2-dimensional geometries
671
References [1] D. J. Aldous. Tree-based models for random distribution of mass. J. Statist. Phys. 73(3–4) (1993), 625–641. [2] D. J. Aldous. The continuum random tree. I. Ann. Probab. 19(1) (1991), 1–28. [3] J. Ambjørn, B. Durhuus, and T. Jonsson. Quantum geometry. A statistical field theory approach. Cambridge Monographs on Mathematical Physics. Cambridge University Press, Cambridge, 1997. [4] J. Ambjørn and Y. Watabiki. Scaling in quantum gravity. Nuclear Phys. B 445(1) (1995), 129–142. [5] O. Angel. Growth and percolation on the uniform infinite planar triangulation. Geom. Funct. Anal. 13(5) (2003), 935–974. [6] O. Angel and O. Schramm. Uniform infinite planar triangulations. Comm. Math. Phys. 241(2–3) (2003), 191–213. [7] D. Arqu`es. Les hypercartes planaires sont des arbres tr`es bien ´etiquet´es. Discrete Math. 58(1) (1986), 11–24. [8] J. Bettinelli. Scaling limits for random quadrangulations of positive genus. Electron. J. Probab. 15 no. 52 (2010), 1594–1644. [9] J. Bettinelli. Scaling limit of random planar quadrangulations with a boundary. 2011. arXiv:1111.7227. [10] J. Bettinelli. The topology of scaling limits of positive genus random quadrangulations. Ann. Probab. (2012+). To appear. [11] G. Borot, J. Bouttier, and E. Guitter. More on the O(n) model on random maps via nested loops: loops with bending energy. 2012. [12] G. Borot, J. Bouttier, and E. Guitter. A recursive approach to the O(n) model on random maps via nested loops. J. Phys. A: Math. Theor. (45):045002, 2012. [13] G. Borot and B. Eynard. Enumeration of maps with self-avoiding loops and the O(n) model on random lattices of all topologies. J. Stat. Mech. Theory Exp. (1):P01010, 62, 2011. [14] J. Bouttier, P. Di Francesco, and E. Guitter. Planar maps as labeled mobiles. Electron. J. Combin. 11:Research Paper 69. (electronic), 2004. [15] J. Bouttier and E. Guitter. Statistics in geodesics in large quadrangulations. J. Phys. A 41(14):145001, 30, 2008. [16] J. Bouttier and E. Guitter. The three-point function of planar quadrangulations. J. Stat. Mech. Theory Exp. (7):P07020, 39, 2008. [17] E. Br´ezin, C. Itzykson, G. Parisi, and J. B. Zuber. Planar diagrams. Comm. Math. Phys. 59(1) (1978), 35–51. [18] D. Burago, Y. Burago, and S. Ivanov. A course in metric geometry. Graduate Studies in Mathematics vol. 33. American Mathematical Society, Providence, RI, 2001. [19] G. Chapuy. The structure of unicellular maps, and a connection between maps of positive genus and planar labelled trees. Probab. Theory Related Fields 147(3–4) (2010), 415–447. [20] P. Chassaing and B. Durhuus. Local limit of labeled trees and expected volume growth in a random quadrangulation. Ann. Probab. 34(3) (2006), 879–917.
672
Gr´egory Miermont
[21] P. Chassaing and G. Schaeffer. Random planar lattices and integrated superBrownian excursion. Probab. Theory Related Fields 128(2) (2004), 161–212. [22] R. Cori and B. Vauquelin. Planar maps are well labeled trees. Canad. J. Math. 33(5) (1981), 1023–1042. [23] N. Curien, J.-F. Le Gall, and G. Miermont. The brownian cactus i. scaling limits of discrete cactuses. 2011. arXiv:1102.4177. [24] P. Di Francesco, P. Ginsparg, and J. Zinn-Justin. 2D gravity and random matrices. Phys. Rep. 254(1–2) (1995), 133. [25] B. Duplantier. Conformal fractal geometry & boundary quantum gravity. In: Fractal geometry and applications: a jubilee of Benoˆıt Mandelbrot, Part 2, Proc. Sympos. Pure Math. vol. 72, 365–482. Amer. Math. Soc., Providence, RI, 2004. [26] B. Duplantier and S. Sheffield. Liouville quantum gravity and KPZ. Invent. Math. 185(2) (2011), 333–393. [27] T. Duquesne and J.-F. Le Gall. Random trees, L´evy processes and spatial branching processes. Ast´erisque, 281 (2002), vi+147. [28] T. Duquesne and J.-F. Le Gall. Probabilistic and fractal aspects of L´evy trees. Probab. Theory Related Fields 131(4) (2005), 553–603. [29] B. Eynard and C. Kristjansen. Exact solution of the O(n) model on a random lattice. Nuclear Phys. B 455(3) (1995), 577–618. [30] B. Eynard and C. Kristjansen. More on the exact solution of the O(n) model on a random lattice and an investigation of the case |n| > 2. Nuclear Phys. B 466(3) (1996), 463–487. [31] V. G. Knizhnik, A. M. Polyakov, and A. B. Zamolodchikov. Fractal structure of 2Dquantum gravity. Modern Phys. Lett. A 3(8) (1988), 819–826. [32] M. Krikun. Local structure of random quadrangulations. 2005. Preprint. [33] S. K. Lando and A. K. Zvonkin. Graphs on surfaces and their applications. Encyclopaedia of Mathematical Sciences vol. 141. Springer-Verlag, Berlin, 2004. [34] J.-F. Le Gall. Spatial branching processes, random snakes and partial differential equations. Lectures in Mathematics ETH Z¨ urich. Birkh¨ auser Verlag, Basel, 1999. [35] J.-F. Le Gall. The topological structure of scaling limits of large planar maps. Invent. Math. 169(3) (2007), 621–670. [36] J.-F. Le Gall. Geodesics in large planar maps and in the Brownian map. Acta Math. 205(2) (2010), 287–360. [37] J.-F. Le Gall. Uniqueness and universality of the Brownian map. arXiv:1105.4842.
2011.
[38] J.-F. Le Gall and Y. Le Jan. Branching processes in L´evy processes: the exploration process. Ann. Probab. 26(1) (1998), 213–252. [39] J.-F. Le Gall and G. Miermont. Scaling limits of random planar maps with large faces. Ann. Probab. 39(1) (2011), 1–69. [40] J.-F. Le Gall and G. Miermont. Scaling limits of random trees and planar maps. In: Probability and Statistical Physics in two and more Dimensions, Clay Mathematics Proceedings. American Mathematical Society, 2012. To appear.
Random maps and continuum random 2-dimensional geometries
673
[41] J.-F. Le Gall and F. Paulin. Scaling limits of bipartite planar maps are homeomorphic to the 2-sphere. Geom. Funct. Anal. 18(3) (2008), 893–918. [42] J.-F. Marckert and G. Miermont. Invariance principles for random bipartite planar maps. Ann. Probab. 35(5) (2007), 1642–1705. [43] J.-F. Marckert and A. Mokkadem. Limit of normalized random quadrangulations: the Brownian map. Ann. Probab. 34(6) (2006), 2144–2202. [44] L. M´enard. The two uniform infinite quadrangulations of the plane have the same law. Ann. Inst. Henri Poincar´e Probab. Stat. 46(1) (2010), 190–208. [45] G. Miermont. An invariance principle for random planar maps. In: Fourth Colloquium on Mathematics and Computer Sciences CMCS’06, Discrete Math. Theor. Comput. Sci. Proc., AG, 39–58 (electronic). Nancy, 2006. [46] G. Miermont. On the sphericity of scaling limits of random planar quadrangulations. Electron. Commun. Probab. 13 (2008), 248–257. ´ Norm. [47] G. Miermont. Tessellations of random maps of arbitrary genus. Ann. Sci. Ec. Sup´er. (4) 42(5) (2009), 725–781. [48] G. Miermont. The Brownian map is the scaling limit of uniform random plane quadrangulations. Acta Math. (2012+). To appear. arXiv:1104.1606. [49] G. Miermont and M. Weill. Radius and profile of random planar maps with faces of arbitrary degrees. Electron. J. Probab. 13 no. 4 (2008), 79–106. [50] A. Okounkov and R. Pandharipande. Gromov–Witten theory, Hurwitz numbers, and matrix models, I. 2001. arXiv:math.AG/0101147. [51] G. Schaeffer. Conjugaison d’arbres et cartes combinatoires al´eatoires. PhD thesis, Universit´e Bordeaux I, 1998. [52] S. Sheffield. Exploration trees and conformal loop ensembles. Duke Math. J. 147(1) (2009), 79–129. [53] S. Sheffield and W. Werner. Conformal loop ensembles: the Markovian characterization and the loop-soup construction. Ann. Math. (2012+). To appear. arXiv:1006.2373, arXiv:1006.2374. [54] G. ’t Hooft. A planar diagram theory for strong interactions. Nucl. Phys. B 72 (1974), 461–473. [55] W. T. Tutte. A census of planar maps. Canad. J. Math. 15 (1963), 249–271. [56] M. Weill. Asymptotics for rooted planar maps and scaling limits of two-type spatial trees. Electron. J. Probab. 12, Paper no. 31, 862–925 (electronic), 2007.
Gr´egory Miermont, D´epartement de Math´ematiques, Universit´e de Paris-Sud 11, Bˆ at. 425, 91405 Orsay Cedex, France E-mail: [email protected]
Approximate (Abelian) groups Tom Sanders
Abstract. Our aim is to discuss the structure of subsets of Abelian groups which behave ‘a bit like’ cosets (of subgroups). One version of ‘a bit like’ can be arrived at by relaxing the usual characterisation of cosets: a subset S of an Abelian group is a coset if for every three elements x, y, z ∈ S we have x + y − z ∈ S. What happens if this is not true 100% of the time but is true, say, 1% of the time? It turns out that this is a situation which comes up quite a lot, and one possible answer is called Fre˘ıman’s theorem. We shall discuss it and some recent related quantitative advances. 2010 Mathematics Subject Classification. Primary 20K99; Secondary 11P70. Keywords. Fre˘ıman’s theorem, sumsets.
1. Introduction The aim of this article is to cover some of the recent developments in the theory of approximate Abelian groups. Our starting point is a common characterisation of cosets of subgroups: suppose that G is an Abelian group and A ⊂ G is a coset of a subgroup of G – we call this a coset in G. A simple characterisation of A being a coset in G is that (i) A 6= ∅; and (ii) x, y, z ∈ A ⇒ x + y − z ∈ A. The theory of approximate groups is concerned with relaxing these conditions. Relaxing the first does not deliver particularly exciting results; relaxing the second, however, turns out to be very fruitful. Our relaxations will be statistical in nature, and so we shall think of G as being discrete and endowed with Haar counting measure. It follows that we shall be interested in finite sets A. We write X E(A) := 1A (x)1A (y)1A (z)1A (x + y − z), x,y,z∈G
a quantity which is called the additive energy of A. We see that A is a coset if and only if (it is non-empty) and E(A) = |A|3 . Our first question is what happens if condition (ii) is true only a proportion 1 − δ of the time. In particular, what do sets A look like for which E(A) ≥ (1 − δ)|A|3 , (1.1) where δ is to be thought of as a small constant, say δ ≤ 1/10, and |A| is to be thought of as tending to infinity.
676
Tom Sanders
It is instructive to begin with some examples. The natural way to create sets with this property is based around cosets in G. Indeed, suppose that H is a coset in G and A is any set satisfying |A ∩ H| ≥ (1 − )|A| and |A ∩ H| ≥ (1 − η)|H|. In words this says that 1 − of A is contained in a coset H, and 1 − η of H is in the intersection of A and H. Then, after a short calculation, we find that E(A) ≥ (1 − O( + η))|A|3 . It turns out that sets constructed in the above way are essentially the only sets with large additive energy in the sense of (1.1). The following result is classical and has been considered in far more generality than the statement here suggests. A proof can be read out of Fournier [4], but it seems likely that it was known before then. Proposition 1.1. Suppose that G is an Abelian group and A ⊂ G is finite with E(A) ≥ (1 − δ)|A|3 . Then there is some coset H in G such that |A ∩ H| ≥ (1 − O(δ 1/2 ))|A| and |A ∩ H| ≥ (1 − O(δ 1/2 ))|H|. The main strength of this result is that it is a rough equivalence: every set satisfying the conclusion also satisfies the hypothesis with δ replaced by O(δ 1/2 ) so that up to powers the hypothesis and conclusion are equivalent. The main weakness of the result is that there may be very few sets satisfying the hypothesis. For example, suppose that G = Z/pZ where p is a prime. Then G has no non-trivial subgroups, so if A is of ‘intermediate’ size then a short argument from Proposition 1.1 tells us we must have E(A) ≤ (1 − Ω(1))|A|3 ; equivalently, no set of ‘intermediate’ size satisfies the hypothesis. This weakness highlighted in the above discussion can be rectified by a further relaxation of condition (ii), and this is the main concern of the paper. We ask what happens if condition (ii) is true only a proportion δ of the time. In particular, what do sets A look like for which E(A) ≥ δ|A|3 , (1.2) where this time δ is to be thought of as tending to 0 (if at all) much more slowly than |A| tends to infinity. Once again we start by trying to construct examples of such sets. As before we can use cosets to generate a large class of sets with large additive energy (in the sense of (1.2) this time), but what is more interesting is that a genuinely new sort of structure emerges, that of arithmetic progressions. If P is an arithmetic progression a short calculation shows that E(P ) ∼ 32 |P |3 , but it turns out that this is just one example from a much wider class. Definition 1.2 (Convex coset progressions). A convex progression in G is a set of the form φ(Q ∩ Zd ) where Q is a symmetric convex body about the origin in Rd , and φ : Zd → G is a homomorphism. A convex coset progression in G is then a set H + P where P is a convex progression and H is a coset in G. In both cases we say that the progression is d-dimensional.
677
Approximate groups
It is worth making two small remarks here. First, dimension is monotonic so that if a convex coset progression M is d-dimensional then it is also d0 -dimensional for all d0 ≥ d; secondly, in this article we are only interested in dimension up to a constant multiple. Crucially convex coset progressions inherit growth properties from the convex body used in their definition. It is not hard to show that if M is a convex coset progression then |M + M | ≤ exp(O(d))|M |. We now return to constructing sets with large additive energy. Suppose that M is a d-dimensional convex coset progression and A is any set satisfying |A ∩ M | ≥ η|A| and |A ∩ M | ≥ |M |. Then a short calculation using the fact that |M + M | ≤ exp(O(d))|M | tells us that E(A) ≥ η 3 exp(−O(d))|A|3 . Again it turns out that sets constructed in the above way are essentially the only sets having large additive energy. The following result captures this fact and is a combination of the Balog–Szemer´edi Lemma [1] and the Green–Ruzsa Theorem [7]. It should be remarked that the Green–Ruzsa Theorem is also called Fre˘ıman’s theorem for Abelian groups and extends Fre˘ıman’s theorem [5] from Z to general Abelian group. Theorem 1.3. Suppose that G is an Abelian group and A ⊂ G is finite with E(A) ≥ δ|A|3 . Then there is a convex coset progression M such that |A ∩ M | ≥ η(δ)|A|, |A ∩ M | ≥ (δ)|M | and dim M ≤ d(δ), for some (increasing) functions η, and d. While the result is appealing in its own right, since the breakthrough work of Gowers [6] it has become a central result in additive combinatorics and allied areas as a result of numerous applications. This wealth of applications provides strong empirical evidence for the utility of the theorem, but there are also some rather compelling theoretical reasons why it should be so useful. We turn to some of these now. (i) The hypothesis of the theorem is robust under small perturbations. This is particularly useful because while the input is flexible, the output, a convex coset progression, is rather rigid. (ii) The hypothesis of the theorem is easily satisfied. From a theoretical perspective this is because convex coset progressions are ubiquitous in contrast to subgroups (in some groups). (iii) A convex coset progression supports a lot of structure. While it is not a coset, it behaves enough like a coset that it can support many commonly used analytic arguments, and in particular a sort of approximate harmonic analysis. This means that many results for groups can also be established for convex coset progressions.
678
Tom Sanders
(iv) The result is a rough equivalence: any set satisfying the conclusion of the theorem satisfies the hypothesis with δ replaced by (δ)η(δ)3 exp(−O(d(δ))). The quality of the rough equivalence serves as a measure of the strength of Theorem 1.3 and conjecturally this equivalence is polynomial. Our interest in this paper is in the quality of this equivalence – the strength of the bounds on η(δ), (δ) and d(δ) – the stronger they are the stronger the results in applications. Conjecturally we can take log η(δ)−1 , log (δ)−1 , d(δ) = O(log δ −1 ) in Theorem 1.3. This is called the Polynomial Fre˘ıman–Ruzsa conjecture, and if true means that any set satisfying the output of Theorem 1.3 automatically satisfies the hypothesis with δ replaced by δ O(1) – the rough equivalence would be rather strong. Combining Gowers’ refinement of the Balog–Szemer´edi Lemma [6] with the Green–Ruzsa Theorem [7] gives log η(δ)−1 , log (δ)−1 , d(δ) = O(δ −O(1) ). Green and Ruzsa actually gave an explicit value for the constant implied in the O(1) term and there was some work improving this constant before an important breakthrough by Schoen [16] who showed that it is smaller than any power. Specifically he proved that one may take p (1.3) log η(δ)−1 , log (δ)−1 , d(δ) = O(exp(O( log δ −1 ))). Following on from this we were recently able to show in [14] that log η(δ)−1 , log (δ)−1 , d(δ) = O(logO(1) δ −1 ).
(1.4)
The results we have chosen to mention above are far from a complete history of work on Theorem 1.3. Indeed, as a centrepiece of additive combinatorics it has been investigated from many different angles, but we do not have the space to discuss these here. The interested reader may wish to consult the notes [13] of Ruzsa.
2. De-coupling the argument The arguments to prove Theorem 1.3 separate naturally into two parts: one more combinatorial, and one more algebraic. The quality of the bounds is almost entirely dependent on the combinatorial part of the argument and that is where most of the recent progress has been made, so we shall now briefly explain how to de-couple the two parts so that we can then focus on the combinatorial one. The algebraic part of the argument essentially shows that being a convex coset progression is equivalent to satisfying a relative polynomial growth condition. The latter condition is easier to satisfy combinatorially, so that proving Theorem 1.3
679
Approximate groups
comes down to finding a set with relative polynomial growth rather than a convex coset progression. To be more concrete we start with an observation about convex sets: if Q is a convex set in Rd then µ(nQ) ≤ nd µ(Q) for all n ≥ 1, and this property is inherited by d-dimensional convex coset progressions. We say that a set X has relative polynomial growth of order d if |nX| ≤ nd |X| for all n ≥ 1, where nX := X + · · · + X and the sum is n-fold1 . It turns out that if M is a d-dimensional coset progression then M has relative polynomial growth of order O(d), and that having relative polynomial growth is essentially characteristic for convex coset progressions. To this end we have the following theorem. Theorem 2.1. Suppose that G is an Abelian group and X ⊂ G has relative polynomial growth of order d. Then there is a (centred) convex coset progression M in G such that X − X ⊂ M , |M | ≤ exp(O(d log d))|X| and dim M = O(d log d). We shall sketch the proof of this Theorem in §8, but it is not the focus of the paper and is more or less a rearrangement of the ideas in Green and Ruzsa [7]. By considering the N × · · · × N cube in Zd we see that the result is tight up to the logarithmic factors and we think of it as providing an equivalence between d-dimensional convex coset progressions and sets with relative polynomial growth of order d. Indeed, instead of proving Theorem 1.3 we shall prove the following. Theorem 2.2. Suppose that G is an Abelian group and A ⊂ G is such that E(A) ≥ δ|A|3 . Then there is a set Y which is a translate of X − X such that 0
|A ∩ Y | ≥ η 0 (δ)|A|, |A ∩ Y | ≥ 0 (δ)|Y | and |nX| ≤ nd (δ) |X| for all n ≥ 1 for some (increasing) functions η 0 , 0 and d0 . We should like to combine Theorems 2.1 and 2.2 to get Theorem 1.3. We can not do this directly but it turns out that by delving a little into the proofs of each one can combine them to do so, and in particular the ways we establish Theorem 2.2 for given functions η 0 , 0 and d0 lead to arguments for establishing Theorem 1.3 with η ≈ η 0 , ≈ 0 and d ≈ d0 .
3. Overview of the combinatorial obstacles Our goal now is to prove Theorem 2.2, and in light of the equivalence mentioned in the previous section we shall think of cosets, convex coset progressions and sets with relative polynomial growth as being the same thing for the purpose of constructing examples. 1 In
particular nX := {x1 + · · · + xn : x1 , . . . , xn ∈ X}.
680
Tom Sanders
One of the reasons that proving Theorem 2.2 (and hence Theorem 1.3) is hard (and also one reason it is so powerful) is that there are qualitatively three different sorts of structure having large additive energy. We got a sense of roughly what these are earlier but it is helpful now to record them a little more precisely. (i) (Random sets) Suppose that H is a coset in G and A ⊂ H is chosen by including each h ∈ H independently with probability δ. Then (with high probability) E(A) ≈ δ|A|3 . (ii) (Independent copies of the same coset) Suppose H is a coset in G and A is a union of k ∼ δ −1 independent cosets in G/H. To be clear this means that Sk A = i=1 (xi + H) where {xi + H}ki=1 is a set of k elements of H such that n ∈ Zk and n1 x1 + · · · + nk xk ∈ H ⇒ ni xi ∈ H for all i ∈ {1, . . . , k}. Then E(A) ≈ δ|A|3 . (iii) (Independent copies of different cosets) Suppose that k ∼ δ −1/2 and H1 , . . . , Hk are ‘internally independent cosets’ all of the same size which is to be taken to mean that |H1 + · · · + Hk | = |H1 | . . . |Hk |, and intuitively means that there are no non-trivial relations between elements in the Hi s. Then E(A) ≈ δ|A|3 . It is a little easier to unify the first two classes of example with each other than the third with either of the first two. This is because in the first two classes there is an obvious choice of coset: H. (In fact the obvious choice is really of subgroup, but this turns out not to be an important distinction here.) On the other hand in the third class any of the cosets (or, rather, corresponding subgroups of) H1 , . . . , Hk are reasonable choices and there is no particular reason to pick one over the other. A unifying aspect of the first two classes above is that the sets A given have small sumset. In particular, in both classes we have that |A + A| = Θ(δ −1 |A|) which we think of as saying that A has ‘small doubling’; in the third class |A+A| = Ω(|A|2 ) so that it is almost as large as can be. The first step in proving Theorem 2.2 is then in converting sets from the third class into the first two, and this is the purpose of the Balog–Szemer´edi–Gowers Lemma. Qualitatively this was proved by Balog and Szemer´edi in [1], but it was a very important step when Gowers established polynomial bounds in [6]. Theorem 3.1 (Balog–Szemer´edi–Gowers Lemma). Suppose that A ⊂ G has E(A) ≥ δ|A|3 . Then there is a set A0 ⊂ A such that |A0 | ≥ δ O(1) |A| and |A0 + A0 | ≤ δ −O(1) |A0 |. This result has been studied extensively elsewhere and will not be our focus here. The interested reader might like to consult the book [17] of Tao and Vu. It
681
Approximate groups
is work saying that the proof is elementary, albeit rather clever, and is set around the idea of examining A ∩ (x + A) for suitable randomly chosen x. In the third class of examples considered above this has the effect of selecting one of the cosets (at random). From now on we shall be interested in the case of so-called ‘small doubling’ mentioned earlier meaning the case when |A + A| ≤ K|A|, and shall prove the following. Theorem 3.2. Suppose that G is an Abelian group and A ⊂ G is such that |A + A| ≤ K|A|. Then there is a set Y which is a translate of X − X such that 00
|A ∩ Y | ≥ η 00 (K)|A|, |A ∩ Y | ≥ 00 (K)|Y | and |nX| ≤ nd 00
00
(K)
|X| for all n ≥ 1
00
for some (decreasing) functions η , and d . This yields Theorem 2.2 on combination with the Balog–Szemer´edi–Gowers Lemma with η 0 (δ) = δ O(1) η 00 (δ −O(1) ), 0 (δ) = δ O(1) 00 (δ −O(1) ) and d0 (δ) = d00 (δ −O(1) ). In actual fact for the best bounds one also goes into the details of the proof of the Balog–Szemer´edi–Gowers Lemma, but for this overview that improvement will not concern us. The focus of the paper now is on proving Theorem 3.2 and we split into three sections. In §5 we establish Theorem 2.2 with bounds corresponding roughly to the original work of Green and Ruzsa; in §6 we develop Schoen’s improvement of this; and, finally, in §7 we develop the improvement leading to the bounds in (1.4).
4. Sumset estimates and polynomial growth In this brief section we record a couple of useful results which will help us to establish relative polynomial growth on all scales from a growth condition on just one scale. In particular we have the following result of Chang [2]. Lemma 4.1 (Variant of Chang’s covering lemma). Suppose that G is an Abelian group and X ⊂ G is symmetric with |(3k + 1)X| < 2k |X| for some k. Then |nX| ≤ nk |X| for all n ≥ 1. To establish the hypothesis of this lemma it will also be useful to have Pl¨ unnecke’s inequality. Theorem 4.2 (Pl¨ unnecke’s inequality). Suppose that |A + A| ≤ K|A|. Then |nA| ≤ K n |A| for all n ≥ 1. This result was proved by Pl¨ unnecke in [11], and his proof was rediscovered and popularised somewhat later by Ruzsa. Very recently, however, Petridis [10] found a new proof which is very direct and well worth reading. Before closing this short section it is worth saying that the above two results are part of a rich family of sumset estimates we do not have time to touch on here, but we direct the interested reader towards Tao and Vu [17] for more details.
682
Tom Sanders
5. The basic argument We now turn our attention to proving Theorem 3.2 with bounds of the quality arrived at by Green and Ruzsa in [7]. We are thus considering a set A with |A + A| ≤ K|A| and in light of our earlier discussions we can restrict ourselves to sets coming from the first two classes of structure in §3. Even with the work we have done the two classes of possible structure behave differently: in the first class when A is chosen randomly from H, A will typically have a lot of gaps so that 1A is not very smooth. One way of smoothing a function is by averaging or convolving and to this end we make a definition. Given f, g ∈ `1 (G) we define the convolution of f and g to be the function X f ∗ g(x) = f (y)g(z) for all x ∈ G. y+z=x
To relate this to sets we define some level sets called symmetry sets. Given a set A the symmetry set at threshold η is defined to be the set Symη (A) := {x ∈ G : 1A ∗ 1−A (x) ≥ η|A|}. Note that 1A ∗ 1−A (x) ≤ |A| so that Symη (A) is the set of points where 1A ∗ 1−A is a proportion η of its maximum. Heuristically we expect 1A ∗1A to be pretty smooth – on a qualitative level 1A is an element of L2 and the convolution of two L2 functions is continuous. Concretely then we expect to have a quantitative version of the notion of uniform continuity, meaning there should be a (large) set X such that on translation by elements of X, the convolution does not vary very much. To formulate this precisely we define translation: given f ∈ `2 (G) we write τx (f )(y) = f (y + x) for all x, y ∈ G. It will also be helpful to write µA for the function 1A /|A| so that f ∗ µA (x) is the average value of f over the set x − A. The heuristic above can be made precise in a number of ways but one very powerful approach is due to Croot and Sisask [3] who proved the following lemma. Lemma 5.1 (Croot–Sisask Lemma). Suppose that G is an Abelian group, f ∈ −2 `2 (G) and |A + A| ≤ K|A|. Then there is a set X with |X| ≥ (2K)−O( ) |A| such that kτx (f ∗ µA ) − f ∗ µA k`2 (G) ≤ kf k`2 (G) for all x ∈ X. Sketch proof. The basic idea is to start with the equality f ∗ µA = Ea∈A τ−a (f ). Given this we can randomly sample from A, say k times, and get a good approximation to f ∗ µA : k 1X τ−zi (f ) f ∗ µA ≈ k i=1
683
Approximate groups
for z1 , . . . , zk chosen uniformly at random from A. Here ‘≈’ means approximately equal in `2 -norm, and the larger the value of k, the better the approximation. We now examine the set L of vectors (zi )ki=1 such that the approximation is good. By averaging we prove that L − L has a large intersection, call it X, with the diagonal set {(a, . . . , a) : a ∈ A}. On the other hand, if x ∈ X then it follows that there is some z ∈ L such that f ∗ µA ≈
k k 1X 1X τ−zi −x (f ) and f ∗ µA ≈ τ−zi (f ) k i=1 k i=1
and hence τx (f ∗ µA ) ≈ f ∗ µA for all x ∈ X. Working through the details of this sketch gives the proof. Given the above result we shall now prove the following which is the version of Theorem 3.2 corresponding to the bounds of Green and Ruzsa although the argument is somewhat different. Theorem 5.2. Suppose that G is an Abelian group and A ⊂ G is such that |A + A| ≤ K|A|. Then there is a set Y which is a translate of X − X such that |A ∩ Y | ≥ exp(−K 1+o(1) )|A|, |A ∩ Y | = Ω(|Y |/K) and |nX| ≤ nK
1+o(1)
|X| for all n ≥ 1.
Sketch proof. We let k be a natural √ number to be optimised later and apply 2the Croot–Sisask lemma with = 1/2k K to get a set X with |X| ≥ |A|/(2K)O(k K) such that kτx (1A ∗ 1A ) − 1A ∗ 1A k2`2 (G) ≤ |A|3 /4K for all x ∈ kX by the triangle inequality. Now, by Cauchy–Schwarz we have k1A ∗ 1A k2`2 (G) ≥ |A|3 /K which by the triangle inequality and the output of the Croot–Sisask lemma tells us two things: kX ⊂ 2A − 2A and k1A ∗ 1A ∗ µX−X k2`2 (G) ≥ |A|3 /4K. The second of these gives us the translate Y of X −X such that |A∩Y | = Ω(|Y |/K) via an averaging argument; the first let us control the degree of polynomial growth of X. Since kX ⊂ 2A − 2A we have by Pl¨ unnecke’s inequality that |klX| ≤ K 4l |A| ≤ K 4l (2K)O(k
2
K)
|X|.
Putting 3r + 1 = kl and l = k 2 K we get that |(3r + 1)X| ≤ K O(r
2/3
K 1/3 )
|X|;
684
Tom Sanders
it follows that we can take r = O(K log3 K) such that |(3r+1)X| < 2r |X|. Chang’s covering lemma then tell us that X has the right order of relative polynomial growth. Finally from the definition of r and l in terms of k we get that k = O(log K) from which the bound in the size of |A ∩ Y |/|A| follows.
6. Schoen’s refinement Schoen in [16] made a major breakthrough when he proved the bounds mentioned in (1.3). √ If we study the argument above the weakness was that we had to take ≈ 1/ K in our application of the Croot–Sisask lemma, and since the resultant set X has size exponentially dependent on −2 this lead to exponential losses in K. To some extent these loses are necessary as can be seen by considering the examples in class (i) of §3. In this class A is chosen randomly with probability 1/K from a coset H, so that (with high probability) 1A ∗ 1A (x) ≈ |A|/K for all x ∈ A − A. On the other hand A + A is very structured – it is the whole coset H. Similarly, in class (ii) of §3, A + A is again very structured. It follows that in either of the above cases 1A+A ∗ 1A+A takes rather large values; certainly much larger than those of 1A ∗ 1A in the case when A is chosen randomly. If we can guarantee that some convolution takes a lot of values much larger than its average value then the arguments at the end of the last section can be applied much more effectively. This is roughly speaking Schoen’s idea and the following is one of the key ingredients from [16]. Proposition 6.1. Suppose that G is an Abelian group, A is a finite subset of G with |A + A| ≤ K|A|, and ∈ (0, 1] is a parameter. Then there is a non-empty set A0 ⊂ A such that | SymK −η (A0 + A)| ≥ exp(− exp(O(η −1 )) log K)|A|. The proof of this is iterative and based around an important observation which seems to have been first made by Katz and Koester in [8]. Suppose that A00 ⊂ G is such that |A + A00 | ≤ M |A| and |A00 + A00 | ≤ L|A00 |. Then we have 1A+A00 ∗ 1−(A+A00 ) (x) = |(A + A00 ) ∩ (x + A + A00 )| ≥ |A + (A00 ∩ (x + A00 ))|. Writing S for the set of x such that A00 ∩ (x + A00 ) is large, that is S := {x ∈ G : 1A00 ∗ 1A00 (x) ≥ |A00 |/2L}, we have two possibilities:
685
Approximate groups
(i) either 1A+A00 ∗ 1−(A+A00 ) (x) ≥ R|A + A00 | for all x ∈ S; (ii) or, putting A000 := A00 ∩ (x + A00 ), we have |A000 + A000 | ≤ 2L2 |A000 | and |A + A000 | ≤ (M/R)|A|. Given this we proceed by downward induction on |A + A00 |/|A| terminating when we are in the first case and repeating with A00 replaced by A000 , M by M/R and L by 2L2 in the second. This yields the proposition. With the above result we can now prove the following. Theorem 6.2. Suppose that G is an Abelian group and A ⊂ G is such that |A + A| ≤ K|A|. Then there is a set Y which is a translate of X − X such that p |A ∩ Y | ≥ exp(− exp(O( log K)))|A|, |A ∩ Y | = Ω(|Y |/K O(1) ) and
√ log K)
|nX| ≤ nexp(O(
|X| for all n ≥ 1.
Sketch proof. We apply Proposition 6.1 and put S := SymK −η (A0 + A) so that |S| ≥ exp(− exp(O(η −1 )) log K)|A|. By definition and the Cauchy–Schwarz inequality we have that k1A+A0 ∗ 1S k`2 (G) ≥ K −2η |A + A0 ||S|2 , and (since S ⊂ 2A − 2A) that |S + S| ≤ exp(exp(O(η −1 )) log K)|S|. We then proceed as in the proof of Theorem 5.2 but this time apply Croot–Sisask to the function 1A+A0 and the set S with parameter = 1/2kK −η and get a set X with 2/3 O(η) exp(O(η −1 ))) |(3r + 1)X| ≤ (2K)O(r K |X|. √ Optimising for η we take η = 1/ log K, and then the argument proceeds much as before to give the result.
7. The L´ opez–Ross trick and generalised Croot–Sisask The Croot–Sisask lemma has a rather powerful generalisation to `p -norms. Lemma 7.1 (Croot–Sisask lemma, `p -norm version). Suppose that G is an Abelian group, f ∈ `p (G) and |A + A| ≤ K|A|. Then there is a set X with |X| ≥ −2 (2K)−O( p) |A| such that kτx (f ∗ µA ) − f ∗ µA k`p (G) ≤ kf k`p (G) for all x ∈ X.
686
Tom Sanders
This result is also from [3] and the proof of the `2 version except that Khintchine’s inequality has to be replaced by the Marcinkiewicz–Zygmund inequality. The reason that this result is so much more powerful than the `2 version of the Croot–Sisask lemma is in the bounds. In particular the p dependence is exponential in p, rather than doubly exponential which is what all previous arguments had given. To understand why it is useful here we now sketch the proof of the following. Theorem 7.2. Suppose that G is an Abelian group and A ⊂ G is such that |A + A| ≤ K|A|. Then there is a set Y which is a translate of X − X such that |A ∩ Y | ≥ exp(− logO(1) K)|A|, |A ∩ Y | = Ω(|Y |/K O(1) ) and |nX| ≤ nlog
O(1)
K
|X| for all n ≥ 1.
Sketch proof. Rather than examining the `2 -norm of the convolution of two functions we use an observation of L´opez and Ross from [9]: h1A+A , 1A ∗ 1A i = |A|2 . On the other hand if we know that kτx (1A+A ∗ 1A ) − 1A+A ∗ 1A k`p (G) ≤ k1A+A k`p (G) , then we conclude that hτx (1A+A ), 1A ∗ 1A i ≥ |A|2 − k1A+A k`p (G) |A| = |A|2 (1 − K 1/p ). We conclude that we can take p ∼ log K and = Ω(1) such that hτx (1A+A ), 1A ∗ 1A i ≥ |A|2 /2. But this means by the Croot–Sisask lemma that there is a set X of size at least 2 |A|K −O(k ) such that hτx (1A+A ), 1A ∗ 1A i ≥ |A|2 /2 for all x ∈ kX. This can then be plugged back into a similar argument to the ones we had before to get Theorem 7.2. The advantage here is that the set X we have found is a lot bigger than those we had previously found as a result of the good bounds in the Croot–Sisask lemma.
8. Polynomial growth and convex progressions In this section we shall sketch a proof of Theorem 2.1 which we restate now as a reminder.
687
Approximate groups
Theorem 8.1 (Theorem 2.1). Suppose that G is an Abelian group and X ⊂ G is such that |nX| ≤ nd |X| for all n ≥ 1. Then there is a (centred) convex coset progression M in G such that X − X ⊂ M , |M | ≤ exp(O(d log d))|X| and dim M = O(d log d). As indicated this is largely a rearrangement of the ideas of Green and Ruzsa in [7], which are themselves developed from the hugely influential paper [12] of Ruzsa. One of the key tools is the Fourier transform which in this case we define b for the compact Abelian group of regarding G as a discrete group. We write G characters on G and given f ∈ `1 (G) the Fourier transform of f is defined to be the function X b → C; γ 7→ fb : G f (x)γ(x). x∈G
There is a useful notion of approximate annihilator on G called Bohr sets. Given b a compact set and δ ∈ (0, 2] we write Γ⊂G Bohr(Γ, δ) := {x ∈ G : |γ(x) − 1| ≤ δ for all γ ∈ Γ}. Bohr sets interact particularly well with the large spectrum of a set. Given A ⊂ G we write b : k1 − γkL2 (µ ∗µ ) ≤ }. LSpec(A, ) := {γ ∈ G A −A The basic idea is to show that if X has polynomial growth then X −X is contained in the Bohr set of the large spectrum of (a dilate of) X. We then show that this Bohr set is not too large, and finally that it is actually a low dimensional convex coset progression. The following proposition deals with the first objective above; it is only slightly more general than [17, Proposition 4.39]. Proposition 8.2. Suppose that X ⊂ G, l is a positive integer such that |lX| ≤ K|(l − 1)X| and ∈ (0, 1] is a parameter. Then √ X − X ⊂ Bohr(LSpec(lX, ), 2 2K). The proof is fairly straightforward after unpacking the definitions. The second objective above – that the Bohr set not be too large – is proved using an idea of Schoen [15] introduced to Fre˘ıman-type problems by Green and Ruzsa in [7]. Proposition 8.3. Suppose that X ⊂ G has |nX| ≤ nd |X| for all n ≥ 1, and ∈ (0, 1/2] is a parameter. Then we have the estimate | Bohr(LSpec(X, ), 1/2π)| ≤ exp(O(d log −1 d))|X|. The proof of this is via the Fourier transform which shows that the large spectrum of the specified Bohr set must support a lot of the `2 -mass of 1X .
688
Tom Sanders
To deal with similar concerns to those of our the final objective Ruzsa introduced the geometry of number to Fre˘ıman-type theorems in [12]. There is a great deal to say about this, and we direct the reader to [17, Chapter 3.5] for a much more comprehensive discussion. For our purposes we have the following proposition. Proposition 8.4. Suppose that G is an Abelian group, d ∈ N and B is a Bohr set such that | Bohr(Γ, (3d + 1)δ)| < 2d | Bohr(Γ, δ)| for some δ < 1/4(3d + 1). Then Bohr(Γ, δ) is a d-dimensional convex coset progression. The proof of this involves the covering lemma of Chang mentioned earlier and a very important embedding defined by Ruzsa RΓ : G → C(Γ, R) x 7→ RΓ (x) : Γ → R; γ 7→
1 arg(γ(x)), 2π
where the argument is taken to lie in (−π, π]. The map RΓ acts as something called a Fre˘ıman morphism2 which lets us embed Bohr sets into a lattice. With those three ingredients it is possible to stitch together a proof of Theorem 2.1 and the section is complete.
References [1] A. Balog and E. Szemer´edi, A statistical theorem of set addition. Combinatorica 14(3) (1994), 263–268. [2] M.-C. Chang, A polynomial bound in Fre˘ıman’s theorem. Duke Math. J. 113(3) (2002), 399–419. [3] E. S. Croot and O. Sisask, A probabilistic technique for finding almost-periods of convolutions. Geom. Funct. Anal. 20(6) (2010), 1367–1396. [4] J. J. F. Fournier, Sharpness in Young’s inequality for convolution. Pacific J. Math. 72(2) (1977), 383–397. [5] G. A. Fre˘ıman, Nachala strukturnoi teorii slozheniya mnozhestv. Kazan. Gosudarstv. Ped. Inst, 1966. [6] W. T. Gowers. A new proof of Szemer´edi’s theorem for arithmetic progressions of length four. Geom. Funct. Anal. 8(3) (1998), 529–551. [7] B. J. Green and I. Z. Ruzsa. Fre˘ıman’s theorem in an arbitrary abelian group. J. Lond. Math. Soc. (2) 75(1) (2007), 163–175. [8] N. H. Katz and P. Koester, On additive doubling and energy. SIAM J. Discrete Math. 24(4) (2010), 1684–1693. [9] J. M. L´ opez and K. A. Ross, Sidon sets. Marcel Dekker Inc., New York, 1975. Lecture Notes in Pure and Applied Mathematics, Vol. 13. 2 We
direct the unfamiliar reader to [17, Chapter 5.3].
689
Approximate groups
[10] G. Petridis. New proofs of Pl¨ unnecke-type estimates for product sets in groups. 2011, arXiv:1101.3507. [11] H. Pl¨ unnecke, Eigenschaften und Absch¨ atzungen von Wirkungsfunktionen. BMwFGMD-22. Gesellschaft f¨ ur Mathematik und Datenverarbeitung, Bonn, 1969. [12] I. Z. Ruzsa, Generalized arithmetical progressions and sumsets. Acta Math. Hungar. 65(4) (1994), 379–388. [13] I. Z. Ruzsa, Sumsets and structure. In: Combinatorial number theory and additive group theory, Adv. Courses Math. CRM Barcelona, 87–210. Birkh¨ auser Verlag, Basel, 2009. [14] T. Sanders, On the Bogolyubov–Ruzsa lemma. arXiv:1011.0107.
Anal. PDE, to appear, 2010,
[15] T. Schoen, Multiple set addition in Zp . Integers 3:A17 (2003), (electronic). [16] T. Schoen, Near optimal bounds in Fre˘ıman’s theorem. Duke Math. J. 158 (2011), 1–12. [17] T. C. Tao and H. V. Vu, Additive combinatorics. Cambridge Studies in Advanced Mathematics, vol. 105. Cambridge University Press, Cambridge, 2006.
Tom Sanders, Mathematical Institute, University of Oxford, 24–29 St. Giles’, Oxford OX1 3LB, England E-mail: [email protected]
Shearing and mixing in parabolic flows Corinna Ulcigrai∗
Abstract. Parabolic flows are dynamical systems in which nearby trajectories diverge with polynomial speed. A classical example is the horocycle flow on a surface of constant negative curvature. Other important classes of examples are smooth area-preserving flows on surfaces, whose study is connected with Teichmueller dynamics, and Heisenberg nilflows. We survey some of the chaotic properties of these flows and some recent results on time changes of the above mentioned classes of examples. We focus in particular on mixing and we explain the shearing mechanism which is responsible for mixing in parabolic dynamics. 2010 Mathematics Subject Classification. 37A25, 37E35, 37D40. Keywords. Parabolic flows, shearing, mixing, time-changes, horocycle flow, area preserving flows, locally Hamiltonian flows, Heisenberg nilflows.
Introduction Dynamical systems are often grouped in three main classes: elliptic, parabolic and hyperbolic. A non-singular flow is called parabolic if nearby orbits diverge polynomially in time. If nearby orbits diverge exponentially, the flow is called hyperbolic. If there is no divergence (or perhaps it is slower than polynomial) the flow is called elliptic. In contrast with the hyperbolic case, and to a lesser extent with the elliptic case, there is no general theory which describes the dynamics of parabolic flows. In this paper, we survey some classical and some recent results on the chaotic properties of parabolic flows. We focus on three main classes of parabolic flows and their time-changes. The first fundamental example of a parabolic flow is given by horocycle flows on compact negatively curved surfaces (see section 1). Other important examples of flows which are sometimes considered parabolic is given by area-preserving flows on surfaces of higher genus (genus greater than two) with saddle-like singularities (see section 2). In this case the orbit divergence is entirely produced by the splitting of trajectories near the singularities. Finally, another class of parabolic flows of algebraic nature are nilflows on nilmanifolds, which are quotients of Lie groups by unipotent lattices. The prototype of a nilflow is the Heisenberg nilflow which is defined in section 3. While some of the above examples (as the horocycle flow on a negatively curved surface and nilflows) have been classically studied and are well understood, not much is known for general smooth parabolic flows, not even for smooth perturba∗ is
partially supported by a RCUK Fellowship and an EPRSC Grant.
692
Corinna Ulcigrai
tions of these standard examples. In fact, even the dynamics of non-trivial smooth time-changes is poorly understood. Let us recall that given a flow {e ht }t∈R is called a reparametrization or a time-change of a flow {ht }t∈R on X if there exists a measurable function τ : X × R → R such that for all x ∈ X and t ∈ R we have e ht (x) = hτ (x,t) (x). The reparametrized flow {e ht }t∈R has the same trajectories than {ht }t∈R , but different speed of motion. Since {e ht }t∈R is assumed to be a flow, the function τ (x, ·) : R → R is an additive cocycle over the flow {e ht }t∈R , that is, it satisfies the cocycle identity: τ (x, s + t) = τ (e hs (x), t) + τ (x, s) ,
for all x ∈ X , s, t ∈ R .
(1)
If X is a manifold and {ht }t∈R is a smooth flow, we will say that {e ht }t∈R is a smooth reparametrization if the cocycle τ is a smooth function. As a step towards a better understanding of parabolic dynamical systems, one can ask to what extent the dynamical properties of the the known examples persist after a smooth time-change. Results on the ergodic and spectral theory of smooth time-changes of the classical horocycle flow were recently achieved in joint work with G. Forni (see Section 1), while time-changes of the Heisenberg nilflow were studied in joint work with A. Avila and G. Forni (see section 3). We focus in particular on the chaotic property known as mixing. We recall that a measure preserving flow {ϕt }t∈R on a probability space (X, µ) is (strongly) mixing if for each pair of measurable sets A, B, one has lim µ(ϕt (A) ∩ B) = µ(A)µ(B) ,
t→∞
that is, under the dynamics sets become asymptotically independent. Equivalently, for any pair of square-integrable functions f, g ∈ L2 (X, µ), such that Rf belongs to the space L20 (X, µ) of zero-average square integrable functions, that is M f dµ = 0, the correlations Z hf ◦ ϕt , giL2 (M,µ) = f (ϕt x)g(x) dµ(x) M
converge to zero as t tends to infinity. The geometric mechanism that seems to be universal in producing mixing (when there is mixing) in parabolic flows is shearing. We describe the shearing phenomenon in the three classes of examples mentioned above.
1. Horocycle flow and its time changes The classical horocycle flow is a fundamental example of a unipotent, parabolic flow. Consider the homogeneous space M := Γ\P SL(2, R), where Γ is a cocompact lattice in P SL(2, R). The space M can be interpreted as the unit tangent bundle T 1 S of the compact hyperbolic surface S = Γ\H obtained by quotienting the
693
Shearing and mixing in parabolic flows
upper half plane H = {z ∈ C| Im(z) > 0} by the action of Γ by fractional linear transformations. Let {φX t } denote the geodesic flow on M . The geodesic flow V {φX } and the stable and unstable horocycle flows {hU t t } and {ht } are defined respectively by the multiplicative action on the right of the 1-parameter subgroups {exp(tX)}t∈R , {exp(tV )}t∈R , {exp(tU )}t∈R of the group P SL(2, R) where {U, V, X} is a basis of the Lie algebra sl(2, R) of P SL(2, R) which satisfy the following commutation relations: [U, V ] = X ,
[X, U ] = U ,
[X, V ] = −V .
Let us recall that geometrically, the stable (respectively unstable) horocycle containing a given point (x, v) ∈ T 1 S = M consist of all points (x0 , v 0 ) such that the geodesics through x0 in direction v 0 have the same final point (respectively, initial point) on the boundary of H than the geodesics in direction v passing through x. The dynamical properties of the horocycle flow {hU t }t∈R have been studied in great detail. It is known for example that the flow is minimal [12], uniquely ergodic [10], has Lebesgue spectrum and is therefore strongly mixing [22], in fact mixing of all orders [19], and has zero entropy [11]. Its finer ergodic and rigidity properties were investigated by M. Ratner is a series of papers, as well as the rate of mixing [23]. G. Forni and L. Flaminio [6] proved precise bounds on ergodic integrals of smooth functions. Not much was known for smooth time-changes of classical horocycle flows. A smooth time-change {hα t }t∈R of the (stable) horocycle flow on M given by a smooth positive function α : M → R+ is the flow generated by the smooth vector field Uα =: U/α. One can check that {hα t }t∈R is given by the formula U hα τ (x,t) (x) := ht (x) ,
for all (x, t) ∈ M × R.
(2)
where τ : M ×R → R is a smooth cocycle over the flow {hU t }t∈R (see (1)) which has α as infinitesimal generator, i.e. such that α(x) := ∂τ ∂t (x, t)|t=0 for all x ∈ M . The flow {hα t }t∈R preservesR the (smooth) R volume form volα := αvol (we assume that α is normalized so that M volα = M α vol = 1). Let us remark that also the timechange {hα t }t∈R is parabolic, in fact the infinitesimal divergence of trajectories is at most quadratic with respect to time, as is the case for the standard horocycle flow. The most important classical result to date which applies to time-changes of horocycle flows is the proof by B. Marcus more than thirty years ago that all time-changes satisfying a mild differentiability conditions are mixing [19]. Marcus results generalized earlier work by Kushnirenko who proved mixing for all timechanges with sufficiently small derivative in the geodesic direction [18]. In joint work with G. Forni, by combining B. Marcus’ proof of mixing with quantitative equidistribution results and techniques developed by G. Forni and L. Flaminio [6] and by A. Bufetov and G. Forni [4], we showed that B. Marcus’ proof of mixing can be made quantitative. More precisely, we prove the following quantitative mixing result for smooth time-changes of horocycle flows [8]. Let µ0 > 0 be the smallest eigenvalue for the hyperbolic Laplacian on the compact surface Γ\H. Let
694
Corinna Ulcigrai
ν0 ∈ [0, 1) and 0 ∈ {0, 1} be the parameters defined as follows: ( (√ 0 , if µ0 = 6 1/4 , 1 − 4µ0 , if µ0 < 1/4 , 0 := ν0 := 1 , if µ0 = 1/4 . 0, if µ0 ≥ 1/4 ;
(3)
Let us denote by W r (M ) the standard Sobolev space and by k · kX the graph norm1 of the Lie derivative operator LX . Theorem 1.1 ([8]). For any α ∈ W r (M ) where r > 11/2, there exists a constant Cr (α) > 0 such that the following holds. For any zero-average function f ∈ W r (M ) ∩ L20 (M, volα ), for any function g ∈ dom(LX ) and for any t ≥ 1, − |hf ◦ hα t , giL2 (M,volα ) | ≤ Cr (α)kf kr kgkX t
1−ν0 2
(1 + log t)0 .
The key idea used by Marcus to prove mixing relies on the shearing phenomenon which we explain here below. Consider the push-forward under the flow of a small σ geodesic arc. Let σ ∈ R+ and (x, t) ∈ M × R. Let γx,t : [0, σ] → M be the parametrized path defined as follows: σ X γx,t (s) := hα t ◦ φs (x) ,
for all s ∈ [0, σ] .
σ and its length and check that for One can compute the velocity of the path γx,t all (x, t, s) ∈ M × R × [0, σ] the tangent vector to the path is given by σ dγx,t σ σ (s) := vt (x, s)Uα (γx,t (s)) + X(γx,t (s)) , ds
(4)
where the coefficient vt (x, s) is given by the following integral along the flow {hα τ }t∈R : Z t Xα X vt (x, s) := ( − 1) ◦ hα τ ◦ φs (x)dτ. α 0 Thus, while the original curve is tangent to the geodesic direction, the push forward σ γx,t has not only a component in the X direction, but a component in the horocycle direction Uα (or, which is the same, in direction U ). The coefficient vt (x, s) of this component is given by an ergodic integral along an orbit of {hα τ }t∈R . Since the horocycle flow is uniquely ergodic [10] and the integral of the function ( Xα α − 1) is equal to −1, we have that vt (x, s)/t converges to −1. Moreover, one can show that as t increases, the coefficient vt (x, s) of the component in the horocycle direction is growing asymptotically like t. More precisely, in [8], we show that, for all (x, s) ∈ M × [0, σ] and for all t > 0, vt (x, s) 1−ν ≤ Cr kαkr t− 2 0 (1 + log+ t)0 , + 1 (5) t 1 For all functions g ∈ L2 (M ) which belong to the maximal domain dom(L ) ⊂ L2 (M ) of the X maximally defined Lie derivative operator LX : L2 (M ) → L2 (M ), the graph norm of g is defined by kgkX := (kgk20 + kXgk20 )1/2 .
Shearing and mixing in parabolic flows
695
where 0 , ν0 are the exponents in (3) above. Geometrically equation (4) describes the shearing phenomenon we mentioned above: geodesic arcs, pushed under the flow, shear and gain a component in the flow direction. As t increases, the push forward of the small original becomes a curve which are more and more stretched in the flow direction. This geometric phenomenon produces mixing in the following way. Consider two sufficiently smooth functions f, g ∈ L2 (M, volα ), where f ∈ L20 (M, volα ) has zero average. The correlation integral hf ◦ hα t , giL2 (M,volα ) , by the invariance of the the standard volume under the geodesic flow and integration by parts, can be rewritten as Z 1 σ X X hf ◦ hα hf ◦ hα , gi = t ◦ φs ds, g ◦ φσ i t σ 0 Z Z Z 1 σ S 1 σ X X X X f ◦ hα = f ◦ hα ◦ φ ds, g ◦ φ − t ◦ φs ds, LX g ◦ φS dS, t s σ σ 0 σ 0 0 Thus, hf ◦hα t , giL2 (M,volα ) can be estimated by estimating integrals along the pushσ forward curves γx,t . Marcus’ proves that if f is continuous, one can choose σ sufficiently small, so that these integrals are close to the integrals along flow trajectories. Thus, mixing follows from uniform distribution of long trajectories of the horocycle flow [10]. More precisely, in [8], we show that for any zero-average function f ∈ W r (M ) ∩ L20 (M, volα ), Z S 1+ν0 X 2 (f ◦ hα [1 + log+ (St)]0 /t . t ◦ φs )(x)ds ≤ Cr,σ (α)kf kr (St) 0 These estimate can then be used to give a quantitative estimate of hf ◦ hα t , gi and prove Theorem 1.1. This results on decay of correlations of time-changes turned out to be the starting point to prove a conjecture by A. Katok and J. P. Thouvenot [14] on the spectral properties of smooth time changes of the horocycle flow. Let us recall that for each function f ∈ L2 the associated spectral measure µf is the unique measure on the real line suchRthat its Fourier transform µ ˆf (which is the bounded function defined by µ ˆf (t) := R eiξt dµf (ξ) , for all t ∈ R ), is given by the autocorrelation of f , that is satisfies the following identity: µ ˆf (t) = hf ◦ hα t , f iL2 (M,volα ) . The flow is said to have absolutely continuous spectrum if all spectral measures are absolutely continuous with respect to the Lebesgue measure on the real line. In [8] we show the following, that proves one of the conjectures in [14]: Theorem 1.2. Let r > 11/2 and let α ∈ W r (M ). The time-change {hα t }t∈R of the (stable) horocycle flow {hU t }t∈R with infinitesimal generator Uα := U/α has purely absolutely continuous spectrum. Theorem 1.2 is derived from square-mean bounds on twisted ergodic integrals of smooth functions which are equivalent to bounds on the Fourier transform of the spectral measures. The decay of correlations of a general smooth function
696
Corinna Ulcigrai
under the horocycle flow is not square-integrable, however, our results on decay of correlations of time-changes are precise enough, thanks to a bootstrap trick, to give optimal (in particular square-integrable), decay of correlations for smooth coboundaries (that is, functions of the form Uα u for some smooth u). Once it is established that all smooth coboundaries have absolutely continuous spectral measures, it follows (for instance by a density argument) that the spectrum is purely absolutely continuous. Moreover, our estimates on decay of correlations of coboundaries are also used to prove the maximal spectral type is Lebesgue [8].
2. Area preserving (locally Hamiltonian) flows on surfaces Another class of flows with parabolic behavior are flows on surfaces of higher genus which preserve a smooth area form. A natural construction for such flows is the following. On a closed connected orientable surface S of genus g ≥ 1 with a fixed smooth area form ω, consider a smooth closed real-valued differential 1-form η. Let X be the vector field determined by η = iX ω = ω(η, ·) and consider the flow {ϕt }t∈R on S associated to X. Since η is closed, the transformations ϕt , t ∈ R, preserve a smooth area that we will denote by µ. The flow {ϕt }t∈R is known as the locally Hamiltonian flow (or flow given by a multi-valued Hamiltonian) associated to η. Indeed, locally 2 one can find coordinates (x, y) on S in which {ϕt }t∈R is given by the solution to the Hamilton equations x˙ = ∂H/∂y, y˙ = −∂H/∂x for some smooth real-valued Hamiltonian function H. The study of locally Hamiltonian flows was first initiated by S. P. Novikov and his school, see for example [21, 30]. Another well studied class of area-preserving flows are directional flows on translation surfaces, whose study is connected with polygonal billiards and Teichmueller dynamics. Directional flows on translation surfaces are known to be typically (uniquely) ergodic [20, 29], but never mixing [13]. Locally Hamiltonian flows can be seen as time-changes of the directional flows, thus they are also typically ergodic. The question of mixing (which instead depends on the time-parametrization) in locally Hamiltonian flows was raised by Arnold in [1]. For g ≥ 2, the form ω has always zeros, which correspond to points fixed by the flow. We assume that ω is Morse, so that the flow {ϕt }t∈R has only simple saddles and centers. The parabolic nature of this class of flows is entirely due to the presence of the saddles, which split nearby trajectories and are responsible for the polynomial divergence of nearby orbits. Moreover, Hamiltonian saddles produces a shearing effect similar to the one described for the horocycle flow. Since the closer a trajectory pass to a saddle point, the more it is slowed down, if γ ⊂ S be a small arc transversal to the flow, when its evolution ϕt (γ) under the flow passes nearby a saddle separatrix (without hitting the saddle point), it tends to shear in the direction of the flow (see Figure 1). While in the case of horocycle flows the shear happens coherently, for flows on surfaces the shear which is gained when 2 A global Hamiltonian H cannot be in general be defined, but one can think of {ϕ } t t∈R as globally given by a multi-valued Hamiltonian function.
Shearing and mixing in parabolic flows
697
Figure 1. Shearing in locally Hamiltonian flows.
passing on one side of a separatrix can be lost when passing near the other side of the same or another separatrix. In this set up, a necessary condition for the flow {ϕt }t∈R to be mixing is that {ϕt }t∈R is minimal, that is, any trajectory which is not a fixed point or a separatrix is dense. It is clear that if there is a center, the flow is not minimal, since around centers there are closed periodic orbits. Nevertheless, the surface can be decomposed into finitely many periodic components, i.e. connected components filled up by periodic trajectories and components, called minimal, on which the orbits are dense, see for example [30]. In genus one, locally Hamiltonian flows with at least a singularity have a unique minimal component. Arnold’s conjecture the restriction to this minimal component is mixing was proved by Khanin and Sinai [15]. We discuss here below the case g ≥ 2 and address the question of mixing for the restriction to minimal components. We will see that there are two open classes of locally Hamiltonian flows, in one mixing is typical, in the other absence of mixing is typical. Let us first a representation of locally Hamiltonian flows as special flows. Let us fix a transversal to the flow restricted to a minimal component and consider the Poincar´e map, i.e. the map that sends a point of the transversal to the point of its orbit under the flow which first returns to the transversal (the map is well defined almost everywhere by minimality). Since the flow is area-preserving, the Poincar´e map also preserves a smooth measure and one can choose coordinates in which it is the Lebesgue measure. In this coordinates, the first return map is an interval exchange transformation. Let us recall that an interval exchange transformation or IET is a piecewise isometry of [0, 1] is determined by a combinatorial data (e.g. a permutation of d ≥ 2 symbols) and a length vector λ = (λ1 , . . . , λd ) ∈ Rd+ Pd with i=1 λi = 1. The interval exchange T associated to π and λ rearranges the subintervals of lengths given by λ in the order determined by π. Let us say that a property holds for almost every IET if, for any π which is irreducible (that is, such that the only π-invariant set of the form {1, 2, . . . , k} is {1, 2, . . . , d}), the property holds for almost every length vector λ = (λ1 , . . . , λd ) with respect to the Pd restriction of the Lebesgue measure on Rd+ to the simplex i=1 λi = R1. Let f > 0 and assume for convenience that f is normalized so that f = 1. Let Xf be the region under the graph of f given by Xf = {(x, y)| x ∈ [0, 1], 0 ≤ y ≤
698
Corinna Ulcigrai
f (x)}. Let us recall that the special flow {ϕt }t∈R over T and under f is defined on Xf as the vertical unit speed flow under the identification of the point (x, f (x)) with (T (x), 0). In other words, the flow {ϕt }t∈R moves a point vertically along the y-axes with unit speed, i.e., ϕt (x, y) = (x, y + t) until y + t ≤ f (x), then, at t = f (x) − y, it returns back to the base according to T , at (0, T (x)). Let us consider special flows under roof functions with logarithmic singularities (see an example in Figure 1), i.e. of the form: f (x) =
n X i=1
m pos X pos + + g, Ci+ ln x − x+ Ci− ln x− i i −x
(6)
i=1
where g has bounded variation and xpos = x if x > 0 and xpos = 0 otherwise. At − the points {x+ i , i = 1, . . . , n} the function f has a right-singularity and at {xi , i = + 1, . . . , m} it has a left-singularity. Let us assume that {xi , i = 1, . . . , n} ∪ {x− i ,i = 1, . . . , m} are a subset of the discontinuities of T . WePsay that f P has asymmetric m n logarithmic singularities and write f ∈ AsymLog if i=1 Ci− 6= i=1 Ci+ , while we say that P f has symmetric logarithmic singularities and write f ∈ SymLog if P m n − + i=1 Ci = i=1 Ci . The restriction of a locally Hamiltonian flow to each minimal component can be represented as a special flow under a roof function with logarithmic singularities of the form (6). If the flow is minimal and there are only simple saddles, f ∈ SymLog. On the other hand, a typical flow with saddle loops (that is, separatrices which start and end at the same saddle) homologous to zero gives rise to a return function f ∈ AsymLog. We remark that a typical flow with centers or more than one minimal component also has saddle loops. Indeed, saddle loops bound the periodic orbits around a center and the cylinders filled by periodic orbits which separate different minimal components. Thus, roof functions of minimal components belong to AsymLog. In the set up of special flows we proved the following two main results3 that show that the presence/absence of mixing depends crucially on the asymmetry/ symmetry of the singularities: Theorem 2.1 ([27]4 ). For a.e. IET T , the suspension flow built over T under any roof function f ∈ AsymLog is mixing. We remark that mixing in Theorem 2.1 is very slow. The upper estimates in [15] and [27] are sub-polynomial (more precisely of the form ln tγ for some γ < 0). This type of estimate is presumably optimal, but giving lower bounds on the decay of correlations for these flows is an open problem. 3 Let us remark that some special cases of the above theorems were proved earlier. In particular, if the number d of exchanged intervals is two, IETs reduce to rotations. For rotations (d = 2), Theorem 2.1 was proved in [15] (and implies that, as above mentioned, the unique minimal component of a typical locally Hamiltonian flow in genus one is mixing), Theorem 2.2 by Kochergin (first for a.e. rotation number α [16], later for all α [17]) and Theorem 2.3 by Fr¸ aczek and Lema´ nczyk in [9]. Furthermore, Scheglov in [24] proved Theorem 2.2 for d = 4 and π = (54321) (which implies that on a surface of genus g = 2 typical multivalued Hamiltonian flows with two isometric saddles are not mixing). 4 The result is proved in [27] for roofs with one logarithmic singularity, but it can be extended to roofs with more singularities under the asymmetry condition f ∈ AsymLog.
Shearing and mixing in parabolic flows
699
Theorem 2.2 ([26]). For a.e. IET T , the suspension flow built over T under any roof function f ∈ SymLog is not mixing. It is natural to ask whether the flows in Theorem 2.2 are at least weakly mixing. We recall that the flow {ϕt } is weakly mixing if for each pair of measurable sets A, B one has Z 1 T |µ(ϕt (A) ∩ B) − µ(A)µ(B)| dt = 0. lim T →∞ T 0 We also proved the following result. Theorem 2.3 ([28]). For a.e. IET T , the suspension flow build over T under any roof function f ∈ SymLog is weakly mixing. Theorems 2.1, 2.2 and 2.3 yield a complete description of (weak) mixing in locally Hamiltonian flows. Since the presence of saddle loops is an open condition, that is it persists under small perturbation by a closed 1-form, we have the following picture. There are two open sets of locally Hamiltonian flows, the first containing minimal flows with only simple saddles (which are represented as special flows with roofs with symmetric logarithmic singularities), the second containing flows with saddle loops homologous to zero (which include flows with centers or with more than one minimal components separated by cylinders of periodic orbits, whose minimal components correspond to roofs with asymmetric singularities). A typical flow in the first open set is minimal, weakly mixing but not mixing, while typical flows in the second open set restricted to each minimal component are mixing. Let us briefly mention how shearing (or the absence of) plays a crucial role in the proofs of Theorems 2.1 and 2.2. Near a Hamiltonian saddle, both in the symmetric and asymmetric case, transversal arcs are sheared in the flow direction, but the direction of shearing (that is, the sign of the slope of the asymptotic curves) depends on the side from which the saddle is approached. In the asymmetric case, the shearing happens prevalently in one direction, so it accumulates and after a long time, many small arcs transversal to the flow become long sheared arcs almost in the flow direction (see Figure 1) which wrap around the surface. In the special flow representation, the long time evolution of a small horizontal segment in the asymmetric case consists of many almost vertical arcs, as shown in Figure 1. Since the transverse dynamics, that is the IET, is typically ergodic, these arcs become equidistributed, as in the case of time-changes of horocycle flows. In the symmetric case, though, the shear which is gained when passing near one side of a singularity is lost when passing on a different side of other saddles. To make rigorous these heuristic arguments and to produce quantitative estimates on the shear, one uses the special flow representation. In the setup of special flows the shearing mechanism is also known as stretching of Birkhoff sum. Indeed, the evolution of a small transversal arc is described by Birkhoff sums of the roof function Pn−1 over the base transformations, that is expressions of the form Sn f (x) = i=0 f (T i x). More precisely, if I ⊂ [0, 1] is a small horizontal arc such
700
Corinna Ulcigrai
that ϕs (I) does not hit any singularity of T for 0 ≤ s ≤ t, then ϕt (I) consists of several curves γj (an example in the asymmetric case is shown in Figure 1), each of which has a parametrization of the form T nj x, Snj f (x) , x ∈ Ij (7) for some integer nj and subinterval Ij ⊂ I. Thus, quantitative estimates on the shear of γj reduce to estimates of the derivative (Snj f )0 which, since T is almost everywhere and isometry, is the Birkhoff sum Snj f 0 of the derivative f 0 of the roof function f . Since f has logarithmic singularities, f 0 has singularities of type 1/x and thus it is non-integrable. If f ∈ AsymLog, one can prove a non-standard limit theorem for Birkhoff sums of f 0 , showing that Sn f 0 /n ln n converge in probability to a constant. This is one of the main ingredients to obtain the quantitative stretch estimate used in proving mixing. Stretching of Birkhoff sums is the only phenomenon that can create mixing if the base transformation T of a special flow is enough rigid (in the sense explained below), as shown by the following criterion. In (see [16]) Kochergin proved that if (K1) the base transformation is partially rigid, that is, there exist subsets Ek ⊂ [0, 1] with Leb(Ek ) ≥ c > 0 for all k, and a sequence of increasing times {rk }k∈N such that the iterate T rk , restricted to Ek , converge to the identity transformation and (K2) Birkhoff sums Srk f do not stretch on Ek , that is there exists an M > 0 such that for all k ∈ N, for all y1 , y2 ∈ Ek , |Srk f (y1 ) − Srk f (y2 )| < M , then the special flow {ϕt }t∈R is not mixing. We use this criterion to prove Theorem 2.2. The existence for any IET of partial rigidity sets as in (K1) is due to Katok [13]. We use a variation of Katok’s construction. The heart of the proof of Theorem 2.2 consists in proving (K2). While if f ∈ AsymLog the Birkhoff sums Sn f 0 grow as n ln n, if f ∈ SymLog one needs to show uniform estimates of the form |Sn f 0 | ≤ M n. Heuristically, the absence of growth of Sn f 0 is related to the fact that in the symmetric case, there are delicate cancellations between the contributions to Sn f 0 coming from positive and negative singularities of f 0 . The next crucial technical ingredients both in the proof of Theorem 2.1 and Theorem 2.2 (which we do not discuss here) are conditions on the IET which allow a fine control on equidistribution of orbits. In both Theorems we define a suitable Diophantine-type condition on IETs which is satisfied by a.e. IET and give the full measure sets of IETs for which the conclusions of the Theorems hold.
3. Time changes of Heisenberg nilflows Another important class of (homogenous) parabolic flows is given by nilflows. The basic example is the Heisenberg nilflow defined as follows. The 3-dimensional Heisenberg group N is the unique connected, simply connected Lie group with 3-dimensional Lie algebra n on two generators X, Y satisfying the Heisenberg commutation relations [X, Y ] = Z ,
[X, Z] = [Y, Z] = 0 .
701
Shearing and mixing in parabolic flows
Up to isomorphisms, N is the group 1 [x, y, z] := 0 0
of upper triangular unipotent matrices x z 1 y , x, y, z ∈ R. 0 1
A compact Heisenberg nilmanifold is the quotient M := Γ\N of the Heisenberg group over a co-compact lattice Γ < N . It is well-known that there exists a positive integer E ∈ N such that, up to an automorphism of N , the lattice Γ coincide with the lattice 1 x z/E y : x, y, z ∈ Z . Γ := 0 1 0 0 1 Any Heisenberg nilmanifold M has a natural probability measure µ locally given by the Haar measure of N . The group N acts on the right transitively on M by right multiplication, given by Rg (x) := x g, for x ∈ M , g ∈ N . By definition, Heisenberg nilflows are the flows obtained by the restriction of this right action to the one-parameter subgroups on N . The measure µ defined above, which is invariant for the right action of N on M , is, in particular, invariant for all nilflows on M . Thus each W := wx X + wy Y + wz Z ∈ n defines a measure preserving flow (φW , µ) on M where φW := {φtW }t∈R is given by the formula φtW (x) = x exp(tW ),
x ∈ M, t ∈ R .
By classical results of homogeneous dynamics, see [2], minimal nilflows are uniquely ergodic. However, in contrast with horocycle flows, they are never mixing, not even weak mixing. However, there is a clear geometric obstruction to the (weak) mixing property, that is, every nilflow is only partially parabolic, in the sense that it has an elliptic factor given by a linear flow on a torus. However, all nilflows are relatively mixing in the following sense. Let H := π ∗ L2 (T2 ) ⊂ L2 (M ) be the subspace obtained by pull-back of the square-integrable functions on the torus T2 and let H ⊥ ⊂ L2 (T) its orthogonal complement. Then the restriction of any nilflow (φW , µ) to the N -invariant subspace H ⊥ ⊂ L2 (T) has countable Lebesgue spectrum [2], hence it is mixing. In fact, it is possible to prove by the theory of unitary representations of the Heisenberg group that for all sufficiently smooth functions in H ⊥ the decay of correlations is polynomial (it is faster than any polynomial for infinitely differentiable functions in H ⊥ ). Results on the speed of equidistribution of Heisenberg nilflows for smooth functions were proved in [7] by G. Forni and L. Flaminio. In joint work with A. Avila and G. Forni we investigated mixing for timechanges of Heisenberg nilflows. Let {ht }t∈R be a uniquely ergodic homogeneous flow on the Heisenberg nilmanifold M . For any function α : C ∞ (M ) → R+ let hα := {hα t }t∈R be the time-change with infinitesimal generator α, given by the formula dht dhα t = α . dt t=0 dt t=0
702
Corinna Ulcigrai
Time-changes of Heisenberg nilflows admit the following special flow representation (see section 2 for the definition of special flow). In any uniquely ergodic Heisenberg nilflow the smooth compact surface Σ ⊂ M defined by Σ := {Γ exp(xX + zZ) : (x, z) ∈ R2 } is transversal to the flow. Since the subspace hX, Zi generated in n by X, Z ∈ n is an abelian ideal, the surface Σ is isomorphic to a 2-dimensional torus. One can compute the return map and the return time function (see [25], §3). It turns out that the return time is constant and the return map is a linear skew-shift over an irrational rotation of the circle, that is a map f : T2 → T2 of the following form: for all (x, z) ∈ T2 .
f (x, z) = (x + α, z + x + β) ,
(8)
In particular, any (uniquely ergodic) Heisenberg nilflow can be represented as special flow over a linear skew-shift of the form (8) under a constant roof function, while each (smooth) time-change {hα t }t∈R corresponds to a special flows under a (smooth) roof function Φα : T2 → R+ . Let us recall that f in (8) (and hence the special flow) is (uniquely) ergodic if and only if α is irrational. It is well-known that if the roof function Φ > 0 is a measurable (smooth) almost coboundary, that is, if there exists a measurable (smooth) function u : X → R such that Z u◦f −u=Φ− Φ dLeb , T2
then the special flow under Φ is measurably (smoothly) trivial, that is measurably (smoothly) isomorphic to a special flow with constant roof function. Any measurably trivial special flow is not weak-mixing, hence not mixing. We prove in [3] that this is the only obstruction to mixing in a dense class of roof functions which contains all trigonometric polynomials. Let f be as in (8) with α irrational. Theorem 3.1 ([3]). There exists a dense subspace R ⊂ C ∞ (T2 ) such that, for any positive function Φ ∈ R, the special flow f Φ under Φ and over f is mixing if and only if it is not measurably trivial and if and only if it is not smoothly trivial. The result can also be formulated directly in the set-up of smooth time-changes of Heisenberg nilflows (see Theorem 3 in [3]). In particular, since the set of smoothly trivial functions has countable codimension, mixing is typical. We remark also that the set of mixing roof functions in Theorem 3.1 is concretely described in [3] in terms of invariant distributions, that is, it is possible to check explicitely if a given smooth roof function given in terms of a Fourier expansion belongs to Mf . Thus one can construct concrete examples of mixing reparametrizations, as the one given by the roof function Φ(x, y) = sin(2πy) + 2. As in the case of locally Hamiltonian flows, the main mechanism that we use to prove Theorem 3.1 is a phenomenon of stretching of Birkhoff sums, but the stretch happens here only in the z-direction. Let us give an heuristic explanation of this mechanism. Fix x0 ∈ T and let I = {x0 } × [a, b] be a subinterval of the y-fiber {x0 } × T. The stretch of Φn on I is by definition the following quantity: ∆Φn (I) := max Φn (x0 , y) − min Φn (x0 , y). a≤y≤b
a≤y≤b
Shearing and mixing in parabolic flows
703
Let nt (x, y) be the number of discrete iterations of f that the orbit of point (x, y) up to time t undergoes. One can show that, for all sufficiently large t, there is a set of intervals I = {x} × [y 0 , y 00 ] whose union has large measure in T2 and which have large stretch ∆Φn (I) for all times n of the form nt (x, y) for some (x, y) ∈ I. Large stretch implies that the variation of the number of discrete iterations nt (x, y) with (x, y) ∈ I is large. Moreover, we show that in this construction y 7→ nt (x, y) is monotone on [y 0 , y 00 ]. If we subdivide I into intervals Ii on which nt (x, y) is constant, the image under the special flow {ftΦ }t∈R of each Ii is a 1- dimensional curve γi = ftΦ (Ii ) which goes from the base (i.e. the set T2 × {0}) to the roof (i.e. the set {(x, y, Φ(x, y)) : (x, y) ∈ T2 }). Since f sends y-fibers to y-fibers and preserves distances within y-fibers, the projection of each curve γi under the map (x, y, z) 7→ (x, y) is an interval in another y-fiber of the same length than Ii . If the intervals I are chosen sufficiently small, the projections of the curves γi shadow with good approximation an orbit of f . Moreover one can estimate the distortion of the curves γi and show that they are close to segments in the z-direction. Using that the skew-product f is uniquely ergodic, together with estimates on the distortion, we can hence show that fiΦ (I) (which is the union of the curves γi ) becomes equidistributed and hence prove mixing. The stretching of Birkhoff sums for Heisenberg nilflows is derived from a theorem on the growth of Birkhoff sums of functions which are not coboundaries with a measurable Rtransfer function. More precisely, we consider the function φ(x, z) = Φ(x, z) − Φ(x, z)dz and show that if φ is not a measurable coboundary, then, for each C > 1, lim Leb(|φn | < C) = 0 . n→∞
This result is quite general and can be proved for all nilflows. In fact, it is essentially based on a measurable Gottschalk–Hedlund theorem, which holds for any volume preserving uniquely ergodic dynamical system, and on the parabolic divergence of orbits (although in a quite explicit form). The second part of the proof of Theorem 3.1 consists of a cocycle effectiveness result for the Heisenberg case, which states that if a smooth function is a coboundary with a measurable transfer function, then the transfer function is in fact smooth. This result is based on the sharp bounds [7] for ergodic sums which are only available in the Heisenberg case. Let us remark that a similar shearing mixing mechanism was also used by Fayad in [5] to produce smooth (analytic) mixing time-changes of some elliptic flows, i.e. linear flows on tori Tn , with n ≥ 3. If n = 2, smooth time-changes of a linear flow on T2 are never mixing (for example by [13]). Moreover, for Diophantine linear flows on Tn all smooth time-changes are trivial by KAM theory, so mixing examples as in [5] can exist only for Liouvillean frequencies. Thus, in the elliptic realm, the phenomenon of stretching of Birkhoff sums and mixing time-changes is not generic and can occurr only for Liouvillean frequencies, in contrast with our result for nilflows, where mixing time-changes are generic for any uniquely ergodic nilflow, or equivalently, as long as the frequency of the elliptic factor is irrational. Theorem 3.1 leaves open several natural questions on possible generalizations, for example whether Theorem 3.1 holds within the class of all smooth time-changes and whether it extends to nilflows on 2-step nilmanifolds on several generators or
704
Corinna Ulcigrai
nilflows on s-step nilmanifolds for any s ≥ 3. Other natural questions on the dynamics of nilflows are whether the correlation decay polynomial in time for sufficiently smooth functions (under a Diophantine conditions on the frequency) and what is the nature of the spectrum of mixing time-changes, whether it is absolutely continuous or countable Lebesgue.
References [1] V. I. Arnold, Topological and ergodic properties of closed 1-forms with incommensurable periods. Funktsional’nyi Analiz i Ego Prilozheniya 25 (1991), 1–12. [2] L. Auslander, L. Green, and F. Hahn, Flows on homogeneous spaces. Princeton University Press, Princeton, NJ, 1963. [3] A. Avila, G. Forni, and C. Ulcigrai. Mixing for time-changes of Heisenberg nilflows. Journal of Differential Geometry 89 no. 3 (2011) 369–410. [4] A. Bufetov and G. Forni, Limit Theorems for Horocycle Flows. Preprint (2011) arXiv:1104.4502v1. [5] B. R. Fayad. Analytic mixing reparametrizations of irrational flows. Ergodic Theory Dynam. Systems 22 (2002), 437–468. [6] L. Flaminio and G. Forni, Invariant distributions and time averages for horocycle flows. Duke Math. J. 119 no. 3 (2003), 465–526. [7] L. Flaminio and G. Forni, Equidistribution of nilflows and applications to theta sums. Ergodic Theory Dynam. Systems 26 (2006), 409–433. [8] G. Forni and C. Ulcigrai, Time changes of horocycle flows. To appear in J. Modern Dyn., Preprint (2011) arXiv:1202.4986. [9] K. Fraczek and M. Lema´ nczyk, On symmetric logarithm and some old examples in smooth ergodic theory. Fund. Math. 180 (3) (2003), 241–255. [10] H. Furstenberg, The unique ergodicity of the horocycle flow. In: Recent Advances in Topological Dyn. (New Haven, Conn., 1972), Lecture Notes in Math. 318, Springer, Berlin, 1973, 95–115. [11] B. M. Gurevich, The entropy of horocycle flows. Dokl. Akad. Nauk SSSR 136 (1961), 768–770. [12] G. A. Hedlund, Fuchsian groups and transitive horocycles. Duke Math. J. 2 1936, 530–542. [13] A. Katok. Interval exchange transformations and some special flows are not mixing. Israel J. Math. 35 no. 4 (1980), 301–310. [14] A. Katok and J.-P. Thouvenot, Spectral Properties and Combinatorial Constructions in Ergodic Theory, Handbook of Dynamical Systems, Vol. 1B (B. Hasselblatt and A. Katok editors), Elsevier, 2006, 649–743. [15] K. M. Khanin and Ya. G. Sinai, Mixing for some classes of special flows over rotations of the circle. F unktsional’nyi Analiz i Ego Prilozheniya 26 no. 3 (1992), 1–21. [16] A. V. Kochergin, Nonsingular saddle points and the absence of mixing. Mat. Zametki 19 no. 3 (1976) 453–468.
Shearing and mixing in parabolic flows
705
[17] A. V. Koˇcergin, Nondegenerate saddles and absence of mixing in flows on surfaces. Tr. Mat. Inst. Steklova 256 (2007) 252–266. [18] A. G. Kushnirenko, Spectral properties of some dynamical systems with polynomial divergence of orbits. Vestnik Moskovskogo Universiteta. Matematika 29 no. 1 (1974), 101–108. [19] B. Marcus, Ergodic properties of horocycle flows for surfaces of negative curvature. Ann. of Math. 105 (1977), 81–105. [20] H. Masur. Interval exchange transformations and measured foliations. Ann. of Math. 115 (1982), 169–200. [21] S. P. Novikov, The Hamiltonian formalism and a multivalued analogue of Morse theory. Uspekhi Matematicheskikh Nauk 37 no. 5 (1982), 3–49. [22] O. S. Parasyuk, Flows of horocycles on surfaces of constant negative curvature. Uspekhi Mat. Nauk 8 no. 3 (1953), 125–126. [23] M. Ratner, The rate of mixing for geodesic and horocycle flows. Ergodic Theory Dynam. Systems 7 (1987), 267–288. [24] D. Scheglov, Absence of mixing for smooth flows on genus two surfaces. J. Modern Dyn. 3 no. 1 (2009), 13–34. [25] A. Starkov, Dynamical Systems on Homogeneous Spaces. Translations of the American Mathematical Society, 190, Providence, Rhode Island 2002. [26] C. Ulcigrai, Absence of mixing in area-preserving flows on surfaces. Ann. of Math. 173 no. 2 (2011), 1743–1778 [27] C. Ulcigrai, Mixing for suspension flows over interval exchange transformations. Ergodic Theory Dynam. Systems 27 no. 3 (2007), 991–1035. [28] C. Ulcigrai, Weak mixing for logarithmic flows over interval exchange transformations. J. Modern Dyn. 3 no. 1 (2009), 35–49. [29] W. Veech, Gauss measures for transformations on the space of interval exchange maps. Ann. of Math. 115 (1982), 201–242. [30] A. Zorich, How do the leaves of a closed 1-form wind around a surface? Tanslations of American Mathematical Society 197 (1999), 135–178.
Corinna Ulcigrai, School of Mathematics, University of Bristol, University Walk, Clifton, Bristol BS8 1TW, United Kingdom E-mail: [email protected]
Optimal control theory and some applications to aerospace problems Emmanuel Tr´elat
Abstract. In this proceedings article we first shortly report on some classical techniques of nonlinear optimal control such as the Pontryagin Maximum Principle and the conjugate point theory, and on their numerical implementation. We illustrate these issues with problems coming from aerospace applications such as the orbit transfer problem which is taken as a motivating example. Such problems are encountered in a longstanding collaboration with the european space industry EADS Astrium. On this kind of nonacademic problem it is shown that the knowledge resulting from the maximum principle is insufficient for solving adequately the problem, in particular due to the difficulty of initializing the shooting method, which is an approach for solving the boundary value problem resulting from the application of the maximum principle. On the orbit transfer problem we show how the shooting method can be successfully combined with a numerical continuation method in order to improve significantly its performances. We comment on assumptions ensuring the feasibility of continuation or homotopy methods, which consist of deforming continuously a problem towards a simpler one, and then of solving a series of parametrized problems to end up with the solution of the initial problem. Finally, in view of designing low cost interplanetary space missions, we show how optimal control can be also combined with dynamical system theory, which allows to put in evidence nice properties of the celestial dynamics around Lagrange points that are of great interest for mission design. 2010 Mathematics Subject Classification. Primary 49J15, 93B40; Secondary 65H20, 37N05, 37N35. Keywords. Optimal control, Pontryagin Maximum Principle, conjugate points, numerical methods, shooting method, orbit transfer, continuation method, dynamical systems.
1. Introduction Let n and m be nonzero integers. Consider on IRn the control system x(t) ˙ = f (t, x(t), u(t)),
(1)
where f : IRn × IRm −→ IRn is smooth, and where the controls are bounded and measurable functions, defined on intervals [0, T (u)] of IR+ , and taking their values in a subset U of IRm . Let M0 and M1 be two subsets of IRn . Denote by U the set of admissible controls, so that the corresponding trajectories steer the system from an initial point of M0 to a final point in M1 . For such a control u, the cost of the
708
Emmanuel Tr´elat
corresponding trajectory xu (·) is defined by Z
tf
C(tf , u) =
f 0 (t, xu (t), u(t))dt + g(tf , xu (tf )),
(2)
0
where f 0 : IR × IRn × IRm −→ IR and g : IR × IRn → IR are smooth. We investigate the optimal control problem of determining a trajectory xu (·) solution of (1), associated with a control u on [0, tf ], such that xu (0) ∈ M0 , xu (tf ) ∈ M1 , and minimizing the cost C. The final time tf can be fixed or not. When the optimal control problem has a solution, we say that the corresponding control (or the corresponding trajectory) is minimizing or optimal. As a motivating example, we consider the orbit transfer problem with low thrust, modeled with the controlled Kepler equations q¨(t) = −q(t)
T (t) µ + , r(t)3 m(t)
m(t) ˙ = −βkT (t)k,
(3)
where q(t) ∈ IR3 is the position of the engine at time t, r(t) = kq(t)k, T (t) is the thrust at time t, and m(t) is the mass, with β = 1/Isp g0 . Here g0 is the usual gravitational constant and Isp is the specific impulsion of the engine. The thrust is submitted to the constraint kT (t)k 6 Tmax , where the typical value of the maximal thrust Tmax is around 0.1 Newton, for low-thrust engines. The orbit transfer problem consists of steering the engine from a given initial orbit (e.g. an initial excentric inclinated orbit) to a final one (e.g., the geostationary orbit), either in minimal time or by minimizing the fuel consumption. This is an optimal control problem of the form settled above, where the state is then x = (q, q, ˙ m) ∈ IR7 , the control is the thrust, the set U of constraints on the control is the closed ball of IR3 centered at the origin and with radius Tmax . If one considers the minimal time problem, then one can choose f 0 = 1 and g = 0, and if one considers the minimal fuel consumption problem, then one can choose f 0 (t, x, u) = βkuk and g = 0. Note that controllability properties, ensuring the feasibility of the problem, have been studied in [10, 13], based on a careful analysis of the Lie algebra generated by the vector fields of the system (3). The purpose of this proceedings article is to shortly report on some of the main issues of optimal control theory, with a special attention to applications to aerospace problems. We will show that the most classical techniques of optimal control, namely, the Pontryagin Maximum Principle, the conjugate point theory, and the associated numerical methods, are in general insufficient to solve efficiently a given optimal control problem. They can however be significantly improved by combining them with other tools like numerical continuation (homotopy) methods, geometric optimal control, or results of dynamical system theory. These items will be illustrated with the minimal time or minimal consumption orbit transfer problem with strong or low thrust (mentioned above), and space mission design using the dynamics around Lagrange points. We mention the recent survey [51] for a more detailed exposition of optimal control applied to aerospace.
Optimal control theory and some applications to aerospace problems
709
2. Classical optimal control theory and numerical approaches Throughout this section, we assume that the optimal control problem (1)–(2) has an optimal solution. Note that there exists a large literature on existence results for optimal controls. Such results usually require some convexity assumptions on the dynamics (see e.g. [18]). Here, we are not concerned with such issues. The aim of this section is to provide the most classical first- and second-order necessary and/or sufficient conditions allowing one to characterize and compute optimal trajectories. 2.1. First-order necessary optimality conditions. The set of admissible controls on [0, tf ] is denoted Utf ,IRm , and the set of admissible controls on [0, tf ] taking their values in U is denoted Utf ,U . Definition 2.1. The end-point mapping E : IRn × IR+ × U → IRn of the system is defined by E(x0 , T, u) = x(x0 , T, u), where t 7→ x(x0 , t, u) is the trajectory solution of (1), corresponding to the control u, such that x(x0 , 0, u) = x0 . The set Utf ,IRm , endowed with the standard topology of L∞ ([0, tf ], IRm ), is open, and the end-point mapping is smooth on Utf ,IRm . In terms of the endpoint mapping, the optimal control problem (1)–(2) can be recast as the infinite dimensional minimization problem min{C(tf , u) | x0 ∈ M0 , E(x0 , tf , u) ∈ M1 , u ∈ L∞ (0, tf ; U )}.
(4)
Although it is in infinite dimension, this is a classical optimization problem with constraints. It is well-known that in such a problem it is usually important that the set of constraints have (at least locally) the structure of a manifold. This is one of the motivations of the following definition. Definition 2.2. Assume that M0 = {x0 }. A control u defined on [0, tf ] is said singular if and only if the differential ∂E ∂u (x0 , tf , u) is not of full rank. Singular controls are one of the main notions in optimal control theory. Note that, in the above constrained minimization problem, the set of constraints is a local manifold around a given control u provided u is nonsingular. Assume temporarily that we are in the simplified situation where M0 = {x0 }, M1 = {x1 }, T is fixed, g = 0 and U = IRm . The optimal control problem the consists of steering the system (1) from the initial point x0 to the final point x1 in time T and minimizing the cost (2) among controls u ∈ L∞ ([0, T ], IRm ). Assuming that the extremities are fixed is not a big simplification and it is not difficult to extend the following statements to the case of general subsets. Here, the main (important) simplification is the fact that the controls are unconstrained. In that case, the optimization problem (4) reduces to min
C(T, u).
(5)
E(x0 ,T,u)=x1
If u is optimal then there exists a Lagrange multiplier (ψ, ψ 0 ) ∈ IRn × IR \ {0} such that ψ.dEx0 ,T (u) = −ψ 0 dCT (u), (6)
710
Emmanuel Tr´elat
0 0 T written equivalently as ∂L ∂u (u, ψ, ψ ) = 0 where LT (u, ψ, ψ ) = ψEx0 ,T (u) + ψ 0 CT (u) is the so-called Lagrangian of the optimization problem (5). It can be noted that with such an approach it would be more difficult to take into account control (or even state) constraints. One of the main contributions of Pontryagin and his collaborators is the consideration of needle-like variations, allowing one to deal efficiently with control constraints. The Pontryagin Maximum Principle, which is the milestone of optimal control theory, is a far-reaching issue of the firstorder necessary condition (6). In some sense it is a parametrization of (6) punctually along the trajectory. The following statement is the most usual Pontryagin Maximum Principle, valuable for general nonlinear optimal control problems (1)– (2), with control constraints but without state constraint. Usual proofs rely on a fixed point argument and on the use of Pontryagin cones (see e.g. [42, 35]).
Theorem 2.3 (Pontryagin Maximum Principle). If the trajectory x(·), associated to the optimal control u on [0, tf ], is optimal, then it is the projection of an extremal (x(·), p(·), p0 , u(·)) (called extremal lift), where p0 6 0 and p(·) : [0, tf ] → IRn is an absolutely continuous mapping called adjoint vector, with (p(·), p0 ) 6= (0, 0), such that x(t) ˙ =
∂H (t, x(t), p(t), p0 , u(t)), ∂p
p(t) ˙ =−
∂H (t, x(t), p(t), p0 , u(t)), ∂x
almost everywhere on [0, tf ], where H(t, x, p, p0 , u) = hp, f (t, x, u)i + p0 f 0 (t, x, u) is called the Hamiltonian of the optimal control problem, and there holds H(t, x(t), p(t), p0 , u(t)) = max H(t, x(t), p(t), p0 , v) v∈U
(7)
almost everywhere on [0, tf ]. Moreover, if the final time tf to reach the target M1 is not fixed, then one has the condition at the final time tf max H(tf , x(tf ), p(tf ), p0 , v) = −p0 v∈U
∂g (tf , x(tf )). ∂t
(8)
Additionally, if M0 and M1 (or just one of them) are submanifolds of IRn locally around x(0) ∈ M0 and x(tf ) ∈ M1 , then the adjoint vector can be built in order to satisfy the transversality conditions at both extremities (or just one of them) p(0) ⊥ Tx(0) M0 ,
p(tf ) − p0
∂g (tf , x(tf )) ⊥ Tx(tf ) M1 , ∂x
(9)
where Tx Mi denotes the tangent space to Mi at the point x. The relation between the Lagrange multipliers and (p(·), p0 ) is that the adjoint vector can be constructed so that (ψ, ψ 0 ) = (p(tf ), p0 ) up to some multiplicative scalar. In particular, the Lagrange multiplier ψ is unique (up to a multiplicative scalar) if and only if the trajectory x(·) admits a unique extremal lift (up to a multiplicative scalar). If p0 < 0 then the extremal is said normal, and in this case, since the Lagrange multiplier is defined up to a multiplicative scalar, it is usual to normalize it so that p0 = −1. If p0 = 0 then the extremal is said abnormal.
Optimal control theory and some applications to aerospace problems
711
It can be also noted (using (6)) that, in the absence of control constraints, abnormal extremals project exactly onto singular trajectories. The scalar p0 is a Lagrange multiplier associated with the instantaneous cost. Abnormal extremals, corresponding to p0 = 0, are not detected with the usual Calculus of Variations approach, because this approach postulates at the very beginning that, in a neighborhood of some given reference trajectory, there are other trajectories having the same terminal points, whose respective costs can be compared (and this leads to Euler–Lagrange equations). But this postulate fails whenever the reference trajectory is isolated: it may indeed happen that there is only one trajectory joining the terminal points under consideration. A typical situation where this phenomenon occurs is when there is a unique trajectory joining the desired extremities: then obviously it will be optimal, for any possible optimization criterion. In this case the trajectory is singular and the corresponding extremal is abnormal. In many situations, where some qualification conditions hold, abnormal extremals do not exist in the problem under consideration, but in general it is impossible to say whether, given some initial and final conditions, these qualification conditions hold or not. Finally, we mention that in the normal case the Lagrange multiplier ψ (or the adjoint vector p(tf ) at the final time) coincides up to some multiplicative scalar with the gradient of the value function (solution of a Hamilton–Jacobi equation); see e.g. [22] for precise results. Remark 2.4. The Pontryagin Maximum Principle withstands many possible generalizations: intrinsic version on manifolds (see [2]), wider classes of functionals and boundary conditions, delayed systems, hybrid or nonsmooth systems, etc. An important extension is the case of state constraints (see [21, 32]): in that case the adjoint vector becomes a bounded variation measure and may have some jumps when the trajectory meets the boundary of the allowed state domain. In practice in order to compute optimal trajectories with the Pontryagin Maximum Principle the first step is to make explicit the maximization condition. Under 2 the usual strict Legendre assumption, that is, the Hessian ∂∂uH2 (t, x, p, p0 , u) is negative definite, a standard implicit function argument allows one to express (locally) the optimal control u as a function of x and p. To simplify, below we assume that we are in the normal case (p0 = −1). Plugging the resulting expression of the control in the Hamiltonian equations, and defining Hr (t, x, p) = H(t, x, p, −1, u(x, p)), it follows that every normal extremal is solution of x(t) ˙ =
∂Hr (t, x(t), p(t)), ∂p
p(t) ˙ =−
∂Hr (t, x(t), p(t)). ∂x
(10)
The exponential mapping is then defined by expx0 (t, p0 ) = x(t, x0 , p0 ), where the solution of (10) starting from (x0 , p0 ) at t = 0 is denoted as (x(t, x0 , p0 ), p(t, x0 , p0 )). It parametrizes the (normal) extremal flow, and is the natural extension to optimal control theory of the Riemannian exponential mapping. In Riemannian geometry the extremal equations correspond to the cotangent formulation of the geodesics equations. Note as well that the equations (10) are the cotangent version of the usual Euler–Lagrange equations of the calculus of variations.
712
Emmanuel Tr´elat
The abnormal extremal flow can be parametrized as well provided that there holds such a kind of Legendre assumption in the abnormal case. When the Hessian of the Hamiltonian is degenerate the situation is more intricate. This is the case for instance when one considers the minimal time problem for single-input control affine systems x(t) ˙ = f0 (x(t)) + u(t)f1 (x(t)) without constraint on controls. In that case, the maximization condition leads to ∂H ∂u = 0, that is, there must hold hp(t), f1 (x(t))i = 0 along the corresponding extremal. To compute the control, the method consists of differentiating two times this relation with respect to t, which leads at first to hp(t), [f0 , f1 ](x(t))i = 0 and then at hp(t), [f0 , [f0 , f1 ]](x(t))i + u(t)hp(t), [f1 , [f0 , f1 ]](x(t))i = 0, where [·, ·] denotes the Lie bracket of vector fields. This permits as well to express the optimal control u(t) as a function of x(t) and p(t), provided that the quantity hp(t), [f1 , [f0 , f1 ]](x(t))i does not vanish along the extremal (strong generalized Legendre–Clebsch condition, see [12]). It can also be shown that this kind of computation is valid in a generic situation (see [19, 20]). Note that, when facing with an optimal control problem, in general we have to deal with two extremal flows, distinguished by the binary variable p0 ∈ {0, −1}. Due to additional constraints it is however expected that the abnormal flow fills less space than the normal one, in the sense that almost every point of the accessible set should be reached by a normal extremal. Such a statement is however difficult to derive. We refer to [43, 44, 49] for precise results for control-affine systems. 2.2. Second-order optimality conditions. We stress that the Pontryagin Maximum Principle is first-order necessary condition for optimality. Conversely, in order to ensure that a given extremal is indeed optimal (at least locally), sufficient second-order conditions are required. For the sake of simplicity we assume that we are in the simplified situation where M0 = {x0 }, M1 = {x1 }, g = 0 and U = IRm . In this simplified situation where there is no constraint on the control, conditions of order two are standard. Defining the usual intrinsic second order deriva2 tive QT of the Lagrangian as the Hessian ∂∂ 2LuT (u, ψ, ψ 0 ) restricted to the subspace T ker ∂L ∂u , it is well-known that a second-order necessary condition for optimality is that QT be nonpositive (recall the agreement ψ 0 6 0), and a second-order sufficient condition for local optimality is that QT be negative definite. As previously with the Pontryagin Maximum Principle, these second-order conditions can be parametrized along the extremals, as derived in the theory of conjugate points, whose main issues are the following. Under the strict Legendre assumption, the quadratic form QT is negative definite whenever T > 0 is small enough. This leads naturally to define the first conjugate time tc along a given extremal as the infimum of times t > 0 such that Qt has a nontrivial kernel. Under the strict Legendre assumption, there holds tc > 0, and this first conjugate time characterizes the (local) optimality status of the trajectory, in the sense that the trajectory x(·) is locally optimal (in L∞ topology) on [0, t] if and only if t < tc (see [2, 12] for more details on that theory). The following result is important in view of practical computations of conjugate times.
Optimal control theory and some applications to aerospace problems
713
Theorem 2.5. The time tc is a conjugate time along x(·) if and only if the mapping expx0 (tc , ·) is not an immersion at p0 (that is, its differential is not injective). Essentially it states that computing a first conjugate time reduces to compute the vanishing of some determinant along the extremal. Indeed, the fact that the exponential mapping is not an immersion can be translated in terms of so-called vertical Jacobi fields. Note however that the domain of definition of the exponential mapping requires a particular attention in order to define properly these Jacobi fields according to the context: normal or abnormal extremal, final time fixed or not. A more complete exposition can be found in the survey article [11], which provides also some algorithms to compute first conjugate times in various contexts. Remark 2.6. The conjugate point theory sketched above can be extended to more general situations, such as general initial and final sets (notion of focal point), discontinuous controls (see [37, 3]) by using the notion of extremal field (see [1]). We refer the reader to [47] where a brief survey with a unifying point of view of different approaches has been written in the introduction. Up to now a complete theory of conjugate points, that would cover all possible cases (with trajectories involving singular arcs and/or boundary arcs) still does not exist. 2.3. Numerical methods in optimal control. It is usual to distinguish between direct and indirect numerical approaches in optimal control. Direct methods consist of discretizing the state and the control and after this discretization the problem is reduced to a nonlinear finite dimensional optimization problem with constraints of the form min{F (Z) | g(Z) = 0, h(Z) 6 0}.
(11)
The dimension is of course larger as the discretization is finer. There exist many ways to carry out such discretizations (collocation, spectral or pseudospectral methods, etc). In any case, one has to choose finite dimensional representations of the control and of the state, and then express in a discrete way the differential equation representing the system. Once all static or dynamic constraints have been transcribed into a problem with a finite number of variables, one is ought to solve the resulting optimization problem with constraints, using some adapted optimization method (gradient methods, penalization, dual methods, etc). Note that this kind of method is easy to implement in the sense that it does not require a precise a priori knowledge of the optimal control problem. Moreover, it is easy to take into account some state constraints or any other kinds of constraints in the optimal control problem. In this sense, this approach is not so much sensitive to the model. In practice, note that it is possible to combine automatic differentiation softwares (such as the modeling language AMPL, see [25]) with expert optimization routines (such as the open-source package IPOPT, see [52]), which allows one to implement in a simple way with few lines of code even difficult optimal control problems. We refer the reader to [6] for an excellent survey on direct methods with a special interest to applications in aerospace.
714
Emmanuel Tr´elat
Another approach, that we only mention quickly, is to solve numerically the Hamilton–Jacobi equation satisfied (in the sense of viscosity) by the value function of the optimal control problem. We refer to [46] for level set methods but mention that this approach can only be applied to problems of very low dimension. Indirect methods consist of solving numerically the boundary value problem derived from the application of the Pontryagin Maximum Principle, and lead to the shooting methods. More precisely, since every optimal trajectory is the projection of an extremal, the problem is reduced to finding an extremal, solution of the extremal equations and satisfying some boundary conditions: z(t) ˙ = F (t, z(t)),
R(z(0), z(tf )) = 0.
Denoting by z(t, z0 ) the solution of the Cauchy problem z(t) ˙ = F (t, z(t)), z(0) = z0 , and setting G(z0 ) = R(z0 , z(tf , z0 )), this boundary value problem is then equivalent to solving G(z0 ) = 0, that is, one should determine a zero of the so-called shooting function G. This can be achieved in practice by using a Newton like method. This approach is called a shooting method. It has many possible refinements, among which the multiple shooting method, consisting of subdividing the time interval in N intervals, and of considering as unknowns the values at each node (with some gluing conditions). The aim is to improve the stability of the method (see e.g. [48]). From the practical implementation point of view, there exist many variants of Newton methods, among which the Broyden method or the Powell hybrid method are quite competitive. Moreover, as for direct methods the shooting methods can be combined with automatic differentiation, used in order to generate the Hamiltonian equations of extremals.1 It should be noted that, when implementing a shooting method, the structure of the trajectory has to be known a priori, particularly in the case where the trajectory involves singular arcs (see e.g. [8]). This raises the question of being able to determine at least locally the structure of optimal trajectories, as investigated in geometric optimal control theory (see further). Note that the shooting method is well-posed if and only if the Jacobian of the shooting function is invertible. In the simplest situation this is equivalent to requiring the fact that the exponential mapping is a local immersion or, in other words, that the point under consideration is not a conjugate point. Remark 2.7. As sketched above, direct methods consist of discretizing first, and then of dualizing (by applying to the discretized problem (11) some necessary condition for optimality), whereas indirect methods consist of dualizing first (by applying the Pontryagin Maximum Principle which is a necessary condition for optimality), and then of discretizing (by applying a shooting method, that is a Newton method composed with a numerical integration method). It must be noted that this diagram may fail to be commutative in general (see [31] for very simple counterexamples), due to a possible lack of coerciveness in the discrete scheme. In 1 We refer to the code COTCOT (Conditions of Order Two and COnjugate Times), available for free on the web, documented in [11], implementing such issues as well as efficient algorithms to compute conjugate points.
Optimal control theory and some applications to aerospace problems
715
other words, although usual assumptions of consistency and stability (Lax scheme) allow one to show convergence of the indirect approach, they may be insufficient to ensure the convergence of the direct approach. Up to now very few conditions do exist on the numerical schemes that can ensure the commutativity of this diagram (see [31, 7, 45]) and this problem is still widely open in general. Remark 2.8. Direct and indirect approaches have their own advantages and drawbacks and are complementary. Roughly speaking, one can say that direct methods are less precise, more computationally demanding than indirect methods, but less sensitive and more robust in particular with respect to the model (for instance it is easy to add state constraints in a direct approach). Indirect methods inherit of the main advantage of the Newton methods, namely, its extremely quick convergence and good accuracy, but also suffer from its main drawback, namely, its sensitivity to the initial guess. Moreover, the structure of the trajectory has to be known a priori in an indirect approach. We refer to [6, 40] for details on these methods and some solutions to bypass the difficulties. The reader can also find many other references in the recent survey [51]. In this article we focus on applications of optimal control to aerospace, and in such problems indirect methods are often privileged because, although they are difficult to initialize and thus to make converge, they offer an extremely good numerical accuracy and a very quick execution time. In the sequel we will show how this initialization difficulty can be overcome by combining optimal control tools with numerical continuation and geometric optimal control for orbit transfer problems (Section 3), and with dynamical systems theory for interplanetary mission design (Section 4).
3. Solving the orbit transfer problem by continuation Coming back to the orbit transfer problem (3) mentioned in the introduction as a motivating example, one immediately realizes that the main difficulty of this problem is the fact that the maximal authorized modulus of thrust is very low. It is actually not surprising to observe numerically that the lower is the maximal thrust, the smallest is the domain of convergence of the Newton method in the shooting problem. In these conditions it is natural to consider first a larger value of Tmax (e.g., 60 N), so that in that case the domain of convergence of the shooting method is much larger and thus the shooting method is easy to initialize successfully, and then to make decrease the value of Tmax step by step in order to reach down the value Tmax = 0.1 N. At each step, the shooting method is then initialized with the solution obtained from the previous step. This strategy was implemented in [15] in order to realize the minimal time 3D transfer of a satellite from a low and eccentric inclinated initial orbit towards the geostationary orbit. From the mathematical point of view, this method consists of implementing a so-called continuation, or homotopy, on the parameter Tmax . To ensure its feasibility, one should at least ensure that, along the continuation path, the optimal
716
Emmanuel Tr´elat
solution depends continuously on the continuation parameter. Let us provide several mathematical details on the continuation method. The objective of continuation or homotopy methods is to solve a problem step by step from a simpler one by parameter deformation. There exists a welldeveloped theory and many algorithms and numerical methods implementing these ideas, and the field of applications encompasses Brouwer fixed point problems, polynomial and nonlinear systems of equations, boundary value problems in many diverse forms, etc. We refer the reader to [4] for a complete report on these theories and methods. Here we use the continuation or homotopy approach in order to solve the shooting problem resulting from the application of the Pontryagin Maximum Principle to an optimal control problem. More precisely, the method consists of deforming the problem into a simpler one that we are able to solve (without any careful initialization of the shooting method), and then of solving a series of shooting problems parametrized by some parameter λ ∈ [0, 1], step by step, to come back to the original problem. The continuation method consists of tracking the set of zeros of the shooting function, as the parameter λ evolves. Numerical continuation can fail whenever the path of zeros which is tracked has bifurcation points or more generally singularities, or whenever this path fails to exist globally and does not reach the final desired value of the parameter (say, λ = 1). Let us provide shortly the basic arguments ensuring the local feasibility of the continuation method. From the theoretical point of view, regularity properties require at least that the optimal solution be continuous, or differentiable, with respect to the parameter λ. This kind of property is usually derived using an implicit function argument, which is encountered in the literature as sensitivity analysis. Let us explain what is the general reasoning of sensitivity analysis, in the simplified situation where M0 = {x0 }, M1 = {x1 } and U = IRm . We are faced with a family of optimal control problems, parametrized by λ, that can be as in (5) written in the form of min Ex0 ,T ,λ (uλ )=x1
CT,λ (u),
(12)
that is, in the form of an optimization problem depending on the parameter λ. According to the Lagrange multipliers rule, if uλ is optimal then there exists (ψλ , ψλ0 ) ∈ IRn × IR \ {0} such that ψλ dEx0 ,T,λ (uλ ) + ψλ0 dCT,λ (u) = 0. Assume that there are no minimizing abnormal extremals in the problem. Under this assumption, since the Lagrange multiplier (ψλ , ψλ0 ) is defined up to a multiplicative scalar we can definitely assume that ψλ0 = −1. Then, we are seeking (uλ , ψλ ) such that F (λ, uλ , ψλ ) = 0, where the function F is defined by ∂LT ,λ ψdEx0 ,T,λ (u) − dCT,λ (u) ∂u (u, ψ) , F (λ, u, ψ) = = Ex0 ,T,λ (u) − x1 Ex0 ,T,λ (u) − x1 ¯ u ¯ , ψ ¯ ) be a zero of F . Assume where LT,λ (u, ψ) = ψEx0 ,T,λ (u) − CT,λ (u). Let (λ, λ λ that F is of class C 1 . If the Jacobian of F with respect to (u, ψ), taken at the point ¯ u ¯ , ψ ¯ ), is invertible, then according to a usual implicit function argument one (λ, λ λ can solve the equation F (λ, uλ , ψλ ) = 0, and the solution (uλ , ψλ ) depends in a C 1
Optimal control theory and some applications to aerospace problems
way on the parameter λ. The Jacobian of F with respect to (u, ψ) is QT,λ dEx0 ,T,λ (u)∗ , dEx0 ,T,λ (u) 0
717
(13)
∂2L
where QT,λ is is defined as the restriction of the Hessian ∂ 2Tu,λ (u, ψ, ψ 0 ) to the ∂LT ,λ subspace ker ∂u , and dEx0 ,T,λ (u)∗ is the transpose of dEx0 ,T,λ (u). The matrix (13) (which is a matrix of operators) is called sensitivity matrix in sensitivity analysis. It is easy to prove that it is invertible if and only if the linear mapping dEx0 ,T,λ (u) is surjective and the quadratic form QT is nondegenerate. The surjectivity of dEx0 ,T,λ (u) exactly means that the control u is not singular (see Definition 2.2). The nondegeneracy of QT,λ is exactly related with the concept of conjugate point (see Section 2.2). Note that, as long as we do not encounter any conjugate time along the continuation path, the extremals that are computed are locally optimal. It follows that, to ensure the surjectivity of dEx0 ,T,λ (u) along the continuation process, it suffices to assume the absence of singular minimizing trajectory (in the simplified problem considered here, with unconstrained controls, singular trajectories are exactly the projections of abnormal extremals). Therefore, we conclude that, as long as we do not encounter any minimizing singular control nor conjugate point along the continuation procedure, then the continuation method is locally feasible, and the extremal solution (uλ , ψλ ) which is locally computed as above is of class C 1 with respect to the parameter λ. Remark 3.1. The absence of conjugate point can be tested numerically, as explained previously. The assumption of the absence of minimizing singular trajectories is of a much more geometric nature. Such results exist for some classes of control-affine systems under some strong Lie bracket assumptions (see [2, 12]). Moreover, it is proved in [19, 20] that for generic (in the Whitney sense) controlaffine systems with more than two controls, there is no minimizing singular trajectory; hence for such kinds of systems the assumption of the absence of minimizing singular trajectory is automatically satisfied. To ensure the global feasibility of the continuation procedure, we ought to ensure that the path of zeros is globally defined on [0, 1]. It could indeed happen that the path either reaches some singularity or wanders off to infinity before reaching λ = 1. To eliminate the first possibility, we can make the assumption of the absence of minimizing singular trajectory and of conjugate point over all the domain under consideration (not only along the continuation path). As remarked above, the absence of singular minimizing trajectory over the whole space is generic for large classes of systems, hence this is a reasonable assumption; however the global absence of conjugate point is a strong assumption and there exist other issues (see Remark 3.3 below). To eliminate the second possibility, we ought to provide sufficient conditions ensuring that the tracked paths remain bounded. In other words, we have to ensure that the initial adjoint vectors that are computed along the continuation procedure remain bounded, uniformly with respect to the homotopic parameter λ. This fact is exactly ensured by assuming the absence of minimizing abnormal extremals over the domain (see [51] for details, see also[49]), hence by one of the (generic) assumptions done above.
718
Emmanuel Tr´elat
Proposition 3.2. In the simplified case where M0 = {x0 }, M1 = {x1 } and U = IRm , if there is no minimizing singular trajectory nor conjugate points over all the domain, then the continuation procedure is globally feasible on [0, 1]. Remark 3.3. There exist some other possibilities to eliminate the first possibility above. Singularities due to conjugate points may be either detected and then handled with specific methods (see [4]), or can be removed generically by Sard arguments, by considering a global perturbation of the homotopy function (probability-one homotopy method, see the survey [53]). We also refer the reader to [51] for a more detailed discussion of continuation and homotopy methods. The continuation method is a powerful tool significantly improving the efficiency of shooting approaches and making them realistic in many applications. This combination has been applied successfully to a number of applications in aerospace problems (see e.g. [9, 15, 26]), and to the application below. Automatic solving of the optimal flight of the last stage of Ariane launchers. We mention here our recent work with EADS Astrium (les Mureaux, France), consisting of solving the minimal consumption transfer for the last stage of Ariane V and next Ariane VI launchers, the objective being to obtain a robust software able to provide automatically (that is, without any careful initialization) and instantaneously (actually, within one second) the optimal solution of that problem, for any possible initial and final conditions prescribed by the user. This software is operational on a very large range of possible values covering in particular the domain of applications usually treated at EADS Astrium for civilian launchers. Successfully integrated to the global optimization tools of EADS Astrium, this real-time algorithm brought a significant improvement for Ariane V trajectory planning and also allows one to consider new strategies for the forthcoming Ariane VI launchers. From the mathematical point of view, the approach is based on a combination of the Pontryagin Maximum Principle with numerical continuation methods and with a refined geometric analysis of the extremal flow and the use of recent items of geometric optimal control (see below). Although we are not allowed to describe the precise method we employed, we mention some byproduct works realized in collaboration with EADS. In [17] we provide an alternative approach to the strong thrust minimal consumption orbit transfer planification problem, consisting of considering at first the problem for a flat model of the Earth with constant gravity (which is extremely easy to solve), and then of introducing step by step, by continuation, the variable gravity and the curvature of the Earth, in order to end up with the true model. In [30] we show how one can take into account a shadow cone (eclipse) constraint in the orbit transfer problem, by defining an hybridization of the problem, considering that the controlled vector fields are zero when crossing the shadow cone. A regularization procedure consisting of smoothing the system, combined with a continuation, is also implemented and strong convergence properties of the smoothing procedure are derived.
Optimal control theory and some applications to aerospace problems
719
We end this section by providing a very short insight of what is geometric optimal control. Geometric optimal control. Geometric optimal control can be described as the combination of the knowledge inferred from the Pontryagin Maximum Principle with geometric considerations such as the use of Lie brackets, of subanalytic sets, of differential geometry on manifolds, of symplectic geometry and Hamiltonian systems, of singularity theory, with the ultimate objective of deriving optimal synthesis results, permitting to describe in a precise way the structure of optimal trajectories. In other words, the objective is to derive results saying that, according to the class of control systems we are considering, the optimal trajectories have a precise structure and are of such and such kind. The geometric tools mentioned above are used to provide a complement to the Pontryagin Maximum Principle whenever its application alone happens to be insufficient to adequately solve an optimal control problem, due to a lack of information. We refer the reader to the textbooks [2, 12, 33] and to [51] for a more detailed exposition on geometric optimal control and other applications to aerospace (in particular, the atmospheric re-entry of a space shuttle), where it is shown how results of geometric optimal control theory can help to make converge a shooting method, or at least can simplify its implementation by describing precisely the structure of the optimal trajectory.
4. Dynamical systems theory and mission design We now switch to another problem, also realized in collaboration with EADS Astrium, on mission design planification with the help of Lagrange points. The objective is here to show how dynamical systems theory can supply new directions for control issues. 4.1. Dynamics around Lagrange points. Consider the so-called circular restricted three-body problem, in which a body with negligible mass evolves in the gravitational field of two masses m1 and m2 called primaries and assumed to have circular coplanar orbits with the same period around their center of mass. The gravitational forces exerted by any other planet or body are neglected. In the solar system this problem provides a good approximation for studying a large class of problems. In a rotating frame the equations are of the form ∂Φ ∂Φ ∂Φ , y¨ + 2x˙ = , z¨ = x ¨ − 2y˙ = ∂x ∂y ∂z 2
2
with Φ(x, y, z) = x +y + (1 − µ)((x + µ)2 + y 2 + z 2 )−1/2 + µ((x − 1 + µ)2 + 2 µ(1−µ) y 2 + z 2 )−1/2 + 2 . These equations have the first integral (called Jacobi first integral) J = 2Φ − (x˙ 2 + y˙ 2 + z˙ 2 ), hence the solutions evolve on a five-dimensional energy manifold, the topology of which determines the so-called Hill’s region of possible motions (see e.g. [34]). It is well-known that the above dynamics admit five equilibrium points called Lagrange points: three collinear ones L1 , L2 and L3 (already known by Euler),
720
Emmanuel Tr´elat
and the equilateral ones L4 and L5 (see Figure 1). The linearized system around these equilibrium points admits eigenvalues with zero real part, hence the study of their stability is not obvious. It follows from a generalization of a theorem of Lyapunov (due to Moser [39]) that, for a value of the Jacobi integral a bit less than
Figure 1. Lagrange points in the system Sun–Earth, and phaseportrait around them
the one of the Lagrange points, the solutions have the same qualitative behavior as the solutions of the linearized system around the Lagrange points. It was then established in [36] that the three collinear Lagrange points are always unstable, whereas L4 and L5 are stable under some conditions (that are satisfied in the solar system for instance for the Earth–Moon system, or for the system formed by the Sun and any other planet). These five equilibrium points are naturally privileged sites for space observation. At a first step one may think of the stability feature of L4 and L5 as a good news: indeed if one places an observation engine near one these points, then the engine will remain in a neighborhood of the Lagrange point. However, because of this stability, it happens that the neighborhood of these stable points is full of small particles that have been trapped into the potential well due to astronomic hazards.2 These small particles are of course extremely dangerous for a space engine, and finally these sites must be avoided. In view of that, the instability of L1 , L2 and L3 is finally a good news be cause the vicinity of these points is clean in some sense. The counterpart is that, since they are unstable, any engine located in the neighborhood of such a point must be stabilized by means of a control. However, stabilizing a control system near an unstable equilibrium point requires only little energy and hence this appears as a good strategy for astronomic observation. The Lagrange points L1 and L2 of the Sun–Earth system are indeed used for such issues since years: the satellite SOHO, whose mission is to observe the surface of the Sun, is located near3 the point L1 ; the James Webb Space Telescope (JWST) will be 2 As a striking example, we mention the troyan asteroids, located near the points L and L 4 5 in the Sun–Jupiter system. 3 Actually not exactly: the satellite SOHO is stabilized along a halo orbit of quite large am-
Optimal control theory and some applications to aerospace problems
721
launched within next years and placed near the point L2 , which is an ideal site to observe the cosmos. There are many other objects near Lagrange points. The dynamics around these Lagrange points have particularly interesting features for space mission design. Using Lyapunov–Poincar´e’s Theorem, it is shown that there exists a two-parameter family of periodic trajectories around every Lagrange point (see [36], see also [13]), among which the well-known halo orbits are periodic orbits that are diffeomorphic to circles (see [14]) whose interest for mission design was put in evidence by Farquhar (see [23, 24]). There exist many other families of periodic orbits (called Lissajous orbits) and quasi-periodic orbits around Lagrange points (see [28, 29]). The invariant (stable and unstable) manifolds of these periodic orbits, consisting of all trajectories converging to the orbit (as the time tends to ±∞), are four-dimensional tubes, topologically equivalent to S 3 ×IR, in the five-dimensional energy manifold (see [27]). Hence they play the role of separatrices. Roughly speaking, these tubes can be seen as gravity currents, similar to ocean currents except that their existence is due to gravity effects. Therefore they can be used for mission design and space exploration, since a trajectory starting inside such a tube (called transit orbit) stays inside this tube. Many recent studies have been done on this subject. It can be noted however that the invariant manifolds of halo orbits (which can be really seen as tubes) are chaotic in large time: they do not keep their aspect of tube and behave in a chaotic way, far from the halo orbit (see [34]). In contrast, the invariant manifolds of eight-shaped Lis-
Figure 2. Invariant manifolds of an Eight Lissajous orbit, in the Earth–Moon system (left), and a zoom around the Moon (right)
sajous orbits 4 (which are eight-shaped tubes) are numerically shown in [5] to keep their regular structure globally in time (see Figure 2 on the left). Moreover, in the Earth-Moon system, it is shown that they permit to fly over almost all the surface of the Moon, passing very close to the surface (between 1500 and 5000 kilometers, see Figure 2 on the right). These features are of particular interest in view of designing low-cost space missions to the Moon. Indeed in the future space plitude around the point L1 . 4 Eight-shaped Lissajous orbits are the Lissajous orbits of the second kind. They are diffeomorphic to a curve having the shape of an eight. They are chiefly investigated in [5].
722
Emmanuel Tr´elat
exploration the Moon could serve as an intermediate point (with a lunar space station) for farther space missions. 4.2. Applications to mission design and challenges. The idea of using the specific properties of the dynamics around Lagrange points in order to explore lunar regions is far from new but has recently received a renewal of interest. In [34, 38], the authors combine the use of low-thrust propulsion with the use of the nice properties of invariant manifolds of periodic orbits around Lagrange points in order to design low-cost trajectories for space exploration. Their techniques consist of stating an optimal control problem that is numerically solved using either a direct or an indirect transcription, carefully initialized with the trajectories of the previously studied system (with no thrust). In such a way they are able to realize a reasonable compromise between fuel consumption and time of transfer, and design trajectories requiring moderate propellant mass and reaching the target within reasonable time. In these studies the previously studied circular restricted three-body problem approximation is used to provide an appropriate first guess for carefully initializing an optimal control method (for instance, a shooting method) applied to a more precise model. In view of that, and having in mind the previous methodology based on continuation, it is natural to develop an optimal planification method based on the combination of the dynamics of the three-body problem with a continuation on the value of the maximal authorized thrust. This idea opens new directions for future investigations and is a promising method for designing efficiently fuel low-consumption space missions. Although the properties of the dynamics around Lagrange points have been widely used for developing planification strategies, up to now, and to the best of our knowledge they have not been combined with continuation procedures that would permit to introduce, for instance, the gravitational effects of other bodies, or values of the maximal thrust that are low or mild, or other more complex models. This is a challenge for potential future studies. Note that, in [41], the author implements a numerical continuation procedure to compute minimal-energy trajectories with low thrust steering the engine from the Earth to the Lagrange point L1 in the Earth–Moon system, by making a continuation on the gravitational constant of the Moon. The continuation procedure is initialized with the usual Kepler transfer, in which the Moon coincides with the point L1 , and ends up with a trajectory reaching the point L1 with a realistic gravitational effect of the Moon. In view of designing future mission design it should be done a precise cartography of all invariant manifolds generated by all possible periodic orbits (not only halo or eight-shaped orbits) around Lagrange points. The existence of such invariant manifolds indeed makes possible the design of low-cost interplanetary missions. The design of trajectories taking advantage of these gravity currents, of gravitational effects of celestial bodies of the solar system, of “swing-by” strategies, is a difficult problem related to techniques of continuous and discrete optimization (multidisciplinary optimization). Is is an open challenge to design a tool combining refined techniques of nonlinear optimal control, continuation procedures, mixed optimization, and global optimization procedures.
Optimal control theory and some applications to aerospace problems
723
Another challenge, which is imperative to be solved within next years, is the problem of debris cleaning. Indeed, recently it has been observed a drastic growth of space debris in the space around the Earth, in particular near the SSO orbit and polar orbits with altitude between 600 and 1200 km (indeed these orbits are intensively used for Earth observation): there are around 22000 debris of more than 10 cm (which are cataloged), around 500000 debris between 1 and 10 cm (which are not cataloged), and, probably, millions of smaller debris that cannot be detected. These debris are due to former satellites that were abandoned, and now cause high collision risks for future space flights. It has become an urgent challenge to clean the space at least from its biggest debris in order to stabilize the debris population, otherwise it will soon become impossible to launch additional satellites. At present, all space agencies in the world are aware of that problem and are currently working to provide efficient solutions for designing space debris collecting missions. One of them, currently led with EADS Astrium (see [16]), consists of deorbiting five heavy debris per year, selected in a list of debris (in the LEO region) so that the required fuel consumption for the mission is minimized. The problem to be solved turns into a global optimization problem consisting of several continuous transfer problems and of a combinatorial path problem (selection of the debris and of the collecting order). It is not obvious to solve since it must combine continuous optimal control methods with combinatorial optimization, and other considerations that are specific to the problem. The results of [16] (which are valuable for high-thrust engines) provide first solutions in this direction, and open new problems for further investigation. For instance it is an open problem to design efficient space cleaning missions for low-thrust engines, taking benefit of the gravitational effects due to Lagrange points and to invariant manifolds associated with their periodic orbits. Such studies can probably be carried out with appropriate continuation procedures, carefully initialized with trajectories computed from the natural dynamics of the three-body problem. It is one of the top priorities in the next years is to clean the space from the biggest fragments. Although we have at our disposal a precise catalog of fragments, wreckage, scraps, it is a challenging problem to design optimally a space vehicle able to collect in minimal time a certain number of fragments, themselves being chosen in advance in the catalog in an optimal way. This problem combines techniques of continuous optimal control in order to determine a minimal time trajectory between two successive fragments, and techniques of discrete optimization for the best possible choice of the fragments to be collected.
References [1] A. Agrachev and R. Gamkrelidze, Symplectic methods for optimization and control. In: Geometry of Feedback and Optimal Control (ed. by B. Jajubczyk and W. Respondek). Marcel Dekker, 1998, 19–78. [2] A. Agrachev and Y. Sachkov, Control theory from the geometric viewpoint. Encyclopaedia Math. Sciences 87, Springer-Verlag, 2004.
724
Emmanuel Tr´elat
[3] A. Agrachev, G. Stefani, and P. Zezza, Strong optimality for a bang-bang trajectory, SIAM J. Control Optim. 41 no. 4 (2002), 991–1014. [4] E. Allgower and K. Georg, Numerical continuation methods. An introduction. Springer Series in Computational Mathematics 13, Springer-Verlag, Berlin, 1990. [5] G. Archambeau, P. Augros, and E. Tr´elat, Eight Lissajous orbits in the Earth–Moon system. MathS in Action 4 no. 1 (2011), 1–23. [6] J. T. Betts, Practical methods for optimal control and estimation using nonlinear programming. Second edition, Advances in Design and Control, 19, SIAM, Philadelphia, PA, 2010. [7] F. Bonnans and J. Laurent-Varin, Computation of order conditions for symplectic partitioned Runge–Kutta schemes with application to optimal control. Numer. Math. 103 (2006), 1–10. [8] F. Bonnans, P. Martinon, and E. Tr´elat, Singular arcs in the generalized Goddard’s problem. J. Optim. Theory Appl. 139 no. 2 (2008), 439–461. [9] B. Bonnard and J.-B. Caillau, Riemannian metric of the averaged energy minimization problem in orbital transfer with low thrust. Ann. Inst. H. Poincar´e Anal. Non Lin´eaire 24 no. 3 (2007), 395–411. [10] B. Bonnard, J.-B. Caillau, and E. Tr´elat, Geometric optimal control of elliptic Keplerian orbits. Discrete Contin. Dyn. Syst. Ser. B 5(4) (2005), 929–956. [11] B. Bonnard, J.-B. Caillau, and E. Tr´elat, Second order optimality conditions in the smooth case and applications in optimal control. ESAIM Control Optim. Calc. Var. 13 no. 2 (2007), 207–236. [12] B. Bonnard and M. Chyba, The role of singular trajectories in control theory. Springer Verlag, 2003. [13] B. Bonnard, L. Faubourg, and E. Tr´elat, M´ecanique c´eleste et contrˆ ole de syst`emes spatiaux. Math. & Appl. 51, Springer Verlag (2006), XIV. [14] J. V. Breakwell and J. V. Brown, The halo family of 3-dimensional of periodic orbits in the Earth–Moon restricted 3-body problem. Celestial Mechanics 20 (1979), 389– 404. [15] J.-B. Caillau, J. Gergaud, and J. Noailles, 3D geosynchronous transfer of a satellite: continuation on the thrust. J. Optim. Theory Appl. 118 no. 3 (2003), 541–565. [16] M. Cerf, Multiple space debris collecting mission: debris selection and trajectory optimization. To appear in J. Optim. Theory Appl. (2012). [17] M. Cerf, T. Haberkorn, and E. Tr´elat, Continuation from a flat to a round Earth model in the coplanar orbit transfer problem. To appear in Optimal Control Appl. Methods (2012). [18] L. Cesari, Optimization – theory and applications. Problems with ordinary differential equations. Applications of Mathematics 17, Springer-Verlag, 1983. [19] Y. Chitour, F. Jean, and E. Tr´elat, Genericity results for singular curves. J. Differential Geom. 73 no. 1 (2006), 45–73. [20] Y. Chitour, F. Jean, and E. Tr´elat, Singular trajectories of control-affine systems. SIAM J. Control Optim. 47 no. 2 (2008), 1078–1095. [21] F. H. Clarke, Optimization and nonsmooth analysis. Canadian Mathematical Society Series of Monographs and Advanced Texts, John Wiley & Sons, Inc., New York, 1983.
Optimal control theory and some applications to aerospace problems
725
[22] F. H. Clarke and R. Vinter, The relationship between the maximum principle and dynamic programming. SIAM J. Control Optim. 25 no. 5 (1987), 1291–1311. [23] R. W. Farquhar, Station-keeping in the vicinity of collinear libration points with an application to a Lunar communications problem. Space Flight Mechanics, Science and Technology Series 11 (1966), 519–535. [24] R. W. Farquhar, A halo-orbit lunar station. Astronautics & Aerospace 10 no. 6 (1972), 59–63. [25] R. Fourer, D. M. Gay, and B. W. Kernighan, AMPL: A modeling language for mathematical programming. Duxbury Press, Second edition (2002). [26] J. Gergaud and T. Haberkorn, Homotopy method for minimum consumption orbit transfer problem. ESAIM Control Optim. Calc. Var. 12 no. 2 (2006), 294–310. [27] G. G´ omez, W. S. Koon, M. W. Lo, J. E. Marsden, J. Masdemont, and S. D. Ross, Connecting orbits and invariant manifolds in the spatial three-body problem. Nonlinearity 17 (2004), 1571–1606. [28] G. G´ omez, J. Masdemont, and C. Sim´ o, Lissajous orbits around halo orbits. Adv. Astronaut. Sci. 95 (1997), 117–134. [29] G. G´ omez, J. Masdemont, and C. Sim´ o, Quasihalo orbits associated with libration points. J. Astronaut. Sci. 46 (1998), 135–176. [30] T. Haberkorn and E. Tr´elat, Convergence results for smooth regularizations of hybrid nonlinear optimal control problems. SIAM J. Control Optim. 49 no. 4 (2011), 1498– 1522. [31] W. W. Hager, Runge–Kutta methods in optimal control and the transformed adjoint system. Numer. Math. 87 (2000), 247–282. [32] R. F. Hartl, S. P. Sethi, and R. G. Vickson, A survey of the maximum principles for optimal control problems with state constraints. SIAM Rev. 37 no. 2 (1995), 181–218. [33] V. Jurdjevic, Geometric control theory. Cambridge Studies in Advanced Mathematics 52, Cambridge University Press, 1997. [34] W. S. Koon, M. W. Lo, J. E. Marsden, and S. D. Ross, Dynamical Systems, the threebody problem and space mission design. Springer, 2008. [35] E. B. Lee and L. Markus, Foundations of optimal control theory. John Wiley, 1967. [36] K. R. Meyer and G. R. Hall, Introduction to Hamiltonian dynamical systems and the N-body problem. Applied Math. Sci. 90, Springer-Verlag, New York, 1992. [37] A. A. Milyutin and N. P. Osmolovskii, Calculus of Variations and Optimal Control. Transl. Math. Monogr. 180, AMS, Providence, 1998. [38] G. Mingotti, F. Topputo, and F. Bernelli-Zazzera, Low-energy, low-thrust transfers to the Moon. Celestial Mech. Dynam. Astronomy 105 no. 1–3 (2009), 61–74. [39] J. Moser, On the generalization of a theorem of A. Lyapunov. Commun. Pure Appl. Math. 11 (1958), 257–271. [40] H. J. Pesch, A practical guide to the solution of real-life optimal control problems. Control Cybernet. 23 no. 1/2 (1994). [41] G. Picot, Shooting and numerical continuation methods for computing time-minimal and energy-minimal trajectories in the Earth–Moon system using low propulsion. Discrete Cont. Dynam. Syst. Ser. B 17 no. 1 (2012), 245–269.
726
Emmanuel Tr´elat
[42] L. Pontryagin, V. Boltyanskii, R. Gramkrelidze, and E. Mischenko, The mathematical theory of optimal processes. Wiley Interscience, 1962. [43] L. Rifford and E. Tr´elat, Morse–Sard type results in sub-Riemannian geometry. Math. Ann. 332 no. 1 (2005), 145–159. [44] L. Rifford and E. Tr´elat, On the stabilization problem for nonholonomic distributions, J. Eur. Math. Soc. 11 no. 2 (2009), 223–255. [45] I. M. Ross, A roadmap for optimal control: the right way to commute. Annals of the New York Academy of Sciences 1065 (2006), 210–231. [46] J. A. Sethian, Level set methods and fast marching methods. Cambridge Monographs on Applied and Computational Mathematics 3, 1999. [47] C. J. Silva and E. Tr´elat, Asymptotic approach on conjugate points for minimal time bang-bang controls. Syst. Cont. Letters 59 no. 11 (2010), 720–733. [48] J. Stoer and R. Bulirsch, Introduction to Numerical Analysis. Springer-Verlag, 1983. [49] E. Tr´elat, Some properties of the value function and its level sets for affine control systems with quadratic cost. J. Dyn. Cont. Syst. 6 no. 4 (2000), 511–541. [50] E. Tr´elat, Contrˆ ole optimal: th´eorie & applications. Vuibert, 2005. [51] E. Tr´elat, Optimal control and applications to aerospace: some results and challenges. To appear in Journal Optim. Theory Appl. 154 no. 3 (2012). [52] A. W¨ achter and L. T. Biegler, On the implementation of an interior-point filter linesearch algorithm for large-scale nonlinear programming. Math. Programming 106 (2006), 25–57. [53] L. T. Watson, Probability-one homotopies in computational science, J. Comput. Appl. Math. 140 no. 1–2 (2002), 785–807.
Emmanuel Tr´elat, Universit´e Pierre et Marie Curie (Univ. Paris 6) and Institut Universitaire de France, CNRS UMR 7598, Laboratoire Jacques-Louis Lions, 75005, Paris, France. E-mail: [email protected]
Mathematics and geometric ornamentation in the medieval Islamic world. Jan P. Hogendijk
Abstract. We discuss medieval Arabic and Persian sources on the design and construction of geometric ornaments in Islamic civilization. 2010 Mathematics Subject Classification. Primary 01A30; Secondary 51-03. Keywords. Islamic mathematics, tilings, pentagon, heptagon.
1. Introduction Many medieval Islamic mosques and palaces are adorned with highly intricate geometric ornaments. These decorations have inspired modern artists and art historians, and they have been discussed in connection with modern mathematical concepts such as crystallographic groups and aperiodic tilings. The Islamic ornamental patterns can certainly be used to illustrate such modern notions. Medieval Islamic civilization has also left us an impressive written heritage in mathematics. Hundreds of Arabic and Persian mathematical manuscripts have been preserved in libraries in different parts of the world. These manuscripts include Arabic translations of the main works of ancient Greek geometry such as the Elements of Euclid (ca. 300 BC) and the Conics of Apollonius (ca. 200 BC), as well as texts by medieval authors between the eighth and seventeenth centuries, with different religious and national backgrounds. In what follows I will refer to ‘Islamic’ authors and ‘Islamic’ texts, but the word ‘Islamic’ will have a cultural meaning only. Most ‘Islamic’ mathematical texts were not related to the religion of Islam, and although the majority of ‘Islamic’ authors were Muslims, substantial contributions were made by Christians, Jews and authors with other religious backgrounds who lived in the Islamic world. Many Islamic texts on geometry are related to spherical trigonometry and astronomy, and most Islamic scholars who studied the Elements of Euclid were studying in order to become astronomers and possibly astrologers. Yet there are also Islamic works on geometrical subjects unrelated to astronomy. In almost all medieval Islamic geometrical texts that have been published thus far, one does not find the slightest reference to decorative ornaments. This may be surprising because the authors of these texts lived in the main Islamic centers of civilization and may have seen geometric ornaments frequently. In this paper we will see that the Islamic geometric ornaments were in general designed and constructed not by mathematician-astronomers but by craftsmen (Arabic: .sunn¯ ac .) Our main question will be as follows: what kind of mathematical
728
Jan P. Hogendijk
methods, if any, did these craftsmen use, and to what extent did they interact with mathematician-astronomers who were trained in the methodology of Greek geometry? We will discuss these questions on the basis of the extant manuscript material, which is very fragmentary. In sections 2-5 we will discuss four relevant sources, and we will draw our conclusions in the final section 6. For reasons of space, we will restrict ourself to plane ornaments and pay no attention to decorative patterns on cupolas and to muqarnas (stalactite vaults).
2. Abu’l-Waf¯ a’ We first turn to the “book on what the craftsman needs of the science of geometry”1 by the tenth-century mathematician-astronomer Abu’l-Waf¯a’ al-B¯ uzj¯an¯ı. This work contains some information on the working methods of the craftsmen, which will be useful for us in Section 4 below. Abu’l-Waf¯a’ worked in Baghdad, one of the intellectual centers of the Islamic world. He dedicated his booklet to Bah¯ a’ al-Dawla, who ruled Iraq from 988 to 1012, and who apparently employed mathematicians as well as craftsmen at his court. Almost all of the booklet consists of ruler and compass-constructions belonging to plane Euclidean geometry. They are explained in the usual way, that is, by means of geometric figures in which the points are labeled by letters, but without proofs. Abu’l-Waf¯a’ says that he does not provide arguments and proofs in order to make the subject more suitable and easier to understand for craftsmen [1, 23]. The booklet consists of eleven chapters on (1) the ruler, the compass and the gonia (i.e., a set square); (2) fundamental Euclidean ruler-and-compass constructions, and in addition a construction of two mean proportionals, a trisection of the angle, and a pointwise construction of a (parabolic) burning mirror; (3) constructions of regular polygons, including some constructions by a single compassopening; (4) inscribing figures in a circle; (5) circumscribing a circle around figures; (6) inscribing a circle in figures; (7) inscribing figures in one another; (8) division of triangles; (9) division of quadrilaterals; (10) combining squares to one square, and dividing a square into squares, all by cut-and-paste constructions; and (11) the five regular and a few semi-regular polyhedra. Abu’l-Waf¯a’ does not mention geometric ornaments. Most of the information on the working methods of craftsmen is contained in Chapter 10. In that chapter, Abu’l-Waf¯a’ reports about a meeting between geometers and craftsmen in which they discussed the problem of constructing a square equal to three times a given square (for an English translation see [16, 173– 183]). The craftsmen seem to have had three equal squares in front of them and wanted to cut them and rearrange the pieces to one big square. The geometers easily constructed the side of the required big square by means of Euclid’s Elements, but were unable to suggest a cut-and-paste construction of the big square from the three small squares. Abu’l-Waf¯a’ presents several cut-and-paste methods that 1 Incomplete French and German versions are to be found in [21] and [20]. The complete version in Arabic is in [1] and in facsimile in [18].
Islamic geometric ornaments
729
were used by the craftsmen, but he regards these methods with some disdain because they are approximations. Abu’l-Waf¯a’ was trained in Euclid’s Elements and therefore he believed that geometry is about infinitely thin lines and points without magnitude, which exist in the imagination only. He complains that the craftsmen always want to find an easy construction which seems to be correct to the eyesight, but that they do not care about a proof by what Abu’l-Waf¯a’ calls “the imagination.” He declares that the constructions that can be rigorously proven should be distinguished from approximate constructions, and that the craftsmen should be provided with correct constructions so that they do not need to use approximations anymore.2 We do not know how the booklet was received but the 16th-century Persian manuscript which we will study in Section 4 contains a rich variety of approximate constructions.
3. The Topkapı Scroll The craftsmen themselves seem to have left us with very few documents about their activities in the field of geometric ornamentation. The most important published example is the so-called Topkapı Scroll, which is now preserved in the Topkapı Palace in Istanbul, and which has appeared in the magnificent volume [14]. This 29.5 m long and 33 cm wide paper scroll is undated and may have been compiled in Northwestern Iran in the 16th century, but the dating is uncertain. The scroll consists of diagrams without explanatory text. Many of these diagrams are related to calligraphy or muqarnas and therefore do not concern us here. Some of the diagrams concern plane tilings. I have selected one non-trivial example in order to draw attention to the characteristic (and frustrating) problems of interpretation. The drawing on the scroll [14, p. 300] consists of red, black and orange lines, which are indicated by bold, thin and broken lines respectively in Figure 1 (for a photo of the manuscript drawing see also [17]). The broken lines in Figure 1 define a set of five tiles, called gireh-tiles in the modern research literature, from the Persian word g¯ıreh, which means knot. The thin lines form a decorative pattern which can be obtained by bisecting the sides of the gireh-tiles, and by drawing suitable straight line segments through the bisecting points. It is likely that the pattern was designed this way, but one cannot be sure because the scroll does not contain any explanatory text. The gireh tiles of Figure 1 have drawn recent attention because they can be used to define aperiodic tilings. In the absence of textual evidence, it is impossible to say whether the craftsmen had an intuitive notion of aperiodicity (for a good discussion see [8]).
2 Note that Abu’l-Waf¯ a’ presents an approximate construction of the regular heptagon by ruler and compass. Just like many of his Islamic contemporaries, he probably believed that the regular heptagon cannot be constructed by ruler and compass.
730
Jan P. Hogendijk
Figure 1. Drawing by Dr Steven Wepster.
4. An anonymous Persian treatise One would like to have a medieval Islamic treatise, written by a craftsman, in which the design and construction of ornaments is clearly explained. Such a treatise has not been found, and thus far, only a single manuscript has been discovered in which diagrams on geometrical ornaments are accompanied by textual explanations. In this section we will discuss what this manuscript can tell us about the main question in the beginning of the paper. The manuscript is a rather chaotic collection of 40 pages of Persian text and drawings (for some photos see [14, 146–150]). The text consists of small paragraphs which are written close to the drawing to which they refer, and although the texts and drawings appear in a disorganized order and may not be the work of a single author, I will consider the collection as one treatise.3 It may have been compiled in the sixteenth century, although some of the material must be older as we shall see. The treatise belongs to a manuscript volume of approximately 400 pages [5, 55– 56]. Some of the other texts in the manuscript volume are standard mathematical works such as an Arabic translation of a small part of Euclid’s Elements. But the treatise itself does not resemble a usual work by a mathematician or astronomer in the Islamic tradition. I believe that the treatise is the work of one or more craftsmen because it agrees with most of what Abu’l-Waf¯a’ says about their methodology. The treatise provides much additional information on the working methods of the craftsmen and it also shows that they were really involved with the design and construction of geometrical ornaments. In order to illustrate these points, I have selected the following four examples 4.1 through 4.4 from the treatise. 3 The treatise was translated into Russian [6, 315–340] and modern Persian [2, 73–93], and ¨ a full publication of it with English translation was planned by Alpay Ozdural (cf. [15]), who unfortunately passed away in 2003 before he completed the project. The Persian text is scheduled to be published, with translation and commentary, by an interdisciplinary research team in 2013.
731
Islamic geometric ornaments
4.1. The treatise contains many approximation constructions, including a series of ruler-and-compass constructions of a regular pentagon by means of a single compass-opening. In these constructions, the compass opening is assumed to be either the side of the required regular pentagon, or the diagonal, the altitude, or the radius of the circumscribing circle. Here is one such construction with my paraphrase of the manuscript text [12, 184b]. Figure 2 is a transcription of the figure in the manuscript, in which the labels (the Arabic letters alif, b¯ a’, . . . ) are rendered as A, B, . . . , and Hindu-Arabic number symbols are represented by their modern equivalents. The Persian text says:
D
E H
Z
15 9
21
9
15
21 6
G
6 4
5
B
A
Figure 2.
“On the construction of gonia 5 by means of the compass-opening of the radius, from gonia 6. On line AG describe semicircle ADG with center B. Then make point A the center and describe arc BE. Then make point G the center and on the circumference of the arc find point D and draw line AD to meet arc EB at point Z. Draw line GZ to meet the circumference of the arc at point H. Join lines AH, GH.4 Each of the triangles AZH, GZD is gonia 5, and the original triangle ADG was gonia 6, . . . ” Points A, E, D and G are four angular points of a regular hexagon, and DH is the side of the regular pentagon inscribed in the same circle. The construction is a good approximation,5 but it is not exact so Abu’l-Waf¯a’ would not have approved it. In Chapters 3 and 4 4 of his booklet, Abu’l-Waf¯a’ provided exact constructions of the regular pentagon using a fixed compass-opening. The gonia is mentioned by Abu’l-Waf¯ a’ as an instrument used by craftsmen. From the Persian treatise we o o infer that gonia n is a set square with angles 90o , 180 and 90 − 180 n n . In Figure 2, angles are expressed in units such that 15 units are a right angle. In the Islamic 4 Instead
of GH the manuscript says incorrectly DH. is easily shown by modern elementary geometry. Suppose that the radius of the circle is √ 1, and drop a perpendicular ZP onto AG. Then ZA = 1, ∠ZAP = 30o , ZP = 21 , AP = 23 , GP = 5 This
2−
√ 3 , ∠ZGP 2
= arctan
ZP GP
≈ 23.8o . Because ∠DGP = 60o , ∠ZAH = ∠ZGD ≈ 36.2o .
732
Jan P. Hogendijk
tradition, the division of the right angle into 90 degrees, subdivided sexagesimally, was only used in mathematical astronomy and mathematical geography. 4.2. Abu’l-Waf¯ a’ says that the craftsmen are interested in cut-and-paste constructions, and the Persian treatise contains many such constructions. Some of these are explained by one or more paragraphs of text, but the following example is presented without accompanying text.
3 3
3 2
3
1
2
4
1
2
5
5 4
2 11
5
4
5
4
Figure 3.
Figure 3 displays a regular hexagon and an isosceles triangle, dissected into pieces such that both figures can be composed from these pieces. Figure 3 is derived from the manuscript [12, 197a] with the difference that I have arbitrarily assumed the isosceles triangle to be equilateral, and I have drawn the figure in a mathematically correct way. In the manuscript, the pieces are indicated by numbers (as in Figure 3) so the correspondence is clear. Since there is no text in the manuscript, the reader does not have a hint how exactly the pieces have to be cut. I invite the reader to work out the details for himself. After this exercise, she or he will probably be convinced that the manuscript was intended to be used under the guidance of a competent teacher who could provide further information. It should be noted that the pieces no. 1 and 2 in the manuscript are drawn in such a way that no. 1 is wider than no. 2. This may happen if the vertex angle of the isosceles triangle is less than 54o ; figure 4 has been drawn for a vertex angle o of 360 7 . It is tempting to assume that the craftsmen had a general dissection of an isosceles (rather than an equilateral) triangle in mind, but because there is no accompanying text, one cannot be sure. The construction is mathematically correct but there are also approximate cut-and-paste constructions in the Persian treatise. It is not necessary to assume that the fancy cut-and-paste construction of Figure 3 and 4 was used in practice. Just like European arithmetics teachers in later centuries, Islamic craftsmen may have challenged one another with problems which surpassed the requirements of their routine work.
733
Islamic geometric ornaments
3 3 1
2
3
5
4
4
1
2 2
3
2
1 1 5 5
4
5
4
Figure 4.
4.3. The many drawings of geometric ornaments in the Persian treatise show that its authors were deeply involved with the design and construction of ornamental patterns. I have selected an example which is also found on a real building, namely the North Cupola of the Friday Mosque in Isfahan, which was built in the late eleventh century. The Persian text laconically introduces the ornamental pattern as follows ([12, 192a], [14, 148]) with reference to Figure 5.6 C
O
G
E
B H I
T
L
D
N F
S
K M Z A
Figure 5.
“Make angle BAG three sevenths of a right angle. Bisect AG at point D. Cut off BE equal to AD. Produce line EZ parallel to AG. Draw line T I 7 parallel to BE, bisect T E at point H, and make T I equal to T H. Extend EI until it intersects AB at point K. Produce KL parallel to BE. With center Z draw circular arc KM N in such a way that its part KM is equal to M N . On line AF take point S and that is the center of a heptagon. Complete the construction, if God Most High wants. 6 Broken 7 The
lines in Figure 5 also appear as broken lines in the manuscript. text does not make clear that T is an arbitrary point on segment EZ.
734
Jan P. Hogendijk
Or construct angle ELN equal to angle ELK and by means of line LN find the center S. Or cut off EO equal to EL, so that O is the center of a heptagon. And make line OS parallel to GA and equal to AG8 Then point S is the center of another heptagon. Or else let GO be equal to AS. God knows best.” P
P
P
P
H
Q
H
P
P
P
P
Figure 6.
The text does not inform the reader what should be done with the completed figure. Apparently the rectangular figure in the manuscript and its mirror image should be repeated as suggested by figure 6. Thus one obtains the pattern in the north cupola of the Friday Mosque.9 The pattern can be linked to gireh tiles such as in Figure 1 above. These gireh tiles are not mentioned explicitly in the Persian treatise; all information in the treatise about figure 5 is contained in the passages quoted above. Let α = 17 × 180o and take as gireh tiles two types of equilateral hexagons with equal sides (thin lines in figure 6), of type P with angles 4α, 5α, 5α, 4α, 5α, 5α, and of type Q with angles 4α, 4α, 6α, 4α, 4α, 6α. Now draw suitable lines through the midpoints of the sides, in such a way that the “stars” inscribed in P and Q emerge, with angles 2α at the midpoints of the sides of the gireh tiles. The heptagons H in figure 6 are regular. Patterns with regular heptagons are rarely found on Islamic buildings so the pattern in the manuscript and on the North Cupola probably go back to the same designer or designers. The pattern on the North Cupola of the Friday Mosque consists of the thick lines in Figure 6 with some additional embellishments but without the gireh tiles in Figure 6. 4.4. My fourth and final example from the Persian treatise will reveal some information about the relationship between craftsmen and Islamic mathematicianastronomers who had been trained in Greek mathematics. As an introduction, consider a pattern from the Hakim Mosque in Isfahan (Figure 7). The pattern is 8 The 9 For
manuscript has AD by scribal error. a photograph see [9].
735
Islamic geometric ornaments
inspired by a division of a big square into a small square and four kites.10 Two of the angles of each of the kites are right angles.
Figure 7.
U
Q R
E H B
S Z
C
T
P
Figure 8.
Figure 8 is a partial transcription of a figure in the Persian treatise [12, 189b], but the labels and broken lines are my own additions.11 The figure displays a big square with side ZP , subdivided into a small square with side RQ, and four big kites such as EQT Z and RT P U , each with two right angles, and with pairwise equal sides (QE = EZ, QT = T Z, RT = T P, RU = U P ). Note that the four longer diagonals of the big kites also form a square with side ET , which I call the intermediate square. In the special case of Figure 8, the side QR of the small square is supposed to be equal to the distance RB between each angular point of 10 See [7]. The pattern is inscribed with calligraphy: All¯ ah in the central square and Muh.ammad and c Al¯ı in the four kites. 11 I have labelled the points in Figure 8 to highlight the correspondence with Figure 10 below.
736
Jan P. Hogendijk
the small square and the closest side of the intermediate square. Then each big kite such as EQT Z can be divided into two right-angled triangles BRT, BCT , and two small kites such as EQRB, EBCZ with two right angles and pairwise equal sides (EQ = EB, RQ = RB, EB = EZ, CB = CZ). Thus we have four big kites and eight small kites, and for easy reference, I will call the resulting division of the big square the twelve kite pattern. Almost a quarter of the Persian treatise is somehow devoted to the twelve kite pattern. If we draw perpendiculars ZH and RS to ET and T U respectively, ZH = RS = BT . The two sides EZ and EB of the small kite EBZC are also equal, so in the right-angled triangle EZT we have ZH +EZ = ET. The twelve kite pattern can be constructed if a right-angled triangle (such as EZT ) can be found with the property that the altitude (ZH) plus the smallest side (ZE) is equal to the hypotenuse (ET ). The text states that “Ibn-e Heitham” wrote a treatise on this triangle and constructed it by means of two conic sections, namely “a parabola and a hyperbola”. No further details are given, and no conic section is drawn anywhere in the Persian treatise. But the text contains a series of approximation constructions of the twelve kite pattern, such as the following [12, 189b] (Figure 9). The text reads: W
V
K
F Y H X Z 5 D L
G
B
E A
Figure 9.
“Line AD is the diagonal of a square. The magnitudes of AB, BG are equal and AD is equal to AB. Find point E on the rectilinear extension of line GD. Then each of EZ, ZH is equal to AG. Join line GH and through point K draw line KL parallel to GH. Find point L, the desired point has now been obtained.” The approximation is sufficiently close for all practical purposes: if the side of the square is 1 meter, the difference between the correct and approximate positions of L is only a few millimeters.12 It does not follow that the approximation presup√ 12 the side of the “square” in the √ beginning1 is set equal √ to 1, we have AD = o2,0 AG = √ IfAE 1 1 √ 2 2, AG = so AE = 7 · (8 + 2 2), AZ = 7 · (8 + 16 2), ∠ZGA ≈ 57.12 . . . ≈ 57 7 . Note 2 2−1 that ∠ZGA in Figure 9 corresponds to α = ∠ZET in Figures 8 and 10.
Islamic geometric ornaments
737
poses a deep mathematical knowledge. In the figure in the manuscript, the eight small kites are all subdivided into three even smaller kites with pairwise equal sides and at most one right angle. In Figure 9 the subdivision is indicated by broken lines in only one kite V W XY (these labels are mine) in the upper left corner. One may guess that F V = 21 V W and note that F is located on the bisector of angle W V Y . The first step of the approximation boils down to the construction of a triangle ADG similar to V F W . For further details on the Persian treatise we refer to the planned edition with translation and commentary which is scheduled to appear in 2013.
5. Mathematicians on the twelve kite pattern The reference to “Ibn e-Heitham” in the Persian treatise shows that the twelve kite pattern was also studied by mathematician-astronomers. We will now discuss what is known about these studies because they will give us some further hints about the interactions between mathematician-astronomers and craftsmen. “Ibn eHeitham” is a Persian form of Ibn al-Haytham (ca. 965–1041), a well-known Islamic mathematician-astronomer who was interested in conic sections. His treatise on the twelve kite pattern has not been found but one of the extant works of the famous mathematician-astronomer and poet c Umar Khayy¯am (1048–1131) is also of interest here. The work is written in Arabic and entitled “treatise on the division of a quadrant”. It begins in the following uninspiring way (Figure 10, [10, 73]): “We wish to divide the quadrant AB of the circle ABGD into two parts at a point such as Z and to draw a perpendicular ZH onto the diameter BD in such a way that the ratio of AE to ZH is equal to the ratio of EH to HB, where E is the center of the circle and AE is the radius.” Khayy¯am does not give the slightest indication of the origin or relevance of this problem. He draws the tangent to the circle at Z, which tangent intersects BE extended at T , and he shows that in the right angled triangle EZT , the sum of the altitude ZH plus the shortest side ZE is equal to the hypotenuse ET .13 Thus the problem is inspired by the twelve kite pattern, but Khayy¯ am does not mention the relationship with this pattern or with geometric ornamentation in general. In a new figure (not rendered here), √ Khayy¯am puts, in the notation of Figure 10, EH = 10 and ZH = x, so ZE = 100 + x2 and 2 by similar triangles HT = x10 . He then shows that the property ZH + EZ = ET boils down to the cubic equation x3 +200x = 20x2 +2000, or in a literal translation of his words: “a cube and two hundred things are equal to twenty squares plus two thousand in number” [10, 78]. He then proceeds to construct a line segment with length equal to the (positive) root x of this equation by the intersection of a circle and a hyperbola. An anonymous appendix [10, 91] to Khayy¯am’s text contains a direct construction of point Z in figure 10 as a point of intersection of 13 Proof: In Figure 10 by similar triangles EH : EZ = EZ : ET , and because EZ = EB we have EH : EB = EB : ET and therefore EH : (EB − EH) = EB : (ET − EB), that is to say EH : HB = EB : BT . By assumption EH : HB = AE : ZH so because AE = BE also EH : HB = EB : ZH. We conclude ZH = BT , so EZ + ZH = EB + BT = ET .
738
Jan P. Hogendijk
the circle and the hyperbola through point B whose asymptotes are the diameter AEG and the tangent GM (broken lines in figure 10). None of this was relevant to a craftsman who wanted to draw the twelve kite pattern, and Khayy¯am declares that numerical solutions of the cubic equation could not be found. In order to find a numerical approximation of arc ZB, Khayy¯am rephrases the problem about the quadrant in trigonometrical form as follows: to find an arc such that “the ratio of the radius of the circle to the sine of the arc is equal to the radius of the cosine to the versed sine.” In modern terms, if α = ∠ZET and the radius is 1, the ratio AE : ZH = EH : BH is equivalent to 1 : sin α = cos α : (1 − cos α). Khayy¯am says that this problem can be solved by trial and error using trigonometrical tables and that he found in this way α ≈ 57o , and if AE = 60 then ZH ≈ 50, EH ≈ 32 32 and BH ≈ 27 13 . He also says that one can solve the problem more accurately. Using the trigonometrical tables that were available in his time, he could have computed the required arc in degrees and minutes by linear interpolation.14 This information on sexagesimal degrees and minutes may not have been of much use to craftsmen as we have already seen in 4.2 above. We may also compare with a reference by the Iranian mathematician and astronomer Al-B¯ır¯ un¯ı (976-1043) in a work on the qibla (direction of prayer towards Mecca). Al-B¯ır¯ un¯ı computes the qibla at Ghazni (Afghanistan) by trigonometrical methods as 70 degrees and 47 minutes West of the South point on the local horizon. He then adds a ruler-andcompass approximation construction for “builders and craftsmen,” who “are not guided by degrees and minutes” ([4, 286], compare [3, 255-256]).
Z
T
A
B H
M
E
D
G Figure 10.
14 If we use modern methods and put x = tan α, we have HZ = 10x if HE = 10. so 10x is a root of Khayy¯ am’s cubic equation, and therefore x3 + 2x = 2x2 + 2. The equation is irreducible over the rational numbers, so the twelve kite pattern cannot be constructed by ruler and compass. The equation has one real root x = 1.54369 . . . so α ≈ 57.06o ≈ 57o 40 .
Islamic geometric ornaments
739
6. Conclusion We now return to the main question in the introduction to this paper. Because the evidence is so scarce, it is not clear to what extent we are able to generalize the information which we can obtain from the available manuscript sources. But if this can be done, the following may be suggested about the main differences between Islamic craftsmen who designed and constructed ornaments, and Islamic mathematician-astronomers who were trained in Greek geometry: • mathematician-astronomers worked with geometric proofs in the style of Euclid’s Elements. Craftsmen were familiar with the Euclidean way to draw figures, using letters as labels of points (but also the number 5 in Figure 9 above). Craftsmen did not use geometric proofs and they had not been trained in the methods of Euclid’s Elements. • Texts written by mathematician-astronomers usually contain sufficient explanation to understand the mathematics. An oral explanation is not absolutely necessary. Texts and diagrams by craftsmen are often ambiguous, and oral explanations were essential. • mathematician-astronomers distinguished between exact and approximate geometrical constructions. Craftsmen did not distinguish between these constructions if the result was acceptable from a practical point of view. • Craftsmen used some geometrical instruments not found in the theoretical works of Greek geometry, such as a set-square and a compass with fixed opening. The following relationship between craftsmen and mathematicians may be suggested. Mathematicians such as Ibn al-Haytham and c Umar Khayyam may have regarded the designs of craftsmen as hunting ground for interesting mathematical problems. Thus the twelve kite pattern led to construction by means of conic sections, as in figure 10 above. These constructions were a favorite research topic in the tenth and eleventh century among Islamic mathematicians who had studied the Conics of Apollonius (ca. 200 BC). However, Khayy¯am did not reveal that his geometric construction problem was inspired by a decorative ornament.15 Other Islamic geometric problems may also have a hitherto unidentified historical context related to ornaments. The craftsmen knew that the mathematicians had worked on some problems related to ornamentation and they regarded the solutions with respect, even though they probably did not understand the details and technicalities. The Persian treatise states [12, 185a] that the construction of a right-angled triangle such as EZT in Figure 8 “falls outside the Elements of Euclid” and requires the “science of conic sections”. No drawing of a conic section occurs anywhere in the Persian treatise. 15 When Khayyam’s text on the division of the quadrant was published in 1960 [13] and in 1981 [10], the modern editors had no way of knowing that the problem was inspired by ornaments. ¨ Around 1995 Ozdural discovered the connection as a result of his study of the anonymous Persian treatise [15].
740
Jan P. Hogendijk
Of course we cannot exclude the possibility that a few mathematicians were also involved in the design and construction of geometric ornaments. The heptagonal pattern in Figure 6 is explained in our treatise in the language of the craftsmen, but since c Umar Khayy¯ am lived in Isfahan at the time that the North Cupola was built, it is possible that he was somehow involved in the design. That a combination of mathematical learning and manual skill was possible in Islamic civilization is shown by the case of Ab¯ u H . ¯amid al-Khujand¯ı (ca. 980), who was trained in Greek geometry and astronomy, authored a number of geometrical and astronomical works, and was also a superb metal-worker.16 The source materials that we have discussed in this paper give a fascinating glimpse into a design tradition about which little is known. Our knowledge is based to a large extent on one single Persian manuscript which is now preserved in Paris. It is likely that a systematic search in manuscript libraries in the Islamic world will produce many more relevant documents, and lead to a significant increase in our insight into the working methods of the medieval Islamic craftsmen. Acknowledgement. I thank Viktor Bl˚ asj¨o for his comments on a preliminary version of this paper.
References [1] (Abu’l-Waf¯ a’) S.. A. c Al¯ı, ed., M¯ a yuh.t¯ aj ilayhi al-s.a ¯nic min c ilm al-handasa, li-Abu’lWaf¯ a’ al-B¯ uzj¯ an¯ı. Baghdad: University of Baghdad, 1979 [in Arabic]. [2] (Abu’l-Waf¯ a’) Applied geometry, Abolvefa Mohammad ibn Mohammad Albuzjani, rewritten into modern Persian with appendices by Seyyed Alireza Jazbi. Tehran: Soroush Press, 1991 [in Persian]. [3] (al-B¯ır¯ un¯ı) Jamil Ali, transl. The Determination of the Coordinates of Positions for the Correction of Distances between Cities, a translation from the Arabic of al-B¯ır¯ un¯ı’s Kit¯ ab Tah.d¯ıd nih¯ ay¯ at al-am¯ akin li-tas.h.¯ıh. mas¯ af¯ at al-mas¯ akin. Beirut: American University of Beirut, 1967. [4] (al-B¯ır¯ un¯ı) P. Bulgakov, ed., Kit¯ ab Tah.d¯ıd nih¯ ay¯ at al-am¯ akin li-tas.h.¯ıh. mas¯ af¯ at almas¯ akin li Ab¯ı’l-Rayh.a ¯n . . . al-B¯ır¯ un¯ı, Cairo 1962, reprint edition ed. F. Sezgin. Frankfurt, Institut f¨ ur Geschichte der arabisch-islamischen Wissenschaften, 1992, series Islamic Geography vol. 25. [5] S. Brentjes, Textzeugen und Hypothesen zum arabischen Euklid in der Uberlieferung von al-H gg ˇa ¯g ˇ b. Y¯ usuf b. Mat.ar (zwischen 786 und 833). Archive for History of . aˇ Exact Sciences 7 (1994), 53-92. [6] M. S. Bulatov, Geometricheskaya Garmonizatsiya v arkhitekture Srednei Azii IX–XV vv. Moskou: Nauka 1988 [in Russian]. [7] P. R. Cromwell and E. Beltrami, The Whirling Kites of Isfahan: Geometric Variations on a Theme. Mathematical Intelligencer 33 (2011), 84–93. 16 He made one of the most beautiful astrolabes of the entire Islamic tradition, which is now in the Museum of Islamic Art in Doha, Qatar, see [11, 503-517] and also the illustration of front page of [11].
Islamic geometric ornaments
741
[8] P. R. Cromwell, The Search for Quasi-Periodicity in Islamic 5-fold Ornament. Mathematical Intelligencer 31 (2009), 36–56. [9] J. P. Hogendijk, Ancient and modern secrets of Isfahan. Nieuw Archief voor Wiskunde fifth series, 9 (2008), 121. [10] (Khayy¯ am) L’Oeuvre Alg´ebrique d’al-Khayy¯ am, ed. R. Rashed, A. Djebbar. Aleppo, Institute for History of Arabic Science, 1981. [11] D. A. King, In Synchrony with the Heavens: Studies in Astronomical Timekeeping and Instrumentation in Medieval Islamic Civilization, Volume 2: Instruments of Mass Calculation, Leiden: Brill, 2005. [12] Manuscript Paris, Biblioth`eque Nationale, Persan 169, fol. 180a–199a. [13] Gh. H . Mossaheb, Hakim Omare Khayyam as an Algebraist. Tehran: Bahman Printing 1960. Anjomane Asare Melli Publications No. 38. [14] G. Necipoˇ glu, The Topkapı Scroll: Geometry and Ornament in Islamic Architecture. Santa Monica, Ca., Getty Center for the History of Art and the Humanities, 1995. ¨ [15] A. Ozdural, On Interlocking Similar or Corresponding Figures and Ornamental Patterns of Cubic Equations. Muqarnas 13 (1996), 191-211. ¨ [16] A. Ozdural, Mathematics and Arts: Connections between Theory and Practice in the Medieval Islamic World. Historia Mathematica 27 (2000), 171-201. [17] S. R. Prange, The tiles of infinity. Saudi Aramco World 60, September/October 2009, 24-31. http://www.saudiaramcoworld.com/issue/200905/the.tiles.of.infinity.htm [18] A. Q. Qorbani, B¯ uzj¯ an¯ı-N¯ ameh. Tehran 1371 A.H. (solar) [19] F. Sezgin, ed., Abu’l-Wafˆ a al-B¯ uzj¯ an¯ı. Texts and Studies, Collected and Reprinted. Vol. 2. Frankfurt, Institut f¨ ur Geschichte der arabisch-islamischen Wissenschaften, 1998. Series: Islamic Mathematics and Astronomy, vol. 61. [20] H. Suter, Das Buch der geometrischen Konstruktionen des Abu’l-Wef¯ a’, Beitr¨ age zur Geschichte der Mathematik bei den Griechen und Arabern, Hsg, J. Frank, Abhandlungen zur Geschichte der Naturwissenschaften und der Medizin IV. Erlangen 1922. Reprinted in Heinrich Suter, Beitr¨ age zur Geschichte der Mathematik und Astronomie im Islam, ed. F. Sezgin. Frankfurt, Institut f¨ ur Geschichte der arabischislamischen Wissenschaften, 1986, vol. 2, 635–630 Also reprinted in [19, 280–295] [21] F. Woepcke, Recherches sur l’Histoire des Math´ematiques chez les Orientaux. Deuxi`eme Article: Analyse et Extrait d’un recueuil de constructions g´eom´etriques par Aboˆ ul Wafˆ a. Journal Asiatique 5 (1855), 218–255, 309–359. ´ Reprinted in: Franz Woepcke, Etudes sur les math´ematiques arabo-islamiques, ed. F. Sezgin. Frankfurt, Institut f¨ ur Geschichte der arabisch-islamischen Wissenschaften, 1986, vol. 1, 483–572. Also reprinted in [19, 84–174]. Digital version: http://books.google.com/books?id=Z4gvAAAAYAAJ
Jan P. Hogendijk, Mathematics Department, Utrecht University, PO Box 80.010, NL 3508 TA Utrecht, Netherlands E-mail: [email protected]
Some mathematical aspects of the planet Earth Jos´e Francisco Rodrigues
Abstract. The Planet Earth System is composed of several sub-systems: the atmosphere, the liquid oceans, the internal structure and the icecaps and the biosphere. In all of them Mathematics, enhanced by the supercomputers, has currently a key role through the “universal method” for their study, which consists of mathematical modeling, analysis, simulation and control, as it was re-stated by Jacques-Louis Lions in [41]. Much before the advent of computers, the representation of the Earth, navigation and cartography have contributed in a decisive form to the mathematical sciences. Nowadays the International Geosphere-Biosphere Program, sponsored by the International Council of Scientific Unions, may contribute to stimulate several mathematical research topics. In this article, we present a brief historical introduction to some of the essential mathematics for understanding the Planet Earth, stressing the importance of Mathematical Geography and its role in the Scientific Revolution(s), the modeling efforts of Winds, Heating, Earthquakes, Climate and their influence on basic aspects of the theory of Partial Differential Equations. As a special topic to illustrate the wide scope of these (Geo)physical problems we describe briefly some examples from History and from current research and advances in Free Boundary Problems arising in the Planet Earth. Finally we conclude by referring the potential impact of the international initiative Mathematics of Planet Earth (http://www.mpe2013.org) in Raising Public Awareness of Mathematics, in Research and in the Communication of the Mathematical Sciences to the new generations. 2010 Mathematics Subject Classification. 00-XX
Ancient Mathematics and the Earth There is no doubt that the planet Earth is a main ancient root of mathematics. Distancing, constructing, spacing, surveying or angulating led to Geometry, that means literally measurement of the earth (respectively, metron and geo, from ancient Greek). The Babylonian tablets and the Egyptian papyri, which are dated back about 4000 years, are the first known records of elementary geometry. Even if it may be controversial to attribute to Pythagoras the idea that the shape of the Earth is a sphere, this was clear already to Aristotle (384–322 BCE) in his “On the Heavens”: “Its shape must be spherical. . . If the earth were not spherical, eclipses of the moon would not exhibit segments of the shape they do. . . Observation of the stars also shows not only that the earth is spherical but that it is not of great size, since a small change of position on our part southward or northward visibly alters the circle of the horizon, so that the stars above our heads change their position considerably, and we do not see the same stars as we move to the North or South.”
744
Jos´e Francisco Rodrigues
But if the Hellenistic scientists had observed the sphericity of the planet, they had also obtained a relatively accurate estimate of its radius. Indeed, we owe one of the first estimates of the circumference of the earth to Erastosthenes (276–194 BCE), a member of the Alexandrine school, who established it in 250,000 stadia. He measured in Alexandria the angle elevation of the sun at midday, i.e. the angular distance from the zenith at the summer solstice, and he found 1/50th of a circle (about 7◦ 120 ) making then the proportion, by knowing that Syene (Aswan) was on the Tropic of Cancer at a distance of about 5000 stadia. If the stadion meant 185 m, he obtained 46,620 km, an error of 16.3% too great, but if the stadion meant 157.5 m, them the result of 39,690 km has an error less than 2%!
Erastosthenes of Cyrene, as Heath wrote [27], “was, indeed, recognised by his contemporaries as a man of great distinction in all branches of knowledge”. He is remembered for his prime number sieve, still a useful tool in number theory, and was the first to use the word geography and to attempt to make a map of the world for which he invented a line system of latitude and longitude. Another old trigonometric technique, the basic principle of triangulation to determine distances of inaccessible points on earth, was used by Aristarchus of Samos (about 310–230 BCE) to estimate the relative sizes and distances of the Sun and the Moon. Even if these estimates were an order of magnitude too small, this was a remarkable intellectual achievement of the Hellenic mathematician. He was also a precursor of Copernicus, as one of the philosophers of the Antiquity to suggest the heliocentric theory in Astronomy. Ptolemy (about 100–178), the most influential Hellenic astronomer and geographer of his time, credited Eratosthenes to have measured the tilt of the Earth’s axis with great accuracy obtaining the value of 11/83 of 180◦ (23◦ 510 15”). In his Guide to Geography he gave information on the construction of maps of the known world in Europe, Africa and Asia. However, as we may see from a world map redrawn in the 15th century, from the present point of view his representation of the earth is not accurate at all, in particular showing the Atlantic and the Indian Oceans as closed seas. Ptolemy used Strabo’s value for the circumference of the Earth, which was too small with an error of 27.7%. This crude estimate has been used to explain the Columbus’ error of looking for Cipango (Japan) going West more than thirteen centuries later [53], but historians have recently discovered other reasons for this fact. In its great astronomical treatise of the second century, the Almagest, which geocentric theory was not superseded until a century after Copernicus’ book De Revolutionibus Orbium Coelestium of 1543, Ptolemy describes, in particular, a kind
Some Mathematical Aspects of the Planet Earth
745
15th century redrawn of Ptolemy’s world map http://en.wikipedia.org/wiki/File:PtolemyWorldMap.jpg
of ‘astrolabe’, which is a combination of graduated circles that later became a more sophisticated chief astronomical instrument reintroduced into Europe from the Islamic world. The nautical adaptation of the planispheric astrolabe was one of the tools used by the Portuguese navigator Bartolomeu Dias in his ocean expedition rounding Africa and crossing the Cape of “Boa Esperan¸ca” in 1488. This has shown the connection between the Atlantic and the Indian Oceans, a discovery that would change dramatically the geographical vision of the world, and has happened four years before Columbus’ first travel to the Antilles.
Martellus world map of 1489 or 1490 http://en.wikipedia.org/wiki/File:Martellus world map.jpg
746
Jos´e Francisco Rodrigues
This fact was immediately reflected in the world map made in 1489 or 1490 by Henricus Martellus and in the Nuremberg Globe of Martin Behaim, of 1492 [16]. Atlas and globes are treasures of the Renaissance cartography that illustrate how useful mathematical techniques were necessary for map making in the late 15th century, for practical navigations or for helping the European minds to change their concept of the world, as did the famous Globus Jagellonicus of 1510 that is considered as being the oldest existing globe showing the Americas. The strategic importance of the new terrestrial representations and of the ocean navigations, as new key technologies, goes beyond their scientific meaning and consequences. They represented technological breakthroughs and were decisive tools for the European expansion in the period 1400–1700, as “the conquest of the high seas gave Europe a world supremacy that lasted for centuries” [12]. If shape, measure and representation of the Earth were key elements in ancient mathematics, the novel problems and concepts of Renaissance mathematics, in particular, those associated with a new geometric approach to the theory and practice of navigation, as well as to mapping techniques, were instrumental in the rise of modern science. As the Dutch historian of science R. Hooykaas has stressed, “the great change (not only in astronomy or physics, but in all scientific disciplines) occurred when, not incidentally but in principle and in practice, the scientists definitively recognized the priority of Experience. The change of attitude caused by the voyages of discovery is a landmark affecting not only geography and cartography, but the whole of ‘natural history’.” [29]
Mathematical Geography and the Scientific Revolution(s) Recently historians of Mathematics have been recognizing the importance of Renaissance methods [33], often invoking the significant and countlessly repeated phrase of Galileo in “Il Saggiatore” (1623): “Philosophy is written in this grand book, the universe, (. . . ) written in the language of mathematics, and its characters are triangles, circles, and other geometric figures without which it is humanly impossible to understand a single word of it; without these, one wonders about in a dark labyrinth.” The Elements of Euclid were first printed, in Latin, in Venice in 1482, and had several vernacular translations in the following century in Italian, German, French, English and Spanish. The English edition, printed in London in 1570, contains a “very fruitfull Preface made by M. I. Dee, specifying the chiefe Mathematicall Sciences” [2]. In this influential text of the English scientist John Dee (1527– 1608), after stating that “Of Mathematicall thinges, are two principall kindes: namely, Number, and Magnitude”, he describes among the branches of his remarkable “Mathematicall Tree”, the “Arte of Nauigation, demonstrateth how, by the shortest good way, by the aptest Directi˜ o, & in the shortest time, a sufficient Ship, betwene any two places (in passage Nauigable,) assigned: may be c˜ oducted: and in all stormes, & naturall disturbances chauncyng, how, to vse the best possible meanes, whereby to recouer the place first assigned ” [17].
Some Mathematical Aspects of the Planet Earth
747
Geography and navigation were in fact extremely important in the 16th century [4] and it became now clear that Dee’s mathematical program has roots in the works of the Portuguese mathematician and cosmographer Pedro Nunes (1502– 1578). In a letter of 1558 to Mercator, Dee considered Nunes as the “most learned and grave man who is the sole relic and ornament and prop of the mathematical arts among us” [3]. Gerardus Mercator (1512–1594), the German cartographer and mathematician, in his Mapamundi of 1569 had constructed a new projection to represent the rhumb lines, i.e. the curves with a constant angle V (0 < V < π/2) with all meridians, as straight lines on a flat map [13]. Those spiral curves on the sphere, later also called loxodromes, as it is confirmed with recent new evidence by historians [3], were brought to Mercator’s attention by Nunes’ works, who a few years earlier had imagined and had discussed a method for their representation for nautical purposes. Although this major advance in mathematical cartography was of great importance for navigation [25], it took one and half century to understand completely the mathematical equation of the loxodrome and to establish that its stereographic projection in the plane is the logarithmic spiral, as it was published by the English astronomer Edmund Halley in 1696 [26].
The 1537’s original representation of the loxodromes by Pedro Nunes [48] Directly questioned by the seamen returning from the oceanic navigations, Pedro Nunes was the first to distinguish on the terrestrial globe between the loxodromic course, then called rhumb lines, consisting in navigating with a constant angle and the orthodromic course, which is the shortest distance on the arc of a great circle, i.e. the geodesic. In two small original treatises published in Portuguese in 1537, in particular in “Tratado em defensam da carta de marear ” (‘Treatise defending the nautical chart’) [4], Nunes described the spiral nature of the rhumb lines and represented them in a symmetrical picture inside an equinoctial circle. However, only in is Opera, a Latin Collectanea of his extended works published in Basel in 1566, Nunes clearly stated that the loxodrome behaves similarly to
748
Jos´e Francisco Rodrigues
a helix never reaching the pole, a concept he also described in a manuscript of the 1540’s found in Florence. Later in a manuscript of 1595 Thomas Harriot suggested a relation with the logarithmic spiral in the plane. In the book “Certain errors in navigation” published in London in 1599, another English mathematician and cartographer, Edward Wright (1561–1615) that studied carefully Nunes’ works, described precisely the process of representing rumbs lines as straight lines in Mercator charts. In modern notations, in a sphere with unit radius the equation of the loxodrome with angle V may be given by φ = −τ log tan(θ/2), where τ = tan V, φ is the longitude (φ = 0 at its intersection with the equator) and θ the colatitude. Hence, in Cartesian coordinates, z = cos θ and in the xy-plane of the equator x = sin θ cos φ and y = sin θ sin φ. By eliminating θ in the equation ρ = sin θ and using the loxodrome equation, we obtain ρ = sech(φ/τ ) = 2(eφ/τ + e−φ/τ )−1 . This equation is the orthogonal projection of the loxodrome in polar coordinates on the equator plan and represents a Poinsot spiral [24], a planar curve considered by the French mathematician Louis Poinsot in his 1834 geometrical mechanics theory [51]. Another remarkable work of Pedro Nunes is the book on twilights, De Crepusculis published for the first time in Lisbon in 1542. This geophysical problem had been considered by the Islamic Al-Andalus mathematician Ibn Mu’adh in the 11th century and by the Polish scientist Witelo in the 13th century. In his book, Nunes treated the twilight variation produced by the sun during the annual course through the ecliptic. Using spherical trigonometry only, he was able, in particular, to answer completely the question about the shortest twilight, while Jakob Bernoulli, l’Hospital and d’Alembert gave only an indirect, incomplete solution of the problem, as recognized by Delambre in 1815 and refereed by Gauss in 1817 [36]. No less important is the method invented by Nunes to improve the measurements of angles in the division of the scale of a quadrant or of a nautical astrolabe. The concept of the “nonius” arise in the second part of the “De Crepusculis”, after proposition III: “to construct an instrument well suited to observation of the heavenly bodies, with which one can accurately determine their altitudes” [20]. The method was used by Tycho Brahe (1546–1601) in two quadrants he constructed for his observatory in Uraniborg, in Denmark, that were equipped with the nonius and he described in the book “Astronomiae Insaturatae Mechanica” [11], first published in 1598, referring explicitly “inside this division there is yet another according to the principles set forth by . . . Petrus Nonnius in his learned little book on the Twilight”. The only known existing instrument of this period reproducing the Nunes’ invention is a quadrant used to measure altitudes dated 1595 and belonging to the Museum of the History of Science in Florence [20]. A pioneer and most important feature of this Iberian Renaissance mathematician was his professional activity as royal cosmographer from 1529 on and, in
Some Mathematical Aspects of the Planet Earth
749
The quadrant of the 16th century of the Museum of the History of Science in Florence and a picture from the Brahe’s “Astronomiae Insaturatae Mechanica” of 1598
addition of being mathematics professor at the University of Coimbra since 1544, his appointment as Cosmographer-in-Chief of Portugal in 1547, who was in charge of the examination the chart masters and nautical instruments manufacturers and of the certification of its quality [20]. These facts placed Pedro Nunes as one of the “few learned authors [who] began to be interested in the mechanical arts, which had become economically so important” in the sixteenth century, since “natural science needs theory and mathematics as well as experiments and observations” and “only theoretically educated men with rationally trained intellects were able to supply that other half of its methods to science”, so that paraphrasing Edgar Zilsel [62], “eventually the social barrier between the two components of the scientific method broke down, and the methods of the superior craftsmen were adopted by academically trained scholars: the real science was born”. These original contributions in mathematical navigation, together with other advances in geography, astronomy, architecture, mechanics and music, are significant examples of a paradigm of Renaissance mathematics on the establishment of a “program for the mathematization of the real world” [38] and have contributed to the creation of the first “Academia Real Mathematica” de Madrid, by Filipe II, while he was in Lisbon in 1582. In a remarkable text on the institutional foundations of this pioneer Iberian academy, Juan de Herrera [27] refers the importance of “las disciplinas Mathematicas que abren la entrada y puerta a todas las demas sciencias por su grande certitude y mucha euidencia, donde tomaron el nombre de Mathematicas o disciplinas que todo es vno, porque manifestan el methodo verdadero y orden de saber ” (“the Mathematical disciplines that open the door and entrance to all other sciences for his great certitude and much evidence, which took the name of Mathematics or disciplines that the whole is one, because they express the true method and order of knowledge”). Indeed, almost one century before
750
Jos´e Francisco Rodrigues
Galileo’s famous statement, Nunes in his 1537 work on mathematical navigation had already written “(. . . ) because no rule that is based on speculative or theoretical knowledge can be well practiced and understood if one doesn’t know this same principles” since “nothing is most evident than mathematical demonstration, which by no means, is possible to be contested” [4].
Winds, Heating, Earthquakes and Climate The atmosphere, the liquid oceans, the icecaps, the internal structure and the biosphere are the sub-systems of the Planet Earth whose mathematical models and simulations, nowadays enhanced by the supercomputers, play an essential role, as observed by J.-L. Lions (1928–2001) in [41]. Much before the advent of computers, while navigation and cartography have contributed to advance the mathematical knowledge in the Renaissance, after the invention of the Calculus in the following century, the planet Earth has continued to contribute in several ways to advance Mathematical Analysis in the eighteen and nineteen centuries as well.
The partial differential calculus appeared in the end of the 17th century with trajectory and isoperimetric problems with one parameter families of curves. How-
Some Mathematical Aspects of the Planet Earth
751
ever the creation of the theory of partial differential equations took a decisive step in the works of Jean D’Alembert (1717–1783) on the “vibrating string problem” and on a problem of the Planet Earth [18]. The introduction in 1743 in his “Trait´e de dynamique” of the first wave equation of the type ddy dy − (l − s) 2 dt2 ddy = ds ds was followed by the more developed work published in Paris in 1747 on “R´eflexions sur la cause g´en´erale des vents”. In this book, that won the 1746 prize of the Academy of Sciences of Berlin, d’Alembert considered the geophysical problem of the vibrations of a layer of air (winds) under the action of the Earth’s rotation and he derives for the first time the general formula for the solution of the wave equation that has now his name. He developed the method of Euler, introducing first order systems of the type ∂β ∂α = ∂t ∂s
and
ν
∂β ∂α =ρ + ϕ(t, s). ∂t ∂s
Looking for total differentials and using the usual change of coordinates, d’Alembert arrived to the general solution of the wave equation in terms of two arbitrary functions defined on the characteristics equations t + λs = C1 and t − λs = C2 , where λ2 = ν/ρ. The third part of his M´emoire actually initiated this new branch of mathematical analysis that was immediately followed and developed by Euler and Lagrange. Also related to the Planet Earth, the Academy of Sciences of Paris had proposed in 1738 in the class of mathematics the theme “the cause of the flux and reflux of the sea” and among the prizewinners were Daniel Bernoulli, Euler and MacLaurin. But, as J.-L. Lions also observed in his interesting little book [41], Joseph Fourier (1768–1830) became a forerunner by calling the attention in 1824 on the possible effect of anthropogenic factors on the surface temperature of the Earth. In fact, Fourier, recognized in [23]: “La question des temp´eratures terrestres m’a toujours paru un des plus grands objets des ´etudes cosmologiques, et je l’avais principalement en vue en ´etablissant la th´eorie math´ematique de la chaleur ” (“The question of the terrestrial temperatures has always seemed to me one of greatest subjects of the cosmological studies, and I had it mainly in mind in establishing the mathematical theory of heat”); and in [23] he also conjectured explicitly that “L’´etablissement et le progress des soci´et´es humaines, l’action des forces naturelles peuvent changer notablement, et dans de vastes contr´ees, l’´etat de la surface du sol, la distribution des eaux et les grands mouvements de l’air ” (“The establishment and progress of the human societies, the action of natural forces can change significantly, and in large parts, the surface condition of the soil, the distribution of water and the large air movements”), as possible causes of the variation of the average temperatures in the course of several centuries. Indeed the interest of Fourier on the temperature of the Earth dated from the first manuscripts of 1807 of his “Th´eorie Analytique de la Chaleur ”, which final version was published in Paris only in 1822. In his M´emoire [22] of 1820 he referred
752
Jos´e Francisco Rodrigues
three factors acting upon the heat of the planet: the heating by the sunrays, the internal heat of the globe and the secular dissipation due to the cooling of the Earth. Then he described the model of the “refroidissement s´eculaire” of the sphere using the heat equation in polar coordinates for the temperature ν at time t and at the layer of radius x: 2 2 ∂ν K ∂ ν ∂ν + = ∂t CD ∂x2 x ∂x where K, C and D are physical constants. To model the heat change at the surface of the Earth Fourier was the first to write the differential equation of the qualitative law observed experimentally by Newton in 1701: K
∂ν + hν = 0, ∂x
which means that the heat flux is proportional to the surface temperature with constant h/K. He gave then the exact general solution in the form of a trigonometric series in x with an exponential in t with precise coefficients in terms of the initial temperature, by referring to his earlier work of 1807. Finally he concluded that “on peut connaˆıtre, au moyen de cette formule, toutes les circunstances du refroidissement d’un globe solide dont le diam´etre n’est pas extrˆemement grand ” (“By means of this formula, we may know all circumstances of the cooling of a solid globe whose diameter is not extremely large”).
The young S. L. Sobolev with A. N. Krilov in the 1930’s More than a century later, after graduating at the Physics and Mathematics Faculty of the Leningrad (St Petersburg) University in 1929, Sergey Sobolev (1908– 1989) worked at the Theoretical Department of the Seismological Institute and at the Steklov Institute of Physics and Mathematics of the USSR Academy of Sciences in his city. In the first decade of his mathematical production, he introduced
Some Mathematical Aspects of the Planet Earth
753
and developed new theories that were fundamental for the development of partial differential equations and functional analysis in the 20th century [7]. Besides his deep work in hyperbolic partial differential equations of second order and the proof of the seminal Sobolev inequalities, which are at the basis of the theory of Sobolev spaces, he introduced in 1934 generalized solutions [58] for the wave equation, a landmark that became adaptable to an enormous number of problems of mathematical physics and other areas of mathematics. The following year he introduces the novel concept “solution in functionals” for the Cauchy problem for hyperbolic linear equations, which corresponds to what is now called a distribution. Also in 1934 another young mathematician, Jean Leray (1906–1998) after completing his PhD at the Faculty of Science of Paris the year before, introduced the fundamental concept of weak solutions of the Navier–Stokes equations, which he called “solutions turbulentes”, based on the new definition of “quasi-d´eriv´ees” and on the approximation of integrable functions by mollification in [39]. In fact, Leray used implicitly the Sobolev space H 1 (R3 ) in his existence theory and raised a still open double question. Sixty years later he described in [40] this double question that became known as one of “The Millennium Problems”: “The theoretical study of a fluid motion with initial condition leads in various cases to a same conclusion: the existence of at least one weak solution that is regular and unique near the initial time, and exists for any later time ( . . . ) but does it remain regular and uniquely determined? ” If the wave equation in heterogeneous media is an essential tool in theoretical seismology, the Navier–Stokes equations are of no less importance in meteorology, in particular, in developing mathematical models of the atmosphere or of the ocean. For instance, with the purpose to understand the long-term weather prediction and climate changes, in 1992 J.-L. Lions proposed, in collaboration with R. Temam and S. Wang [42], a long range project based on the mathematical study and computational simulations of an important class in the hierarchy of models in geophysical fluid dynamics, the primitive equations of the dynamics of the atmosphere and of the oceans, as well as their coupling effects, by analogy to the mathematical theory of Navier–Stokes equations (see [50], for a recent survey). From a different point of view, the Serbian mathematical-physicist Milutin Milankovitch (1879–1958) in the first quarter of the 20th century studied the effect of solar radiation in different latitudes and its consequence in the planetary albedo and proposed a decisive theory on “mathematical climate” [46] to explain the Earth’s long term climate changes caused by the variation of its position, including obliquity and precession, in the orbit around the Sun (the Milankovitch cycles). In a certain sense, this work started a whole hierarchy of climate models, paving the way to John von Neumann optimistic suggestion in 1955, by referring to climate control through managing solar radiation: “Probably intervention in atmospheric and climate matters will come in a few decades, and will unfold on a scale difficult to imagine at present” [47]. In recent decades, the “rising tide of scientific data” and the advances of mathematical and computational tools has raised climate science to a whole new level, although global representations and predictions are still very difficult, of limited
754
Jos´e Francisco Rodrigues
range and often quite deficient. Nevertheless, applied mathematics is bringing new contributions among the whole range of multiscale problems of this interdisciplinary research, by blending asymptotic, qualitative, numerical simulations with rigorous analysis. For instance, the modeling of climate phenomena in the tropics in the range of scales from kilometers to ten thousand kilometers and shorter time scales has been recently surveyed in [34]. In this review paper, the dynamics of precipitation fronts in the tropical atmosphere are modeled as large-scale boundaries between moist and dry regions and treated as a new hyperbolic free boundary problem for which rigorous mathematical analysis is possible in the framework of weak solutions.
Free Boundary Problems (FBP) in the Planet Earth Generally speaking, free boundaries are interfaces, for instance curves or surfaces, which are a priori unknown and separate different regions in space and/or time and appear typically in models with phase changes. Their mathematical treatment has a long history and a large scope of applications. It was the subject of a five year scientific program of the European Science Foundation in the 1990’s [55] and interfaces and free boundaries appear often in the mathematics of models for climatology and environment [19]. As typical examples of FBPs relevant to these models, we have the dynamic of glaciers and ice sheets that lead to very complex and rich mathematical questions, like for instance, the motion of grounding lines which solutions with zero contact angle have been shown to exist and were asymptotically characterized in the recent work [21]. The first FBP was exactly motivated by an application of the “Th´eorie Analytique de la Chaleur ” to the cooling of the terrestrial globe. Lam´e and Clapeyron presented, at the session of 10th May 1830 of the Academy of Science of Paris, the elegant solution of what is now known as the one phase Stefan problem in one space dimension: the depth of the solidification front of the Earth, supposed homogeneous and initially liquid √ [37]. Looking for solutions of the heat equation of the single variable y = x/ t, they obtained the formulas for the temperature ν(x, t) in the solid and for the free boundary x = ϕ(t) Z
√ x/ t
ν(x, t) = A
e−y
2
/4
dy
and
√ ϕ(t) = β t,
0
where the constants A and β are determined as the unique roots of the transcendental equations involving the normalized latent heat λ and the solidification temperature σ at the two free boundary conditions σ = ν(ϕ(t), t)
and
λ
dϕ ∂ν (t) = (ϕ(t), t). dt ∂x
This type of phase transition problems was named after the Slovene mathematical-physicist Joseph Stefan, who published in 1889 a series of four papers [60] where a model for the ice formation in the polar oceans was discussed. The
Some Mathematical Aspects of the Planet Earth
755
impressive bibliography [61], containing about 5900 references up to the year 2000, reflects the huge number of variants and applications of Stefan-type problems. For instance, the first proof of existence and uniqueness of the generalized solution of the multidimensional (one or two-phase) Stefan problem was given by Shoshana Kamin [32] in her 1958 doctoral thesis at Moscow University under the supervision of Olga Oleinik [49]. From the mathematical point of view, the one-phase Stefan problem is closely related to another FBP arising in filtration through porous media. In the “dam problem”, an engineering model arising in the control of a particular but important Earth problem, Claudio Baiocchi has observed in 1971 [8] that a simple transformation allowed to reformulate the problem as a variational inequality, opening new directions of research [9]. Similarly, for the multi-dimensional one-phase Stefan problem, the variational inequality approach has enabled the analysis of the regularity of the free boundary, in particular, by Luis Caffarelli in 1976 [14]. As a consequence, that allowed, under certain conditions, to show that the generalized solution is also the classical one, in which the jump conditions are satisfied at the moving interface (see, for instance, [54] for references and details).
The breakthrough of salt water creates a cusp in the free boundary separating the fresh from the salted water It is not possible to give a complete picture of the FBPs encountered and modeled in the Planet Earth, but in order to illustrate the variety of mathematical results and contributions in the last years we give a few examples. For instance the design of freshwater reservoirs in coastal regions, when freshwater overlays salt water from the sea may lead, in a special case, to a two-phase free boundary problem for the stationary flow in a porous medium. In [5] Alt and van Duijn using a weak formulation have shown rigorously that the separation of fresh and salt water, in two or three dimensions, is a continuous interface, which ends up in the well and has an asymptotic behavior that can be precisely described in function of a real parameter, the water discharge at the well. This special case is related to another relevant but different model for the joint motion of two immiscible liquids, the Muskat problem. Starting with the stationary Stokes system for an incompressible inhomogeneous viscous fluid, occupying a pore space, coupled with the stationary Lam´e equations for an incompressible elastic solid skeleton, through suitable boundary conditions on the common boundary “solid skeleton – pore space”, and a transport equation for the unknown liquid density, Meirmanov [45] has recently used homogenization techniques to derive a well-posed model of viscoelastic filtration.
756
Jos´e Francisco Rodrigues
FBPs for the Navier–Stokes and related equations constitute a major challenge and source of open questions. In the framework of coupled models for the dynamics of the atmosphere and the ocean, a free surface problem is considered and some existence and uniqueness results are shown for an eddy viscosity model in [43]. Another classical fundamental problem is the shape and stability of equilibrium figures of a uniformly rotating isolated fluid, at least since the famous publication in 1743 of “Th´eorie de la figure de la Terre”, by Alexis Clairaut, supporting the Newton–Huygens conjecture that the Earth was flattened at the poles. Studying the Navier–Stokes equations with surface tension and kinematic free boundary conditions, Vsevolod Solonnikov in [59] has recently shown that the positivity of the second variation of the energy in an appropriate functional space is a sufficient condition for the stability of even certain nonsymmetrical equilibrium figures of rotating viscous fluids, confirming an old conjecture of Poincar´e and Lyapunov of the end of the nineteenth century.
A very interesting mathematical model in Aeolian research is the “sand pile problem”, motivated by the detachment, transport and deposition of sediments by wind. Among different approaches [30], Prigozhin in [52] has observed that the shape of a growing pile z = u(x, t), x ∈ Ω ⊂ R2 , t > 0 of a cohesionless granular material, being characterized by its angle of repose α > 0, is constrained through its surface slope, i.e. |∇u(x, t)| ≤ γ = tan α. This condition is complementary to a general conservation of mass in the form ∂u + ∇ · Φ(u, ∇u) = f, ∂t where the source of material f is given. The horizontal projection of the material flux Φ in the case without convection is directed to the steepest descent Φ = −µ∇u and subject to the unilateral condition µ ≥ 0,
|∇u| ≤ γ
and
|∇u| < γ =⇒ µ = 0.
This model corresponds to an evolutionary variational inequality, may be regarded as a limit case of the p-Laplacian diffusion when p → ∞ [6] and has been extended in several directions. The case γ = G(u) corresponds to a quasi-variational inequality and has applications to other critical-state problems, such as magnetization of type-II superconductors, formation of networks of lakes and rivers, and was preceded by problems in elastic-plastic deformations (see [56] for a recent mathematical treatment and references).
Some Mathematical Aspects of the Planet Earth
757
The Planet Earth is full of very complex FBPs and challenges to the mathematical modeler, as the mere example of a sand pile or an avalanche may illustrate. As a simple toy problem to illustrate the variational inequality approach, we describe the free boundary evolution for the sand pile with constant convection in one space dimension, with slope γ = 1 and constantly increasing source. We may find the explicit solution that solves the following simple problem: Find u = u(·, t) ∈ K = {ν ∈ H01 (0, 1) : |ν 0 (x)| ≤ 1}, t > 0, such that: Z 1 ∂u ∂u (t) + (t) − t (ν − u(t))dx ≥ 0, t > 0, ∀ν ∈ K, ∂t ∂x 0 with initial condition u(x, 0) = −x2 /2, if 0 ≤ x ≤√ξ(0), u(x, 0) = x − 1, if ξ(0) ≤ x ≤ 1. Choosing as initial free boundary ξ(0) = 3 − 1, a second free boundary point ς(t) < ξ(t) appears for t > 1 and the solution attains the steady state exactly at t = 5/4 when ς(5/4) = ξ(5/4) = 1/2 (see figure at times t = 0, 3/4, 8/9 and 5/4).
Satellite image and simulated rivers and lakes of the R´eunion island, by Barrett and Prigozhin (FBP2012) http://www.uni-regensburg.de/Fakultaeten/nat Fak I/fbp2012/FBP2012 files/Talks/Prigozhin talk.pdf
It is interesting to notice that this approach is applicable to other geological problems, such as lakes and rivers, by taking also a quasi-variational inequality model that allows the solution to describe “rivers” running out of “lakes” and flowing in the steepest descent direction until reaching the next horizontal surface of a lake. By approximating the continuous problem, the zero-repose-angle limit of a growing sand pile model, with a network of fluxes along the edges of a triangulation, in 2012 Barrett and Prigozhin [10] were able to provide a numerical simulation with public domain (from the Shuttle Radar Topography Mission) real data of the R´eunion island in the Indian ocean with a mesh of 504 thousand triangles obtaining a very realistic picture.
Raising Awareness of Mathematics and MPE2103 The “Mathematics of Planet Earth” initiative (MPE2013) aims to be an important occasion for showing the essential relevance of mathematical sciences in planetary
758
Jos´e Francisco Rodrigues
issues at research level for solving some of the greatest challenges of our century. The scope is broad as we may see from the four themes compiled by C. Rousseau [57], that range from “a planet to discover” (oceans, meteorology and climate, mantle processes, natural resources, celestial mechanics cartography) to “a planet at risk” (climate change, sustainable development, epidemics, invasive species, natural disasters), nor forgetting that “our planet supports life” (ecology, biodiversity, evolution) and is “organized by humans and structured by civilization” (political, economic, social and financial systems, transport organization and communication networks, management of resources and energy). Like in the World Mathematical Year 2000, it should also be an opportunity for stimulating a collective reflection on the great challenges of the 21st Century, on the role of mathematics as a key for development and on the importance of the image of mathematics in the public understanding.
As observed by Jones [31], “since the Earth is not available for experimentation, climate science relies on mathematical models to make up its ‘laboratory’”, the mathematical community should address this topic and be also aware “that the greatest challenges as well as the greatest promise for novel and innovative mathematical thinking is at this interface between data and models”. On another hand, for the no less important Raising Public Awareness of mathematics at the educational and societal levels, mathematicians should be encouraged to write expository versions of popular mathematical lectures, for instance, on the global change, as in [35], or on sand piles and avalanches, as in [15] and [1]. But any other well-established Public Awareness of some terrestrial phenomenon X may be chosen to Raising the Public Awareness of Mathematics, as the example of X = wildfires, illustrated with interesting ideas in [44]. Among several initiatives promoted by MPE2013, the international competition for modules for a virtual global exhibition, which launching has the UNESCO patronage, and the research initiatives announced for 2013, in particular, in Europe by the ERCOM centers, deserve special reference.
Acknowledgements. During the preparation of the lecture, that was sponsored by the London Mathematical Society, and of this article, the author has profited of several conversations and references kindly shared with B. Almeida, J. I. Diaz, K. Hutter, H. Leit˜ ao, L. Prigozhin, L. Santos, D. Tarzia and J. P. Xavier.
Some Mathematical Aspects of the Planet Earth
759
References [1] C. Acary-Robert, D. Dutykh, and M. Gisclon, Une approche pour simuler des avalanches de neige. Images des Math´ematiques, CNRS, 2011. http://images.math.cnrs.fr/Une-approche-pour-simuler-des.html [2] G. L. Alexandeson and W. S. Greenwalt, About the cover: Billingsley’s Euclid in English. Bull. AMS, vol. 49 no. 1 (2012), 163–167. [3] B. Almeida, On the origins of Dee’s mathematical programme: The John Dee-Pedro Nunes connection. Studies in History and Philosophy of Science, Part A, vol. 43 no. 3 (2012), 460–469. [4] B. Almeida and H. Leit˜ ao, Pedro Nunes (1502–1578). Mathematics, cosmography and nautical Science in the 16th century. http://pedronunes.fc.ul.pt (2009), Accessed 15 September 2012. [5] H. W. Alt and C. J. van Duijn, A free boundary problem involving a cusp: breakthrough of salt water. Interfaces and Free Boundaries 2 (2000), 21–72. [6] G. Aronson, L. C. Evans, and Y. Wu, Fast/Slow Diffusion and Growing Sandpiles. J. Diff. Eq. 131 (1996), 304–335. [7] V. Babich, On the Mathematical Works of S. L. Sobolev in the 1930’s. in V. Maz’ya (ed.) Sobolev Spaces in Mathematics II, International Mathematical Series, Springer (2009), 1–9. [8] C. Baiocchi, Sur un probl´eme a ` fronti`ere libre traduisant le filtrage de liquides a ` travers le milieu poreux. C. R. Acad. Sci. Paris 273-A (1971), 1215–1217. [9] C. Baiocchi and A. Capelo, Disequazioni variazionali e quasivariazionali. Applicazioni a problemi di frontiera libera, Vol. 1, 2. Pitagora Editrice, Bologna (1978) (English transl. Wiley, 1983). [10] J. W. Barret and L. Prigozhin, Lakes and Rivers in the Landscape: A QuasiVariational Inequality Approach. Communication to 12th International Conference on Free Boundary Problems: Theory & Applications, Germany, 11–15 June 2012. http://www.uni-regensburg.de/Fakultaeten/nat Fak I/fbp2012/ [11] T. Brahe, Astronomiæ instauratæ mechanica. Wandsbek 1598. English transl. in: Tycho Brahe’s Description of his Instruments and Scientific Work, København 1946; http://www.kb.dk/en/nb/tema/webudstillinger/brahe mechanica/index.html [12] F. Braudel, Civilization and Capitalism 15th –18th Centuries, The Structures of Everyday Life (vol. 1). 1979 (English translation, 1982). [13] N. Crane, Mercator, The Man who Mapped the Planet. Weidenfeld & Nicolson, London, 2002. [14] L. Caffarelli, The regularity of free boundaries in higher dimensions. Acta Math. 139 (1977), 156–184. (see also: Indiana Univ. Math. J. 27 (1978), 73–77.) [15] P. Cannarsa and S. Finzi Vita, Pile di sabia, dune, valanghe: modelli matematici per la material granulare. Lettera Matematica PRISTEM 70-71 (2009), 47–61. [16] A. Davies, Behaim, Martellus and Columbus. The Geographical Journal vol. 143, no. 3 (Nov., 1977), 451–459. [17] J. Dee, The Project Gutenberg EBook of The Mathematicall Praeface to Elements of Geometrie of Euclid of Megara, by John Dee. http://www.gutenberg.org/files/22062/22062-h/main.html
760
Jos´e Francisco Rodrigues
[18] S. S. Demidov, Cr´eation et d´eveloppement de la th´eorie des ´equations diff´erentielles aux d´eriv´ees partielles dans les travaux de J. d’Alembert. Rev. Hist. Sci. XXXV (1982), 3–42. [19] J. I. Diaz (Ed.), The Mathematics of Models for Climatology and Environment. Proceedings of the NATO Advanced Institute, January 11–21, 1995, Tenerife, Spain, Spriger-Verlag Berlin, 1997. [20] A. Est´ acio dos Reis, Pedro Nunes’ Nonius. In: The Practice of Mathematics in Portugal. Acta Universitatis Conimbrigensis, Coimbra 2004, 195–223. [21] M. A. Fontelos and A. I. Mu˜ noz, A free boundary problem in glaciology: The motion of grounding lines. Interfaces and Free Boundaries 9 (2007), 67–93. [22] J. Fourier, Sur le refroidissement s´eculaire du globe terrestre. Annales de Chimie et de Physique 13 (1820), 418–438 (partially in Oeuvres, II, 271–288). [23] J. Fourier, Remarques generals sur la temp´erature du globe terrestre et des espaces plan´etaires. Annales de Chimie et de Physique 27 (1824), 136–167 (also in Oeuvres, II, 97–125). [24] F. Gomes Teixeira, Trait´e des Courbes Sp´eciales Remarquable Planes et Gauches, Ouvrage couronn´e et publi´e par l’Acad´emie Royale des Sciences de Madrid, Traduit de l’espagnol, revu et tr`es augment´e, Tome II. Coimbra 1909, Reprint 1995 Ed. J. Gabay, Paris. [25] R. D’Hollander, La th´eorie de la Loxodromie de Pedro Nunes. In: Proceedings of 2002 International Conference on Petrii Nonii Salaciensis Opera, Universidade de Lisboa, 2003, 63–111. [26] E. Halley, An Easie Demonstration of the Analogy of the Logarithmick Tangents to the Meridian Line. Philosophical Transactions of the Royal Society of London 19 (1696), 199–214. [27] T. L. Heath, A History of Greek Mathematics (2 vols.). Oxford, 1921. [28] J. de Herrera, Institvcion de la Academia Real Mathematica. Madrid, 1584 (Reprint in 1995, by Instituto de Estudios Madrile˜ nos). [29] R. Hooykaas, The Rise of Modern Science: When and Why? British Journal for History of Science 20(4) (1987), 453–473. [30] K. Hutter and N. Kirchner (Eds.), Dynamic Response of Granular and Porous Materials under Large and Catastrophic Deformations. Springer, Berlin–Heidelberg, 2003. [31] C. K. T. Jones, Will climate change mathematics(?) IMA J. Appl. Math. 76 (2011), 353–370. [32] S. L. Kamenomostskaya, On Stefan Problem. Nauchnye Doklady Vysshey Shkoly, Fiziko-Matematicheskie Nauki 1(1) (1958), 60–62 (in Russian), Zbl 0143.13901. See also Matematicheskii Sbornik 53(95) (4), (1961) 489–514, MR 0141895, Zbl 0102.09301. [33] V. J. Katz, A History of Mathematics. Harper-Collins College Publishers, New York, 1993. [34] B. Khouider, A. J. Majda, and S. N. Stechmann, Climate Science in the Tropics: Waves, Vortices and PDEs. Nonlinearity (in print, 2013). [35] R. Klein, Mathematics in the Climate of Global Change, Chap. 15. “Mathematics Everywhere” M. Aigner, E. Behrends (Eds.). American Mathematical Society, Providence, R.I. 2010, 197–216.
Some Mathematical Aspects of the Planet Earth
761
[36] E. Knobloch, Nunes’ “Book on Twilights”. In: Proceedings of the 2002 International Conference on Petrii Nonii Salaciensis Opera, Universidade de Lisboa, Lisboa, 2003, 113–140. [37] G. Lam´e and B. P. Clapeyron, Memoire sur la solidification par refroidissement d’un globe liquide. Annales Chimie Physique 47 (1831), 250–256. [38] H. Leit˜ ao, Ars e ratio: A n´ autica e a constitui¸ca ˜o da ciˆencia moderna. In: M. V. Maroto and M. E. Pi˜ neiro (Eds.), La ciencia y el mar (183–207). Valladolid (2006).: Los autores. [39] J. Leray, Sur le mouvement d’un liquide visqueux emplissant l’espace. Acta Mathematica 63 (1934), 193–248. [40] J. Leray, Aspects de la m´ecanique th´eorique des fluides, La Vie des Sciences. Comptes Rendus de l’Acad´emie des Sciences, Paris, S´erie G´en´erale, 11 (1994), 287–290. [41] J.-L. Lions, El Planeta Tierra, El papel de las Matem´ aticas y de los super ordenadores. Instituto de Espa˜ na, Madrid, 1990. [42] J.-L. Lions, R. Temam, and S. Wang, New formulations of the primitive equations of the atmosphere and applications and On the equations of the large-scale ocean. Nonlinearity 5 (1992), 237–288 and 1007–1053. [43] J.-L. Lions, R. Temam, and S. Wang, Probl`eme a ` fronti`ere libre pour les mod`eles coupl´es de l’oc´ean et de l’atmosph`ere. C. R. Acad. Sci. Paris, S´er. I Math. 318 (1994), 1165–1171. [44] S. Markvorsen, From P A(X) to RP AM (X). In: E. Behrends, N. Crato and J. F. Rodrigues (Eds.) Raising Public Awareness of Mathematics. Springer, Berlin– Heidelberg 2012, 255–267. [45] A. Meirmanov, The Muskat problem for a viscoelastic filtration. Interfaces and Free Boundaries 13 (2011), 463–484. [46] M. Milankovitch, Th´eorie math´ematique des ph´enom`enes thermiques produits par la radiation solaire. Gauthier-Villars et Cie, Paris, 1920. [47] J. von Neumann, Can we survive Technology? Fortune, June 1955 (also in Collected Works, vol. VI, Pergamon Press, 1963; see also: Population and Development Review 12, no. 1 March 1986, 117–126). [48] P. Nunes, Obras, vol. 1. Academia de Ciˆencias de Lisboa (1940). A new edition of six volumes of Nunes works has been done by the Funda¸ca ˜o Calouste Gulbenkian, Lisbon 2002–2010. See direct links to the digitized original works in http://pedronunes.fc.ul.pt/works.html [49] O. A. Oleinik, A method of solution of the general Stefan problem (in Russian). Doklady Akademii Nauk SSSR 135, 1050–1057, MR 0125341, Zbl 0131.09202. [50] M. Petcu, R. Temam, and M. Ziane, Some Mathematical Problems in Geophysical Fluid Dynamics. Handbook of Numerical Analysis 14 (2009), 577–750. [51] L. Poinsot, Th´eorie nouvelle de la rotation des corps. J. Math. Pures et Appl. 1er s., t. 16 (1851), 289–336. [52] L. Prigozhin, Sandpiles and river networks: Extended systems with nonlocal interactions. Phys. Rev. E 49 (1994), 1161–1167 (see also: Eur. J. Appl. Math. 7 (1996), 225–235). [53] H. L. Resnikoff and R. O. Wells, Mathematics in Civilization. Dover, New York, 1973.
762
Jos´e Francisco Rodrigues
[54] J. F. Rodrigues, The variational inequality approach to the one-phase Stefan problem. Acta Appl. Math. 8 (1987), 1–35. [55] J. F. Rodrigues, Mathematical Treatment of Free Boundary Problems. ESF Communications #28, April 1993, 18–19 http://newsletter.fbpnews.org, Accessed 15 September 2012. [56] J. F. Rodrigues and L. Santos, Quasivariational Solutions for First Order Quasilinear Equations with Gradient Constraint. Arch. Rational Mech. Anal. 205 (2012), 493– 514. [57] C. Rousseau, Four themes with potential examples of modules for a virtual exhibition on the “Mathematics of Planet Earth”. Centro Internacional de Matem´ atica Bulletin #30 Jul 2011, 31–32. [58] S. L. Sobolev, Generalized solutions to the wave equation (Russian). In: Proc. the 2nd All-Union Math. Congr. (Leningrad, 24–30 June 1934), Vol. 2 (1936), p. 259. Akad. Nauk SSSR, Moscow–Leningrad. [59] V. A. Solonnikov, On the stability of nonsymmetric equilibrium figures of a rotating viscous incompressible liquid. Interfaces and Free Boundaries 6 (2004), 461–492. ¨ [60] J. Stefan, Uber die Theorie der Eisbildung, insbesondere u ¨ber die Eisbildung im Polarmeere Sitzungsber. Wien. Akad. Mat. Nat. 98 (1889), 965–983 (see also pp.: 473–484, 614–634 and 1418–1442). [61] D. Tarzia, A Bibliography on Moving-Free Boundary Problems for the Heat-Diffusion Equation. The Stefan and Related Problems. MAT, Series A: Conferencias, seminarios y trabajos de matem´ atica. Univ. Austral., Rosario, #2 (2000), 1–297, MR 1802028, Zbl. 0963.35207. [62] E. Zilsel, The Sociological Roots of Science. In: The social origins of modern science, by E. Zilsel, Kluwer Academic Publ., Dordrecht, 2003.
Jos´e Francisco Rodrigues, Centro de Matem´ atica e Aplica¸c˜ oes Fundamentais, Universidade de Lisboa, Av. Prof Gama Pinto, 2, 1649-003 Lisboa, Portugal E-mail: [email protected]
Turing’s mathematical work P. D. Welch∗
Abstract. We sketch a brief outline of the mathematical, and in particular the logical, achievements of Turing in this, his centenary year. 2010 Mathematics Subject Classification. Primary 01-A70; Secondary 03-02. Keywords. Turing, computable numbers, ordinal logics.
1. Introduction This is the centenary year of Alan Mathison Turing’s birth: there have been many celebrations of the life and work of the man, with a veritable accumulation point around his birthday, June 23rd. It would have been impossible 20 years ago to imagine this year’s stream of events, a considerable portion of which is not restricted to academic circles, but is in the very public eye: perhaps the arc of his life strikes a particular chord as someone emblematic of his country’s history, his mileu, of a time past. As is appropriate for a Proceedings of this type, in this review we intend to take stock of his purely mathematical contributions, and put aside his war-time coding work, and the post-1945 work on the development of computers, and of morphogenesis on which he was working when he died. In this we have made a somewhat personal choice of his papers. This includes some of his unpublished work. His major contributions are in mathematical logic and I concentrate largely on explications of his two main papers there. However, within logic, there are a number of papers (and unpublished work) on type theory that are perhaps a bit too specialised or too dated for this account, so we have simply decided to omit any discussion of them. For a full list of his papers the reader should of course consult the mathematical volumes of the Collected Works [17] and [18] as well as the further volumes on Computation and Morphogenesis for his work there. This account is, for the main, chronological. His mathematical upbringing was in a conventional English public school. Sherborne College where he was sent as a boarder (as his parents lived abroad – a fate of many children of that class in the Britain of that era) appears not to have exerted a great influence on him scientifically. He seems to have educated himself in many respects: he evidenced a lively curiosity in all things scientific, in matters biological, mechanical and physical from a young age. At home he would make up chemistry experiments, and there is a charming drawing by his mother of Alan playing hockey for his team – except that he isn’t: the teams are engaged on the ∗ The author is grateful to the Isaac Newton Institute, Cambridge for a Visiting Research Fellowship during part of the preparation of this paper.
764
P. D. Welch
horizon, and Turing is bent over a flower examining it in the foreground. (He expressed interest in the puzzle of how flowers know how to grow – a question that stayed with him and resurfaced in his last work on morphogenesis.) He showed mathematical strength certainly, but without evidencing any Gauss-like precocity. However it was enough for him to win a scholarship to King’s College, Cambridge in 1931 (although he failed to get one to Trinity, his first choice, and at that time the acme for the aspiring mathematician or scientist). At Sherborne he had been interested in relativity, and the relatively new field of quantum mechanics. He had read Arthur Eddington’s “The Nature of the Physical World ”. Eddington was an astrophysicist, who in Britain at least, was famous for having led the expedition to verify during the 1919 solar eclipse Einstein’s predictions on the gravitational effects on the curvature of light. For the undergraduate Turing, the scientific luminaries at that time in Cambridge would have been Eddington himself, who would be an early influence, and G. H. Hardy. His undergraduate tutor at King’s College was the group theorist Philip Hall. Besides Hardy he also read and absorbed von Neumann’s “Mathematische Grundlagen der Quantum Mechanik ”, also a topic of enduring interest. During his undergraduate studies he is supposed to have given an improved proof of a theorem of Sierpi´ nski, but what that was, or which theorem it was, has been lost in the mists of time.
2. The Central Limit Theorem In 1933 he attended Eddington’s lectures entitled “The Distribution of Measurements in Scientific Experiments.” This must have sparked his curiosity more than usual since he distilled for himself a mathematical problem from Eddington’s heuristic description, which he then proceeded to solve. This resulted in a theorem in fact it was the Central Limit Theorem, but to Turing this was unknown. It seems to be a pattern throughout his life, that he would endeavour to work things out for himself, preferably from first principles. For a young mathematician it is perhaps excusable not to be conversant with the relevant literature, but this tendency of working from scratch seems to have stayed with him. The version of the Central Limit Theorem he proved had been discovered 12 years earlier by the Finnish mathematician Jarl Lindberg [10]. S. Zabell [21] gives an account of the history of the Central Limit Theorem and a full discussion of Turing’s proof and its context. De Moivre’s original formulation had been in terms of expressing the probability of success, Sn , after n trials from an infinite sequence X1 , X2 , . . . , Xn , . . . of random variables, and its approximation to the Gaussian Error function. Turing developed a condition for convergence that follows from Lindebergh’s convergence condition. Lindbergh’s condition was later (1935) shown by Feller and Levy to be necessary. Feller also discovered a subsequence phenomenon if his condition failed. Turing anticipated this by also demonstrating that a subsequence of the Xi would contribute a set of values converging to the Gaussian limit. He also proved a special case of the later (1936) theorem of Cram´er: If X and Y
Turing’s mathematical work
765
are independent, and X + Y is Gaussian then X and Y are Gaussian. Turing showed just the special (and simpler) case that if it is assumed that additionally X is Gaussian then Y can be deduced to be Gaussian. The other insight stressed by Zabell, is that whereas earlier statements and proofs (and textbook versions) of the Central Limit Theorem were in terms of densities that needed stronger assumptions, he realised that the best results would be obtained by working with distribution functions rather than densities throughout – an insight used also by Lindebergh to get the optimal results. Turing remained interested in statistical theories throughout his life. The article by I. J. Good in [17] gives an account of Turing’s ideas concerning statistical evidence from the Bletchley Park years: Turing did not publish these war-time statistical ideas because, after the war, he was too busy working on the ground floor of computer science and artificial intelligence. I was impressed by the importance of his statistical ideas, for other applications, and developed and published some of them in various places. (Good, [8] p. 211) Notwithstanding the lack of priority, Philip Hall encouraged him to write up this work as a Fellowship Dissertation for the King’s College competition in 1934, which was done, being entitled On the Gaussian Error Function. This was accepted on 16 March 1935, Hall arguing that the rediscovery of a known theorem was a significant enough sign of Turing’s strength (which he argued had not yet achieved its full potential). Turing thus won a 3 year Fellowship, renewable for another 3, with £300 per annum with room and board. He was 22 years old. Probably under Hall’s influence his first published paper was in group theory. This was a small and easily stated advance on a recent theorem of von Neumann’s. The latter had defined two notions of ‘left’ and ‘right’ periodicity in [20] but had missed the fact that they are equivalent. Turing proved this, and it appeared as a two page paper in April 1935 [11]. By coincidence von Neumann arrived on a sabbatical visit from Princeton that month and proceeded to lecture on the subject; it is from this time that the two must have been acquainted.
3. “On Computable Numbers” Probably more decisive to meeting von Neumann, was his contact with Max Newman. In Spring 1935 he went on a Part III course of Newman’s on the Foundations of Mathematics. (Part III courses at Cambridge were, and are, of a level beyond the usual undergraduate curriculum but preparatory to undertaking a research career.) Newman was a topologist, and interested in the theory of sets. Newman attended Hilbert’s lecture at the 1928 International Congress of Mathematicians where three strands of the latter’s ‘Program’ were stated. Hilbert had worked on Foundational matters for the previous decades and would continue to do so. His aim to obtain a secure foundations for mathematics by finding proofs of consistency of large parts (if not all) of mathematics by a process
766
P. D. Welch
of systematic axiomatisation, and then showing that these axiomatisations were safe by providing finite consistency proofs, looked both reasonable and possible. By systematic effort Hilbert and his school had reduced the questions of the consistency of geometry to analysis. There seemed reasonable hope that genuinely finitary methods of proof could render arithmetic provably consistent within finite arithmetical means. Hilbert’s program might be summarised as tripartite. • (I Completeness) The question, or rather Hilbert’s belief, that mathematics was complete: that is, given any properly formulated mathematical proposition P , either a proof of P could be found, or a disproof. • (II Consistency) The question of consistency: given a set of axioms for, say, arithmetic, such as the Dedekind–Peano axioms, PA, could it be shown that no proof of a contradiction can possibly arise? Hilbert stringently wanted a proof of consistency that was finitary, that made no appeal to infinite objects or methods. • (III Decidability – the Entscheidungsproblem) Could there be a finitary process or algorithm that would decide for any properly formulated proposition P whether it was derivable from axioms or not? Of course the main interest was consistency, but there was both hope (discernible from some of the writings of the G¨ottingen group) that there was a positive solution to the Entscheidungsproblem. However, as is well known, G¨odel’s Incompleteness Theorems block Hilbert’s program. Theorem 3.1. (G¨ odel–Rosser First Incompleteness Theorem – 1931) For any theory T containing a moderate amount of arithmetical strength, with T having an effectively given list of axioms, then: if T is consistent, then it is incomplete, that is for some proposition neither T ` P nor T ` ¬P . The theorem is, deliberately, written out in a semi-modern form. Here, it suffices that T contain the Dedekind–Peano axioms, PA, to qualify as having a ‘moderate amount of arithmetical strength’. The axioms of PA can be written out as an ‘effectively given’ list, since although the axioms of PA include an infinite list of instances of the Induction Axiom, we may write out an effective prescription for listing them. Hence PA satisfies the theorem’s hypothesis. G¨odel had used a version of the system of Principia Mathematica of Russell and Whitehead but was explicit in saying that the theorem had a wide applicability to sufficiently strong “formal systems” (although without being able to specify completely what that meant). This immediately established that PA is incomplete, as is any theory containing the arithmetic of PA. This destroys any hope for the full resolution of Hilbert’s program that he had hoped for. In a few months there was more to come: Theorem 3.2. (G¨ odel’s Second Incompleteness Theorem – 1931) For any consistent T as above, containing the axioms of PA, the statement that ‘T is consistent’ (when formalised as ‘ConT ’) is an example of such an unprovable sentence. Symbolically: T 6 ` ConT The First Theorem thus demonstrated the incompleteness of any such formal system, and the Second the impossibility of demonstrating the consistency of the
Turing’s mathematical work
767
system by the means of formal proof within that system. The first two of Hilbert’s questions were thus negatively answered. What was left open by this was the Entscheidungsproblem. That there might be some effective or finitary process is not ruled out by the Incompleteness Theorems. But what could such a process be like? How could one prove something about a putative system that was not precisely described, and certainly not mathematically formulated? Church and the λ-calculus One attempt at resolving this final issue was the system of functional equations called the “λ-calculus” of Alonzo Church. This gave a strict, but rather forbidding, formalism for writing out terms defining a class of functions from base functions and a generalised recursion or induction scheme. Church had only established that the simple number successor function was “λ-definable”, when his future PhD student Stephen Cole Kleene arrived in 1931; by 1934 Kleene had shown that all the usual number theoretic functions were also λ-definable. They used the term “effectively calculable” for the class of functions that could be computed in the informal sense of effective procedure or algorithm alluded to above. Church ventured that the notion of λ-definability should be taken to coincide with “effectively calculable”. Church’s Thesis (1934 – First version, unpublished) The effectively calculable functions coincide with the λ-definable functions. At first Kleene tried to refute this by a diagonalisation argument along the lines of Cantor’s proof of the uncountablility of the real numbers. He failed in this but instead produced a theorem: the Recursion Theorem. G¨odel’s view of the suggestion contained in the thesis when Church presented it to him, was that it was “thoroughly unsatisfactory.” G¨ odel meanwhile had formulated an expanded notion of primitive recursive function that he had used in his Incompleteness papers; these became known as the Herbrand–G¨ odel general recursive functions. He lectured on these in 1934 whilst visiting the IAS, Princeton. Church and Kleene were in the audience, and seem to have decided to switch to the perhaps more mathematically appealing general recursive functions. By 1935 Church could show that there was no λ-formula “A conv B” iff the λ-terms A and B were convertible to each other within the λ-calculus. Moreover, mostly by the work of Kleene, they could show the λ-definable functions were co-extensive with the general recursive functions. Putting this “non-λ-definableconversion” property together with this last fact, there was therefore a problem which, when coded in number theory, could not be solved using general recursive functions. This was published by Church [6]. Another thesis was formulated: Church’s Thesis (1936 – second version) The effectively calculable functions coincide with the [H-G] general recursive functions. G¨ odel still indicated at the time that the issue was unresolved, and that he was unsure that the general recursive functions captured all informally calculable functions.
768
P. D. Welch
“On Computable Numbers” Newman and Turing were unaware of these developments in Princeton. Turing’s classic paper’s first subject is ostensibly ‘Computable Numbers’ and is said to be only “with an application to the Entscheidungsproblem”. He starts by restricting his domain of interest to the natural numbers, although he says it is almost as easy to deal with computable functions of computable real numbers, but he will deal with integers as being the ‘least cumbrous.’ He briefly initiates the discussion with calling computable numbers those ‘calculable by finite means.’ In the first Section he compares a man computing a real number to a machine with a finite number of states or ‘m-configurations’ q1 , . . . , qR . The machine is supplied with a ‘tape’ divided into cells capable of containing a single symbol from a finite alphabet. The machine is regarded as scanning, and being aware of, only the single symbol in the cell being viewed at any moment in time. The possible behaviour of the machine is determined only by the current state qn and the current scanned symbol Sr which make up the current configuration of the machine. The machine may operate on the scanned square by erasing the scanned symbol or writing a symbol. It may move one square along the tape to the left, or to the right. It may also change its m-configuration. He says that some of the symbols written will represent the decimal expansion of the real number being computed, and others (subject to erasure) will be for scratch work. He thus envisages the machine continuously producing output, rather than halting at some stage. It is his contention that “these operations include all those which are used in the computation of a number.” His intentions are often confused with statements such as ‘Turing viewed any machine calculation as reducible to one on a Turing machine’ or some thesis of this form. Or that he had ‘distilled the essence of machine computability down to that of a Turing machine.’ He explicitly warns us that no “real justification will be given for these definitions until Section 9.” In Section 2 he goes on to develop a theory of his “automatic” or a-machines giving and discussing some definitions. He also states: “For some purposes we may use machines whose motion is only partly determined. When such a machine reaches one of these ambiguous configuations, it cannot go on until some arbitrary choice has been made. . . ” Having thus in two sentences prefigured the notion of what we now call a nondeterministic Turing machine he says that he will stick in the current paper only to a-machines, and will drop the ‘a’. He remarks that such a non-deterministic machine ‘could be used to deal with axiomatic systems.’ (He is probably thinking here of the choices that need to be made when developing a proof line-by-line in a formal system.) The succeeding sections develop the theory of the machines, the theory of a “universal machine” is explicitly described, as is in particular the conception of program as input or stored data; further, the mathematical argument using Cantor’s diagonalisation technique, to show the impossibility of determining by a machine, whether a machine program was ‘circular’ (that is, writing only finitely many output symbols) or not. (Thus, as he does not consider a complete computation as a halted one, he instead considers first the problem of whether one
Turing’s mathematical work
769
can determine a looping behaviour.) Section 9, “The extent of the computable numbers”, is in some ways the heart of the paper, in particular for later discussions of the so-called ‘Turing’ or ‘Church– Turing’ theses. It is possibly of a unique character for a paper in a purely mathematical journal of that date (although perhaps reminiscent of Cantor’s discussions on the nature of infinite sets in Mathematische Annalen). He admits that any argument that any calculable number (by a human) is “computable” (i.e. in his machine sense) is bound to hang on intuition and so be mathematically somewhat unsatisfactory. He argues that the basis of the machine’s construction earlier in the paper is grounded on an analysis, which he then proceeds to give, of what a human computor does when calculating. This is done by appealing to the obvious finiteness conditions of human capabilities: the possibilities of surveying the writing paper, observing symbols together with their writing and erasing. It is important to see that this analysis should be taken prior to the machine’s description. (Indeed one can imagine the paper re-ordered with this section placed at the start.) He had asked: “What are the possible processes which can be carried out in computing a real number” [My emphasis]. It is as if the difference between the Princeton approach and Turing’s is that the former appeared to be concentrating on discovering a definition whose extension covered in one blow the notion of effectively calculable, where as Turing concentrated on process, the very act of calculating. According to Gandy, [7], Turing has in this section in fact proved a theorem albeit one with unusual subject matter. What has been achieved is a complete analysis of human computation in terms of finiteness of the human acts of calculation broken-down into discrete, simple, and locally determined steps. Hence: Turing’s Thesis: Anything that is humanly calculable is computable by a Turing machine. (i) Turing provides a philosophical paradigm of analysis when defining “effectively calculable”: a vague intuitive notion is given a unique meaning which moreover can be stated with complete precision. (ii) He also makes possible a completely precise understanding of what is a ‘formal system’ thereby making an exact statement of G¨odel’s results possible (see the quotation below). He claims to have a machine that will enumerate the theorems of predicate calculus. This also makes possible a correct formulation of Hilbert’s 10’th problem. It is important to note in this regard that Turing thus makes expressions along the lines of “such and such a proposition is undecidable” have mathematical content. (iii) In the final 4 pages he gives his solution to the Entscheidungsproblem. He proves that there is no machine that will decide of any formula ϕ of the predicate calculus whether it is derivable or not. He was 23. His mentor and teacher Max Newman was astonished, and at first reacted with disbelief. He had achieved what the combined mental resources of
770
P. D. Welch
Hilbert’s G¨ ottingen school and Princeton had not, and in the most straightforward, direct, even simple manner. Within 14 months of starting to attend the Foundations of Mathematics course he had solved the last general open problem associated with Hilbert’s program. However this triumph was then tempered by the arrival of Church’s preprint of [5] which came just after Turing’s proof was read by Newman. The latter however convinced the London Mathematical Society that the two approaches were sufficiently different to warrant publication; this was done in November 1936, with an appendix demonstrating that the the machine approach was co-extensional with the λ-definable functions, and with Church as referee. G¨odel again:1 “When I first published my paper about undecidable propositions the result could not be pronounced in this generality, because for the notions of mechanical procedure and of formal system no mathematically satisfactory definition had been given at that time. . . The essential point is to define what a procedure is.” “That this really is the correct definition of mechanical computability was established beyond any doubt by Turing.”
4. Normal Numbers Turing’s unpublished “A Note on Normal Numbers” (in [17]) dates presumably from about 1936 (the manuscript is on the reverse of some pages of a proof copy of the “On Computable Numbers” manuscript). The notion of normal number is due to Borel who showed, measure theoretically and hence highly non-constructively, that almost all real numbers are normal . A number, say in (0, 1) is called normal if in every base, every block of digits of the same length occurs with the same limit frequency. Thus, in a binary expansion 0 and 1 must each occur half the number of times, each of the blocks 00, 01, 10 and 11 one quarter of the times and so on. As Turing’s typescript starts out: Although it is known that almost all numbers are normal no example of a normal number has been given. I propose to show how normal numbers may be constructed and to prove that almost all numbers are normal constructively. Becher [1] gives an account of the typescript and accompanying manuscript notes. In the latter Turing gives a partial example due to his friend David Champernowne in the explicit base 10 only: 0.1234567891011121314 . . . by simply stringing together all base 10 numerals one after the other (so a ‘semi-normal’ number). So this example had a simple description. Turing asserts that his solution, although constructive – it makes use of his own new theory of computable reals – 1 There are several approving quotes from G¨ odel; this is taken from an unpublished (and ungiven) Lecture in the Nachlass G¨ odel, Collected Works, Vol III, p. 166–168.
Turing’s mathematical work
771
does not give what he calls a ‘convenient’ solution, such as exemplified by Champernowne’s number. Nevertheless it is perfectly constructive, and indeed Turing uses this word in his paper rather than ‘computable’ which would have been perfectly appropriate. Both Sierpi´ nski and Lebesgue gave constructions of normal numbers, but these proofs are not finitary and so not computable or constructive in the modern sense, but Becher speculates that perhaps these previous proofs put Turing off from publishing his own note. Theorem 4.1 (Turing). We can find a constructive function c(k, n) of two integer variables with values in finite sets of pairs of rational numbers such that, for each k and n, if Ec(k, n) = (a1 , b1 )∪· · ·∪(a1m , bm ) denotes the finite union of the intervals whose rational endpoints are the pairs given by c(k, n), then Ec(k, n) ⊂ Ec(k, n − 1) and the measure of Ec(k, n) is greater than 1 − k1 . Further, for each k, E(k) = T 1 n Ec(k, n) has measure 1 − k and consists entirely of normal numbers. Becher et al. ([2]) have reconstructed the proof of the following second theorem (see her discussion in [1] of this in relation to the introductory note of J. L. Britton in [17] which had questioned the veracity of this theorem). It produces explicitly computable normal numbers: Theorem 4.2 (Turing). There is a rule whereby given an integer k, and an infinite sequence θ of zeros and ones, we can find a normal number α(k, θ) ∈ (0, 1) and in such a way that for a fixed k these numbers form a set of measure at least 1 − k2 , and so that the first n digits of θ determine α(k, θ) to within 2−n . In modern day terms, the ‘rule’ is a computable algorithm, and when the sequence θ is a computable one, then the output is a computable normal number. Becher points out that the time complexity of the algorithm needed to produce the n’th digit of α(k, θ) is doubly exponential in n, and such appears to be the best to date. (They also note that an ‘effectivized’ version of Sierpi´ nski’s argument also gives a doubly exponential time algorithm.) There it is also remarked that the proof shows that random numbers (a later concept related to work of Martin–L¨of, and others) are all normal.
5. Princeton Years After the triumph of the “On Computable Numbers” it was natural for Turing to visit Princeton which he did in 1937 but was somewhat dismayed to find only Church and Kleene there. (He had naturally hoped to meet G¨odel, but their paths were not to cross.) He published quite quickly two papers on group theory (described in a letter to Philip Hall – as ‘small papers, just bits and pieces’ – nevertheless they appeared in Compositio [12] and Annals of Mathematics [13]). He had first asked von Neumann for a problem, and von Neumann passed on one from Ulam concerning the possibility of approximating continuous groups with finite ones which Turing soon answered negatively in [13].
772
P. D. Welch
Let G be a multiplicative group with a product · and a metric d. Let ε > 0 be fixed. A finite group Hε with a product ◦ is said to be an ε-approximation to G if Hε ⊆ G and (i) every x ∈ G is within distance ε of some h ∈ Hε (ii) a, b ∈ Hε ⇒ d(a ◦ b, a · b) < ε. G itself is said to be approximable if it has an ε-approximation for every ε > 0. Turing then proved two theorems: Theorem 5.1 (Turing). Let G be an approximable group with a faithful representation over complex matrices of degree n. Then may be approximated by finite groups with faithful representations of the same degree n. Theorem 5.2 (Turing). An approximable Lie group is compact and abelian. The Compositio paper (which Turing had stated in the letter as something ‘Baer thinks is worth publishing’) concerns the problem of determining the extensions of a given group G by a given group H inducing given classes of automorphisms. He stayed on in Princeton on a Procter Fellowship (of these there were three, one each for candidates from Cambridge, Oxford and the Coll’ege de France). He decided to work towards a Ph.D. under Church. He still had a King’s Fellowship, and thus a Ph.D. would not have been of great use to him in the Cambridge of that day. He completed his thesis in two years (even whilst grumbling about Church’s “suggestions which resulted in the thesis being expanded to an appalling length” – it is 106 pages.) To illustrate the thesis problem, by an example (where we may think of T0 as PA again). Set: T1 : T0 + Con(T0 ) where “Con(T0 )” is some expression arising from the Incompleteness Theorems expressing that “T0 is a consistent system”; as Con(T0 ) is not provable from T0 , this is a deductively stronger theory; continuing, define: [ Tk+1 : Tk + Con(Tk ) for k < ω, and then: Tω = Tk . k